An Intro to Kernel Development - MMU - Part 4

5 mins

Let's see more Debugging...

In the previous blog we went through the MMU table layout and the structure of the entries. I had a doubt when i reread my blog, like why is L1 equal to 1GP? atleast that's how i have phrased it in my blog Today, let's try to understand the VA separation and how ranges are calculated for every index within the table. so when we define the layout in TCR_EL1, we specify the separation of VA, which was 48 bits + page size that is 4k in my example T0SZ, T1SZ are parts of the TCR_EL1, that takes the VA size TG0, TG1 are parts of the TCR_EL1, that takes the page size For 4kb, we have predefined Levels VA bits [47:39] - 512GP range VA bits [38:30] - 1GP range VA bits [29:21] - 2mb range VA bits [20:12] - 4k range VA bits [11:0] - actual index for 16kb, we have VA bit [47] - 128TB range VA bits [46:36] - 64GP range VA bits [35:25] - 32MB range VA bits [24:14] - 16kb range VA bits [13:0] - actual index if we take T0SZ as 16 then 64 - 16 = 48-bit address space for 48bits VA + 4k Page we can use entire bit range to index into the table that is 0-47, so all the Level will be used for indexing if we pick 34 as T0SZ then 64 - 34 = 30 that is 30bits can be used to index, in that case with 4kb page, our indexing will start from Level 2 Now let's say we want to map VA 0x80000000 to PA 0x90000000 what do we do? need to know the VA size + page size let's take that as 48 VA, 4kb page For 4 KB, everything excep the last 11:0 bit should be indexed, and every level will have 11bits for indexing and because we have selected 48-bit VA, MMU will start the walk from L0 we need to have index values in L0-L3 input VA is 0x80000000 0x0000000080000000 we should index 0-47 bits into the table to index into L0 table, we want bits 47-39 0x80000000 >> 39 (bits [47:39]) to index into L1 table, we want bits 38-30 0x80000000 >> 30 (bits [38:30]) to index into L2 table, we want bits 29-21 0x80000000 >> 21 (bits [29:21]) to index into L3 table, we want bits 20-12 0x80000000 >> 12 (bits [20:12]) we just use shifts or AND operation to get the correct value in that bit range and use that as the index in that table. one of the L should have a block descriptor or we will reach the final page entry For 16 KB granule (offset = 14 bits instead of 12): L0: >> 47 (bits [47]) L1: >> 36 (bits [46:36]) L2: >> 25 (bits [35:25]) L3: >> 14 (bits [24:14]) Incase if you get a translation fault, check the qemu logs for cat qemu-1.log | grep -2 'with ESR' | head Taking exception 3 [Prefetch Abort] on CPU 0 ...from EL1 to EL1 ...with ESR 0x21/0x86000005 ...with FAR 0x80000880 ...with SPSR 0x200003c5 -- Taking exception 3 [Prefetch Abort] on CPU 0 ...from EL1 to EL1 ...with ESR 0x21/0x86000005 ...with FAR 0x80001200 In this error, we have tried to access address 0x80000880 Fault code is 0x86000005 you can interrept the result using ARM manul, here is quick summary of different usefull codes we got 0x5 L1 translation fault 0x0 Address size fault, level 0 L0 0x1 Address size fault, level 1 L1 0x2 Address size fault, level 2 L2 0x3 Address size fault, level 3 L3 0x4 Translation fault, level 0 L0 0x5 Translation fault, level 1 L1 0x6 Translation fault, level 2 L2 0x7 Translation fault, level 3 L3 0x9 Access flag fault, level 1 L1 0xD Permission fault, level 1 L1 https://github.com/michealkeines/kernel_fuzzing/blob/main/kernel-simple-mmu/core/mmu.c I have implemented 48-bit 4kb and 16kb version 4kb version worked perfectly, 16kb didnt work no matter what i did, i was completly feed because, it felt like i gave everything correctly, when i was debugging it i found out that i get L1 translation error, i filled every index in L1 with the block and then it worked, i narrowed it down to the correct index, which was 2, which what we use in 4kb page, then i kept debugging whether my TCR register correct, may be i missed the bit there,nothing, everythign looked perfect, yet mmu just used 4k page index i completly gave up and was google this, found some mailing where it say 16kb page is not supported in arm qemu cpus, idk if this true, but it looks like that. References: Translation-granule (ARM Developer)
AArch64 MMU Programming (Lowenware Blog)
The Starting Level of Address Translation (ARM Developer)
Image – address translation diagram (CSDN)
ESR-EL1: Exception Syndrome Register (ARM Developer)