Since I posted the article about malware using the 0x33 segment selector to execute 64-bit code in an 32-bit (WOW64) Process, a few people have asked me how the segment selector actually works deep down (a lot of people think it’s software based). For those who haven’t read the previous article, I suggest you read it fist: http://www.malwaretech.com/2013/06/rise-of-dual-architecture-usermode.html
Global Descriptor Table
The global descriptor table (GDT) is a structure used by x86 and x86_64 CPUs, the structure resides in memory, consists of multiple 8-byte descriptors, and is pointed to by the GDT register. Although GDT entries can be segment descriptors, call gates, task state segments, or LDT descriptors; we will focus only on segment descriptors as they are relevant to this article.
A segment descriptor uses a ridiculous layout for backwards compatibility reasons. There is a 4 byte segment base address which is stored at bytes 3,4,5 and 8; The segment limit is 2 and a half bytes and stored at bytes 1, 2 and half of 7; The descriptor flags are the other half of the 7th byte, and the Access flags are byte 6. That’s probably pretty confusing, so I’ve made an example image.
|(A segment descriptor) Fragmentation is cool now.
The only part of the segment descriptor that is relevant for this article is the “Flags” part, which is a total of 4 bits:
- Granularity (if 0, the segment limit is in 1 Byte blocks; if 1, the segment limit is in 4 Kilobyte blocks).
- D/B bit (If 0, the segment is 16-bit; if 1, the segment is 32-bit).
- L Bit (If 0, the D/B bit is used; if 1, the segment is 64-bit and D/B bit must be 0).
- Doesn’t appear to be used.
|The 4 bit “Flags” part of the segment descriptor.
In real mode registers are 16-bit, which means that the CPU should only be able to address 216 (64 KB) of memory, that’s not the case. The CPU has a special 20-bit register it uses for addresses which allows it to address 220 (1 MB), but how is that achieved? The CPU has a segment register which are also 16-bit, the segment register is multiplied by 16 (shifted left 4 bit) then added to the address in order to give a 20 bit address and allowing the whole 1 MB of memory to be accessed.
Protected mode segmentation is significantly different, the segment register is not actually a segment at all, it’s a selector which is split up into 3 parts: Selector, TL (Descriptor Table), and RPL (Request Privilege Level):
- Segment Selector (13 bits) specifies which GDT/LDT descriptor to use, 0 for 1st, 1 for 2nd, etc.
- TL (1 bit) specifies which descriptor table should be used (0 for GDT, 1 for LDT).
- RPL (2 bits) specifies which CPU protection ring is currently being used (ring 1, 2, or 3). This is how the CPU keeps track of which privilege level the current operation is executing at.
Segment selector format
When you switch into 64-bit mode by doing a “CALL 0x33:Address” or “JMP 0x33:Address”, you’re not actually changing the code segment to 0x33, you’re only changing the segment selector. The segment selector for 32-bit code is 0x23, so by changing the selector to 0x33, you’re not modifying the TL or RPL, only changing the selector part from 4 to 6 (If you’re wondering how the selectors are 4 and 6 not 0x23 and 0x33, it’s because the low 3 bits are for the TL and RPL so 0x23 (00100011) is actually RPL = 3, TL = 0, Selector = 4 and 0x33 (00110011) is actually RPL = 3, TL = 0, Selector = 6).
|A visual representation of the above.
So, changing the code segment register doesn’t necessarily mean you’re changing segment like it would in real mode, it totally depends on what the selector’s corresponding descriptor says. As we know 0000000000100 is binary for 4 and 0000000000110 is binary for 6, so we need to pull GDT entries 4 and 6.
Here we can see the only difference between the entry for the 32-bit and 64-bit selector is the “Limit” and “Flags” field. The limit is easily explained: it’s 0 for the 64-bit entry because there is no limit, and the 32-bit limit is 0xFFFFF because the granularity bit is set, making the limit (0xFFFFF * 4KB) AKA 4GB (the maximum addressable space using 32-bit registers). To understand the difference in the Flag’s field, we’ll have to view the individual bits.
Here we can see the Granularity and D/B bits are set for entry 4, but for entry 6 they’re not. Entry 6 also has the L bit set, why? The L bit means the CPU should be in 64-bit mode when this segment descriptor is being used, thus the D/B bit must be 0 as it is not in 16-bit or 32-bit mode. The granularity bit is 0 because the descriptor has no limit set as we showed earlier, so the limit granularity is irrelevant. So there you have it, both segment descriptors point to exactly the same address, the only difference is then when the 0x33 (64-bit) selector is set, the CPU will execute code in 64-bit mode. The selector is not magic and doesn’t tell windows how to interpret the code, it’s actually makes use of a CPU feature that allows the CPU to easily be switched between x86 and x64 mode using the GDT.
If you’re interested in how to dump GDT entries, you need to setup a a virtual machine and remotely debug it with windbg (you cant use local kernel debugger). Once you’re connected remotely you can do “DG 0x23” to dump entry for segment selector 0x23 and it will output it in pretty text. If you want to get the raw bytes for the entry, you’ll need to do “r gdtr” to get the address of the GDT from the GDT register, then you’ll need to do “dq (GDT_Address+(0x8 * SELECTOR) L1” example: “dq (fffff80000b95000+(0x08*4)) L1”.