Monday, June 3, 2013

Rise of the dual architecture usermode rootkit

A bit about past rootkits

In the past it has been very common to see usermode rootkits that only attack one architecture, which has usually been 32-bit. A standard rootkit injects code into specific/all running processes in order to modify code inside them, this then allows it to hide itself by manipulating results returned by the program. The problem arose when 64-bit systems were introduced. As 64-bit operating systems run under 64-bit (duh) this means 64-bit kernel, addresses and registers. In order for 32-bit applications to run on a 64-bit system a subsystem was introduced which we know as WoW64 (Windows on Windows 64-Bit). WoW64, in short, is responsible for emulating the 32-bit code and translating 32-bit calls to 64-bit in order for the system to call kernel functions. This layer makes it near impossible to interact with 64-bit processes from a 32-bit process (Running under WoW64), in the way a rootkit would need to (Writing process memory, Creating Threads). Because of this issue most rootkit developers have opted to only support 32-bit systems, as they are more widespread, while others have opted to write 2 separate rootkits (one for each architecture) and pack them into a 32-bit dropper. 

WoW64, in short, is responsible for emulating the 32-bit code and translating 32-bit calls to 64-bit in order for the system to call kernel functions. 

32-bit System Calls

On 32-bit systems, a call to a kernel function is made by setting up a few registers and then executing an instruction to enter into kernel mode, here is an example of NtOpenFile in ntdll on Windows XP 32-bit.

Example of a typical ntdll function on 32-bit Windows XP
All this is doing is setting EAX to the ordinal of NtOpenFile withing the SSDT (System Service Dispatch Table) then calling the address pointed to by 0x7FFE0300, so let's see what it points to.

> dd 0x7FFE0300 L1
7ffe0300  77ab64f0

> u 0x77AB64F0 L3
77ab64f0 8bd4         mov    edx,esp
77ab64f2 0f34          sysenter
77ab64f4 c3             ret

As we can see it is a simple stub that stores the current stack pointer into EDX, then uses SYSENTER to transfer execution to the kernel. 

64-bit System Calls from WoW64

On 64-bit systems when a process running under WoW64 needs to make a call to the kernel, things are slightly more complicated. The 32-bit process has 2 versions of ntdll loaded, a 32-bit one which is exposed to the process and a 64-bit one which sits behind the WoW64 layer.  Let's take a look at a NtOpenFile in the 32-bit ntdll. 

Example of a typical ntdll function on 64-bit Windows XP (Wow64)
This is similar to the 32-bit version: load SSDT ordinal into EAX, however in this call we also see that it loads the address of the first argument into EDX, ready to be translated. Let's see what FS:0x0C points to (to avoid confusion i will add the the FS segment points to the WoW64 TEB (Thread Environment Block) which on my OS is at address 0x7EFDD000). 

> dd 0x7EFDD000+0xC0 L1
7efdd0c0  749b2320

> u 0x749b2320 L1
749b2320 ea1e279b743300 jmp 0033:wow64cpu!CpupReturnFromSimulatedCode: (749b271e)

What we find here is a far jump to an address with the segment selects 0x33, now let's take a look what's at the address of the JMP.

> u 0x749B271E L1
749b271e 67448b0424            mov     r8d,[esp]
749b2723 458985bc000000     mov     [r13+0xbc],r8d
749b272a 4189a5c8000000     mov     [r13+0xc8],esp
749b2731 498ba42480140000 mov     rsp,[r12+0x1480]

Would you look at that, It's 64-bit code! If we continue to disassemble this function we can see that it sets up the stack and converts the arguments, before ending up at the same place our 32-bit counterpart ended up at.

> u 0x749B2DD0 L3
749b2dd0 4189adb8000000    mov     [r13+0xb8],ebp
749b2dd7 0f05                       syscall
749b2dd9 c3                          ret
Home sweet home! You will notice this code is slightly different from the previous, this is because we are in 64-bit mode, it is important to note here we have SYSCALL instead of SYSENTER, i believe this is to do with CPU make and current mode, however they both essentially do the same thing: transfer execution to kernel mode.

What does it all mean

Well performing a far jump (or possibly a far call) using the segment selector 0x33 (64-bit code segment) as seen with "jmp 0033:wow64cpu!CpupReturnFromSimulatedCode" results in you breaking out of WoW64 and into the magical land of 64-bit code. Because you can choose where the jump lands after it breaks out of WoW64 it is actually possible to "jmp 0033:LocationWithinMyProgram", which will result in you switching your WoW64 process into a 64-bit process until you switch back by performing a jump with the segment selector 0x23 (32-bit code segment).

A full tutorial for this method was posted on VxHeavens about 7 years ago, giving it the name Heavens Gate,  however only recently it has started to be abused by rootkit developers. Because this method allows a 32-bit (WoW64) process to execute 64-bit code, it is now being used by rootkit developers to inject both 64-bit and 32-bit processes from a single 32-bit process on WoW64 compatible systems. However the method isn't without it's downsides: Once in 64-bit mode it is very difficult to load any 64-bit dlls, leaving developers with only ntdll functions to play with, then there's the issue of how you go about storing 32-bit and 64-bit instructions in the same application whilst still having it work on 32-bit systems. 


Already this year I have witnessed the release of 3 rootkits abusing this method, I am currently trying to get a sample in order to perform a further analysis, but it has been tricky due to these rootkits not attracting much interest. I guess we will have to wait and see if these rootkits become a game changer or just another flop using a method that should have stayed dead and buried.

For a more in depth explanation of how the 0x33 segment selector works, see here.