Let’s Analyze: Dridex (Part 2)

In the previous article we went over how to dump the names of the majority of functions dridex resolves dynamically to complicate analysis. Today we will be using some similar methods to get the other main piece of the puzzle (encrypted string).

Encrypted Strings

As we’ve already got a nice list of functions called, we can look for those involved in string operations such as MultiByteToWideChar or CompareStringA in order to find encrypted strings. I went with CompareStringA as it takes two input strings so there’s a better chance of finding one that’s recently been decrypted.
I got lucky and the first call to CompareStringA I looked at appeared to be a wrapper function that the bot implements for easy string comparison (taking both strings as input parameters). 
A simple string comparison function

By right clicking on the function and selecting “jump to xrefs to”, we can just pick one at random and trace back each of the two string arguments and see if any start off as encrypted. 
This one is promising because var_24 (the second parameter to our CompareString function) is a stack variable which is not written to at any point during the function, the only reference is the lea eax, [var_24] in the excerpt above (so it must be written somewhere in sub_40C47C). Even more promising is the fact that right above the lea operation is a mov operation which moves a pointer to some non ASCII data into edx (looks encrypted); the assumption here is that the function sub_40C47C decrypts the data in the pointer stored in edx into the one stored in eax; but is that assumption correct?

Without going overboard with screenshots, I can tell you that the empty stack variable in eax is moved into ebx which is stored and restored by all functions called from inside sub_40C47 (that is, it doesn’t change at any point during the function until the end where its’ value is moved into eax and the original register is restored). If all assumptions and facts remain correct, we can put a breakpoint at the end of the function and eax should be a pointer to a string.
 Whoever said assumptions get you nowhere?
If you’re not familiar with WinDbg the command “da poi(eax)” breaks down to “da” (display as ascii) “poi(eax)” (the address pointed to by the value in eax). Which should be (and is) our decrypted string! 
Dridex actually uses both ASCII and Unicode strings but the function we have found only deals with ASCII, we could go back and look for a Unicode function called like CompareStringW then repeat the previous process, or we could do a binary search (ALT + B) of the function and hope to get another that looks similar. I liked the look of that “push 2800h; mov ebp, edx” so I grabbed the bytes using the WinDbg command “u 0x0040c485 L2” and entered them into the binary search (make sure to check find all occurrences).

Again, for those unfamiliar with WinDbg “u <address>” disassembles an  address and “L2” tells the disassembler to only dissemble two instructions.

After a quick binary search we are presented with two identical functions (the one we’ve already found and another, the unicode version.), it’s almost too easy. All that’s left is to write a script similar to the one in the previous article and dump the encrypted string + the address the decrypter was called from.

This code is pretty much the same as the previous one, except instead of setting breakpoints on the xrefs to the decryptor function “func_ascii” and “func_unicode” are the address of the last instruction of both function. We get the xref address by doing “PrevHead(Dword(GetRegValue(“ESP”)), 0)” which gets the return address off the stack and then find the instruction prior to it (the call to the decrypter).

The following snippet creates a tuple containing the decrypted string and thhe address of the call, this is then pushed to an array if it hasn’t already which allows us to output call_loc:dec_string combinations we’ve not already handled; it also allows us to call DumpString() from the python command line to dump all the unique decrypted string combinations.


Once we’ve run the script and gotten some data, we might notices that multiple calls came from the same address but decrypted different strings.

To resolve this, lets look back to how the decryption function is called.

There is a number in ecx which could be some kind of id or offset into a block which specifies to the decrypter which string gets returned. The next step would be to find some strings you’re interested in, head to the address the call came from, then disassemble it to find out how the target string is determined and where in the call chain it was decided that specific string was needed. Once you’ve gotten all that information, you can merge some of the code from the last article and this one to comment all the places in which a given string is referenced.

Next we’ll start looking at the C&C code, which will require you to have a good handle on how the strings are referenced. I’ve walked you through the first step and explained what the next step entails, so you should have something to work on while you wait for the next article! Hint: focus on HTTP related strings as the C&C protocol is encrypted XML over HTTPS (you can check you’re in the right place by cross-referencing the string with calls to functions which might send data to a remote host).