Monday, April 9, 2012

Reverse Engineering AudioDr7 PDF Malware

1. Introduction
This article presents a step by step analysis of a malware we will call AudioDr7 due to the URL address it attempts to contact. The MD5 hash for the malware is “ca1c1adab23e5baeeb3b49e0809e4ad4” and a sample can be found at offensivecomputing.com. The malware is embedded into a PDF document. Several tools are utilized that aid in the analysis of this malware. Tools to extract the JavaScript, execute a payload, obtain the shellcode, and later run the malicious code in an emulator and debugger. All these are shown later in this article.

2. The Malware
A sample of the malware analyzed in this article can be obtained at http://www.offensivecomputing.net/.

Figure 1.0 - Malware found on Offensive Computing

The analysis is performed on a system running Ubuntu 10.04. The PDF document is examined in a file editor in order to identify any suspicious objects contained within the file. In Figure 1.1 VIM is used to view the PDF file and examine its contents. Object 13 is the object shown in Figure 1.1. We can be sure this is malicious code due to the extremely large content in the variable "s".  It includes a string of numbers that will most likely represents some form of a shellcode.

Figure 1.1 - Large string from object 13 from the malicious PDF
Also following the string of numbers is JavaScript code that appears to do parsing for a string. Figure 1.2 shows the code segment following the variable "s" that was declared. After the preliminary inspection of the PDF document, the tool Jsunpack [1] is used to extract any JavaScript from the PDF to a separate file.

Figure 1.2 - JavaScript code from object 13 from the malicious PDF
Figure 1.3 displays the end of the output from Jsunpack. JavaScript is found and it is written to a separate file named “malware.exe.out”. The output contains the same information displayed in Figure 1.1 and Figure 1.2. The declaration of the variable “s” is followed by the code to parse a string.

Figure 1.3 - Output from Jsunpuck executed with the malicious PDF
3. Analysis of the JavaScript
The next step in the analysis is to find a way to obtain the shellcode, if it exists within the PDF. The next tool to use is SpiderMonkey [2] or Google’s V8 JavaScript Engine [3]. Both of these programs are JavaScript interpreters and they allow us to run JavaScript code. We use SpiderMonkey to execute our JavaScript contained in the file malware.exe.out. Also a patched version of SpiderMonkey 1.7 is available and it makes it easier for malware analysis. It redefines vulnerable functions and objects in order to prevent infection of the system and make the analysis easier. The patched version of SpiderMonkey 1.7 is used for this malware analysis alongside a file defined pre.js that defines document objects in case of a reference error. The file pre.js can be found inside the Jsunpack folder.

Figure 1.4 - Output of SpiderMonkey executed with the malicious PDF
The command to run SpiderMonkey with the pre.js and JavaScript found in malware.exe.out is shown in figure 1.4. Two interesting results can be obtained from SpiderMonkey. First the pre.js file from Jsunpack determines the exploit that the malware attempts to take advantage of. In this case it is “collab.getIcon”. The second interesting result is the log files that are created by SpiderMonkey.

Figure 1.5 - Folder containing the two log files created by SpiderMonkey
In figure 1.5 the files “eval.001.log” and “eval.002.log” are the two files created by SpiderMonkey. The first file contains the string that is created by the parsing function in figure 1.2.

Figure 1.6 - Contents of eval.001.log
The second file executes the string in the first file and we obtain the payload. Here we find the shellcode initialized to the variable “payload”. The patched SpiderMonkey makes it easier for us to execute the JavaScript and obtain the shellcode. If the process was done manually we would have to hook the eval and unescape statements as print statements. The JavaScript would have to be modified and executed twice to obtain the same output.

Figure 1.7 - Snippet from the contents of eval.002.log

Figure 1.7 shows a snippet of the contents for eval.002.log. The payload starting with “%uC033” and ending with “%u0070” is copied and saved in a separate file “payload.txt”. In order to analyze the shellcode we need to convert to hex representation and for this we use a Perl script provided by “Malware Analyst’s Cookbook and DVD” [4].

Figure 1.8 - Payload converted to shellcode with Perl Script

Figure 1.8 shows the HEX and ASCII representation of the shellcode we converted from the payload string. The ASCII representation displays a url http://audiodr7... that is most likely the address the malware will attempt to contact and download more malicious code. The shellcode should be saved in a separate file labeled in this example “shellcode.txt”. Figure 1.9 shows the command to save the output to a separate file.

Figure 1.9 - Shellcode saved to text file named "shellcode.txt"

4. Analysis of Shellcode
The next step is to utilize a tool called libemu [5] that runs shellcode in an emulated environment. Libemu should pop an alert if any windows api functions are called and provide the instructions that are executed.

Figure 1.10 - Output of libemu executed with shellcode

In Figure 1.10 the step size is 100000 and the option verbose is enabled. Libemu displays that the windows function GetTempPathA is called by the malware and the execution stops there. The reason the execution is stopped because GetTempPathA expects a temporary path to be returned to the program to use and none is given so the program cannot continue. This is one limitation of libemu. However, we can perform a manual analysis of the binary instructions of the malware and a user level debugger Immunity debugger [6] can be utilized.

The hex code is needed to inject the malware into immunity debugger. Figure 1.8 displays the hex code and this code is copied to a separate file labeled “hexdump.txt”.  To facilitate the process of obtaining the hex code without the offset or ASCII information the command in Figure 1.11 is used.

Figure 1.11 - Hex dump only of the malicious shellcode

Instead of displaying it on the screen we save it to the file hexdump.txt as shown in Figure 1.12.

Figure 1.12 - Command to output shellcode to text file in hex code format

Immunity debugger is installed on a system running Windows XP SP2. From the hex dump file we can easily obtain the executable file by using the online Sandsprite tool “shellcode 2 exe” [7]. The hex dump is pasted into the textbox provided by the webpage and the executable is created and downloaded to the system.

Figure 1.13 - Shellcode 2 exe web interface

The file created is labeled “shellcode.exe_”. This file can be opened with immunity debugger.

Figure 1.14 - Shellcode executable loaded into Immunity Debugger

To step through the program the key “F8” is used. To step into a function the key “F7” is used. To set a software breakpoint the key “F2” is used. To run the program or execute until a breakpoint is reached, the key “F9” is used. These are the commands used for this analysis. For an explanation on how to use Immunity Debugger refer to Dr. Fu’s Security Blog [8].

The first interesting instruction is at the address 00401002. Here the instruction “MOV EAX, DWORD PTR FS:[EAX+30]” copies an address to the EAX register. The FS segment region should set a red flag because this region stores critical information. The description of this location can be verified with winDBG. Attach windbg to any process or executable and examine the data structure for the thread information block.

Figure 1.15 - Data structure for Thread Environment Block in WinDBG

As we can see in Figure 1.15 the Instruction FS:[30] refers to the ProcessEnvironmentBlock section and it is a 32-bit pointer. The next location that is saved to the EAX register is at the address 00401008. DS[EAX+C] is executed and after DS[EAX+1C]. First DS[EAX+C] saves the address of the “Ldr” which is a pointer to _PEB_LDR_DATA. This can be verified with WinDBG.

Figure 1.16 - Data structure of _PEB in WinDBG

The second instruction DS[EAX+1C] now saves the address of InInitializationOrderModuleList to the EAX register. This address points to the beginning of a list of modules and the malware will probably try to access one of these modules later. This can also be verified with Windbg.

Figure 1.17 - Data structure of _PEB_LDR_DATA
As we can see in Figure 1.17 InInitializationOrderModuleList is at the offset 1C. Next let us set a breakpoint at 0040105F. As we can see from figure 1.20 there is a nested loop. After some analysis we can conclude that the malware has its own hash table and attempts to locate a specific function to load from kernel32.dll. At the address 0040105B the instruction CMP EDI, EAX compares the hash values and if they are not equal continues to search the list of modules. When the malware finds the module it will pass the instruction JNZ and continue to the instruction at 0040105F which pops the top of the stack to the ESI register.

Figure 1.18 - Section of shellcode loaded in Immunity Debugger

After the breakpoint has been set to 0040105F we can run the program to the breakpoint with the key “F9”. Continue to step through the program until the instruction ADD EAX, EBX at address 00401071. Here we find the function that the malware was searching for in the EAX register. The function is GetTempPathA and it corresponds to the output of libemu.

Figure 1.19 - Registers of shellcode.exe at address 00401071

We continue to step through the program and inside the function GetTempPathA it obtains the temp folder for the system and returns the Unicode string to the malware. Figure 1.20 displays the stack contents at the address 7C822220 which is inside the function GetTempPathA. The value stored is “C:\DOCUME~1\Mario\LOCALS~1\Temp\”.

Figure 1.19 - Stack contents of shellcode.exe at address 7C822220

We continue to step through the program and at address 0040109E at the instruction PUSH EAX we can see that the ESI register contains the temp address of the system and the file name for an executable “e.exe”. This is most likely the file the malware wants to download.

Figure 1.20 - Immunity Debugger instructions and registers at the address 0040109E

We continue to step through the program and notice the functions that are called by the malware. It should show the true intentions of what the malware is trying to accomplish. A breakpoint is set at the address POP EDI to quickly find the different functions the malware will call. This location is chosen because it is after the hash table function that searches for a function and if matched will display the name in the stack register.

Figure 1.21 - Immunity Debugger showing the function in EAX register

The second function called by the malware is GetProcAddress and this is from the dll file kernel32. The function name can be seen in the register EAX in Figure 1.21. We continue to the next function by pressing “F9”.

Figure 1.22 - Immunity Debugger showing the function in the EAX register

Above in Figure 1.22 the third function called is stored in the EAX register. The function is LoadLibraryA and it is also found in the kernel32.dll file. If we further examine the function call to LoadLibrary we find that two extra libraries are loaded into memory. First twain_32.dll and second urlmon.dll.

Again we execute the program to the breakpoint at 00401073 and the fourth function called is URLDownloadToFileA from the library urlmon.dll. The function can be seen in the EAX register in Figure 1.23.

Figure 1.23 - Immunity Debugger showing the function in the EAX register

Examining the call to URLDownloadToFileA we encounter the web address it connects to and attempts to download an executable from this URL. The address is “http://audiodr7...” and it is the same that appeared in the hexdump of the shellcode in Figure 1.9. Figure 1.26 shows the stack contents at the address 772BAAD3 inside the URLDownloadToFileA function.

Figure 1.24 - Stack contents at address 772BAAD3

Again we execute the program to the previously set breakpoint by pressing “F9” and we obtain the fifth function called. The function WinExec from the library kernel32 is called and the address is stored in the EAX register. After the WinExec function is called the malware terminates and the system is infected.

Figure 1.25 - Immunity Debugger showing the function in the EAX register

5. Conclusion
Now we have an overview of what the audiodr7 malware is trying to accomplish and what functions the malware attempts to call. To summarize we have 5 important functions that are called.
  1. GetTempPath – Obtains the location of the temporary folder for the system
  2. GetProcAddress – Obtains the address of the process running
  3. LoadLibraryA – Calls this function to load two extra libraries, twain_32.dll and urlmon.dll
  4. URLDownloadFileA – Connects to audiodr7 url and downloads the file “e.exe” to temp location
  5. WinExec – The last function called in order to execute the downloaded file “e.exe”
To conclude, many tools exist to help aid in the analysis of malware. The approach described above is one way to reverse engineer malware, specifically malware that is embedded into a PDF document.

6.References
[1] Jsunpack, Available at https://code.google.com/p/jsunpack-n/
[2] SpiderMonkey, Available at https://developer.mozilla.org/en/SpiderMonkey
[3] V8 JavaScript Engine, Available at http://code.google.com/p/v8/
[4] Michael Leigh, “Malware Analyst’s Cookbook and DVD”, Available at
[5] Libemu – x86 Shellcode Emulation, Available at http://libemu.carnivore.it/
[6] Immunity Debugger, Available at http://www.immunitysec.com/products-immdbg.shtml
[7] Shellcode 2 Exe, Available at http://sandsprite.com/shellcode_2_exe.php
[8] Dr. Xiang Fu, Malware Analysis Tutorial 4: Int2dh Anti-Debugging, Available at,

Tuesday, March 27, 2012

Jsunpack Patch for Detecting PDF JavaScript

1. Introduction
Jsunpack [1] is a great tool to examine the structure of a PDF and extract the embedded JavaScript inside a document. Specifically, the python script “pdf.py”, which is included in Jsunpack, handles the PDF document. The “pdf.py” script displays the objects contained within a given PDF, as well as, detects embedded JavaScript and outputs the JavaScript functions to a separate file for analysis. However; “pdf.py” may not always detect the embedded JavaScript. An example of a PDF document that bypasses detection is examined later.  An experimental approach is followed to figure out why jsunpack does not detect the embedded JavaScript. A solution is also presented to patch jsunpack.

2. JavaScript Detected
There are two versions of a PDF document that displays “Hello World” and pops up an alert box using JavaScript code. The first is the original version that was manually created in notepad and it displays the contents in plain text.

   
Figure 1.1 - Version 1 of the uncompressed pdf document labeled "works_original.pdf"

Notepad++ is used to view the contents of the original PDF document shown in Figure 1-1.We can see there are two objects with JavaScript tags; object 6 and object 8. Object 8 contains the actual JavaScript code to produce the alert box which displays, “This is my alert box”. We expect pdf.py to detect the JavaScript in object 6 and 8 and it does! Figure 1-2 shows a partial output of pdf.py executed with the original PDF as the input file.

Figure 1.2 - Output of pdf.py executed with version 1 of the pdf document (works_original.pdf) 

The original PDF document is labeled “works_original.pdf” since it is detected by pdf.py as containing JavaScript.

3. Javascript Not Detected
The second document is a compressed version of the “works_original.pdf” file. The second version uses FlateDecode to compress the streams. When the “works_original.pdf” is saved in Adobe Acrobat Professional 9, the application automatically compresses and converts the original version to the compressed version. Pdf.py can be used to examine the contents. We can see that the structure of the PDF has been modified. New objects are created in the document that did not exist in the “works_original.pdf”. The compressed version is labeled “notwork.pdf” since the JavaScript is not detected by pdf.py. Figure 1.3 is the output from pdf.py with the compressed version (notwork.pdf) as the input.

Figure 1.3 - Output of pdf.py executed with version 2 of the pdf document (notwork.pdf)

A couple of interesting results can be seen from the figure above. First and most importantly, no JavaScript is detected in the compressed file. Second, all the objects are not displayed and references are included to objects that do not appear in the output. For example, object 8 has a tag “/Names” which refers to an object 13 that is not visible. To get a better idea of what is going on and what is contained in the compressed streams, the tool pdfstreamdumper [2] is used. This tool decompresses all the streams that have been encoded with filters like “FlateDecode” and presents the text in a graphical user interface.

Figure 1.4 - Objects listed by Pdfstreamdumper for the notwork.pdf 

Pdfstreamdumper provides a list of objects contained in the PDF and is displayed in Figure 1.4. The list is consistent with the output that “pdf.py” returns so where are the missing objects? If we examine each object and its contents we discover the missing objects are contained within other objects. For example, object 10 contains object 13, 14, 15 and 16.

Figure 1.5 - Contents of object 10 shown by Pdfstreamdumper for the notwork.pdf

To understand the syntax we can refer to PDF Document Reference [3], however, it is clear after some simple analysis. As we can see from Figure 1.5 we have four objects listed consecutively. The first number is the object number and the second is the offset to the beginning of the next stream. So the first two numbers “13 0” declares the object 13 is contained first at offset 0. The next two numbers “14 22” is object 14 and the content for that object is at offset 22. The same for the next two pairs “15 49” and “16 146”. If we look back at the output of pdf.py we see the tags for object 10 and which tag allows for multiple objects.

Figure 1.6 - Snippet from the output of pdf.py for notwork.pdf

We see that the tag “/ObjStm” allows for multiple objects to be embedded into object 10 and we can confirm by looking at the PDF Document Reference [3]. Also the tag “/N” informs us of how many objects are included inside object 10 and as we can see in Figure 1.6, and is verified by pdfstreamdumper, the number of objects inside object 10 is 4. The same process above can be followed to determine where the missing objects 5 and 6, from the pdf.py output, are located. Object 5 is embedded in Object 2. Object 6 is embedded into object 3. Figure 1.7 and Figure 1.8 show the contents of objects 2 and 6 respectively using pdfstreamdumper.

Figure 1.7 - Contents of object 2 shown in Pdfstreamdumper for the file notwork.pdf

Figure 1.8 - Contents of object 3 shown in Pdfstreamdumper for the file notwork.pdf

The locations of the missing objects are known and this information can be used to figure out why the pdf.py script does not detect the JavaScript in the “notwork.pdf” document. The python debugger is utilized to step through the “pdf.py” functions and determine how each object is parsed, specifically object 10. This article assumes the reader knows how to use the python debugger and does not go into detail on the debugging process.

4. Results and Solution
The results of the debugging session are the following. The python script “pdf.py” does not handle the “ObjStm” tag. Any object with a tag “/ObjStm” has a stream that is decompressed, if necessary, however, the information in the stream is not parsed by “pdf.py”. So what we can do here is inject code into pdf.py to handle the “/ObjStm” tag. Figure 1.9 is the code I wrote that detects if an object has a “/ObjStm” tag. Also it checks each object inside and determines if there exist JavaScript.

Figure 1.9 - Code created for pdf.py to address objects streams in a PDF document

This code has also been submitted to Jsunpack’s source code and the patch request is pending review. Figure 1.9 displays the new output for “notwork.pdf” when executed with the modified “pdf.py” script which includes the code shown above.

Figure 1.10 - Output of modified pdf.py executed with the file notwork.pdf

As we can see in Figure 1.10 the python script detects the JavaScript! The objects 13-16 and 5-6 were missing from the unmodified version. Our modification makes those objects visible in the output as well as outputs the JavaScript functions to a separate file. In the example above the JavaScript is exported to a file named “notwork.pdf.out”. Overall this solution improves upon the pdf.py script and allows it to handle objects in an object stream. More importantly it detects if JavaScript exist inside an object stream.

5. Additional Patch to Pdf.py 
Another patch that has been made to the pdf.py script is in regards to the “/Names” tag. I noticed that if the “/Names” tag includes a custom name with the tag then the parsing only captures the text of the name and not the reference number. An example is shown below.

Figure 1.11 - Snippet of the output from pdf.py for the file notwork.pdf

For object 14 there is a “/Name” tag and the output only display the text “My Code” which is the name given to the reference to object 15. However, the reference to object 15 does not appear. After utilizing the python debugger to trace into the program, the issue is due to the parenthesis which stops the parsing function from capturing anything after the parenthesis. This is easily fixed by adding a condition to an existing “if” statement in “pdf.py”. The change is displayed in Figure 1.12.

Figure 1.12 - Code created for pdf.py to address the missing reference number for the tag "/Names" 

The tag variable is an array that contains the stream for the current object. So as well as looking for the condition “\\”, I added the condition “curtag == ‘Names’ “. This line now checks if the current tag is a “/Name” tag and if it is the parsing function will continue to collect the following characters in the tag which would include the object reference number.

Figure 1.13 - Snippet of output from the modified pdf.py for the file notwork.pdf

Figure 1.13 shows the new output of the modified “pdf.py” script which includes the reference as well as the text name which is given to the tag “/Names”.

6. References
[1] Jsunpack, Available at http://code.google.com/p/jsunpack-n/
[2] Pdfstreamdumper, Available at http://sandsprite.com/blogs/index.php?uid=7&pid=57
[3] Pdf Document Reference, Available at http://www.adobe.com/devnet/pdf/pdf_reference.html

Tuesday, March 20, 2012

ZeroAccess Rootkit - Part 2

1. Debuggers
Debugging an application means to detect and remove bugs from an application. Debuggers are essential in software programming because they can help quickly identify a syntactic or logic error in a program.  In the field of malware analysis, debuggers are used to study how malicious codes work in order to provide a method of detection and removal. Debuggers are also used by software pirates who reverse engineer popular software to find ways to remove protections put in place by the application developers.  Due to the emergence of software pirates and their utilization of debuggers, developers use anti-debugging techniques to serve as a deterrence to those individuals who reverse engineer their code. There is no complete solution to stop a reverse engineer who is committed, however, anti-debugging techniques makes the process more difficult, requires a higher level of expertise to bypass, and increases the time for analysis of an application [2].Similar to application developers, who utilize anti-debugging techniques to serve as a layer of protection for their software, malware authors also adopt these techniques for the malware they create. In this scenario debugging techniques serve as a deterrence to malware analysts. The purpose is to prevent accurate analysis of the malicious code by the malware analysts and in effect increase the lifespan of the malware.

2. Dynamic Behavior of Int2d
Many anti-debugging techniques exist; however, this section concentrates on the Int2d instruction since it is frequently used in the Max++ rootkit. The int2d interrupt is a special interrupt reserved for Microsoft kernel debugging service. It raises an exception to be handled by the kernel debugger. If the kernel debugger does not handle the exception it is then passed to the user level exception handling. When an interrupt 2d is executed, the memory address of the exception points to the EIP register. The EIP register is the instruction pointer and always points to the next instruction. After the exception address has been set to the EIP register, the EIP is incremented by one byte.  An exception breakpoint is issued and the exception is either handled or not handled by an exception handler. When no debugger is attached to the system, execution will resume at the address of the exception. The execution will resume normally because the exception is assumed to be corrected and the process can continue from the exception address. If a debugger is present, the execution of the program will continue at the EIP address which is one byte after the exception address. The program skips one byte and this is known as a byte scission.

Due to the difference in observed behavior of the int2d instruction, this can be used to determine if a debugger is present on the system. Also since one byte is skipped, this instruction can be used to change the execution of programs based on the debugging environment. A program may run differently if a debugger is attached to the system as opposed to if no debugger is attached. This technique proves problematic for malware analysis.

This section also explores the dynamic nature of the int2d instruction. The complexities of int2d are more than meets the eye and the factors that change its behavior are numerous. Some examples of factors that can change the observed behavior of the int2d instruction are the values of the register, the structured exception handling, whether a user level debugger is present, as well as whether a kernel level debugger is attached. Different behaviors can be observed by combinations of the above examples. An experimental approach is followed to examine the change in behavior exhibited by int2d.

3. Int2d Experiment Design
To analyze the int2d instruction, the C program in Figure 2.2 is utilized. Written by Dr. Xiang Fu [1], the Int2dExp.cc program is used in this paper to perform experiments with the int2d instruction. The file is compiled into a binary executable to later debug. The program consists of two print statements. The first print statement displays the characters “AAAA”. The second print statement displays the characters “BBBB”. Variables are also included in the code to give room to insert assembly instructions in a debugger. Immunity debugger allows us to debug the executable and modify the assembly instructions.

Figure 2.2 – C code for Int2dExp.cc

Figure 2.3 shows the int2dexp binary file opened in Immunity debugger. The important section of the assembly instructions are shown below. From the memory address “004010DA” to “004010EF, the variables “a” through “d” are initialized. The next two lines stores the value “AAAA” and display it by calling the “printf” function from the “cygwin.dll” file at address “004010FD”. The second print statement is located at address “00401125” and displays the characters “BBBB”.

Figure 2.3 – Assembly instructions shown in Immunity debugger for int2dexp executable

The instructions are modified to incorporate the use of the int2d instruction. In order to test the different behaviors we set up an int2d instruction and overwrite the previous initialization of variables. The int2d is followed by a one byte instruction which is “INC”. To test if the byte after int2d is skipped we include a “CMP” and “JE” instruction. “CMP” compares two values and sets the Z flag in the debugger. The instruction compare subtracts the two values from each other and if they are equal then the result is zero. When the result is zero the “Z flag”, which stands for “zero flag”, will be set to one. If the two values in the compare instruction are not equal the z flag is set to zero. The “JE” instruction stand for “jump if equal to zero” and it depends on the “z flag”. If the two values in the compare function are equal, then the z flag is set to 1 and the “JE” is true and results in a jump to the address specified. The “JE” instruction allows us to see if int2d causes a byte scission. Figure 2.4 shows the modified int2dexp with the EAX register set to one. If a byte is skipped then the “INC EAX” instruction will not be executed. The EAX register retains the same value and at the instruction “CMP” the two values remain equal. The jump instruction is true and the execution would jump from the address “0040110D” to “0040112A”. The jump address is right after the second print statement and prevents the characters “BBBB” from being displayed. Figure 2.4 only displays the character “AAAA”.

Figure 2.4 - Assembly instructions in Immunity debugger for int2dexp where EAX equals one and the JE is included

Another way to accomplish the same experiment above is to replace the instruction “JE” with the instruction “JNZ”. “JNZ” stand for jump if not equal to zero and does the exact opposite of the “JE” instruction. If two values in the compare function are not equal to each other than the JNZ instruction will jump to a specified address. For the same example above if we replace JE with JNZ the program would display “AAAABBBB” instead of only “AAAA”. “JNZ” example can be seen in Figure 2.5.

Figure 2.5 – Assembly instructions in Immunity debugger for int2dexp where EAX equals one and JNZ is included

The program above allows us to test the int2d behavior against two factors. First the debugging environment is changed. The execution is examined with a user level debugger attached, a kernel level debugger attached, and with no debugger attached. The second factor that is changed is the value of the EAX register. The EAX register can be easily modified by changing the value at address “00401102”. Figure 2.6 shows an example where the EAX register is changed to the value two.

Figure 2.6 - Assembly instructions in Immunity debugger for int2dexp where EAX equals two and JE is included

4. Int2d Experiment Configuration
A virtual box image of Windows XP SP2 was used as a host system. The guest system was Windows 7 Home Edition with debugger tools installed on both systems. Below is the serial port configuration for the host system.

Figure 2.7 – Serial port configuration for the windows host system

Figure 2.7 displays the command issued to start a windbg session through the windows SDK command prompt. The port must match the virtual box serial configuration shown above.

Figure 2.8 – Windows SDK 7.1 Command prompt and command to connect to host system

A successful connection to the host machine presents the following window shown below in Figure 2.6. Windbg executes an interrupt “int 3” on the machine by default when first connected and the command “g”, which stands for go, resumes the execution of the host system.

Figure 2.9 – A successful connection established in WinDbg

This command is also used to continue execution of the host machine when an exception has been raised and the host system waits for the exception to be handled. This command is used in the following experiments.

5. Int2d Experiment Results
Figure 2.8 presents results for the experiments with the int2d instruction. The values 1, 2, 3, 4, and 99 are used for the register EAX. Also the int2dprint program executes in different debugging environments and the different combinations are listed below.

Figure 2.10 -  Results for executing int2dexp.exe in various debugging environments and with different values for the
                        EAX register

One particular area of interest is the row where the EAX register value is one and the different debugging environments are tested. Red text indicates that the “INC” instruction executed and the int2d did not cause a byte to be skipped. This behavior is observed only when a kernel debugger is attached to the system, in this case windbg. When windbg is not attached to the system and the EAX register value is one, the int2d interrupt does cause a byte to be skipped. This is significant due to the fact different behaviors are observed and can be used to determine when a debugger is attached and when it is not attached.

From the figure above we can see there is a way to precisely identify if the system is set up in one of four configurations. One configuration is a kernel debugger and a user level debugger attached to a system. The second configuration is a kernel debugger and no user level debugger attached. The third configuration is no kernel debugger and a user level debugger attached. The last configuration is no kernel debugger and no user level debugger attached. Each of the configurations can be identify by their unique behavior.

The configuration of no kernel debugger and no user level debugger can be identified when EAX is equal to zero. Figure 2.10 shows that only in this set up, where the EAX is equal to zero, the “int2dprint” program displays no characters in the command window.

The configuration of no kernel debugger and immunity debugger can be identified when the EAX is equal to two. When the EAX is equal to two, this is the only set up where the output of “int2dprint”is “AAAA” for the “JZ0” command and “AAAABBBB” for the “JNZ” command. Two other configurations also print the same statements, however, only after the WinDbg breakpoint is resumed by the guest system.

The third configuration of kernel debugger and no user level debugger attached can be identified when the EAX register is equal to zero. Only in this set up the “INC EAX” is executed and the resulting display is “AAAABBBB” for the “JZ0” command and “AAAA” for the “JNZ” command.

The last configuration of kernel debugger and user level debugger can also be identified but in two steps. When EAX is equal to zero, there is one configuration that shares the same result where there is a kernel debugger and user level debugger attached to the system. The second configuration that shares the same result for the “int2dprint” is where there is no kernel debugger and a user level debugger is attached. For both of these set ups the result of the program is “AAAA” for the “JZ0” command and “AAAABBBB” for the “JNZ” command. The configuration of kernel debugger and user level debugger attached can be determined by checking the EAX value of two after the EAX value of zero. If the output is not “AAAA” for “JZ0” when the EAX value is equal to two, then the configuration we have is a kernel debugger and user level debugger attached. Alternately, process of elimination can be used since three of the four configurations can be identified.

Here lies the reason the int2d instruction serves as an anti-debugging technique. A program with an int2d interrupt can cause a program to execute differently with a debugger attached as opposed to without a debugger. As shown above with Immunity debugger, when EAX equals one, “AAAABBBB” printed with a debugger was attached. “AAAA” printed with no debugger was attached. Malware authors use this interrupt to prevent accurate analysis of their malware.

An important note to make is that int2d can be used to crash a system. As shown in Figure 2.10, when the EAX register is equal to zero, and the computer is in debug mode, and immunity debugger is not attached, if the int2d instruction is used then the system will freeze and require a manual reboot. Also a system can be crashed with immunity debugger attached. If the EAX register is changed to two and the int2d is executed, again the system freezes and requires a manual reboot.

6. References
[1] Dr. Xiang Fu, Malware Analysis Tutorial 4: Int2dh Anti-Debugging, Available at    
     http://fumalwareanalysis.blogspot.com/2011_10_01_archive.html