In this series of posts I will be looking at a PDF malware attack from beginning to end. I shall attempt to analyse a malicious PDF in this initial post. With subsequent posts, I shall decipher the resulting JavaScript and finally I will hope to reverse any malicious binaries that may be downloaded as a result of the PDF exploitation.

The malicious PDF I obtained from the Woodmann Reversing forum here. The post covers the javascript that is identified in the PDF file. I decided to take this sample and demonstrate the steps required to reverse it and identify any malware that is downloaded. I shall then attempt to reverse the binary, should the site still serve it.

For the PDF analysis, I used the excellent PDF-Tools from Didier Stevens that can be located here. The main python script that was used was pdf-parser seen below:

pdfparser

The next stage of the analysis is to check the statistics for the PDF file. It may reveal some interesting details about the file. See image below:

stats

In the image, we can see that there are a large number of objects. These objects may contain some malicious JavaScript. The pdf-parser tool allows us to further analyse these objects and search for traces of JavaScript. The search is demonstrated in below:

search

It is clear that there are a large number of javascript objects present in the PDF file. These objects may contain malicious code. PDF-Parser allows us to further analyse these objects using the –raw flag. The raw option makes pdf-parser output raw data and results in the following:

rawsearch

Next, we identified that interesting things lie within object 29 . To investigate further I used pdf-parser and the –object option which outputs the data of that object. Seen below:

streamdecode

From this use of pdf-parser we can then identify a FlateDecode stream, which is an encoding method for PDFs. This can be decoded using the Zlib decompression. Thankfully, the wonderful pdf-parser can decode it for us, using the –filter option.

function

With the stream decoded, we can then begin to get an idea of some malicious javascript. It is clearer in the image below.

array

So, we continue to decode the streams to get a fuller picture of the content. We then attempt to reconstruct the file into javascript that can be executed using spidermonkey. I have used Didier Stevens’ implementation of spidermonkey which implements some other functionality to aide in the reversing of JavaScript. The reconstucted JavaScript is seen below:

shellcode1

and the end, which uses document.write to output the array into a textfile for analysis later. Spidermonkey will output the data into a file for us.

function2

Using spidermonkey, we then execute the following commands to get some legible JavaScript into a textfile. See below:

list

The two files containing the data are the write.log files. The content can is indicated below:

collab

Some more identifiable code can now been seen. The malware authors have attempted to make analysis more difficult by obfuscating the JavaScript variable names. However, it is possible to see some plaintext functions such as Collab,collectEmailInfo and printf. These functions are the latest targets for adobe 0-day exploits and so it is clear that something malicious will follow. In the next installment, we shall look deeper into understanding what this javascript actually achieves.

Mark