Anatomy of a malicious PDF file

Date : February 09, 2010

For more than a year, attacks using malicious PDF (Portable Document Format) files largelly increased:

During year 2009, Cert-IST issued 4 Potential Danger notices to warn its community of new attacks, which use yet unknown vulnerabilities in Adobe Reader or Adobe Acrobat (so called "0-day" attacks).
The BitDefender antivirus editor has placed at the top of its "Top 10 for December 2009" the Exploit.PDF-JS.Gen threat, which represents 12,04 % of all infections. Under this name are grouped all the PDFs that trigger different vulnerabilities found in the PDF Reader JavaScript engine, to run malicious code on user's computer.

To better understand this threat we explain in this article how a malicious PDF file works and what are the available measures to reduce this risk.

Anatomy of a malicious PDF

Malicious PDF files use vulnerabilities discovered in the PDF reader (typically Acrobat Reader) to force the reader to run arbitrary code when the PDF file is opened. These vulnerabilities are typically buffer overflows, which occur when the PDF reader parses a crafted file. These vulnerabilities can typically be found:

in PDF primitives. For example, a call to the PDF primitive "/Colors" with an argument greater than 2 ^ 24 causes a heap overflow in Adobe Reader 9.3 (see CERT-IST/AV-2009.469, CVE-2009-3459)
in the JavaScript interpreter which is embedded in the PDF reader. For example a call to the JavaScript "Collab.getIcon" method should cause a stack overflow in Acrobat Reader 9.0 (see CERT-IST/AV-2009.078, CVE-2009-0927)

To take advantage of memory overflows and to be able to run arbitrary code when the overflow occurs, the attacker needs to know the exact layout of the PDF reader memory when the overflow occurs. But more often it uses a JavaScript code that performs a "heap spray" before triggering the vulnerability. This technique allows to execute arbitrary code without knowing the exact memory location to modify. Then the attacker fills the memory with multiple jump instructions to its attack code, increasing the probability that one of them is executed when the overflow occurs.

Therefore, even if it is not absolutely necessary, most malicious PDF files contain a JavaScript code:

either because the targeted vulnerability lies in the JavaScript interpreter,
or because the attack technique uses JavaScript (e.g. heap-spay attack).

To trigger that JavaScript code when the document is opened, or even later when a specific event occurs, the attacker uses one of the following PDF primitives: "/OpenAction" or "/AA" (Additional Action).

Infection vectors

There are two typical scenarios that may result in the infection of a user’s system thanks to malicious PDF:

The infection most often occurs when the victim visits a malicious web site, which was designed to force the web browser to automatically open the trapped PDF. In this case, it is the "Acrobat Reader" browser plugin that is attacked. This attack can be very stealth (invisible and silent), by inserting a small "iframe" tag in the trapped web page (e.g. 1 x 1 pixel in size).
It can also be done by sending to the victim a trapped PDF file attached to an e-mail. The e-mail itself will use any good excuse to encourage the victim to open the attached file. In this case this is the standalone "Acrobat Reader" application which is attacked.

There is a third scenario which can be used in the very specific (and uncommon) circumstance where the vulnerability also affects the extension that Adobe attaches to the Windows File Explorer application. This extension allows Windows Explorer to display a preview of a PDF file or a popup tooltip with general information when the mouse cursor is moved over the PDF file icon. In this circumstance, just viewing the content of a directory where a rogue PDF file is located (for example in a network share) is enough to trigger the infection. This infection scenario has been demonstrated in the case of the "JBIG2Decode" vulnerability (see CERT-IST/AV-2009.078, CVE-2009-0658).

Protective measures

It should be noted first, that it could be difficult for antivirus program to detect malicious PDF files, because the PDF format specification provides multiple opportunities for the attacker to obfuscate his malicious code. First of all he could take advantage of lexical manipulations such as:

Change a character by its hexadecimal or octal equivalent,
Insert line-breaks into strings,
Insert multiple blanks between words.

It is also possible to use a JavaScript function to dynamically generate code (using the "unescape" and "eval" JavaScript functions).

Finally, malicious code can be packed into a PDF object named a "Stream". PDF extensively uses (for legitimate purposes) "Stream" objects: for example a regular text or an image is usually stored in the PDF file as a "stream" (which in fact could contain any binary data) and is unpacked at the time the PDF file is displayed. PDF provides half of dozen of packing/unpacking functions (which are called "filters"), such as: hexadecimal or base-85 encoding, Zlib or LZW compression, AES encryption, etc…

All these techniques could be combined to build very complex PDF files which have to be reversed by the antivirus engine (it should parse the file, put it in a canonical form, unpack the streams, etc…). And indeed, most of the malicious PDF are not detected as malware by antivirus.

The only real effective protections against malicious PDFs are:

to keep its PDF reader up to date in terms of security patches,
to disable JavaScript interpretation in Acrobat Reader,
and possibly, to disable PDF web browser (to avoid "drive by download" infection when visiting web sites).

Starting with Adobe Reader 9.2 (and 8.1.7) a "blacklist" feature is available to deny execution for some JavaScript functions in PDF files. This function, however, seems quite unattractive compared to the complete deactivation of JavaScript.

The detection of malicious files can be done manually with tools such as PDF-id (a Python script that searches if a PDF contains suspicious items, such as JavaScript code or "/OpenAction" primitive). But these tools are still very basic, and we don't know any more elaborate solution which could be installed on a (web or e-mail) gateway to detect and block suspicious PDF files at the organisation border.

Conclusion

Until recently, the PDF format was deemed as a safe file format. It has even been used for some time as a safe alternative to the ".doc" format, which was known to be dangerous. But the times have changed, and PDF has clearly proved to be very dangerous as well.

PDF is a versatile language. The ability to include JavaScript code (and Flash ActionScript since version 9 of Adobe Reader) in PDF file increase the possible flaws in the reader (the JavaScript engine could be vulnerable). But moreover, it provides the ability to build sophisticated attacks. It is likely that "the worst kind of PDF attacks" is yet to come...

For more information

Malicious PDF Files – Pawel Jacewlcz (NASK review 2009 – Page 28)
http://www.nask.pl/review/Review_2009a.PDF
Q&A Didier Stevens on malicious PDFs (Insecure magazine page 29)
http://www.net-security.org/dl/insecure/INSECURE-Mag-23.pdf
Anatomy of Malicious PDF Documents – Didier Stevens (Hakin9 magazine)
http://iaclub.ist.psu.edu/files/PDF_Seminar/anatomy_of_malicious_pdfs.pdf