Automated Penetration Testing with White-Box Fuzzing

Software QA FYI - SQAFYI

By: John Neystadt

Summary: This article covers how to employ a white-box testing approach to fuzzing, leveraging available source or disassembled code of the tested software.

Overview
White-box fuzzing or smart fuzzing is a systematic methodology that is used to find buffer overruns (remote code execution); unhandled exceptions, read access violations (AVs), and thread hangs (permanent denial-of-service); leaks and memory spikes (temporary denial-of-service); and so forth.

You can perform fuzzing on any code that parses input that is received across a trust boundary. This includes files, network sockets, pipes, remote procedure call (RPC) interfaces, driver IOCTLs, ActiveX objects, and message queues (including Microsoft Windows messages).

This article presents a case study of fuzzing during development of Microsoft Internet Security and Acceleration (ISA) Server 2006, and discusses efforts, bug density, and ROI. During this release, the internal testing team found over 30 bugs that were either Important or Critical—according to Microsoft Security Response Center (MSRC) ranking—in over 500 KLOC parsing code.

By monitoring hacker conferences and forums in the years 2005–2006, one can see that security researchers and hackers are increasingly using fuzzing as one of the main techniques for finding vulnerabilities. Hackers typically practice black-box fuzzing—generating various permutations of the data, without actually correlating it with the code that parses the data.

Black-box fuzzing ([1], [2], [7]) focuses on input format—ignoring the tested software target. While being efficient and allowing reuse of the same test tools across different tested targets that share the same data formats, this method misses significant code paths. These code paths depend on configuration options or specific complex conditions that are governed by application logic.

In this article, we discuss how to employ a white-box testing approach to fuzzing—leveraging available source or disassembled code of the tested software.

While white-box fuzzing requires a greater testing effort than black-box fuzzing, it provides better testing coverage—achieving higher tested-software quality, and ultimately completely eradicating defects that fuzzing can find.

Defect Types Found with Fuzzing

Fuzzing is a systematic way to find defects in code that account for the most severe of security bugs. Fuzzing is capable of finding remote code execution (buffer overruns), permanent denial-of-service (unhandled exceptions, read AVs, thread hangs), and temporary denial-of-service (leaks, memory spikes).

Fuzzing finds not only defects in buffer boundaries validation, but also faults in state machine logic, error handling, and clean-up code.

Fuzzing Taxonomy

Term	Definition
Dumb fuzzing	Corruption of data packets randomly without awareness of data structure.
Smart fuzzing	Corruption of data packets with awareness of the data structure, such as encodings (for example, base-64 encoding) and relations (checksums, bits indicating the presence of some fields, fields indicating offsets or lengths of other fields).
Black-box fuzzing	Sending of malformed data without actual verification of which code paths were hit and which were not.
White-box fuzzing	Sending of malformed data with verification that all target code paths were hit—modifying software configuration and the fuzzed data to traverse all data validations in the tested code.
Generation	Generation of fuzzed data automatically—not basing on any previous input.
Mutation	Corruption of valid data according to defect patterns, to produce fuzzed data.
Mutation template	Well-formed buffer that represents an equivalence class of the input. The fuzzer takes the mutation template as an input—producing a fuzzed buffer to be sent to the tested software.
Code coverage	Technology (such as that which is bundled in Microsoft Visual Studio 2005) that allows inspection of which code paths were executed during testing. This is useful for verification of test effectiveness and improvement of test coverage.

A case study that was conducted on a small Web application (450 lines of code) with four planted defects showed the following number of defects that can be found by using different techniques. The fuzzing was done by using technology that is described later in this article.

Technique	Effort	Code coverage	Defects found
Combination of black box + dumb	10 min	50%	25%
Combination of white box + dumb	30 min	80%	50%
Combination of black box + smart	2 hr	80%	50%
Combination of white box + smart	2.5 hr	99%	100%

Fuzzing Targets

Trust Boundaries
Fuzzing is a verification method for the code that processes input that is received across trust boundaries. Typical trust boundaries include:
* Files that are received from the Internet (that is, downloaded from the Web, and mail or news attachments).
* Network sockets.
* Pipes.
* RPC interfaces.
* Driver IOCTLs.
* ActiveX objects.

However, fuzzing can be applied to other, less typical trust boundaries—for example:
* Structured data that is stored in a database as blobs (for example, XML) and is written and read by different users.
* Configuration files that are written by one user and read by another.
* Message-queue systems (persistent in database or Windows Messages).
* Shared memory that is used to pass structured information between processes.

Primary vs. Secondary Fuzzing Targets

In a typical componentized software design, code that parses input data places the results in internal data structures or classes and is fairly separated from the logic that then consumes these structures. The parsing code is a primary target for the fuzzing; any unverified assumption about the data structure in it would cause unexpected results. However, the code that further consumes parsed data might also have assumptions about the data that was parsed, and applying fuzzing on this secondary code might also find security defects.

Therefore, during template selection and code coverage, analysis of secondary code should be accounted for, too.

Should Managed Code Be Fuzzed?

Naturally, most of the defects that fuzzing can find occur in software that is developed by using C or C++ languages, which leave memory management to the programmer. However, fuzzing can find defects in software that is coded in languages that hide memory management, including C#, Visual Basic, or Java. Bugs that can be found in managed code are unhandled exceptions, deadlocks, or memory spikes. Typically, these bugs are denial-of-service (DoS) or information-disclosure class bugs.

Fuzzing Process
There are two primary ways to produce a fuzzed data buffer:
* Automatic generation of fuzzed data
* Mutation of a sample of valid data (mutation template), obtained from a capture or created by some other test automation

Generation can be split into two steps:
1. Generation of a valid mutation template
2. Mutation of the template to produce a fuzzed buffer

Generation vs. Mutation
While it has been said that monkeys typing randomly can produce Hamlet entirely by chance [4], the number of permutations for accomplishing this is NP complete. Such complexity is feasible for simple parsers (see the following Step 1); for complex parsers, however, fuzzers will have to run for an impractically long time. This amount time is necessary for getting deep inside complex parsing code, to create sufficiently valid data to pass the various validations and checks that trigger parsing errors and, thus, prevent data from reaching the inner components of parsers.

A much more efficient technique for creating good fuzzing coverage is to produce fuzzing buffers by mutating well-formed buffers—mutation templates. Each such template is an equivalence class of the input.

For example, consider code that contains the following statement:

if (packet->Referrer &&& strcmp (packet->Referrer, sMySite) == 0 &&& packet-> UserAgent &&& packet->UserAgent == uaIE6 &&& p->WwwAuthenticate &&& p->WwwAuthenticate != NULL)
{
if (packet->AcceptLanguage &&&& strstr (packet->AcceptLanguage, "en-us") == 0)
{
printf ("%s", packet->AcceptCharset);
}
}

The flaw in this code is the assumption that the AcceptCharset header is always present in the request, if all of the other conditions hold. To find such a flaw, an HTTP request that contains all four headers—Referrer, UserAgent, WwwAuthenticate, and AcceptLanguage, and without the AcceptCharset header—should be attempted. Typically, when AcceptCharset is dereferenced in the call to printf, this code will crash.

If our fuzzing engine is able to add or delete random headers from the request, it will take a long time (if it is at all possible) to generate an HTTP request that has exactly those four headers present. However, if we provide a template buffer with all of the possible HTTP headers present, the fuzzer will very quickly create the bogus request with all of the headers, but without the AcceptCharset header present.

Full article...

Other Resource

... to read more articles, visit http://sqa.fyicenter.com/art/

Automated Penetration Testing with White-Box Fuzzing