Dissecting a 19-year-old bug

Software QA FYI - SQAFYI

By: Eric Lippert

It was with a bizarre combination of nostalgia and horror that I read this morning about a 19-year-old rather severe security hole in Windows. Today I want to expand a bit on Robert Freeman’s writeup, to describe the underlying bug in more detail, the pattern that likely produced it, better ways to write the code, and whether static analysis tools could find this bug. I’m not going to delve into the specifics of how this initially-harmless-looking bug can be exploited by attackers.
What’s so safe about a SAFEARRAY?
Many of the data structures familiar to COM programmers today, like VARIANT, BSTR and SAFEARRAY, were created for “OLE Automation”; old-timers will of course remember that OLE stood for “object linking and embedding”, the “paste this Excel spreadsheet into that Word document” feature. OLE Automation was the engine that enabled Word and Excel objects to be accessed programmatically by Visual Basic. (In fact the B in BSTR stands for “Basic”.) Naturally, Visual Basic uses these data structures for its representations of strings and arrays.
The data structure which particularly concerns us today is SAFEARRAY:
typedef struct tagSAFEARRAY
{
USHORT cDims; // number of dimensions
USHORT fFeatures; // type of elements
ULONG cbElements; // byte size per element
ULONG cLocks; // lock count
PVOID pvData; // data buffer
SAFEARRAYBOUND rgsabound[1]; // bounds, one per dimension
} SAFEARRAY;
typedef struct tagSAFEARRAYBOUND
{
ULONG cElements; // number of indices in this dimension
LONG lLbound; // lowest valid index
} SAFEARRAYBOUND;

SAFEARRAYs are so-called because unlike an array in C or C++, a SAFEARRAY inherently knows the dimensionality of the array, the type of the data in the array, the number of bytes in the buffer, and finally, the bounds on each dimension. How multi-dimensional arrays and arrays of unusual types are handled is irrelevant to our discussion today, so let’s assume that the array involved in the attack is a single-dimensional array of VARIANT.
The operating system method which contained the bug was SafeArrayRedim, which takes an existing array and a new set of bounds for the least significant dimension — though again, for our purposes, we’ll assume that there is only one dimension. The function header is:
HRESULT SafeArrayRedim(
SAFEARRAY *psa,
SAFEARRAYBOUND *psaboundNew
)

Now, we do not have the source code of this method, but based on the description of the exploit we can guess that it looks something like the code below that I made up just now. Bits of code that are not particularly germane to the defect I will omit, and I’ll assume that somehow the standard OLE memory allocator has been obtained. Of course there are many cases that must be considered here — such as “what if the lock count is non zero?” — that I am going to ignore in pursuit of understanding the relevant bug today.
As you’re reading the code, see if you can spot the defect:
{
// Omitted: verify that the arguments are valid; produce
// E_INVALIDARG or other error if they are not.
PVOID pResourcesToCleanUp = NULL; // We'll need this later.
HRESULT hr = S_OK;

// How many bytes do we need in the buffer for the original array?
// and for the new array?
LONG cbOriginalSize = SomehowComputeTotalSizeOfOriginalArray(psa);
LONG cbNewSize = SomehowComputeTotalSizeOfNewArray(psa, psaboundNew);
LONG cbDifference = cbNewSize - cbOriginalSize;
if (cbDifference == 0)
{
goto DONE;
}
SAFEARRAYBOUND originalBound = psa->rgsabound[0];
psa->rgsabound[0] = *psaboundNew;
// continues below ...

Things are looking pretty reasonable so far. Now we get to the tricky bit.
Why is it so hard to shrink an array?

If the array is being made smaller, the variants that are going to be dropped on the floor might contain resources that need to be cleaned up. For example, if we have an array of 1000 variants containing strings, and we reallocate that to only 300, those 700 strings need to be freed. Or, if instead of strings they are COM objects, they need to have their reference counts decreased.
But now we are faced with a serious problem. We cannot clean up the resources after the reallocation. If the reallocation succeeds then we no longer have any legal way to access the memory that we need to scan for resources to free; that memory could be shredded, or worse, it could be reallocated to another block on another thread and filled in with anything. You simply cannot touch memory after you’ve freed it. But we cannot clean up resources before the reallocation either, because what if the reallocation fails? It is rare for a reallocation that shrinks a block to fail. While the documentation for IMalloc::Realloc() doesn’t call out it can fail when shrinking (is this a doc bug?), it doesn’t rule it out either. It seems prudent to assume that the reallocation can always fail without evidence to the contrary. In that case we have to return the original array, untouched, and deallocating 70% of the strings in the array is definitely not “untouched”.
The solution to this impasse is we have to allocate a new block and copy the resources into that new block before the reallocation. After a successful reallocation we can clean up the resources; after a failed reallocation we of course do not.

Full article...

Other Resource

... to read more articles, visit http://sqa.fyicenter.com/art/

Dissecting a 19-year-old bug