Assertive debugging: correcting software as if we meant it
By: Mark Halpern
Assertive debugging is a new way to make embedded systems ensure their own health by having your code monitor itself.
Debugging is an art that needs much further study .... The most effective debugging techniques seem to be those which are designed and built into the program itself—many of today's best programmers will devote nearly half of their programs to facilitating the debugging process on the other half; the first half... will eventually be thrown away, but the net result is a surprising gain in productivity.
—Donald Knuth, The Art of Computer Programming1
As Don Knuth implies, debugging is a much-neglected subject, and we're paying a terrible price for that neglect. We've made little progress in debugging methods in half a century, with the result that projects everywhere are bogged down because of buggy software. The price in lost time and wasted resources, when the projects are commercial, must run into the billions; the price when the projects are military is paid not only in dollars but in lives. This situation is intolerable; new ideas and approaches must be found. This articles offers one such new approach.
I propose that a new system for debugging software called the Assertive Debugging System (ADS) can transform debugging from a minor art form to a modern industrial process. ADS exploits an old idea—the assertions were first suggested by John von Neumann in 1947.2 ADS, however, does something with assertions that neither he nor anyone else, to my knowledge, has proposed, much less done: it uses them systematically and exhaustively rather than as ad hoc tools that are employed only when the programmer remembers them and feels like using them. In doing so, ADS transforms assertions from an idea that's been floating around for half a century without achieving much, into a technology that could effect a revolution in program development. And unlike the methods Knuth had in mind, it doesn't throw away that part of the program devoted to debugging, but preserves it as valuable documentation of the state of the subject program and for later reuse when that program is modified.
Bugs: the major bottleneck
It's nearly impossible to find a scientific or engineering project these days that doesn't depend on computing, and almost as hard to find one that's not slipping its schedule because of buggy software. The debugging problem is a critical one for nearly all our projects. The penalties we pay for buggy software are already high: lost business when our customers are dissatisfied and lost sales when our products are tardy coming to market; these penalties will get much higher as we increasingly use computers for critical applications—mission-critical and even life-critical.
In such critical applications, being able to take and prove you've taken serious debugging measures will become much more important, even legally required. For applications on which so much depends, today's half-hearted gestures at debugging will no longer be acceptable. ADS represents an approach to program debugging that directly addresses these issues: it enables developers to shorten the debugging process, and it supports the systematic and documentable debugging of software objects that I contend will soon be required just to stay in business—perhaps required just to stay out of jail.
The most remarkable thing about debugging today is how little it differs from debugging at the dawn of the modern computing age, half a century ago. We still do it by letting a faulty program run up to what we conjecture is a critical point, then stop execution and look at the state of what we think are the key variables. If one of these variables differs in value from what we expected, we try to understand how it could have assumed that value. If we can't understand where it went wrong, we repeat the process, stopping at some earlier point. After an unpredictable number of iterations of this process, we stop the program close enough to the location of the bug, and the standard revelation occurs: we find that we have forgotten to reset some counter, flush some buffer, allow for the overflow of some data structure, or have committed one of the other half-dozen classic programming errors.
This is how software was debugged in the mid '50s, and how it's debugged today. It's a process that will always, if time and customer patience permit, eventually find the bug that's troubling you—but, usually, only that particular occurrence of it, and only after a debugging effort of unpredictable length, and without leaving anyone the wiser about the program being debugged, or about how to find other such bugs.
What is a bug?
To make clear which bugs are the really troublesome bugs—the ones that ADS is meant to deal with—I offer here a rough taxonomy of software problems in general, with estimates of their relative gravity. You'll find nothing original in this taxonomy; all it does is gather and organize some common truths and put them in a form convenient for understanding ADS. Only programmer errors are considered here—problems caused by hardware failure, operator error, or other conditions not under the programmer's control are not nearly so difficult to deal with, nor so serious a problem. Programmer errors are:
1. Algorithm design errors. The programmer (or his client) has misunderstood the problem, and hence the way to solve it; consequently, his algorithm, even if implemented perfectly, will not work. For example, he may be trying to compute the orbit of an artificial Earth satellite on the assumptions that the planet is perfectly spherical. His error has nothing to do with computing, but rather with his or his client's understanding of the problem he is dealing with.
2. Program design errors. Although the programmer's understanding of his problem and his approach to solving it are correct, he has blundered in designing a program to implement that solution. For example, he has failed to realize that the program he is expecting the computer to execute would take longer to run than the expected lifetime of the universe. This error is computer-related; it reflects a defect in the programmer's understanding, not of any specific computer or programming language, but of computing in general.
3. Program implementation errors. The programmer has erred in generating the instructions to be executed by his computer. Of this type, there are two varieties.
* Formal or syntactic errors. His program has violated a rule imposed by his program-development tools—but the violation is of the type caught by those tools.
* Substantive or logic errors. The program compiles but doesn't run to completion or runs but yields bad output. The programmer has either made some mechanical error (such as a typo), a formal error of a type not caught by the development system, or—and this is the critical type—an error in detailed program logic, such as neglecting to flush a buffer or writing beyond the end of a data area. (Conceivably, she may have encountered a bug in her program-development tools. This, of course, is not her fault but the fault of a programmer—another programmer.)
Type 1 errors have nothing to do with computing; they're just plain old ignorance, carelessness, or stupidity, for which no general remedy is known. Type 2 errors are computer-related, but aren't particularly troublesome; they're so gross that they're usually found early in the program's design stage and are relatively uncommon. Type 3a are already reasonably well handled—most modern program-development systems detect all the common syntactic errors and closely pinpoint them. Sometimes they can even fix them, as the program used to compose this article silently changes hte to the.
Really dangerous bugs
Type 3b errors are the real villains: easy to introduce, hard to notice, and patient in waiting for the worst possible moment to manifest themselves. The reason they're so great a problem is that they're so trivial, so inconspicuous, so hard to focus on. Type 3b bugs (henceforth just "bugs") are dangerous precisely because they're seldom immediately troublesome. A program infected with them is often asymptomatic until it crashes disastrously or yields obviously faulty output. Generally these bugs let programs run with no sign of trouble long after they have in fact corrupted the results. By the time it's evident that something is wrong, much has happened to delete or corrupt the evidence needed to determine just where the problem originated; hence the long and painful period of backtracking that the debugging process almost always begins with.
What is needed, then, to deal with the debugging problem is some way to make bugs manifest themselves quickly, so as to give us the earliest possible warning of their existence, and let us take action before continued program execution can obliterate their traces. Ideally, we would like bugs to become so blatant that their presence can be detected even before they have acted; we want to catch them when they are just about to do their dirty work. That is what ADS is designed to do.
How ADS works
The way to catch bugs while they're fresh and out in the open is by monitoring the behavior of a great variety of variables at run time, looking for violations of assertions made by the programmer when he defined them. "Variable" means here not just those quantities a mathematician would think of and label as such, but any program construct any of whose properties change in a predictable way, either absolutely or relative to some other program construct. Among these would be the numeric variables that specify how often a loop is to be traversed, how many characters a buffer can hold before it's to be written out, how many states a switch can assume, and so on; they define, collectively, the route the program is meant to follow. It's the major premise of ADS that no bug can take effect without soon causing some variable to violate a constraint, and that if such violations are systematically detected, virtually every bug will cause an alarm to sound while it's still "fresh," easily found, and understood.
The rigorous and systematic testing of such assertions throughout execution amounts to erecting walls on both sides of the narrow path that a program must take if its results are to be correct, so that the slightest deviation from that path causes an almost immediate collision between the running program and some assertion. Consequently, something valuable is learned from every execution-time failure: a bug is found (or at least its hiding area is narrowed down significantly) or a programmer's misconception is uncovered.
For each of his program constructs, the programmer asserts at definition time all the constraints on its behavior he can think of. The possible constraints include the following; others will doubtless suggest themselves as experience in the use of ADS grows:
* its maximum and minimum size
* the step size by which it will vary
* whether it's cyclic, monotonic, or "random" in the sequence of values it can assume
* the relationships it bears to one or more other variables
* an explicit list of the values it may take on or those it may not
* for a pointer or link, the type of construct it can point to
These assertions are expressed in a notation that's a natural extension of the source language the programmer uses, and they may be grouped in various ways, so that the programmer can activate or deactivate sets of related assertions with one command.
At each compilation of a subject program, the activated assertions generate into the object program code that can be used to check the variables to which they apply, at every change of value, for violations of any of the constraints so imposed ("can be used" because not every test needs to be executed every time). When the monitoring code detects that any variable has violated (or in some cases, is about to violate) an assertion, it halts execution of the program, and takes the exception action specified by the programmer.
... to read more articles, visit http://sqa.fyicenter.com/art/