Software QA FYI - SQAFYI

Software Negligence and Testing Coverage1

By: Cem Kaner

This presentation explores the legal concept of negligence and the technical concept of coverage. The article advances several related propositions:

1. The phrase, complete coverage, is misleading. This "completeness" is measured only relative to a specific population of possible test cases, such as lines of code, branches, n-length sub-paths, predicates, etc. Even if you achieve complete coverage for a given population of tests (such as, all lines of code tested), you have not done complete, or even adequate, testing.
2. We can and should expand the list of populations of possible test cases. We can measure coverage against each of these populations. The decision as to whether to try for 1%, 10%, 50% or 100% coverage against any given population is non-obvious. It involves tradeoffs based on thoughtful judgment.

1. Negligence liability attaches when injury or loss is caused by a failure to satisfy a duty that was imposed by law, as a matter of public policy. So the question of whether or not it is negligent to fail to test every line of code turns into a question of whether the company had a public duty to test every line of code.
2. The nature of the duty involved depends on the role of the company in the development and release of the program.
* A company that publishes a software product has a duty to take reasonable measures to ensure that its products are safe.
* A company that sells software development services might have a duty to its client to deliver a program that is reasonably competently coded and tested.
* A company that acts as an independent test laboratory might owe a duty of competent testing and reporting to the client or to the public.
3. In any of these three cases, it is not obvious whether failure to achieve 100% line coverage is negligent. The plaintiff will have to prove that the tradeoff made by the software company was unreasonable.

I'll start by considering the situation of the software developer/publisher. This provides the room to explore the coverage issues that are the focus of this paper. The situations of the service providers are of independent interest to the testing community and so will also be considered below.

Under negligence law, software development companies must not release products that pose an unreasonable risk of personal injury or property damage.2 An injured customer can sue your company for negligence if your company did not take reasonable measures to ensure that the product was safe.

Reasonable measures are those measures that a reasonable, cautious company would take to protect the safety of its customers. How do we determine whether a company has taken reasonable measures? One traditional approach in law involves a simple cost-benefit analysis. This was expressed as a formula by Judge Learned Hand in the classic case of United States v. Carroll Towing Co.3:

Let B be the burden (expense) of preventing a potential accident.

Let L be the severity of the loss if the accident occurs.4

Let P be the probability of the accident. Then failure to attempt to prevent a potential accident is unreasonable if B< P x L.

For example, suppose that a software error will cause a total of $1,000,000 in damage to your customers. If you could prevent this by spending less than $1,000,000, but don't, you are negligent. If prevention would cost more than $1,000,000, and you don't spend the money, you are not negligent.

In retrospect, after an accident has occurred, now that we know that there is an error and what it is, it will almost always look cheaper to have fixed the bug and prevented the accident. But if the company didn't know about this bug when it released the program, our calculations should include the cost of finding the bug. What would it have cost to make the testing process thorough enough that you would have found this bug during testing?

For example, if a bug in line 7000 crashes the program, B would not be the cost of adding one test case that miraculously checks this line (plus the cost of fixing the line). B would be:

* the cost of strengthening the testing so that line 7000's bug is found in the normal course of testing, or
* the cost of changing the design and programming practices in a way that would have prevented this bug (and others like it) in the first place.

Coming back to the coverage question, it seems clear that you can prevent the crash-on-line-7000 bug by making sure that you at least execute every line in the program. This is line coverage.

Line coverage measures the number / percentage of lines of code that have been executed. But some lines contain branches-the line tests a variable and does different things depending on the variable's value. To achieve complete branch coverage, you check each line, and each branch on multi-branch lines. To achieve complete path coverage, you must test every path through the program, an impossible task.5

The argument made at the start of this article would have us estimate B as the cost of achieving complete line coverage. Is that the right estimate of what it would cost a reasonable software company to find this bug? I don't think so.

Line coverage is just one narrow type of coverage of the program. Yes, complete line coverage would catch a syntax error on line 7000 that crashes the program, but what about all the other bugs that wouldn't show up under this simple testing? Suppose that it would cost an extra $50,000 to achieve complete line coverage. If you had an extra $50,000 to spend on testing, is line coverage what you would spend it on? Probably not.

Most traditional coverage measures look at the simplest building blocks of the program (lines of code) and the flow of control from one line to the next. These are easy and obvious measures to create, but they can miss important bugs.

A great risk of a measurement strategy is that it is too tempting to pick a few convenient measures and then ignore anything else that is more subtle or harder to measure. When people talk of complete coverage or 100% coverage, they are using terribly misleading language. Many bugs will not be detected even if there is complete line coverage, complete branch coverage, or even if there were complete path coverage.

If you spend all of your extra money trying to achieve complete line coverage, you are spending none of your extra money looking for the many bugs that won't show up in the simple tests that can let you achieve line coverage quickly. Here are some examples:

* A key characteristic of object-oriented programming is that each object can deal with any type of data (integer, real, string, etc.) that you pass to it. Suppose that you pass data to an object that it wasn't designed to accept. The program might crash or corrupt memory when it tries to deal with it.. Note that you won't run into this problem by checking every line of code, because the failure is that the program doesn't expect this situation, therefore it supplies no relevant lines for you to test.
There is an identifiable population of tests that can reveal this type of problem. If you pass every type of data to every object in your product, you will find every error that involves an object that doesn't properly handle a type of data that is passed to it. You can count the number of possible tests involved here, and you can track the number you've actually run. Therefore, we can make a coverage measure here.
* A Windows program might fail when printing.6 You achieve complete coverage of printer compatibility tests (across printers) if you use the set of all Windows-supported printers, using all Windows printer drivers available for each of these printers. These drivers are part of the operating system, not part of your program, but your program can fail or cause a system failure when working with them. The critical test case is not whether a particular line of code is tested, but whether it is tested in conjunction with a specific driver.
* Suppose that you test a desktop publishing program. One effective way to find bugs and usability failures is to use the program to create interesting documents. This approach is particularly effective if you use a stack of existing documents and try to make exact copies of them with your program. To create your stack, perhaps you'll use all the sample files and examples that come with PageMaker, Quark, FrameMaker, and one or two other desktop publishers. In this case, you achieve complete coverage if you recreate all of the samples provided by all available desktop publishers.

The Appendix to this article lists 101 measures of testing coverage. Line coverage is just one of many. There are too many possible tests for you to achieve complete coverage for every type of coverage in the list.

I hope that the list helps you make priority decisions consciously and communicate them explicitly. The tradeoffs will differ across applications-in one case you might set an objective of 85% for line coverage,7 100% for data coverage, but only 5% for printer / driver compatibility coverage. For a different program whose primary benefit is beautiful output, you would assign printer coverage a much higher weight.

If you had an extra $50,000 to spend, would you focus your efforts on increasing line coverage or increasing some of the others? Surely, the answer should depend on the nature of your application, the types of risks involved in your application, and the probable effectiveness of the different types of tests. The most desirable strategy will be the one that is most likely to find the most bugs, or to find the most serious bugs.

The legal (negligence) test for the coverage tradeoffs that you make is reasonability. No matter what tradeoffs you make, and no matter how much money you spend on testing, you will miss some bugs.8 Whether or not those bugs are products of negligence in the testing process depends on your reasonability, not on your luck in selecting just the right tests.

Your task is to prioritize among tests in the way that a reasonably careful company would-and to me that means to select the test strategy that you rationally believe is the most likely to find the most bugs or the most serious bugs.

There is no magic talisman in coverage that you can use blindly and be free of negligence liability. Being reasonable in your efforts to safeguard your customer requires careful thought and analysis. Achieving complete (line, branch, whatever) coverage will not insulate you. The plaintiff's attorney will just ask you why you spent all that money on line coverage, at the expense of, say, interrupt coverage. Try to assign your weights sensibly, in a way that you can explain and justify.

The same reasoning applies to customer satisfaction in general. If your approach will control the risks, you've done your job. But if you can identify gaps that leave an unreasonable degree of risk to customer safety or satisfaction, there is no reasonable alternative to addressing those risks.

As a final note, I hope that you'll take a moment to appreciate the richness, multidimensionality, and complexity of what we do as testers. Sometimes we hear that only programmers should be testers, or that all testing should be driven from a knowledge of the workings of the code. This list highlights the degree to which that view is mistaken. Programming skills and code knowledge are essential for glass box testing tasks, but as we explore the full range of black box testing approaches, we find that we also need skills and knowledge in:

* the application itself (subject matter experts)
* safety and hazard analysis
* usability, task analysis, human error (human factors analysis)
* hardware (modems, printers, etc.)
* customer relations.

A person who has these skills but who can't program may be an invaluable member of a black box testing team.

Full article...

Other Resource

... to read more articles, visit

Software Negligence and Testing Coverage1