Measure test coverage with Cobertura

Software QA FYI - SQAFYI

By: Elliotte Rusty Harold

Find untested code where bugs lurk

Summary: Cobertura is an open source tool that measures test coverage by instrumenting a code base and watching which lines of code are and are not executed as the test suite runs. In addition to identifying untested code and locating bugs, Cobertura can optimize code by flagging dead, unreachable code and can provide insights into how an API operates in practice. Elliotte Rusty Harold shares how you can take advantage of Cobertura using code-coverage best practices.

Test-driven development is the most significant innovation in programming in the last 10 years, although test-first programming and unit testing are hardly new. Some of the best programmers have been using these techniques for half a century; however, only in the last few years have they become widely recognized as critical components in developing robust, bug-free software on time and on budget. But test-driven development is only as good as the tests are. Tests improve code quality, but only for the parts of the code base that are actually being tested. You need a tool that tells you which parts of a program are not being tested so you can write tests for those parts and find more bugs.

Mark Doliner's Cobertura (cobertura is Spanish for coverage) is a free as in speech GPL tool that handles this job. Cobertura monitors tests by instrumenting the bytecode with extra statements to log which lines are and are not being reached as the test suite executes. It then produces a report in HTML or XML that shows exactly which packages, classes, methods, and individual lines of code are not being tested. You can write more tests for those specific areas to reveal any lingering bugs.

Reading Cobertura output
Let's begin with the generated Cobertura output. Figure 1 shows a report produced by running Cobertura on the Jaxen test suite (see Resources). You can see that coverage ranges from great (almost 100 percent in the org.jaxen.expr.iter package) to extremely poor (no coverage at all in org.jaxen.dom.html).

Cobertura calculates coverage both by the number of lines tested and by the number of branches tested. For a first pass, the difference between these two is not hugely important. Cobertura also calculates the average McCabe's cyclomatic complexity for the class (see Resources).

You can drill down through the HTML report to view the coverage of a particular package or class. Figure 2 shows coverage statistics for the org.jaxen.function package. In this package, coverage ranges from 100 percent for the SumFunction class to a mere 5 percent for the IdFunction class.

Drilling down further into individual classes, you can see exactly which lines aren't being tested. Figure 3 shows part of the coverage in the NameFunction class. The left-hand column shows the line number. The next column shows the number of times that line is executed during the test run. As you can see, line 112 is executed 100 times, and line 114 is executed 28 times. Lines highlighted in red are not tested at all. This report reveals that although the method as a whole is tested, many branches are not.

Identify missing tests
Using Cobertura's reports, you can identify the untested parts of the code and write tests for them. For example, Figure 3 shows that Jaxen needs tests that apply the name() function to text nodes, comment nodes, processing instruction nodes, attribute nodes, and namespace nodes.

Adding all the missing tests is time-consuming when you have as much uncovered code as Cobertura has identified here -- but it's worth doing. You don't have to do it all at once. Begin with the least-tested code, such as any packages that have no coverage. Once you have tested all packages a little, you can write some tests for each class that shows no coverage. Once you partially test all the classes, write tests that cover any uncovered methods. Once all the methods are tested, you can begin looking at what's necessary to activate any untested statements.

Leave (almost) no code untested
Is there anything you can test but shouldn't? It depends on who you ask. In the JUnit FAQ, J. B. Rainsberger writes, "The general philosophy is this: if it can't break on its own, it's too simple to break. First example is the getX() method. Suppose the getX() method only answers the value of an instance variable. In that case, getX() cannot break unless either the compiler or the interpreter is also broken. For that reason, don't test getX(); there is no benefit. The same is true of the setX() method, although if your setX() method does any parameter validation or has any side effects, you likely need to test it."

I don't agree. I've lost count of the number of bugs I've found in code that was "too simple to break." It's true that some getters and setters are so trivial that there's no way they can fail. But I've never been able to figure out how to tell which methods really are too simple to fail and which ones just look that way. It's not hard to write tests that cover simple methods like setters and getters. The minimal time you take to do this will be more than compensated by the number of such methods in which you'll find unexpected bugs.

Generally, it's fairly easy to reach 90 percent test coverage once you start measuring. Increasing coverage to 95 percent or more can require some sneakiness. For example, you might load different versions of supporting libraries to test workarounds for bugs that don't show up in all versions of the libraries. Or you can restructure code so the tests can reach parts of the code they wouldn't normally touch. You might extend classes simply to make their protected methods public so they can be tested. These tricks might seem excessive, but they've helped me find more undiscovered bugs about half the time.

Perfect, 100 percent code coverage is not always attainable. Sometimes you find lines, methods, or even entire classes that simply cannot be reached by tests, no matter how much you contort the code. The following are some examples of challenges you might come across:

* Code that executes only on a specific platform. For example, in a well-designed GUI application, the code that adds an Exit menu item would run on a Windows PC but not a Mac.

* catch clauses that catch exceptions that won't happen, such as IOExceptions thrown when reading from a ByteArrayInputStream.

* Methods inside nonpublic classes that are never actually invoked but must be implemented in order to satisfy the contract of an interface.

* Code blocks that handle virtual-machine bugs, such as a failure to recognize the UTF-8 encoding.

Given these and similar issues, I find the efforts of some extreme programmers to delete all untested code automatically to be unrealistic and perhaps satirical. The fact that you can't always have absolutely perfect test coverage doesn't mean you shouldn't have better coverage.

Nonetheless, more often than not unreachable statements and methods are vestigial code that no longer has any purpose and can be cut from the code base with no effect. It is sometimes possible to test untested code through really skanky hacks that use reflection to access private members. It is also possible to write tests for untested, package-protected code by putting the tests classes in the same package as the classes they're testing. Don't do this. Any code that cannot be reached through the published (public and protected) interfaces should be deleted instead. Unreachable code should not be part of a code base. The smaller a code base, the easier it is to understand and maintain.

Full article...

Other Resource

... to read more articles, visit http://sqa.fyicenter.com/art/

Measure test coverage with Cobertura