Software Testing Technology

Software QA FYI - SQAFYI

By: Ben-Avi

Software Testing Technology

Overview

INTRODUCTION

This section provides a high-level overview and perspective on software testing technology as it is practiced today and as it may be practiced in the near future. Of all of the areas within computer science, none has received more attention in the past decade than the issue of software quality. Software quality lies at the foundation of many (or even all) of the problems facing the computing community; the issue of quality determines whether particular solutions to problems that involve a computer are, or are not, accepted. Certainly, there are sufficiently many examples of systems (i.e. software solutions) that have failed to meet minimum levels of acceptability; there are, in addition, a few instances of genuine success.

What is the difference between the successes and the failures? The answer is manifold: it not only involves the application of good software engineering techniques and the use of modern programming practices, software design methodologies, and more reliable high-level languages, but also must take into account the software engineering methods used (whatever they are) to assure the quality of the delivered software system. Quality is assured primarily through some form of software testing; a number of instances of formal program verification exist, but in relative terms the total amount of software that has been processed with this theoretically important but costly method is very small. Very often, what passes for software quality assurance with testing has not been called that at all: instead, it was termed advanced debugging, interface testing, or acceptance testing, and so forth.

Now, however, there is growing agreement on the role of testing as a software quality assurance discipline, as well as on the terminology, technology, and phenomenology of, and the expectations about testing. This section presents some intuitive descriptions of each of these categories.

HISTORY OF TESTING.

Beginnings.

The first formal technical conference devoted to software testing was held in June 1972 at the University of North Carolina. Although that certainly was not the first time a program was tested, it was the first time that researchers who were primarily concerned with the practicalities of software quality assembled to discuss the whole range of issues involved.

Subsequent to that meeting there have been many International Software Engineering Conferences, and a spate of smaller workshops and symposia devoted either fully or in part to questions of program testing. At the Oregon Conference on Computing in the 1980s, testing technology was considered as a formal part of software engineering technology, probably for the first time.

The history of testing goes back to the beginnings of the computing field; the programs that ran on the earliest machine were "tested." In fact, there is even an early paper by Turing indicating that "testing is the empirical form of software quality assurance, while proving is the theoretical way." Results of both approaches have varied widely. Some software has performed excellently, and other software has done quite badly. There are plenty of "doom and gloom" anecdotes such as the famous missing comma in the Venus probe software, the structural faults that led to the detection of the moon as an incoming ICBM, and the like. Perhaps expectedly, many of the successes are not so well known or discussed.

The Direction for the Future.

There is a steadily increasing need for effective software quality assurance at affordable prices. The need arises from various sectors of industry, all of which need practical methods for assuring quality (minimizing the chance of a latent error) without having the costs of doing so go off the scale. At the end of this section are some practical guidelines on what the costs to do this might actually be.

The one obstacle to the widespread adoption of software testing is its negative reward structure. The objective of a dedicated software tester is to find errors and report them to somebody, but that attitude and perspective may be at odds with the software implementors and/ or their managers. Hence, in many cases it is found necessary to consider special organizational setups both to isolate this negativity (or to turn it around as a positive force) and to maximize any benefits of synergy.

As more and more experience with testing is gained, an increasingly strong set of intuitions and guidelines on handling the software quality assurance problem is being developed as well as ideas on how to get the most out of what is currently known about the techniques for achieving software quality. Knowing how to choose among the many technical alternatives is the responsibility of the well-informed quality assurance manager.

MOTIVATING FORCES.

This section describes the basics of testing (what is involved and how it is done). It suggests some of the primary motivating forces that push toward the use of formal testing methods and indicates in a general way what some of the benefits of testing are.

Basics of Testing.

A quality analysis oriented test of a computer program consists of:

l Running the program with a controlled set of inputs.

l Observing the run-time effect the inputs have on the program.

l Examining the program outputs to determine their acceptability.

One usually attends to these items informally during the program debugging process, whereas, during formal program testing, one thinks less in terms of making the program do something and more in terms of showing precisely what the program does. One way to distinguish between formal testing and a less formal debugging activity is to examine the tools used. Debugging relies on the program production tools (i.e., compilers, debugging packages that examine a program's internal operation), whereas testing may involve other kinds of tools.

In addition to the process of running programs under controlled and observable circumstances (a process called dynamic testing), a second stage called static testing is ordinarily included. Static testing may include manual code inspections, structured walkthroughs, or the use of automated tools that analyze software by looking for certain kinds of common errors (those not caught during the normal program production process). Contemporary testing technology advocates combinations of static and dynamic quality analysis processes.

The input data used to make a program demonstrate its own features are called a test case or test data. Individual programs can be tested, a process called unit testing.

Whole systems of software can also be tested in an activity called system testing. Normally, system testing is divided into a set of individual activities that are larger than unit testing but smaller than full system testing. Testing can be done from either the black box or the white box perspective, depending on whether the internal operation of the program is being observed during the testing process. Normally, black box testing is reserved only for the system level; most programs are fully unit tested using white box methods.

During white box testing, the tester observes the extent to which test cases exercise program structure. The resulting coverage is used as an indicator of the likelihoods of any remaining undiscovered errors. Note that input/output relationships still have to be examined in detail for appropriateness.

One of the more common strategies for testing involves declaring in a minimum level of testing coverage, ordinarily expressed as a percentage of the elemental segments that are exercised in aggregate during the testing process. This is called C1 coverage, where C1 denotes the minimum level of testing coverage so that every logically separate part of the program has been exercised at least once during the set of tests. Unit testing usually requires at least 85-90% C1 coverage.

At the system level it is possible to define similar measures of coverage that relate to the extent to which subroutines and invocations to them are exercised. Many different strategies exist for governing the process of multiple unit testing and/or system-level testing.

An underlying premise of structurally motivated and measured testing is that at a minimum, it is necessary to know that all of the functions in a software system have been exercised at least once. It is not reasonable to accept (or release, or approve) any software system unless it is known that every part of it has been exercised. Although this criterion cannot be guaranteed (in formally provable terms) to force discovery of all errors, it can be shown to be a practically effective method for systematically analyzing large and complex software systems (see below).

Typical Applications.

Much of contemporary testing technology arises from the need to examine the quality of software that is embedded in products sold directly to consumers or that otherwise affects the public. This factor was not predicted as the major motivating force in the past, perhaps because nobody expected that there would be such widespread use of software. It is valuable to take a look at some typical application areas and identify for each of them the primary areas of concern for software quality.

Instrumentation Systems. The use of computers, and especially microcomputers, in instrumentation has been increasing rapidly in the past five years. For example, many oscilloscope, data display, signal analysis, and communication link analyzers are more effective and can have more capabilities and flexibility with embedded microcomputers. The instrument manufacturers have found, however, that they are poorly equipped to deal with the software component in such instruments. In a typical instrumentation system, the software is coded at the assembly language level, a choice usually dictated by economies of program storage space and execution time. It seems well understood now that the cost to repair or modify such systems is very large, particularly if it must be done after product introduction and/or if the production run is large. As a consequence, the instrument manufacturers are generally providing fairly basic levels of software quality assurance through testing, with the effort level based on the relative cost to repair the software in the field.

Process Control Systems. Computers are finding increasing application in process control, particularly for those systems that have rapid real-time response requirements that cannot ordinarily be met by human controllers. Examples of systems that fall into this category range from nuclear reactors to petroleum plants to automated assembly lines. The cost of failure of such computer-based control systems ranges from politically unacceptable to economically undesirable. Largely as a function of the potential impact of a failure caused by software, organizations responsible for these kinds of systems are "betting" on a number of techniques. In many cases, triple modular redundancy is provided at the hardware level to reduce the chances of a hardware failure to vanishingly small probability. Corresponding methods of multiple implementation of software control processes have also been used effectively, particularly in very critical applications.

Appliances. As with instrumentation systems, most appliance-based software is written at a very low level, sometimes even in direct machine code. The cost of failure, or the impact on the organization, as the result of a deficiency in software is ordinarily measured by a combination of direct and indirect costs. Direct costs include the costs of repair, as well as the cost to provide legal protection in the event of software failure. The indirect costs involve such factors as product image, organizational image, and reputation.

Automotive Computers. Everyone has been talking for some time about the revolution in automotive electronics, in which many of the current mechanically implemented control functions have been replaced by full digital control. While this has been slower to occur than many had thought, recent model cars do contain computers, and they are used in a more than peripheral fashion. Announcements appeared for several retrofits based on dash computers, in many instances similar to previously available "rally-style" computers in function but not in technology. For this class of application, software reliability can affect the driver and passengers only in an indirect manner. An unanswered question, however, is extent of liability assumed by the supplier of the software/hardware combination.

Telephony. Computers are increasingly the central feature of complex telephone switching equipment, both for the public network and for privately maintained in-house systems. In addition to the switching function is the data transfer function represented by the new value-added data transmission networks. For all of these applications, the implications of software failure are generally less severe than those previously noted, being intrinsically restricted to the loss of communication and/or the effect of an errant call. For value-added data switching networks, however, the implications of a failure are not so clear cut. While the telephone companies themselves have devoted substantial effort to systematically testing computerized switching systems and substantial experience has been gained in effectiveness evaluation of switched data networks, much quality assurance work remains to be done.

Military Systems. Because military systems typically involve the risk of human life, substantial emphasis has always been placed on the software quality issue. Modern concepts of third party verification and validation of software projects had their origin in the military community. Notwithstanding this head start, however, contemporary technology seems to be passing up military systems in favor of developing techniques more applicable to the popular languages and/or the popular computers. Much of research and development activity in the area of program verification has been focused on military applications software. Full verification and validation for military style software is a very expensive proposition. Estimates range from $10 to $25 per statement and higher for "the full treatment". A number of companies have been active in this field for some time with measurable effectiveness. After all, strategic systems that employ computers have experienced no failures, although this may have been the result of luck.

Liability Questions.

At a meeting of futurists, a scenario focused on the implications for software engineering of a hypothetical multi-million dollar malpractice claim against programmers responsible for a vehicle control package. While the application in mind was somewhat facetious, indications are today that the issue of liability and, correspondingly, the issue of professional malpractice will be important ones in the future.

Underwriters in the United States, with the backing of Lloyds of London, are now writing this kind of policy. The key question to be answered is, "What is the interaction between the cost of malpractice insurance and the kind and level of software testing that has been performed?"

Benefits of Testing.

Apart from all of these motivating forces, additional benefits may accrue as the result of instituting a systematic testing discipline. Some of these are:

l By focusing attention on the issues of software quality, programmers as well as program testers are made conscious of the need for error-free software.
l The processes involved in analyzing computer programs from the perspective of a program tester almost automatically ensure that the more flagrant kinds of errors will be detected.
l The systematic testing process, even if it does not identify significant errors, still acts as a backup to other techniques such as design reviews, structured walkthroughs, and so on.
l Instituting a systematic testing activity provides a framework in which new quality assurance technologies can be applied as soon as they become available.

As subsequent sections will show, contemporary technology does permit general kinds of statements about testing activity, how it should be organized, and what should be expected of it.

ORGANIZING FOR TESTING.

If a testing activity is going to be instituted, some appropriate questions are:

l When should the program-testing methodology be applied?
l How should the testing group be organized?
l How much effort should be applied to the testing activity?
l What outcomes can be expected from the testing group?

Some answers to these and related questions are given in the paragraphs below.

When to Test

As already indicated, program testing is an activity that requires that the computer software be relatively stable. Testing as discussed here is a post-development process, or it could be interpreted as an "acceptance test" process. A software system can be considered reasonably stable:

l When the structure of the system is well determined, that is, all of the major subsystems and most of the modules are defined.
l At the time of first release of a working prototype.

It is important to delay the formal testing process until after the program is debugged because too many changes tend to complicate things by shifting the burden of testing to one of retesting. Most typical applications of software-testing technology advocate testing at least at the following three levels (or stages) of software development:

l After individual modules have been completed and thoroughly tested by their programmers (unit testing).
l During integration of individual modules into subsystems or during integration of subsystems into the final system (development testing).
l During final integration (hardware/software integration) (System testing).

A tradeoff exists between the onset of formal testing and the quality ultimately achieved, although this relationship is not yet thoroughly documented. It appears necessary to plan for program testing fairly early in the software development process. By doing so the software engineer is thereby assured of a sufficiently good set of test cases with which to begin formal testing. Most of the time, programmers devise tests that exercise the working features of the program, rather than tests that identify errors. Hence, the testing process must concentrate on those features that were not exercised previously.

Organizational Schemes. Precisely how the testing team is organized and how it fits into the software life cycle are important questions that need to be addressed early. As already suggested, current thinking advocates the beginning of formal test procedures immediately after module development. The question remains, however, whether the testing should be done by a programmer, by a separate group in the organization, by an independent group (either within the organization or outside), or by somebody else. Some of the factors to be investigated in choosing the organizational structure for the testing team are:

To whom does the testing team report? If the test team reports to the ultimate buyer, then its independence is not in question. If software is supplied by a subcontractor, then the testing team should report to the prime contractor. In general, it is important that the testing team have adequate freedom to ensure that it can deliver its report without interference.

How is the budgeting done? Testing budgets could be set as a function of total system implementation costs or as a fixed percentage of staff time, or be based on the size of the software system (e.g., a fixed dollar amount of testing per statement).

Note that a number of organizations offering quality assurance services are springing up all over the United States and elsewhere in the world. Only time will tell the extent to which such organizations can survive, and how effective they will be as independent entities.

THE PSYCHOLOGY OF TESTING

This section presents some basic information, both positive and negative, about the psychology of testing.

The Negative Side

As with many human-based methodologies, the overall effectiveness of program-testing activity can be influenced by the attitudes held about testing. The prevailing attitude, and one that should be resisted as much as possible, views testing in the following ways:

l Testing is a dirty business, involving cleaning up someone else's mess and not involving any of the "fun" of creating programs.
l Testing is unimaginative and lacks the sense of adventure and opportunity for self-expression ordinarily associated with software production.
l Testing is menial work and involves handling too many details.
l Testing is too difficult, because of the complexity of programs, because of the difficulty of the logic of programs, and because of deficiencies in technology.
l Testing is not important, is under-funded, and generally involves too much work in too little time.
l The tools associated with testing are primitive and difficult to use.
l The techniques of testing are not rigorous. Too much ad hoc thinking is required; there is too little systematic knowledge to rely upon; too often every new activity is another "special case"; and there are too few generally accepted principles with demonstrated value.
l Testing has a negative reward structure; finding mistakes requires a critic's mentality.

All of these factors, or any one of them in the extreme, can change a successful program testing activity into a fiasco.

The Positive Side

It is possible to make the following claims about the psychology of testing:

l Testing is a challenge because it is a complicated problem that requires significant creativity at all times and because it rewards discipline.
l Testing is significant because software creators will appreciate having their oversights discovered, because managers will then feel more confident in the product, and because program testers can take pride in the software.
l Testing is interesting because the technology is imperfectly understood (and therefore represents an opportunity for innovation), because insights gained can be used in developing automated tools, and because many different approaches will be valued.

This positive view, unfortunately, has been apparent only in a few instances. Fortunately, this situation is changing in many areas of the computing community.

SOME MANAGEMENT GUIDELINES

This section presents general guidelines on the size, difficulty, cost, and expected outcomes of a formal software testing activity. Naturally, such estimates must be used carefully. Estimates are based on experience in testing software written in a high level language.

Size of Testing Budget

A common way to express the size of the testing activity is as a percentage of the total software development costs. For HLL-coded systems, experience suggests that budgets in the range of 25 to 45% of the total development cost for the software are appropriate. Here, total development cost is intended to include all of those costs up to the time when the first release of the software is made. Naturally enough, the amount of effort devoted to the formal testing activity is a function of how critical the system is. The higher percentage figure is what appears to be minimally adequate for the highest-priority software system. Examples of systems that fall into this class are those that are man-rated and those that have a large production run (with value far exceeding the total software development cost). A second rule-of-thumb that has been found useful in the past is to provide insurance based on the expected cost of failure. For a product with an embedded computer, for example, it would be reasonable to allocate 5 to 10% of the total cost to repair the first (set of?) software failure(s), in the field if necessary, solely with a well-organized search for errors through systematic testing. The attitude to have is that there are errors in the software and the only question is whether they are going to be found before or after issuance of the product! Note that this attitude allows for a product to be issued with a known error if the cost to repair it at the time it is found is too large (e.g., when the program is "burned" into ROM), or when the impact of it is judged too small to warrant changes until the "next model."

Size of the Testing Problem

It is possible to estimate the total difficulty of the testing activity for software systems written in an HLL from past experience in analyzing software. Naturally there is a rather wide variance in these properties, depending on both the kind of problem being solved and the style of programming. To appreciate the numbers, it is important to understand the precise meanings of some testing-related terminology:

1. A statement in a program is either a declaration or an action statement, and may run to more than one line of text.
2. A segment is a logically independent piece of program text that corresponds to the actions the program takes as the consequence of a program decision (including the decision to make a subroutine invocation). Hence a program with no logic has one segment; each IF statement contributes two segments, and so forth.
3. A test is one transfer of control to the software system (regardless of the part being examined) and the consequential actions the system takes. This corresponds (in effect) to a single subroutine invocation.

Note that a subroutine with no logical statements -- the so-called straight-line program-has one segment consisting of all the statements of the program, and at least one test would be required. Coverage is measured (in the C1 measure) by the percentage of segments that are exercised at least once in aggregate, that is, over all of the tests run. Typical programs will have a number of segments approximately equal to between 5 and 25% of the number of statements. (That is, each segment corresponds typically to about 5 to 20 statements.) A good "middle" figure is 10%: Multiply the number of statements by 0.1 and you get the number of segments.

The number of tests required to achieve an 85% C1 coverage level also tends to be a constant. For typical HLL-coded programs, each test exercises between 5 and 50 statements. This figure should not be misinterpreted; achieving 85% C1 coverage may involve some tests that exercise only a few statements and others that exercise as much as 50% of the program.

The 85% C1 coverage figure was chosen because it represents a reasonably attainable goal in practice with a large software system. It is important to appreciate that, in most cases, bringing the figure to 100%, if it is possible at all, would increase the number of tests required substantially.

The Results of Testing

The most important questions is, Given 85% coverage of segments in a complex HLL-coded program, what is the likelihood of finding errors of varying severity? It is difficult to quote a precise figure, since the number of errors found is a function of the skill of the program testing team, the quality of the software (or lack of it), and the amount of prior testing that has been performed. For a typical situation in which software has been fully unit-tested by the programmers, but which has not been systematically tested by a separate group, one can expect a latent defect count (see below) at a rate of 0.5 to 5.0% of the number of statements. This suggests that a 1000 statement program, when declared "finished" by the programmer, may have as many as 100 defects of varying levels of severity. Such defects fall into two classes: fatal and nonfatal defects. Approximately 15 to 20% of the defects are fatal. The implication is that the total number of fatal errors is as much as 1% of the total number of statements. (This may or may not be a comforting fact, depending on one's point of view!)

QUALITY ASSURANCE AND TESTING TOOLS

This section addresses the functional characteristics needed in modern quality assurance support tools. Quality assurance is virtually dependent on the use of tools because of the highly complex nature of the analyses that must be done.

The section describes each general category of tool, the ways in which these functions can be integrated into a single automated testing system for use on complex software, and projections of tool development in the future.

Generic Descriptions of Tools

The two broad categories of software quality assurance support tools are: static and dynamic analysis tools. Most tool functions fall cleanly into one category or the other, but there are some exceptions like symbolic evaluation systems and mutation analysis systems (which actually run interpretively). The main tools used in quality assurance are:

l Static analyzers, which examine programs systematically and automatically
l Code inspectors, who inspect programs automatically to make sure they adhere to minimum quality standards
l Standards enforcers, which impose simple rules on the programmer
l Coverage analyzers, which measure the value of Cx, x = ?
l Output comparators, used to determine whether the output in a program is appropriate or not
l Test file/data generators, used to set up test inputs
l Test harnesses, used to simplify test operations
l Test archiving systems, used to provide documentation about programs

Each of these is discussed below. The discussion is organized into the two major divisions in the above list: static and dynamic.

Static Testing Tools

Static testing tools are those that perform analyses of the programs without executing them at all.

Static Analyzers.

A static analyzer operates from a precomputed data base of descriptive information derived from the source text of the program. The idea of a static analyzer is to prove allegations, which are claims about the analyzed programs that can be demonstrated by systematic examination of all of the cases. There is a close relation to code inspectors, but static analyzers are stronger. All of these systems are-language dependent in the sense that they apply to a particular language, and also often to a particular system.

Experience: Many applications using static analyzers find 0.1 to 2.0% NCSS (Noncomment Source Statements) deficiency reports. Some of these are real, and others are spurious in the sense that they are false warnings that are later ignored, after interpretation.
Possibility: Build a static analyzer for the language/machine being used; then apply it to all code produced.
Assessment: Probably a favorable benefit/cost ratio if the software analyzed is critical and the tool is well designed.
Problems: Dependence on language vagaries, variance between compilers, and high initial tool investment costs.

Code Inspectors.

A code inspector does a simple job of enforcing standards in a uniform way for many programs. These can be single statement or multiple-statement rules. Also, it is possible to build code inspector assistance programs that force the inspector to do a good job by linking him to the process through an interactive environment. The AUDIT system is typical: It enforced some standards and also imposed some minimum conditions on the program. AUDIT, in use by the Navy for some time, was found effective in production coding. Code inspection activity is found in some COBOL tools, and in some parts of tools like RXVP.

Experience: Found useful in many circumstances.
Possibility: Implement automated code inspector system for use in handling production coding, and possibly nonproduction codes.
Assessment: Used properly, it can be a big payoff.
Problems: Language dependence, programmer resistance, initial investment costs, and difficulty in constructing a good programmer interface.

Standards Enforcers.

This tool is like a code inspector, except that the rules are generally simpler. The main distinction is that a full-blown static analyzer looks at whole programs, whereas a standard enforcer looks at only single statements. Since only single statements are treated, the standards enforced tend to be cosmetic ones; even so, they are valuable because they enhance the readability of the programs. It seems well established that the readability of a program is an indirect indicator of its quality.

Experience: When used after initial programmer resistance, such tools are found helpful as a filter protecting against completely unreadable codes.

Possibility: Establish standards and support tools to enforce them and then have all suppliers use that standard.

Assessment: Indirect benefit on software quality, a factor difficult to justify quantitatively but one that has a certain palliative effect.

Problems: Programmer resistance and attitude of non-importance of program format.

Other Tools.

Related tools are used to catch bugs indirectly through listings of the program that highlight the mistakes. One example is a program generator that is used (mostly in COBOL environments, but possibly also in others as well) to produce the pro-forma parts of each source module. Use of such a method ensures that all programs look alike, which in itself enhances the readability of the programs. Another example is using structured programming pre-processors that produce attractive print output. Such augmented program listings typically have automatic indentation, indexing features, and in some cases much more.

Dynamic Testing Tools

Dynamic testing tools seek to support the dynamic testing process. Besides individual tools that accomplish these functions, a few integrated systems group the functions under a single implementation. A test consists of a single invocation of the test object and all of the execution that ensues until the test object returns control to the point where the invocation was made. Subsidiary modules called by the test object can be real or they can be simulated by testing stubs. A series of tests is normally required to test one module or to test a set of modules. Test support tools must perform these functions:

l Input setting: selecting of the test data that the test object reads when called
l Stub processing: handling outputs and selecting inputs when a stub is called
l Results display: providing the tester with the values that the test object produces so that they can be validated
l Test coverage measurement: determining the test effectiveness in terms of the structure of the program
l Test planning: helping the tester to plan tests so they are both efficient and also effective at forcing discovery of defects

Coverage Analyzers (Execution Verifiers).

A coverage analyzer or execution verifier (or automated testing analyzer, or automated verification system, etc.) is the most common and important tool for testing. It is often relatively simple. C1 is the most commonly used measure, but Cd is used sometimes (see below). Most often, C1 is measured by planting subroutine calls--called software probes -- along each segment of the program. The test object is then executed and some kind of run-time system is used to collect the data, which are then reported to the user in fixed-format reports.

Experience: Use of coverage analysis can be incorporated into most quality assurance situations, although it is more difficult when there is too little space or when non-real-time operation is inequivalent to real-time operation (an artifact of the instrumentation process).

Possibility: Require use of C1 (at least) in all cases; 85% C1 is a practical, minimally acceptable value of testing coverage.

Assessment: Highest payoff in terms of quality achieved at the lowest possible cost through use of C1 measurement.

Problems: Programmer resistance, nonstandard system use, difficulty in interpreting C1 values for multiple tests (requires some kind of history analyzer).

Output Comparators.

Output comparators are used in dynamic testing -- both single-module and multiple-module (system level) varieties -- to check that predicted and actual outputs are equivalent. This is also done during regression testing. The typical output comparator system objective is to identify differences between two files; the old and the new output from a program. Typical operating systems for the better minicomputers often have an output comparator, sometimes called a file comparator, built-in.

Experience: Basic utility, finds differences effectively and usually quite efficiently

Possibility: Equip the quality assurance facility with an output comparator for general use Assessment: Needed tool for QA environment

Problems: May produce too much output if old and new files are lengthy and/or have many differences.

Test File Generators.

A test file generator creates a file of information that is used as input to the program and does so based on commands given by the user and/or from data descriptions (in a COBOL program's data definition section, for example). Mostly, this was a COBOL-oriented idea in which the file of test data is intended to simulate transaction inputs in a data base management situation. This idea has been adapted to other environments.

Experience: Good benefit when this kind of tool is used.
Possibility: Use test file generator, or adapt the concept, to create lengthy files of input transactions. The best use is to have the input data varied automatically to account for a range of cases.

Assessment: For a given software testing situation, it may be a good idea to include a test file generator system -- either commercial or home brew -- as a way to save valuable testing effort. Problems: Difficulty of use.

Test Data Generators.

The test data generation problem is a difficult one, and at least for the present is one for which no general solution exists (a known theoretical fact). On the other hand, there is a practical need for methods to generate test data that meet a particular objective, normally to execute a previously unexercised segment in the program. One of the practical difficulties with test data generation is that it requires generation of sets of inequalities that represent the conditions along a chosen path, and the reality is that:

l Paths are too long and produce very complex formulas.
l Formula sets are nonlinear.
l Many paths are illegal (not logically possible).

Practical approaches to automatic test data generation run into very difficult technical limits. In practice, the techniques of variational test data generation are often quite effective. The test data are derived (rather than created) from an existing path that comes near the intended segment for which test data are to be found. This is often very easy to do, apparently because programs' structures tend to assist in the process.

Experience: Automatically generating test data is effectively impossible, but good R&D work is now being done on the problem.
Possibility: Use variational and machine-assisted methods only.
Assessment: Technical issues of this problem may be more formidable than the problem is in practice.
Problems: Recursive undecidability of test data generation problem, and difficulty in developing good interactive test data generation heuristics.

Test Harness Systems.

A test harness system is one that is bound (i.e., link-edited and relocated) around the test object and that (1) permits the easy modification and control of test inputs and outputs and (2) provides for online measurement of C1 coverage values. Some test harnesses are batch oriented, but the high degree of interaction available in a full-interactive system makes it seem very attractive in practical use. Modern thinking favors interactive test harness systems, which tend to be the focal point for installing many other kinds of analysis support.

Experience: Clear benefit/cost improvement, even with batch-oriented systems. (TESTMANAGER had nearly 300 installations, as a typical example of a batch system.) Possibility: Current technology permits development of a test system for almost any system and for almost any language.

Assessment: Give strong consideration to building a test harness system for the candidate language and on the candidate machine, for both single-module and system testing.

Problems: Need an interactive environment and customizing on language/machine combination. Development cost.

Test-Archiving Systems.

The goal of a test archiving system is to keep track of series of tests and to act as the basis for documenting that the tests have been done and that no defects were found during the process. A typical design involves establishing both procedures for handling files of test information and procedures for documenting which tests were run when and with what effect.

Experience: Test archive systems are mainly developed on a system-specific/application-specific basis.
Possibility: Implement manual methods at least, and give consideration to automating test documentation processes.
Assessment: High potential value during regression testing (after maintenance phase is reached) because of automation.
Problems: Dependency on local environment, inadequate experience in automation.

Characteristics of Tools

The characteristics of modern software quality assurance tools differ somewhat from previous systems. Modern tools are modular and highly adaptable to different environments. They also tend to make use of the facilities provided by most operating systems rather than provide those capabilities internally. The ingredients of an AVS package of modern design are:

1. Instrumentor: This component modifies source programs so that they emit subroutine calls to a RUNTIME package that reports to a COVERAGE analyzer.

2. Runtime package: This component receives data from programs that were processed by the INSTRUMENTOR and generates either an on-line trace file or a series of calls directly to the COVERAGE component.
3. Coverage analyzer: This component accepts information from the RUNTIME package and produces C1 coverage reports. This component also generates S1 (system level) coverage reports as an option.
4. Command processor: A unified component that interfaces between the user and the remainder of the other components.
5. Test assistance component: This tool displays partial listings of program text for known-to-be-executable paths and provides input/output analysis that assists in employing the variational test data generation method.
6. Log processor: This component generates an archival log of all actions taken by the tester during processing of a system.
7. Test harness: This component provides standardized drivers and stub controllers for the test object (i.e., the program being tested).
8. Results analyzer: This component automatically compares the values produced by a program with the reference values recorded prior to the program's execution.
9. Archive controller: This unit manages a set of system-dependent files that keep permanent records of the effects of testing throughout the test process.
10. Report controller: This component monitors all the output produced by other components and guarantees that the total volume of information does not overwhelm the user.

Future Tool Prognosis

The future development of tools is dependent on at least two factors: (1) the level of use made of current tools -- individual tools and integrated sets of tools, and (2) the degree of interest expressed by the software engineering community in having advanced tools. Tools tend to lead the application of methodologies, at least in the sense that when a new methodology is conceived, it is normally necessary to construct (or at least envision) a set of tools to go along with the methodology. This means, in effect, that each new methodology invented by researchers in the field will have to be tried out first with prototypes of what may ultimately become integrated tool sets. This happened in the early 1970s with such systems as JAVS and RXVP, among others, where early experimental prototypes of systems that are not in practical use were developed in most cases without complete information about their ultimate operational function. It seems clear now that the central focus for the future will be on tools as interactive support facilities, connected to data base systems used to keep track of all of the detailed information, and effectively built in to the system and/or methodology that surrounds their use.

LEVELS OF UNIT TESTING TECHNOLOGY

Quality assurance is a set of disciplines that can be applied in a number of ways, and at varying levels of sophistication. The essential ingredients of typical quality assurance activities are three:

1. Systematic inspection and analysis, an activity that seeks to find discrepancies between requirements and the actual software
2. Dynamic test planning, an activity that defines a series of tests that will be run to demonstrate various functions of the software
3. Comprehensive test evaluation, an activity that occurs during dynamic testing and involves analysis of the test results after tests are run.

These three steps are included in one form or another in almost all testing processes -- although not necessarily under precisely these names. Note that the code inspection processes are fundamentally different from the other two-both of which involve execution of the program in some way. Hence, the technology for systematic inspection of the programs can be handled in a fully different fashion from the methods used to treat dynamic testing. The following section describes a sequence of increasingly sophisticated levels of testing methodology, expressed in terms of coverage measures that indicate how thoroughly the process in steps 2 and 3 is actually accomplished. The use of coverage measures for this purpose is intended as a basis for describing the methodologies that are associated with each of them. The measures are structural indicators of the thoroughness with which each portion of the software under examination is actually exercised. As studies point out, the relationship between the achieved coverage and the chance of an error remaining in the software is an indirect one at best, even though there is substantial evidence to indicate that the correlation is very high. As a convenience to the reader, definitions of the various coverage measures are given at the beginning of each section.

Inspection Methods

The idea of code inspections was first described in full in 1976; Fagan described IBM's internally developed procedures. They were an outgrowth of structured programming and as such may be naturally combined with regular structured programming methods. The method of code inspection, when applied to critical software, can be expected to discover a number of problems with a software system. Code inspection is best performed, according to Fagan, with teams in which individuals have very well-defined roles:

l The moderator, the key person on the team, whose job it is to " . . . preserve objectivity and to increase the integrity of the inspection".
l The designer, the expert responsible for doing the design for the system.
l The coder/implementor who builds the software.
l The tester, responsible for all software testing.

This team applies a set of well-defined rules to the candidate software and seeks to find "errors" in the programs. Naturally, the errors are either fixed immediately, or they are held for later verification through dynamic testing.

Experience: Fagan's report became the "bible" for this technique, and it quotes 80% or better error detection rates at costs that are quite low: between 650 and 900 NCSSs per team hour for preparation and inspection at two stages. (This approximates 200 NCSSs per hour overall.)

Possibility: Application of code inspection generally increases overall final-product productivity by reducing error content early.
Assessment: Use code inspection methods as part of the quality assurance discipline, as software is delivered.
Problems: Rules to be followed are specific to each language/machine/application combination, so must be redone each time.

Testing Methods

The methods of testing always focus on execution characteristics of the programs being examined. A main theme of the testing literature is the trend to interpret the effect of tests in terms of the level of structural exercise the programs attain. The levels of testedness then have a specific meaning in terms of testing coverage, described next at various levels.

Ad Hoc Testing (CO Coverage). The processing of programs assumes a constant structure viewed in a model called a "directed graph representation" of the program. In this model, nodes correspond to "places" in the program text, and edges correspond to actions (or segments). Conventional programming methods normally involve only a limited amount of testing -- sometimes called debugging -- and typically result in less than full exercise of a program. Many writers during the 1970s advocated the following measure as a way to characterize the quality of a set of tests.

CO coverage measure: CO is the percentage of the total number of statements in a module that are exercised, divided by the total number of statements present in the module.

The intention of this measure is to ensure that the highest possible fraction of program statements are exercised at least once. The surprising thing about much of the work done by programmers is that most of the time they test programs very naively. CO is not normally considered an acceptable level of quality in the testing process but is generally felt to be better than nothing at all. In other words, the CO measure decides what fraction of the total number of statements have been exercised

Experience: Most programs when completed are about 50% tested in the C1 measure, and perhaps 75 to 90% tested in the CO measure.
Possibility: Use CO testing as the basic measure of programmer testing, but do not use it to measure actual quality assurance testing coverage.
Assessment: Use CO only if no other measure is possible; best to use C1 as the minimum (see below).
Problems: The main problem is that CO, even if achieved, would leave some segments unexercised.

Basic Testing (C1 Coverage).

The next level in testing comprehensiveness is established by the first truly structure-based measure, called Cl, and defined as follows:

Cl coverage measure: The percentage of segments exercised in a test relative to the total number of segments in a program.

The C1 measure has its origin in research work aimed at developing a strong measure for testing effectiveness that takes into account most, if not all, of the elementary features of a software system. C1 measures the total number of segments exercised in each test, or when computed cumulatively, the total fraction of segments that are exercised in one or more tests in a series.

The C1 goal is to have a set of tests that in aggregate exercise a high percentage of all of the segments it is possible to exercise. Current thinking is that a value of C1 of 85% or greater is a practical level of testing coverage. Experience suggests that this level of C1 coverage is adequate to discover perhaps 90% (?) of the available errors, although there is no way of knowing that for certain. The 85% level arises from the fact that the JAVS system was purchased by the U.S. Air Force with an 85% self-test requirement (this percentage occurred in 1973 and 1974). Even though 85% is not perfection, it is much stronger than the actual 25 to 50% achieved by most programmers -- even the good ones. Even when coverage information is provided as part of the programming process, evidence suggests that unless prodded, programmers don't exceed the threshold levels for Cl. The methodology for getting C1 = 85% is quite well understood in general terms, and involves structured test planning, dynamic operation of the program, and a lot of study of the source text. This is discussed in more detail below.

Experience: 85% C1 or more means finding 2 to 5% NCSS defects, approximately 10 to 20% of which are probably fatal or dangerous.
Possibility: Apply Cl-based measurement as soon as possible to critical applications.
Assessment: C1 is a good starting point for a systematic methodology for testing, a good minimum requirement.
Problems: No proof exists to back up the empirical evidence that software testing using the C1 measure has a predictable effect in forcing discovery of errors.

Intermediate Testing (C2 Coverage).

The next most sophisticated level above C1 testing attempts to force exercise of some basic features of the program besides just trying all the segments. This is the C2 measure, which is defined as follows:

C2 coverage measure: This coverage measure assesses the quality of tests by requiring full C1 coverage, plus one test for each iteration's interior and exterior.

C2 came about because a measure was needed to emulate the work of proof-of-correctness methods. The strength of C2 lies in the fact that achieving C1 forces all of the volume of the code, and C2 adds in the iteration. Note that C2 will normally require more tests than C1 alone would. In practice, although there is only a small amount of evidence, it is felt that C2 is significantly stronger than C1, in the sense that accomplishing it would more likely uncover looping errors in programs in much the same way a proof of correctness does.

Experience: Minimal, but much consideration given to use of this measure.
Possibility: Apply C2 in most important one-third of the application programs.
Assessment: Likely candidate for future use, if possible.
Problems: Tends to be structure dependent, so that if the program has a structural fault, there will be less chance of finding any defect easily (this is a common limitation of all structure-based measures).

Advanced Testing: (C2 Coverage).

C2 is a very strong test method. It designs tests in terms of the pure-structured representation of a program. The definition of C2 is as follows:

C2 coverage measure: The percentage of independent execution subtrees in the hierarchical decomposition tree of a program that have been exercised, relative to all of the possible subtrees that can be found for that program.

C2 for a pure-structured program is the same kind of testing that would be accomplished in a full-blown proof of the program. In other words, the set of tests indicated by this method is the same as the set that would be used if the set of verification conditions for the program were each tested. The way the C2 measure is computed is as follows: The hierarchical decomposition tree is generated from the directed graph (digraph) for the program, and the set of all possible subtrees of this tree is found. Each subtree represents one distinct executional class for the program. C2 is measured as the percentage of all such subtrees that have been tried during testing. Another way to find the subtrees is to think of the program as if it were pure-structured, then choose the tests by inspection from that representation of the program. The use of decomposition methods is not necessary when the program is already pure-structured.

Experience: Primarily pencil and paper investigations, although some organizations have automated test planning methods based on this principle.
Possibility: Use C2 as the basis for very strong testing of modules of the application software.
Assessment: Use of C2 will probably represent the strongest of the dynamic testing disciplines.
Problems: C2 is highly dependent on program structure, so when there are structural faults, C2 may not be effective.

Symbolic Evaluation Methods.

Besides proof of correctness, symbolic evaluation and symbolic execution are the strongest methods known for assuring quality through testing. Actually, symbolic evaluation is a static analysis-type method, because the program is never actually executed. Instead, a particular path through the program is evaluated in detail from the source-level version of the program. All the indicated computations are performed symbolically, subject to constraints that may exist along the path because of the kind and number of conditionals that specify whether the path is executable. The result is a set of formulas that tell what the path computes. This is very close to proof of correctness. The relation is so close that the method is considered, along with test data generation systems and proof systems, as the partial basis for quality assurance systems of the future. Most of the work in this area has been done quite recently.

Experience: Primarily in academic-like environments, and for small (less than 2,000 NCSS) programs. Many errors are found during the symbolic evaluation process, but the method appears to depend on human interaction.

Possibility: Increased R&D work could lead to a user-interactive system that reduces the amount of work to be performed.

Assessment: Good prospects for this approach for the future.

Problems: Difficulties with combinatoric growth in the formula size, with complexity of the formulas and/or the path predicates, and difficulties in reducing the formulas to human-readable format.

Relation to Program Proving Methods.

A proof of correctness is a mathematical demonstration of the consistency between a program and its specifications, relative to sets of statements about the environment and the semantic behavior of the program. The largest programs ever proved correct are about 1,700 statements.

As part of the proof process some 43 errors in the 1,700 line program were found; it was later discovered that about 86 percent of these would have been detected through C1-based testing. Some proofs that have been completed and even published in the technical literature have later been found to be proofs of programs that have easily visible errors present. This points out the fundamental dependence of proof methods on mathematically complete statements about the environment in which the programs are supposed to run.

The basic steps in a proof are: (1) establishing the assertions, (2) constructing the proof, (3) checking the result. These steps are only partially automatable. Finding the assertions is a process similar to that involved in setting up the plans for C2-based testing. Constructing the proof can be done automatically, but most systems, when given a problem to work on that comes from a real-world example, require human assistance. Lastly, the check process itself must be verified by running some actual tests of the program.

Experience: Much in the R&D community, little in the practical world. Largest program proved is about 2,000 statements or less.
Possibility: Scaling problems will limit proving methods to smaller systems.
Assessment: Use proofs for most important modules only.
Problems: What checks the proof? Known errors in proof-of-correctness exercises. Complexity of the method.

Future/Projected Techniques vs Current Methods.

Prior sections have identified the basic methods of software testing, expressed in terms of the kind of coverage that must be achieved at each level for single-module analysis. The methodology for multiple-modules and for systems is similar, with the exception of the differences that result from the issue of complexity and size. Future methods of testing include at least domain testing and mutation analysis. Domain testing involves the analysis of the input space of each module to establish a set of domains that can be used as the basis for systematic demonstration of each module's behavior. The input domains' boundaries are analyzed in detail for their intersections; test data that push the behavior of the program close to the boundaries are deemed better than ad hoc test data. Mutation analysis is the process of systematically constructing small variations, called mutants, to the given program and then determining if they can be distinguished by the existing test data. Each mutant that is distinguished is termed "retired"; the remaining mutants are either equivalent to the original program or inequivalent but undistinquished by the test data. In the latter case, one must add test data. Although experimental, the mutation scheme is experiencing a great deal of interest because it provides a direct, quantitative measure of the quality of test data for a program.

The disadvantage of the method is that it may require rather large numbers of mutants, and the number of equivalent programs may be quite large.

Possibility: Methods like these will be used soon as the basis of future quality assurance disciplines.
Assessment: Both domain testing and mutation analysis methods are too immature for real-world application.
Problems: Domain testing appears to be limited in effectiveness to linear-constraint (linear logic) applications. Mutation analysis is still in the experimental stages and needs extensive field validation prior to actual use in a practical situation.

TESTING METHODOLOGIES

This section describes some of the basics of two major classes of methodologies: single-module testing, and multiple-module testing (or system testing). The goal of the section is to establish a baseline of understanding about the processes that apply in typical quality assurance circumstances.

A way of determining how to work a testing methodology is to examine the characteristics of various methods against a backdrop of some typical situations. The four cases we can employ for this purpose are:

1. Case a: A high-criticality single module
2. Case b: A medium-sized, medium-criticality system
3. Case c: A large, medium-criticality system
4. Case d: A large, high-criticality system.

Case a is in a high-level language, it is maybe 250 to 500 statements long, and it has a very complicated control structure. It has many different things to do, and it must be error free. Case b is a set of about 25 modules that support an on-line facility of some kind. If the software system fails, there is no serious loss because there is an automated backup system; however, it is expensive in terms of lost production and associated waste. The system is written in a structured extension of FORTRAN and contains about 6,500 statements overall. The calling depth in the structure of the system is between 4 and 7 -- not complex and not flat. Case c is a comprehensive system for control of a facility, and it is 80% in a high-level language like Pascal or Ada and 20% in an assembly language. The total volume of code is in the range of 175,000 NCSSs. If this system fails, there is a substantial loss of value, but no lives would be lost, and the damage would not normally be extensive and/or expensive to repair. Case d is a geographically dispersed interactive control system where human life is at stake -- like an advanced air traffic control system. The system approaches the limits of current complexity in the sense that the latest methods are employed in its design and implementation, perhaps 1,000,000 NCSSs overall.

Single-Module Test Methodology

Single-module testing puts maximum focus on each separately compilable and/or separately invokable software module. The process of handling exhaustive analysis of a single module has three stages. First, there is an analysis and preprocessing stage during which the module is structurally analyzed and studied. If an automated test system is being used, this stage includes preparing the module to be the test object in the subsequent phases. Typically, there is some programmer-supplied test data that can be used to find an initial value for the testing coverage measure being used. The coverage level obtained by these initial tests is normally quite low. In fact, there is reason to believe that a major reason for the presence of such high defect rates in software is that the programs are never tested very well by the programmers. The next stage involves the middle game, during which tests are generated systematically to increase coverage. Only a few errors are likely to be found since even though the programmer may not have exercised the program thoroughly, many errors that can be found through the inspection process would have been removed. The final stage involves the end of testing activity, during which it may be necessary to plan for a number of difficult tests so that all of the segments can be exercised. As an alternative to requiring exercise of a segment, the program tester can spell out the reasons why a segment was not exercised.

Experience: Most errors are found near the end of the single module testing process, although the reasons why this is true are not clear.
Assessment: At least every module should be tested to a high C1 level of structural coverage, and much more testing should be done if possible.
Problem: Single-module testing is perhaps an order of magnitude easier with the right tool environment (i.e., an interactive test system); without good tools, the work is difficult.

System-Testing Methodology

System testing is analogous to single-module testing except that a segment corresponds to a whole module. The two modes of attack can be different -- a fact that is highly dependent on the application software and its internal structure. System testing is, typically, done either bottom up or top down. This means that the flow of attention during the entire testing process is from single modules at the bottom of a system to the topmost modules (driver modules) later on, if the bottom-up methodology is chosen. If the top-down method is used, attention begins at the topmost module and continues until each module in the system is handled. Stubs (empty procedures) are sometimes used to simulate subsidiary module behavior. The choice of which method to use depends on many factors, but primarily on the structure of the software system itself. Large systems that are flat (in the sense they have a small maximum invocation chain length) tend to work better in a bottom-up mode, other factors being equal, than top down. Systems that are very narrow and deep tend to fall more easily into the top-down mode.

Experience: Most experience is with single module testing, but in some cases, there is extensive system-level experience. Early data seemed to suggest that there is no effective difference between one level of testing and the other qualitatively, only quantitatively.
Possibility: Direct extension of single-module testing to the system-level case, provided the right data are presented.
Assessment: Proceed in all cases with system testing as if it were module testing, assuming that this option exists.
Problems: Lack of experience in testing very large systems of software; unavailability of appropriate tools.

Application to Cases

The application of these methodologies to the cases described above should now be relatively straightforward.

l Case a: A high-criticality single module. In this case, the strongest methodology for single-module testing should be applied to the critical module. The C2 measure should be used if possible.
l Case b: A medium-sized, medium-criticality system. It is not possible to focus all the effort on each module, so a limited approach seeking to maximize the payoffs from rigorous single-module testing should be followed. This will tend to maximize the outcome in terms of numbers of errors found.
l Case c: A large, medium-criticality system. For a large system, there is presumably a somewhat larger budget, and so it is possible to consider building special purpose tools that can be used to support the testing activity (see below). In addition, this level of effort can involve other features of quality assurance besides simple testing.
l Case d: A large, high-criticality system. This situation requires the use of the best methodologies, thorough analysis of single modules throughout the software, and a good level of system testing.

Contemporary Results.

It is possible to set up a systematic test factory or assembly line and have it work well. In an experimental test factory, the goal was to process a large amount of software at relatively low cost. In 24 man-months of work in the automated software test factory, some 60,881 statements of a PL/l-like language were processed. They had some 4,378 segments and required 1,544 tests to achieve 89.7% overall C1 coverage. In the processing, some 1,486 discrepancies were found between the software as it existed and the specifications as they existed. About 190 of these were fatal program errors; the remainder were deficiencies that were accepted as such by the client organization. The main lesson learned in this case was the necessity for a good tool environment, the costs of which are not reflected in the above figures.

In another activity, single-module and system-level testing were done on the Central Flow Control software. Some 23,700 statements of JOVIAL/J2 were tested and 98+% C1 coverage was obtained. The results of the error detecting process were as follows:

l 3.57% NCSS defect rate in unit testing
l 0.25% NCSS defect rate in subsystem testing
l 0.08% NCSS defect rate in system testing
l 3.91% NCSS overall defect rate

The important feature of these values is that they were found by systematic examination of an existing software system.

A final example comes from the nuclear energy field. In this experiment, Cl- and C2-based testing were done on programs to determine whether multiple programming teams using different programming languages could produce significantly better software. Using a coverage-analysis system based on RXVP, the team at the Kernforschungszentrum Karlsruhe was able to discover 100 errors in about 4,500 statements, for a 2.4% error rate. Most of these errors were "in the specification" in the sense that the programs had been written correctly even when the specification was faulty.

Experience: When done with good tools and a positive atmosphere, it is possible to find a significant number of errors (defects, deficiencies) in completed programs.

Possibility: Continued emphasis on this kind of technique can reduce the chances for errors to appear in programs.

Assessment: Costs for full Cl-based methods are about 20 to 40% in addition to the programming costs (private-sector cost estimate).

Problems: Requires good tools; requires right QA management attitude; is far from a fully comprehensive methodology (Cl does not guarantee the discovery of errors); cost of right tool environment.

THE FUTURE

Predicting the future is difficult in a technical area and changeable as software quality assurance. Likely future events can be classified in three categories: methodology, supporting tools, and theory (which tends to lead methodological advances).

Theory Area

In the area of theory, it is reasonable to expect that most of the technical issues surrounding software testing will be fully resolved soon, but there is no guarantee of this. Progress seems to be good in most cases, but as with all theoretical work, there are some problems on which progress is slow. Some of the unanswered questions deal with: What is the direct relationship between coverage analysis and testing? How will we know when errors can no longer be present in a program? What constitutes completely adequate and fully characterizing test data?

This is the most sensitive of the three technical areas to predict. Only time will tell for sure.

Tools Area

Contemporary tools, like those used to produce the results described in the preceding sections, are still far from being ready for full production use since they have the flavor of prototypes being evaluated for eventual use. Each new large-scale project leads to a new level of understanding of what the tools should be and how they should operate, but does not fully answer the question of what the ideal tool should be. Certain required features of tools can be discerned from current experience, however. Among them are:

l Interaction level with the program tester is very important because it helps him or her to pay attention to more detail.
l Documentation backup systems are critical in tools to ensure that enough data are preserved.
l Structural test planning methods are crucial as a way to eliminate the burdens of test-planning computations.

These required features of tools overlay a second layer of needs and wants.

Methodology Area

Theoretical developments lead to the development of formal testing methodologies, so that when technical advances are made in the theory of testing, they can be applied to the practice. The current art of testing is based primarily on systematic structural and behavioral analysis of software, and this basis can be expected to continue for quite some time. New standards for quality assurance may have a positive effect on the kinds of methodologies actually practiced. For example, if a minimum level-of-coverage criterion is adopted as the standard for acceptance testing of software systems, then one would anticipate the quick adoption of corresponding systematic testing methodologies. In some cases, draft standards have already been created, although not without some controversy. Methodologies for system-level testing are less well developed than for single-module quality assurance; it is in this area where the greatest improvements can be made. One could expect that as more and more experience is gained, better methodologies will result.

Other Resource

... to read more articles, visit http://sqa.fyicenter.com/art/

Software Testing Technology