Model-Based Testing in Practice

Software QA FYI - SQAFYI

By: S. R. Dalal, A. Jain, N. Karunanithi,

ABSTRACT
Model-based testing is a new and evolving technique for generating a suite of test cases from requirements. Testers using this approach concentrate on a data model and generation infrastructure instead of hand-crafting individual tests. Several relatively small studies have demonstrated how combinatorial test generation techniques allow testers to achieve broad coverage of the input domain with a small number of tests. We have conducted several relatively large projects in which we applied these techniques to systems with millions of lines of code. Given the complexity of testing, the modelbased testing approach was used in conjunction with test automation harnesses. Since no large empirical study has been conducted to measure efficacy of this new approach, we report on our experience with developing tools and methods in support of model-based testing. The four case studies presented here offer details and results of applying combinatorial test-generation techniques on a large scale to diverse applications. Based on the four projects, we offer our insights into what works in practice and our thoughts about obstacles to transferring this technology into testing organizations.

Keywords Model-based testing, automatic test generation, AETG software system.

1 INTRODUCTION
Product testers, like developers, are placed under severe pressure by the short release cycles expected in today’s software markets. In the telecommunications domain, customers contract for large, custom-built systems and demand high reliability of their software. Due to increased competition in telecom markets, the customers are also demanding cost reductions in their maintenance contracts. All of these issues have encouraged product test organizations to search for techniques that improve upon the traditional approach of hand-crafting individual test cases.

Test automation techniques offer much hope for testers. The simplest application is running tests automatically. This allows suites of hand-crafted tests to serve as regression tests. However, automated execution of tests does not address the problems of costly test development and uncertain coverage of the input domain.

We have been researching, developing, and applying the idea of automatic test generation,which we call model-based testing. This approach involves developing and using a data model to generate tests. The model is essentially a specification of the inputs to the software, and can be developed early in the cycle from requirements information. Test selection criteria are expressed in algorithms, and can be tuned in response to experience. In the ideal case, a regression test suite can be generated that is a turnkey solution to testing the piece of software: the suite includes inputs, expected outputs, and necessary infrastructure to run the tests automatically. While the model-based test approach is not a panacea, it offers considerable promise in reducing the cost of test generation, increasing the effectiveness of the tests, and shortening the testing cycle. Test generation can be especially effective for systems that are changed frequently, because testers can update the data model and then rapidly regenerate a test suite, avoiding tedious and error-prone editing of a suite of hand-crafted tests.

At present, many commercially available tools expect the tester to be 1/3 developer, 1/3 system engineer, and 1/3 tester. Unfortunately, such savvy testers are few or the budget to hire such testers is simply not there. It is a mistake to develop technology that does not adequately address the competence of amajority of its users. Our efforts have focused on developing methods and techniques to support model-based testing that will be adopted readily by testers, and this goal influenced our work in many ways. We discuss our approach to model-based testing, including some details about modeling notations and test-selection algorithms in Section 2. Section 3 surveys related work. Four large-scale applications of model-based testing are presented in Section 4. Finally, we offer some lessons learned about what works and does not work in practice in Section 5.

2 METHODS AND TOOLS FOR MODEL-BASED TESTING
Model-based testing depends on three key technologies: the notation used for the data model, the test-generation algorithm, and the tools that generate supporting infrastructure for the tests (including expected outputs). Unlike the generation of test infrastructure, model notations and testgeneration algorithms are portable across projects. Figure 1 gives an overview of the problem; it shows the data flows in a generic test-generation system.

We first discuss different levels at whichmodel-based testing can be applied, then describe the model notation and testgeneration algorithmused in our work.

Levels of testing
During development and maintenance life cycles, tests may be applied to very small units, collections of units, or entire systems. Model-based testing can assist test activities at all levels.

At the lowest level, model-based testing can be used to exercise a single software module. By modeling the input parameters of the module, a small but rich set of tests can be developed rapidly. This approach can be used to help developers during unit test activities.

An intermediate-level application of model-based testing is checking simple behaviors, what we call a single step in an application. Examples of a single step are performing an addition operation, inserting a row in a table, sending a message, or filling out a screen and submitting the contents. Generating tests for a single step requires just one input data model, and allows computation of the expected outputswithout creating an oracle that is more complex than the system under test.

A greater challenge that offers comparably greater benefits is using model-based testing at the level of complex system behaviors (sometimes known as flow testing). Step-oriented tests can be chained to generate comprehensive test suites. This type of testing most closely represents customer usage of software. In our work, we have chosen sequences of steps based on operational profiles [11], and used the combinatorial test-generation approach to choose values tested in each step. An alternate approach to flow testing uses models of a system’s behavior instead of its inputs to generate tests; this approach is surveyed briefly in Section 3. Model notation

The ideal model notation would be easy for testers to understand, describe a large problem as easily as a small system, and still be a form understood by a test-generation tool. Because data model information is essentially requirements information, another ideal would be a notation appropriate for requirements documents (i.e., for use by customers and requirements engineers). Reconciling these goals is difficult.
We believe there is no ideal modeling language for all purposes, which implies that several notations may be required.
Ideally the data model can be generated from some representation of the requirements.
In practice, a requirements data model specifies the set of all possible values for a parameter, and a test-generation data model specifies a set of valid and invalid values that will be supplied for that parameter in a test. For example, an input parameter might accept integers in the range 0..255; the data model might use the valid values 0, 100, and 255 as well as the invalid values -1 and 256. (We have had good experience with using values chosen based on boundary-value analysis.) Additionally, the model must specify constraints among the specific values chosen. These constraints capture semantic information about the relationships between parameters. For example, two parameters might accept empty (null) values, but cannot both be empty at the same time. A test-generation data model can also specify combinations of values (“seeds”) that must appear in the set of generated test inputs. The use of seeds allows testers to ensure that well-known or critical combinations of values are included in a generated test suite. Our approach to meeting this challenge has employed a relatively simple specification notation called AETGSpec, which is part of the AETGTM software system.1 Work with product testers demonstrated to us that the AETGSpec notation used to capture the functional model of the data can be simple to use yet effective in crafting a high quality set of test cases. AETGSpec notation is not especially large; we have deliberately stayed away from constructs that would increase expressiveness at the expense of ease of use. For example, complex relational operators like join and project would have providedmore constructs for input test specifications, but we could never demonstrate a practical use for such constructs.

# This data model has four fields.
field a b c d;
# The relation ‘r’ describes the fields.
r rel {
# Valid values for the fields.
a: 1.0 2.1 3.0;
b: 4 5 6 7 8 9 10;
c: 7 8 9;
d: 1 3 4;
# Constraints among the fields.
if b < 9 then c >= 8 and d <= 3;
a <d;
# This must appear in the generated tuples.
seed {
a b c d
2.1 4 8 3
}
}
Figure 2: Example data model in AETGSpec notation

An example model written in AETGSpec notation appears in Figure 2. Besides the constructs shown in the example, AETGSpec supports hierarchy in both fields and relations; that is, a relation could have other relations and a field could use other fields in a model. The complete syntax of the language is beyond the scope of this paper.

Thanks to the relative simplicity of the notation,we have had good experience in teaching testers how towrite a datamodel and generate test data. Experience discussed in Section 4 showed that testers learned the notation in about an hour, and soon thereafter were able to create a data model and generate test tuples.
After an input data model has been developed it must be checked. Deficiencies in the model, such as an incorrect range for a data item, lead to failed tests and much wasted effort when analyzing failed tests. One approach for minimizing defects in the model is ensuring traceability from the requirements to the data model. In other words, users should be able to look at the test case and trace it to the requirement being tested. Simple engineering techniques of including as much information as possible in each tuple reduce the effort associated with debugging the model. Still, defects will remain in the model and will be detected after tests have been generated. Incorporating iterative changes in themodel without drastically altering the output is vital but difficult. Using “seed” values in the data model can help, but ultimately the test-selection algorithmwill be significantly perturbed by introducing a new value or new constraint, most likely resulting in an entirely new set of test cases.

Test-generation algorithm
We use the AETG software system to generate combinations

Test Parameters (factors)
no. 1 2 3 4 5 6 7 8 9 10
1 a a a a a a a a a a
2 a a a a b b b b b b
3 b b a b b a b a b a
4 a b b b a a a b b b
5 b a b b a b b a a b
6 b b b a b b a b a a

Table 1: Test cases for 10 parameters with 2 values each of input values. This approach has been described extensively elsewhere [4], so we just summarize it here.
The central idea behind AETG is the application of experimental designs to test generation [6]. Each separate element of a test input tuple (i.e., a parameter) is treated as a factor, with the different values for each parameter treated as a level. For example, a set of inputs that has 10 parameters with 2 possible values each would use a design appropriate for 10 factors at 2 levels each. The design will ensure that every value (level) of every parameter (factor) is tested at least once with every other level of every other factor, which is called pairwise coverage of the input domain. Pairwise coverage provides a huge reduction in the number of test cases when compared with testing all combinations. By applying combinatorial design techniques, the example with 210 combinations can be tested with just 6 cases, assuming that all combinations are allowed. The generated cases are shown in Table 1 to illustrate pairwise combinations of values. The combinatorial design technique is highly scalable; pairwise coverage of 126 parameters with 2 values each can be attained with just 10 cases.

In practice, some combinations are not valid, so constraints must be considered when generating test tuples. The AETG approach uses avoids; i.e., combinations that cannot appear. The AETG algorithms allow the user to select the degree of interaction among values. The most commonly used degree of interaction is 2, which results in pairwise combinations. Higher values can be used to obtain greater coverage of the input domain with accordingly larger test sets.

The approach of generating tuples of values with pairwise combinations can offer significant value even when computing expected values is prohibitively expensive. The idea is using the generated data as test data. The generated data set can subsequently be used to craft high-quality tests by hand. For example, a fairly complex database can easily be modeled, and a large data set can be quickly generated for the database. Use of a generated data set ensures that all pairwise combinations occur, which would be difficult to attain by hand. The data set is also smaller yet far richer in combinations than arbitrary field data.

Initial work with product testers was facilitated by offering access to the AETG software system over the web. The service is named AETG Web. By eliminating expensive delays in installing and configuring software, testers could begin using the service almost immediately.

Strengths, Weaknesses, and Applicability
The major strengths of our approach to automatic test generation are the tight coupling of the tests to the requirements, the ease with which testers can write the data model, and the ability to regenerate tests rapidly in response to changes. Two weaknesses of the approach are the need for an oracle and the demand for development skills from testers, skills that are unfortunately rare in test organizations. The approach presented here is most applicable to a system for which a data model is sufficient to capture the system’s behavior (control information is not required in the model). In other words, the complexity of the system under test’s response to a stimulus is relatively low. If a behavioral model must account for sequences of operations in which later operations depend on actions taken by earlier operations, such as a sequence of database update and query operations, additional modeling constructs are required to capture control- flow information. We are actively researching this area, but it is beyond the scope of this paper.

3 RELATEDWORK
Heller offers a brief introduction to using design of experiment techniques to choose small sets of test cases [8]. Mandl describes his experience with applying experiment design techniques to compiler testing [10]. Dunietz et al. report on their experience with attaining code coverage based on pairwise, triplet-wise, and higher coverage of values within test tuples [7]. They were able to attain very high block coverage with relatively few cases, but attaining high path coverage required far more cases. Still, their work argues that these test selection algorithms result in high code coverage, a highly desirable result. Burr presents experience with deriving a data model from a high-level specification and generating tests using the AETG software system [2]. Other researchers have worked on many areas in automated test data and test case generation. Ince offers a brief survey [9]. Burgess offers some design criteria that apply when constructing systems to generate test data [1]. Ostrand and Balcer discuss closely related work to ours [12]. As in our approach, a tester uses a modeling notation to record parameters, values, and constraints among parameters; subsequently, a tool generates tuples automatically. However, their algorithm does not guarantee pairwise coverage of input elements.

Clarke reports on experience with testing telecommunications software using a behavioral model [3]. This effort used a commercially available tool to represent the behavioral model and generate tests based on paths through that model. Although Clarke reports impressive numbers con-

Category Examples
Arithmetic add, subtract, multiply
String clrbit, setbit, concat, match
Logical and, or, xor
Time and date datestr, timestr, date+, time+
Table addrow, delrow, selrow
Table 2: Manipulators tested in project 1
field type1 type2 type3;
field value1 value2 value3;
field op1 op2;
a rel {
type1 type2 type3: int float hex ;
value1 value2 value3: min max nominal ;
op1 op2: "+" "*" "/" "-";
}
Figure 3: AETGSpec data model for an expression with 3 operators

cerning the cost of generating tests, no indicators are given about the tests’ effectiveness at revealing system failures.

4 CASE STUDIES
We present experience and results from four applications of our technology to Bellcore products.

Project 1: Arithmetic and table operators
The first project addressed a highly programmable system that supported various basic operators [5]. This work had many parallels to compiler testing, but the focus was very narrow. Test were generated for arithmetic and table operators, as shown in Table 2.

The data model was developed manually. Individual data values were also chosen manually, with special attention to boundary values. The data model included both valid and invalid values. Tuples (i.e., combinations of test data) were generated by the AETG software system to achieve pairwise coverage of all valid values. (Testing of table manipulators was slightly different because both tables and table operations were generated.) All manipulator tests were run using test infrastructure that was written in the language provided by the programmable system. This infrastructure (“service logic”) performed each operation, compared the result to an expected value, and reported success or failure. The effort to create the required service logic required more time than any other project element.

Testing arithmetic/string manipulators Figure 3 shows a model (an AETG software system relation) for generating test cases. In this example, each test case consists of an arithmetic expression with two operators and three operands. The table lists all possibilities for each. An exam

Full article...

Other Resource

... to read more articles, visit http://sqa.fyicenter.com/art/

Model-Based Testing in Practice