Software QA FYI - SQAFYI

Scalability and Performance Testing of Server Software

By: Srinivasan Desikan

This article is intended to discuss the concepts of performance and scalability testing with respect to four resources CPU, disk, memory and network. The four resources are related to each other and we need to completely understand their relationship to implement the strategy for scalability and performance testing.

Like the children in the house the four resources in the computer require equal attention, as they want to grow together to enhance the scalability and performance purpose. Many software engineers do not understand the purpose of this growth very well. All these resources are interdependent and will have issues if other resource is modified. For example, if the memory requirements in the system are addressed, the CPU will become more intensive. This in turn results in multiple cycles of upgrade as the requirements of the customers keep increasing. All these resources tend to grow, when a new release of the product happens. These aspects show only upward growth as software becomes more and more complex. Multiple upgrade cycles were possible till last few years. Due to several factors like recession in economy, cost consciousness and software being utilized in plenty of other equipment such as Mobile phones, and PDAs, there is a requirement to change the development and testing strategy. This is needed to justify the growth of these four resources. The software is not only for computers but also for equipment; hence up-grading the resources are not easy anymore.

Let us discuss the following issues and steps to derive a good testing strategy based on some assumptions: · Choosing the right product configuration · Choosing the right resources · Utilizing the benchmarking and scalability results Choosing the Right Product Configuration

The confusion that often arises between the customers and software vendors is in deciding the right product configuration. This happens since the developers and the testers of server software assume only one typical customer scenario. They exhibit a reluctant attitude for appreciating the fact that each customer scenario is different and requires a specified customization at the design level of the product.

Often, the customers are lost when they have been told to increase the number of CPUs, memory, and the network bandwidth for better performance and scalability. Also, measurable guidelines to specify the level of performance/scalability improvement/degradation when resources are adjusted, is not provided. This results in multiple upgrade cycles.

Choosing the Right Resources

It is important to understand and acknowledge customer’s view about the resources the product intensively uses and the resources that are critical for the product usage. It is easy to tell the customer that product A is CPU intensive, product B requires more memory and product C requires better bandwidth. Some of the products that run on a particular server could be from multiple vendors. But it is very difficult for the customer to increase all the resources in the server as all the products are expected to run in the same server at the same time. Many times the engineers are not aware of this fact and they look at only one product to give the solution, which often fails to meet the expectations of the customer. The concept of system is missing in development and testing of many companies. The product development teams do not get to decide what components constitute a system; it is rather decided by the customer and may involve multiple products from multiple suppliers. Utilizing Benchmarking and Scalability Results

The benchmarking and scalability results achieved in the test lab are not repeatable for the customer in their setup. This is because of the fact that the product is tested in a simulated environment or by keeping all parameters such as memory, CPU, disk and network in "safe condition". These parameters are generally available to customers in the marketing docs.

The perspectives and requirements for each of the customer are completely different and it is very difficult to set the right strategy. Let us have a look at some basic assumptions before we proceed further.

· The CPU can be utilized, as long as the CPU can be freed when a high priority job comes in. · The “memory available” can be used by the multiple threads of the software, as long as the threads exit after performing the task within reasonable amount of interval, and the memory is relinquished when another job requires memory. · The cost of adding CPU or memory is not that expensive as it was before, to get better performance as long as we know what the increased % of performance and scalability is for each added resource · Network packets can be generated by the product, as long as the network bandwidth is available and the cost is also less. There is a difference in this assumption that most of the packets generated are for LAN and not for WAN. In the case of WAN or routes involving multiple hops, the packets need to be reduced. · More disk space or the complete I/O bandwidth can be used for the product as long as they are available. While disk costs are getting cheaper, IO bandwidth is not. This may limit the amount of "disk footprint" for a product. · The customer gets the maximum Return on Investment (ROI) only if the resources such as CPU, Disk, memory and network are optimally used. So there is intelligence needed in the software to understand the server configuration their usage and auto-tune the parameters accordingly (e.g. Number of threads) to get optimum returns · Graceful degradation in performance or scalability can be expected when resources in the machine are also utilized for different activities in the server · Predictable variations in performance or scalability is acceptable for different configurations of the same product · Variation in levels of performance and scalability is acceptable, when some parameters are tuned, as long as we know the impact of adjusting each of those tunable parameters. · The product can behave completely different in low-end and high-end servers as long as they support ROI. This in fact motivates the customers to upgrade their resources.

From the above assumptions, you may get an opinion that our perspectives may be completely different from what customers really expect. The above statements can't be directly told to the customer in verbatim. However the above assumptions will guide us to derive answers to some of the problems that traditionally exist in software but never answered for the customers. These assumptions help us to create a basic scenario for performance and scalability.

Performance testing can be conducted in different conditions. The analysis of four resources (CPU, Memory, Disk & Network) gives us the following test scenarios; a. Record the normal (Without fine tuning the parameters) and the best performance data (after fine tuning the parameters) on a typical recommended server configuration (as documented in the product user guide as minimum requirement). Note down the utilization of CPU, disk space, disk I/O usage, memory usage, network usage (called baseline utilization) b. Record the normal and best performance data on a typical customer configuration (obtained from sales/marketing data - also called popular configuration). Note down the utilization of CPU, disk usage, disk I/O, memory and network usages (called normal utilization) c. Record the normal and best performance data on the specific customer scenarios and note down the resource usage (special configuration). There could be multiple special configurations that are possible depending up on how many customer scenarios we would like to simulate in the lab.

Tuning the configurable parameters of the product is a trial and error method and getting the best performance data may require multiple iterations of testing. Trial and error method doesn't mean you do the testing without any baseline data. Design and architecture of the product can definitely give indicative performance and scalability, which becomes the baseline as well as improvement area. Most of the time, the limitations in performance and scalability are introduced by design and architecture. Tuning as a procedure doesn't mean that we need to tune only product parameters. The tuning includes OS, services and product parameters. Trial and error method for understanding the benefit or the impact of tuning the parameters is normally a pre-system or pre-integration test activity. This should not be performed during test cycles. Also one need to understand that tuning a parameter for performance may affect another type of testing such as, reliability. Such points need to be kept in mind and setups that are created for performance/scalability testing need to be reused for other types of testing to understand the complete impact.

While we talked about setting up configuration, it may not be always possible to get a specific customer scenario. However the experience of testing various configurations would give some ideas about the expected performance levels, tuning details etc. Such points along with assumptions need to be documented so that customers can derive specific performance levels from the product.

Validating the performance results is another gray area of this discussion. There needs to be some basic assumptions that are to be made to validate this data. The actual expectation may vary from product to product. Some of the basic assumptions can be,

A11. The performance should increase 50% when I double the number of CPUs from minimum requirement (The performance here could be response time or throughput) and 40% thereafter, till x number of CPUs are added (normally you get breakeven at 16 or 32 CPUs). This aspect has to be tested with as assumption that the product is CPU intensive. A12. The performance should increase 40% when I double my memory from minimum requirement and 30% from thereafter A13. The performance should increase by at least 30% when I double the number of NIC cards or increase network bandwidth in the system A14. The performance should increase at least by 50%, when I double I/O bandwidth

The above assumptions try to validate the four resources discussed, what and how they need to grow and the benefits because of the growth. These assumptions also help us to develop scalability requirements for the product. In the points said above, we have see situations where there is a demand for increasing the resources. Obviously, you can't see improvement in performance after upgrading the resources such as RAM, CPU unless utilization of those resources is quite high already. Unnecessary upgrades need to be avoided.

Metrics and Measurements for performance and scalability testing

Now it is time to propose a set of methodologies and define metrics and measurements based on the above discussion and assumptions. The metrics and measurements defined under have a strong relationship with the assumptions stated above, and these metrics need to be changed when one or more of the assumptions are changed.

Full article...

Other Resource

... to read more articles, visit

Scalability and Performance Testing of Server Software