A Cautionary Tale

Software QA FYI - SQAFYI

A Cautionary Tale

By: P. Martin

A Cautionary Tale - In this article, Patric will share his experience of debugging application to identify performance bottleneck in the application. He conclude his article with a very important advice, do not make assumptions and consider data variation.

A Cautionary Tale ...
There are some basic rules of thumb which will serve you well in testing any application that deals with lists of data (and which applications don't?).

1.1; few; many"
2."don't make any assumptions"
3."remember to mix it up a little"

This case study covers a very interesting example of where following the rules of thumbs *exactly* paid dividends. The Problem

There was an interactive reporting solution that was having performance issues: essentially there was some pathological performance degradation under some circumstances.

The code contributing the biggest slice of time wastage was in a 3rd party component that really could not be
There were multiple passes at the problem and the usual things were done:
* Direct: get the problem project and simply pause the debugger at the obvious pain point
* Indirect: scrub the code looking for opportunities to
* (Eventually) Validation through testing: write an automated script to exercise the the reporting component through a number of configurable scenarios

These scrubs made a palpable difference on each iteration: for reasons related to the constraints of the 3rd party component and the application framework it was hosted in, traceability was not great in the code and it turned out to be difficult to get the job right first time.

There were significant improvements made however: minutes became seconds in some cases which gives an indication of the potential severity of the issue for an interactive application.

However, it turns out even if the coding had been got "right first time" there was a lurking issue that would have caused a return to the drawing board due to a serious performance issue.

The bombshell
In this case it's worth going into a little detail:
Developer and testers have an awareness to some level of the importance of the complexity of a given process on the run time: essentially they tend to have an expectation that (roughly) the run time will go as O(n) and for many apps there is a strong pressure for n to be a small number, otherwise the user experience can be very disappointing. Many defects are raised around the issue that n is not in fact a small number. In this instance, n seemed to be an acceptable figure if not great.

The developers and testers got a big upset when at the last moment of the product release cycle a project was submitted that completely defied the performance as seen by the developers and QA. It took ages to perform what was next instant on much larger chunks of data.

What was going on?
The developer debugged the application: the same call sequence as any other data; just a very different time taken.

What was different in the project?
Well, one of the default limits had been busted wide open from the default value of several hundred to several thousand - there had never been any testing for this number of items in these lists, but still a few thousand is a small number for modern machines with functions having an acceptable order of n.

So, expectations were being defied: what was going on?

The automated test was re-run with the limits set up to the default-busting value - still performance was acceptable and far better then the problem project.

It was now down to the data in the project. The automated test which created data in a controlled way could no longer help.

Back to the developer.
Working furiously to the very last days of the development deadline the answer suddenly became apparent in debugging and comparing the behaviour of the "good" big projects and the "bad" big projects.

The behaviour of a key function involved in the interactive performance of the component was not simply characterised as O(n). A better indication would be O(n) + O(x-y), where x and y are the counts of items in the lists used by the component.

Where x and y where below the application defaults the second term was never noticed through behaviour and the 3rd party component source was never scrutinised to the level where the flaw could have been found. When the application defaults were massively exceeded this O(x-y) had the opportunity to become - "quite significant".

Why didn't the automated tests catch this when the new larger sizes were set in configuration?
Because the tests made a reasonable assumption - that n mattered and hence it was not noticed by the tester and developer that the list sizes were all the same - x-y was always zero

The problem project has real data with large and different lists counts - all it took was a list to be several 1000 items long and for another to be small for the lurking performance issue to be exposed.

The moral of the story
It could be argued the original bug report contained the kernel of the solution, by providing an example of rule of thumb #3.

Rule of thumb #1 was initially thought to be adequately covered but fatally undermined by the pathological behaviour of the application. The application appeared to be matching the testers' and developers' internal model for certain highly specific conditions, which were sadly unrealistic in one key feature: rule of thumb #3

Note: It is doubtful the flaw in the 3rd party component could have been found through desk-checking the source in any realistic timescale.

Full article...

Other Resource

... to read more articles, visit http://sqa.fyicenter.com/art/

A Cautionary Tale