Load Testing as Science and Art

Software QA FYI - SQAFYI

By: Elad Rosenheim

The aim of this post is not specifically to shed more light regarding what went wrong before the launch of SCN, but I do promise to get there. In fact, I hope to do much more: inspire some thinking about the making of load-tests for any big, complex system. I've been to a few of these, and many of you have been through this as well, I guess.

One thing that's always noticeable to me either when I present my findings, or when I read of other's experiences, is this aura of "OMG SCIENCE AT WORK! HYPOTHESES, ISOLATING VARIABLES, STATISTICS...FEAR ALL YE PRODUCT PEOPLE!". I do admit it's kinda satisfying as an engineer to bask in that light. However when looking at the details, there appears an intricate layer of reasoning, switchbacks, convenient omissions and the like which make it read more like a novel (and a bad one at times). Why does that happen?

The Unknown
One major reason, I think, is the actual amount of unknowns you're facing. Even when you have the legacy of an existing system such as the old SDN, there appear numerous questions. Here are but a few - some are easy, some are hard.

* The old system had of course the concept of replies to discussions, and now we added Likes and Shares on top - so how to estimate the number of such actions? How do these actions affect the number of replies? A safe bet here, which we actually took, is to leave the rate of replies as it is but add likes and shares by a factor of 4x. Why 4x? Because these are much easier tasks for the user to do than reply, and so we needed to have SOME factor there. Of course, one could argue, likes and shares would also increase the number of replies - because people would reply just for the sake of being liked. Here you could hopefully see the endless loop of discussion which may surround every little detail. So, you decide, this corner of the system really doesn't matter all that much; You go for some nice factor, telling yourself that it's all fine because you added a lot of load for SOME OTHER FEATURE.

* And here's another one: looking at the logs from the old system, we were quite surprised at the number of requests for RSS feeds. I'm not an avid fan of the technology myself, and I had to wonder how many of these registrations are actually "active" - how many users registered once to a blog/discussion and now forgot about it? In other words, how many would bother to register again in the new system, given the fact that the new system has more modern (and arguably better) functionality which serves a similar purpose: Followed Activity? For this corner of the system, we actually decreased the number of RSS requests compared to the old system, while making sure to set a really high rate for the All/Followed Activity page, which we already knew was quite heavy at times. In this process of bargaining, we always made sure the total number of page-views per hour would sum up to what we calculated as representative of a "busy hour in a busy day" - in the old system. Nice, but is this old hourly rate enough anyway? maybe not, but then you have to make your baseline SOMEWHERE and start loading from there.

* The long-tail of content: Some content, such as Oliver Kohl 's blog, is more popular than others ;-) and so you look at your first draft of the test and think: maybe I'm actually making it too hard on that poor system...I mean, I'm randomly requesting for threads that no one would look for anymore, when actually it might be that 20% of threads are viewed by 80%, and maybe it's even closer to 5% consumed by 95%! If that is indeed the case, my system is gonna smoke that test with its in-memory caches! Results would be wonderful...but if you're like me, you just don't take that path of glory. And you know you're putting some non-realistic load here for the worse, because it evens out with somewhere else where you unknowingly made life way too easy on the system.

* Product people don't know any better than you: there are many respects in which they actually do know better (as painful as this is to admit), but suprisingly not in this case. I've had this time and time again: You can't go to someone who's an expert on functionality and demand some numbers. It's hard to even get some realistic usage scenario from them. They just don't think that way, they don't have that info, and these nice little user-stories are totally made-up anyway - and they know it even better than you do. In the rare case they get TOO INTERESTED, however, you face the possibility of someone actually questioning your basic axioms all over again. Of course, you're open to that, but as George Smiley once said in "Tinker, Taylor, Soldier, Spy":

"TO A POINT".

I could go on in similar vein forever here, but a pattern does emerge I guess: There's just a lot you don't know, and nobody's gonna help you. So, you try to strike a balance which FEELS right - to you.

Full article...

Other Resource

... to read more articles, visit http://sqa.fyicenter.com/art/

Load Testing as Science and Art