It takes more than faith to avoid a software disaster
By: Shane Schick
History is replete with major IT screw-ups, yet people are always surprised by their own
The irony was not lost on me when, on my way to deliver a speech on software disasters to the Toronto Association of Software Systems and Quality, I found myself unable to open a PowerPoint file on my laptop.
I ended up taking it up to my IT department, where someone fiddled with it for about 20 minutes before handing me a different laptop. I'm still not sure what the problem is, but I learned my lesson - if you're going to talk to a group of testers, it's wise to test your own systems beforehand.
Speaking to TASSQ was more daunting than I expected. Usually when software failures happen in well-known corporate enterprises, the response from people like me is almost too reflexive: we wag our finger sanctimoniously and scold them for rushing things into production before they're ready. This would, of course, have been a useless message to bring to TASSQ, where I would run the risk of preaching to the converted. Instead, I thought I could talk about the crisis management styles through which various organizations responded to well-known disasters, which I think can say a lot about their ability to avoid another one.
As regular readers of Computing Canada will know, the first tactic most organizations employ is containment. They tend to emphasize the small proportion of those affected by a system failure in contrast with their overall user base. This was never more galling than in the case of a Québec hospital that lost 14,000 patient X-rays, whose spokespeople pointed out that 85 per cent of their patients were not affected by the problem. That was probably small comfort to a patient who reportedly got diagnosed with cancer six months late.
Companies also tend to minimize the details they provide about software disasters, even when they have demonstrated considerable skill in communicating their success stories. Here I referenced RBC's so-called routine software upgrade in 2003, when spokespeople said that those responsible for fixing the problem were so busy with their work they didn't have time to explain it to the thousands of customers who were waiting to have their accounts put back in order. What a change from the RBC that regularly puts on conferences that showcase its innovations in financial data management.
The third dysfunctional tendency I pointed out was the way in which those struck by software disasters try to lie low for a while afterwards, not following up on the investigations they pledge to pursue when the heat is on. What I should have added is that media, including Computing Canada, are also to blame for failing to keep on top of stories, however forgotten by users, that could provide some useful lessons learned long after the fact.
Underlying these points were a few other observations. The companies that have these disasters never see them coming. Despite conflicts between IT departments and other lines of business, there is a surprising faith that the technology will work. While some of them may have highly thought-out IT security policies in place, they didn't seem to have a crisis management plan - because they couldn't imagine the kind of crises software can bring about.
The imagination of disaster, and an ability to help formulate contingencies in the event they happen, could be the way software quality assurance evolves as a discipline. That's the message I left with TASSQ. As with any theory, it will take a while longer before it's really put to the test.
... to read more articles, visit http://sqa.fyicenter.com/art/