A Real Life Lesson in the Importance of Testing IT Systems
Some of you may have read the Washington Post article “Crash of Va. computer network has implications for tech world, state politics” by Rosalind S. Helderman and Anita Kumar published on Thursday, September 2, 2010. The quick summary is that a redundant data storage unit failed in a warehouse outside of Richmond last week, wreaking havoc in the computer networks of a number of Virginia agencies for more than seven days. Now, sometimes even if you do everything right, technology just fails. This is a sad reality and the disaster in VA may simply be a case of very bad luck.
Having said that, I suspect the disaster in VA more likely is an example of an all too common problem in IT: the lack of testing systems after installation and more importantly the lack of regular testing of systems to validate proper function.
I am going to hypothesize that the redundant storage system was installed and everyone just assumed it would work because they bought an expensive system and that no one ever bothered to test the failover process. You can see this attitude in the quotes in the Washington Post article, “This is surprising — it’s a selling point for them (EMC) when they talk to a major organization, that this stuff never goes down,” said Bill Kreher. This attitude to trust that a system is going to work and not take the time to test is a common problem in any field (see the BP Deepwater Horizon disaster), but it happens a lot in information technology especially.
The all too often made assumption is that you don’t need to test a product because the vendor says it will work, and because it is hard and complicated to test. So in-house IT teams and IT consultants take the easy way out and don’t go perform the initial or ongoing quality assurance that they should. Then when a real world event occurs and lo and behold the system doesn’t work and now the organization is in real trouble.
At New Signature we live by the mantra that you should regularly test information technology systems to validate that they are fully operational and perform as expected. This is one of the reasons we institute a quarterly maintenance process for our clients. Part of this maintenance process is to perform real world tests of key critical components with careful planning and during scheduled downtime (e.g., unplug the UPS and make sure it works, take one of the servers offline and make sure the redundancy kicks in, do something unexpected to the system and see how it responds, etc.). Without this regular quality assurance organizations more easily find themselves in the very bad place that the state of Virginia has been in over the past week.
If you have an IT system that hasn’t been tested or hasn’t been tested lately, please contact us at New Signature. We can help you develop a plan and execute to ensure that your information technology systems are being regularly tested and that important maintenance is being performed.