If TurboTax Online can stress test in production, and do it live with real customers on at the same time, anyone can do it. Hallmark, American Girl, Best Buy, Quicken Online, Kentucky Derby and countless others are doing it. Companies with high traffic online applications from every sector are doing it: Retail, Finance, Banking, Online Gaming, and so on.
This week I was on a pre-sales call and I heard a phrase that comes up often: “We would never test in production”. There are a number of questions that usually pop up that underline the reasons why companies might feel that way:
- What about live customers on the site?
- What about security concerns?
- What about dirtying up production data?
- What about fake orders in production?
… to list a few. These are all very valid concerns, and a proper methodology for testing will address all of these. Suffice it to say though that every major concern that could possibly come up can and has been addressed. I will cover some of these specific concerns in a later post.
Ok, so… “Hey Dan – that’s all good, but, why bother testing in production?”. This is what’s really important. I have seen so many companies start out with a ‘we would never test in production’ mindset, have a serious production crash, and become immediate believers.
There are just too many variables in between production and lab environments to test in one and make assumptions about performance and capacity in the other. Lab testing is also an important part of good performance management processes… but a 200 millisecond page response time in a lab can become 10 seconds in production when a batch job kicks off under load. What if you are under peak traffic levels and a database backup kicks off. What if you are on shared infrastructure and a different site other than yours experiences peak traffic through a shared network device like a load balancer or firewall?
Even if you’re not performance testing in production, what about failover testing? This “Im not going to test in production” mantra usually permeates every part of the organization. So forget performance testing for a second.. what about pulling the plug on servers and switches to test failover? If you have failover mechanisms in place but are scared to pull the plug on the in production for fear of impacting a ‘real customer’… something is wrong with that picture! Google pulls the plug on entire data centers every 30 days to make sure everything fails over correctly and they do it under live load.
There’s a lot that could be covered in this topic, but I just wanted to spark it up with some food for thought. Recently I had a conversation with the guy who managed the New York Stock Exchange production floor systems. Every weekend after the markets shut down on Friday, they would conduct performance tests stressing the trading transaction systems, apply a methodology for clean up and reset, and be good to go by Monday morning. They started doing this in the mid nineties.
About the Author