The Performance Beacon

The web performance, analytics, and optimization blog

The only way to conduct real-world tests … is in the real world.

This guest post is authored by Seth Eliot, Sr. Knowledge Engineer in Test at Microsoft and a well-known speaker and blogger. Seth shared additional content during an October 31 webinar with SOASTA on Testing in Production Advances with Big Data and the Cloud.

In 1985 Microsoft ran a classified ad in the newspaper recruiting testers:

As a Software Tester you will design execute and document tests of application software. You will generate test scripts and automatic test packages. This is a challenging and highly visible position within a fast growing division of Microsoft.

Software Tester Ad

In case you hadn’t noticed, it’s not 1985 anymore. So why are so many companies still trapped in outdated testing paradigms – you know, designing and executing test scripts, capturing test results, and making quality assessments based on that data? Today, forward-thinking companies deploying contemporary large-scale applications are employing data-driven techniques traditionally used by operations as part of their testing – call them “TestOps” – and tapping into Big Data to render quality assessments.

Sure, you can test your application or service in a lab and get some data from that exercise. But if you want to get a proper sense of full-scale usage and real telemetry from hundreds of thousands of servers, there’s only one place to go: production.

I know what you’re thinking: “Hold it – don’t we test before production?”

That was the typical process, yes. But that simply isn’t enough anymore. With that 1985 approach (call it “Big Upfront Testing” or BUFT), we tested everything we could, put the system into production, and walked away … and hoped. But, of course, hope isn’t a strategy. Today, we need to do the right amount of Upfront Testing – and then continue that testing by Testing in Production (TiP).

If we’re supporting dozens all the way up to thousands of servers and managing Big Data – the elements found only in production settings – the only logical place to perform those tests is in the actual production environment. That’s where we find a rich diversity of users and where we encounter events and edge cases that we could never anticipate in a test script. The reality is: The data center is a sophisticated environment with complex topologies of networks, power sources, interactions of services. We need exposure to that complexity to truly test our applications and systems.

The Four Approaches to Testing in Production

When we talk about this new discipline of TiP, there are four major approaches to consider.

1. Passive Monitoring

With this approach, we monitor metrics and results that use real data. In this observational approach there is no lab required. Your service – running in production – constantly emits a stream of data about users, usage scenarios, resource consumption, and more. We use that data to make assessments about the quality of the application and its performance. Note that passive monitoring does not initiate transactions. It simply monitors the naturally occurring data. We observe the functional correctness as well as the time it takes for the production system to execute the requests that real users ask for – such as page load times, email sends, or similar tasks.

2. Active Monitoring

This is the more traditional approach that most testing professionals are familiar with. This is “synthetic” testing, in that we don’t have “real users” requesting tasks and services. Instead, the tester simulates a collection of real users. The advantage is that when something fails, you know exactly what to debug. You can also run these tests from outside and use the same scenarios that real users experience. These end-to-end scenarios may be harder to debug, but they are more realistic. SOASTA helps automate these tests performed in active monitoring.

3. Experimentation

With experimentation, we try new things – in production, with real users – and learn what works and what doesn’t. That can, understandably, create genuine concern among IT folks, but this can be an essential stage. A development cycle that proceeds for months or years without any real-world experimentation prior to launch will learn – far too late – what works and what doesn’t. The key is exposure control. Launch your prototype to real users – but not too many users. Maybe limit it to certain browsers or geographies. Or limit it to a random selection of 1 percent of your users. For instance, Netflix – which processes 1 billion API requests per day – deployed its next-generation service by putting a small slice of its users on the new service and gradually migrating them all over. With careful monitoring and by keeping the older service in full production mode, Netflix was able to control its exposures and minimize its risks.

4. System Stress

With the system stress approach, we are performing synthetic transactions “on steroids.” One version is load testing in production. We run synthetic transactions at high rates to assess the scale and capability of our service. We do this in production – but we don’t want to adversely affect real users, of course.

Big Data plays a big role here: We are injecting more than just synthetic users. We’ll also inject vast volumes of “synthetic data” (ideally modeled after production data). Then we monitor key performance indicators to see if users or performance are adversely impacted. If so, we stop the test, find the source of the problem, and fix it. Incrementally and iteratively, we can repeat this process to eliminate sources of adverse impact and design-in the quality our application/service requires.


TiP is the modern testing discipline that is capable of keeping abreast of today’s demand for agile and accelerated software development cycles. While a bit unsettling to contemplate at first, TiP represents the logical next step in the evolution of software testing.

SOASTA Marketing

About the Author

SOASTA Marketing

Follow @CloudTest