The Performance Risks of Testing in a Scaled Environment

Posted by on

Very often, performance testing is executed against a smaller environment than the live production system. While testing against the production environment is important, it may not possible due to factors involving cost, practicality and risk. SOASTA’s CloudTest Methodology promotes testing throughout the application lifecycle, including in staging/test environments. This post details the risks associated with performance testing solely in a downsized environment prior to go-live.

One of the most complicated questions to address when Performance Testing is this: “If we halve the size of the performance/load testing environment, can’t we just multiply the figures up?” This is a straightforward question and the answer is simple – NO. However, explaining it in simple terms is more difficult.

First, let’s take a square. If we halve the square, do we get half the size? Well, yes and no – the square is half the size, but its capacity is 1/4 of the original square.

This simplistic view illustrates that if the environment is halved, the capacity is not. Performance environments are not simple squares – they have many shapes and reducing their size does not yield a proportional decrease in size.

IT projects are inherently complicated – each piece of the system is built separately and then assembled before delivery. Let’s consider a motor vehicle that has a top speed (aka production) of 100mph. If we halve the environment (e.g. halve the number of CPU’s and therefore the speed) we are effectively halving the size of the engine, which gives the car 1/4 of the capacity and performance. The maximum speed the resized car can reach is now 25mph. We are driving a car with an engine that is 25% of the original size, but is still supporting the same chassis and load as when the engine was 75% more powerful.

Now let’s say other parts of the car are software subsystems – they bolt indirectly onto the engine: wheels, nuts, bolts, steering, and the axle. The smaller engine is driving these but can never load them past 25mph. So actually everything looks okay in the scaled environment when we are going full speed i.e. 25mph. But if we take the axle, an essential part of the system that is attached to the engine, and introduce a fault by updating it without changing anything else, it may well break at 30mph. This performance fault will only be found in production. The key lesson here is: by using a scaled environment you are unlikely to find performance issues past your environment’s capacity.

Now let’s take the wheel nuts; they have been designed to be over-tolerant and break at 200mph. If a design fault means the tolerance is actually lowered to 102mph the system won’t break in test or live, but the actual capacity of the associated components has been drastically reduced without visibility. So the key lesson here is: by using a scaled environment it becomes more difficult to see if the overall capacity and tolerance of the system has been reduced – making the chances of live performance failure more likely.

 

It gets more complicated.

In reality, the system is made up of many components that are essential to its speed: spark plugs, piston, camshaft, etc. When a hardware environment is scaled down, the size of its components are not scaled proportionally. Our essential IT components consist of memory, L1 Cache, CPU speed, network, disk I/O, and database, and these interact in a complex way. Resizing one will affect the overall performance. If you have to prioritize any of these then keep the memory the same. So the key takeaway here is: when working in a scaled environment – do not scale everything down. Identify, prioritize and attempt to keep as many key attributes as close to live as possible.

That’s a simple analogy. In reality, you also have to consider the risks of deadlocking (less likely in a scaled environment), configuration differences, and actual scalability capabilities of the application, not to mention the parts of the architecture that are typically untested in a scaled environment, such as bandwidth usage, load balancers and router/firewall capacity. The ideal solution is to test against a full-sized production system at least once before going live.

To recap:

  1. Scaled environments do not downsize proportionally.
  2. By using a scaled environment you are unlikely to find performance issues past your scaled environment’s capacity.
  3. By using a scaled environment you can reduce the capacity and tolerance of the overall system without visibility – this increases the risk of live performance issues.
  4. If you have to work in a scaled environment – do not scale everything down. Identify, prioritize and attempt to keep as many key attributes as close as possible to live.
  5. Identify and communicate the limitations of the scaled performance test environment to management.
  6. Study the environment and architecture – attempt to scale any performance testing environment sensibly.
  7. Always attempt to performance test against the live target architecture before rolling out to customers.
Leave a Response
  • (will not be published)