Web Performance Optimization – Fun for Performance Engineers!

The Velocity conference is underway (SOASTA is one of the sponsor). Looking at the schedule, you will realize that topics cover all critical factors to take into consideration when building scalable, fast and reliable websites and services.

  • Operation: Deploying large and scalable cloud infrastructure, infrastructure automation, real-time monitoring, distributed systems, NoSQL databases etc.
  • Mobile performance: Analyzing mobile performance, optimization, writing fast client-side code, device comparison etc.
  • Web Performance: Optimizing server-side scripting with NodeJS, Client side optimization (image, javascript, browser specific optimization, HTML5 etc.), automated web performance testing etc.

Today’s performance engineers have the opportunity to get involved in all these area and this is why this is such a great time to be in performance engineering.

Want proof? Take a look at the following project some of our performance engineers tackled some months ago.

The web application to be tested was developed by the government of one of the largest country in the world. It was a form based application all adult citizens in the country had to use. We’re talking a massive number of users! The application was globally distributed across multiple private datacenters to offer failover option as well as provide the best user experience possible. The technical environment was your typical Struts application framework with an Oracle database for back-end. The target was to reach a clean test at 172,000 concurrent users using the application on the production environment. Yes, you read it right. On a production environment! Our performance engineers run a fair portion of their tests on production systems, especially at the end of the application development life-cycle. This is the environment the application is going to be accessed from and problems you find at that stage can’t be found when the application is installed in a lab or in a staging environment (Loadbalancer problem, CDNs, Bandwidth etc.). If you want to learn more about performance testing in production, there is a webinar available.

This was a 2 months projects and our engineers had some time to perform state of the art performance testing: Starting with a low number of concurrent users (500), fixing issues at this low level before moving on to greater level ie. 1000, 5000, 10k, 50k etc. That’s a fairly typical approach but sometime overlooked by some of our customers. We usually educate them while going through our performance testing methodology.

Many problems were found and fixed before reaching 100k: Servers misconfiguration, oversized pages, poor client-side caching, SQL optimization, login process optimization (One of the typical optimization engineers have to deal with regularly as companies tend to retrieve all kind of heavy information during the login procedure).

One problem really drove our engineers crazy for a few weeks. (The type of problems they love as it is really challenging and rewarding to get them fixed!). At a fairly substantial level of concurrent users (110k+), they were observing the following real-time chart:

They ran hundreds of tests at this level and were getting plagued by the same issue: As soon as the first users were done filling up their form and submitting it, overall response time was getting to the roof, throughput dropped significantly and error rate reached 6.5%! Not a pretty picture and a really bad user experience: Spending 15 minutes to fill governmental forms and getting an error when submitting is not what most users want to go through.

After many days scratching their heads, the problem was finally identified. This application was distributed across the world with a global Load Balancer taking the requests and route it to local load balancers closer to the user. As it was a highly secured application, the global load balancer was responsible to serve a highly encrypted (2048 bits) certificate during the submit process. When it was serving this certificate, CPU on the load balancer would go sky high and the throughput would drop. Our engineers had the idea of reducing the encryption level to 128 bits as a test and sure enough the CPU level on the load balancer was normal and it was being able to serve the certificate as expected. But as soon as the encryption level reached 2048 bits, problems were back. We ended up contacting the load balancer manufacturer and with all the data collected during tests it was all the information they needed to provide a firmware fix.

This is the final test results with the firmware fix.

Very clean test! Very low average response time (440ms), extremely low error rate (0.001%!) and 172,000 virtual users getting through the process without a itch!

That’s what performance engineering is all about these days! Getting involved with large scale projects, learning the full scope of performance optimization, teaching customers performance best practices. And best of all, there is product innovation to bring engineers all this fun! CLOUDTEST! And soon a very big surprise for all performance engineers in the world … !

Now is the Time to be in Performance Engineering!

I’ve always considered performance engineering as the most rewarding discipline in software testing. In my opinion, this is where you have the most opportunity to learn, especially technically. Great performance engineers follow Cem Kaner principles described in his Bug Advocacy paper and especially this one:

The best tester isn’t the one who finds the most bugs or who embarrasses the most programmers. The best tester is the one who gets the most bugs fixed.

It’s about finding the right ways to communicate problems and giving as much useful information to the developers, DBA and IT guys responsible for the infrastructure where the application under test resides. It’s about dealing with objections from these people, motivating them to consider the problem seriously and to start investigating it. It’s also about pinpointing the problem in the right direction. Great Performance Engineers need to be good salesmen and need an amazing amount of knowledge to get the issue they’ve found fixed, whether it’s in the application code, the infrastructure in which the application resides or elsewhere in the overall architecture!

Great Performance Engineers get to learn about:

  • The intricacies of load balancers, especially since they’re one of the primary sources of contention when dealing with high volume applications. A lot of companies take load balancer configuration for granted and don’t bother testing their algorithm under load.  A BIG mistake!
  • CDN configuration. Again one of the top problems our Performance Engineers find when testing applications from outside the firewall.
  • Bandwidth usage and its implication on the overall performance of the application.
  • Auto-scaling mechanisms.
  • Garbage collection, memory leaks, unoptimized database schema and queries, optimizing CPU consumption, etc.
  • Everything about front-end optimization: Browser caching, expired headers, cache busters, image optimization, lazy loading, progressive rendering, etc.

Performance Engineers are able to test today at a scale they couldn’t dream about 4 years ago. Look at the test below: a 58 min test with 7 Terabytes of data received! A “big data” problem Performance Engineers can have fun with these days.

They’re able to test from inside and outside the firewall, providing coverage for problems they couldn’t previously replicate. They can, with CloudTest, get performance results in real-time and have conversations with developers, DBAs, Ops and other IT constituents during the test, increasing their chance to solve problems quickly, and to learn. A recent engagement with a large telecommunications company in the US brought 90 people together during the 2 hour test. A great learning opportunity!

If you’re eager to learn, and help companies get the best performance from their application, this is the best time to be in performance engineering. Best of all, SOASTA is hiring!

At SOASTA we’re in the wisdom business

An interesting discussion with a former LoadRunner Product Manager piqued my interest and triggered this post. The real trigger was this part: CloudTest (like others, including LoadRunner) is still just a testing solution and as such serves as a filter or gate-keeping function in the lifecycle. [attrib: Jim Duggan, Gartner]. I’m gonna disagree on that one as I’ve had a different opinion on the “gate” part for the past 15 years. In my opinion, the goal of performance testing (and software testing in general) is not to be used as a gate. Testing is only one part of the equation when you need to decide whether or not to go live. There are other important factors to take into consideration, and depending on the context and timing they might be more important. The marketing and sales side of the software industry often prevails, for better of for worse.

The true goal of Performance Testing is to provide wisdom. Maybe the sole purpose of LoadRunner is to be a testing solution that serves as a gate, but at SOASTA we’re in the wisdom business.


The first objective of performance testing is to gather relevant data from EVERY components involved in the overall performance of your application. This is your raw and dumb data.

  • Data from your application itself, memory, CPU consumptions, number of processes, heapsize, etc.
  • Data from the application’s underlying infrastructure: Application servers, web servers, databases, SSL servers, CMS, memcache servers etc.
  • Data from your application’s ecosystem: CDNs, Load balancers, switches, routers, DNS, etc.

You end up with LOTS of data. To give you an idea of the magnitude of data you might gather, one of the tests we’ve performed with a major TV network generated over 7 terabytes of data in less than one hour. Transfer was 17GigaBit per second! Thanks to CloudTest we’re able to gather these terabytes of data in real-time. But they’re USELESS data if you don’t have a mechanism to transform them into information. In order to deal with this BIG DATA PROBLEM you need a real-time, in memory OLAP engine. Gartner predicts that By 2014, 30 percent of analytic applications will use in-memory functions to add scale and computational speed. It looks like SOASTA is ahead of the curve!

Information is created by analyzing relationships and connections between data. Information should help you make some sense from your data and become relevant to your business:

  • What’s the relationship between the number of virtual users and the memory consumed by my application server?
  • How much server capacity am I left with when I reach 10,000 concurrent users during my test?
  • What is the correlation between the number of process counts on the database server and the overall throughput?
  • At what stage during the test do I see a drop in overall response time? Can I correlate this drop with another data to understand this behavior?
  • What is the correlation between an increase in response time and the number of errors coming from my SSL Server?
  • Why is 90% of response time for my overall homepage taken by this particular file? Where does it come from? Why does it take longer than the other page assets?

By combining, correlating and aggregating your data you’re able to build enough information to understand the behavior of your application and its entire ecosystem.

Knowledge is created by receiving, absorbing and understanding the information. From this knowledge we can make decisions and take actions. If you’re observing a surge of traffic going into one particular webserver, impacting the overall response time for some of your visitors, you can pinpoint the problem to a misconfiguration in one of your load balancer. You’ve got sufficient knowledge to take action and make a change on the fly. That’s what we call actionable intelligence at SOASTA. And with modern testing tools, such as CloudTest, it can happen in real-time. That’s the agile way of doing performance testing!

Wisdom deals with the future and predictability.  This when you apply knowledge to change your process and reach your true objectives. This is the level where you test your predictions, execute them, monitor the results and adapt. As an example, what if you’re expecting 200,000 concurrent users on your website after a big product announcement? You think your visitors will behave in a particular way (you’ve got historical data to back this up): 50% browsing the site, 15% login, 15% putting items in and out of the shopping cart and 20% watching a video. How much bandwidth do you need? What should be the right configuration for your load balancer? What assets should be within the CDN? What should be the configuration of the memcache server? What if one of the web server crash? What would be the impact for your visitors? What would be the impact on your business?

Today, a lot of companies stop at the information stage and analyze their metrics retroactively to understand what has happened. There are some behavioral reasons, but most of them are just not equipped to reach further levels. You need a mechanism to:

  • Gather raw data in an efficient manner.
  • Build information by combining, correlating and aggregating these data to start to understand behavior.
  • Bring you knowledge, in real-time, so you can make decisions and take actions, FAST.
  • Perform predictive analysis, build prediction models so you can help your business and your customers in the best possible way. Gartner predics that by 2014, 30 percent of analytic applications will use proactive, predictive and forecasting capabilities. SOASTA is already there!

At SOASTA we’re not in the gate business. We’re in the wisdom business.

Email Us!
Subscribe to our Feed!
Join us on LinkedIn
Find us on Facebook
Follow our Tweets
See our pics