Archive for June 2011

The Fragility of Web Applications

Most web applications have some code in them that if utilized even slightly more than expected could send the whole stack toppling over. Web apps and their associated infrastructure are fragile and they must be performance tested thoroughly across all key functional areas. This testing should be done above expected traffic levels… the following is an example of why this is so important.

I was recently testing for a very large online retailer. Their site has the typical shop, buy, and self service functional areas. On this particular application there is an Ajax call to create an empty shopping cart (sometimes referred to as a transient session) as soon as you start browsing the catalog. It’s a lightweight and seemingly harmless GET request that passes through to the app tier to initialize the empty shopping cart. This cart is an in-memory object at the application tier.

What we discovered through testing was that if we generated the exact profile of traffic they were expecting with people browsing the catalog and creating empty shopping carts, along with customers adding products to the cart, then there was enough capacity to perform well. However, if we adjusted the load mix just slightly to have either more empty carts, or more carts with items in them, then the entire application slowed down and ultimately fell completely over. This affected not only people in the shopping experience but everywhere… the entire site went down.

This really got me thinking about how fragile apps really are unless you test all of the different components past their expected load levels, and assess not only their performance, but also the performance of the components around them.

Every web application has a weak link somewhere. Do you know where yours is? I bet that a very small load test that makes one particular type of request directed at your application could have catastrophic results. It could be 10% more users logging in than normal, or the worst case, more people trying to check out than you had planed for. I’m amazed at how many people market a flash sale and don’t change anything on their application to account for a totally different load profile than normal. We need to find those weak links and build them out to be more resilient.

Web Performance Optimization – Fun for Performance Engineers!

The Velocity conference is underway (SOASTA is one of the sponsor). Looking at the schedule, you will realize that topics cover all critical factors to take into consideration when building scalable, fast and reliable websites and services.

  • Operation: Deploying large and scalable cloud infrastructure, infrastructure automation, real-time monitoring, distributed systems, NoSQL databases etc.
  • Mobile performance: Analyzing mobile performance, optimization, writing fast client-side code, device comparison etc.
  • Web Performance: Optimizing server-side scripting with NodeJS, Client side optimization (image, javascript, browser specific optimization, HTML5 etc.), automated web performance testing etc.

Today’s performance engineers have the opportunity to get involved in all these area and this is why this is such a great time to be in performance engineering.

Want proof? Take a look at the following project some of our performance engineers tackled some months ago.

The web application to be tested was developed by the government of one of the largest country in the world. It was a form based application all adult citizens in the country had to use. We’re talking a massive number of users! The application was globally distributed across multiple private datacenters to offer failover option as well as provide the best user experience possible. The technical environment was your typical Struts application framework with an Oracle database for back-end. The target was to reach a clean test at 172,000 concurrent users using the application on the production environment. Yes, you read it right. On a production environment! Our performance engineers run a fair portion of their tests on production systems, especially at the end of the application development life-cycle. This is the environment the application is going to be accessed from and problems you find at that stage can’t be found when the application is installed in a lab or in a staging environment (Loadbalancer problem, CDNs, Bandwidth etc.). If you want to learn more about performance testing in production, there is a webinar available.

This was a 2 months projects and our engineers had some time to perform state of the art performance testing: Starting with a low number of concurrent users (500), fixing issues at this low level before moving on to greater level ie. 1000, 5000, 10k, 50k etc. That’s a fairly typical approach but sometime overlooked by some of our customers. We usually educate them while going through our performance testing methodology.

Many problems were found and fixed before reaching 100k: Servers misconfiguration, oversized pages, poor client-side caching, SQL optimization, login process optimization (One of the typical optimization engineers have to deal with regularly as companies tend to retrieve all kind of heavy information during the login procedure).

One problem really drove our engineers crazy for a few weeks. (The type of problems they love as it is really challenging and rewarding to get them fixed!). At a fairly substantial level of concurrent users (110k+), they were observing the following real-time chart:

They ran hundreds of tests at this level and were getting plagued by the same issue: As soon as the first users were done filling up their form and submitting it, overall response time was getting to the roof, throughput dropped significantly and error rate reached 6.5%! Not a pretty picture and a really bad user experience: Spending 15 minutes to fill governmental forms and getting an error when submitting is not what most users want to go through.

After many days scratching their heads, the problem was finally identified. This application was distributed across the world with a global Load Balancer taking the requests and route it to local load balancers closer to the user. As it was a highly secured application, the global load balancer was responsible to serve a highly encrypted (2048 bits) certificate during the submit process. When it was serving this certificate, CPU on the load balancer would go sky high and the throughput would drop. Our engineers had the idea of reducing the encryption level to 128 bits as a test and sure enough the CPU level on the load balancer was normal and it was being able to serve the certificate as expected. But as soon as the encryption level reached 2048 bits, problems were back. We ended up contacting the load balancer manufacturer and with all the data collected during tests it was all the information they needed to provide a firmware fix.

This is the final test results with the firmware fix.

Very clean test! Very low average response time (440ms), extremely low error rate (0.001%!) and 172,000 virtual users getting through the process without a itch!

That’s what performance engineering is all about these days! Getting involved with large scale projects, learning the full scope of performance optimization, teaching customers performance best practices. And best of all, there is product innovation to bring engineers all this fun! CLOUDTEST! And soon a very big surprise for all performance engineers in the world … !

Now is the Time to be in Performance Engineering!

I’ve always considered performance engineering as the most rewarding discipline in software testing. In my opinion, this is where you have the most opportunity to learn, especially technically. Great performance engineers follow Cem Kaner principles described in his Bug Advocacy paper and especially this one:

The best tester isn’t the one who finds the most bugs or who embarrasses the most programmers. The best tester is the one who gets the most bugs fixed.

It’s about finding the right ways to communicate problems and giving as much useful information to the developers, DBA and IT guys responsible for the infrastructure where the application under test resides. It’s about dealing with objections from these people, motivating them to consider the problem seriously and to start investigating it. It’s also about pinpointing the problem in the right direction. Great Performance Engineers need to be good salesmen and need an amazing amount of knowledge to get the issue they’ve found fixed, whether it’s in the application code, the infrastructure in which the application resides or elsewhere in the overall architecture!

Great Performance Engineers get to learn about:

  • The intricacies of load balancers, especially since they’re one of the primary sources of contention when dealing with high volume applications. A lot of companies take load balancer configuration for granted and don’t bother testing their algorithm under load.  A BIG mistake!
  • CDN configuration. Again one of the top problems our Performance Engineers find when testing applications from outside the firewall.
  • Bandwidth usage and its implication on the overall performance of the application.
  • Auto-scaling mechanisms.
  • Garbage collection, memory leaks, unoptimized database schema and queries, optimizing CPU consumption, etc.
  • Everything about front-end optimization: Browser caching, expired headers, cache busters, image optimization, lazy loading, progressive rendering, etc.

Performance Engineers are able to test today at a scale they couldn’t dream about 4 years ago. Look at the test below: a 58 min test with 7 Terabytes of data received! A “big data” problem Performance Engineers can have fun with these days.

They’re able to test from inside and outside the firewall, providing coverage for problems they couldn’t previously replicate. They can, with CloudTest, get performance results in real-time and have conversations with developers, DBAs, Ops and other IT constituents during the test, increasing their chance to solve problems quickly, and to learn. A recent engagement with a large telecommunications company in the US brought 90 people together during the 2 hour test. A great learning opportunity!

If you’re eager to learn, and help companies get the best performance from their application, this is the best time to be in performance engineering. Best of all, SOASTA is hiring!

What Keeps a 100-Year-Old Company Feeling Like a Teenager?

This year IBM turns 100 years old, and last week IBM’s “Watson” was named 2011 “Person of the Year” by the Webby Awards. The mere thought of IBM as a start-up (circa 1911) boggles my mind, especially when you consider that this year they reached their all-time high water mark at $205B in market valuation. This feat is even more amazing given the somewhat rocky road that they traveled during the late 80’s and 90’s. Over the last ten years under the watchful eye of Sam Palisamo, IBM is beginning to experience a rebirth. While they are nowhere near the dominant leadership position that they held over the technology sector from 1940 – 1980 when the market was defined as “IBM and the Seven Dwarfs,” today they are beginning to show signs of re-emergence as — if nothing else — the “supervising adult ” that their 100 years of existence entitles them to.

So what keeps an old company relevant after all these (100) years? Probably the same things that keep older people young…they hang out with their kids or grandchildren. In IBM’s case they have learned (over the years) to partner with a few young, innovative companies that are proving to be the new “game changers” in several traditional markets — even in some markets that have long been considered to be IBM’s strongholds. By partnering up with these young “upstarts,” IBM has given their customers fresh alternatives for new technologies and approaches for dealing with a rapidly changing business world. Perhaps even more importantly for many of their customers, IBM is also delivering a much-needed layer of “adult supervision” in this increasingly crowded and complex vendor landscape. Their years of experience enable IBM to become a trusted advisor to their customers on how to navigate through this vendor mine field.

As one those lucky few young upstarts that Grandpa IBM has chosen to partner with, we here at SOASTA get the advantage of their many years of experience surviving and thriving in both up and down markets. We also get to benefit from what may be the greatest technology distribution channel ever compiled, a channel that no start-up could ever replicate organically.

Time will only tell if Grandpa IBM and its young upstart partner SOASTA will make an interesting combination for the IBM nation, but the hourglass has been turned. One thing is for sure, SOASTA is one youngster that is eager to learn from Grandpa.

Will the Cloud Change the Industry Giants or Will the Giants Change the Cloud?

Over the past couple of years or so the technology giants such as IBM, HP, Cisco, Oracle and Microsoft have all announced cloud strategies — all with promises of multi-billion dollar investments backing these strategies up. Some have even predicted that cloud computing is the key to their future over the next ten years. This type of affirmation from these industry leaders is pretty heady stuff for a cloud industry in its own infancy…but will it become a reality? Will the industry leaders of today’s technology market be the leaders of tomorrow’s?

Most people following this market will say “absolutely,” even though the majority of cloud innovation continues to flow from early stage start-ups in this space today. It is also clear that the technology giants have both the cash reserves and the desire to take a leadership position in the cloud through acquisition. But is this enough?

The biggest obstacle to the giants achieving their ultimate goal of owning the cloud may be in their existing business models. The very same business model that has been the envy of the industry for the past thirty years may, in fact, be a roadblock to owning the cloud unless something changes.

Companies such as IBM, HP, and Oracle have been built largely on top of enormous distribution channels of direct sales and support personnel to attract new and maintain existing customers. While these channels have been the envy of the industry for the past thirty years, they also come at an enormous cost that must be compensated for in these companies’ pricing models. Conversely, the cloud’s on-demand pay-per-use business model (often defined as the “anti-lock-in” model) may be the antithesis of the enterprise site license that was popular in the 80’s and 90’s. Most importantly for publicly traded companies is that the cloud’s on-demand model offers limited visibility into projected or future revenue streams, which is a major issue in delivering guidance to share holders. Even worse, the average deal size for on-demand servers ranging from $.10 to $.80 per server hour rarely, if ever, goes over $100,000, let alone the $1M threshold that commonly justifies the existence of a direct sales organization. So the problem is, how do these companies sell cloud services? Their existing channels are focused on $1M deals, so getting their attention on a $50,000 cloud services deal may prove to be problematic. Developing a new organization around selling cloud services may cause some major internal channel conflict.

These challenges may explain why there have been more announcements and declarations than actual cloud successes so far from the industry giants. That said, I would not bet against them, as cash is still king…but I suspect independent cloud divisions will have to be formed to achieve any quick success. It also means that lean and agile cloud start-ups that are evolving from the beginning around an on-demand model model have more of a fighting chance than start-ups in previous generations. I, for one, am looking forward to seeing how the cloud market plays out between the “Davids” and the “Goliaths.”

Email Us!
Subscribe to our Feed!
Find us on Facebook
Follow our Tweets
See our pics