Implementing 2-Legged OAuth in Javascript (and CloudTest)

Introduction

If you’re reading this you are probably looking for information on how to implement 2-Legged OAuth in Javascript.  I recently had to implement 2-legged OAuth into a CloudTest performance test for one of our customers.  Because 2-legged OAuth is not part of the official OAuth spec yet (as of 6/15/2011) there is relatively little information outthere about how to make this all work. Where there is information unfortunately it doesn’t universally work for all implementations since there isn’t a specification for it.  I hope this saves you some time… it definitely would have helped me out.  You will need a working knowledge of Javascript to find the implementation details in this article useful.  Without an understanding of Javascript you may find just the general OAuth overview interesting.

High Level OAuth Overview

OAuth is a way for applications to authenticate with one-another.  In essence a client application encrypts a string of values and passes that encrypted string, along with the values it used to encrypt it (except one, your secret key), to theserver.  The server then uses the values you sent across to look up your secret key and attempt to generate the sameencrypted string you did. The server then compares the two encrypted stringstogether.  If they match, it’s a success.  If not, it’s a failure.

The difference between 3-Legged OAuth and 2-Legged OAuth is that in the 3-Legged variant, the client first passes some credentials to the server and gets an access token back if authentication is successful.  Then this token is passed along in subsequent requests.  This is commonly called ‘the dance’ in OAuth developer circles.  When you authenticate with Netflix through various platforms (AppleTV, iPhone, Netflix.com), you do a 3-Legged OAuth dance.  This allows for users, applications, and authentication to be abstracted out into separate tiers.

Some other types of applications may be better suited for the authentication and message passing to happen in 1 request and 1 requestonly. This is where 2-legged comes in. In 2-legged OAuth you pass the encrypted string, the values used forencryption, and the message payload in 1 GET or POST.  If it is rejected, the message fails.  If it’s accepted then the message is processed.  This particular app that I was working on testing was a central logging system.  Every message was a log event.  There was no time (or functional need) for a three-way handshake in this app and also no notion of a maintained state.  2-Legged OAuth cuts out the middleman.  If authentication is successful the message is processed, no dancing around.

Click here to dive deeper

The Fragility of Web Applications

Most web applications have some code in them that if utilized even slightly more than expected could send the whole stack toppling over. Web apps and their associated infrastructure are fragile and they must be performance tested thoroughly across all key functional areas. This testing should be done above expected traffic levels… the following is an example of why this is so important.

I was recently testing for a very large online retailer. Their site has the typical shop, buy, and self service functional areas. On this particular application there is an Ajax call to create an empty shopping cart (sometimes referred to as a transient session) as soon as you start browsing the catalog. It’s a lightweight and seemingly harmless GET request that passes through to the app tier to initialize the empty shopping cart. This cart is an in-memory object at the application tier.

What we discovered through testing was that if we generated the exact profile of traffic they were expecting with people browsing the catalog and creating empty shopping carts, along with customers adding products to the cart, then there was enough capacity to perform well. However, if we adjusted the load mix just slightly to have either more empty carts, or more carts with items in them, then the entire application slowed down and ultimately fell completely over. This affected not only people in the shopping experience but everywhere… the entire site went down.

This really got me thinking about how fragile apps really are unless you test all of the different components past their expected load levels, and assess not only their performance, but also the performance of the components around them.

Every web application has a weak link somewhere. Do you know where yours is? I bet that a very small load test that makes one particular type of request directed at your application could have catastrophic results. It could be 10% more users logging in than normal, or the worst case, more people trying to check out than you had planed for. I’m amazed at how many people market a flash sale and don’t change anything on their application to account for a totally different load profile than normal. We need to find those weak links and build them out to be more resilient.

Best Practice Architecture – The Graceful Turn-Away Page

One thing that I’ve seen best-in-class web applications make use of is the ‘graceful turn-away page’, also known as the graceful deferral, turn-away, or throttle page.  I apologize in advance for this post being a little light on technical content, but the reason for that is a throttle can be implemented in a number of different ways depending on application architecture.  Therefore I’d like to just focus on why this should be a mandatory piece of functionality for any online application.  I will talk a little bit about how this is done in some applications and the rest is up to you, the smart people, to decide on how to implement it for your application.

First things first – what is it?  Simply put, it’s application functionality that throttles new users and requests flooding into an application while preserving the experience for people already on the site.  Without queuing and throttling mechanisms applications that get more traffic than expected will normally blow up and impact everyone.  Throttling will give new users coming in a graceful page saying something to the effect of ‘we are experiencing higher traffic that normal, please come back later’.  It stops people at the front door of the store while letting everyone else inside the store finish shopping.  The key here is that sessions already established and in flight are not impacted at all.

One of the best examples I’ve seen of this is for event registrations held by Active Network.  Active Network does registration for things like marathons, triathlons and Iron Man events.  These events typically have a fixed registration count but get many, many magnitudes higher users coming to the site trying to get one of those spaces.  To further complicate matters, registration opens at a very well-publicized, set date and time.  So the traffic profile is something like this: they have potentially tens of thousands of people sitting on the web site page clicking refresh while waiting for the event to open.  At some point they flip a switch and let everyone in.  First come first, serve on the seats available.  At some point though they may have such an unexpected spike in traffic that they need to queue people at the front door and let people in the registration process get through with a good customer experience.  If a spike like this happens, a nice friendly turn away page is shown to all new users.  Performance is great for everyone still in the registration process.  Brilliant.

I have seen this on some best in class web apps, but not nearly enough.  I feel that this functionality should be mandatory for every site out there.  Again, not just a friendly ‘fail page’ that everyone sees when things blow up.  This should happen way before that.  Any web server can show a branded error page.  They key here is the throttling capability.  There is usually some effort to make this work properly but it’s well worth it.

In order for this functionality to work properly it is usually implemented at either the application tier, the load balancer tier, or some combination of the two.  The application server usually knows how many sessions are present in the application.  Application servers like WebSphere and JBoss know how many in-memory sessions are present and can report this out over JMX.  This is a great metric from which to base concurrent load levels.  Web or application server threads or connections are another common metric to watch for making a decision on when to turn away.  The load balancer is a great place to do the actual throttling because most modern load balancers, like F5 Networks BigIP boxes, can serve static html pages.  Plus they are aware of how many connections are open to the web and/or application servers.

Does your web app have a graceful turn away or throttling mechanism to handle unexpected traffic?  If not, I would highly recommend such functionality.  Create it and test it to see that it works as intended.  It doesn’t even have to be automatic. It’s okay to go out on a load balancer and manually put the page up if that’s what it takes.  As long as it holds back the flood of traffic from the tiers that  need to be running optimally to serve the existing customers then it’s a win.  It will create a better customer experience for everyone during turbulent traffic periods.  If you can’t keep the site from going down entirely, at least you won’t have a zero dollar revenue window because the whole thing blew up.

Randomizing Click Paths Inside of a Test Case – Not a Good Thing

QA and performance test cases are usually a sequential set of steps to take action in an application.  Something like this:

Step 1: Home page
Step 2: Click catalog
Step 3: Click ‘clothing’
Step 4: Choose the ‘fancy sweater’
{some other steps in the checkout process abbreviated for space}
Step 13: Click place order

The key here is that the test case is a structured repeatable set of steps… more on why this is important later.  There have been times when I have been presented with a test case that looks something like this:

Step 1: Home page
From here:
70% of the users click on catalog
30% click on login

Catalog subflow:

15% click ‘Clothing’
11% click ‘New arrivals’
….

Clothing subflow:
New arrivals subflow:
Login subflow:

This test case is actually not very useful in nearly any situation you could imagine using it, and certainly not in the most common applications for performance testing: baseline tests, stress tests, problem isolation tests, throughput tests… you get the idea.  In addition, it’s complicated to write, maintain, and offers little flexibility in troubleshooting and load calibration.

I never understood where test cases like this came from until I recently watched a live demo of one of the world’s leading web analytics tools.  This particular tool visualized user click paths by starting with a vertical row on the left of all of the landing pages on a web site.  Usually, the majority of users were landing on the home page, but sometimes it was a microsite or a specific popular product where a bulk of users entered the site and then navigated on from there.  When you clicked ‘home page’, for instance, the tool branched out showing percentages of user traffic that went on from the home page to other pages.  It was at that moment when I realized where these test case designs were coming from: analytics tools!  Unfortunately what comes out of an analytics tool doesn’t necessarily translate into a useful test case.  The analytics tool provides you with workload data and traffic numbers, first and foremost.  Secondly, it tells you where users are spending time on the site… and from there you should take that information and design modular and reusable performance scenarios.  To make matters worse, there isn’t a testing tool on the planet that makes writing test cases like this easy (because they were never meant to exist like that).  A die-hard QA engineer will most likely tell you that this is not a real test-case as defined anyway, so it makes sense that tools wouldn’t make it easy to do.  Yet I do see them out there a lot.

Test cases in performance and QA are meant to be steps of actions that can be used to generate a certain load level or reproduce a problem or error condition.  They are meant to be followed closely and at the end have a very specific outcome, assuming certain conditions.  These outcomes can be as simple as placing an order and getting a success page or as specific as a certain click path that generates a specific error.  In performance, the goal of a workload mix is to generate a specific amount of load.  This load is usually in the form of orders per second, page views per hour, and so on.  When problems are encountered in any form of QA the test cases are boiled down to just those needed to isolate and reproduce a problem.  Test cases that are comprised of randomized click paths make all of those things extremely difficult.  Don’t vary the percentages of users executing test cases inside of a running load test.  I have never found an application for this kind of test in 12 years of running tests for the biggest online applications*.  (*this statement limited to the web space)

Designing your test cases like this is attempting to blend two sciences together: workload modeling and test case development.  Don’t mix the two together.  A better way to approach this is to have modular test cases and a workload model built using the right mix of those modular test cases.  Control the execution of those test cases at the Composition level.   Object oriented programming teaches us to separate functionality into the object in which it belongs.  The same applies here.  Another concept that applies is that of the ‘business process’.  Many monitoring technologies and engineering methodologies are based on the concept of a business process.  A BP is a simple set of transaction steps that results in a specific action at the end (order placement, logging in, modifying account info, etc).  There is sometimes a little overlap in shared actions across business processes to account for in the workload, but it’s usually minimal (logging in for instance can happen in a few paths throughout the site).

Here is a summary of the points raised:

Why having randomized click paths in a scenario is bad

1) When you run this test multiple times you’re going to get different results (close at best statistically, but more often than not completely far off in the real world).  This makes calibration very challenging and you can never get it the same consistently and repeatedly.
2) If you hit a problem with one area of the site you have to rewrite, duplicate, or drastically change your test cases to isolate the problem and reproduce it.
3) Maintenance is hard – test cases like this become animals.  You actually have to change your test case to increase the amount of load going to a certain area of the site.  Not good.  Test tools give you mechanisms for controlling this logic outside of the test case for a good reason.

Why having modular, independently controllable scenarios is good

1) You can tweak the test load to a solid calibration.  I recently calibrated a site to within .01 of the targets on all page views at the end of a test because of modular scenarios that could be tweaked to achieve very specific page view targets.  This would be impossible to do reliably with random click path scenarios.
2) If you need to stress an area of the application weeding out the scenarios you don’t need is easy and adding load to one particular area of the site can be achieved by just cranking up the virtual users for that one scenario.  This, as opposed to cutting part of the script out, creating a new one out of it… etc. etc.
3) You can adapt to changes in load by simply modifying the workload, and to changes in test cases by modifying the test cases.  One shouldn’t need to rewrite a test case because recent analytics data shows that more people are creating accounts than logging in.  Simply changing the virtual user count to increase load on a registration checkout scenario is so much easier.

Functionality needs to be abstracted out to the highest level possible that still allows you to achieve business process and transaction goals. Configure the load levels, percentages, and ramp ups in the test definition and not the test case.

Performance Problems Should be Filed as Bugs!

Seriously. If you are running a performance test and you find bad performance in some code, it should be filed as a bug and assigned to the developer who wrote it. This will either strike you as common sense and something already being done where you work or as a really different way of doing things than what in place right now.

Doing this is a culturally mature thing on engineering teams. It says “we care about performance” on the dev team. Engineering leadership then expects their developers to fix the bugs, and be proactive about performance all the time.

It also says that the Performance QA team needs to uncover problems with thorough testing, and let developers know in a timely fashion. At the end of the day, if a team’s performance testing isn’t agile to fit along with the rest of the SDLC, they’re not really doing agile.

n-tier Architectures Are an Off-the-Shelf Bottleneck

It’s true. Whether you realize it or not, the typical n-tier web architecture is an out-of-the box bottleneck. Pretty much all popular application technology stacks are a performance problem waiting to happen. When you picture your web application infrastructure, think of it as a funnel. By default, the capacity is wide at the top and bottlenecks in tightly at the bottom. Here is a diagram of a typical 3 tier architecture with a load balancer:

typical 3 tier architecture with a load balancer

Let’s use an F5 Networks/Apache/JBoss/Oracle stack for reference since this closely mimics most enterprise level e-commerce applications. If you haven’t tuned and optimized your environment at all levels, your capacity model looks something like this:

  • F5 Load Balancer (Hardware)
    • Capable of thousands and thousands of connections
    • Built to handle tens to hundreds of thousands of concurrent users
    • Bound by CPU, memory and bandwidth
  • Apache Web Servers
    • ~128-256 threads by default
    • Bound by CPU, memory, and bandwidth
  • JBoss Application Servers
    • ~25-100 threads by default
    • Bound by CPU and memory
  • Oracle Database Servers

    • ~10-40 connections typically
    • Bound by CPU and memory

You should see a pattern here. The further down the stack you go, the less throughput you get. Apache has configured by default usually no more than 128 or 256 threads. This almost always has to be tuned to 512 or 1024, if not higher depending on the nature of the traffic. JBoss is set to no more than 100 threads out of the box. IIS for ASP .NET applications is in the 15-25 thread range. Your database connections are always a fraction of the total thread count of your app servers. Usually no more than 25%.

So, if your load balancer can support 10k simultaneous requests at the top, but your 2 web servers can only process 256 simultaneous requests combined, your 4 app servers can only process 160 combined, and your database can only take 40… this model will likely not serve your expected traffic pattern very well and needs some attention.

All of the major technologies at play in modern web applications need tested and tuned for optimal throughput and behavior. This means Apache, JBoss, Tomcat, IIS, Oracle, MySQL, Postgres… you name it, they all have out of the box configurations that are generic and need tuned to suit your app. Incidentally, optimizing doesn’t always mean increasing thread counts. If each thread running on your JBoss servers is eating up mad CPU cycles, you might need to tune down then scale the tier wider horizontally.

The ONLY way to ascertain the right number for these configurations is with testing. I was just on a test last week where folks were looking at their thread pool settings on the app servers for the first time ever in years of running a popular online application. Opening up the count in their case meant a massive throughput increase Don’t overlook these key performance pinch points. Again, these are the ‘default’ settings… if you’ve never looked at them under load for your app, you really, really need to.

The Static HTML Page Test – a Great Tool for the Performance Engineer

Here is a simple test you can do to help isolate performance problems in complex n-tier environments – it’s called the static HTML page test.  I’ve used it many times over the years, and I just busted this one out last night in a customer test.  It’s invaluable.  Here is the scenario:

You’re performance testing an application and there’s a bottleneck somewhere.  But where?  In an n-tier environment the requests usually follow a trail that looks something like this:

Load Balancer -> Web Server -> App Server -> Database

By creating a simple static page with an .html extension on it (very important), you take the last two tiers of the request chain out of the equation.  The .html extension is important because web servers know where to route the requests for content based on their extension.  When something comes through with .jsp, .aspx, or .php on it, the web server knows to send that request to a Java app server or through the PHP/ASP page processor.  If the extension is .html, the web server will serve it up itself.  Note that this is true in most cases.  While rare, it may be the case that web server settings or web application settings in your environment are having your .html pages served out of the app server.  This is really really bad.  In general, HTML should be served by the web tier (Apache, IIS, etc).

So why is this test valuable?  Let’s say you’ve been running a performance test that has a user hitting the homepage, browsing a product catalog, and adding an item to the shopping cart.  Over multiple tests you haven’t been able to push more than 200 hits per second and 7MBit per second of throughput.  You throw a static page on the web tier and it just so happens that you can drive 1000 hits per second and 35MBit per second of bandwidth.  In this simple test you just ruled out bandwidth problems, thread count settings in your web servers, and connection limitations on switches, load balancers and firewalls… to name a few of the bigger ones.

Simple test, huge results.  Happy testing!

Testing 1.1 Million Concurrent Users from 8 Global Locations

Just like any other day at a fast moving company like SOASTA, you never know what you’re going to find when you turn a corner.  Today, I walked around the corner and saw a bunch of product and performance engineers in our biggest conference room.  The projector was running and everyone was staring at the screen, which had a CloudTest dashboard up on it.  So I stick my head in, look at the dashboard, and what do I find?  A one million, one hundred and four thousand concurrent user test running for one of our customers.  Whoa!

This rocks on so many levels.  First, that’s a lot of load on any system.  Two, it’s being launched from 8 different data centers around the world, including multiple Microsoft Azure and Amazon EC2 locations.  Among them, Hong Kong, Singapore, Ireland, Chicago, San Antonio, and others. Perhaps most impressive in this whole setup, though, is the fact that each of our load servers were running 3,000 users apiece… with lots of headroom.

Check out the CloudTest dashboard.  The future of performance testing is so bright.

Screenshot of the CloudTest Dashboard from SOASTA

We are in a ‘Caching’ Renaissance Right Now

We all know how important caching is.  The concept of caching in web applications has been around for a long time, but I believe that we are in a period of caching renaissance within n-tier application stacks.  It started out back in the late 1990′s and early part of 2000 with trying to take load off of the database on high traffic e-commerce sites. Repositories and caching setups at the application server tiers were trying to keep requests from going to the database (rightfully so).  Even before that, the databases themselves had their own caches to speed up queries so they weren’t going to disk every time.  Keeping load off of the database makes a huge difference on performance and capacity, and it’s still a mainstay principle of performance engineering to this day.

In the not so distant past, memcached came along, and gave us a way to pull entire arbitrary objects out of a cache that was outside of the application tier and database tier.  Naturally, it was only a matter of time until that concept was put in place to cache content one level higher, on top of the web tier.  Now you have things like Varnish sitting above the web servers and caching entire pages and pieces of content to keep load off of the 3-tier architecture as much as possible.  What’s next, someone takes it one step higher and puts a cache layer on top of your content cache layer?  Well, guess what?  Akamai sort of does that with their Dynamic Site Accelerator.  They cache dynamic pages and content and keep all of the load off of your entire infrastructure where possible.

Caching is really important, and it needs to be applied properly and judiciously throughout the architecture.  Here is a real world example of a test I ran that illustrates the importance of caching:

Virtual users versus average response time screenshot in the CloudTest Dashboard

The teal area chart is virtual users ramping up over time.  The orange area in front is response time. For the first two minutes of the test, response times are crap.  They range from 2.5 to 10.5+ seconds. To make matters worse, this is a WEB SERVICES CALL.  So it’s just one component of a much larger transaction.  In essence, the end user would see this time, plus all of the remaining time to finish whatever they were doing in.  In this case, it’s an order placement.  The worst place to have this kind of problem.  But, something happens at the 2 minute mark and response times immediately get better, into the 100ms range, up to about 1 second under max load.  Here is another view:

Send rate versus average response time in the CloudTest Dashboard

I’ve kept average response time, but I swapped virtual users for throughput.  Yeah, when response time was crappy, so was throughput.  But then, response time goes down, and throughput immediately shoots up.  Any guesses on what happened there?  Yep, you guessed it!  The cache fully populated.  Part of this order placement call was to do inventory lookups.  Those inventory lookups fetch other associated metadata about the products being bought that should come from cache.

Having a cache in place doesn’t always mean it’s working as intended either.  Thankfully, it was in this case.  The customer was happy.  The only way to ensure that it is, is to test.  I have seen a lot of applications behaving in ways the teams didn’t think they would.  For example, if even one flag on a product object is changing with each instantiation or request, the cache might think it’s a new, fresh object and hit the database anyways.  In the case of pages, if you’re using Varnish or a similar page accelerator and you’re passing in dynamic values on the query string or in the POST data, the content accelerator might not cache the content.  You must test whether your caches are working properly.

First, though, you need to have one :)

Testing in Production – If the NYSE Can Do It, Anyone Can

If TurboTax Online can stress test in production, and do it live with real customers on at the same time, anyone can do it.  Hallmark, American Girl, Best Buy, Quicken Online, Kentucky Derby and countless others are doing it.  Companies with high traffic online applications from every sector are doing it:  Retail, Finance, Banking, Online Gaming, and so on.

This week I was on a pre-sales call and I heard a phrase that comes up often:  “We would never test in production”.  There are a number of questions that usually pop up that underline the reasons why companies might feel that way:

  1. What about live customers on the site?
  2. What about security concerns?
  3. What about dirtying up production data?
  4. What about fake orders in production?

… to list a few.  These are all very valid concerns, and a proper methodology for testing will address all of these.  Suffice it to say though that every major concern that could possibly come up can and has been addressed.  I will cover some of these specific concerns in a later post.

Ok, so… “Hey Dan – that’s all good, but, why bother testing in production?”.  This is what’s really important.  I have seen so many companies start out with a ‘we would never test in production’ mindset, have a serious production crash, and become immediate believers.

There are just too many variables in between production and lab environments to test in one and make assumptions about performance and capacity in the other.  Lab testing is also an important part of good performance management processes… but a 200 millisecond page response time in a lab can become 10 seconds in production when a batch job kicks off under load.  What if you are under peak traffic levels and a database backup kicks off.  What if you are on shared infrastructure and a different site other than yours experiences peak traffic through a shared network device like a load balancer or firewall?

Even if you’re not performance testing in production, what about failover testing? This “Im not going to test in production” mantra usually permeates every part of the organization.  So forget performance testing for a second..  what about pulling the plug on servers and switches to test failover?  If you have failover mechanisms in place but are scared to pull the plug on the in production for fear of impacting a ‘real customer’… something is wrong with that picture!  Google pulls the plug on entire data centers every 30 days to make sure everything fails over correctly and they do it under live load.

There’s a lot that could be covered in this topic, but I just wanted to spark it up with some food for thought.  Recently I had a conversation with the guy who managed the New York Stock Exchange production floor systems.  Every weekend after the markets shut down on Friday, they would conduct performance tests stressing the trading transaction systems, apply a methodology for clean up and reset, and be good to go by Monday morning.  They started doing this in the mid nineties.

Email Us!
Subscribe to our Feed!
Join us on LinkedIn
Find us on Facebook
Follow our Tweets
See our pics