I recently had the pleasure of doing an O’Reilly webcast with Gopal Brugalette, senior performance engineer at Nordstrom.com. Gopal is also a woodworker, farmer, and former nuclear physicist — but that merits an entire separate webcast. 😉
The topic of our talk was how to get your site ready for holidays and special events. You can watch a recording of the webcast here, but if you want an express version of the talk, here are my top twelve takeaways about Nordstrom’s pain points, hard-won lessons, and performance best practices.
1. Always be ready
At Nordstrom, anything that ships into production must be ready for anniversary sale and holiday traffic surges — regardless of the time of year. This wasn’t always the case. Several years ago, performance was an afterthought. Nordstrom would postpone preparing for holidays and special events till right before the rush, and as a result, both performance and in-house efficiency suffered. Gopal reminded us that we should care about performance year round, because people use websites year round.
2. Crashes aren’t the only thing you should worry about
Slowdowns can cause as much damage as outages — damage to your traffic, damage to your revenue, damage to your customers’ satisfaction. If you care about the total customer experience, then you need to focus on more than outages.
3. Embrace performance as a feature
Performance is a key aspect of the customer experience. Think about performance as a part of functionality, not as a “nice-to-have” add-on.
4. Ensure that everyone in your organization owns performance
Performance shouldn’t be the sole domain of engineers and operations folks. Everyone on your team — developers, designers, product managers — should own the customer experience, and everyone on your team should ask themselves the same two questions about every new design/feature consideration:
- What is the impact to the customer?
- What is the technical impact?
5. Get the right tools
Gopal stressed the importance of implementing a set of application performance monitoring solutions that give you all the tools you need — from code profiling to log analysis to real user monitoring (RUM) — to monitor and analyze your site’s performance, figure out where your problems are, and find and fix the root cause.
6. Embrace continuous improvement
Gopal also stressed the value of small evolutionary steps — adding new tools and processes gradually — versus huge overnight changes. “We’re constantly getting better at what we do,” he shared. “We do what we do, making it better a little bit at a time.”
7. Study the past, but don’t assume it’s a predictor of the future
Don’t assume that load patterns for one event will be consistent for other events. Your site changes, your users change, your users’ behaviour changes. As this graph demonstrates — showing Nordstrom’s holiday load patterns alongside its annual anniversary sale load patterns — there are no constants.
8. Test early, test often, test everywhere
Gopal made a really interesting point during our talk: “Very few people would think about shipping major functional defects to production, but I’m always surprised at how many people will want to ship performance defects into production.” It’s potentially worse to ship performance issues than it is to ship functional problems with a new feature. This is where product management and technical teams need to come together to test throughout the development and production cycle.
9. Be prepared. But don’t be overprepared
Focus on the most realistic case versus the worst case. For example, a worst-case scenario is everyone in North America coming to your site on Black Friday, resulting in a spectacular crash. This isn’t likely to happen. Rather than falling down a rabbit hole of planning for wildly unrealistic scenarios, focus on what’s more likely to happen.
10. Build monitoring into your dev and test environments
You can’t test everything and you can’t analyze everything, but you can monitor everything (which will help you understand what to test and analyze). Put your monitoring plan into place early. Every time you consider a new feature, ask yourself, “How am I going to monitor the performance of this feature?” Then build that instrumentation into your dev and test environments. When you ship to production, ship the monitoring with it.
11. Coordinate with your vendors
Most third-party vendors will go into lockdown along with you during the holidays, but your vendors might not be aware of your special events. They might plan changes that could have an impact on your site.
To illustrate: One year, during Nordstrom’s anniversary sale, Gopal’s team realized that their order management system wasn’t processing orders any more. After much digging, they discovered that the vendor they used for fraud checks had implemented a change to the service just hours before the start of the sale — and this change was blocking all orders. Now Nordstrom lets all vendors know about its anniversary sale well in advance.
12. Realize that issues will happen where you can’t predict them
Great quote from Gopal: “I am 100% confident that everything we tested will work just fine.” When loads are different than what you modelled, that’s where you’re going to have problems.
An example from Nordstrom: During the most recent anniversary sale, mobile sign-ins ended up being nine times greater than anticipated during the early-access stage of the sale. Nordstrom was able to handle the load without crashing, but the spike triggered a large number of mobile users getting “access denied” error messages. This is why real-time monitoring is crucial. Nordstrom used RUM to get to the root cause of the issue.
There’s no silver bullet that will guarantee you a 100% seamless holiday shopping season (or any other special event). The best course of action, not surprisingly, is a customer-first philosophy, the right tools, and year-round diligence.
About the Author
Tammy has spent the past two decades obsessed with the many factors that go into creating the best possible user experience. As senior researcher and evangelist at SOASTA, she explores the intersection between web performance, UX, and business metrics. Tammy is a frequent speaker at events including IRCE, Shop.org Summit, Velocity, and Smashing Conference. She is the author of 'Time Is Money: The Business Value of Web Performance' (O'Reilly, 2016).