Velocity NY 2016 Keynote – Connecting Analytics to Actions

Most web sites today collect a staggering amount of performance statistics. Back end data, front end data, real user measurements, synthetic data, all can be useful. But without connecting this data to specific actions we can take to improve the site, all we are collecting are summary statistics for static reports.

In this talk Buddy will describe how SOASTA applies analytics to the data we collect to generate specific actions site operators can take to improve their performance. Further we will explore the impact these changes have on revenue and approaches to prioritize work to ensure that the most impactful activities are always taken first.

Speaker: Buddy Brewer, SOASTA




  My name’s Buddy, and I work at a company called SOASTA. We were onstage at a keynote earlier, talking about performance analytics, so that’s a big part of what we do at SOASTA. Really, pretty much all I’ve done in my career, it’s kind of funny, just as a pure matter of just kind of chance, I ended up working for a startup company in 2001 when I got out of school that happened to be doing web performance in 2001. We were timing how long it took for people to download pages on IE6 or whatever, right? We were doing it synthetically from little software agents that people would download, and mostly over dial-up connections, because that’s what people had at the time. It’s been really interesting to kind of watch the industry evolve from 2001. We ended up becoming part of Gomez, which was a really popular synthetic monitoring company that was purchased by Compuware.


  Then after I left, the synthetic monitoring business where I spent most of my career in software engineering, I do product management now, I kind of pivoted away from that and got more interested in real user measurement for various reasons and started a small company with the guy who wrote the Boomerang library for RUM, Philip Tellis. We started this little company called Log-Normal that we sold to SOASTA, and it became part of SOASTA. Just telling you this whole story, because I think it’s interesting to see how, like I said, the industry and the problem space has sort of evolved over the years. Right? Back in the first chunk of my career, it was really all about backend time and network download times. Right? CDNs were further just becoming a thing. Everything that you did was, there was no front-end engineering. Right? I mean, you were like an asshole if you put JavaScript in the page. You were supposed to do all that stuff on the server side with JSP tag libraries and stuff like that and render all of your content on the server, and then push it to the browser.


  Then the browser was kind of like this dumb terminal, right? It was almost this reversion to 1970s era and 3270-type stuff, where it just displayed content, the server did all the thinking, and so all the time was spent on backend. As the ecosystem of tools, the first APM companies were starting to build out their tooling and launch their products, and they focused on a very particular narrative that I think persists to this day, in 2016, which is this notion of drilling down to that line of code that you or somebody you know wrote that’s totally in your control that’s running on a server somewhere where you can log in and you can fix it, you can install agents. It’s a really nice, convenient narrative that, to this day, I think we see a lot when … I’m from the APM industry, so I’m sort of criticizing myself as being part of the problem here, but we’ve always kind of clung to that notion, because it’s so elegant. Right? It’s like, found this problem, a spike in a time series chart, because God knows APM companies love time series charts.


  You click on that, and then you drill down, and you drill down, and you drill down, and there’s this line of code in your backend that you can go fix. Well, that falls down in most cases when I talk with customers with the problems that they’re actually confronted with in real life, when they have problems where their users are complaining, for two reasons. One of them is really well known. Steve Souders made this one really famous when he wrote his first book about high-performance websites, where he was talking about eighty to ninety percent of the time spent waiting on a webpage is on the front end. Right? He said, “Start there.” We find that that’s true to this day. In that keynote earlier that our chairman Ken gave, he was talking about how many user experiences we’ve measured over the course of the four years that we’ve had this RUM product at SOASTA, and we can look across all of that data, which is kind of neat. That’s a nice thing about being a vendor, I get to see data across lots of websites, not just the site that I operate in.


  We found that it generally holds to be true, and that’s what this chart here, which unfortunately on the, with the ambient light, it’s kind of hard to see because it’s dark, but the blue down here is the backend time. This big expanse of green here is front-end time. Right? All of that code that’s happening on your backend, it’s important. It doesn’t mean that you shouldn’t invest in some sort of a server-side APM agent-based type of solution that is kind of a table stakes insurance policy thing you got to have, and there’s a lot of really great companies, like New Relic, AppDynamics, Dynatrace, that supply those types of technologies, but it’s not enough. It’s not nearly enough anymore. In fact, I was having dinner with Steve a couple of nights ago when we first got here, and we were kind of laughing about this, and he had this provocative thing he used to like to say, which was that if you optimized all of your SQL queries to zero, the reality is most of your users wouldn’t notice. Because they’re spending so much time waiting on all this front-end stuff.


  The backend stuff is such a tiny slice that when things are operating normally, I’m not talking about exceptional cases where the backend’s taking thirty seconds, but under normal circumstances, you could optimize the backend stuff all the way to zero, and most people wouldn’t notice. There’s a lot of focus on front end, and a lot of the traditional things that we’ve done that were very backend-focused don’t apply. That’s the first thing, but the second thing, it actually gets a little bit worse, because that wouldn’t be such a hard problem if you actually controlled all the code, but one of the things that we find now is, everyone’s been moving increasingly to cloud technologies and third parties. No one’s building their own comment engines or recommendation systems or ratings and review systems anymore. They bolt all that stuff on with third-party JavaScript, third-party ads, third-party infrastructure components, obviously CDNs, which have been around forever. That’s a form of infrastructure. In any one of these places, things could go wrong, and you’ll get blamed for it, as a site operator.


  No one’s going to say, when they log onto your eCommerce site, for example, like, “Man, I’m really pissed at Tealium today, because they’re slow, and they’re delivering this tag that blocked render.” They’re going to be angry at your brand, of course, and none of this is code that you actually control, so even if there were APM solutions that drilled down to the exact line of code, you wouldn’t be able to do anything about it because of these third parties. What I wanted to present today for you guys to consider, and if we have time, I’d actually love to hear what you think and learn from your experiences, because I’m in product management, and we need that kind of feedback in order to inform our roadmap, but I’d like to present to you today nine tips for how you can connect analytics to actions, because that’s the dilemma that I think people are faced with. It used to be easy to connect analytics to actions. Right? I mean, that’s the drill-down thing, and here’s my code, ship off a build, and now I’m done.


  The analytics are sort of inherently actionable, but now there’s this like action gap where analytics, I was joking about this with some folks yesterday too. The problem with analytics is, they don’t actually do anything. Right? All they do is kind of complain and moan, and it’s like, “That’s slow. This isn’t working.” Right? Like, “I’m sad,” and you have to look at the analytics and turn it into something to do, and it’s really difficult. Here’s kind of a framework that I want to propose for how you might think about it. It really boils down to two disciplines that are important to get right. The first one is situational awareness, which analytics are really good at providing, and then the second one is contingency planning, so thinking ahead, right? What if … If this breaks, suppose this breaks, or this other thing doesn’t work, then what am I going to do about it, in a world where I can’t always just fix the code and then ship another build? It starts with gathering all of your dashboards and your control panels in one place.


  By the way, that’s another key point, is that as more things move to the cloud, the actions aren’t always, change even this line of front-end code and deploy it anymore. It’s, go log into like your Amazon console, or your CDN control panel, and change a setting, or push a new configuration. This is just, it seems obvious, but I think a lot of people don’t do this when I go out and I meet with our customers that are struggling with performance, getting everything in one spot, so that the person responsible for the Akamai configs isn’t super far removed from the person that’s looking at the user experience data. What you want to do is, and this is another thing I think is interesting about this industry, if you look at the … I have this theory that if you look at all the exhibitors at Velocity this year, you’ll find that eighty percent of them fall into two categories. The first category is, they’re all analytics companies of some form or another, of which we’re, of course, one, SOASTA. We’re right there. Then the other one is all these infrastructure providers.


  The analytics companies don’t actually do anything to change the behavior of your website, and the infrastructure providers are sort of mute. Right? They just, they stare at you until you explicitly tell them to do something. The challenge is to get all of this stuff in one spot, because in order to do this front-end operations kind of connecting these analytics to the actions, the first thing you have to do is assemble all of the things you have available, all the tools in one spot. By the way, that’s not just the performance tools either. It’s also things that you might not have thought about, like marketing analytics that can tell you things about how the revenue is doing, so you can correlate that with some of the performance metrics that you’re getting. All of those types of things, getting them all in one spot, whether that means you’ve got a whole [slew of TVs 00:09:20] across [a knock 00:09:22] somewhere, or you’ve just got like thirteen tabs open in a browser on the day you launched this really important campaign, or on Black Friday or Cyber Monday, or you …


  We have this product called the Digital Operations Center that kind of rolls all that up for you, and this hardware thing, but just get it all in one spot. Then number two, I think this is a big mistake that people make, because we’ve put this notion of drilling down on this pedestal is not optimizing too early. One of the things I find happens over and over and over again is people will look at their website, and they’ll say, “I know I need to improve performance,” for whatever reason, right? “Aberdeen says so.” “I read an analyst report that said people are waiting too long.” Okay, great. However you get the religion, terrific. Go work on the performance of your website, but then what they do, and this is where they start to go off track, is they say, “All right, let’s figure out what the slowest parts are on the site and start optimizing them.” Oftentimes, the parts of your sites that are the slowest aren’t necessarily the parts that you should optimize first, because people don’t have an even level of patience across everything. We see this pattern happen a lot in retail.


  You can’t see the little captions on the x-axis, but I can, so I’ll read them for you. It’s product, search results, and category. We see that happen a lot. The things that are right at the moment, where you’re deciding to put something in your shopping cart or not are when people are the most sensitive to the performance of the page. Oftentimes, those aren’t exactly the slowest things on the site, though. Things that happen deeper in the cart, maybe they switched the session over to SSL, there was a negotiation that happened, it slowed that particular step down, it does it frequently enough that it takes the whole median and it makes it larger, so it has a longer response time in general. Certainly, things that happen really deep, if you’re doing any kind of credit card validation or anything, those are often the slowest pages on the site, and I’ll hear people say, “Really excited about working on performance. Going to work on these things, because they’re the slowest parts of the site,” and that’s actually wrong.


  By the people get that deep in the funnel, they’ve already made a certain level of commitment, and therefore they’re more patient. What you have to do, and we created an algorithm for this, it’s based on a correlation coefficient that I don’t really, or a correlation that I don’t fully understand called the Spearman correlation coefficient, or a Spearman correlation. The idea in general is, you collect in the same context, and this is really important when you’re doing your analytics, have your RUM pick up information about how long people are waiting, but also what the actions are that you ultimately want them to do. Right? Whether that’s looking at a second or a third or a fourth page on your media property so you can deliver more ads, or whether it’s completing the funnel and checking out the transaction or signing up for a newsletter, collect all that behavioral data in context with all of the performance data, and then you can do these correlations where you say, “Okay, well, I can do a cohort analysis for each one of these steps.” Right? Let’s take the product page, for example.


  When people in general have to wait two seconds on the page, how likely are they to convert? When people have to wait three seconds, how likely are they to convert? When they have to wait four seconds. You can produce these curves, and the curves tell you something about how sensitive people are to the performance of that particular step. Then all this does, those blue bars are the score that we apply. The taller the blue bar is, the more sensitive the user is to the speed of that page. In other words, the less patient they are with having to wait on that page. What we recommend is that you optimize or you focus your efforts on those where the users are most sensitive, because what we usually see is a pattern like this also, where there’s this big drop-off. There’s three, maybe four. It’s usually three or four pages that have these really high impact scores, where performance is really important, and then there’s a long tail where it’s not nearly as important.


  If you’re thinking about, you’ve got a big event coming up, I mean, we’re getting into Thanksgiving shopping season pretty soon, so for anyone who’s in retail, that’s probably a good case study. You’re thinking about, “Okay, what are the parts of the site where I need the most safety nets, as far as performance?” What we recommend is, you do this type of analysis, and then target those pages. Not the pages that are the slowest, but the pages where users are most sensitive to changes in speed. This is the conversion impact, but incidentally, you can do the exact same type of analysis if you’re a media property. We have an algorithm that we call the Activity Impact Score. What it does is very similar, except instead of focusing on a conversion, in other words a specific action, we’re focusing on how likely they are to look at more pages, so we’re optimizing for session length, in terms of page views. There’s an assumption there that the longer the session is, if you’re a media property, the more profitable that user is for you.


  After all, we think about how whether you’re in retail or you’re in media or any other business, the two big chunks to how you generate revenue are traffic acquisition, in other words how you get people on the site, and then what they do when they’re there. Just to be clear, in the performance industry, my belief is, we can only help you with one of those two things. Right? No matter how fast your site is, that’s not going to magically bring strangers to your website, so that’s on you. You’ve got to figure out how to, through advertising or whatever, get people there, but what we can help you with is when they get there, we can help you deliver them the best experience in a way that produces the outcome that you need as a business, whether that’s ad revenue or selling things. Which kind of leads us to the third one, and this is all around … We haven’t taken any actions yet.


  We’re just trying to build some situational awareness and figure out where to focus our efforts, because the other thing I find, I’ve never talked to anyone in the fifteen years that I’ve been doing this who has all of the time they need to do everything they want. Right? There’s always a decision that has to be made on what I’m going to do this week or this month, because I can’t do everything, so this is about prioritization before we start actually making some of these things actionable. The other thing is, is figuring out what your target speed actually is. Analytics companies are really good at giving you a baseline. There’s not a whole lot of magic left in that. I think it used to be that there was this big thing about evaluating analytics tools because these use arithmetic mean, and because of the distribution of performance data, it’s not normally distributed, so the mean’s not a very good summary statistic, and the median or percentile is better. That used to be a thing. It’s still a thing, except substantially everyone does percentile-based analysis now, and so it’s less about the …


  We’re all really good. Everyone in the industry is really good at telling you how fast you are. The part that’s tricky, and the part that’s really important for you to also know, is how fast you actually need to be, so that you’re not just making stuff up. I mean, I talk to people all the time who, they just, they can’t do any better, and they want to have some sort of a goal. Right? Just to have a, put a stake in the ground, so they’ve just come up with something out of the air, right? Either, “The analyst, Gartner said this, Aberdeen said that,” or, “We had a meeting about it, and we just kind of arrived at a nice, round number.” You can do a lot better than that with this cohort analysis that I was talking about. People often talk about their conversion rates and their bounce rates as if there’s only one. Right? “The conversion rate for my site is three and a half percent.”


  Well, that’s true overall, but the reality is that you also have many, many, many different conversion rates among different cohorts, and for the purposes of performance, what we’re interested in is a cohort-based analysis on how long people are waiting. Let’s say your conversion rate overall is three and a half percent, like in this example on the slide it’s five point two percent. Right, but you can see via the green line here, which is the conversion rate, and this x-axis is time, how long people are waiting, the blue region just shows you how many sessions were in each load time bucket. That there are people with fast experiences that converted at a rate much better than the overall number, and then it decays as people have to wait longer and longer, and we can use that analysis to figure out what our optimal load time is. That lets us, that gives us a goal, if we’re trying to improve the site. We can even do, and this is a, we call this our what-if analysis widget in our mPulse product.


  We can actually do these little sliders where you can say, “Well, what if I sped the site up by a hundred milliseconds? What would that generate for me in terms of revenue? What if I sped it up by a second? What would that generate,” or, “What’s the absolute ideal peak speed that I should be operating at, and what’s the ROI, so that … ” One of our customers, for example, used this because most people don’t have … Most companies don’t have like a dedicated performance team that has control of the code, so engineers have to make changes in order to improve the performance of the site, and those engineers have lots of demands on their time that’s not performance. There’s a product manager somewhere in the company that’s arguing for redoing the personalization features, or something, right? It’s all this new feature development.


  Speed becomes its own form of technical debt, and there’s this need to be able to justify why speed should be prioritized up, so we built this for one of customers so they could do this analysis and actually use it as an argument, or the basis on which they [want 00:18:21] to delay new feature development for a month or two while they worked on speed, because it had a defined ROI. The what-if analysis thing is really cool. We’re really proud of it, and it’s kind of a centerpiece of our RUM strategy where we talked about how fast are you, how fast should you be, and then how you get there. We explain more about it on a blog that Tammy Everts here, in the front row, writes and maintains and manages. This is a really great post by one of our engineers, Nic Jansma, describing some of the details, if you’re interested in more than I just described here. For the purposes of this talk, where we’re talking about more of an operational use of this kind of stuff, it’s great to do what-if analysis strategically.


  If I were to invest in making my site faster, whatever that means, like spending time redesigning the site, hiring more headcount, or making an infrastructure investment, what’s the ROI, but the corollary is, you can also move that slider the other way and say, “What happens if I slow down?” There’s goal setting for making improvement. There’s also deriving your incident severity thresholds from revenue impact. This is another challenge I think a lot of people are faced with, is, “Okay, great. My baseline, response time, suppose it’s three and a half seconds. Okay. We’ll suppose it’s Black Friday, and the website slows down to four seconds. What now? Is that just a blip on the chart, and I don’t bother doing anything, or is it something that I alert on, but at like an info level?” Right? It gets rolled up in the nightly report, and people can look at it and say, “Hey, this is something to keep an eye on,” or is it emergency? Should someone be paged, or is it even more than that?


  Do I need to go get like the war room, like Tom from Uber was talking about, obviously, they had a very severe outage, and they had a very big, robust response to that. I think a lot of folks that I talk to struggle with defining what those thresholds need to be, and especially because it’s kind of a human problem. Right? There’s a negotiation that happens. If somebody says, someone who’s more conservative says, “No, I want everybody in a war room if the site like hiccups,” then the people who actually are like the ops folks that have to do the work, say, “No, I want a much bigger change in response time before I get out of bed at 3:00 in the morning.” We’re proposing here, as a way to take the analytics and help make them a little bit more actionable, one of the ways you can do that is to use this type of revenue analysis to do a risk assessment. People usually do this what-if, again, to figure out how much more money they can make.


  Figure it out, use it to figure out how much money you can lose, and then change the question from instead of saying, “What are the response time thresholds that generate different severities,” phrase it as, “How much revenue am I prepared to lose per hour?” You can negotiate this with your head of eCom or head of digital media, or whatever. Get your, the line of business kind of involved in the conversation and say, “Okay. How much revenue are we ready to put at risk,” and then let that then derive your response times that would produce those different revenue losses, and then assign severities to those. It helps you bring a little bit more science and a little bit more relevance in sort of a business setting to what those response time thresholds ought to be, and then you can revisit them periodically as your users change and your site content changes. Once we’ve developed some situational awareness, the next thing we need to do is actually figure out what the heck we’ve got, right, in terms of front-end infrastructure. There’s a couple of ways to do that.


  A lot of times, people don’t actually realize how many different components make up a site because so many different people are responsible for building a large one. A good way to do this to map things out without having to put forth a lot of effort is, for example, Ghostery has this thing called the TrackerMap, where they’ll just go to your site. Actually, they already have this data. They’ll just display it for you, of everyone who is actually on your site, and we’re particularly interested in those components that have like dials and knobs and things we can turn. Right? Yeah, there’s different categories of those that I think are common. One is tag management. I have kind of a love-hate relationship with tag management, because Ilya Grigorik from Google a few years ago at Velocity did this really cool talk where he took all the HTTP Archive data, loaded it into Google BigQuery, and started running some different queries. He found that, in general, there’s this very strong correlation with having tag management on your site and having crap loads of third parties and lots of load time problems and everything, because people just get …


  It’s like a kid in a candy store, right? It’s super easy to turn all this stuff on, and it can produce problems, but the flip side of that coin, and the nice thing, the thing that I really like about tag management, is that it gives you operational control over those third parties. If you load all that stuff into tag management, you now have kill switches on your site, and eCommerce companies that built really successful sites before any of these types of companies existed, like Walmart, for example, built it themselves. A lot of people … Which is fine. You can totally do that, and there’s a …


  It’s kind of cool to do it on the server side, so you don’t have this serial problem of the tag manager has to load before you load these other components, but the bottom line is that you have to get all of your tags under some type of control in order to be able to make the decision that says, “My analytics just told me that Disqus is having a bad day. Well, maybe it’s more important to me that I deliver my articles and the ads associated with them than I deliver comments.” It’s nice to be able to have the ability to turn that off, and putting all that stuff under tag management is a good way to do it. There’s really no excuse anymore. Google Tag Manager is free. I’m not sure what their traffic limits are and things, but it’s not hard to get into that. The other thing that’s … The CDNs, none of the CDNs want to be viewed as a commodity anymore, so they’re offering all these front-end optimization features. FEO, people coming down on different sides of FEO.


  There was a period of time where I think there was kind of this notion that FEO was this … Front-end optimization, this thing, this magic wand that you could just kind of bolt onto your website, and it would just make things faster. Doesn’t really work that way. It turns out that the best application of FEO is to actually understand how it works. Like Akamai, for example, has this thing called Ion. It’s got like thirty-seven or forty or something different things you can turn on that do neat things, like you can turn on this thing that says, “I want all of my images that are below the fold to lazy load.” A lot of people build that themselves. There’s a way to just kind of turn that on in FEO. You can do things like if you have a base page that takes a really long time to download, you can turn on, and it creates a head-of-line blocking problem. You can turn on this thing that flushes the page early, and the browser can get started rendering stuff, but you have to actually turn these things on.


  In order to know that you need to turn those things on, it’s helpful to have some analytics that actually say, “Hey, this page has a really long base page download time, and it’s associated with a step in your funnel that generates a lot of revenue, and it’s having problems.” The remediation to this could now be as simple as turning on this configuration, instead of having to go and change your code and push a build and all that kind of stuff, but you have to know that those dials are there. That’s the point here, is to map out your front-end infrastructure, CDNs, Amazon Web Services obviously has this really very complicated, for lack of a better word, dashboard console where there’s lots of things you can turn. Figure out how all this stuff works before you start having problems in production. That’s the corollary. Then take your key pages, so now that you’ve kind of … You’ve mapped out your infrastructure, you know what your tools are, you have your analytics in one place, you know which pages are most important to the user, and operationalize them. That’s doing things like tag managers, front-end optimizers, CDNs.


  The point here is that I think there’s been this lore, like the cool thing now that everyone wants to talk about is immutable infrastructure, infrastructure is code, everything is a deploy. “I deploy a thousand times a second. How fast do you deploy?” That’s, it’s nice, it’s good, but it’s not the only solution to every problem. It becomes this hammer that sometimes there’s this tendency to go whack every problem with the same hammer, and there are other tools that you can bring to bear on the problem. Not every problem in production necessitates a code deployment of some type to fix. With those key pages, think about, “How do I design these pages to fail,” right, or, “To handle failure, or how I can actually adjust them without having to push code?” Accounting for failure, so again, having the ability to turn off, identify which third parties are blocking rendering, for example, and if you can’t, [a so you 00:27:03] can go defer them, then put them under tag management so that you can at least turn them off if there’s a problem.


  The other thing that I’ve seen people do that’s useful is actually consider in the design of the page ways to modify at runtime how the page behaves. For example, if your default is to return twenty-five search results, and all of a sudden, under load, there’s some sort of problem, turns out most users are kind of focused on the top ten anyway. Maybe you could change your search results on the fly to ten instead of twenty-five, but you have to build those kinds of configurations in so that your operations folks, the folks who are close to the analytics, can actually do these things on the fly without having to get an engineer involved and necessitate a code push. Third-party tags under control. I think we talked about this already, but a couple of things I find interested about this, that media gets hit really hard with this problem, because of how poorly ad providers behave. Not everybody knows that, because everybody in the media industry is freaking out about ad blockers these days.


  James Turnbull at the Velocity Santa Clara, if anyone was there, gave a really, really great talk about that, and what he’s seeing at Opera. I think the stat here is pretty staggering, that came out of this report, that when you think about how many pixels ads take up on a page, it’s usually only about nine percent, but they use up half the bandwidth, and they account for half the load time. James’ point, one of his points in his keynote at Velocity this year in Santa Clara was about the impact that has on mobile devices, and particularly people in emerging economies that you might actually be interested in moving into, but they have to pay quite a lot of money for bandwidth. You’re eating up, chewing up their bill, but yet you have to advertise, you have to do something, even if a lot of ad providers these days are essentially bad actors with respect to performance and bandwidth.


  If you can’t, again, wave a magic wand and fix the ad delivery industry, then at least you can put those things under your control, as much as you can, so maybe you can turn off, selectively turn on and off certain ad units. If you can do that, make sure you know how. Make sure you know where those knobs and dials are. If you can’t, then figure out a way to do that so that you can control those pieces. We have a, we actually devote a lot of our product development these days to trying to analyze this third-party problem. What we’ll usually do is, we’ll take someone through an analysis like this, a tree map that we have here that sweeps up all of the user experiences and then organizes them by host name. You can see that the most popular host name is the first party, but after that, we pretty quickly get into third parties’ core metrics, which is marketing analytics, Montate. There’s a Bing widget, apparently, something off Anyway, look at all of this and figure out where the big red boxes are, right?


  The third parties that are in lots and lots of user experiences and consistently slowing things down. Then from there, what we usually do is take that particular host that we’re interested in, and we put all of the user experiences into a distribution so that we can see if it’s being skewed by an outlier, or in this case, there’s this weird kind of bimodal thing going on where, or maybe even more, where there’s this population over here, so it helps you drill down and figure out where the problem is. Maybe it’s focused, maybe it’s something that happens across the board, maybe it’s something that affects people on mobile devices more than on desktop, and then finally, once we understand the distribution, and we can kind of filter down to the population that we’re interested in, then we can even start to look at individual resources, so that’s what this view is, trying to find the URL that’s causing the problem.


  This is a little bit different type of analysis than the way the time-series-based analysis that those of us in the monitoring industry have kind of been forcing on people for years, where you click on the peak, on the time series chart, and you click on this, and you click on this, and then you’ve got this waterfall with the long stripe that says, “This thing took a long time.” There’s some utility in that, but when it’s the only tool you have, the problem is, all you’ve done is solve one user experience. What we want to do is orient our analytics around sweeping up lots and lots of user experiences, so each one of these is a distribution. We’re not trying to find the thing that caused a problem for someone once. We’re trying to find the things that generally cause problems, and so then, ideally, you could just take it off your site, but to be honest, most of the people that I work with these days have already thought about a lot of those, with the low-hanging fruit, and they’ve removed them.


  Now what they’re dealing with are things that they can’t remove, just wholesale, off their website, so they’re left with this problem they have to manage. What we want to help people understand is when and how and where the performance degrades, and then again give you some kind of actionability so that you can take it off temporarily and then put things back on. Ultimately, there’s only so much that you can do in engineering and operations to actually deliver good user experiences. It’s not sufficient just to silo that responsibility in those groups. The folks who actually design the content and the collateral that goes on the page have a role in this too, and one of the things that I would suggest, that you work with your marketing and your advertising, anyone who’s in your creative departments or whatever, to work with you on, is, for these campaigns, consider having a backup plan if the first one doesn’t work out. A lot of times, there’s this debate over the … It’s kind of the classic debate of the designer who wants the beautiful landing page, and it’s thirty meg or whatever.


  I’m exaggerating to make a point, but … Then there’s the engineer who wants it to be like all text so that it loads instantly, and then there’s this kind of horse trading that happens that says, “Okay, well, where … ” Usually, it’s the people on the creative side that wield the most power, so it skews more toward heavier pages. Instead of arriving at like just this one page that we’re going to do, something to consider doing is say, “Okay, fine. Have it your way. Do the bigger page, but just humor me. Let’s make a backup one just in case.” Because the thing that we found out, the more that our company has started working with people in marketing departments versus our traditional constituencies and ops in engineering, is how much money people drop on these campaigns. It’s staggering, right, the amount of money that’s at risk, and they do it every day, launch a new campaign, and a new campaign.


  As a hedge against the risk of delivering a poorly performing page that costs you lots of money, and think about how much money gets spent on traffic acquisition, the emails that get blasted out, the click advertising, and everything to get traffic on the site, only to drive people away because the page doesn’t perform correctly, it seems like an easy insurance policy to take out to just have a backup in place so that if something happens, you can swap it in. In order to know that you need to swap it in, you need some sort of a real time view that tracks things by campaign to see how that’s doing. That’s another part of our mPulse product, so … [I think 00:34:03] if you saw the keynotes this morning, Ken, our chairman, showed some of this stuff, but the idea is that you’ve got this multi-panel view. You can track the overall revenue for the day. You can track the speed of the experiences people are getting, and then you can break things out by campaign. We’ve seen a couple things happen here.


  One, the campaign starts performing poorly, and there’s this scramble to figure out how to fix it, and that’s what I’m suggesting here today, is that based on those experiences, maybe just have a backup plan, and then have a way to slot that one in. Where we have seen people slide in other campaigns has usually been because the revenue starts kind of flatlining because the content’s stale, and then, so then they’ll move to the next one. Whether it’s performance-based or it’s content-based, the idea is that you have to have some visibility into it, and you also have to have the ability to, in real time, swap content in and out. Number eight, this one’s simple, not a whole lot of complexity here. I just wanted to make this point about mobile devices and how important they are. A couple of things that have happened recently that I thought were interesting. I saw this report that, actually, this was something that Fastly put out, but they were quoting Adobe, talking about how much traffic came from mobile devices on Cyber Monday last year.


  It turns out that forty-four percent, nearly half, and that’s almost a year ago, of visits to websites on that day came from mobile, and so clearly it’s becoming this moment where we’re kind of beyond the tipping point now. People have been saying mobile’s important for years. It’s sort of arrived. There’s tons of people that the other thing that’s not on there is, I want to say the revenue that they accounted for was something like twenty-six percent, which tells you one of two things. It either tells you omnichannel’s really important, and you have to deliver good experiences across devices, because they’re going from mobile, and then they’re completing their checkout on desktop, or worse than that, it just means that your performance on mobile is so crappy that even though half the people go there, they don’t transact, because they’re just driven away. They probably opened up the Amazon app, because that’s really fast. We see this pattern a lot, where desktop …


  This is from a real customer. It sort of reflects exactly what that study was saying, because it is nearly half, nearly but not quite half, on mobile and tablet, that are pretty doggone slow compared to what we see from desktop devices. [Somebody 00:36:26] playing marbles up there. Anyway, and what’s particularly bad, and we see this over and over again, is tablet performance. We’re not quite sure … It seems like it’s one of two things, either because tablets are really, really old, because people don’t, people upgrade their phones like every year, but they hang on to their tablets as long as they can, and then they upgrade them, because who wants to be on three upgrade cycles, three hamster wheels for three devices all the time, so people usually let their tablets lag behind their laptops and their phones. The other reason is that whether people are doing an m-dot site, or they’re doing responsive design, or whatever, they’re only kind of doing two. It’s like a binary approach. I either have like my phone layout, or I have my desktop layout.


  The phone layout looks comical on a tablet, because the buttons are huge, and so they switch to the desktop, but the desktop loads so much JavaScript and everything into the device that it overruns the ability of the processor. That’s something to pay very close attention to, and as you’re drilling down through your analytics, map that out and see if because the mobile traffic is growing so much relative to the overall pie of traffic that you’re getting, maybe mobile, just like we were talking about operationalizing key pages, maybe operationalizing mobile is another focus that you should put in place. Then finally is knowing what normal looks like. I don’t know, five or ten years ago, when I was at Gomez doing the synthetic monitoring thing, our standard narrative when we would go in to a new customer is say, “You need to get this monitoring in place early.”


  It’s a little self-serving, but there’s a reason why, like, “You need to get this stuff in early, because you need to generate a baseline, so that you can see what your patterns look like over the course of thirty days, or a week, or within the day, and things like that,” because it turns out that data has a seasonality over time, meaning that the values of the data change, and you can’t just … This is what plays hell with static thresholds. This is why people hate static thresholds. There’s a couple solutions to that problem. One, and this actually, this is a true story. It didn’t look like this. I got this off like Flickr or something, but there’s one of our customers, I was talking with him, and who was telling me that because of the way that the data changes over time, the threshold, simple threshold-based alerting, didn’t work. There were either too many false positives or too many false negatives. His solution to that problem was to have IT or help desk or whatever provision him another screen that he would put on his desk, and that was like the analytics screen.


  This is a major retailer, by the way. On like the analytics screen that he would put on his desk, and he would keep real time response time analytics for key pages on there, and so while he is working throughout the course of the day, he’d have this up. Then if something spiked, he’d look over at it, and just using what he knew about the site, he would contextualize it. Right? He’d say, “Okay, that’s normal. That’s not a big deal,” or, “Wow, our site that does, that everyone’s heard of that’s done a staggering, that does a staggering amount of revenue, is having a problem. I’d better call someone about that.” He was like the single threat and alert system. They had issues that cost them revenue because he went on vacation. You could take that and then put it in a [knock 00:39:40] and then create multiples of those people and all that kind of stuff, but we feel like a better approach is to try to use data and algorithms. One of the things that we just came out with recently in our product is this ability to do what we call tolerance bands. Right?


  You can use standard deviation, Holt-Winters. There’s a lot of different ways to do this, but the idea is that you build up a statistical model using a set of data over the last thirty, sixty days, or however much you can get, that suggests where, what the trading range throughout the day for each one of these key metrics are, and then alert based off of that. What you essentially have are dynamic response time thresholds, but whether you do this the old-fashioned, manual way that we used to tell people, or you do it the more modern way, it’s really important to understand what normal means before you have some sort of big event where everything just has to work. Those are the nine. Unfortunately, a lot of this is best practices rather than just pointing out pieces of software that solve the problem, because a lot of the different … As companies around the performance space, I think if you go around the expo floor, you’ll find lots of people who are doing interesting, innovative things, but unfortunately no one vendor has the solution to the entire performance problem.


  It’s up to you to put all of these things together. That’s really what drives us. These are three of our actual customers. That’s Tom. He works at Target. Gopal’s at Nordstrom, and Paul works at MSN, at Microsoft. They have this problem every day, where they’re stuck in the middle, in between these two massive investments that they’ve made in analytics and infrastructure that they have to wire together, oftentimes manually. When we think about our product roadmap and where we’re trying to go, it’s a big part of our focus is, how can we automatically tell you that you need to change this configuration in your CDN, or turn off this tag in the tag manager, or better yet, maybe at some point in the future, we can actually enable more of kind of an SRE workflow where you define thresholds that say, “You know what? Comments are important, but they’re not important beyond like ten seconds of waiting, so if that happens, just automatically turn it off and then notify me.” I think there’s a lot of opportunity to do that. We hope to kind of lead that charge.


  I’m sure a lot of other companies are looking at that too, and if any of you have ideas or war stories or whatever to share in that area, I’d love to hear them. Thanks, everybody.