By Ian Withrow
This is a multipart part blog series in which I’m going to share some musings on the topic of the business value of web performance monitoring. Since most people in business have at least a passing affinity for money, I think this is a relevant topic for examination. While the business impact of web performance is certainly not a new topic, I haven’t yet found any public material that tries to do an analysis for Web Performance Monitoring (WPM). In this post I plan to talk about the impact of Web Performance Monitoring (WPM) on operational metrics, such as Mean Time to Identify (MTTI), which relate to failures or problems on web property. The argument goes that these metrics, commonly tracked by IT and operations, are good proxies for business costs and that WPM will improve them. Thus as a consequence, WPM will improve the business bottom-line. I’ll also address the more complicated topic of how much value this might be. Let me caveat this outright that this is an illustrative example and a generalized one at that. However, this can be useful to you as a framework for completing this analysis for your own business.
First, to be thorough we have to ask: are operational metrics, like MTTI, actually meaningful? What these metrics try to do is track how long it takes an operational group to discover that their web property is underperforming and, in the case of Mean Time to Repair, return it to the norm. There is a body of evidence that for both large and small companies, the faster a site performs the better KPIs like time on site, customer satisfaction, conversion rate, and shopping cart size are. In fact it seems that the fundamental desire for more speed is limited only by our biological ability to perceive it, about 100-200ms depending on whether you play video games. So in practice operational metrics should measure how long a site or portion of it is slower than a historical baseline, which is really just a period of time during which the business potential of the site is lower. So it seems reasonable to conclude that operational metrics do in fact have a link to important business metrics, like profitability.
Notice here that I’m skirting the topic of when a site is completely down. This is deliberate because while WPM will tell you this, a good operations department has plenty of other ways that they’ll know this. Since my concern here is the incremental business value of WPM for operational metrics I’m focusing on the case where a site is impaired or unavailable externally to at least some users but this isn’t obvious internally at the business.
Second, we need to understand how WPM improves operational metrics. To do this I must address two issues: what a good MTTI is and what would it be in a world without WPM. The difference in these two is the ‘per incident’ value of WPM measured in time.
You might think that the ideal MTTI is zero, and maybe at a theoretical level it is, but in practice this could be disastrous. The internet has plenty of volatility and to generate an alarm and operational attention every time you get a measurement that is slower than the norm would require substantial headcount for the sole purpose of wild goose chases. In practice it’s better to look at average performance over a short period of time, say 10 minutes, and then if the alarm threshold is still violated you can be confident there is a problem worth pursuing. So what about if you didn’t have a WPM solution? The trite answer is that you may never know about some problems. However, let’s say you have an active fan base or partners who will complain. At first I thought well maybe they would call and you would know in an hour or two, but upon reflection I think this is too optimistic. Sure it could happen, but we are talking about WEB business here, people email or post complaints. In fact businesses encourage this because phone support is expensive and some business may not even a customer support number. A few quick Google searches show that companies that publish aggressive policies about turnaround time tend to target 12 to 24 hours for email. Factors to consider for your business is when someone actually looks at the email (versus responds) and how long it would take them to figure out what do with a complaint of this nature (they may not be trained). Bottom-line in the best case we are looking anywhere from half a day to days to indentify that there is a problem. At this point we’ve hopefully dwarfed how long it takes to fix the problem, although even this may not be a fair assumption.
Finally, it certainly looks like time justifies a WPM solution but what about the dollar bills? I’m going to do a quick, dirty back of the envelope calculation to highlight how I’d approach the problem. Again consider this as a rough estimate of the magnitude of the opportunity and a starting place for recalculating with your own numbers. First let me lay out a few assumptions. This summer‘s Velocity conference provided some handy data points for me to use here. In the Metrics 101 workshop we heard that an hour of downtime can cost $50k for a serious online player, see slide 11.
However, as I said before I’m interested in when performance is impaired. Here is some data pulled from a variety of sources:
- Bing: 2 seconds = 4.3% revenue change
- Shopzilla: 5-8 seconds = 5%-12% revenue change
- Mozilla: 2.2 seconds = 15.4% change in conversions
- Strangeloop (Small customers): 2 seconds is 13% to 25% change in conversion rate, if you are fast already
Depending on who you are, a 2 second change in performance seems like it can have anywhere from a 4% to 25% hit on the potential of your site, whether you care about conversion rate or search revenue. Let’s say 15% for our analysis. So if we know the hourly value for our hypothetical site is roughly $50K then 15% of that is 7.5K, what we’ll lose at an hourly rate when performance slows down significantly. We also found in the proceeding section that it will take at least 18 hours to figure out that we have a problem. Crunching the numbers we see that the value of performance monitoring is roughly $135K (18 hours * 7.5K) less $1.5K (1/6 hour * 10K). Throw in some overhead and WPM saves us about $100K every time we have a legitimate performance problem, although if you’re SLA for email response or MTTI without WPM is longer, then this number goes up really fast.
In conclusion, a quick calculation shows that WPM can be quite compelling with respect to MTTI. If you have one incident per month or more you can quickly save millions of dollars each year. Some other things you may wish to consider in your own analysis is: what is the hourly value of your business, how optimized is your site to begin with, (paradoxically the more optimized you are the more you have to lose) if and when would you know about a problem without WPM, how many real incidents do you have a year, and what percent of your users are impacted in a typical incident. Also I’ve just looked at MTTI; Mean Time to Repair (MTTR) can also be meaningfully improved by WPM.
Have a quibble or a WPM justification war story to share? Leave a comment or send me a note at Ian.Withrow@keynote.com
Photo by jeffmcneil