Web Performance Watch

How Do I Compare Thee? Linode vs Bluehost Web Host Performance Shootout

If you are evaluating or switching hosting companies for your Web site, don't just ask for an SLA guarantee, but do a site performance shootout before you cut the check. Recently I wrote a post on Beta Program, a site that I created to highlight how businesses can use Web technologies to run their operations better, for everything from using Web-based accounting software to building a better, faster website. The article, Mirror Mirror On The Wall, Who’s The Fastest Web Host Of Them All, demonstrated the performance gains I experienced when moving a website from Web hosting company Bluehost to Linode. For that article, I had focused on the overall user experience, concluding that the same site performance fluctuated between 1-2 seconds for Linode, compared to between 2-4 seconds for Bluehost.

A commentator asked "Vik– Just curious, when you ran your monitoring experiment comparing Linode to Bluehost, did you notice any trends in the performance details? Like significant differences in DNS lookup time, versus time to first byte, versus content components. Just wondering if the data reveals any specific “soft spots” with Bluehost?" I thought it would be useful to share my findings from this experiment on the Keynote Web performance blog. 

Methodology. I took a website that was built on the LAMP stack - Linux, Apache, MySQL, and PHP - and duplicated it on both Bluehost and Linode Web hosts. In both cases, I used the lowest plan that was available, and with Linode this gave me a Virtual Private Server (VPS). I made very minor modifications to the site to make it work right on the Linode environment, along the lines of changing the value of some variables that referred to the root URL. Then, I used Keynote's IE browser monitoring agent to run measurements every 5 minutes from the US-8, a group of 8 cities in the US. You could use other products as well, including the free WebPageTest service, which I used to create the video of the two sites.These were all high-speed, high-bandwidth connections, and I was able to ensure that I had a clean lab of performance monitoring agents to conduct this test. My assertions here are made after observing over 2000 datapoints per day on each website, from Mar 6 until today, for about 3 weeks. For all datapoints, I used Arithmetic Mean (though I could have used 95th percentile or median or geometric mean if I so chose, and if I remembered high school math better).

Visually Comparing Site Speed. Using WebPageTest, I first ran a site speed comparison. This measurement was taken from a server in Dulles, VA, and the video shows how long it takes for the two sites to load in the same browser. It's a great first step to help you understand the site visitor's experience. If you don't see the widget below, watch it on Youtube.

 

 

Measuring the User Experience Time. UX time is the time elapsed from when the browser started navigating to the page until the browser finished loading the page contents. This includes DNS lookup time (the time it takes for the browser to translate the URL you typed in to a host address), time to process all JavaScript on the page, to load the base HTML page, and all objects on the page such as images and JavaScript files. Here is how the two hosts stacked up over 3 weeks:

Indian_bento_site_speed

The site performance on Bluehost fluctuated between 2-3 seconds, and on LInode between 1.5-2 seconds. To answer the question on whether there were "soft spots" in Bluehost, I calculated the arithmetic mean on each of these performance metrics as well:

Indian_bento_metrics

As you can tell, the big gains are in the network and server infrastructure. For example, it took Bluehost 448ms, almost half a second, to return the first byte of the Web page, whereas for Linode it was 39ms. If you look at all the content downloaded, there isn't much of a difference between the hosting companies - Linode is only 4% faster (though, at almost 1.5s, there is room for improvement with speed optimization on the website content itself). 

Linode recently wrote about its network upgrades in an initiative called Linode Nextgen. It appears that their work is paying off, and if you host a website, you and your users would be well served by Linode.

Posted by Vik Chaudhary on March 29, 2013 at 02:35 PM in Site Load Time, Web Page Monitoring, Web Performance, Web Performance Testing, Web/Tech, Website Monitoring, Website Monitoring Software, Website Performance Monitoring | Permalink | Comments (2) | TrackBack (0)

| |

Olympic Drag?

122/366 - London 2012 ticketsToday begins the 2012 Olympic games, and after years of anticipation the world is eager to share the experience as it unwinds from London. This year’s event will be the most watched, followed and liked ever. Which begs the question:

Will the Olympics be a massive drag for the rest of us online?

This week’s Benchmark magazine includes an interview with Bhavesh Upadhyaya—head of operations for iStreamPlanet, which will be delivering live video streaming during the games. He says that online video utilization is doubling every year, and the demand for the Olympics will be huge. In fact, he suggests that the last mile may be at risk of saturation. “You're going to get to the point where, potentially for some major cities, you might be saturating an ISP with all the video that's being delivered.”

FCC-peak-speed-attainment-by-ISPWhoa! We knew that video streaming consumes a disproportionate amount of Internet bandwidth. Netflix consumes 33% of all U.S. traffic during prime time. And the FCC reported this month in their study of U.S. broadband performance that sustained download speeds during peak hours were pretty fast. But most of that traffic is more predictable consumption of on-demand (recorded) video.

The Olympics is an entirely different enchilada.

Expect Delays“This is the first time that anyone has actually attempted to execute at this scale,” says Donald Foss, Keynote’s director of global testing services. “It’s massive on a scale that, frankly, is only seen during the Olympics.”

So potentially some of your customers might feel a dent in performance if their neighbors are maxing out the ISPs pipes streaming Usain Bolt’s historic attempt to sweep the big three sprints.

What can you do? Minimize the impact of latency and congestion by keeping your pages lean—especially over the next few weeks. Monitor your CDN provider’s performance closely. Some may be working with streaming content providers for the games. And exploit caching options as much as possible.

Posted by Aaron Rudger on July 27, 2012 at 05:40 AM in Current Affairs, Web/Tech | Permalink | Comments (0) | TrackBack (0)

| |

Taking Advantage of the Cloud

As we’ve mentioned here before, adopting cloud computing strategies can generate transformative advantages for IT organizations, but not without important considerations. Reducing cost and improving user experience can be achieved by moving applications and infrastructure to the cloud. So how do CIOs get started, and more importantly, enforce and improve the quality of service they deliver to the business in a cloud paradigm? Vik Chaudhary recently spoke with the editors at CIO Insight on how companies can take advantage of the cloud with three straightforward recommendations.

 

Posted by Aaron Rudger on October 05, 2011 at 07:40 AM in Application Performance Testing, Web/Tech, Website Performance Monitoring | Permalink | Comments (0) | TrackBack (0)

| |

Deconstructing the Target.com “Fail Doggie”: A Keynote Perspective

Have you met the Target.com “fail doggie?”

He’s cute, but if you are a Target customer, you don’t really want to encounter him in a browser.  I met the “fail doggie” while using the MyKeynote portal to research the nature and extent of the major outage that occurred on Tuesday, September 13th 2011.  That was the day that Target began allowing online purchases of “Missoni by Target” items.

Fail-doggie-intro

A lot at stake

According to one source, the Target site re-build from scratch, launched just weeks before this incident, drew on the talents of over 20 vendors, including many of the biggest names in the e-commerce technology space.

Imagine spending millions of dollars and two years of time to “create a more user-friendly, reliable experience” and then have this happen.  Not fun.  When a major event like this outage happens, it can be difficult to get complete details from any one participant.

Passions are high when a crisis like this occurs.  Where can you go to get an objective vantage point from which to make accurate assessments and analysis?  At Keynote Systems, we’ve long been looked to as a neutral third-party with accurate and actionable Internet and mobile performance data, and that day proved to be no exception.

In this blog post, I’ll explain how I used the tools that every Keynote customer has in MyKeynote along with measurements being run for two of our public web performance indices to determine what users were seeing that day and the following morning and to determine just how extensive the outage was.

Target.com appears in a number of our public index measurements, so I had several to pick from – note that we never publish insights based on a customers’ private data, but only based on publicly available data that we collected without payment by any company. Two were particularly useful for figuring out what was going on and capturing screenshots as the day went on. The first measurement visited their home page only, while the second arrived at the home page and then performed a multiple step transaction, just as a customer would when shopping, placing items in a cart and checking out. 

First a brief bit of background: both scripts were written in the Keynote Internet Testing Environment, known as KITE and both measurements were being run with our real browser product, Keynote Transaction Perspective. I point that out so you’ll know that none of what you will see here was retrieved by a “bot” or other emulation system; we were getting the experience of a user launching an actual Internet Explorer browser to go visit Target.com.

How things went down that day

As word of the outage quickly spread, we started taking calls from various news media companies asking if we had data.  We took a look at the home page monitoring scatter plot below and saw what looked like a brief spike in response times followed by a restoration of reasonable response times shortly thereafter. 

02-firstScatterPlot

(click image to view full size)

At first glance, it was tempting to say that the outage had been brief and thus was relatively uneventful.  But a deeper look at the chart revealed telltale signs that something was not right.  Notice the distribution of the dots before and after the spike?  See how they are randomly distributed somewhere between two and six seconds before the spike (time on the network is on the left scale) but then they are all tightly packed down below the one-second mark afterword?

Each of those dots represents a full visit with a real browser, so I could click down on any of them to see a listing of what was on each of those pages and how long it took to get to the browser.  I did that eventually, but much like an operations team technician would be at such a moment, I was in triage mode at this point.  The first thing I wanted to do was to look for clues as to why those dots were organized as they were.

First, I hovered my mouse over a datapoint from before the spike in performance.   This would be my “normal” baseline to compare to.  Notice that the page contained 147 elements (separate downloaded objects) and a total size of about 2 MB.  The time on the network to download the page and all elements was 5.156 seconds. 

03-baselineScatterPlot

Next, I took a look at the two red triangles, which represent pages where we know there was an error of some sort or another:

04-SpikeScatterPlot

In this case, the element count and page size were lower than the baseline but the time to download was skyrocketing to a number eight times higher. The lower object count and page size were due to timeouts being hit… we couldn’t get all of the objects into the browser before time was up, so the agent running the browser quit trying and reported the error.

What was really interesting was the object count from the very next green dot after the red triangles.

05-1ElementScatterPlot

That data point showed only one page element and a very small download size of 640 bytes. 

One element?  I knew that if we had downloaded only one element, that it must be the base page of html, but a page with only 640 bytes couldn’t possibly have much to say.  That was my first clue that visitors were probably getting raw error messages.

I quickly scanned over the remaining “spiky” datapoints after the incident began and found more of the same: just one page element and a size of 639 or 640 bytes.   Here’s the last point caught during that initial spike:

06-639byteScatterPlot

So that was all fairly consistent with a site failure; something went very wrong around 8:00am EDT and the server took a long time to send out a very small page that was probably just an error message. 

What about all of those green dots hugging the bottom line after the spike had subsided?   I hovered over a few and got the same thing each time: 5 page elements and a page size of 31550 bytes. 

07-5ElementScatter

This wasn’t some random subset of the real page and no error was being recorded, so clearly these speedy responses were something altogether different. 

Now it was finally time to start drilling in for some details.  I clicked one of those data points and made my way to the page detail waterfall graph:

08-PageDetailWaterfal
 

(click image to view full size)

Welcome to our waiting room

I hovered over each of the bars and noticed that each object came from a folder called “spawaitingroom.” The image files consisted of a red stripe, a Target logo and a photo of the Target dog posing next to a tool box.  I had met the “fail doggie” for the first time:

  09-WaterfallPopupDoggie

(click image to view full size)

So let’s recap what I had learned so far:

  1. Initial data points have no page objects… just the base page, and that base page was TINY (640 bytes) in comparison to the pages that preceded it, but the amount of time the server took to return that page was HUGE.
  2. The datapoints following the big spike downloaded much faster and had larger base pages but only had five page elements instead of the 140+ found in the normal pages.
  3. Drilling in on the five-element page datapoints, I found that all five of the objects came from a folder called “spawaitingroom” – the “fail doggie” was part of a “waiting room” feature that was being served in lieu of the real home page. 

Getting the whole picture: what were people actually seeing?

I was making good progress, but I still had a lot of questions to answer.  I could guess that a Target dog next to a toolbox was probably some variation of an “under construction” page, but I didn’t have the text of the page yet.  I really wanted to know what those pages looked like but all I had were some pieces to the puzzle.

The thing that was complicating my sleuthing was that neither the original 640-byte server error pages nor the fail doggie pages were being sent as an “error” (http status of 4xx or 5xx) – they were being sent as successful pages (http status 200).  That prevented me from getting as much diagnostic information as I otherwise might have.  I could poke around at each scatterplot and piece together what I was seeing, but without an error, I wasn’t going to have the html from the page or any screenshots, both of which MyKeynote stores on a fatal error. Fortunately, all was not lost.  There's a benefit in more than 400,000,000 objects per day stored away inside MyKeynote.

Web Content Trending to the rescue

Here’s where the other measurement that was scripted to go five steps deep into the target.com site came in very handy.  The second monitoring script was set up to do the following:

  1. Go to the target.com home page.
  2. Perform a search for “lil wayne”
  3. Filter the search results to the “music” category.
  4. Click on the first album in the resulting list to view the details.
  5. Click the “add to cart.” button on the details page and then confirm that “1 item added to cart.” appears on the screen.

Without the real site being served up, there was no way the script could complete the search for an album.  When the Keynote agent piloting the Internet Explorer browser went to find the search box and type “lil wayne” into it, there was no search box. The resulting error provided a steady stream of screenshots of the home page throughout the day.

Let me explain a little more about why I got the screenshots.  With Keynote’s Web Content Trending option turned on, every screen is proactively captured and if an error is detected in subsequent steps, all captured screenshots are saved to the MyKeynote portal to support troubleshooting by our customers.  If there is no error, the proactive screenshots are discarded before ever being sent to the database.  Every time the second step failed, (which was on EVERY visit at this point), the screenshot of the home page taken on arrival was stored.

To get those screenshots, I simply had to drill into the scatterplot chart and I saw this:

10-EisForError

(click image to view full size)

I clicked on that “E” which is the error recorded for the second step and saw the page details screen below. I have added callouts so you can see where the links to the screenshot and html are:

11-SnapshotAndHTML

(click image to view full size)

Things were about to get a lot clearer in a hurry. 

The wrong kind of "direct communication"

I clicked on the thumbnail of the screen snapshot and sighed as I saw what visitors had seen in the moments when the site became unusable around 7:58-8:00am:

12-ServerError

We captured the above screenshot at 8:01am.  It shows what the 640 and 639 byte html pages with no images looked like in the browser.  This is a raw server error that was passed through all the way to users’ browsers (something developers and operations teams work very hard to avoid).

In the minutes that followed, the target.com team took rapid action to replace the cryptic server error with something more friendly.

By 8:14 am, we had captured the first of those friendlier images; the “fail doggie” page made its debut. About 17 minutes later we captured a new version of the page.  We saw additional changes again at 11:30 and 12:38.  It should be noted that these are times when we visited with the transaction-based measurement and that particular measurement was only set to visit ten times per hour.  The point is that these times are when we observed the pages and made captures, not necessarily the exact times that they changed.

Here, then, is the full gallery of “fail doggie” pages we recorded:

8:14 am EDT – “Oh no”

This page requested the user to “please try again” and provided a single link to “Target help”

13-OhNo

(click image to view full size)

8:31am EDT – “Hello”

This page let folks know the team was “hard at work making the site better” and dropped the “please try again” in favor of “Sorry for the inconvenience – we’ll be back up and running shortly.  It also featured links to three services that were still online: redcard, weekly ad, and find a store.

14-Hello

(click image to view full size)

11:30 – “Woof” #1

Around 11:30 the message changed to “We are suddenly extremely popular.”  Visitors were also asked to “Please stay here and we’ll try to get you in as soon as we can!”  Finally, this version also explained what the three links that began appearing in the previous version were, saying “We are up and running here” just above the links.

15-Woof-01

(click image to view full size)

12:38pm EDT – “Woof” #2

This version added a plea to not keep hitting refresh, something that can make bringing a site back up very difficult when a large group is all doing it at the same time: “Please know that there is no need to refresh your browser.  Your request will automatically retry in 30 seconds.”  This was the most well organized page, broken into four separate paragraphs, and adding back in an apology with the sentence, “Thank you and our apologies for the inconvenience.”

16-Woof02

(click image to view full size)

Sizing up the impact: just how bad was it, and for how long?

I was starting to put it all together.  Now I knew what the raw server error looked like and I had a play-by-play set of screen captures showing how the “waiting room” page evolved over time that morning. 

What I still wanted to know was just how “unavailable” the site had been.  Were any users getting through to the real home page at any point?  The fail doggie page said it would auto-refresh in 30 seconds and implied that perhaps one might eventually get through.  Was the real home page ever turning up, and if so how often? 

I waited until the next morning to size up the duration and intensity of the outage.  My goal was to confidently establish an “end point” for the incident and then do the numbers.

How can you measure “availability” when the site is displaying “OK” pages, but not the right ones?

The home page only measurement had the highest frequency (about 40 per hour) so I wanted to use that to calculate availability.  There are many different reports and graphs in the MyKeynote portal, and most of them will tell you at a glance what your “availability” is – that is, what percentage of the measurements succeeded versus failed in some way.  The catch here was that server error and “fail doggie” pages were sent to the browser as “OK” (http status 200) pages, not errors. 

Scripts can be easily enhanced with validations that look either for required text that should be there or error text that should not be there.  Either validation option would have marked the server message or fail doggie pages as errors and impacted the built-in availability calculation, but the index measurement I was using didn’t have that validation in place.  In practice, validation is usually used at the end of a multi-page script to be sure the right final page had been reached.  Fortunately the information we needed to answer the questions was readily available anyway (more on that below). We just had to step back and consider the available tools to size it up.

The tool I chose to use was MyKeynote’s Object Trending report.  This report is available for any measurement that has been configured with our Web Content Trending (WCT) option.  WCT stores performance information for every object in the page, not just a roll-up for the page as a whole. Once again, storing all those details day and night was about to become very handy.

The Object Trending report has several options, all of which provide ways to view the performance of page elements over time.  To display it, I chose the Target measurement, selected Object Trending and set the date range to the 24 hour period after the incident had begun:

17-ObjectTrendingSelected

(click image to view full size)

Here’s what that report looks like by default, which is a separate line for each domain that objects originated from:

18-ObjectTrendingGraphDisplayed

(click image to view full size)

The above default view is great for determining if one or more domains are particularly slow or unavailable, but I needed even more detail than that, so I dropped down the menu at the bottom of the graph and changed it to “Object data by object without parameters:”

19-OTWithoutParams

This option would give me each separate element in the page, ignoring any query-string data that might be appended to the object name.  Think of the home page as a stage with a cast of characters.  The object trending report was about to show me who had appeared in what number of performances. 

I clicked “Generate Graph Now” and scrolled down to the table below the graph. Now I had a listing of the frequency of every object that had appeared on the home page from 8:00am EDT onward:

20-ObjTrendTableRoughCutEdges

(click image to view full size)

What I was particularly interested in was the object name on the left and the “Included datapoints” on the right.   Every visit always had retrieved at least the base page (www.target.com). By comparing the count of base pages observed to the count of all the other objects, I could make some meaningful conclusions about how often the “fail doggie” had been turning up. 

I needed to do a little math, so I pulled the results into an Excel spreadsheet with a simple copy and paste, sorted by the datapoints column and added a new column to compare the count of each element to the count of the base page.  I did this a number of times, varying the time period a bit to narrow in on useful takeaways.

For starters, I re-ran the report to look just at the objects observed in the first hour between 8:00am EDT and 9:00am EDT.  The cast of characters is quite small; we either got just the base page of html, or the base page plus the elements of the fail doggie page.  Remember, these aren’t observations of all user traffic, they are the results observed by the visits of our real browser agents. While we were just a drop in the bucket of actual visitors, we did make it through to the site 31 times in that first hour.  Here’s what we saw: The “fail doggie” appeared just 55% of the time and we got nothing but the base page in the remaining 45% of visits.  The hundreds of thousands (or millions) of users attempting to visit during that same period likely saw a similar mix.

21-ExcelFirstHour

(click image to view full size)

So I could see that things had been pretty “messy” during the initial confusion of the incident.  An hour of outage is worth a lot of money to a site like target.com, and cryptic errors couldn’t have been good for inspiring confidence, but an hour is just an hour and perhaps many people hadn’t even tried to visit the site yet. 

The next question I wanted to answer was, “What percentage of Target.com’s traffic was able to make it to the home page throughout the rest of the day?”

Here’s a view of my spreadsheet based on running the Object Trending report for the period starting one hour after the incident (9:00am EDT) and ending at midnight that evening:

22-Excel-9-Midnight
(click image to view full size)

So taking the big view of that entire day up to midnight EDT (which is 9pm Pacific time), I determined that 93% got the fail doggie and no more than 7% got through to the real home page.

Not pretty no matter how you slice it

I re-ran the report with various time windows, and the results varied only slightly.  Most surprisingly, even pushing the end time all the way to 9am the following day, the stats still showed that 85% of all visits got the fail doggie page elements.  Focusing on Midnight 9/14/11 to 9:00am 9/14/11 the number was still high at 75%.  Was this all cleared up by the start of business on 9/14/11?  Looking at just the hour between 8:00am and 9:00am that second day 9/14/11, the number was still an amazing 50%.

By this time my boss was wondering why I was spending so much time with all those spreadsheets and charts, so I stopped my investigation there and moved on to share updated results with all the folks that had been asking for data.  I annotated a screenshot of a scatterplot graph and wrote a narrative to go with it, sharing the details with several media outlets and even conducting a radio interview with Wall Street Journal Radio.  Here are a couple of links to places the results showed up, along with a copy of that annotated scatter-plot graph.

Investor’s Business Daily: Target Website Crash Offers Lessons

Retail Online Integration: What You Can Learn From Target's Site Crash

23-target-scatterplot

(click image to view full size)

Epilog:

The one question I left unanswered was just how many truly useful pages there had been in the 7% that were not “fail doggie” pages.  I’m curious how many would have actually allowed a customer to purchase.  Given the size of Target.com’s typical traffic this time of year (reported as 29.5 million for the month of the prior October by Investor’s Business Daily), I was pretty content to stop with the work I had done over those two days, observing with a long sigh that the folks at Target, who are no amateurs at online retail, had missed out on a LOT of potential transactions by turning away something north of 93% of all visitor attempts that first day.

Posted by Dave Karow on September 30, 2011 at 05:10 PM in Application Performance Testing, Current Affairs, Load Testing, Test Website, Testing Web Applications, Transaction Monitoring, Web Load Test, Web Page Monitoring, Web Performance, Web/Tech, Website Availability Monitoring | Permalink | Comments (1) | TrackBack (0)

| |

Contrarian View: Amazon Outage Proves the Promise of Cloud Computing

By Ian Withrow

Those who know me well can tell you that I’m hardly a frothy fan boy, indeed I’m a died in the wool skeptic. So it may come as a surprise to you to hear that I view the fallout of the recent Amazon Web Services (AWS) outage as a very positive sign for Cloud Computing. Sure some sites got taken down, including one of my personal favorites Quora. However, another favorite site of mine managed to survive the incident with comparatively minor hiccups: Netflix. This is the bright spot I want to highlight. I just happened to have a performance measurement for Netflix in my Keynote account. On the east coast starting at 12am April 21st, Netflix’s performance for successful transactions stayed a consistent couple of seconds and was available 96% of the time. Granted this isn’t flawless execution, note that the 27 failed data points are all timeouts resulting in just a red screen. However, compared to what happened to many sites, this is outstanding. (Y-axis details obscured)

AWS Promise

It’s not dumb luck that got Netflix off this easy. It’s the product of hard work and engineering time invested in building their Amazon Web Services deployment the right way. As Netflix has been touting in various cloud conferences this year, they’ve been forced to fully embrace AWS due to their tremendous growth. Basically they only run credit card transactions in their private network. To ensure they always have enough capacity (and incidentally are highly available) they have turned provisioning decisions over to their operational systems.  Whenever an Amazon instance is poorly performing they terminate it and get a new one.  Likewise if there is an availability zone acting up (like what happened) then they automatically switch over to another.

This is how real high availability has always been done in networking: ensure that you can automatically failover to logically, physically, and geographically separate resources.  Any real engineer will tell you that problems and failures will happen.  Your availability track record is not based on how frequently this occurs but how gracefully you recover from them.

Herein is the promise of Cloud Computing: namely the favoreable relationship between cost and failover capabilities. In a private network world you would have to build and pay for a lot of stuff yourself: multiple data centers, double the hardware, internet access connections on opposite sides of the building, etc.  Very quickly the cost of high availability gets prohibitive, locking out all but the deepest of pockets.  Netflix explicitly said at Cloud Connect they came to the conclusion that they, even with all their growth, just weren’t big enough to justify building their own network of redundant data centers.

Enter Cloud Computing.  Now having access to redundant data centers is just a matter of purchasing the right performance monitoring tools and the engineering time in programming your applications and operational systems to take full advantage of on demand resources.  In the end you only pay for what you use of the infrastructure, not what you might need as is the case when doing it yourself. That’s what the real shame and promise highlighted by this outage is, young companies like Quora and Foursquare could easily have done just what Netflix has done.  The barrier to entry here isn’t a huge budget but the knowledge and priorities to do the work. The next step of course after fully leveraging Amazon is to be able to failover to different cloud providers, I’d bet you $100 Netflix is working on exactly this right now.

In a way this drives home a point we’ve known all along.  Cloud Computing is not outsourcing, this implies a transfer of risk and responsibility. You, not Amazon or Microsoft or Google etc., are responsible for the performance of your applications whether they are in the cloud or not.  Cloud Computing is a powerful tool to increase performance and availability many fold while reducing costs, if it’s used correctly. If you don’t use the tool properly then an outage isn’t Amazon’s fault, it’s yours.  I'll leave you with this thought, Amazon seems to agree: according to Gartner Analyst Lidya Leong this isn’t an outage that generates service credits. (Quote at very end of article)

Posted by Ian Withrow on April 22, 2011 at 11:36 AM in Web Page Monitoring, Web Performance, Web/Tech, Website Availability Monitoring, Website Monitoring, Website Monitoring Service, Website Performance Monitoring | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: Amazon Web Services, Cloud Application Perspective, Cloud Computing, Cloud Monitoring, EC2, Keynote, Outage

| |

“The Cloud” Terminology Rant

By Ian Withrow

My last few post have been pretty rich with Keynote specific content. But today the temperature is supposed to reach 80 degrees so for my last post of the month I think it’s time for something a little bit more spring: a rant. Specifically a rant on what to some may seem like splitting hairs and to others may seem like the most important of things: terminology. In blog posts, in many other areas of the web, and even the airport these days you’ll see something referred to as “The Cloud.” I must confess upon consideration that I strongly dislike this simplification in spite of its popularity today. The problem is it is ambiguous and means different things based on the agenda of who you are talking to and when you are speaking with them. (I’m probably no exception) For example, today the loudest voices might mean ‘Cloud Computing’ or perhaps they are referring to running your applications through a web browser when they say ‘Cloud’. The problem is these meanings, and others, have little in common beneath the skin. What ‘Cloud’ really is, when the topic is something related to networking, computers, and communications, is an abstraction. Starting at least in the 70’s and perhaps earlier, when data networking was young, a cloud was used in network diagrams to simply represent a logical group of infrastructure that wasn’t relevant to the discussion at hand. Like for example this X.25 network diagram courtesy of Wikipedia.

X.25 Network

As the Internet Protocol (Internet) supplanted earlier solutions like X.25 it maintained their conventions, typically using a cloud to represent the internet backbone in a diagram. Other technologies like, Frame Relay and ATM, also followed this practice. However, the internet quickly eclipsed these other solutions in terms of mindshare and soon, at least in this world, you started hearing people say “the internet cloud.” It’s at this point that I’m really wishing I had my old Newton’s Telecom Dictionary so I could find out what he had to say about it. Regardless, fast forward a few years and someone who was combining virtualization with lots of spare hardware decides that “Cloud Computing” is a good term to use. After all you are using the internet cloud to access computing resources, Cloud Computing. Makes sense right? But now simplifying it to just the Cloud is confusing as heck. It’s like they decided to call it ‘Black Box Computing’ and then for simplicity sake started calling it the ‘Black Box’. It might work if no one else was using this abstraction but the truth couldn’t be more the opposite. So in conclusion on this fine spring day I encourage you to ask anyone who is talking about the cloud to clarify what type of cloud.

 

 

Posted by Ian Withrow on April 08, 2011 at 03:19 PM in Web/Tech, Weblogs | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: Cloud, Cloud Computing, Cloud Definition, Cloud Monitoring, Performance Monitoring

| |

A Look Behind the Scenes at Monitoring the Cloud

A Look Behind the Scenes at Monitoring the Cloud

The Internet is amazingly, almost incomprehensibly, large.  Nearly 300 million websites exist today—90 million more than a year ago. The rate of change in the complexity of content, applications and the tools we use to access it all is insanely great (to borrow from Steve Jobs).

What hasn’t changed is our expectation that it all works, and works exceptionally well.

The overwhelming complexity and diversity of what defines “the user experience” is driven by constant innovation. Keynote continuously manages a world-class infrastructure and develops new technologies to ensure that the entirety of your customers’, employees’ and partners’ Internet, mobile and private cloud experience can be monitored and optimized, no matter from where, or how they interact with your content and services.

That’s a lot of sausage to grind! So how do we do it? Take a look behind the scenes at Keynote.

 

Posted by Aaron Rudger on March 31, 2011 at 05:54 AM in Web/Tech | Permalink | Comments (0) | TrackBack (0)

| |

The Website Sausage Factory and Impact on Performance – a TechCrunch Case Study

By Ian Withrow

As we’ve discussed in various blog posts, websites are like a sausage.  Ok maybe not so directly, but like a sausage they are made of ingredients than come from many sources even though they are presented in one tidy package to the user.  Today I’m going to break apart the sausage that is the TechCrunch blog, show how this can be easily done for any website using KITE, and as a special bonus show how a sausage maker can monitor all the pieces of their links using a Keynote technology called Virtual Pages. 

TechCrunch Composition

First a fair warning: just like with sausage making, finding out what is inside your favorite website is not always pretty.  If you feel you are of a squeamish disposition then you have been warned.  Second, note that all details in the post are from the time of writing and the balance of ingredients is likely to change overtime.

TechCrunch is a behemoth of a site, weighing in at just over 4 MB of data, 329 page elements, and a whopping 65 domains.  About half of this comes from them directly or really via Wordpress who is evidently the platform they use to power their blog.  The rest comes from over 20 3rd parties.  You read that correctly, 65 domains in total and half the content originates from someone else.  After direct content the next biggest category is from social sites; Facebook, Twitter, and tools related to these properties total about 1 MB.  Google is about a 500 KB and the ‘Misc.’ category of various 3rd party tools that TechCrunch uses to improve user experience is about 200 KB.  Ads and ad related content are about 180 KB.  Below is a snapshot from KITE breaking it out for you.  (Note you need to complete the process in the next section to actually get this view)

Download Size


Now it’s time for a few fun observations.  The amount of content from Facebook and Twitter is huge!  Each alone is bigger than most websites are in total.  Digging into this is unfortunately off topic for this post but it is definitely something on my radar screen for the future.  Another interesting area is the level of user tracking that goes on.  I could identify at least 6 different 3rd parties that were tracking TechCrunch visitors, not including Google and Facebook.  TechCrunch knows what you are, if not who.

Finally, while these stats make it seem like TechCrunch is hardly advertising, understand that TechCrunch is a very, very, very long page (vertically) and all the ad content is at the top where the user is most likely to see it.  They aren’t dummies giving away their yummy sausage for free.

Scripting Sites for 3rd Party Monitoring & Analysis

To make sense of this mess I used Keynote’s KITE product.  There are a lot of other great, free products out there that one can use to view all the content and domains of a page.  However, KITE has the ability to permanently parcel out these domains into what we call Virtual Pages for ongoing monitoring and analysis.  Note this section won’t be a detailed how-to; I’m going to focus on highlighting what is possible with the tool.  After which you should be prepared to experiment or watch this training video depending on your learning style.

After downloading TechCrunch in KITE I organized the content by domain as shown below.

Transaction Performance Details

This lets me easily see the composition and breakout of a page in a manual fashion.  If I just want to see the domains I can simply collapse the domain groupings.  There are tons of options that I can add to this view like content size and various time breakouts based on my interest.  Here is a complete list:

Keynote Components List

With just this you can see that I can casually learn a lot about the page.  However, if I’m serious about how TechCrunch and its 3rd parties perform then I need ongoing data points.  If I’m going to gather a lot of data then I don’t want to do this parsing and analysis manually, it just won’t scale.  The solution is to organize this content into permanent logical pieces.  For example, in a simple scenario I’d carve out a Virtual Page for my advertising so I could monitor and analyze the performance of that content separately from my content with Keynote.  As you can imagine the more complicated your site becomes, the more important this exercise is.  True you can always pick through a waterfall manually to see who did it in the event of the problem but if you want to have ongoing data about 3rd party performance or be proactive with alerts then you’ll need something like Virtual Pages.  The nice thing is once you’ve designated content into a Virtual Page you can monitor and analyze it like a regular page.

Let’s discuss how I broke-up and organized TechCrunch.  Please note I’m not holding this up as the standard for the best or only way to use Virtual Pages.  One thing we need to keep in mind is cost.  Each page (virtual or otherwise) adds to the cost of the measurement and so in the real world we probably can’t go hog wild with these.  Given an unlimited budget I’d define a Virtual Page for each 3rd party, possibly even one for each domain if I was especially crazy for detailed data.  My guess is you live in the real world and even if your site isn’t as complicated as this one you’ll need to create some buckets.  Most likely you’d start with prior experience, defining Virtual Pages where you knew or suspected there was a problem.  Here I simply broke the site into the following logical categories:

  • TechCrunch direct content plus AOL
  • Google (but not Google owned advertising)
  • Facebook
  • Twitter (and related tools like Postup)
  • Wordpress (even though this is the core of the site, I want to evaluate my vendor here)
  • Misc. Tools and Widgets for the users
  • Analytics and User Tracking
  • Ads and Ad related content

Why no CDN category? We certainly encounter CDN’s here but each is tied to a specific 3rd party.  Facebook has its own CDN, the ad platforms have CDNs and so forth.  So instead I left the CDN’s with their respective masters.

Here is a brief teaser for how this is done in KITE

Step 1) Pick the URL you want to virtualize and run the page once (we did this already)

Step 2) Right click on the page in question and select ‘Insert Virtual Action’

Add Virtual Action

Step 3) We now have a new Virtual Page at the bottom of your script.  Right click ‘Match Page Elements’ and select ‘Add URL Match’.  Here I’ve used the naming convention “vp:TechCrunch” to distinguish Virtual Pages from real pages.  You can name them anything you want in practice though.  There are other options that you can use to construct Virtual Pages, such as content type, that have interesting possibilities but to address 3rd party content, URL seems ideal to me.  As you see below I’ve created a list of URL matches that should capture all the differently named TechCrunch domains.

Add Page Match

Step 4) In the Script Properties Editor you can create the settings for each URL Match.  Note in my script I used a variety of regular expressions so that I could get away with far fewer rules than the 65 domains and still cover all of the page content.

Script Properties Editor

Note that to do this I never had to write any code or do any advanced scripting.  It was all point, click, and form completion.  Hopefully by now you can see how easy it is to create Virtual Pages in KITE and have an idea of its possibilities.

How can I Benefit from Virtual Pages?

The obvious and immediate answer is you can now isolate and monitor the performance of certain 3rd parties or subsections of you website.  If Facebook slows down you’ll know immediately and explicitly that this is case regardless of the overall impact on your performance.  Moreover, you can easily track and directly report on the performance of these guys overtime without needing to manually crunch the data and objects yourself.  Another interesting possibility is you could monitor your own additions to your site to see how they fair.  Finally, another angle might be to isolate and monitor all the Javascript that your site utilizes.  There are a countless number of ways that Virtual Pages might be used, and my list probably just scratches the surface.  Have fun with it!

 

Posted by Ian Withrow on December 31, 2010 at 10:55 AM in Site Load Time, Testing Web Applications, Transaction Monitoring, Web Page Monitoring, Web Performance, Web Performance Testing, Web/Tech, Weblogs, Website Availability Monitoring, Website Monitoring, Website Monitoring Service, Website Monitoring Software, Website Performance Monitoring | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: 3rd party content, TechCrunch, Web Development, Web Monitoring, Web Performance

| |

Chrome Web Apps Store + Chrome = A Cloud-Driven User?

According to one statistic of Google's, the average PC user spends 90 percent of their time browsing the web. It stands to reason then that with most people interested in web browsing and using cloud applications, perhaps a web-centric operating system is the next logical step in the evolution of the PC. Enter Google's latest offering, the Chrome Web Store, which is a being touted as "the app store for the web" - as long as you have the Chrome browser. Over time, these applications will be available for other browsers. Chrome_web_store

A few items of note:

  • While the grand opening of the App Store will take place next year, you can go on right now and try various applications. When you select a product, the "installation" will open a new tab with the product icon you choose in it.
  • There will be some paid apps, at this time it appears it will be about 20 paid apps, most of them are games.
  • Google is also introducing packaged apps, which can allow an application to work in offline mode. On the Apps Store the options are theme for the background, extensions, which Google defines as a feature that you can add to a browser, and Apps which are defined as an "advanced interactive websites".

Now Firefox has extensions, but the Google Apps store provides for easier access and quite honestly, a cleaner more easily explainable interface for the uninitiated. Unlike the Apple iStore, the Chrome Web Store is open source and developers have far more leeway to develop than Apple developers. (Think the Android Market for mobile devices.)

I'm excited by the potential of this, but let's be honest this setup is only as good as your web applications allow you to be, and the browser. Let's take a quick look: Extensions for blogging and popular websites such as ESPN and CNN, some of these are similar to what I'm used to on Firefox. In addition, there are applications for Education. Games, and much more. Some of these apps I use on my mobile phone, which means an immediate level of connectivity between the desktop and mobile platforms. Two examples would be Springpad and Box.net

I could go on and on, but you look at the hundreds of applications available right now, with more arriving every day - including free and paid applications, and the "application gap" between the desktop and the cloud will rapidly decrease. The other part? Performance. I tested Springpad, Huffington Post, and WGT Golf Challenge. I was really impressed with the performance of Huffington Post NewsGlide. I actually preferrede News Glide interface to the regular website. WGT is a graphics golf game with solid physics (meaning that shots behave in a very lifelike fashion) and the performance is solid, if occasionally choppy.

The second part of a high performing browser-based setup is the operating system. Google explains their premise with a video presentation that explains how the Chrome OS skips most of the conventional loading of an operating system. PCs running Chrome OS booted up at 7 seconds in tests, which is at least seven (yes SEVEN) times faster than I've ever launched a PC running Windows on a desktop replacement machine. 

So we're talking solid performance with the applications, a blazing fast OS coming out soon, which leaves the browser.

Browser_speeds2 Look at this benchmark performance chart showing the relative performance of various web browsers  based on tests performed a few months ago. The freeware program Peacemaker Futuremark was used for benchmark testing. Looking at the results, Chrome is clearly superior in performance than Firefox or IE (which are used by the majority of PC owners), and in my personal experience, I've notice Firefox has shown more of a propensity to crashes than before.

Start with a high performing browser. Toss in an open source market of applications and an environment that fosters rapid development among the development community. Have the applications and browser work within a new OS that is "webcentric", which caters to a user base that is all about web browsing and becoming increasingly comfortable with cloud applications. When these factors are all factored in, I anticipate 2011 to be the year where the Chrome Web Store carves out a space akin to the iStore.

 

Posted by G.H. Brooks on December 31, 2010 at 07:00 AM in Application Performance Testing, Current Affairs, Web/Tech | Permalink | Comments (0) | TrackBack (0)

| |

Elevate Performance Awareness in your Company by Publishing Keynote Graphs

By Ian Withrow

I’ll warn you right off this isn’t my typical post about a concept or a market trend.  I’m going to straight up show you how to do something cool in MyKeynote that’s available right now: export a saved graph.  Why you ask would anyone want to do this?  It’s the perfect way to raise visibility within your organization about the valuable information that your Keynote measurements generate.

In many of the organizations I talk to, Keynote data is an invaluable tool for IT and operational groups but it’s a shame in my opinion that this information doesn’t get out to groups like product management and sales as readily as it should. So today I’m going to show you how to create a graph in MyKeynote that will provide an ongoing view into the performance of your website or application.  I’ll also suggest some types of graphs different business functions might be interested in.  Finally, I’ll spit ball some ideas on how you might distribute this information.

First Create and Save a Graph

First, before we can become heroes (or villains who dare speak the truth) we need to figure out how to actually share a graph.  The first part of this is a routine MyKeynote process so stick with me or skip down to the next section if you already know how to save a graph.

 To start, we head to the graph section and perform the following steps:

  1. Select the measurement we want to graph.  This could be more than one measurement if we wanted to make a comparison.  I’ve gone with the Keynote Business 40, The Keynote Business 40 Internet Performance Index (KB40) measures the average download time for the home pages of 40 important US-based business Web sites.  To find out more go here.
  2. Pick the graph type, time history is a great option to show performance overtime.
  3. Time period is up to you but its important use to a relative period if you want your graph to have permanent relevance.
  4. Finally I’ve elected just to show performance to keep things simple in this post.
  5. Ok let’s generate the graph, this graph is a big one with a lot of data and measurements in it so it takes some time to generate.  Keep in mind how long your audience will wait when you setup your graph.

Graph Creation v2

Once generated, save the graph using the menu in the top right hand corner.

Save Graph

The next step is important not to mess up so read carefully.  Name your graph whatever you want and click save.  You’ll get a dialog window like the below.  If you want a ‘relative’ graph you need to choose “Cancel”.

Relative Graphs
Retrieving the URL for your Graph

Ok now we are done with MyKeynote and for the experienced users out there you may note that we haven’t done anything unusual or new yet.  Now for the trick!  Keynote actually publishes your saved graphs in two heretofore undocumented RSS feeds.  One feed publishes the entire MyKeynote graph page, complete with legend and the other publishes just the graphs themselves.  Those feeds are:

  1. For the full graph page: http://my.keynote.com/newmykeynote/mykeynoterss.do
  2. For just the graph:  http://my.keynote.com/newmykeynote/mykeynoteembed.do

You’ll find that visiting one of these feeds requires you to login in to your MyKeynote account.  Do so with a browser like IE or a feed reader that can handle authenticated RSS feeds and you’ll see something like this.

Graph RSS
The URL behind Keynote Business 40 and any other graph is permanent (as long as the graph is saved) and can be requested by clients without logging in.  Note that you can also get this link from the MyKeynote home page “Saved Graphs” widget.  There you can find an RSS feed button that will provide the same view above.  Below is an example of what the link provides:

KB40

Putting your Data to Work

This is a handy feature to be sure but how best to make use of it? Well at the simplest level you could simply distribute the link internally to interested parties.  Better might be to embed this page within another page, such as your companies Intranet.  Perhaps the cleverest idea I’ve encountered so far is to actually serve different graphs based on the organizational function of the user logging into an internal knowledge base site.

Presumably different groups want to know different things.  A product manager or marketing manager needs to know how their key pages or transactions are performing and for a large company there may be multiple product areas with different interests.  Developers may want to track their pages to see the impact of any code changes.  In contrast sales may wish to see the company’s page benchmarked against key competitors.  Any outage in a competitor’s site can provide an immediate piece of FUD to use.  Similarly a problem in their site is a heads up that they may be getting customer complaints.  As you can see timescale, graph type, and level of detail can easily vary based on the user’s interest.

As mentioned earlier it’s important to consider load time.  Generating graphs of measurements from many agents over a long period of time can take a few minutes.  You may want to limit the scope of your graphs to something that can be quickly generated to maximize the amount of attention it gets from viewers.  A drill down graph can easily be referenced to for users who are willing to wait for more details.

This feature is new and we see a lot of areas to make it even better.  Let us know how you make use of this feature and what improvements you think are most important to improving its impact.

Posted by Ian Withrow on December 16, 2010 at 04:42 PM in Application Performance Testing, Site Load Time, Transaction Monitoring, Web Page Monitoring, Web Performance, Web Performance Testing, Web/Tech, Weblogs, Website Availability Monitoring, Website Monitoring, Website Monitoring Service, Website Monitoring Software, Website Performance Monitoring | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: Cloud, Internet, Keynote, Performance, Web

| |

Next »

Search

Connect With Keynote

  • Subscribe to RSS
  • Follow us on Twitter
  • View us on YouTube
  • Signup for our Newsletter
  • Other Keynote Blogs:
    • Keynote Web Privacy

    • Keynote Mobility

    • Performance Watch - Le
     blog

    • Cloud Testing und Performance Monitoring

Keynote Web Performance Watch Blog

A forum for discussion and commentary on technology, trends and touchpoints of interest to the Web Performance community.

Recent Posts

  • A 10-Point Checklist To Ensure Site Uptime When Switching Web Hosts
  • Understanding the Impact of Web Attacks - the User Perspective
  • How Do I Compare Thee? Linode vs Bluehost Web Host Performance Shootout
  • Super Disappointing
  • Cyber Monday Breaks Records, But Could It Have Been Even Better?
  • Retailers Zip Through Black Friday
  • Firefox & User Experience Metrics, Now in KITE
  • From Network Performance to User Experience
  • Is an Outage Sometimes Your Best Strategy?
  • Filtering Out Web Performance Monitoring Traffic From Google Analytics

About This Blog

  • • About
  • • View Archives
Copyright © 1995-2012 Keynote Systems, Inc. All rights reserved.


  • • Terms of Service
  • • Privacy Policy
  • • Site Map
  • • Support