Web Performance Watch

“The Cloud” Terminology Rant

By Ian Withrow

My last few post have been pretty rich with Keynote specific content. But today the temperature is supposed to reach 80 degrees so for my last post of the month I think it’s time for something a little bit more spring: a rant. Specifically a rant on what to some may seem like splitting hairs and to others may seem like the most important of things: terminology. In blog posts, in many other areas of the web, and even the airport these days you’ll see something referred to as “The Cloud.” I must confess upon consideration that I strongly dislike this simplification in spite of its popularity today. The problem is it is ambiguous and means different things based on the agenda of who you are talking to and when you are speaking with them. (I’m probably no exception) For example, today the loudest voices might mean ‘Cloud Computing’ or perhaps they are referring to running your applications through a web browser when they say ‘Cloud’. The problem is these meanings, and others, have little in common beneath the skin. What ‘Cloud’ really is, when the topic is something related to networking, computers, and communications, is an abstraction. Starting at least in the 70’s and perhaps earlier, when data networking was young, a cloud was used in network diagrams to simply represent a logical group of infrastructure that wasn’t relevant to the discussion at hand. Like for example this X.25 network diagram courtesy of Wikipedia.

X.25 Network

As the Internet Protocol (Internet) supplanted earlier solutions like X.25 it maintained their conventions, typically using a cloud to represent the internet backbone in a diagram. Other technologies like, Frame Relay and ATM, also followed this practice. However, the internet quickly eclipsed these other solutions in terms of mindshare and soon, at least in this world, you started hearing people say “the internet cloud.” It’s at this point that I’m really wishing I had my old Newton’s Telecom Dictionary so I could find out what he had to say about it. Regardless, fast forward a few years and someone who was combining virtualization with lots of spare hardware decides that “Cloud Computing” is a good term to use. After all you are using the internet cloud to access computing resources, Cloud Computing. Makes sense right? But now simplifying it to just the Cloud is confusing as heck. It’s like they decided to call it ‘Black Box Computing’ and then for simplicity sake started calling it the ‘Black Box’. It might work if no one else was using this abstraction but the truth couldn’t be more the opposite. So in conclusion on this fine spring day I encourage you to ask anyone who is talking about the cloud to clarify what type of cloud.

 

 

Posted by Ian Withrow on April 08, 2011 at 03:19 PM in Web/Tech, Weblogs | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: Cloud, Cloud Computing, Cloud Definition, Cloud Monitoring, Performance Monitoring

| |

The Website Sausage Factory and Impact on Performance – a TechCrunch Case Study

By Ian Withrow

As we’ve discussed in various blog posts, websites are like a sausage.  Ok maybe not so directly, but like a sausage they are made of ingredients than come from many sources even though they are presented in one tidy package to the user.  Today I’m going to break apart the sausage that is the TechCrunch blog, show how this can be easily done for any website using KITE, and as a special bonus show how a sausage maker can monitor all the pieces of their links using a Keynote technology called Virtual Pages. 

TechCrunch Composition

First a fair warning: just like with sausage making, finding out what is inside your favorite website is not always pretty.  If you feel you are of a squeamish disposition then you have been warned.  Second, note that all details in the post are from the time of writing and the balance of ingredients is likely to change overtime.

TechCrunch is a behemoth of a site, weighing in at just over 4 MB of data, 329 page elements, and a whopping 65 domains.  About half of this comes from them directly or really via Wordpress who is evidently the platform they use to power their blog.  The rest comes from over 20 3rd parties.  You read that correctly, 65 domains in total and half the content originates from someone else.  After direct content the next biggest category is from social sites; Facebook, Twitter, and tools related to these properties total about 1 MB.  Google is about a 500 KB and the ‘Misc.’ category of various 3rd party tools that TechCrunch uses to improve user experience is about 200 KB.  Ads and ad related content are about 180 KB.  Below is a snapshot from KITE breaking it out for you.  (Note you need to complete the process in the next section to actually get this view)

Download Size


Now it’s time for a few fun observations.  The amount of content from Facebook and Twitter is huge!  Each alone is bigger than most websites are in total.  Digging into this is unfortunately off topic for this post but it is definitely something on my radar screen for the future.  Another interesting area is the level of user tracking that goes on.  I could identify at least 6 different 3rd parties that were tracking TechCrunch visitors, not including Google and Facebook.  TechCrunch knows what you are, if not who.

Finally, while these stats make it seem like TechCrunch is hardly advertising, understand that TechCrunch is a very, very, very long page (vertically) and all the ad content is at the top where the user is most likely to see it.  They aren’t dummies giving away their yummy sausage for free.

Scripting Sites for 3rd Party Monitoring & Analysis

To make sense of this mess I used Keynote’s KITE product.  There are a lot of other great, free products out there that one can use to view all the content and domains of a page.  However, KITE has the ability to permanently parcel out these domains into what we call Virtual Pages for ongoing monitoring and analysis.  Note this section won’t be a detailed how-to; I’m going to focus on highlighting what is possible with the tool.  After which you should be prepared to experiment or watch this training video depending on your learning style.

After downloading TechCrunch in KITE I organized the content by domain as shown below.

Transaction Performance Details

This lets me easily see the composition and breakout of a page in a manual fashion.  If I just want to see the domains I can simply collapse the domain groupings.  There are tons of options that I can add to this view like content size and various time breakouts based on my interest.  Here is a complete list:

Keynote Components List

With just this you can see that I can casually learn a lot about the page.  However, if I’m serious about how TechCrunch and its 3rd parties perform then I need ongoing data points.  If I’m going to gather a lot of data then I don’t want to do this parsing and analysis manually, it just won’t scale.  The solution is to organize this content into permanent logical pieces.  For example, in a simple scenario I’d carve out a Virtual Page for my advertising so I could monitor and analyze the performance of that content separately from my content with Keynote.  As you can imagine the more complicated your site becomes, the more important this exercise is.  True you can always pick through a waterfall manually to see who did it in the event of the problem but if you want to have ongoing data about 3rd party performance or be proactive with alerts then you’ll need something like Virtual Pages.  The nice thing is once you’ve designated content into a Virtual Page you can monitor and analyze it like a regular page.

Let’s discuss how I broke-up and organized TechCrunch.  Please note I’m not holding this up as the standard for the best or only way to use Virtual Pages.  One thing we need to keep in mind is cost.  Each page (virtual or otherwise) adds to the cost of the measurement and so in the real world we probably can’t go hog wild with these.  Given an unlimited budget I’d define a Virtual Page for each 3rd party, possibly even one for each domain if I was especially crazy for detailed data.  My guess is you live in the real world and even if your site isn’t as complicated as this one you’ll need to create some buckets.  Most likely you’d start with prior experience, defining Virtual Pages where you knew or suspected there was a problem.  Here I simply broke the site into the following logical categories:

  • TechCrunch direct content plus AOL
  • Google (but not Google owned advertising)
  • Facebook
  • Twitter (and related tools like Postup)
  • Wordpress (even though this is the core of the site, I want to evaluate my vendor here)
  • Misc. Tools and Widgets for the users
  • Analytics and User Tracking
  • Ads and Ad related content

Why no CDN category? We certainly encounter CDN’s here but each is tied to a specific 3rd party.  Facebook has its own CDN, the ad platforms have CDNs and so forth.  So instead I left the CDN’s with their respective masters.

Here is a brief teaser for how this is done in KITE

Step 1) Pick the URL you want to virtualize and run the page once (we did this already)

Step 2) Right click on the page in question and select ‘Insert Virtual Action’

Add Virtual Action

Step 3) We now have a new Virtual Page at the bottom of your script.  Right click ‘Match Page Elements’ and select ‘Add URL Match’.  Here I’ve used the naming convention “vp:TechCrunch” to distinguish Virtual Pages from real pages.  You can name them anything you want in practice though.  There are other options that you can use to construct Virtual Pages, such as content type, that have interesting possibilities but to address 3rd party content, URL seems ideal to me.  As you see below I’ve created a list of URL matches that should capture all the differently named TechCrunch domains.

Add Page Match

Step 4) In the Script Properties Editor you can create the settings for each URL Match.  Note in my script I used a variety of regular expressions so that I could get away with far fewer rules than the 65 domains and still cover all of the page content.

Script Properties Editor

Note that to do this I never had to write any code or do any advanced scripting.  It was all point, click, and form completion.  Hopefully by now you can see how easy it is to create Virtual Pages in KITE and have an idea of its possibilities.

How can I Benefit from Virtual Pages?

The obvious and immediate answer is you can now isolate and monitor the performance of certain 3rd parties or subsections of you website.  If Facebook slows down you’ll know immediately and explicitly that this is case regardless of the overall impact on your performance.  Moreover, you can easily track and directly report on the performance of these guys overtime without needing to manually crunch the data and objects yourself.  Another interesting possibility is you could monitor your own additions to your site to see how they fair.  Finally, another angle might be to isolate and monitor all the Javascript that your site utilizes.  There are a countless number of ways that Virtual Pages might be used, and my list probably just scratches the surface.  Have fun with it!

 

Posted by Ian Withrow on December 31, 2010 at 10:55 AM in Site Load Time, Testing Web Applications, Transaction Monitoring, Web Page Monitoring, Web Performance, Web Performance Testing, Web/Tech, Weblogs, Website Availability Monitoring, Website Monitoring, Website Monitoring Service, Website Monitoring Software, Website Performance Monitoring | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: 3rd party content, TechCrunch, Web Development, Web Monitoring, Web Performance

| |

Elevate Performance Awareness in your Company by Publishing Keynote Graphs

By Ian Withrow

I’ll warn you right off this isn’t my typical post about a concept or a market trend.  I’m going to straight up show you how to do something cool in MyKeynote that’s available right now: export a saved graph.  Why you ask would anyone want to do this?  It’s the perfect way to raise visibility within your organization about the valuable information that your Keynote measurements generate.

In many of the organizations I talk to, Keynote data is an invaluable tool for IT and operational groups but it’s a shame in my opinion that this information doesn’t get out to groups like product management and sales as readily as it should. So today I’m going to show you how to create a graph in MyKeynote that will provide an ongoing view into the performance of your website or application.  I’ll also suggest some types of graphs different business functions might be interested in.  Finally, I’ll spit ball some ideas on how you might distribute this information.

First Create and Save a Graph

First, before we can become heroes (or villains who dare speak the truth) we need to figure out how to actually share a graph.  The first part of this is a routine MyKeynote process so stick with me or skip down to the next section if you already know how to save a graph.

 To start, we head to the graph section and perform the following steps:

  1. Select the measurement we want to graph.  This could be more than one measurement if we wanted to make a comparison.  I’ve gone with the Keynote Business 40, The Keynote Business 40 Internet Performance Index (KB40) measures the average download time for the home pages of 40 important US-based business Web sites.  To find out more go here.
  2. Pick the graph type, time history is a great option to show performance overtime.
  3. Time period is up to you but its important use to a relative period if you want your graph to have permanent relevance.
  4. Finally I’ve elected just to show performance to keep things simple in this post.
  5. Ok let’s generate the graph, this graph is a big one with a lot of data and measurements in it so it takes some time to generate.  Keep in mind how long your audience will wait when you setup your graph.

Graph Creation v2

Once generated, save the graph using the menu in the top right hand corner.

Save Graph

The next step is important not to mess up so read carefully.  Name your graph whatever you want and click save.  You’ll get a dialog window like the below.  If you want a ‘relative’ graph you need to choose “Cancel”.

Relative Graphs
Retrieving the URL for your Graph

Ok now we are done with MyKeynote and for the experienced users out there you may note that we haven’t done anything unusual or new yet.  Now for the trick!  Keynote actually publishes your saved graphs in two heretofore undocumented RSS feeds.  One feed publishes the entire MyKeynote graph page, complete with legend and the other publishes just the graphs themselves.  Those feeds are:

  1. For the full graph page: http://my.keynote.com/newmykeynote/mykeynoterss.do
  2. For just the graph:  http://my.keynote.com/newmykeynote/mykeynoteembed.do

You’ll find that visiting one of these feeds requires you to login in to your MyKeynote account.  Do so with a browser like IE or a feed reader that can handle authenticated RSS feeds and you’ll see something like this.

Graph RSS
The URL behind Keynote Business 40 and any other graph is permanent (as long as the graph is saved) and can be requested by clients without logging in.  Note that you can also get this link from the MyKeynote home page “Saved Graphs” widget.  There you can find an RSS feed button that will provide the same view above.  Below is an example of what the link provides:

KB40

Putting your Data to Work

This is a handy feature to be sure but how best to make use of it? Well at the simplest level you could simply distribute the link internally to interested parties.  Better might be to embed this page within another page, such as your companies Intranet.  Perhaps the cleverest idea I’ve encountered so far is to actually serve different graphs based on the organizational function of the user logging into an internal knowledge base site.

Presumably different groups want to know different things.  A product manager or marketing manager needs to know how their key pages or transactions are performing and for a large company there may be multiple product areas with different interests.  Developers may want to track their pages to see the impact of any code changes.  In contrast sales may wish to see the company’s page benchmarked against key competitors.  Any outage in a competitor’s site can provide an immediate piece of FUD to use.  Similarly a problem in their site is a heads up that they may be getting customer complaints.  As you can see timescale, graph type, and level of detail can easily vary based on the user’s interest.

As mentioned earlier it’s important to consider load time.  Generating graphs of measurements from many agents over a long period of time can take a few minutes.  You may want to limit the scope of your graphs to something that can be quickly generated to maximize the amount of attention it gets from viewers.  A drill down graph can easily be referenced to for users who are willing to wait for more details.

This feature is new and we see a lot of areas to make it even better.  Let us know how you make use of this feature and what improvements you think are most important to improving its impact.

Posted by Ian Withrow on December 16, 2010 at 04:42 PM in Application Performance Testing, Site Load Time, Transaction Monitoring, Web Page Monitoring, Web Performance, Web Performance Testing, Web/Tech, Weblogs, Website Availability Monitoring, Website Monitoring, Website Monitoring Service, Website Monitoring Software, Website Performance Monitoring | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: Cloud, Internet, Keynote, Performance, Web

| |

Adoption rate of 3rd party web content

A colleague of mine pondered the other day about the prevalence of 3rd party content in web sites.  This is a great question because the performance 3rd party content has a substantial impact on user experience, unfortunately it is typically negative.  But to really answer this question accurately you need to define what subset of sites you care about.  Even if you could analyze the 200 million some domains out there this is probably not what most people are interested in.  So I modified the question to: what portion of successful web sites have 3rd party content? Happily, my colleague was good enough to provided a white paper produced by anti-malware company Dasient that studied the Fortune 500 recently on this very topic.

First, before we look at their findings let’s define 3rd party content.  That’s basically any content that a user’s client receives that wasn’t provided by your domain.  For example: Ads, APIs/Widgets/etc, and analytics all count but having a separate domain for your images does not.

Back to the Fortune 500, Dasient found that 75% of companies had what they called 3rd party javascript and 42% had advertising.  Surprisingly even many manufacturing companies had advertising.  See graphics below for the breakout by industry vertical.

Fortune 500 3rd party javascript take rate by vertical:

3rd party Java script widgets

Fortune 500 Advertising take rate by vertical:

Ads Fortune 500
Here is a link to the survey if you want to read their analysis but one thing struck me.  You could use the different behaviors of verticals to explain the results behind these charts but you could also use traffic.  Verticals that seem like they would have more heavily trafficked sights were far more likely to use 3rd party content of one form or another.

The investigate further, I decided to do my own little mini-investigation of highly trafficked websites.  To do so, I started with Google’s US top 100 list of most visited web properties.  I then decided that everyone who the list said had advertising was automatically on the ‘3rd party content present’ list, except for companies actually in the web advertising business like Google.  I also exempted sites that I simply didn't believe had any advertising despite what Google said, like the NIH.  To inspect the remainder quickly and efficiently I needed to make use of web development tool.  KITE has a handy feature whereby you can group content by domain of a particular a page.  Then it’s a simple task of visually inspecting the domains to see if any come from a 3rd party.  As an aside, KITE has this feature so you can setup Keynote monitoring and alerts for 3rd party content, but I digress.  Below is a KITE screen shot with Yahoo’s content sorted by domain.

 

Kite Capture 1

 

As you can see, while most content at Yahoo comes from them there is one object that comes from a third party, underlined in red.  In the end I found only two players who didn’t have 3rd party content: a few Google properties and Wikipedia who is a non-profit and happens not to use a 3rd party analytics javascript.  Even more traditional organizations in this list, like HP.com or NIH.gov, had 3rd parties in the mix.  I then expanded to Google’s global 1000 list and while I wasn’t masochistic enough to check everyone, the trend held at the top and the bottom of the list.

So what have we learned?  The vast majority of top companies in the US have 3rd party content on their websites and pretty much any successful site will have it.  We have to be careful here to get cause and effect straight though.  I hypothesize that the virtues that have helped to make top web properties so successful has led them to embrace 3rd party content.  It’s certainly not a consumer only thing.  Even sites, like Cisco.com, that couldn’t be more business oriented use 3rd party content.  This all makes sense.  As the public cloud (internet) get’s bigger and more diverse you are going to want to leverage specialists to deliver rich content, integrate with 3rd parties like Twitter that can help get the word out, and analyze the visitors to your page carefully.  If you don’t then you are likely to miss out on traffic and repeat visitors.  So if the question is what successful websites use third party content then the answer is Yes.

Posted by Ian Withrow on December 08, 2010 at 05:06 PM in Web Page Monitoring, Web Performance, Web Performance Testing, Web/Tech, Weblogs, Website Availability Monitoring, Website Monitoring, Website Monitoring Service, Website Monitoring Software, Website Performance Monitoring | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: 3rd party web content, cloud, internet, web monitoring, web performance

| |

Net Neutrality: Implications for Cloud Performance

In the view of its most passionate advocates, the Internet is all about "openness" - which can be loosely defined as equal access to information at any time. As has been widely reported. last week, a Federal Appeals Court ruled that the FCC could not regulate Comcast's ability to control the web content that comes through their network. Many feel that this could, over time pose a serious threat to the openness of the Internet.  Scales-justice

The concept of "net neutrality" was established in 2008 in response to Comcast secretly (initially) limiting bandwidth for BitTorrent traffic (among other sites).The FCC mandated that ISP companies will handle all data equally. Comcast decided to appeal this, and won the appeal.

Comcast's complaint is that some applications and website generate a traffic load so great, that it can drag down the entire system, and so the ISP and cable company wants to handle different types traffic with different priorities.

If we use the common highway analogy, any high-bandwidth traffic (not just Peer to Peer (or P2P) networks) is somethiBroadband2ng like truck traffic. It is fair to suspect Comcast and other network infrastructure companies. will turn their infrastructure into a toll road, where some of the lanes are devoted to trucks, the other half to cars only. If you are driving a truck, the cost will be considerably higher. Comcast has said that they have no plans to discriminate against high-volume traffic. Color me skeptical, as the court's ruling clears the way for creating different tiers on the Internet, some slow, some fast.

 The implications of the appeals court ruling are wide-ranging. The DC Court of Appeals didn't address Comcast's arguments as much as it did the FCC's jurisdiction in this matter. In short, they said that the FCC didn't have sufficient enforcement credibility.

However. high-speed Internet falls under an area of light jurisdiction known as "Title I - Information Services" and with a majority vote of their five commissioners, hi-speed may get moved to the category known as "Title II - Telecommunication services", a category that will allow for a higher level of regulation. This is a possibility that may indeed allow the FCC to regulate ISPs as stringently as they currently do the airwaves.

Let us assume that the appeals court ruling stands and that broadband providers will be allowed to control their bandwidth usage as they see fit. Where does this leave users and companies that push intensive applications? Consider this train of thought:

You run a Web site that is pushing a lot of data, or perhaps you have a cloud application that generates high volume.It doesn't matter if the volume is generated from a relatively small user base, or lower amounts of data being sent and received by a site with tens of millions of users. If we are looking at a new world where the volume of bandwidth can change your overhead costs, the cost of performance will be measured in two new terms:

  • First, assume that your site is relegated to the "slow lane" to keep costs low. You have two primary ways to compensate - optimize the back-end, or the front-end.  As stated in an earlier column of mine, enhancing front end performance, which is much easily attainable than back-end enhancements, will impact the user experience as much as 40 to 50 percent. The alternative is a slow user experience that will cause a serious loss in business
  • Second, assume your alternative is to pay extra for the "fast lane" of your broadband provider. Now you weigh the benefits of a faster user experience - everywhere the broadband network reaches - against the cost of the fast lane.

Web performance has always had an economic component for owners of Web sites. Now it gets even more clearly defined in the core network. Dollars and cents and the cost of a poor user experience - who expected the US judicial system to provide incentive to enhance cloud performance?

Posted by G.H. Brooks on April 16, 2010 at 06:46 PM in Web/Tech, Weblogs | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: cloud performance, web performance

| |

Search

Connect With Keynote

  • Subscribe to RSS
  • Follow us on Twitter
  • View us on YouTube
  • Signup for our Newsletter
  • Other Keynote Blogs:
    • Keynote Web Privacy

    • Keynote Mobility

Keynote Web Performance Watch Blog

A forum for discussion and commentary on technology, trends and
touchpoints of interest to the Web performance community.

Recent Posts

  • Did You Fall When Facebook Stumbled?
  • Open API Access to the Keynote Business 40 Index Data
  • Another Win for Keynote Customers
  • Guidelines for 3-screen Performance Management
  • April Release
  • KITE 5: Introducing New User Experience Metrics
  • Speed and Tenacity: the Apple iPad Outage
  • Test Your Site on IE 9 and Measure User Experience
  • Making IT Matter
  • Up on the Roof Top… Click, Click, Click

About This Blog

  • • About
  • • View Archives
Copyright © 1995-2012 Keynote Systems, Inc. All rights reserved.


  • • Terms of Service
  • • Privacy Policy
  • • Site Map
  • • Support