November 08, 2008

Web 2.0 – Fatigue or Future?

At another recent “summit” in San Francisco to celebrate Web 2.0, a casual poll of the attendees revealed something to me that might portend a future direction for IT.

As I milled about a conclave of attendees from venture capital, self-designated Web 2.0 companies, and media (Ken Auletta was even in attendance) there was a lot of smiling and people watching. There wasn’t much else to do.

The attendees expressed skepticism that there was anything new being said behind the well guarded doors of the conference’s main hall. There seemed to be fatigue with yet another IT event claiming to be a summit, especially about a topic that is getting a little long in the tooth.

There maybe is something to this after last Tuesday. As if by magic wand, or clicking of one’s heels, the world seems a much different place. Assumptions are being challenged and priorities reexamined. Let’s call it for what it is: the Obama effect.

Is what we have today the best the IT industry can do? How many more photo sharing, social networking and shopping sites do we need?

If the reputation that VCs enjoy – deserved or not– of being the smart thinking set is any indicator, then the smart money has already moved on. Take a look at serial entrepreneur and investor Vinod Khosla – he calls his stable of companies the “renewable portfolio”.

“Innovative bottom up methods will solve problems that now seem intractable- from energy to poverty to disease.”

- Vinod Khosla

 
Whether we acknowledge it now or later, it is clear that a new era has begun. The best and brightest will once again think about making an impact for the good of the world.

It has already started. I saw a presentation given by Dara O'Rourke about a company he started. It was born from his curiosity about what the sunscreen he was applying on his young daughter contained.  He found out and to his horror, it wasn't good. Now you and I can benefit from his shock: www.goodguide.com. (It is interesting to note that Mr. O'Rourke was the hand's down audience favorite. What did the VC panel of "judges" think? That's another matter.)


Another presentation that caught our imagination was www.sungevity.com, a company bringing solar energy to the home.

There are many tough problems waiting to be solved that need our attention. Let’s get started, shall we?

September 04, 2008

Olympic Performance and Winning Web Sites

The 2008 Summer Olympics was an historic event in many ways. China was a remarkable host, and ranked first by gold medal count with 51, and ranked second overall with 100 medals total.  CCTV in China is one of the largest television broadcasters in the world, and owns the online video rights to the 2008 Beijing Olympics. For viewers in mainland China, there was an unprecedented 3800 hours of Olympic coverage available. Although the site’s performance could, at certain points lag to 8 seconds to load a page, those pages were incredible. CCTV utilized a mix of their own proprietary player for the live streaming videos, and extensive use of Flash and Flex for video content. The wide range of technology that was implemented to bring this event to viewers was significant, and well done, with the site’s availability remaining at 99.90%. Worth noting is that the CCTV streaming video was supplied to 77 other countries that did not have IOC rights granted, and the performance remained top notch.

France ended with 40 medals in total, 7 gold, 16 silver and 17 bronze. For the french public who wanted to keep up with the Olympics news, they would not have been left disappointed. In particular, the AFP, French Athletes and L'Equipe websites stood out with very high availability and fast performance times. Canal Plus' performance times varied considerably during the day. Visitor's to that site would have faced performance times of around 8 seconds during the early morning of late evenings.. however, during the day and peak viewing times, the page was taking up to 20 seconds to download. This behaviour happened everyday during the tournament. This is an indication that extra heavy load on the Canal Plus website was causing slower performance.

Germany ended in 5th place with 41 medals, 16 of which were gold, 10 silver and 15 bronze. Gold for performance and availability, clearly goes to the Google DE News site. 100% availability and 1.21seconds average download time. ORF, ARD Sport and the Schenker sites also performed very well and did not leave their visitors frustrated. Visitors to Leichtathletik would have found that if they went to the site in the morning, performance was  slower than other times of the day. 10 seconds to load the page every morning around 9am,however, other times of the day it was faster.

Team GB ended in 4th with their best medal haul for many years. 47 in total, 19 golds, 13 silver and 15 bronze. Visitors to UK sites would have found that Eurosport and the UK Athletics sites would have provided the quickest and most available updates on the goings on in Beijing. Sky Sport's availability levels seem to drop during peak times. Keynote's measurement services which were used to monitor these sites have a 60 seconds default timeout value. On the occasions where Sky Sport's availability drops it signifies that the page did not complete downloading fully within that 60 seconds.

The United States finished the games with 110 total medals, 36 of them gold, 38 silver, and 36 bronze, a remarkably consistent finish.The internet experience was not nearly as consistent, however was very exciting for viewers at home nonetheless.  From a performance standpoint, the US Track & Field site, with it’s slim graphics, and pinpoint data offerings was by far the fastest site with an overall average download time of 1.43 seconds and 99.6 availability. Silver would go to either Fox Sports or the Associated Press. The Fox Sports site, managed by Microsoft, had average download performance time of 3.00 seconds and 99.90% availability; whereas, the Associated Press came in with an average of 1.80 seconds but 99.84% availability.  The Associated Press may have gotten the gold had it not been intermittent web server connectivity issues experienced between 10:28am and 11:55am (EDT) on August 17th.  Fox Sports too was in contention for the Gold; however, download performance is greatly influenced by composition and design of a web site.  Fox Sports provides a very rich with dynamic experience with interactive graphics, JavaScript, and links to video, all of which took longer for visitors to download; however, even three seconds is quite impressive. The most ambitious site in the US was by far, the NBC Olympic site. In an unprecedented but highly anticipated move, NBC partnered with Microsoft and Limelight Networks to provide over 2,200 hours of live coverage on 20 simultaneous “channels” and more than 3,000 on-demand video content, using Microsoft’s Silverlight technology.  This is in addition to providing real-time updates on individual performances, medal counts, and commentary.  (This is in stark contrast to some of the fast downloading and more available sites.)  At first glance, the overall average download performance of 4.56 seconds and availability of 99.70% is quite respectable but, when digging deeper, it is truly impressive.  Beginning August 8th, the “size” of the initial home page grew by almost 40%, from around one megabyte to approximately 1.3 to 1.4 megabytes.  During this same time, amount of time it took for the NBC’s web servers to respond to online requests sped up by almost 350%, virtually eliminating any performance impact larger files, images, and videos displayed on the NBC site would have had on download performance.  NBC was able to deliver a richer experience and more content without sacrificing the expectations of eager online visitors.

If IT was an Olympic sport, the gold would go to the worldwide teams delivering Streaming Video to viewers like us.

August 22, 2008

Silverlight and the Olympic sized streaming effort

When my wife complained about missing out on some of the Olympic action on TV due to her work schedule, I was quick to remind her about the availability of the games online. Part of the reason for my enthusiasm for it was that as a Product Manager for the Streaming Perspective at Keynote, I got a peek into the efforts that go into making that happen. Streaming Perspective is being used extensively by media companies in various countries to ensure that the Olympic streams on their Web site meet the customer expectation.

Closer to home in US, Keynote integrated Microsoft Silverlight into its streaming measurement infrastructure to help measure the performance aspects of the Silverlight streams as seen on NBCOlympics.com on MSN. More details in the press release.

How measurement helps:

As any performance enthusiast would know, to ensure the highest quality experience of a video for a live event, it is not only important that you make sure that the infrastructure is up to the task, using the load testing, but also one should continuously monitor the performance of the video streams. This becomes more important as the monetization aspects of video streams on Web become clearer. The demand for the videos on line were impressive NBC alone averaged more than 1.5 million viewers daily, source: http://www.marketingcharts.com/interactive/fans-around-the-world-following-the-olympics-online-5703/ and (http://www.beet.tv/2008/08/nbc-olympics-on.html).

Some technical details on the metrics:

Some of the key points that Streaming Perspective collects are:

 Availability from different geographical location
 DNS Resolution Delta
 TCP/IP Connect Deltas
 Number of TCP Connections
 Total Sent & Received Bytes
 Total Number of Sent & Received Socket Buffers
 Connect Time
 Initial Buffering Delta
 Total Rebuffering Delta
 Total Rebuffering Count
 Average Rendered Frames Per Second

When you take a holistic view of the data collected the performance bottlenecks become very obvious, allowing your team to take the necessary action to fix the issue.

How can you measure from your desktop?

In addition to streaming video measurements my colleague Bard Boston, also managed to get Silverlight network/file metrics using KITE and from his desktop (to get a free copy of KITE, go to http://kite.keynote.com). The unique aspect of this is: using Streaming Perspective to monitor stream quality in a Silverlight player, and coupling that with the network metrics of Silverlight http delivery methodology, administrators have an unparalleled view into streams delivered over the internet. Here is some overview of the metrics collected.

For those of you who want to see it for yourself, Bard has listed the steps to get this metrics right from your desktop, detailed instructions will soon be posted on the KITE community (http://kite.keynote.com/community).

Syncro_diving

Step 1. Download Silverlight Beta 2 (for video streaming) to your desktop. This can be found at either nbcolympics.com or Microsoft.com

Step 2: Download and install KITE.

Step 3 Build a KITE script using the stream URL found in the NBC player.


August 06, 2008

Keynote KITE Goes Live

It's been a busy summer at Keynote as we've been preparing to launch KITE. We've been happy to get really positive early feedback from demos we've been doing. And for those who signed up for and were accepted into the early adopter program, download is available as of yesterday.  If you didn't get the info send an email to: kite.labs@keynote.com.  (And if you're still interested in joining the KITE community you can still sign-up.)

Whenever you launch a new product you wonder how users will respond.  With KITE what's impressed us is how easily folks have seen the need for a desktop tool that lets them test Web sites for performance.  It's like the old "the car that sells itself" ads.  Duh - of course.  Just because Web infrastructure is better understand today than a decade ago doesn't make it easier. 

And from what we see from KITE registrations, there are many more departments - from dev to QA - that are interested in Web performance than you might have thought.  It ain't just for ops anymore and isn't that better for us all.

Side Reading: For those of you looking to put KITE and your Web performance skills to the test, a new book is out by Andy King called "Website Optimization Secrets".  Steve Souders' 14-steps to fast Web sites is also must reading.

July 18, 2008

SaaS operations - the complexity of an airline logistics system

I was talking to our head of operations, Eric Stokesberry, just the other day. He said that we like to take pride in the fact that our global network (comprised of Windows XP machines, actual mobile handsets, Linux boxes and the like in 2400 locations) execute almost 250 million Internet measurements every single day. That's pretty impressive, considering that Salesforce.com, the premier SaaS company in the world, does about 50,000 transactions per day.

It struck me that our SaaS operations system has all the complexity of a powerful, Sabre-like airline reservations system and an airport's logistics system. Our customers make reservations, through an automated scheduling process, to execute Web (or mobile) application/site test scripts, for, say, every 2 minutes. For Web performance measurements, the time it takes to execute the test script itself depends upon both connection speed and the construction of the application itself.

The speed of the connection that our measurement computers are connected to could be last mile DSL speeds, or it could be Tier1 broadband speeds - the customer has the choice. Just like airline customers decide to book themselves on economy or first class. If you're in first class, the lines to board the plane are going to be much shorter. In our world, test scripts can "board" much faster on broadband, because test scripts finish faster (than on DSL), and therefore more of them can execute in a given time period.

Sometimes (and you know this from your own experience, unfortunately) airlines overbook their planes, and passengers have to be rescheduled or routed to a different plane. The speed at which the airline does directly relates to how satisfied their customers feel, during the rescheduling process. In a similar way, if too many test scripts are taking, say 5 minutes to execute, but they are supposed to execute every 2 minutes - overbooking happens. Other test scripts that were scheduled to run right after the first test completed, now cannot start. The system begins to feel like an overbooked flight.

Fortunately, this is where Eric and his team come in. They have spare computers, connected and ready to go, which can be brought online in an instant. Tests get re-directed to their spare computers, and everything begins humming again. This is like an airline where, if the flight is overbooked, you show up at the gate, and magically they open another counter for you where your check-in handled, and you are led to a new gate, with a new plane, and new staff.

Except you never even knew all this was happening in the background!

Now that's a good customer experience.

June 20, 2008

Announcing KITE for Early Adopters at O'Reilly Velocity Conference

Keap_2

I'm very excited about being at the O'Reilly Velocity Web Performance and Operations Conference in Burlingame, CA, on June 23-24. For those reading this post, I assume you are interested in the online experience and its performance.   Just like me I'm sure you are excited that we now have a dedicated O'Reilly conference addressing the topic of Web Performance and Online Operations. 

At Velocity, I will be demonstrating, along with my colleague Abelardo Gonzalez, Keynote's Internet Test Environment (KITE) right after the opening session, on Monday, June 23, at 9:30 a.m. in the Plenary Sessions, Salons E-F, and participating in the Performance Metrics Panel led by John Rauser with Peter Sevcik (NetForecast), Eric Goldsmith (AOL), Eric Schurman (Microsoft), on Tuesday, June 24, at 5:15 p.m.

KITE is a product that is used by Keynote customers today.  But at Velocity we will be announcing the KITE Early Adopter program in which Keynote is allowing anyone in the world to sign up (at http://kite.keynote.com) to work with us to put this FREE product to use.  Early Adopters will have in their arsenal a really strong and sophisticated product to measure, test and diagnose the performance of Web applications and sites.

KITE is a new desktop-based test and measurement environment for recording, editing and analyzing the performance of Web sites across the Internet cloud that is intended to bridge the gap between web application developers, QA teams, performance analysts and web operations.

KITE enables Web developers, QA professionals and others, to execute rapid performance analysis and validation to measure the end user experience of next generation Web 2.0 applications that include AJAX and asynchronously downloaded content with point and click ease.  Scripts can be shared as benchmarks and to perform triage among all the web application life cycle groups, including developers, QA, performance analysts and Web operations/IT Departments.

In my next post, I will show how you can use KITE as a performance diagnostic tool for the Google App Engine app Tweetwheel.com!

Here's the performance error I have seen on Tweetwheel.com (click on the image to enlarge):

Tweetwheel

Here is how KITE shows the performance times by domain - tweetwheel.com, code.google.com, and ie7-js.google.com (click on the image to enlarge):

Tweetwheel2

If you are interested in joining the KITE Early Adopters Program please sign up for our release at http://kite.keynote.com/. See you at Velocity on Monday!

How to Choose Tools for Building on Cloud Platforms

You can go about this in a couple of different ways.  One is you can bring in people who understand application construction or development, often independents developers and give them tools to help figure out where the application problem is.  There are a variety of tools being used in the marketplace to do this that will give a developer some sense of where the response times of these applications. My advice to anyone building applications on these cloud platforms would be that they should look at functional, performance and scalability testing tools. 

There are many functional testing tools on the market place as this is a well-established practice - see http://www.opensourcetesting.org/functional.php. In terms of performance testing, a developer should look for a software product that allows them to breakdown response times by domain. This is a must have.  For example in the previous Tweetwheel example it would be necessary to determine to breakdown the response times for Tweetwheel.com, the domain for the Twitter API and Google.com domains.

For scalability testing I would look at tools that are easy to use that are easy to use so that you easily can set up load test of 1000 or 5000 virtual users and assess that at that load does your application start breaking down or does it still perform well.  You can buy many software or SaaS products, which allow you to do these tests. Of course, I work for a company that provides some of these products – Keynote Systems.

June 18, 2008

Responsiveness, Availability and Usability of Cloud Computing Platforms

In order to solve the cloud computing performance dilemma, I think about breaking it down or examining the performance at the application construction level.  To diagnose the problem you have to break down the responsiveness, availability and usability of the various components that are being served up by the different cloud platforms.

A developer has to examine the application code and ask "where in this application are we calling a service that served by Twitter, or served by Google App Engine? And how many seconds is that taking and is that being responsive or not? 

Let’s say the Tweetwheel developers found no performance problems with the Twitter API. Then the next step is examining data on the Google App Engine and asking: is some component (JavaScript file, image, database, etc) of the Google App Engine responding quickly? Is it taking one second or three seconds?  After all this will determine the overall responsiveness of the Tweetwheel application.

This is just one approach.

How do you choose the right products for testing the responsiveness, scalability, and usability of web apps that are built on cloud computing platforms?

June 17, 2008

We may be experiencing difficulties with our cloud computer...

I think most application developers who will be building on cloud computing platforms will have to think about how do you figure out who is both responsible and accountable for the performance of an application.

I think about that all the time.

An example of a really interesting cloud computing platform is Google App Engine http://code.google.com/appengine/, that essentially consists of an application runtime engine and database hosted on Google servers.  Google App Engine can provide storage and processing power for anyone who wants to write a web application.

I took a look at Google’s application gallery (http://appgallery.appspot.com/) and viewed dozens of applications that people have written and posted already. At first blush these looked very much like Facebook type applications.  And when I say Facebook type applications I mean applications that are not used for a serious business purpose, but mostly for fun. 

An example of such an application is Tweetwheel, http://www.tweetwheel.com, where you “Find out which of your Twitter friends know each other!”  To participate, you enter your twitter id (mine is “vikchaudhary”) and a huge social graph is created that displays spokes of all the people that are following you and the people who are following them.  Tweetwheel is built with Google App Engine and Pylons.

But interestingly, I went to Tweetwheel many times to use it, and it often hung when loading my friends and this message appeared: “Twitter is taking too long to respond. If it doesn’t respond soon, refresh and try again.” Clearly the publicity that Tweetwheel has gotten from Google App Engine’s web site was tremendous. In fact, it appears they are getting so much attention that their application may be getting swamped, or maybe it’s Twitter’s own site that’s getting swamped. 

This made me wonder – if there indeed is a performance problem with Tweetwheel, how does one dig further to find out where the problem lies? It appears that diagnosing the problem is not that simple since there are three players in the organization of Tweetwheel:  the Tweetwheel website, the Twitter API function that is running on the Twitter servers, and the Google code that is running on Google App Engine. 

So we already have a United Nations of sorts – in the construction of the application.  Who is responsible for maintaining peace and stability among the applications?  When an application goes down, is it Tweetwheel that needs to look at it, is it the Twitter guys with the API or is it Google App Engine? 

What can we do about diagnosing who is responsible and how to fix performance problems for apps built with these new cloud computing platforms?

June 05, 2008

Be first in the last mile

Ah, the early days of the commercial Internet. Broadband was a pipedream, dial-up ruled the roost. “Lighter pages!” was the clarion call from VPs of engineering departments. Anyone remember @Home? (Later becoming Excite@Home – a most unfortunate combination of corporate brands.) They were the first to seriously tackle the broadband problem (sorry, ISDN doesn’t count) and like many firsts ended up in the dust bin of history. Don’t let that happen to your online retail business, I’ll explain in just a moment by way of an example.

10 years later broadband is plentiful and all our problems have been solved – or have they? Over the last few weeks if you were shopping for that cashmere sweater (yes it’s summer and that’s why it’s getting colder in San Francisco) at several online retailers  chances are that you never got very far online. The performance problems at some sites would have made you forget $4.50 gasoline prices, grab your car keys and head down to the local mall.

There are of course multiple culprits. If you’re a business manager or marketer reading this and decide to go down the hall with your flashlight to ask your techies (they don’t like lights, very green of them), they might say something like “it’s the Internet stupid”. But I say, don’t buy it. Sure there are network slowdowns but week after week? And only for your Web site?

Some pretty smart people at Google have looked into the last mile performance problem. And they will tell you that as much as 98% (those guys like precision) of the wait time could be caused not by the back-end but . . . wait for it. . . the front-end. Yup, the browser.

So 10 years later does the clarion call still remain “lighter pages!”? It might or it might just be smarter scripting techniques. But rather then guessing, figure out if you have a problem by looking into your last mile performance.

If you don’t know how you compare against the industry, just spend a few minutes in search land and your questions will be answered (psst, it’s no secret that Keynote publishes benchmarks for free and also has a nifty service to determine last mile performance).

Now that example I promised.  Remember that other pioneering company, Friendster?  What happened to them?   When asked that question by The New York Time (free registration required) John Doerr's response was pretty sobering:

“Everything boiled down to our inability to improve performance.”