A 100-minute outage on the web platform of Google’s Gmail service wasn’t just an inconvenience for millions of users; it’s an illustration of the risks associated with cloud computing.
The idea with cloud computing is that users outsource many of their computing needs via the Internet, using only the applications they need when they need them, as opposed to having software they own run directly on their own desktop computers or corporate servers. It’s the hidden genius of today’s Internet, serving anyone who’s ever run an online search, and has helped a lot of companies better manage their resources.
Online retailer Amazon has a web services unit that helps companies run everything from online language learning to tax preparation. Salesforce.com offers customer relationship management services via its own cloud system.
Google’s entire business model is predicated on users going online, and the company is increasingly focused on inducing people to move their computing needs from desktop-bound software to web-based programs.
And the business potential of cloud computing is a growth area for Google, too and part of its increasing targeting of Microsoft. In one of its only conventional paid advertising forays, the company has even bought billboard space to promote its “Apps at work” suite for business -- a service that costs $50 (U.S.) per user account.
But the cloud isn’t an amorphous blob of good computing feeling. It’s a complex world driven by human and machine calculations about how to best optimize resources amongst servers, storage units, content, and end users. Sometimes the cloud disappoints. Here’s the account of what happened to Gmail from Ben Treynor, Google’s vice president of engineering and “site reliability czar”:
"we took a small fraction of Gmail's servers offline to perform routine upgrades. This isn't in itself a problem — we do this all the time, and Gmail's web interface runs in many locations and just sends traffic to other locations when one is offline."
"we had slightly underestimated the load which some recent changes (ironically, some designed to improve service availability) placed on the request routers — servers which direct web queries to the appropriate Gmail server for response."
"Within minutes nearly all of the request routers were overloaded."
In other words, Google miscalculated. That’s especially troubling. Of all technology concerns, Google ought to set a cloud computing gold standard: it’s likely the most cash-rich, well-capitalized, and capable of “scaling” computing challenges to handle large numbers of users. If it makes a mistake, or misaligns resources, larger questions have to be raised about its, or anyone else’s, ability to offer cloud computing to clients (be they business or personal) who need reliability.
Traynor concluded his blog post by saying “Gmail remains more than 99.9% available to all users, and we're committed to keeping events like today's notable for their rarity.”
But for anyone who wants to pull a presentation from the cloud for a big business meeting, or has stored key information on their e-mail account, a 99.9 per cent success rate isn’t good enough.
A trip to the clouds will continue to entail a risk that some may find too much to bear.
