Saturday, December 29, 2007

Computing on the Edge of the Cloud

While you read these words, fork yourself a little mindlet and send it off a few years into the future. Imagine it reading a page just like this one - with a little extra something, a computational payload to be processed along with the rest of the content.

The concept is simple enough. The various voluntary computing projects have long since established the core principles of partitioning a huge problem into many smaller problems, shooing them off to thousands of voluntary client computer. As the results trickle in, they are swallowed, digested and - pardon my french - nicely excreted, so as to enlighten the operators, whatever their quest. Although we have found no aliens yet, nor discovered a cure for cancer, the method actually works. A lot of numbers are being crunched that way, each and every day.

The weak link is this voluntary thing. People participating in such programs must consciously decide to do so. Downloading and installing a distributed computing client (such as a screensaver) is trivial and well within the capabilities of the average patron. More often than not, though, it is one of those thing we are likely to put off to some other day. Or completely forget about. In essence, the success (in terms of numbers crunched) of an endeavor like folding@home hinges on communicating the worthiness of the cause to the widest possible audience. On that vehicle called marketing.

So to harness at least a bit of that awe-inspiring potential of a billion computers, we might want to tackle the issue from a different angle. We might put it into banner ads.

In the Browser
That concept should not be too hard to digest, either. JavaScript has been with us for well over a decade, and while that supremely underestimated language is neither down-to-the-metal-C nor all-your-meta-LISP, its scripting nature makes it very well suited for wrapping up a minuscule piece of action along with the data in distress. These cadgets (code-and-data widgets) would be stamped out in huge amounts by industrial strength code generators. As for reporting back, a simple XMLHttpRequest based scheme should do the trick nicely.

On top of that, of course, there is the ubiquity of a language born and bred in the browser. In terms of deployment, JavaScript is second to none.

Performance, however, looms. For this prospect to gain any traction, a bare minimum of speed is required, and JavaScript was never known to dispatch itself with abandon. There are probably a lot of reasons for that (though I suspect neglect to be chief among them), but if we relate the testimony of Jeff Atwood over at Coding Horror, times may get very interesting out there in browserland. The various implementations are getting faster and more efficient with each release, and Atwood even speculates that this may become a significant battleground in an imminent browser war.

Which makes perfect sense in this AJAX crazed society. There are a lot of tricks in the book, so in a few years, JavaScript should be able to hold its own with ease. As interpreted scripting languages goes, JavaScript has quite a lot going for it, not the least an open, formal specification, which lowers the barrier of entry for new - or improved - implementations enormously. I believe we are set for a rather interesting ride.

The Infrastructure
Thus satisfied with the state of affairs in the browser, we are left with the minor issues of distributing these cadgets and collating their feedback. The whole business of generating possibly billions of distinct cadgets, disseminating them redundantly across the web, and keeping track of what may or may not return, is daunting in a distinctly darwinian sense.

Sort of like combining Google Adsense and Google Analytics.

Indeed, the constituents of this scheme lies mostly in well trodden territory. While I'm certainly no expert on massively distributed computing, I reckon that the few remaining principles could be fleshed out fairly easily. This isn't exactly MapReduce, since the code and the data are bundled, but it isn't that different, either. Recombining a function with a set of arguments sounds a lot like splicing dynamic content with an HTML template, and how hard is that?

To ease the burden on cadget generators and networks alike, helper libraries might be deployed to a CDN like CacheFile. That way, only the essential computation would have to be generated and send off to the browser.

The resulting cadget would end up along the bitmaps, banner ads, blog rolls and whatever else travels along with modern web pages. It would execute and quietly send back the result.

Sideshow or Big Picture?
Case by case, such a system could never compete with MapReduce at a Google datacenter, or an equivalent number of carefully orchestrated folding@home patrons. The overhead of plain-text transmissions to and from finicky, flickering scripting agents is likely too high. On the other hand, the accumulated effect of millions of JavaScript engines chipping away at some problem or the other, even for short runs, should not be discounted off hand. We are dealing with hitherto undisclosed laws of very large numbers.

To hark back to the issue of banner ads, one might imagine an ecology of computing projects and sponsorship leagues, vying for placements at blogs, portals and corporate websites alike. Personally, I might choose to flag my support for an effort like this - should they ever need some largescale geological analysis - by sporting a cadget banner right here, on this blog.

Who knows, the dreaded Slashdot Effect might well end up being eagerly anticipated by some.

No comments: