I’ve just had a really pleasant experience looking at Heroku – the ‘cloud application platform’ from Salesforce.com but it’s left me wondering where it fits in.
A mate of mine who works for Salesforce.com suggested I look at Heroku after I told him that I’d had some good and bad experiences with Google’s AppEngine and Amazon’s EC2. I’d been looking for somewhere to host some Python code that I’d written in my spare time and I had looked at both AppEngine and EC2 and found pros and cons with both of them.
As it turns out it was a good suggestion because Heroku’s approach is very good for the spare-time developer like me. That’s not to say that it’s only an entry level environment – I’m sure it will scale with my needs, but getting up and running with it is very easy.
Having had some experience of the various platforms, I’m wondering where Heroku fits in. My high-level thoughts…
Amazon’s EC2 – A Linux prompt in the sky
Starting with EC2, I found EC2 the simplest concept to get to grips with but by far the most complex to configure. For the uninitiated, EC2 provides you with a machine instance in the cloud which is a very simple concept to understand. Every time you start a machine instance you effectively get a Linux prompt, of varying degrees of power and capacity, in the sky. What this means is that you have to manually configure the OS, database, web infrastructure, caching, etc. This is excellent in that it gives unrivalled flexibility and after all, we’ve all had to configure our development and test environment anyway so we should understand the technology.
But imagine that you’ve architected your system to have multiple machines hosting the database, multiple machines processing logic and multiple web servers managing user load; you have to configure each of these instances yourself. This is non-trivial and if you want to be able to flexibly scale each of the machine layers then you own that problem yourself (although there are after market solutions to this too).
But what it does mean is that if you’re taking a system that is currently deployed on internal infrastructure and deploying it to the cloud, you can mimic the internal configuration in the cloud. This in turn means that the application itself does not necessarily need to be re-archtected.
The sheer amount of additional infrastructure that Amazon makes available to cloud developers (Queuing, cloud storage, MapReduce farms, storage, caching, etc) coupled with their experience of managing both the infrastructure and the associated business models, makes Amazon an easy choice for serious cloud deployments.
Google AppEngine – Sandbox deployment dumbed down to the point of being dumb?
So I’m a fan of Google, in the same way that I might say I’m a fan of oxygen. It’s ominpresent and it turns out that it’s easier to use a Google service than not – for pretty much all of Google’s services. They really understand the “giving crack cocaine free to school kids” model of adoption. They also like Python (my drug of choice) and so using AppEngine was a natural choice for me. AppEngine presents you with an abstracted view of a machine instance that runs your code and supports Java, Python or Google’s new Go language. With such language restrictions it’s clear to see that, unlike EC2, Google is presenting developers with a cosseted, language-aware, sand-boxed environment in which to run code. The fact that Google tunes the virtual machines to host and scale code optimally is, depending on your mindset, either a very good thing or close to being the end of the world. For me, not wanting, knowing how to, or needing to push the bounds of the language implementation, I found the AppEngine environment intuitive and easy. It’s Google right?
But some of the Python restrictions, such as not being able to use modules that contain C code are just too restrictive. Google also doesn’t present the developer with a standard SQL database interface, which adds another layer of complexity as you have to use Google’s high replication datastore. Google would argue, with some justification I’m sure, that you can’t use a standard SQL database in an environment when the infrastructure that happens to be running your code at any given moment could be anywhere in Google’s data centres worldwide. But it meant that my code wouldn’t port without a little bit of attention.
The other issue I had with Google is that the pricing model works from quotas for various internal resources. Understanding how your application is likely to use these resources and therefore arriving at a projected cost is pretty difficult. So whilst Google has made getting code into the cloud relatively easy, it’s also put in place too many restrictions to make it of serious value.
Heroku- Goldilock’s porridge too hot, too cold or just right?
It would be tempting, and not a little symmetrical, to place Heroku squarely between the two other PaaS environments above. And whilst that is sort of where it fits in my mind, it would also be too simplistic. Heroku does avoid the outright complexity of EC2 and seems to also avoid some of the terminal restrictions (although it’s early days) of AppEngine. But the key difference with EC2 lies in how Heroku manages Dynos (Heroku’s name for an executing instance). To handle scale and to maximise use of its own resources, Heroku runs your code only for the specific instance that it is being executed. After that, the code, the machine instance and any data it contained are forgotten. This means that things like a persistent file system or a having a piece of your code always running cannot be relied upon.
These problems are pretty easily surmountable. Amazon’s S3 can be used as a persistent file store and Heroku apps can also launch a worker process that can be relied upon to not be restarted in the same way as the other Dyno web processes.
Scale is managed intelligently by Heroku in that you simply increase the number of web and worker processes that your application has access to – obviously this also has an impact on the cost. Finally there is an apparently thriving add-on community that provides (at additional monthly cost) access to caching, queuing and in fact any type of additional service that you might otherwise have installed for free on your Amazon EC2 instance.
I guess the main conclusion of this simple comparison is that whilst Heroku does make deploying web apps simple, you can’t simple take code already deployed on internal servers and git commit it to Heroku.com. Heroku forces you to think about the interactions your application will have with its new deployment environment, because if it didn’t, your app wouldn’t scale. This is also true of Google’s AppEngine, but the restrictions that AppEngine places on the type of code you can run makes it of limited value to my mind. These restrictions do not appear to be there with Amazon EC2. You can simply take an internally hosted system and build a deployment environment in the cloud that mimics the current environment. But at some point down the line, you’re going to have to think about making the code a better cloud citizen. With EC2, you’re simply able to defer the point of re-architecture. And the task of administering EC2 is a full time job in itself and should not be underestimated. Heroku is amazingly simply by comparison.
Anyway, those are my top of mind thoughts on the relative strengths and weaknesses of the different cloud hosting solutions I’ve personally looked at. Right now I have to say that Heroku really does strike an excellent balance between ease and capability. Worth a look.