Cloudy With a Chance of Scale

Recently I met with a company looking for some long term advice on building out their database infrastructure. They had a pretty good mix of scaling vertically for overall architecture, while scaling horizontally by segmenting customers into their own schemas. The had a failover server in place, but as the business was growing, they were looking at ways to better future proof operations against growth, and also build more redundancy into the system, including multi-datacenter redundancy. After talking with them for a bit, I drew up a radical solution: “To The Cloud!” I think I am generally considered a cloud skeptic. Most of how we are taught to scale systems and databases from a technical standpoint doesn’t work well in the cloud. I mean, if you have a good methodology for problem solving you can make a lot of improvements in any environment; we’ve certainly seen that with customers we’ve worked with at OmniTI. But if you are just into looking at low-level numbers, or optimizing performance around disk i/o (generally the most common problem in databases), those methods just aren’t going to be as effective in the cloud. That is not to say that if you are willing to embrace some of the properties of what makes for successful cloud operations, then I think it can be a pretty successful strategy. One of the key factors which I often see overlooked in most “will the cloud work for me” discussions is whether or not your business lends itself well to the way cloud operations work. In the case of this particular client, it’s a really good match. First, this company already segments their customer data, so there is a natural way to split up the database and operations. Second, they don’t do any significant amount of cross customer data, which means they don’t have to re-engineer those bits to make the switch. Further, the customers have different dataset sizes, different access patterns, and different operational needs, and most importantly, they pay different rates based on desired levels of service. This matches up extremely well with a service like Imagine that, instead of buying that next bigger server, instead of setting up cross-data-center WAL shipping, instead of buying machines in a different colo somewhere across the country, instead of all that, they could instead buy individual servers with Heroku, sized according to customer data size and performance needs. For smaller customers you start with minimal resources, and as the customer grows, you dial up the server instance size. Furthermore, you get automated failover setups, and an also easily store backups in a different datacenter based on given regions. You can even work to match customers to different availability zones based on their users endpoints. And if you want to do performance testing or development work, you can create copies of the production databases and hack away. These are the kinds of services OmniTI has built on top of Solaris, Zones, and ZFS, and believe me they will change the way you think about database operations. Of course, it’s not all ponies and rainbows. You still have to move clients on to the new infrastructure, but that should be pretty manageable. You’d also need to build out some infrastructure for monitoring, and you’ll need to be able juggle operational changes. Some of this is not significantly different; pushing DDL changes across schemas is pretty similar to doing it across servers, but you’ll probably want to create some toolsets around this. Also you’re less likely to bear fruit from micro-optimizations; that doesn’t mean that you throw away your pgfouine reports, but the return on performance improvements and query optimization will be much lower. That said, if you can get good enough performance for your largest customers (and remember, you’ll have easy capabilities for distributing read loads), you end up an extremely scalable system, not just technically, but from a business standpoint as well. If you aren’t building this on top of Heroku’s Postgres service, the numbers will probably look different, but the idea that you’ve matched your infrastructure capabilities to a significant range of possible growth patterns should be compelling for both suits and the people who maintain the systems.