As we sit in the midst of record traffic and holiday rushes online, as people scramble to get their gifts ordered and shipped before time runs out, I recently wrote a piece for Retail Info Sys News, talking about various best practices for monitoring web operations during the holiday rush. The folks at circonus asked me to expand on that, which I did in this guest post on the Circonus blog. If you run web operations, do e-commerce, or are just wondering about what goes on behind the scenes, I’d encourage you to check it out.
Recently I met with a company looking for some long term advice on building out their database infrastructure. They had a pretty good mix of scaling vertically for overall architecture, while scaling horizontally by segmenting customers into their own schemas. The had a failover server in place, but as the business was growing, they were looking at ways to better future proof operations against growth, and also build more redundancy into the system, including multi-datacenter redundancy. After talking with them for a bit, I drew up a radical solution: “To The Cloud!” I think I am generally considered a cloud skeptic. Most of how we are taught to scale systems and databases from a technical standpoint doesn’t work well in the cloud. I mean, if you have a good methodology for problem solving you can make a lot of improvements in any environment; we’ve certainly seen that with customers we’ve worked with at OmniTI. But if you are just into looking at low-level numbers, or optimizing performance around disk i/o (generally the most common problem in databases), those methods just aren’t going to be as effective in the cloud. That is not to say that if you are willing to embrace some of the properties of what makes for successful cloud operations, then I think it can be a pretty successful strategy. One of the key factors which I often see overlooked in most “will the cloud work for me” discussions is whether or not your business lends itself well to the way cloud operations work. In the case of this particular client, it’s a really good match. First, this company already segments their customer data, so there is a natural way to split up the database and operations. Second, they don’t do any significant amount of cross customer data, which means they don’t have to re-engineer those bits to make the switch. Further, the customers have different dataset sizes, different access patterns, and different operational needs, and most importantly, they pay different rates based on desired levels of service. This matches up extremely well with a service like postgres.heroku.com. Imagine that, instead of buying that next bigger server, instead of setting up cross-data-center WAL shipping, instead of buying machines in a different colo somewhere across the country, instead of all that, they could instead buy individual servers with Heroku, sized according to customer data size and performance needs. For smaller customers you start with minimal resources, and as the customer grows, you dial up the server instance size. Furthermore, you get automated failover setups, and an also easily store backups in a different datacenter based on given regions. You can even work to match customers to different availability zones based on their users endpoints. And if you want to do performance testing or development work, you can create copies of the production databases and hack away. These are the kinds of services OmniTI has built on top of Solaris, Zones, and ZFS, and believe me they will change the way you think about database operations. Of course, it’s not all ponies and rainbows. You still have to move clients on to the new infrastructure, but that should be pretty manageable. You’d also need to build out some infrastructure for monitoring, and you’ll need to be able juggle operational changes. Some of this is not significantly different; pushing DDL changes across schemas is pretty similar to doing it across servers, but you’ll probably want to create some toolsets around this. Also you’re less likely to bear fruit from micro-optimizations; that doesn’t mean that you throw away your pgfouine reports, but the return on performance improvements and query optimization will be much lower. That said, if you can get good enough performance for your largest customers (and remember, you’ll have easy capabilities for distributing read loads), you end up an extremely scalable system, not just technically, but from a business standpoint as well. If you aren’t building this on top of Heroku’s Postgres service, the numbers will probably look different, but the idea that you’ve matched your infrastructure capabilities to a significant range of possible growth patterns should be compelling for both suits and the people who maintain the systems.
Last night at BWPUG, Greg Smith gave his talk on “Managing High Volume Writes with Postgres”, which dives deep into the intersection of checkpoint behavior and shared buffers, and also into dealing with vacuum. One of the things I always like about Greg’s talks are it’s a good way to measure what we’ve learned between reading code and running large scale / highly loaded system in the wild. Even in the cases where we disagree, it’s good to get a different point of view on things. If you manage Postgres systems and get the chance to see this talk, it’s worth taking a look (and I suspect he’ll post the slides up somewhere this week, if they aren’t already available). One of the other cool things that came out of the talk was one of the guys on my team again validating why we love working with Circonus. We have an unofficial slogan that with Circonus, “if you can write a query, you can make a graph”. Well, Keith noticed that we didn’t have any monitoring for the background writer info on one of our recently upgraded from 8.3->9.1 multi-TB Postgres, so he jump into Circonus and just like that, we had metrics and a graph faster than Greg could move off the slide. This will be awesome once we accumulate some more data, but here’s a screenshot I took from last night while we were in the talk: Yay graphs! Update: Shortly after posting, Keith mentioned that he had updated the graph to speak in MB rather than Buffers. So, here is an updated screenshot with friendlier output and more data. (Note that Phil, one of our other DBA’s, also flipped the buffers allocated to a right axis as well).
Most people tend to think of Postgres as a very conservative piece of software, one designed to “Not Lose Your Data”. This reputation is probably warranted, but the other side of that coin is that Postgres also suffers when it comes to performance because it chooses to be safe with your data out of the box. While a lot of systems tend to side towards being “fast by default”, and leaving durability needs as an exercise to the user, the Postgres community takes the opposite approach. I think I heard it put once as “We care more about your data than our benchmarks”. That said, Postgres does have several options that can be used for performance gains in the face of durability tradeoffs. While only you can know the right mix for your particular data needs, it’s worth reviewing and understanding the options available. “by default” - OK, this isn’t a real setting, but you should understand that, by default, Postgres will work to ensure full ACID guarantees, and more specifically that any data that is part of a COMMIT is immediately synched to disk. This is of course the slowest option you can chose, but given it’s also a popular code path the postgres devs have worked hard to optimize this scenario. “synchronous commits” - By default synchronous_commit is turned on, meaning all commits are fsyncd to disk as they happen. The first trade-off of durability for performance should start here. Turning off synchronous commits introduces a window between when the client is notified of commit success, and when the data is truly pushed to disk. In affect, it let’s the database cheat a little. The key to this parameter is that, while you might introduce data loss, you would never introduce data corruption. Since it tends to produce significantly faster operations for write based workloads, many people find that is a durability tradeoff they are willing to make. As an added bonus, if you think that most of your code could take advantage of this but you have some certain part of your system that you can’t afford the tradeoff, this setting can be set per transaction, so you can ensure durability in the specific cases where you need it. That level of fined grained control is pretty awesome. “delayed commits” - Similar sounding in theory to synchronous_commit, the settings for “commit_siblings” and “commit_delay” try to provide “grouped commits”, meaning multiple transactions are committed with a single fsync() call. While this certainly has the possibility of increasing performance in a heavily loaded system, when the system is not loaded these will actually slow down commits, and that overall lack of granularity compared to synchronous_commit usually means you should favor turning off synchronous_commit and bypass these settings when trading off durability for performance. “non-synching” - Fsync was the original parameter for durability vs performance tradeoffs, and it can still be useful in some environments today. When turned off, postgres throws out all logic of synchronizing write activity with command input. This does mean that running in this mode, in the event of hardware or server failure, you can end up with corrupt, not just missing, but corrupt data. In many cases this might not happen, or might happen in an area that does matter (say a corrupt index, that you can just REINDEX), but it could also happen within a system catalog, which can be disastrous. This leads many a Postgres DBA to tell you to never turn this off, but I’d say ignore that advice and evaluate things based on the tradeoffs of durability vs performance that are right for you. Consider this; if you have a standby set up (WAL based, Slony, Bucardo, etc…), and you are designing for a low MTTR, chances are in most cases hardware failure on the primary will lead to a near immediate switch to the standby anyway, so a corrupt database that you have already moved beyond will be irrelevant to your operations. This assumes that you can afford to lose some data, but if you are using asynchronous replication, you’ve already come to that conclusion. Of course, you are giving up single node durability, which might not be worth the tradeoffs in performance, especially since you can get most of the performance improvements with turning off synchronous_commits. In some situations you might fly in the face of conventional wisdom and turn off fsync in production, but leave it on in development; imagine an architecture where you’ve built redundancy on top of ec2 (so a server crash means a lost node), but you are developing on a desktop machine where you don’t want to have to rebuild in the case of a power failure, and don’t want to run multiple nodes. Life is a series of tradeoffs between cost and efficiency, and Postgres tries to give you the flexibility you need to adjust to fit your particular situation. If you are setting up a new system, take a moment to think about the needs of your data. And before you replace Postgres with a new system, verify what durability guarantees that new system is giving you; it might be easier to set Postgres to something comparable. If you are trying to find the right balance on your own situation, please feel free to post your situation in the comments, and I’ll be happy to try to address it.
Last Friday was the first PGDay Denver, a regional one day Postgres conference, put on by Kevin Kempter and the folks who run the Denver Postgres User Group. We had between 50 and 75 people, which is pretty good turnout for a first time event. I gave two talks, my “Essential PostgreSQL.conf” talk (slides here) and my “Advanced WAL File Management with OmniPITR” talk (slides here). It was my first time in Denver (outside of the airport at least), and I have to say that the city is very well laid out for conference goers. The one tricky part was getting from the airport to downtown, but once you are downtown, their are plenty of good places to eat/drink, plenty of hotels, and the conference center itself is massive. After a couple nights on the town, I was honestly left surprised that I hadn’t been to a conference here before (maybe OSCon should swing through some year?) and hoping I’ll get the chance to go back. In any case, thanks to the PGDay Denver folks for putting together a nice event, and hopefully we’ll see others follow their lead with more PGDay’s in their part of the country.
Hey Folks! Looks like we had a snafu with the Meetup site where it was showing the meet on our old schedule last week rather than the new schedule. We’re in the process of fixing that, but wanted to make sure everyone knew that we are still going to meet on our new night, which is tomorrow, Tuesday, September 20th. This month, Theo will talk about application and systems performance measurement and why almost everyone does it wrong. It’s not hard to do right, but people often approach these things completely wrong. So, we’ll look at some numbers, understand why they are misleading and talk about the right way to approach these problems. Since we can’t always approach things the right way, we’ll talk a bit about adding a tiny bit of value to the “wrong” approach. When: September 20th, ~6:30PM. Where: 7070 Samuel Morse Dr, Columbia, MD, 21042. Host: OmniTI As always we will have time for networking and we can do some more open Q & A, and we’ll likely hit one of the local restaurants after the meet. BWPUG Meetup Page BWPUG Mailing List
Self-reflection and process analyzation are two critical components to success that I think people all too often overlook. When things are going good, people think they are doing things correctly, so they don’t need to self-analyze. Worse, when things are going bad, they often try to rationalize the problem away. I think this is missing a golden opportunity for most people, businesses, or teams. This past week I wrote a piece on the OmniTI Seeds blog discussing this topic; if you happen to lead such a group, you owe it to yourself and those around you to recognize The Opportunity of Crises.
I’m sitting in SFO tonight, awaiting my return trip back to Hurricane Pending Maryland. (As a former Floridian, I must of course scoff at any notions that this hurricane is significant). Walking through the airport I noticed a large billboard about “Big Data and the Cloud”. This is the kind of billboard you only see in Silicon Valley; I don’t see signs like that in Portland or Ottawa, and certainly not when I had to change flights in Detroit this year. Anyway, these two buzz words aren’t a local phenomenon, and are actually taking the tech world by storm. Big Data has become serious enough that there are multiple conferences now for folks interested in the topic. And cloud, well, perhaps harder to define, but more and more businesses are moving to the cloud every day. The problem here is that, most of the traditional ideas on big data run entirely counter to the ideas that work well in the cloud. Last spring I moderated a panel PGEast in New York that focused on Postgres in the cloud. As someone who works on multi-terabyte systems, and someone who deals with cloud servers on at least a semi-regular basis, I tried to prod and poke my panelists into sharing their take on how they see Postgres’s role in the cloud. Not too surprisingly, the idea behind “Big Data” on Postgres in the cloud was not a particularly popular one. The tools you need to do the job effectively with Postgres just aren’t there. Not to say you can’t try, but so far I haven’t seen many wild successes. Next month at Surge though, I’m going to be involved in another panel focusing on ”Pushing Big Data To The Cloud”. This time though I’m turning over moderating duties to long-time thought leader in the MySQL community Baron Schwartz. Joining me on the panel are several folks who all have a stake in the idea of Big Data in the cloud; John Hugg and Philip Wickline from VoltDB and Hadapt, respectivly, two new database vendors built with scale-out in mind; Bryan Cantrill, VP of Engineering at Joyant, a cloud provider with thier own strong opinions on dealing with data in the clouds, and Kate Matsudaira, someone who is currently managing those multi-TB databases, all in the cloud, over at SEOMoz. This should be a really good mix of people using different technology, with different biases against the problems involved. If you’re looking to work on Big Data in The Cloud, I hope you’ll join us, it should be a lot of fun.
In spite of all previous notions to the contrary, thanks to some last minute wrangling by the conference organizers, I will be making the trek out to Chicago this September for Postgres Open after all. I had been planning to sit out the event and just stay focused on Surge (which, I must say, looks even more kick ass than last year), but after looking at the schedule, and some persuading at OSCon, I’m very excited about what has been put together, and look forward to seeing many of my fellow Postgres community members once again. Oh, and in case you were wondering, I’ll be reprising my talk from this years Velocity conference, ”Managing Databases in a DevOps Environment”. At Velocity, the talk was intended to highlight how people already familiar with DevOps should approach their databases systems. I’m not sure how well “DevOps” is understood within the Postgres community, so I think I’ll try to emphasize the differences between managing databases and traditional services, to hopefully give better expectations to DBA’s whose organizations might be undergoing such a change. If you’re going to be at Postgres Open and are interested in the topic, I’d love to hear your feedback on what aspects of this topic you’re most interested in. (PS. I’ll also be heading to the Velocity Summit next week in San Francisco, for those attending, I’d love to hear your thoughts on this topic as well).
I often run my ops like I take care of data; a bit overzealously. Case in point, when setting up a new database, I like to throw on a metric for database size, which gets turned into both a graph for trending, but also an alert on database size. Everyone is always on board with trending database size in a graph, but the alert is one people tend to question. This is not entirely without justification. On a new database, with no data or activity, deciding when to alert is pretty fuzzy. When we set up a new client within our managed hosting service, I usually just toss up an arbitrary number, like 2GB or something. The idea isn’t that a 2GB database is a problem, it’s that when we cross 2GB, we should probably take a look at the trending graph and do a projection. Depending on how things look, we’ll bump up the threshold on the alert to a new level, based on when we think we might want to look at things again. For example, in this graph we take a month long sample, and then project it out for three months. We can then set a new threshold somewhere along that line. While this is good for capacity planning, there’s more that can be gained from this process. The act of alerting forces us to pay attention. And if we get notices before our expectations, we go back in and re-evaluate the data patterns. Of course, some times people will question this. Getting a notice that your database has passed 4GB can seem pointless when you have 100+ GB of free space on your disks. And besides, isn’t that what free space monitors are for? Here is a graph of another of our clients database growth. Their data size is not particularly large (don’t confuse scalability with size; it doesn’t take a large database to have scalability issues), but what’s important is that we kept getting notices that the size was growing, and when talking with the developers, no one thought it should be growing at nearly this rate. Eventually we were able to track down the problem to purging job that had gone awry. Once that was fixed, the growth pattern leveled off completely (and the database size returned to the tiny amount that was expected!)