Disaster Recovery at 1000 GB's

I had mentioned to a few people our TB+ disaster recovery scheme at the PG-East conference last week, with hopes that we would be doing a full on recovery test in early April. Lucky for us, we’ve been able to do a rough run through, so I wanted to report some results. First, a quick recap of why most of the common [http://www.postgresql.org/docs/current/interactive/backup.html backup solutions] suck for our needs: * pg_dump is pretty much a joke with 1TB+ of data, and especially on our system which has constant data churn, and enough mutating schema to make getting a consistent snapshot unworkable. * pitr would be nice for failover, but it isn’t a real disaster recovery system. The key problems are issue with either corrupted xlogs making thier way to the slave, or data corruption issues getting propogated into your “backup”. If you don’t have a static snapshot, you can hose yourself in some un-fun ways. * slony (or bucardo, or other replication systems) also suffers from the issue of data corruption getting propogated onto your slaves, with no method to get back to a legitimate copy of your work. Again, this is fine when trying to solve failover, but not always the right answer for backups. So, what we need is to make a copy of the database, and stick that some place safe and secure, so in case something goes horribly wrong (for **really** scary versions of horribly), we can get back to data that we know is good. The basic scheme goes like this:
    * Need a working PostgreSQL setup with archive_mode = on. We’re using 8.3.1 on Solaris 10 / u4, all on top of [http://en.wikipedia.org/wiki/ZFS ZFS]. * Issue a pg_start_backup() in postgres * Do a zfs snapshot of each filesystem (all tablespaces, and be sure to include all of $PGDATA, including all xlogs) * Issue a pg_stop_backup() in postgres * Ship the snapshots over to a back up machine (ideally somewhere remote, possibly on tape) * Wait 3 days for the last step to finish [[image /xzilla/templates/default/img/emoticons/smile.png alt=”:-)” style=”display: inline; vertical-align: bottom;” class=”emoticon” /]]
Once you have a the full snapshot, the idea is you can lay this onto your server, fire up postgres, and it will start as if coming up from crash recovery. The key to this process is having some form of archive recovery running so that you can run the snapshot commands between the start and stop backup commands, thereby capturing all the xlog files (and it is critical that you have all of them). The actual start-up process is un-climactic:
bash-3.00$ pg_ctl -D /pgdata/ start pg_ctl: another server might be running; trying to start server anyway server starting bash-3.00$ cat /pgdata/pg_log/postgresql-2008-04-07.log 2008-04-07 12:30:50 EDT:@:14724: LOG: database system was interrupted; last known up at 2008-04-04 14:22:49 EDT 2008-04-07 12:30:50 EDT:@:14724: LOG: database system was not properly shut down; automatic recovery in progress 2008-04-07 12:30:51 EDT:@:14724: LOG: record with zero length at 381/A20329C8 2008-04-07 12:30:51 EDT:@:14724: LOG: redo is not required 2008-04-07 12:30:51 EDT:@:14722: LOG: database system is ready to accept connections 2008-04-07 12:30:51 EDT:@:14727: LOG: autovacuum launcher started
Exciting eh? We’re currently going through some more testing, but all results so far have been positive (tables look right, seems easy to access everything, etc…), certainly we’ll post more if we uncover something. For those looking at the details of the above, it didn’t actually take 3 days to send the data to the other machine, but it is upwards of 24 hours, and can take longer when pushing to tape or similar setups; which becomes really important when you want to pull the data back out from the backups. Also it’s worth noting you could probably do this with something other than zfs, but zfs certainly makes it nice and easy to script the whole thing and ship it around.