A Present From My Filesystem
People ask me why I haven’t blogged much lately and I generally tell them I’ve been busy fixing broken stuff. Then they tell me to blog about the broken stuff I’m fixing. So, ok. Here’s a random encounter from before the holidays….
There’s nothing like trying to log into a database server to do some simple schema work, only to have you’re login hang trying to load your session. After some brief poking around and attempts to login with different accounts, I was able to see that we had a disk problem on the database. (Nail in the coffin, zpool status, the zfs command which shows the condition of your drives, was hanging trying to show the last pool). So, poke the SA’s and see what they can dig up… and unfortunately things were bad enough, we were going to need a reboot. So, we reboot.
Upon restart, well, things don’t restart; where “things” means postgres. SMF shows that we tried to start, but something has gone awry. Looking in the database logs, I see this
Uh, that’s not good. And looking at the file, it’s even weirder. It’s owned (and grouped) by root, something that shouldn’t actually ever happen. There are a couple ways to go with that, the way I thought would work would be to copy the file off to a side directory (“borked” seemed like a good directory name for this), and then change the permissions so that postgres owned the file again. After that, fire it back up
Well duh. Ok, so look in the dir, and turns out there are only three files like this, so fix them up the same way, and see if it will work. A pg_ctl later, and now I see this in my logs:
zoinks… but the server doesn’t die and actually, I think it will work. And it does.
Well, at least enough to get things working. But that’s really not the whole story; when something looks like it has gone wrong, you need to dig just a little further. First I want to check out who those three files were:
pgods=# select \* from pg_class where relfilenode in (58757497, 58757498, 58757499);
relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex | relisshared | relkind | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
------------------------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------------+--------+------------
ban_activity_log_snap1 | 17322 | 41561 | 10 | 0 | 58757497 | 16385 | 860 | 77570 | 41562 | 0 | f | f | r | 6 | 0 | 0 | 0 | 0 | 0 | f | f | f | f | 1281741311 | |
pg_toast_41559 | 99 | 41563 | 10 | 0 | 58757498 | 16385 | 0 | 0 | 0 | 41564 | t | f | t | 3 | 0 | 0 | 0 | 0 | 0 | f | t | f | f | 1281741311 | |
pg_toast_41559_index | 99 | 0 | 10 | 403 | 58757499 | 16385 | 1 | 0 | 0 | 0 | f | f | i | 2 | 0 | 0 | 0 | 0 | 0 | f | f | f | f | 0 | |
Well, the relation is totally harmless; it’s a replicated copy from a different database, so I can toss it out and rebuild it if needed, but I also need to see what the toast table and index are for.
pgods=# select \* from pg_class where oid = 41559;
relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex | relisshared | relkind | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
------------------------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------------+--------+------------
ban_activity_log_snap1 | 17322 | 41561 | 10 | 0 | 58757497 | 16385 | 860 | 77570 | 41562 | 0 | f | f | r | 6 | 0 | 0 | 0 | 0 | 0 | f | f | f | f | 1281741311 | |
Again, I’ve dodged a bullet. It looks like all of those files came from the same relation. So chances are we’re ok. But before I start this under SMF, I do a couple of checkpoints and xlog switches. A little bit superstitious, but also I want to avoid having to go back through the previous batch of xlogs if I can. (I also did a vacuum verbose on the postgres database in the cluster).
Now I know that some people might be uncomfortable running there database after something like that; they might want to do some kind of dump/restore or reload or backup. Realistically, this is a multi-tb database, and we can’t afford the downtime needed for those luxuries, so we’re going to keep pushing forward. It’d be nice if Postgres had better tools to do filesystem level verification of tables/index files, but sometimes you have to make do. We’ve got some things working in our favor; zfs checksums on the files, and we’ll dig around a bit to see what else we can turn up. But for now, I consider it within the spirit of the season that things didn’t go worse.
BWPUG Meeting November 10th
Scalability, I Has It
BWPUG September Meeting 2010-09-08: PostgreSQL Security and SE-Postgres
Now What? (Wrt OpenSolaris and Your Database)
Too Soon?
The first thing I have said to everyone is that it is honestly too soon to make any moves. Oracle is notorious for being poor communicators, and at this point I don’t think we’ve seen enough official communication to really know what’s going to happen. This doesn’t mean you can’t start planning though! We’ve been looking at some of the available options since before the Oracle/Sun merger was closed, so it doesn’t hurt to start evaluating the options out there. However there’s no need to rush in to things; it is possible that the announcement of OpenSolaris’s death might be premature. I personally don’t believe Solaris can’t survive based on the model we’ve just seen laid out; there are too many people learning the gnu tool chain who won’t be willing to invest big money into a tool that is hard for them to use. They need a low cost / free option for people to familiarize themselves on (and all the better if it installs gnu tools by default). There’s an outside chance Oracle might come to this conclusion, which would give new life to OpenSolaris. A more likely alternative to that theory is that some other group might pick up OpenSolaris maintenance and start pushing it forward. Certainly not an easy task, but there are already several different distribution of OpenSolaris available, so the userland level management has the resources, we mostly would need to figure out how to handle the more core technologies that have been maintained by Sun. I think this might also be possible, as there are numerous companies already heavily invested in OpenSolaris technology, and there are Solaris internals hackers looking to move out of Oracle, it’s not an impossible leap to think we might see something worked out. And if Oracle continues to make technology available via the CDDL (which most of the current signs seem to indicate), this could work out. I would say that this might not resemble the OpenSolaris as it is now, but could definitely be an option for current users who’d like to remain on the OpenSolaris platform.Other Options?
Of course, you might not want to put all your eggs in that basket. So what other options do we have? Well, that mostly depends on what you’re getting out of OpenSolaris now, and what you want out of your OS going forward. For many people, I suspect that Solaris 11 Express might be a suitable replacement, especially for those running mixed OpenSolaris / Solaris environments. Migrating up to full Solaris 11 will also cover most of your technology needs, so depending on pricing I suspect people may find that a cheaper alternative to migrating to a new platform. Of course, if you want to stick with a free software solution, this won’t really be an option. FreeBSD seems to be the most obvious alternative platform. If you’re currently taking advantage of dtrace, zfs, and zones, FreeBSD gives you options to cover all three. It won’t be the same; the dtrace and zfs implementations are pretty close aiui, but for zones you’ll probably have to use either Jails or OpenVS, neither of which am I a fan of. I think you’d also find a larger overlap in system utilities (tar, find, grep, etc..) between FreeBSD and Solaris, so for people (and scripts) making the transition, this might be an easier move. The big question here is probably hardware support; if you can’t get FreeBSD running on your hardware, that’s likely to be a show stopper, unless you can work out a new hardware purchase in the transition :-) So, if you don’t want to go closed Solaris, and FreeBSD isn’t an option, that probably leaves you on Linux. People sometimes think I don’t like Linux; I’m actually very comfortable on it. My first “unix” was Linux, and we run some extremely demanding systems on Linux and it has performed well in those cases. However if you’re trying to do deep introspection, systemtap is a poor man’s dtrace. And if you are relying on zfs, you’ll have a hard time finding a suitable replacement amongst the current Linux options. Personally I am most comfortable on ext3, but I tend to give up on file system snapshots, which is a painful submission if you have to make it. XFS is probably the next most common option, and generally I’ve no bones about using it if you want to avoid ext3. Of the three “advanced” replacements; ext4, btrfs, and zfs on linux; I think ext4 is probably your best bet, but only because zfs is too new for any serious database systems, and if you are moving off OpenSolaris to get away from Oracle, “butter” seems like an odd choice.And so…
I think it’s wise to keep things in perspective. There are some cases where you want to be a technology leader (we’ve been running Postgres 9 for months on some systems), but generally speaking when it comes to picking the operating system and filesystem for your database, it’s best to tread lightly. Now is a fine time to start evaluating your options; at least figure out what features are critical to your enterprise that you’ll need to replace (and don’t just think about database, you might be relying on crossbow for something, or who knows what else). We’ll certainly be watching the current options available, and I suspect diversifying a little, over the next 6 months, as we wait for the picture to clear up where we can. We’re not in a hurry (after all, we do have the source code of what we’re running now), and I don’t see much reason for others to be either.Database Scalability Patterns - OSCon 2010
I Hardly Gnu, You?
“Everyone who bitches about the GPL … I’d like them to remove all the GPL software from their computers, and see how they get on.”Now, this is all apart of some brouhaha over Wordpress and it’s templates or something. I personally don’t care about any of that (I’ve long moved past Wordpress, as I find them to be bad open source citizens), but it did get me to thinking, as a person who does think the BSD license is a better open source license than the GPL, just how much of an inconveneince would this be? I certainly do rely on a bunch of gnu software that I’d just as soon not live without, like my playstation, or our dvr, and probably our router too. However I don’t really have a choice in these things, and to be fair I don’t think there is anything perticularly special about linux that makes it better for a dvr than you could do with BSD. But let’s consider things where I do have a choice, where most software developers have a choice, which is in the tools we work with and things we work on. This is pretty broad, so let’s just consider the classic LAMP stack that most people work on. Apache is of course, available under the Apache license, so it’s already in the clear. While I like Linux, I’ve long replaced it in my life with a mix of OSX, *Solaris, and *BSD, for computers where I get to choose the OS. Yes, our eeepc does run Linux, but I could probably switch that to some BSD system if needed. Likewise MySQL has always had the backseat next to Postgres or Oracle, and even the NoSQL fanboys have plenty of non-gpl options (Couch, Hadoop, Cassandra, Voldemort, etc…) to pick from. And finally, whether your P is Perl, PHP, or Python, all of those languages are available under non-GPL licenses. So, I guess the “LAMP” stack could go on. Surely there must be some things though right? I started to think about other tools I work with regularly, like X and vim, and while I’m sure there are some tools that might be gpl, certainly many are not, and I’d guess between Solaris and BSD, I could make a GPL-free stack that I’d be comfortable working on, with software that is already available. This isn’t to say I wouldn’t miss anything. I love my recursive grep, find the -P argument for xargs amazing, and find BSD tar just crippling to work with. None of those would be as bad as losing screen, which is a must have for any serious server work. Of course there are alternatives for all of these (not that I know anyone who uses tmux), and I’d bet some of these features could be easily re-implemented in a BSD version if needed. I think where this really get’s you is in the software that has been built upon these base tools, perticularly in the area of PHP software development. A lot of people over that last 10 years have produced GPL lciense software, like Drupal, PHP-Nuke, phpBB, phpMyAdmin, and more. I think most people didn’t really think about there license choices back then (perhaps not now either); “if it’s good enough for Linus”. It’s too bad, I know I’d much rather license phpPgAdmin under the BSD than GPL, but I think we’re pretty much stuck at this point. I have noticed some newer projects (Habari for instance) have chosen non-GPL licenses; I don’t know if that is a trend or anything, but it wouldn’t hurt if it was.
BWPUG July Meeting 2010-07-14: PostgreSQL Performance Farm
Hey all! This month BWPUG returns to DC/Falls Church! When: July 14th, 6:30PM (this Wednesday!) Where: 3150 Fairview Park Dr, Falls Church, VA Host: Noblis, Inc. and the Noblis Innovation and Collaboration Center (NICC) It’s basically at 495 and 50 in Northern Virginia. We’ll be discussing the work and progress made so far on the PostgreSQL Performance Farm scripts done by Scott (an intern working for Noblis over the summer from Virginia Tech). If you’re planning to attend, please RSVP to [email protected]. The host/facility we’ll be meeting at does require a government issued photo ID (eg: driver’s license or passport). Hope to see you there! Thanks, Stephen