BWPUG June Meeting 2010-06-09: PostgreSQL on FreeBSD

After a brief hiatus, BWPUG is back with an all new meeting for June! This months speaker will be Greg Smith, who reprises his talk from BSDCan, presenting on “PostgreSQL on FreeBSD”. The talk discusses some of the technical and business hurdles in deploying database on the FreeBSD architecture, and touches on topics like what former users of OpenSolaris might be looking for in a new OS. When: June 9th, 6:30PM. Where: 7070 Samuel Morse Dr, Columbia, MD, 21046. Host: OmniTI As always we’ll have time for networking and likely hit one of the local restaurants after meeting, hope to see you there.

phpPgAdmin 4.2.3 Released

phpPgAdmin 4.2.3 Released ———————– 4-13/2010 The phpPgAdmin Team is happy to announce a new bugfix release for phpPgAdmin. Version 4.2.3 fixes several long standing bugs and fixes some PHP 5 compatability issues in the 4.2.x branch. All users of phpPgAdmin are encouraged to upgrade to this new version. Download ——– To download right now, visit: http://phppgadmin.sourceforge.net/?page=download (RPMs available soon) Demo —- To give the fully-functional demo a try, visit: http://phppgadmin.kattare.com/phppgadmin4/ Deprecation Warning ——————- Note, this may likely be the last version of phpPgAdmin released on the 4.2.x branch. We are currently planning to release phpPgAdmin 5.0 this summer, which will support PHP 5+ only, and Postgres 7.3+. Legacy users should continue to use 4.2.3. Regards, The phpPgAdmin Team

Watch for Momentary Monitoring

One of the things I preach about a lot is good monitoring of your database servers; having tools in place to tell you both what good looks like and when things go bad is critical for large scale success. But sometimes you just need to monitor a momentary process, where setting up a check in your normal monitoring software is overkill. In these cases one tool that can help out is the watch command. Case in point, the other day I needed to back up a fairly large partitioned table (about 1.3TB on disk). The plan? A quick little script to pg_dump each of the partitions (about 325). Feed the script through xargs -P so I don’t swamp the box, but I get some concurrency out of things. And of course, I planned to run the whole thing in screen session. But dumping this much data will take some time, so how to check on the progress? When working on databases, one of the most natural things to me is to whip up some SQL to see what going on inside my database. Then you pipe that through watch, and you have some quick and simple monitoring. This example happens to be on postgres, but you could do it with any database’s command line program.
watch -n 5 'psql -U robert -h mk -d wario -c "select procpid, waiting, query_start, current_query from pg_stat_activity where current_query ~ '\''^COPY'\'' ORDER BY procpid"'
which gave me some output like:
Every 5.0s: psql -U robert -h mk -d wario -c "select procpid, waiting, query_start, curren... Mon Apr 05 15:22:03 2010 procpid | waiting | query_start | current_query ---------+---------+-------------------------------+------------------------------------------------------------------------------------------- 12706 | f | 2010-04-05 15:21:34.565754-05 | COPY wario.tblhits_p2005_1010 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout; 12728 | f | 2010-04-05 15:21:39.563847-05 | COPY wario.tblhits_p2005_1017 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout; 12732 | f | 2010-04-05 15:21:43.43266-05 | COPY wario.tblhits_p2005_1024 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout; 12735 | f | 2010-04-05 15:21:48.385349-05 | COPY wario.tblhits_p2005_1031 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout; (4 rows)
Here I can see all of my COPY processes running, and I’ve selected out what I think are the most relevant bits, like when each COPY started and if they are waiting on any locks. (Luckily I had named my partitions pretty obviously to know which ones were being worked on. Also, I threw in the order by clause so that as each piece finishes and the next one starts, they will cycle through the output. It’s a little hard to get the full effect without a screencast (sorry, too lazy for that), but the next bit of the cycle would look something like this:
Every 5.0s: psql -U robert -h mk -d wario -c "select procpid, waiting, query_start, curren... Mon Apr 05 15:22:19 2010 procpid | waiting | query_start | current_query ---------+---------+-------------------------------+------------------------------------------------------------------------------------------- 12728 | f | 2010-04-05 15:21:39.563847-05 | COPY wario.tblhits_p2005_1017 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout; 12732 | f | 2010-04-05 15:21:43.43266-05 | COPY wario.tblhits_p2005_1024 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout; 12735 | f | 2010-04-05 15:21:48.385349-05 | COPY wario.tblhits_p2005_1031 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout; 12836 | f | 2010-04-05 15:22:10.746129-05 | COPY wario.tblhits_p2005_1107 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout; (4 rows)
Notice the new line starts at the bottom and everything else shifts up. This worked well for this script, but watch has other options like highlighting changes and other goodies that come in handy. So the next time you need to keep an eye on something, pair up a little bit of SQL with watch for some quick and easy monitoring.

Postgres Returning for Google Summer of Code 2010

I’m very happy to announce that the Postgres project has been selected to participate in this years Google Summer of Code program. Over the next couple weeks we’ll be looking to solidify our mentor base; if you work on Postgres and would be willing to mentor a student, please send me a note so we can get you signed up. If you are a student and you’re interested in working on Postgres, now is the time to get your proposal together. Student applications will open up on March 29th, so we’d like to have our mentors in place for review, and hopefully had students discussing with the Postgres devs their proposals as much as needed. If anyone has any questions, feel free to email me, or track me down on irc. Handy links for Postgres GSoC: Thanks everyone, I’m looking forward to another interesting year with GSoC, and hoping you’ll join in. Article translated to Belorussian

Postgres at MySQL Conference?

During the MySQL conference Call for Papers there was some talk of getting one or two Postgres sessions into the mix, as a lot of MySQL users seem to have questions about Postgres these days. Alas, looking through the MySQLcon schedule I don’t see any on there. I’ve also looked through the BOF’s and nothing about Postgres to be found there either. So, maybe no one is interested in Postgres after all. However I held a Postgres BOF at MySQLcon last year and we got a handful of people, and since I am going to be at MySQLcon again this year, I might as well host one again. I think it’s too late to schedule one formally, but I can put some info on the schedule sheets once I’m at the conference; if you are interested in learning some more about Postgres, please keep an eye out.

Actually, the Relational Model Doesn’t Scale

Before all my fellow DBAs’ heads explode, let me just say that I am a relational guy. I like the relational model, think it’s the best tool for the job, and think every programmer (not just DBA’s) should aspire to be as familiar with it as they are with AJAX, MVC, or whatever other technology pattern you think is important. I’ll even take that a step further; I think the NoSQL movement is mostly a re-hash of failed technologies from the last century. Object and document databases had their run in the market (some might say “they had their time”), and they were pretty thoroughly beaten by the RDBMS; that some people have reinvented that wheel doesn’t change the game. That said, I find the recent comments from Jeff Davis on the relational model and scalability to be overlooking some things. The state of computing tasks has changed over the past two decades, and what we know about computer engineering has also changed. Working on highly scalable systems like we do at OmniTI, you can’t escape some of the inherent problems that you face when working in these types of environments. As much as I’d like the answer to every problem to be “just use an RDBMS”, Brewer’s CAP theorem just isn’t something you can ignore. When most people think about the relational model, they think of it in terms of parent-child relationships between tables. Without getting too deep in the details of it, I think it’s pretty fair to say that Primary Keys and Foreign Keys are very large part of any relational implementation, and that pretty much all RDBMS strive to allow you to add these constraints to your model; it’s what helps keep the data consistent. But there’s the rub. CAP theorem points out that as we strive for tighter and tighter consistency, we are pulling away from availability, and sacrificing partition tolerance. Two theoretical systems that run smack dab into each other in the real world. This isn’t really something new; if you have ever de-normalized, dropped a foreign key, or split data across multiple nodes, you’ve run into this before. Now, where CAP theorem falls on it’s face (imho) is that it also ignores another holy trinity of software development; Cheap, Fast, and Good. The size of your problem is dictated by the resources you have available; if you can afford decent tools (and let’s be clear, decent is not your web dev throwing up MySQL on an EC2 instance) it is quite likely that the stressors of the relational model will never impact you in a way that most CAP folks are worried about. This is also one of the places the NoSQL movement fails; by throwing the baby out with the bath water. Giving up your data integrity before you have scalability issues is a form of premature optimization. The trick, as Theo would say, is having the experience to know when such optimizations are and aren’t premature. So what’s the take away? I like to say that you use the relational model because it is best, and you use something else because it is necessary. Most SQL implementations can scale very well, and they should be your first choice when starting a new project. But we also can’t pretend that there aren’t inherent problems as these systems grow larger; let’s understand the trade-offs and engineer appropriately.

OmniTI Is Heading to PGEast 2010

PGEast is the premiere Postgres conference held inside the U.S. each year, and this years conference, in Philadelphia, is now less than a month away. The organization and formatting have evolved a little from previous years, but one things still continues; a very strong presentation line up. We at OmniTI are very happy to be among that group of people, with four talks in this years conference lineup; Know More Waiting, A Guide To PostgreSQL 9.0 by Robert Treat (hey, that’s me), will give an overview of the upcoming PostgreSQL 9.0 release. While we’re still a few months ahead of release, but we have a pretty good idea of what’s coming in the next release, and this talk will help you start planning for how you will be able to take advantage of the new features coming our way. PostgreSQL, meet AMQP, by Theo Schlossnagle, looks at pg_amqp, a “contrib” style module for Postgres that provides transaction style message queuing from inside of Postgres, using the AMQP standard. Yet Another Replication Tool : RubyRep by Denish Patel, will delve into one of the newer Postgres replication solutions on the block. RubyRep is design for dead simple installation and setup, while still delivering advanced features like data comparing, synchronization between servers, and even master-master replication options. Database Scalability Patterns by Robert Treat (me again), takes a look at the common patterns around scaling your database solution, and looks at some of the different options available to people scaling with Postgres. But wait, there’s more! While we at OmniTI are definitely excited to be participating in PGEast this year, there are a number of other good talks and speakers, including Magnus Hagander, Jeff Davis, Baron Schwartz, and many others. For full talk details, check out the conference talks page; I hope you’ll journey out and say hi, it should be a pretty good time.

BWPUG March 10th, Falls Church Take Two.

Barring a repeat of last months snowmageddon / snowpocalypse, we’re going to take another stab at heading down to Falls Church, Va, for the March BWPUG meeting. If you haven’t felt like trucking out to Columbia, then please try to make this one. Depending on response we may be able to do this more often. When: March 10th, 6:30PM. Where: 3150 Fairview Park Dr, Falls Church, VA Host: Noblis, Inc. and the Noblis Innovation and Collaboration Center (NICC) It’s basically at 495 and 50 in Northern Virginia. We’ll be discussing the upcoming PostgreSQL 9.0 release, including a preview of my upcoming talk at PG East. If you’re planning to attend, please RSVP to Stephen Frost via sfrost _at_ noblis.org. The host/facility we’ll be meeting at does require a government issued photo ID (eg: driver’s license or passport). Hope to see you there!

MySQL, Open Source’s Version of “Too Big to Fail” ?

When I was younger, I remember hearing the phrase “too big to fail” being used to describe very large companies in the US, often financial institutions of some type. At the time I had thought the meaning of this phrase was an indicator of size of a company, the diversity of it’s business dealings, and it’s financial reserves. The idea was that, as the size of the company grew, its ability to withstand a hit in any one market would increase, because other areas of the business could keep it going. Last year as the financial crisis was getting into full swing and our government was looking at bailing out companies, this phrase took on a fairly different meaning, more so referring to the idea that a company had grown so big and so well integrated into the daily economy that it’s failure would be catastrophic to the larger financial ecosystem. Or as I more cynically thought of it, the company had grown so big it was able to grease politicians at every level of the system thereby ensuring its future. Too big to fail indeed. So now we look at the story of MySQL, and the wild cries from its former leader about the need to save mysql. No surprise coming from him, but a little bit of surprise that people would buy into the story when so many holes are easily apparent. Right off the bat the claims of Oracle’s monopoly are almost laughable; Oracle is certainly powerful, but they are by no means a monopoly; by most accounts they don’t own even 50% of the database market-share, whether you look at revenue or mindshare. Certainly if they were a monopoly power we would see far more influence from them within the industry; a good example is how they were unable to push their recursive query syntax into the SQL standard. Does that sound like monopoly power? But even if it were true, that shouldn’t mean the end of MySQL. MySQL is one of the largest open source projects on the planet; it’s user base certainly in the millions. You would think that such a large user base would given the software the strength to withstand the raise and fall of corporate masters, just like our companies get stronger legs to stand on as they spread out into different segments. This wouldn’t be the first time that open source licensing has saved software from a poor commercial fate, in fact it isn’t even the first time in the realm of open source databases. Both Ingres and Firebird have had their share of bumpy roads, and in the end their open source nature has allowed companies to rebuild around those products and users to continue to press forward with the software, long after the commercial vendors were aiming to shut their doors. Sure if these open source databases can do it, then MySQL and it’s community must be big enough not to fail, no? Apparently not. Instead of this belief from MySQL’s founder, instead of focusing energy into building a more stable structure around MySQL (like OurDelta and/or the Open Database Alliance), or enhancing alternatives to MySQL (MariaDB, Drizzle), we instead get a different version of “Too Big To Fail”. We get told about the importance of MySQL as competition for Oracle, as if Open Source could not compete against Oracle without a commercially backed MySQL. We see demands for restrictions to be placed on Oracle that MySQL’s own customers have asked for for years and MySQL AB ignored. We see complaints about Oracles lack of discussion on their plans for MySQL, and when they do speak out, we’re told that these promises are empty, even though they are better than the “no promises” we got from MySQL and Sun. We are told that, like the financial companies did last fall, that we must force the government to intervene, to prop up MySQL so that it will have a future going forward. The European Union is concerned about keeping competition in the database market; this sounds like a fair goal to me. But exactly which competition are they concerned about? The commercial database market has healthy competition from several major vendors (Oracle, IBM, Microsoft) as well as many old and newer smaller commercial vendors (Sybase, Progress, Vertica). The open source world mirrors this, from well establish open source projects (Postgres, Sqlite), to cross-over open source projects/products in the vein of MySQL itself (Ingres or EnterpriseDB), to upcoming systems that are pushing their way into database management (Hadoop, Tokyo Cabinet). Have no fear, if you want competition in the database market, it’s there. Even if MySQL were to disappear (and it won’t), the competition will be there. I would agree that a MySQL under Oracle would mean less competition between *those two products*, in the way that we saw IBM settle differences between Informix and DB2, but this will have virtually zero impact on the database industry as a whole when it comes to the number of choices available. (Trust me, there is waay to much money to be made from a MySQL that gets “killed” by Oracle for companies not to give it a go) Disclosure: I work at OmniTI, we do database management and consulting for various database systems including Oracle and MySQL. I don’t think the Oracle purchase affects my bottom line either way, but I have to admit I would feel safer upgrading an Oracle product than a MySQL product, so that probably makes me less afraid of an Oracle purchase than your average MySQL user. OTOH, I’ve actually hung out with Monty on occasion, and Black Vodka aside, I’m pretty sure I’d rather go drinking with him than Larry. *shrug*

LISA 2009 Wrap-up

While a good portion of the Postgres community was making their way to France for PGDay Europe, fellow BWPUG member Greg Smith and I were manning the home-front in Baltimore at the 2009 Large Installation and Systems Administration (aka LISA) conference, held this year in Baltimore, MD. The two of us took to the exhibition floor to man a booth for the PostgreSQL project, a two-day stint that gave us plenty of face time with the LISA attendees. For me it had been three years since my last LISA conference (at my other local city, Washington, D.C.) so I was curious to see how things had changed since then. Some thoughts/notes I took while working the show floor: 1) There were a lot of Postgres users at the show. A lot of *happy* Postgres users. Compared to 3 years ago when we ran into just a few, more than half the people who stopped at the booth were already using Postgres. 2) No one asked me “so why should I use Postgres instead of MySQL”. Which is not to say the topic of MySQL didn’t come up, but the above question is by far the #1 question I normally hear working community booths (even got it at OSCon this summer), so to not get anyone asking was quite a surprise in retrospect. I think this is probably due to two factors; first that Postgres advocacy has been working hard to make the case for Postgres and clarify the differences between the two projects, and second that we’ve gotten a lot of converts over the past three years so there’s much more knowledge about Postgres these days. A couple people showed there was still work to do; some glossed over differences between the projects, and one person even thought Postgres was the commercial version of MySQL; so the job of Postgres advocacy goes on. 3) So where did they come from? Many of the people who told us they were happy Postgres users also mentioned previous database systems they had worked on. These aren’t formal numbers, but I’d say the breakdown was close to 55% MySQL, 35% Oracle, and 5% Sybase and 5% MSSQL. Again rough numbers, but that seems about right. As the LISA crowd is heavy on system administrators, the complaints were mostly that MySQL was a pain to keep running (regular corruption issues and similar problems), and Oracle just couldn’t justify its cost. 4) One person I spoke told me about a problem they had with setting up authentication. They run a university where they initially setup authentication for students via LDAP which they thought was pretty nice. They then ran into a problem because the students had to write scripts for classes, which required them to hard-code in their ldap passwords in the scripts, which were easily read by other students. They ended up solving the problem by configuring the apache server to run files as the script owner rather than the more standard “nobody” user, which allowed them to prevent others from seeing individual scripts. This isn’t the first person I’ve run into with this type of problem; I’d love to see more people blogging on topics like this. 5) Several people asked about the business model behind Postgres. Many people get stuck in the idea that every piece of open source project has a single corporate backer/owner. I’ve been a big proponent of highlighting both the strength of the Postgres community and the nature of being a true Open Source project; so for me these are great questions to get to talk about, but it’s something we should make sure other folks volunteering for booth duty are prepared to answer. Finally, I want to say a big thanks to the folks running LISA and to the crowd at large. In a conference thin on DBA’s, we still managed to get a number of donations which will help with further advocacy efforts. I guess system admins are into solid database software too. :-)