At OmniTI, we are often approched by companies who think they are the next myspace. They’re sure they are going to have 10 million users any day now, and they come to us to get ideas and advice on just how to make that happen. Most of the time they have already formulated a plan based on “how the big guys are doing it”. Quite often they are at the point that they are hitting walls and trying to make sense of it.
I tend to think part of this is because how people think the big guys are doing it, and how it is actually getting done, are two very different things, especially when it comes to databases. The common myths range from the all too frequent “denormalize everything” and “eliminate all foreign keys for performance” to convoluted schemes involving horizontal and vertical partitioning (nay, sharding) amongst servers with a complexity level that make application coders dream of the simple times when they got to code cross-browser javascript for Netscape and IE 4. Do you really think the answer is to store all of your timestamp information in text fields? Some people do.
It’s funny that even with case studies by Sun and customers like Friendster and Renkoo, people stare at us like we’re aliens when we recommend something like PostgreSQL to them. Like the common myths, I think they assume that since we are into scaling and open source, we’re going to bust out a MySQL oriented plan making use of MySQL Cluster and Federated tables with replicated slaves and all sorts of other buzzword oriented ideas. Truth is I think we almost never recommend those types of solutions, even if we go with a MySQL solution for you. But it’s the PostgreSQL recommendations that make people think we have no clue what we’re talking about, because PostgreSQL and its use in social networking sites are almost never talked about.
The truth is that you can build a highly scaleable architecture using any database solution, if you do it right. Sure, some tasks are easier depending on which system you pick; need to drop an index on a table without rewriting the table? MySQL can’t help you, but PostgreSQL can. Need to reindex an index without blocking writers? PostgreSQL ain’t gonna work there, but Oracle will. It’s all a matter of degrees. Which sounds simple in theory, but gets very real in practice.
So what are “the big guys” using? Here are the top 10 social networking sites from June 2007 (based on data from compete.com), along with the database solution being used:
- myspace.com, SQL Server
- facebook.com, Oracle & MySQL
- bebo.com, Oracle
- tagged.com, Oracle & MySQL
- blackplanet.com, Oracle & MySQL
- myyearbook.com, PostgreSQL
- hi5.com, PostgreSQL
- classmates.com, PostgreSQL
- friendster.com, MySQL
- xanga.com, MySQL
Most people are surprised to see any entries from PostgreSQL on the list, let alone three. And while Oracle and MySQL both have a hefty share, I have to think that it just highlights that while you should be able to run a site on either Oracle or MySQL alone, since 3 of the sites use them in a hybrid fasion there must be more difficulty with those databases on some level; I speculate monetarily for Oracle (it seems unlikely you couldn’t replace any MySQL server with Oracle on a technical level), and technically for MySQL (since you don’t just buy Oracle licenses because you feel bad for not spending all your VC money, you do it because you need it).
This is not to say that PostgreSQL is always the answer. I know I don’t believe SQL Server is the answer, and it sits at #1. :-) But PostgreSQL is a legitimate option, and one more companies would be wise to consider. Or at least, not be so confused if we recommend it. It is also “what the big guys are using”.