Techdocs Update

Just got done with a slew of various website updates and wanted to give a pointer to 2 new articles we added to techdocs. Both by Carlos Crosetti, one is [http://www.geocities.com/ccrosetti/otl/index.html Programming OTL using PostgreSQL and unixODBC in the Solaris environment], the other [http://www.geocities.com/ccrosetti/otlw/index.html Programming OTL using PostgreSQL and ODBC in the Windows environment]. Both have a bunch of pretty pictures, always a bonus, so go check them out.

PostgreSQL Naming Names

With the help of several members within the community, I’ve compiled a list of [http://www.postgresql.org/about/users well-known companies and organizations that are making use of PostgreSQL]. It includes both companies who use us as a traditional enterprise style RDBMS (Afilias, The National Weather Service) and companies that have found PostgreSQL as a good database to use to build thier own products on (Apple, OpenMFG). In some cases we’ve provided pointers to public information including articles, case studies, or on-line documentation so you can see how these organizations make use of PostgreSQL. The list doesn’t included companies like CNN, ITN, Fox News, MTV, or Disney who some of us in the postgresql community know make use of postgresql by way of third party products and services they use. I based this decision on the thought that most of these companies probably don’t know they use PostgreSQL, so they probably aren’t the best candidates for this list. (However maybe you should keep in mind that every company that has an Apple runs PostgreSQL and the BSD operating system, nothing to sneeze at) Incidentally, if you know of any “name” companies making use of PostgreSQL who don’t consider it a trade seceret and would be willing to go on record, please do drop me a line. By name companies I am thinking companies that are [http://bwnt.businessweek.com/global_1000/2003/index.asp Global 1000], or [http://metamoney.com/globalListIndex.html Web 100], or have similar name recognition. We know they’re out there, we just need to convince them to speak up. [[image /xzilla/templates/default/img/emoticons/smile.png alt=”:-)” style=”display: inline; vertical-align: bottom;” class=”emoticon” /]]

Technical Prejudices

Nat Torkington recently had a pointer to a rather serious topic about using google to see peoples prejudices towards others, known as the [prejudice map](http://blog.outer-court.com/prejudice/). It is probably worth looking at on its own merits, but rather than focus on something mundane like improving humanity, I thought I’d try this technique on something much more important: geek religious wars! To that end here are the results of some common thoughts in the open source database community:

PostgreSQL is known for: features, stability, reliability

MySQL is known for: simplicity, reliability, speed, scaling, data corruption and performance.

And Sqlite? Apparently it’s known for not having any documents found [[image /xzilla/templates/default/img/emoticons/tongue.png alt=”:-P” style=”display: inline; vertical-align: bottom;” class=”emoticon” /]]

Bruce Momjian on BSDTalk Podcast

PostgreSQL core-team member Bruce Momjian was [http://bsdtalk.blogspot.com/2006/02/bsdtalk015-interview-with-postgresql.html recently interviewed] on the BSDTalk podcast. The overall podcast feel is pretty low-key, and the audio volume on Bruce’s segment is a little low, but if you turn up your volume it should be clear enough. Bruce hits on a number of topics including PostgreSQL on Windows, the PostgreSQL [http://www.bookpool.com/sm/1590595475 book] market, and in-place upgrades, so if you’re currently using postgresql or just getting started, I’d say give it a twirl.

ajaxMyTop for PostgreSQL (Link Fixed)

Jeremy Zawadony had [http://jeremy.zawodny.com/blog/archives/006199.html a pointer to a project called ajaxMyTop] which is an ajax-ified version of his mytop program which does process listing for mysql. The [http://sourceforge.net/project/screenshots.php?group_id=156708 screenshots] looked cool, but I really wanted to play around with it so I converted it to PostgreSQL. If you’d like to play with it, you can find the code on the [http://sourceforge.net/tracker/index.php?func=detail&aid=1422853&group_id=156708&atid=800870 ajaxmytop sourceforge patch page]. ==ED NOTE== I’ve re-uploaded the tarball to the sourceforge page so it should now be working. Probably forgot to check a box or something. Also it’s not a patch, it’s the full code needed, so you can just download and install.

What the Free Non-free Databases Signal

If you go back about a year, and certainly not more than 2, you’ll find the big commercial db vendors all claiming to be ignoring the open source database offerings. Generally the claim was that you might use an open source database for really small work, but for anything serious the open source databases (and really they were focused on mysql) just weren’t feature complete enough to compete with their products. So with IBM now joining Microsoft and Oracle in [http://www.earthtimes.org/articles/show/5188.html offering a free version of their db’s], is there something we can come away with? I think it’s this: the commercial databases cannot compete on features any more. Basically databases tend to compete on price and features. For years these commercial vendors have stuck to higher prices justified by better features, but that’s just no longer the case. 2005 really marked the end of that trend for two reasons; First, mysql 5.0 was released, which pretty well rounded out the check-mark feature list that most pointy hairs look at when deciding on a database technology. Second, PostgreSQL (which was already feature complete) hit the mainstream tech press, with companies like Pervasive, GreenPlum and EnterpriseDB making use of PostgreSQL and helping push it’s name into the commercial tech press (and the rest of the public) in ways that [http://www.tmcnet.com/usubmit/2006/02/01/1332690.htm only commercial companies can do]. These two events meant that people in the techworld who weren’t really up on their open source databases were now realizing that there really was something to look at, and that the proprietary guys just weren’t offering any value for their higher prices. And so those proprietary vendors will now try to compete on price as well. Eventually they will lose. As more people begin to use PostgreSQL, it will only get better and better. Right now the sweet spot of PostgreSQL can be summed up very nicely by looking at the free db2 market: two processors, 4GB of RAM and unlimited database size. I use machines like that and can get systems into the 100’s of transactions per second range, which covers a lot of ground for most people. I’ll grant, if you have a couple of million dollars to throw at a database solution, you should investigate the big three and see how they stack up against the open source offerings, but if your in the sweet spot, PostgreSQL is your no-brainer choice. Why? Since the competition is even based on price and features (evidenced from the new “free” db’s), PostgreSQL still offers you the advantage of a real competition of vendors based on service without the need to change technologies. Whether you’re hiring me or you’re hiring [http://www.sun.com/software/solaris/postgres.jsp Sun] or something in between, PostgreSQL gives you a true market place of support avenues, rather than a single vendor solution. All else being equal, PostgreSQL still comes out ahead.

Letting Non-superusers Control Logging

I’ve decided that my quest to port one of my systems from 7.3 -> 8.1 is turning into a case study on why I shouldn’t upgrade. It’s not just the [http://archives.postgresql.org/pgsql-performance/2006-01/msg00265.php query trouble], it’s the little quirks that that trip you up as well. Todays topic? Controlling Logging For Non-Superusers… So the problem I face is very simple. One of the apps that hits the 7.3 database is an xml-rpc service that runs on about a half-dozen servers, doing a couple thousand transactions every minute. When something goes awry, it’s hard to debug, especially on 7.3. So one thing we do is include a debugging flag on each server that allows it to turn on sql logging by issuing the proper SET command. Unfortunatly I looked in my 8.1 log where we’re doing some testing and ran across this:

2006-01-30 14:03:03 EST : 11267 : 10.25.10.54 : ERROR:  permission denied to set parameter "log_statement"

2006-01-30 14:03:03 EST : 11267 : 10.25.10.54 : STATEMENT: SET log_statement = on

Ugh. Apparently this behavior was changed somewhere after 7.3. I say somewhere becuase I didn’t see any mention of it in the release notes. (Bonus points for anyone who posts a link in the comments) Luckily it’s not a complete catastrophe. Basically you need to write a function that allows non-superusers to turn the logging on and off. In my case, I don’t need to duplicate the full functionality of log_statement, just the basic idea of turning logging on or off, so I whipped out this little function:

CREATE OR REPLACE FUNCTION log_statement(boolean) RETURNS text AS $$

DECLARE

    param     text;

    retval    text;

    setsql    text;

    devnull   text;

BEGIN

    IF $1 THEN

        param := 'all';

        retval := 'on';

    ELSE

       param := 'none';

       retval := 'off';

    END IF;



    setsql := 'SELECT set_config(\'log_statement\',\'' || param || '\',true)';

    RAISE DEBUG '%',setsql; 

    EXECUTE setsql INTO devnull;



    RETURN retval;

END

$$ LANGUAGE 'plpgsql' SECURITY DEFINER;

You can could probably shorten the code down a little, but it will work like this and should be pretty simple to read. If you really used this functionality alot, you might want to do some type of overloaded version of set_config itself and make it security definer so you could control all of the log_* parameters but that’s a bit beyond what I needed. The only downside here is that we still need to touch application code to implement the change, but I guess you can’t have everything.

Techdocs Update

More [http://people.planetpostgresql.org/devrim/index.php?/archives/35-Solaris-packages-for-PostgreSQL.html good news for Solaris users], we’ve added a new document to the PostgreSQL [http://techdocs.postgresql.org Techdocs] site, this one discussing some best practices for [http://techdocs.postgresql.org/techdocs/pg_solaris10_jdbc_v1.2.1.pdf deploying PostgreSQL 8.1 on Solaris] (pdf). I don’t use Solaris personally, but found it a nice read anyway, so even if you DBA on other platforms I’d recommend giving it a once-over. Many thanks to Chris Drawater for another fine contribution.

Viewing Your Current Session’s Settings

Probably the most well known way to look at settings is by using the SHOW ALL command in psql. This commands basically does what it says; it shows all the postgresql variables as a resultset onto your screen. It works ok, but I’ve found when doing tuning it really gives me a lot of extraneous information that I don’t need, and doesn’t give me some of the information I do need. So instead I look at [http://www.postgresql.org/docs/current/static/view-pg-settings.html pg_settings]. By using pg_settings, you can query on specific information relevant to your current session. Want to see all of the values you’re using that aren’t defaults, try this:

bjt=# select name, setting, source, category  from pg_settings where source <> 'default';

              name              |               setting               |       source       |                       category                       

--------------------------------+-------------------------------------+--------------------+------------------------------------------------------

 autovacuum                     | on                                  | configuration file | Auto Vacuum

 checkpoint_segments            | 8                                   | configuration file | Write-Ahead Log / Checkpoints

 checkpoint_timeout             | 600                                 | configuration file | Write-Ahead Log / Checkpoints

 config_file                    | /var/lib/pgsql/data/postgresql.conf | override           | File Locations

 data_directory                 | /var/lib/pgsql/data                 | override           | File Locations

 effective_cache_size           | 90000                               | configuration file | Query Tuning / Planner Cost Constants

 enable_nestloop                | on                                  | session            | Query Tuning / Planner Method Configuration

 fsync                          | on                                  | configuration file | Write-Ahead Log / Settings

 geqo_effort                    | 5                                   | session            | Query Tuning / Genetic Query Optimizer

 hba_file                       | /var/lib/pgsql/data/pg_hba.conf     | override           | File Locations

 ident_file                     | /var/lib/pgsql/data/pg_ident.conf   | override           | File Locations

 join_collapse_limit            | 25                                  | session            | Query Tuning / Other Planner Options

 lc_collate                     | C                                   | override           | Client Connection Defaults / Locale and Formatting

 lc_ctype                       | C                                   | override           | Client Connection Defaults / Locale and Formatting

 lc_messages                    | C                                   | configuration file | Client Connection Defaults / Locale and Formatting

 lc_monetary                    | C                                   | configuration file | Client Connection Defaults / Locale and Formatting

 lc_numeric                     | C                                   | configuration file | Client Connection Defaults / Locale and Formatting

 lc_time                        | C                                   | configuration file | Client Connection Defaults / Locale and Formatting

 listen_addresses               | *                                   | configuration file | Connections and Authentication / Connection Settings

 log_destination                | syslog                              | configuration file | Reporting and Logging / Where to Log

 log_directory                  | pg_log                              | configuration file | Reporting and Logging / Where to Log

 log_line_prefix                | :%h:                                | configuration file | Reporting and Logging / What to Log

 log_min_duration_statement     | 5000                                | configuration file | Reporting and Logging / When to Log

 log_min_error_statement        | error                               | configuration file | Reporting and Logging / When to Log

 log_rotation_age               | 1440                                | configuration file | Reporting and Logging / Where to Log

 log_rotation_size              | 0                                   | configuration file | Reporting and Logging / Where to Log

 log_statement                  | ddl                                 | configuration file | Reporting and Logging / What to Log

 log_truncate_on_rotation       | on                                  | configuration file | Reporting and Logging / Where to Log

 maintenance_work_mem           | 32768                               | configuration file | Resource Usage / Memory

 max_connections                | 1120                                | configuration file | Connections and Authentication / Connection Settings

 max_fsm_pages                  | 200000                              | configuration file | Resource Usage / Free Space Map

 max_prepared_transactions      | 0                                   | configuration file | Resource Usage

 max_stack_depth                | 8192                                | configuration file | Resource Usage / Memory

 port                           | 5432                                | command line       | Connections and Authentication / Connection Settings

 random_page_cost               | 3.5                                 | session            | Query Tuning / Planner Cost Constants

 redirect_stderr                | off                                 | configuration file | Reporting and Logging / Where to Log

 server_encoding                | SQL_ASCII                           | override           | Client Connection Defaults / Locale and Formatting

 shared_buffers                 | 50000                               | configuration file | Resource Usage / Memory

 statement_timeout              | 2700000                             | configuration file | Client Connection Defaults / Statement Behavior

 stats_command_string           | on                                  | configuration file | Statistics / Query and Index Statistics Collector

 stats_row_level                | on                                  | configuration file | Statistics / Query and Index Statistics Collector

 stats_start_collector          | on                                  | configuration file | Statistics / Query and Index Statistics Collector

 superuser_reserved_connections | 20                                  | configuration file | Connections and Authentication / Connection Settings

 temp_buffers                   | 3200                                | configuration file | Resource Usage / Memory

 TimeZone                       | EST5EDT                             | command line       | Client Connection Defaults / Locale and Formatting

 transaction_isolation          | read committed                      | override           | Client Connection Defaults / Statement Behavior

 transaction_read_only          | off                                 | override           | Client Connection Defaults / Statement Behavior

 wal_buffers                    | 1024                                | configuration file | Write-Ahead Log / Settings

 work_mem                       | 1024                                | configuration file | Resource Usage / Memory

(49 rows)

of course, even this might be too verbose, but the beauty is that you can make these queries as simple or complex as you’d like, for example if you’re doing performance tuning, you might want something fancy like:

bjt=# select name, setting, source, context from pg_settings where category ~ 'Tun|Memory' order 

bjt-#   by source='session' desc, source;

           name            | setting |       source       |  context   

---------------------------+---------+--------------------+------------

 enable_nestloop           | on      | session            | user

 geqo_effort               | 5       | session            | user

 join_collapse_limit       | 25      | session            | user

 random_page_cost          | 3.5     | session            | user

 effective_cache_size      | 90000   | configuration file | user

 maintenance_work_mem      | 32768   | configuration file | user

 max_stack_depth           | 8192    | configuration file | superuser

 shared_buffers            | 50000   | configuration file | postmaster

 temp_buffers              | 3200    | configuration file | user

 work_mem                  | 1024    | configuration file | user

 constraint_exclusion      | off     | default            | user

 cpu_index_tuple_cost      | 0.001   | default            | user

 cpu_operator_cost         | 0.0025  | default            | user

 cpu_tuple_cost            | 0.01    | default            | user

 default_statistics_target | 10      | default            | user

 enable_bitmapscan         | on      | default            | user

 enable_hashagg            | on      | default            | user

 enable_hashjoin           | on      | default            | user

 enable_indexscan          | on      | default            | user

 enable_mergejoin          | on      | default            | user

 enable_seqscan            | on      | default            | user

 enable_sort               | on      | default            | user

 enable_tidscan            | on      | default            | user

 from_collapse_limit       | 8       | default            | user

 geqo                      | on      | default            | user

 geqo_generations          | 0       | default            | user

 geqo_pool_size            | 0       | default            | user

 geqo_selection_bias       | 2       | default            | user

 geqo_threshold            | 12      | default            | user

(29 rows)

That certainly gives you a quick overview of your current setting within your session when your doing performance tuning. And also don’t forget you can also create views of these types of queries if you want:

bjt=# create view "dba.tune" as select name, setting, source, context from pg_settings where 

bjt-#   category ~ 'Tun|Memory' order by source='session' desc, source;

CREATE VIEW

… and make use of the info as you need…

bjt=# select * from "dba.tune" where source <> 'default';

         name         | setting |       source       |  context   

----------------------+---------+--------------------+------------

 enable_nestloop      | on      | session            | user

 geqo_effort          | 5       | session            | user

 join_collapse_limit  | 25      | session            | user

 random_page_cost     | 3.5     | session            | user

 effective_cache_size | 90000   | configuration file | user

 maintenance_work_mem | 32768   | configuration file | user

 max_stack_depth      | 8192    | configuration file | superuser

 shared_buffers       | 50000   | configuration file | postmaster

 temp_buffers         | 3200    | configuration file | user

 work_mem             | 1024    | configuration file | user

(10 rows)

More Reasons to Switch to Postgresql

John Drummond gives [http://www.johndrummond.com/?p=7 a break down on what has lead him to look at postgresql]. It’s interesting to think what a good combo C# and PostgreSQL for developers looking for a platform for the next 10 years. There are certainly a number of windows users out there, but this also gives them an easy migration path between Win32 development and the LAMP stack (linux/apache/mono/postgresql) should they want it.