8.2.3 Stats Bug and Solaris

I’ve been following reports of the 8.2.3 statistics bug for a few weeks now, mainly with intrest/concern as we run some 8.2.3 databases at [http://www.omniti.com/home OmniTI]. The reports all seemed reasonable (see emails [http://archives.postgresql.org/pgsql-hackers/2007-02/msg01751.php here], [http://archives.postgresql.org/pgsql-hackers/2007-02/msg01902.php here], and [http://archives.postgresql.org/pgsql-hackers/2007-03/msg00068.php here]), but it seemed odd because I haven’t experienced such pains myself. I was kind of chalking it up to our 8.2.3 instances just not being heavily taxed (they can’t all be TB databases [[image /xzilla/templates/default/img/emoticons/wink.png alt=”;-)” style=”display: inline; vertical-align: bottom;” class=”emoticon” /]] ), but with some people [http://archives.postgresql.org/pgsql-hackers/2007-03/msg01529.php swearing off the 8.2 branch] until this gets fixed, I thought it might be worth a closer look to make sure of what was going on. I don’t have answers per say, but I do have some information… I ran a few tests on our machines that run Solaris and PostgreSQL (compiled with Sun Studio, not GCC):
the test that [http://people.planetpostgresql.org/joshua/ Joshua Drake] used: bash-3.00$ time for i in `bin/psql -d postgres -c "select generate_series(1,10000)"`; do bin/psql -d postgres -qc "select 1 from pg_database where datname = 'postgres'"; done;
Of course, this is Solaris, so I poked [http://www.lethargy.org/~jesus/ Theo] for some dtrace-fu, and (in the end) got the following results:
bash-3.00# /usr/sbin/dtrace -n 'syscall:::entry/pid==22260/{@a[probefunc]=count();}' dtrace: description 'syscall:::entry' matched 229 probes ^C close 10717 getpid 10717 ioctl 10717 lwp_sigmask 10717 open 10717 rename 10717 setcontext 10717 setitimer 10717 fstat 21434 write 21434 recv 80650 pollsys 91336
The first couple times I did this with a lower number of calls (1000 range) the numbers didnt seem so bad, but the ratio looked sketchy so I kicked it up a notch to get a better reproduction and sure enough, they start to look at lot like those reports on -hackers. This makes more sense as the bug in question isn’t something that should be able to be optimized away. So what does this mean? I dunno… if you do a fair amount of testing to make sure your servers can handle the load you’re going to be putting on them, I don’t see much reason to avoid 8.2.3 (Again, our boxes are humming along fine). Even if you think this will cause you a performance issue, testing your schema and application against any of the 8.2.x branch is a good way to spend time while you wait for 8.2.4 (the fix is in CVS now, so we should see a released fix soon), and in many cases resolving those types of fixes are generally more time consuming anyway.