Why the F&#% Doesn't Postgres Have Hints?!?!

Monday, February 7. 2011

Why the F&#% Doesn't Postgres Have Hints?!?!

Once again someone has brought up the idea of having a hints system in Postgres, which means once again we're all subjected to watching people trod out the same tired, faulty, and even self-contradictory reasons to try to justify the idea that Postgres doesn't have, need, or even want a hinting system. As frail as the arguments might be, people are so entrenched in their positions now that even having a discussion on the topic is difficult. And in fact, there are really two discussion going on here; one is whether Postgres should have any type of hinting system, and the other a more specific discussion on if we should have a query hinting system.

Does Postgres Need Hinting?

I think anyone who argues against hinting in an absolute sense is approaching things from the standpoint of intellectual dishonesty. Even posts that pretend to sustain the point of not having hints (like Josh's recent post) eventually seem to acquiesce to the idea that some system for hinting would be ok; maybe not query hinting but "selectivity" hints or "statistics" hints or some such similar jargon.

Now, it's true, some people do take the overly high minded idea that even that level of effort is mis-guided and that time would be better spent fixing these so-called bugs in the planner. While that sounds great *in theory*, I've found that if you watch the mailing lists long enough, you'll find people who have reported optimizer bugs and not been able to get a fix. Greg's laundry list of work-arounds isn't something new that he made up; it's a collection of watching people year after year run into performance problems and then be forced to come up with solutions because a planner improvement just isn't going to be forthcoming. Don't get me wrong; I'd love nothing more than to be able to send poor queries to the list and have code improvements flow out of that, but given that after 10 years of me and 1000's of others sending in those types of emails we've still not achieved a good enough planner to have killed the hinting conversation tells me that's an unachievable goal, and for those who need to GSD, we would be better off with something more tangible.

Hinting Yes, But Certainly Not Query Hinting

Ok, so presuming you are onboard with the idea that hinting is something that is necessary, even if not ideal, then the question really should be framed as "what kind of hinting should we use?". Most of the people involved in Postgres development who are vocal on this subject tend to be firmly against the idea of query hinting as implemented in all of the other major databases. I think the arguments for this break down into 1 of 3 different ideas, all of which are inter-related.

The first idea is that solving these problem at the query level is far too narrow a solution, and that we need a more general solution that can solve things without having to modify individual queries. While I think that would be great, I've often pointed out that the reason so many implementations have fallen back to query hinting is that that is where the manifestation of performance problems actually exist. In other words, if I have a table that gets 20 different queries on it, I don't necessarily want any special statistics changes or hinting mechanism at all, until the 1 query comes along that has performance issues. At this point I need something to improve that query, being careful not to have any negative impact on the already existing queries. Tricks like modifying join_collapse_limit, or random_page_cost, might solve your specific query problem, but if you change those on a global level you might wreck your existing system too. This pushes us back to query level hinting. Granted you could argue that those knobs don't require direct query modification, but it's not like those type of tricks aren't already in the system as well. Example? The highly touted "offset 0" optimization that you'll hear talked about is one example. I admit, I've told people to use this very same trick to fix queries after they have upgraded from older Postgres versions and had their query performance break. On the one hand, this makes me look like a "performance guru" because I know this super double-secret handshake optimizer trick; on the other hand, what kind of pompous jackass can stand there with a straight face and tell someone the "problem" with their query is that they aren't using this syntax hack which was entirely unnecessary on the same queries using the same database product that was released 2 years ago. Yes, that's right. This isn't just that it makes some queries slow; it makes queries that used to run perfectly fine totally nosedive on performance. (You might think that's a bug, but let's try not to digress.) Any query hinting system we had would ostensibly be no worse than the majority of work-arounds that are in play today, and IMHO they'd likely be better understood than explaining to people the intricacies of the different enable_foo GUC's (which aren't a good enough solution to solve problems within complex queries anyway)

The second idea against traditional query hinting is that the Postgres community can come up with a superior method at hinting, which would likely target data statistics rather than the query planner, and that efforts should be focused on that instead. This is again one of those ideas that sounds great *in theory*, and now-a-days there's even some fancy hand-waving talking about overriding the optimizers notion of column distinctness, or perhaps storing statistics on the correlation of data between multiple columns in a table, that make it sound like there's something to this line of thinking. The problem is that while these solution do address some of the (relatively) simpler cases, there are whole classes of problems yet to be dealt with. Have you ever tried to do joins across aggregated subquery columns in Postgres? Not only has Postgres struggled with these types of queries for years, the whole idea of improved statistics or selectivity hinting is pretty much irrelevant because no one can even come up with a plan for how these types of suggestions could be stored and fed into the optimizer. (Hint: It's hard to store statistics on columns that don't exist beyond the execution of a single statement). (Side note: I do think you might be able to feed these in on a per-query basis, but then we'd be back to argument number 1). This isn't to say it can't be done; I do occasionally see academic papers involving Postgres and improved query performance pop up, but I can't remember the last time I saw one cited in a Postgres commit, and again we've been working on this problem for years and have yet to see a concrete proposal.

The last major argument against query level hints is that because this solution ignores the underlying data, the fixes are at best temporary (since data can change over time), and will eventually bite you once that happens. Or, to put it a different way, query hints are nothing more than footguns which will eventually cause you pain. To that, all I can say is DBA's hate being eaten by crocodiles. (OK, if that isn't good enough for you, I'll also point out that it's a bit of a fallacy to think that the database can be a better predictor of data changes than a DBA; the database can only at best do statistical analysis after the fact; the DBA can do that as well, but can also know about potential data changes before they are implemented.)

Ok, Nice Rant, But You Know This Doesn't Change Anything, Right?

Yes, I know. I know because I have had all of these discussions before. Hints are religious in Postgres, so you shouldn't expect a rapid change here. On the other hand, Windows support used to be religious in Postgres. And you know what else? Replication used to be religious in Postgres. So, while you should never underestimate the stubbornness of the Postgres development group, it is possible for them to come around to ideas that were once consider verboten. I think it will probably be at least another 5 years before it happens though; there isn't enough overlap with the Oracle and Postgres communities to realize how many people are not able to migrate to Postgres because of planner/optimizer deficiencies; but I do think that day will come.

Posted by Robert Treat in postgres, sql at 08:33 | Comments (10) | Trackbacks (0)

Trackbacks

Trackback specific URI for this entry

No Trackbacks

Comments

Display comments as (Linear | Threaded)

s/you should expect a rapid change/you should not expect a rapid change/

#1 Filip (Homepage) on 2011-02-07 10:23 (Reply)

Indeed! Fixed, thanks

#1.1 Robert Treat (Homepage) on 2011-02-07 10:48 (Reply)

When I saw the trackback, from the intro text I assumed I'd picked up some trolling. No, these are all fair comments. I don't actually have any religion here. If I thought hints were the right approach, I'd be collecting data to support why they are necessary. The funny thing about playing with the optimizer is that the more you do it, the less regular hints seem like the way around the problems you run into though.

Hinting to fix selectivity errors will need to happen at the per-column basis, in the query. Overriding there should be stable over time, as selectivity tends to stay proportionately the same even as data sets grow. That's one reason why driving hints from that angle seems better to some developers.

The only way you're going to get a major change in this area is to have someone who is running into problems here fund someone to work on the relevant parts of the optimizer, with test data they can share with a few major developers. The lack of repeatable test cases for the problems people are running into is a major obstacle to doing better here.

#2 Greg Smith (Homepage) on 2011-02-07 13:46 (Reply)

About JOINs with aggregated results, even when a plan is weird, all my problems were solved with "WITH" clauses.

I guess that those hints were requested by people who still uses simple SQL querys from generic ORMs.

Not trolling, just a comment.

#3 Daniel Cristian Cruz on 2011-02-08 05:26 (Reply)

Quite the opposite actually. First, most ORM users can't add query hinting even when they want to; ORM's don't support it, and if you break out of the ORM to do it, chances are you could rewrite the query for speed.

Really it's needed for folks doing complex analytical queries; those 14 pages long queries that do large amounts of aggregation across datasets. WITH queries help, but don't solve all of those problems.

#3.1 Robert Treat (Homepage) on 2011-02-22 11:20 (Reply)

As some one who was an Oracle DBA for a long time, and then worked for EnterpriseDB and now works in a large Postgres shop, I admit I found the recent thread about hinting quite entertaining.
First, let me say that hints are something you use as a last resort. They are a hard-coded directive and certainly do not scale. However, sometimes you need to make things work because the business requires it. The VP of Sales & Mkting does not really care if your database implementation is not the most elegant - they want their app to perform. I agree that "in theory" it would be great if we didn't need a lot of these features because all designs/programs were perfect, but that is just not reality. As a DBA I'd love to tell the business people we need to redesign the schema and/or rewrite the code so performance is optimal and you don't need things like hints. Or I could add a one line hint. For the business, what's a better use of resources? The difference I see between Postgres and say, Oracle, is that Postgres is a not for profit company that is not driven by customers. There are plenty of Oracle features that exist to help companies get themselves out of the messes they've gotten themselves into. It's admirable that the Postgres community wants to make the optimizer perfect instead, but Oracle has been working on the optimizer for a long time with a large group of really smart people, and I'm thinking if they haven't made it perfect,
odds are the PG community isn't going to either.

#4 Peter Steinheuser on 2011-02-08 10:35 (Reply)

Not only is Postgres community extremely unlikely to produce the "perfect optimizer", it is also very rude to tell your users that develpers consider them dumber than a computer program. Also, commercial companies should take notice of that "we're not for profit" BS. Translated to English, that means: we're not going to listen for what the users ask for, we're artists, we'll do whatever the heck we think is right.

#4.1 Mladen Gogala (Homepage) on 2011-02-10 10:10 (Reply)

Yes, let's do whatever all user want. (sic)

Sure, it's not good to have a huge bloated project, with bad code being constraints to good and more important features.

I don't need hints. I set the right settings before the commands (http://www.postgresql.org/docs/9.0/interactive/runtime-config-query.html). Never needed more than this, just the manual.

#4.1.1 Daniel Cristian Cruz on 2011-02-11 06:04 (Reply)

Again, you show you're ignorance in this area. First, by admitting that you do per query planner adjustments, you are essentially admitting the query level hinting is a feature you would make use of. Second, if those tunable's are good enough for you, it shows you're not writing complex enough queries to have been affected by this; think of the case where you may want to turn off index scans for a specific subselect, but turning of all index scans is disastrous. You need a more fine grained hinting system than what the current GUC's offer.

#4.1.1.1 Robert Treat (Homepage) on 2011-02-22 11:23 (Reply)

Nice post.

I just tried to add my comments on josh's blog, but after i read your article there where some additonal thoughts in my mind:

It is a pain if products are restricting functionality on purpose. Because it means that they did not consider use cases beyond their current view point.

A metaphor to describe this:

I sometimes face this issue with java libraries. Yes it is nice that a library owner prohibits other developers from inheriting functionalty, because it ensures that its not possible to break the intended functionality.

But if you are hitting an edge case where you need to tweak some of the original intentions, you are facing the issue that it gets really hard to change something. In the end you are ending up overrideing tons of methods just because its the only way to change something because of a good reason.

I guess that the same type of "problem" can be found in many scenarios and is really not unique to postgresqls query-hint problem.

I think that postgresql followers are seing the problem from the wrong perspective.

They always tell everyone a "we dont want it" story, but in reality they want to tell something like "sorry, we lack on manpower to implement hinting mechanics. we dont even know yet, how they should look like".

The difference is subtle:

In the first case, the customer is angry, because the perceived message is: "they dont want to fix it".

But in the second case he has the chance to realize, that helping the pg team might be a win-win situation. (And if its just a small donation to emphase that this feature is importent for them)

On the pro side: It seems like februars "query hint" discussions improved the pg faq page little bit.

#5 Bernhard Neuhauser on 2011-03-02 19:30 (Reply)

Add Comment

Name
Email
Homepage
In reply to
Comment	Enclosing asterisks marks text as bold (word), underscore are made via _word_. Standard emoticons like :-) and ;-) are converted to images.
	Remember Information? Subscribe to this entry

Quicksearch

Hi! I'm Robert Treat, COO of OmniTI, perhaps the best internet technology consulting company on the planet.

A veteran open source developer and advocate, I have been recognized as a major contributor to the PostgreSQL project, and can often be found speaking on open source, databases, and large scale web operations.

Upcoming Events

Surge 2012

September

At Baltimore, Maryland

Recent Musings

Contents of an Office

Wednesday

What Todd Akin Can Teach Us About DevOps

Monday

Root Cause of Success

Thursday

You were saying?

Laura Thomson about Root Cause of Success
Fri, 22.06.2012 10:18
We do postmortems when things go right - not always but we h ave, especially for big things that go right. It's im [...]

Javier Salado about Intrest free (technical) debt is risky
Tue, 07.02.2012 05:16
Hi Robert, Tanks for your t houghtful interest in my lates t post. You are absolutely right about the underlyi [...]

about Checkpoints, Buffers, and Graphs
Tue, 20.12.2011 10:49
thanks for the slides and the post.