Once again someone has brought up the idea of having a hints system in Postgres, which means once again we’re all subjected to watching people trod out the same tired, faulty, and even self-contradictory reasons to try to justify the idea that Postgres doesn’t have, need, or even want a hinting system. As frail as the arguments might be, people are so entrenched in their positions now that even having a discussion on the topic is difficult. And in fact, there are really two discussion going on here; one is whether Postgres should have *any* type of hinting system, and the other a more specific discussion on if we should have a *query* hinting system.
Does Postgres Need Hinting?
I think anyone who argues against hinting in an absolute sense is approaching things from the standpoint of intellectual dishonesty. Even posts that pretend to sustain the point of not having hints (like Josh’s recent post
) eventually seem to acquiesce to the idea that some system for hinting would be ok; maybe not query hinting but “selectivity” hints or “statistics” hints or some such similar jargon.
Now, it’s true, some people do take the overly high minded idea that even that level of effort is mis-guided and that time would be better spent fixing these so-called bugs in the planner. While that sounds great *in theory*, I’ve found that if you watch the mailing lists long enough, you’ll find people who have reported optimizer bugs and not been able to get a fix. Greg’s laundry list of work-arounds
isn’t something new that he made up; it’s a collection of watching people year after year run into performance problems and then be forced to come up with solutions because a planner improvement just isn’t going to be forthcoming. Don’t get me wrong; I’d love nothing more than to be able to send poor queries to the list and have code improvements flow out of that, but given that after 10 years of me and 1000’s of others sending in those types of emails we’ve still not achieved a good enough planner to have killed the hinting conversation tells me that’s an unachievable goal, and for those who need to GSD, we would be better off with something more tangible.
Hinting Yes, But Certainly Not Query Hinting
Ok, so presuming you are onboard with the idea that hinting is something that is necessary, even if not ideal, then the question really should be framed as “what kind of hinting should we use?”. Most of the people involved in Postgres development who are vocal on this subject tend to be firmly against the idea of query hinting as implemented in all of the other major databases. I think the arguments for this break down into 1 of 3 different ideas, all of which are inter-related.
The first idea is that solving these problem at the query level is far too narrow a solution, and that we need a more general solution that can solve things without having to modify individual queries. While I think that would be great, I’ve often pointed out that the reason so many implementations have fallen back to query hinting is that that is where the manifestation of performance problems actually exist. In other words, if I have a table that gets 20 different queries on it, I don’t necessarily want any special statistics changes or hinting mechanism at all, until the 1 query comes along that has performance issues. At this point I need something to improve that query, being careful not to have any negative impact on the already existing queries. Tricks like modifying join_collapse_limit, or random_page_cost, might solve your specific query problem, but if you change those on a global level you might wreck your existing system too. This pushes us back to query level hinting. Granted you could argue that those knobs don’t require direct query modification, but it’s not like those type of tricks aren’t already in the system as well. Example? The highly touted “offset 0” optimization that you’ll hear talked about is one example. I admit, I’ve told people to use this very same trick to fix queries after they have upgraded from older Postgres versions and had their query performance break. On the one hand, this makes me look like a “performance guru” because I know this super double-secret handshake optimizer trick; on the other hand, what kind of pompous jackass can stand there with a straight face and tell someone the “problem” with their query is that they aren’t using this syntax hack which was entirely unnecessary on the same queries using the same database product that was released 2 years ago. Yes, that’s right. This isn’t just that it makes some queries slow; it makes queries that used to run perfectly fine totally nosedive on performance. (You might think that’s a bug, but let’s try not to digress.) Any query hinting system we had would ostensibly be no worse than the majority of work-arounds that are in play today, and IMHO they’d likely be better understood than explaining to people the intricacies of the different enable_foo GUC’s (which aren’t a good enough solution to solve problems within complex queries anyway)
The second idea against traditional query hinting is that the Postgres community can come up with a superior method at hinting, which would likely target data statistics rather than the query planner, and that efforts should be focused on that instead. This is again one of those ideas that sounds great *in theory*, and now-a-days there’s even some fancy hand-waving talking about overriding the optimizers notion of column distinctness
, or perhaps storing statistics on the correlation of data between multiple columns in a table, that make it sound like there’s something to this line of thinking. The problem is that while these solution do address some of the (relatively) simpler cases, there are whole classes of problems yet to be dealt with. Have you ever tried to do joins across aggregated subquery columns in Postgres? Not only has Postgres struggled with these types of queries for years, the whole idea of improved statistics or selectivity hinting is pretty much irrelevant because no one can even come up with a plan for how these types of suggestions could be stored and fed into the optimizer. (Hint: It’s hard to store statistics on columns that don’t exist beyond the execution of a single statement). (Side note: I do think you might be able to feed these in on a per-query basis, but then we’d be back to argument number 1). This isn’t to say it can’t be done; I do occasionally see academic papers involving Postgres and improved query performance
pop up, but I can’t remember the last time I saw one cited in a Postgres commit, and again we’ve been working on this problem for *years* and have yet to see a concrete proposal.
The last major argument against query level hints is that because this solution ignores the underlying data, the fixes are at best temporary (since data can change over time), and will eventually bite you once that happens. Or, to put it a different way, query hints are nothing more than footguns which will eventually cause you pain. To that, all I can say is DBA’s hate being eaten by crocodiles
. (OK, if that isn’t good enough for you, I’ll also point out that it’s a bit of a fallacy to think that the database can be a better predictor of data changes than a DBA; the database can only at best do statistical analysis after the fact; the DBA can do that as well, but can also know about potential data changes before they are implemented.)
Ok, Nice Rant, But You Know This Doesn’t Change Anything, Right?
Yes, I know. I know because I have had all of these discussions before. Hints are religious in Postgres, so you shouldn’t expect a rapid change here. On the other hand, Windows support used to be religious in Postgres. And you know what else? Replication used to be religious in Postgres. So, while you should never underestimate the stubbornness of the Postgres development group, it is possible for them to come around to ideas that were once consider verboten. I think it will probably be at least another 5 years before it happens though; there isn’t enough overlap with the Oracle and Postgres communities to realize how many people are not able to migrate to Postgres because of planner/optimizer deficiencies; but I do think that day will come.