I spend large portion of my life working on MySQL Performance Optimization and so MySQL Optimizer is quite important to me. For probably last 10 years I chased first Monty and later Igor with Optimizer complains and suggestions. Here are some general ideas which I think can help to make optimizer in MySQL, MariaDB or Drizzle better.
Make it Plugable Every Major optimizer change causes a lot of pain for some of the users. Even if you improve performance 99 our of 100 queries there are going to be countless users complaining about the change. Due to this problem Optimizer Team was more conservative than I think they could have been. The solution is simple – make optimizer pluggable and make it possible to stick to old optimizer behavior with new MySQL Version.
Make Cost Model Adjustable MySQL Optimizer looks at query plan in terms of disk IOs/Seeks in same way for all data sets. In practice some people have their database 100% in RAM (even for Disk tables such MyISAM or Innodb) others keep database on SSD which has completely different ratio between CPU and IO cost.
Focus on Execution Methods Performance problems can be due to optimizer picking the wrong plan, such as doing full table scan when Index access is better or because MySQL simply does not have execution method to resolve query in optimal way – loose index scan, hash join, sort merge join are all the examples of such. For me it is most important to ensure MySQL has proper ways to execute the query. It may not always pick them right but at least it allows to get query going right manually.
Zero Administration Tunables and Hints Zero Administration is great. I would love to see Optimizer which always choses the fastest plan for the query (not the plan with lowest “cost” but the one which actually gives best performance). I also recognize there are always going to be cases when Optimizer will not pick the right plan. So I would like to see tuning knobs (which relates to cost model and various optimizations) as well as simply hints. Any way MySQL could possibly execute query should be possible to force with hints. In MySQL 4.0 this was the case, in recent versions number of optimizations have been added which can’t be easily forced with hints.
Another benefit of having ways to force any optimizer behavior with hints (rather than by changing the code) is the data Optimizer Team can get back from customers and community – it because very easy for users to show there is the plan which works better and so let the team know which cases are not handled best by the optimizer logic.
I cannot agree more. Some database projects (for instance PostgreSQL) have the following policy: if the optimizer is wrong, then the optimizer must be fixed, and there is no point supporting hints. Rationale is this policy is supposed to increase the quality of the optimized in the long term. It may work or not work, this is debatable. I will not comment on the relative quality of Oracle/MySQL/PostgreSQL query optimizer.
My point is it is not really a pragmatic approach. Giving full control over the query execution plans to the developer or DBA is an invaluable tool to get the job done, especially for large databases, in a high throughput, high availability context. Results of a query optimizer can only be as good as the collected statistics, which are sometimes quite difficult to generate in some situations.
Didier,
There are a lot of cases why optimizer can do mistakes. I have not seen optimizer which would never pick the slower plan for some kind of query. It is good if optimizer is fixed but you can’t always upgrade to the next release. For example if you have problem with MySQL 5.0 the problem may be fixed in 5.4.x but you may not really be able to upgrade immediately.
We’ve all been bit by the optimizer suddenly picking a poor plan when the data set changes, most probably without even knowing that the problem lies therein. Its not like saving an explain of every query is really all that common practice (though we have to give a nod to mk-query-digest which would make doing that relatively easy). The most common scenario is probably “hey query X is going REALLY slow on live but its really fast on dev.. explain both.. realize that query on live is now scanning a table.. add index hints or archive some data. This is all reasonable, but it probably means angry database users waiting on highly technical work to complete, which is never a good place to be in.
I think this is a large driving force behind people moving bits of their architecture to simpler key/value systems. Among other things, the value of predictable scalability is outweighing the flexibility of SQL, even when it means rewriting and refactoring large pieces of applications.
So yes, I have to agree that adding more places to force queries to run with a predictable plan is indeed something MySQL will need to continue to meet the demands of users.
Clint,
I think Drizzle is moving in right direction with this. And NDB (MySQL Cluster) is another example
You should not be forced to ONLY access your storage using SQL. There are cases when you can avoid expense of parsing and picking optimizer plan when transaction is predictable and performance critical.