One schema optimization we often do is extending index when there are queries which can use more key part. Typically this is safe operation, unless index length increases dramatically queries which can use index can also use prefix of the new index are they ? It turns there are special cases when this is not the case.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | CREATE TABLE `idxitest` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `a` int(11) NOT NULL, `b` int(11) NOT NULL, PRIMARY KEY (`id`), KEY `a` (`a`) ) ENGINE=InnoDB AUTO_INCREMENT=6029313 DEFAULT CHARSET=latin1 mysql> select count(*) from idxitest where a=5 and b=5; +----------+ | count(*) | +----------+ | 60434 | +----------+ 1 row in set (0.69 sec) mysql> explain select count(*) from idxitest where a=5 and b=5; +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+ | 1 | SIMPLE | idxitest | ref | a | a | 4 | const | 707820 | Using where | +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+ 1 row in set (0.00 sec) |
The obvious optimization is to extend index from column (a) to column (a,b) right which will make it faster and should not hurt any other queries a lot, right ?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | mysql> alter table idxitest drop key a,add key(a,b); Query OK, 0 rows affected (24.84 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> select count(*) from idxitest where a=5 and b=5; +----------+ | count(*) | +----------+ | 60434 | +----------+ 1 row in set (0.02 sec) mysql> explain select count(*) from idxitest where a=5 and b=5; +----+-------------+----------+------+---------------+------+---------+-------------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+---------------+------+---------+-------------+--------+-------------+ | 1 | SIMPLE | idxitest | ref | a | a | 8 | const,const | 120640 | Using index | +----+-------------+----------+------+---------------+------+---------+-------------+--------+-------------+ 1 row in set (0.00 sec) |
Wow. The query runs 30 times faster – not only because it has to scan less rows but also because it is index covering query now – it does not need to access the data.
It turns out it is too early to celebrate. Application also had another query which previously ran so fast it hardly could be noticed. It however became a lot slower:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | # Old Schema mysql> select * from idxitest where a=100 order by id desc limit 1; +---------+-----+---+ | id | a | b | +---------+-----+---+ | 3000000 | 100 | 7 | +---------+-----+---+ 1 row in set (0.00 sec) mysql> explain select * from idxitest where a=100 order by id desc limit 1; +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+ | 1 | SIMPLE | idxitest | ref | a | a | 4 | const | 126074 | Using where | +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+ 1 row in set (0.00 sec) # new Schema mysql> select * from idxitest where a=100 order by id desc limit 1; +---------+-----+---+ | id | a | b | +---------+-----+---+ | 3000000 | 100 | 7 | +---------+-----+---+ 1 row in set (1.01 sec) mysql> explain select * from idxitest where a=100 order by id desc limit 1; +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+ | 1 | SIMPLE | idxitest | index | a | PRIMARY | 4 | NULL | 36 | Using where | +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+ 1 row in set (0.00 sec) # The plan also can look something like this: mysql> explain select * from idxitest where a=100 order by id desc limit 1; +----+-------------+----------+------+---------------+------+---------+-------+------+------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+---------------+------+---------+-------+------+------------------------------------------+ | 1 | SIMPLE | idxitest | ref | a | a | 4 | const | 1 | Using where; Using index; Using filesort | +----+-------------+----------+------+---------------+------+---------+-------+------+------------------------------------------+ 1 row in set (0.01 sec) |
So why did this query become so much slower ? The reason is its plan benefits from Innodb specific feature – index entries being sorted by primary key for each complete key value. So when you have index (a) and id is a primary key the real index is (a,id) when we extend index to (a,b) it really becomes (a,b,id). So if there is a query which used both a and id key part from original index it will quite likely be unable to use new index to full extent.
What is solution ? You can have “redundant” indexes on (a) and (a,b) at the same time. This is something what suppose to work but it well often does not:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | CREATE TABLE `idxitest` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `a` int(11) NOT NULL, `b` int(11) NOT NULL, PRIMARY KEY (`id`), KEY `a` (`a`), KEY `a_2` (`a`,`b`) ) ENGINE=InnoDB AUTO_INCREMENT=6029313 DEFAULT CHARSET=latin1 mysql> select * from idxitest where a=100 order by id desc limit 1; +---------+-----+---+ | id | a | b | +---------+-----+---+ | 3000000 | 100 | 7 | +---------+-----+---+ 1 row in set (1.03 sec) mysql> explain select * from idxitest where a=100 order by id desc limit 1; +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+ | 1 | SIMPLE | idxitest | index | a,a_2 | PRIMARY | 4 | NULL | 2247 | Using where | +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+ 1 row in set (0.00 sec) |
MySQL Optimizer considers using both (a) and (a,b) indexes and in the end decides to use neither rather doing full index scan until it finds a=100. This looks like an optimizer glitch in this case because it estimates it will scan 2247 rows in the selected plan, while using (a) index you can get result scanning only 1 row guaranteed.
To really get things going you will need to use FORCE INDEX(a) to force MySQL optimizer using right plan.
These results mean you should be very careful applying index changes from mk-duplicate-key-checker key checker when it comes to redundant indexes. If you have query plans depending on Innodb ordering of data by primary key inside indexes they can become significantly affected.
Optimizer behavior may be different in different MySQL versions. These tests were done with 5.1.45 though I’ve seen same behavior with MySQL 5.0 too.
Hi Peter,
I’ve seen inconsistent behavior in MySQL/InnoDB with regard to PK being suffix of all indexes. That is, many times execution plans did not go as plan, just as you presented.
“Index entries being sorted by primary key for each complete key value.” — although tests have (almost) always supported this, and despite the fact I sometimes rely on this for query optimizations, I was unsuccessful in finding documentation for that. Is this feature documented? Is it declared to be so? Or is it just current implementation? I’ll be grateful if you could provide with the docs.
I may write about this in the future: I found temporary tables to behave differently with InnoDB plugin compressed format — but, again, inconsistently.
Regards
Shlomi,
I’m not sure if this is documented. Though recent versions of MySQL optimizer relay on this behavior in multiple places. There also were number of bugs and support for this relatively recent – I think even in early MySQL 5.0 you would need to add id explicitly in the end of the index if you would like to sort by it.
It is part of Innodb design to store rows by PK for each key value, I believe it would be a major change to change it.
Peter, thanks.
PS, instead of using FORCE INDEX, what I do is actually drop the “ORDER BY id”, when the desired order is ASC, and just rely on the fact that rows are retrieved by that order.
Shlomi,
Yes… This is why I used DESC in example. It is actually what was in the real query too 🙂
No offence 😀
It seems like a more ideal way for Innodb to deal with secondary indexes is to explicitly require that the PK be set. I guess I am becoming an enemy of all things implicit.
Not quite understand one thing here:
SELECT * FROM idxitest WHERE a=100 ORDER BY id DESC LIMIT 1;
As we know, the primary key is also the clustered index in InnoDB. If MySQL want to use the primary key to query the above statement, it can’t use the id column in the clustered index, is this right? IMO, Mysql has to scan all the index leaves to find all records with a = 100. At the same time, all the leaves are just the data pages. So in this situation, is there any difference between the full index scan and full table scan?
Thanks.
Simon,
If there is only key on column (a) for each a value rows will be sorted by primary key in the data pages so they can be retrieved in the sorted order.
Also you’re right in Innodb scanning PRIMARY KEY completely and full table scan is the same thing.
Thanks peter, that’s more clear.
To support what has been said above, i found this behavior and strengthened my understanding of the index usage. This was tested on 5.5/windows.
CREATE TABLE cust2(
loginname VARCHAR(14) NOT NULL,
custname VARCHAR(255) NOT NULL,
salary FLOAT NOT NULL,
PRIMARY KEY
uidx_name
(custname,loginname),KEY
idx_sal
(salary),KEY
idx_namesal
(custname,salary)) ENGINE=INNODB;
INSERT INTO cust2 (custname,salary,loginname) VALUES (‘raghu’,1000,’raghu’),(‘imran’,2000,’imran’), (‘doofus’,500,’doofus’),(‘joey’,400,’joey’),(‘phoebe’,200,’phoebe’),(‘monica’,200,’monica’)
EXPLAIN SELECT * FROM cust2 WHERE loginname=’raghu’ OR custname=’joey’
initially i thought the primary key would be used (deliberately reversed the order in the where clause), but the index that was used was idx_sal. Needless to say, this index actually expanded to (sal,pk ) in this case sal,custname,loginname.
“id”,”select_type”,”table”,”type”,”possible_keys”,”key”,”key_len”,”ref”,”rows”,”Extra”
“1”,”SIMPLE”,”cust2″,”index”,”PRIMARY,idx_namesal”,”idx_sal”,”4″,\N,”6″,”Using where; Using index”
reversing the order still used the same index
EXPLAIN SELECT * FROM cust2 WHERE custname=’raghu’ OR loginname =’raghu’
“id”,”select_type”,”table”,”type”,”possible_keys”,”key”,”key_len”,”ref”,”rows”,”Extra”
“1”,”SIMPLE”,”cust2″,”index”,”PRIMARY,idx_namesal”,”idx_sal”,”4″,\N,”6″,”Using where; Using index”
but changing the operator to “AND” made it use the primary key.
EXPLAIN SELECT * FROM cust2 WHERE custname=’raghu’ AND loginname =’raghu’
“id”,”select_type”,”table”,”type”,”possible_keys”,”key”,”key_len”,”ref”,”rows”,”Extra”
“1”,”SIMPLE”,”cust2″,”const”,”PRIMARY,idx_namesal”,”PRIMARY”,”811″,”const,const”,”1″,””
This indeed proves that we not only need to verify your queries but index behavior as well.
Thanks!