May 14, 2008

Concurrent inserts on MyISAM and the binary log

Posted by Baron Schwartz

Recently I had an interesting surprise with concurrent inserts into a MyISAM table. The inserts were not happening concurrently with SELECT statements; they were blocking and the process list was filling up with queries in Locked status.

My first thought was that the customer had deleted from the table, which leaves "holes" in the middle of it and prevents concurrent inserts. (You can configure the server to permit concurrent inserts even when there are holes, but it's disabled by default.) However, that turned out not to be the cause; the table was only inserted into (and selected from). Instead, the blocked statements were because of INSERT... SELECT statements that were running against the table, selecting data from it and inserting into another table.

Let's look at what happens here: suppose you have two tables tbl1 and tbl2 and concurrent inserts into tbl2 are running fine. If you now run the following query,

SQL:
  1. INSERT INTO tbl1 SELECT * FROM tbl2

The concurrent inserts into tbl2 can block. This happens if you have the binary log enabled. If you think about it, this makes sense and is correct behavior. The statements have to be serialized for the binary log; otherwise replaying the binary log can result in a different order of execution.

The MySQL manual actually says this, but not in the clearest way. It just says

If you are using the binary log, concurrent inserts are converted to normal inserts for CREATE ... SELECT or INSERT ... SELECT statements.

If you use mysqladmin debug, you'll see an ordinary SELECT gets a lock on the table like this:

CODE:
  1. Locked - read         Low priority read lock

But on INSERT...SELECT, you'll see this:

CODE:
  1. Read lock  without concurrent inserts

That read lock is what's blocking the concurrent inserts from happening.

There's no solution to this, if you need the binary log enabled. (It needs to be enabled for replication.) There are workarounds, though. You can use the old trick of SELECT INTO OUTFILE followed by LOAD DATA INFILE. You can use InnoDB instead. Or you can do something more elaborate and application-specific, but that's a topic for another post.

May 13, 2008

MySQL Is back to Open Source Camp ?

Posted by peter

Looking at Kaj's Blog Annoucement MySQL has pulled back on the plans to release portions of the servers as close Source only.

I am extremely happy to hear these news ! This is good for MySQL as a company, MySQL customers and MySQL users.

I'm hoping Community feedback was serious contributer to this decision, though I know there were a lot of Internal discussions as well. In any case this sends a great message to community - Speak up and you may be heard.

I also hope Marten Mickos took this decision being convinced rather than getting the order from the top as this is only one battle in "what is going to be opensource" war

Anyway thank you everyone who made this happen, in particularly Monty, which I know fought a lot for this.

P.S. This is great news but I'd like to see and know more. Looks like servers is left alone being Open Source, what is about MySQL WorkBench, MySQL Proxy Extensions, MySQL Monitoring software ?

MySQL Performance Engineer opening

Posted by peter

As you know we love to analyze performance of various MySQL features, benchmark, compare, analyze things and post our findings on MySQL Performance Blog. However recently we got too busy with serving out customers and the backlog of things to take a look and write about is just growing larger and larger. So we decided to hire someone who could focus on such tasks.

What are we expecting from MySQL Performance Engineer ?

  • Such person should have a passion for making things to work fast or finding why they are slow.
  • One should have an "evil mind" to be able to put system in the conditions developers would not think about.
  • One should have an "X-Ray vision" understanding what happens on all layers (MySQL, Operation System, Hardware) when for example row is being inserted into Innodb table on Linux stored on LVM volume over RAID10
  • One should understand trusting documentation is worst mistake ever and if you need to find the truth source code is your best friend.
  • One should have good MySQL knowledge including main storage engines, and best have experience using it in practice
  • One should know how one could profile applications - valgrind, oprofile, dtrace and similar tools
  • One should know C/C++ to be able to understand performance issues discovered and prototype or develop fixes for them
  • One should know some scripting languages - bash,perl,php to be able to script benchmark runs
  • This remote position which is open for candidate worldwide. We also do not require perfect Spoken English for this position, though candidate should be able to read and write in English language pretty well.

    If you are such a person or know someone who might be interested, drop us a note.

    You can learn more about our company at Percona web site.

May 1, 2008

Quickly preloading Innodb tables in the buffer pool

Posted by peter

In the previous post I mentioned a way I use to preload Clustered Index (data) for Innodb tables. Though I thought this topic would benefit from a bit more information.

But lest first start with feature request for Innodb Team: All ways I mention here are hacks and they can't be as efficient as native support. It would be great if Innodb would implement command to preload table to Innodb buffer pool, which would simply go through .ibd file sequentially and inject pages in the buffer pool. This would make preload done using sequential file scan even if indexed suffered a lot of page splits.

Now lets continue to the hacks :)
[read more...]

Learning about MySQL Table Fragmentation

Posted by peter

Recently I was working with the customer who need quick warmup - to get Innodb table fetched in memory as fast as possible to get good in memory access performance.

To do it I run the query: "SELECT count(*) FROM tbl WHERE non_idx_col=0" I use this particular form of query because it will do full table scan - running count(*) without where clause may pick to scan some small index instead.

If your table is not fragmented one of two things should happen - either you should be reading at your hard drive sequential read rate or you would see MySQL becoming CPU bound if IO subsystem is too fast.
[read more...]

T2000 CPU Performance - Watch out

Posted by peter

Sun is aggressively pushing T2000 as Scalable MySQL Platforms, and indeed it is Scalable in terms of high concurrency workloads - it is able to execute a lot of concurrent threads and so speed gain from 1 thread to say 32 thread will be significant.

But thing a lot of people miss is - Being Scalable is Not Enough - you need to scale from reasonable base to claim the good performance, and this is where T2000 performs subpar in many cases.
[read more...]

April 30, 2008

Can having information public hurt consulting business ?

Posted by peter

People frequently ask me if the fact we keep information public can hurt our consulting business ? Lets keep aside for the moment amount of new business publishing this information brings to us but think it also have significant negative effect because people find information on MySQL Performance Blog and use it instead of purchasing our services ?

[read more...]

Percona Team presentations from MySQL Users Conference 2008 published

Posted by peter

If you've missed our presentations on MySQL Users Conference you can catch up now by taking a look at the slides, which are now published in presentations section of our company web site. You can also find a lot of old presentations in the same location.

Enjoy !

April 28, 2008

MySQL Replication vs DRBD Battles

Posted by peter

Well these days we see a lot of post for and against (more, more) using of MySQL and DRBD as a high availability practice.

I personally think DRBD has its place but there are far more cases when other techniques would work much better for variety of reasons.
First let me start with Florian's comments on the issue as I think they are most interested ones.
[read more...]

The MySQL optimizer, the OS cache, and sequential versus random I/O

Posted by Baron Schwartz

In my post on estimating query completion time, I wrote about how I measured the performance on a join between a few tables in a typical star schema data warehousing scenario.

In short, a query that could take several days to run with one join order takes an hour with another, and the optimizer chose the poorer of the two join orders. Why is one join order so much slower than the other, and why did the optimizer not choose the faster one? That's what this post is about.

Let's start with the MySQL query optimizer. The optimizer tries to choose the best join order based on its cost metric; it tries to estimate the cost for a query, then choose the query plan that has the lowest cost. The unit of cost for the MySQL query optimizer is a single random 4k data page read. In general, it's a pretty good metric, but it has one major weakness: the server doesn't know whether a read will be satisfied from the operating system cache, or whether it'll have to go to disk. (This distinction is abstracted away by the storage engine; the optimizer doesn't know how a given storage engine stores its data).

I'll try to omit the details and keep this clean. Let's take a look at the tables.

SQL:
  1. mysql> SHOW TABLE STATUS LIKE 'fact'\G
  2. *************************** 1. row ***************************
  3. Name: fact
  4. Engine: MyISAM
  5. Rows: 147045493
  6. Avg_row_length: 117
  7. Data_length: 17217646764
  8. Index_length: 11993816064
  9.  
  10. mysql> SHOW TABLE STATUS LIKE 'dim1'\G
  11. *************************** 1. row ***************************
  12. Name: dim1
  13. Engine: MyISAM
  14. Rows: 453193
  15. Avg_row_length: 122
  16. Data_length: 55605116
  17. Index_length: 93812736
  18.  
  19. mysql> SHOW TABLE STATUS LIKE 'dim2'\G
  20. *************************** 1. row ***************************
  21. Name: dim2
  22. Engine: MyISAM
  23. Rows: 811
  24. Avg_row_length: 105
  25. Data_length: 85368
  26. Index_length: 154624

It's a big fact table and two fairly small dimension tables, which is normal. Here is the query:

SQL:
  1. SELECT fact.col1, min(fact.col2) AS min_col2
  2. FROM fact, dim1, dim2
  3. WHERE fact.col4 = dim1.col4
  4. AND dim1.col3 <> 'hello world'
  5. AND dim2.col5 = 1
  6. AND fact.dim2_id = dim2.dim2_id
  7. AND fact.col2> some_const
  8. GROUP BY fact.col1

There are indexes on all the columns in all the ways you'd expect: all the dimension columns are indexed on every table, and there's a separate index on every column in the WHERE clause. Here's the query plan initially.

SQL:
  1. *************************** 1. row ***************************
  2. TABLE: dim1
  3. type: range
  4. key_len: 195
  5. rows: 18790
  6. Extra: USING WHERE; USING TEMPORARY; USING filesort
  7. *************************** 2. row ***************************
  8. TABLE: fact
  9. type: ref
  10. key_len: 4
  11. rows: 606
  12. Extra: USING WHERE
  13. *************************** 3. row ***************************
  14. TABLE: dim2
  15. type: eq_ref
  16. key_len: 2
  17. rows: 1
  18. Extra: USING WHERE

This query will run for days and never complete. No one ever let it finish to see how long it would run.

How do I know it will run for days? Here's my train of thought:

  • It's performing index lookups into the fact table, which is big.
  • An index lookup is a random I/O.
  • A modern disk can do about 100 random I/O's per second, as a rule of thumb.
  • If you do the math with the rows column in EXPLAIN, you realize that this equates to about 18790 * 606 = 11386740 I/O operations, assuming that the indexes are fully in memory.
  • When you divide this by 100 I/O operations per second, and then divide that by 86400 seconds in a day, you get about 2.6 days.

Ouch! That's slow.

Now let's look at the alternative: table-scan the fact table, and do index lookups in the two dimension tables. MySQL doesn't want to choose this join order, so we'll force it with STRAIGHT_JOIN:

SQL:
  1. EXPLAIN SELECT STRAIGHT_JOIN  ....
  2. +-------+-----------+-----------+---------------------------------+
  3. | TABLE | type      | rows      | Extra                           |
  4. +-------+-----------+-----------+---------------------------------+
  5. | fact  | ALL       | 147367284 | USING TEMPORARY; USING filesort |
  6. | dim1  | eq_ref    | 1         | USING WHERE                     |
  7. | dim2  | eq_ref    | 1         | USING WHERE                     |
  8. +-------+-----------+-----------+---------------------------------+

As we saw in the previous post, which I linked at the top of this post, we can scan the fact table in less than an hour. And it turns out that joining to the dimension tables doesn't slow the query perceptibly, because these tables are small and they stay in memory, in the OS cache. (They don't get evicted from memory by the cache's LRU policy, because they are frequently used -- once per row in the fact table. The LRU policy evicts old blocks from the fact table instead, which is perfect -- these blocks are used only once and not needed again, so they can be replaced).

The difference between the two queries -- 55 minutes and 2.6 days -- is basically the difference between scanning data sequentially on disk and random disk I/O.

So now you know why one join order is faster than the other. But why didn't the optimizer know this, too? The optimizer does know that random access is slower than sequential access, but it doesn't know that the dimension tables will stay in memory, and this is an important distinction.

Let's put ourselves into the mindset of the optimizer. We'll assume that every join to the dimension tables will go to disk instead of being read from cache. Now the STRAIGHT_JOIN becomes a table scan of about 313 sequential reads (150 million rows / 117 bytes per row / 4096 bytes per read), plus about 150 million random I/Os for the first dimension table, plus 150 million random I/Os for the second dimension table. That's 300 million random I/O operations.

In contrast, the optimizer chose a plan that it thought would cause only 11.3 million random I/O operations.

The optimizer was being smart, given its lack of knowledge about the OS cache. This is why an expert is sometimes needed to provide the missing information. If the MySQL optimizer were right and each of these had to go to disk, our STRAIGHT_JOIN plan would take more than a month to complete! Good thing we know the difference between cache and disk!


This page was found by: mysql performance bl... mysql performance mysqlperformanceblog www.mysqlperformance... mysql inurl:blog