The idea to use SSD/Flash as a cache is not new, and there are different solutions for this, both OpenSource like L2ARC for ZFS and Flashcache from Facebook, and proprietary, like directCache from Fusion-io.
They all however have some limitations, that’s why I am considering to have L2 cache on a database level, as an extension to InnoDB buffer pool.
Fortunately, there is a project in progress Flash_Cache_For_InnoDB by David which implements this.
David helped us to port his work to the latest Percona Server and you can get it from our launchpad Percona Server 5.5.28
I think that naming this as Flash_Cache is confusing due similarity to Flashcache from Facebook, so I prefer to name it as L2cache.
There is a quick benchmark on tpcc-mysql 2500W (250GB), I used
- data on RAID
- data on SSD (Intel SSD 910)
- data on RAID with L2 cache on Intel SSD 910, and size of cache is 150GB
As we see the result is quite good an promising, so I am going to continue to research on further integration with Percona Server.
Right now there are two challenging questions, which we need to resolve:
- Backup. In current state only mysqldump will work. To make backups with Percona XtraBackup we will need additional support of L2 cache in XtraBackup
- Recovery time. Right now if a crash happens, the cache is safe, but recovery time is on longer side, we need to see how it can be improved
I guest “BP 125GB” should be “BP 1250GB” , Vadim
Iiang,
No, I used buffer pool 13, 75, 125GB
aha, my fault:) I thought “130GB”/”750GB”
Testing with a buffer pool > the amount of data (250GB) would invalidate the testing of the L2cache as all the data would be in the L1 cache (the buffer pool).
Is this a read only cache or a write back cache as well? So many of these SSD cache things accelerate reads only. For the MySQL DBs I have reads are all done via the buffer pool so from a storage I/O perspective it’s pretty much all writes.
nate,
It is for caching writes too.
I suggest you check out link with an original architecture
https://code.google.com/p/david-mysql-tools/wiki/Flash_Cache_For_InnoDB
and there is a diagram
http://blog.chinaunix.net/attachment/201108/29/196376_13145957329oNS.png
Hi,
I would like to share my thoughts on this.
Personally I don’t like from a pure theoretical standpoint the idea of making software more complex,
in my ideal world the memory/storage will all be the same hi-speed so the idea of buffers and caches will no longer be needed.
Back to the real world of course it’s hard to get rid of the duality hi-speed expensive memory vs lower-speed cheap mass memory but I would rather use faster storage and with better caching mechanism for the durable data than making the software more complex. May be improving is already there as flushing mechanism, but not introducing a new intermediate layer.
To be honest I did not go thru the code so I can’t evaluate it in a more proper way but the idea of going from a 2 speeds memory scenario to a 3 speeds one, and managed from the same software it makes me think there will be a lot more to mantain a many more bugs to fix. If this layer would be indeed modular, so an independent piece of software that can be enabled or disabled that would make the L2 Cache only a potential benefit. I have not enough insight to know if it can be implemented with a sort of in/out ‘flushing api’ but it would be better I think.
Good job as usual.
Claudio
Vadim,
I wonder if we have any performance regression compared to the server which does not have this code compiled in ? It would be important point to know.
Another question is graphs – with 13GB the graph looks “normal” while on others it looks as it either takes a while to stabilize or just plain unstable…. Though it always looks positive – you do not have any dots where performance would fall so badly it would be worse than having no cache… which is good sign.
The results for 125GB buffer pool size are though very puzzling though. The 150GB cache you have with SSD is very close to 125GB of memory, so why are we looking at so large performance difference ? For reads the gain should be rather limited, for writes… it should only be the problem with small log… and also not at the start of benchmark where no flushing is taking place. How large logs did you use ?
Peter,
For 125GB case, we have very write intensive workload, and you understand our cache should be able to write changes to slow storage.
In fact instability is similar to what we see when we have big buffer pool in InnoDB and slow storage – we need an adaptive algorithm. The cache is affected to same same issue.
We have too many changes which eventually needs to land on disk, and cache needs to slow down, to be able to do that. That’s why we see a performance difference comparing to memory.
There is article about flash cache and recovery:
FlashBased Extended Cache for Higher Throughput and Faster Recovery
http://vldb.org/pvldb/vol5/p1615_woon-hakkang_vldb2012.pdf
Vadim,
How is your configuration about innodb and l2 cache ?
I think if you set innodb_log_file_size=4G and innodb_flush_neighbors=0,you can get a more stable and better result.
Besides, your l2 cache read hit ratio will also increase.
Claudio Nanni – “I would rather use faster storage and with better caching mechanism for the durable data than making the software more complex”
This is fine and of course it’s going to be the normal situation for a lot of people. Problem is if you have a very large db that needs rotational storage (for various reasons, not least capacity) you probably can’t actually use SSD storage directly, and you have hot data that will fit inside an SSD but not in RAM then this will probably be a great solution.
Of course in an ideal world we’d have persistent storage with the performance characteristics of DDR4 for cheap but we don’t live in that world.
I think the TPS will be slow down when the flush thread on l2 cache to disk start to work,because the disk may not work in the first hours ,all data be written into the l2 cache.
when there are more than 150G*10% dirty page,disk may start to receive the dirty page from l2 cache.