Make MySQL faster in one hourI want to hear from people who test this with real workloads on servers with 8+ cores or who do any type of testing on platforms other than Linux/x86. The patch for MySQL 5.0 is at code.google.com. The change is to replace the mutex_t used in the InnoDB rw_lock_struct with a pthread_mutex_t. Calls to lock, unlock, create and destroy rw_lock_struct::mutex in sync0rw.c must also be updated. InnoDB implements a mutex (mutex_t) and a read-write lock (rw_lock_struct). Both of these spin when a lock cannot be granted. On my platforms, the code spins for about 4 microseconds and then the thread waits on a condition variable. rw_lock_struct uses mutex_t to protect its internal state. I think that InnoDB is faster on SMP when pthread_mutex_t is used in place of mutex_t for rw_lock_struct::mutex. The following describes the overhead from the use of the InnoDB mutex when there is contention. A thread that must sleep waiting for a lock does: spin for a few microseconds trying to get the lock reserve a slot in the sync array (one lock/unlock of the global sync array pthread_mutex_t) reset an event (lock/unlock the event pthread_mutex_t) wait on the event (lock/unlock the global sync array pthread_mutex_t, lock the event pthread_mutex_t, wait on a pthread_cond_t) There are 4 pthread_mutex_lock calls and 3 pthread_mutex unlock calls on this codepath and 2 of the lock calls are for a global mutex which can be another source of mutex contention. All of this can be replaced with the pair pthread_mutex_lock/pthread_mutex_unlock when rw_lock_struct::mutex is changed to use a pthread mutex. Of course, you shouldn't take my word for it so I will provide a few results. These were measured on an 8-core x86 server that used Linux 2.6. Three mysqld binaries were tested: base - MySQL 5.0.37 and the Google patch excluding the smpfix changes smpfix+tcmalloc - MySQL 5.0.37 and the Google patch including the smpfix changes and linked with tcmalloc pthread_mutex - base with rw_lock_struct::mutex changed to use pthread_mutex_t Results for sysbench --test=oltp --oltp-read-only. This displays transactions per second for sysbench run with 1, 2, 4, 8, 16, 32 and 64 concurrent users. Results for sysbench --test=oltp --oltp-read-write. This displays transactions per second for sysbench run with 1, 2, 4, 8, 16, 32 and 64 concurrent connections. Results for concurrent queries. Each query is a primary key - foreign key join between tables that each have 2M rows. Too long means it ran for 10s of minutes and I killed it. This displays the time in seconds to complete the query for 1, 2, 4, 8 and 16 concurrent users. Binary 1 user 2 users 4 users 8 users 16 users base 2.6 3.9 8.1 182.5 Too long smpfix+tcmalloc 2.6 3.7 4.9 7.6 15.2 pthread_mutex 2.5 3.7 9.1 27.8 58.6 Results for concurrent inserts. Each user does a sequence of insert statements to a different table. Too long means it ran for 10s of minutes and I killed it. This displays the time in seconds to complete the inserts for 1, 2, 4, 8 and 16 concurrent users. Binary 1 user 2 users 4 users 8 users 16 users base 15.5 32.4 78.2 Too long Too long smpfix+tcmalloc 12.6 21.5 40.5 112.4 232.9 pthread_mutex 13.5 23.8 76.0 378.7 Too long |
Comments