Jeremy Satterfield
Coding, Making and Tulsa Life

The Pains of MySQL Perfomance Under Docker

Over the course of the last several days I've been plagued with a couple pretty massive performance issues running MySQL under Docker(-compose) in our developer environments. Everyone in our office is running either a Mac or some flavor of Linux. In both cases I spent a couple of days search for a solution, running into several roadblocks, and exhausting many Google search terms. Since I had such a difficult time finding the solution, here are the symptoms I was seeing and what finally worked in our case, maybe it'll save someone else time.

Background

In both cases we were seeing incredibly long query times in certain areas of the site. The particular queries were those which we would expect to see performance issues exacerbated if they existed, large result sets with several joins and sorted. However, running the same queries against the same data on the same machines against MySQL installed natively on the host still ran these queries in fractions of a second, while the Docker instances took several minutes and spiked the CPU usage. This is an existing app with existing data, but only about 6GB on disk, so nothing big.

We are using MySQL's official images on Docker Hub. Using MySQL 5.6 to match what we currently have on our production databases. No customizations to speak of other than setting environment variables for user credentials and mounting named volumes to load and store the data.

The Issue with Mac

This was the fairly simple one to fix. After past experiences with similar CPU pegging performance issues with MySQL, I suspected this to be something to do with IO being limited somewhere. It took a bit of looking around, but I eventually ran across some forum posts talking about fsync flushing issues. I've run across enough similar issues in the past, it's a wonder my mind doesn't start there.

So reading on, I came across this gist which is a script to turn off fsync within the Hyperkit that Docker is running under Docker for Mac. This did the trick. I started up the docker containers and loaded page with the worst offending queries and it loaded in a matter of seconds.

Since this was an issue with something as low level as fsync, you'd be right in thinking that there's an issue with the driver Docker is using to access the disk. Further reading in one of the Github issues I found that they had apparently made changes in 17.11+ to use`raw` format which no longer needed these fsync changes. Docker 17.12 landed in stable a couple days later and appears to have indeed fixed the issue.

The Issue with Linux

This proved to be a much tricker issue to fix and the solution turned out to be one I tried by chance and not anything I found existing on the web. Like Mac I was seeing CPU spike for complex queries and several minutes to return results. It turns out the Mac issue is wide spread enough that it's nearly impossible to do a Google search that gives you Linux specific steps to try to narrow it down.

Here is a short list of many of the steps that did not work or showed minimal improvements:

  • forcing Docker's storage driver to something other than the default (I ended up on overlay2 because it's recommended, but again there's no noticeable difference)
  • using a bind (named) volume for storing data files
  • using a new mount volume from the host machine
  • using a mount volume to the existing data files for my host machines existing MySQL data files, which worked great with a native instance
  • upgrading to MySQL 5.7
  • providing a custom settings file with tuned settings that worked for the native instance
  • running a standalone Docker instance (read: not compose) with minimal parameters provided
  • change how my root file system was mounted on boot to include barrier=0

At this point I was 3 days in and pretty frustrated. I decided to change tactics, and I set out to compare the performance of the Docker vs native instance in a better way. I started out by comparing the EXPLAIN results for the same query in both environments, and instantly noticed a pretty big difference, the Docker instance wasn't using a pretty important PRIMARY index which the native instance is using. I dug a little deeper and no matter how the data was stored (bind v mount) if it was going through the Docker instance it wasn't using the index. I verified that the index existing in all instances.

Finally, for no reason that I can particularly pin down, I decided to run MySQL's optimize script on all the tables. That fixed the issue. I have no idea why this fixed anything, but it has done the trick for all the Linux systems in our office. If you have an explanation I'd be happy to hear about it down in the comments.