Speeding up INSERTs on MySQL

Was working on a problem today that required a lot of bulk inserts to a MySQL table. I was getting about 200 inserts/second on my development system, which is OK considering there was some minor processing going on.

Since some batches can contain well over a million rows, I started working on how to optimise these queries so we can get them in there faster.

Firstly, watch your indexes on the table. Obviously the more indexes you have on the table the more work the DB has to do on INSERT to maintain them.

Second, if possible, ditch your ORM. Instead of building and hydrating objects for each row in the table used direct/prepared queries. Most ORMs worth their salt can handle this. i.e. Propel can give you direct access to the underlying PDO connection to use your prepared statements.

/** @var $con PDO */
$con = Propel::getConnection();

Obviously if you have business logic tied up in your objects, it’s best to use them instead of duplicating code.

Third, lock the table for a batch of inserts. Here’s some sample code

$i = 0;
$con = Propel::getConnection;
$con->beginTransaction();
 
foreach ($rowsToInsert as $row) {
 
  if ($i % 100 == 0) {
    $con->commit();
    $con->beginTransaction();
  }
 
  TablePeer::insertUsingPreparedStatement($row);
  $i++;
}

If your using MyISAM you can lock the table using the LOCK TABLE statement

$con->exec('LOCK TABLES table_name WRITE');
$con->exec('UNLOCK TABLES');

Depending on how important your data is you can change the (100) value in the $i % conditional to higher or lower. The reason this speeds up Inserts so much is that MySQL won’t flush it’s write cache to the disk until the transaction is finished, as opposed to every INSERT statement. However, having unflushed data in your cache is dangerous because it may disappear if something happens to the DB server, or get rolled back if your script carks it. Also, since a transaction/lock table call will stop all other access to the table, if it’s frequently read from those queries will be waiting on locks, so it’s good to refresh them frequently.

Using these three methods I almost tripled the performance of my script which now inserts between 550-600 rows per second. Win!

There’s some more tips over here in the MySQL manual and some of the comments are quite helpful too.