Faster Rails: Indexing Large Database Tables Without Downtime

As the scope and size of a Rails project grows, actions that were blazingly fast can become slow, and even downright unacceptable. The cause behind this issue can be an exponential growth of your database tables that makes lookups and updates much slower. If this is the case, adding missing indexes to your database is a cheap and easy way to drastically improve the performance of your application.

However, adding a new index to a database table that’s already big can be dangerous. Don’t forget, index creation on a database table is a synchronous action that prevents INSERT, UPDATE, and DELETE operations until the full index is created. If the system is a live production database, this can have severe effects. Indexing very large tables can take many hours. For a system like Semaphore, even short periods are unacceptable. If this happens during deployment, we can potentially cause an unwanted downtime for the whole system.

Imported from Disqus

Augusts Bautra3 years ago

Superb! I have encountered indexing migrations on tables with some 500k records that take 5 minutes to run, effectively bringing down production, so this seems like a life saver.

Avatar

Danny Fallon3 years ago

This article is aimed at Postgres, I’m not sure why you mentioned MySQL. It is unfair to say MySQL locks writes on your table when adding an index. but since I’m here I’ll give some more details:

The MyISAM engine will lock tables when performing a DML operation like adding a column or an index. However it also locks the table when you perform a write which is why it hasn’t been the default MySQL storage engine since 5.5.5 when MySQL moved to InnoDB. Since MySQL 5.6 InnoDB DDL operations have improved significantly and adding or droping indexes (in addition to a great many other DDL changes) will allow you to perform concurrent writes. The only index-based operation that blocks writes is the creation of a FULLTEXT index, normal index additions and drops are non-blocking.

Much like Postgres, the method is not without caveats. While the new index is build built in the background for all existing data, writes are kept in a log (the online alter log). On very heavily written tables you could exceed the alter log size and fail to add the index. The size limit of the alter log can be raised so this can be worked around. The real problem is that once the index is marked as ready for existing data prior to the ADD INDEX statement the alter log is replayed onto the index so it can catch up with the live dataset. This replay will take a lock and if the alter log is large you could end up blocking for a while or having a stability problem as a result; a thundering herd soaking up all your MySQL connections/threads. You’re not going to fix this problem inside Rails, if you have writes that high you likely need an external migration solution.

If you’re going to stick with MySQL optimisations I would also highly recommend that multiple DDL changes to a table be performed using change table where possible:

def up  change_table(:table_name, { bulk: true }) do |table|    table.column :col6    table.index [:col1, :col2]    table.index [:col5, :col6]    table.remove_index [:col1]end

This will run the DDL statements as one query - it speeds up completion time at the risk of the alter log being a bit slower to process. I am not sure the same thing is possible with Postgres.

Igor Sarcevic Danny Fallon3 years ago

@Danny thanks for the feedback. You are correct, this article is very postgresql specific. It is the primary database we use to develop Semaphore, and I made a mistake by mentioning mysql.

Thanks for sharing the information about MySQL.

Avatar

Aleksey Leshchuk3 years ago

There is a nice gem solution called zero_downtime_migrations, which nicely covers many other cases besides concurrent index.

https://github.com/LendingH…

Abe Voelker3 years ago

It’s worth mentioning that concurrent index creation is not a fire-once-and-forget thing like traditional index creation; concurrent indexing can sometimes (silently) fail in the background so you have to verify later that the index succeeded building (I think your graph made it pretty obvious yours did).

https://medium.com/carwow-p…