Last week I made some modifications to our database migration framework (Bering) to support zero-downtime database deployment. My introduction to zero-downtime database deployment comes from Michael Nygard’s excellent book, Release It!.

Database migrations are one of the primary sources of planned outages during a system deployment. As we’re deploying to production every week, this is a big concern for us. In a scaled out web-based system, it is pretty easy to deploy a new version of the application without incurring downtime by adjusting the load balancer to pull servers out of the pool while they are getting upgraded. Applying schema changes to a central database is another matter.

Database migrations tend to introduce a chicken-or-the-egg-type problem: the database changes can’t be applied without breaking the existing version of the application and the new version of the application won’t work without the changes to the database. Either way, you’re faced with the prospect of all or part of the system being unavailable for the duration of the deployment. Not good.

Zero-downtime database deployment presents a way out of this conundrum. The idea is to separate database migrations into two sets of changes:

  • expansion scripts are any database changes that are safe to apply without breaking backwards compatibility with the existing version of the application. Changes like creating new tables, adding columns or tweaking indexes generally fall into this category.
  • contraction scripts are database migrations that clean up any database structure that is no longer needed after the upgrade. Deleting columns or tables and adding constrains are generally contraction operations.

The expansion scripts are run at some point prior to upgrading the application and the contraction scripts are run once the system has been upgraded and considered stable. This produces a nice benefit of decoupling database migrations from application deployments. The expansion scripts could be run a day or more in advance of the application deployment at a time that is convenient for database changes. The contraction scripts could be run potentially days after the deployment once everything has been validated with the new release.

This approach also greatly simplifies the task of deployment rollbacks. As much as we try to ensure that all database changes are reversible by having ‘down’ scripts, rolling back database changes is rarely easy and can easily lead to lost or inconsistent data depending on the time elapsed between the migration and the rollback. With zero-downtime database deployment however, if a problem is discovered with the new version of the application either during or after the release, it is safe to rollback to the existing version of the application without needing to rollback the database changes as the expansion migrations are compatible with both versions of the system.

Supporting zero-downtime database deployment is as simple as having two schema version tables in your database – one for tracking the latest version of the expansion scripts applied and one for tracking the contraction script versions. Then it is just a matter of keeping separate database migrations folders for each type of script. I needed to make some extensions to Bering to support a configurable version table name and script folder location, but aside from that, it was pretty easy to get set up and going.

Applying zero-downtime database deployment is a bit of an experiment for us. I plan to report more as we get more experience with it.