Peripatetic thinking
Last week I made some modifications to our database migration framework (Bering) to support zero-downtime database deployment. My introduction to zero-downtime database deployment comes from Michael Nygard’s excellent book, Release It!.
Database migrations are one of the primary sources of planned outages during a system deployment. As we’re deploying to production every week, this is a big concern for us. In a scaled out web-based system, it is pretty easy to deploy a new version of the application without incurring downtime by adjusting the load balancer to pull servers out of the pool while they are getting upgraded. Applying schema changes to a central database is another matter.
Database migrations tend to introduce a chicken-or-the-egg-type problem: the database changes can’t be applied without breaking the existing version of the application and the new version of the application won’t work without the changes to the database. Either way, you’re faced with the prospect of all or part of the system being unavailable for the duration of the deployment. Not good.
Zero-downtime database deployment presents a way out of this conundrum. The idea is to separate database migrations into two sets of changes:
The expansion scripts are run at some point prior to upgrading the application and the contraction scripts are run once the system has been upgraded and considered stable. This produces a nice benefit of decoupling database migrations from application deployments. The expansion scripts could be run a day or more in advance of the application deployment at a time that is convenient for database changes. The contraction scripts could be run potentially days after the deployment once everything has been validated with the new release.
This approach also greatly simplifies the task of deployment rollbacks. As much as we try to ensure that all database changes are reversible by having ‘down’ scripts, rolling back database changes is rarely easy and can easily lead to lost or inconsistent data depending on the time elapsed between the migration and the rollback. With zero-downtime database deployment however, if a problem is discovered with the new version of the application either during or after the release, it is safe to rollback to the existing version of the application without needing to rollback the database changes as the expansion migrations are compatible with both versions of the system.
Supporting zero-downtime database deployment is as simple as having two schema version tables in your database – one for tracking the latest version of the expansion scripts applied and one for tracking the contraction script versions. Then it is just a matter of keeping separate database migrations folders for each type of script. I needed to make some extensions to Bering to support a configurable version table name and script folder location, but aside from that, it was pretty easy to get set up and going.
Applying zero-downtime database deployment is a bit of an experiment for us. I plan to report more as we get more experience with it.
80% technical, 20% social change. This blog is dedicated to finding ways to sustainably release software more frequently.
exortech.com » Blog Archive » Weekly Release Blog #18 - Long-running database migrations
March 26th, 2009 at 10:26 pm
[...] we started practicing zero-downtime database deployment, we have the freedom to decouple database expansion migrations from the actual release as we ensure [...]
saem
March 31st, 2009 at 3:43 pm
You might want to give liquibase a shot. It has the ability to generate sql if desired, and unlike migrations which are up/down, it’s based on changesets.
exortech
March 31st, 2009 at 9:44 pm
Thanks for the feedback, Saem. I haven’t tried out LiquiBase, but it looks interesting (btw, Hawk also contacted me with this recommendation). I can see the value of changesets over versioning individual migrations. Does LiquiBase support a script language for building migrations – instead of having to use xml?
exortech.com » Blog Archive » Speaking at DevTeach Vancouver
June 19th, 2009 at 6:44 am
[...] Zero down-time deployment [...]
exortech.com » Blog Archive » Back from Bangalore (and Hyderabad and Mumbai)
September 16th, 2009 at 9:56 pm
[...] presentation also stimulated some good side chatter on twitter. In general, zero-downtime database deployment, continuous monitoring and WAGMI seemed to be popular topics. Thanks to everyone who made it out [...]
Zero Downtime Continuous Deployment « Lean Builds
November 30th, 2009 at 4:04 pm
[...] since you’ve potentially got lots of dependency problems; luckily, the Exortech blog recently addressed this issue as [...]
Scott Rich
January 15th, 2010 at 9:48 am
Great article, thanks. I hope this isn’t a naive question, but this scheme doesn’t seem to allow for any form of “mutating” schema change, changing the logical or physical type of a column, for example. Is the assumption that this has to be done by moving the new app version to a new column and deprecating the old?
Simon Harris
January 19th, 2010 at 8:19 pm
How do you handle views (if at all)?
Have you had to handle column type changes?
Chris
February 20th, 2010 at 4:16 pm
You may want to have a look at a product called ChronicDB. It offers zero-downtime database deployment in a way that solves the chicken-or-the-egg problem.
Pawan
October 1st, 2010 at 4:28 am
if stage servers DB servers are there stop data flow from stage to Primary DB servers, you can take one more mirror server, do deployment on it and do testing, once successfull, failover the server and make principal down, and start data flow from stage to Primary server. This will not affect existing users. Hope it would help
Benjamin Starks
November 9th, 2010 at 10:39 am
Mutating a schema change is the major problem we are also facing. With 20+ web-server frontends all hitting the database we can’t just bring all of them down to change the logic structure.
We looked at Liquibase for a while, but even though it builds automatically scripts that can apply a schema change, the time it takes to do it for our 8GB+ db locks up the application. We are giving ChronicDB a go next.