How to Survive Wiping Your Database

Engineering

Learn from our challenges and triumphs as our talented engineering team offers insights for discussion and sharing.

How to Survive Wiping Your Database

Engineering

We use Ruby on Rails pretty heavily at LiveRamp, and we use Rails migrations to manage our database schemas. While Rails migrations are a great way to manage the schema for a single database, we have multiple production databases. Managing multiple schemas has been a bit of a pain point for us in the past, so we recently set out to create a tool that would help us run multiple sets of migrations from within a single project.

We ran into some minor difficulties at first. We had some old migrations that used the old naming scheme (they were numbered consecutively instead of using timestamps) and were mixed in with our newer migrations. Since we wanted our new tools to be able to assume that we were using the timestamp naming convention instead of the old consecutive numbering convention, we went back and renamed the old migrations. This seemed to work fine, and our new tools worked great when developing on our local machines.

About a week went by and everything was fine. Then, all of a sudden, our website went down. We started investigating, and we found that a bunch of important database tables were completely empty. It took a bit of digging around, but eventually we tracked the problem back to a migration that we had renamed a week earlier. Apparently this was the first time that we had tried to run rake db:migrate in production, and renaming the migration caused Rails to re-run it (since it now had a different number and looked like a brand new migration, even though it had already been run in the past). To complete the perfect storm, this particular migration happened to be copied from an old schema.rb, which meant that there were :force => true directives after each create_table. This meant that since this migration had already been run in the past, the tables were dropped and recreated.

Once we identified the problem, it was simply a matter of restoring the database from our backups. We ran into a few hiccups when trying to replay the database logs, but ultimately we got everything resolved.

While it seemed like a fiasco at the time, we learned a number of valuable lessons from this incident.

  1. Editing old migrations — particularly renaming them — is bad. Just don’t do it.
  2. Back up your databases. Seriously.
  3. Want to be sure something like this never happens? Take “drop table” permissions away from the user that you use to run migrations. Dropping tables is a cleanup task that can be done (carefully) by administrators.