Legacy databases in Django

This is all documented elsewhere on the Django docs and the internets, but I summarize it here for easy understanding and reference.

The scenario here is that of a Django project that has its own default database and a legacy database.

Legacy databases

A legacy database is a database that Django has not created itself. The process of taking control of that database starts with
$ python manage.py inspectdb > legacy/models.py
which results in a models.py file that Django has reconstructed from the current database schema at the moment of inspection.

Django manages the default database via the usual model first and migrations flow. The legacy database is not managed because during inspectdb Django attaches the flag

managed = False

on the imported models Meta inner class.

This means that Django is able to use the ORM for querying and CRUD operations on the legacy database, but that will not be managing the schema.

Multiple databases

From now on, we will assume that we have a Django project with two databases. The default one works as expected, the legacy one is inherited from an existing schema. We assume that we have created a specific Django application (also named legacy) that contains the models obtained from the legacy database via inspectdb.

To use both databases, we have to configure define all of them on the settings

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': getenv('DATABASE_NAME'),
        'USER': getenv('DATABASE_USER'),
        'PASSWORD': getenv('DATABASE_PASSWORD'),
        'HOST': getenv('DATABASE_HOST'),
        'PORT': 3306,
    },
    'legacy': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': getenv('LEGACY_DATABASE_NAME'),
        'USER': getenv('LEGACY_DATABASE_USER'),
        'PASSWORD': getenv('LEGACY_DATABASE_PASSWORD'),
        'HOST': getenv('LEGACY_DATABASE_HOST'),
        'PORT': 3306,
    }
}

Django will use the ‘default’ database by (you guess it) default when using the default manager Django attaches to any model like (suppose you have a Food model) Food.

models.Potato.objects.filter(provider='market').order_by('freshness')

If you want to use the model from another database (imagine that on the legacy database you have rock bands), you can specify the database on the manager also:

models.Singer.objects.using('legacy').filter(name='Mercury')

This might feel awkward and if you know in advance which models you want in which database, then you can leverage this to Django for it to choose the right database for any model. You need a custom router, overriding the default one on the settings:

DATABASE_ROUTERS = ['legacy.db.LegacyRouter',]

In particular, you need to tell the router which database to use for reading and for writing:

class LegacyRouter:
    route_app_labels = {'legacy',}
    legacy_database = 'legacy'

    def db_for_read(self, model, **hints):
        if model._meta.app_label in self.route_app_labels:
            return self.legacy_database
        return None

    def db_for_write(self, model, **hints):
        if model._meta.app_label in self.route_app_labels:
            return self.legacy_database
        return None

Also, we need to create migrations for the legacy database and migrate them if we want to use the admin on them.

$ python manage.py createmigrations
$ python manage.py migrate

This is necessary because Django creates permissions for the tables. It doesn’t create the tables because the models are not managed. You can see that on the migrations just created. The models come with options

options={
    'db_table': 'singers',
    'managed': False,
},

This has the effect of an empty SQL when migrating. You can check this doing

$ python manages.py sqlmigrate legacy 0001

Multiple databases when testing

When running

python manage.py test

Django creates a mysqlite database in memory and applies all the migrations to it. This ensures that tests will encounter the schema expected by the application, which is very handy for doing some integration tests that involve the persistence layer.

Out of the box, Django only migrates models for the default database.

To me, __legacy code __is simply code without tests.

In order for Django to create and migrate models for more than the default database we need to add the databases that we need for the tests on the TestCase. We can do that on the test settings module once for all the tests

from django.test import TestCase
TestCase.databases = { 'default', 'legacy' }

Also, if we want to recreate the legacy database from scratch during testing, we need to tell django to apply migrations during testing.

A trick you can use is to have an extra setting

IS_TESTING = False

everywhere, but use an different test settings module and override it there:

IS_TESTING = True

This allow you to change the

options={
    'db_table': 'singers',
    'managed': False,
},

on the migrations files by putting

from django.conf import settings
MANAGED = settings.IS_TESTING

on top and changing the options like this

options={
    'db_table': 'singers',
    'managed': MANAGED,
},

This way, when running tests with the test settings, Django will also create the legacy database with the schema that the application code (models and admin mostly) expects.

Adopting a legacy model

Typically, if you’re using a legacy database, sooner or later you will need to change it. It’s time to adopt it.

To adopt a legacy model we need to first make it managed, so change the Meta setting to True:

class Singer(models.Model):
    singer_id = models.AutoField(primary_key=True)
    name = models.CharField(max_length=255)

    class Meta:
        managed = True
        db_table = singers'

then create a migration

$ python manage.py makemigrations

This causes a new migration that alter the model options to ‘managed’: True.

You don’t have to use the trick with IS_TESTING here, because from that migration forward, the model is managed in tests also.

Then you can manage the model as you do with the default database (add rows, remove rows, whatever).

To ensure that Django will not mix tables from one database to the other when migrating, you need to tell the router which tables go to which database.

class LegacyRouter:
    route_app_labels = {'legacy',}
    legacy_database = 'legacy'

    def db_for_read(self, model, **hints):
        if model._meta.app_label in self.route_app_labels:
            return self.legacy_database
        return None

    def db_for_write(self, model, **hints):
        if model._meta.app_label in self.route_app_labels:
            return self.legacy_database
        return None

    def allow_relation(self, obj1, obj2, **hints):
        return None

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        is_legacy_db = db == self.legacy_database
        is_legacy_app = app_label in self.route_app_labels
        if is_legacy_db:
            return is_legacy_app
        else:
            return not is_legacy_app

With this router, only the models on the legacy.models module will be migrated and accessed from the lagacy database, all the other models will remain on the default database, both in tests and in normal code and operations.

A note on migrations

With this setup, the commands

  • python manage.py showmigrations
  • python manage.py sqlmigrate
  • python manage.py migrate

will act only with the default database. This is by design.

To interact with the legacy database we need to pass the –database=legacy extra parameter.

  • python manage.py showmigrations–database=legacy`
  • python manage.py sqlmigrate–database=legacy
  • python manage.py migrate–database=legacy

If you adopt the legacy database, this means you will need to start migrating the schema there. The first time you do it, django will create a django_migrations table on the legacy database to keep track of the changes. Also note that Django will add an entry for any migration in the project to any database (both in the default database and in the legacy database). This is by design.

This doesn’t mean that Django creates tables in both databases. Django creates the tables and manages the schema according to what the router says. You can check that the actual SQL emitted for the default database is empty for the migrations for the legacy application with sqlmigrate. You can also check that no models of the default database are created to the legacy one also with sqlmigrate.