This is part 3 in a multiple part series. Read part 2 or part 1
The object selector
Our site, as many sites, is highly interrelated. One piece of content isn’t really just one row in a database table. If we wanted to export an article we would want to include the authors, photos and other media, and a variety of other content that is directly linked to the article.
This meant the typical Django serialization wouldn’t work. Django’s default export command (dumpdata
) assumes that only one table is involved. Luckily Django’s serialization framework serializes object by object independently. That means that what we really needed a way to select the objects required and feed them through the serialization process.
I started with django-fixture-magic. This project had the basic idea, but there were a few limitations:
- The output format (JSON) was hard-coded. We needed a format-agnostic command.
- Limited ability to use natural keys. Natural keys was crucial so we could avoid database id collisions.
- Limited ability to control which objects get exported. There were some relationships that we wanted to ignore, but the parts we wanted to ignore weren’t universal.
There are quite a few forks on this project. Some forks helped with these limitations, but we decided to fork it into django-objectdump. Since we were changing it substantially, we decided the name change was appropriate.
Changes
Fine-grain control over selection process
Due to the rather elaborate connections, specifying them over the command line, and having to do it every time seemed cumbersome. So the configuration is static and lives in your settings.py file.
For each model you can specify to ignore it completely, ignore all or some foreign key relationships, ignore all or some many-to-many relationships, specify additional relations to include, ignore all or some reverse relationships, and exclude one or more fields on the model.
Here is part of our configuration as an example:
OBJECTDUMP_SETTINGS = {
'MODEL_SETTINGS': {
'licensing.grantedlicense': {
'm2m_fields': False, 'reverse_relations': False, },
'reference.genericarticlerelation': {'ignore': True},
'reference.genericarticle': {
'addl_relations': ['resources.all', get_concepts],
'exclude': ['reporting_categories']},
'reporting.reportingcategory': {'ignore': True},
'resource_carousel.externalresource': {
'm2m_fields': ['categories', ],
'reverse_relations': False, },
'resource_carousel.slide': {'addl_relations': ['content_object', ]},
}
}
Debugging information
We also built in plenty of debugging information to find out why the hell certain records got in the export. It has the ability to print out how it is following the relationships, print out what will get exported, and even output a .dot
file showing the relationships visually.
Natural key support is included
That was a big requirement. Like Django’s dumpdata
command, the format is specified when you call the command.
Added GenericForeignKey support
Generic Foreign Keys are tricky and not supported by Django’s serialization framework. We provided a hook to do it that required the output format to support it. Since we were writing our own output format, that wasn’t going to be a problem.
Improved topological sorting
The selected objects needed to be serialized so that when an object was deserialized, all the objects to which it was relating are already deserialized. This is called topological sorting. (Yea, I learned a new concept!) The dependency tracking in django-fixture-magic was pretty good, but had issues with more complex relationships.
Results
Django-objectdump allowed us to export our very interrelated content completely and easily. The export of one Lesson
model could result in 200 or more additional objects exported. Once we used the debugging information to dial in the configuration for each primary type of content model, export and import was effortless in any format.