Unified code

We decided to have one code base for all sites. I have been a part of efforts to unify code across multiple sites with disastrous results. The reason the efforts would typically fail is due to the very different needs of the sites and the types of content they manage. For example, on the surface it might seem that National Geographic Magazine and the National Geographic Education websites both simply manage articles, photos and videos, they actually are quite different. The education site, for example tracks lots of additional metadata for education needs, such as national standards compliance, age and grade levels and curriculum properties. What allowed us to use one code base in this instance was that each site was to function identically. They would manage the same data in the same ways.

Within our one code repository, each site would have its own set of configurations. This also meant that when the code was deployed on a server, it could serve one or more of the sites, depending on needs. That gave us flexibility in scaling.

codebase ├─ apps │ └─ ... ├─ bootstrap.py ├─ fabfile.py ├─ manage.py ├─ requirements.txt ├─ settings │ ├─ __init__.py │ ├─ base.py │ ├─ dev.py │ └─ production.py ├─ sites │ ├─ __init__.py │ └─ site1 │ ├─ __init__.py │ ├─ conf │ │ ├─ nginx.conf │ │ ├─ gunicorn_conf.py │ │ └─ upstart.conf │ ├─ static │ │ └─ ... │ ├─ templates │ │ └─ ... │ ├─ site_settings.py │ ├─ start_gunicorn.sh │ └─ urls.py ├─ static │ └─ ... ├─ templates │ └─ ... └─ urls.py

apps

Django applications or other python code that is so specific to the overall site that it is useless for any other site. And also legacy code that we are still stuck with.

bootstrap.py

A file that will set up the virtualenv and install the requirements. Typically used on server deployments, but could be used on a local machine.

fabfile.py

A file that contains scripts to run on servers

manage.py

The default Django command.

requirements.txt

All of the requirements for running the code. Can be installed with pip install -r requirements.txt

settings

The basic settings and derivations.

base.py includes all settings

dev.py includes overrides of base.py for development

production.py includes overrides of base.py for production use

__init__.py imports dev.py to make development easier by not having to specify the --settings flag.

sites

This module contains overrides for each site installed using this code base. Each site will have its own module under this.

sites/site1

The module for a site. It contains overrides specific to this site.

sites/site1/conf

Configurations for various parts. These files are symlinked to other locations if necessary

nginx.conf is the nginx configuration for this site. It is symlinked to /etc/nginx/sites-available/<sitename>

gunicorn_conf.py is the Gunicorn configuration for this site. It is referenced by path in sites/site1/start_gunicorn.sh

upstart.conf is Ubuntu's Upstart configuration. It is symlinked to /etc/init/<sitename>.conf

sites/site1/static

New or modified static files go here. They are collected with the default static media, but over ride any items with the same name.

sites/site1/templates

New or modified templates go here.

sites/site1/site_settings.py

Settings for this site. It imports settings.production so it only needs to override specific things, such as static directories, template directories, site id, default language, database, etc. Gunicorn is configured to use these settings for this site.

sites/site1/start_gunicorn.sh

A shell script, run from Upstart to run the Gunicorn WSGI server at server start.

sites/site1/urls.py

Overridden urls, if necessary.

This is part 1 of a series of posts. Stay tuned for additional posts.

During a weekly progress meeting my boss asked a simple question: “How hard would it be to translate a small part of our content and make a special site for it?” Apparently, there was some grant money available to do this, and in the non-profit realm grant money is our life blood. But there was a bit of a catch: everything had to be done before September, which gave us about six weeks to do it.

This project wasn’t far removed from projects already on our timeline. We wanted the ability to have language versions of our content, but it wasn’t high up on our priorities. However, when someone is going to give you lots of money, your priorities can change. Patricia, my boss, has a far-reaching vision for where the site can go and augmented the grant’s requirements with a few of her own.

She wanted to accomplish this project in a way that would make it easier to find partners for additional translations, instead of doing the absolute minimum it would take to provide translations. Also assuming that the initial content translated would not be the last content translated, a separate team would probably manage the translated site(s). With different teams managing different sets of content and the sites focusing on slightly different audiences, each site would require some autonomy.

So our basic project guidelines were:

Convert a single site into multiple (semi-independent) sites
Each site may be in different languages
Be able to push content from one site to another
Each site may want to alter the content independently
Know which content is on which other sites (to link across)
Have producer-customizable landing pages for featuring content

And have it all done in six weeks.

And this won’t be the only thing that you’re working on.

I should mention now that no one on our team has done anything like this before.

Our Team

Our primary team consists of a project manager, a business analyst, an application architect, a developer and a designer. We don’t really confine ourselves to titles. We all have primary responsibilities, but are able to assist each other as well. While we all work within National Geographic Education (NGEd) and the web site is our primary responsibility, each of us also works on additional NGEd side projects in differing capacities.

Breaking down the Project

We tend to break projects down into small components that are independently deployable. These components are then ranked in order importance and also by “must have” and “nice to have”. The goal is to break up the work so we can each work independently, or bring on additional help to work on a very specific component.

Here is how we broke it down, with a brief explanation:

Design the infrastructure for developing and hosting the various sites

There were a few specific design goals in this.

Easily maintained. Adding a new site shouldn’t mean a ton of additional work for everyone.
Easily understood by developers and producers. A brand new developer on the project should have as few "WTF" moments as possible. The team managing the content should only see their content and their site.
Allow for a site-specific look-and-feel, without having to re-implement the entire look-and-feel.

Export a piece of content and the content upon which it depends

If you want to move an article from site A to site B, you’ll obviously want the key photographs used, tags referenced, vocabulary listed (hey, we’re an educational site!), etc. The goal of this component would be to:

Find all related pieces of the requested content
Serialize it into the requested format, ideally in the standard way our development framework, Django, does it so it can be deserialized using standard Django tools.

Seialize content into a format that is compatible for translating

Django has a few standard ways of serializing content—JSON, XML, YAML—but none are good for doing translations. The goal of this component was:

Find a standard translation document format that:
Was extendable, to allow for Django-specific metadata
Ideally allowed for multiple pieces of content in one file
Had existing tools available for using it
Appeared widely-used
Write a standard Django serialization component for this format

Make arbitrary UUID natural keys and dynamically add them to existing content

This task was actually defined after working on the initial infrastructure design. We’ll discuss it more in depth later, but suffice it to say we needed an arbitrary natural key that wouldn’t get translated when moving from one site to another. This component needed to:

Automatically add and manage a UUID field on the specified content
Add the appropriate functions required for Django to use that field as the model's natural key when serializing and deserializing content.

Translate content (and work with translators)

We were vague on this point because we didn’t have access to the people arranging and co-ordinating the translations (due to vacations). We treated it as a black box that we would interface with manually or automatically, depending on the nature of the box.

Internationalize static strings in templates and code

Django already had methods for handling internationalization and localization. All we had to do was find everywhere we had a string and mark it for internationalization. This included our Python code, HTML, and JavaScript. Once these strings are marked, you then need to translate them, which is covered in the above task.

Create a dynamic page management system for stand-alone pages

Our current system had a very bad system for doing this and we had marked it for replacement already. Since the new translated site would also require these stand-alone pages, we added that to the requirements. Our plan wasn’t actually to create something from scratch, but to integrate an existing open source project, if possible. The component required:

Easy to use interface for producers
Customizable modules
Fast

Create a centralized system to manage content (browse content, select for copying, manage update notifications, track which content is where)

While doing a one-off translated site probably wouldn’t require this, we were planning for on-going translations. This required managing a workflow across multiple content teams and translators. We needed a way to manage it. The primary focus of this component was:

A client site API for searching content, requesting a serialized version, submitting translated content for insertion
Interface with the translators. One aspect of this was the inspection of the serialized objects and finding and removing objects that already existed on the destination site. No need to translate content that has already been translated.
A server side API for finding out if a piece of content existed in another language on another site.
Notification of updates to content already translated to each team.

Setting Priorities

These tasks are pretty much sorted into our order of priority. For example, the last component, the centralized management system was really a "want". Future articles will discuss each task in depth.

Part of the priority setting was also time-limiting each component and having several methods of implementation. A quick evaluation would decide if the idea implementation would take to long and we needed to settle with a less ideal, but more easily implemented solution.

Case Study part 2: Multiple sites and one code base

Unified code

Code structure

Case Study: Translating website content for National Geographic Education