This is part 1 of a series of posts. Stay tuned for additional posts.
During a weekly progress meeting my boss asked a simple question: “How hard would it be to translate a small part of our content and make a special site for it?” Apparently, there was some grant money available to do this, and in the non-profit realm grant money is our life blood. But there was a bit of a catch: everything had to be done before September, which gave us about six weeks to do it.
This project wasn’t far removed from projects already on our timeline. We wanted the ability to have language versions of our content, but it wasn’t high up on our priorities. However, when someone is going to give you lots of money, your priorities can change. Patricia, my boss, has a far-reaching vision for where the site can go and augmented the grant’s requirements with a few of her own.
She wanted to accomplish this project in a way that would make it easier to find partners for additional translations, instead of doing the absolute minimum it would take to provide translations. Also assuming that the initial content translated would not be the last content translated, a separate team would probably manage the translated site(s). With different teams managing different sets of content and the sites focusing on slightly different audiences, each site would require some autonomy.
So our basic project guidelines were:
- Convert a single site into multiple (semi-independent) sites
- Each site may be in different languages
- Be able to push content from one site to another
- Each site may want to alter the content independently
- Know which content is on which other sites (to link across)
- Have producer-customizable landing pages for featuring content
And have it all done in six weeks.
And this won’t be the only thing that you’re working on.
I should mention now that no one on our team has done anything like this before.
Our Team
Our primary team consists of a project manager, a business analyst, an application architect, a developer and a designer. We don’t really confine ourselves to titles. We all have primary responsibilities, but are able to assist each other as well. While we all work within National Geographic Education (NGEd) and the web site is our primary responsibility, each of us also works on additional NGEd side projects in differing capacities.
Breaking down the Project
We tend to break projects down into small components that are independently deployable. These components are then ranked in order importance and also by “must have” and “nice to have”. The goal is to break up the work so we can each work independently, or bring on additional help to work on a very specific component.
Here is how we broke it down, with a brief explanation:
Design the infrastructure for developing and hosting the various sites
There were a few specific design goals in this.
- Easily maintained. Adding a new site shouldn’t mean a ton of additional work for everyone.
- Easily understood by developers and producers. A brand new developer on the project should have as few "WTF" moments as possible. The team managing the content should only see their content and their site.
- Allow for a site-specific look-and-feel, without having to re-implement the entire look-and-feel.
Export a piece of content and the content upon which it depends
If you want to move an article from site A to site B, you’ll obviously want the key photographs used, tags referenced, vocabulary listed (hey, we’re an educational site!), etc. The goal of this component would be to:
- Find all related pieces of the requested content
- Serialize it into the requested format, ideally in the standard way our development framework, Django, does it so it can be deserialized using standard Django tools.
Seialize content into a format that is compatible for translating
Django has a few standard ways of serializing content—JSON, XML, YAML—but none are good for doing translations. The goal of this component was:
- Find a standard translation document format that:
- Was extendable, to allow for Django-specific metadata
- Ideally allowed for multiple pieces of content in one file
- Had existing tools available for using it
- Appeared widely-used
- Write a standard Django serialization component for this format
Make arbitrary UUID natural keys and dynamically add them to existing content
This task was actually defined after working on the initial infrastructure design. We’ll discuss it more in depth later, but suffice it to say we needed an arbitrary natural key that wouldn’t get translated when moving from one site to another. This component needed to:
- Automatically add and manage a UUID field on the specified content
- Add the appropriate functions required for Django to use that field as the model's natural key when serializing and deserializing content.
Translate content (and work with translators)
We were vague on this point because we didn’t have access to the people arranging and co-ordinating the translations (due to vacations). We treated it as a black box that we would interface with manually or automatically, depending on the nature of the box.
Internationalize static strings in templates and code
Django already had methods for handling internationalization and localization. All we had to do was find everywhere we had a string and mark it for internationalization. This included our Python code, HTML, and JavaScript. Once these strings are marked, you then need to translate them, which is covered in the above task.
Create a dynamic page management system for stand-alone pages
Our current system had a very bad system for doing this and we had marked it for replacement already. Since the new translated site would also require these stand-alone pages, we added that to the requirements. Our plan wasn’t actually to create something from scratch, but to integrate an existing open source project, if possible. The component required:
- Easy to use interface for producers
- Customizable modules
- Fast
Create a centralized system to manage content (browse content, select for copying, manage update notifications, track which content is where)
While doing a one-off translated site probably wouldn’t require this, we were planning for on-going translations. This required managing a workflow across multiple content teams and translators. We needed a way to manage it. The primary focus of this component was:
- A client site API for searching content, requesting a serialized version, submitting translated content for insertion
- Interface with the translators. One aspect of this was the inspection of the serialized objects and finding and removing objects that already existed on the destination site. No need to translate content that has already been translated.
- A server side API for finding out if a piece of content existed in another language on another site.
- Notification of updates to content already translated to each team.
Setting Priorities
These tasks are pretty much sorted into our order of priority. For example, the last component, the centralized management system was really a "want". Future articles will discuss each task in depth.
Part of the priority setting was also time-limiting each component and having several methods of implementation. A quick evaluation would decide if the idea implementation would take to long and we needed to settle with a less ideal, but more easily implemented solution.