Putting together a Dockerfile

I couldn't wait any longer, so I wanted to see it running in Docker!

Choosing a Linux distrobution

We want the absolute smallest container we can get to run our project. The container is going to run Linux. We currently have Ubuntu on our servers, but default Ubuntu includes lots of stuff we don't need.

We chose Alpine Linux because it was small and had a large set of packages to install.

Setting up the Dockerfile

We based our Dockerfile on João Ferreira Loff's Alpine Linux Python 2.7 slim image.

FROM alpine:3.5

# Install needed packages. Notes:
#   * dumb-init: a proper init system for containers, to reap zombie children
#   * musl: standard C library
#   * linux-headers: commonly needed, and an unusual package name from Alpine.
#   * build-base: used so we include the basic development packages (gcc)
#   * bash: so we can access /bin/bash
#   * git: to ease up clones of repos
#   * ca-certificates: for SSL verification during Pip and easy_install
#   * python2: the binaries themselves
#   * python2-dev: are used for gevent e.g.
#   * py-setuptools: required only in major version 2, installs easy_install so we can install Pip.
#   * build-base: used so we include the basic development packages (gcc)
#   * linux-headers: commonly needed, and an unusual package name from Alpine.
#   * python-dev: are used for gevent e.g.
#   * postgresql-client: for accessing a PostgreSQL server
#   * postgresql-dev: for building psycopg2
#   * py-lxml: instead of using pip to install lxml, this is faster. Must make sure requirements.txt has correct version
#   * libffi-dev: for compiling Python cffi extension
#   * tiff-dev: For Pillow: TIFF support
#   * jpeg-dev: For Pillow: JPEG support
#   * openjpeg-dev: For Pillow: JPEG 2000 support
#   * libpng-dev: For Pillow: PNG support
#   * zlib-dev: For Pillow:
#   * freetype-dev: For Pillow: TrueType support
#   * lcms2-dev: For Pillow: Little CMS 2 support
#   * libwebp-dev: For Pillow: WebP support
#   * gdal: For some Geo capabilities
#   * geos: For some Geo capabilities
ENV PACKAGES="\
  dumb-init \
  musl \
  linux-headers \
  build-base \
  bash \
  git \
  ca-certificates \
  python2 \
  python2-dev \
  py-setuptools \
  build-base \
  linux-headers \
  python-dev \
  postgresql-client \
  postgresql-dev \
  py-lxml \
  libffi-dev \
  tiff-dev \
  jpeg-dev \
  openjpeg-dev \
  libpng-dev \
  zlib-dev \
  freetype-dev \
  lcms2-dev \
  libwebp-dev \
  gdal \
  geos \
"

RUN echo \
  # replacing default repositories with edge ones
  && echo "http://dl-cdn.alpinelinux.org/alpine/edge/testing" > /etc/apk/repositories \
  && echo "http://dl-cdn.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories \
  && echo "http://dl-cdn.alpinelinux.org/alpine/edge/main" >> /etc/apk/repositories \

  # Add the packages, with a CDN-breakage fallback if needed
  && apk add --no-cache $PACKAGES || \
    (sed -i -e 's/dl-cdn/dl-4/g' /etc/apk/repositories && apk add --no-cache $PACKAGES) \

  # make some useful symlinks that are expected to exist
  && if [[ ! -e /usr/bin/python ]];        then ln -sf /usr/bin/python2.7 /usr/bin/python; fi \
  && if [[ ! -e /usr/bin/python-config ]]; then ln -sf /usr/bin/python2.7-config /usr/bin/python-config; fi \
  && if [[ ! -e /usr/bin/easy_install ]];  then ln -sf /usr/bin/easy_install-2.7 /usr/bin/easy_install; fi \

  # Install and upgrade Pip
  && easy_install pip \
  && pip install --upgrade pip \
  && if [[ ! -e /usr/bin/pip ]]; then ln -sf /usr/bin/pip2.7 /usr/bin/pip; fi \
  && echo

# Chaining the ENV allows for only one layer, instead of one per ENV statement
ENV HOMEDIR=/code \
    LANG=en_US.UTF-8 \
    LC_ALL=en_US.UTF-8 \
    PYTHONUNBUFFERED=1 \
    NEW_RELIC_CONFIG_FILE=$HOMEDIR/newrelic.ini \
    GUNICORNCONF=$HOMEDIR/conf/docker_gunicorn_conf.py \
    GUNICORN_WORKERS=2 \
    GUNICORN_BACKLOG=4096 \
    GUNICORN_BIND=0.0.0.0:8000 \
    GUNICORN_ENABLE_STDIO_INHERITANCE=True \
    DJANGO_SETTINGS_MODULE=settings

WORKDIR $HOMEDIR

# Copying this file over so we can install requirements.txt in one cache-able layer
COPY requirements.txt $HOMEDIR/
RUN pip install --upgrade pip \
  && pip install -r $HOMEDIR/requirements.txt

# Copy the code
COPY . $HOMEDIR

EXPOSE 8000
CMD ["sh", "-c", "$HOMEDIR/docker-entrypoint.sh"]

The first change that we made was to use Alpine Linux version 3.5, which has just been released.

Next we listed all the OS-level packages we'll need in the PACKAGES environment variable.

The next RUN statement sets the package repositories to the edge version, installs the packages in PACKAGES, creates a few convenience symlinks, and installs pip for our Python installs.

We set up all the environment variables next.

After setting the working directory, we copy our requirements.txt file into the container and install all our requirements. We do this step separately so it creates a cached layer that won't change unless the requirements.txt file changes. This saves tons of time if you keep building and re-building the image.

We copy all our code over to the container, tell the container to expose port 8000 and specify the command to run (unless we specify a different command at runtime).

You'll notice that the command looks strange. Because of the way that Docker executes the commands, it can't substitute the environment variable HOMEDIR. So we have to actually prefix our command $HOMEDIR/docker-entrypoint.sh with sh -c.

But there's something missing

You'll notice in this version, there isn't any environment variables for the database, cache, or any other variables we set up earlier. We'll get them in there eventually, but for right now, we want to see if we can build and run this container and have it connect to our local database and cache.

If you build it, it can run

Building the docker image is really easy:

docker build -t ngs:latest .

This tags this built image as ngs:latest, which isn't what we are going to do in production, but it helps when testing everything.

The output looks something like this:

$ docker build -t ngs:latest .
Sending build context to Docker daemon 76.43 MB
Step 1 : FROM alpine:3.5
 ---> 88e169ea8f46
Step 2 : ENV PACKAGES "  dumb-init   musl   linux-headers   build-base   bash   git   ca-certificates   python2   python2-dev   py-setuptools   build-base   linux-headers   python-dev   postgresql-client   postgresql-dev   py-lxml   libffi-dev   tiff-dev   jpeg-dev   openjpeg-dev   libpng-dev   zlib-dev   freetype-dev   lcms2-dev   libwebp-dev   gdal   geos "
 ---> Using cache
 ---> 184f9b7e79f9
Step 3 : RUN echo   && echo "http://dl-cdn.alpinelinux.org/alpine/edge/testing" > /etc/apk/repositories   && echo "http://dl-cdn.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories   && echo "http://dl-cdn.alpinelinux.org/alpine/edge/main" >> /etc/apk/repositories   && apk add --no-cache $PACKAGES ||     (sed -i -e 's/dl-cdn/dl-4/g' /etc/apk/repositories && apk add --no-cache $PACKAGES)   && if [[ ! -e /usr/bin/python ]];        then ln -sf /usr/bin/python2.7 /usr/bin/python; fi   && if [[ ! -e /usr/bin/python-config ]]; then ln -sf /usr/bin/python2.7-config /usr/bin/python-config; fi   && if [[ ! -e /usr/bin/easy_install ]];  then ln -sf /usr/bin/easy_install-2.7 /usr/bin/easy_install; fi   && easy_install pip   && pip install --upgrade pip   && if [[ ! -e /usr/bin/pip ]]; then ln -sf /usr/bin/pip2.7 /usr/bin/pip; fi   && echo
 ---> Using cache
 ---> 514dcc2f010d
Step 4 : ENV HOMEDIR /code LANG en_US.UTF-8 LC_ALL en_US.UTF-8 PYTHONUNBUFFERED 1 NEW_RELIC_CONFIG_FILE $HOMEDIR/newrelic.ini GUNICORNCONF $HOMEDIR/conf/docker_gunicorn_conf.py GUNICORN_WORKERS 2 GUNICORN_BACKLOG 4096 GUNICORN_BIND 0.0.0.0:8000 GUNICORN_ENABLE_STDIO_INHERITANCE True DJANGO_SETTINGS_MODULE settings
 ---> Running in 2d58f77c0a8e
 ---> 1342bb501c0f
Removing intermediate container 2d58f77c0a8e
Step 5 : WORKDIR $HOMEDIR
 ---> Running in a20a2fa64d2e
 ---> df977d30491c
Removing intermediate container a20a2fa64d2e
Step 6 : COPY requirements.txt $HOMEDIR/
 ---> e6ae37797b36
Removing intermediate container 820e3406fb5c
Step 7 : RUN pip install --upgrade pip   && pip install -r $HOMEDIR/requirements.txt
 ---> Running in 4c65be60af03
Requirement already up-to-date: pip in /usr/lib/python2.7/site-packages/pip-9.0.1-py2.7.egg
Collecting beautifulsoup4==4.5.1 (from -r /code/requirements.txt (line 2))
  Downloading beautifulsoup4-4.5.1-py2-none-any.whl (83kB)
Collecting cmsplugin-forms-builder==1.1.1 (from -r /code/requirements.txt (line 3))
...
Installing collected packages: beautifulsoup4, Django, ...
  Running setup.py install for future: started
    Running setup.py install for future: finished with status 'done'
  Installing from a newer Wheel-Version (1.1)
  Running setup.py install for unidecode: started
    Running setup.py install for unidecode: finished with status 'done'
Successfully installed Django-1.8.15 Fabric-1.10.2 ...
 ---> 165f7ae9507e
Removing intermediate container 4c65be60af03
Step 8 : COPY . $HOMEDIR
 ---> 1058d14b462f
Removing intermediate container 55f77f2e60d6
Step 9 : EXPOSE 8000
 ---> Running in 38e8c650a529
 ---> 7c53dcf41f2a
Removing intermediate container 38e8c650a529
Step 10 : CMD sh -c $HOMEDIR/docker-entrypoint.sh
 ---> Running in 1b8781bf6458
 ---> a255a40e30b8
Removing intermediate container 1b8781bf6458
Successfully built a255a40e30b8

I've truncated most of the output from installing the Python dependencies. If I run it again, steps 6 and 7 use the existing cache:

Step 6 : COPY requirements.txt $HOMEDIR/
 ---> Using cache
 ---> e6ae37797b36
Step 7 : RUN pip install --upgrade pip   && pip install -r $HOMEDIR/requirements.txt
 ---> Using cache
 ---> 165f7ae9507e

If I make changes to any other part of our project, steps 1-7 use the cache, and it only has to copy over the new code.

How big is it?

So how big is the container? Running docker images gives us:

REPOSITORY             TAG                 IMAGE ID            CREATED             SIZE
ngs                    latest              a255a40e30b8        11 minutes ago      590.1 MB

So 590.1 MB. What makes up that space? We can take a look at the layers created by our Dockerfile. Running docker history ngs:latest returns:

IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
a255a40e30b8        7 minutes ago       /bin/sh -c #(nop)  CMD ["sh" "-c" "$HOMEDIR/d   0 B
7c53dcf41f2a        7 minutes ago       /bin/sh -c #(nop)  EXPOSE 8000/tcp              0 B
1058d14b462f        7 minutes ago       /bin/sh -c #(nop) COPY dir:0da094a2328f4e5bfb   73.69 MB
165f7ae9507e        7 minutes ago       /bin/sh -c pip install --upgrade pip   && pip   227.1 MB
e6ae37797b36        11 minutes ago      /bin/sh -c #(nop) COPY file:25e352c295f212113   3.147 kB
df977d30491c        11 minutes ago      /bin/sh -c #(nop)  WORKDIR /code                0 B
1342bb501c0f        11 minutes ago      /bin/sh -c #(nop)  ENV HOMEDIR=/code LANG=en_   0 B
514dcc2f010d        3 days ago          /bin/sh -c echo   && echo "http://dl-cdn.alpi   285.3 MB
184f9b7e79f9        3 days ago          /bin/sh -c #(nop)  ENV PACKAGES=  dumb-init     0 B
88e169ea8f46        6 days ago          /bin/sh -c #(nop) ADD file:92ab746eb22dd3ed2b   3.984 MB

At the bottom layer is the Alpine Linux 3.5 distro, which is only 3.984 MB. Our OS-level packages take up 285.3 MB. Our Python dependencies take up 227.1 MB. Our code is 73.69 MB.

Make it run! Make it run!

We want this container to connect to resources running on our local computer.

Make PostgreSQL and Redis listen more

My default installation of redis and PostgreSQL only listen for connections on the loopback address. I modified them to listen on every interface.

Now my container will be able to connect to them.

Give the container the address

The container has no idea where it is running. Typically all the connections are made when Docker sets up the containers (and that is what we want, eventually). We need to inform the container on where it is running.

We are going to do this with a temporary script called docker-run.sh

#!/bin/bash
export DOCKERHOST=$(ifconfig | grep -E "([0-9]{1,3}\.){3}[0-9]{1,3}" | grep -v 127.0.0.1 | awk '{ print $2 }' | cut -f2 -d: | head -n1)
docker rm ngs-container
docker run -ti \
    -p 8000:8000 \
    --add-host dockerhost:$DOCKERHOST \
    --name ngs-container \
    -e DATABASE_URL=postgresql://coordt:password@dockerhost:5432/education \
    -e CACHE_URL=rediscache://dockerhost:6379/0?CLIENT_CLASS=site_ext.cacheclient.GracefulClient \
    ngs:latest

The first line sets DOCKERHOST environment variable to the local computer's current IP address.

The second line removes any existing containers named ngs-container. Note: Docker doesn't clean up after itself very well. This is very well known, and there are several different solutions, I'm sure. After doing some Docker building and running, you end up with lots of unused images and containers. This script attempts to remove old containers by naming the container ngs-container each time.

The last line tells docker to run the ngs:latest image, with a pseudo-tty and interactivity (-ti), map container port 8000 to local port 8000 (-p 8000:8000), adds dockerhost to the container's /etc/hosts file with the local computer's current IP address (--add-host dockerhost:$DOCKERHOST), name the container ngs-container (--name ngs-container), and sets the DATABASE_URL and CACHE_URL environment variables.

Now, make docker-run.sh executable with a chmod a+x, and you can run it.

$ ./docker-run.sh
Copying '/code/static/concepts/jquery-textext.js'
Copying '/code/static/autocomplete_light/addanother.js'
...
Post-processed 'js/tiny_mce/plugins/inlinepopups/skins/clearlooks2/img/alert.gif' as 'js/tiny_mce/plugins/inlinepopups/skins/clearlooks2/img/alert.568d4cf84413.gif'
Post-processed 'js/tiny_mce/plugins/inlinepopups/skins/clearlooks2/img/corners.gif' as 'js/tiny_mce/plugins/inlinepopups/skins/clearlooks2/img/corners.55298b5baaec.gif'
...
4256 static files copied to '/code/staticmedia', 4256 post-processed.
Operations to perform:
  Synchronize unmigrated apps: redirects, ...
  Apply all migrations: teachingatlas, ...
Synchronizing apps without migrations:
  Creating tables...
    Running deferred SQL...
  Installing custom SQL...
Running migrations:
  No migrations to apply.

If you remember from a previous post, the docker-entrypoint.sh runs two commands before it starts gunicorn.

The first is collecting (and post-processing) the static media. I've truncated the output for copying and the post-processing of said static media, but you can see that it ran.

The next is a database migration. I've truncated the output somewhat, but can see that nothing was required to migrate.

Now when I try http://localhost:8000, I get a web page! Success!

Next time

In the next installment I'll get the container serving its own static files.

The road to Docker, Django and Amazon ECS, part 4