This is part two in a series on taking a simple Python project from local script to production. In part one I talked about a gotcha I ran into when converting an old project from Python 2 to Python 3.

This part will go over how I put my Python process, its inputs, and its outputs into a Docker container and made an image publicly available on Dockerhub.

Requirements that I will not go over here. Go to Docker.com and follow the instructions there

Download docker
Create a docker id
Log in with your docker id on Dockerhub

What is Docker?

Docker is a containerization platform. Containerization is a way to package units of code with their dependencies so that they have everything they need to run in isolation.

Using Docker can help fix the "it works on my machine" problem, and writing dockerized code is a great way to encourage thoughtful code practices. Docker containers should be simple, responsible for as little as possible, and dependent on as few externals as possible.

Docker image vs docker container

Throughout this post, and online, you'll see the terms container and image. An image is basically a snapshot of your dockerized code that is created when you use the docker build command - more on that below. Docker images start a container when you use docker run on that image. So a container is a running instance of an image.

Anatomy of a Dockerfile

I decided to dockerize my csv writer from the previous post in this series so that I could move it between environments easily.

For this I needed a Dockerfile. A Dockerfile is a text file that does not have a file extension.

Here's what the dockerfile for my Python code looks like:

FROM python:3.7
ARG export_file=goodreads.csv
COPY $export_file goodreads_export.csv
COPY converter.py /
CMD ["python", "./converter.py"]

FROM

The FROM keyword here indicates a dependency. Docker containers don't have languages automatically loaded. To access Python to run the code, we need to instruct the image to include python:3.7.

A note on Docker registries:
the default Docker registry is Dockerhub. If a docker image is available on Dockerhub, you don't need to specify a url when pulling or pushing from a docker repo. You just need the author's username and the repo name. For example, you can pull the docker image from this post with the command docker pull thejessleigh/goodreads-libib-converter. If you're using a different registry you'll need to tell Docker where to go. For example, if you're using Quay you'd do docker pull quay.io/example-username/test-docker-repo.

The python dependency in my Dockerfile doesn't have a username because it's an official repo hosted on Dockerhub.

ARG

ARG declares an argument. It is the only instruction in a Dockerfile that can precede FROM, although I prefer to have FROM come first for the sake of consistency.

In the above example, I declare an ARG export_file and give it a default. It expects a file called goodreads.csv in the same directory as the Dockerfile. If I want to pass in something different, I instruct it to use a different filename with --build-arg=export_file=my_goodreads_export.csv when building the image.

COPY

COPY and ADD duplicate the contents of a file into the docker image. This is where I'm importing the input file and also the actual Python code that the Docker image executes.

COPY takes two arguments:

the location of the file you're putting into the image
the location of the file inside the docker image

So whatever file I include as the CSV to convert will be referred to as goodreads_export.csv inside the Docker container. This is nifty, because it means that no matter what I build the docker image with, the filename will always be consistent. I don't have to worry about making the Python code handle different filenames or paths. It can always look for ./goodreads_export.csv.

There are some subtle differences between COPY and ADD that @ryanwhocodes has already written about, so I'll leave his post here.

Update September 2019: It appears that this post is no longer available on dev.to so I have replaced the embedded post with an archive.org link.

RUN

RUN issues an instruction that is executed and committed as part of the image. If I were dockerizing a Python project that needed to install external packages, I could use RUN to pip install those dependencies. However, converter.py is a very simple process that doesn't need external packages, so I don't need to run anything as part of my build process.

CMD

There can only be one CMD instruction per Dockerfile. If the Dockerfile contains multiple CMDs, only the last one will execute.

CMD is the command you intend the image to do when you run an instance of it as a container. It is not executed as part of the build process for an image. CMD is different from RUN in this way.

Building a docker image

Now we have everything necessary to build a Docker image for our Python code from the Dockerfile.

As stated above, a Docker image is an inert snapshot of an environment that is ready to execute a command or program, but has not yet executed that command.

To build using the above Dockerfile, we run

docker build --build-arg=export_file=goodreads_export.csv -t goodreads-libib-converter .

--build-arg tells Docker to build the image with a file called goodreads_export.csv, overriding the default expectation of goodreads.csv.

-t goodreads-libib-converter "tags" the image as goodreads-libib-converter. This is how you create your container with a human readable REPOSITORY name.

. tells Docker to look for a Dockerfile to build in the current directory.

After I do this, I can see that the image was successfully created by checking my image list.

> docker image list
REPOSITORY                 TAG       IMAGE ID       CREATED             SIZE
goodreads-libib-converter  latest    1234567890     12 seconds ago      924MB

Running a Docker container

Now that I have an image, I have a standalone environment capable of running my program, but it hasn't actually executed the core procedure specified with CMD yet. Here's how I do that:

docker run goodreads-libib-converter

I see the print debugging statements I have in my converter.py file execute, so I know how many CSV rows are being converted. When I ran the program locally, it created an output file called libib_export.csv. However, when I check the contents of my directory now, it's not there. How is that useful!?

Accessing Files Written Out

I'm no longer running the Python code in the directory I was before. I'm running it inside the Docker container. Therefore, any files that are written out will also be stored inside the Docker container. The output file doesn't do me much good in there!

I'm running the Docker container locally, so all I have to do is find the container and copy the output file from it's dockerized location to the place I actually want it.

docker cp container_id:/libib_export.csv ~/outputs/libib_export.csv

This extracts the resultant CSV output from converter.py and puts it somewhere I can access it.

I can figure out the container_id (or the human readable name) with

> docker ps -a
CONTAINER ID  IMAGE                   COMMAND                  CREATED             NAMES
e00000000000  goodreads-libib-export  "python ./converter.…"   24 seconds ago      naughty_mcclintock

Yes, naughty_mcclintock is actually the procedurally generated name for the container I've been working with locally.

Copying a file from a container to my desired location is fine for a local environment, but has limited uses if I ever want to take this project to production. There are other, better options for dealing with output files from Docker containers, but we'll get into that ✨ in another installment in this series ✨

Committing a docker image

After we've run the container to confirm that it works, we probably to create a new image based on the changes it made when it executed. We're preparing the image that we want to push up into an external Docker registry, like Dockerhub.

When committing a Docker image, we need to specify the registry (if it's something other than dockerhub), the author name, the repository name, and the tag name.

docker commit -m "Working Python 3 image" naughty_mcclintock thejessleigh/goodreads-libib-converter:python3

My docker commit was successful, so I see a sha256 hash output in my terminal. Creating a commit message is, of course, optional. But I like to do it to keep organized.

A note on Docker image tags:
When you pull a Docker image and you don't specify a tag it will use the default tag (usually latest). Tags are the way you can keep track of changes in your project without overwriting previous versions. For example, if you (for some reason) are still using Python 2, you can access the Python 2 image by running docker pull thejessleigh/goodreads-libib-converter:python2. Right now the :python3 and latest tags on my rocker repo are the same, but you can pull either one.

Pushing a docker image to Dockerhub

Now that I have an image I want to put out into the world, I can push it up to Dockerhub.

First, I need to log into Dockerhub and create a repository. Repositories require a name, and should have a short description which details the purpose of the project, and a long description that explains dependencies, requirements, build arguments, etc. You can also make a Docker repository private.

Once I've done that, I run docker push, which sends the latest commit of the project and tag I've specified up to the external registry. If you didn't specify a tag, this push will override the latest tag in your repository.

docker push thejessleigh/goodreads-libib-converter:python3

If you go to my Dockerhub profile you can see the goodreads-libib-converter project, and pull both the Python 2 and Python 3 incarnations.

Next Steps

Now that I have a working Docker image, I want to put it into production so that anyone can convert their Goodreads library CSV into a Libib library CSV. I'm going to go about this using AWS, which requires a bit of setup.

The next installment in this series will go over setting up an AWS IAM account, setting up awscli and configuring your local profiles, and creating an s3 bucket that your IAM account can access.

EDIT: Never did get around to that next post in the series. I should do that someday.

Dockerizing a Simple Python Process