Docker: What I Learned From Working Out What's in It for Me
I recently spent some time researching and using Docker, specifically with a view of how it would help me develop a scalable SOA. I gave a couple of presentations on some of the basics I picked up.
In the spirit of the order of the talk, I concluded that Docker had some really interesting use cases:
For developers: Docker is great for pulling in application dependencies. For example, starting up an ElasticSearch server is as simple as two commands at the shell.
For devops: Docker is great for getting developers to package up apps with all their dependencies into a declaratively constructed, directly deployable, entirely repeatable artifact.
For architects: Docker is entirely in line with the tenets of Twelve Factor Apps, and can easily be used in various infrastructure scenarios.
I didn’t have time in the talk to cover some of the interesting details I discovered along the way, so I’ll cover them here instead:
- Structure Dockerfiles hierarchically
- Use Versions in Tags
- Check Dockerfiles into GitHub
- Running in the foreground
- Inter-container communication and web services APIs
Structure Dockerfiles hierarchically
It seems like premature optimization, but I found distinct develop-build-test cycle time improvements when I paid attention to the image tree Docker maintains.
My earlier attempts to iteratively build two Docker images, each from their own Dockerfile, was unnecessarily slow. Each of the Dockerfiles was large and monolithic, and the commands in each of the Dockerfiles had no real order. Each of the images had some similarities (Ubuntu-based, Oracle Java), and some notable differences (one runs ElasticSearch with Supervisor, the other packages up and runs a Clojure application from GitHub sources). Changes in the shared code in one were replicated in the other, and a change early in one Dockerfile meant rebuilding the entire image, not just incremental changes in the later “states” Docker manages.
The approach I moved to, and what I recommend, is nothing novel. Since each docker command run within a container yields a new container with that altered state, the best way to minimize build-time churn is to keep the base images stable. To wit: develop a hierarchy of images whose base layers are the most stable, and where changes occur as close to the leaf nodes of the tree as possible.
For illustration, here is the hierarchy of some images I built according to that scheme:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
The base
image is simply an updated ubuntu:saucy
image. common
adds common command line tools (curl etc) to base
. java
is an
image based on common
with Oracle Java 8 installed; it has a number
more intermediate images each corresponding to all the trickery to get
Java installed
“unattended”.
clojure
adds basic Clojure development tools, and
search-clojure-archive
builds on that to retrieve, build and run the
web app.
Your scheme will be different. I view Java as a fairly base-level dependency. Yours might be Ruby, or Nginx. The point is to push the rapidly changing configurations to the edges of the hierarchy, and get considerably quicker build and test times.
Use Versions in Tags
When building Dockerfiles into images, make consistent use of docker
build -t <tag>
. The documentation isn’t superbly lucid, but you are
encouraged to structure the “tag” used here as
<username>/<image>:<version>
. The version
identifier can be
anything, but latest
has some semantic significance.
I picked a semver scheme, I would just recommend being consistent, and keeping Dockerfile inline documentation and published image versions consistent.
Check Dockerfiles into GitHub
Because why not?
Whilst finding ways to solve some problem or accomplish some goal, I found looking at others’ Dockerfiles very useful. My issues were only very slightly unique, the problems had generally all been solved by others.
Quinten Krijger was kind enough to have put a bunch of Dockerfiles in his repo, they are worth a look.
Running in the foreground
To package your app up into a container (or when writing pretty much any other Dockerfile) that serves as a single purpose self-contained application, you quickly find that you need to run your app as a foreground process, and likely as the default command to run.
If the app container you’re creating is based on your own code, this is entirely within your control.
If the app container is based on a third-party daemonizing service, or if your container must have additional services running, using something like Supervisor might help. There are plenty of examples of configuring Supervisor, including in Docker’s own documentation.
Before adding too many services into a single image, ask yourself whether this is really necessary – could these dependencies be fulfilled by other independent Docker containers, with communication over the network?
Inter-container communication and web services APIs
If a good Docker container is single-purpose, then how can you build a
system of containers? Docker’s docker run --name <name>
and docker
run --link <name>:<alias>
is designed to help you with just that.
Run a service container with --name foo
and its exposed ports are
available to be mapped into client containers. Run a client container
with --link foo:bar
, and the “foo” container’s ports are exposed via
a variety of environment variables within the client, using “bar” as
an identifier.
For example, a client application depends on an ElasticSearch service.
An ES service would typically expose its HTTP API on port 9200. The
application within the client container can be written to expect the
ElasticSearch dependency to be fulfilled by injection of environment
variables corresponding to the alias you decide to always run it with
(“elasticsearch”), so specifically $ELASTICSEARCH_PORT_9200_TCP_ADDR
and $ELASTICSEARCH_PORT_9200_TCP_PORT
.
By running the ElasticSearch container with --name es
, and the
client with --link es:elasticsearch
, the service is wired into the
client via the predetermined env vars, whose values are only set at
run time. Very neat, and
full documentation for linking containers is on Docker’s site.