Docker has transformed software development with its simple model of containerization that lets you rapidly package workloads into reproducible units. While Docker’s easy to get to grips with, there’s more nuance to its usage than is always apparent. This is especially true when you’re looking to optimize your Docker usage to increase efficiency and performance.
Here are seven of the most common Docker anti-patterns you should look for and avoid. Although your containers and images might fulfill your immediate needs, the presence of any of these practices suggests you’re deviating from containerization principles in a way which could be harmful further down the line.
1. Applying Updates Inside Containers
Arguably the most common Docker anti-pattern is trying to update containers using techniques carried over from traditional virtual machines. Container filesystems are ephemeral so all changes are lost when the container stops. Their state should be reproducible from the
Dockerfile used to build the image.
This means you shouldn’t run an
apt upgrade inside your containers. They’d then differ from the image they were built from. Containers are intended to be freely interchangeable; separating your data from your code and dependencies lets you replace container instances at any time.
Patches should be applied by periodically rebuilding your image, stopping existing containers, and starting new ones based on the revised image. Community toolchain projects are available to simplify this process and let you know of available upstream updates.
2. Running Multiple Services Inside One Container
Containers should be independent and focused on one particular function. Although you may previously have run your web and database servers on a single physical machine, a fully decoupled approach would see the two components separated into individual containers.
This methodology prevents individual container images from becoming too large. You can inspect the logs from each service using built-in Docker commands and update them independently of each other.
Multiple containers give you enhanced scalability as you can readily increase the replica count of individual parts of your stack. Database running slow? Use your container orchestrator to add a few more MySQL container instances, without allocating any extra resources to the components that are already running well.
3. Image Builds With Side Effects
Docker image builds should be idempotent operations that always produce the same result. Running
docker build shouldn’t impact your broader environment in the slightest as its sole objective is to produce a container image.
Nonetheless many teams create Dockerfiles that manipulate external resources. A Dockerfile can morph into a form of all-encompassing CI script that publishes releases, creates Git commits, and writes to external APIs or databases.
These actions don’t belong in a Dockerfile. Creating a Docker image is an independent operation which should be its own CI pipeline stage. Release preparation then occurs as a separate stage so you can always
docker build without unexpectedly publishing a new tag.
4. Over-complicating Your Dockerfile
In a similar vein, it is possible for Dockerfiles to do too much. Limiting your Dockerfile to the bare minimum set of instructions you need minimizes your image size and enhances readability and maintainability.
Problems can often occur when using multi-stage Docker builds. This feature makes it easy to develop complex build sequences referencing multiple base images. Many independent stages can be an indicator that you’re mixing concerns and coupling processes too tightly together.
Look for logical sections in your Dockerfile that serve specific purposes. Try to break these up into individual Dockerfiles, creating self-contained utility images that can run independently to fulfill parts of your broader pipeline.
You could create a “builder” image with the dependencies needed to compile your source. Use this image as one stage in your CI pipeline, then feed its output as artifacts into the next stage. You might now copy the compiled binaries into a final Docker image which you use in production.
5. Hardcoded Configuration
Container images which include credentials, secrets, or hardcoded configuration keys can cause serious headaches as well as security risks. Baking settings into your image compromises Docker’s fundamental attraction, the ability to deploy the same thing into multiple environments.
Use environment variables and declared Docker secrets to inject configuration at the point you start a container. This maintains images as reusable assets and limits sensitive data access to runtime only.
This rule still applies to images that are intended for internal use only. Hardcoding secrets implies they’re also committed to your version control software, potentially rendering them vulnerable to theft in the event of a server breach.
6. Separate Development and Deployment Images
You should only build one container image for each change in your application. Maintaining multiple similar images for individual environments suggests you’re not benefiting from Docker’s “runs anywhere” mentality.
It’s best to promote a single image across your environments, from staging through to production. This gives you confidence you’re running the exact same logical environment in each of your deployments, so what worked in staging will still run in production.
Having a dedicated “production” image suggests you may be suffering from some of the other anti-patterns indicated above. You’ve probably got a complex build sequence that could be broken up, or production-specific credentials hardcoded into your image. Images should be separated by development lifecycle stage, not by deployment environment. Handle differences between environments by injecting configuration using variables.
7. Storing Data Inside Containers
The ephemeral nature of container filesystems means you shouldn’t be writing data within them. Persistent data created by your application’s users, such as uploads and databases, should be stored in Docker volumes or it will be lost when your containers restart.
Other kinds of useful data should avoid writing to the filesystem wherever possible. Stream logs to your container’s output stream, where they can be consumed via the
docker logs command, instead of dumping them to a directory which would be lost after a container failure.
Container filesystem writes can also incur a significant performance penalty when modifying existing files. Docker’s use of the “copy-on-write” layering strategy means files that exist in lower filesystem layers are read from that layer, rather than your image’s final layer. If a change is made to the file, Docker must first copy it into the uppermost layer, then apply the change. This process could take several seconds for larger files.
Watching for these anti-patterns will make your Docker images more reusable and easier to maintain. They’re the difference between merely using Docker containers and adopting a containerized workflow. You can easily write a functioning Dockerfile but a poorly planned one will restrict your ability to capitalize on all the potential benefits.
Containers should be ephemeral, self-contained units of functionality created from a reproducible build process. They map to stages in your development process, not deployment environments, but don’t directly facilitate that process themselves. Images should be the artifacts produced by a CI pipeline, not the mechanism defining that pipeline.
Adopting containers requires a mindset shift. It’s best to start from the fundamentals, be aware of the overarching objectives, then look at how you can incorporate them into your process. Using containers without due consideration of these aspects can end up creating a headache in the long-term, far from the increased flexibility and reliability touted by proponents of the approach.