Docker images are created by building Dockerfiles. The build process executes the instructions in the Dockerfile to create the filesystem layers that form the final image.

What if you already have an image? Can you retrieve the Dockerfile it was built from? In this article, we’ll look at two methods that can achieve this.

The Objective

When you’re building your own Docker images, you should store your Dockerfiles as version controlled files in your source repository. This practice ensures you can always retrieve the instructions used to assemble your images.

Sometimes you won’t have access to a Dockerfile though. Perhaps you’re using an image that’s in a public registry but has an inaccessible source repository. Or you could be working with image snapshots which don’t directly correspond to a versioned Dockerfile. In these cases, you need a technique that can create a Dockerfile from an image on your machine.

Docker doesn’t offer any built-in functionality for achieving this. Built images lack an association with the Dockerfile they were created from. However, you can reverse engineer the build process to produce a good approximation of an image’s Dockerfile on-demand.

The Docker History Command

The docker history command reveals the layer history of an image. It shows the command used to build each successive filesystem layer, making it a good starting point when reproducing a Dockerfile.

Here’s a simple Dockerfile for a Node.js application:

FROM node:16
COPY app.js .
RUN app.js --init
CMD ["app.js"]

Build the image using docker build:

$ docker build -t node-app:latest .

Now inspect the image’s layer history with docker history:

$ docker history node-app:latest
IMAGE          CREATED          CREATED BY                                      SIZE      COMMENT
c06fc21a8eed   8 seconds ago    /bin/sh -c #(nop)  CMD ["app.js"]               0B        
74d58e07103b   8 seconds ago    /bin/sh -c ./app.js --init                      0B        
22ea63ef9389   19 seconds ago   /bin/sh -c #(nop) COPY file:0c0828d0765af4dd...   50B       
424bc28f998d   4 days ago       /bin/sh -c #(nop)  CMD ["node"]                 0B        
<missing>      4 days ago       /bin/sh -c #(nop)  ENTRYPOINT ["docker-entry...   0B        
...

The history includes the complete list of layers in the image, including those inherited from the node:16 base image. Layers are ordered so the most recent one is first. You can spot where the layers created by the sample Dockerfile begin based on the creation time. These show Docker’s internal representation of the COPY and CMD instructions used in the Dockerfile.

The docker history output is more useful when the table’s limited to just showing each layer’s command. You can disable truncation too to view the full command associated with each layer:

$ docker history node-app:latest --format "{{.CreatedBy}}" --no-trunc
/bin/sh -c #(nop)  CMD ["app.js"]
/bin/sh -c ./app.js --init
/bin/sh -c #(nop) COPY file:0c0828d0765af4dd87b893f355e5dff77d6932d452f5681dfb98fd9cf05e8eb1 in . 
/bin/sh -c #(nop)  CMD ["node"]
/bin/sh -c #(nop)  ENTRYPOINT ["docker-entrypoint.sh"]
...

From this list of commands, you can gain an overview of the steps taken to assemble the image. For simple images like this one, this can be sufficient information to accurately reproduce a Dockerfile.

Automating Layer Extraction with Whaler and Dfimage

Copying commands out of docker history is a laborious process. You also need to strip out the /bin/sh -c at the start of each line, as Docker handled each instruction as a no-op Bash comment.

Fortunately there are community tools available that can automate Dockerfile creation from an image’s layer history. For the purposes of this article, we’ll focus on Whaler which is packaged into the alpine/dfimage (Dockerfile-from-Image) Docker image by the Alpine organization.

Running the dfimage image and supplying a Docker tag will output a Dockerfile that can be used to reproduce the referenced image. You must bind your host’s Docker socket into the dfimage container so it can access your image list and pull the tag if needed.

$ docker run --rm 
    -v /var/run/docker.sock:/var/run/docker.sock 
    alpine/dfimage node-app:latest

Analyzing node-app:latest
Docker Version: 20.10.13
GraphDriver: overlay2
Environment Variables
|PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
|NODE_VERSION=16.14.2
|YARN_VERSION=1.22.18

Image user
|User is root

Dockerfile:
...
ENTRYPOINT ["docker-entrypoint.sh"]
CMD ["node"]
COPY file:bcbc3d5784a8f1017653685866d30e230cae61d0da13dae32525b784383ac75f in .
    app.js

RUN ./app.js --init
CMD ["app.js"]

The created Dockerfile contains everything you need to go from scratch (an empty filesystem) to the final layer of the specified image. It includes all the layers that come from the base image. You can see these in the first ENTRYPOINT and CMD instructions in the sample output above (the other base image layers have been omitted for brevity’s sake).

With the exception of COPY, the instructions specific to our image match what was written in the original Dockerfile. You can now copy these instructions into a new Dockerfile, either using the whole dfimage output or by taking just the part that pertains to the final image. The latter option is only a possibility if you know the original base image’s identity so you can add a FROM instruction to the top of the file.

The Limitations

In many cases dfimage will be able to assemble a usable Dockerfile. Nonetheless it’s not perfect and an exact match is not guaranteed. The extent of the discrepancies compared to the image’s original Dockerfile will vary depending on the instructions that were used.

Not all instructions are captured in the layer history. Unsupported ones will be lost and there’s no way you can determine what they were. The best accuracy is obtained with command and metadata instructions like RUN, ENV, WORKDIR, ENTRYPOINT, and CMD. RUN instructions could still be missing if their command didn’t result in filesystem changes, meaning no new image layer was created.

COPY and ADD instructions present unique challenges. The history doesn’t contain the host file path which was copied into the container. You can see a copy occurred but the source path references the file hash that was copied into the image from the build context.

As you do get the final destination, this can be enough to help you work out what’s been copied and why. You can then use this information to interpolate a new source path into the Dockerfile which you can use for future builds. In other cases, inspecting the file inside the image might help reveal the copy’s purpose so you can determine a meaningful filename for the host path.

Summary

Docker images don’t include a direct way to work backwards to the Dockerfile they were built from. It’s still possible to piece together the build process though. For simple images with few instructions, you can often work out the instructions manually by looking at the CREATED BY column in the docker history command’s output.

Larger images with more complex build processes are best analyzed by tools like dfimage. This does the hard work of parsing the verbose docker history output for you, producing a new Dockerfile that’s a best effort match for the likely original.

Reverse engineering efforts aren’t perfect and some Dockerfile instructions are lost or mangled during the build process. Consequently you shouldn’t assume Dockerfiles created in this way are an accurate representation of the original. You might have to make some manual adjustments to ADD and COPY instructions too, resurrecting host file paths that were converted to build context references.

Profile Photo for James Walker James Walker
James Walker is a contributor to How-To Geek DevOps. He is the founder of Heron Web, a UK-based digital agency providing bespoke software development services to SMEs. He has experience managing complete end-to-end web development workflows, using technologies including Linux, GitLab, Docker, and Kubernetes.
Read Full Bio »