Dockerfile writing tips

WALT relies on the famous docker technology to pack its OS images. For some users, this underlying technology may remain unnoticed (e.g., when just using high level tools such as walt image shell). However, if you want to build a new image from scratch, or make the image build process more reproducible, working with a Dockerfile is the solution. This post proposes a few tips and best practices regarding the writing of Dockerfiles.

There is a large set of documentation available at https://docs.docker.com, including a section about Dockerfiles and one about the build system. This post will link to specific subsections and provide complementary information.

Note that most of this blog post is not specific to WALT: it applies to any kind of docker image development.

The docker image building process

A simple example

Let’s start with a simple example. We will generate a Debian OS image with a small set of popular text editors. Here is the content of the Dockerfile:

FROM debian:trixie
RUN apt update
RUN apt install -y vim jed
RUN apt distclean

We can build the OS image and give it the name debian-with-editors by running:

$ docker build -f Dockerfile -t "debian-with-editors" .
Sending build context to Docker daemon  2.048kB
Step 1/4 : FROM debian:trixie
 ---> 53f0b37b86c4
Step 2/4 : RUN apt update
 ---> Running in ccdd22d5625f
[...]
Removing intermediate container ccdd22d5625f
 ---> d21fc06c76a2
Step 3/4 : RUN apt install -y vim jed
 ---> Running in 5672af07d4d3
[...]
Removing intermediate container 5672af07d4d3
 ---> c565951dea10
Step 4/4 : RUN apt distclean
 ---> Running in 065f17cc2f21
[...]
Removing intermediate container 065f17cc2f21
 ---> 7674c15e4e6b
Successfully built 7674c15e4e6b
Successfully tagged debian-with-editors:latest
$

Depending on the build backend, this output may be formatted differently, but the underlying behavior is the same. We see that docker build has split the work in 4 steps, corresponding to the 4 lines of the Dockerfile. Each step allows to modify the image incrementally.¹

Impact and pitfalls of the build cache

Since we were building the image for the first time, the build cache could not help, and each of the RUN steps really had to be executed.

But if we run the same command again, it completes very fast, because each step is found in the build-cache so there is no need to run the commands again:

$ docker build -f Dockerfile -t "debian-with-editors" .
Sending build context to Docker daemon  5.632kB
Step 1/4 : FROM debian:trixie
 ---> 53f0b37b86c4
Step 2/4 : RUN apt update
 ---> Using cache
 ---> d21fc06c76a2
Step 3/4 : RUN apt install -y vim jed
 ---> Using cache
 ---> c565951dea10
Step 4/4 : RUN apt distclean
 ---> Using cache
 ---> 7674c15e4e6b
Successfully built 7674c15e4e6b
Successfully tagged debian-with-editors:latest
$

Now, let’s add nano to the list of text editors we want to install:

FROM debian:trixie
RUN apt update
RUN apt install -y vim jed nano
RUN apt distclean

And let’s restart the build:

$ docker build -f Dockerfile -t "debian-with-editors" .
Sending build context to Docker daemon  2.048kB
Step 1/4 : FROM debian:trixie
 ---> 53f0b37b86c4
Step 2/4 : RUN apt update
 ---> Using cache
 ---> d21fc06c76a2
Step 3/4 : RUN apt install -y vim jed nano
 ---> Running in ed9d95025c68
[...]
Removing intermediate container ed9d95025c68
 ---> 8fed3644a951
Step 4/4 : RUN apt distclean
 ---> Running in a3125f5611d0
[...]
Removing intermediate container a3125f5611d0
 ---> 52b5a4b4d171
Successfully built 52b5a4b4d171
Successfully tagged debian-with-editors:latest
$

You can notice that the step RUN apt update was skipped again. This is because we modified only the 3rd line of the Dockerfile, so docker build could find the result of the two first steps in the build cache, and fast-forward to the 3rd step.

This 3rd step obviously had to be run again… but the 4th step too, even if the corresponding line of the Dockerfile did not change, because the previous intermediate image was changed by step 3.

Generally speaking, each time a Dockerfile step has to be re-executed, all following steps of the Dockerfile are re-executed too. So, if you develop and then maintain a complex Dockerfile, a good practice is to write the lines you often have to change by the end of the Dockerfile, if possible.

The build cache is automatically² invalidated in those three cases:

The line of the Dockerfile was changed;
The line imports a file or directory which was modified (ADD and COPY commands);
A previous step of the Dockerfile was invalidated.

During development, you may save build time by writing this kind of things:

FROM [...]
[...]
RUN apt update && apt install -y <all-packages-i-need...>
RUN <two-hours-long-build-command>
[...]
RUN apt update && apt install -y <one-more-package-i-forgot>
RUN <command-using-the-added-package>

If I had modified the upper list of packages, this would have invalidated all following steps, including the next one which is very long. So, in this development phase, I preferred to add the missing package with a new RUN line, just before the command that needs it. I can still refactor the Dockerfile properly at the end of the development, and trigger a long but last re-build.

We have seen that properly leveraging the build cache can greatly reduce build times, but it also has pitfalls. As an example, let’s wrap up what happened when I added nano:

FROM debian:trixie
RUN apt update                   # in cache
RUN apt install -y vim jed nano  # executed
RUN apt distclean                # executed

But… what if I added nano 6 months after I first wrote this Dockerfile? This would mean a 6 months gap between running apt update (since it was only run the 1st time) and re-running apt install! You probably know that command apt update queries the package repositories and apt install tries to download and install the relevant packages… so a 6-months gap will probably cause package version mismatches, and this new run of apt install will fail!

However, it’s rather easy to fix our Dockerfile and ensure apt update is immediately followed by apt install:

FROM debian:trixie
RUN apt update && apt install -y vim jed nano
RUN apt distclean

Now, if I add yet another editor next year, I should not have any problems. Generally speaking, be cautious with commands such as apt update which give different results over time. The build cache will not be invalidated unless you really modify the line. So make sure you write next operations relying on this changing data in the same RUN line.

The layer-based backend and its impact on image storage size

This section assumes docker is configured with the overlay2 storage driver, which has been the default value for a long time. By the end of 2025, the default storage driver for new installations became the containerd image store; what is said below is still true, but storage usage increases significantly with this new driver since it stores both compressed and uncompressed versions of images.

Let’s check how big is our sample image:

$ docker image list | grep debian-with-editors
debian-with-editors  latest  f501ab2ac234  9 minutes ago  198MB
$

It is 198MB large.

When using docker build, we actually create one “filesystem layer” for each of the Dockerfile steps, and the “image size” is the sum of layer sizes. A “filesystem layer” is just a subdirectory somewhere in /var/lib/docker. This directory stores the files which were created or modified at this step of the Dockerfile. Files which were removed at this step are also indicated there, but in a more special way (check-out this blog post if you are interested).

Knowing this, let’s try something in our Dockerfile:

FROM debian:trixie
RUN apt update && apt install -y vim jed nano && apt distclean

I moved apt distclean up, on the same RUN line.

And after a new build, we get:

$ docker image list | grep debian-with-editors
debian-with-editors  latest  34f2147494be  20 seconds ago  177MB
$

This saved 198 - 177 = 21MB!

This effect comes from the layer-based system and it can be explained quite easily. The size of each layer of the previous Dockerfile was:

FROM debian:trixie                             # 120MB -- size of debian:trixie
RUN apt update && apt install -y vim jed nano  # ~78MB
RUN apt distclean                              #  ~0MB -- mostly deleted files

The second layer³ was validated before the superfluous files were removed, so those superfluous files have been saved in the underlying layer directory. And running apt distclean after that will not modify what we stored in the previous layer! So at the end of the build, the image appears cleaned up, but the way it is stored on disk is sub-optimal.

Compared to that, the size of each layer of the new Dockerfile is:

FROM debian:trixie                                              # 120MB
RUN apt update && apt install -y vim jed nano && apt distclean  # ~57MB

This time, we properly removed the superfluous files before validating the layer.

Sometimes, the difference is even much larger. Consider for instance this Dockerfile involving the compilation of a large code base:

[...]
RUN make
RUN make install
RUN make clean

This will actually store all of the intermediate compilation artefacts in the layer storage and result in a much larger image size, compared to a single RUN line.

In this kind of case, you may also notice that docker build seems to hang for a while before ending the RUN make step and finally running the next one. The reason is that at this time docker build is saving the many new files to layer storage… So optimizing this properly can not only save disk space but also make the build faster!

Generally speaking, it’s often a good thing to chain in a single RUN line a command which creates many temporary files and the command which removes them.

Additional notes about the build system

In some cases one may leverage the build cache by ensuring the first steps of two related images are the same.

You can make the long lines in your Dockerfile more readable by using backslash line continuation. For instance:

[...]
RUN make && \
    make install && \
    make clean

WALT internally relies on podman (and related tools buildah and skopeo) instead of docker⁴. Those two toolboxes are mostly similar, but they store their objects in a different location. Users can therefore use docker on the WALT server without risk of interference with the WALT software.

The maximum number of layers in a Docker image is 127. This includes the number of layers in the base image (the one referenced by the FROM line) and the ones in the Dockerfile. The default WALT images are often used as a base image, so we try to keep this number as low as possible. However, users which use the walt image shell command very repeatedly may still reach this number⁵. In this case, editing the image is no longer allowed unless the user runs walt image squash to squash all layers into a single one.

Keeping disk usage under control

The docker storage is global to the whole system, and all images starting with FROM debian:trixie for instance share the same initial set of layers (that is, the layers of debian:trixie itself). For this reason, the disk space needed to store docker images may be smaller than the sum of the “image sizes”. On the other hand, images built with multi-stage Dockerfiles (we will talk about them below) may consume much more disk storage at build time than the final image size suggests.

For this reason, you should monitor disk usage and use docker system prune from time to time, to clear the layer cache before you run out of disk. For instance, using docker system prune --filter until=1440h will clear docker objects older than 2 months (more precisely 1440 hours) and no longer referenced, including layers no longer referenced by any image.

You can automate this by creating a systemd timer file and the associated service on your docker build server (e.g., your WALT server). The following will run this task everyday at 3:00 in the morning, and prune those docker objects older than 2 months.

$ cat > /etc/systemd/system/docker-prune.timer << EOF
[Unit]
Description=Daily pruning of old docker objects

[Timer]
OnCalendar=*-*-* 3:00
Persistent=true

[Install]
WantedBy=timers.target
EOF
$
$ cat > /etc/systemd/system/docker-prune.service << EOF
[Unit]
Description=Daily pruning of old docker objects
After=docker.service

[Service]
Type=oneshot
ExecStart=docker system prune --filter until=1440h --force
EOF
$
$ systemctl daemon-reload
$ systemctl enable docker-prune.timer
$ systemctl start docker-prune.timer

Even with this in place, check your disk space when you build large images repeatedly.

Multi-stage Dockerfiles

“Multi-stage Dockerfiles” is a feature we use very often for building most of the default WALT images. It allows to clearly separate the build process from the final image. The official documentation contains many details on this subject, but when creating full system images like those in WALT, some rather specific problems may arise. So, let’s say a few words about them.

In any case, it’s obviously a good practice to exclude build tools and build artifacts from a Docker image, and multi-stage Dockerfiles are very helpful in this regard.

Multi-stage Dockerfiles and CPU emulation

In this example we build a minimal docker image based on Debian trixie, with an arm64 architecture (the one of the recent Raspberry Pi boards for instance):

# builder stage
FROM debian:trixie as builder
RUN apt update && apt install -y debootstrap
RUN debootstrap --arch=arm64 --foreign --variant=minbase \
      trixie /rpi_fs
# final stage
FROM --platform=linux/arm64 scratch
COPY --from=builder /rpi_fs /
RUN /debootstrap/debootstrap --second-stage

This is basically the way we build default WALT images for Raspberry Pi boards. In the builder stage, we install a standard debian tool called debootstrap, and run it to download the target OS files and populate a directory at /rpi_fs. In the final stage, we use the special scratch image in the FROM line, so the image is empty at first; then we copy the directory /rpi_fs of the builder step at the root of the image.

Note that when populating an OS directory with a CPU architecture different from that of the build machine, debootstrap must be called in a two-steps process:

Run debootstrap as usual but with the options --arch <target-arch> and --foreign added;
Run /debootstrap/debootstrap --second-stage inside the target OS.

The historical way to run step 2 was to use chroot (for dynamically changing where the root of the filesystem is), but this is a privileged command, not allowed in Dockerfiles. However, as shown in this example, it is easy to achieve this with a multi-stage Dockerfile.

In our case, the CPU architecture of the builder stage is the one of the build machine, x86_64, and the CPU architecture of the final step is arm64. That implies that /debootstrap/debootstrap --second-stage will have to run many arm64 binaries. The first of them is /bin/sh because /debootstrap/debootstrap is actually a shell script, and then it will run other package management tools (such as dpkg) as child processes. For this CPU emulation to work transparently, one just has to install qemu user-mode emulation on the build machine, as we explained in a previous blog post.

Impact of multi-stage Dockerfiles on layer storage

In the case of a multi-stage Dockerfile, the “size” of the image is just the sum of the final stage layers. And when running docker system prune, the builder stage layers are cleared, because the image only references the layers of the final stage. However, if you iterate many times when developing this kind of Dockerfile, keep in mind that the build cache can fill up quite quickly, independently of how minimal the image finally gets.

Making long build steps shorter

Some build steps way be really long, especially in a builder stage. For instance, if you want to generate an OpenWRT or buildroot image, you may use this kind of Dockerfile:

# builder stage
FROM debian:trixie as builder
RUN apt update && apt install -y <all-packages-needed-for-builder-stage>
WORKDIR /root
RUN git clone <repo-of-openwrt>
WORKDIR /root/openwrt
[...]   # configuration of openwrt
RUN make
[...]   # extract generated files at /openwrt-os
# final stage
FROM --platform=linux/<arch> scratch
COPY --from=builder /openwrt-os /

Here the command RUN make may take a really long time to complete, perhaps half your work day, because those projects work on source code only, so they compile everything. In their default configuration, they even start by compiling a compiler toolchain!

If you are connected remotely on the build machine using ssh, the first thing to do is to use a tool such as screen to run the docker build command, so that you can reattach your shell session later in case of disconnection.

Next, check in the relevant project documentation if you cannot reduce the time needed for this command. For instance OpenWRT and buildroot allow you to install a pre-built compiler toolchain and just reference it in the configuration, instead of compiling it.

These projects also allow parallel builds, so you could type make -j $(nproc) instead. This will let make run up to N builds jobs in parallel, when possible (when several build targets can be built independently), with nproc returning the number of processor threads on the build machine. In this case you should also prefix the command with nice, i.e., nice make -j $(nproc), and possibly with ionice and chrt too, as OpenWRT docs suggest. This will decrease the priority of the build processes regarding CPU and disks usage, compared to the other processes on the build machine. So it may increase the runtime of the build by a few percents, but it will minimize slowdowns in interactive processes running simultaneously (e.g., SSH sessions).

Saving time when a long build step fails

Here is what a failure of a long parallel build step looks like:

$ docker build [...]
[...]  # very long trace
Step 31/54 : RUN nice make -j $(nproc)
 ---> Running in eea0f080ff50
[...]  # very long trace
make[2]: Entering directory '/workdir/openwrt/scripts/config'
make[2]: 'conf' is up to date.
make[2]: Leaving directory '/workdir/openwrt/scripts/config'
make -r toolchain/compile: build failed. Please re-run make with -j1 V=s or V=sc for a higher verbosity level to see what's going on
make: *** [/workdir/openwrt/include/toplevel.mk:233: 0] Error 1
The command '/bin/sh -c nice make -j $(nproc)' returned a non-zero code: 2
$

The initial error message which caused the failure is not visible, for two reasons:

In this specific case (OpenWRT build), the build should be started with V=s for a more verbose output (as suggested in the ending message)
Since it was a parallel build, the error immediately stopped one of the build jobs when it occurred, but the other jobs stopped later. So, if ever it was printed, the initial error message would have been followed by many other traces, possibly overflowing the history buffer of your terminal or screen session.

So what we want now is to run make -j1 V=s as an additional step after the one that failed. This additional step should run fast because of the artefacts of the partial build already generated.

Since docker build failed, the image was not created. However, we noted the <container-id> of the failed build step above (it said Running in eea0f080ff50), so we can save the content of this container as a temporary image and then explore it or run this additional step:

$ docker commit eea0f080ff50 failed-build
$ docker run -it --rm --entrypoint /bin/bash failed-build
root@12581779e2a0:/workdir/openwrt# make -j1 V=s
[...]
Error: [...]
root@12581779e2a0:/workdir/openwrt#

But you might have missed this log message indicating the <container-id> (for instance if the terminal history buffer overflowed), and it would be really annoying to restart the whole build from scratch, especially with option -j1 (single-threaded build).

For this reason I often write this kind of things in my Dockerfiles:

[...]
RUN nice make -j $(nproc) || make -j1 V=s || touch .error_detected
RUN [ ! -f .error_detected ]
[...]

If ever the parallel build fails, it will automatically run the additional build with verbose and single-threaded parameters, allowing to properly print the error message at the end of the output. This additional build should logically fail too, so the command touch .error_detected will create a file. Since the touch command does not fail, docker build considers this build step actually succeeded, and it stops on the next step only, because of the .error_detected file.

When you reconnect your screen session after a while, you should be able to properly read what the error was. If it is not enough to diagnose the issue, you can also read the <container-id> of the last step, and use docker commit and docker run as above to explore the state of the OS after make failed.

Sometimes you can also break down a long build command into several steps. For instance, instead of using the default make target which builds everything, you may call make tools first, then make toolchain, and finally make to build what remains (the exact make target names depend on the project). So if the first two build steps are successful and you struggle with the third, you can make various tries in the Dockerfile without having to recompile the tools and toolchain components already in the build cache.

Last words

We hope you found some interesting techniques in this blog post for writing efficient Dockerfiles, and for writing them effectively!

Thanks to Jérémie Finiel for proofreading. If you have questions, we can answer you on the mailing list.

You can easily emulate the build steps of a Dockerfile using lower-level docker commands. For instance, considering you have built the image resulting from the previous steps, a RUN step is like running docker run <prev-image> <command> followed by docker commit <container-id> <new-image>. ↩︎
You can also forcibly disable the build cache by running docker build with option --no-cache. ↩︎
We simplified a bit here by considering the line FROM debian:trixie added only 1 layer. In reality it adds all layers of this image debian:trixie. ↩︎
The reason why we selected podman and buildah instead of docker for the management of OS images in WALT is that buildah provides a subcommand buildah mount, important software for exporting WALT images as NFS shares. ↩︎
walt image shell internally relies on a combination of podman run and podman commit. Each run adds one layer to the image. ↩︎

The docker image building process#

A simple example#

Impact and pitfalls of the build cache#

The layer-based backend and its impact on image storage size#

Additional notes about the build system#

Keeping disk usage under control#

Multi-stage Dockerfiles#

Multi-stage Dockerfiles and CPU emulation#

Impact of multi-stage Dockerfiles on layer storage#

Making long build steps shorter#

Saving time when a long build step fails#

Last words#