How to leverage distributions' packaging tools and CI/CD to build better container images using alpine Linux and Gitlab as examples.
Context
Why building images ?
Since most cloud oriented open-source projects provide their own pre-build images from highly available public repositories (quay.io, gcr.io, docker.io, …), you can legitimately ask if it’s worth wasting time building and distributing your own images.
In fact, apart from the lab environment where it makes sense to quickly evaluate a solution using pre-built images, I found that consuming public images for production was unpractical at best, and dangerous at worst :
each image comes with its own way of configuring and running the services (environment variables, volumes, secrets, user, …),
the distribution, tooling, size and quality vary greatly between images,
images don’t necessarily follow closely the upstream distribution or project’s revisions, integrate the options you need (compilation flags for example), apply important security patches on dependencies, run as unprivileged user, or handle properly the PID 1,
if you use containers on the edge, you will probably have difficulties to find images for your architecture (arm64).
Your cluster is lot easier to manage if you can maintain a consistency among your images: same distribution, same way to generate configurations, access secrets, and run services. This is a prerequisite for having a golden image you could use as a basis for all the other.
I wanted to share with you an alternative way of building container images, but first let’s go back in the past to understand how we ended up reinventing a squared wheel with docker.
The docker way
When docker came out in 2013 it revolutionized software distribution by allowing anyone to easily build, ship and run full Linux system images on top of lightweight isolation features provided by the Linux kernel: cgroups.
Before docker, building a system image that you could use on different servers required :
extensive Linux system administration skills,
complex system building process (Linux from scratch, …),
complex virtualization tools (vmware ESXi, qemu, kvm, …), and big network volumes or external devices to install the images from,
an heavy manual procedure in between (what some refers to as ClickOps)
docker fully automated this process :
by using a single file (Dockerfile) describing how to build the image and how to run container based on that image,
by simplifying images distribution using a network registry,
and by reducing every step of the container’s life-cycle to a single command line instruction (pull, build, push, run).
The installation of docker itself was reduced to the copy of a single big fat binary somewhere in your operating system.
People started to abuse the limited scripting expressiveness of the Dockerfile the moment they used it to build software instead of just assembling parts to create an image. In the name of repeatability and centralization of the whole build process, features were progressively added to docker to overcome problems caused by this misuse. This turned the situation even worse.
Layers are useless
Each statement in a Dockerfile that has a side effect on the file system triggers the creation of a
new layer in the resulting image. Having a layer built with every
RUN
statement was nice at the beginning
because it allowed to develop an image step by step and restart the process where it failed by
retrieving quickly the previous successful layers from the cache.
If this makes sense during development, it fails short when you want to go for production. You end up with multi gigabytes images made of tens of layers containing all the steps of the build: installation of the development packages, intermediate files, and even deleted files.
People started to use tools to squash image layers, or used ugly chains of instructions separated by
&&
in a single RUN
statement to limit the creation of new layers at the cost of a lesser
readability, a harder debug processes, and a much longer time to rebuild in case of changes. The
recent introduction of a
heredoc syntax, 8 years after
the first release of docker, says much about the weight of bad design decisions and the time needed
to mitigate them.
If layers seem a good idea on paper, they are in fact pretty useless. Modern image build tools like
buildha don’t enable cache by default and squash layers to only 2: The base
image (FROM
statement), and the rest.
Multi-stage build is a hack
And what to say about multi-stage build functionality which seems to exist only for not having to think too much about cleaning our mess after a build, and not including the whole tool-chain inside the image ?
Look for instance to this Dockerfile
taken from the
meilisearch repository :
# Compile
FROM alpine:3.14 AS compiler
RUN apk update --quiet \
&& apk add -q --no-cache curl build-base
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
WORKDIR /meilisearch
COPY Cargo.lock .
COPY Cargo.toml .
COPY meilisearch-auth/Cargo.toml meilisearch-auth/
COPY meilisearch-error/Cargo.toml meilisearch-error/
COPY meilisearch-http/Cargo.toml meilisearch-http/
COPY meilisearch-lib/Cargo.toml meilisearch-lib/
ENV RUSTFLAGS="-C target-feature=-crt-static"
# Create dummy main.rs files for each workspace member to be able to compile all the dependencies
RUN find . -type d -name "meilisearch-*" | xargs -I{} sh -c 'mkdir {}/src; echo "fn main() { }" > {}/src/main.rs;'
# Use `cargo build` instead of `cargo vendor` because we need to not only download but compile dependencies too
RUN if [ "$TARGETPLATFORM" = "linux/arm64" ]; then \
export JEMALLOC_SYS_WITH_LG_PAGE=16; \
fi && \
$HOME/.cargo/bin/cargo build --release
# Cleanup dummy main.rs files
RUN find . -path "*/src/main.rs" -delete
ARG COMMIT_SHA
ARG COMMIT_DATE
ENV COMMIT_SHA=${COMMIT_SHA} COMMIT_DATE=${COMMIT_DATE}
COPY . .
RUN if [ "$TARGETPLATFORM" = "linux/arm64" ]; then \
export JEMALLOC_SYS_WITH_LG_PAGE=16; \
fi && \
$HOME/.cargo/bin/cargo build --release
# Run
FROM alpine:3.14
ENV MEILI_HTTP_ADDR 0.0.0.0:7700
ENV MEILI_SERVER_PROVIDER docker
RUN apk update --quiet \
&& apk add -q --no-cache libgcc tini curl
COPY /meilisearch/target/release/meilisearch .
EXPOSE 7700/tcp
ENTRYPOINT ["tini", "--"]
CMD ./meilisearch
It is not obvious to figure out
- where the different parts start and end,
- what is constructed or shared between steps and how,
- where every piece is coming from,
- and where it is finally installed.
There is no contract or constraint between the different parts, and it’s even a common pattern to
use different images/distributions in the FROM
statements. We are simply copying blobs between
stages without consistency or dependency checks. I can only imagine the kind of bugs we can
potentially introduce with such a construction.
And yet it can work pretty well, but can’t we do better ?
A better way
Back to square one
Linux’s distributions have accumulated decades of experience on how to build and package software in the most efficient and reliable way. The distribution packaging tool-chains prevent you to reproduce the same mistakes all over again that stacking build instructions in a Dockerfile inevitably expose you to make. They can :
- get source code from reliable sources,
- verify check-sums and signatures,
- patch files,
- build and test in isolation for multiple hardware architectures,
- optimize (remove symbols),
- split in sub-packages (architecture independent assets, doc, dev, …),
- cleanup,
- trace dependencies automatically,
- compress and package everything into simple archives you can install everywhere, even in running containers
All of that without having to know the exhausting list of tools needed to complete all those tasks, and with the guarantee of not skipping any step on the way. As a bonus, you get proper error and warning messages when something goes south or looks suspicious.
There is NOT a single task from the list above that a Dockerfile can do for you out of the box, and that fact alone, should make you think several times before building something within a Dockerfile ever again.
Containerfile for assembly
Instead of building your software directly in a Dockerfile, why not using the packaging tools of the distribution you are targeting for your image (alpine Linux), and just install the package ?
This is for example a Containerfile
(which is the OCI name for Dockerfile) for another meilisearch
image
FROM reg.itsufficient.me/alpine:3.15
MAINTAINER eric@itsufficient.me
RUN apk add --no-cache meilisearch=0.25.2-r0
## add s6 configuration
ADD etc /etc
# meilisearch port
EXPOSE 7700
The Containerfile
is straightforward and easy to understand.
No need for layers or cache: it just installs packages and copy files and completes very quickly.
The build process is now composable: you can easily use the same package in totally different contexts (even on running containers).
It is flexible: we split the build (packages) and the assembly process (image) we can manage in different projects with different teams and different security policies.
It scales better: you can clearly see the dependency graph, implement it in the form of CI/CD pipelines, and let the system execute everything in parallel whenever possible.
You can even build multiple image versions with the same Containerfile
without modifying it :
FROM reg.itsufficient.me/alpine:3.15
MAINTAINER eric@itsufficient.me
ARG TAG
RUN apk add --no-cache meilisearch=${TAG}-r0
## add s6 configuration
ADD etc /etc
# meilisearch port
EXPOSE 7700
You just have to add a tag to your repository, and configure your CI/CD pipeline script to pass the tag
to the build engine (docker
, podman
, buildah
, kaniko
, …) with --build-arg TAG="${CI_COMMIT_TAG}"
.
One can argue that we just moved the build complexity behind :
a base image that hides the configuration of the package repository (URL and public keys) and entry point,
a supervision suite (s6) that probably breaks the mantra one process per container (we will come back on that later).
On top of that, we now need a way to build and deploy packages to a private repository, and rely on CI/CD pipelines to glue everything together.
These are valid points but regarding all the qualitative and functional advantages we get by
moving all build logic out of the Containerfile
, it is totally worth the effort as we will see
now.
Building packages
alpine Linux uses APKBUILD files (heavily inspired by gentoo build system) for creating a package.
This is what is needed for instance to build, test, package, and optimize meilisearch
. It is pretty
straightforward and easy to understand even if you know nothing about APKBUILD
syntax (structured
sh).
# Maintainer: Eric BURGHARD <eric@itsufficient.me>
pkgname=meilisearch
pkgver=0.25.2
pkgrel=0
pkgdesc="Powerful, fast, and an easy to use search engine"
url="https://www.meilisearch.com"
arch="x86_64"
license="MIT"
makedepends="cargo"
install="$pkgname.pre-install"
source="$pkgname-$pkgver.tar.gz::https://github.com/$pkgname/$pkgname/archive/v$pkgver.tar.gz"
build() {
cargo build --release
}
check() {
cargo test all --release
}
package() {
install -Dm755 target/release/"$pkgname" "$pkgdir"/usr/bin/"$pkgname"
}
sha512sums="
fb22c314b3d2dae4b46640d2ed6fd91a1e80f649c597f8b1bcb63a259173a0a54810dc0a9fe1cddcc991652cd0a590eed827d4f65754e404aa862d4de1b4fa92 meilisearch-0.25.2.tar.gz
"
build()
, check()
and package()
are special functions
(or hooks) and most of the heavy work is done automatically behind your back. A lot of these
functions have a default implementation and run even if not defined/overridden in the APKBUILD
file (fetch()
, unpack()
, prepare()
, …).
By just running abuild -r
, you end up with a software properly verified, tested, optimized and
packaged. Dependencies are installed with the package and additional scripts (ex: user/group
creation) can be executed at several points during the process (pre-install
, post-intall
, …).
We can see immediately what is going to be build, for which architecture, what version and where it
will be installed. More importantly, you can feel that by just replacing some values you could build
a totally different rust project. In contrast, everything seems mixed up and
WET in the meilisearch
Dockerfile and there
is probably nothing to keep if you want to adapt it to another project.
From experience, you rarely need more than 50 lines to build anything in whatever framework or language. You can easily find examples you can just copy/paste and quickly adapt to your needs.
You can follow the tutorial Building alpine Linux packages inside a container to convince yourself that the procedure is easy. What follows now is what is needed to streamline the production of packages and images for production.
Build cache
Cache is best handled at the language tool level because of the granularity of dependencies. A cache at the image level would be invalidated as soon as one dependency changes, whereas build tools are smart enough to rebuild only what is necessary. As projects often have hundreds of dependencies, no one can seriously consider image layers as an effective cache system.
As the CI/CD scheduler can start a build pipeline in any available physical servers (node), we need
a way to share the cache through the network. Most modern tools now use connector for
S3 compatible store, and
GitLab chart (Kubernetes) comes with a pre-configured
minio instance you can use to add your own buckets. For rust, I use
sccache by just defining environment variables before running
cargo
:
AWS=XXXX
RUSTC_WRAPPER=/usr/bin/sccache
SCCACHE_BUCKET=sccache
SCCACHE_ENDPOINT=myendpoint:443
SCCACHE_S3_USE_SSL=true
A lot of build tools (bazel, cmake, …) support S3 out of the box or via plugins, but if nothing is available, you can still leverage the cache functionality integrated in GitLab.
CI/CD to rule them all
CI/CD has taken a central place in software development and is used to build, test, assess, package and deploy applications. GitLab generalizes the notion of package as being some files served under a well-defined protocol, and expands the list of integrated registry types with each new version at an incredible pace.
To build software outside a Containerfile
we need at least 3 projects which share the same name
and tags but located in different namespaces (or groups). These projects are built either at each
new commit (tag) or through API triggers, and publish
different kind of packages upon successful execution. Each project has eventually the right to make
changes and commit to its downstream project, as well as getting packages published by its upstream
project :
A source project builds, tests, and publishes source releases. It is allowed to update a package project (inject the tag and update checksum in a
APKBUILD
file),a package project gets and verifies the source release, then builds, tests and deploys packages to an alpine repository. It is allowed to update an image project (inject a tag in a
Containerfile
),an image project assembles and pushes images. It is allowed to update a manifest project (inject a tag in a YAML manifest)
an optional manifest project describes how the application is deployed :
- Either a CI/CD GitOps pipeline runs when something changes in the project git repository (push strategy),
- or a GitOps controller constantly monitorizes the repository (pull strategy) and reacts to changes.
The manifest can be patched with kustomize to match the running environment before being submitted to the control plane (using kubectl or any other tools), which finally pulls the container image.
We can test in some pipelines if the tag of the downstream project already exists to trigger a
rebuild instead of committing changes (which then triggers a build). For instance if a package
project-0.1.0-r1.apk
replaces project-0.1.0-r0.apk
, the already tagged image project:0.1.0
will not change. A rebuild is triggered, and the most recent package will be picked up during the
assembly phase. However, this does not apply to the source project as it is tightly coupled to the
package project due to the signature of the source archive present in the APKBUILD
file. In that
case we always need to commit the new checksum.
Configuring the CI/CD for each alpine package is really easy and totally DRY thanks to GitLab CI/CD templates and to the fact that projects share the same name and tags (only their group are different). You can look at my GitLab templates if this is of any interest for you.
Here is for instance the .gitlab-ci.yml
file I use for all my alpine package projects :
include:
- project: templates/gitlab
file:
- deploy_apk.yml
- commit_downstream_container.yml
variables:
BUILD_VARIANT: /rust
Packages repository
reposerve
GitLab has package registry (npm
, maven
, conan
, …), container registry (OCI), and infrastructure
registry (helm), but hasn’t tackled yet the problem of OS packages registry (rpm
, deb
, apk
, …).
All packages registries I know rely on simple static file servers, and alpine is no exception to the
rule. Nevertheless, if you want to be able to deploy packages from a CI/CD pipeline with a POST
for instance, you need more than a static server to handle authorization, uploads, and trigger
actions when new packages are posted (update and sign index). I didn’t find any software capable of
handling these problems out of the box.
The generalization of microservices and emergence of new compiled languages changed the way we approach HTTP services nowadays. From using a generic solution with a complex configuration and scripts (nginx), we shifted to tailor made solutions with simple configurations. Rust is a perfect candidate for these kinds of projects because it has some of the fastest HTTP frameworks and gives some strong guarantees over memory and thread safety.
reposerve is a static file server over directories
containing alpine packages and indexes. It offers an /upload
path (which access can be restricted
to the use of a valid JWT token) to post one or several packages. It
signs automatically the indexes when new packages are uploaded using alpine build tools.
Server configuration
Configuring reposerve
is easy.
dir: /home/packager/packages
tls:
crt: /var/run/secrets/reposerve/tls.crt
key: /var/run/secrets/reposerve/tls.key
jwt:
jwks: https://gitlab.example.com/-/jwks
claims:
iss: 'gitlab.example.com'
ref_protected: 'true'
ref_type: 'tag'
namespace_path: 'alpine'
We need to provide a JWKS
URL where we can find the public key needed to verify the signature of the JWT token as well as a
list of claims the JWT token should include. Here we just ask that reposerve
should verify that the
token is issued by gitlab.example.com
. The other claims are
GitLab specific and
mean that the upload should come from a pipeline originating from a project from the alpine
namespace (group) and from a commit with a protected
tag.
Client configuration
Once reposerve
is deployed under a defined domain (alpine.example.com), the configuration on the
consumer side is simple. Depending on the version of alpine (3.15) you are running, you just have to
append the deployment’s URL to the list of the repositories.
sed -i "1ihttps://alpine.example.com/3.15/main" /etc/apk/repositories
sudo apk update
CI/CD configuration
GitLab automatically generates a short-lived (1h) JWT token for every running pipelines under the
environment variable CI_JOB_JWT
. Inside the token we find claims that match the project connected
to the pipeline and which must match the ones indicated in the reposerve
configuration. Otherwise,
the upload is rejected.
I use this script (embedded in the deploy_apk.yml
GitLab template presented above) to build and
upload the packages to the repository :
# run abuild in the directory containing the APKBUILD file
/usr/bin/abuild -r -P /tmp/packages
# post the packages to the repository
apkdeploy.sh
apkdeploy.sh
use curl
and the JWT token to post all the packages found in a directory to the
repository, detecting architectures and versions automatically :
#!/bin/sh
. /etc/os-release
VERSION="${VERSION_ID%.*}"
REPO="$(basename $(dirname $(pwd)))"
DIR="${1:-/tmp/packages}"
# this is to deal with multi-arch build
for arch in "$(find $DIR -name APKINDEX.tar.gz)"; do
ARCH="$(basename $(dirname $arch))"
args="-H 'Authorization: Bearer $CI_JOB_JWT_V2' -F 'version=$VERSION' -F 'repo=$REPO' -F 'arch=$ARCH' "
for file in "$(find $DIR -name '*.apk')"; do
args="${args}-F file=@$(basename $file) "
done
(cd "$(dirname $arch)" && eval curl $args https://$HOST/upload)
done
Entry point
Process supervision
Process supervision is a crucial part of any Linux system whether it’s running on a real host, a vm, or a container.
The fact you could suddenly package a Linux distribution without previous knowledge about system init, process management or packaging software was bad in terms of quality and security. A container provides a thin abstraction layer on top of the host operating system and gives the false impression that these questions are not relevant anymore.
That belief also takes root from the mantra many are still relaying: a container should run only one process. The number of project in GitHub alone dedicated to be used as docker entry point (tinit, dumb-init, pid1, …), or the number of images that carelessly run the executable as PID 1 just says long about the gap between theory and reality.
Process supervision is tricky. There is a lot of corner cases and I think you should dedicate a vast amount of time to understand everything correctly. This is not my case and as a matter of fact, I thought for a long time that systemd was a superior init system because it provided a better way of describing services and their dependencies than a regular sysvinit, and because it was compiled (C) and declarative instead of being interpreted. I was really surprised that it was apparently not used in the container world.
In fact, systemd
is so intricately tied to kernel functionalities that it’s impossible to run in a
containerized environment unless you also tie systemd
to the container engine as
podman did,
and drill a lot of security holes by allowing the container to manage critical host’s resources
(cgroup
) just to make systemd
happy.
I later discovered thanks to the s6 supervision suite,
why systemd
approach was flawed and why it would
probably never be usable in musl
environments due to its strong libc
dependency (which also makes systemd
poorly portable).
s6
s6
is the perfect candidate for process supervision and PID 1 management in containerized
environment because its scope has been limited to do that an only that in the most efficient way. It
is lightweight (5Mb all included) and has
been carefully crafted by
Laurent Bercot who has an extensive
(low level) experience on Linux process programming and supervision. At some point, s6 should
become the official service manager
of alpine.
The best way to use s6 inside your container is to use s6 overlay.
Choosing a supervision suite just for managing correctly the PID 1 seems overkill, but I have at least 2 running processes inside all my containers :
rconfd
manages configuration files and get secrets. When secrets change, services are signaled (SIGHUP) and can reload their configuration. This is way better than the traditional approach: no proper secret management or waiting for Kubernetes to kill a pod because it became unresponsive after loosing its access.The main service (generally corresponding to the name of the container) which should wait
rconfd
startup notification (meaning configuration files have been successfully generated). I use for that the poll free lightweight startup notification of s6.
The way a s6 container starts is consistent with the way every Linux host OS starts: by running an init script (provided by the s6-overlay package). s6 init supervises a services tree based on structured directories. This what I use for my golden base image :
Containerfile
FROM alpine:3.15
MAINTAINER eric@itsufficient.me
ARG TAG=${TAG}
ARG RCONFD_VERSION=0.11.2-r0
## install s6
RUN apk add --no-cache \
s6-overlay=${TAG}-r0 \
rconfd=${RCONFD_VERSION}
## add s6 base configuration
ADD etc /etc
## run s6 as root
ENV TERM xterm
USER root
ENTRYPOINT ["/init"]
You just have to install whatever package you need and ADD
some scripts in the /etc/services.d
directory of the derived image. s6
will do the rest. No need to redefine ENTRYPOINT
or CMD
. You
could also package the scripts in a $pkgname-s6
package and just install it, but as this is a
frequent moving part in images (several images can start services differently) I add it directly in
the Containerfile
instead.
One key component of s6 is execline
which goal is to
replace the interpreter used by init scripts (i.e. bash) with a no-interpreter, and reduce the
scripts to one-liners. It sounds completely silly, but it is in fact brilliant.
An execline
script is a chain of commands + arguments. Each command consumes its own arguments,
completes its task and then replaces itself (like exec shell command) with the remaining arguments
(chain-loading). The script is parsed only once at startup, no interpreter lies in memory during the
process, and yet you can do everything bash can do. It looks like an impossible mission script
that is consuming itself to the end. Only the part that has not been executed stays in memory at
each step. No interpreter means fewer security risks, fewer resources allocated, and instant
startup.
execline
is the preferred method for defining services under s6. We could write the following
script to start meilisearch
. Placed in the right directory (/etc/services.d/meilisearch
), it will
be picked up automatically by the init script. This is the last missing part for creating our
meilisearch
container :
/etc/services.d/meilisearch/run
#!/bin/execlineb -P
with-contenv
foreground { s6-echo start meilisearch }
cd /var/lib/meilisearch
s6-setuidgid meilisearch
/usr/bin/meilisearch
The foreground
instruction looks weird, but as s6-echo
accepts a variable number of arguments
and doesn’t exec into anything else, we must use foreground
which forks and waits the {}
delimited block then exec into the remaining script (from the cd
point). This is chain-loading at
work: even if I used new lines for readability, it is really only one line.
Conclusion
No doubt that docker has changed radically the way we build and deploy applications and will forever be remembered as a disruptive technology. If the simplification of all the processes involved in the life cycle management of containers undeniably explains its dazzling adoption, it also became its curse after a few years. It bet everything on an all-in-one approach, and sometimes tried to reinvent the wheel just to stay the central piece of everything. This approach also impacted negatively the quality of what was built with docker.
Kubernetes won because of its modularity. It allowed each component, through a stable and evolutionary API, to have their own release schedule and make their own experiments, and it quickly became the center of a staggering number of external contributions over a broad range of cloud oriented technologies. It ended up stripping out docker itself. By slowly replacing it pieces by pieces (OCI, CNI, CSI, …), it finally deprecated docker as a container engine altogether. Today the combo crio, crun, kubelet is faster than docker while being highly composable.
Running containers shouldn’t be taken differently than running real hosts. Concerns about efficiency and security should remain the same to really benefit from their additional isolation features. A software running on a host should run seamlessly on a container and vice versa, and you should always prefer less instead of more because it leads to simple and composable instead of cluttered and tied.
I’m grateful to s6 author to have patiently constructed such a nice init system for container and
for showing that simple works better. He recently managed to be sponsored to work full time on his
project. I think that s6 will soon be ready to chase systemd
on its land.
Unifying host and container init system would
facilitate system administration and turn container oriented distributions
(flatcar) even faster, lighter and more secure.
I hope to have successfully shown that building up procedures on top of decade old tools or habits is not necessarily a bad thing, and that taking shortcuts in the name of simplified procedures or modernity is not always a good thing. As always, feel free to give me your impressions in the comments’ area below.
Éric BURGHARD
Related posts
Managing roles for PostgreSQL with Vault on Kubernetes
Vault has a database secret engine with a PostgreSQL driver that helps to create short-lived roles with random passwords for your database applications, but putting everything in production is not as simple as it seems.
Building an alpine golden image
How to build an alpine image to base all your containers on.
Intalling Kubernetes with cri-o inside flatcar Container Linux
How to run containers without dockershim / containerd by installing cri-o with crun under flatcar Container Linux.