How to build an alpine image to base all your containers on.
What’s the point ?
If you start to build your own containers images, repetitions will appear sooner or later in your manifests, as well as divergences when you change something that you forget to propagate all over the places.
A golden image provides a common base to avoid repetitions and ease update and maintenance. When it has been proven to be stable enough, a base image acquires the golden status.
A simple golden image
I don’t build anything in Containerfile
(or Dockerfile
) anymore: The rational is explained in:
“A better way to build containers images”.
If you followed “Building / consuming alpine Linux packages inside containers and
images”, you know that, in order to be able to install
your own package with apk
(alpine package tool), you need to add a repository public key with a URL
system-wide, and you need to repeat this process for each of your image that use private packages.
The default libmusl’s memory allocator is also not the fastest around (specially in multithreaded context), so you may want to switch to mimalloc for all your alpine deployment.
Well, we can write these 2 configuration tasks inside a Containerfile
file :
FROM alpine:3.17
LABEL org.opencontainers.image.authors=eric@itsufficient.me
ARG MIMALLOC_VERSION=2.0.9-r0
## add repository public key and URL
COPY domain.my-xxxxxxxx.rsa.pub /etc/apk/keys/
RUN sed -i "1ihttps://apk.domain.my/${TAG}/main" /etc/apk/repositories
## install mimalloc
RUN apk add mimalloc=${MIMALLOC_VERSION}
## use mimalloc per default
# don't forget s6 with-contenv in services to inherit variables
ENV LD_PRELOAD=/lib/libmimalloc.so.2.0
ENV MIMALLOC_LARGE_OS_PAGES=1
You just need a mimalloc package
to generate this image, but I’ve got you covered, and you can use this
APKBUILD
file to generate it :
# Maintainer: Éric BURGHARD <eric@itsufficient.me>
pkgname=mimalloc
pkgver=2.0.9
pkgrel=0
pkgdesc="mimalloc is a compact general purpose allocator with excellent performance."
url="https://github.com/microsoft/mimalloc"
arch="all"
license="MIT"
makedepends="cmake"
options="!check" # No test suite
subpackages="$pkgname-doc $pkgname-dev"
source="$pkgname-$pkgver.tar.gz::https://github.com/microsoft/$pkgname/archive/v$pkgver.tar.gz"
build() {
mkdir build && cd build
cmake -DMI_INSTALL_TOPLEVEL=ON -DCMAKE_INSTALL_PREFIX=/usr ..
make -j"$(nproc)"
}
package() {
(cd build && make DESTDIR="$pkgdir" install)
mv "$pkgdir"/usr/lib "$pkgdir"/lib
for file in $(ls | grep -i -e license -e copying -e copyring -e changelog -e contributing -e readme -e code_of_conduct); do
install -m644 -D -t "$pkgdir"/usr/share/doc/"$pkgname" "$file"
done
}
dev() {
default_dev
find "$pkgdir"/lib/cmake/mimalloc/ -name '*.cmake' -exec install -Dm644 {} -t "$subpkgdir"/usr/share/cmake/Modules/ \;
rm -rf "$pkgdir"/lib/cmake
}
sha512sums="
bf6945bfb600ade35dab34c7f570ee4f69a77612547ad874bbbd989a4e594a6a219c222a22c90c5e36f205aae4d5cd1a5e4651caed5433db275d414c6769bf49 mimalloc-2.0.9.tar.gz
"
We can now build a new base image with :
buildah bud -t reg.domain.my/containers/alpine:3.17
and use it as follows in Containerfile
:
FROM reg.domain.my/containers/alpine:3.17
A more useful one
Specialization
If we want to use the same image in different contexts, we need to inject somehow the parameters needed to specialize the container to the workload context. This is inversion of control : the software doesn’t ask for parameters, we must provide them, and this is how Kubernetes is working.
But this is not working very well when something is changing frequently in the workload’s context, like short-lived secrets. The explanation lies into how the containers’ orchestrator (Kubernetes) injects parameters at run time, which can roughly be classified in 2 categories :
Static ones :
Dynamic ones which normally don’t require to restart the pod when something changes :
- Sidecar containers
- Volumes
- CSI plugins (AFAIK not really dynamic in case of vault because the content is fixed at startup and immutable on runtime.)
If you have frequently changing secrets (like rotated every hour), and you don’t want your pod to be killed by the orchestrator because it will likely become unresponsive with expired secrets, then sidecar container is a simpler solution.
Volume requires a (less secure) shared access, a controller (with extended rights) to update the volume content, and a local service to watch for modifications and restart services. If this solution scales better (one controller vs nth sidecars), it has broader security implications and no edge over the simplification of operations.
But even the sidecar brings his fair share of :
Complexity: The sidecar needs to be configured separately. A mount point and eventually the processes’ namespace (cgroup) need to be shared between the sidecar and the workload container to allow the sidecar to write configuration files and signal processes when modifications happen.
Resources consumption: Injecting a several hundred megabytes binary like the vault agent in the vault sidecar does not scale very well when all you need to do is calling a REST API to refresh tokens every hour or so. It will certainly be deprecated by Hashicorp in the near future for that reason.
Instead of a sidecar, a better solution is to use a side-process which runs in the same container and has automatically access to files and processes space. If it’s specialized enough (light on resources’ consumption), it can scale a lot better than a sidecar while having fewer security concerns and being simpler to operate.
We now need a process supervision suite to manage the lifecycle of the workload process and its configuration agent (side-process), as well as their interdependencies (i.e. wait for the first generation of configuration before starting the main process).
A supervision suite (as long as it is lightweight) is a perfect candidate to be included in a golden image. It is useful even if you don’t need dynamic configuration because it offers the guarantee of correctly handling the responsibilities coming with running as PID 1, (some don’t care about or simply defer to dumb-init and the likes).
I’m not considering systemd
for the task which :
is clearly oversized (did I say bloated ?) for the task,
doesn’t work seamlessly inside containers, and
is not portable without
glibc
.
A small supervision suite
The natural choice for process management under alpine is s6, which should at some point become its official service manager.
s6 works by running lightweight long-lived daemons that supervise other processes, and offers simple and effective signaling and readiness notification mechanisms.
The main point of entry is s6-svscan which should run as PID 1 (your container entry point), and is the main supervisor.
The services themselves are usually execline scripts organized in services directories and which are supervised by separate instances of another lightweight daemon: s6-supervise.
This is a low level description and even if some work is underway (s6-rc, s6-frontend) to make it more declarative for easily expressing services interdependencies, it is kind of hardcore to use it directly if all you want is just to start a few services.
s6-overlay to the rescue
The quickest way of using s6 in your container is by using s6-overlay which contains the required scripts organized in init stages to start your services without thinking too much about the technical details.
An official alpine package
exists, but
it depends on other s6 software, and I found this subdivision
very unpractical for maintenance reason: as the revisions of the dependencies are not clearly
stated in its apk
manifest, you can introduce runtime bugs if you have different
versions of s6 software in your private repository (to stay on the edge or to backport newer s6 on
older alpine versions). You also end up managing 9 packages or so instead of just one.
As I never use s6 software separately from one another, I chose instead to make an all-in-one s6-overlay package which includes everything statically linked. Nothing clever here as it simply relies on the s6-overlay Makefile which already gets and compiles all dependencies using the right (hard-coded) revisions. Feel free to grab it and adapt to your need.
# Maintainer: Éric BURGHARD <eric@itsufficient.me>
pkgname=s6-overlay
pkgver=3.1.3.0
pkgrel=0
_pkgdesc="s6 overlay for containers"
pkgdesc="$_pkgdesc"
url="https://github.com/just-containers/s6-overlay"
arch="all"
license="ISC"
options="!check"
makedepends="xz linux-headers"
source="$pkgname-$pkgver.tar.gz::https://github.com/just-containers/$pkgname/archive/v$pkgver.tar.gz"
install="$pkgname.post-install"
subpackages="$pkgname-scripts::noarch $pkgname-symlinks::noarch $pkgname-syslogd::noarch"
builddir="$srcdir/$pkgname-$pkgver"
options="!check suid"
build() {
cd "$builddir"
make
}
package() {
cd "$builddir"
mkdir -p "$pkgdir"
tar xf output/$pkgname-$(uname -m).tar.xz -C "$pkgdir"
# remove suid flags otherwise postcheck() fails. Add it again in post-install
chmod -s "$pkgdir"/package/admin/s6-overlay-helpers/command/s6-overlay-suexec
}
scripts() {
pkgdesc="$_pkgdesc - scripts"
cd "$builddir"
mkdir -p "$subpkgdir"
tar xf output/$pkgname-noarch.tar.xz -C "$subpkgdir"
}
symlinks() {
pkgdesc="$_pkgdesc - symlinks"
cd "$builddir"
mkdir -p "$subpkgdir"
tar xf output/$pkgname-symlinks-noarch.tar.xz -C $subpkgdir
tar xf output/$pkgname-symlinks-arch.tar.xz -C $subpkgdir
}
syslogd() {
pkgdesc="$_pkgdesc - syslogd"
cd "$builddir"
mkdir -p "$subpkgdir"
tar xf output/syslogd-overlay-noarch.tar.xz -C $subpkgdir
}
sha512sums="
30e8aa212d29ff185252d8695ffa845ef1dadafc0f133b235bce2caf73ef90cccacd4678ea5e4e72eb9092276ba47fcfa5a10ea1985568c2985e41c6841748f0 s6-overlay-3.1.3.0.tar.gz
"
Definition of our services
Going back to side-process and configuration/secret management, I developed a small tool
in rust for that purpose: rconfd
. It is
similar to consul-template
, but smaller,
faster and with an intentionally narrowed scope (vault,
jsonnet, and few backends). You can use it with Kubernetes (by using Kubernetes
Auth Method) or CI/CD (by using
JWT/OIDC Auth Method).
How to use rconfd
should normally be the subject of another blog post, but let’s see how to
start the service and how other dependent services can wait for it.
s6 services are declared as separated
directories in /etc/services.d
:
└── etc
└── services.d
└── rconfd
├── notification-fd
└── run
notification-fd
contains an integer: the file descriptor number that will be used by the
service to signal its readiness status. It is usually 3
as 0
to 2
are taken by the standard
descriptors (stdin, stdout, stderr).
run
is an executable (don’t forget execution
rights), and execline
is the natural choice for scripting
your services.
etc/services.d/rconfd/run
#!/command/execlineb -P
with-contenv
foreground { /usr/bin/rconfd -D -j /etc/rconfd -r 3 }
importas -u ? ?
if { eltest ${?} = 0 }
s6-pause
execline
is not an interpreter but a parser, although it offers the same Turing completeness than
bash
. Even if I use new lines to separate commands in the script above, you should read
it as a one line instruction. The commands you normally use in execline
scripts are standalone
executables that consume their arguments and execute into something else (much like env
in shell
scripts) while passing the remaining arguments (chain-loading).
execline
parses the script only once at startup to construct the chain of arguments then
replace itself with the first command of the script. Only one command stays in the memory at any given
step. It is much more secure and efficient than having a full-blown interpreter, generally
subject to all kind of parsing and injection exploits, that just stays in memory until the
very last instruction. It’s a perfect fit to start services in containers: lightweight, fast,
secure, as long as the logic stays simple, and the script size is smaller than 4Kb.
foreground
is used here because rconfd
doesn’t do chain loading. Like other daemon (-D
argument) it normally never returns unless
it encounters an error. Nonetheless, rconfd
can return with a success code if it generates
configuration files and has nothing more to do (when it detects that only static secrets are
used for instance). In that case we replace rconfd
by the smallest possible daemon with
s6-pause. This is a nice trick
(used by docker as well with pause
containers) to tell s6: “it’s ok, we are (kind of) still running normally, so please don’t
restart us” (s6-rc
has the concept of one-shot service).
When rconfd
successfully starts, gets access to secrets and generates configuration files, it
will signal on file descriptor 3 (-r 3
) that it is ready. The given integer should match the
one written in the notification-fd
file.
Putting everything together
We now have all the required parts to assemble our image.
Here is the Containerfile
I use for my golden image. I tag the image using the s6-overlay
version (--build-arg TAG=w.x.y.z
argument of buildah
).
FROM reg.domain.my/containers/alpine:3.17
LABEL org.opencontainers.image.authors=eric@itsufficient.me
ARG TAG
ARG RCONFD_VERSION=0.11.4-r0
## install s6
RUN apk add --no-cache \
s6-overlay=${TAG}-r0 \
s6-overlay-scripts=${TAG}-r0 \
rconfd=${RCONFD_VERSION}
## add s6 configuration
ADD etc /etc
## run s6 as root
ENV TERM xterm
USER root
ENTRYPOINT ["/init"]
How to use it
Let’s say we need a k8s container image for a hypothetical myservice
that needs
a configuration file with secrets fetched from a vault
server. This is how
we could use our golden image (of course we need to have compiled/deployed a private
myservice
apk
beforehand) :
Containerfile
FROM reg.domain.my/containers/s6-overlay:3.1.3.0
LABEL org.opencontainers.image.authors=eric@itsufficient.me
ARG TAG
RUN apk add --no-cache myservice=${TAG}-r1
## add s6 configuration
ADD etc /etc
Now the execline
script to start our service:
etc/services.d/myservice/run
#!/command/execlineb -P
with-contenv
# passively wait for configuration to be ready
foreground { s6-svwait -U /var/run/s6/legacy-services/rconfd }
# double check that everything is ok and our config file is present
importas -u ? ?
if { eltest ${?} = 0 -a -f /etc/myservice.yml }
cd /var/lib/myservice
s6-setuidgid myservice
myservice
I eluded the rconfd
files (/etc/rconfd/*.{json,jsonnet}
) on purpose, but you understood that
rconfd
is responsible to generate /etc/myservice.yml
referenced in the above script.
In case the service is critical, and the container should stop if the service fails (default is endless restarting), just add the following script in the service directory :
etc/services.d/myservice/finish
#!/command/execlineb -S1
if { eltest ${1} -ne 0 }
if { eltest ${1} -ne 256 }
/run/s6/basedir/bin/halt
That’s it.
What we have so far
We have one base image (alpine:3.17
) we can use in place of the official alpine one, that is able to consume
packages from our private repository, and that changes the default memory allocator.
We have a golden image (s6-overlay:3.1.3.0
) based on the later that offers process supervision
and has a configuration service that can fetch secrets from a vault server, (re)generates
configuration files and signals others when something change.
In case we don’t need process supervision, we can rely on our minimalist base image: In
the picture above, we derive a build image containing some GitLab tools (build:3.17
) and
then define sub-images containing pre-installed packages (tool chain) for a defined language
(build/rust:1.64.9
) to speed up compilation jobs in CI pipelines.
By using apk
in Containerfile
, we avoid duplicate work in CI (like resources
consuming compilations), and the process of creating or rebasing new images can’t be faster as
it is limited to just packaging existing files together. This is a crucial point in case of
critical emergency patch.
When a new alpine version is published we just have to rebuild in CI all the required apk
and images once, and then update the tag used in the FROM
instruction in Containerfile
to
rebase all our images on the new version. This job can be driven in CI/CD by simply watching
GitHub RSS feeds and using commits and
triggers along the images’ dependency chain.
As usually, I highly appreciate feedbacks, comments or alternate ways of doing the same thing in the comments’ area below.
Éric BURGHARD
Related posts
Managing roles for PostgreSQL with Vault on Kubernetes
Vault has a database secret engine with a PostgreSQL driver that helps to create short-lived roles with random passwords for your database applications, but putting everything in production is not as simple as it seems.
Intalling Kubernetes with cri-o inside flatcar Container Linux
How to run containers without dockershim / containerd by installing cri-o with crun under flatcar Container Linux.
Building / consuming alpine Linux packages inside containers and images
How to build alpine Linux packages you can later install inside other alpine based containers or images