Do you know what container layers are? How are they stored on the file system? How are they used run a program? It’s time to answer these questions.

Containers are isolated environments for programs, and the foundation of programs is files. The program itself is an executable file, almost every program needs libc (the libc.so file), the time zone database (the /usr/share/zoneinfo directory), a dynamic linker (the ld-linux.so file).

Containers are also self-sufficient: you can download it and you don’t need to install anything on the host system. To secure this property, we must isolate its files from the host files.

Chroot Link to heading

The easiest way to achieve this isolation is chroot(2). We’ll use this syscall via a command with the same name: chroot(1).

Let’s build our first container with bash and ls.

$ which bash ls
/usr/bin/bash
/usr/bin/ls
$ mkdir -p ./container/usr/bin
$ cp /usr/bin/bash /usr/bin/ls ./container/usr/bin/
$ sudo chroot ./container /usr/bin/bash
chroot: failed to run command ‘/usr/bin/bash’: No such file or directory

No such file or directory? That doesn’t sound right. Let’s check it again.

$ ls ./container/usr/bin/bash
./container/usr/bin/bash
$ file /usr/bin/bash
/usr/bin/bash: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=33a5554034feb2af38e8c75872058883b2988bc5, for GNU/Linux 3.2.0, stripped

Oh, bash is a dynamically linked program. The kernel cannot find its dynamic linker: /lib64/ld-linux-x86-64.so.2. We need to copy its dependencies and the dynamic linker.

$ ldd /usr/bin/bash
	linux-vdso.so.1 (0x00007fff3cff9000)
	libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007ffbb7b03000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffbb78db000)
	/lib64/ld-linux-x86-64.so.2 (0x00007ffbb7c9f000)
$ ldd /usr/bin/ls
	linux-vdso.so.1 (0x00007ffffc78d000)
	libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007f3f2ac51000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3f2aa29000)
	libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f3f2a992000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f3f2acaa000)

Libraries may need other libraries, let’s check dependencies of libtinfo.so.6.

$ ldd /lib/x86_64-linux-gnu/libtinfo.so.6
	linux-vdso.so.1 (0x00007ffec9b42000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f786236a000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f78625cd000)

No new dependencies, output for other libraries on Ubuntu 22.04 is the same.

Let’s put all dependencies into our container directory and try it again.

$ mkdir -p ./container/lib/x86_64-linux-gnu
$ cp \
	/lib/x86_64-linux-gnu/libtinfo.so.6 \
	/lib/x86_64-linux-gnu/libselinux.so.1 \
	/lib/x86_64-linux-gnu/libpcre2-8.so.0 \
	/lib/x86_64-linux-gnu/libc.so.6 \
	./container/lib/x86_64-linux-gnu/
$ mkdir -p ./container/lib64
$ cp /lib64/ld-linux-x86-64.so.2 ./container/lib64
$ sudo chroot ./container /usr/bin/bash
bash-5.1# ls -al /
total 20
drwxrwxr-x 5 1000 1000 4096 Jan 30 13:06 .
drwxrwxr-x 5 1000 1000 4096 Jan 30 13:06 ..
drwxrwxr-x 3 1000 1000 4096 Jan 30 13:06 lib
drwxrwxr-x 2 1000 1000 4096 Jan 30 13:06 lib64
drwxrwxr-x 3 1000 1000 4096 Jan 30 12:58 usr
bash-5.1# exit
exit

It works! We can pack it, send it to another Linux machine, and it’ll work.

$ tree ./container
./container
├── lib
│   └── x86_64-linux-gnu
│       ├── libc.so.6
│       ├── libpcre2-8.so.0
│       ├── libselinux.so.1
│       └── libtinfo.so.6
├── lib64
│   └── ld-linux-x86-64.so.2
└── usr
    └── bin
        ├── bash
        └── ls

5 directories, 7 files

Chroot is not a security feature Link to heading

Some programs need /proc, let’s mount it and check what we have there.

$ mkdir ./container/proc
$ sudo mount -t proc proc ./container/proc
$ sudo chroot ./container /usr/bin/bash
bash-5.1# ls -al /proc/1/cwd/
total 72
drwxr-xr-x  20    0    0  4096 Jan 20 23:11 .
drwxr-xr-x  20    0    0  4096 Jan 20 23:11 ..
lrwxrwxrwx   1    0    0     7 Jan 10 02:08 bin -> usr/bin
drwxr-xr-x   3    0    0  4096 Jan 21 11:17 boot
drwxr-xr-x  17    0    0  3840 Jan 20 23:10 dev
drwxr-xr-x  95    0    0  4096 Jan 29 22:21 etc
drwxr-xr-x   4    0    0  4096 Jan 20 23:10 home
lrwxrwxrwx   1    0    0     7 Jan 10 02:08 lib -> usr/lib
lrwxrwxrwx   1    0    0     9 Jan 10 02:08 lib32 -> usr/lib32
lrwxrwxrwx   1    0    0     9 Jan 10 02:08 lib64 -> usr/lib64
lrwxrwxrwx   1    0    0    10 Jan 10 02:08 libx32 -> usr/libx32
drwx------   2    0    0 16384 Jan 10 02:10 lost+found
drwxr-xr-x   2    0    0  4096 Jan 10 02:08 media
drwxr-xr-x   2    0    0  4096 Jan 10 02:08 mnt
drwxr-xr-x   2    0    0  4096 Jan 10 02:08 opt
dr-xr-xr-x 167    0    0     0 Jan 20 23:10 proc
drwx------   4    0    0  4096 Jan 24 09:44 root
drwxr-xr-x  34    0    0  1040 Jan 30 12:52 run
lrwxrwxrwx   1    0    0     8 Jan 10 02:08 sbin -> usr/sbin
drwxr-xr-x   6    0    0  4096 Jan 10 02:09 snap
drwxr-xr-x   2    0    0  4096 Jan 10 02:08 srv
dr-xr-xr-x  13    0    0     0 Jan 20 23:10 sys
drwxrwxrwt  11    0    0  4096 Jan 30 12:52 tmp
drwxr-xr-x  14    0    0  4096 Jan 10 02:08 usr
drwxr-xr-x   1 1000 1000   160 Jan 23 01:42 vagrant
drwxr-xr-x  13    0    0  4096 Jan 10 02:09 var
bash-5.1#

Oops, we’ve got access to the host files.

Chroot is not a security feature, and this is not the only way to break out of the chroot, you can read more about it in these articles:

Even though our container isn’t secure at all, it’s good enough for demonstration purposes, so let’s clean up and move on.

$ sudo umount ./container/proc
$ rm -rf ./container

BusyBox Link to heading

We can avoid all this dynamic-library fuss if we use BusyBox.

$ mkdir -p ./container/bin
$ cp "$(which busybox)" ./container/bin/sh
$ sudo chroot ./container /bin/sh


BusyBox v1.30.1 (Ubuntu 1:1.30.1-7ubuntu3) built-in shell (ash)
Enter 'help' for a list of built-in commands.

/ # ls /
bin
/ # exit

BusyBox is a statically linked executable, so it doesn’t need any dependencies. If you don’t know about BusyBox, I’d recommend to read its man.

Layers Link to heading

Even putting a single binary like bash into an isolated root directory is quite a challenge. If you want to put something that doesn’t exist on your machine (some library), it’s even more difficult.

There is one thing that can help us: package managers. If we have a package manager in a container, we can use it to install additional programs and libraries.

We can create a root directory with only a package manager. This base root directory may be relatively small, but if we need to run a lot of containers, things can quickly get worse: each container needs its own copy of these base files, and creating them (i.e. creating a container) takes time and takes up disk space.

Luckily, we have a solution: the overlay filesystem. It can merge multiple directories into a single directory tree, and keep the base directory untouched.

mount \
	-t overlay \
	-o lowerdir=./lower1:./lower2,upperdir=./upper,workdir=./work \
	overlay \
	./mergeddir

If you read a file from ./mergeddir, first it’ll be searched in ./upper, then in ./lower1, and then ./lower2. The first found file is used. Writes happen only into ./upper directory. When you delete a file or directory, a special marker (a whiteout) is created in the upper directory. The workdir is an empty directory for the kernel purposes, it should be on the same filesystem as upperdir. You can read more about overlayfs in the kernel documentation.

It allows us to have only one copy of the base files, and use them in different containers. Then we can run a container with the base files, install some programs, and create a snapshot of the result files. These snapshots are layers.

This concept of layers is at the heart of containers. Note that the layers are in a different order than the lowerdir. The second layer overrides content of the first layer.

LLLUMa/aape/ybyyprrbeieeeegirnrrraen/ddd/1s23ishrdhirec//t/eeoettrtccryc//e/ppatpaadrassessseswwwddd//mmyryaeapappdp..sshhr.e.eaendnvv//nwnerewiwftfieillee

Creating a container with multiple layers Link to heading

Let’s create the layers of our container. We’ll use BusyBox this time.

mkdir ./data ./data/layer1 ./data/layer2 ./data/layer3

# layer 1
mkdir ./data/layer1/bin ./data/layer1/etc
cp "$(which busybox)" ./data/layer1/bin/sh
echo 'root:x:0:0:root:/:/bin/sh' >./data/layer1/etc/passwd

# layer 2
cat >./data/layer2/myapp.sh <<'END'
#!/bin/sh -x
echo .* *
. /.env
echo "$MSG"
END
chmod +x ./data/layer2/myapp.sh

# layer 3
mkdir ./data/layer3/etc
cp ./data/layer1/etc/passwd ./data/layer3/etc/passwd
echo 'user:x:1000:1000:user:/:/bin/sh' >>./data/layer3/etc/passwd
echo 'MSG="Hello, world!"' >./data/layer3/.env

Note that to add only one line, we copied the entire file. We would need to copy it even if we wanted to change permissions or other attributes.

We also need an upperdir (we’ll call it diff), workdir, and mergeddir.

mkdir ./data/container ./data/container/diff ./data/container/work \
      ./data/container/merged

Let’s run it.

LOWERDIR="./data/layer3:./data/layer2:./data/layer1"
UPPERDIR="./data/container/diff"
WORKDIR="./data/container/work"
sudo mount \
	-t overlay \
	-o "lowerdir=$LOWERDIR,upperdir=$UPPERDIR,workdir=$WORKDIR" \
	overlay \
	./data/container/merged
sudo chroot ./data/container/merged /myapp.sh

The last command should print:

+ echo . .. .env bin etc myapp.sh
. .. .env bin etc myapp.sh
+ . /.env
+ MSG='Hello, world!'
+ echo 'Hello, world!'
Hello, world!

We’ve created a container that has 3 layers.

As a final step, we must clean up after ourselves.

sudo umount ./data/container/merged
rm -rf ./data

Tool for creating layers and running containers Link to heading

Let’s create a bash script wiac (what-is-a-container) that automates everything we did before.

wiac:

#!/bin/bash -eu
TMPDIR="${TMPDIR:-/tmp}"
DATADIR="${DATADIR:-"$TMPDIR/wiac-data"}"

err() { echo "$*" >&2; exit 1; }

copy() {
	if [[ $# < 3 ]]; then
	  err "usage: copy <layerdir> <src1> [<src2> ...] <dst>"
	fi
	local layerdir="$1"
	local src=("${@:2:$#-2}")
	local dst="${@:$#}"
	mkdir -p "$layerdir/diff/${dst%/*}"
	cp -R "${src[@]}" "$layerdir/diff/$dst"
}

run() {
	if [[ $# < 3 ]]; then
		err "usage: run <lowerdirN>[...:<lowerdir1>] <containerlayer> <entrypoint...>"
	fi
	mkdir -p "$2/diff" "$2/work" "$2/merged"
	local mergeddir="$2/merged"
	sudo mount -t overlay \
		-o "lowerdir=$1,upperdir=$2/diff,workdir=$2/work" \
		overlay "$mergeddir"
	shift 2 # remove lowerdir and containerlayer from args
	(
		trap 'sudo umount "$mergeddir"' EXIT
		sudo chroot "$mergeddir" "$@"
	)
}

# preprocess_dockerfile removes comments and handles line continuations.
preprocess_dockerfile() {
	sed -n '
: again
/^#/d
/\\$/ {
	N
	s/\\\n//
	t again
}
/^[A-Z]/p
' "$@"
}

build() {
	if [[ $# != 1 ]]; then
		err "usage: build <path>"
	fi
	cd "$1"
	local n=1 lowerdir="" cmd
	while read -r line; do
		set -- $line
		cmd=$1; shift
		case "$cmd" in
		FROM)
			;;
		COPY)
			copy "$DATADIR/layer$n" "$@"
			if [ -z "$lowerdir" ]; then
				lowerdir="$DATADIR/layer$n/diff"
			else
				lowerdir="$DATADIR/layer$n/diff:$lowerdir"
			fi
			n=$((n+1))
			;;
		RUN)
			run "$lowerdir" "$DATADIR/layer$n" /bin/sh -c "$*"
			lowerdir="$DATADIR/layer$n/diff:$lowerdir"
			n=$((n+1))
			;;
		*)
			err "unexpected command $cmd"
			;;
		esac
	done < <(preprocess_dockerfile ./Dockerfile)
	echo "$lowerdir"
}

cmd=$1; shift
case "$cmd" in
copy|run|build)
	"$cmd" "$@"
	;;
*)
	err "unknown command $cmd"
	;;
esac

It can build a container from scratch.

mkdir -p ./src/bin ./src/etc
cp "$(which busybox)" ./src/bin/sh
echo 'root:x:0:0:root:/:/bin/sh' >./src/etc/passwd
cat >./src/Dockerfile <<'END'
FROM scratch
COPY . /
RUN printf '#!/bin/sh -x\necho .* *\n. ./.env\necho $MSG\n' >./myapp.sh && \
    chmod +x ./myapp.sh
RUN echo "user:x:1000:1000:user:/:/bin/sh" >>/etc/passwd && \
    echo 'MSG="Hello, world!"' >/.env
END
~$ cd ./src
~/src$ ../wiac build .
/tmp/wiac-data/layer3/diff:/tmp/wiac-data/layer2/diff:/tmp/wiac-data/layer1/diff
~/src$ ../wiac run /tmp/wiac-data/layer3/diff:/tmp/wiac-data/layer2/diff:/tmp/wiac-data/layer1/diff /tmp/upperlayer /myapp.sh
+ echo . .. Dockerfile bin etc myapp.sh
. .. Dockerfile bin etc myapp.sh
+ . ./.env
+ MSG='Hello, world!'
+ echo Hello, 'world!'
Hello, world!
~/src$

Reusing podman layers Link to heading

$ podman run --rm -i -t node mount
...
overlay on / type overlay (rw,relatime,context="system_u:object_r:container_file_t:s0:c539,c879",lowerdir=/home/obulatov.linux/.local/share/containers/storage/overlay/l/PGV3FCQXCAYYYHBPDQLHTXJPWX:/home/obulatov.linux/.local/share/containers/storage/overlay/l/Q4GEF43UFNU4FNEAS7MRBU2FQC:/home/obulatov.linux/.local/share/containers/storage/overlay/l/ED34UM2AKWBESQJIMUQDTQQELT:/home/obulatov.linux/.local/share/containers/storage/overlay/l/KSWWLEPB242QCOOU3B5RS6HC5K:/home/obulatov.linux/.local/share/containers/storage/overlay/l/QU2NNNGE6BF434EXHCPCR4IQDX:/home/obulatov.linux/.local/share/containers/storage/overlay/l/VC2QU23NKXJVC45HPNSUIYS6NY:/home/obulatov.linux/.local/share/containers/storage/overlay/l/DGNGMQ7DLOSBUDMSIEAS2ZW7AB:/home/obulatov.linux/.local/share/containers/storage/overlay/l/4VV5PCPETKU3GOAMABZB544SK5:/home/obulatov.linux/.local/share/containers/storage/overlay/l/6VQMGZHRJAWC374ZJWASTEPROT,upperdir=/home/obulatov.linux/.local/share/containers/storage/overlay/1e236df8934fbff8de521d81813e79054c67e963daa6fa91c678d3b01ea55d6f/diff,workdir=/home/obulatov.linux/.local/share/containers/storage/overlay/1e236df8934fbff8de521d81813e79054c67e963daa6fa91c678d3b01ea55d6f/work,volatile,userxattr)
...
$ ./wiac run /home/obulatov.linux/.local/share/containers/storage/overlay/l/PGV3FCQXCAYYYHBPDQLHTXJPWX:/home/obulatov.linux/.local/share/containers/storage/overlay/l/Q4GEF43UFNU4FNEAS7MRBU2FQC:/home/obulatov.linux/.local/share/containers/storage/overlay/l/ED34UM2AKWBESQJIMUQDTQQELT:/home/obulatov.linux/.local/share/containers/storage/overlay/l/KSWWLEPB242QCOOU3B5RS6HC5K:/home/obulatov.linux/.local/share/containers/storage/overlay/l/QU2NNNGE6BF434EXHCPCR4IQDX:/home/obulatov.linux/.local/share/containers/storage/overlay/l/VC2QU23NKXJVC45HPNSUIYS6NY:/home/obulatov.linux/.local/share/containers/storage/overlay/l/DGNGMQ7DLOSBUDMSIEAS2ZW7AB:/home/obulatov.linux/.local/share/containers/storage/overlay/l/4VV5PCPETKU3GOAMABZB544SK5:/home/obulatov.linux/.local/share/containers/storage/overlay/l/6VQMGZHRJAWC374ZJWASTEPROT /tmp/upperdir npm --version
9.3.1

The wiac tool can run it! If only it could pull images… Stay tuned.