Image Building

Dockerfile, BuildKit, and the OCI Image Format

A container image is, structurally, a list of tar-gzipped filesystem layers plus a JSON manifest. Each layer is content-addressable; identical layers across different images are stored once. When you run an image, the runtime stacks the layers via OverlayFS and gives the container a writable upper layer on top.

Building an image means producing those layers efficiently. Modern builds use BuildKit (parallel stage execution, advanced caching), multi-stage Dockerfiles (compile in a fat builder, copy into a thin runtime), and distroless or scratch bases for minimum attack surface. buildx handles multi-architecture builds; cosign signs the result.

OCI Image Layout

manifest.json references config + layers by digest config.json env, cmd, exposed ports, history Layers debian:base.tar.gz apt-installed.tar.gz node_modules.tar.gz app-source.tar.gz each is content-addressed (sha256) cosign signature stored as separate OCI artifact referenced via OCI 1.1 referrer

Key Numbers

128
overlay2 max layers
2 MB
distroless/static base size
5-10x
build speedup from BuildKit caching
2023
BuildKit became default (Docker 23.0)
Docker, OCI
two image-spec formats (compatible)
sha256:
content addressing for every blob

The Cache: Why Order Matters

# BAD — every source change rebuilds node_modules
FROM node:20
WORKDIR /app
COPY . .
RUN npm install
CMD ["node", "server.js"]

# GOOD — manifest first, then install, then source
FROM node:20
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm install --omit=dev
COPY . .
CMD ["node", "server.js"]

# Cache key for each layer is computed from:
#   - the instruction text
#   - the contents of files referenced by COPY/ADD (hashed)
#   - parent layer's digest
# Anything that changes the cache key invalidates this and all later layers.

Multi-Stage Builds

# Stage 1: heavy builder
FROM golang:1.22 AS builder
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /app

# Stage 2: tiny runtime
FROM gcr.io/distroless/static-debian12
COPY --from=builder /app /app
USER nonroot:nonroot
ENTRYPOINT ["/app"]

# Result: a ~12 MB image instead of ~1.5 GB.
# 'builder' is discarded — only its /app file ends up in the final image.

# Build a specific stage (useful for debugging)
$ docker build --target builder -t myapp-builder .

BuildKit: Mounts, Secrets, Cache

# syntax=docker/dockerfile:1.6
FROM golang:1.22

# Cache the Go module download dir between builds
RUN --mount=type=cache,target=/root/.cache/go-build \
    --mount=type=cache,target=/go/pkg/mod \
    go build -o /app .

# Inject a build-time secret without it ending up in any layer
RUN --mount=type=secret,id=npm_token \
    NPM_TOKEN=$(cat /run/secrets/npm_token) npm install --registry=...

# SSH forwarding for private repos
RUN --mount=type=ssh git clone [email protected]:org/private-repo.git

# Build with secrets and SSH
$ docker build --secret id=npm_token,src=$HOME/.npmrc \
              --ssh default \
              -t myapp .

# Inspect cache usage
$ docker buildx du --verbose

buildx: Multi-Architecture Builds

# Create a builder that supports multiple platforms
$ docker buildx create --name multi --driver docker-container --use
$ docker buildx inspect --bootstrap

# Build for both amd64 and arm64 in one command
$ docker buildx build \
    --platform linux/amd64,linux/arm64 \
    -t myorg/app:v1.0 \
    --push .

# Under the hood:
# - amd64 builds natively
# - arm64 builds via QEMU emulation (slow) or remote arm builder (fast)
# - Both get tagged into a multi-arch manifest in the registry

# Verify what was pushed
$ docker buildx imagetools inspect myorg/app:v1.0
Name:      docker.io/myorg/app:v1.0
MediaType: application/vnd.oci.image.index.v1+json
Manifests:
  Name:      docker.io/myorg/app:v1.0@sha256:...
  Platform:  linux/amd64
  Name:      docker.io/myorg/app:v1.0@sha256:...
  Platform:  linux/arm64

Distroless and scratch

# scratch — empty base. Only works for static binaries.
FROM scratch
COPY app /app
ENTRYPOINT ["/app"]
# Final image: ~5 MB (just the binary)

# distroless — minimal but with /etc/passwd, /etc/ssl/certs, tzdata
FROM gcr.io/distroless/base-debian12
COPY app /app
USER nonroot:nonroot
ENTRYPOINT ["/app"]
# Final image: ~25 MB (binary + minimal libs + CA certs)

# Variants
gcr.io/distroless/static-debian12   # Go, Rust static
gcr.io/distroless/base-debian12     # glibc-linked binaries
gcr.io/distroless/cc-debian12       # C/C++ runtime
gcr.io/distroless/python3-debian12  # Python
gcr.io/distroless/java21-debian12   # JDK 21

# Debugging without a shell — use a debug image variant or k8s ephemeral container
$ docker run -it --entrypoint=sh gcr.io/distroless/...:debug

Signing with cosign

# Generate a key pair (or use keyless OIDC)
$ cosign generate-key-pair

# Sign an image
$ cosign sign --key cosign.key myorg/app:v1.0
$ cosign sign myorg/app:v1.0       # keyless, prompts for OIDC provider

# Verify
$ cosign verify --key cosign.pub myorg/app:v1.0
Verification for myorg/app:v1.0 --
The cosign claims were validated
The signatures were verified against the specified public key

# Kubernetes admission policy (sigstore policy-controller)
apiVersion: policy.sigstore.dev/v1beta1
kind: ClusterImagePolicy
metadata:
  name: require-cosign
spec:
  images:
    - glob: "myorg/*"
  authorities:
    - key:
        data: |
          -----BEGIN PUBLIC KEY-----
          ...

Vulnerability Scanning

# Trivy — open source, scans OS packages and language deps
$ trivy image myorg/app:v1.0
myorg/app:v1.0 (alpine 3.19)
==============================
Total: 3 (HIGH: 2, CRITICAL: 1)

┌──────────┬───────────────┬──────────┬────────┬─────────┐
│ Library  │ Vulnerability │ Severity │ Status │ Fixed   │
├──────────┼───────────────┼──────────┼────────┼─────────┤
│ openssl  │ CVE-2024-...  │ CRITICAL │ fixed  │ 3.1.5   │
└──────────┴───────────────┴──────────┴────────┴─────────┘

# Grype, Snyk, Anchore — alternatives. All read SBOM (CycloneDX or SPDX).
# Generate SBOM
$ syft myorg/app:v1.0 -o cyclonedx-json > sbom.json

Tradeoffs

Modern build wins
  • BuildKit cache mounts make repeated builds 5-10x faster
  • Multi-stage shrinks images by 10-100x
  • Distroless eliminates entire classes of CVEs (no shell, no package manager)
  • Multi-arch via buildx is one flag, not a separate pipeline
Sharp edges
  • Cache invalidation surprises — one COPY * line ruins layer reuse
  • BuildKit features need # syntax= directives that older builders ignore silently
  • Distroless is hard to debug; keep a debug variant for ad-hoc poking
  • QEMU cross-builds are 5-30x slower than native — invest in native arm builders

Frequently Asked Questions

Why is my Dockerfile cache invalidating early?

Each Dockerfile instruction creates a layer; the cache is keyed by the instruction text plus the contents of files referenced by COPY/ADD. If you 'COPY . /app' early, then any file change invalidates that layer and all subsequent ones — including expensive RUN apt-get install. The fix: copy only the dependency manifest first (package.json, go.mod, requirements.txt), run the install, then copy source. Source changes don't invalidate the install layer because the COPY of the manifest hasn't changed.

What does BuildKit do that the legacy builder didn't?

BuildKit is the modern Docker build engine, default since Docker 23.0 (2023). Wins: parallel stage execution (independent stages run concurrently), better caching (per-mount cache directories you can keep between builds: --mount=type=cache,target=/root/.cache), build secrets that don't end up in layers (--mount=type=secret), SSH forwarding for private repos, multi-platform builds via buildx, and content-addressable layer dedup. The legacy builder ran one instruction at a time, single-threaded, with a primitive cache.

What is a multi-stage build?

A Dockerfile with multiple FROM lines. Each FROM starts a new stage; you can COPY files between stages. Typical pattern: a 'builder' stage with the full toolchain (compilers, build deps), then a thin runtime stage with just the binary. The final image is only the last stage; the builder layers are discarded. This is how you go from a 1.5 GB Go build environment to a 15 MB scratch image containing just your binary.

Why use distroless images?

Distroless (Google's gcr.io/distroless/...) ships only your application's runtime dependencies — no shell, no package manager, no busybox, no apt. Benefits: smaller image (10-50 MB), smaller attack surface (no /bin/sh for an attacker to drop into), fewer CVEs in your scanner reports. Drawbacks: harder to debug (no shell to exec into; you need to run a sidecar with tools, or use 'docker debug' / kubectl ephemeral containers). Distroless images are language-specific (distroless/java, distroless/nodejs, distroless/static for Go/Rust).

What does buildx do for multi-arch?

buildx is Docker's CLI for multi-platform builds. It uses BuildKit's QEMU-based emulation or remote builders to produce images for multiple architectures (amd64, arm64, arm/v7) from a single Dockerfile. 'docker buildx build --platform linux/amd64,linux/arm64 -t myimage --push .' produces a multi-arch image manifest in the registry. When a user pulls 'myimage' on an arm64 machine, the registry returns the arm64 variant. Apple Silicon Macs and AWS Graviton instances made this go from nice-to-have to required.

How does cosign signing work?

cosign (from the Sigstore project) signs container images and stores the signatures alongside the image in the registry as separate OCI artifacts. You sign with a key (or keyless via OIDC: GitHub Actions, Google account) and verify with the public key. Critical for supply-chain security: a compromised registry can't substitute a malicious image because the signature wouldn't verify. Kubernetes admission controllers (Sigstore policy-controller, Kyverno) can require valid signatures before allowing images to run.