CustomResourceDefinitions

The Extension Mechanism Behind Every Operator

A CRD lets you teach the Kubernetes API server about a new kind of object — a Database, a Certificate, an S3Bucket — and then write a controller that reconciles that object's spec into reality. Almost every modern Kubernetes feature outside the core resources (Pod, Service, ConfigMap, Secret) is a CRD: cert-manager, Argo CD, Tekton, Crossplane, every cloud's database operator, the Gateway API itself.

The CRD is just the schema. The behavior comes from a controller that watches your CR objects and acts on them. The pattern — define an API, watch it, reconcile — is so foundational that the Kubernetes community wraps it in a name: the operator pattern. controller-runtime (Go) and kubebuilder (its scaffolding tool) are the de facto way to build one.

Anatomy of a CRD

Key Numbers

2017

CRDs (then TPRs) introduced

1.16

v1 CRDs GA (2019)

~256 KB

etcd object size limit per CR

subresources: status, scale, none

versions per CRD; one storage version

~50 ms

added webhook latency per CR write

A Real CRD Definition

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.acme.io
spec:
  group: acme.io
  scope: Namespaced
  names:
    plural: databases
    singular: database
    kind: Database
    shortNames: [db]
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          required: [spec]
          properties:
            spec:
              type: object
              required: [engine, version, storage]
              properties:
                engine:
                  type: string
                  enum: [postgres, mysql, mariadb]
                version:
                  type: string
                  pattern: '^[0-9]+\.[0-9]+$'
                storage:
                  type: object
                  properties:
                    size:
                      type: string
                      pattern: '^[0-9]+(Mi|Gi|Ti)$'
                replicas:
                  type: integer
                  minimum: 1
                  maximum: 9
                  default: 1
            status:
              type: object
              properties:
                phase: { type: string }
                conditions:
                  type: array
                  items:
                    type: object
                    required: [type, status]
                    properties:
                      type: { type: string }
                      status: { type: string, enum: [True, False, Unknown] }
                      message: { type: string }
      subresources:
        status: {}
        scale:
          specReplicasPath: .spec.replicas
          statusReplicasPath: .status.replicas
      additionalPrinterColumns:
        - name: Engine
          type: string
          jsonPath: .spec.engine
        - name: Phase
          type: string
          jsonPath: .status.phase
        - name: Age
          type: date
          jsonPath: .metadata.creationTimestamp

Validation: OpenAPI v3

# What you can express in the schema
type: object
required: [name, port]
properties:
  name:
    type: string
    minLength: 1
    maxLength: 63
    pattern: '^[a-z]([a-z0-9-]*[a-z0-9])?$'
  port:
    type: integer
    minimum: 1
    maximum: 65535
  protocol:
    type: string
    enum: [TCP, UDP, SCTP]
    default: TCP
  tags:
    type: array
    maxItems: 10
    items:
      type: string
  metadata:
    type: object
    additionalProperties:
      type: string

# Plus CEL validation rules (1.25+)
x-kubernetes-validations:
  - rule: "self.replicas <= self.maxReplicas"
    message: "replicas cannot exceed maxReplicas"

The Reconcile Loop (controller-runtime)

// kubebuilder generates this skeleton
type DatabaseReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request)
    (ctrl.Result, error) {

    var db acmev1.Database
    if err := r.Get(ctx, req.NamespacedName, &db); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // Handle deletion via finalizer
    if !db.DeletionTimestamp.IsZero() {
        if controllerutil.ContainsFinalizer(&db, "acme.io/finalizer") {
            if err := r.cleanupExternalResources(ctx, &db); err != nil {
                return ctrl.Result{}, err
            }
            controllerutil.RemoveFinalizer(&db, "acme.io/finalizer")
            return ctrl.Result{}, r.Update(ctx, &db)
        }
        return ctrl.Result{}, nil
    }

    // Add finalizer if missing
    if !controllerutil.ContainsFinalizer(&db, "acme.io/finalizer") {
        controllerutil.AddFinalizer(&db, "acme.io/finalizer")
        return ctrl.Result{}, r.Update(ctx, &db)
    }

    // Reconcile: ensure StatefulSet, Service, PVC exist and match spec
    if err := r.ensureStatefulSet(ctx, &db); err != nil {
        return ctrl.Result{}, err
    }

    // Update status (subresource — won't conflict with kubectl apply)
    db.Status.Phase = "Running"
    return ctrl.Result{RequeueAfter: time.Minute}, r.Status().Update(ctx, &db)
}

Conversion Webhooks

# CRD with two versions and a conversion webhook
spec:
  conversion:
    strategy: Webhook
    webhook:
      conversionReviewVersions: [v1]
      clientConfig:
        service:
          namespace: my-system
          name: my-webhook-service
          path: /convert
        caBundle: <base64 CA>
  versions:
    - name: v1alpha1
      served: true
      storage: false       # served but not the storage version
      schema: ...
    - name: v1
      served: true
      storage: true        # all CRs persisted as v1
      schema: ...

# Webhook receives ConversionReview, returns translated objects
# Critical: conversion must be lossless and idempotent

Status Subresource and Finalizers

# Update status without touching spec (won't fight 'kubectl apply')
PUT /apis/acme.io/v1/namespaces/default/databases/main/status
{
  "status": {
    "phase": "Running",
    "replicas": 3,
    "conditions": [
      { "type": "Ready", "status": "True", "lastTransitionTime": "..." }
    ]
  }
}

# Finalizer prevents deletion until cleanup completes
$ kubectl get database main -o yaml | grep finalizers -A1
  finalizers:
  - acme.io/finalizer

# Even after 'kubectl delete', the object stays until your controller
# does its cleanup and removes the finalizer entry.

Operator Pattern in Practice

cert-manager — Certificate, Issuer, ClusterIssuer CRDs; controllers talk to Let's Encrypt, Vault, or self-signed CAs to mint TLS certs
Argo CD — Application, AppProject CRDs; controller pulls from Git and reconciles cluster state to match
Crossplane — Compositions and XRs let you provision cloud infrastructure declaratively from inside Kubernetes
Postgres operators (Zalando, CrunchyData, CloudNativePG) — handle replication, backups, failover, version upgrades
External-Secrets Operator — sync secrets from AWS/GCP/Vault into Kubernetes Secrets

Tradeoffs

When CRDs win

Domain-specific abstractions (Database, Cluster, Certificate)
Reusable operational knowledge — codify runbooks
kubectl works for free (kubectl get/describe/edit)
Native RBAC, audit logging, and versioning come along

Sharp edges

Every CR adds load to etcd and the API server's watch cache
Validation and conversion webhooks add latency to every write
Schema migrations across versions are hard
etcd's 256 KB object limit caps how big a CR can be

Frequently Asked Questions

What's a finalizer for?

A finalizer is a string in metadata.finalizers[] that prevents an object from being deleted until the finalizer is removed. When you 'kubectl delete', the API server sets deletionTimestamp but keeps the object alive. Each controller responsible for cleanup notices the deletionTimestamp, does its work (e.g., delete the cloud resource the CR represents), and removes its finalizer entry. When the list is empty, the API server actually deletes the object. This guarantees you don't orphan external resources when a CR is deleted.

Why does my CR's status keep getting reset?

If you don't enable the status subresource (subresources.status: {} in the CRD), the status field is just a regular field — anyone updating the CR (especially kubectl apply) overwrites it. Enable the status subresource and Kubernetes splits the API into /status and /scale endpoints. Your controller updates only /status; user kubectl apply commands hit the main endpoint and can't touch status. This separation is essential for any controller that publishes computed state.

What does a conversion webhook do?

When your CRD has multiple versions (v1alpha1, v1beta1, v1), the API server needs to convert between them on read and write so clients see the version they asked for. Conversion can be 'None' (only structural changes; v1 must be a strict superset of v1alpha1, no semantic shifts) or 'Webhook' — your code, called by the API server, that translates between versions. Webhooks let you rename fields, restructure data, derive new fields. Cost: every read/write that crosses versions does a network call to your webhook.

Why use kubebuilder vs raw client-go?

kubebuilder (and the underlying controller-runtime library) handles the boilerplate that any controller needs: leader election, informers (cached lookups + change events), workqueues, retry logic, scheme registration. Writing a controller from scratch with client-go is doable but you re-implement the same patterns. kubebuilder also generates CRD YAML, deepcopy methods, RBAC roles, and webhook scaffolding from your Go types and markers (// +kubebuilder:validation:...). Use raw client-go only when you need very specific behavior that controller-runtime constrains.

Should I use OpenAPI v3 schema or just YAML?

Always use OpenAPI v3 schema. Without one, anyone can put any junk in spec — no validation, no kubectl explain output, no field pruning, no defaulting. With one: the API server enforces field types, required fields, ranges, regex patterns; kubectl explain produces docs from your descriptions; unknown fields are rejected; defaults fill in. CRDs without schemas are a v1beta1 relic. Modern v1 CRDs require structural schemas.

What's the operator pattern?

An operator is a controller that encodes operational knowledge for an application — beyond what stock Deployments and StatefulSets can do. Examples: a Postgres operator that handles backups, upgrades, failover; a cert-manager operator that issues TLS certs by talking to Let's Encrypt. The pattern: define a CRD that represents the high-level concept (Postgres cluster, Certificate), write a controller that reconciles desired state to reality (provisioning Pods, configuring replication, requesting certs). Operators move admin runbooks into code.