Storage

PV, PVC, StorageClass, CSI, and Volume Snapshots

Kubernetes' storage model layers four objects: StorageClass describes a kind of storage and how to provision it; PersistentVolumeClaim is a user's request for some; PersistentVolume is the resulting piece of provisioned storage; and the Pod mounts the PVC like a regular volume. The plumbing between cloud APIs and this model is the Container Storage Interface (CSI).

The model has expanded over the years to cover snapshots, online expansion, ReadWriteMany shared filesystems, raw block devices, ephemeral CSI-backed scratch volumes, and topology-aware scheduling that ensures a Pod lands on a node that can access its volume. Each capability is independent — a CSI driver implements only the parts that make sense for its backing store.

The Storage Stack

Pod volumeMounts PVC 100 Gi, RWO PV 100 Gi, RWO, claim=... StorageClass gp3 CSI Driver (Pods + sidecars) Cloud volume API / NFS server / Ceph cluster

Key Numbers

3
access modes: RWO, RWX, ROX (plus RWOP)
3
reclaim policies: Retain, Delete, Recycle (deprecated)
~30
CSI drivers in production usage
1.13 / 1.20
CSI GA / VolumeSnapshot GA
~10s
typical attach + mount time per PVC
block / fs
two volumeMode options

StorageClass and Dynamic Provisioning

# StorageClass — recipe for a kind of storage
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata: { name: fast-ssd }
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
  encrypted: "true"
  kmsKeyId: arn:aws:kms:us-east-1:...:key/...
volumeBindingMode: WaitForFirstConsumer    # bind only when a Pod is scheduled
allowVolumeExpansion: true
reclaimPolicy: Delete
mountOptions: [discard, noatime]

---
# PVC — request from a user
apiVersion: v1
kind: PersistentVolumeClaim
metadata: { name: pgdata }
spec:
  accessModes: [ReadWriteOnce]
  storageClassName: fast-ssd
  resources:
    requests: { storage: 100Gi }

---
# Pod consuming it
apiVersion: v1
kind: Pod
spec:
  containers:
    - name: postgres
      image: postgres:16
      volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: pgdata

CSI Driver Anatomy

# A CSI driver is two Pods:
# 1. Controller Pod — runs in cluster, talks to cloud API to provision/attach
# 2. Node DaemonSet — runs on every node, handles mount/unmount

# Sidecars (provided by Kubernetes-CSI)
external-provisioner   # watches PVCs, calls CreateVolume/DeleteVolume
external-attacher      # ControllerPublish/Unpublish (cloud attach)
external-resizer       # ControllerExpand
external-snapshotter   # CreateSnapshot
node-driver-registrar  # registers the driver with kubelet
livenessprobe          # k8s-style health for the driver

# CSI gRPC interface (simplified)
service Identity   { rpc GetPluginInfo / GetPluginCapabilities }
service Controller { rpc CreateVolume / DeleteVolume / ControllerPublishVolume /
                      ControllerExpandVolume / CreateSnapshot }
service Node       { rpc NodeStageVolume / NodePublishVolume / NodeUnpublishVolume /
                      NodeExpandVolume }

VolumeSnapshot and Restore

# Snapshot a running PVC
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata: { name: pgdata-2026-05-03 }
spec:
  volumeSnapshotClassName: ebs-snapshot
  source:
    persistentVolumeClaimName: pgdata

---
# Restore: new PVC from the snapshot
apiVersion: v1
kind: PersistentVolumeClaim
metadata: { name: pgdata-restored }
spec:
  accessModes: [ReadWriteOnce]
  storageClassName: fast-ssd
  resources:
    requests: { storage: 100Gi }
  dataSource:
    name: pgdata-2026-05-03
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io

ReadWriteMany Options

BackendAccessLatencyBest for
NFS server in clusterRWX~1 ms LANCheap shared scratch
CephFS (Rook)RWX~2-5 msOn-prem, self-managed
AWS EFSRWX~5-10 msAWS multi-AZ workloads
Azure FilesRWX~5-15 msAzure, SMB compat
GCP FilestoreRWX~2-5 msGCP, NFS protocol
FSx for LustreRWX<1 msHPC, ML training

Local PVs and Ephemeral Volumes

# Local PV — directly attached SSD/NVMe, no cloud volume
apiVersion: v1
kind: PersistentVolume
metadata: { name: local-nvme-1 }
spec:
  capacity: { storage: 1Ti }
  accessModes: [ReadWriteOnce]
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-storage
  local:
    path: /mnt/disks/nvme0n1
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values: [node-7]

---
# Generic ephemeral — PVC lifetime tied to Pod
apiVersion: v1
kind: Pod
spec:
  containers:
    - name: app
      volumeMounts: [ { name: scratch, mountPath: /scratch } ]
  volumes:
    - name: scratch
      ephemeral:
        volumeClaimTemplate:
          spec:
            accessModes: [ReadWriteOnce]
            storageClassName: fast-ssd
            resources: { requests: { storage: 50Gi } }

Tradeoffs

Strengths
  • CSI separates storage from k8s release cadence
  • Dynamic provisioning means users never see PVs directly
  • Snapshots and expansion are first-class operations
  • WaitForFirstConsumer binding avoids volume/Pod topology mismatches
Sharp edges
  • RWX is expensive and rare — design around RWO + replicas if possible
  • Local PVs lose all data when the node dies; replication is your job
  • Snapshots are crash-consistent only — quiesce databases before snapshotting
  • Reclaim policy Delete + accidental PVC delete = data loss; use Retain for stateful

Frequently Asked Questions

PV vs PVC vs StorageClass — what's the relationship?

PersistentVolume (PV) is a piece of storage in the cluster — provisioned by the admin or dynamically — with a capacity, access mode, and reclaim policy. PersistentVolumeClaim (PVC) is a user's request for storage with size and access mode requirements. StorageClass describes the 'kind' of storage (gp3 SSD on AWS, premium-rwo on Azure) and which provisioner to use. Flow: user creates PVC referencing a StorageClass; the StorageClass's CSI provisioner creates a backing volume in the cloud and a PV; PV gets bound to the PVC; the Pod mounts the PVC. Same model whether you provision dynamically (typical) or pre-create PVs (rare).

What is CSI and why does it exist?

Container Storage Interface is a spec that lets storage vendors implement Kubernetes (and Mesos, Cloud Foundry) volume support without putting code in the Kubernetes core. Before CSI, storage drivers (AWS EBS, Ceph, GCE PD) lived as 'in-tree' code inside kubelet — every release was a release of every driver. CSI moved them out: a CSI driver is a Pod that runs in the cluster, registers itself, and handles volume operations via gRPC. Today every cloud and storage system ships a CSI driver; in-tree drivers are deprecated and migrating to CSI.

What's the difference between RWO, RWX, ROX?

ReadWriteOnce — the volume can be mounted as read-write by one node at a time (multiple Pods on that node can share it). This is what AWS EBS, GCE PD, Azure Disk give you. ReadOnlyMany — many nodes can mount read-only. ReadWriteMany — many nodes can mount read-write. RWX requires a real shared filesystem: NFS, CephFS, EFS, FSx for Lustre, GlusterFS. Most databases and stateful apps use RWO; only legacy apps that need shared filesystems (e.g., a CMS where multiple replicas write to the same uploads dir) need RWX.

How do volume snapshots work?

VolumeSnapshot is a CRD (in snapshot.storage.k8s.io/v1) that asks a CSI driver with snapshot support to take a point-in-time copy of a PVC. Snapshots are stored as VolumeSnapshotContent objects (cluster-scoped) referencing provider-specific snapshot IDs. You can create a new PVC from a snapshot — it provisions a new volume initialized from that snapshot's data. Backup workflows (Velero, Stash) integrate with this. Caveat: snapshots are not crash-consistent for databases — quiesce the DB or use its own backup tooling for application-consistent snapshots.

What is volume expansion?

If your StorageClass has allowVolumeExpansion: true, you can increase a PVC's storage request by editing it ('kubectl edit pvc'). The CSI driver expands the underlying volume (most clouds support online expansion now), then the kubelet expands the filesystem inside the Pod. You cannot shrink. The Pod usually doesn't need to restart if both the volume driver and filesystem (ext4, xfs) support online resize. Allows growing as data grows without downtime.

What are CSI ephemeral volumes?

Generic ephemeral volumes (1.21+) and CSI inline ephemeral volumes (1.16+) let you specify a volume directly in the Pod spec without a separate PVC. The volume is created when the Pod starts and deleted when the Pod ends — same lifetime as the Pod. Use cases: scratch space backed by a fast cloud volume, secrets-store CSI driver mounting secrets, BCC for storage. They're like emptyDir but with the full power of CSI (any storage class, with snapshots, etc.).