☸️ Kubernetes Autoscaling

How Kubernetes Dynamically Adjusts Pods and Nodes — From HPA to Karpenter

Kubernetes autoscaling automatically adjusts the number of pods (HPA, VPA) and the size of the cluster itself (Cluster Autoscaler, Karpenter). The goal is simple: match capacity to demand in real time, without over-provisioning that burns cloud budget or under-provisioning that causes latency spikes and OOM kills. The key insight: metrics drive decisions — CPU utilization, memory pressure, request rates, and custom application signals all feed the autoscaler. Getting the right metrics and the right threshold is 80% of the work.

HPA Algorithm Visualizer

The Horizontal Pod Autoscaler adjusts replica count based on observed metrics. Watch how it evaluates current vs desired utilization and scales up or down in discrete steps.

Stable state: current replicas = desired replicas. No scaling needed.

Metric Evaluation

HPA State

CPU45%70%✓ OK

Memory60%80%✓ OK

Replicas33Stable

Metrics-Driven Scaling

What metric should you use? The wrong metric causes oscillation or slow response. Compare common patterns:

CPU Utilization

Simple, built-in, no extra setup

Bad for I/O-bound workloads (disk-intensive DBs)

★★★★☆

Memory Utilization

Essential for caches (Redis, ClickHouse)

Leaks cause gradual OOM accumulation

★★★☆☆

Requests per Second

Aligns directly with user-facing SLAs

Requires Prometheus Adapter, baseline calibration

★★★★★

Custom Metrics

Queue depth, p99 latency, business KPIs

Complex setup, can introduce oscillations

★★★☆☆

CPU Utilization

Built into Kubernetes.metrics-server. The HPA controller queries the metric API every 15 seconds (configurable via --horizontal-pod-autoscaler-sync-period). Scales when: current / desired > 1.1. Scales down when: current / desired < 0.9. Uses a stabilization window (default 5 min) to prevent thrashing during rapid fluctuations.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: clickhouse-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: clickhouse
  minReplicas: 3
  maxReplicas: 30
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

VPA vs HPA: When to Use Which

Vertical Pod Autoscaling adjusts resource requests/limits on existing pods. HPA adds/removes pods. They address different problems.

Dimension HPA VPA Together (CA)

What changes Replica count CPU/memory requests Pods + node count

Use case Traffic spikes, request spikes Memory leaks, growing caches All of the above

Disruption Pod restart (configurable) Pod eviction & restart Multiple pod changes

Conflicts VPA in "Auto" mode HPA on same metric Requires VPA Off mode

Best for API servers, stateless services Databases, caches, stateful pods Full-stack auto-scaling

⚠️ VPA in "Auto" mode and HPA on the same workload conflict. Use VPA in "Off" or "Initial" mode when HPA is active. VPA recommendation can be monitored via vpa-updater and applied manually or via admission webhook.

ClickHouse Workload: A Real Autoscaling Story

ClickHouse is stateful, memory-hungry, and I/O-intensive — a tricky target for autoscaling. Here's a production-grade setup:

Ingress

Load Balancer

→

Query Gateway

↓

Compute Layer (HPA)

clickhouse-server × 3–30

↓

Storage Layer (Fixed)

S3 / MinIO (data never moves)

HPA Metric HTTP requests/sec per pod NOT CPU — ClickHouse CPU spikes are query-bound, not pod-bound

Scale-up trigger > 500 req/s per pod for 30s

Scale-down trigger < 100 req/s per pod for 5 min

VPA Memory Initial: 8Gi, Max: 64Gi ClickHouse buffers are sized to memory; more memory = bigger query buffers

Stabilization 5-minute stabilization window Prevents scale oscillation during batch query bursts

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: clickhouse-query-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: clickhouse-query
  minReplicas: 3
  maxReplicas: 30
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
      - type: Percent
        value: 100   # Double pods in one step max
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 1      # Remove max 1 pod per step
        periodSeconds: 60
  metrics:
  - type: External
    external:
      metric:
        name: http_requests_total
        selector:
          matchLabels:
            service: clickhouse-query
      target:
        type: AverageValue
        averageValue: "500"

Cluster-Level Scaling: Cluster Autoscaler vs Karpenter

HPA adjusts pods. Cluster Autoscaler and Karpenter adjust nodes. They work at different layers of the stack.

Feature Cluster Autoscaler Karpenter

Provisioner model Node groups + node pools Provisioner CRD (declarative)

Scale-up speed ~30–60 seconds (node bootstrap) ~15–30 seconds (direct launch)

Spot instance support Yes (mixed instance pools) Yes (native, flexible)

Bin-packing Default bin-packing More efficient (Grafana metrics)

Diversity Limited to defined node groups Any instance type on demand

Shutdown grace 10 minutes Configurable TTL expiry

Best for Predictable, batch workloads Dynamic, mixed workloads

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: clickhouse-spot
spec:
  ttlSecondsAfterEmpty: 120   # Scale to zero after 2 min idle
  limits:
    resources:
      cpu: "256"
      memory: 512Gi
  provider:
    amiFamily: AL2
    instanceTypes:
      - c6i.8xlarge
      - c6i.16xlarge
      - c6i.32xlarge
    capacityType: spot         # 70% cheaper than on-demand
  requirements:
    - key: karpenter.k/capacity-type
      operator: In
      values: ["spot"]
    - key: node.kubernetes.io/lifecycle
      operator: In
      values: ["spot"]

Autoscaling Anti-Patterns

These mistakes cause oscillation, OOM kills, or runaway costs:

❌ Scale on CPU with I/O-bound workloads

A ClickHouse pod doing heavy disk scans barely uses CPU but starves for I/O. CPU-based HPA won't respond. Use query rate or queue depth instead.

❌ No stabilization window

Without stabilizationWindowSeconds, the HPA reacts to every spike and trough. This causes "flapping" — rapid scale-up, then scale-down, burning cloud budget.

❌ Scale-down to zero on stateful services

ClickHouse pods share data via S3 — but query state (processing buffers, connection pools) lives in-memory. Scaling to zero doesn't "free" memory in the traditional sense, and scaling back up is slow.

❌ Too-wide min/max range without budget controls

maxReplicas: 100 without spend guards can result in a $50K/month bill after a traffic spike. Always set behavior.scaleUp.stabilizationWindowSeconds and cost alerts.

✅ Right metric: requests/sec per pod

For stateless API servers and ClickHouse query nodes, requests/sec divided by replica count gives a per-pod utilization that's stable, fast-responding, and directly tied to user experience.

✅ Warm pool for latency-sensitive services

Keep minReplicas: 2 even at 3 AM for services with SLA < 100ms. Cold starts on new pods add 10–30 seconds of latency. The cost of idle replicas is almost always worth it.