Trace Sampling Strategies

Head-Based vs Tail-Based · Probabilistic vs Deterministic · The Cost-Quality Tradeoff

A single production request at 100 RPS with 30 spans and ~1 KB per span produces 3 MB/sec of telemetry. At 1 GB/day ingestion, a single observability backend can cost $50K/month. Sampling is the pressure valve — but it has a catch: the traces most worth keeping are the ones that look normal until they don't. Sampling strategies answer the question of when the sampling decision is made and what rule is applied.

The core split: head-based sampling decides before the trace is complete (simple, low overhead, but may discard the interesting tail). Tail-based sampling decides after the trace finishes (keeps exactly the right traces, but requires buffering and distributed coordination). Most production systems use a hybrid: head-based for the 99% of fast requests, tail-based for the 1% that are slow or errored.

Why Sample — The Bandwidth Math

Move the sliders to see how fast trace volume grows.

RPS: 10,000 100100K

Avg spans per trace: 30 5200

Avg bytes per span: 1,024 256 B4 KB

Retention (days): 7 130

Head-Based Sampling — Decision at Request Start

The sampler decides whether to keep a trace before seeing its outcome. Low overhead, distributed-friendly, but can't prioritize slow/error traces.

Tail-Based Sampling — Decision After the Trace

The collector buffers spans and decides to keep a trace only after it completes. Guarantees you keep the interesting 1% — slow requests, errors — but requires stateful buffering and memory.

Tail budget (traces/sec): 100

Error rate threshold: 0%

P99 latency threshold (ms): 500

Trace arrival rate (traces/sec): 10,000

Trace Decision Simulator

Generate traces and watch each sampling strategy make keep/discard decisions. Which strategy keeps the most interesting traces?

Strategy:

Traces to generate: 50

Hybrid Sampling — The Production Standard

Most large-scale deployments use a two-stage approach: head-based sampling at the application (collect 1% always) + tail-based sampling in the collector (up-sample the interesting head-based traces).

100K

No sampling (traces/sec)

Head-only 1% (traces/sec)

100

Hybrid final (traces/sec)

99.9%

Cost reduction

OTel TailSamplingProcessor Configuration

The OTel Collector tail_sampling processor evaluates policy rules against completed traces. Policies are checked in order; the first match wins.

processors:
  tail_sampling:
    decision_wait: 10s          # Buffer spans this long before deciding
    num_traces: 100_000         # Max traces in memory
    expected_new_traces_per_sec: 10_000

    policies:
      # 1. Always keep errors
      - name: errors-policy
        type: status_code
        status_code: { status_codes: [ERROR] }

      # 2. Always keep slow traces (p99 > 1s)
      - name: slow-traces-policy
        type: latency
        latency: { threshold_ms: 1000 }

      # 3. Keep rare operations (low-frequency spans)
      - name: rare-ops-policy
        type: string_attribute
        string_attribute:
          key: operation.name
          values: ["/admin.*", "/debug.*"]  # rare regex patterns

      # 4. Keep traces with specific user segments
      - name: vip-users-policy
        type: string_attribute
        string_attribute:
          key: user.tier
          values: ["premium", "enterprise"]

      # 5. Probabilistic sampling on everything else (1 in 10)
      - name: probabilistic-policy
        type: probabilistic
        probabilistic: { sampling_percentage: 10 }

⚠️ Memory bound: num_traces × avg_trace_size = collector RAM. At 100K traces × 100KB avg = 10 GB RAM. Tune decision_wait shorter to reduce buffer.

💡 Policy order matters — most specific policies first (errors, slow) then general (probabilistic). A trace matching the error policy is kept regardless of later policies.

✅ Always add a probabilistic policy last — otherwise non-matching traces are silently dropped. The probabilistic policy acts as your fallback sampler.

Sampling Strategy Comparison

Strategy	Decision timing	Interesting trace recall	Memory overhead	Complexity	Best for
Probabilistic head	At request start	~10% of slow/error	None	Low	High-volume, uniform traffic
Fixed-rate head	At request start	~10% of slow/error	None	Low	Budget predictable storage
Rule-based head	At request start	60–80% of slow/error	None	Medium	Known error patterns
Tail-based	After trace complete	90–99% of slow/error	High (buffer)	High	Low-volume critical services
Hybrid (head+tail)	Both stages	85–95% of slow/error	Medium	High	Large-scale production

FAQ

Does head-based sampling lose the slow request that caused the incident?

Possibly. If your 1% sample happens to land on a request that becomes slow only after 400ms, and your latency threshold is 500ms, you might miss it. This is why most production deployments add a tail-based layer for the tail — or use rule-based head sampling to always sample requests from VIP users or specific error-prone endpoints.

Why does tail-based sampling require so much memory?

Because the decision to keep a trace isn't made until the last span arrives. If a trace takes 5 seconds to complete, the collector must hold all its spans in memory for 5 seconds. At 10K traces/sec with avg 500ms duration and 10KB per trace, that's ~50 GB of buffer memory just for in-flight traces. This is why num_traces is a hard limit — when the buffer is full, new traces are dropped without tail evaluation.

What is the "tail sampling budget"?

The maximum number of traces per second your observability backend can ingest. For Jaeger with Elasticsearch backend, this might be 500 traces/sec. The tail sampler keeps exactly that many, selected by policy priority. If only 50 traces match your policies, only 50 are kept — leaving head-based sampling as the fallback for the remaining budget.

Can sampling cause me to miss rare bugs that only affect 1 in 10,000 requests?

Yes. At 1% probabilistic sampling, a bug affecting 0.01% of requests will appear in roughly 1 in 1,000,000 sampled traces — effectively invisible. For rare bugs, use tail-based sampling with a "keep everything from this endpoint" rule, or temporarily disable sampling for that endpoint during incident investigation.

What does consistent sampling mean and why does it matter?

Consistent sampling means all spans from the same trace are either all kept or all dropped — you never sample a partial trace. This is achieved by generating the sampling decision from the trace_id hash (deterministic), so any collector processing any span from that trace reaches the same decision. This is critical for tail-based sampling where spans from the same trace may arrive at different times from different services.

How do exemplar links work with sampled traces?

Exemplars are single representative values from a histogram bucket that link to the actual trace that generated them. With head-based sampling, the exemplars in your metrics are from the 1% sampled traces — which may not include the worst outliers. With tail-based sampling, you can configure exemplars specifically for error and slow traces, making metric drill-down actually useful.