Distributed Tracing

Spans, traceparent, and the propagation that makes it all work

A trace is a tree of spans. Each span represents a unit of work — an HTTP handler, a database query, a goroutine — with a start time, a duration, a name, attributes, and a parent. The root span has no parent. Every other span has exactly one. Together they describe a single request's journey through every service it touched.

The hard part isn't drawing the tree. It's propagating context across process boundaries so service B knows it's part of service A's trace. That happens via HTTP headers (traceparent), gRPC metadata, message queue attributes, and database session vars — standardized by the W3C Trace Context spec and implemented by every modern instrumentation library.

Anatomy of a Trace

A trace is identified by a 128-bit trace_id. Each span has a 64-bit span_id and an optional parent_span_id. Wall-clock timestamps glue them into a tree.

trace_id = 4bf92f3577b34da6a3ce929d0e0e4736 (128-bit) HTTP /checkout (root span, 480ms) auth.verify (35ms) cart.process (320ms) db.query items (45ms) redis.get (8ms) payment.charge (240ms) stripe.api.v1 (220ms) 0ms 240ms 480ms payment.charge is the critical path - 50% of total latency

Key Numbers

128 bits
trace_id width (16 hex bytes)
64 bits
span_id width (8 hex bytes)
32
max attribute key length recommended (semconv)
128
default max attributes per span (OTel SDK)
1 KB
average span size on the wire (OTLP gzip'd)
~30
spans per trace in a typical microservice request
W3C
traceparent / tracestate as the open standard

The Span Model

A span carries ten or so essential fields. Everything else is metadata layered on top.

{`message Span {
  bytes  trace_id     = 1;     // 128-bit trace identifier
  bytes  span_id      = 2;     // 64-bit span identifier
  bytes  parent_span_id = 4;   // 64-bit, empty for root
  string name         = 5;     // operation name, e.g. "GET /api/users"
  SpanKind kind       = 6;     // CLIENT, SERVER, INTERNAL, PRODUCER, CONSUMER
  fixed64 start_time_unix_nano = 7;
  fixed64 end_time_unix_nano   = 8;
  repeated KeyValue attributes = 9;     // http.method, db.statement, ...
  repeated Event    events     = 11;    // exception, log-like markers
  repeated Link     links      = 13;    // pointers to related spans
  Status status       = 15;             // OK / ERROR + message
  string  trace_state = 16;             // tracestate header passthrough
}`}

Span kind matters for analytics. SERVER spans are entry points to a service. CLIENT spans are calls to other services. PRODUCER/CONSUMER are message queue ops. INTERNAL is everything else. Backends use kind to build service maps and to label spans as "incoming" vs "outgoing".

W3C traceparent & tracestate

The W3C Trace Context spec defines two HTTP headers that every modern tracer respects. traceparent carries the trace_id, span_id, and sampling flag. tracestate carries vendor-specific context.

{`traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
             ^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^ ^^
             |  |                                |                |
             |  trace-id (16 bytes hex)          parent-id        flags
             version                                              (sampled bit)

tracestate: vendor1=value1,vendor2=value2,otel=k1:v1;k2:v2

# Receiving service:
#   1. Parse traceparent
#   2. Use trace-id and parent-id as the context for new spans
#   3. Honor the sampled flag (or apply local policy that overrides it)
#   4. Pass through tracestate, prepending its own vendor entry if any`}

The 00 at the start is the version. The trailing 01 is the flags byte; bit 0 is the sampled flag. If the upstream caller sampled the trace, every downstream service should respect that and emit spans (head-based consistent sampling).

B3 (Zipkin) Headers

Older, still common in Zipkin-derived ecosystems. B3 splits the trace context across multiple headers (multi-header B3) or stuffs it into one (single-header B3).

{`# Multi-header B3
X-B3-TraceId:      4bf92f3577b34da6a3ce929d0e0e4736
X-B3-SpanId:       00f067aa0ba902b7
X-B3-ParentSpanId: 0020000000000001
X-B3-Sampled:      1

# Single-header B3 (preferred today)
b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-1-0020000000000001

# OTel propagators support both via composite:
#   OTEL_PROPAGATORS=tracecontext,baggage,b3multi`}

Production tip: configure your services to extract both W3C and B3 (read either) but only inject W3C (write the standard). That handles legacy callers gracefully without spreading B3 to new services.

Propagation Across Boundaries

Every transport that crosses a service boundary needs an idiomatic way to carry trace context. The OTel propagator interface gives one API; each integration applies it differently.

HTTP (REST, GraphQL)

traceparent and tracestate headers. Server middleware extracts on incoming, client middleware injects on outgoing. Every OTel HTTP instrumentation does this automatically.

gRPC

gRPC metadata, the equivalent of HTTP headers. Same key names (traceparent). The OTel gRPC interceptor handles both server and client sides; the metadata travels with the request like a header.

Kafka / RabbitMQ / SQS

Per-message headers / properties. The producer attaches traceparent to the message, the consumer extracts it and either continues the same trace (PRODUCER → CONSUMER spans) or starts a new trace linked to the producing trace via a Link.

SQL databases

Either as a SET LOCAL session variable (SET LOCAL trace_id = '...') or embedded in a SQL comment (/* traceparent='...' */). The latter works without server-side support and shows up in slow query logs — useful for connecting DB latency back to the originating trace.

OTel SDK Span Lifecycle

The path a span takes from creation to export.

{`# Pseudocode of OTel SDK internals
def start_span(name, kind, parent_context):
    # 1. Read parent context (from incoming traceparent or current goroutine)
    parent = parent_context.span_context

    # 2. Sampler decides keep/drop
    decision = sampler.should_sample(parent, trace_id, name, kind, attrs)
    if decision == DROP:
        return NonRecordingSpan()  # cheap noop

    # 3. Allocate span object
    span = Span(
        trace_id   = parent.trace_id or new_random_128(),
        span_id    = new_random_64(),
        parent_id  = parent.span_id,
        start_time = now_ns(),
        ...
    )
    return span

span = tracer.start_span("payment.charge", kind=CLIENT)
try:
    span.set_attribute("payment.amount_usd", 42.99)
    do_work()
except Exception as e:
    span.record_exception(e)         # adds an Event with type=exception
    span.set_status(StatusCode.ERROR)
    raise
finally:
    span.end()                        # end_time = now_ns()
    # SpanProcessor receives the ended span
    # BatchSpanProcessor queues it; export thread sends OTLP every N seconds`}

Attributes vs Events vs Links

Three different ways to attach context to a span. Choosing the right one matters for both backend storage and queryability.

MechanismShapeUse for
Attributeskey/value pairs on the span itselfSpan-scoped facts: http.method, db.statement, user.tier
Eventstimestamped log-like markers within the spanThings that happened during the span: exceptions, retry attempts, cache misses
Linksreferences to other spans (different trace OK)Many-to-one or causal relations: batch consumers from N producers

The semantic conventions (semconv) standardize attribute names. Use http.method, not method. Use db.system, not db_type. Vendors and backends rely on these names for service maps and built-in dashboards.

Sampling Decisions in the SDK

The Sampler interface receives the parent context and decides keep/drop before any attributes are set. Three implementations cover most needs.

{`# OTel SDK sampler types (Python example)
sampler = ParentBased(root=TraceIdRatioBased(0.01))
# - If a parent context exists with sampled=true, sample the child
# - If a parent context exists with sampled=false, drop
# - If no parent (root span), apply 1% probabilistic on trace_id

sampler = AlwaysOnSampler()        # 100% - tail sampling at collector handles cost
sampler = AlwaysOffSampler()       # 0% - useful for forks of a request

# Custom sampler that always samples /checkout traces
class HighValueRouteSampler(Sampler):
    def should_sample(self, parent, trace_id, name, kind, attrs):
        if attrs.get("http.route") == "/checkout":
            return Decision.RECORD_AND_SAMPLE
        return self.fallback.should_sample(...)`}

Tradeoffs

More spans = better visibility

Wrapping every function call in a span gives perfect debuggability. It also produces 1000-span traces that nobody can read and storage bills nobody can pay.

Fewer spans = cheaper, less detail

Span only the entry points and outgoing calls. The internals show up as duration on the entry span. Less detail, but the service map stays clean and budgets stay sane.

Attribute richness = backend cost

Tempo and Jaeger both index attributes for search. More attributes means more index entries and slower query response. Pick a 5-10 attribute discipline and stick to it.

Manual vs auto-instrumentation

Auto wins on coverage (HTTP, gRPC, DB, cache, queues all instrumented automatically). Manual wins on business semantics (span events for "coupon applied"). Use both.

FAQ

Why is trace_id 128 bits when span_id is only 64?

The trace_id has to be globally unique across all your services, all your customers, all the time. Birthday-paradox math says 64 bits collides at scale; 128 bits is comfortably collision-free. span_id only has to be unique within a trace, where 64 bits is plenty.

Do I need to instrument my code manually?

For HTTP, gRPC, popular databases, and message queues: no. OTel auto-instrumentation libraries wrap them. For your business logic: yes, the spans that name your domain ("checkout", "fraud_check") have to be added by you because no library knows what they mean.

What if a service doesn't propagate context?

You get a "broken trace" — the receiving service starts a new trace_id and the connection is lost. Mitigate by adding OTel auto-instrumentation as early as possible in your stack (often as a sidecar or middleware), so context propagates even before code changes.

How do I trace through a load balancer?

Most LBs (Envoy, nginx, HAProxy) pass headers transparently. Some can also generate trace context themselves. The risk is custom proxies or API gateways that strip "unknown" headers - configure them to allowlist traceparent and tracestate.

Should I trace synchronous internal function calls?

Generally no. Spans for in-process function calls produce noisy traces and double-count time (parent span includes child). Use them sparingly for boundaries where you need timing data, not for every function.

What's a "span link" actually for?

A batch consumer that processes 100 messages from Kafka. Each message comes from a different producing trace. The consumer span Links to all 100 producing spans — one consumer trace, with provenance to N parents. Without links you'd lose the connection or fan out the trace tree absurdly.