strace

Watching Every Syscall a Process Makes — and What It Costs You

strace is the universal "what is this thing actually doing?" tool on Linux. It prints every system call a process makes, with arguments and return values, in human-readable form. It works on any binary — Go, Rust, Python, a closed-source blob — because all of them ultimately go through the same kernel syscall interface.

The mechanism is ptrace(2), the same primitive that powers gdb. Every syscall triggers two stops in the target process: one before, one after. The tracer reads the target's registers, decodes the arguments, prints them, and resumes. This is powerful but expensive — easily 20x slowdown for syscall-heavy programs. For production observability, modern eBPF tools (bpftrace, bcc) do the same job at a fraction of the cost.

How strace Works

A four-stage dance between tracer, target, and kernel. Every single syscall.

Target Process Kernel strace (tracer) 1. issues syscall 2. STOPS target PTRACE_SYSCALL_ENTER 3. reads regs, decodes args 4. PTRACE_SYSCALL runs syscall, stops on EXIT resumes after tracer prints

Key Numbers

~20x
typical slowdown for syscall-heavy programs
2
context switches per traced syscall
~400
syscalls strace knows how to pretty-print
1
tracer per process (limit imposed by ptrace)
~10 µs
overhead added per syscall
1992
year strace was originally written for SunOS

The Recipes You'll Actually Use

# Trace a command from start to finish
$ strace ls /tmp

# Follow forks and threads (essential for any non-trivial program)
$ strace -f make -j4

# Attach to a running process
$ strace -p 1234

# Filter to just file-related syscalls
$ strace -e trace=open,openat,close,read,write,stat,lstat,fstat ./app

# Use syscall classes (faster to type)
$ strace -e trace=%file ./app          # all file ops
$ strace -e trace=%network ./app       # all socket ops
$ strace -e trace=%process ./app       # fork/exec/wait
$ strace -e trace=%signal ./app        # signal delivery

# Show only failed syscalls (great for "why won't it open?")
$ strace -e trace=openat -e fault=openat:error=EACCES ./app   # inject failure
$ strace -Z ./app                                              # only show errors

# Aggregate counts and timings (what's the program spending time on?)
$ strace -c ./app
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 84.23    0.012345           4      3145           read
 10.11    0.001482           2       846       12 openat
  3.55    0.000520           1       512           close

# Print full strings (default is truncated to 32 bytes)
$ strace -s 1024 -e trace=write ./app

# Save to file, with timestamps and one PID per file
$ strace -ff -tt -o /tmp/trace ./app
$ ls /tmp/trace.*
trace.4123  trace.4124  trace.4125

# Decode socket address structures
$ strace -yy -e trace=%network curl https://example.com

Decoding the Output

A typical line:

openat(AT_FDCWD, "/etc/hosts", O_RDONLY|O_CLOEXEC) = 3
       ^         ^               ^                    ^
       |         path            flags                return value (FD)
       directory FD reference

Negative return values become -1 ENOENT (No such file or directory) — strace decodes errno for you. Pointers are dereferenced when strace knows the type:

fstat(3, {st_mode=S_IFREG|0644, st_size=412, ...}) = 0
read(3, "127.0.0.1\tlocalhost\n::1\tlocalh"..., 4096) = 412
                                                 ^^^^
                                                 truncated to -s value

The ptrace API That Powers It

/* What strace actually calls (simplified) */

/* 1. Fork the target, child issues PTRACE_TRACEME and execs */
pid_t kid = fork();
if (kid == 0) {
    ptrace(PTRACE_TRACEME, 0, NULL, NULL);
    raise(SIGSTOP);                              /* wait for parent */
    execvp(argv[0], argv);
}

/* 2. Parent waits for the SIGSTOP, sets options */
waitpid(kid, &status, 0);
ptrace(PTRACE_SETOPTIONS, kid, 0,
       PTRACE_O_TRACESYSGOOD |     /* tag syscall stops with bit 7 in signal */
       PTRACE_O_TRACEFORK |        /* trace children too */
       PTRACE_O_TRACEEXEC);

/* 3. The main loop: continue, wait, decode, repeat */
while (1) {
    ptrace(PTRACE_SYSCALL, kid, 0, 0);   /* resume until next syscall stop */
    waitpid(kid, &status, 0);
    if (WSTOPSIG(status) == (SIGTRAP | 0x80)) {
        struct user_regs_struct regs;
        ptrace(PTRACE_GETREGS, kid, 0, ®s);
        /* regs.orig_rax = syscall number, regs.rdi/rsi/rdx = args */
        decode_and_print(®s);
    }
}

strace vs bpftrace

Aspectstracebpftrace
Mechanismptrace stopseBPF programs attached to tracepoints/kprobes
Overhead~20x for syscall-heavy programs~1-5% even at high rates
ScopeOne process (and children with -f)System-wide or filtered to PIDs
OutputOne line per syscall, full argsAggregations: counts, histograms, summaries
FilteringBy syscall name and classArbitrary expressions in BPF DSL
Per-call argsAll decoded by defaultYou write what you want
Production-safe?No (slowdown affects user-visible behavior)Yes (designed for it)
# bpftrace: count syscalls by name across the whole system
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[ksym(args->id)] = count(); }'

# bpftrace: latency histogram for read()
bpftrace -e '
  tracepoint:syscalls:sys_enter_read { @s[tid] = nsecs; }
  tracepoint:syscalls:sys_exit_read /@s[tid]/ { @ns = hist(nsecs - @s[tid]); delete(@s[tid]); }
'

Tradeoffs

Pros
  • Universal — works on any binary, any language
  • Pretty-prints hundreds of syscall arg structures for free
  • No kernel-side setup; ships in every distro
  • Shows exactly the userspace/kernel boundary, with arguments
Cons
  • 20x overhead can mask the bug you're hunting (timing-dependent races vanish)
  • Single-tracer limit conflicts with debuggers
  • Useless for kernel-internal events (page faults, scheduling)
  • Output volume swamps your terminal for busy programs

Server Hosting for Linux Experiments

To follow along with the strace examples, you'll want root access to a Linux machine. The providers below offer fast SSDs, full root access, and are well-suited for running your own strace experiments and kernel tracing exercises.

  • Vultr — Cloud compute with root access, starting at $2.50/month. 100% SSDs, fast networking, and full KVM virtualization. Good for spinning up isolated test machines to practice strace and other Linux debugging tools. → Get $100 free credit
  • DigitalOcean — Droplets from $4/month. Excellent documentation and one-click app installations. Well-regarded in the developer community, great for Linux experiments. → Get $100 free credit
  • Cloudflare — Free DNS management and CDN. While Cloudflare doesn't offer VMs, it's essential for anyone running web services and pairs well with a VPS for DNS and DDoS protection.

Disclosure: Some links above are affiliate links. If you sign up through them, I may earn a small commission at no extra cost to you. This helps support the site.

Frequently Asked Questions

Why is strace so slow?

Each syscall the target makes triggers two context switches: the kernel stops the target before the syscall, the tracer reads its registers and arguments, the tracer issues PTRACE_SYSCALL to resume, the kernel stops the target again on syscall exit, the tracer reads the return value, then resumes. So a syscall that took 0.5 microseconds now costs ~10 microseconds — a 20x slowdown that compounds for syscall-heavy workloads. Programs that do mostly compute and few syscalls slow down only mildly; programs that do millions of small syscalls (e.g., a web server serving small files) can become unusable. The cost is built into ptrace: it's an interrupt-driven inspection model, not a sampling model.

How does strace differ from ltrace?

strace traces syscalls — the boundary between user space and the kernel. It works on any binary regardless of language. ltrace traces library calls — calls to functions in shared libraries (libc, libssl, etc.). ltrace works by overwriting PLT entries with breakpoints; it's much less reliable than strace and breaks easily on modern binaries that use BIND_NOW, full RELRO, or static linking. For real production debugging, strace is the workhorse; ltrace is a curiosity that mostly works on glibc-linked dynamic binaries.

When should I use bpftrace instead of strace?

Use bpftrace when (a) the syscall overhead would be unacceptable in production, (b) you want aggregate statistics across many processes without picking one to attach to, or (c) you want to filter on conditions strace can't express (e.g., 'show me all openat() calls that returned EACCES from PIDs whose parent is sshd'). Use strace when you need exhaustive per-call argument decoding for a single process during debugging — bpftrace can show syscall counts and selected arguments but doesn't have strace's library of pretty-printers for every syscall struct.

What's the difference between -f and -ff?

-f follows forks/clones — when the traced process spawns a child, strace traces the child too. Output is interleaved, prefixed with the PID. -ff (with -o filename) writes each PID's trace to a separate file (filename.PID). Useful for multi-process programs (web servers, build systems) where interleaved output is unreadable. Without -f, you only see the parent and miss everything the children do — a common confusion when tracing things like make or systemd services.

Can I attach strace to a running process?

Yes: strace -p PID. The kernel sends SIGSTOP-like ptrace stops to interrupt the target, and the tracer takes over. Detach with Ctrl-C and the target resumes normally. Caveat: only one tracer per process. If gdb or another strace is already attached, you get EPERM. Also: on systems with kernel.yama.ptrace_scope=1 (default on Ubuntu), you can only attach to direct descendants without CAP_SYS_PTRACE — use sudo or set ptrace_scope=0.

Why does strace -c sometimes lie about timings?

strace -c reports per-syscall counts and aggregate time. The 'time' column is the time spent inside the kernel servicing each syscall, measured by ptrace stop timestamps. It excludes user-space time between syscalls (good) but it also adds the ptrace overhead itself (bad), and on some kernels the timestamps come from coarse clocks. For accurate kernel-time profiling, use perf or bpftrace's hist() over kprobe:do_sys_*. strace -c is fine for spotting which syscalls dominate, not for measuring absolute latencies.