strace
Watching Every Syscall a Process Makes — and What It Costs You
strace is the universal "what is this thing actually doing?" tool on Linux.
It prints every system call a process makes, with arguments and return values, in
human-readable form. It works on any binary — Go, Rust, Python, a closed-source blob —
because all of them ultimately go through the same kernel syscall interface.
The mechanism is ptrace(2), the same primitive that powers gdb. Every syscall
triggers two stops in the target process: one before, one after. The tracer reads
the target's registers, decodes the arguments, prints them, and resumes. This is
powerful but expensive — easily 20x slowdown for syscall-heavy programs. For
production observability, modern eBPF tools (bpftrace, bcc)
do the same job at a fraction of the cost.
How strace Works
A four-stage dance between tracer, target, and kernel. Every single syscall.
Key Numbers
The Recipes You'll Actually Use
# Trace a command from start to finish
$ strace ls /tmp
# Follow forks and threads (essential for any non-trivial program)
$ strace -f make -j4
# Attach to a running process
$ strace -p 1234
# Filter to just file-related syscalls
$ strace -e trace=open,openat,close,read,write,stat,lstat,fstat ./app
# Use syscall classes (faster to type)
$ strace -e trace=%file ./app # all file ops
$ strace -e trace=%network ./app # all socket ops
$ strace -e trace=%process ./app # fork/exec/wait
$ strace -e trace=%signal ./app # signal delivery
# Show only failed syscalls (great for "why won't it open?")
$ strace -e trace=openat -e fault=openat:error=EACCES ./app # inject failure
$ strace -Z ./app # only show errors
# Aggregate counts and timings (what's the program spending time on?)
$ strace -c ./app
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
84.23 0.012345 4 3145 read
10.11 0.001482 2 846 12 openat
3.55 0.000520 1 512 close
# Print full strings (default is truncated to 32 bytes)
$ strace -s 1024 -e trace=write ./app
# Save to file, with timestamps and one PID per file
$ strace -ff -tt -o /tmp/trace ./app
$ ls /tmp/trace.*
trace.4123 trace.4124 trace.4125
# Decode socket address structures
$ strace -yy -e trace=%network curl https://example.com Decoding the Output
A typical line:
openat(AT_FDCWD, "/etc/hosts", O_RDONLY|O_CLOEXEC) = 3
^ ^ ^ ^
| path flags return value (FD)
directory FD reference Negative return values become -1 ENOENT (No such file or directory) — strace decodes errno for you. Pointers are dereferenced when strace knows the type:
fstat(3, {st_mode=S_IFREG|0644, st_size=412, ...}) = 0
read(3, "127.0.0.1\tlocalhost\n::1\tlocalh"..., 4096) = 412
^^^^
truncated to -s value The ptrace API That Powers It
/* What strace actually calls (simplified) */
/* 1. Fork the target, child issues PTRACE_TRACEME and execs */
pid_t kid = fork();
if (kid == 0) {
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
raise(SIGSTOP); /* wait for parent */
execvp(argv[0], argv);
}
/* 2. Parent waits for the SIGSTOP, sets options */
waitpid(kid, &status, 0);
ptrace(PTRACE_SETOPTIONS, kid, 0,
PTRACE_O_TRACESYSGOOD | /* tag syscall stops with bit 7 in signal */
PTRACE_O_TRACEFORK | /* trace children too */
PTRACE_O_TRACEEXEC);
/* 3. The main loop: continue, wait, decode, repeat */
while (1) {
ptrace(PTRACE_SYSCALL, kid, 0, 0); /* resume until next syscall stop */
waitpid(kid, &status, 0);
if (WSTOPSIG(status) == (SIGTRAP | 0x80)) {
struct user_regs_struct regs;
ptrace(PTRACE_GETREGS, kid, 0, ®s);
/* regs.orig_rax = syscall number, regs.rdi/rsi/rdx = args */
decode_and_print(®s);
}
} strace vs bpftrace
| Aspect | strace | bpftrace |
|---|---|---|
| Mechanism | ptrace stops | eBPF programs attached to tracepoints/kprobes |
| Overhead | ~20x for syscall-heavy programs | ~1-5% even at high rates |
| Scope | One process (and children with -f) | System-wide or filtered to PIDs |
| Output | One line per syscall, full args | Aggregations: counts, histograms, summaries |
| Filtering | By syscall name and class | Arbitrary expressions in BPF DSL |
| Per-call args | All decoded by default | You write what you want |
| Production-safe? | No (slowdown affects user-visible behavior) | Yes (designed for it) |
# bpftrace: count syscalls by name across the whole system
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[ksym(args->id)] = count(); }'
# bpftrace: latency histogram for read()
bpftrace -e '
tracepoint:syscalls:sys_enter_read { @s[tid] = nsecs; }
tracepoint:syscalls:sys_exit_read /@s[tid]/ { @ns = hist(nsecs - @s[tid]); delete(@s[tid]); }
' Tradeoffs
- Universal — works on any binary, any language
- Pretty-prints hundreds of syscall arg structures for free
- No kernel-side setup; ships in every distro
- Shows exactly the userspace/kernel boundary, with arguments
- 20x overhead can mask the bug you're hunting (timing-dependent races vanish)
- Single-tracer limit conflicts with debuggers
- Useless for kernel-internal events (page faults, scheduling)
- Output volume swamps your terminal for busy programs
Server Hosting for Linux Experiments
To follow along with the strace examples, you'll want root access to a Linux machine. The providers below offer fast SSDs, full root access, and are well-suited for running your own strace experiments and kernel tracing exercises.
- Vultr — Cloud compute with root access, starting at $2.50/month. 100% SSDs, fast networking, and full KVM virtualization. Good for spinning up isolated test machines to practice strace and other Linux debugging tools. → Get $100 free credit
- DigitalOcean — Droplets from $4/month. Excellent documentation and one-click app installations. Well-regarded in the developer community, great for Linux experiments. → Get $100 free credit
- Cloudflare — Free DNS management and CDN. While Cloudflare doesn't offer VMs, it's essential for anyone running web services and pairs well with a VPS for DNS and DDoS protection.
Disclosure: Some links above are affiliate links. If you sign up through them, I may earn a small commission at no extra cost to you. This helps support the site.
Frequently Asked Questions
Why is strace so slow?
Each syscall the target makes triggers two context switches: the kernel stops the target before the syscall, the tracer reads its registers and arguments, the tracer issues PTRACE_SYSCALL to resume, the kernel stops the target again on syscall exit, the tracer reads the return value, then resumes. So a syscall that took 0.5 microseconds now costs ~10 microseconds — a 20x slowdown that compounds for syscall-heavy workloads. Programs that do mostly compute and few syscalls slow down only mildly; programs that do millions of small syscalls (e.g., a web server serving small files) can become unusable. The cost is built into ptrace: it's an interrupt-driven inspection model, not a sampling model.
How does strace differ from ltrace?
strace traces syscalls — the boundary between user space and the kernel. It works on any binary regardless of language. ltrace traces library calls — calls to functions in shared libraries (libc, libssl, etc.). ltrace works by overwriting PLT entries with breakpoints; it's much less reliable than strace and breaks easily on modern binaries that use BIND_NOW, full RELRO, or static linking. For real production debugging, strace is the workhorse; ltrace is a curiosity that mostly works on glibc-linked dynamic binaries.
When should I use bpftrace instead of strace?
Use bpftrace when (a) the syscall overhead would be unacceptable in production, (b) you want aggregate statistics across many processes without picking one to attach to, or (c) you want to filter on conditions strace can't express (e.g., 'show me all openat() calls that returned EACCES from PIDs whose parent is sshd'). Use strace when you need exhaustive per-call argument decoding for a single process during debugging — bpftrace can show syscall counts and selected arguments but doesn't have strace's library of pretty-printers for every syscall struct.
What's the difference between -f and -ff?
-f follows forks/clones — when the traced process spawns a child, strace traces the child too. Output is interleaved, prefixed with the PID. -ff (with -o filename) writes each PID's trace to a separate file (filename.PID). Useful for multi-process programs (web servers, build systems) where interleaved output is unreadable. Without -f, you only see the parent and miss everything the children do — a common confusion when tracing things like make or systemd services.
Can I attach strace to a running process?
Yes: strace -p PID. The kernel sends SIGSTOP-like ptrace stops to interrupt the target, and the tracer takes over. Detach with Ctrl-C and the target resumes normally. Caveat: only one tracer per process. If gdb or another strace is already attached, you get EPERM. Also: on systems with kernel.yama.ptrace_scope=1 (default on Ubuntu), you can only attach to direct descendants without CAP_SYS_PTRACE — use sudo or set ptrace_scope=0.
Why does strace -c sometimes lie about timings?
strace -c reports per-syscall counts and aggregate time. The 'time' column is the time spent inside the kernel servicing each syscall, measured by ptrace stop timestamps. It excludes user-space time between syscalls (good) but it also adds the ptrace overhead itself (bad), and on some kernels the timestamps come from coarse clocks. For accurate kernel-time profiling, use perf or bpftrace's hist() over kprobe:do_sys_*. strace -c is fine for spotting which syscalls dominate, not for measuring absolute latencies.