Adaptive Radix Trees (ART)

DuckDB's Cache-Friendly Index That Replaced B-Trees

Most databases reach for a B-tree when they need an index. B-trees minimize disk I/O by packing hundreds of keys into each page-sized node, but in an in-memory engine like DuckDB, disk access is not the bottleneck — CPU cache efficiency is. The Adaptive Radix Tree (ART) is a trie-based structure that traverses keys byte-by-byte instead of doing binary comparisons. Each node adapts its size to the actual number of children it has, keeping nodes small enough to fit in a CPU cache line. The result: lookups in O(k) time where k is the key length, regardless of how many millions of keys are stored.

Radix Tree vs B-Tree: The Core Difference

B-Tree Lookup

1. Load root node (many keys)
2. Binary search within node
3. Follow pointer to child page
4. Repeat at each level
O(log n) comparisons
Each comparison touches the full key. Large nodes cause cache misses.

ART Lookup

1. Read first byte of key
2. Follow that byte's child pointer
3. Read next byte, follow pointer
4. Repeat for each byte in key
O(k) where k = key length
No comparisons within nodes. Nodes fit in cache lines.

Adaptive Node Sizes

ART uses four node types that adapt to the number of children. Sparse nodes stay small; dense nodes allow direct indexing. The engine promotes nodes automatically when they grow.

Node4
1 - 4 children
Linear search through sorted array of up to 4 key bytes
~52 bytes — fits in one cache line
Fastest for sparse prefixes
Node16
5 - 16 children
SIMD parallel comparison — checks all 16 bytes at once with a single CPU instruction
~160 bytes — 2-3 cache lines
SIMD makes this nearly as fast as Node4
Node48
17 - 48 children
256-byte index array maps key bytes to 48 child slots — one indirection, no search
~656 bytes
Constant-time with one extra lookup
Node256
49 - 256 children
Direct 256-pointer array — key byte is the array index, zero search required
~2 KB
Constant-time, no indirection at all

Interactive ART Visualization

Insert keys and watch the tree adapt. Nodes change color and size as they promote from Node4 to larger types. Path compression collapses single-child chains.

Insert keys to build the ART. Watch nodes promote from Node4 to Node16 when they exceed 4 children.
Keys: 0 Nodes: 0 Tree Height: 0 Compressed Prefixes: 0

Path Compression

When a chain of nodes each has only one child, ART collapses them into a single node that stores the shared prefix. This dramatically reduces tree height and memory usage for keys with long common prefixes.

Without compression
'a'
|
'p'
|
'p'
|
'l'
|
'e'
5 nodes, 5 pointer dereferences
->
With path compression
"apple"
1 node, 1 pointer dereference

Lazy Expansion

ART delays creating inner nodes until a second key forces it. If a prefix path has only one key, the key is stored directly without expanding the trie structure.

1
Insert "cat" — stored as a single leaf. No inner nodes created.
2
Insert "car" — shares prefix "ca". Now ART expands: creates inner nodes for "c" and "a", then branches at the third byte ("t" vs "r").
3
Insert "dog" — no shared prefix with "ca*". Stored as a new leaf off the root. No expansion of the "ca" subtree.

Performance: ART vs B-Tree Numbers

O(k)
ART lookup time
k = key length in bytes. Independent of dataset size.
O(log n)
B-tree lookup time
Each level requires binary search with full key comparisons.
64 B
Node4 fits in one cache line
No cache misses during traversal of sparse nodes.
1 instr
Node16 SIMD search
SSE2 compares all 16 key bytes in a single CPU instruction.

Frequently Asked Questions

Why does DuckDB use ART instead of B-trees for indexing?

B-trees were designed for disk-based systems where minimizing I/O is paramount — large nodes amortize the cost of each page fetch. DuckDB operates primarily in-memory, so disk I/O is not the bottleneck. ART's byte-by-byte key traversal avoids the binary search overhead inside each B-tree node. Each ART node fits within a single cache line (64 bytes for Node4 and Node16), making traversal extremely cache-friendly. The result is O(k) lookup time based on key length rather than O(log n) based on dataset size.

What are the four ART node types and when does each one apply?

Node4 stores 1-4 children using sorted arrays and linear scan — it occupies just 52 bytes. Node16 holds 5-16 children and uses SIMD parallel comparison to check all keys in a single CPU instruction. Node48 supports 17-48 children via a 256-byte index array that maps key bytes directly to child slots. Node256 handles 49-256 children with a flat 256-pointer array for constant-time direct lookup. The engine promotes a node to the next size when it exceeds its capacity.

How does path compression work in an Adaptive Radix Tree?

When a sequence of internal nodes each has only a single child, ART collapses them into one node that stores the shared prefix bytes directly. For example, if keys 'application' and 'applicable' share the prefix 'applic', ART stores those six bytes in a single compressed node instead of creating six one-child nodes. This reduces tree height, saves memory, and speeds up traversal because fewer pointer dereferences are needed.

Does ART support range queries or only point lookups?

ART supports both. For range queries, the tree is traversed to the starting key, and then an in-order traversal visits all keys within the range. Because ART stores keys in lexicographic order by construction, range scans follow the natural tree structure without additional sorting. Leaf nodes at the same depth are adjacent in traversal order, making range iteration efficient.

What is lazy expansion and how does it save memory?

Lazy expansion means ART does not create inner trie nodes until they are actually required by a second key sharing the same prefix. If only one key passes through a given prefix path, ART stores that key directly in a leaf without expanding the intermediate structure. When a second key with a conflicting prefix is inserted, the engine expands just enough of the trie to distinguish the two keys. This avoids wasting memory on single-key paths that would otherwise require many one-child nodes.