DuckDB Internals

DuckDB is an in-process OLAP database that crunches billions of rows on a laptop. No server, no config, no dependencies — just embed it and query. These pages dissect the engineering that makes this possible: vectorized execution, push-based pipelines, adaptive radix trees, and a query optimizer that rivals production systems ten times its size.

Inspired by the DiDi curriculum from the University of Tübingen. All content is original.

DuckDB Architecture Overview

Key Numbers

Vector Size

2,048

Execution Model

Push-based

Index Type

ART (Radix)

Parallelism

Morsel-driven

Storage

Columnar

Why DuckDB Exists

The Gap

Before DuckDB, local analytical queries meant either: (a) load everything into pandas (slow, memory-hungry), (b) stand up a PostgreSQL/ClickHouse server (ops overhead), or (c) use SQLite (row-store, terrible for analytics). There was no embedded columnar engine.

The Insight

Modern laptops have 8-64 cores and 16-128GB RAM — enough to analyze datasets that previously required distributed clusters. What was missing was a database engine designed to exploit this hardware without the operational complexity of a server.

The Result

DuckDB combines 30 years of database research (vectorized execution from MonetDB, morsel parallelism from HyPer, ART from Tübingen) into a single embeddable library. pip install duckdb and you have a full analytical engine.

✦ Live

DuckDB vs Other Databases

	DuckDB	PostgreSQL	SQLite	ClickHouse
Deployment	Embedded (in-process)	Client-server	Embedded (in-process)	Client-server / cluster
Storage layout	Columnar	Row-based (heap)	Row-based (B-tree)	Columnar (MergeTree)
Best for	Local analytics, data science	OLTP + moderate OLAP	OLTP, mobile, config	Production OLAP at scale
Execution model	Vectorized, push-based	Volcano (pull, row)	Virtual machine (row)	Vectorized, push-based
Parallelism	Morsel-driven (auto)	Parallel query (limited)	Single-threaded	Multi-threaded + distributed
Concurrency	Single writer, multi reader	Full MVCC	WAL mode multi-reader	Append-oriented
Index type	ART (Adaptive Radix Tree)	B-tree, GIN, GiST	B-tree	Sparse (MergeTree primary)
Dependencies	Zero	Server + extensions	Zero	Server + ZooKeeper/Keeper

DuckDB Internals

DuckDB Architecture Overview

Key Numbers

Why DuckDB Exists

Vectorized Execution

Query Pipeline

ART Indexing

Sorting Large Tables

Memory Management

Query Optimizer

Storage Format

DuckDB vs SQLite

DuckDB vs Other Databases

DuckDB Internals

DuckDB Architecture Overview

Key Numbers

Why DuckDB Exists

Vectorized Execution

Query Pipeline

ART Indexing

Sorting Large Tables

Memory Management

Query Optimizer

Storage Format

DuckDB vs SQLite

DuckDB vs Other Databases

🔗 Related Topics