π² MergeTree Engine
Column Storage, Parts & Background Merges β The Heart of ClickHouse
π Column vs Row Storage
Click a column header to see how column-oriented storage reads only what it needs.
Row-Oriented (PostgreSQL-style)
Column-Oriented (ClickHouse)
π¦ Parts & Background Merges
Each INSERT creates a new "part" on disk. Background merges combine parts for efficiency.
π Activity Log
π Sparse Index (Primary Key) Lookup
ClickHouse doesn't index every row β it indexes every granule (8192 rows). Enter a value to see how it skips granules.
ποΈ Compression Codecs
Column storage enables massive compression. Different codecs suit different data patterns.
β‘ Data Skipping Indexes
Beyond the primary index, MergeTree can skip granules using minmax, set, bloom_filter, and ngrambf indexes.
minmax
Stores min/max per granule block. Best for: range queries on nearly-sorted data.
WHERE temperature BETWEEN 20 AND 25 set(N)
Stores unique values (up to N) per block. Best for: equality on low-cardinality columns.
WHERE status = 404 bloom_filter
Probabilistic membership test. Best for: equality checks on high-cardinality columns.
WHERE user_id = 'abc123' ngrambf_v1
N-gram bloom filter for substring search. Best for: LIKE '%pattern%' queries.
WHERE url LIKE '%/api/v2%' π³ The MergeTree Family
MergeTree
Base engine. Stores all inserted rows. Use for append-only analytics and event logs.
ENGINE = MergeTree() ReplacingMergeTree
Deduplicates rows by sorting key during merges. Last version wins. Use for mutable dimension tables.
ENGINE = ReplacingMergeTree(version) SummingMergeTree
Automatically sums numeric columns for rows with same sorting key during merges. Use for pre-aggregated counters.
ENGINE = SummingMergeTree(columns) AggregatingMergeTree
Stores partial aggregation states (AggregateFunction types). Use with materialized views for real-time rollups.
ENGINE = AggregatingMergeTree() CollapsingMergeTree
Uses a sign column (+1 / -1) to logically delete/update rows during merges. Use for mutable event streams.
ENGINE = CollapsingMergeTree(sign) π Partitioning
Partitions organize data into independent directories. Queries that filter on partition keys skip entire partitions.
ποΈ Key MergeTree Settings
index_granularity
Rows per granule. Smaller = more precise skipping, larger index.
index_granularity = 8192 min_bytes_for_wide_part
Parts below this use compact format (single file). Above = wide format (file per column).
min_bytes_for_wide_part = 10485760 merge_with_ttl_timeout
Minimum delay between TTL-triggered merges
merge_with_ttl_timeout = 14400 max_parts_in_total
Too many parts = "too many parts" error. Keep under this limit.
max_parts_in_total = 100000