ποΈ Data Compression
Codecs, Stacking Strategies & Storage Efficiency in ClickHouse
ποΈ Why Column Stores Champion Compression
Row-oriented databases store one row's columns contiguously. Column-oriented databases store one column's values contiguously. That single difference changes everything for compression.
Row-Oriented Storage
Column-Oriented Storage
Column Homogeneity
All values in a column share the same data type and often similar semantic meaning. This predictability lets specialized codecs work at maximum efficiency.
Repetition Amplification
Low-cardinality columns have enormous repetition at the byte level. A column of status codes (200, 404, 500) compresses far better than the same data scattered across rows.
Sequential Pattern Discovery
Timestamps increment by fixed amounts. Sensor readings change gradually. Network IDs have prefix patterns. Column storage exposes these patterns for codec optimization.
Memory Footprint Reduction
Compressed data stays compressed in memory. ClickHouse's vectorized execution processes compressed data directly β fewer memory reads, more cache-friendly scans.
β‘ The ClickHouse Compression Philosophy
ClickHouse optimizes for compression ratio first, decompression speed second. In OLAP workloads, data is read far more often than it is written, and queries scan millions of rows. Getting more data per read is the primary goal.
π― Primary Goal: Maximum Ratio
Better compression means more data fits in disk I/O buffers, more data stays in CPU cache, and less data needs to be read from storage. The savings compound across every query.
βοΈ Acceptable Tradeoff: CPU Time
Decompression CPU cost is predictable and bounded. With modern CPUs doing 5β10 GB/s LZ4 decompression, the CPU cost is almost always worth the I/O savings.
π Secondary Goal: Encode Speed
Data is written once, read many times. ClickHouse tolerates slower encoding in exchange for faster decoding β writes happen during ingestion windows, reads happen constantly.
ποΈ Codec Selectability
Unlike most databases with one or two compression options, ClickHouse exposes per-column codec selection β letting you match the compressor to the data pattern.
| Philosophy Aspect | ClickHouse Approach | Typical RDBMS Approach |
|---|---|---|
| Compression priority | Ratio over speed β ZSTD default | Speed over ratio β LZ4 or no compression |
| Codec selection | Per-column, user-specified | Global, automatic, or none |
| Specialized codecs | Delta, Gorilla, DoubleDelta, T64 | None β generic only |
| Codec stacking | Up to 4 codecs per column | Single codec or none |
| Block-level optimization | Per-granule codec selection (8192 rows) | Table or page-level only |
| Memory compression | Full support β processed in-place | Usually decompress-on-read |
ποΈ Generic Compression Algorithms
ClickHouse supports several general-purpose compression algorithms. These are the workhorses β they're the fallback when specialized codecs don't apply, and they often work very well as the final stage in a codec stack.
LZ4 Default
Use when: You need the fastest possible decompression. Good default for frequently-accessed hot data where I/O is the bottleneck.
ALTER TABLE events MODIFY COLUMN timestamp_codec LZ4; ZSTD Recommended
Use when: You want the best balance of ratio and speed. ClickHouse's default for most scenarios. ZSTD level 1 is default, but you can tune the level (1β22) for more ratio at more CPU cost.
-- Default ZSTD level 1
ALTER TABLE events MODIFY COLUMN data_codec ZSTD(1);
-- Higher ratio, more CPU
ALTER TABLE events MODIFY COLUMN data_codec ZSTD(3); LZMA
Use when: Data is archived and rarely accessed. Maximum compression ratio matters more than decode speed. Very high CPU usage β not recommended for hot data.
ALTER TABLE archive MODIFY COLUMN old_data_codec LZMA; Zlib (gzip)
Use when: Compatibility with other tools matters, or when you need gzip-compatible output. Slower than LZ4 or ZSTD with similar ratio to ZSTD level 1.
ALTER TABLE logs MODIFY COLUMN payload_codec Zlib; Ziqng
Use when: Maximum compression on cold archive data. Better ratio than ZSTD level 1, but significantly slower. Niche use case.
ALTER TABLE archive MODIFY COLUMN cold_data_codec Ziqng; π Generic Codec Benchmark (approximate)
| Codec | Encode Speed (MB/s) | Decode Speed (MB/s) | Compression Ratio | CPU Cost |
|---|---|---|---|---|
| LZ4 | ~800 | ~2500 | 2β4Γ | Low |
| ZSTD (level 1) | ~350 | ~900 | 3β6Γ | Medium |
| ZSTD (level 3) | ~180 | ~850 | 4β8Γ | Medium-High |
| Zlib | ~100 | ~300 | 3β5Γ | Medium |
| LZMA | ~30 | ~200 | 5β10Γ | High |
| Ziqng | ~20 | ~350 | 5β9Γ | High |
* Benchmarked on Intel Skylake-class hardware. Your hardware will vary. Codec performance depends heavily on data entropy β high-randomness data compresses poorly with all algorithms.
π¬ Specialized Codecs β The ClickHouse Secret Sauce
Generic compression treats data as opaque bytes. ClickHouse's specialized codecs understand the semantics of your data β timestamps increment, floats drift slowly, IDs share prefixes. This understanding lets them achieve dramatically better ratios than generic algorithms.
Specialized codecs work by detecting and exploiting data patterns. They don't just compress β they transform data into a more compact representation based on what the data means, not just what it contains.
Most specialized codecs are lossless β the original data is perfectly reconstructed. They're designed to stack on top of each other, with each codec in the chain handling a specific pattern.
π DEFAULT β No Compression
Stores data uncompressed. Sometimes the right choice β data that's already encrypted or randomly distributed won't compress further, and compression adds CPU overhead for no benefit.
- UUID columns β random by design, no compression benefit
- Already-compressed data (JPEG, video, binary blobs)
- Columns with very high cardinality (>1M unique values)
- Data in memory-only tables where I/O isn't a concern
-- UUIDs don't compress well
CREATE TABLE events (
event_id UUID DEFAULT,
user_id UUID DEFAULT,
payload String DEFAULT -- already compressed external data
) ENGINE = MergeTree()
ORDER BY (event_id); π Delta β Monotonic Sequence Compression
Stores the difference between consecutive values instead of raw values. For monotonically increasing sequences (timestamps, auto-increment IDs), differences are tiny β a timestamp delta might be just 1 instead of 1704067200.
value[i] - value[i-1]. Small deltas compress dramatically better than large raw numbers.
- Timestamps with regular intervals (1s, 1ms, 1ΞΌs)
- Monotonically increasing IDs (user_id, order_id)
- Sequence numbers in event logs
- Sensor readings taken at fixed intervals
-- Delta(4) stores deltas as 4-byte integers
-- Good for timestamps where deltas fit in 4 bytes
CREATE TABLE sensor_events (
timestamp DateTime64(3) CODEC(Delta(4)),
sensor_id UInt32 CODEC(Delta(4)),
temperature Float32 CODEC(Gorilla)
) ENGINE = MergeTree()
ORDER BY (sensor_id, timestamp); ππ DoubleDelta β Predictable Rate-of-Change
Stores the delta of deltas β the second-order difference. For sequences that change at a constant rate (timestamps incrementing by exactly 1ms, counters incrementing by exactly 1), DoubleDelta achieves extraordinary compression.
-- DoubleDelta(4) for 4-byte timestamp deltas
-- Excellent for regular-interval time series
CREATE TABLE metrics (
timestamp DateTime64(3) CODEC(DoubleDelta),
metric_name LowCardinality(String),
value Float64 CODEC(Gorilla)
) ENGINE = MergeTree()
ORDER BY (metric_name, timestamp);
-- DoubleDelta shines for DateTime64(3) at millisecond precision
-- Ratio improvement over raw: 10-50Γ typical ποΈ T64 β Transform for 64-byte Alignment
T64 is a byte-order transformation that rearranges integer data for better compression. It processes 64-byte blocks, transposing data so that similar bytes across values are grouped together β improving dictionary-based compression that follows it.
T64 is typically used as a preprocessing step before other codecs β it reorganizes data into a form that downstream codecs can exploit more effectively.
-- T64 as a preprocessing codec
-- Reorganizes bytes for better downstream compression
CREATE TABLE events (
event_id UInt64 CODEC(T64, ZSTD(1)),
user_id UInt64 CODEC(T64, LZ4),
timestamp DateTime64(3) CODEC(DoubleDelta, ZSTD(1))
) ENGINE = MergeTree()
ORDER BY event_id; π¦ Gorilla β Floating Point Time Series Compression
Gorilla is a purpose-built codec for floating point data that changes gradually β the hallmark of time series from sensors, metrics, and monitoring systems. It's the codec that Facebook's Gorilla paper made famous, and ClickHouse implements it directly.
Gorilla exploits two patterns common in time series:
- Leading zero sharing: If consecutive floats share the same leading bits, only the differing bits are stored
- XOR compression: If value[i] XOR value[i-1] is small, store just the XOR result
- Sensor readings that change gradually (temperature, pressure)
- Financial time series with small per-tick changes
- CPU/memory utilization metrics (similar values across time)
- Any Float32/Float64 where consecutive values often share leading bits
CREATE TABLE temperature_sensors (
sensor_id UInt32,
timestamp DateTime64(3) CODEC(DoubleDelta, ZSTD(1)),
temperature Float64 CODEC(Gorilla), -- slowly varying β great with Gorilla
humidity Float64 CODEC(Gorilla), -- also slowly varying
battery_voltage Float64 CODEC(Gorilla) -- gradual discharge, Gorilla works too
) ENGINE = MergeTree()
ORDER BY (sensor_id, timestamp); π Codec Stacking β The ClickHouse Advantage
This is where ClickHouse's compression system gets really powerful. You can chain up to 4 codecs per column β each codec in the chain handles a specific pattern, transforming data progressively until the final generic codec squeezes out the last bits.
Delta β removes large baseline valuesGorilla β XOR of similar floatsZSTD(1) β dictionary + entropy codingraw_timestamp[i] - raw_timestamp[i-1] β small_delta β XOR_with_previous β compressed_bits π‘ Common Codec Stack Patterns
DoubleDelta, ZSTD(1) DoubleDelta, ZSTD(1) for timestamps, Gorilla, ZSTD(1) for floats
Delta(8), ZSTD(1) Delta(4), ZSTD(3) or ZSTD(3) ZSTD(3) β only option that helps for random data
T64, Delta(8), ZSTD(1) π Stacking Rules & Constraints
π Measuring Your Compression Ratios
ClickHouse provides several ways to inspect actual compression effectiveness. Use these queries to understand how your data is actually compressing and identify opportunities for improvement.
π system.parts β Per-Part Compression Stats
Every part in ClickHouse tracks compressed and uncompressed sizes. Query this to see per-table, per-part actual ratios.
SELECT
database,
table,
partition_id,
name AS part_name,
active,
bytes_on_disk AS compressed_bytes,
primary_key_bytes_in_memory AS pk_bytes,
rows,
ROUND(data_compressed_bytes / data_uncompressed_bytes, 3) AS compression_ratio,
data_uncompressed_bytes,
data_compressed_bytes
FROM system.parts
WHERE database = 'default'
AND table = 'events'
AND active = 1
ORDER BY partition_id, name; π system.columns β Column-Level Info
See which columns are storing data and their default codec specifications.
SELECT
name AS column_name,
data_type,
codec_desc AS codec,
formatReadableSize(data_compressed_bytes) AS compressed_size,
formatReadableSize(data_uncompressed_bytes) AS uncompressed_size
FROM system.columns
WHERE database = 'default'
AND table = 'events'
AND data_compressed_bytes > 0
ORDER BY data_uncompressed_bytes DESC; π§ͺ testCompression β Experimental Codecs
ClickHouse includes a testCompression() table function that compresses a sample of data with different codecs and reports the results.
-- Test different codecs on your data
SELECT
codec,
compressed_size,
uncompressed_size,
ROUND(compression_ratio, 3) AS ratio,
encode_time_us,
decode_time_us
FROM (
SELECT *
FROM testCompression('your_table')
WHERE compression_method NOT LIKE 'Preliminary%'
)
ORDER BY compression_ratio DESC; π Per-Column Ratio Analysis
Identify which columns compress well and which don't β helps focus codec optimization effort.
-- Find columns with poor compression (ratio < 2x)
SELECT
database,
table,
name AS column_name,
data_type,
codec_desc,
ROUND(data_uncompressed_bytes / NULLIF(data_compressed_bytes, 0), 2) AS ratio
FROM system.columns
WHERE database = 'default'
AND table LIKE '%events%'
AND data_compressed_bytes > 0
HAVING ratio < 2.0
ORDER BY ratio ASC; π― Expected Compression Ratios by Data Type
| Data Type | Recommended Codec Stack | Expected Ratio | Notes |
|---|---|---|---|
| Timestamps (DateTime, DateTime64) | DoubleDelta, ZSTD(1) | 10β50Γ | Best when regular intervals |
| Float32/Float64 (slowly varying) | Gorilla, ZSTD(1) | 8β30Γ | Sensor readings, metrics |
| UInt64 IDs (monotonic) | Delta(8), ZSTD(1) | 10β20Γ | Auto-increment, event IDs |
| UInt64 IDs (random) | ZSTD(3) | 1.2β2Γ | Limited gains on random IDs |
| LowCardinality(String) | ZSTD(3) | 5β15Γ | Country codes, status strings |
| String (high entropy) | ZSTD(3) | 1β3Γ | JSON, URLs, random strings |
| Nullable(DateTime) | DoubleDelta, ZSTD(1) | 8β40Γ | NULL bits pack well |
π Primary Key Columns and Compression
Primary key columns appear in every MergeTree data file β the primary key index (.idx), the primary key column data (.bin), and the mark file (.mrk). Compression on these columns has outsized impact.
Why Primary Key Column Compression Matters More
- Every query reads primary key columns first β they're used for WHERE clause filtering, range scans, and primary key lookups
- Primary key columns appear in ORDER BY clauses β almost every query processes them
- They're read on every part scan β even queries that filter to specific partitions still read primary key data for marking
- MergeTree sorts by primary key β this sorting exposes patterns (monotonic increase, low cardinality) that good codecs exploit
Recommendations for Primary Key Columns
DoubleDelta, ZSTD(1) β almost always the right choice. Timestamps in primary keys are sorted and often monotonic.
ZSTD(1) β works well for low-cardinality strings that appear in the primary key. Consider LowCardinality(String) type as well.
Delta(8), ZSTD(1) if monotonically increasing. ZSTD(3) if random. Even random IDs compress somewhat with ZSTD.
βοΈ Data Skipping with Compressed Data
ClickHouse's data skipping relies on reading marks (offset pointers) and using the sparse primary key index to determine which granules (8192-row chunks) to decompress and scan. Compression doesn't break this β it makes it more important.
π How Skipping Works with Compression
π Compression Impact on Data Skipping
| Factor | Low Compression (1β2Γ) | Medium Compression (4β8Γ) | High Compression (10β50Γ) |
|---|---|---|---|
| Disk I/O | High β reading more raw bytes | Balanced β good ratio with moderate CPU | Low β fewer bytes to read |
| Decompression CPU | Low β less data to decompress | Moderate β reasonable workload | Higher β more data to decompress per match |
| Cache Efficiency | Poor β large data doesn't fit in cache | Good β compressed data fits better | Excellent β more data fits in cache |
| Best for | Hot data, very frequent access | General analytics | Wide scans, archival data |
π Mark Files and Granules
ClickHouse organizes data into granules β 8192 consecutive rows that are processed together. Each granule has one entry in the primary key sparse index and one mark (offset pointer).
Codecs apply at the granule level β each granule can have slightly different compressed size, and codec selection is fixed per column but compression ratio varies per granule based on actual data patterns.
π·οΈ Low Cardinality Columns β Built-In Efficiency
Low-cardinality columns (columns with few unique values relative to total rows) compress exceptionally well. ClickHouse has both codec-level optimizations and a dedicated LowCardinality data type that encodes values internally.
π Real-World Example: IoT Sensor Time Series
Complete table design for a sensor monitoring system, with codec selection for each column.
CREATE TABLE iot_sensors (
-- Primary key columns: frequently used in WHERE, ORDER BY
device_id UInt64 CODEC(Delta(8), ZSTD(1)),
timestamp DateTime64(3) CODEC(DoubleDelta, ZSTD(1)),
-- Measurement values: slowly varying floats β Gorilla
temperature Float64 CODEC(Gorilla, ZSTD(1)),
pressure Float64 CODEC(Gorilla, ZSTD(1)),
humidity Float64 CODEC(Gorilla, ZSTD(1)),
battery_voltage Float64 CODEC(Gorilla, ZSTD(1)),
-- Signal quality: 0-100 integer, low cardinality
signal_quality UInt8 CODEC(Delta(1), ZSTD(1)),
-- Location: LowCardinality string
location_id LowCardinality(String),
-- Error code: low cardinality integer β Delta
error_code UInt16 CODEC(Delta(2), ZSTD(1)),
-- Raw payload: high entropy JSON, ZSTD only option
raw_payload String CODEC(ZSTD(3))
) ENGINE = MergeTree()
ORDER BY (device_id, timestamp)
PARTITION BY toYYYYMM(timestamp)
TTL timestamp + INTERVAL 90 DAY; Why these codecs?
- device_id: Monotonically increasing UInt64 β Delta(8) reduces to tiny deltas
- timestamp: Regular 1ms intervals β DoubleDelta gets the delta-of-delta close to 0
- temperature/pressure/humidity: Slowly varying floats β Gorilla XOR gives 8β30Γ ratio
- signal_quality: UInt8 (0β100) β Delta(1) handles tiny deltas perfectly
- location_id: 1000 unique strings β LowCardinality saves ~50Γ over plain String
- raw_payload: High-entropy JSON β ZSTD(3) is the only option that helps
Expected compression
ποΈ MergeTree Storage and Compression Interaction
ClickHouse's MergeTree storage engine is where codec selection actually happens. Understanding how MergeTree stores data helps you make better codec choices.
π MergeTree Part File Structure
Each MergeTree part (a batch of sorted rows) contains these files on disk:
{column}.bin{column}.mrk3{column}.idxprimary.idxprimary.mrk3count.txtchecksums.txtcolumns.txtπ‘ Storage Insight: Codec Selection Affects Disk I/O
When ClickHouse reads a column for query processing, it:
- Reads mark file (
.mrk3) to find granule offsets β fast, small, uncompressed - Uses primary key sparse index to skip granules that don't match WHERE clause β fast
- Reads compressed column data for matching granules only β this is where codec matters
- Decompresses in-memory β CPU cost, but amortized over fast columnar processing
I/O time = (compressed_bytes_read / disk_speed) + (decompressed_bytes / memory_bandwidth Γ cpu_cycles_per_byte) Better compression reduces the first term (I/O) at the cost of increasing the second term (decompression CPU). The tradeoff almost always favors better compression in OLAP workloads.
π Part Merges and Compression
When MergeTree merges parts, it recompresses the data. This is when codec choice affects write performance β and why you should test codec combinations before production deployment.
ALTER TABLE ... MODIFY COLUMN which triggers a full rewrite
π Codec Quick Reference
Specialized Codecs
| Codec | Best For | Data Types |
|---|---|---|
| Delta(N) | Timestamps, monotonic integers, regular intervals | Int, DateTime |
| DoubleDelta | Regular-interval timestamps, constant-rate sequences | Int, DateTime |
| T64 | Integer preprocessing before generic codec | Int (all sizes) |
| Gorilla | Time series floats (temperature, metrics, sensors) | Float32, Float64 |
| DEFAULT | Random data (UUIDs, hashes, encrypted blobs) | Any |
Generic Codecs
| Codec | Ratio | Speed | CPU |
|---|---|---|---|
| LZ4 | 2β4Γ | Fastest decode | Low |
| ZSTD(1) | 3β6Γ | Fast | Medium |
| ZSTD(3) | 4β8Γ | Medium | Medium-High |
| ZSTD(10) | 5β10Γ | Slow | High |
| LZMA | 5β10Γ | Very slow | Very High |
| Zlib | 3β5Γ | Medium | Medium |
π₯ Common Codec Stacks
DoubleDelta, ZSTD(1) β Timestamps (best default for time series) Gorilla, ZSTD(1) β Floating point time series (metrics, sensors) Delta(8), ZSTD(1) β Monotonic UInt64 IDs T64, Delta(8), ZSTD(1) β Counter columns (high-frequency monotonic) ZSTD(3) β High-entropy strings (JSON, URLs, random data) Delta(2), ZSTD(1) β Small integers (status codes, categories) DoubleDelta, Gorilla, ZSTD(1) β Full time series stack (timestamp + float value)