Parts vs partitions

PARTITION BY toYYYYMM(ts) 202401 part_1_3_2 part_4_4_0 part_5_8_3 part_9_9_0 4 parts 202402 part_10_15_2 part_16_18_1 2 parts 202403 part_19_22_2 part_23_25_1 part_26_26_0 3 parts Background merges combine parts within a partition; they never cross partitions.

A partition is the unit of operational manipulation; a part is the unit of merging and IO.

Key Numbers

1k
recommended ceiling on partitions per table
300
default parts_to_throw_insert
monthly
canonical partition granularity
rename
cost of DROP / DETACH / ATTACH
5
main partition operations
never
merges cross partition boundaries
prune
at planning time, before scan

PARTITION BY semantics

CREATE TABLE events (
    ts        DateTime,
    user_id   UInt64,
    event     LowCardinality(String)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(ts)
ORDER BY (user_id, ts);

The expression is evaluated per row at INSERT time. Rows with different partition values cannot share a part, so an insert that touches three months produces at least three parts.

Partition keys can use any deterministic expression:

  • toYYYYMM(ts) — month — the most common choice.
  • toDate(ts) — daily — only for short-retention tables.
  • (toYYYYMM(ts), tenant_id) — tuple — multi-tenant rolling drops.
  • cityHash64(user_id) % 100 — bucketed — rarely a good idea.

Partition pruning

At plan time the optimizer evaluates the partition expression against the query's WHERE. Parts whose partition value cannot match are removed from the scan list before any IO. Inspect the result with EXPLAIN ESTIMATE or via system.query_log:

EXPLAIN ESTIMATE
SELECT count() FROM events WHERE ts BETWEEN '2024-03-01' AND '2024-03-31';
-- ┌─database─┬─table──┬─parts─┬───rows─┐
-- │ default  │ events │     3 │ 412881 │
-- └──────────┴────────┴───────┴────────┘

Pruning fires only when the WHERE expression matches the partition expression structurally. WHERE ts >= '...' prunes toYYYYMM(ts); WHERE substring(date_str, 1, 7) = '2024-03' does not.

Partition operations

-- Drop one month — instant; just removes the directory
ALTER TABLE events DROP PARTITION 202401;

-- Detach (move under detached/, recoverable)
ALTER TABLE events DETACH PARTITION 202401;
ALTER TABLE events ATTACH PARTITION 202401;

-- Move data between tables (must share schema/storage policy)
ALTER TABLE events_archive ATTACH PARTITION 202401 FROM events;

-- Move to a different storage tier
ALTER TABLE events MOVE PARTITION 202401 TO DISK 'cold';
ALTER TABLE events MOVE PARTITION 202401 TO VOLUME 's3';

-- Replace one partition's data with another's
ALTER TABLE events REPLACE PARTITION 202401 FROM events_staging;

These are O(rename) — they manipulate file system metadata, not data. Combined with multi-disk storage policies, partition moves implement automatic hot/cold tiering.

system.parts

SELECT
    partition,
    count()                         AS parts,
    sum(rows)                       AS rows,
    formatReadableSize(sum(bytes_on_disk)) AS size,
    max(modification_time)          AS last_change
FROM system.parts
WHERE table = 'events' AND active
GROUP BY partition
ORDER BY partition DESC;

This is the operational dashboard for a MergeTree table. Watch for:

  • Partition with hundreds of parts → merges falling behind.
  • Partition with size 0 → orphaned, candidate for DETACH.
  • Newest partition has many tiny parts → INSERT batch size too small.

Custom partition keys

Beyond time, partition keys are useful for tenant isolation, retention tiers, or segregating reprocessable from immutable data:

-- Multi-tenant: drop one tenant's data without scanning others
PARTITION BY tenant_id

-- Retention tiers: hot = day, warm = month, cold = year
PARTITION BY (tier, toStartOfInterval(ts,
    if(tier = 'hot',  toIntervalDay(1),
       if(tier = 'warm', toIntervalMonth(1), toIntervalYear(1)))))

The cardinality must stay sane: a partition per user_id will produce millions of partitions and ClickHouse will refuse to start.

Limits and pitfalls

Settings that govern partition health:

  • parts_to_throw_insert = 300 — INSERTs fail when a partition has too many active parts.
  • parts_to_delay_insert = 150 — INSERTs are slowed before the hard limit.
  • max_parts_in_total = 100000 — table-wide ceiling.
  • max_partitions_per_insert_block = 100 — guard against accidental partition explosion.

The "too many parts" error means merges can't keep up. Causes: tiny INSERTs, too-fine partitioning, mutations starving the merge scheduler, slow disk. Fix: batch larger, partition coarser, throttle ALTER.

Tradeoffs

  • + O(1) DROP / MOVE / DETACH for entire time ranges.
  • + Free pruning for time-bounded queries.
  • + Per-partition tiering with multi-disk storage policies.
  • Too-fine partitioning blows up part counts and merge work.
  • Merges never cross partitions, so partition skew = part skew.
  • Predicate must structurally match the partition expression to prune.

FAQ

How many partitions is too many?

Beyond ~1k–10k per table you start seeing slow plan time and slow restarts (parts are scanned at startup). Daily partitioning over a 5-year retention is already 1825 partitions; monthly is 60.

Can I change the partition expression after creation?

No. The partition expression is fixed at table creation. To change it: create a new table with the new key, INSERT … SELECT in the background, swap with RENAME TABLE.

Should partition key match the primary key?

Often yes for time data: PARTITION BY toYYYYMM(ts) + ORDER BY (user_id, ts). The partition prunes the time range and the primary key prunes within the partition.

What does the part name part_1_3_2 mean?

{block_number_min}_{block_number_max}_{merge_level}. A part with min == max is freshly inserted; merge level grows each time it is merged.

Can I DROP a partition that's currently being merged?

Yes. The merge cancels and the resulting parts are removed atomically. There is no risk of partial drop.