ClickHouse vs Elasticsearch
Elasticsearch is a search engine; ClickHouse is an analytics engine. They overlap in one big territory — log and event data — and that's where the comparison matters. Elasticsearch indexes every term in every document and gives you fast full-text search and faceted filtering. ClickHouse stores rows in column order and gives you scans, aggregations, and SQL at 100× the throughput. Pick the wrong one for your workload and you'll either burn money on disk or wait minutes for a "count by service over the last 24h" query.
Different storage shapes
ES inverts: term → documents. ClickHouse columnar: scan a column with codec compression. Different physics; different operations are cheap.
Side by side
| ClickHouse | Elasticsearch | |
|---|---|---|
| Storage layout | Columnar parts | Lucene inverted index + doc values |
| Compression | 3–30× typical | ~1.5–3× typical |
| Full-text search | LIKE / ngrambf / tokenbf | Native, ranked |
| Aggregation | Vectorized, row-billion/sec | doc_values, slower |
| Schema | Strict types, ALTER expensive | Dynamic mapping |
| Query language | SQL | Query DSL (JSON) or ES SQL |
| Write throughput | 100s MB/s/node | 10s MB/s/node |
| Resource use | Low memory, high CPU | High RAM (heap + filesystem cache) |
| Update model | Mutations / ReplacingMergeTree | Document version, lazy delete |
Workloads where ES wins
- Free-text search with ranking — relevance scoring, BM25, multi-language analysis. ClickHouse can match on substrings but cannot rank.
- Faceted search UIs with many short-tailed filters and aggregations interleaved. The inverted index makes "filter by term + aggregate" tight.
- High-cardinality keyword fields with WHERE = 'foo' lookups across billions of docs.
Workloads where ClickHouse wins
- Aggregations over time ranges — "errors per service per minute" — 10–100× faster, far cheaper.
- Long-retention log storage — codec compression cuts disk by an order of magnitude.
- Analytical SQL — joins, window functions, CTEs.
- High write throughput at modest hardware — no per-document indexing cost.
The substring problem
ClickHouse's solution to "log message contains 'OOMKilled'" is an ngrambf_v1 data-skipping index:
ALTER TABLE logs
ADD INDEX msg_ngram message TYPE ngrambf_v1(3, 256, 2, 0) GRANULARITY 4;
SELECT count() FROM logs
WHERE message LIKE '%OOMKilled%' AND ts > now() - INTERVAL 1 DAY; Granules whose bloom rejects the trigrams are skipped. It's not as fast as Lucene's posting list, but it's fast enough for most log workloads and the storage cost is negligible. For "find me docs about kernel panics ranked by relevance," it's not the right tool.
Schema and ingestion
ES has dynamic mapping: send a JSON document, ES infers types and indexes everything. Convenient until field explosion (dynamic mapping creates a field per unique key — a leaky log payload can produce thousands).
ClickHouse has strict schemas. The JSON type (24+) and Dynamic type (also 24+) approximate ES's
dynamism but with explicit constraints. The bigger story is LowCardinality + ZSTD: a typical structured
log table compresses to under 0.1× raw bytes.
CREATE TABLE logs (
ts DateTime CODEC(DoubleDelta, ZSTD),
level LowCardinality(String),
service LowCardinality(String),
trace_id UUID,
message String CODEC(ZSTD(3)),
fields Map(LowCardinality(String), String) CODEC(ZSTD)
)
ENGINE = MergeTree
PARTITION BY toYYYYMMDD(ts)
ORDER BY (service, ts); Cost
At 1 TB/day of structured logs over a 30-day retention:
- Elasticsearch: ~30 TB raw, ~60–90 TB on disk after replication; needs RAM ~5% of disk for hot data; typically 5–20× compute vs ClickHouse.
- ClickHouse: ~3 TB on disk after compression; replicas double that; RAM is whatever fits the working set.
Most teams that switch from ES to ClickHouse for logs report 5–20× cost reduction. The penalty: full-text search degrades to substring + ngram skipping.
Operational model
Elasticsearch shards live on data nodes, master-eligible nodes coordinate metadata, and Coordinating nodes fan out queries. Heap pressure is the perpetual operational concern: index merges, fielddata, query buffers all compete for JVM heap.
ClickHouse runs natively without a JVM. Memory is straightforwardly allocated from the page cache + a few
per-query budgets (max_memory_usage, max_bytes_before_external_group_by). Replication is
peer-to-peer through Keeper; there is no "master" to fail.
For high-cardinality keyword filtering at very low latency (e.g. "find the row with this exact UUID"), Elasticsearch's posting list is hard to beat. ClickHouse's primary-key + bloom filter combo gets close on well-designed schemas but cannot match Lucene's per-term lookup constants.
Tradeoffs
- + ClickHouse: faster aggregation, lower disk, full SQL, predictable resource use.
- + ES: full-text search with ranking, dynamic schema, mature ecosystem (Kibana, Logstash).
- − ClickHouse: substring search via ngram is good, not great.
- − ES: high cost at scale, heap pressure, slow aggregations.
Migrating ES log workloads to ClickHouse
The most common ES → ClickHouse migration is a logging stack. The recipe:
- Replace the
@timestampfield with aDateTime64(3). - Map structured fields to typed columns; put high-cardinality string fields under
LowCardinality. - Stash the unstructured tail in a
Map(LowCardinality(String), String)or the newJSONtype. - Add an
ngrambf_v1index onmessagefor substring search. - Replace Logstash with the Vector ClickHouse sink, or Kafka → CH Kafka engine.
- Replace Kibana with Grafana, OpenObserve, or SigNoz.
Expect a 5–20× cost reduction for log-only workloads, and a quality tradeoff in full-text search ergonomics.
FAQ
Can ClickHouse fully replace Elasticsearch?
For analytical/log workloads, yes — and many teams have done so. For search UIs that need relevance ranking and language analysis, no.
What about hybrid: ES for search + CH for aggs?
Common pattern. Index the searchable fields in ES with short retention, ship the same data to ClickHouse for analytics with long retention.
Does ClickHouse have something like Kibana?
Grafana works well. Tabix and play.clickhouse.com are SQL-first. For log UI specifically, OpenObserve and SigNoz target the ES-replacement use case.
How does the OpenSearch / Elasticsearch fork affect this?
Both forks share the underlying Lucene model; the comparison is identical. License differences matter for vendor choice, not architecture.
What about full-text search in ClickHouse 24+?
The new full_text index (experimental) brings tokenized search closer to ES territory but ranking and language analysis are still primitive.