Different storage shapes

Elasticsearch Lucene inverted index term "error" → docs term "auth" → docs term "503" → docs + stored fields per doc + doc_values for aggs disk: ~3-5× raw ClickHouse column files (.bin) ts.bin level.bin message.bin no inverted index scan + skip indexes disk: ~0.1× raw

ES inverts: term → documents. ClickHouse columnar: scan a column with codec compression. Different physics; different operations are cheap.

Side by side

ClickHouseElasticsearch
Storage layoutColumnar partsLucene inverted index + doc values
Compression3–30× typical~1.5–3× typical
Full-text searchLIKE / ngrambf / tokenbfNative, ranked
AggregationVectorized, row-billion/secdoc_values, slower
SchemaStrict types, ALTER expensiveDynamic mapping
Query languageSQLQuery DSL (JSON) or ES SQL
Write throughput100s MB/s/node10s MB/s/node
Resource useLow memory, high CPUHigh RAM (heap + filesystem cache)
Update modelMutations / ReplacingMergeTreeDocument version, lazy delete

Workloads where ES wins

  • Free-text search with ranking — relevance scoring, BM25, multi-language analysis. ClickHouse can match on substrings but cannot rank.
  • Faceted search UIs with many short-tailed filters and aggregations interleaved. The inverted index makes "filter by term + aggregate" tight.
  • High-cardinality keyword fields with WHERE = 'foo' lookups across billions of docs.

Workloads where ClickHouse wins

  • Aggregations over time ranges — "errors per service per minute" — 10–100× faster, far cheaper.
  • Long-retention log storage — codec compression cuts disk by an order of magnitude.
  • Analytical SQL — joins, window functions, CTEs.
  • High write throughput at modest hardware — no per-document indexing cost.

The substring problem

ClickHouse's solution to "log message contains 'OOMKilled'" is an ngrambf_v1 data-skipping index:

ALTER TABLE logs
ADD INDEX msg_ngram message TYPE ngrambf_v1(3, 256, 2, 0) GRANULARITY 4;

SELECT count() FROM logs
WHERE message LIKE '%OOMKilled%' AND ts > now() - INTERVAL 1 DAY;

Granules whose bloom rejects the trigrams are skipped. It's not as fast as Lucene's posting list, but it's fast enough for most log workloads and the storage cost is negligible. For "find me docs about kernel panics ranked by relevance," it's not the right tool.

Schema and ingestion

ES has dynamic mapping: send a JSON document, ES infers types and indexes everything. Convenient until field explosion (dynamic mapping creates a field per unique key — a leaky log payload can produce thousands).

ClickHouse has strict schemas. The JSON type (24+) and Dynamic type (also 24+) approximate ES's dynamism but with explicit constraints. The bigger story is LowCardinality + ZSTD: a typical structured log table compresses to under 0.1× raw bytes.

CREATE TABLE logs (
    ts        DateTime CODEC(DoubleDelta, ZSTD),
    level     LowCardinality(String),
    service   LowCardinality(String),
    trace_id  UUID,
    message   String CODEC(ZSTD(3)),
    fields    Map(LowCardinality(String), String) CODEC(ZSTD)
)
ENGINE = MergeTree
PARTITION BY toYYYYMMDD(ts)
ORDER BY (service, ts);

Cost

At 1 TB/day of structured logs over a 30-day retention:

  • Elasticsearch: ~30 TB raw, ~60–90 TB on disk after replication; needs RAM ~5% of disk for hot data; typically 5–20× compute vs ClickHouse.
  • ClickHouse: ~3 TB on disk after compression; replicas double that; RAM is whatever fits the working set.

Most teams that switch from ES to ClickHouse for logs report 5–20× cost reduction. The penalty: full-text search degrades to substring + ngram skipping.

Operational model

Elasticsearch shards live on data nodes, master-eligible nodes coordinate metadata, and Coordinating nodes fan out queries. Heap pressure is the perpetual operational concern: index merges, fielddata, query buffers all compete for JVM heap.

ClickHouse runs natively without a JVM. Memory is straightforwardly allocated from the page cache + a few per-query budgets (max_memory_usage, max_bytes_before_external_group_by). Replication is peer-to-peer through Keeper; there is no "master" to fail.

For high-cardinality keyword filtering at very low latency (e.g. "find the row with this exact UUID"), Elasticsearch's posting list is hard to beat. ClickHouse's primary-key + bloom filter combo gets close on well-designed schemas but cannot match Lucene's per-term lookup constants.

Tradeoffs

  • + ClickHouse: faster aggregation, lower disk, full SQL, predictable resource use.
  • + ES: full-text search with ranking, dynamic schema, mature ecosystem (Kibana, Logstash).
  • ClickHouse: substring search via ngram is good, not great.
  • ES: high cost at scale, heap pressure, slow aggregations.

Migrating ES log workloads to ClickHouse

The most common ES → ClickHouse migration is a logging stack. The recipe:

  1. Replace the @timestamp field with a DateTime64(3).
  2. Map structured fields to typed columns; put high-cardinality string fields under LowCardinality.
  3. Stash the unstructured tail in a Map(LowCardinality(String), String) or the new JSON type.
  4. Add an ngrambf_v1 index on message for substring search.
  5. Replace Logstash with the Vector ClickHouse sink, or Kafka → CH Kafka engine.
  6. Replace Kibana with Grafana, OpenObserve, or SigNoz.

Expect a 5–20× cost reduction for log-only workloads, and a quality tradeoff in full-text search ergonomics.

FAQ

Can ClickHouse fully replace Elasticsearch?

For analytical/log workloads, yes — and many teams have done so. For search UIs that need relevance ranking and language analysis, no.

What about hybrid: ES for search + CH for aggs?

Common pattern. Index the searchable fields in ES with short retention, ship the same data to ClickHouse for analytics with long retention.

Does ClickHouse have something like Kibana?

Grafana works well. Tabix and play.clickhouse.com are SQL-first. For log UI specifically, OpenObserve and SigNoz target the ES-replacement use case.

How does the OpenSearch / Elasticsearch fork affect this?

Both forks share the underlying Lucene model; the comparison is identical. License differences matter for vendor choice, not architecture.

What about full-text search in ClickHouse 24+?

The new full_text index (experimental) brings tokenized search closer to ES territory but ranking and language analysis are still primitive.