ClickHouse vs BigQuery
BigQuery is Google's serverless data warehouse: you write SQL, Google figures out the cluster. ClickHouse is a self-hosted columnar OLAP engine: you provision the boxes and tune the merges. Both are extremely fast on analytical scans, but they live in different operational universes — pricing, latency, real-time ingestion, and ecosystem are all opposites. The choice is rarely about raw speed; it is about which kind of operational debt your team would rather take on.
Architecture at a glance
ClickHouse couples compute to storage; BigQuery uses Google's Jupiter petabit network to keep them apart.
Side by side
| ClickHouse | BigQuery | |
|---|---|---|
| Hosting | Self-hosted or ClickHouse Cloud | Google-managed only |
| Storage | Local disk / S3 (24+ separation) | Colossus, separated by design |
| Compute | Long-lived servers | Slots leased per query |
| Pricing | $/instance-hour | $/TB scanned (or $/slot-hour) |
| P99 latency | 10ms–1s for hot data | 1–10s baseline |
| Real-time INSERT | 10s–100s of MB/s/node | Streaming insert API, $/MB |
| Joins | Hash / merge / direct | Broadcast / shuffle |
| UDFs | SQL, executable, WebAssembly | JavaScript, SQL |
| Ecosystem | Open source, Apache 2 | GCP-native |
Pricing model: where the surprise lives
BigQuery's on-demand pricing is per terabyte scanned. A query that touches 10 TB costs $50, regardless of how
long it took. The on-demand model rewards narrow queries (column selection, partition predicates) and
punishes SELECT *. The "flat-rate" alternative (slots) charges per compute capacity reservation —
predictable, but you pay even when idle.
ClickHouse pricing is just the box. A 3-node 32-core cluster on AWS runs ~$3k/month and serves whatever you throw at it. There is no per-query cost. This is amazing if you have steady high-volume analytics; expensive if you query rarely.
-- BigQuery: this is a $50 query if events is 10 TB, even with WHERE
SELECT user_id, count(*) FROM events GROUP BY user_id;
-- BigQuery: this is a $5 query (only events of 2024-03 scanned)
SELECT user_id, count(*) FROM events
WHERE _PARTITIONDATE BETWEEN '2024-03-01' AND '2024-03-31'
GROUP BY user_id; Query latency
ClickHouse reaches sub-100ms for primary-key-aligned queries on small data. Cold large-aggregation queries depend on disk and CPU; 10s for a 100-billion-row scan is normal.
BigQuery's baseline overhead is 1–3s before any work happens (slot acquisition + planning). Queries scale almost arbitrarily — Google can throw 2000 slots at one query — so very large aggregations finish faster than on a finite ClickHouse cluster, but interactive dashboards on small data feel sluggish.
Real-time ingestion
ClickHouse ingests 100s of MB/s per node by design. INSERTs land as parts in seconds, queryable immediately. Materialized views fan out to pre-aggregated tables in real time.
BigQuery streaming insert costs ~$0.01/MB with a per-row size cap and an eventual consistency window before rows appear in queries. The newer Storage Write API is cheaper but still bandwidth-billed. Most BigQuery users batch load via GCS for cost reasons, which means "real-time" becomes "every 5 minutes."
Ecosystem and lock-in
BigQuery integrates with the rest of GCP: Dataflow, Pub/Sub, Looker, IAM. Moving data out is a known ETL job; moving the workload off BigQuery means rewriting GoogleSQL → standard SQL and re-engineering streaming ingestion.
ClickHouse is open source. The same engine runs on a laptop, on EC2, on bare metal, on Kubernetes, on ClickHouse Cloud. Dialects are stable across deployments. The cost of leaving is "spin up the same engine somewhere else."
When each wins
- Pick ClickHouse for: real-time event ingestion, sub-second dashboards, predictable monthly cost, on-prem or multi-cloud, log/metrics/observability backends.
- Pick BigQuery for: bursty ad-hoc analytics over petabytes, organizations already deep in GCP, ML pipelines via BigQuery ML, audit-grade SQL with no ops staff.
Concurrency & dashboards
ClickHouse is built to serve thousands of concurrent dashboard queries off the same nodes. The thread-per-query plus shared filesystem cache makes each marginal user nearly free until you saturate CPU.
BigQuery's slot model means every concurrent query takes some of your slot budget. On-demand pricing has no fixed concurrency limit but each query competes for shared Google capacity, producing variable latency. Flat-rate slot reservations give predictable concurrency at the cost of paying for idle slots.
The classic pattern: BigQuery for the data lake / nightly aggregations, ClickHouse downstream for the user-facing dashboards. ETL into BQ once, replicate aggregated tiers into CH for serving.
Tradeoffs
- + ClickHouse: lower latency, real-time inserts, predictable cost.
- + BigQuery: zero ops, infinite burst capacity, Google ML and IAM stack.
- − ClickHouse: you operate Keeper, replicas, mergers, retention.
- − BigQuery: per-TB scanning surprises, GCP lock-in, slow dashboards.
Migration patterns
Most BigQuery → ClickHouse migrations fall into one of three shapes:
- Lift-and-shift. Recreate schemas in ClickHouse, dump as Parquet, ingest via the
s3()table function. Hard parts: GoogleSQL-specific syntax (STRUCT,ARRAY_AGG) and BigQuery UDFs (JavaScript) that need rewriting in ClickHouse SQL. - Tiered. Keep BigQuery as the source of truth for ad-hoc/ETL; replicate hot recent data into ClickHouse for sub-second user-facing queries. Common when BigQuery is the canonical warehouse and CH is the serving layer.
- Greenfield. Write new pipelines directly to ClickHouse (Kafka engine, native HTTP), keeping BigQuery only for whatever specific GCP integrations require it.
FAQ
Can I use BigQuery as a backend for a low-latency app?
Not directly. The 1–3s baseline kills interactive UX. Use BI Engine (in-memory cache) or replicate into a faster store. ClickHouse is engineered for this directly.
Does ClickHouse support compute-storage separation?
Yes since 24.x — S3 disk + zero-copy replication. Not as automatic as BigQuery slots, but it eliminates the "must scale storage with compute" problem.
Which has better SQL?
BigQuery's GoogleSQL is closer to ANSI; ClickHouse has dialect quirks but more analytical functions and the entire *State / *Merge aggregate-state model.
What about cost at petabyte scale?
For steady workloads, ClickHouse on dedicated hardware is typically 3–10× cheaper than BigQuery flat-rate. For sporadic queries, BigQuery on-demand often wins.
How do I migrate?
Most schemas port directly. The hard parts: GoogleSQL specifics (STRUCT, ARRAY_AGG), streaming ingestion rewrite, and BigQuery-specific UDFs.