Spark vs Flink

Batch-First vs Streaming-First, and the Tradeoffs That Follow

Spark and Flink are both top-tier distributed processing engines, both written in Scala/Java, both running on JVM, both supporting batch and streaming. They look interchangeable from a brochure. They are not — the underlying execution model is fundamentally different, and that difference cascades into latency, state management, exactly-once semantics, and ecosystem fit.

Spark is batch-first: an RDD/DataFrame engine where streaming is incremental batches. Flink is streaming-first: a continuous dataflow engine where batch is bounded streams. For analytical SQL on a data lake, Spark wins on tooling and optimizer. For low-latency stateful streaming, Flink wins on pretty much every dimension.

Architecture Comparison

Key Numbers

100ms+

Spark Streaming typical latency

5-50ms

Flink typical latency

2014

Spark 1.0 release

2014

Flink 0.5 release (originally Stratosphere)

Catalyst

Spark optimizer (mature, ~150 rules)

Calcite

Flink SQL optimizer (Apache Calcite)

10s TB

Flink production state size

Streaming Models

Aspect	Spark Structured Streaming	Flink DataStream
Execution	Micro-batch (continuous trigger experimental)	True per-record streaming
Latency	~100ms-seconds	~ms (single-digit typical)
Backpressure	Implicit via batch timing	Explicit, credit-based, well-instrumented
Event-time	Watermark per source partition	Watermark per parallel subtask, idleness-aware
Out-of-order	allowedLateness; older dropped	Side outputs for late events; configurable retention
Window types	Tumbling, sliding, session (fixed gap)	+ Session with custom gap, custom assigners
State APIs	flatMapGroupsWithState (typed but limited)	Rich KeyedState API: ValueState, ListState, MapState

Exactly-Once Approaches

# Spark: micro-batch atomicity
# Each micro-batch:
# 1. Read offsets from source (checkpoint)
# 2. Process the batch
# 3. Write outputs (transactional sinks for exactly-once)
# 4. Commit offsets to checkpoint
# Crash mid-batch: replay from last checkpoint, idempotent reprocess
# Limitation: latency floor = batch interval
# Sink support: Kafka, Delta, Iceberg, file sink, foreachBatch

# Flink: async distributed snapshots
# Periodically:
# 1. JobManager injects checkpoint barrier into every input
# 2. Each operator snapshots its state when barrier passes
# 3. Sinks pre-commit (2PC); commit on checkpoint complete
# 4. JobManager records the global checkpoint as durable
# Crash: restore all operators from latest checkpoint, sinks roll back uncommitted
# Latency floor: per-record (snapshots are async)
# Sink support: Kafka, JDBC w/ XA, files, any 2PC sink

Batch Workloads

Aspect	Spark	Flink
SQL optimizer	Catalyst (~150 rules) + AQE runtime	Apache Calcite-based, fewer specialized rules
Code generation	Tungsten whole-stage codegen	Some codegen, less aggressive than Tungsten
Lakehouse ecosystem	Delta Lake, Iceberg, Hudi all native	Iceberg connector solid; Delta/Hudi via 3rd party
ML	MLlib (mature, retired-but-present), Spark ML pipelines	FlinkML in maintenance mode
Adaptive Query Execution	Default-on, dynamic skew + coalesce + join switch	Less mature runtime adaptation
Cloud support	EMR, Dataproc, Databricks, Synapse, Glue	Kinesis Analytics, Ververica Cloud, less native cloud

State Management Details

# Spark Structured Streaming state stores
spark.sql.streaming.stateStore.providerClass =
  org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider
# State at each stateful operator partition, checkpointed each batch.
# Recovery: load latest snapshot + replay deltas to current batch.
# Realistic state size: ~100s GB per executor.

# Flink state backends
state.backend = rocksdb
state.backend.incremental = true
state.backend.changelog.enabled = true   # Flink 1.16+
# Async snapshots, only RocksDB SST diffs uploaded.
# Recovery from checkpoint is constant-time (lazy load).
# Realistic state size: 10s of TB per operator.

# Why the gap?
# - Flink built state-first; checkpoints are core, not retrofitted
# - Flink supports incremental checkpoints since 1.3 (2017)
# - Spark only got RocksDB state in 3.2 (2021); full incremental in 3.5

When Each Wins

Spark wins for

Analytical batch SQL on data lakes (Delta/Iceberg)
ETL pipelines with Catalyst-friendly SQL
ML workflows (MLlib + integrations)
Streaming where seconds latency is fine and code reuse with batch matters
Teams already on Databricks / Spark cloud services

Flink wins for

Sub-100ms latency stream processing
Complex event-time semantics, custom windows
Multi-TB state, fast incremental checkpoints
Many non-Kafka sources/sinks (Kinesis, Pulsar, JDBC XA)
CEP (complex event processing) and stateful pattern matching

Ecosystem Notes

# Spark
# Cloud-managed: Databricks, EMR, Dataproc, Synapse, Glue, Fabric
# Lakehouse: Delta Lake (Linux Foundation), Iceberg (Apache), Hudi (Apache)
# Notebook integration: Jupyter, Databricks, Zeppelin
# Languages: Scala, Java, Python (PySpark), R (SparkR), SQL
# Key vendors: Databricks, Cloudera, Confluent (for Streams)

# Flink
# Cloud-managed: Amazon Kinesis Data Analytics, Ververica Cloud, Aliyun
# Companion: Flink CDC (Change Data Capture), Flink ML
# Languages: Java, Scala, Python (PyFlink), SQL
# Key vendors: Ververica (data Artisans), Alibaba, Cloudera

# Beam (portability layer)
# Lets you write once, run on Spark, Flink, Dataflow, Samza
# Tradeoff: feature ceiling = lowest common denominator + translation overhead
# Used by Google Dataflow customers, less common elsewhere

Frequently Asked Questions

Why is Spark called 'batch-first' if it has Structured Streaming?

Spark started as a batch RDD engine; streaming was retrofitted on top as micro-batches. The execution model is fundamentally 'run a small batch every interval'. Flink, in contrast, was designed as a streaming dataflow runtime from day one — every operator processes records continuously, and batch is implemented as a special case of bounded streams. This shows up in latency (Flink is ~10x lower) and in features (Flink's event-time, watermarks, and savepoints are deeper).

Which has better state management?

Flink, by a wide margin. Flink's state backends (RocksDB with incremental checkpoints, ChangelogStateBackend) routinely handle multi-TB state per operator with sub-minute checkpoints. Spark Structured Streaming's state stores are more limited — RocksDB support arrived in 3.2 and is still maturing. For state-heavy applications (stream joins with weeks of retention, sessionization at scale), Flink is the standard choice.

Which is better for batch?

Spark, decisively. Catalyst + Tungsten + AQE form the most polished batch-SQL stack in the open-source ecosystem, and the surrounding tooling (Delta Lake, Iceberg, Hudi, Spark UI, broad cloud support) makes Spark the default for analytical batch processing. Flink supports batch via the same DataStream/Table APIs, but the ecosystem and tooling are thinner, and the query optimizer is less mature than Catalyst.

Can the same code run on Spark and Flink?

Largely no. Spark uses Datasets/DataFrames; Flink uses DataStream/DataSet (now unified into DataStream + Table API). The Apache Beam project provides a portable API, and your job can run on either Spark or Flink as a runner — but Beam adds a translation layer with its own performance characteristics, and most production users pick a runtime and stick with it.

Which has better Kubernetes support?

Both are mature on K8s. Flink has a native operator (Flink Kubernetes Operator) and was designed with long-running jobs in mind, so its K8s integration handles streaming-specific concerns like checkpoint storage and HA out of the box. Spark on K8s is also production-ready (3.1+) but is more oriented toward batch-style job submission. For long-running streaming jobs on K8s, Flink's lifecycle is slightly cleaner; for ad-hoc batch jobs, Spark's pattern is simpler.