Skip to main content

Intro

Apache Flink is a stream processing framework that can also handle batch tasks. It considers batches to simply be data streams with finite boundaries, and thus treats batch processing as a subset of stream processing. This stream-first approach to all processing has a number of interesting side effects.

This stream-first approach has been called the Kappa architecture, in contrast to the more widely known Lambda architecture (where batching is used as the primary processing method with streams used to supplement and provide early but unrefined results). Kappa architecture, where streams are used for everything, simplifies the model and has only recently become possible as stream processing engines have grown more sophisticated.

The four cornerstones on which Flink is built

  1. Streaming
  2. State
  3. Time
  4. Snapshots

Real-time Stream Processors

Advanced stream processors allow you to perform a wide range of tasks including:

  • Windowed aggregations (e.g., count pageviews per minute)
  • Stream-to-table joins (e.g., enrich clickstream data with user profiles)
  • Event filtering and deduplication
  • Resilience to late or out-of-order data

Stream processing frameworks are purpose-built to manage state, ordering, and fault tolerance at scale.

Streaming

streaming

  • A stream is a sequence of events
  • Business data is always a stream: bounded or unbounded
  • For Flink, batch processing is just a special case in the runtime