Skip to main content

Kafka Guarantees

  • Messages sent by a producer to a topic and a partition are appended in the same order
  • A consumer instance gets the messages in the same order as they are produced
  • A topic with replication factor N, tolerates upto N-1 server failures

Message Processing Guarantees

  • No guarantee - No explicit guarantee is provided, so consumers may process messages once, multiple times or never at all.
  • At most once - This is "best effort" delivery semantics. Consumers will receive and process messages exactly once or not at all.
  • At least once - Consumers will receive and process every message, but they may process the same message more than once.
  • Effectively once - Also contentiously known as exactly once, this promises consumers will process every message once.

image

Exactly Once Semantics (EOS)

Key Building Blocks for EOS

Kafka achieves EOS using three main features:

(a) Idempotent Producer

  • Every message sent by a producer gets a producer ID (PID) and a sequence number.
  • If retries happen (e.g., network glitch), Kafka detects duplicates by checking the sequence number and ignores them.
  • Ensures no duplicates on retries → gives idempotent writes.

(b) Transactions

  • A producer can group multiple writes (to one or more partitions, topics) into a transaction.
  • Kafka ensures that either:
    • All writes in the transaction are committed, or
    • None are (atomicity).
  • Consumers using read_committed isolation see only committed data.

(c) Replication Protocol

  • With acks=all, Kafka ensures that once a transaction is committed, it’s durably replicated to all in-sync replicas (ISRs).
  • This guarantees durability even if the leader fails immediately after commit.

Putting It Together

Here’s how EOS works in practice:

  1. Producer starts a transaction → initTransactions().
  2. Writes messages with idempotence enabled.
  3. Calls commitTransaction() → Kafka writes a transaction marker ensuring atomic visibility.
  4. Messages are replicated to all ISRs (with acks=all).
  5. Consumer reads with isolation.level=read_committed → sees each committed message exactly once.

Limitations

  • EOS works only within Kafka (end-to-end with producers + Kafka + consumers).
  • If you write to an external DB/system, you need two-phase commit or outbox patterns.
  • Slight overhead due to transaction coordination.

Replication Protocol

  • When a producer writes a message to a partition, it first goes to the leader replica of that partition.

  • The producer can specify the acks setting:

    • acks=0 → Producer doesn’t wait for acknowledgment.
    • acks=1 → Producer gets an acknowledgment once the leader writes it.
    • acks=all (or -1) → Producer gets acknowledgment only after the message is written to the leader and replicated to all in-sync replicas (ISRs).

Important nuance: Kafka does not guarantee replication to all available replicas after the leader write. Instead, it guarantees replication to the set of in-sync replicas (ISRs).

  • If a follower is out of sync, it won’t block the acknowledgment.
  • As long as at least one ISR (typically the leader + followers in sync) has the message, Kafka considers it durable.

Kafka’s replication protocol guarantees that once a message has been acknowledged with acks=all, it has been written successfully to the leader replica and replicated to all in-sync replicas (ISRs).

Outbox Pattern

The Outbox Pattern is one of the most popular patterns used to achieve reliable event-driven integration between a database (like MySQL, PostgreSQL) and a message broker (like Kafka).

Problem it Solves

When a service needs to:

  1. Update its own database (say insert a new Order), and
  2. Publish an event to Kafka (like OrderCreated)

You run into the dual-write problem:

  • If you write to the DB but crash before publishing to Kafka → event is lost.
  • If you publish to Kafka but crash before writing to DB → system state is inconsistent.

This breaks exactly-once semantics across DB + Kafka.

The Outbox Pattern

Instead of doing both separately, you write only to your database in a single atomic transaction:

  1. Service writes business data (e.g., new Order)
  2. In the same transaction, service writes an event record into a special outbox table.

Example (Postgres/MySQL):

BEGIN;
INSERT INTO orders (id, customer, amount, status) VALUES (123, 'Deepak', 1000, 'CREATED');
INSERT INTO outbox (event_id, type, payload, status) VALUES (uuid(), 'OrderCreated', '{"id":123,"amount":1000}', 'NEW');
COMMIT;

Now DB and event are guaranteed consistent.

Event Relay (Debezium / Polling)

Next step:

  • A relay process reads events from the outbox table and publishes them to Kafka.
  • Can be done via:
    • Change Data Capture (CDC) tools like Debezium (reads DB transaction log, automatically streams outbox rows to Kafka).
    • Polling (a background service queries the outbox table and sends events).

After publishing, event status can be marked as SENT or the row can be archived.

Benefits

  • Atomicity → DB update + event log are one transaction.
  • Reliability → No lost or duplicated events.
  • Event-driven architecture works smoothly with existing DB.
  • Plays well with Kafka + EOS when you integrate external systems.

Example Flow

Order Service:

  • User places order.
  • Service saves order + inserts event into outbox table.
  • Debezium streams outbox → Kafka topic order.events.

Inventory Service:

  • Consumes from order.events.
  • Updates stock accordingly.

Outbox to Kafka - EOS

1. The Challenge

  • The outbox table is already atomic with your business data.
  • But when your producer service reads from the outbox and publishes to Kafka, two risks appear:
    1. Duplicate sends (retry on failure, service crash before marking event as sent).
    2. Lost events (service marks row as sent, but crash happens before actually producing to Kafka).

We need EOS here.

2. Solution Approaches

Approach A: Kafka Transactions (EOS Producer)

Kafka has idempotent producers + transactions:

  • Configure the producer with:
    • enable.idempotence=true
    • acks=all
    • transactional.id=outbox-relay-1
  • Start a transaction:
    1. Read batch of unsent events from outbox table.
    2. Begin Kafka transaction.
    3. Send all events to Kafka.
    4. Commit Kafka transaction AND update outbox table status (sent=true) in the same transactional flow.

But here’s the catch: Kafka transactions and DB transactions are separate. You cannot commit both atomically in one step.

So you need a transactional outbox + idempotent updates pattern.

Approach B: Idempotent Deduplication Key

Each event in outbox has a unique event_id (UUID).

  • Use event_id as Kafka message key (or put it in headers).
  • Kafka producer with idempotence ensures retries don’t create duplicates.
  • Consumers can deduplicate by event_id if needed.

Approach C: Debezium / CDC Outbox (Best Practice)

Instead of writing custom relay code:

  • Use Debezium (reads DB’s transaction log).

  • When your service commits the DB transaction (business row + outbox row), Debezium streams the outbox change directly into Kafka.

  • This is already exactly-once because:

    • DB transaction log guarantees no duplicates.
    • Kafka’s EOS producer (in Debezium) guarantees idempotence.

This avoids the "dual-write" problem entirely.

3. Practical Design (if writing custom relay service)

  1. Outbox row has: event_id (UUID, PK), payload, status, created_at
  2. Relay service logic:
    • Poll outbox rows where status='NEW'.
    • Start Kafka transaction.
    • For each row: producer.send(new ProducerRecord("topic", event_id, payload));
    • Commit Kafka transaction.
    • After successful commit, mark those rows as SENT.
  3. If crash happens:
    • Rows still marked NEW → safe to retry.
    • Kafka idempotence + unique key ensure no duplicates.

4. Key Points

  • Idempotence in Kafka removes duplicates caused by retries.
  • Outbox status column ensures no message is lost.
  • CDC (Debezium) is the cleanest and production-proven way.
  • If you roll your own relay, you must use idempotent producer + unique event_id.

So, the safest way to ensure EOS from Outbox → Kafka is:

  • Use CDC (Debezium Outbox pattern) if possible.
  • If using a custom service, use Kafka idempotent producer with unique event_id + mark rows after commit.

Idempotent Writer

A writer produces Events that are written into an Event Stream, and under stable conditions, each Event is recorded only once. However, in the case of an operational failure or a brief network outage, an Event Source may need to retry writes. This may result in multiple copies of the same Event ending up in the Event Stream, as the first write may have actually succeeded even though the producer did not receive the acknowledgement from the Event Streaming Platform. This type of duplication is a common failure scenario in practice and one of the perils of distributed systems.

Solution

idempotent-writer

Generally speaking, this can be addressed by native support for idempotent clients. This means that a writer may try to produce an Event more than once, but the Event Streaming Platform detects and discards duplicate write attempts for the same Event.

Implementation

To make an Apache Kafka® producer idempotent, configure your producer with the following setting:

enable.idempotence=true

The Kafka producer tags each batch of Events that it sends to the Kafka cluster with a sequence number. Brokers in the cluster use this sequence number to enforce deduplication of Events sent from this specific producer. Each batch's sequence number is persisted so that even if the leader broker fails, the new leader broker will also know if a given batch is a duplicate.

To enable exactly-once processing within an Apache Flink® application that uses Kafka sources and sinks, configure the delivery guarantee to be exactly once, either via the DeliveryGuarantee.EXACTLY_ONCE KafkaSink configuration option if the application uses the DataStream Kafka connector, or by setting the sink.delivery-guarantee configuration option to exactly-once if it uses one of the Table API connectors. Confluent Cloud for Apache Flink provides built-in exactly-once semantics. Downstream of the Flink application, be sure to configure any Kafka consumer with an isolation.level of read_committed since Flink leverages Kafka transactions in the embedded producer to implement exactly-once processing.

To enable exactly-once processing guarantees in Kafka Streams or ksqlDB, configure the application with the following setting, which includes enabling idempotence in the embedded producer:

processing.guarantee=exactly_once_v2

Considerations

Enabling idempotency for a Kafka producer not only ensures that duplicate Events are fenced out from the topic, it also ensures that they are written in order. This is because the brokers accept a batch of Events only if its sequence number is exactly one greater than that of the last committed batch; otherwise, it results in an out-of-sequence error.

Exactly-once semantics (EOS) allow Event Streaming Applications to process data without loss or duplication. This ensures that computed results are always consistent and accurate, even for stateful computations such as joins, aggregations, and windowing. Any solution that requires EOS guarantees must enable EOS at all stages of the pipeline, not just on the writer. An Idempotent Writer is therefore typically combined with an Idempotent Reader and transactional processing.

Idempotent Writer