Skip to main content

Migration / Mirroring / Replication

Cross-Cluster Data Mirroring

  • Multicluster architecture
    • Hub-and-Spokes architecture
    • Active-Active architecture
    • Active-standby architecture
    • Stretch clusters
  • MirrorMaker1 and MirrorMaker2
  • Other cross-cluster mirroring solutions

MirrorMaker 2.0

MirrorMaker, is simple a Kafka consumer and producer, linked together with a queue. Can aggregate messages from two local clusters into an aggregate cluster, and then copying that cluster to other datacenters.

Architecture components

To successfully understand how MirrorMaker 2 works, one needs to keep in mind that MirrorMaker 2 is built on top of Kafka Connect. Kafka Connect is a framework within Apache Kafka that eases the integration of Kafka with other systems. Indeed, it allows developers to stream data to Kafka from various external sources and vice versa (i.e., from Kafka to external systems). Kafka Connect operates in a scalable and fault-tolerant manner using connector plugins. MirrorMaker 2 relies on three key Kafka Connectors to perform data and offset replications. These special connectors are as follows:

  • Source Connector is responsible for replicating the data between Kafka clusters.
  • Checkpoint Connector is responsible for consumer groups offsets translation.
  • Heartbeat Connector enables the monitoring of the health of a MirrorMaker 2 instance.

Demystifying Kafka MirrorMaker 2: Use cases and architecture | Red Hat Developer

Highlights of the Mirror Maker 2.0

  • Leverages the Kafka Connect framework and ecosystem.
  • Includes both source and sink connectors.
  • Includes a high-level driver that manages connectors in a dedicated cluster.
  • Detects new topics, partitions.
  • Automatically syncs topic configuration between clusters.
  • Manages downstream topic ACL.
  • Supports "active/active" cluster pairs, as well as any number of active clusters.
  • Supports cross-datacenter replication, aggregation, and other complex topologies.
  • Provides metrics including end-to-end replication latency across multiple data centers/clusters
  • Emits offsets required to migrate consumers between clusters.
  • Tooling for offset translation.
  • No data or partition rebalancing, guarantees ordering within partition

MirrorMaker 2 Limitations

  • MirrorMaker 2 does not expose replication lag or throughput metrics
  • Automatically sync offset, but need to create a system for offset translation
  • Limited documentation for monitoring, tuning, and securing your MirrorMaker 2 Configuration
  • Failover logic is application-specific and can be time-consuming to set up and maintain
  • Changes to MirrorMaker 2 must be made on the properties file and requires the restart of the connect cluster
  • Scaling requires significant overhead
    • Requires 4 connectors and 3 internal topics
    • Each destination cluster needs a MirrorMaker 2 connector configured

Offset Mapping

MM2 uses 2 internal topics to track the mapping of source and target offsets as well as the mapping between the source consumer_offsets to the target offset. The offset_sync topic at the target cluster maps the source topic, partition and offset with the corresponding offset at the target. MM2 gets the target offset from the RecordMetadata returned by producer.send().

For consumers relying on the __consumer_offsets topic to track progress, MM2 maps the consumer offsets in a separate log compacted __checkpoint topic per source cluster. MM2 periodically queries the source cluster for all committed offsets from all consumer groups, filters for those topics and consumer groups that need to be replicated and emits a message to the internal checkpoints topic at the target cluster. These checkpoint records are emitted at a configurable interval that can be dynamically controlled.

Using the checkpoint topic, a consumer, on failover, can directly determine (using the MM2 utilities) the target offset corresponding to the source committed offset that it needs to start consuming from.

Offset Translation

The offset translation is great feature to serve the foundation of migrating or failing over downstream consumers (including Kafka stream applications) from the primary to the backup cluster, as the consumers will use the translated offsets to resume the consumption from where they left off at the primary cluster, without losing messages or consuming many duplicate messages. This expectation essentially contributes to a smooth and transparent one-time migration of consumers from one to another cluster, or the failover of consumers from primary to backup cluster.

To achieve the above transition, there are two important steps: (1) consumer offsets can be translated into the ones that make sense in another cluster, which is already done by the current MM 2.0. (2) periodically synchronize the translated offsets to the  ___consumer_offsets_ topic, so that when the consumers switch over to the other cluster, they can start off from the last known and translated offsets.

KIP-545: support automated consumer offset sync across clusters in MM 2.0 - Apache Kafka - Apache Software Foundation

Confluent Replicator vs MirrorMaker 2.0

Migration

Cluster Linking

Cluster Linking allows you to directly connect clusters and perfectly mirror topics, consumer offsets, and ACLs from one cluster to another.

Scaling

Is there anyway to activate auto scaling or some form of auto scaling with Strimzi? · strimzi · Discussion #6635 · GitHub

Auto-scaling Kafka is complicated. It usually cannot be done just based on some CPU utilization etc.

  • If you want to scale consumers, you need to understand their consumer group membership and which topics are they consuming. Because the maximum number of replicas is for example limited with number of partitions from which they are consuming. You need to use tools such as for example KEDA to autoscale them which have some additional logic to take these things into account.
  • If you want to auto-scale components such as Connect, Connectors, Bridge etc., Strimzi gives you the scale subresources to plug it into Kubernetes HPA and tools like KEDA. These are basically consumers and producers in a special packaging. So the same rules as described above apply for them.
  • For Kafka brokers, auto-scaling is complicated because of their architecture. Adding or removing brokers is simple. But directing some load to them is complicated because they are in a way form of data storage. And moving the whole partitions between brokers is expensive. The partitions often contain huge amounts of data which need to be shifted from one broker to another - that will take time, it will have a performance penalty on the other traffic and possibly cost even real money for the data transfers. Plus it still might not work because if your bottleneck is for example a topic with 5 partitions, it might not matter whether you have 5 or 10 brokers. So from my experience, only rarely autoscaling of Kafka brokers makes sense.