Confluent Platform Documentation

Confluent Platform | Confluent Documentation

Overview
Get Started
- Platform Overview
- Quick Start
- Learn More About Confluent and Kafka
- Tutorial: Set Up a Multi-Broker Cluster
- Scripted Confluent Platform Demo
Install and Upgrade
- Overview
- System Requirements
- Supported Versions and Interoperability
- Install Manually
- Deploy with Ansible Playbooks
- Deploy with Confluent for Kubernetes
- License
- Upgrade
- Installation Packages
- Migrate to Confluent Platform
- Migrate to and from Confluent Server
- Migrate from Confluent Server to Confluent Kafka
- Migrate from ZooKeeper to KRaft
- Installation FAQ
Build Client Applications
- Overview
- Configure Clients
- Client Guides
- Client Examples
  - Overview
  - Python Client
  - .NET Client
  - JavaScript Client
  - Go Client
  - C++ Client
  - Java
  - Spring Boot
  - KafkaProducer
  - REST
  - Clojure
  - Groovy
  - Kafka Connect Datagen
  - kafkacat
  - Kotlin
  - Ruby
  - Rust
  - Scala
- Kafka Client APIs
- Deprecated Client APIs
- VS Code Extension
- MQTT Proxy
Build Kafka Streams Applications
- Overview
- Quick Start
- Streams API
- Tutorial: Streaming Application Development Basics on Confluent Platform
- Connect Streams to Confluent Cloud
- Concepts
- Architecture
- Examples
- Developer Guide
- Build Pipeline with Connect and Streams
- Operations
- Upgrade
- Frequently Asked Questions
- Javadocs
- ksqlDB
Confluent Private Cloud
- Overview
- Confluent Private Cloud Gateway
- Intelligent Replication
- Release Notes for Confluent Private Cloud
Confluent REST Proxy for Apache Kafka
- Overview
- Quick Start
- API Reference
- Production Deployment
- Connect to Confluent Cloud
Process Data With Flink
- Overview
- Installation and Upgrade
- Get Started
  - Get Started with Applications
  - Get Started with Statements
- Architecture and Features
- Configure Environments, Catalogs and Compute Pools
- Deploy and Manage Flink Jobs
- Disaster Recovery
- Clients and APIs
- How-to Guides
  - Overview
  - Checkpoint to S3
- FAQ
- Get Help
- What’s New
Connect to External Services
- Overview
- Get Started
- Connectors
- Confluent Hub
  - Overview
  - Component Archive Specification
  - Contribute
- Connect on z/OS
- Install
- License
- Supported
  - Supported Self-Managed Connectors
  - Supported Connector Versions in Confluent Platform 8.1
- Preview
- Configure
- Monitor
- Logging
- Connect to Confluent Cloud
- Developer Guide
- Tutorial: Moving Data In and Out of Kafka
- Reference
  - Kafka Connect Javadocs
  - REST interface
  - Kafka Connect Worker Configuration Properties for Confluent Platform
  - Connector Configuration Properties for Confluent Platform
- Transform
- Custom Transforms
- Security
  - Kafka Connect Security Basics
  - Kafka Connect and RBAC
  - Manage CSFLE in Confluent Cloud for Self-Managed Connectors
  - Manage CSFLE in Confluent Platform for Self-Managed Connectors
  - Manage CSFLE for partner-managed connectors
- Design
- Add Connectors and Software
- Install Community Connectors
- Upgrade
- Troubleshoot
- FileStream Connectors
- FAQ
Manage Schema Registry and Govern Data Streams
- Overview
- Get Started with Schema Registry Tutorial
- Install and Configure
- Fundamentals
- Manage Schemas
- Security
- Reference
- FAQ
Manage Security
- Overview
- Deployment Profiles
- Compliance
- Authenticate
- Authorize
- Protect Data
- Configure Security Properties using Prefixes
- Secure Components
- Enable Security for a Cluster
- Add Security to Running Clusters
- Configure Confluent Server Authorizer
- Security Management Tools
- Cluster Registry
- Encrypt using Client-Side Payload Encryption
Deploy Confluent Platform in a Multi-Datacenter Environment
- Overview
- Multi-Data Center Architectures on Confluent Platform
- Cluster Linking on Confluent Platform
- Multi-Region Clusters on Confluent Platform
- Replicate Topics Across Kafka Clusters in Confluent Platform
Configure and Manage
- Overview
- Configuration Reference
- CLI Tools for Use with Confluent Platform
- Change Configurations Without Restart
- Manage Clusters
  - Overview
  - Cluster Metadata Management
  - Manage Self-Balancing Clusters - SBC
    - Self-Balancing Clusters (SBC) is a Confluent feature that is designed to simplify the process of managing an Apache Kafka® cluster. With this feature enabled, a cluster automatically rebalances partitions across brokers when new brokers are added or existing brokers are removed. This ensures that data is evenly distributed across the cluster, which can improve performance and reduce the risk of data loss in the event of a broker failure.
    - Self-Balancing offers:
      - Fully automated load balancing.
      - Dynamic enablement, meaning you can turn it off or on while the cluster is running and set to rebalance when brokers are added or removed, or for any uneven load (anytime).
      - Auto-monitoring of clusters for imbalances based on a large set of parameters, configurations, and runtime variables
      - Continuous metrics aggregation and rebalancing plans, generated instantaneously in most cases, and executed automatically
      - Automatic triggering of rebalance operations based on simple configurations you set on Control Center for Confluent Platform or in Kafka server.properties files. You can choose to auto-balance Only when brokers are added or Anytime, which rebalances for any uneven load.
      - At-a-glance visibility into the state of your clusters, and the strategy and progress of auto-balancing through a few key metrics.
    - On CC, when scaling to a higher CFU, SBC by default redistributes only a subset of topics (~80%) to prioritize faster rebalancing. The distribution of remaining topics is deferred to later, based on observed throughput / other capacity breaches.
    - Overview
    - Tutorial: Adding and Remove Brokers
    - Configure
    - Performance and Resource Usage
  - Auto Data Balancing
  - Tiered Storage
- Metadata Service (MDS) in Confluent Platform
- Docker Operations for Confluent Platform
- Run Kafka in Production
- Production Best Practices
- Manage Topics
Manage Hybrid Environments with USM
- Overview
- Get Started with USM
- USM Agent Operations
- Schema Registry in Hybrid setup
Monitor with Control Center
Monitor
- Logging
- Monitor with JMX
- Monitor with Metrics Reporter
- Monitor Consumer Lag
- Monitor with Health+
Confluent CLI
Release Notes
- Release Notes
- Changelogs
APIs and Javadocs for Confluent Platform
- Overview
- Kafka API and Javadocs for Confluent Platform
- Client APIs
- Confluent APIs for Confluent Platform
Glossary

Multi-Region Clusters on Confluent Platform

Overview

Confluent Server is often run across availability zones or nearby datacenters. If the computer network between brokers across availability zones or nearby datacenters is dissimilar, in term of reliability, latency, bandwidth, or cost, this can result in higher latency, lower throughput and increased cost to produce and consume messages.

To mitigate this, three distinct pieces of functionality were added to Confluent Server:

Follower-Fetching - Before the introduction of this feature, all consume and produce operations took place on the leader. With Multi-Region Clusters, clients are allowed to consume from followers. This dramatically reduces the amount of cross-datacenter traffic between clients and brokers.
Observers - Historically there are two types of replicas: leaders and followers. Multi-Region Clusters introduces a third type of replica, observers. By default, observers will not join the in-sync replicas (ISR) but will try to keep up with the leader just like a follower. With follower-fetching, clients can also consume from observers.
- Automatic observer promotion - Automatic Observer Promotion is the process whereby an observer is promoted into the in-sync replicas list (ISR). This can be advantageous in certain degraded scenarios. For instance, if there have been enough broker failures for a given partition to be below its minimum in-sync replicas constraint then that partition would normally become offline. With automatic observer promotion, one or more observers can take the place of followers in the ISR keeping the partition online until the followers can be restored. Once followers have been restored (they are caught up and have rejoined the ISR) then the observers are automatically demoted from the ISR.
Replica Placement - Replica placement defines how to assign replicas to the partitions in a topic. This feature relies on the broker.rack property configured for each broker. For example, you can create a topic that uses observers with the new --replica-placement flag on kafka-topics to configure the internal property confluent.placement.constraints.

Tutorial: Multi-Region Clusters on Confluent Platform | Confluent Documentation

Multi-Region Clusters allow customers to run a single Apache Kafka® cluster across multiple datacenters. Often referred to as a stretch cluster, Multi-Region Clusters replicate data between datacenters across regional availability zones. You can choose how to replicate data, synchronously or asynchronously, on a per Kafka topic basis. It provides good durability guarantees and makes disaster recovery (DR) much easier.

Benefits:

Supports multi-site deployments of synchronous and asynchronous replication between datacenters
Consumers can leverage data locality for reading Kafka data, which means better performance and lower cost
Ordering of Kafka messages is preserved across datacenters
Consumer offsets are preserved
In event of a disaster in a datacenter, new leaders are automatically elected in the other datacenter for the topics configured for synchronous replication, and applications proceed without interruption, achieving very low RTOs and RPO=0 for those topics.

Kafka Stretch Cluster

A Kafka stretch cluster is a single logical cluster where brokers and consensus nodes (ZooKeeper or KRaft) are physically distributed across multiple data centers or availability zones.

Unlike multi-cluster setups that use asynchronous mirroring, a stretch cluster relies on synchronous replication to ensure data is written to multiple locations before acknowledging success to the producer.

Core Architecture & Mechanics

Logical Unity: It is managed as one cluster, meaning clients (producers/consumers) are generally unaware of the physical distribution.
Rack Awareness: Kafka uses the broker.rack configuration to identify each broker's physical location (e.g., DC1, DC2, DC3), ensuring partition replicas are spread across different sites.
Quorum Requirement: To maintain stability and prevent "split-brain" scenarios, a stretch cluster typically requires three data centers (or two plus a lightweight "tie-breaker" site) so a majority of consensus nodes can always form a quorum if one site fails.

Critical Configurations

To achieve the primary goal of zero data loss (RPO=0), specific settings are required:

acks=all: Producers must wait for all in-sync replicas to acknowledge the write.
min.insync.replicas: Usually set to 2 in a 3-replica setup, ensuring that if one DC fails, the remaining two can still accept writes.
Latency Limits: Because of synchronous writes, the network round-trip time (RTT) between sites directly impacts throughput. It is generally recommended to keep RTT under 50-100ms.

Pros and Cons

Feature	Benefit / Trade-off
Data Durability	RPO=0: No data loss during a single DC failure.
Availability	Near-zero RTO: Automatic failover with no manual intervention.
Complexity	High: Extremely difficult to monitor and tune for network flakiness.
Performance	Lower Throughput: Write latency is tethered to the slowest network link.
Cost	High: Requires expensive, low-latency inter-DC connectivity and redundant capacity.

Kafka Stretch Clusters. Introduction | by Sonam Vermani | Medium

Multi-Data Center Architectures on Confluent Platform | Confluent Documentation

Multi-data center
Multi-Cluster replication
2-Cluster Active-Passive
- A 2-cluster active-passive architecture involves two fully-operational clusters, each running a separate Confluent cluster, where one cluster is the “active” cluster serving all produce and consume requests and the second “passive” cluster is a copy of the “active” but without applications running against it. When the “active” data center fails, applications failover to the “passive” data center. The key difference between this and the 2-cluster active-active architecture is that the active-passive architecture has applications only running in one cluster during normal operating conditions.
2+-Cluster Active-Active
- A 2+-cluster active-active architecture involves two or more fully-independent clusters, each running a separate Confluent cluster, where each cluster is a copy of the other. When one of the data centers fails, applications failover to the other cluster.
Stretched Cluster 3-Data Center
- A stretched 3-data center cluster architecture in kraft mode involves three data centers that are connected by a low latency (sub-100ms) and stable (very tight p99s) network, usually a “dark fiber” network that is owned or leased privately by the company. Confluent Server nodes are configured with appropriate process.roles (either broker,controller for combined roles or dedicated broker and controller roles) and are spread evenly across the three data centers to form a single, stretched KRaft cluster. The KRaft controller quorum now manages the cluster metadata and handles leader elections for all Kafka brokers in this single, stretched cluster.
Stretched Cluster 2.5 Data Center
- A stretched 2.5-data center architecture involves two fully-operational data centers and one light (0.5) data center running a single, stretched cluster. The fully operational data centers run an equal number of Confluent Server nodes configured as brokers and controllers, whereas the light data center runs a subset of kraft controller nodes to maintain the controller quorum (the equivalent of running a single ZooKeeper server in legacy deployments). When any single data center fails, the KRaft controller quorum will remain available. When a fully operational data center fails, applications failover to the other data center.

Multi-Region Clusters on Confluent Platform​

Overview​

Tutorial: Multi-Region Clusters on Confluent Platform | Confluent Documentation​

Kafka Stretch Cluster​

Core Architecture & Mechanics​

Critical Configurations​

Pros and Cons​