Kafka Connect
- Connectors and tasks
- Connectors
- Determining how many tasks will run for the connector
- Deciding how to split the data-copying work between the tasks
- Getting configurations for the tasks from the workers and passing it along
- Tasks
- Tasks are responsible for actually getting the data in and out of Kafka
- Connectors
- Workers
Kafka Connect's worker processes are the "container' processes that execute the connectors and tasks
- Connectors and tasks are responsible for the "moving data" part of data integrations, while the workers are responsible for the REST API, configuration management, reliability, high availability, scaling, and load balancing
- Source Connectors bring external data into Kafka (like capturing changes in Salesforce)
- Sink Connectors send Kafka data to external systems (like pushing records into Elasticsearch)
Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It makes it simple to quickly define connectors that move large collections of data into and out of Kafka. Kafka Connect can ingest entire databases or collect metrics from all your application servers into Kafka topics, making the data available for stream processing with low latency. An export job can deliver data from Kafka topics into secondary storage and query systems or into batch systems for offline analysis.