Back to Home or Kafka

## Overview

Kafka is organized around a few key terms: topics, partitions, producers, consumers, and brokers.

• All Kafka messages are organized into topics. Messages are sent or received from topics
• A consumer pulls messages off of a Kafka topic while producers push messages into a Kafka topic.

Lastly, Kafka, as a distributed system, runs in a cluster. Each node in the cluster is called a Kafka broker.

Kafka topics are divided into a number of partitions. Partitions allow you to parallelize a topic by splitting the data in a particular topic across multiple brokers — each partition can be placed on a separate machine to allow for multiple consumers to read from a topic in parallel.

Consumers can also be parallelized so that multiple consumers can read from multiple partitions in a topic allowing for very high message processing throughput.

Each message within a partition has an identifier called its offset. The offset the ordering of messages as an immutable sequence. Kafka maintains this message ordering for you. Consumers can read messages starting from a specific offset and are allowed to read from any offset point they choose, allowing consumers to join the cluster at any point in time they see fit. Given these constraints, each specific message in a Kafka cluster can be uniquely identified by a tuple consisting of the message’s topic, partition, and offset within the partition.

## Kafka Setup and Concepts

• –replication-factor 1
• –partitions 1
• Kafka Connect cluster (which is a cluster of workers) is completely different from the
• Kafka cluster (which is a cluster of Kafka brokers).

## Kafka/Confluent

Confluent the commercial/supported version of Kafka quick start

Starting up:

\$ confluent start schema-registry
Starting zookeeper
zookeeper is [UP]
Starting kafka
kafka is [UP]
Starting schema-registry
schema-registry is [UP]


Piping in a file for testing:

kafka-console-producer.sh --broker-list localhost:9092 --topic my_topic < my_file.txt

### Consuming a File -> Console and Elastic

With kafka/confluent running, start the console consumer:

./kafka-avro-console-consumer --zookeeper localhost:2181 --topic catalog --from-beginning

we are watching a topic called catalog

Now injest data into the catalog topic. The connect-file-source.properties properties has been setup as:

name=catalog
connector.class=FileStreamSource
file=/home/richard/devTools/confluent-4.0.0/testData/testData.json
topic=catalog
connect-file-source

And a simple script to copy data to the testData.json to slowly stream the data:

#! /usr/bin/python
import os
import sys
import time

i = 0
f = open("20180126022806-20180126022131-full-taxonomy.json","r")
copy = open("testData.json","wt")
#loop that copies file line by line and terminates loop when i reaches 10
for line in f:
copy.write(str(line))
copy.flush()
i = i +1
sys.stdout.write('.')
sys.stdout.flush()
time.sleep(1)
f.close()
copy.close()

Now we can load the file source ..

./confluent load file-source

## Logging

As well as the /logs files .. one can connect to the running log files with:

 confluent log connect

### Making a Topic

this should create the catalog topic automatically. However if you want to manually make a topic:

./kafka-topics --create --zookeeper localhost:2181 --replication-factor 1   --partitions 1 --topic catalog

## Dashboard and Managers

• Confluent control center

## Kafka Streams

Offical docs Kafka Streams

Kafka Window SQL

## Kafka Consumer Groups

allow monitoring of micro-service health and workload distribution.

• Different instances of a micro-service are placed in consumer group.
• Kafka will automatically distribute the workload, provided that the customers are well balanced across the partitions.