Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
comparison_of_hadoop_based_tools [2018/01/28 14:09]
root [Kafka Plugins]
comparison_of_hadoop_based_tools [2018/01/29 01:29] (current)
root
Line 33: Line 33:
 Once you have started a worker, look at the master’s web UI (http://​localhost:​8080 by default). You should see the new node listed there, along with its number of CPUs and memory (minus one gigabyte left for the OS). Once you have started a worker, look at the master’s web UI (http://​localhost:​8080 by default). You should see the new node listed there, along with its number of CPUs and memory (minus one gigabyte left for the OS).
  
-== Kafka/​Confluent ==  
  
-Confluent the commercial/​supported version of Kafka [[https://​docs.confluent.io/​current/​quickstart.html | quick start]] 
- 
-Starting up: 
-<​code>​ 
-$ confluent start schema-registry 
-Starting zookeeper 
-zookeeper is [UP] 
-Starting kafka 
-kafka is [UP] 
-Starting schema-registry 
-schema-registry is [UP] 
- 
-</​code>​ 
- 
-Piping in a file for testing: 
- 
-<​code>​ 
-kafka-console-producer.sh --broker-list localhost:​9092 --topic my_topic < my_file.txt 
-</​code>​ 
- 
-=== Consuming a File -> Console and Elastic === 
- 
-With kafka/​confluent running, start the console consumer: 
- 
-<​code>​ 
-./​kafka-avro-console-consumer --zookeeper localhost:​2181 --topic catalog --from-beginning 
-</​code>​ 
- 
-we are watching a topic called **catalog** 
- 
-Now injest data into the catalog topic. The **connect-file-source.properties** properties has been setup as: 
- 
-<​code>​ 
-name=catalog 
-connector.class=FileStreamSource 
-tasks.max=1 
-file=/​home/​richard/​devTools/​confluent-4.0.0/​testData/​testData.json 
-topic=catalog 
-connect-file-source 
-</​code>​ 
- 
-And a simple script to copy data to the testData.json to slowly stream the data: 
- 
-<code python> 
-#! /​usr/​bin/​python 
-import os 
-import sys 
-import time 
- 
- 
-i = 0 
-f = open("​20180126022806-20180126022131-full-taxonomy.json","​r"​) 
-copy = open("​testData.json","​wt"​) 
-#loop that copies file line by line and terminates loop when i reaches 10 
-for line in f:  
-     ​copy.write(str(line)) 
-     ​copy.flush() 
-     i = i +1 
-     ​sys.stdout.write('​.'​) 
-     ​sys.stdout.flush() 
-     ​time.sleep(1) 
-f.close() 
-copy.close() 
-</​code>​ 
- 
-Now we can load the file source ..  
- 
-<code console> 
-./confluent load file-source 
-</​code>​ 
- 
-== Logging ==  
-As well as the /logs files .. one can connect to the running log files with: 
- 
-<​code>​ 
- ​confluent log connect 
-</​code>​ 
-=== Making a Topic ===  
- 
-this should create the catalog topic automatically. However if you want to manually make a topic: 
-<code console> 
-./​kafka-topics --create --zookeeper localhost:​2181 --replication-factor 1   ​--partitions 1 --topic catalog 
-</​code>​ 
-== Kafka Plugins ==  
-* An alternate elastic https://​github.com/​Stratio/​kafka-elasticsearch-sink 
  
  
 
comparison_of_hadoop_based_tools.txt · Last modified: 2018/01/29 01:29 by root
 
RSS - 200 © CrosswireDigitialMedia Ltd