Back Home

In your install directory

curl -L -O -k  https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.2.tar.gz
tar -xvf elasticsearch-0.90.2.tar.gz
ln -s  elasticsearch-0.90.2 elasticsearch

Optional install the elastic search servies wrapper

curl -L -k http://github.com/elasticsearch/elasticsearch-servicewrapper/tarball/master | tar -xz
mv *servicewrapper*/service elasticsearch/bin/
elasticsearch/bin/service/elasticsearch install
ln -s `readlink -f elasticsearch/bin/service/elasticsearch` /usr/local/bin/rcelasticsearch
rcelasticsearch start

Configuraiton

Create a Unique name for the node:

/elasticsearch/config/elasticsearch.yml and on line 32 edit

cluster.name: PUT-SOMETHING-UNIQUE-HERE
  • Ideally increase the heap size /elasticsearch/bin/service/elasticsearch.conf
set.default.ES_HEAP_SIZE=1024

Filters for Analysis and Queries

Indexes and Analyzers

Since most people have a familiarity with sql databases this guide will provide details of the setup and configuraiton of the elastic database with reference to common relational database practices.

  • index An index is like a like relational database. It has
    • a mapping which defines the fields in the index, which are grouped by multiple type.
    • index is a logical namespace which maps to one or more primary shards and
    • zero or more replica shards.
curl -XPUT 'http://localhost:9200/twitter/' -d '
index :
    number_of_shards : 3
    number_of_replicas : 2'

Templates and Types

A mapping is like a schema definition in a relational database. Each index has a mapping, which defines each type within the index, plus a number of index-wide settings. A mapping can either be defined explicitly, or it will be generated automatically when a document is indexed.

emplate mapping that will automatically be applied to new indices created.

For example, if you planned to have indexes per year for twitter feeds (twitter2012, twitter2013, twitter2014) and you want to define 

{
    "template" : "twitter*",
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "tweet" : {
            "properties" : {
                "message" : {
                    "type" : "string",
                    "store" : "yes"
                }
            }
        }
    }
}

Elastic Type mappings are: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html

Core datatypes

Type Notes
string text and keyword
long, integer, short, byte, double, float, half_float, scaled_float Numeric datatypes
Date date
Boolean boolean
binary
integer_range, float_range, long_range, double_range, date_range range data types eg. between values
Array datatype Array support does not require a dedicated type
Object datatype object for single JSON objects
Nested datatype nested for arrays of JSON objects
geo_point for lat/lon points
Geo-Shape datatype geo_shape for complex shapes like polygons

Specialised datatype

ip for IPv4 and IPv6 addresses
completion to provide auto-complete suggestions
token_count to count the number of tokens in a string
murmur3 to compute hashes of values at index-time and store them in the index

Attachment datatype See the mapper-attachments plugin which supports indexing attachments like Microsoft Office formats, Open Document formats, ePub, HTML, etc. into an attachment datatype.

Percolator type

Multi-fieldsedit It is often useful to index the same field in different ways for different purposes. For instance, a string field could be mapped as a text field for full-text search, and as a keyword field for sorting or aggregations. Alternatively, you could index a text field with the standard analyzer, the english analyzer, and the french analyzer.

This is the purpose of multi-fields. Most datatypes support multi-fields via the fields parameter.

Extensions

Logstash currently pipes events to Statsd is a simple client/server mechanism from the folks at Etsy that allows operations and development teams to easily feed a variety of metrics into a Graphite system. For more info on statsd read the seminal blog article on Statsd “Measure Anything, Measure Everything”.

Rest Api Usage

Rest Examples

Insert Data:

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elastic Search"
}'

One can also use the Chrome Rest console, head and Sense

With curl commands large json files can be referenced as part of the command

#!/bin/bash
 
curl  -XPUT 'localhost:9200/analyzerTest?pretty' -H 'Content-Type: application/json' -d @testIndex.json

Querying

Full-text Queries Term Level Queries Joining Queries Geo Queries Specialized Queries Span Queries
Match Term Nested GeoDistance MoreLikeThis SpanTerm
Multi-match Terms HasChild GeoBoundingBox Template SpanMulti
Query String Range HasParent GeoShape Script SpanFirst
Simple Query String Exists ParentId GeoDistanceRange  SpanNear
Common Terms Missing GeoPolygon SpanOr
Prefix GeoHashCell SpanNot
Wildcard SpanContaining
Regexp SpanWithin
Fuzzy
Ids
 Type
Elastic Sql
query Select
GET _search { "query": { "match_all": {}  } } 
Select *
term for exact matching
 "query": { "term" : { "user" : "Kimchy" } } 
Select * from index where user == “Kimchy”

There are number different type of matches

  • term - term query finds documents that contain the exact term. Note if the field is analyzed and has words or characters removed this may not match .. even if the raw data is an exact match.
"bool" : {
      "must" : {
        "term" : { "correlationId" : "74560af3-a9b5-11e7-8563-fb711eff2a2e" }
      }
    }

is a valid query that may not match if the id is analyzed

  • match - queries can be more tolerate as the data appears to be analysed in the same way as the data
POST lambda-2017-10-04/_search
{
   "query": {
    "bool" : {
      "must" : {
        "match" : { "correlationId" : "74560af3-a9b5-11e7-8563-fb711eff2a2e" }
      }
    }
   }
}
  • match_phrase query analyzes the text and creates a phrase query out of the analyzed text. For example:
    {
        "match_phrase" : {
            "message" : "this is a test"
        }
    }
  • match_phrase
    {
        "match_phrase_prefix" : {
            "message" : "quick brown f"
        }
    }

Subset Results

Filtering the result set can be done with partials, including wild cards

{
  "partial_fields": {
    "subsetData": {
      "include": "@fields.logdate"
    }
  },
  "query": {
    "bool": {
      "must": [
        {
          "text": {
            "mule-app-shrimp_simpledebug.@fields.MessageType": "EVT-Title-New"
          }
        }
      ],
      "must_not": [],
      "should": []
    }
  },
  "from": 0,
  "size": 10,
  "sort": [],
  "facets": {}
}

Selecting columns returned

By default all fields are retuned in _source custom fields can be returned in _fields

  "fields": [
     "@message"
  ], 

full example:

POST _search/
{
  "query": {
    "bool": {
      "must": [
        {
          "text": {
            "mule-app-shrimp_simpledebug.@fields.MessageType": "EVT-Title-New"
          }
        }
      ],
      "must_not": [],
      "should": []      
    }    
  },
  "fields": [
     "@message"
  ], 
  "from": 0,
  "size": 10,
  "sort": [],
  "facets": {}
  , "filter": {}
}

Updates

Count

Elastic supports a “count” api that can replace search to get a total count of matching documents:

%elasticsearch
count /lambda-2017-11-15 {
"_source": ["application", "payload.errorDetails.errorCode"],
  "query": {
    "match": {
      "payload.details.action": {
        "query": "custSearchWithContent",
        "type": "phrase"
      }
    }
  }
}

Aggregate - More Complex Counts

More complex counts use the Agg api

To count index returns e.g in sql

SELECT COUNT(DISTINCT inv_number) FROM invoices;

is more like

 {
      "size": 0, 
      "aggs": {
        "total_invoices": {
          "terms": {
            "field": "inv_number" 
 
        },
        "aggs": {
          "unique_invoiceid": {
            "cardinality": {
              "field": "inv_number"
            }
          }
        }
      }
    }

Specialised Queries

https://www.elastic.co/guide/en/elasticsearch/reference/5.4/specialized-queries.html

Elastic provides a number of extended query tools, including:

  • more_like_this query This query finds documents which are similar to the specified text, document, or collection of documents.

Java Bulk Upload

import java.io.IOException;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
import org.elasticsearch.client.Client;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkProcessor; 
import org.elasticsearch.action.index.IndexRequest;
import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
 
import com.rishav.xml.xmlDef;
 
public class esBulkLoad {
 public static void main(String[] args) throws IOException {
 
  // HBase configuration
  Configuration config = HBaseConfiguration.create();
  HTable htable = new HTable(config, "xmlTest");
  Scan scan1 = new Scan();
  scan1.setCaching(500);
  ResultScanner scanner1 = htable.getScanner(scan1); 
 
  //ElasticSearch configuration
  Client client = new TransportClient().addTransportAddress(
    new InetSocketTransportAddress("localhost",9300));
 
  final Logger logger = LoggerFactory.getLogger(esBulkLoad.class);
 
  // define ES bulkprocessor
  BulkProcessor bulkProcessor = BulkProcessor.builder(client, new BulkProcessor.Listener() {
      @Override
      public void beforeBulk(long executionId, BulkRequest request) {
          logger.info("Going to execute new bulk composed of {} actions", request.numberOfActions());
      }
 
      @Override
      public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
          logger.info("Executed bulk composed of {} actions", request.numberOfActions());
      }
 
      @Override
      public void afterBulk(long executionId, BulkRequest request, Throwable failure) {
          logger.warn("Error executing bulk", failure);
      }
      }).setBulkActions(1000).setConcurrentRequests(1).build();
 
  // read Hbase records
  for (Result res : scanner1) {
   byte[] b;
 
   int col = 0;
   for (String xmlTag : xmlDef.xmlDef[3]) {
    b = res.getValue(Bytes.toBytes(xmlDef.xmlDef[2][col]), Bytes.toBytes(xmlDef.xmlDef[3][col]));
    xmlDef.xmlDef[5][col] = Bytes.toString(b);
    col++;
   }
 
   // build ES IndexRequest and add it to bulkProcessor
   IndexRequest iRequest = new IndexRequest("xml", "response", xmlDef.xmlDef[5][0]);
   iRequest.source(jsonBuilder().startObject()
        .field(xmlDef.xmlDef[3][0], xmlDef.xmlDef[5][0])
                          .field(xmlDef.xmlDef[3][1], xmlDef.xmlDef[5][1])
                          .field(xmlDef.xmlDef[3][2], xmlDef.xmlDef[5][2])
                          .field(xmlDef.xmlDef[3][3], xmlDef.xmlDef[5][3])
                          .field(xmlDef.xmlDef[3][4], xmlDef.xmlDef[5][4])
                      .endObject());
   bulkProcessor.add(iRequest);
  }
  scanner1.close();
  htable.close();
 
  // shutdown Elasticsearch node
  bulkProcessor.close();
 }
}

Unit Testing

  • ESTestCase - provides a test framework that doesn't need Elastic
public class SimpleEsUnitTest extends ESTestCase {
 
	@Test
	public void check_es_query() {
		// TODO do some stuff with lucene indexes ?
	}
}
  • ESIntegTestCase - starts an instance of elastic
@ClusterScope(scope = Scope.SUITE)
@ThreadLeakScope(ThreadLeakScope.Scope.NONE)
@RunWith(com.carrotsearch.randomizedtesting.RandomizedRunner.class)
public class OpenTraceSearchIntegrationTest extends ESIntegTestCase {
 
	private Client client;
 
	@Override
	@Before
	public void setUp() throws Exception {
		super.setUp();
		this.client = client();
	}
 
@Test
	public void search_open_tracing_traces_by_operationName() throws Exception {
		// GIVEN
		final String operationName = randomOperationName();
		indexOpenTraceDocument(randomTraceID(), operationName);
		indexOpenTraceDocument(randomTraceID(), operationName);
 
		// WHEN
		final SearchResponse response = searchOpenTracesByOperationName(operationName);
 
		// THEN
		then(response.getHits())
				.hasSize(2)
				.extracting(SearchHit::getSourceAsMap)
				.allSatisfy(hit -> then(hit).containsEntry(OPERATION_NAME_FIELD, operationName));
	}

Where indexOpenTraceDocument inserts the document into elastic before hand

private void indexOpenTraceDocument(final String traceID, final String operationName) throws Exception {
		this.client.prepareIndex(OPEN_TRACE_INDEX, TRACE_TYPE)
				.setSource(generateOpenTrace(traceID, operationName), XContentType.JSON)
				.execute()
				.get();
		// refreshes the index otherwise we would not find anything
		refresh();
	}
 
	private static String generateOpenTrace(final String traceID, final String operationName) {
		return "{\n" +
				"   \"traceID\": \"" + traceID + "\",\n" +
				"   \"spanID\": \"3b1237777ef2d83\",\n" +
				"   \"parentSpanID\": \"bbe20e919b94f710\",\n" +
				"   \"operationName\": \"" + operationName + "\",\n" +
				"   \"references\": [],\n" +
				"   \"startTime\": 1510878645507000,\n" +
				"   \"duration\": 129000,\n" +
				"   \"tags\": [\n" +
				"     {\n" +
				"       \"key\": \"mvc.controller.class\",\n" +
				"       \"type\": \"string\",\n" +
				"       \"value\": \"Apis\"\n" +
				"     },\n" +
				"     {\n" +
				"       \"key\": \"mvc.controller.method\",\n" +
				"       \"type\": \"string\",\n" +
				"       \"value\": \"pong\"\n" +
				"     },\n" +
				"     {\n" +
				"       \"key\": \"source\",\n" +
				"       \"type\": \"string\",\n" +
				"       \"value\": \"KevinWasPong\"\n" +
				"     },\n" +
				"     {\n" +
				"       \"key\": \"spring.instance_id\",\n" +
				"       \"type\": \"string\",\n" +
				"       \"value\": \"172.20.41.251:Service2:18081\"\n" +
				"     },\n" +
				"     {\n" +
				"       \"key\": \"span.kind\",\n" +
				"       \"type\": \"string\",\n" +
				"       \"value\": \"server\"\n" +
				"     }\n" +
				"   ],\n" +
				"   \"logs\": [],\n" +
				"   \"processID\": \"\",\n" +
				"   \"process\": {\n" +
				"     \"serviceName\": \"service2\",\n" +
				"     \"tags\": [\n" +
				"       {\n" +
				"         \"key\": \"ip\",\n" +
				"         \"type\": \"int64\",\n" +
				"         \"value\": \"-1407964677\"\n" +
				"       }\n" +
				"     ]\n" +
				"   },\n" +
				"   \"warnings\": null,\n" +
				"   \"startTimeMillis\": 1510878645507\n" +
				" }";
	}

Elastic Machine Learning

Plugins

 
elastic_search.txt · Last modified: 2017/12/15 03:34 by root
 
RSS - 200 © CrosswireDigitialMedia Ltd