Differences

This shows you the differences between two versions of the page.

Link to this comparison view

elastic_custom_analyser [2017/11/15 01:07] (current)
root created
Line 1: Line 1:
 +An important component of the product search offering is how products are indexed and queries are analysed. Product search uses a custom analyser which configures how a query is broken down. Among other things this determines how:
 +* queries are stemmed
 +* what synonyms are used
 +* what stop words are utilised ​
 +* in general how language is processed
  
 + An understanding of these topics is useful for anyone carrying out new development work against elasticsearch. The query analyser is built up of a number of filters which are applied in order. An analyser is applied both during document indexing and at query time. The elasticsearch documentation on default and custom analysers is a useful resource on this topic.
 +
 +== Index time analysis == 
 +
 +When a document is added to the index each field can have a unique analyser applied to it. For most fields the default analysers are sufficient. For more complicated fields, and especially those which are generally written in English and are likely to be searchable a custom analyser is used. Which analyser is used against which field is specified when the index is created. This is generally handled using the es-tools application which has it's own documentation. The mapping json which is loaded into an index can specify which analyser to use for which field. A number of fields (name, extendedDescription) use a custom analyser (wtr-analyser) which is configured as in the code block below.
 +
 +<code json>
 +"​wtr_analyzer":​ {
 +    "​tokenizer":​ "​standard",​
 +    "​filter":​ [
 +        "​english_possessive_stemmer",​
 +        "​lowercase",​
 +        "​asciifolding",​
 +        "​english_stop",​
 +        "​stemmer_override",​
 +        "​minimal_english"​
 +    ]
 +}
 +</​code>​
 +
 +All the **filters** in this analyser are default standard filters provided by elasticsearch. A custom analyser is used into order to control exactly which filters are used and in which order. Any time a document is inserted if a field has the wtr_analyzer specified then its contents will be processed by the analysers filters prior to indexing and insertion into Elasticsearch. The topic of stemming is particularly important and the consequences to any changes to this configuration should be well understood prior to making changes. The Elastic documentation on choosing a stemmer is of particular importance here.
 +
 +Filter
 +Documentation
 +tokenizer https://​www.elastic.co/​guide/​en/​elasticsearch/​reference/​current/​analysis-standard-tokenizer.html
 +english_possessive_stemmer Removes 's from the end of words
 +lowercase Lowercases prior to indexing
 +asciifolding Folds none ascii characters to ascii
 +english_stop https://​www.elastic.co/​guide/​en/​elasticsearch/​reference/​current/​analysis-stop-tokenfilter.html
 +stemmer_override Overrides default stemming provided by the minimal_english stemmer
 +minimal_english https://​www.elastic.co/​guide/​en/​elasticsearch/​guide/​current/​choosing-a-stemmer.html
 +Query Time Analysis
 +Prior to searching Elasticsearch the query terms are also passed through the query analyser. The analyser is configured against a given index and can be updated at any time. Which analyser to use on the search terms forms part of the query into Elasticsearch. The configuration of which analyser to use is described on the confluence page which describes the product catalog search query. The query analyser filters are initially configured when the index is created by the es-tools application. The filters employed in the wtr_query_analyzer are as described in the following code block.
 +
 +<code json>
 +"​wtr_query_analyzer":​ {
 +    "​tokenizer":​ "​standard",​
 +    "​filter":​ [
 +        "​english_possessive_stemmer",​
 +        "​lowercase",​
 +        "​asciifolding",​
 +        "​english_stop",​
 +        "​stemmer_override",​
 +        "​minimal_english",​
 +        "​wtr_synonym_filter",​
 +        "​dictionary_decompounder"​
 +    ]
 +}
 +</​code>​
 +
 +The filters used at query time are roughly the same as those used at index time with the exception of the dictionary decompounder and the wtr_synonym_filter. The wtr_synonym_filter provides a custom list of synonyms which are applied to queries. Initial configuration of the synonym filter is handled by the es-tools application,​ however, this list can be modified and amended by elastic search manager tool.
 +
 +== Links and Reference == 
 +
 +* wtr_synonym_filter https://​www.elastic.co/​guide/​en/​elasticsearch/​reference/​current/​analysis-synonym-tokenfilter.html
 +* dictionary_decompounder https://​www.elastic.co/​guide/​en/​elasticsearch/​reference/​current/​analysis-compound-word-tokenfilter.html
 +* Additional information on synonyms can be found in the Elasticsearch documentation: ​
 +** https://​www.elastic.co/​guide/​en/​elasticsearch/​guide/​current/​using-synonyms.html
 +** https://​www.elastic.co/​guide/​en/​elasticsearch/​guide/​current/​synonym-formats.html
 +** https://​www.elastic.co/​guide/​en/​elasticsearch/​guide/​current/​synonyms-expand-or-contract.html
 
elastic_custom_analyser.txt · Last modified: 2017/11/15 01:07 by root
 
RSS - 200 © CrosswireDigitialMedia Ltd