Back Home

Index

Based table of data

CREATE TABLE pictures (
  id SERIAL PRIMARY KEY,
  title VARCHAR(300),
  ..

Create a reverse index on the data

CREATE INDEX pictures_title ON pictures 
  USING gin(to_tsvector('english', title));

One may need to use a trigger on inserts to update the text search index .. there is a tsvector_update_trigger

Tokenization and Stemming

Accents

To do workout what the dictionary 'english' is doing in to_tsquery and to_tsvector since my demos shows not much

SELECT to_tsvector('english', 'Häagen'), to_tsquery('english', 'Häagen')

returns

'häagen':1	'häagen'

However if we add the unaccent extension/function

CREATE extension unaccent 

then for

SELECT to_tsvector('english', unaccent('Häagen')), to_tsquery('english', unaccent('Häagen'))

we can get a match

'haagen':1	'haagen'

Phrase Searches

Ranking

ts_rank_cd

Auto Complete

The basic search covers the text into a query and use the @@ operator

SELECT document_id, document_text FROM documents  
WHERE document_tokens @@ to_tsquery('jump & quick');  

This will return if the (stemmed) words are present. Below is the phrase queries where we can restrict and order where the words are actually found

Synonyms

Ranking and Boosting

setweight(to_tsvector('english', '...'), 'A', '{vehicl}')

Stats on Search Words

to_tsvector will give us a stemmed and stop word removed data set

SELECT to_tsvector(''english'', description ) FROM sql_product

This can be passed to ts_stat to count the most common words

SELECT * FROM ts_stat( 'select to_tsvector(''english'', description ) from sql_product')
ORDER BY nentry DESC

the ts_stat function gives us statistics on the test search

  • ndoc - number of documents the word appears in
  • nentry - the number of occurrences of the word
 
postgres_text_search.txt · Last modified: 2019/02/28 08:16 by root
 
RSS - 200 © CrosswireDigitialMedia Ltd