Marks specified tokens as keywords, which are not stemmed.
The keyword_marker
filter assigns specified tokens a keyword
attribute of
true
. Stemmer token filters, such as
stemmer
or
porter_stem
, skip tokens with a keyword
attribute of true
.
To work properly, the keyword_marker
filter must be listed before any stemmer
token filters in the analyzer configuration.
The keyword_marker
filter uses Lucene’s
KeywordMarkerFilter.
To see how the keyword_marker
filter works, you first need to produce a token
stream containing stemmed tokens.
The following analyze API request uses the
stemmer
filter to create stemmed tokens for
fox running and jumping
.
GET /_analyze { "tokenizer": "whitespace", "filter": [ "stemmer" ], "text": "fox running and jumping" }
The request produces the following tokens. Note that running
was stemmed to
run
and jumping
was stemmed to jump
.
[ fox, run, and, jump ]
To prevent jumping
from being stemmed, add the keyword_marker
filter before
the stemmer
filter in the previous analyze API request. Specify jumping
in
the keywords
parameter of the keyword_marker
filter.
GET /_analyze { "tokenizer": "whitespace", "filter": [ { "type": "keyword_marker", "keywords": [ "jumping" ] }, "stemmer" ], "text": "fox running and jumping" }
The request produces the following tokens. running
is still stemmed to run
,
but jumping
is not stemmed.
[ fox, run, and, jumping ]
To see the keyword
attribute for these tokens, add the following arguments to
the analyze API request:
-
explain
:true
-
attributes
:keyword
GET /_analyze { "tokenizer": "whitespace", "filter": [ { "type": "keyword_marker", "keywords": [ "jumping" ] }, "stemmer" ], "text": "fox running and jumping", "explain": true, "attributes": "keyword" }
The API returns the following response. Note the jumping
token has a
keyword
attribute of true
.
{ "detail": { "custom_analyzer": true, "charfilters": [], "tokenizer": { "name": "whitespace", "tokens": [ { "token": "fox", "start_offset": 0, "end_offset": 3, "type": "word", "position": 0 }, { "token": "running", "start_offset": 4, "end_offset": 11, "type": "word", "position": 1 }, { "token": "and", "start_offset": 12, "end_offset": 15, "type": "word", "position": 2 }, { "token": "jumping", "start_offset": 16, "end_offset": 23, "type": "word", "position": 3 } ] }, "tokenfilters": [ { "name": "__anonymous__keyword_marker", "tokens": [ { "token": "fox", "start_offset": 0, "end_offset": 3, "type": "word", "position": 0, "keyword": false }, { "token": "running", "start_offset": 4, "end_offset": 11, "type": "word", "position": 1, "keyword": false }, { "token": "and", "start_offset": 12, "end_offset": 15, "type": "word", "position": 2, "keyword": false }, { "token": "jumping", "start_offset": 16, "end_offset": 23, "type": "word", "position": 3, "keyword": true } ] }, { "name": "stemmer", "tokens": [ { "token": "fox", "start_offset": 0, "end_offset": 3, "type": "word", "position": 0, "keyword": false }, { "token": "run", "start_offset": 4, "end_offset": 11, "type": "word", "position": 1, "keyword": false }, { "token": "and", "start_offset": 12, "end_offset": 15, "type": "word", "position": 2, "keyword": false }, { "token": "jumping", "start_offset": 16, "end_offset": 23, "type": "word", "position": 3, "keyword": true } ] } ] } }
-
ignore_case
-
(Optional, Boolean)
If
true
, matching for thekeywords
andkeywords_path
parameters ignores letter case. Defaults tofalse
. -
keywords
-
(Required*, array of strings) Array of keywords. Tokens that match these keywords are not stemmed.
This parameter,
keywords_path
, orkeywords_pattern
must be specified. You cannot specify this parameter andkeywords_pattern
. -
keywords_path
-
(Required*, string) Path to a file that contains a list of keywords. Tokens that match these keywords are not stemmed.
This path must be absolute or relative to the
config
location, and the file must be UTF-8 encoded. Each word in the file must be separated by a line break.This parameter,
keywords
, orkeywords_pattern
must be specified. You cannot specify this parameter andkeywords_pattern
. -
keywords_pattern
-
(Required*, string) Java regular expression used to match tokens. Tokens that match this expression are marked as keywords and not stemmed.
This parameter,
keywords
, orkeywords_path
must be specified. You cannot specify this parameter andkeywords
orkeywords_pattern
.Poorly written regular expressions can cause Elasticsearch to run slowly or result in stack overflow errors, causing the running node to suddenly exit.
To customize the keyword_marker
filter, duplicate it to create the basis for a
new custom token filter. You can modify the filter using its configurable
parameters.
For example, the following create index API request
uses a custom keyword_marker
filter and the porter_stem
filter to configure a new custom analyzer.
The custom keyword_marker
filter marks tokens specified in the
analysis/example_word_list.txt
file as keywords. The porter_stem
filter does
not stem these tokens.
PUT /my-index-000001 { "settings": { "analysis": { "analyzer": { "my_custom_analyzer": { "type": "custom", "tokenizer": "standard", "filter": [ "my_custom_keyword_marker_filter", "porter_stem" ] } }, "filter": { "my_custom_keyword_marker_filter": { "type": "keyword_marker", "keywords_path": "analysis/example_word_list.txt" } } } } }