Bucket aggregations always return a field named doc_count showing the number of documents that were aggregated and partitioned
in each bucket. Computation of the value of doc_count is very simple. doc_count is incremented by 1 for every document collected
in each bucket.
While this simple approach is effective when computing aggregations over individual documents, it fails to accurately represent
documents that store pre-aggregated data (such as histogram or aggregate_metric_double fields), because one summary field may
represent multiple documents.
To allow for correct computation of the number of documents when working with pre-aggregated data, we have introduced a
metadata field type named _doc_count. _doc_count must always be a positive integer representing the number of documents
aggregated in a single summary field.
When field _doc_count is added to a document, all bucket aggregations will respect its value and increment the bucket doc_count
by the value of the field. If a document does not contain any _doc_count field, _doc_count = 1 is implied by default.
-
A
_doc_countfield can only store a single positive integer per document. Nested arrays are not allowed. -
If a document contains no
_doc_countfields, aggregators will increment by 1, which is the default behavior.
The following create index API request creates a new index with the following field mappings:
-
my_histogram, ahistogramfield used to store percentile data -
my_text, akeywordfield used to store a title for the histogram
PUT my_index
{
"mappings" : {
"properties" : {
"my_histogram" : {
"type" : "histogram"
},
"my_text" : {
"type" : "keyword"
}
}
}
}
The following index API requests store pre-aggregated data for
two histograms: histogram_1 and histogram_2.
PUT my_index/_doc/1
{
"my_text" : "histogram_1",
"my_histogram" : {
"values" : [0.1, 0.2, 0.3, 0.4, 0.5],
"counts" : [3, 7, 23, 12, 6]
},
"_doc_count": 45
}
PUT my_index/_doc/2
{
"my_text" : "histogram_2",
"my_histogram" : {
"values" : [0.1, 0.25, 0.35, 0.4, 0.45, 0.5],
"counts" : [8, 17, 8, 7, 6, 2]
},
"_doc_count": 62
}
|
Field |
If we run the following terms aggregation on my_index:
GET /_search
{
"aggs" : {
"histogram_titles" : {
"terms" : { "field" : "my_text" }
}
}
}
We will get the following response:
{
...
"aggregations" : {
"histogram_titles" : {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets" : [
{
"key" : "histogram_2",
"doc_count" : 62
},
{
"key" : "histogram_1",
"doc_count" : 45
}
]
}
}
}