$percentile (aggregation)
Definition
$percentileNew in version 7.0.
Returns an array of scalar values that correspond to specified percentile values.
You can use
$percentileas an accumulator in the$groupstage or as an aggegation expression.
Syntax
The syntax for $percentile is:
{ $percentile: { input: <expression>, p: [ <expression1>, <expression2>, ... ], method: <string> } }
Command Fields
$percentile takes the following fields:
Field | Type | Necessity | Description |
|---|---|---|---|
| Expression | Required |
|
| Expression | Required |
|
| String | Required | The method that |
Behavior
You can use $percentile in:
$groupstages as an accumulator$setWindowFieldsstages as an accumulator$projectstages as an aggregation expression
$percentile has the following characteristics as an accumulator, it:
Calculates a single result for all the documents in the stage.
Uses the t-digest algorithm to calculate approximate, percentile based metrics.
Uses approximate methods to scale to large volumes of data.
$percentile has the following characteristics as an aggregation
expression, it:
Accepts an array as input
Calculates a separate result for each input document
Type of Operation
In a $group stage, $percentile is an accumulator and calculates
a value for all documents in the window.
In a $project stage, $percentile is an aggregation expression and
calculates values for each document.
In $setWindowFields stages, $percentile returns a result
for each document like an aggregation expression, but the results are
computed over groups of documents like an accumulator.
Calculation Considerations
In $group stages, $percentile always uses an approximate
calculation method.
In $project stages, $percentile might use the discrete
calculation method even when the approximate method is specified.
In $setWindowFields stages, the workload determines the calculation
method that $percentile uses.
The computed percentiles $percentile returns might vary, even on the
same datasets. This is because the algorithm calculates approximate
values.
Duplicate samples can cause ambiguity. If there are a large number of duplicates, the percentile values may not represent the actual sample distribution. Consider a data set where all the samples are the same. All of the values in the data set fall at or below any percentile. A "50th percentile" value would actually represent either 0 or 100 percent of the samples.
$percentile returns the minimum value for p = 0.0.
$percentile returns the maximum value for p = 1.0.
Array Input
If you use $percentile as an aggregation expression in a
$project stage, you can use an array as input.
The syntax is:
{ $percentile: { input: [ <expression1, <expression2>, .., <expressionN> ], p: [ <expression1>, <expression2>, ... ], method: <string> } }
Window Functions
A window function lets you calculate results over a moving "window" of
neighboring documents. As each document passes though the pipeline, the
$setWindowFields stage:
Recomputes the set of documents in the current window
calculates a value for all documents in the set
returns a single value for that document
You can use $percentile in a $setWindowFields stage to calculate
rolling statistics for time series or
other related data.
When you use $percentile in a $setWindowField stage, the
input value must be a field name. If you enter an array instead of a
field name, the operation fails.
Examples
The following examples use the testScores collection. Create the
collection:
db.testScores.insertMany( [ { studentId: "2345", test01: 62, test02: 81, test03: 80 }, { studentId: "2356", test01: 60, test02: 83, test03: 79 }, { studentId: "2358", test01: 67, test02: 82, test03: 78 }, { studentId: "2367", test01: 64, test02: 72, test03: 77 }, { studentId: "2369", test01: 60, test02: 53, test03: 72 } ] )
Calculate a Single Value as an Accumulator
Create an accumulator that calculates a single percentile value:
db.testScores.aggregate( [ { $group: { _id: null, test01_percentiles: { $percentile: { input: "$test01", p: [ 0.95 ], method: 'approximate' } }, } } ] )
Output:
{ _id: null, test01_percentiles: [ 67 ] }
The _id field value is null so $group selects all the
documents in the collection.
The percentile accumulator takes its input data from the test01
field.
In this example, the percentiles array, p, has one value so the
$percentile operator only calculates one term for the test01
data. The 95th percentile value is 67.
Calculate Multiple Values as an Accumulator
Create an accumulator that calculates multiple percentile values:
db.testScores.aggregate( [ { $group: { _id: null, test01_percentiles: { $percentile: { input: "$test01", p: [ 0.5, 0.75, 0.9, 0.95 ], method: 'approximate' } }, test02_percentiles: { $percentile: { input: "$test02", p: [ 0.5, 0.75, 0.9, 0.95 ], method: 'approximate' } }, test03_percentiles: { $percentile: { input: "$test03", p: [ 0.5, 0.75, 0.9, 0.95 ], method: 'approximate' } }, test03_percent_alt: { $percentile: { input: "$test03", p: [ 0.9, 0.5, 0.75, 0.95 ], method: 'approximate' } }, } } ] )
Output:
{ _id: null, test01_percentiles: [ 62, 64, 67, 67 ], test02_percentiles: [ 81, 82, 83, 83 ], test03_percentiles: [ 78, 79, 80, 80 ], test03_percent_alt: [ 80, 78, 79, 80 ] }
The _id field value is null so $group selects all the
documents in the collection.
The percentile accumulator calculates values for three fields,
test01, test02, and test03.
The accumulator calculates the 50th, 75th, 90th, and 95th percentile values for each input field.
The percentile values are returned in the same order as the elements of
p. The values in test03_percentiles and test03_percent_alt
are the same, but their order is different. The order of elements in
each result array matches the corresponding order of elements in p.
Use $percentile in a $project Stage
In a $project stage, $percentile is an aggregation expression and
calculates values for each document.
You can use a field name or an array as input in a $project stage.
db.testScores.aggregate( [ { $project: { _id: 0, studentId: 1, testPercentiles: { $percentile: { input: [ "$test01", "$test02", "$test03" ], p: [ 0.5, 0.95 ], method: 'approximate' } } } } ] )
Output:
{ studentId: '2345', testPercentiles: [ 80, 81 ] }, { studentId: '2356', testPercentiles: [ 79, 83 ] }, { studentId: '2358', testPercentiles: [ 78, 82 ] }, { studentId: '2367', testPercentiles: [ 72, 77 ] }, { studentId: '2369', testPercentiles: [ 60, 72 ] }
When $percentile is an aggregation expression there is a result for
each studentId.
Use $percentile in a $setWindowField Stage
To base your percentile values on local data trends, use $percentile
in a $setWindowField aggregation pipeline stage.
This example creates a window to filter scores:
db.testScores.aggregate( [ { $setWindowFields: { sortBy: { test01: 1 }, output: { test01_95percentile: { $percentile: { input: "$test01", p: [ 0.95 ], method: 'approximate' }, window: { range: [ -3, 3 ] } } } } }, { $project: { _id: 0, studentId: 1, test01_95percentile: 1 } } ] )
Output:
{ studentId: '2356', test01_95percentile: [ 62 ] }, { studentId: '2369', test01_95percentile: [ 62 ] }, { studentId: '2345', test01_95percentile: [ 64 ] }, { studentId: '2367', test01_95percentile: [ 67 ] }, { studentId: '2358', test01_95percentile: [ 67 ] }
In this example, the percentile calculation for each document also incorporates data from the three documents before and after it.
Learn More
The $median operator is a special case of the
$percentile operator that uses a fixed value of p: [ 0.5 ].
For more information on window functions, see:
$setWindowFields.