You are viewing an offline version of MongoDB documentation. Some page features might be unavailable. To view the latest version of the page or use interactive features, visit the live page.

Database Manual

Aggregation Operations

Reference

Operators

$percentile (aggregation)

Definition

$percentile

New in version 7.0.

Returns an array of scalar values that correspond to specified percentile values.

You can use $percentile as an accumulator in the $group stage or as an aggegation expression.

Syntax

The syntax for $percentile is:

{
   $percentile: {
      input: <expression>,
      p: [ <expression1>, <expression2>, ... ],
      method: <string>
   }
}

Command Fields

$percentile takes the following fields:

Field	Type	Necessity	Description
`input`	Expression	Required	`$percentile` calculates the percentile values of this data. `input` must be a field name or an expression that evaluates to a numeric type. If the expression cannot be converted to a numeric type, the `$percentile` calculation ignores it.
`p`	Expression	Required	`$percentile` calculates a percentile value for each element in `p`. The elements represent percentages and must evaluate to numeric values in the range `0.0` to `1.0`, inclusive. `$percentile` returns results in the same order as the elements in `p`.
`method`	String	Required	The method that `mongod` uses to calculate the percentile value. The method must be `'approximate'`.

Behavior

You can use $percentile in:

$group stages as an accumulator
$setWindowFields stages as an accumulator
$project stages as an aggregation expression

$percentile has the following characteristics as an accumulator, it:

Calculates a single result for all the documents in the stage.
Uses the t-digest algorithm to calculate approximate, percentile based metrics.
Uses approximate methods to scale to large volumes of data.

$percentile has the following characteristics as an aggregation expression, it:

Accepts an array as input
Calculates a separate result for each input document

Type of Operation

In a $group stage, $percentile is an accumulator and calculates a value for all documents in the window.

In a $project stage, $percentile is an aggregation expression and calculates values for each document.

In $setWindowFields stages, $percentile returns a result for each document like an aggregation expression, but the results are computed over groups of documents like an accumulator.

Calculation Considerations

In $group stages, $percentile always uses an approximate calculation method.

In $project stages, $percentile might use the discrete calculation method even when the approximate method is specified.

In $setWindowFields stages, the workload determines the calculation method that $percentile uses.

The computed percentiles $percentile returns might vary, even on the same datasets. This is because the algorithm calculates approximate values.

Duplicate samples can cause ambiguity. If there are a large number of duplicates, the percentile values may not represent the actual sample distribution. Consider a data set where all the samples are the same. All of the values in the data set fall at or below any percentile. A "50th percentile" value would actually represent either 0 or 100 percent of the samples.

$percentile returns the minimum value for p = 0.0.

$percentile returns the maximum value for p = 1.0.

Array Input

If you use $percentile as an aggregation expression in a $project stage, you can use an array as input. The syntax is:

{
   $percentile: {
      input: [ <expression1, <expression2>, .., <expressionN> ],
      p: [ <expression1>, <expression2>, ... ],
      method: <string>
   }
}

Window Functions

A window function lets you calculate results over a moving "window" of neighboring documents. As each document passes though the pipeline, the $setWindowFields stage:

Recomputes the set of documents in the current window
calculates a value for all documents in the set
returns a single value for that document

You can use $percentile in a $setWindowFields stage to calculate rolling statistics for time series or other related data.

When you use $percentile in a $setWindowField stage, the input value must be a field name. If you enter an array instead of a field name, the operation fails.

Examples

The following examples use the testScores collection. Create the collection:

db.testScores.insertMany( [
   { studentId: "2345", test01: 62, test02: 81, test03: 80 },
   { studentId: "2356", test01: 60, test02: 83, test03: 79 },
   { studentId: "2358", test01: 67, test02: 82, test03: 78 },
   { studentId: "2367", test01: 64, test02: 72, test03: 77 },
   { studentId: "2369", test01: 60, test02: 53, test03: 72 }
] )

Calculate a Single Value as an Accumulator

Create an accumulator that calculates a single percentile value:

db.testScores.aggregate( [
   {
      $group: {
         _id: null,
         test01_percentiles: {
            $percentile: {
               input: "$test01",
               p: [ 0.95 ],
               method: 'approximate'
            }
         },
      }
   }
] )

Output:

{ _id: null, test01_percentiles: [ 67 ] }

The _id field value is null so $group selects all the documents in the collection.

The percentile accumulator takes its input data from the test01 field.

In this example, the percentiles array, p, has one value so the $percentile operator only calculates one term for the test01 data. The 95th percentile value is 67.

Calculate Multiple Values as an Accumulator

Create an accumulator that calculates multiple percentile values:

db.testScores.aggregate( [
   {
       $group: {
          _id: null,
          test01_percentiles: {
             $percentile: {
                input: "$test01",
                p: [ 0.5, 0.75, 0.9, 0.95 ],
                method: 'approximate'
             }
          },
          test02_percentiles: {
             $percentile: {
                input: "$test02",
                p: [ 0.5, 0.75, 0.9, 0.95 ],
                method: 'approximate'
             }
          },
          test03_percentiles: {
             $percentile: {
                input: "$test03",
                p: [ 0.5, 0.75, 0.9, 0.95 ],
                method: 'approximate'
             }
          },
          test03_percent_alt: {
             $percentile: {
                input: "$test03",
                p: [ 0.9, 0.5, 0.75, 0.95 ],
                method: 'approximate'
             }
          },
       }
    }
] )

Output:

{
    _id: null,
   test01_percentiles: [ 62, 64, 67, 67 ],
   test02_percentiles: [ 81, 82, 83, 83 ],
   test03_percentiles: [ 78, 79, 80, 80 ],
   test03_percent_alt: [ 80, 78, 79, 80 ]
}

The _id field value is null so $group selects all the documents in the collection.

The percentile accumulator calculates values for three fields, test01, test02, and test03.

The accumulator calculates the 50th, 75th, 90th, and 95th percentile values for each input field.

The percentile values are returned in the same order as the elements of p. The values in test03_percentiles and test03_percent_alt are the same, but their order is different. The order of elements in each result array matches the corresponding order of elements in p.

Use `$percentile` in a `$project` Stage

In a $project stage, $percentile is an aggregation expression and calculates values for each document.

You can use a field name or an array as input in a $project stage.

db.testScores.aggregate( [
   {
      $project: {
         _id: 0,
         studentId: 1,
         testPercentiles: {
            $percentile: {
               input: [ "$test01", "$test02", "$test03" ],
               p: [ 0.5, 0.95 ],
               method: 'approximate'
            }
         }
      }
   }
] )

Output:

{ studentId: '2345', testPercentiles: [ 80, 81 ] },
{ studentId: '2356', testPercentiles: [ 79, 83 ] },
{ studentId: '2358', testPercentiles: [ 78, 82 ] },
{ studentId: '2367', testPercentiles: [ 72, 77 ] },
{ studentId: '2369', testPercentiles: [ 60, 72 ] }

When $percentile is an aggregation expression there is a result for each studentId.

Use `$percentile` in a `$setWindowField` Stage

To base your percentile values on local data trends, use $percentile in a $setWindowField aggregation pipeline stage.

This example creates a window to filter scores:

db.testScores.aggregate( [
   {
      $setWindowFields: {
         sortBy: { test01: 1 },
         output: {
            test01_95percentile: {
               $percentile: {
                  input: "$test01",
                  p: [ 0.95 ],
                  method: 'approximate'
               },
               window: {
                  range: [ -3, 3 ]
               }
            }
         }
      }
   },
   {
      $project: {
         _id: 0,
         studentId: 1,
         test01_95percentile: 1
      }
   }
] )

Output:

{ studentId: '2356', test01_95percentile: [ 62 ] },
{ studentId: '2369', test01_95percentile: [ 62 ] },
{ studentId: '2345', test01_95percentile: [ 64 ] },
{ studentId: '2367', test01_95percentile: [ 67 ] },
{ studentId: '2358', test01_95percentile: [ 67 ] }

In this example, the percentile calculation for each document also incorporates data from the three documents before and after it.

Learn More

The $median operator is a special case of the $percentile operator that uses a fixed value of p: [ 0.5 ].

For more information on window functions, see: $setWindowFields.

Back

$or

$pow

Definition

Syntax

Command Fields

Behavior

Type of Operation

Calculation Considerations

Array Input

Window Functions

Examples

Calculate a Single Value as an Accumulator

Calculate Multiple Values as an Accumulator

Use $percentile in a $project Stage

Use $percentile in a $setWindowField Stage

Learn More

Use `$percentile` in a `$project` Stage

Use `$percentile` in a `$setWindowField` Stage