$substrCP (aggregation)
Definition
$substrCPReturns the substring of a string. The substring starts with the character at the specified UTF-8 code point (CP) index (zero-based) in the string for the number of code points specified.
$substrCPhas the following operator expression syntax:{ $substrCP: [ <string expression>, <code point index>, <code point count> ] } FieldTypeDescriptionstring expressionstring
The string from which the substring will be extracted.
string expressioncan be any valid expression as long as it resolves to a string. For more information on expressions, see Expression Operators.If the argument resolves to a value of
nullor refers to a field that is missing,$substrCPreturns an empty string.If the argument does not resolve to a string or
nullnor refers to a missing field,$substrCPreturns an error.code point indexnumber
Indicates the starting point of the substring.
code point indexcan be any valid expression as long as it resolves to a non-negative integer.code point countnumber
Can be any valid expression as long as it resolves to a non-negative integer or number that can be represented as an integer (such as 2.0).
ExampleResults{ $substrCP: [ "abcde", 1, 2 ] }"bc"{ $substrCP: [ "Hello World!", 6, 5 ] }"World"{ $substrCP: [ "cafétéria", 0, 5 ] }"cafét"{ $substrCP: [ "cafétéria", 5, 4 ] }"éria"{ $substrCP: [ "cafétéria", 7, 3 ] }"ia"{ $substrCP: [ "cafétéria", 3, 1 ] }"é"
Behavior
The $substrCP operator uses the code points to extract
the substring. This behavior differs from the
$substrBytes operator which extracts the substring
by the number of bytes, where each character uses between one and four
bytes.
Example
Single-Byte Character Set
Consider an inventory collection with the following documents:
{ "_id" : 1, "item" : "ABC1", quarter: "13Q1", "description" : "product 1" } { "_id" : 2, "item" : "ABC2", quarter: "13Q4", "description" : "product 2" } { "_id" : 3, "item" : "XYZ1", quarter: "14Q2", "description" : null }
The following operation uses the $substrCP operator to
separate the quarter value into a yearSubstring and a
quarterSubstring. The quarterSubstring field represents the
rest of the string from the specified byte index following the
yearSubstring. It is calculated by subtracting the byte index
from the length of the string using $strLenCP.
db.inventory.aggregate( [ { $project: { item: 1, yearSubstring: { $substrCP: [ "$quarter", 0, 2 ] }, quarterSubtring: { $substrCP: [ "$quarter", 2, { $subtract: [ { $strLenCP: "$quarter" }, 2 ] } ] } } } ] )
The operation returns the following results:
{ "_id" : 1, "item" : "ABC1", "yearSubstring" : "13", "quarterSubtring" : "Q1" } { "_id" : 2, "item" : "ABC2", "yearSubstring" : "13", "quarterSubtring" : "Q4" } { "_id" : 3, "item" : "XYZ1", "yearSubstring" : "14", "quarterSubtring" : "Q2" }
Single-Byte and Multibyte Character Set
Create a food collection with the following documents:
db.food.insertMany( [ { "_id" : 1, "name" : "apple" }, { "_id" : 2, "name" : "banana" }, { "_id" : 3, "name" : "éclair" }, { "_id" : 4, "name" : "hamburger" }, { "_id" : 5, "name" : "jalapeño" }, { "_id" : 6, "name" : "pizza" }, { "_id" : 7, "name" : "tacos" }, { "_id" : 8, "name" : "寿司sushi" } ] )
The following example uses the $substrCP operator to create a three
byte menuCode from the name value:
db.food.aggregate( [ { $project: { "name": 1, "menuCode": { $substrCP: [ "$name", 0, 3 ] } } } ] )
The operation returns the following results:
{ "_id" : 1, "name" : "apple", "menuCode" : "app" } { "_id" : 2, "name" : "banana", "menuCode" : "ban" } { "_id" : 3, "name" : "éclair", "menuCode" : "écl" } { "_id" : 4, "name" : "hamburger", "menuCode" : "ham" } { "_id" : 5, "name" : "jalapeño", "menuCode" : "jal" } { "_id" : 6, "name" : "pizza", "menuCode" : "piz" } { "_id" : 7, "name" : "tacos", "menuCode" : "tac" } { "_id" : 8, "name" : "寿司sushi", "menuCode" : "寿司s" }
See also: