Group Data with the Bucket Pattern
The bucket pattern separates long series of data into distinct objects. Separating large data series into smaller groups can improve query access patterns and simplify application logic. Bucketing is useful when you have similar objects that relate to a central entity, such as stock trades made by a single user.
You can use the bucket pattern for pagination by grouping your data based on the elements that your application shows per page. This approach uses MongoDB's flexible data model to store data according to the data your applications needs.
Tip
Time series collections apply the bucket pattern automatically, and are suitable for most applications that involve bucketing time series data.
About this Task
Consider the following schema that tracks stock trades. The initial schema does not use the bucket pattern, and stores each trade in an individual document.
db.trades.insertMany( [ { "ticker" : "MDB", "customerId": 123, "type" : "buy", "quantity" : 419, "date" : ISODate("2023-10-26T15:47:03.434Z") }, { "ticker" : "MDB", "customerId": 123, "type" : "sell", "quantity" : 29, "date" : ISODate("2023-10-30T09:32:57.765Z") }, { "ticker" : "GOOG", "customerId": 456, "type" : "buy", "quantity" : 50, "date" : ISODate("2023-10-31T11:16:02.120Z") } ] )
The application shows stock trades made by a single customer at a time,
and shows 10 trades per page. To simplify the application logic, use the
bucket pattern to group the trades by customerId in groups of 10.
Steps
Group the data by customerId.
Reorganize the schema to have a single document for each
customerId:
{ "customerId": 123, "history": [ { "type": "buy", "ticker": "MDB", "qty": 419, "date": ISODate("2023-10-26T15:47:03.434Z") }, { "type": "sell", "ticker": "MDB", "qty": 29, "date": ISODate("2023-10-30T09:32:57.765Z") } ] }, { "customerId": 456, "history": [ { "type" : "buy", "ticker" : "GOOG", "quantity" : 50, "date" : ISODate("2023-10-31T11:16:02.120Z") } ] }
With the bucket pattern:
Documents with common
customerIdvalues are condensed into a single document, with thecustomerIdbeing a top-level field.Trades for that customer are grouped into an embedded array field, called
history.
Add an identifier and count for each bucket.
1 db.trades.drop() 2 3 db.trades.insertMany( 4 [ 5 { 6 "_id": "123_1698349623", 7 "customerId": 123, 8 "count": 2, 9 "history": [ 10 { 11 "type": "buy", 12 "ticker": "MDB", 13 "qty": 419, 14 "date": ISODate("2023-10-26T15:47:03.434Z") 15 }, 16 { 17 "type": "sell", 18 "ticker": "MDB", 19 "qty": 29, 20 "date": ISODate("2023-10-30T09:32:57.765Z") 21 } 22 ] 23 }, 24 { 25 "_id": "456_1698765362", 26 "customerId": 456, 27 "count": 1, 28 "history": [ 29 { 30 "type" : "buy", 31 "ticker" : "GOOG", 32 "quantity" : 50, 33 "date" : ISODate("2023-10-31T11:16:02.120Z") 34 } 35 ] 36 }, 37 ] 38 )
The _id field value is a concatenation of the customerId
and the first trade time in seconds (since the unix epoch)
in the history field.
The count field indicates how many elements are in that
document's history array. The count field is used to
implement pagination logic.
Next Steps
After you update your schema to use the bucket pattern, update your application logic for reading and writing data. See the following sections:
Query for Data with the Bucket Pattern
In the updated schema, each document contains data for a single page in
the application. You can use the _id and count field to
determine how to return and update data.
To query for data on the appropriate page, use a regex query to return
data for a specified customerId, and use skip to return to the data for the correct page. The regex
query on _id uses the default _id index,
which results in performant queries without the need for an additional
index.
The following query returns data for the first page of trades for
customer 123:
db.trades.find( { "_id": /^123_/ } ).sort( { _id: 1 } ).limit(1)
To return data for later pages, specify a skip value of one less
than the page you want to show data for. For example, to show data for
page 10, run the following query:
db.trades.find( { "_id": /^123_/ } ).sort( { _id: 1 } ).skip(9).limit(1)
Note
The preceding query returns no results because the sample data only contains documents for the first page.
Insert Data with the Bucket Pattern
Now that the schema uses the bucket pattern, update your application
logic to insert new trades into the correct bucket. Use an update
command to insert the trade into the bucket with the appropriate
customerId and bucket.
The following command inserts a new trade for customerId: 123:
db.trades.updateOne( { "_id": /^123_/, "count": { $lt: 10 } }, { "$push": { "history": { "type": "buy", "ticker": "MSFT", "qty": 42, "date": ISODate("2023-11-02T11:43:10") } }, "$inc": { "count": 1 }, "$setOnInsert": { "_id": "123_1698939791", "customerId": 123 } }, { upsert: true } )
The application displays 10 trades per page. The update filter searches
for a document for customerId: 123 where the count is less than
10, meaning that bucket does not contain a full page of data.
If there is a document that matches
"_id": /^123_/and itscountis less than 10, the update command pushes the new trade into the matched document'shistoryarray.If there is not a matching document, the update command inserts a new document with the new trade (because
upsertistrue). The_idfield of the new document is a concatenation of thecustomerIdand the time in seconds since the unix epoch of the trade.
The logic for update commands avoids unbounded arrays by ensuring that no history array contains more than 10
documents.
After you run the update operation, the trades collection has the
following documents:
[ { _id: '123_1698349623', customerId: 123, count: 3, history: [ { type: 'buy', ticker: 'MDB', qty: 419, date: ISODate("2023-10-26T15:47:03.434Z") }, { type: 'sell', ticker: 'MDB', qty: 29, date: ISODate("2023-10-30T09:32:57.765Z") }, { type: 'buy', ticker: 'MSFT', qty: 42, date: ISODate("2023-11-02T11:43:10.000Z") } ] }, { _id: '456_1698765362', customerId: 456, count: 1, history: [ { type: 'buy', ticker: 'GOOG', quantity: 50, date: ISODate("2023-10-31T11:16:02.120Z") } ] } ]
Results
After you implement the bucket pattern, you don't need to incorporate pagination logic to return results in your application. The way the data is stored matches the way it is used in the application.