Example of Materialized View (CLI)¶
Consider an events
Data Source which, for each action performed in an ecommerce website, stores a timestamp, the user that performed the action, the product, which type of action - buy
, add to cart
, view
, and so on - and a JSON column containing some metadata, such as the price.
The events
Data Source is expected to store billions of rows per month. Its data schema is as follows:
DEFINITION OF THE EVENTS.DATASOURCE FILE
SCHEMA > `date` DateTime, `product_id` String, `user_id` Int64, `event` String, `extra_data` String ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYear(date)" ENGINE_SORTING_KEY "date, cityHash64(extra_data)" ENGINE_SAMPLING_KEY "cityHash64(extra_data)"
You want to publish an API Endpoint calculating the top 10 products in terms of sales for a date range ranked by total amount sold. Here's where Materialized Views can help you.
Materialize the results¶
After doing the desired transformations, set the TYPE
parameter to materialized
and add the name of the Data Source, which materializes the results.
DEFINITION OF THE TOP PRODUCT PER_DAY.PIPE
NODE only_buy_events DESCRIPTION > filters all the buy events SQL > SELECT toDate(date) AS date, product_id, JSONExtractFloat(extra_data, 'price') AS price FROM events WHERE event = 'buy' NODE top_per_day SQL > SELECT date, topKState(10)(product_id) AS top_10, sumState(price) AS total_sales FROM only_buy_events GROUP BY date TYPE materialized DATASOURCE top_products_view
Do the rest in the Data Source schema definition for the Materialized View, named top_products_view
:
DEFINITION OF THE TOP PRODUCTS VIEW.DATASOURCE FILE
SCHEMA > `date` Date, `top_10` AggregateFunction(topK(10), String), `total_sales` AggregateFunction(sum, Float64) ENGINE "AggregatingMergeTree" ENGINE_SORTING_KEY "date"
The destination Data Source uses an AggregatingMergeTree engine, which for each date
stores the corresponding AggregateFunction
for the top 10 products and the total sales.
Having the data precalculated as it gets ingested makes the API Endpoint run in real time, no matter the number of rows in the events
Data Source.
As for the Pipe used to build the API Endpoint, top_products_agg
, it's as follows:
DEFINITION OF THE TOP PRODUCTS PER DAY PIPE
NODE top_products_day SQL > SELECT date, topKMerge(10)(top_10) AS top_10, sumMerge(total_sales) AS total_sales FROM dev__top_products_view GROUP BY date
When preaggregating, the Aggregate Function uses the mode State
, while when getting the calculation it makes use of Merge
.
Push to Tinybird¶
Once it's done, push everything to your Tinybird account:
PUSH YOUR PIPES AND DATA SOURCES USING THE CLI
tb push datasources/top_products_view.datasource tb push pipes/top_product_per_day.pipe --populate tb push endpoints/top_products_endpoint.pipe
When pushing the top_product_per_day.pipe
, use the --populate
flag. This causes the transformation to run in a job, and the Materialized View top_products_view
to be populated.
You can repopulate Materialized Views at any moment:
Command to force populate the materialized view
tb push pipes/top_product_per_day.pipe --populate --force