Example of Materialized View (CLI)

Consider an events Data Source which, for each action performed in an ecommerce website, stores a timestamp, the user that performed the action, the product, which type of action - buy, add to cart, view, and so on - and a JSON column containing some metadata, such as the price.

The events Data Source is expected to store billions of rows per month. Its data schema is as follows:

DEFINITION OF THE EVENTS.DATASOURCE FILE
SCHEMA >
    `date` DateTime,
    `product_id` String,
    `user_id` Int64,
    `event` String,
    `extra_data` String

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYear(date)"
ENGINE_SORTING_KEY "date, cityHash64(extra_data)"
ENGINE_SAMPLING_KEY "cityHash64(extra_data)"

You want to publish an API Endpoint calculating the top 10 products in terms of sales for a date range ranked by total amount sold. Here's where Materialized Views can help you.

Materialize the results

After doing the desired transformations, set the TYPE parameter to materialized and add the name of the Data Source, which materializes the results.

DEFINITION OF THE TOP PRODUCT PER_DAY.PIPE
NODE only_buy_events
DESCRIPTION >
    filters all the buy events

SQL >
    SELECT 
        toDate(date) AS date, 
        product_id, 
        JSONExtractFloat(extra_data, 'price') AS price
    FROM events 
    WHERE event = 'buy'

NODE top_per_day
SQL >
    SELECT 
        date, 
        topKState(10)(product_id) AS top_10, 
        sumState(price) AS total_sales
    FROM only_buy_events 
    GROUP BY date

TYPE materialized
DATASOURCE top_products_view

Do the rest in the Data Source schema definition for the Materialized View, named top_products_view:

DEFINITION OF THE TOP PRODUCTS VIEW.DATASOURCE FILE
SCHEMA >
    `date` Date,
    `top_10` AggregateFunction(topK(10), String),
    `total_sales` AggregateFunction(sum, Float64)

ENGINE "AggregatingMergeTree"
ENGINE_SORTING_KEY "date"

The destination Data Source uses an AggregatingMergeTree engine, which for each date stores the corresponding AggregateFunction for the top 10 products and the total sales.

Having the data precalculated as it gets ingested makes the API Endpoint run in real time, no matter the number of rows in the events Data Source.

As for the Pipe used to build the API Endpoint, top_products_agg, it's as follows:

DEFINITION OF THE TOP PRODUCTS PER DAY PIPE
NODE top_products_day
SQL >
    SELECT 
        date, 
        topKMerge(10)(top_10) AS top_10, 
        sumMerge(total_sales) AS total_sales
    FROM dev__top_products_view 
    GROUP BY date

When preaggregating, the Aggregate Function uses the mode State, while when getting the calculation it makes use of Merge.

Push to Tinybird

Once it's done, push everything to your Tinybird account:

PUSH YOUR PIPES AND DATA SOURCES USING THE CLI
tb push datasources/top_products_view.datasource
tb push pipes/top_product_per_day.pipe --populate
tb push endpoints/top_products_endpoint.pipe

When pushing the top_product_per_day.pipe, use the --populate flag. This causes the transformation to run in a job, and the Materialized View top_products_view to be populated.

You can repopulate Materialized Views at any moment:

Command to force populate the materialized view
tb push pipes/top_product_per_day.pipe --populate --force
Updated