Example of Materialized View

The following example shows how to create a Materialized View in Tinybird.

For this example, the question to answer is: "What was the average purchase price of all products in each city on a particular day?"

Get baseline performance

First, you need to check the baseline performance before creating an Endpoint on top of the original ingested Data Source.

Filtering and extracting

Create a first node to filter the data to include only buy events. Simultaneously, normalize event timestamps to rounded days and extract city and price data from the extra_data column containing JSON.

NODE 1: FILTERING & EXTRACTING
SELECT
  toDate(date) day,
  JSONExtractString(extra_data, 'city') as city,
  JSONExtractFloat(extra_data, 'price') as price
FROM events
WHERE event = 'buy'

Averaging

Next, create a node to aggregate data and get the average purchase price by city per day:

Node 2: Averaging
SELECT
  day,
  city,
  avg(price) as avg
FROM only_buy_events_per_city
GROUP BY day, city

Creating a parameterized API Endpoint

Finally, create a node that you can publish as an API Endpoint, adding a parameter to filter results to a particular day:

Node 3: Creating a parameterized API Endpoint
%
SELECT *
FROM avg_buy_per_day_and_city
WHERE day = {{Date(day, '2020-09-09')}}

Create the Materialized View

A Materialized View is formed by a Pipe that ends in the creation of a new Data Source instead of an Endpoint. Start by duplicating the existing Pipe:

  1. Duplicate the events_pipe so that you get a Pipe with the same Nodes as the baseline.
  2. Rename the Pipe to events_pipe_mv.

In this case you're going to materialize the second node in the Pipe, because it's the one performing the aggregation. The third node simply provides you with a filter to create a parameterized Endpoint.

You can't use query parameters in nodes that are published as Materialized Views.

To create the Materialized View:

  1. Select the node options.
  2. Select Create a Materialized View from this Node.
  3. Update the View settings as required.
  4. Select Create Materialized View.

Your Materialized View has been created as a new Data Source. By default, the name that Tinybird gives the Data Source is the name of the materialized Pipe ode appended with _mv, in this case avg_buy_per_day_and_city_mv.

Append the names of all your Transformation Pipes and Materialized View Data Sources with _mv, or another common identifier.

This example uses a new Data Source. You can also select an existing Data Source as the destination for the Materialized View, but it must have the same schema as the Materialized View output. If both schemas match, Tinybird offers that Data Source as an option that you can select when you're creating a Materialized View.

Populating Existing Data

When Tinybird creates a Materialized View, it initially only populates a partial set of data from the original Data Source. This allows you to quickly validate the results of the Materialized View.

Once you have validated with the partial dataset, you can populate the Materialized View with all the existing data in the Original Data source. To do so, select Populate With All Data.

You now have a Materialized View Data Source that you can use to query against in your Pipes.

Testing performance improvements

To test how the Materialized View has improved the performance of the API Endpoint, return to the original Pipe.

In the original Pipe, do the following:

  1. Copy the SQL from the original API Endpoint Node, avg_buy.
  2. Create a new transformation Node, called avg_buy_mv.
  3. Paste the query from the original API Endpoint Node into your new Node.
  4. Update the query to select from your new Materialized View, avg_buy_per_day_and_city_mv.

Because this query is an aggregation, you need to rewrite the query, because data in Materialized Views in Tinybird exists in intermediate states. As new data is ingested, the data in the Materialized View gets appended in blocks of partial results. A background periodically process merges the appended partial results and saves them in the Materialized View.

Because you are processing data in real time, you might not be able to wait for the background process to complete. To account for this, reaggregate in the API Endpoint query using the -merge combinator. This example uses an avg aggregation, so you need to use avgMerge to compact the results in the Materialized View.

When you run your the modified query, you get the same results as you got when you ran the final node against the original Data Source. This time, however, the performance has improved significantly.

With the Materialized View, you get the same results, but process less data twice as fast at a fraction of the cost.

Pointing the API Endpoint at the new node

Now that you've seen how much the performance of the API Endpoint query has improved by using a Materialized View, you can easily change which node the API Endpoint uses. Select the node dropdown, and then select the new node you created by querying the Materialized View.

This way, you improve the API Endpoint performance while retaining the original URL, so applications which call that API Endpoint see an immediate performance boost.

Updated