Materialized Views

A Materialized View is the continuous, streaming result of a Pipe saved as a new Data Source. As new data is ingested into the origin Data Source, the transformed results from the Pipe are continually inserted in the new Materialized View, which you can query as any other Data Source. Tinybird represents Materialized Views using the icon.

Preprocessing data at ingest time reduces latency and cost-per-query, and can significantly improve the performance of your API Endpoints. For example, you can transform the data through SQL queries, using calculations such as counts, sums, averages, or arrays, or transformations like string manipulations or joins. The resulting Materialized View acts as a Data Source you can query or publish.

Typical use cases of Materialized Views include:

  • Aggregating, sorting, or filtering data at ingest time.
  • Improving the speed of a query that's taking too much time to run.
  • Simplifying query development by automating common filters and aggregations.
  • Reducing the amount of data processed by a single query.
  • Changing an existing schema for a different use case.

You can create a new Materialized View and populate it with all existing data without any cost. On-going incremental writes to Materialized Views count towards your Tinybird usage.

Create Materialized Views

To create a Materialized View using the Tinybird UI, follow these steps:

  1. Write your Pipe using as many Nodes as needed.
  2. Select the downward arrow (▽) next to Create API Endpoint and select Create Materialized View.
  3. Select the node you want to use as output.
  4. Edit the name of the destination Data Source.
  5. Adjust the Engine Type, Sorting Keys, and so on.

For a detailed example, see Example of Materialized View.

Error messages

With any Materialized View, Tinybird runs a speed simulation to ensure that the Materialized View won't produce any lag.

If you're getting an error in Tinybird that your query is not compatible with real-time ingestion, review your Materialized View query setup.

Review the 5 rules for faster queries and keep the following principles in mind:

  1. Avoid doing huge joins without filtering.
  2. Use GROUP BY before filtering.
  3. Remember that array JOINs are slow.
  4. Filter the right side of JOINs to speed up Materialized Views.

If you're batching, especially when ingesting from Kinesis, consider decreasing the amount of data you batch.

Create Materialized Views (CLI)

Consider an origin data source, for example my_origin.datasource, like the following:

Origin data source

SCHEMA >
    `id` Int16,
    `local_date` Date,
    `name` String,
    `count` Int64

You might want to create an optimized version of the Data Source that preaggregates count for each ID. To do this, create a new Data Source that uses a SimpleAggregateFunction as a Materialized View.

First, define the destination data source, for example my_destination.datasource:

Destination data source
SCHEMA >
    `id` Int16,
    `local_date` Date,
    `name` String,
    `total_count` SimpleAggregateFunction(sum, UInt64)

ENGINE "AggregatingMergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(local_date)"
ENGINE_SORTING_KEY "local_date,id"

Write a transformation Pipe, for example my_transformation.pipe:

Transformation Pipe
NODE transformation_node

SQL >
    SELECT
        id,
        local_date,
        name,
        sum(count) as total_count
    FROM
        my_origin
    GROUP BY
        id,
        local_date,
        name

TYPE materialized
DATASOURCE my_destination

Once you have the origin and destination Data Sources defined and the transformation Pipe, you can push them:

Push the Materialized Views
tb push my_origin.datasource
tb push my_destination.datasource
tb push my_transformation.pipe --populate

Any time you ingest data into my_origin, the data in my_destination is automatically updated.

For a detailed example, see Example of Materialized View (CLI).

Guided process using tb materialize

Alternatively, you can use the tb materialize command to generate the target .datasource file needed to push a new Materialized View.

The goal of the command is to guide you through all the needed steps to create a Materialized View.

Given a Pipe, tb materialize:

  1. Asks you which Node of the Pipe you want to materialize. By default, it selects the last one in the Pipe. If there's only one, it's automatically selected, skipping asking you. From the selected query, the commands guesses the best parameters for the following steps.
  2. It warns you of the errors the query has, if any, that prevents it from materializing. If everything is correct, it continues.
  3. It creates the target Data Source file that receives the results of the materialization, setting default engine parameters. If you are materializing an aggregation you should make sure the ENGINE_SORTING_KEY columns in the .datasource file are in the right order you are going to filter the table.
  4. It modifies the query to set up the materialization settings and pushes the Pipe to create the materialization. You can skip the Pipe checks if needed as well.
  5. It asks you if you want to populate the Materialized View with existing data. If you select to populate, it asks you if you want to use a subset of the data or fully populate with all existing data.
  6. It creates a backup file of the Pipe adding the _bak suffix to the file extension. It completes the aggregate functions with -State combinators where needed and adds the target Data Source name. The backup file is preserved in case you want to recover the original query.

The command generates and modifies the files involved in the materialization. If you run into an error or you need to modify something in the materialization, you can reuse the files as a better starting point.

Force populate Materialized Views

Sometimes you might want to force populate a Materialized View, most likely because you changed the transformation in the Pipe and you want the data from the origin Data Source to be reingested.

You can do this using tb push and the --force and --populate flags:

Populate a Materialized View
tb push my_may_view_pipe.pipe --force --populate

The response contains a Jobs API job_url that you can use to check progress and status of the job.

Query Materialized Views

If the type of engine in your Data Source is MergeTree and you're not doing aggregations, you can query a Materialized View as a standard Data Source.

If the engine is AggregatingMergeTree, SummingMergeTree or other special engine, and you have functions in your Pipe with the -State modifier, use the -Merge modifier, or max() and group by the Sorting Key columns. See Best practices for Materialized Views.

For Deduplication use cases with ReplacingMergeTree, see Deduplication strategies.

Limitations

Materialized Views work as insert triggers, which means a delete or truncate operation on your original Data Source doesn't affect the related Materialized Views.

As transformation and ingestion in the Materialized View is done on each block of inserted data in the original Data Source, some operations such as GROUP BY, ORDER BY, DISTINCT and LIMIT might need a specific engine, such as AggregatingMergeTree or SummingMergeTree, which can handle data aggregations.

The Data Source resulting from a Materialized View generated using JOIN is automatically updated only if and when a new operation is performed over the Data Source in the FROM.

You can't create Materialized Views that depend on the UNION of several Data Sources.

Considerations on populates

As described in Best practices for Materialized Views, populates are the process to move the data that was already in the original Data Source through to the new Materialized View.

If you have a continuous or streaming ingestion into the original Data Source, the populate might produce duplicated rows in the Materialized View. The populate runs as a separate job that goes partition by partition moving the existing data into the Materialized View. At the same time, the Materialized View is already automatically receiving new rows. There may be an overlap where the populate job moves a partition that includes rows that were already ingested into the Materialized View.

To handle this scenario, see Backfill strategies.

Next steps

Updated