Materialized views¶
A materialized view is the continuous, streaming result of a pipe saved as a new data source. As new data is ingested into the origin data source, the transformed results from the pipe are continually inserted in the new materialized view, which you can query as any other data source.
Preprocessing data at ingest time reduces latency and cost-per-query, and can significantly improve the performance of your endpoints. For example, you can transform the data through SQL queries, using calculations such as counts, sums, averages, or arrays, or transformations like string manipulations or joins. The resulting materialized view acts as a data source you can query or publish.
Typical use cases of Materialized views include:
- Aggregating, sorting, or filtering data at ingest time.
- Improving the speed of a query that's taking too much time to run.
- Simplifying query development by automating common filters and aggregations.
- Reducing the amount of data processed by a single query.
- Changing an existing schema for a different use case.
You can create a new materialized view and populate it with all existing data without any cost.
Create materialized views¶
Materialized views are defined in a .pipe file using the TYPE MATERIALIZED
directive. See Materialized pipes.
Consider an origin
data source, for example my_origin.datasource
, like the following:
Origin data source
SCHEMA > `id` Int16, `local_date` Date, `name` String, `count` Int64
You might want to create an optimized version of the data source that preaggregates count
for each ID. To do this, create a new data source that uses a SimpleAggregateFunction
as a materialized view.
First, define the destination
data source, for example my_destination.datasource
:
Destination data source
SCHEMA > `id` Int16, `local_date` Date, `name` String, `total_count` SimpleAggregateFunction(sum, UInt64) ENGINE "AggregatingMergeTree" ENGINE_PARTITION_KEY "toYYYYMM(local_date)" ENGINE_SORTING_KEY "local_date,id"
Write a materialized pipe, for example my_transformation.pipe
:
Transformation Pipe
NODE transformation_node SQL > SELECT id, local_date, name, sum(count) as total_count FROM my_origin GROUP BY id, local_date, name TYPE materialized DATASOURCE my_destination
Once you have the origin and destination data sources defined and the materialized pipe, deploy the project to apply the changes. Materialized views are populated automatically in the target environment.
Populate materialized views on demand¶
You can populate a materialized view on demand using the tb materialization populate
command. For example:
Populate the materialized view
tb materialization populate my_transformation.pipe
You can also truncate the materialized view before populating it. For example:
Truncate the materialized view
tb materialization populate my_transformation.pipe --truncate
See tb materialization populate for more details.
Next steps¶
- Learn about Copy pipes.
- Publish your data with API endpoints.