Evolve data sources¶
After you've deployed your project, you can evolve your data sources and Tinybrid will handle the data migration. For example, you might need to add a new column, change the data type of a column, or change the sorting key.
Types of changes¶
You can evolve your data sources in two ways:
- Edit the schema.
- Edit the engine settings or indexes.
Edit the schema¶
When you make changes to the schema, such as adding or editing columns or changing a data type, you can follow these steps:
- In Tinybird Local, start a dev session with
tb dev
. - Edit the .datasource file to add the changes. See SCHEMA instructions.
- Add a forward query instruction to tell Tinybird how to migrate your data.
- Run
tb deploy --check
to validate the deployment before creating it. This is a good way of catching potential breaking changes. - Deploy and promote your changes in Tinybird Cloud using
tb --cloud deploy --auto
.
When Tinybird Cloud creates the deployment, it automatically populates the new table following the updated schema.
Forward query¶
If you make changes to a .datasource file that are incompatible with the live version, you must use a forward query to transform the data from the live schema to the new one. Otherwise, your deployment fails due to a schema mismatch.
The FORWARD_QUERY
instruction is a SELECT
query executed on the live data source. The query must include the column selection part of the query, for example SELECT a, b, c
or SELECT * except 'guid', toUUID(guid) AS guid
. The FROM
and WHERE
clauses aren't supported.
The following is an example of a forward query that changes the session_id
column from a String
to a UUID
type:
tinybird/datasources/forward-query.datasource - Data Source with a FORWARD_QUERY declaration
DESCRIPTION > Analytics events landing data source SCHEMA > `timestamp` DateTime `json:$.timestamp`, `session_id` UUID `json:$.session_id`, `action` String `json:$.action`, `version` String `json:$.version`, `payload` String `json:$.payload` ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" ENGINE_TTL "timestamp + toIntervalDay(60)" FORWARD_QUERY > SELECT timestamp, toUUID(session_id) as session_id, action, version, payload
Tinybird runs a backfill to migrate the data to the new schema. After changes have been deployed and promoted, if you want to deploy other changes that don't affect that data source, you can remove the forward query.
Forward query in a materialized data source¶
Forward queries are important when evolving materialized data sources, especially when dealing with TTL (Time To Live) settings.
If your landing data source has a shorter TTL than your materialized data source, you must use a forward query when making schema changes to prevent data loss.
For example, imagine the following scenario:
- Landing data source has a 7-day TTL.
- Materialized data source has no TTL (keeps data indefinitely).
- You change the data type of a column in the materialized data source.
Without a forward query, recalculating the materialized data source would only process the last 7 days of data (due to the landing source's TTL), causing you to lose any historical data beyond that period. To retain that historical data, use a forward query to transform the data from the live schema to the new one.
For example, the following materialized data source uses a forward query to transform the data type of the visits
column from AggregateFunction(count, UInt16)
to AggregateFunction(count, UInt64)
:
DESCRIPTION > Materialized data source for daily page visits aggregation SCHEMA > `date` Date, `page_url` String, `visits` AggregateFunction(count, UInt64) ENGINE "AggregatingMergeTree" ENGINE_PARTITION_KEY "toYYYYMM(date)" ENGINE_SORTING_KEY "date, page_url" FORWARD_QUERY > SELECT date, page_url, CAST(visits, 'AggregateFunction(count, UInt64)') AS visits
Edit the engine settings or indexes¶
When you make changes to the engine settings or indexes, such as changing the sorting or partition key or adding a data skipping index, you can follow these steps:
- In Tinybird Local, start a dev session with
tb dev
. - Edit the .datasource file to add the changes. See engine settings. No forward query is required.
- Run
tb deploy --check
to validate the deployment before creating it. This is a good way of catching potential breaking changes. - Deploy and promote your changes in Tinybird Cloud using
tb --cloud deploy --auto
.
When Tinybird Cloud creates the deployment, it automatically populates the new table following the changes.
Next steps¶
- Learn more about deployments.
- Learn about datafiles, like .datasource and .pipe files. See Datafiles.