Evolve data sources¶
After you've deployed your project, you can evolve your data sources. For example, you might need to add a new column, change the data type of a column, or change the sorting key. Tinybird Forward handles the data migration.
Types of changes¶
You can evolve your data sources by editing one or more of the following:
- Landing data source schema.
- Landing data source engine settings.
- Materialized data source.
Landing data source schema¶
When you make changes to the schema of a landing data source, such as adding or editing columns or changing a data type, you can follow these steps:
- In Tinybird Local, start a dev session with
tb dev
. - Edit the .datasource file to add the changes. See SCHEMA instructions.
- Add a forward query instruction to tell Tinybird how to migrate your data.
- Run
tb deploy --check
to validate the deployment before creating it. This is a good way of catching potential breaking changes. - Deploy and promote your changes in Tinybird Cloud using
tb --cloud deploy
.
When Tinybird Cloud creates the deployment, it automatically populates the new table following the updated schema.
Forward query¶
If you make changes to a .datasource file that are incompatible with the live version, you must use a forward query to transform the data from the live schema to the new one. Otherwise, your deployment fails due to a schema mismatch.
The FORWARD_QUERY
instruction is a SELECT
query executed on the live data source. The query must include the column selection part of the query, for example SELECT a, b, c
or SELECT * except 'guid', toUUID(guid) AS guid
. The FROM
and WHERE
clauses aren't supported.
The following is an example of a forward query that changes the session_id
column from a String
to a UUID
type:
tinybird/datasources/forward-query.datasource - Data Source with a FORWARD_QUERY declaration
DESCRIPTION > Analytics events landing data source SCHEMA > `timestamp` DateTime `json:$.timestamp`, `session_id` UUID `json:$.session_id`, `action` String `json:$.action`, `version` String `json:$.version`, `payload` String `json:$.payload` ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" ENGINE_TTL "timestamp + toIntervalDay(60)" FORWARD_QUERY > SELECT timestamp, toUUID(session_id) as session_id, action, version, payload
Tinybird runs a backfill to migrate the data to the new schema. These backfills are logged in datasources_ops_log
with the event_type
set to deployment_backfill
.
After changes have been deployed and promoted, if you want to deploy other changes that don't affect that data source, you can remove the forward query.
Landing data source engine settings¶
When you make changes to the engine settings of a landing data source, such as changing the sorting or partition key, you can follow these steps:
In Tinybird Local, start a dev session with
tb dev
.Edit the .datasource file to add the changes. No forward query is required. See engine settings.
Run
tb deploy --check
to validate the deployment before creating it. This is a good way of catching potential breaking changes.Deploy and promote your changes in Tinybird Cloud using
tb --cloud deploy
.
When Tinybird Cloud creates the deployment, it automatically populates the new table following the changes.
Materialized data sources¶
When editing materialized data sources, you need to consider the settings of the landing data sources that feed into them, especially the TTL (Time To Live) settings.
Forward queries are essential when evolving materialized data sources, both schema and engine settings, to retain historical data.
If your landing data source has a shorter TTL than your materialized data source, you will get a warning when you deploy your changes. You will need to add a forward query to prevent data loss or, if you accept loss of historical data, add the --allow-destructive-operations
flag to your deployment command.
For example, consider this scenario:
- Landing data source has a 7-day TTL.
- Materialized data source has no TTL (keeps data indefinitely).
- You want to change the data type of a column in the materialized data source.
Without a forward query, recalculating the materialized data source would only process the last 7 days of data due to the landing source's TTL, causing you to lose historical data beyond that period. To retain all historical data, use a forward query to transform the data from the live schema to the new one.
Here's an example materialized data source that uses a forward query to transform the data type of the visits
column from AggregateFunction(count, UInt16)
to AggregateFunction(count, UInt64)
:
DESCRIPTION > Materialized data source for daily page visits aggregation SCHEMA > `date` Date, `page_url` String, `visits` AggregateFunction(count, UInt64) ENGINE "AggregatingMergeTree" ENGINE_PARTITION_KEY "toYYYYMM(date)" ENGINE_SORTING_KEY "date, page_url" FORWARD_QUERY > SELECT date, page_url, CAST(visits, 'AggregateFunction(count, UInt64)') AS visits
Omitting the forward query instruction fully recalculates the materialized data source.
You can omit the forward query when:
- Landing data source has a longer TTL than the materialized data source, or no TTL.
- Making non-backward compatible changes, like adding a new group by column.
- Accepting loss of historical data.
Next steps¶
- Learn more about deployments.
- Learn about datafiles, like .datasource and .pipe files. See Datafiles.