Evolve data sources

After you've deployed your project, you can evolve your data sources. For example, you might need to add a new column, change the data type of a column, or change the sorting key. Tinybird Forward handles the data migration.

Types of changes

You can evolve your data sources by editing one or more of the following:

  • Landing data source schema.
  • Landing data source engine settings.
  • Materialized data source.

Landing data source schema

When you make changes to the schema of a landing data source, such as adding or editing columns or changing a data type, you can follow these steps:

  1. In Tinybird Local, start a dev session with tb dev.
  2. Edit the .datasource file to add the changes. See SCHEMA instructions.
  3. Add a forward query instruction to tell Tinybird how to migrate your data.
  4. Run tb deploy --check to validate the deployment before creating it. This is a good way of catching potential breaking changes.
  5. Deploy and promote your changes in Tinybird Cloud using tb --cloud deploy.

When Tinybird Cloud creates the deployment, it automatically populates the new table following the updated schema.

Forward query

If you make changes to a .datasource file that are incompatible with the live version, you must use a forward query to transform the data from the live schema to the new one. Otherwise, your deployment fails due to a schema mismatch.

The FORWARD_QUERY instruction is a SELECT query executed on the live data source. The query must include the column selection part of the query, for example SELECT a, b, c or SELECT * except 'guid', toUUID(guid) AS guid. The FROM and WHERE clauses aren't supported.

The following is an example of a forward query that changes the session_id column from a String to a UUID type:

tinybird/datasources/forward-query.datasource - Data Source with a FORWARD_QUERY declaration
DESCRIPTION >
    Analytics events landing data source

SCHEMA >
    `timestamp` DateTime `json:$.timestamp`,
    `session_id` UUID `json:$.session_id`,
    `action` String `json:$.action`,
    `version` String `json:$.version`,
    `payload` String `json:$.payload`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp"
ENGINE_TTL "timestamp + toIntervalDay(60)"

FORWARD_QUERY >
    SELECT timestamp, toUUID(session_id) as session_id, action, version, payload

Tinybird runs a backfill to migrate the data to the new schema. These backfills are logged in datasources_ops_log with the event_type set to deployment_backfill.

After changes have been deployed and promoted, if you want to deploy other changes that don't affect that data source, you can remove the forward query.

Landing data source engine settings

When you make changes to the engine settings of a landing data source, such as changing the sorting or partition key, you can follow these steps:

  1. In Tinybird Local, start a dev session with tb dev.

  2. Edit the .datasource file to add the changes. No forward query is required. See engine settings.

  3. Run tb deploy --check to validate the deployment before creating it. This is a good way of catching potential breaking changes.

  4. Deploy and promote your changes in Tinybird Cloud using tb --cloud deploy.

When Tinybird Cloud creates the deployment, it automatically populates the new table following the changes.

Materialized data sources

When editing materialized data sources, you need to consider the settings of the landing data sources that feed into them, especially the TTL (Time To Live) settings.

Forward queries are essential when evolving materialized data sources, both schema and engine settings, to retain historical data.

If your landing data source has a shorter TTL than your materialized data source, you will get a warning when you deploy your changes. You will need to add a forward query to prevent data loss or, if you accept loss of historical data, add the --allow-destructive-operations flag to your deployment command.

For example, consider this scenario:

  • Landing data source has a 7-day TTL.
  • Materialized data source has no TTL (keeps data indefinitely).
  • You want to change the data type of a column in the materialized data source.

Without a forward query, recalculating the materialized data source would only process the last 7 days of data due to the landing source's TTL, causing you to lose historical data beyond that period. To retain all historical data, use a forward query to transform the data from the live schema to the new one.

Here's an example materialized data source that uses a forward query to transform the data type of the visits column from AggregateFunction(count, UInt16) to AggregateFunction(count, UInt64):

DESCRIPTION >
    Materialized data source for daily page visits aggregation

SCHEMA >
    `date` Date,
    `page_url` String,
    `visits` AggregateFunction(count, UInt64)

ENGINE "AggregatingMergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(date)"
ENGINE_SORTING_KEY "date, page_url"

FORWARD_QUERY >
    SELECT date, page_url, CAST(visits, 'AggregateFunction(count, UInt64)') AS visits

Omitting the forward query instruction fully recalculates the materialized data source.

You can omit the forward query when:

  • Landing data source has a longer TTL than the materialized data source, or no TTL.
  • Making non-backward compatible changes, like adding a new group by column.
  • Accepting loss of historical data.

Next steps

  • Learn more about deployments.
  • Learn about datafiles, like .datasource and .pipe files. See Datafiles.
Updated