Evolve data sources

After you've deployed your project, you can evolve your data sources and Tinybrid will handle the data migration. For example, you might need to add a new column, change the data type of a column, or change the sorting key.

This is an experimental version of Tinybird. Join #forward in our Slack community to share your feedback.

Types of changes

You can evolve your data sources in two ways:

  1. Edit the schema.
  2. Edit the engine settings or indexes.

Edit the schema

When you make changes to the schema, such as adding or editing columns or changing a data type, you can follow these steps:

  1. In Tinybird Local, start a dev session with tb dev.
  2. Edit the .datasource file to add the changes. See SCHEMA instructions.
  3. Add a forward query instruction to tell Tinybird how to migrate your data.
  4. Run tb deploy --check to validate the deployment before creating it. This is a good way of catching potential breaking changes.
  5. Deploy and promote your changes in Tinybird Cloud using tb --cloud deploy --auto.

When Tinybird Cloud creates the deployment, it automatically populates the new table following the updated schema.

Forward query

If you make changes to a .datasource file that are incompatible with the live version, you must use a forward query to transform the data from the live schema to the new one. Otherwise, your deployment fails due to a schema mismatch.

The FORWARD_QUERY instruction is a SELECT query executed on the live data source. The query must include the column selection part of the query, for example SELECT a, b, c or SELECT * except 'guid', toUUID(guid) AS guid. The FROM and WHERE clauses aren't supported.

The following is an example of a forward query that changes the session_id column from a String to a UUID type:

tinybird/datasources/forward-query.datasource - Data Source with a FORWARD_QUERY declaration
DESCRIPTION >
    Analytics events landing data source

SCHEMA >
    `timestamp` DateTime `json:$.timestamp`,
    `session_id` UUID `json:$.session_id`,
    `action` String `json:$.action`,
    `version` String `json:$.version`,
    `payload` String `json:$.payload`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp"
ENGINE_TTL "timestamp + toIntervalDay(60)"

FORWARD_QUERY >
    SELECT timestamp, toUUID(session_id) as session_id, action, version, payload 

Tinybird runs a backfill to migrate the data to the new schema. After changes have been deployed and promoted, if you want to deploy other changes that don't affect that data source, you can remove the forward query.

Forward query in a materialized data source

Forward queries are important when evolving materialized data sources, especially when dealing with TTL (Time To Live) settings.

If your landing data source has a shorter TTL than your materialized data source, you must use a forward query when making schema changes to prevent data loss.

For example, imagine the following scenario:

  • Landing data source has a 7-day TTL.
  • Materialized data source has no TTL (keeps data indefinitely).
  • You change the data type of a column in the materialized data source.

Without a forward query, recalculating the materialized data source would only process the last 7 days of data (due to the landing source's TTL), causing you to lose any historical data beyond that period. To retain that historical data, use a forward query to transform the data from the live schema to the new one.

For example, the following materialized data source uses a forward query to transform the data type of the visits column from AggregateFunction(count, UInt16) to AggregateFunction(count, UInt64):

DESCRIPTION >
    Materialized data source for daily page visits aggregation

SCHEMA >
    `date` Date,
    `page_url` String,
    `visits` AggregateFunction(count, UInt64)

ENGINE "AggregatingMergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(date)"
ENGINE_SORTING_KEY "date, page_url"

FORWARD_QUERY >
    SELECT date, page_url, CAST(visits, 'AggregateFunction(count, UInt64)') AS visits

Edit the engine settings or indexes

When you make changes to the engine settings or indexes, such as changing the sorting or partition key or adding a data skipping index, you can follow these steps:

  1. In Tinybird Local, start a dev session with tb dev.
  2. Edit the .datasource file to add the changes. See engine settings. No forward query is required.
  3. Run tb deploy --check to validate the deployment before creating it. This is a good way of catching potential breaking changes.
  4. Deploy and promote your changes in Tinybird Cloud using tb --cloud deploy --auto.

When Tinybird Cloud creates the deployment, it automatically populates the new table following the changes.

Next steps

  • Learn more about deployments.
  • Learn about datafiles, like .datasource and .pipe files. See Datafiles.
Updated