Datasource files (.datasource)

Datasource files describe your data sources. You can use .datasource files to define the schema, engine, and other settings of your data sources. See Data sources.

Connectors for cloud storage and Kafka are coming soon. They aren't currently available for v2.

Available instructions

The following instructions are available for .datasource files.

DeclarationRequiredDescription
SCHEMA <indented_schema_definition>YesDefines a block for a Data Source schema. The block must be indented.
DESCRIPTION <markdown_string>NoDescription of the Data Source.
TOKEN <token_name> APPENDNoGrants append access to a Data Source to the Token with name <token_name>. If the token doesn't exist, it's automatically created.
TAGS <tag_names>NoComma-separated list of tags. Tags are used to organize your data project.
ENGINE <engine_type>NoSets the engine for Data Source. Default value is MergeTree.
ENGINE_SORTING_KEY <sql>NoSets the ORDER BY expression for the Data Source. If unset, it defaults to DateTime, numeric, or String columns, in that order.
ENGINE_PARTITION_KEY <sql>NoSets the PARTITION expression for the Data Source.
ENGINE_TTL <sql>NoSets the TTL expression for the Data Source.
ENGINE_VER <column_name>NoColumn with the version of the object state. Required when using ENGINE ReplacingMergeTree.
ENGINE_SIGN <column_name>NoColumn to compute the state. Required when using ENGINE CollapsingMergeTree or ENGINE VersionedCollapsingMergeTree.
ENGINE_VERSION <column_name>NoColumn with the version of the object state. Required when ENGINE VersionedCollapsingMergeTree.
ENGINE_SETTINGS <settings>NoComma-separated list of key-value pairs that describe engine settings for the Data Source.
INDEXES <index definitions>NoDefines one or more indexes for the Data Source. See Data Skipping Indexes for more information.
SHARED_WITH <workspace_name>NoShares the Data Source with one or more Workspaces. Use in combination with --user_token with admin rights in the origin Workspace.
FORWARD_QUERY <sql>NoDefines a query to execute on the Data Source. The results of the query are returned instead of the original schema defined in the SCHEMA declaration.

The following example shows a typical .datasource file:

tinybird/datasources/example.datasource
# A comment
TOKEN tracker APPEND

DESCRIPTION >
    Analytics events **landing data source**

TAGS stock, recommendations

SCHEMA >
    `timestamp` DateTime `json:$.timestamp`,
    `session_id` String `json:$.session_id`,
    `action` LowCardinality(String) `json:$.action`,
    `version` LowCardinality(String) `json:$.version`,
    `payload` String `json:$.payload`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp"
ENGINE_TTL "timestamp + toIntervalDay(60)"
ENGINE_SETTINGS "index_granularity=8192"

INDEXES >
    INDEX idx1 action TYPE bloom_filter GRANULARITY 3

SHARED_WITH >
    analytics_production
    analytics_staging

Schema

A SCHEMA declaration is a newline, comma-separated list of columns definitions. For example:

Example SCHEMA declaration
SCHEMA >
    `timestamp` DateTime `json:$.timestamp`,
    `session_id` String `json:$.session_id`,
    `action` LowCardinality(String) `json:$.action`,
    `version` LowCardinality(String) `json:$.version`,
    `payload` String `json:$.payload`

Each column in a SCHEMA declaration is in the format <column_name> <data_type> <json_path> <default_value>, where:

  • <column_name> is the name of the column in the Data Source.
  • <data_type> is one of the supported Data Types.
  • <json_path> is optional and only required for NDJSON data sources. See JSONpaths.
  • <default_value> sets a default value to the column when it's null. A common use case is to set a default date to a column, like updated_at DateTime DEFAULT now().

To change or update JSONPaths or other default values in the schema, push a new version of the schema using tb push --force or use the alter endpoint on the Data sources API.

JSONPath expressions

SCHEMA definitions support JSONPath expressions. For example:

Schema syntax with jsonpath
DESCRIPTION Generated from /Users/username/tmp/sample.ndjson

SCHEMA >
    `d` DateTime `json:$.d`,
    `total` Int32 `json:$.total`,
    `from_novoa` Int16 `json:$.from_novoa`

See JSONPaths for more information.

Engine settings

ENGINE declares the engine used for the Data Source. The default value is MergeTree.

See Engines for more information.

Forward query

The FORWARD_QUERY instruction is used to define a SELECT query to execute on the Data Source. The query transforms the data from the old schema, the one currently deployed, to a new one defined in the query.

The query must include the column selection part of the query, for example SELECT a, b, c or SELECT * except 'guid', toUUID(guid) AS guid. The FROM and WHERE clauses are not supported.

The following example shows a FORWARD_QUERY declaration in a .datasource file:

tinybird/datasources/forward-query.datasource - Data Source with a FORWARD_QUERY declaration
DESCRIPTION >
    Analytics events landing data source

SCHEMA >
    `timestamp` DateTime `json:$.timestamp`,
    `session_id` UUID `json:$.session_id`,
    `action` String `json:$.action`,
    `version` String `json:$.version`,
    `payload` String `json:$.payload`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp"
ENGINE_TTL "timestamp + toIntervalDay(60)"

FORWARD_QUERY >
    select timestamp, toUUID(session_id) as session_id, action, version, payload 

An alternative way to define a FORWARD_QUERY is to use the EXCEPT clause. For example:

FORWARD_QUERY >
    select * except 'session_id', toUUID(session_id) as session_id
Updated