Datasource files (.datasource)¶

Datasource files describe your data sources. You can use .datasource files to define the schema, engine, and other settings of your data sources.

Available instructions¶

The following instructions are available for .datasource files.

Declaration	Required	Description
`SCHEMA <indented_schema_definition>`	Yes	Defines a block for a data source schema. The block must be indented.
`ENGINE <engine_type>`	No	Sets the engine for data source. Default value is `MergeTree`.
`ENGINE_SORTING_KEY <sql>`	No	Sets the `ORDER BY` expression for the data source.
`ENGINE_PARTITION_KEY <sql>`	No	Sets the `PARTITION` expression for the data source.
`ENGINE_TTL <sql>`	No	Sets the `TTL` expression for the data source.
`ENGINE_VER <column_name>`	No	Column with the version of the object state. Required when using `ENGINE ReplacingMergeTree`.
`ENGINE_SIGN <column_name>`	No	Column to compute the state. Required when using `ENGINE CollapsingMergeTree` or `ENGINE VersionedCollapsingMergeTree`.
`ENGINE_VERSION <column_name>`	No	Column with the version of the object state. Required when `ENGINE VersionedCollapsingMergeTree`.
`ENGINE_SETTINGS <settings>`	No	Comma-separated list of key-value pairs that describe engine settings for the data source.
`FORWARD_QUERY <sql>`	No	Defines a query to execute on the data source. The results of the query are returned instead of the original schema defined in the `SCHEMA` declaration. See Evolve data sources.
`TOKEN <token_name> READ\|APPEND`	No	Grants read or append access to a datasource to the token named <token_name>. If the token isn't specified or <token_name> doesn't exist, it will be automatically created.

The following example shows a typical .datasource file:

tinybird/datasources/example.datasource

# A comment
SCHEMA >
    `timestamp` DateTime `json:$.timestamp`,
    `session_id` String `json:$.session_id`,
    `action` LowCardinality(String) `json:$.action`,
    `version` LowCardinality(String) `json:$.version`,
    `payload` String `json:$.payload`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp"
ENGINE_TTL "timestamp + toIntervalDay(60)"
ENGINE_SETTINGS "index_granularity=8192"

Schema¶

A SCHEMA declaration is a newline, comma-separated list of columns definitions. For example:

Example SCHEMA declaration

SCHEMA >
    `timestamp` DateTime `json:$.timestamp`,
    `session_id` String `json:$.session_id`,
    `action` LowCardinality(String) `json:$.action`,
    `version` LowCardinality(String) `json:$.version`,
    `payload` String `json:$.payload`

Each column in a SCHEMA declaration is in the format <column_name> <data_type> <json_path> <default_value>, where:

<column_name> is the name of the column in the data source.
<data_type> is one of the supported Data types.
<json_path> is optional and only required for NDJSON data sources.
<default_value> sets a default value to the column when it's null. A common use case is to set a default date to a column, like updated_at DateTime DEFAULT now().

JSONPath expressions¶

SCHEMA definitions need JSONPath expressions when working with Parquet or NDJSON data.

It supports base fields json:$.field, arrays json:$.an_array[:], nested fields json:$.nested.nested_field, and storing the whole object json:$.

For example, given this JSON object:

{
  "field": "test",
  "nested": { "nested_field": "bla" },
  "an_array": [1, 2, 3],
  "a_nested_array": { "nested_array": [1, 2, 3] }
}

The schema would be something like this:

jsonpaths.datasource

SCHEMA >
    field String `json:$.field`,
    nested_nested_field String `json:$.nested.nested_field`,
    an_array Array(Int16) `json:$.an_array[:]`,
    a_nested_array_nested_array Array(Int16) `json:$.a_nested_array.nested_array[:]`,
    whole_message String `json:$`

Use brackets for attributes with dots that are not actual nested attributes, like this:

{
    "attributes": {
        "otel.attributes": {
            "cli_command": "datasource ls",
        }
    }
}

jsonpaths.datasource

SCHEMA >
    cli_command String `json:$.attributes.['otel.attributes'].cli_command`

Tinybird's JSONPath syntax support has some limitations: It support nested objects at multiple levels, but it supports nested arrays only at the first level, as in the example before. To ingest and transform more complex JSON objects, store the whole JSON as a String "<column_name> String json:$", and use JSONExtract functions to parse at query time or in materializations.

Column compression codecs¶

Tinybird applies compression codecs to data types to optimize storage. You can override the default compression codecs by adding the CODEC(<codec>) statement after the type declarations in your .datasource schema. For example:

SCHEMA >
    `product_id` Int32 `json:$.product_id`,
    `timestamp` DateTime64(3) `json:$.timestamp` CODEC(DoubleDelta, ZSTD(1)),

Engine settings¶

ENGINE declares the engine used for the data source. The default value is MergeTree.

See Engines for more information.

Connector settings¶

A data source file can contain connector settings for certain type of sources, such as Kafka or S3. See Connectors.

Forward query¶

If you make changes to a .datasource file that are incompatible with the live version, you must use the FORWARD_QUERY instruction to transform the data from the live schema to the new one. Otherwise, your deployment will fail due to a schema mismatch.

See Evolve data sources for more information.

Get started

Ingest data

Work with data

Analytics agents

Test and deploy

Monitor your data

Administration

Pricing

Deployment options

Reference

Datasource files (.datasource)¶

Available instructions¶

tinybird/datasources/example.datasource

Schema¶

Example SCHEMA declaration

JSONPath expressions¶

jsonpaths.datasource

jsonpaths.datasource

Column compression codecs¶

Engine settings¶

Connector settings¶

Forward query¶