Datasource files (.datasource)¶
Datasource files describe your data sources. You can use .datasource files to define the schema, engine, and other settings of your data sources.
Connectors for cloud storage and Kafka are coming soon. They aren't currently available for v2.
Available instructions¶
The following instructions are available for .datasource files.
Declaration | Required | Description |
---|---|---|
SCHEMA <indented_schema_definition> | Yes | Defines a block for a Data Source schema. The block must be indented. |
ENGINE <engine_type> | No | Sets the engine for Data Source. Default value is MergeTree . |
ENGINE_SORTING_KEY <sql> | No | Sets the ORDER BY expression for the Data Source. If unset, it defaults to DateTime, numeric, or String columns, in that order. |
ENGINE_PARTITION_KEY <sql> | No | Sets the PARTITION expression for the Data Source. |
ENGINE_TTL <sql> | No | Sets the TTL expression for the Data Source. |
ENGINE_VER <column_name> | No | Column with the version of the object state. Required when using ENGINE ReplacingMergeTree . |
ENGINE_SIGN <column_name> | No | Column to compute the state. Required when using ENGINE CollapsingMergeTree or ENGINE VersionedCollapsingMergeTree . |
ENGINE_VERSION <column_name> | No | Column with the version of the object state. Required when ENGINE VersionedCollapsingMergeTree . |
ENGINE_SETTINGS <settings> | No | Comma-separated list of key-value pairs that describe engine settings for the Data Source. |
FORWARD_QUERY <sql> | No | Defines a query to execute on the Data Source. The results of the query are returned instead of the original schema defined in the SCHEMA declaration. See Evolve data sources. |
TOKEN <token_name> READ|APPEND | No | Grants read or append access to a datasource to the token named <token_name>. If the token isn't specified or <token_name> doesn't exist, it will be automatically created. |
The following example shows a typical .datasource file:
tinybird/datasources/example.datasource
# A comment SCHEMA > `timestamp` DateTime `json:$.timestamp`, `session_id` String `json:$.session_id`, `action` LowCardinality(String) `json:$.action`, `version` LowCardinality(String) `json:$.version`, `payload` String `json:$.payload` ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" ENGINE_TTL "timestamp + toIntervalDay(60)" ENGINE_SETTINGS "index_granularity=8192"
Schema¶
A SCHEMA
declaration is a newline, comma-separated list of columns definitions. For example:
Example SCHEMA declaration
SCHEMA > `timestamp` DateTime `json:$.timestamp`, `session_id` String `json:$.session_id`, `action` LowCardinality(String) `json:$.action`, `version` LowCardinality(String) `json:$.version`, `payload` String `json:$.payload`
Each column in a SCHEMA
declaration is in the format <column_name> <data_type> <json_path> <default_value>
, where:
<column_name>
is the name of the column in the Data Source.<data_type>
is one of the supported Data types.<json_path>
is optional and only required for NDJSON data sources.<default_value>
sets a default value to the column when it's null. A common use case is to set a default date to a column, likeupdated_at DateTime DEFAULT now()
.
JSONPath expressions¶
SCHEMA
definitions support JSONPath expressions. For example:
Schema syntax with jsonpath
DESCRIPTION Generated from /Users/username/tmp/sample.ndjson SCHEMA > `d` DateTime `json:$.d`, `total` Int32 `json:$.total`, `from_novoa` Int16 `json:$.from_novoa`
Engine settings¶
ENGINE
declares the engine used for the Data Source. The default value is MergeTree
.
See Engines for more information.
Connector settings¶
A data source file can contain connector settings for certain type of sources, such as Kafka or S3. See Connectors.
Forward query¶
If you make changes to a .datasource file that are incompatible with the live version, you must use the FORWARD_QUERY
instruction to transform the data from the live schema to the new one. Otherwise, your deployment will fail due to a schema mismatch.
See Evolve data sources for more information.