Data Source API ¶
Use the Data Source API to list, create, update or delete your Tinybird Data Sources. You can also insert or delete data from Data Sources.
New to Data Sources? Read more about them here
All endpoints require authentication using a Token with the appropriate scope.
- POST /v0/datasources/?¶
This endpoint supports 3 modes to enable 3 distinct operations, depending on the parameters provided:
Create a new Data Source with a schema
Append data to an existing Data Source
Replace data in an existing Data Source
The mode is controlled by setting the
mode
parameter, for example,-d "mode=create"
. Each mode has different rate limits.When importing remote files by URL, if the server hosting the remote file supports HTTP Range headers, the import process will be parallelized.
Request parameters¶ KEY
TYPE
DESCRIPTION
mode
String
Default:
create
. Other modes:append
andreplace
.
Thecreate
mode creates a new Data Source and attempts to import the data of the CSV if a URL is provided or the body contains any data.
Theappend
mode inserts the new rows provided into an existing Data Source (it will also create it if it does not exist yet).
Thereplace
mode will remove the previous Data Source and its data and replace it with the new one; Pipes or queries pointing to this Data Source will immediately start returning data from the new one and without disruption once the replace operation is complete.
Thecreate
mode will automatically name the Data Source if noname
parameter is provided; for theappend
andreplace
modes to work, thename
parameter must be provided and the schema must be compatible.name
String
Optional. Name of the Data Source to create, append or replace data. This parameter is mandatory when using the
append
orreplace
modes.url
String
Optional. The URL of the CSV with the data to be imported
dialect_delimiter
String
Optional. The one-character string separating the fields. We try to guess the delimiter based on the CSV contents using some statistics, but sometimes we fail to identify the correct one. If you know your CSV’s field delimiter, you can use this parameter to explicitly define it.
dialect_new_line
String
Optional. The one- or two-character string separating the records. We try to guess the delimiter based on the CSV contents using some statistics, but sometimes we fail to identify the correct one. If you know your CSV’s record delimiter, you can use this parameter to explicitly define it.
dialect_escapechar
String
Optional. The escapechar removes any special meaning from the following character. This is useful if the CSV does not use double quotes to encapsulate a column but uses double quotes in the content of a column and it is escaped with, e.g. a backslash.
schema
String
Optional. Data Source schema in the format ‘column_name Type, column_name_2 Type2…’. When creating a Data Source with format
ndjson
theschema
must include thejsonpath
for each column, see theJSONPaths
section for more details.engine
String
Optional. Engine for the underlying data. Requires the
schema
parameter.engine_*
String
Optional. Engine parameters and options, check the Engines section for more details
progress
String
Default:
false
. When usingtrue
and sending the data in the request body, Tinybird will return block status while loading using Line-delimited JSON.token
String
Auth token with create or append permissions. Required only if no Bearer Authorization header is found
type_guessing
String
Default:
true
Thetype_guessing
parameter is not taken into account when replacing or appending data to an existing Data Source. When usingfalse
all columns are created asString
otherwise it tries to guess the column types based on the CSV contents. Sometimes you are not familiar with the data and the first step is to get familiar with it: by disabling the type guessing, we enable you to quickly import everything as strings that you can explore with SQL and cast to the right type or shape in whatever way you see fit via a Pipe.debug
String
Optional. Enables returning debug information from logs. It can include
blocks
,block_log
and/orhook_log
replace_condition
String
Optional. When used in combination with the
replace
mode it allows you to replace a portion of your Data Source that matches thereplace_condition
SQL statement with the contents of theurl
or query passed as a parameter. See this guide to learn more.replace_truncate_when_empty
Boolean
Optional. When used in combination with the
replace
mode it allows truncating the Data Source when empty data is provided. Not supported whenreplace_condition
is specifiedformat
String
Default:
csv
. Indicates the format of the data to be ingested in the Data Source. By default iscsv
and you should specifyformat=ndjson
for NDJSON format, andformat=parquet
for Parquet files.Examples
Creating a CSV Data Source from a schema¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String, date Date, close Float32"
Creating a CSV Data Source from a local CSV file with schema inference¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources?name=stocks" \ -F csv=@local_file.csv
Creating a CSV Data Source from a remote CSV file with schema inference¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d url='https://.../data.csv'
Creating an empty Data Source with a ReplacingMergeTree engine and custom engine settings¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "schema=pk UInt64, insert_date Date, close Float32" \ -d "engine=ReplacingMergeTree" \ -d "engine_sorting_key=pk" \ -d "engine_ver=insert_date" \ -d "name=test123" \ -d "engine_settings=index_granularity=2048, ttl_only_drop_parts=false"
Appending data to a Data Source from a local CSV file¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources?name=data_source_name&mode=append" \ -F csv=@local_file.csv
Appending data to a Data Source from a remote CSV file¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d mode='append' \ -d name='data_source_name' \ -d url='https://.../data.csv'
Replacing data with a local file¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources?name=data_source_name&mode=replace" \ -F csv=@local_file.csv
Replacing data with a remote file from a URL¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d mode='replace' \ -d name='data_source_name' \ --data-urlencode "url=http://example.com/file.csv"
- GET /v0/datasources/?¶
- getting a list of your Data Sources¶
curl \ -H "Authorization: Bearer <DATASOURCES:READ token>" \ -X GET "https://api.tinybird.co/v0/datasources"
Get a list of the Data Sources in your account.
The token you use to query the available Data Sources will determine what Data Sources get returned: only those accessible with the token you are using will be returned in the response.
Successful response¶{ "datasources": [{ "id": "t_a049eb516ef743d5ba3bbe5e5749433a", "name": "your_datasource_name", "cluster": "tinybird", "tags": {}, "created_at": "2019-11-13 13:53:05.340975", "updated_at": "2022-02-11 13:11:19.464343", "replicated": true, "version": 0, "project": null, "headers": {}, "shared_with": [ "89496c21-2bfe-4775-a6e8-97f1909c8fff" ], "engine": { "engine": "MergeTree", "engine_sorting_key": "example_column_1", "engine_partition_key": "", "engine_primary_key": "example_column_1" }, "description": "", "used_by": [], "type": "csv", "columns": [{ "name": "example_column_1", "type": "Date", "codec": null, "default_value": null, "jsonpath": null, "nullable": false, "normalized_name": "example_column_1" }, { "name": "example_column_2", "type": "String", "codec": null, "default_value": null, "jsonpath": null, "nullable": false, "normalized_name": "example_column_2" } ], "statistics": { "bytes": 77822, "row_count": 226188 }, "new_columns_detected": {}, "quarantine_rows": 0 }] }
Request parameters¶ Key
Type
Description
attrs
String
comma separated list of the Data Source attributes to return in the response. Example:
attrs=name,id,engine
. Leave empty to return a full responseNote that the
statistics
’sbytes
androw_count
attributes might benull
depending on how the Data Source was created.
- POST /v0/datasources/(.+)/alter¶
Modify the Data Source schema.
This endpoint supports the operation to alter the following fields of a Data Source:
Request parameters¶ Key
Type
Description
schema
String
Optional. Set the whole schema that adds new columns to the existing ones of a Data Source.
description
String
Optional. Sets the description of the Data Source.
kafka_store_raw_value
Boolean
Optional. Default: false. When set to true, the ‘value’ column of a Kafka Data Source will save the JSON as a raw string.
kafka_store_headers
Boolean
Optional. Default: false. When set to true, the ‘headers’ of a Kafka Data Source will be saved as a binary map.
ttl
String
Optional. Set to any value accepted in ClickHouse for a TTL or to ‘false’ to remove the TTL.
dry
Boolean
Optional. Default: false. Set to true to show what would be modified in the Data Source, without running any modification at all.
The schema parameter can be used to add new columns at the end of the existing ones in a Data Source.
Be aware that currently we don’t validate if the change will affect the existing MVs (Materialized Views) attached to the Data Source to be modified, so this change may break existing MVs. For example, avoid changing a Data Source that has a MV created with something like
SELECT * FROM Data Source ...
. If you want to have forward compatible MVs with column additions, create them especifying the columns instead of using the*
operator.Also, take in account that, for now, the only engines supporting adding new columns are those inside the MergeTree family.
To add a column to a Data Source, call this endpoint with the Data Source name and the new schema definition.
For example, having a Data Source created like this:
Creating a Data Source from a schema¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String, date Date, close Float32"
if you want to add a new column ‘concept String’, you need to call this endpoint with the new schema:
Adding a new column to an existing Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \ -d "schema=symbol String, date Date, close Float32, concept String"
If everything went ok, you will get the operations done in the response:
ADD COLUMN operation resulted from the schema change.¶{ "operations": [ "ADD COLUMN `concept` String" ] }
You can also view the inferred operations without executing them adding
dry=true
in the parameters.To modify the description of a Data Source:
Modifying the description a Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \ -d "name=stocks" \ -d "description=My new description"
To save in the “value” column of a Kafka Data Source the JSON as a raw string:
Saving the raw string in the value column of a Kafka Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \ -d "name=stocks" \ -d "kafka_store_raw_value=true" -d "kafka_store_headers=true"
To modify the TTL of a Data Source:
Modifying the TTL of a Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \ -d "name=stocks" \ -d "ttl=12 hours"
To remove the TTL of a Data Source:
Modifying the TTL of a Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "ttl=false"
To add default values to the columns of a Data Source:
Modifying default values¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String DEFAULT '-', date Date DEFAULT now(), close Float32 DEFAULT 1.1"
To add default values to the columns of a NDJSON Data Source, add the default definition after the jsonpath definition:
Modifying default values in a NDJSON Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String `json:$.symbol` DEFAULT '-', date Date `json:$.date` DEFAULT now(), close `json:$.close` Float32 DEFAULT 1.1"
To make a column nullable, change the type of the column adding the Nullable type prefix to old one:
Converting column “close” to Nullable¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String `json:$.symbol, date Date `json:$.date`, close `json:$.close` Nullable(Float32)"
To drop a column, just remove the column from the schema definition. It will not be possible removing columns that are part of the primary or partition key:
Remove column “close” from the Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String `json:$.symbol, date Date `json:$.date`"
You can also alter the JSONPaths of existing Data Sources. In that case you have to specify the JSONPath in the schema in the same way as when you created the Data Source.
- POST /v0/datasources/(.+)/truncate¶
Truncates a Data Source in your account. If the Data Source has dependent Materialized Views, those won’t be truncated in cascade. In case you want to delete data from other dependent Materialized Views, you’ll have to do a subsequent call to this method. Auth token in use must have the
DATASOURCES:CREATE
scope.Truncating a Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources/name/truncate"
This works as well for the
quarantine
table of a Data Source. Remember that the quarantine table for a Data Source has the same name but with the “_quarantine” suffix.Truncating the quarantine table from a Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:DROP token>" \ -X POST "https://api.tinybird.co/v0/datasources/:name_quarantine/truncate"
- POST /v0/datasources/(.+)/delete¶
Deletes rows from a Data Source in your account given a SQL condition. Auth token in use must have the
DATASOURCES:CREATE
scope.Deleting rows from a Data Source given a SQL condition¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ --data "delete_condition=(country='ES')" \ "https://api.tinybird.co/v0/datasources/:name/delete"
When deleting rows from a Data Source, the response will not be the final result of the deletion but a Job. You can check the job status and progress using the Jobs API. In the response,
id
,job_id
, anddelete_id
should have the same value:Delete API Response¶{ "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_url": "https://api.tinybird.co/v0/jobs/64e5f541-xxxx-xxxx-xxxx-00524051861b", "job": { "kind": "delete_data", "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "status": "waiting", "created_at": "2023-04-11 13:52:32.423207", "updated_at": "2023-04-11 13:52:32.423213", "started_at": null, "is_cancellable": true, "datasource": { "id": "t_c45d5ae6781b41278fcee365f5bxxxxx", "name": "shopping_data" }, "delete_condition": "event = 'search'" }, "status": "waiting", "delete_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b" }
To check on the progress of the delete job, use the
job_id
from the Delete API response to query the Jobs API.For example, to check on the status of the above delete job:
checking the status of the delete job¶curl \ -H "Authorization: Bearer <TOKEN>" \ https://api.tinybird.co/v0/jobs/64e5f541-xxxx-xxxx-xxxx-00524051861b
Would respond with:
Job API Response¶{ "kind": "delete_data", "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "status": "done", "created_at": "2023-04-11 13:52:32.423207", "updated_at": "2023-04-11 13:52:37.330020", "started_at": "2023-04-11 13:52:32.842861", "is_cancellable": false, "datasource": { "id": "t_c45d5ae6781b41278fcee365f5bc2d35", "name": "shopping_data" }, "delete_condition": " event = 'search'", "rows_affected": 100 }
Data Source engines supported
Tinybird uses ClickHouse as the underlying storage technology. ClickHouse features different strategies to store data, these different strategies define not only where and how the data is stored but what kind of data access, queries, and availability your data has. In ClickHouse terms, a Tinybird Data Source uses a Table Engine that determines those factors.
Currently, Tinybird supports deleting data for data sources with the following Engines:
MergeTree
ReplacingMergeTree
SummingMergeTree
AggregatingMergeTree
CollapsingMergeTree
VersionedCollapsingMergeTree
Dependent views deletion
If the Data Source has dependent Materialized Views, those won’t be cascade deleted. In case you want to delete data from other dependent Materialized Views, you’ll have to do a subsequent call to this method for the affected view with a proper
delete_condition
. This applies as well to the associatedquarantine
Data Source.Request parameters¶ KEY
TYPE
DESCRIPTION
delete_condition
String
Mandatory. A string representing the WHERE SQL clause you’d add to a regular DELETE FROM <table> WHERE <delete_condition> statement. Most of the times you might want to write a simple
delete_condition
such ascolumn_name=value
but any valid SQL statement including conditional operators is validdry_run
String
Default:
false
. It allows you to test the deletion. When usingtrue
it will execute all deletion validations and return number of matchedrows_to_be_deleted
.
- GET /v0/datasources/(.+)¶
- Getting information about a particular Data Source¶
curl \ -H "Authorization: Bearer <DATASOURCES:READ token>" \ -X GET "https://api.tinybird.co/v0/datasources/datasource_name"
Get Data Source information and stats. The token provided must have read access to the Data Source.
Successful response¶{ "id": "t_bd1c62b5e67142bd9bf9a7f113a2b6ea", "name": "datasource_name", "statistics": { "bytes": 430833, "row_count": 3980 }, "used_by": [{ "id": "t_efdc62b5e67142bd9bf9a7f113a34353", "name": "pipe_using_datasource_name" }] "updated_at": "2018-09-07 23:50:32.322461", "created_at": "2018-11-28 23:50:32.322461", "type": "csv" }
Request parameters¶ Key
Type
Description
attrs
String
comma separated list of the Data Source attributes to return in the response. Example:
attrs=name,id,engine
. Leave empty to return a full responseid
andname
are two ways to refer to the Data Source in SQL queries and API endpoints. The only difference is that theid
never changes; it will work even if you change thename
(which is the name used to display the Data Source in the UI). In general you can useid
orname
indistinctively:Using the above response as an example:
select count(1) from events_table
is equivalent to
select count(1) from t_bd1c62b5e67142bd9bf9a7f113a2b6ea
The id
t_bd1c62b5e67142bd9bf9a7f113a2b6ea
is not a descriptive name so you can add a description liket_my_events_datasource.bd1c62b5e67142bd9bf9a7f113a2b6ea
The
statistics
property contains information about the table. Those numbers are an estimation:bytes
is the estimated data size on disk androw_count
the estimated number of rows. These statistics are updated whenever data is appended to the Data Source.The
used_by
property contains the list of pipes that are using this data source. Only Pipeid
andname
are sent.The
type
property indicates theformat
used when the Data Source was created. Available formats arecsv
,ndjson
, andparquet
. The Data Sourcetype
indicates what file format you can use to ingest data.
- DELETE /v0/datasources/(.+)¶
- Dropping a Data Source¶
curl \ -H "Authorization: Bearer <DATASOURCES:DROP token>" \ -X DELETE "https://api.tinybird.co/v0/datasources/:name"
Drops a Data Source from your account.
Request parameters¶ Key
Type
Description
force
String
Default:
false
. Theforce
parameter is taken into account when trying to delete Materialized Views. By default, when usingfalse
the deletion will not be carried out; you can enable it by setting it totrue
. If the given Data Source is being used as the trigger of a Materialized Node, it will not be deleted in any case.dry_run
String
Default:
false
. It allows you to test the deletion. When usingtrue
it will execute all deletion validations and return the possible affected materializations and other dependencies of a given Data Source.token
String
Auth token. Only required if no Bearer Authorization header is sent. It must have
DROP:datasource_name
scope for the given Data Source.
- PUT /v0/datasources/(.+)¶
Update Data Source attributes
Updating the name of a Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X PUT "https://api.tinybird.co/v0/datasources/:name?name=new_name"
Promoting a Data Source to a Snowflake one¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X PUT "https://api.tinybird.co/v0/datasources/:name" \ -d "connector=1d8232bf-2254-4d68-beff-4dd9aa505ab0" \ -d "service=snowflake" \ -d "cron=*/30 * * * *" \ -d "query=select a, b, c from test" \ -d "mode=replace" \ -d "external_data_source=database.schema.table" \ -d "ingest_now=True" \
Request parameters¶ Key
Type
Description
name
String
new name for the Data Source
token
String
Auth token. Only required if no Bearer Authorization header is sent. It should have
DATASOURCES:CREATE
scope for the given Data Sourceconnector
String
Connector ID to link it to
service
String
Type of service to promote it to. Only ‘snowflake’ or ‘bigquery’ allowed
cron
String
Cron-like pattern to execute the connector’s job
query
String
Optional: custom query to collect from the external data source
mode
String
Only replace is allowed for connectors
external_data_source
String
External data source to use for Snowflake
ingest_now
Boolean
To ingest the data immediately instead of waiting for the first execution determined by cron