Add data from a local file¶
You can add data from a local file to Tinybird using the Data sources API or the tb datasource CLI command.
Supported file types¶
Tinybird supports these file types and compression formats at ingest time:
File type | Method | Accepted extensions | Compression formats supported |
---|---|---|---|
CSV | File upload, URL | .csv , .csv.gz | gzip |
NDJSON | File upload, URL, Events API | .ndjson , .ndjson.gz | gzip |
Parquet | File upload, URL | .parquet , .parquet.gz | gzip |
Avro | Kafka | gzip |
Analyze the schema of a file¶
Before you upload data from a file or create a data source, you can analyze the scheme of the file. Tinybird infers column names, types, and JSONPaths. This is helpful to identify the most appropriate data types for your columns. See Data types.
The following examples show how to analyze a local CSV file.
tb datasource analyze local_file.csv
Append data from a file¶
You can append data from a local or remote file to a data source in Tinybird Local or Tinybird Cloud.
The following examples show how to append data from a local file to a data source in Tinybird Cloud:
tb --cloud datasource append <data_source_name> local_file.csv
When appending CSV files, you can improve performance by excluding the CSV Header line. However, in this case, make sure the CSV columns are ordered. If you can't guarantee the order of columns in your CSV, include the CSV header.
Replace data from a file¶
You can replace existing all data or a selection of data in a data source with the contents of a file. You can replace with data from local or remote files.
The following examples show how to replace data in Tinybird Cloud:
tb --cloud datasource replace <data_source_name> local_file.csv
Replace data based on conditions¶
Instead of replacing all data, you can also replace specific partitions of data. To do this, you define an SQL condition that describes the filter that's applied. All matching rows are deleted before finally ingesting the new file. Only the rows matching the condition are ingested.
Replacements are made by partition, so make sure that the condition filters on the partition key of the data source. If the source file contains rows that don't match the filter, the rows are ignored.
The following examples show how to replace partial data using a condition:
tb --cloud datasource replace <data_source_name> local_file.csv --sql-condition "my_partition_key > 123"
All the dependencies of the data source are recalculated so that your data is consistent after the replacement. If you have n-level dependencies, they're also updated by this operation.
Although replacements are atomic, Tinybird can't assure data consistency if you continue appending data to any related data source at the same time the replacement takes place. The new incoming data is discarded.