GCS Connector¶
You can set up a GCS Connector to load your CSV, NDJSON, or Parquet files into Tinybird from any GCS bucket. Tinybird does not automatically detect new files; ingestion must be triggered manually.
Setting up the GCS Connector requires:
- Configuring a Service Account in GCP.
- Creating a connection file in Tinybird.
- Creating a data source that uses this connection.
Set Up the Connector¶
1. Create a GCS Connection¶
You can create a GCS Connection in Tinybird using either the CLI or by manually creating a connection file.
Option 1: Use the CLI (Recommended)¶
Run the following command to create a connection:
tb connection create gcs
You will be prompted to enter:
- A name for your connection.
- The GCS bucket name.
- The service account credentials (JSON key file). You can check Google's Cloud docs for mode details.
- Whether to create the connection for your Cloud environment.
Option 2: Manually Create a Connection File¶
Create a .connection
file with the required credentials:
gcs_sample.connection
TYPE gcs GCS_SERVICE_ACCOUNT_CREDENTIALS_JSON {{ tb_secret("GCS_KEY") }}
Ensure your GCP Service Account has the roles/storage.objectViewer
role.
Use different Service Account keys for each environment leveraging Tinybird Secrets.
2. Create a GCS Data Source¶
After setting up the connection, create a data source.
Create a .datasource
file using tb create --prompt
or manually:
gcs_sample.datasource
DESCRIPTION > Analytics events landing data source SCHEMA > `timestamp` DateTime `json:$.timestamp`, `session_id` String `json:$.session_id`, `action` LowCardinality(String) `json:$.action`, `version` LowCardinality(String) `json:$.version`, `payload` String `json:$.payload` ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" ENGINE_TTL "timestamp + toIntervalDay(60)" IMPORT_CONNECTION_NAME gcs_sample IMPORT_BUCKET_URI gs://my-bucket/*.csv IMPORT_SCHEDULE '@on-demand'
The IMPORT_CONNECTION_NAME
setting must match the name of your .connection
file.
3. Sync Data (Trigger Ingestion)¶
Since automatic ingestion (@auto
mode) is not supported, you must manually sync data when new files are available.
Using the API:¶
curl -X POST "https://api.tinybird.co/v0/datasources/sync" \ -H "Authorization: Bearer <your-tinybird-token>" \ -d '{"name": "<datasource_name>"}'
Using the CLI:¶
tb datasource sync <datasource_name>
.connection settings¶
The GCS connector use the following settings in .connection files:
Instruction | Required | Description |
---|---|---|
GCS_SERVICE_ACCOUNT_CREDENTIALS_JSON | Yes | Service Account Key in JSON format, inlined. We recommend using Tinybird Secrets. |
Once a connection is used in a data source, you can't change the Service Account Key. To modify it, you must:
- Remove the connection from the data source.
- Deploy the changes.
- Add the connection again with the new values.
.datasource settings¶
The GCS connector uses the following settings in .datasource files:
Instruction | Required | Description |
---|---|---|
IMPORT_CONNECTION_NAME | Yes | Name given to the connection inside Tinybird. For example, 'my_connection' . This is the name of the connection file you created in the previous step. |
IMPORT_BUCKET_URI | Yes | Full bucket path, including the gs:// protocol, bucket name, object path, and an optional pattern to match against object keys. For example, gs://my-bucket/my-path discovers all files in the bucket my-bucket under the prefix /my-path . You can use patterns in the path to filter objects, for example, ending the path with *.csv matches all objects that end with the .csv suffix. |
IMPORT_SCHEDULE | Yes | Use @on-demand to sync new files as needed, only files added to the bucket since the last execution will be appended to the datasource. You can also use @once , which behaves the same as @on-demand . However, @auto mode is not supported yet; if you use this option, only the initial sync will be executed. |
IMPORT_FROM_TIMESTAMP | No | Sets the date and time from which to start ingesting files on an GCS bucket. The format is YYYY-MM-DDTHH:MM:SSZ . |
We don't support changing these settings after the Data Source is created. If you need to do that, you must:
- Remove the connection from the data source.
- Deploy the changes.
- Add the connection again with the new values.
- Deploy again.
GCS Wildcard Patterns¶
Use GCS wildcards to match multiple files:
*
(single asterisk): Matches files at one directory level.- Example:
gs://bucket-name/*.ndjson
(matches all.ndjson
files in the root directory, but not in subdirectories).
- Example:
**
(double asterisk): Recursively matches files across multiple directory levels.- Example:
gs://bucket-name/**/*.ndjson
(matches all.ndjson
files anywhere in the bucket).
- Example:
GCS does not allow overlapping ingestion paths. For example, you cannot have:
gs://my_bucket/**/*.csv
gs://my_bucket/transactions/*.csv
Supported File Types¶
The GCS Connector supports the following formats:
File Type | Accepted Extensions | Supported Compression | ||
---|---|---|
CSV | .csv , .csv.gz | gzip | NDJSON | .ndjson , .ndjson.gz , .jsonl , .jsonl.gz | gzip | Parquet | .parquet , .parquet.gz | snappy , gzip , lzo , brotli , lz4 , zstd |
JSON files must follow the Newline Delimited JSON (NDJSON) format. Each line must be a valid JSON object and must end with a \n
character.
GCS Permissions¶
To authenticate Tinybird with GCS, you need a GCP service account key in JSON format with the Object Storage Viewer role.
- In the Google Cloud Console, create or use an existing service account.
- Assign the
roles/storage.objectViewer
role. - Generate a JSON key file and download it.
- Store the key as a Tinybird secret:
tb secret set GCS_KEY '<your-json-key-content>'
Limitations¶
- No
@auto
mode: Ingestion must be triggered manually. - File format support: Only CSV, NDJSON, and Parquet are supported.
- Permissions: Ensure your service account has the correct role assigned.