GCS Connector

You can set up a GCS Connector to load your CSV, NDJSON, or Parquet files into Tinybird from any GCS bucket. Tinybird does not automatically detect new files; ingestion must be triggered manually.

Setting up the GCS Connector requires:

  1. Configuring a Service Account in GCP.
  2. Creating a connection file in Tinybird.
  3. Creating a data source that uses this connection.

Set Up the Connector

1

1. Create a GCS Connection

You can create a GCS Connection in Tinybird using either the CLI or by manually creating a connection file.

Run the following command to create a connection:

tb connection create gcs

You will be prompted to enter:

  1. A name for your connection.
  2. The GCS bucket name.
  3. The service account credentials (JSON key file). You can check Google's Cloud docs for mode details.
  4. Whether to create the connection for your Cloud environment.

Option 2: Manually Create a Connection File

Create a .connection file with the required credentials:

gcs_sample.connection
TYPE gcs
GCS_SERVICE_ACCOUNT_CREDENTIALS_JSON {{ tb_secret("GCS_KEY") }}

Ensure your GCP Service Account has the roles/storage.objectViewer role.

Use different Service Account keys for each environment leveraging Tinybird Secrets.

2

2. Create a GCS Data Source

After setting up the connection, create a data source.

Create a .datasource file using tb create --prompt or manually:

gcs_sample.datasource
DESCRIPTION >
    Analytics events landing data source

SCHEMA >
    `timestamp` DateTime `json:$.timestamp`,
    `session_id` String `json:$.session_id`,
    `action` LowCardinality(String) `json:$.action`,
    `version` LowCardinality(String) `json:$.version`,
    `payload` String `json:$.payload`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp"
ENGINE_TTL "timestamp + toIntervalDay(60)"

IMPORT_CONNECTION_NAME gcs_sample
IMPORT_BUCKET_URI gs://my-bucket/*.csv
IMPORT_SCHEDULE '@on-demand'

The IMPORT_CONNECTION_NAME setting must match the name of your .connection file.

3

3. Sync Data (Trigger Ingestion)

Since automatic ingestion (@auto mode) is not supported, you must manually sync data when new files are available.

Using the API:

curl -X POST "https://api.tinybird.co/v0/datasources/sync" \
  -H "Authorization: Bearer <your-tinybird-token>" \
  -d '{"name": "<datasource_name>"}'

Using the CLI:

tb datasource sync <datasource_name>

.connection settings

The GCS connector use the following settings in .connection files:

InstructionRequiredDescription
GCS_SERVICE_ACCOUNT_CREDENTIALS_JSONYesService Account Key in JSON format, inlined. We recommend using Tinybird Secrets.

Once a connection is used in a data source, you can't change the Service Account Key. To modify it, you must:

  1. Remove the connection from the data source.
  2. Deploy the changes.
  3. Add the connection again with the new values.

.datasource settings

The GCS connector uses the following settings in .datasource files:

InstructionRequiredDescription
IMPORT_CONNECTION_NAMEYesName given to the connection inside Tinybird. For example, 'my_connection'. This is the name of the connection file you created in the previous step.
IMPORT_BUCKET_URIYesFull bucket path, including the gs:// protocol, bucket name, object path, and an optional pattern to match against object keys. For example, gs://my-bucket/my-path discovers all files in the bucket my-bucket under the prefix /my-path. You can use patterns in the path to filter objects, for example, ending the path with *.csv matches all objects that end with the .csv suffix.
IMPORT_SCHEDULEYesUse @on-demand to sync new files as needed, only files added to the bucket since the last execution will be appended to the datasource. You can also use @once, which behaves the same as @on-demand. However, @auto mode is not supported yet; if you use this option, only the initial sync will be executed.
IMPORT_FROM_TIMESTAMPNoSets the date and time from which to start ingesting files on an GCS bucket. The format is YYYY-MM-DDTHH:MM:SSZ.

We don't support changing these settings after the Data Source is created. If you need to do that, you must:

  1. Remove the connection from the data source.
  2. Deploy the changes.
  3. Add the connection again with the new values.
  4. Deploy again.

GCS Wildcard Patterns

Use GCS wildcards to match multiple files:

  • * (single asterisk): Matches files at one directory level.
    • Example: gs://bucket-name/*.ndjson (matches all .ndjson files in the root directory, but not in subdirectories).
  • ** (double asterisk): Recursively matches files across multiple directory levels.
    • Example: gs://bucket-name/**/*.ndjson (matches all .ndjson files anywhere in the bucket).

GCS does not allow overlapping ingestion paths. For example, you cannot have:

  • gs://my_bucket/**/*.csv
  • gs://my_bucket/transactions/*.csv

Supported File Types

The GCS Connector supports the following formats:

File Type | Accepted Extensions | Supported Compression
CSV | .csv, .csv.gz | gzipNDJSON | .ndjson, .ndjson.gz, .jsonl, .jsonl.gz | gzipParquet | .parquet, .parquet.gz | snappy, gzip, lzo, brotli, lz4, zstd

JSON files must follow the Newline Delimited JSON (NDJSON) format. Each line must be a valid JSON object and must end with a \n character.

GCS Permissions

To authenticate Tinybird with GCS, you need a GCP service account key in JSON format with the Object Storage Viewer role.

  1. In the Google Cloud Console, create or use an existing service account.
  2. Assign the roles/storage.objectViewer role.
  3. Generate a JSON key file and download it.
  4. Store the key as a Tinybird secret:
tb secret set GCS_KEY '<your-json-key-content>'

Limitations

  • No @auto mode: Ingestion must be triggered manually.
  • File format support: Only CSV, NDJSON, and Parquet are supported.
  • Permissions: Ensure your service account has the correct role assigned.
Updated