Ingest data from BigQuery using Google Cloud Storage

Read on to learn how to send data from BigQuery to Tinybird, for example when you need to periodically run full replaces of a table or do a one-off ingest.

This process relies on BigQuery's exporting capabilities, or bulk exporting data as gzipped CSVs and then ingesting them using the Data Sources API.

Prerequisites

To follow these steps you need a Tinybird account and access to BigQuery and Google Cloud Storage.

1

Export the BigQuery table

The first step consists in unloading the BigQuery table, or query result set, to a gzipped CSV file.

Run the following SQL statement to unload the data:

EXPORT DATA
  OPTIONS (
    uri = 'gs://bucket/folder/*.csv',
    format = 'CSV',
    compression = 'GZIP',
    overwrite = true,
    header = true,
    field_delimiter = ';')
AS (
  SELECT field1, field2
  FROM mydataset.table1
  ORDER BY field1
);

More details about the EXPORT DATA statement in the Google Cloud docs

To automate the unloading, you can create a BigQuery Scheduled Query.

2

Ingest data into Tinybird

After your data is in a Google Cloud Storage bucket, you can ingest it in Tinybird generating signed URLs and sending them to the Data Sources API.

You can use this example script:

import json
import time
import requests
from google.cloud import storage
from google.auth.credentials import AnonymousCredentials
from google.oauth2 import service_account
from datetime import datetime, timedelta

# Configuration
BUCKET_NAME = "your-gcs-bucket"
TINYBIRD_HOST = "https://api.tinybird.co"
TINYBIRD_DATASOURCE = "your_datasource_name"
TINYBIRD_TOKEN = "your_tinybird_token"
SERVICE_ACCOUNT_JSON = "path/to/service-account.json"
URL_EXPIRATION = 3600  # 1 hour

# Authenticate with GCS
credentials = service_account.Credentials.from_service_account_file(SERVICE_ACCOUNT_JSON)
storage_client = storage.Client(credentials=credentials)

def generate_signed_url(blob_name):
    """Generate a signed URL for a blob in GCS."""
    bucket = storage_client.bucket(BUCKET_NAME)
    blob = bucket.blob(blob_name)
    url = blob.generate_signed_url(expiration=timedelta(seconds=URL_EXPIRATION))
    return url

def list_gcs_files():
    """List files in the specified GCS bucket."""
    bucket = storage_client.bucket(BUCKET_NAME)
    return [blob.name for blob in bucket.list_blobs()]

def send_to_tinybird(url, tinybird_token, datasource_name):
    """Uploads signed URLs to Tinybird."""
    u = f"{TINYBIRD_HOST}/v0/datasources?name={datasource_name}&mode=append&format=ndjson&token={tinybird_token}"
    
    data = {"url": url}
    response = requests.post(u, data)
    
    if response.status_code == 200:
        print("Successfully sent data to Tinybird")
    else:
        print(f"Failed to send data: {response.text}")

def main():
    print("Fetching GCS files...")
    files = list_gcs_files()
    signed_urls = [generate_signed_url(file) for file in files]
    print(f"Generated {len(signed_urls)} signed URLs.")

    print("Sending to Tinybird...")
    for url in signed_urls:
        send_to_tinybird(url, TINYBIRD_TOKEN, TINYBIRD_DATASOURCE)
        time.sleep(15)

if __name__ == "__main__":
    main()

Check more detailed use cases in the Ingest from Google Cloud Storage guide.

Limits

Because you're using the Data Sources API, its limits apply.

Remember you can limit the size of the exported files to accommodate to your Tinybird plan limits.

Next steps

See the following resources:

Updated