Ingest data from BigQuery using Google Cloud Storage¶
Read on to learn how to send data from BigQuery to Tinybird, for example when you need to periodically run full replaces of a table or do a one-off ingest.
This process relies on BigQuery's exporting capabilities, or bulk exporting data as gzipped CSVs and then ingesting them using the Data Sources API.
Prerequisites¶
To follow these steps you need a Tinybird account and access to BigQuery and Google Cloud Storage.
Export the BigQuery table¶
The first step consists in unloading the BigQuery table, or query result set, to a gzipped CSV file.
Run the following SQL statement to unload the data:
EXPORT DATA OPTIONS ( uri = 'gs://bucket/folder/*.csv', format = 'CSV', compression = 'GZIP', overwrite = true, header = true, field_delimiter = ';') AS ( SELECT field1, field2 FROM mydataset.table1 ORDER BY field1 );
More details about the EXPORT DATA
statement in the Google Cloud docs
To automate the unloading, you can create a BigQuery Scheduled Query.
You can follow Google Cloud official docs for Scheduled queries.
Ingest data into Tinybird¶
After your data is in a Google Cloud Storage bucket, you can ingest it in Tinybird generating signed URLs and sending them to the Data Sources API.
You can use this example script:
import json import time import requests from google.cloud import storage from google.auth.credentials import AnonymousCredentials from google.oauth2 import service_account from datetime import datetime, timedelta # Configuration BUCKET_NAME = "your-gcs-bucket" TINYBIRD_HOST = "https://api.tinybird.co" TINYBIRD_DATASOURCE = "your_datasource_name" TINYBIRD_TOKEN = "your_tinybird_token" SERVICE_ACCOUNT_JSON = "path/to/service-account.json" URL_EXPIRATION = 3600 # 1 hour # Authenticate with GCS credentials = service_account.Credentials.from_service_account_file(SERVICE_ACCOUNT_JSON) storage_client = storage.Client(credentials=credentials) def generate_signed_url(blob_name): """Generate a signed URL for a blob in GCS.""" bucket = storage_client.bucket(BUCKET_NAME) blob = bucket.blob(blob_name) url = blob.generate_signed_url(expiration=timedelta(seconds=URL_EXPIRATION)) return url def list_gcs_files(): """List files in the specified GCS bucket.""" bucket = storage_client.bucket(BUCKET_NAME) return [blob.name for blob in bucket.list_blobs()] def send_to_tinybird(url, tinybird_token, datasource_name): """Uploads signed URLs to Tinybird.""" u = f"{TINYBIRD_HOST}/v0/datasources?name={datasource_name}&mode=append&format=ndjson&token={tinybird_token}" data = {"url": url} response = requests.post(u, data) if response.status_code == 200: print("Successfully sent data to Tinybird") else: print(f"Failed to send data: {response.text}") def main(): print("Fetching GCS files...") files = list_gcs_files() signed_urls = [generate_signed_url(file) for file in files] print(f"Generated {len(signed_urls)} signed URLs.") print("Sending to Tinybird...") for url in signed_urls: send_to_tinybird(url, TINYBIRD_TOKEN, TINYBIRD_DATASOURCE) time.sleep(15) if __name__ == "__main__": main()
Check more detailed use cases in the Ingest from Google Cloud Storage guide.
Limits¶
Because you're using the Data Sources API, its limits apply.
Remember you can limit the size of the exported files to accommodate to your Tinybird plan limits.
Next steps¶
See the following resources: