Ingest from Snowflake using Azure Blob Storage¶
Read on to learn how to send data from Snowflake to Tinybird, for example when you need to periodically run full replaces of a table or do a one-off ingest.
This process relies on unloading, or bulk exporting, data as gzipped CSVs and then ingesting them using the Data Sources API.
Prerequisites¶
To follow these steps you need a Tinybird account and access to Snowflake and permissions to create SAS Tokens for Azure Blob Storage.
Unload the Snowflake table¶
Snowflake lets you unload query results to flat files to and external storage service. For example:
COPY INTO 'azure://myaccount.blob.core.windows.net/unload/' FROM mytable CREDENTIALS = ( AZURE_SAS_TOKEN='****' ) FILE_FORMAT = ( TYPE = CSV COMPRESSION = GZIP ) HEADER = FALSE;
The most basic implementation is unloading directly, but for production use cases consider adding a named stage as suggested in the Snowflake docs. Stages give you fine-grained control to access rights.
Create a SAS token for the file¶
Using Azure CLI, generate a shared access signature (SAS) token so Tinybird can read the file:
az storage blob generate-sas \ --account-name myaccount \ --account-key '****' \ --container-name unload \ --name data.csv.gz \ --permissions r \ --expiry <expiry-ts> \ --https-only \ --output tsv \ --full-uri > 'https://myaccount.blob.core.windows.net/unload/data.csv.gz?se=2024-05-31T10%3A57%3A41Z&sp=r&spr=https&sv=2022-11-02&sr=b&sig=PMC%2E9ZvOFtKATczsBQgFSsH1%2BNkuJvO9dDPkTpxXH0g%5D'
You can use the same behavior in S3 and GCS to generate presigned URLs.
Ingest into Tinybird¶
Take the generated URL and make a call to Tinybird. You need a Token with DATASOURCES:CREATE
permissions:
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources?name=my_datasource_name" \ -d url='https://myaccount.blob.core.windows.net/unload/data.csv.gz?se=2024-05-31T10%3A57%3A41Z&sp=r&spr=https&sv=2022-11-02&sr=b&sig=PMC%2E9ZvOFtKATczsBQgFSsH1%2BNkuJvO9dDPkTpxXH0g%5D'
You now have your Snowflake Table in Tinybird.
Automation¶
To adapt to production scenarios, like having to append data on a timely basis or replacing data that has been updated in Snowflake, you might need to define scheduled actions to move the data. See examples in the Ingest from Google Cloud Storage guide and in Schedule data ingestion with cron and GitHub Actions guide.
Limits¶
Because you're using the Data Sources API, its limits apply.
You might need to adjust your COPY INTO <location> expression adding PARTITION
or MAX_FILE_SIZE = 5000000000
. For example:
COPY INTO 'azure://myaccount.blob.core.windows.net/unload/' FROM mytable CREDENTIALS=( AZURE_SAS_TOKEN='****') FILE_FORMAT = ( TYPE = CSV COMPRESSION = GZIP ) HEADER = FALSE MAX_FILE_SIZE = 5000000000;
Next steps¶
See the following resources: