Ingest from Snowflake using Azure Blob Storage

Read on to learn how to send data from Snowflake to Tinybird, for example when you need to periodically run full replaces of a table or do a one-off ingest.

This process relies on unloading, or bulk exporting, data as gzipped CSVs and then ingesting them using the Data Sources API.

Prerequisites

To follow these steps you need a Tinybird account and access to Snowflake and permissions to create SAS Tokens for Azure Blob Storage.

1

Unload the Snowflake table

Snowflake lets you unload query results to flat files to and external storage service. For example:

COPY INTO 'azure://myaccount.blob.core.windows.net/unload/'
  FROM mytable
  CREDENTIALS = ( AZURE_SAS_TOKEN='****' )
  FILE_FORMAT = ( TYPE = CSV  COMPRESSION = GZIP )
  HEADER = FALSE;

The most basic implementation is unloading directly, but for production use cases consider adding a named stage as suggested in the Snowflake docs. Stages give you fine-grained control to access rights.

2

Create a SAS token for the file

Using Azure CLI, generate a shared access signature (SAS) token so Tinybird can read the file:

az storage blob generate-sas \
    --account-name myaccount \
    --account-key '****' \
    --container-name unload \
    --name data.csv.gz \
    --permissions r \
    --expiry <expiry-ts> \
    --https-only \
    --output tsv \
    --full-uri

> 'https://myaccount.blob.core.windows.net/unload/data.csv.gz?se=2024-05-31T10%3A57%3A41Z&sp=r&spr=https&sv=2022-11-02&sr=b&sig=PMC%2E9ZvOFtKATczsBQgFSsH1%2BNkuJvO9dDPkTpxXH0g%5D'

You can use the same behavior in S3 and GCS to generate presigned URLs.

3

Ingest into Tinybird

Take the generated URL and make a call to Tinybird. You need a Token with DATASOURCES:CREATE permissions:

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources?name=my_datasource_name" \
-d url='https://myaccount.blob.core.windows.net/unload/data.csv.gz?se=2024-05-31T10%3A57%3A41Z&sp=r&spr=https&sv=2022-11-02&sr=b&sig=PMC%2E9ZvOFtKATczsBQgFSsH1%2BNkuJvO9dDPkTpxXH0g%5D'

You now have your Snowflake Table in Tinybird.

Automation

To adapt to production scenarios, like having to append data on a timely basis or replacing data that has been updated in Snowflake, you might need to define scheduled actions to move the data. See examples in the Ingest from Google Cloud Storage guide and in Schedule data ingestion with cron and GitHub Actions guide.

Limits

Because you're using the Data Sources API, its limits apply.

You might need to adjust your COPY INTO <location> expression adding PARTITION or MAX_FILE_SIZE = 5000000000. For example:

COPY INTO 'azure://myaccount.blob.core.windows.net/unload/'
  FROM mytable 
  CREDENTIALS=( AZURE_SAS_TOKEN='****')
  FILE_FORMAT = ( TYPE = CSV  COMPRESSION = GZIP )
  HEADER = FALSE
  MAX_FILE_SIZE = 5000000000;

Next steps

See the following resources:

Updated