Ingest from Snowflake via unloading

In this guide you'll learn how to send data from Snowflake to Tinybird, for scenarios where the native connector can't be used —things outside a one-off load or periodical full replaces of the table, or where limits apply—.

This process relies on unloading (aka bulk export) data as gzipped CSVs and then ingesting via Data Sources API.

This guide explains the process using Azure Blob Storage, but it's easy to replicate using Amazon S3, Google Cloud Storage, or any storage service where you can unload data from Snowflake and share presigned URLs to access the files.

This guide is a walkthrough of the most common, basic process: Unload the table from Snowflake, then ingest this export into Tinybird.

Prerequisites

This guide assumes you have a Tinybird account, you are familiar with creating a Tinybird Workspace and pushing resources to it.

You will also need access to Snowflake, and permissions to create SAS Tokens for Azure Blob Storage or its equivalents in AWS S3 and Google Cloud Storage.

1. Unload the Snowflake table

Snowflake makes it really easy to unload query results to flat files to and external storage service.

COPY INTO 'azure://myaccount.blob.core.windows.net/unload/'
  FROM mytable
  CREDENTIALS = ( AZURE_SAS_TOKEN='****' )
  FILE_FORMAT = ( TYPE = CSV  COMPRESSION = GZIP )
  HEADER = FALSE;

The most basic implementation is unloading directly, but for production use cases consider adding a named stage as suggested in the docs. Stages will give you more fine-grained control to grant access rights.

2. Create a SAS token for the file

Using Azure CLI, generate a shared access signature (SAS) token so Tinybird can read the file:

az storage blob generate-sas \
    --account-name myaccount \
    --account-key '****' \
    --container-name unload \
    --name data.csv.gz \
    --permissions r \
    --expiry <expiry-ts> \
    --https-only \
    --output tsv \
    --full-uri

> 'https://myaccount.blob.core.windows.net/unload/data.csv.gz?se=2024-05-31T10%3A57%3A41Z&sp=r&spr=https&sv=2022-11-02&sr=b&sig=PMC%2E9ZvOFtKATczsBQgFSsH1%2BNkuJvO9dDPkTpxXH0g%5D'

Use the same behavior in S3 and GCS to generate pre-signed URLs.

3. Ingest into Tinybird

Take that generated URL and make a call to Tinybird. You'll need a Token with DATASOURCES:CREATE permissions:

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources?name=my_datasource_name" \
-d url='https://myaccount.blob.core.windows.net/unload/data.csv.gz?se=2024-05-31T10%3A57%3A41Z&sp=r&spr=https&sv=2022-11-02&sr=b&sig=PMC%2E9ZvOFtKATczsBQgFSsH1%2BNkuJvO9dDPkTpxXH0g%5D'

You should now have your Snowflake Table in Tinybird.

Automation

To adapt to more "real-life" scenarios (like having to append data on a timely basis, replace data that has been updated in Snowflake, etc.), you may need to define scheduled actions to move the data. You can see examples in the Ingest from Google Cloud Storage guide and in Schedule data ingestion with cron and GitHub Actions guide.

Limits

You will be using Data Sources API, so its limits apply:

DescriptionLimit
Append/Replace data to Data Source5 times per minute
Max file size (uncompressed)Free plan 10GB
Max file size (uncompressed)pro and enterprise 32GB

As a result of these limits, you may need to adjust your COPY INTO <location> expression adding PARTITION or MAX_FILE_SIZE = 5000000000.

COPY INTO 'azure://myaccount.blob.core.windows.net/unload/'
  FROM mytable 
  CREDENTIALS=( AZURE_SAS_TOKEN='****')
  FILE_FORMAT = ( TYPE = CSV  COMPRESSION = GZIP )
  HEADER = FALSE
  MAX_FILE_SIZE = 5000000000;

Next steps

These resources may be useful:

Updated