Snowflake Connector

Use the Snowflake Connector to load data from your existing Snowflake account into Tinybird so that you can quickly turn them into high-concurrency, low-latency REST APIs.

The Snowflake Connector is fully managed and requires no additional tooling. You can define a sync schedule inside Tinybird and execution is taken care of for you.

With the Snowflake Connector you can:

  • Start ingesting data instantly from Snowflake using SQL.
  • Use SQL to query, shape, and join your Snowflake data with other sources.
  • Use Auth tokens to control access to API endpoints. Implement access policies as you need. Support for row-level security.

Snowflake IP filtering isn't supported by the Snowflake Connector. If you need to filter IPs, use the GCS/S3 Connector.

Load a Snowflake table

Load a Snowflake table in the UI

To add a Snowflake table as a Data Source, follow these steps.

Create a connection

Create a new Data Source using the Snowflake Connector dialog:

  1. Open Tinybird and add a new Data Source by selecting Create new (+) next to the Data Sources section.
  2. In the Data Sources dialog, select the Snowflake connector.
  3. Enter your Snowflake Account Identifier. To find this, log into Snowflake, find the account info and then copy the Account Identifier.
  4. In Tinybird, in the Connection details dialog, configure authentication with your Snowflake account. Enter your user password and Account Identifier.
  5. Select the role and warehouse to access your data.
  6. Copy the SQL snippet from the text box. The snippet creates a new Snowflake Storage Integration linking your Snowflake account with a Tinybird staging area for your Workspace. It also grants permission to the given role to create new Stages to unload data from your Snowflake Account into Tinybird.
  7. With the SQL query copied, open a new SQL Worksheet inside Snowflake. Paste the SQL into the Worksheet query editor. You must edit the query and replace the <your_database> fragment with the name of your Snowflake database.
  8. Select Run. The statement must be executed with a Snowflake ACCOUNTADMIN role, since Snowflake Integrations operate at Account level and usually need administrator permissions.

Select the database, table, and schema

After running the query, the Statement executed successfully message appears. Return to your Tinybird tab to resume the configuration of the Snowflake connector. Set a name for the Snowflake connection and complete this step by selecting Next.

The Snowflake Connector now has enough permissions to inspect your Snowflake objects available to the given role. Browse the tables available in Snowflake and select the table you wish to load.

Start by selecting the database to which the table belongs, then the schema, and the table. Finish by selecting Next.

Maximum allowed table size is 50 million rows. The result is truncated if it exceeds that limit.

Configure the schedule

You can configure the schedule on which you wish to load data. By default, the frequency is set to One-off which performs a one-time sync of the table. You can change this by selecting a different option from the menu.

To configure a schedule that runs a regular sync, select the Interval option. You can configure a schedule in minutes, hours, or days by using the menu, and set the value for the schedule in the text field.

You can also select whether the sync should run immediately, or if it should wait until the first scheduled sync. The Replace data import strategy is selected by default. Finish by selecting Next.

Maximum allowed frequency is 5 minutes.

Complete the configuration

The final screen of the dialog shows the interpreted schema of the table, which you can change as needed. You can also modify what the name of the Data Source in Tinybird. Select Create Data Source to complete the process.

After you've created the Data Source, a status chart appears showing executions of the loading schedule. The Data Source takes a moment to create the resources required to perform the first sync.

When the first sync has completed, a green bar appears indicating the status. Details about the data, such as storage size and number of rows, is shown. You can also see a preview of the data.

Load a Snowflake table in the CLI

To add a Snowflake table as a Data Source using the Tinybird CLI, follow these steps.

Create a connection

You need to create a connection before you can load a table from Snowflake into Tinybird using the CLI. Creating a connection grants your Tinybird Workspace the appropriate permissions to view data from Snowflake.

Authenticate your CLI and switch to the desired Workspace. Then run:

tb connection create snowflake

The output includes instructions to configure read-only access to your data in Snowflake. Enter your user, password, account identifier, role, warehouse, and a name for the connection.

After introducing the required information, copy the SQL block that appears.

** Creating a new Snowflake connection at the xxxx workspace.

User (must have create stage and create integration in Snowflake):
Password:
Account identifier:
Role (optional):
Warehouse (optional):
Connection name (optional, current xxxx):

Enter this SQL statement in Snowflake using your admin account to create the connection:

------
create storage integration if not exists "tinybird_integration_role"
    type = external_stage
    storage_provider = 'GCS'
    enabled = true
    comment = 'Tinybird Snowflake Connector Integration'
    storage_allowed_locations = ('gcs://tinybird-cdk-production-europe-west3/id');

grant create stage on all schemas in database <your_database> to role ACCOUNTADMIN;

grant ownership on integration "tinybird_integration_ACCOUNTADMIN" to role ACCOUNTADMIN;
------

Ready?  (y, N):

** Validating connection...
** xxxx.connection created successfully! Connection details saved into the .env file and referenced automatically in your connection file.

With the SQL query copied, open a new SQL Worksheet inside Snowflake. Paste the SQL into the Worksheet query editor. You must edit the query and replace the <your_database> fragment with the name of your Snowflake database.

Select Run. This statement must be executed with a Snowflake ACCOUNTADMIN role, since Snowflake Integrations operate at Account level and usually need administrator permissions.

The Statement executed successfully message appears. Return to your terminal, select yes (y) and the connection is created.A new snowflake.connection file appears in your project files. The .connection file can be safely deleted.

Create the Data Source

After you've created the connection, you can create a Data Source and configure the schedule to import data from Snowflake.

The Snowflake import is configured using the following options, which you can add at the end of your .datasource file:

  • IMPORT_SERVICE: Name of the import service to use. In this case, snowflake.
  • IMPORT_CONNECTION_NAME: The name given to the Snowflake connection inside Tinybird. For example, 'my_connection'.
  • IMPORT_EXTERNAL_DATASOURCE: The fully qualified name of the source table in Snowflake. For example, database.schema.table.
  • IMPORT_SCHEDULE: A cron expression (UTC) with the frequency to run imports. Must be higher than 5 minutes. For example, */5 * * * *.
  • IMPORT_STRATEGY: The strategy to use when inserting data, either REPLACE or APPEND.
  • IMPORT_QUERY: (Optional) The SELECT query to extract your data from Snowflake when you don't need all the columns or want to make a transformation before ingestion. The FROM must reference a table using the full scope: database.schema.table.

Note: For IMPORT_STRATEGY only REPLACE is supported today. The APPEND strategy will be enabled in a future release.

The following example shows a configured .datasource file for a Snowflake Data Source:

snowflake.datasource file
DESCRIPTION >
    Snowflake demo data source

SCHEMA >
    `timestamp` DateTime `json:$.timestamp`,
    `id` Integer `json:$.id`,
    `orderid` LowCardinality(String) `json:$.orderid`,
    `status` LowCardinality(String) `json:$.status`,
    `amount` Integer `json:$.amount`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp"
ENGINE_TTL "timestamp + toIntervalDay(60)"

IMPORT_SERVICE snowflake
IMPORT_CONNECTION_NAME my_snowflake_connection
IMPORT_EXTERNAL_DATASOURCE mydb.raw.events
IMPORT_SCHEDULE */5 * * * *
IMPORT_STRATEGY REPLACE
IMPORT_QUERY >
    select
    timestamp,
    id,
    orderid,
    status,
    amount
        from
        mydb.raw.events

The columns you select in the IMPORT_QUERY must match the columns defined in the Data Source schema. For example, if you Data Source has columns ColumnA, ColumnB then your IMPORT_QUERY must contain SELECT ColumnA, ColumnB FROM .... A mismatch of columns causes data to arrive in the quarantine Data Source.

Push the configuration to Tinybird

With your connection created and Data Source defined, you can now push your project to Tinybird using:

tb push

The first run of the import begins on the next lapse of the CRON expression.

Iterate a Snowflake Data Source

Prerequisites

Use of the CLI and the version control integration to handle your resources.

To use the advantages of version control, connect your Workspace with your repository, and set the CI/CD configuration.

Check the use case examples repository where you can find basic instructions and examples to handle Snowflake Data Sources iteration using git integration, under the iterate_snowflake section.

To use the Tinybird CLI check its documentation.

For instance to create the connections in the main-branch Workspace using the CLI:

tb auth # use the main Workspace admin Token
tb connection create snowflake
# these prompts are interactive and will ask you to insert the necessary information

You can only create connections in the main Workspace. When creating the connection in a Branch, it's created in the main Workspace and from there is available to every Branch.

For testing purposes, use different connections from main-branches workspaces.

Add a new Snowflake Data Source

You can add a new Data Source directly with the UI or the CLI tool, following the load of a Snowflake table section.

This works for testing purposes, but doesn't carry any connection details. You must add the connection and Snowflake configuration in the .datasource file when moving to production.

To add a new Data Source using the recommended version control workflow check the instructions in the examples repository.

Update a Data Source

  • Snowflake Data Sources can't be modified directly from UI
  • When you create a new Tinybird Branch, the existing Snowflake Data Sources won't be connected. You need to re-create them in the Branch.
  • In Branches, it's usually useful to work with fixtures, as they'll be applied as part of the CI/CD, allowing the full process to be deterministic in every iteration and avoiding quota consume from external services.

Snowflake Data Sources can be modified from the CLI tool:

tb auth
# modify the .datasource Datafile with your editor
tb push --force {datafile}
# check the command output for errors

To update it using the recommended version control workflow check the instructions in the examples repository.

Delete a Data Source

Snowflake Data Sources can be deleted directly from UI or CLI like any other Data Source.

To delete it using the recommended version control workflow check the instructions in the examples repository.

Logs

Job executions are logged in the datasources_ops_log Service Data Source. You can check this log directly in the Data Source view page in the UI. Filter by datasource_id to monitor ingestion through the Snowflake Connector from the datasources_ops_log:

SELECT
  timestamp,
  event_type,
  result,
  error,
  job_id
FROM
  tinybird.datasources_ops_log
WHERE
  datasource_id = 't_1234'
AND
  event_type = 'replace'
ORDER BY timestamp DESC

Schema evolution

The Snowflake Connector supports backwards compatible changes made in the source table. This means that, if you add a new column in Snowflake, the next sync job automatically adds it to the Tinybird Data Source.

Non-backwards compatible changes, such as dropping or renaming columns, aren't supported and might cause the next sync to fail.

Limits

See Snowflake Connector limits.

Updated