Table functions

Tinybird table functions allow you to read data from an existing database and schedule a regular Copy Pipe to orchestrate synchronization. You can load full tables or incrementally sync your data.

Tinybird supports the following table functions:

Prerequisites

Your database needs to be open and public, exposed to the internet with publicly signed certs, so you can connect it to it passing the hostname, port, username, and password.

You also need to be familiar with making cURL requests to manage your secrets.

Environment Variables API

The Environment Variables API is currently only accessible at API level.

Pasting your credentials into a Pipe node or .datafile as plain text is a security risk. Instead, use the Environment Variables API to create two new secrets for your username and password.

In the next step, you can interpolate your new secrets using the tb_secret function:

{{tb_secret('username')}}
{{tb_secret('password')}}

Load an external table

Create a new Pipe node. Call the table function in the FROM and pass the connection details:

PostgreSQL table function example
SELECT *
FROM postgresql(
  'aws-0-eu-central-1.TODO.com:3866',
  'postgres',
  'orders',
  {{tb_secret('pg_username')}},
  {{tb_secret('pg_password')}},
)

Publish this node as a Copy Pipe, thereby running the query manually. You can choose to append only new data, or replace all data.

Using datafiles

You can also define node logic in Pipe files. An example for a PostgreSQL eCommerce orders_backfill scenario, with a node called all_orders, would be:

NODE all_orders
SQL >

    %
    SELECT *
    FROM postgresql(
      'aws-0-eu-central-1.TODO.com:3866',
      'postgres',
      'orders',
      {{tb_secret('pg_username')}},
      {{tb_secret('pg_password')}},
    )

TYPE copy
TARGET_DATASOURCE orders
COPY_SCHEDULE @on-demand
COPY_MODE replace

Include filters

You can use a source column to filter by a value in Tinybird, for example:

Example Copy Pipe with PostgreSQL table function and filters
SELECT *
FROM postgresql(
  'aws-0-eu-central-1.TODO.com:3866',
  'postgres',
  'orders',
  {{tb_secret('pg_username')}},
  {{tb_secret('pg_password')}},
  )
WHERE orderDate > (select max(orderDate) from orders)

Schedule runs

When publishing as a Copy Pipe, most users set it to run at a frequent interval using a cron expression.

You can also trigger the Copy Pipe on demand:

curl -H "Authorization: Bearer <PIPE:READ token>" \
    -X POST "https:/tinybird.co/api/v0/pipes/<pipe_id>/run"

Having on-demand Pipes in your Workspace is helpful, as you can run a full sync manually any time you need it. You might also use them for weekly full syncs.

Synchronization strategies

When copying data from your database to Tinybird, you can use one of the following strategies:

  • Use COPY_MODE replace to synchronize small dimensions tables, up to a few million rows, in a frequent schedule (1 to 5 minutes).
  • Use COPY_MODE append to do incremental appends. For example, you can append events data tagged with a timestamp. Combine it with COPY_SCHEDULE and filters in the Copy Pipe SQL to sync the new events.

Timeouts

When synchronizing dimensions tables with COPY_MODE replace and 1 minute schedule, the copy job might time out because it can't ingest the whole table in the defined schedule.

Timeouts depend on several factors:

  • The timeout configured in your external database.
  • The external database load.
  • Network connectivity, for example when copying data from different cloud regions.

Follow these tips to avoid timeouts using incremental appends:

  • Make sure to tag your data with an updated timestamp and use the column to filter the Copy Pipe SQL.
  • Configure the Copy Pipe with an incremental append strategy and 1 minute schedule. That way you make sure only new records in the last minute are ingested, thus optimizing the copy job duration.
  • Create an index in the external table to speed up filtering.
  • Create the target Data Source as a ReplacingMergeTree using a unique or primary key as the ENGINE_SORTING_KEY. Rows with the same ENGINE_SORTING_KEY are deduplicated. Remember to use the FINAL keyword when querying the Data Source to force deduplication at query time.
  • Combine this approach with an hourly or daily replacement to get rid of deleted rows. Learn about how to handle deleted rows when using ReplacingMergeTree.

Observability

Job executions are logged in the datasources_ops_log Service Data Source. You can check this log directly in the Data Source view page in the UI. Filter by datasource_id to monitor ingestion through the table functions from the datasources_ops_log:

Example query to the datasources_ops_log Service Data Source
SELECT
  timestamp,
  event_type,
  result,
  error,
  job_id
FROM
  tinybird.datasources_ops_log
WHERE
  datasource_id = 't_1234'
AND
  event_type = 'copy'
ORDER BY timestamp DESC

Limits

The table functions inherit all the limits of Copy Pipes.

Environment Variables are created at a Workspace level, so you can connect one of each external database per Tinybird Workspace.

Check the limits page for limits on ingestion, queries, API Endpoints, and more.

Billing

There are no additional or specific costs for the table function itself; only the costs associated with Copy Pipes apply. For more information on data operations and their charges, see the billing docs.