DynamoDB Connector

The DynamoDB Connector is in private beta testing. Please contact support@tinybird.co if you would like to participate. Steps in this guide may change as the feature is developed.

The DynamoDB Connector allows you to ingest historical and change stream data from Amazon DynamoDB to Tinybird.

The DynamoDB Connector is fully managed and requires no additional tooling. Connect Tinybird to DynamoDB, choose your tables, and Tinybird will keep in sync with DynamoDB.

The DynamoDB Connector is:

  • Easy to use. Connect to your DynamoDB tables and start ingesting data in minutes.
  • SQL-based. Using nothing but SQL, query your DynamoDB data and enrich it with dimensions from your streaming data, warehouse, or files.
  • Secure. Use Auth tokens to control access to API endpoints. Implement access policies as you need. Support for row-level security.

Required IAM permissions

The DynamoDB Connector requires certain permissions to access your tables. The IAM Role needs the following permissions:

  • dynamodb:Scan
  • dynamodb:DescribeStream
  • dynamodb:DescribeExport
  • dynamodb:GetRecords
  • dynamodb:GetShardIterator
  • dynamodb:DescribeTable
  • dynamodb:DescribeContinuousBackups
  • dynamodb:ExportTableToPointInTime
  • dynamodb:UpdateTable
  • dynamodb:UpdateContinuousBackups

Below is an example AWS Access Policy:

Note: This is an example policy. When configuring the connector, the UI, CLI and API all provide the necessary policy templates.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:Scan",
                "dynamodb:DescribeStream",
                "dynamodb:DescribeExport",
                "dynamodb:GetRecords",
                "dynamodb:GetShardIterator",
                "dynamodb:DescribeTable",
                "dynamodb:DescribeContinuousBackups",
                "dynamodb:ExportTableToPointInTime",
                "dynamodb:UpdateTable",
                "dynamodb:UpdateContinuousBackups"
            ],
            "Resource": [
                "arn:aws:dynamodb:*:*:table/<your_dynamodb_table>",
                "arn:aws:dynamodb:*:*:table/<your_dynamodb_table>/stream/*",
                "arn:aws:dynamodb:*:*:table/<your_dynamodb_table>/export/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": ["s3:PutObject", "s3:GetObject", "s3:ListBucket"],
            "Resource": ["arn:aws:s3:::<your_dynamodb_export_bucket>", "arn:aws:s3:::<your_dynamodb_export_bucket>/*"]
        }
    ]
}

Below is an example trust policy:

Note: This is an example policy. When configuring the connector, the UI, CLI and API all provide the necessary policy templates.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Principal": {
                "AWS": "arn:aws:iam::473819111111111:root"
            },
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "ab3caaaa-01aa-4b95-bad3-fff9b2ac789f8a9"
                }
            }
        }
    ]
}

Load a DynamoDB table using the UI (coming soon)

The DynamoDB Connector UI is not available in the beta. Please use the CLI instead.

Load a DynamoDB table using the CLI

To load a DynamoDB table into Tinybird using the CLI, you first need to create a connection, and then a Data Source.

The connection grants your Tinybird Workspace the necessary permissions to access AWS and your tables in DynamoDB. The Data Source then maps a table in DynamoDB to a table in Tinybird and manages the historical and continous sync.

The Tinybird DynamoDB connector was introduced in v5.3.0 of the Tinybird CLI. This is the minimum version of the CLI required to use the DynamoDB connector.

Prerequisites

Check that the CLI is pointing to the correct Workspace with tb workspace current. Switch Workspaces with tb workspace use <workspace_name>.

1. Create the DynamoDB connection

The connection grants your Tinybird Workspace the necessary permissions to access AWS and your tables in DynamoDB.

1.1 Run the connection command

tb connection create dynamodb

This command initiates the process of creating a connection. When prompted, type y to proceed.

1.2 Create a new IAM Policy in AWS

The CLI will provide a policy template. Replace <your_dynamodb_table> with the name of your DynamoDB table. Replace <your_dynamodb_export_bucket> with the name of the S3 bucket you want to use for the initial load.

After that:

  • Navigate to the AWS Management Console.
  • Go to IAM > Policies > Create Policy.
  • Select the JSON tab and paste the modified policy text.
  • Save and create the policy.

1.3 Create a new IAM Role in AWS

Return to the CLI and proceed to the next step. The CLI will provide a trust policy template.

Then:

  • Go to IAM > Roles > Create Role.
  • Select "Custom Trust Policy" and paste the trust policy copied from the CLI.
  • In the "Permissions" tab, attach the policy created in the previous step.
  • Complete the role creation process.

1.4 Complete the connection

In the AWS IAM console, find the role you just created. Copy its ARN (Amazon Resource Name), which looks like arn:aws:iam::111111111111:role/my-awesome-role.

The CLI will ask you for:

  • The Role ARN
  • AWS region of your DynamoDB tables
  • Connection name

The connection name is used by Tinybird to identify the connection. It can be any name you choose, but must follow the standard Tinybird name conventions. The name can only contain AlphaNumeric characters a-zA-Z and underscores _, and must start with an Alpha character.

When the CLI prompts are completed, the connection will be created directly in Tinybird.

The CLI will generate a .connection file in your project directory. This file is not used and is safe to delete. A future release will allow you to push this file to Tinybird to automate the creation of connections, similar to Kafka connections.

2. Create a DynamoDB Data Source

The Data Source maps a table in DynamoDB to a table in Tinybird and manages the historical and continous sync.

2.1 Create a Data Source file

Data Source files contain the table schema, and specific DynamoDB properties to target the table that Tinybird will import.

Create a Data Source file called mytable.datasource (you can name the file anything you like, but you must use the .datasource extension).

There are two approaches to defining the schema for a DynamoDB Data Source:

  1. Define just the Partition Key and Sort Key from your DynamoDB table, and access other properties from JSON at query time
  2. Define all DynamoDB item properties as columns

The Partition Key and Sort Key (if any) from your DynamoDB must be defined in the Data Source schema. These are the only properties that are mandatory to define, as they are used for deduplication of records (upserts/deletes).

Approach 1: Define just the Partition Key and Sort Key

If you do not want to map all properties from your DynamoDB table, you can define just the Partition Key and Sort Keys.

The entire DynamoDB item will be stored as JSON in a _record column, and you can extract properties using JSONExtract* functions.

For example, if you have a DynamoDB table with transaction_id as the Partition Key, you can define your Data Source schema like this:

mytable.datasource
SCHEMA >
    transaction_id String `json:$.transaction_id`

IMPORT_SERVICE "dynamodb"
IMPORT_CONNECTION_NAME <your_connection_name>
IMPORT_TABLE_ARN <your_table_arn>
IMPORT_EXPORT_BUCKET <your_dynamodb_export_bucket>

Replace <your_connection_name> with the name of the connection created in the first step. Replace <your_table_arn> with the ARN of the table you'd like to import. Replace <your_dynamodb_export_bucket> with the name of the S3 bucket you want to use for the initial sync.

Approach 2: Define all DynamoDB item properties as columns

If you want to strictly define all of your properties and their types, you can map them into your Data Source as columns.

Properties can be mapped to any of the supported types in Tinybird. They can be also Arrays of the previously mentioned types, and Nullable. We recommend to use Nullable when there are properties that may not have a value in every item within your DynamoDB table.

For example, if you have a DynamoDB with items like this:

{
    "timestamp": "2024-07-25T10:46:37.380Z",
    "transaction_id": "399361d5-10fc-4777-8187-88aaa4623569",
    "name": "Chris Donnelly",
    "passport_number": 4904040,
    "flight_from": "Burien",
    "flight_to": "Sanford",
    "airline": "BrianAir"
}

Where transaction_id is the Partition Key, you can define your Data Source schema like this:

mytable.datasource
SCHEMA >
    `timestamp` DateTime64(3) `json:$.timestamp`,
    `transaction_id` String `json:$.transaction_id`,
    `name` String `json:$.name`,
    `passport_number` Int64 `json:$.passport_number`,
    `flight_from` String `json:$.flight_from`,
    `flight_to` String `json:$.flight_to`,
    `airline` String `json:$.airline`

IMPORT_SERVICE "dynamodb"
IMPORT_CONNECTION_NAME <your_connection_name>
IMPORT_TABLE_ARN <your_table_arn>
IMPORT_EXPORT_BUCKET <your_dynamodb_export_bucket>

Replace <your_connection_name> with the name of the connection created in the first step. Replace <your_table_arn> with the ARN of the table you'd like to import. Replace <your_dynamodb_export_bucket> with the name of the S3 bucket you want to use for the initial sync.

Mapping properties

Properties with basic types (String, Number, Boolean, Binary, String Set, Number Set) at the root item level can be mapped easily.

Follow this schema definition pattern:

<PropertyName> <PropertyType> `json:$.<PropertyNameInDDB>`
  • PropertyName is the name of the column within your Tinybird Data Source.
  • PropertyType is the type of the column within your Tinybird Data Source. It needs to match the type in the DynamoDB Data Source. For example, a String will correspond to a String column in DynamoDB. All Int, UInt, or Float variants should be assigned to number columns. An Array(String) should be assigned to String Set columns in DynamoDB. An Array(UInt<X>) (and all numeric variants) should be assigned to a Number Set column in DynamoDB.
  • PropertyNameInDDB is the name of the property in your DynamoDB table. It should match the letter casing.

Properties within complex types, like Maps, need to be mapped manually with JSONPaths.

For example, a map property at the first level can be mapped in your Data Source schema like:

MyString String `json:$.<map_property_name>.<map_property_value_name>`.

For Lists, standalone column mapping is not supported yet. Those properties will need to be extracted with JSONExtract* functions or consumed after a transformation with a MaterializedView.

2.2 Push the Data Source

With your connection created and Data Source defined, push your Data Source to Tinybird using tb push.

For example, if your Data Source file is called mytable.datasource, run:

tb push mytable.datasource

Architecture

AWS provides two free, out-of-the-box functions for DynamoDB: DynamoDB Streams and point-in-time recovery (PITR).

  • DynamoDB Streams captures change events for a given DynamoDB table and provides an API to access events as a stream. This enables CDC-like access to the table for continuous updates.
  • PITR allows you to take snapshots of your entire DynamoDB table at a point in time and save the export to S3. This enables historical access to table data for batch uploads.

The DynamoDB Connector uses these functions under-the-hood to send DynamoDB data to Tinybird:

Connecting DynamoDB to Tinybird architecture
Connecting DynamoDB to Tinybird architecture