MergeTree engine

The MergeTree engine is designed for high data ingest rates and huge data volumes. Is the default engine for Data Sources.

Key features:

  • The table's sorting key (defined by ENGINE_SORTING_KEY) determines the sort order within each table part.
  • The sorting key doesn't reference individual rows but blocks of 8192 rows called granules.
  • You can partition tables using an arbitrary partition expression. Partition pruning ensures partitions are omitted from reading when the query allows it. When partitioning you should be aware that partitioning isn't the fastest way to speed up queries, in contrast to the ENGINE_SORTING_KEY. A partition is the minimum unit of data for async operations, but for streaming ingest you should not write to more than one or two partitions.

Creating a MergeTree Data Source

Here's how to define a MergeTree data source:

SCHEMA >
    `event_date` Date `json:$.date`,
    `event_type` LowCardinality(String) `json:$.event_type`,
    `user_id` UInt64 `json:$.user_id`,
    `payload` JSON `json:$.payload`

ENGINE "MergeTree" 
ENGINE_SORTING_KEY "event_type, event_date, user_id"
[ENGINE_PARTITION_KEY "toYYYYMM(event_date)"]
[ENGINE_PRIMARY_KEY "event_type, event_date"]
[ENGINE_TTL "event_date + INTERVAL 30 DAY"]
[ENGINE_SETTINGS index_granularity=8192]

[INDEXES >
    index_name index_expression TYPE index_type GRANULARITY index_granularity]

Engine Settings

  • ENGINE_SORTING_KEY - Defines the sorting key. This is a tuple of column names or arbitrary expressions that determines how data is sorted within each part. For example: "event_type, event_date".

  • ENGINE_PARTITION_KEY - Optional. Defines the partitioning key. In most cases, you don't need a partition key, and if you do need to partition, generally you don't need a partition key more granular than by month. To partition by month, use "toYYYYMM(event_date)".

  • ENGINE_PRIMARY_KEY - Optional. Defines the primary key. If not specified, the ENGINE_PRIMARY_KEY matches the ENGINE_SORTING_KEY. It can be a subset of the ENGINE_SORTING_KEY. Useful for Engines where you need a long sorting key for aggregation or deudplication purposes, but a shorter primary key will be better for keeping the index size smaller.

  • ENGINE_TTL - Optional. Defines rules for data expiration. The expression must result in a Date or DateTime, for example "event_date + INTERVAL 1 DAY".

  • ENGINE_SETTINGS - Optional. Defines settings for the engine. For example, "index_granularity=8192".

  • INDEXES > - Optional. Defines indexes for the engine.

Data Storage Internals

A table consists of data parts sorted by sorting key. When data is inserted in a table, separate data parts are created and each of them is lexicographically sorted by sorting key. For example, if the sorting key is "event_type, event_date", the data in the part is sorted by event_type, and within each event_type, it's ordered by event_date.

When data is inserted in a DataSource, separate data parts are created and each of them is lexicographically sorted by sorting key. Data belonging to the same sorting key is stored in the same data part. Data belonging to different partitions are separated into different parts. In the background, the DataSource merges data parts for more efficient storage. Parts belonging to different partitions aren't merged.

The merge mechanism doesn't guarantee that all rows with the same primary key will be in the same data part. Each data part is logically divided into granules. A granule is the smallest indivisible data set that Tinybird reads when selecting data. Tinybird doesn't split rows or values, so each granule always contains an integer number of rows. The first row of a granule is marked with the value of the primary key for the row. For each data part, Tinybird creates an index file that stores the marks. For each column, whether it’s in the primary key or not, Tinybird also stores the same marks. These marks let you find data directly in column files.

The granule size is restricted by the index_granularity setting of the table engine. The number of rows in a granule lays in the [1, index_granularity] range, depending on the size of the rows.

Sorting Keys and Indexes in Queries

Take the "event_type, event_date" sorting key as an example. In this case, the sorting and index can be illustrated as follows:

      Whole data:     [-------------------------------------------------------------------------]
      event_type:     [aaaaaaaaaaaaaaaaaabbbbcdeeeeeeeeeeeeefgggggggghhhhhhhhhiiiiiiiiikllllllll]
      event_date:     [1111111222222233331233211111222222333211111112122222223111112223311122333]
      Marks:           |      |      |      |      |      |      |      |      |      |      |
                      a,1    a,2    a,3    b,3    e,2    e,3    g,1    h,2    i,1    i,3    l,3
      Marks numbers:   0      1      2      3      4      5      6      7      8      9      10

If the data query specifies:

  • event_type in ('a', 'h'), the server reads the data in the ranges of marks [0, 3) and [6, 8).
  • event_type = 'a' AND event_date = 3, the server reads the data in the ranges of marks [1, 3) and [7, 8).
  • event_date = 3, the server reads the data in the range of marks [1, 10].

The examples above show that it's always more effective to use an index than a full scan.

Data Skipping Indexes

SCHEMA >
    `event_date` Date `json:$.date`,
    `event_type` LowCardinality(String) `json:$.event_type`,
    `user_id` UInt64 `json:$.user_id`,
    `payload` JSON `json:$.payload`

ENGINE "MergeTree" 
ENGINE_SORTING_KEY "event_type, event_date, user_id"

INDEXES >
    index_name index_expression TYPE index_type GRANULARITY index_granularity

For tables from the *MergeTree family, data skipping indices can be specified.

These indices aggregate some information about the specified expression on blocks, which consist of granularity_value granules (the size of the granule is specified using the index_granularity setting in the table engine). Then these aggregates are used in SELECT queries for reducing the amount of data to read from the disk by skipping big blocks of data where the where query can't be satisfied.

The GRANULARITY clause can be omitted, the default value of granularity_value is 1.

Example

SCHEMA >
    u64 UInt64,
    i32 Int32,
    s String

INDEXES >
    INDEX idx1 u64 TYPE bloom_filter GRANULARITY 3,
    INDEX idx2 u64 * i32 TYPE minmax GRANULARITY 3,
    INDEX idx3 u64 * length(s) TYPE set(1000) GRANULARITY 4

Indices from the example can be used by Tinybird to reduce the amount of data to read from disk in the following queries:

SELECT count() FROM ds WHERE u64 == 10
SELECT count() FROM ds WHERE u64 * i32 >= 1234
SELECT count() FROM ds WHERE u64 * length(s) == 1234

Data skipping indexes can also be created on composite columns:

-- on columns of type Map:
INDEX map_key_index mapKeys(map_column) TYPE bloom_filter
INDEX map_value_index mapValues(map_column) TYPE bloom_filter

-- on columns of type Tuple:
INDEX tuple_1_index tuple_column.1 TYPE bloom_filter
INDEX tuple_2_index tuple_column.2 TYPE bloom_filter

-- on columns of type Nested:
INDEX nested_1_index col.nested_col1 TYPE bloom_filter
INDEX nested_2_index col.nested_col2 TYPE bloom_filter

Available Types of Indices

MinMax

Stores extremes of the specified expression (if the expression is tuple, then it stores extremes for each element of tuple), uses stored info for skipping blocks of data like the primary key.

Syntax: minmax

Set

Stores unique values of the specified expression (no more than max_rows rows, max_rows=0 means “no limits”). Uses the values to check if the WHERE expression isn't satisfiable on a block of data.

Syntax: set(max_rows)

Bloom Filter

Stores a Bloom filter for the specified columns. An optional false_positive parameter with possible values between 0 and 1 specifies the probability of receiving a false positive response from the filter. Default value: 0.025. Supported data types: Int*, UInt*, Float*, Enum, Date, DateTime, String, FixedString, Array, LowCardinality, Nullable, UUID and Map. For the Map data type, the client can specify if the index should be created for keys or values using mapKeys or mapValues function.

Syntax: bloom_filter([false_positive])

N-gram Bloom Filter

Stores a Bloom filter that contains all n-grams from a block of data. Only works with datatypes: String, FixedString and Map. Can be used for optimization of EQUALS, LIKE and IN expressions.

Syntax: ngrambf_v1(n, size_of_bloom_filter_in_bytes, number_of_hash_functions, random_seed)

  • n: ngram size,
  • size_of_bloom_filter_in_bytes: Bloom filter size in bytes (you can use large values here, for example, 256 or 512, because it can be compressed well).
  • number_of_hash_functions: The number of hash functions used in the Bloom filter.
  • random_seed: The seed for Bloom filter hash functions.

Token Bloom Filter

The same as ngrambf_v1, but stores tokens instead of ngrams. Tokens are sequences separated by non-alphanumeric characters.

Syntax: tokenbf_v1(size_of_bloom_filter_in_bytes, number_of_hash_functions, random_seed)

Functions Support

Conditions in the WHERE clause contains calls of the functions that operate with columns. If the column is a part of an index, Tinybird tries to use this index when performing the functions. Tinybird supports different subsets of functions for using indexes.

Indexes of type set can be utilized by all functions. The other index types are supported as follows:

Function (operator) / Indexprimary keyminmaxngrambf_v1tokenbf_v1bloom_filterfull_text
equals (=, ==)
notEquals(!=, <>)
like
notLike
match
startsWith
endsWith
multiSearchAny
in
notIn
less (<)
greater (>)
lessOrEquals (<=)
greaterOrEquals (>=)
empty
notEmpty
has
hasAny
hasAll
hasToken
hasTokenOrNull
hasTokenCaseInsensitive (*)
hasTokenCaseInsensitiveOrNull (*)

Functions with a constant argument that is less than ngram size can’t be used by ngrambf_v1 for query optimization.

(*) For hasTokenCaseInsensitive and hasTokenCaseInsensitiveOrNull to be effective, the tokenbf_v1 index must be created on lowercased data, for example INDEX idx (lower(str_col)) TYPE tokenbf_v1(512, 3, 0).

Bloom filters can have false positive matches, so the ngrambf_v1, tokenbf_v1, and bloom_filter indexes can't be used for optimizing queries where the result of a function is expected to be false.

For example:

  • Can be optimized:
    • s LIKE '%test%'
    • NOT s NOT LIKE '%test%'
    • s = 1
    • NOT s != 1
    • startsWith(s, 'test')
  • Can not be optimized:
    • NOT s LIKE '%test%'
    • s NOT LIKE '%test%'
    • NOT s = 1
    • s != 1
    • NOT startsWith(s, 'test')

Settings

For a list of supported settings, see Engine settings.

Updated