MergeTree engine¶
The MergeTree
engine is designed for high data ingest rates and huge data volumes. Is the default engine for Data Sources.
Key features:
- The table's sorting key (defined by
ENGINE_SORTING_KEY
) determines the sort order within each table part. - The sorting key doesn't reference individual rows but blocks of 8192 rows called granules.
- You can partition tables using an arbitrary partition expression. Partition pruning ensures partitions are omitted from reading when the query allows it. When partitioning you should be aware that partitioning is not the fastest way to speed up queries, in contrast to the
ENGINE_SORTING_KEY
. A partition is the minimum unit of data for async operations, but for streaming ingest you should not write to more than one or two partitions.
Creating a MergeTree Data Source¶
Here's how to define a MergeTree data source:
SCHEMA > `event_date` Date `json:$.date`, `event_type` LowCardinality(String) `json:$.event_type`, `user_id` UInt64 `json:$.user_id`, `payload` JSON `json:$.payload` ENGINE "MergeTree" ENGINE_SORTING_KEY "event_type, event_date, user_id" [ENGINE_PARTITION_KEY "toYYYYMM(event_date)"] [ENGINE_TTL "event_date + INTERVAL 30 DAY"] [ENGINE_SETTINGS index_granularity=8192] [INDEXES > index_name index_expression TYPE index_type GRANULARITY index_granularity]
Engine Settings¶
ENGINE_SORTING_KEY
- Defines the sorting key. This is a tuple of column names or arbitrary expressions that determines how data is sorted within each part. For example:"event_type, event_date"
.ENGINE_PARTITION_KEY
- Optional. Defines the partitioning key. In most cases, you don't need a partition key, and if you do need to partition, generally you don't need a partition key more granular than by month. To partition by month, use"toYYYYMM(event_date)"
.ENGINE_TTL
- Optional. Defines rules for data expiration. The expression must result in a Date or DateTime, for example"event_date + INTERVAL 1 DAY"
.ENGINE_SETTINGS
- Optional. Defines settings for the engine. For example,"index_granularity=8192"
.INDEXES >
- Optional. Defines indexes for the engine.
Data Storage Internals¶
A table consists of data parts sorted by sorting key. When data is inserted in a table, separate data parts are created and each of them is lexicographically sorted by sorting key. For example, if the sorting key is "event_type, event_date"
, the data in the part is sorted by event_type
, and within each event_type
, it is ordered by event_date
.
When data is inserted in a DataSource, separate data parts are created and each of them is lexicographically sorted by sorting key. Data belonging to the same sorting key is stored in the same data part. Data belonging to different partitions are separated into different parts. In the background, the DataSource merges data parts for more efficient storage. Parts belonging to different partitions are not merged.
The merge mechanism does not guarantee that all rows with the same primary key will be in the same data part. Each data part is logically divided into granules. A granule is the smallest indivisible data set that Tinybird reads when selecting data. Tinybird does not split rows or values, so each granule always contains an integer number of rows. The first row of a granule is marked with the value of the primary key for the row. For each data part, Tinybird creates an index file that stores the marks. For each column, whether it’s in the primary key or not, Tinybird also stores the same marks. These marks let you find data directly in column files.
The granule size is restricted by the index_granularity
setting of the table engine. The number of rows in a granule lays in the [1, index_granularity]
range, depending on the size of the rows.
Sorting Keys and Indexes in Queries¶
Take the "event_type, event_date"
sorting key as an example. In this case, the sorting and index can be illustrated as follows:
Whole data: [-------------------------------------------------------------------------] event_type: [aaaaaaaaaaaaaaaaaabbbbcdeeeeeeeeeeeeefgggggggghhhhhhhhhiiiiiiiiikllllllll] event_date: [1111111222222233331233211111222222333211111112122222223111112223311122333] Marks: | | | | | | | | | | | a,1 a,2 a,3 b,3 e,2 e,3 g,1 h,2 i,1 i,3 l,3 Marks numbers: 0 1 2 3 4 5 6 7 8 9 10
If the data query specifies:
event_type in ('a', 'h')
, the server reads the data in the ranges of marks[0, 3)
and[6, 8)
.event_type = 'a' AND event_date = 3
, the server reads the data in the ranges of marks[1, 3)
and[7, 8)
.event_date = 3
, the server reads the data in the range of marks[1, 10]
.
The examples above show that it is always more effective to use an index than a full scan.
Data Skipping Indexes¶
SCHEMA > `event_date` Date `json:$.date`, `event_type` LowCardinality(String) `json:$.event_type`, `user_id` UInt64 `json:$.user_id`, `payload` JSON `json:$.payload` ENGINE "MergeTree" ENGINE_SORTING_KEY "event_type, event_date, user_id" INDEXES > index_name index_expression TYPE index_type GRANULARITY index_granularity
For tables from the *MergeTree
family, data skipping indices can be specified.
These indices aggregate some information about the specified expression on blocks, which consist of granularity_value
granules (the size of the granule is specified using the index_granularity
setting in the table engine). Then these aggregates are used in SELECT
queries for reducing the amount of data to read from the disk by skipping big blocks of data where the where
query cannot be satisfied.
The GRANULARITY
clause can be omitted, the default value of granularity_value
is 1.
Example
SCHEMA > u64 UInt64, i32 Int32, s String INDEXES > INDEX idx1 u64 TYPE bloom_filter GRANULARITY 3, INDEX idx2 u64 * i32 TYPE minmax GRANULARITY 3, INDEX idx3 u64 * length(s) TYPE set(1000) GRANULARITY 4
Indices from the example can be used by Tinybird to reduce the amount of data to read from disk in the following queries:
SELECT count() FROM ds WHERE u64 == 10; SELECT count() FROM ds WHERE u64 * i32 >= 1234 SELECT count() FROM ds WHERE u64 * length(s) == 1234
Data skipping indexes can also be created on composite columns:
-- on columns of type Map: INDEX map_key_index mapKeys(map_column) TYPE bloom_filter INDEX map_value_index mapValues(map_column) TYPE bloom_filter -- on columns of type Tuple: INDEX tuple_1_index tuple_column.1 TYPE bloom_filter INDEX tuple_2_index tuple_column.2 TYPE bloom_filter -- on columns of type Nested: INDEX nested_1_index col.nested_col1 TYPE bloom_filter INDEX nested_2_index col.nested_col2 TYPE bloom_filter
Available Types of Indices¶
MinMax¶
Stores extremes of the specified expression (if the expression is tuple
, then it stores extremes for each element of tuple
), uses stored info for skipping blocks of data like the primary key.
Syntax: minmax
Set¶
Stores unique values of the specified expression (no more than max_rows
rows, max_rows=0
means “no limits”). Uses the values to check if the WHERE
expression is not satisfiable on a block of data.
Syntax: set(max_rows)
Bloom Filter¶
Stores a Bloom filter for the specified columns. An optional false_positive
parameter with possible values between 0 and 1 specifies the probability of receiving a false positive response from the filter. Default value: 0.025. Supported data types: Int*
, UInt*
, Float*
, Enum
, Date
, DateTime
, String
, FixedString
, Array
, LowCardinality
, Nullable
, UUID
and Map
. For the Map
data type, the client can specify if the index should be created for keys or values using mapKeys or mapValues function.
Syntax: bloom_filter([false_positive])
N-gram Bloom Filter¶
Stores a Bloom filter that contains all n-grams from a block of data. Only works with datatypes: String, FixedString and Map. Can be used for optimization of EQUALS
, LIKE
and IN
expressions.
Syntax: ngrambf_v1(n, size_of_bloom_filter_in_bytes, number_of_hash_functions, random_seed)
n
: ngram size,size_of_bloom_filter_in_bytes
: Bloom filter size in bytes (you can use large values here, for example, 256 or 512, because it can be compressed well).number_of_hash_functions
: The number of hash functions used in the Bloom filter.random_seed
: The seed for Bloom filter hash functions.
Token Bloom Filter¶
The same as ngrambf_v1
, but stores tokens instead of ngrams. Tokens are sequences separated by non-alphanumeric characters.
Syntax: tokenbf_v1(size_of_bloom_filter_in_bytes, number_of_hash_functions, random_seed)
Special-purpose¶
- Experimental indexes to support approximate nearest neighbor (ANN) search.
- An experimental full-text index to support full-text search.
Functions Support¶
Conditions in the WHERE
clause contains calls of the functions that operate with columns. If the column is a part of an index, Tinybird tries to use this index when performing the functions. Tinybird supports different subsets of functions for using indexes.
Indexes of type set
can be utilized by all functions. The other index types are supported as follows:
Function (operator) / Index | primary key | minmax | ngrambf_v1 | tokenbf_v1 | bloom_filter | full_text |
---|---|---|---|---|---|---|
equals (=, ==) | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
notEquals(!=, <>) | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
like | ✔ | ✔ | ✔ | ✔ | ✗ | ✔ |
notLike | ✔ | ✔ | ✔ | ✔ | ✗ | ✔ |
match | ✗ | ✗ | ✔ | ✔ | ✗ | ✔ |
startsWith | ✔ | ✔ | ✔ | ✔ | ✗ | ✔ |
endsWith | ✗ | ✗ | ✔ | ✔ | ✗ | ✔ |
multiSearchAny | ✗ | ✗ | ✔ | ✗ | ✗ | ✔ |
in | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
notIn | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
less (<) | ✔ | ✔ | ✗ | ✗ | ✗ | ✗ |
greater (>) | ✔ | ✔ | ✗ | ✗ | ✗ | ✗ |
lessOrEquals (<=) | ✔ | ✔ | ✗ | ✗ | ✗ | ✗ |
greaterOrEquals (>=) | ✔ | ✔ | ✗ | ✗ | ✗ | ✗ |
empty | ✔ | ✔ | ✗ | ✗ | ✗ | ✗ |
notEmpty | ✔ | ✔ | ✗ | ✗ | ✗ | ✗ |
has | ✗ | ✗ | ✔ | ✔ | ✔ | ✔ |
hasAny | ✗ | ✗ | ✔ | ✔ | ✔ | ✗ |
hasAll | ✗ | ✗ | ✗ | ✗ | ✔ | ✗ |
hasToken | ✗ | ✗ | ✗ | ✔ | ✗ | ✔ |
hasTokenOrNull | ✗ | ✗ | ✗ | ✔ | ✗ | ✔ |
hasTokenCaseInsensitive (*) | ✗ | ✗ | ✗ | ✔ | ✗ | ✗ |
hasTokenCaseInsensitiveOrNull (*) | ✗ | ✗ | ✗ | ✔ | ✗ | ✗ |
Functions with a constant argument that is less than ngram size can’t be used by ngrambf_v1
for query optimization.
(*) For hasTokenCaseInsensitive
and hasTokenCaseInsensitiveOrNull
to be effective, the tokenbf_v1
index must be created on lowercased data, for example INDEX idx (lower(str_col)) TYPE tokenbf_v1(512, 3, 0)
.
Bloom filters can have false positive matches, so the ngrambf_v1
, tokenbf_v1
, and bloom_filter
indexes can not be used for optimizing queries where the result of a function is expected to be false.
For example:
- Can be optimized:
s LIKE '%test%'
NOT s NOT LIKE '%test%'
s = 1
NOT s != 1
startsWith(s, 'test')
- Can not be optimized:
NOT s LIKE '%test%'
s NOT LIKE '%test%'
NOT s = 1
s != 1
NOT startsWith(s, 'test')
Settings¶
For a list of supported settings, see Engine settings.