SummingMergeTree engine

The engine inherits from MergeTree. The difference is that when merging data parts for SummingMergeTree Data Sources, the database replaces all the rows with the same primary key with one row which contains summarized values for the columns with the numeric data type. If the sorting key is composed in a way that a single key value corresponds to large number of rows, this significantly reduces storage volume and speeds up data selection.

Remember that, if not specified, the ENGINE_PRIMARY_KEY matches the ENGINE_SORTING_KEY.

Use the SummingMergeTree engine together with MergeTree. Store complete data in MergeTree table, and use SummingMergeTree for aggregated data storing. Such an approach prevents you from losing valuable data due to an incorrectly composed primary key.

AggregatingMergeTree is generally a better choice because it allows more operations and behaves similarly to SummingMergeTree.

Query clauses

When creating a SummingMergeTree table, the same clauses as when creating a MergeTree table are required.

Usage example

summing_merge_tree.datasource
SCHEMA >
    `key` UInt32,
    `value` UInt32

ENGINE "SummingMergeTree"
ENGINE_SORTING_KEY "key"
$ cat fixtures/summt.csv 
key,value
1,1
1,2
2,1

$ tb datasource append summing_merge_tree fixtures/summt.csv

$ tb sql "SELECT key, sum(value) FROM summing_merge_tree GROUP BY key"
┌─key─┬─sum(value)─┐
│   1 │          3 │
│   2 │          1 │
└─────┴────────────┘

Data Processing

When rows are inserted into a Data Source, they are saved as-is. SummingMergeTree engine merges the inserted parts of data periodically and this is when rows with the same primary key are summed and replaced with one for each resulting part of data.

Merge process is async and trigger can't be controlled, i.e. the summation will be incomplete. Therefore when querying, the aggregate function sum() and GROUP BY clause should be used as described in the example above.

Common Rules for Summation

The values in the columns with the numeric data type are summarized. If the values were 0 in all of the columns for summation, the row is deleted. If the column isn't in the primary key and isn't summarized, an arbitrary value is selected from the existing ones.

The values aren't summarized for columns in the primary key.

Summation in the Aggregatefunction Columns

For columns of the AggregateFunction type, SummingMergeTree engine behaves as the AggregatingMergeTree engine, aggregating according to the function.

Settings

For a list of supported settings, see Engine settings.

Updated