SummingMergeTree engine¶
The engine inherits from MergeTree. The difference is that when merging data parts for SummingMergeTree
Data Sources, the database replaces all the rows with the same primary key with one row which contains summarized values for the columns with the numeric data type. If the sorting key is composed in a way that a single key value corresponds to large number of rows, this significantly reduces storage volume and speeds up data selection.
Remember that, if not specified, the ENGINE_PRIMARY_KEY
matches the ENGINE_SORTING_KEY
.
Use the SummingMergeTree engine together with MergeTree
. Store complete data in MergeTree
table, and use SummingMergeTree
for aggregated data storing. Such an approach prevents you from losing valuable data due to an incorrectly composed primary key.
AggregatingMergeTree is generally a better choice because it allows more operations and behaves similarly to SummingMergeTree.
Query clauses¶
When creating a SummingMergeTree
table, the same clauses as when creating a MergeTree
table are required.
Usage example¶
summing_merge_tree.datasource
SCHEMA > `key` UInt32, `value` UInt32 ENGINE "SummingMergeTree" ENGINE_SORTING_KEY "key"
$ cat fixtures/summt.csv key,value 1,1 1,2 2,1 $ tb datasource append summing_merge_tree fixtures/summt.csv $ tb sql "SELECT key, sum(value) FROM summing_merge_tree GROUP BY key" ┌─key─┬─sum(value)─┐ │ 1 │ 3 │ │ 2 │ 1 │ └─────┴────────────┘
Data Processing¶
When rows are inserted into a Data Source, they are saved as-is. SummingMergeTree engine merges the inserted parts of data periodically and this is when rows with the same primary key are summed and replaced with one for each resulting part of data.
Merge process is async and trigger can't be controlled, i.e. the summation will be incomplete. Therefore when querying, the aggregate function sum()
and GROUP BY
clause should be used as described in the example above.
Common Rules for Summation¶
The values in the columns with the numeric data type are summarized. If the values were 0 in all of the columns for summation, the row is deleted. If the column isn't in the primary key and isn't summarized, an arbitrary value is selected from the existing ones.
The values aren't summarized for columns in the primary key.
Summation in the Aggregatefunction Columns¶
For columns of the AggregateFunction type, SummingMergeTree engine behaves as the AggregatingMergeTree engine, aggregating according to the function.
Settings¶
For a list of supported settings, see Engine settings.