CollapsingMergeTree engine¶
The CollapsingMergeTree
engine inherits from MergeTree and adds the logic of rows collapsing to data parts merge algorithm.
CollapsingMergeTree
asynchronously deletes, or collapses, pairs of rows if all of the fields in a sorting key (ENGINE_SORTING_KEY
) are equivalent except the particular field Sign
, which can have 1
and -1
values. Rows without a pair are kept. The engine may significantly reduce the volume of storage and increase the efficiency of SELECT
queries.
CollapsingMergeTree parameters¶
sign¶
ENGINE_SIGN
- name of the column with the type of row: 1
is a “state” row, -1
is a “cancel” row.
The column data type should be Int8
.
Usage example¶
cmt.datasource
SCHEMA > UserID UInt64 `json:$.UserID`, PageViews UInt8 `json:$.PageViews`, Duration UInt8 `json:$.Duration`, Sign Int8 `json:$.Sign` ENGINE "CollapsingMergeTree" ENGINE_SORTING_KEY "UserID" ENGINE_SIGN "Sign"
tb push datasources/cmt.datasource TB_HOST=$(cat .tinyb | jq ".host" -r) TB_TOKEN=$(cat .tinyb | jq ".token" -r) curl \ -H "Authorization: Bearer $TB_TOKEN" \ -d '{"UserID": 4324182021466249494, "PageViews": 5, "Duration": 146, "Sign": 1}' \ "$TB_HOST/v0/events?name=cmt" tb sql "select * from cmt" ----------------------------------------------------- | UserID | PageViews | Duration | Sign | ----------------------------------------------------- | 4324182021466249494 | 5 | 146 | 1 | -----------------------------------------------------
Let's add another row with the same UserID
and negate the previous row (same row with -1 as Sign
value).{"UserID": 4324182021466249494, "PageViews": 5, "Duration": 146, "Sign": -1}
curl \ -H "Authorization: Bearer $TB_TOKEN" \ -d $'{"UserID": 4324182021466249494, "PageViews": 6, "Duration": 185, "Sign": 1}\n{"UserID": 4324182021466249494, "PageViews": 5, "Duration": 146, "Sign": -1}\n' \ "$TB_HOST/v0/events?name=cmt" tb sql "select * from cmt" ----------------------------------------------------- | UserID | PageViews | Duration | Sign | ----------------------------------------------------- | 4324182021466249494 | 5 | 146 | 1 | | 4324182021466249494 | 6 | 185 | 1 | | 4324182021466249494 | 5 | 146 | -1 | -----------------------------------------------------
What happened here? Collapsing happens during merges, which are performed asynchronously and can't be controlled, so you should force it using FINAL
or query using aggregation.
tb sql "select * from cmt final" ----------------------------------------------------- | UserID | PageViews | Duration | Sign | ----------------------------------------------------- | 4324182021466249494 | 6 | 185 | 1 | ----------------------------------------------------- tb sql "SELECT UserID, sum(PageViews * Sign) AS PageViews, sum(Duration * Sign) AS Duration FROM cmt GROUP BY UserID HAVING sum(Sign) > 0" ---------------------------------------------- | UserID | PageViews | Duration | ---------------------------------------------- | 4324182021466249494 | 6 | 185 | ----------------------------------------------
A different approach would be to negate all the numeric columns and then sum them up.{"UserID": 4324182021466249494, "PageViews": -5, "Duration": -146, "Sign": -1}
curl \ -H "Authorization: Bearer $TB_TOKEN" \ -d '{"UserID": 4324182021466249494, "PageViews": 5, "Duration": 146, "Sign": 1}' \ "$TB_HOST/v0/events?name=cmt" tb sql "select * from cmt" ----------------------------------------------------- | UserID | PageViews | Duration | Sign | ----------------------------------------------------- | 4324182021466249494 | 5 | 146 | 1 | ----------------------------------------------------- curl \ -H "Authorization: Bearer $TB_TOKEN" \ -d $'{"UserID": 4324182021466249494, "PageViews": 6, "Duration": 185, "Sign": 1}\n{"UserID": 4324182021466249494, "PageViews": -5, "Duration": -146, "Sign": -1}\n' \ "$TB_HOST/v0/events?name=cmt" tb sql "SELECT UserID, sum(PageViews) AS PageViews, sum(Duration) AS Duration FROM UAct GROUP BY UserID" ---------------------------------------------- | UserID | PageViews | Duration | ---------------------------------------------- | 4324182021466249494 | 6 | 185 | ----------------------------------------------
Query clauses¶
When creating a CollapsingMergeTree
table, the same query clauses are required as when creating a MergeTree
table.
Settings¶
For a list of supported settings, see Engine settings.