VersionedCollapsingMergeTree engine

This engine allows to quickly write object states that are continually changing and deletes old object states in the background. This significantly reduces the volume of storage.

The engine inherits from MergeTree and adds the logic for collapsing rows to the algorithm for merging data parts. VersionedCollapsingMergeTree serves the same purpose as CollapsingMergeTree but uses a different collapsing algorithm that allows inserting the data in any order with multiple threads.

Engine Parameters

You can use the following parameters to configure the engine.

sign

ENGINE_SIGN is the name of the column with the type of row: 1 is a “state” row, -1 is a “cancel” row.

The column data type should be Int8.

version

ENGINE_VERSION is the name of the column with the version of the object state.

The column data type should be UInt*.

Usage example

vcmt.datasource
SCHEMA >
    UserID UInt64 `json:$.UserID`,
    PageViews UInt8 `json:$.PageViews`,
    Duration UInt8 `json:$.Duration`,
    Sign Int8 `json:$.Sign`,
    Version UInt16 `json:$.Version`

ENGINE "CollapsingMergeTree"
ENGINE_SORTING_KEY "UserID"
ENGINE_SIGN "Sign"
ENGINE_VERSION "Version"
tb push datasources/vcmt.datasource 

TB_HOST=$(cat .tinyb | jq ".host" -r)
TB_TOKEN=$(cat .tinyb | jq ".token" -r)

curl \
    -H "Authorization: Bearer $TB_TOKEN" \
    -d '{"UserID": 4324182021466249494, "PageViews": 5, "Duration": 146, "Sign": 1, "Version": 1}' \
    "$TB_HOST/v0/events?name=vcmt"

tb sql "select * from vcmt"
---------------------------------------------------------------
|              UserID | PageViews | Duration | Sign | Version |
---------------------------------------------------------------
| 4324182021466249494 |         5 |      146 |    1 |       1 |
---------------------------------------------------------------

Let's add another row with the same UserID and negate the previous row (same row with -1 as Sign value).
{"UserID": 4324182021466249494, "PageViews": 5, "Duration": 146, "Sign": -1, "Version": 2}

curl \
    -H "Authorization: Bearer $TB_TOKEN" \
    -d $'{"UserID": 4324182021466249494, "PageViews": 6, "Duration": 185, "Sign": 1, "Version": 2}\n{"UserID": 4324182021466249494, "PageViews": 5, "Duration": 146, "Sign": -1, "Version": 1}\n' \
    "$TB_HOST/v0/events?name=vcmt"

tb sql "select * from vcmt"
---------------------------------------------------------------
|              UserID | PageViews | Duration | Sign | Version |
---------------------------------------------------------------
| 4324182021466249494 |         5 |      146 | 1    |       1 |
| 4324182021466249494 |         6 |      185 | 1    |       2 |
| 4324182021466249494 |         5 |      146 | -1   |       1 |
---------------------------------------------------------------

What happened here? Collapsing happens during merges, which are performed asynchronously and can't be controlled, so you should force it using FINALor query using aggregation.

tb sql "select * from vcmt final"
---------------------------------------------------------------
|              UserID | PageViews | Duration | Sign | Version |
---------------------------------------------------------------
| 4324182021466249494 |         6 |      185 |    1 |       2 |
---------------------------------------------------------------

tb sql "SELECT
    UserID,
    sum(PageViews * Sign) AS PageViews,
    sum(Duration * Sign) AS Duration,
    Version
FROM vcmt
GROUP BY UserID, Version
HAVING sum(Sign) > 0
ORDER BY Version DESC
LIMIT 1 BY UserID"
--------------------------------------------------------
|              UserID | PageViews | Duration | Version |
--------------------------------------------------------
| 4324182021466249494 |         6 |      185 |       2 |    
--------------------------------------------------------

Query clauses

When creating a VersionedCollapsingMergeTree table, the same clauses as when creating a MergeTree table are required.

Settings

For a list of supported settings, see Engine settings.

Updated