AggregatingMergeTree engine

Use the AggregatingMergeTree engine for incremental data aggregation, including for aggregated materialized views.

The engine processes all columns with the following types:

Use AggregatingMergeTree if it reduces the number of rows by orders of magnitude.

Creating an AggregatingMergeTree Data Source

The AggregatingMergeTree Data Source is usually created using a Materialized View from a MergeTree Data Source.

Example:

user_activity.datasource
SCHEMA >
    `timestamp` DateTime `json:$.timestamp`,
    `user_id` UInt32 `json:$.user_id`,
    `activity_type` String `json:$.activity_type`,
    `payload` JSON `json:$.payload`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "activity_type, user_id, timestamp"
total_daily_activities_mat.pipe
NODE total_daily_activities_1
SQL >

    SELECT
        activity_type,
        toDate(timestamp) AS date,
        countState() AS total_activities
    FROM user_activity
    GROUP BY
        activity_type,
        date

TYPE materialized
DATASOURCE total_daily_activities_mv
total_daily_activities_mv.datasource

SCHEMA >
    `activity_type` String,
    `date` Date,
    `total_activities` AggregateFunction(count)

ENGINE "AggregatingMergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(date)"
ENGINE_SORTING_KEY "activity_type, date"

Querying an AggregatingMergeTree Data Source

As explained in the MergeTree engine, parts are merged in the background, so you are never sure if the data is merged or not. That's why you should always use the -Merge() modifier after a -State() when needed.

Querying an AggregatingMergeTree Data Source
SELECT 
    activity_type, 
    date, 
    countMerge(total_activities) AS daily_total_activities
FROM total_daily_activities_mv
GROUP BY activity_type, date

Settings

For a list of supported settings, see Engine settings.

Was this page helpful?
Updated