Understanding ClickHouse’s AggregatingMergeTree Engine: Purpose-Built for High-Performance Aggregations

Understanding ClickHouse’s AggregatingMergeTree Engine: Purpose-Built for High-Performance Aggregations

Understanding ClickHouse’s AggregatingMergeTree Engine: Purpose-Built for High-Performance Aggregations

Understand and compare ClickHouse’s AggregatingMergeTree engine

Written by

Armend Avdijaj

Aug 7, 2025

ClickHouse is renowned for its blazing-fast OLAP (Online Analytical Processing) capabilities. While many know the default MergeTree engine, ClickHouse provides specialized variants optimized for particular workloads. Among these, AggregatingMergeTree stands out for scenarios involving pre-aggregated data. This post explores the AggregatingMergeTree engine in depth, explains how it works, where it shines, and how it compares with other engines like ReplacingMergeTree.

What Is AggregatingMergeTree?

AggregatingMergeTree is a table engine in ClickHouse designed for efficiently storing and merging aggregated data. Unlike regular MergeTree tables that store raw or unprocessed data, AggregatingMergeTree stores states of aggregate functions (like sum, avg, count, etc.) and merges them periodically during background merges.

This makes it particularly powerful for pre-aggregated OLAP workloads, where the same queries with GROUP BY and aggregations are run repeatedly. You can use AggregatingMergeTree tables for incremental data aggregation, including aggregated materialized views. It is appropriate to use AggregatingMergeTree if it reduces the number of rows by orders of magnitude.

How Does It Work Internally?

ClickHouse replaces all rows with the same primary key (or more accurately, with the same sorting key) with a single row (within a single data part) that stores a combination of states of aggregate functions.

The engine processes all columns with the following types:

The SimpleAggregateFunction data type stores the intermediate state of an aggregate function, but not its full state as the AggregateFunction type does. For example, sum is a SimpleAggregateFunction. We don’t need to store all the states to get the final sum — the latest summed value alone is sufficient. However, this is not the case with aggregate functions such as avg, where both the sum and the count states must be stored so that avg can be calculated.

The SimpleAggregateFunction type has better performance than the AggregateFunction for the same aggregate functions.

This optimization can be applied to functions for which the following property holds:

The result of applying a function f to a row set S1 ∪ S2 can be obtained by applying f to parts of the row set separately, and then again applying f to the results:
f(S1 ∪ S2) = f(f(S1) ∪ f(S2))

This property guarantees that partial aggregation results are enough to compute the combined one, so we do not have to store and process any extra data. For example, the results of the min or max functions require no extra steps to calculate the final result from the intermediate steps, whereas the avg function requires keeping track of a sum and a count.

Working Example

Let’s consider a common aggregation scenario: you want to analyze daily user activity across a website. Normally, you might store raw clickstream data and run a query like:

SELECT user_id, count() FROM events GROUP BY

This works well — until your table grows to billions of rows.

With AggregatingMergeTree, you store the intermediate aggregation state and retrieve the aggregated metrics blazingly fast.

Example Table Definition

A. Raw Table

The raw table stores data streamed from a source, such as a Kafka server. Data may be generated every few seconds, and there’s a possibility of delayed arrivals due to network latency.

CREATE TABLE user_activity_raw (
  user_id    UInt64,
  session_id UInt64,
  timestamp  DateTime('UTC')
)
ENGINE = MergeTree()
ORDER BY

B. State Table

The state table stores the aggregation states over a defined partition (defined by ORDER BY in the DDL). Here, page visits for each user are updated incrementally as new data streams in.

CREATE TABLE user_activity_agg (
  user_id     UInt64,
  num_visits  SimpleAggregateFunction(sum, UInt64),
  last_visit  SimpleAggregateFunction(max, DateTime('UTC'))
)
ENGINE = AggregatingMergeTree()
ORDER BY

C. Materialized View

The materialized view processes only the incremental data being inserted into the raw table. Each new data block updates the state table automatically.

CREATE MATERIALIZED VIEW user_activity_mv
TO user_activity_agg AS
SELECT
  user_id,
  count(session_id) AS num_visits,
  max(timestamp)    AS last_visit
FROM user_activity_raw
GROUP BY

Querying the Aggregated Data

To get the fully aggregated results:

SELECT
  a.user_id,
  sum(a.num_visits) AS num_visits,
  max(a.last_visit) AS last_visit
FROM user_activity_agg AS a
GROUP BY

Idempotency: Comparison with ReplacingMergeTree

Now, suppose you need idempotency while keeping the data pipeline robust enough to handle delayed data (e.g., due to poor network). You might aggregate data for the last n minutes and store the result for downstream consumption. To avoid duplication in the final aggregated table, ClickHouse offers two options: ReplacingMergeTree and AggregatingMergeTree.

ReplacingMergeTree removes duplicate entries with the same sorting key (ORDER BY section, not PRIMARY KEY).
However, deduplication occurs only during merges — which happen in the background at unknown times. Some data may remain unprocessed, and while you can trigger merges manually using OPTIMIZE, it’s heavy and not ideal for real-time guarantees.

Thus, ReplacingMergeTree is good for background deduplication to save space, but it does not guarantee the absence of duplicates at query time.

The issue with ReplacingMergeTree is that it provides only eventual correctness. Duplicate rows are identified during merges and only the highest version is retained. This means that queries can produce incorrect results until merges complete.
To ensure correct results, you need to combine background merges with query-time deduplication using the FINAL operator — which can be expensive at scale.

In high-throughput scenarios, tables can accumulate millions of duplicate records before merges catch up. Because deduplication happens at merge time, cleanup may take a long time.

From experience with both engines on large datasets, AggregatingMergeTree (especially with SimpleAggregateFunction) performs better and more predictably than ReplacingMergeTree for deduplication and aggregation workloads.

Did you like this article? Share it!

You might also like

Cleaned Kafka Streams for ClickHouse

Clean Data. No maintenance. Less load for ClickHouse.

Cleaned Kafka Streams for ClickHouse

Clean Data. No maintenance. Less load for ClickHouse.

Cleaned Kafka Streams for ClickHouse

Clean Data. No maintenance. Less load for ClickHouse.