Kafka to ClickHouse connector.
Open Source. Native. Fast. Reliable.

Open-source Kafka to ClickHouse connector with built-in deduplication, configurable batching, and native ClickHouse delivery. No Kafka Connect, no custom consumers. Set up in minutes

Managed Connectors

The Kafka and the ClickHouse connectors are built and updated by the GlassFlow team.

High Performance

The connectors are created for optimal throughput and native support.

Clean Data

You can dedupe and join Kafka streams within GlassFlow before ingesting to ClickHouse. Auto retries make sure your data is up-to-date.

Comparison

See in detail how GlassFlow performs compared to alternative solutions

Open source

Quick to start

Low maintanance

Latency

Built-in stateful processing

Error handling

Transformation support

Very low

Built in retries with backoff

Advanced

ClickHouse Kafka Table Engine

Very low

Basic

Limited

Clickpipes for

Kafka

Very low

Built-in retries and monitoring

Basic

Go
Service

Very low

Learn how to stream data from Kafka to ClickHouse using the Kafka table Engine, ClickPipes, or Kafka Connect. Understand when to use each.

From Kafka to ClickHouse: Get all details.

How does it work?

Supports multiple Kafka topics and partitions

GlassFlow natively supports consuming from multiple Kafka topics and partitions in parallel, ensuring high-throughput and scalable ingestion. It automatically handles partition assignment, offset tracking, and rebalancing behind the scenes. This allows you to build unified pipelines that process data from various sources without manual coordination.

Adjustable waiting times for optimal throughput

GlassFlow lets you configure wait times between batch reads from Kafka, allowing you to control how often data is flushed downstream. By adjusting this interval, you can optimize the trade-off between latency and throughput based on your workload. This flexibility helps maximize performance without overwhelming downstream systems like ClickHouse.


Configurable batch sizes

GlassFlow allows you to set configurable batch sizes for reading and processing data from Kafka, tailoring the amount of data handled in each batch. This helps balance between processing efficiency and memory usage, adapting to different workload demands. By tuning batch sizes, you can optimize pipeline throughput and reduce latency based on your system’s capacity and performance goals.

Kafka to ClickHouse Performance

GlassFlow sustains throughput beyond 500K events per second in a single pipeline while performing real-time transformations and delivering optimized batches to ClickHouse. This is achieved without requiring additional stream processing frameworks or custom consumer services.

Key performance characteristics:

  • Batch delivery latency: under 0.12ms per record end-to-end

  • Backpressure handling: automatic. The ingestor pauses Kafka consumption when the internal buffer fills, resuming when ClickHouse catches up. No events are dropped or lost.

  • Insert efficiency: configurable batch sizes prevent the "too many parts" error that degrades ClickHouse performance under high-frequency small inserts

  • Deduplication window: up to 7 days, configurable — duplicates from Kafka retries and rebalances are dropped before they reach ClickHouse

For a full breakdown of the scaling benchmarks, see how GlassFlow scales to 500k+ events/sec →

Frequently asked questions

Feel free to contact us if you have any questions after reviewing our FAQs.

Do you have a demo?

Yes, visit demo.glassflow.dev to see a live Kafka to ClickHouse pipeline processing real events. You can also book a proof of concept session and we'll walk through your specific use case.

Which datatypes are supported?

GlassFlow supports JSON event streams out of the box, including nested JSON structures. Primitive types (strings, integers, floats, booleans, timestamps) are handled automatically. Complex nested objects can be flattened or mapped to ClickHouse column types during the transformation step.
More data types like Avro or Protobuf in the Enterprise Edition of GlassFlow

Can you handle nested JSON?

Yes. GlassFlow includes schema normalization that flattens nested JSON before delivery to ClickHouse. This is important because ClickHouse performs best with flat, typed schemas rather than raw JSON blobs.

How do retries work?
How do I self-host GlassFlow?

Data transformations at TB scale for ClickHouse

Get query ready data, lower ClickHouse load, and reliable pipelines at enterprise scale.

Data transformations at TB scale for ClickHouse

Get query ready data, lower ClickHouse load, and reliable pipelines at enterprise scale.

Data transformations at TB scale for ClickHouse

Get query ready data, lower ClickHouse load, and reliable pipelines at enterprise scale.