Built to make your ClickHouse stream ingestion seamless

Features to help you quickly move Kafka streams to ClickHouse and apply stateful processing.

Connect and integrate

Process and transform

Operate and observe

Deploy and secure

Connect and integrate

Seamless integration with your development cycles and flexible connections to your Kafka/ClickHouse.

Auto-Detect field mapping to ClickHouse

GlassFlow automatically detects and converts date formats between your Kafka topic and your ClickHouse. For example, transforming MongoDB timestamps (via Kafka) into ClickHouse-compatible date formats. No manual mapping required.

Python SDK for programmatic pipeline creation

With GlassFlows Python SDK, data engineers can define, test and deploy transformations programmatically, making it easy to integrate GlassFlow into existing DevOps and data engineering workflows.

Connect to all Kafka providers

Connect GlassFlow to MSK, RedPanda and Confluent, etc. Supported connection protocols include SASL, SSL, and more. Our ClickHouse connector is built using the native protocol. This gives you the best possible experience.

No code web Ul

GlassFlow’s web UI offers a guided experience for building and deploying real-time data pipelines. Powerful enough for engineers and clear enough for analysts to use it too.

Process and transform

Process and transform

GlassFlow includes data transformations and stateful processing to make your use cases run smoothly and with low effort.

7-days deduplication

Duplicates are automatically detected within 7 days of setup to ensure your data is always clean and that storage is not exhausted. Duplication can be based on the first or last event entering the pipeline.

Stateful processing

Built-in lightweight state store enables low-latency, in-memory deduplication and joins with context retention within the selected time window.

Joins, simplified

Define the fields of the Kafka streams that you would like to join and GlassFlow handles execution and state management automatically

Auto-format any JSON to a flattered table

Nested JSON structures are automatically flattened, ensuring seamless ingestion into ClickHouse tables without complex parsing logic.

Operate and observe

Operate and observe

Get full visibility and control over your running pipelines. You can monitor performance, track data flow in real time, and quickly identify bottlenecks or errors.

DLQ to keep your pipeline running

Automatically captures and isolates problematic events without disrupting data flow, making debugging and recovery effortless. Simply re-run your events after adjustments.

Analyze each step of the pipeline

End-to-end visibility into data flows, latency. and throughput. Complete with metrics, logs, and dashboards. Connect with Promotheus/ Grafana to centralize your observability.

‹12ms processing per event

GlassFlow processes events in under 12ms per record, enabling real-time stream transformations at scale.

Cost-efficient foot print

GlassFlow is lightweight. No clusters to manage or infrastructure to provision. It minimizes operational overhead and costs while ensuring high performance and reliability for every pipeline.

Deploy and secure

Deploy and secure

Make your data pipelines production-ready with one-click deployment, role-based access control, and end-to-end encryption.

Controlled usage

GlassFlow integrates with standard authentication frameworks such as Kerberos, providing secure and familiar identity management.

Built to scale on your environment

GlassFlow runs natively on Kubernetes, leveraging its scaling, reliability, and orchestration capabilities. Making it easy for you to self-host.

GlassFlow is secure

All data handled by GlassFlow is encrypted both at rest and in transit, ensuring end-to-end protection for sensitive information.

Frequently asked questions

Feel free to contact us if you have any questions after reviewing our FAQs.

Do you have a demo?

We have prepared several demo setups that you can run yourself locally or in the cloud. You can find them here.

Do you have a demo?

We have prepared several demo setups that you can run yourself locally or in the cloud. You can find them here.

How is GlassFlow’s deduplication different from ClickHouse’s ReplacingMergeTree?

ReplacingMergeTree (RMT) performs deduplication via background merges, which can delay accurate query results unless you force merges with FINAL—which can significantly impact read performance. GlassFlow moves deduplication upstream, before data is written to ClickHouse, ensuring real-time correctness and reducing load on ClickHouse.

How is GlassFlow’s deduplication different from ClickHouse’s ReplacingMergeTree?

ReplacingMergeTree (RMT) performs deduplication via background merges, which can delay accurate query results unless you force merges with FINAL—which can significantly impact read performance. GlassFlow moves deduplication upstream, before data is written to ClickHouse, ensuring real-time correctness and reducing load on ClickHouse.

How does GlassFlow’s deduplication work?

GlassFlow’s deduplication is powered by NATS JetStream  and is based on a user-defined key (e.g. user_id) and a time window (e.g. 1 hour) to identify duplicates. When multiple events with the same key arrive within the configured time window, only the first event is written to ClickHouse. Any subsequent events with the same key during that window are discarded. This mechanism ensures that only unique events are persisted, avoiding duplicates caused by retries or upstream noise.

How does GlassFlow’s deduplication work?

GlassFlow’s deduplication is powered by NATS JetStream  and is based on a user-defined key (e.g. user_id) and a time window (e.g. 1 hour) to identify duplicates. When multiple events with the same key arrive within the configured time window, only the first event is written to ClickHouse. Any subsequent events with the same key during that window are discarded. This mechanism ensures that only unique events are persisted, avoiding duplicates caused by retries or upstream noise.

Why do duplicates happen in Kafka pipelines at all?

Duplicate events in Kafka can occur for several reasons, including producer retries, network issues, or consumer reprocessing after failures. For example, if a producer doesn’t receive an acknowledgment, it may retry sending the same event—even if Kafka already received and stored it. Similarly, consumers might reprocess events after a crash or restart if offsets weren’t committed properly. These duplicates become a problem when writing to systems like ClickHouse, which are optimized for fast analytical queries but don’t handle event deduplication natively. Without a deduplication layer, the same event could be stored multiple times, inflating metrics, skewing analysis, and consuming unnecessary storage.

Why do duplicates happen in Kafka pipelines at all?

Duplicate events in Kafka can occur for several reasons, including producer retries, network issues, or consumer reprocessing after failures. For example, if a producer doesn’t receive an acknowledgment, it may retry sending the same event—even if Kafka already received and stored it. Similarly, consumers might reprocess events after a crash or restart if offsets weren’t committed properly. These duplicates become a problem when writing to systems like ClickHouse, which are optimized for fast analytical queries but don’t handle event deduplication natively. Without a deduplication layer, the same event could be stored multiple times, inflating metrics, skewing analysis, and consuming unnecessary storage.

What happens during failures? Can you lose or duplicate data?

GlassFlow uses NATS JetStream as a buffer. Kafka offsets are only committed after successful ingestion into NATS, and then data is deduplicated and written to ClickHouse. We batch inserts using the ClickHouse native protocol. If the system crashes after acknowledging Kafka but before inserting into ClickHouse, that batch is lost. We’re actively improving recovery guarantees to address this gap.

What happens during failures? Can you lose or duplicate data?

GlassFlow uses NATS JetStream as a buffer. Kafka offsets are only committed after successful ingestion into NATS, and then data is deduplicated and written to ClickHouse. We batch inserts using the ClickHouse native protocol. If the system crashes after acknowledging Kafka but before inserting into ClickHouse, that batch is lost. We’re actively improving recovery guarantees to address this gap.

What is the load that GlassFlow can handle?

GlassFlow can easily handle hundreds of rec/sec across multiple pipelines.

What is the load that GlassFlow can handle?

GlassFlow can easily handle hundreds of rec/sec across multiple pipelines.

Which features are coming next?

At GlassFlow we constantly improve our product based on user feedback. Here is a link to our public roadmap. If a feature is missing, you can request it directly on the roadmap page..

Which features are coming next?

At GlassFlow we constantly improve our product based on user feedback. Here is a link to our public roadmap. If a feature is missing, you can request it directly on the roadmap page..

How do I self-host GlassFlow?

We have several hosting options. You can find them here.

How do I self-host GlassFlow?

We have several hosting options. You can find them here.

Cleaned Kafka Streams for ClickHouse

Clean Data. No maintenance. Less load for ClickHouse.

Cleaned Kafka Streams for ClickHouse

Clean Data. No maintenance. Less load for ClickHouse.

Cleaned Kafka Streams for ClickHouse

Clean Data. No maintenance. Less load for ClickHouse.