Built to make your ClickHouse stream ingestion seamless

Features to help you quickly move Kafka streams to ClickHouse and apply stateful processing.

Connect and integrate

Process and transform

Operate and observe

Deploy and secure

Connect and integrate

Seamless integration with your development cycles and flexible connections to your Kafka/ClickHouse.

Auto-Detect field mapping to ClickHouse

GlassFlow automatically detects and converts date formats between your Kafka topic and your ClickHouse. For example, transforming MongoDB timestamps (via Kafka) into ClickHouse-compatible date formats. No manual mapping required.

Python SDK for programmatic pipeline creation

With GlassFlows Python SDK, data engineers can define, test and deploy transformations programmatically, making it easy to integrate GlassFlow into existing DevOps and data engineering workflows.

Connect to all Kafka providers

Connect GlassFlow to MSK, RedPanda and Confluent, etc.. Supported connection protocols include SASL, SSL, and more. Our ClickHouse connector is built using the native protocol. This gives you the best possible experience.

No code web Ul

GlassFlow’s web UI offers a guided experience for building and deploying real-time data pipelines. It enables not just engineers but also to analysts to use it.

Process and transform

Process and transform

GlassFlow includes data transformations and stateful processing to make your real-life use cases run smoothly and with low effort.

7 days deduplication

Duplicates are automatically detected within 7d of setup to ensure your data is always clean and that storage is not exhausted. Duplication can be based on the first or last event entering the pipeline.

Stateful Processing.

Built-in lightweight state store enables low-latency, in-memory deduplication and joins with context retention within the selected time window.

Joins, simplified.

Define the fields of the Kafka streams that you would like to join and GlassFlow handles execution and state management automatically

Auto-format any JSON to a flattered table

Nested JSON structures are automatically flattened, ensuring seamless ingestion into ClickHouse tables without complex parsing logic.

Operate and observe

Operate and observe

Get full visibility and control over your running pipelines. You can monitor performance, track data flow in real time and quickly identify bottlenecks or errors.

DLQ to keep your pipeline running

Automatically captures and isolates problematic events without disrupting data flow, making debugging and recovery effortless. Simply re-run your events after adjustments.

Analyze each step of the pipeline

End-to-end visibility into data flows, latency. and throughput. Complete with metrics, logs, and dashboards. Connect with Promotheus/ Grafana to centralize your observability.

‹12ms processing per event

GlassFlow processes events in under 12ms per record, enabling real-time stream transformations at scale.

Cost-efficient foot print

GlassFlow is lightweighted. No clusters to manage or infrastructure to provision. It minimizes operational overhead and costs while ensuring high performance and reliability for every pipeline.

Deploy and secure

Deploy and secure

Your data pipelines production-ready with one-click deployment, role-based access control, and end-to-end encryption.

Built to scale on your environment

GlassFlow runs natively on Kubernetes, leveraging its scaling, reliability, and orchestration capabilities. Making it easy for you to self-host.

Controlled usage

GlassFlow integrates with standard authentication frameworks such as Kerberos, providing secure and familiar identity management.

GlassFlow is secure

All data handled by GlassFlow is encrypted both at rest and in transit, ensuring end-to-end protection for sensitive information.

Frequently asked questions

Feel free to contact us if you have any questions after reviewing our FAQs.

Do you have a demo?

We have prepared several demo setups that you can run yourself locally or in the cloud. You can find them here.

Do you have a demo?

We have prepared several demo setups that you can run yourself locally or in the cloud. You can find them here.

How is GlassFlow’s deduplication different from ClickHouse’s ReplacingMergeTree?

ReplacingMergeTree (RMT) performs deduplication via background merges, which can delay accurate query results unless you force merges with FINAL—which can significantly impact read performance. GlassFlow moves deduplication upstream, before data is written to ClickHouse, ensuring real-time correctness and reducing load on ClickHouse.

How is GlassFlow’s deduplication different from ClickHouse’s ReplacingMergeTree?

ReplacingMergeTree (RMT) performs deduplication via background merges, which can delay accurate query results unless you force merges with FINAL—which can significantly impact read performance. GlassFlow moves deduplication upstream, before data is written to ClickHouse, ensuring real-time correctness and reducing load on ClickHouse.

How does GlassFlow’s deduplication work?

GlassFlow’s deduplication is powered by NATS JetStream  and is based on a user-defined key (e.g. user_id) and a time window (e.g. 1 hour) to identify duplicates. When multiple events with the same key arrive within the configured time window, only the first event is written to ClickHouse. Any subsequent events with the same key during that window are discarded. This mechanism ensures that only unique events are persisted, avoiding duplicates caused by retries or upstream noise.

How does GlassFlow’s deduplication work?

GlassFlow’s deduplication is powered by NATS JetStream  and is based on a user-defined key (e.g. user_id) and a time window (e.g. 1 hour) to identify duplicates. When multiple events with the same key arrive within the configured time window, only the first event is written to ClickHouse. Any subsequent events with the same key during that window are discarded. This mechanism ensures that only unique events are persisted, avoiding duplicates caused by retries or upstream noise.

Why do duplicates happen in Kafka pipelines at all?

Duplicate events in Kafka can occur for several reasons, including producer retries, network issues, or consumer reprocessing after failures. For example, if a producer doesn’t receive an acknowledgment, it may retry sending the same event—even if Kafka already received and stored it. Similarly, consumers might reprocess events after a crash or restart if offsets weren’t committed properly. These duplicates become a problem when writing to systems like ClickHouse, which are optimized for fast analytical queries but don’t handle event deduplication natively. Without a deduplication layer, the same event could be stored multiple times, inflating metrics, skewing analysis, and consuming unnecessary storage.

Why do duplicates happen in Kafka pipelines at all?

Duplicate events in Kafka can occur for several reasons, including producer retries, network issues, or consumer reprocessing after failures. For example, if a producer doesn’t receive an acknowledgment, it may retry sending the same event—even if Kafka already received and stored it. Similarly, consumers might reprocess events after a crash or restart if offsets weren’t committed properly. These duplicates become a problem when writing to systems like ClickHouse, which are optimized for fast analytical queries but don’t handle event deduplication natively. Without a deduplication layer, the same event could be stored multiple times, inflating metrics, skewing analysis, and consuming unnecessary storage.

What happens during failures? Can you lose or duplicate data?

GlassFlow uses NATS JetStream as a buffer. Kafka offsets are only committed after successful ingestion into NATS, and then data is deduplicated and written to ClickHouse. We batch inserts using the ClickHouse native protocol. If the system crashes after acknowledging Kafka but before inserting into ClickHouse, that batch is lost. We’re actively improving recovery guarantees to address this gap.

What happens during failures? Can you lose or duplicate data?

GlassFlow uses NATS JetStream as a buffer. Kafka offsets are only committed after successful ingestion into NATS, and then data is deduplicated and written to ClickHouse. We batch inserts using the ClickHouse native protocol. If the system crashes after acknowledging Kafka but before inserting into ClickHouse, that batch is lost. We’re actively improving recovery guarantees to address this gap.

What is the load that GlassFlow can handle?

We have created a load test for a local setup. You can find the setup and the results here link.

What is the load that GlassFlow can handle?

We have created a load test for a local setup. You can find the setup and the results here link.

How do I self-host GlassFlow?

We have several hosting options. You can find them here.

How do I self-host GlassFlow?

We have several hosting options. You can find them here.

Cleaned Kafka Streams for ClickHouse

Clean Data. No maintenance. Less load for ClickHouse.

Cleaned Kafka Streams for ClickHouse

Clean Data. No maintenance. Less load for ClickHouse.

Cleaned Kafka Streams for ClickHouse

Clean Data. No maintenance. Less load for ClickHouse.