More control
With GlassFlow your data is immediately deduplicated. That means that your query results are correct without any delays.
Less Load
Drop duplicates and reduce the data volume on your ClickHouse. That makes your system faster and cheaper to run.
Clean Data
By deduplicating before ingestion you ensure that only clean data reaches your ClickHouse.
Comparison
See in detail how GlassFlow performs compared to alternative solutions
Deduplication
Immediate range
Late event management
Stateful store included
Quick to start
Reduced load for ClickHouse
Low maintanance effort
Open source
ClickHouse
ReplaceMergingTree
Go
Service

Limits of ReplacingMergeTree at ClickHouse
RMT is very popular among ClickHouse users, but there are certain limitations. The merging process is not controllable and it slows down the system during merges. FINAL brings other challenges. Learn more about all of them through our blog article.
How does it work?
7 days deduplication checks
Our system automatically detects and rejects duplicate records within up to 7 days (configurable), keeping your data clean and preventing unnecessary storage use. You can define specific fields as deduplication keys, ensuring only unique data is accepted. The system refuses any duplicates identified in real time. With a one-click setup, it's easy to launch fully deduplicated data pipelines with zero manual overhead and minimal lag (<0.12ms per record).
Stateful store built-in
GlassFlow’s built-in stateful store maintains context across streaming events, enabling advanced use cases like deduplication, joins, and aggregations. The state is fully managed and persists automatically without needing external databases or extra infrastructure. With support for keyed state and time-based windows, you can build reliable, real-time pipelines that go far beyond simple transformations.
Managed Kafta and ClickHouse connector
The integration uses a native ClickHouse connection for top performance and reliability. You can tune batch sizes and wait times to optimize throughput, with built-in retries for handling errors. It includes automatic schema detection and management, plus full support for JSON data types, making it easy to work with complex, nested data.
Frequently asked questions
Feel free to contact us if you have any questions after reviewing our FAQs.