But raw telemetry pipelines still struggle with duplicate events, schema mismatches, and expensive query-time enrichment. GlassFlow sits between your data sources and ClickStack, handling deduplication, schema normalization, and stream enrichment before data lands in ClickHouse, so your observability data is clean, accurate, and cost-efficient from the start.

Cost Effective

Store less. Compute less. Pay less. Get enterprise-grade observability at a lower cost with upstream deduplication, enrichment, and schema management.

Sub-millisecond Processing

GlassFlow real-time stream processing removes duplicates, enriches events, and flattens payloads before they land in ClickHouse or ClickStack—ensuring accurate alerts and consistently query-optimized tables.

Simple to Use

Manage OpenTelemetry logs, metrics, and traces through one open-source platform powered by ClickStack. GlassFlow handles ingestion, ETL, and data quality—no separate tools or complex orchestration needed.

Observability Stack Comparison

See how a combined GlassFlow and ClickStack platform compares to traditional observability platforms.

Data deduplication

Schema evolution

Real-time enrichment

Open-source

ETL processing

Filtering & transformation

Datadog / New Relic / ELK Stack

Query-time only

Manual updates, fragile pipelines

Costly JOINs in storage engines

Varies

Requires separate tools

Limited

ClickStack + GlassFlow

Real-time, upstream, stateful

Automatic detection and adjustment

Pre-ingestion enrichment + flattening

Fully open-source

Native streaming ETL

Powerful upstream filtering & shaping

Data deduplication

Schema evolution

Real-time enrichment

Open-source

ETL processing

Filtering & transformation

Datadog / New Relic / ELK Stack

Query-time only

Manual updates, fragile pipelines

Costly JOINs in storage engines

Varies

Requires separate tools

Limited

ClickStack + GlassFlow

Real-time, upstream, stateful

Automatic detection and adjustment

Pre-ingestion enrichment + flattening

Fully open-source

Native streaming ETL

Powerful upstream filtering & shaping

Datadog / New Relic / ELK Stack

Query-time only

Manual updates, fragile pipelines

Costly JOINs in storage engines

Varies

Requires separate tools

Limited

ClickStack + GlassFlow

Real-time, upstream, stateful

Automatic detection and adjustment

Pre-ingestion enrichment + flattening

Fully open-source

Native streaming ETL

Powerful upstream filtering & shaping

Limitations of ClickHouse ReplacingMergeTree Observability

ClickHouse’s ReplacingMergeTree introduces three major challenges that make achieving real-time observability difficult on the ClickStack platform.

For a full breakdown of ReplacingMergeTree's limitations and how to solve them upstream,
read our guide →

Deduplication is only eventual and requires FINAL for accuracy

Schema evolution demands manual ALTER TABLE operations that risk breaking ingestion

Query-time enrichment is costly at scale

How does it work?

GlassFlow connects to your Kafka topics or OTLP sources and processes telemetry before it reaches ClickStack. Duplicate spans are dropped within a configurable 7-day window. New fields are detected and added automatically without the need of manual ALTER TABLE. Events are batched and delivered to ClickHouse in sizes optimized for ingestion performance.
The result: accurate metrics, reliable alerts, and lower storage costs.

See it live

Get started on GitHub

Real-Time Deduplication for ClickHouse and ClickStack

GlassFlow intercepts your telemetry streams (logs, traces, metrics) before data lands in ClickHouse. With a configurable sliding window (e.g., up to 7 days), it maintains state to detect and reject duplicate events (based on keys like event_id, trace_id, etc.), forwarding only the first occurrence into storage. This ensures you never ingest duplicate log entries — meaning more accurate metrics, alerts, and cost-efficient storage.

Temporal Joins & Upstream Enrichment

Need to enrich logs with metadata — like user info, environment tags, version identifiers, or cross-stream context? GlassFlow lets you perform temporal stream joins across Kafka topics (or other input streams) within defined time windows. That means by the time data hits ClickHouse or ClickStack, it’s already enriched, flattened, and analytics-ready — no expensive JOINs or heavy query-time transformations required later.

Schema Management to Production-Ready Pipelines

GlassFlow supplies native connectors (Kafka → GlassFlow → ClickHouse) and offers a Web UI or SDK to define ingestion pipelines, mapping fields, scheduling dedupe windows, and configuring joins. It handles schema evolution transparently, adapts to new or changing event formats, and routes malformed or invalid events to a dead-letter queue — so bad data doesn’t corrupt your observability backbone. Under actual load, GlassFlow adds sub-millisecond latency per event and scales to millions of records per second, making it ideal for high-volume, real-time observability workloads.

Frequently asked questions

Feel free to contact us if you have any questions after reviewing our FAQs.

What happens if GlassFlow goes down — do we lose telemetry or block ingestion?

GlassFlow uses NATS JetStream as a buffer. Kafka offsets are only committed after successful ingestion into NATS, and then data is deduplicated and written to ClickHouse. We batch inserts using the ClickHouse native protocol. If the system crashes after acknowledging Kafka but before inserting into ClickHouse, that batch is lost. We're actively improving recovery guarantees to address this gap.

What delivery guarantees does GlassFlow provide (at-least-once, exactly-once), and how does deduplication stay correct?

How do we integrate GlassFlow with our existing pipeline (OpenTelemetry, Kafka, Fluent Bit)?

GlassFlow usually connects to your Apache Kafka broker and takes the events from there. If you want us to connect directly with your collector, please reach out to us via the contact form.

How does GlassFlow handle schema evolution — and what happens when events change or break?

GlassFlow connects to confluent schema registry, to monitor changes, notifies you and adapts the pipeline schema.

What performance overhead does GlassFlow introduce, and how does it scale at high volume?

GlassFlow can easily handle hundred thousand rec/sec and depending on the transformation that you want to run the performance overhead is minimal. Usually we see an end-to-end p95 latency of <1s.