Case Studies

Restack scales real-time enterprise observability with GlassFlow

Restack scales real-time enterprise observability with GlassFlow

84%

Reduction in engineering effort for building and maintaining pipelines

84%

Reduction in engineering effort for building and maintaining pipelines

87%

Reduction in pipeline-related incidents

87%

Reduction in pipeline-related incidents

10 TB / day

Stable processing

10 TB / day

Stable processing

“I am happy we found GlassFlow. I was fed up seeing my engineers debugging broken observability pipelines and spending their time creating custom solutions, while our focus should have been on improving our own product.”

Andres Tapia,

CEO, Restack

Results

After going live with GlassFlow, Restack observed:

  • 84% reduction in engineering effort for building and maintaining pipelines.

  • 87% reduction in pipeline-related incidents.

  • Stable ingestion of approximately 10 TB of data per day.

  • Support for new real-time observability and alerting use cases, like 

    • alerting when CPU > 85% for 3 mins on any node or 

    • the p99 latency is > 500ms for /checkout in prod.

GlassFlow removed operational complexity and allowed the team to focus on core platform development.

Key takeaways:

  • Built real-time, enterprise-grade observability without relying on Kafka table engines or slow merge patterns

  • Reduced pipeline engineering effort by 84% and incidents by 87%

  • Ingesting 10 TB every day from Kafka to ClickHouse (self-hosted)

Challenge - Real-time Observability

Restack needed reliable real-time observability for their platform and for their enterprise customers. Their existing setup was based on Kafka, a Kafka table engine, and custom Python transformation code.

This approach caused several problems:

  • Transformations were implemented as stateless Python code, which required significant ongoing maintenance.

  • Limited visibility into pipeline behavior made debugging difficult.

  • Pipelines would break intermittently without clear insight into root causes.

  • Heavy reliance on ReplacingMergeTree and FINAL made queries too slow for real-time observability and alerting.

  • Ensuring availability for enterprise customers became increasingly difficult.

As ingestion volume increased, this architecture became fragile and expensive to operate.

Solution

Restack adopted GlassFlow to simplify their streaming architecture while meeting strict real-time observability requirements.

GlassFlow was selected because it allowed Restack to:

  • Consume and transform approximately 6.7 billion messages per day.

  • Produce query-ready data without relying on ReplacingMergeTree or FINAL.

  • Enable real-time alerting on streaming data.

  • Expose built-in pipeline metrics.

  • Export metrics via OpenTelemetry and visualize them in Grafana.

  • Use dead-letter queues to isolate failures instead of breaking pipelines.

GlassFlow was implemented as the core of Restack’s observability data pipeline. The implementation included:

  • Stateless streaming transformations.

  • Deduplication over a seven-day time window.

  • Dead-letter queues to handle malformed or failing records.

  • Pipeline-level metrics exposed via OpenTelemetry.

The full rollout was completed in fifteen days and required minimal changes to downstream consumers.

Company Overview

Where product teams design, test and optimize agents at Enterprise Scale.

The open-source stack enabling product teams to improve their agent experience while engineers make them reliable at scale on Kubernetes.

As data volume and customer expectations have grown, Restack needed a more reliable way to build and operate streaming observability pipelines.

Transformed Kafka data for ClickHouse

Get query ready data, lower ClickHouse load, and reliable
pipelines at enterprise scale.

Transformed Kafka data for ClickHouse

Get query ready data, lower ClickHouse load, and reliable
pipelines at enterprise scale.

Transformed Kafka data for ClickHouse

Get query ready data, lower ClickHouse load, and reliable
pipelines at enterprise scale.