Over the past few weeks, we've shipped GlassFlow v3.1.0 and v3.2.0, which together complete a major reliability milestone for GlassFlow's ClickHouse ETL: end-to-end backpressure handling and observable, resilient data delivery from source to sink.

Here's a summary of what changed and why it matters.

v3.1.0: Backpressure Across the Entire Data Plane

Before v3.1.0, when a downstream stage couldn't keep up, upstream components had no coordinated way to slow down. Events would end up in the DLQ, causing NATS memory to fill up and the pipeline to enter a fail state — a pattern we've explored in depth in how Kafka-to-ClickHouse pipelines fail under pressure. v3.1.0 closes that gap.

First-class backpressure handling

Every stage in the pipeline, from the Kafka ingestor, dedup, join, OTLP receiver, or the JetStream publisher, now detects when the downstream operator is full, retries with bounded backoff, and emits Prometheus metrics so you can see it happening in real time.

A new gfm_ingestor_backpressure_* metric family covers active state, episode count, and per-episode duration. You can now alert directly on gfm_ingestor_backpressure_active == 1 for 5m instead of inferring from lag.

Configurable inter-stage stream caps (`MaxMsgs`)

Inter-stage NATS streams now have an explicit MaxMsgs cap via resources.nats.stream.maxMsgs in the pipeline config. The default is auto-computed from your sink batch size, so most pipelines need no manual tuning. When a stream hits its cap, producing components enter a backpressure episode without dropping events.

Chunked NATS publishing

The NATS async batch writer now chunks payloads to a hard memory cap per request, preventing OOM scenarios when a single Kafka or OTLP request would otherwise produce an outsized NATS write.

Automated data migration container

Upgrading from v2 to v3? A one-shot Kubernetes Job glassflow/glassflow-etl-data-migration, included in Helm chart v0.5.17, automatically migrates any v2-format pipeline configs in Postgres to v3 on chart install or upgrade. No manual transformation needed for most cases.

There are no breaking changes from v3.0.0. Use Helm chart v0.5.17.

v3.2.0: Resilient Sink, Hardened OTLP, and Deeper Observability

v3.2.0 builds on the backpressure foundations from the previous release and makes the sink meaningfully more durable.

ClickHouse sink retries via NACK

The biggest change: the ClickHouse sink now classifies errors as retryable (timeouts, transient unavailability) or permanent, and handles each differently. Previously, any failure would route the batch to the DLQ on the first attempt.

Now, retryable errors NACK back to JetStream, which redelivers per consumer policy. Your data stays in the pipeline rather than landing in the DLQ unnecessarily. Permanent errors still route to DLQ as before.

Keeping events out of the DLQ is one side of the equation. Deduplication before ingestion is the other.

Three new metrics make this fully observable:

Gfm_sink_errors_by_classification_total: broken down by classification and error name

gfm_sink_nack_messages_total
gfm_sink_retries_total: by outcome (retry or exhausted)

If you monitor DLQ traffic, expect volume to drop after upgrading. Update your alerts to use gfm_sink_errors_by_classification_total{classification="permanent"} for a more accurate signal.

OTLP receiver hardening

The OTLP receiver now ships with explicit concurrency and memory controls:

maxConcurrentRequests: 50. When exceeded, the receiver returns 503/ResourceExhausted, which standard OTel exporters handle with retry
natsChunkSize: 1000: bounds per-request memory regardless of upstream batch size
Fixed a wedge where the receiver would become stuck after a NATS cluster restart

If you're routing OTLP data through GlassFlow, this breakdown of when you actually need Kafka in your OTel pipeline is worth a read.

Operator concurrent reconciles

The Kubernetes operator now reconciles up to 4 pipelines in parallel (configurable via controllerManager.manager.maxConcurrentReconciles). A long-running reconcile on one pipeline no longer blocks reconciles on all others.

End-to-end backpressure observability

Every component (ingestor, dedup, join, OTLP receiver) now emits a ComponentSignal to the operator when a backpressure episode starts with a 5-minute cooldown to avoid control-plane noise. The new gfm_component_backpressure_* metric family gives you full per-component visibility. DLQ metrics now also carry a reason label (parse_error, schema_mismatch, sink_rejection, retry_exhausted, dedup_overflow, unrecoverable) so you can break down traffic by cause.

For a full example of wiring these metrics into a live observability stack, see how to Build a ClickHouse Telemetry Pipeline with OpenTelemetry and GlassFlow.

New Sources / Integrations directory

The docs now include a dedicated Sources / Integrations directory covering all 29 supported sources from streaming platforms, telemetry collectors, databases, all the way to object storage, and table formats. Each comes with per-source guides and Open Source / Enterprise labels.

There are no breaking changes from v3.1.0. Use Helm chart v0.5.21.

How to Upgrade

Both releases follow the same upgrade path via Helm:

v3.1.0 → chart v0.5.17 (from v3.0.0)
v3.2.0 → chart v0.5.21 (from v3.1.0)

See the Kubernetes installation guide for full instructions.

Read the Full Release Notes

All the details, PR references, and migration notes are in the docs:

Questions? Talk to the Team

If you're upgrading, evaluating GlassFlow, or have questions about how these changes affect your pipelines, we'd love to hear from you.

Book a call with us

Or reach out via our contact form

We're always happy to jump on a call and dive into the details.

Try it now

Did you like this article? Share it!

GlassFlow performance, ClickHouse ingestion, Kafka to ClickHouse, data transformations, ClickHouse sink connector

Product Updates

GlassFlow now scales to 500k+ Events/Sec with zero changes to your existing pipeline

Teams kept asking if Glassflow would scale, so we rebuilt it to grow from small to massive workloads without changing your pipeline.

Written by

Armend Avdijaj

Product Updates

GlassFlow now scales to 500k+ Events/Sec with zero changes to your existing pipeline

Teams kept asking if Glassflow would scale, so we rebuilt it to grow from small to massive workloads without changing your pipeline.

Written by

Armend Avdijaj

Telemetry pipeline, OpenTelemetry, observability pipeline, GlassFlow, ClickHouse pipeline, ClickHouse observability, OTel, OTel Collector, data masking

Tutorials

Build a ClickHouse Telemetry Pipeline with OpenTelemetry and GlassFlow

This guide shows you how to stream OTLP data, perform deduplication and PII masking with GlassFlow and send data directly into ClickHouse for high-performance, low-cost observability.

Written by

Armend Avdijaj

Tutorials

Build a ClickHouse Telemetry Pipeline with OpenTelemetry and GlassFlow

This guide shows you how to stream OTLP data, perform deduplication and PII masking with GlassFlow and send data directly into ClickHouse for high-performance, low-cost observability.

Written by

Armend Avdijaj