Product Updates
GlassFlow v3.1.0 and v3.2.0 bring end-to-end backpressure handling, resilient ClickHouse retries, and full pipeline observability.
Written by
Pablo Pardo Garcia
-

Over the past few weeks, we've shipped GlassFlow v3.1.0 and v3.2.0, which together complete a major reliability milestone for GlassFlow's ClickHouse ETL: end-to-end backpressure handling and observable, resilient data delivery from source to sink.
Here's a summary of what changed and why it matters.
v3.1.0: Backpressure Across the Entire Data Plane
Before v3.1.0, when a downstream stage couldn't keep up, upstream components had no coordinated way to slow down. Events would end up in the DLQ, causing NATS memory to fill up and the pipeline to enter a fail state — a pattern we've explored in depth in how Kafka-to-ClickHouse pipelines fail under pressure. v3.1.0 closes that gap.
First-class backpressure handling
Every stage in the pipeline, from the Kafka ingestor, dedup, join, OTLP receiver, or the JetStream publisher, now detects when the downstream operator is full, retries with bounded backoff, and emits Prometheus metrics so you can see it happening in real time.
A new gfm_ingestor_backpressure_* metric family covers active state, episode count, and per-episode duration. You can now alert directly on gfm_ingestor_backpressure_active == 1 for 5m instead of inferring from lag.
Configurable inter-stage stream caps (MaxMsgs)
Inter-stage NATS streams now have an explicit MaxMsgs cap via resources.nats.stream.maxMsgs in the pipeline config. The default is auto-computed from your sink batch size, so most pipelines need no manual tuning. When a stream hits its cap, producing components enter a backpressure episode without dropping events.
Chunked NATS publishing
The NATS async batch writer now chunks payloads to a hard memory cap per request, preventing OOM scenarios when a single Kafka or OTLP request would otherwise produce an outsized NATS write.
Automated data migration container
Upgrading from v2 to v3? A one-shot Kubernetes Job glassflow/glassflow-etl-data-migration, included in Helm chart v0.5.17, automatically migrates any v2-format pipeline configs in Postgres to v3 on chart install or upgrade. No manual transformation needed for most cases.
There are no breaking changes from v3.0.0. Use Helm chart v0.5.17.
v3.2.0: Resilient Sink, Hardened OTLP, and Deeper Observability
v3.2.0 builds on the backpressure foundations from the previous release and makes the sink meaningfully more durable.
ClickHouse sink retries via NACK
The biggest change: the ClickHouse sink now classifies errors as retryable (timeouts, transient unavailability) or permanent, and handles each differently. Previously, any failure would route the batch to the DLQ on the first attempt.
Now, retryable errors NACK back to JetStream, which redelivers per consumer policy. Your data stays in the pipeline rather than landing in the DLQ unnecessarily. Permanent errors still route to DLQ as before.
Keeping events out of the DLQ is one side of the equation. Deduplication before ingestion is the other.
Three new metrics make this fully observable:
Gfm_sink_errors_by_classification_total: broken down by classification and error name
gfm_sink_nack_messages_totalgfm_sink_retries_total: by outcome (retryorexhausted)
If you monitor DLQ traffic, expect volume to drop after upgrading. Update your alerts to use gfm_sink_errors_by_classification_total{classification="permanent"} for a more accurate signal.
OTLP receiver hardening
The OTLP receiver now ships with explicit concurrency and memory controls:
maxConcurrentRequests: 50. When exceeded, the receiver returns503/ResourceExhausted, which standard OTel exporters handle with retrynatsChunkSize: 1000: bounds per-request memory regardless of upstream batch sizeFixed a wedge where the receiver would become stuck after a NATS cluster restart
If you're routing OTLP data through GlassFlow, this breakdown of when you actually need Kafka in your OTel pipeline is worth a read.
Operator concurrent reconciles
The Kubernetes operator now reconciles up to 4 pipelines in parallel (configurable via controllerManager.manager.maxConcurrentReconciles). A long-running reconcile on one pipeline no longer blocks reconciles on all others.
End-to-end backpressure observability
Every component (ingestor, dedup, join, OTLP receiver) now emits a ComponentSignal to the operator when a backpressure episode starts with a 5-minute cooldown to avoid control-plane noise. The new gfm_component_backpressure_* metric family gives you full per-component visibility. DLQ metrics now also carry a reason label (parse_error, schema_mismatch, sink_rejection, retry_exhausted, dedup_overflow, unrecoverable) so you can break down traffic by cause.
For a full example of wiring these metrics into a live observability stack, see how to Build a ClickHouse Telemetry Pipeline with OpenTelemetry and GlassFlow.
New Sources / Integrations directory
The docs now include a dedicated Sources / Integrations directory covering all 29 supported sources from streaming platforms, telemetry collectors, databases, all the way to object storage, and table formats. Each comes with per-source guides and Open Source / Enterprise labels.
There are no breaking changes from v3.1.0. Use Helm chart v0.5.21.
How to Upgrade
Both releases follow the same upgrade path via Helm:
v3.1.0 → chart
v0.5.17(from v3.0.0)v3.2.0 → chart
v0.5.21(from v3.1.0)
See the Kubernetes installation guide for full instructions.
Read the Full Release Notes
All the details, PR references, and migration notes are in the docs:
Questions? Talk to the Team
If you're upgrading, evaluating GlassFlow, or have questions about how these changes affect your pipelines, we'd love to hear from you.
Or reach out via our contact form
We're always happy to jump on a call and dive into the details.
Did you like this article? Share it!
You might also like

Data transformations at TB scale for ClickHouse
Get query ready data, lower ClickHouse load, and reliable pipelines at enterprise scale.



