Set up real-time ETL pipelines to ClickHouse with GlassFlow and Docker.
Written by
Vimalraj Selvam
Jul 9, 2025
🚀 Getting Started with Glassflow’s ClickHouse ETL: A Local Setup with Docker
This article was created by Vimalraj Selvam and originally posted on his blog.
If you are working with ClickHouse for observability or analytics and want a clean, low‑code way to stream and transform logs before they reach your database, Glassflow’s clickhouse‑etl is a solid option.
In this guide, you will set up a local end‑to‑end ETL pipeline using Glassflow’s clickhouse-etl UI and backend, NATS, Kafka, OpenTelemetry Collector (as a gateway), Fluent Bit, and ClickHouse — all tied together with Docker Compose.
Architecture Overview
High‑level flow:
Fluent Bit generates dummy logs and sends them via OTLP to the OTEL Collector.
The OTEL Collector batches and ships logs to Kafka in a raw format (why not OTLP is discussed later).
The Glassflow backend listens to Kafka topics, lets you define transformations via the UI, and writes into ClickHouse.
The Glassflow UI provides a clean interface to manage ETL jobs.
NATS acts as an internal messaging bus used by the backend to orchestrate ETL job execution and events.
ClickHouse stores the final structured data.
🐳 Docker Compose Stack
Create a docker-compose.yaml with the following content:
Services included
fluentbit: generates and ships dummy logs
otel-collector: processes and exports logs to Kafka
kafka: queues raw log events
clickhouse: stores logs
app and ui: Glassflow backend and frontend
nats: internal messaging bus for the backend
nginx: reverse proxy
Start the full stack:
Once running, open the Glassflow UI: http://localhost:8080.
🔥 Fluent Bit: Generating Logs with Timestamps
We use the dummy input and a Lua script that adds a counter and timestamp to every log.
fluent-bit.conf
counter.lua
Example emitted log:
📦 OpenTelemetry Collector: Kafka Exporter
The OTEL Collector handles batching and delivery to Kafka.
otel-config.yaml
Note: This example uses
rawencoding.otlp_jsonis not used here because Glassflow’sclickhouse-etlcurrently does not process nested JSON structures.
🧩 ClickHouse: Schema to Store Logs
Create the table:
This schema expects:
event_timestamp: Unix timestamp derived from
timestampbody: Log message from
msgcounter: A sequential value for ordering; useful when testing for data loss
🌐 Glassflow ETL: UI‑Driven Pipeline Configuration
When the UI is live at http://localhost:8080, you will see a welcome page:

Select Ingest Only for now. The Deduplicate option is helpful when you suspect duplicate logs. For this walkthrough we will keep it simple with Ingest Only.
Set up the Kafka connection. For a local setup, choose No authentication.

Select the topic where logs are produced and choose the offset (earliest or latest).

Configure the ClickHouse connection:

Select the database and destination table:

Save the pipeline, then head to the ClickHouse UI and query your logs:

All end‑to‑end configurations are available here: Part 1.
📌 Final Thoughts
Glassflow’s clickhouse-etl provides a powerful abstraction for streaming logs from Kafka into ClickHouse. It is ideal for observability pipelines, troubleshooting, and analytics systems. With a visual UI, a modular backend, and native Kafka support, it enables rapid prototyping and production deployments with minimal friction.
The Glassflow team has a strong roadmap. Features I am excited about include:
Support for multiple pipelines (in progress)
Real‑time ingestion view
Field transformations, aggregations, and filtering in the UI
🔗 References
ClickHouse Docs: https://clickhouse.com/docs/
OpenTelemetry Collector: https://opentelemetry.io/docs/collector/
Fluent Bit: https://docs.fluentbit.io/




