This article was created by Vimalraj Selvam and originally posted on his blog
🚀 Getting Started with Glassflow’s ClickHouse ETL: A Local Setup with Docker
If you're working with ClickHouse for observability or analytics and looking for a clean, low-code way to stream and transform logs before they hit your database, Glassflow's clickhouse-etl might be the tool you're looking for.
In this blog post, I'll walk through setting up a local end-to-end ETL pipeline using Glassflow's clickhouse-etl
UI/backend, NATS, Kafka, OpenTelemetry Collector as a Gateway, Fluent Bit, and ClickHouse - all tied together via Docker Compose.
Architecture Overview
Here's the high-level flow:
- Fluent Bit generates dummy logs and sends them via OTLP to OTEL Collector.
- OTEL Collector batches and ships logs to Kafka in a raw format (why not OTLP? Discussed later).
- Glassflow Backend listens to Kafka topics, allows transformation configs via UI, and writes into ClickHouse.
- Glassflow UI provides a powerful interface to manage ETL jobs.
- NATS acts as an internal messaging bus used by the Glassflow backend to orchestrate ETL job execution and event handling.
- ClickHouse stores the final structured data.
🐳 Docker Compose Stack
The complete docker-compose.yaml
file as follows:
This includes:
fluentbit
: generates and sends dummy logsotel-collector
: processes and exports logs to Kafkakafka
: queues raw log eventsclickhouse
: stores logsapp
andui
: Glassflow backend and frontendnats
: internal messaging bus for the backendnginx
: reverse proxy
Run the full stack with:
Once it's up, access the Glassflow UI at http://localhost:8080.
🔥 Fluent Bit – Generating Logs with Timestamps
We use the dummy input and a Lua script that adds a counter and timestamp to every log.
fluent-bit.conf:
counter.lua:
This setup emits logs like:
📦 OpenTelemetry Collector – Kafka Exporter
The OTEL Collector handles log batching and delivery to Kafka.
otel-config.yaml:
NOTE
Observe that I'm using raw
as encoding. The otlp_json
encoding can't be used at this moment as Glassflow's clickhouse-etl is currently cannot process nested json structure.
🧩 ClickHouse – Schema to Store Logs
Create a table in ClickHouse for storing processed logs:
This schema expects:
event_timestamp
: UNIX timestamp derived from timestampbody
: Log message (from msg)counter
: A sequential value for ordering - I'm using to ensure no data loss while testing.
🌐 Glassflow ETL – UI-Driven Pipeline Configuration
Once the UI is up at http://localhost:8080, you will be presented with the following nice welcome page:
Select Ingest Only
for now. The Deduplicate
option is very useful when you believe that your logs are duplicated. For my testing, I just use Ingest Only
, and we can dig into each options in another post.
Now setup the Kafka Connection, we use no authentication option for our local setup, hence select the No authentication
option (🙏 Thank you for the Glassflow developers for the quick feature addition on this to enable local testing easier).
Once the Kafka connection is setup, now select the topic where the logs are produced and select the offset either earliest
or latest
.
Next step is to setup the ClickHouse connection:
On the final step, select the database and the destination table as follows:
Once you saved the pipeline, then you can head to your ClickHouse play interface and query the logs:
As you see, it is quite easy to setup and use the UI to select and transform the data as per your need.
All the working e2e configurations are available here - Part 1
📌 Final Thoughts
Glassflow’s clickhouse-etl offers a powerful abstraction layer for streaming logs from Kafka into ClickHouse. It’s ideal for building observability pipelines, troubleshooting tools, or analytics systems. With its visual UI, modular backend, and native Kafka support, it enables rapid prototyping and production deployments with minimal friction.
Glassflow team has a very good roadmap lined up to add more features, the features which I'm more interested and waiting for are:
- Support for multiple pipelines (the feature is already in progress)
- Real-time ingestion view
- Field transformations / aggregations / filtering through UI
I strongly believe that the team behind Glassflow is adding more features and enabling us to create more powerful pipelines.
🔗 References
- GitHub: https://github.com/glassflow/clickhouse-etl
- ClickHouse Docs: https://clickhouse.com/docs/
- OpenTelemetry Collector: https://opentelemetry.io/docs/collector/
- Fluent Bit: https://docs.fluentbit.io/