No-Code Clickhouse ETL with Glassflow

No-Code Clickhouse ETL with Glassflow

No-Code Clickhouse ETL with Glassflow

Set up real-time ETL pipelines to ClickHouse with GlassFlow and Docker.

Written by

Vimalraj Selvam

Jul 9, 2025

🚀 Getting Started with Glassflow’s ClickHouse ETL: A Local Setup with Docker

This article was created by Vimalraj Selvam and originally posted on his blog.

If you are working with ClickHouse for observability or analytics and want a clean, low‑code way to stream and transform logs before they reach your database, Glassflow’s clickhouse‑etl is a solid option.

In this guide, you will set up a local end‑to‑end ETL pipeline using Glassflow’s clickhouse-etl UI and backend, NATS, Kafka, OpenTelemetry Collector (as a gateway), Fluent Bit, and ClickHouse — all tied together with Docker Compose.

Architecture Overview

High‑level flow:

[Fluent Bit][OTEL Collector][Kafka][Glassflow ETL (App + NATS)][ClickHouse]
  • Fluent Bit generates dummy logs and sends them via OTLP to the OTEL Collector.

  • The OTEL Collector batches and ships logs to Kafka in a raw format (why not OTLP is discussed later).

  • The Glassflow backend listens to Kafka topics, lets you define transformations via the UI, and writes into ClickHouse.

  • The Glassflow UI provides a clean interface to manage ETL jobs.

  • NATS acts as an internal messaging bus used by the backend to orchestrate ETL job execution and events.

  • ClickHouse stores the final structured data.

🐳 Docker Compose Stack

Create a docker-compose.yaml with the following content:

services:
  nats:
    image: nats:alpine
    ports:
      - "4222:4222"
    command: --js
    restart: unless-stopped

  ui:
    image: glassflow/clickhouse-etl-fe:stable
    pull_policy: always
    environment:
      - NEXT_PUBLIC_API_URL=${NEXT_PUBLIC_API_URL:-http://app:8080/api/v1}
      - NEXT_PUBLIC_IN_DOCKER=${NEXT_PUBLIC_IN_DOCKER:-true}

  app:
    image: glassflow/clickhouse-etl-be:stable
    pull_policy: always
    depends_on:
      - nats
    restart: unless-stopped
    environment:
      GLASSFLOW_LOG_FILE_PATH: /tmp/logs/glassflow
      GLASSFLOW_NATS_SERVER: nats:4222
    volumes:
      - logs:/tmp/logs/glassflow

  nginx:
    image: nginx:1.27-alpine
    ports:
      - "8080:8080"
    depends_on:
      - ui
      - app
    volumes:
      - logs:/logs:ro
      - ./nginx:/etc/nginx/templates
    restart: unless-stopped
    environment:
      NGINX_ENTRYPOINT_LOCAL_RESOLVERS: "true"

  kafka:
    image: confluentinc/cp-kafka:7.9.0
    hostname: kafka
    container_name: kafka
    ports:
      - "9092:9092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_EXTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_EXTERNAL://localhost:9092,PLAINTEXT_INTERNAL://kafka:9093
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_NODE_ID: 1
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:29093
      KAFKA_LISTENERS: PLAINTEXT://kafka:29092,CONTROLLER://kafka:29093,PLAINTEXT_EXTERNAL://0.0.0.0:9092,PLAINTEXT_INTERNAL://0.0.0.0:9093
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LOG_DIRS: /tmp/kraft-combined-logs
      CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk

  clickhouse:
    image: clickhouse/clickhouse-server
    user: "101:101"
    container_name: clickhouse
    hostname: clickhouse
    ports:
      - "8123:8123"
      - "9000:9000"
    volumes:
      - ../config/clickhouse/config.d/config.xml:/etc/clickhouse-server/config.d/config.xml
      - ../config/clickhouse/users.d/users.xml:/etc/clickhouse-server/users.d/users.xml

  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    volumes:
      - ./otel/otel-config.yaml:/etc/otel-config.yaml
    command: ["--config=/etc/otel-config.yaml"]
    depends_on:
      - kafka
    ports:
      - "4317:4317"
      - "4318:4318"

  fluentbit:
    image: cr.fluentbit.io/fluent/fluent-bit:4.0.1
    volumes:
      - ./fluent-bit/fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf
      - ./fluent-bit/counter.lua:/fluent-bit/etc/counter.lua
    depends_on:
      - otel-collector
    command: ["/fluent-bit/bin/fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.conf"]

volumes:
  logs

Services included

  • fluentbit: generates and ships dummy logs

  • otel-collector: processes and exports logs to Kafka

  • kafka: queues raw log events

  • clickhouse: stores logs

  • app and ui: Glassflow backend and frontend

  • nats: internal messaging bus for the backend

  • nginx: reverse proxy

Start the full stack:

docker compose up -d

Once running, open the Glassflow UI: http://localhost:8080.

🔥 Fluent Bit: Generating Logs with Timestamps

We use the dummy input and a Lua script that adds a counter and timestamp to every log.

fluent-bit.conf

[INPUT]
    Name    dummy
    Tag     test.logs
    Dummy   {"initial": "start"}
    Rate    1

[FILTER]
    Name    lua
    Match   test.logs
    script  /fluent-bit/etc/counter.lua
    call    gen_log

[OUTPUT]
    Name                    opentelemetry
    Match                   *
    Host                    otel-collector
    Port                    4318
    Logs_uri                /v1/logs
    Log_response_payload    true
    logs_body_key           $message
    add_label               app fluent-bit

counter.lua

counter = 0

function gen_log(tag, timestamp, record)
    counter = counter + 1
    local new_record = {}
    new_record["timestamp"] = timestamp
    new_record["counter"] = counter
    new_record["msg"] = "Test log message #" .. counter
    return 1, timestamp, new_record
end

Example emitted log:

{
  "timestamp": 1719050000,
  "counter": 101,
  "msg": "Test log message #101"
}

📦 OpenTelemetry Collector: Kafka Exporter

The OTEL Collector handles batching and delivery to Kafka.

otel-config.yaml

receivers:
  otlp:
    protocols:
      grpc:
      http:
        endpoint: 0.0.0.0:4318
processors:
  batch:
  memory_limiter:
    limit_mib: 100
    spike_limit_mib: 20
    check_interval: 5s
exporters:
  kafka:
    brokers: ["kafka:9093"]
    topic: "logs"
    encoding: raw

service:
  pipelines:
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [kafka]

telemetry:
  logs:
    level

Note: This example uses raw encoding. otlp_json is not used here because Glassflow’s clickhouse-etl currently does not process nested JSON structures.

🧩 ClickHouse: Schema to Store Logs

Create the table:

CREATE TABLE IF NOT EXISTS logs (
    event_timestamp DateTime,
    body String,
    counter Int32
) ENGINE = MergeTree
ORDER BY

This schema expects:

  • event_timestamp: Unix timestamp derived from timestamp

  • body: Log message from msg

  • counter: A sequential value for ordering; useful when testing for data loss

🌐 Glassflow ETL: UI‑Driven Pipeline Configuration

When the UI is live at http://localhost:8080, you will see a welcome page:


1.webp

Select Ingest Only for now. The Deduplicate option is helpful when you suspect duplicate logs. For this walkthrough we will keep it simple with Ingest Only.

Set up the Kafka connection. For a local setup, choose No authentication.


2.webp

Select the topic where logs are produced and choose the offset (earliest or latest).


3.webp

Configure the ClickHouse connection:


4.webp

Select the database and destination table:


5.webp

Save the pipeline, then head to the ClickHouse UI and query your logs:


6.webp

All end‑to‑end configurations are available here: Part 1.

📌 Final Thoughts

Glassflow’s clickhouse-etl provides a powerful abstraction for streaming logs from Kafka into ClickHouse. It is ideal for observability pipelines, troubleshooting, and analytics systems. With a visual UI, a modular backend, and native Kafka support, it enables rapid prototyping and production deployments with minimal friction.

The Glassflow team has a strong roadmap. Features I am excited about include:

  • Support for multiple pipelines (in progress)

  • Real‑time ingestion view

  • Field transformations, aggregations, and filtering in the UI

🔗 References

Did you like this article? Share it!

You might also like

Cleaned Kafka Streams for ClickHouse

Clean Data. No maintenance. Less load for ClickHouse.

Cleaned Kafka Streams for ClickHouse

Clean Data. No maintenance. Less load for ClickHouse.

Cleaned Kafka Streams for ClickHouse

Clean Data. No maintenance. Less load for ClickHouse.