Clickhouse Production Setup for Analytics

Appler LABS

Jul 26, 2025 · 5 min read

Clickhouse Production Setup for Analytics

Clickhouse :: The Ultimate ClickHouse Setup Guide for High-Throughput Analytics

Why ClickHouse?

ClickHouse has become the gold standard for analytical databases, capable of:

Processing millions of rows per second per server
Achieving 10-100x compression versus traditional databases
Delivering sub-second query response on petabyte-scale datasets
Handling thousands of concurrent queries with proper configuration

In this guide, I’ll walk you through a production-ready ClickHouse setup that can handle 100+ billion rows with ease.

Hardware Requirements

For a production ClickHouse cluster:

Node Type	CPU Cores	RAM	Storage	Network
ClickHouse	16-32	64-128GB	NVMe SSD (2-4TB per node)	10Gbps
ZooKeeper	4-8	16-32GB	SSD (500GB)	1Gbps

Throughput expectations:

Single server: 1-5 billion rows/sec for simple scans
Cluster: 50+ billion rows/sec aggregate throughput

Step 1: Installation

On Ubuntu/Debian:

###   Install dependencies
sudo apt-get install -y apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4

###   Add repository
echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee \
    /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update

###   Install server and client
sudo apt-get install -y clickhouse-server clickhouse-client

###   Start the service
sudo service clickhouse-server start

Configuration Tuning (`/etc/clickhouse-server/config.xml`):

<yandex>
    <logger>
        <level>information</level>
        <log>/var/log/clickhouse-server/clickhouse-server.log</log>
    </logger>

    <listen_host>0.0.0.0</listen_host>

    <max_connections>4096</max_connections>
    <keep_alive_timeout>3</keep_alive_timeout>

    <max_concurrent_queries>200</max_concurrent_queries>
    <max_threads>32</max_threads>

    <uncompressed_cache_size>8589934592</uncompressed_cache_size> <!-- 8GB -->
    <mark_cache_size>5368709120</mark_cache_size> <!-- 5GB -->

    <path>/var/lib/clickhouse/</path>
    <tmp_path>/var/lib/clickhouse/tmp/</tmp_path>
</yandex>

Step 2: Cluster Setup

3-Node Cluster Example

ZooKeeper Configuration (on all nodes):

<zookeeper>
    <node index="1">
        <host>zk1.example.com</host>
        <port>2181</port>
    </node>
    <node index="2">
        <host>zk2.example.com</host>
        <port>2181</port>
    </node>
    <node index="3">
        <host>zk3.example.com</host>
        <port>2181</port>
    </node>
</zookeeper>

Cluster Definition:

<remote_servers>
    <analytics_cluster>
        <shard>
            <weight>1</weight>
            <replica>
                <host>ch1.example.com</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>ch2.example.com</host>
                <port>9000</port>
            </replica>
        </shard>
        <shard>
            <weight>1</weight>
            <replica>
                <host>ch3.example.com</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>ch4.example.com</host>
                <port>9000</port>
            </replica>
        </shard>
    </analytics_cluster>
</remote_servers>

Step 3: Optimal Table Design

Event Data Example (100+ billion rows)

CREATE TABLE analytics.events
(
    `timestamp` DateTime64(3, 'UTC') CODEC(Delta(8), ZSTD(3)),
    `event_type` LowCardinality(String),
    `user_id` UInt64 CODEC(DoubleDelta, ZSTD(3)),
    `device_id` UInt32 CODEC(Gorilla, ZSTD(3)),
    `country` LowCardinality(String),
    `region` LowCardinality(String),
    `os` LowCardinality(String),
    `session_id` UUID,
    `page_url` String CODEC(ZSTD(3)),
    `referrer` String CODEC(ZSTD(3)),
    `duration` UInt32,
    `metrics` Nested(
        key String,
        value Float32
    )
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/analytics.events', '{replica}')
PARTITION BY toYYYYMM(timestamp)
ORDER BY (event_type, country, region, toStartOfHour(timestamp))
TTL timestamp + toIntervalMonth(6)
SETTINGS
    index_granularity = 8192,
    min_bytes_for_wide_part = 1073741824; -- 1GB

Materialized Views for Dashboards

CREATE MATERIALIZED VIEW analytics.events_hourly
ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/{shard}/analytics.events_hourly', '{replica}')
PARTITION BY toYYYYMM(hour)
ORDER BY (hour, event_type, country, region)
AS SELECT
    toStartOfHour(timestamp) AS hour,
    event_type,
    country,
    region,
    countState() AS events,
    uniqCombinedState(user_id) AS users,
    sumState(duration) AS total_duration,
    histogramState(100)(duration) AS duration_histogram
FROM analytics.events
GROUP BY hour, event_type, country, region;

Step 4: Data Loading

High-Throughput Insert Methods

Native Protocol (10M+ rows/sec):

###   Using clickhouse-driver
from clickhouse_driver import Client

client = Client('clickhouse-server')

data = [...]  ###   List of tuples
client.execute(
    "INSERT INTO analytics.events VALUES",
    data,
    types_check=True
)

Bulk Parquet Load (50M+ rows/sec):

INSERT INTO analytics.events
SELECT * FROM file('events.parquet', Parquet)

Kafka Engine (Real-time):

CREATE TABLE analytics.events_queue (
    -- Same schema as main table
) ENGINE = Kafka(
    'kafka-broker:9092',
    'events-topic',
    'clickhouse-group',
    'JSONEachRow'
);

CREATE MATERIALIZED VIEW analytics.events_consumer
TO analytics.events
AS SELECT * FROM analytics.events_queue;

Step 5: Query Optimization

Common Optimizations

Primary Key Design:

-- Good for time-based queries
ORDER BY (event_type, country, timestamp)

-- Good for user analytics
ORDER BY (user_id, timestamp)

Projections (ClickHouse 21.6+):

ALTER TABLE analytics.events
ADD PROJECTION user_sessions (
    SELECT 
        user_id,
        session_id,
        min(timestamp) AS session_start,
        max(timestamp) AS session_end,
        count() AS events,
        sum(duration) AS total_duration
    GROUP BY user_id, session_id
);

TTL Management:

ALTER TABLE analytics.events
MODIFY TTL timestamp + toIntervalMonth(3)
SETTINGS mutations_sync = 2;

Performance Benchmarks

Test Cluster (3 nodes, 32vCPU each)

Operation	Throughput	Latency
Inserts (batched)	12M rows/sec	50ms
Simple count	5B rows/sec	100ms
GroupBy (1M keys)	2B rows/sec	300ms
Join (1B x 1B)	500M rows/sec	2s

Query Examples

1. Real-time Dashboard (100ms response):

SELECT
    toStartOfHour(timestamp) AS hour,
    event_type,
    count() AS events,
    uniqCombined(user_id) AS users
FROM analytics.events
WHERE timestamp >= now() - INTERVAL 24 HOUR
GROUP BY hour, event_type
ORDER BY hour DESC, events DESC

2. User Funnel Analysis (500ms response):

WITH user_journey AS (
    SELECT
        user_id,
        sequenceMatch('(?1).*(?2).*(?3)')(
            timestamp,
            event_type = 'view',
            event_type = 'add_to_cart',
            event_type = 'purchase'
        ) AS has_converted
    FROM analytics.events
    WHERE timestamp BETWEEN '2023-01-01' AND '2023-01-07'
    GROUP BY user_id
)
SELECT
    sum(has_converted) AS converted_users,
    count() AS total_users,
    converted_users / total_users AS conversion_rate
FROM user_journey

Maintenance Operations

Routine Tasks

Monitor System Tables:

SELECT 
    table,
    sum(bytes) AS size_bytes,
    sum(rows) AS total_rows,
    max(modification_time) AS last_modified
FROM system.parts
WHERE active
GROUP BY table
ORDER BY size_bytes DESC;

Optimize Table:

OPTIMIZE TABLE analytics.events FINAL;

Backup Strategy:

###   Create backup
clickhouse-backup create analytics_backup

###   Upload to S3
clickhouse-backup upload analytics_backup

Scaling Recommendations

Vertical Scaling:
- Add more CPU cores (ClickHouse scales linearly)
- Increase RAM for larger aggregations
- Use NVMe storage for parts storage
Horizontal Scaling:
- Add more shards for write scalability
- Add more replicas for read scalability
- Consider multi-region deployment for global users
Resource Isolation:

<!-- config.xml -->
<profiles>
    <default>
        <max_memory_usage>10000000000</max_memory_usage>
    </default>
    <dashboard>
        <max_memory_usage>5000000000</max_memory_usage>
        <priority>10</priority>
    </dashboard>
    <reports>
        <max_memory_usage>20000000000</max_memory_usage>
        <priority>1</priority>
    </reports>
</profiles>

Common Pitfalls

Overpartitioning:
- Keep partitions under 100GB
- Monthly partitioning works for most cases
Excessive Granularity:
- index_granularity=8192 is good default
- Smaller values increase memory usage
JOIN Abuse:
- Prefer denormalized schemas
- Use dictionaries for small dimension tables
- Consider replacing joins with pre-aggregation

Monitoring Setup

Essential metrics to track:

-- Query throughput
SELECT 
    event_time,
    query_count,
    avg(query_duration_ms) AS avg_duration
FROM system.metrics
WHERE event_time > now() - INTERVAL 1 HOUR
GROUP BY event_time
ORDER BY event_time;

-- Memory usage
SELECT
    metric,
    value
FROM system.asynchronous_metrics
WHERE metric LIKE '%Memory%';

Grafana Dashboard Example:

###   ClickHouse Overview Dashboard
panels:
  - title: Query Rate
    targets:
      - expr: rate(clickhouse_query_total[1m])
        legend: "Queries/sec"
  - title: Memory Usage
    targets:
      - expr: clickhouse_memory_usage_bytes
        legend: "Used Memory"

Conclusion

This ClickHouse setup guide provides a production-ready configuration that can handle:

Data ingestion: 10M+ events per second
Query performance: Sub-second responses on 100B+ row tables
Concurrency: 1000+ simultaneous dashboard users
Storage efficiency: 10-20x compression ratios

Key takeaways:

Proper schema design is critical for performance
Materialized views enable real-time dashboards
Resource isolation prevents query interference
Monitoring is essential at scale

For most analytical workloads, ClickHouse delivers unparalleled price/performance compared to traditional data warehouses. Start small with a single node, then scale out as your data grows.