Explore the Architecture

ClickHouse is architected with a layered design for high performance and flexibility.

Core Layers

  1. Query Processing Layer: This layer handles parsing, logical/physical plan building, and execution using a vectorized engine, similar to MonetDB/X100. It supports SQL, PRQL, and Kusto-like KQL. Runtime optimizations such as opportunistic compilation make it extremely efficient across diverse query types and data patterns.

  2. Storage Layer: The storage backend is pluggable with multiple table engines:

    • MergeTree engines (e.g. ReplacingMergeTree, AggregatingMergeTree) form the core persistent format. Based on LSM-tree principles, they maintain sorted parts and continuously merge them in the background.
    • Special-purpose engines like Dictionaries support fast in-memory key-value lookups (even from external sources), and Distributed engines allow transparent sharding across nodes.
    • Virtual engines integrate with external systems like MySQL, Kafka, S3, Iceberg, and Redis for both read and write, enabling data federation and hybrid architectures.
  3. Integration Layer: Facilitates connections to external systems via protocols (MySQL, PostgreSQL, HTTP), or direct engine bindings. It also includes observability, role-based access control, and backup hooks.

  4. Access Layer: Manages sessions, authentication, and protocol-level communication with clients.

Performance Engineering

ClickHouse emphasizes low-level optimizations:

  • It chooses from 30+ hash table implementations dynamically at runtime.
  • Uses data-aware algorithms for sorting, filtering, and join operations.
  • Employs column-level compression codecs (like ZSTD, Gorilla, Delta, AES) for high-speed storage and efficient I/O.

Sharding and Replication

  • Sharding enables massive scalability by partitioning data horizontally across nodes.
  • Replication is handled via ReplicatedMergeTree engines using a Raft-based coordination mechanism called Keeper (a C++ Zookeeper replacement).
  • The Distributed engine provides a global view across shards.

Deployment Modes

  • On-prem: Classic ClickHouse cluster or single-node server.
  • Cloud: Fully-managed ClickHouse Cloud (autoscaling DBaaS).
  • Standalone: CLI tool for local analytics, replacing tools like awk, grep.
  • Embedded (chDB): In-process OLAP database, ideal for Jupyter/Pandas workflows. No IPC or serialization overhead.

For detailed information, see ClickHouse Architecture.

ACID-compliant

ACID PropertyClickHouse SupportNotes
AtomicityNot guaranteedInserts are not forced to disk (fsync is skipped by default), so power loss can lead to data loss.
Consistency⚠️ PartiallyNo constraints or transactional checks across tables; assumes clean input.
IsolationSnapshot IsolationQueries see a consistent snapshot via MVCC-like versioned parts.
DurabilityOptionalNo fsync by default, meaning durability isn’t guaranteed unless explicitly configured.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated