Understand ClickHouse
Overview
ClickHouse is a high-performance, column-oriented database management system (DBMS) designed for online analytical processing (OLAP). It enables organizations to store, process, and analyze large volumes of data in real-time.
Unlike traditional row-based databases, ClickHouse stores data in columns, which makes it faster and more efficient for running complex queries across massive datasets.
ClickHouse is available as both open-source software and a managed cloud offering by the ClickHouse team. This documentation focuses on the open-source version.
Example Use Case
A retail company can use ClickHouse to instantly generate reports showing top-selling products, sales trends, and regional performance, even when analyzing billions of records. This enables fast, data-driven decision-making.
Key Points:
- Optimized for real-time analytics
- Handles large-scale data with low latency
- Supports standard SQL queries
- Ideal for dashboards, monitoring, and reporting use cases
Availability:
ClickHouse is available in two forms:
- Open Source: You can download, install, and run ClickHouse on your infrastructure at no cost.
- Cloud Offering: You can use ClickHouse as a fully managed service in the cloud, eliminating the need to manage infrastructure or updates.
For more information, see open-source software and a cloud offering.
This flexibility allows you to choose the deployment method that best fits your project and operational needs.
For more information about ClickHouse, see ClickHouse Docs.
Understand Analytics
Analytics, or Online Analytical Processing (OLAP), involves executing SQL queries that perform complex calculations—such as aggregations, string operations, and arithmetic—on very large datasets.
Unlike Online Transaction Processing (OLTP), which handles small, fast queries that read or write a few rows at a time, OLAP queries often scan billions or even trillions of rows. These queries are designed to support business intelligence, reporting, and data exploration at scale.
In many scenarios, especially dashboards and real-time monitoring, analytics queries are expected to return results in under one second.
Key Features
Data Replication and Integrity
ClickHouse uses an asynchronous multi-master replication scheme to ensure that data is stored redundantly on multiple nodes. After being written to any available replica, all the remaining replicas retrieve their copy in the background.
The system maintains identical data on different replicas. Recovery after most failures is performed automatically, or semi-automatically in complex cases.
Role-Based Access Control
ClickHouse implements user account management using SQL queries and allows for role-based access control configuration similar to what can be found in the ANSI SQL standard and popular relational database management systems.
Approximate calculation
ClickHouse provides ways to trade accuracy for performance.
For example, some of its aggregate functions calculate the distinct value count, the median, and quantiles approximately.
Additionally, queries can be run on a sample of the data to compute an approximate result quickly. Finally, aggregations can be run with a limited number of keys instead of for all keys.
Depending on how skewed the distribution of the keys is, this can provide a reasonably accurate result that uses far fewer resources than an exact calculation.
Adaptive Join Algorithms
ClickHouse chooses the join algorithm adaptively; it starts with fast hash joins and falls back to merge joins if there's more than one large table.
Superior query performance
Data orientation from row to columnar-based storage
Storage Layer: Concurrent inserts are isolated from each other
ClickHouse stores data in multiple table parts, each created per INSERT. Queries scan all parts, while background merges combine smaller parts into larger ones. This design enables fast, lock-free inserts with minimal synchronization, optimizing for high-throughput writes and efficient storage — a key part of ClickHouse’s holistic performance strategy.
Keeping Writes Lightweight
ClickHouse uses different merge strategies to optimize storage and write performance. These include:
- Replacing merges: Retain only the most recent version of a row from the input parts and discard older versions. This acts as a merge-time cleanup operation.
- Aggregating merges: Combine intermediate aggregation states into a new aggregated result. This effectively implements incremental aggregation.
- TTL (time-to-live) merges: Automatically compress, move, or delete rows based on time-based rules defined in the table schema.
Storage Layer: Data Pruning
ClickHouse optimizes query performance by reducing the amount of data read during execution. This is achieved through:
- Primary keys: Enable fast binary search, avoiding full table scans.
- Table projections: Provide alternate data layouts optimized for common filters.
- Skipping indexes: Store metadata (e.g., min/max values) to eliminate unnecessary reads.
These techniques help skip irrelevant rows and improve overall query efficiency.
Storage Layer: Data Compression
ClickHouse supports optional data compression using a variety of codecs.
Columnar storage naturally groups similar data types, resulting in high compression efficiency. Users can select from:
- General-purpose codecs (e.g., ZSTD)
- Specialized codecs (e.g., Delta, Gorilla, AES)
Compression not only reduces storage requirements but also improves query performance by minimizing disk and network I/O.
When compared with ORC and Parquet, ClickHouse's native compression delivers competitive efficiency and performance.
Compression Comparison Summary
Feature | ClickHouse | ORC | Parquet |
---|---|---|---|
Compression Efficiency | High (with ZSTD, LZ4, Delta, Gorilla, etc.) | Very High (adaptive codecs, dictionary, RLE) | High (with ZSTD, Snappy, Delta) |
Custom Codec Support | ✅ Yes (column-level, specialized codecs) | ❌ Not user-configurable per column | ⚠️ Limited (ZSTD, Delta, Brotli per column) |
Codec Examples | ZSTD, LZ4, Gorilla, FPC, Delta, AES, GCD | ZLIB, ZSTD, LZ4, RLE, Dictionary | Snappy, ZSTD, Brotli, Delta, Dictionary |
Targeted Data Types | Strong for float/int/low-cardinality columns | Optimized for structured, typed tabular data | General-purpose analytics |
Performance on Read | Very fast due to skipping + compression | Good, but slower than ClickHouse in most cases | Good, sometimes faster than ORC on simple reads |
Write Performance | Extremely fast | Slower (due to indexing and metadata) | Moderate |
Encryption Codec | ✅ AES available as a codec | ❌ (handled outside ORC, e.g. Hadoop layer) | ❌ (same) |
Low-Level Optimizations in ClickHouse ClickHouse achieves high performance through deep, query-specific optimizations. Instead of relying on general-purpose data structures, it uses over 30 specialized hash table variants and dynamically selects the most efficient sorting algorithm based on data type, size, and query logic. This fine-tuned approach significantly enhances query speed and efficiency.
Get Started with ClickHouse
You can refer to the ClickHouse Starter Guides for quick examples and usage of common SQL operations such as CREATE
, INSERT
, SELECT
, UPDATE
, and DELETE
.