Understand ClickHouse

Overview

ClickHouse is a high-performance, column-oriented database management system (DBMS) designed for online analytical processing (OLAP). It enables organizations to store, process, and analyze large volumes of data in real-time.

Unlike traditional row-based databases, ClickHouse stores data in columns, which makes it faster and more efficient for running complex queries across massive datasets.

ClickHouse is available as both open-source software and a managed cloud offering by the ClickHouse team. This documentation focuses on the open-source version.

Example Use Case

A retail company can use ClickHouse to instantly generate reports showing top-selling products, sales trends, and regional performance, even when analyzing billions of records. This enables fast, data-driven decision-making.

Key Points:

Optimized for real-time analytics
Handles large-scale data with low latency
Supports standard SQL queries
Ideal for dashboards, monitoring, and reporting use cases

Availability:

ClickHouse is available in two forms:

Open Source: You can download, install, and run ClickHouse on your infrastructure at no cost.
Cloud Offering: You can use ClickHouse as a fully managed service in the cloud, eliminating the need to manage infrastructure or updates.

For more information, see open-source software and a cloud offering.

This flexibility allows you to choose the deployment method that best fits your project and operational needs.

For more information about ClickHouse, see ClickHouse Docs.

Understand Analytics

Analytics, or Online Analytical Processing (OLAP), involves executing SQL queries that perform complex calculations—such as aggregations, string operations, and arithmetic—on very large datasets.

Unlike Online Transaction Processing (OLTP), which handles small, fast queries that read or write a few rows at a time, OLAP queries often scan billions or even trillions of rows. These queries are designed to support business intelligence, reporting, and data exploration at scale.

In many scenarios, especially dashboards and real-time monitoring, analytics queries are expected to return results in under one second.

Key Features

Data Replication and Integrity

ClickHouse uses an asynchronous multi-master replication scheme to ensure that data is stored redundantly on multiple nodes. After being written to any available replica, all the remaining replicas retrieve their copy in the background.

The system maintains identical data on different replicas. Recovery after most failures is performed automatically, or semi-automatically in complex cases.

Role-Based Access Control

ClickHouse implements user account management using SQL queries and allows for role-based access control configuration similar to what can be found in the ANSI SQL standard and popular relational database management systems.

Approximate calculation

ClickHouse provides ways to trade accuracy for performance.

For example, some of its aggregate functions calculate the distinct value count, the median, and quantiles approximately.

Additionally, queries can be run on a sample of the data to compute an approximate result quickly. Finally, aggregations can be run with a limited number of keys instead of for all keys.

Depending on how skewed the distribution of the keys is, this can provide a reasonably accurate result that uses far fewer resources than an exact calculation.

Adaptive Join Algorithms

ClickHouse chooses the join algorithm adaptively; it starts with fast hash joins and falls back to merge joins if there's more than one large table.

Superior query performance

Data orientation from row to columnar-based storage

Storage Layer: Concurrent inserts are isolated from each other

ClickHouse stores data in multiple table parts, each created per INSERT. Queries scan all parts, while background merges combine smaller parts into larger ones. This design enables fast, lock-free inserts with minimal synchronization, optimizing for high-throughput writes and efficient storage — a key part of ClickHouse’s holistic performance strategy.

Keeping Writes Lightweight

ClickHouse uses different merge strategies to optimize storage and write performance. These include:

Replacing merges: Retain only the most recent version of a row from the input parts and discard older versions. This acts as a merge-time cleanup operation.
Aggregating merges: Combine intermediate aggregation states into a new aggregated result. This effectively implements incremental aggregation.
TTL (time-to-live) merges: Automatically compress, move, or delete rows based on time-based rules defined in the table schema.

Storage Layer: Data Pruning

ClickHouse optimizes query performance by reducing the amount of data read during execution. This is achieved through:

Primary keys: Enable fast binary search, avoiding full table scans.
Table projections: Provide alternate data layouts optimized for common filters.
Skipping indexes: Store metadata (e.g., min/max values) to eliminate unnecessary reads.

These techniques help skip irrelevant rows and improve overall query efficiency.

Storage Layer: Data Compression

ClickHouse supports optional data compression using a variety of codecs.

Columnar storage naturally groups similar data types, resulting in high compression efficiency. Users can select from:

General-purpose codecs (e.g., ZSTD)
Specialized codecs (e.g., Delta, Gorilla, AES)

Compression not only reduces storage requirements but also improves query performance by minimizing disk and network I/O.

When compared with ORC and Parquet, ClickHouse's native compression delivers competitive efficiency and performance.

Compression Comparison Summary

Feature	ClickHouse	ORC	Parquet
Compression Efficiency	High (with ZSTD, LZ4, Delta, Gorilla, etc.)	Very High (adaptive codecs, dictionary, RLE)	High (with ZSTD, Snappy, Delta)
Custom Codec Support	✅ Yes (column-level, specialized codecs)	❌ Not user-configurable per column	⚠️ Limited (ZSTD, Delta, Brotli per column)
Codec Examples	ZSTD, LZ4, Gorilla, FPC, Delta, AES, GCD	ZLIB, ZSTD, LZ4, RLE, Dictionary	Snappy, ZSTD, Brotli, Delta, Dictionary
Targeted Data Types	Strong for float/int/low-cardinality columns	Optimized for structured, typed tabular data	General-purpose analytics
Performance on Read	Very fast due to skipping + compression	Good, but slower than ClickHouse in most cases	Good, sometimes faster than ORC on simple reads
Write Performance	Extremely fast	Slower (due to indexing and metadata)	Moderate
Encryption Codec	✅ AES available as a codec	❌ (handled outside ORC, e.g. Hadoop layer)	❌ (same)

Low-Level Optimizations in ClickHouse ClickHouse achieves high performance through deep, query-specific optimizations. Instead of relying on general-purpose data structures, it uses over 30 specialized hash table variants and dynamically selects the most efficient sorting algorithm based on data type, size, and query logic. This fine-tuned approach significantly enhances query speed and efficiency.

Get Started with ClickHouse

You can refer to the ClickHouse Starter Guides for quick examples and usage of common SQL operations such as CREATE, INSERT, SELECT, UPDATE, and DELETE.

Last updated on Aug 4, 2025

Was this page helpful?