Data Storage
A table consists of data parts sorted by primary key.
When data is inserted in a table, separate data parts are created and each of them is lexicographically sorted by primary key. For example, if the primary key is (CounterID, Date)
, the data in the part is sorted by CounterID
, and within each CounterID
, it is ordered by Date
.
Data belonging to different partitions are separated into different parts. In the background, ClickHouse merges data parts for more efficient storage. Parts belonging to different partitions are not merged. The merge mechanism does not guarantee that all rows with the same primary key will be in the same data part.
Data parts can be stored in Wide
or Compact
format. In Wide
format each column is stored in a separate file in a filesystem, in Compact
format all columns are stored in one file. Compact
format can be used to increase performance of small and frequent inserts.
Data storing format is controlled by the min_bytes_for_wide_part
and min_rows_for_wide_part
settings of the table engine. If the number of bytes or rows in a data part is less then the corresponding setting's value, the part is stored in Compact
format. Otherwise it is stored in Wide
format. If none of these settings is set, data parts are stored in Wide
format.
Each data part is logically divided into granules. A granule is the smallest indivisible data set that ClickHouse reads when selecting data. ClickHouse does not split rows or values, so each granule always contains an integer number of rows. The first row of a granule is marked with the value of the primary key for the row. For each data part, ClickHouse creates an index file that stores the marks. For each column, whether it's in the primary key or not, ClickHouse also stores the same marks. These marks let you find data directly in column files.
The granule size is restricted by the index_granularity
and index_granularity_bytes
settings of the table engine. The number of rows in a granule lays in the [1, index_granularity]
range, depending on the size of the rows. The size of a granule can exceed index_granularity_bytes
if the size of a single row is greater than the value of the setting. In this case, the size of the granule equals the size of the row.