How to choose the right data types in ClickHouse
When dealing with large tables and looking for the most efficient and performant queries, data types need to be chosen carefully.
Three basic rules:
- Don't use floats for integers
- Choose the right precision for numbers, and favor the lower precision
- Use
LowCardinality(String)
orFixedString
for text when possible
Let's see a basic example looking at storage and processed bytes for a simple query.
CREATE TABLE deleteme_wrong_type
(
`number` UInt64
)
ENGINE = MergeTree
PARTITION BY number % 10
ORDER BY number AS
SELECT number % 100
FROM numbers(10000000)
While ClickHouse does a great job compressing...
SELECT
total_rows,
formatReadableSize(total_bytes) AS bytes
FROM system.tables
WHERE name = 'deleteme_wrong_type'
FORMAT Vertical
Row 1:
──────
total_rows: 10000000
bytes: 397.03 KiB
...but, when it's time to run a query, you end up processing 80MB of data:
SELECT sum(number)
FROM deleteme_wrong_type
┌─sum(number)─┐
│ 49500000000 │
└─────────────┘
1 rows in set. Elapsed: 0.013 sec. Processed 10.00 million rows, 80.00 MB (767.57 million rows/s., 6.14 GB/s.)
In this case you don't need more than a UInt8
, so let's compare the difference using the right type for the job:
CREATE TABLE deleteme_right_type
(
`number` UInt8
)
ENGINE = MergeTree
PARTITION BY number % 10
ORDER BY number AS
SELECT number % 100
FROM numbers(10000000)
Storage is more or less the same...
SELECT
total_rows,
formatReadableSize(total_bytes) AS bytes
FROM system.tables
WHERE name = 'deleteme_right_type'
FORMAT Vertical
Row 1:
──────
total_rows: 1000000000
bytes: 110.89 KiB
...but, when querying, there's a huge difference in bytes processed, which is 8 times lower at 10MB:
SELECT sum(number)
FROM deleteme_right_type
Query id: 8df38fab-2251-4814-aa1f-9434ca942525
┌─sum(number)─┐
│ 495000000 │
└─────────────┘
1 rows in set. Elapsed: 0.005 sec. Processed 10.00 million rows, 10.00 MB (1.98 billion rows/s., 1.98 GB/s.)
Now imagine you run that query over and over and over... 💸
Choose your data types carefully!