Use Column Encoding

1 min read Dec 06, 2021

Adding compression to large, uncompressed columns will have a big impact on cluster performance. Compression accomplishes two things:

Reduce storage utilization. Because file compression reduces the size footprint of data, you’ll use less of the disk on your cluster nodes.
Improve query performance. Because there is less data to scan or join on, I/O usage is limited which increases query speeds.

We recommend using the Zstandard (ZSTD) encoding algorithm. This relatively new algorithm provides a high compression ratio and works across all Amazon Redshift data types. ZSTD is especially good with VARCHAR and CHAR fields that have a mixture of long and short strings. Also, unlike some of the other algorithms, ZSTD is unlikely to increase storage utilization,

Below is a real-world example of applying ZSTD to three Amazon Redshift logging tables. The average storage reduction is over 50%!