Write Smarter Queries

Amazon Redshift is a distributed, shared-nothing database that scales horizontally across multiple nodes. Query execution time is very tightly correlated with:

the # of rows and data a query processes.
the amount of data moving between nodes.

Below is an example of a poorly written query, and two optimizations to make it run faster.

with

table1_cte as

(

select * from table1

table2_cte as

(

select * from table2

select

from table1_cte as a

JOIN table2_cte as b

ON a.id = b.id

Optimization #1: Limit Rows Processed by using a WHERE clause

Queries can run faster by minimizing the amount of data moving between nodes. In practice, this means being careful when writing multi-stage queries where the results of one stage feeds into the next.

In the case of our example query, modifying your ‘WHERE’ clauses to only select rows needed will minimize the amount of data that needs to be moved around and speed up the query.

with

table1_cte as

(

select * from table1 where created_at>'{{l_bound}}' and created_at <'{{u_bound}}'

table2_cte as

(

select * from table1 where created_at >'{{l_bound}}' and created_at <'{{u_bound}}'

select

from table1_cte as a

JOIN table2_cte as b

ON a.id = b.id

Optimization #2: Limit Columns Scanned

Amazon Redshift is a columnar-oriented database. As a result, scanning a table doesn’t read each row in its entirety. Instead, individual columns can be scanned without needing to read other columns. This means that you should be careful to only select columns that you will use for your query. Try avoiding using a

‘SELECT *’

operation in all cases.

with

table1_cte as

(

select id,name,address from table1 where start_time>'{{l_bound}}' and start_time<'{{u_bound}}'

table2_cte as

(

select id,name,address from table1 where start_time>'{{l_bound}}' and start_time<'{{u_bound}}'

select

a.name,b.address

from table1_cte as a

JOIN table2_cte as b

ON a.id = b.id

Using these two optimizations when writing queries will have dramatic positive effects on your query speeds.

Big Data

Write Smarter Queries

How Integrate.io Pricing Works

Mastering SQL Queries in Excel

ETL Developer vs Data Engineer: Key Differences

Solutions

Support

Company

Language

Write Smarter Queries

Related Readings

How Integrate.io Pricing Works

Mastering SQL Queries in Excel

ETL Developer vs Data Engineer: Key Differences

Subscribe To The Stack Newsletter

Solutions

Support

Company

Language

Subscribe To
The Stack Newsletter