The VACUUM command is an important tool in the DBA tool belt. Supporting an active, live database can be a tricky task, especially when long-running queries and database updates become a bottleneck in your software. Often, incorporating a VACUUM in your workflow can be a pivotal and sustainable solution to your database woes.

Redshift Data Storage

At the core of a Redshift cluster is the disc on which it stores data. In fact, it stores data on several discs. If you're not familiar with how bits get stored on a disc, here's a quick (very high level) summary:

A computer disc consists of thousands of microscopic magnets looping around the center. Think of it as a bunch of domino tiles laying down next to each other in a bunch of concentric circles. A tile can either be face up or face down: 1 or 0. Each tile is a magnet that can either be on or off. When data gets stored/updated/deleted in a database, a bolt of electricity will beam its way through the magnets and flip them on or off.

For more information on Integrate.io's native Redshift connector, visit our Integration page.

When you delete a record from a database, the magnets representing that data will be "switched off". The kicker, and why the VACUUM command is so vital, is that those magnets that were just switched off aren't automatically going to be used the next time you write something to the database. This is because the database keeps a pointer of its "next magnet to flip", and that pointer is always positioned at the last, fresh magnet to get flipped over.

What VACUUM does, at a microscopic level, is:

  1. Identify all of those tiles which were "switched off" due to UPDATE or DELETE commands
  2. Reclaim that space by moving other data into it, thus consolidating all of the data. 
  3. Move the pointer back to the next available magnet to flip.

For deeper dive, see our comprehensive guide on Amazon Redshift.

What Does The VACUUM Command Do?

Redshift will automatically sort rows and run VACUUM DELETE in the background. This condenses the data footprint, making data more easily accessible and cheaper to store.

There are a few different types of VACUUM commands:

  • VACUUM FULL: This is the default VACUUM command. It sorts the specified table and reclaims any disc space cleared out by DELETE or UPDATE commands. One can specify a specific table to VACUUM, or it can do an entire database (this operation can take a long time).
  • VACUUM SORT ONLY: This type of VACUUM command only sorts the data on disc. This is a useful distinction if you do not have space constraints but want your VACUUM command to be as quick as possible.
  • VACUUM DELETE ONLY: This VACUUM command only reclaims space that changed as a result of a DELETE or UPDATE command. It is quicker than a VACUUM FULL command, but there is little benefit of this command other than that.
  • VACUUM REINDEX table_name: This is a more complex VACUUM SORT ONLY. Not only does it sort the records, but it also takes into consideration sort key columns. In other words, if a table has many columns with indexes, this type of VACUUM would take those indexes into account vs the VACUUM SORT ONLY which only sorts on a primary key.

When to Use The Vacuum Command

Since Redshift runs a VACUUM in the background, usage of VACUUM becomes quite nuanced. In other words, it becomes difficult to identify when this command will be useful and how to incorporate it into your workflow. Fear not, Integrate.io is here to help.

The biggest reason to incorporate VACUUM into your workflow is if your application has a lot of UPDATE and DELETE statements. Even though Redshift will run the VACUUM in the background periodically, it may not be frequent enough.

More importantly, if the frequent UPDATE and DELETE commands cause your data to by out of order, enforcing a VACUUM SORT will make data more accessible.

How to Use VACUUM

You can treat VACUUM like any other SQL command you run on your Redshift cluster.

If you wish to run VACUUM on every table in your database:

VACUUM;

If you wish to run VACUUM on a specific table:

VACUUM table_name;

If you want to run VACUUM DELETE ONLY on a specific table:

VACUUM DELETE ONLY table_name;

Similarly for SORT ONLY:

VACUUM SORT ONLY table_name;

And lastly, for VACUUM REINDEX:

VACUUM REINDEX table_name;

Redshift Vacuum and Integrate.io

Managing large software applications can be a daunting task, but you don't have to go it alone. Schedule a call with Integrate.io to learn more about our offerings and how managing your data can be one less thing to worry about.