Amazon Redshift launched with disruptive pricing. To compare the cost, we’re looking at the price for storing 1TB of data for one year ($ / TB / Year). With a 3-year commitment for the ds2.8xlarge nodes, the price comes down to $934 / TB / Year. That price point is unheard of in the data warehousing world.
The average Amazon Redshift customers double their data every year. In fact, that is one of the reasons why it’s important to focus on performance improvements – since managing performance becomes a bigger challenge as data volume grows.
At some point, the cost of storing all this data in Amazon Redshift become prohibitive. So keeping a multi-year history of data “forever” can become expensive. Deleting data may not be an option, e.g. for regulatory reasons or multi-year comparisons.
The issue here is that Amazon Redshift prices based on the size of your cluster, i.e. compute and storage are coupled. You’ll have to keep adding nodes for storage, even though you may not need the additional computing power of the additional vCPUs.
Because pricing is so cheap, a common initial behavior is to store all historical raw data in Redshift. But data volume is growing. You may also want to use the faster but more expensive dense compute nodes. Many companies don’t want to make a capital commitment beyond a 1-year term.
So keeping a multi-year history of data “forever” can become expensive. Deleting data may not be an option, e.g. for regulatory reasons or multi-year comparisons.
The issue here is that Amazon Redshift prices based on the size of your cluster, i.e. it couples compute and storage. You’ll have to keep adding nodes for storage, even though you may not need the additional computing power of the additional vCPUs.
Enter Amazon Redshift Spectrum. With Redshift Spectrum, you can leave data as-is in your S3 data lake, and query it via Amazon Redshift. You can de-couple compute from storage. This approach makes sense when you have data that doesn’t require frequent access. Leave your “hot” data in Amazon Redshift, and your “cold” data in S3.
The impact on cost can be substantial. The price for S3 Standard Storage is $281 / TB / Year. And so with Redshift Spectrum, you get the best of both worlds. We call it “data tiering”. You get to keep all your historical data, along with the performance of Amazon Redshift. With Redshift Spectrum you can benefit from the cost savings of using S3.
In “Amazon Redshift Spectrum: Diving into the Data Lake!”, we’re taking an even closer look at using Redshift as part of a data lake architecture, including the use of Amazon Athena and AWS Glue.