The AWS team announced the release of an SSD version of Amazon Redshift on January 24, 2014. To test out the specs of the newly released SSD version, we have performed some comparisons of its performance against the traditional HDD version of Amazon Redshift. http://aws.typepad.com/aws/2014/01/faster-cost-effective-ssd-storage-options-for-amazon-redshift.html
This new Amazon Redshift SSD instance is named DW2 in AWS’ product offering, whereas the original HDD instance is named DW1. To compare the two instances, we have summarized our benchmarks and notes on our slide deck.
Through our benchmarking, we have found that DW2 is indeed much faster than the traditional DW1, anywhere from 4x to 8x at comparable pricing points. The query speeds on DW2 can also increase as you add more DW2 nodes to your Redshift cluster.
An additional thing to note is that in the case of DW1, when running complex queries[1] against more than a TB of records of data, the query performance will take a few minutes, even when you increase the cluster size. Because of the nature of the queries, it was hard to think of the query running in shorter than 30 seconds. However, in the case of DW2, running simple queries against the entire TB of data took less than a second and complex queries returned results in less than 10 seconds. This is game changing as it allows Redshift to be used in situations where it was not practical before. For example, if you can setup your backend architecture to work with just a few concurrent connections to Redshift, you can start using Redshift directly as a backend database that can query TBs of data. We can also expect that Redshift compatible BI tools such as Tableau will be snappier with the faster Redshift performance. The loading time is quite impressive as well. Of course, adding additional nodes in the cluster will increase load times.
Even so, the decrease in the time necessary to load data was notable. Loading 1.2TBs of data into a 12 x dw2.large cluster took 1 hour and 36 minutes. With a DW1 HDD Redshift, the load time was 2 hours 37 minutes even when we used a 2 x dw1.8xlarge cluster with an equivalent of 16 clusters. That is a comparison between a cluster that takes $3/hour ($0.25/hour * 12) against one that takes $13.60/hour ($6.80*2). Simply comparing the query speed and loading time, Amazon Redshift SSD (DW2) is significantly faster.
Given this query speed, we can easily see Amazon Redshift being used for a wider variety of use cases where query speed is critically important, such as for ad-delivery optimization and financial trading systems. Even though the total price may come out high, there are plenty of benefits to be gained from using this offering.
The only drawback, as we have been mentioning above, is the increased price tag for DW2. DW2 being optimized for query speed, the price per TB on the clusters come in at about 4x higher than DW1. That being said, any customers who have less than 500GB of data and have been using Amazon Redshift HDD (DW1) should definitely take a look at DW2 as the price tag comes out cheaper at that level of data.
All in all, the SSD version of Amazon Redshift is a higher-end version with ultra-fast querying time.
[1] When we say “complex queries” we refer to queries such as the one we used in our benchmarks, which joins 4 tables and performs calculations as well. If you want to perform fast queries on your real-time data, take a look at Integrate.io. We provide continuous loading and ETL solutions to Amazon Redshift.