

You can create a compound sort key, sorting first by date, then customer, product, and finally geography. Let’s say you have a table with 100,000 1 MB blocks per column, and you often filter on one or more of four columns (date, customer, product, geography). Interleaved sorts are most effective with highly selective queries that filter on multiple columns. A table with interleaved keys arranges your data so each sort key column has equal importance.

With eight columns, you can have up to 40K possible projections.įor fast filter queries without the need for indices or projections, Amazon Redshift now supports Interleaved Sort Keys, which will be deployed in every region over the next seven days. What’s more, the number of possible projections increases exponentially with the number of columns. For example, with 3 projections, it may take 2-3 times longer to load your data. You have to maintain not only data you’re loading, but also indices and projections on top of that data. However, both indices and projections can require significant overhead. Instead, columnar systems often create projections, which are copies of the underlying table sorted in different ways. Indices are less selective for columnar databases with larger block sizes. For example, if a table has five years of data sorted by date, 98% of the blocks can be eliminated when you query for a particular month.īut what if you need to filter by a column that is not in the sort key? Row-based systems deal with this by creating secondary indices to reduce the number of blocks accessed when filtering on a non-primary key column. When you sort a table, we use zone maps (cached in memory) to skip 1 MB blocks without relevant data. One way Amazon Redshift improves I/O performance is by using sorted tables to restrict the range of values that need to be scanned. Reducing I/O is critical to improving data warehouse query performance. Many of our customers, including Pinterest, Euclid, Accordant Media, Asana and Coursera, have seen significant performance improvements with Amazon Redshift. Amazon Redshift uses a massively parallel processing, scale-out architecture to ensure compute resources grow with your data set size, and columnar, direct-attached storage to dramatically reduce I/O time.
Sort key in redshift for free#
You can get started with a 160GB cluster for free for 2 months, and scale to a couple of petabytes for less than $1,000/TB/Year. My colleague Tina Adams sent a guest post to introduce another cool and powerful feature for Amazon Redshift.Īmazon Redshift, our fully managed data warehouse service, makes it fast and easy to analyze all your data.
