Archive for May 10th, 2012

SQL Server 2012: ColumnStore Characteristics May 10th, 2012

Vinod Kumar

I was generally exploring for content to explain the nuances of using ColumnStore Index with SQL Server 2012. And interestingly during a conversation with Pinal, he had already written a number of blog posts on the same subject – fundamentals, ignoring columnstore, video demo of performance. Seeing this I thought maybe I want to write something different on the same subject and leave the basic learning to those posts as they make a good read.

Columnstore indexes group and store data for each column and then join all the columns to complete the whole index. These differ from traditional indexes that group and store data for each row and then join all the rows to complete the whole index. For some types of queries, the SQL Server query processor can take advantage of the columnstore layout to significantly improve query execution times. Columnstore indexes in SQL Server have key characteristics that differentiate it from a traditional row-based store. Let me list a few here, these are simple and good to know information:

  1. Just like the explanation I did above, unlike the traditional row-based organization of data (called rowstore format), in columnar database systems such as SQL Server with columnstore indexes, data is grouped and stored one column at a time. SQL Server query processing can take advantage of the new data layout and significantly improve query execution time. In this context you will hear the term xVelocity to describe the columnstore indexing engine.
  2. There is no concept of key columns in a columnstore index, so the limitation on the number of key columns in an index of 16 does not apply to columnstore indexes. But it cannot have more than 1024 columns.
  3. The index cannot be clustered index for columnstore. If a base table is a clustered index, all columns in the clustering key must be present in the non-clustered columnstore index. If a column in the clustering key is not listed in the create index statement, it will be added to the columnstore index automatically.
  4. It can neither be a primary key or foreign key and cannot have the INCLUDE keyword for adding columns.
  5. Columnstore indexes work with table partitioning. No change to the table partitioning syntax is required. A columnstore index on a partitioned table must be partition-aligned with the base table. Therefore, a non-clustered columnstore index can only be created on a partitioned table if the partitioning column is one of the columns in the columnstore index.
  6. The index key record size limitation of 900 bytes also does not apply to columnstore indexes. It cannot be unique Index and cannot include sparse columns. You cannot also include the ASC or DESC keyword as part of the index.
  7. Along with the columnstore index, SQL Server introduces batch processing to take advantage of the columnar orientation of the data. This can be seen in the Logical step of a Execution Plan called “Batch”.
  8. Only the columns needed must be read. Therefore, less data is read from disk to memory and later moved from memory to processor cache.
  9. Most queries do not touch all columns of the table to satisfy a query. Therefore, many columns will never be brought into memory. This, combined with excellent compression, improves buffer pool usage, which reduces total I/O.
  10. In this version of SQL Server 2012, the ColumnStore Index makes the table into ReadOnly mode. And it cannot be combined with Compression, replication, Change tracking, Change data capture and Filestreams.
    For Updates, there are always workarounds as discussed on MSDN.

DMV’s to keep track

sys.column_store_dictionaries: Contains a row for each column in a ColumnStore index.

sys.column_store_index_stats: Contains a row for each ColumnStore index.

Update: Seems like the this DMV will get deprecated, Pinal wrote about this.

sys.column_store_segments: Contains a row for each column in a ColumnStore index.

INDEXPROPERTY – Adds a property IsColumnstore to identify xVelocity columnstore.

This feature is strongly geared towards DW workloads and I had an opportunity to meet customers with TB’s of data to process in short time. And this feature was something they just loved using even given the restrictions. Some of the restrictions may be eliminated in future versions but this xVelocity concept is quite powerful to be ignored. If you do use this, drop a line !!!

Continue reading...