SQL Server 2012: ColumnStore Characteristics May 10th, 2012

Vinod Kumar

I was generally exploring for content to explain the nuances of using ColumnStore Index with SQL Server 2012. And interestingly during a conversation with Pinal, he had already written a number of blog posts on the same subject – fundamentals, ignoring columnstore, video demo of performance. Seeing this I thought maybe I want to write something different on the same subject and leave the basic learning to those posts as they make a good read.

Columnstore indexes group and store data for each column and then join all the columns to complete the whole index. These differ from traditional indexes that group and store data for each row and then join all the rows to complete the whole index. For some types of queries, the SQL Server query processor can take advantage of the columnstore layout to significantly improve query execution times. Columnstore indexes in SQL Server have key characteristics that differentiate it from a traditional row-based store. Let me list a few here, these are simple and good to know information:

  1. Just like the explanation I did above, unlike the traditional row-based organization of data (called rowstore format), in columnar database systems such as SQL Server with columnstore indexes, data is grouped and stored one column at a time. SQL Server query processing can take advantage of the new data layout and significantly improve query execution time. In this context you will hear the term xVelocity to describe the columnstore indexing engine.
  2. There is no concept of key columns in a columnstore index, so the limitation on the number of key columns in an index of 16 does not apply to columnstore indexes. But it cannot have more than 1024 columns.
  3. The index cannot be clustered index for columnstore. If a base table is a clustered index, all columns in the clustering key must be present in the non-clustered columnstore index. If a column in the clustering key is not listed in the create index statement, it will be added to the columnstore index automatically.
  4. It can neither be a primary key or foreign key and cannot have the INCLUDE keyword for adding columns.
  5. Columnstore indexes work with table partitioning. No change to the table partitioning syntax is required. A columnstore index on a partitioned table must be partition-aligned with the base table. Therefore, a non-clustered columnstore index can only be created on a partitioned table if the partitioning column is one of the columns in the columnstore index.
  6. The index key record size limitation of 900 bytes also does not apply to columnstore indexes. It cannot be unique Index and cannot include sparse columns. You cannot also include the ASC or DESC keyword as part of the index.
  7. Along with the columnstore index, SQL Server introduces batch processing to take advantage of the columnar orientation of the data. This can be seen in the Logical step of a Execution Plan called “Batch”.
  8. Only the columns needed must be read. Therefore, less data is read from disk to memory and later moved from memory to processor cache.
  9. Most queries do not touch all columns of the table to satisfy a query. Therefore, many columns will never be brought into memory. This, combined with excellent compression, improves buffer pool usage, which reduces total I/O.
  10. In this version of SQL Server 2012, the ColumnStore Index makes the table into ReadOnly mode. And it cannot be combined with Compression, replication, Change tracking, Change data capture and Filestreams.
    For Updates, there are always workarounds as discussed on MSDN.

DMV’s to keep track

sys.column_store_dictionaries: Contains a row for each column in a ColumnStore index.

sys.column_store_index_stats: Contains a row for each ColumnStore index.

Update: Seems like the this DMV will get deprecated, Pinal wrote about this.

sys.column_store_segments: Contains a row for each column in a ColumnStore index.

INDEXPROPERTY – Adds a property IsColumnstore to identify xVelocity columnstore.

This feature is strongly geared towards DW workloads and I had an opportunity to meet customers with TB’s of data to process in short time. And this feature was something they just loved using even given the restrictions. Some of the restrictions may be eliminated in future versions but this xVelocity concept is quite powerful to be ignored. If you do use this, drop a line !!!

Tags: , , , , , , , , , ,

This entry was posted on Thursday, May 10th, 2012 at 08:30 and is filed under Technology. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

13 Responses to “SQL Server 2012: ColumnStore Characteristics”

  1. […] SQL Server 2012: ColumnStore Characteristics by Vinod Kumar 48.312872 18.086038 Like this:LikeBe the first to like this. […]

  2. Jack says:

    Great article. My only suggestion would be

    1. If you can add some SAMPLE screen shots of each section with some DATA that will really explain the concept better

    2. Using few Sample SQL (in relevant sections) wont hurt either

  3. […] been writing on Columnstore Index for quite a while. Recently my friend Vinod Kumar wrote about  SQL Server 2012: ColumnStore Characteristics. A fantastic read on the subject if you have yet not caught up on that subject. After the blog post […]

  4. […] SQL Server 2012: ColumnStore Characteristics […]

  5. Govind says:

    Thanks Vinod for sharing internals and more importantly limitations/differences. This is a cool new arsenal for the sqlserver adopters today to attack the dw scale issues.

    It will be also prudent to share
    1. If creating an index over columns of differing cardinality what is preferred? Say for example index is over cities, names of people. I would guess city should come first. Or compression takes care of this automatically? Or should this be defined wrt to query pattern?

    2. Since columnar store is pretty old(monetdb has been there from 1993 onwards) /new technology(for sqlserver) –
    Bunch of paper/sites on columnar store for spare time to get deeper understanding – (cstore/monetdb are the original works)
    a. C-store -stonebraker paper – http://db.lcs.mit.edu/projects/cstore/vldb.pdf (this was later commercialized as vertica and sold to HP last year)
    b. Daniel Abadi’s god paper – http://db.csail.mit.edu/projects/cstore/abadi-sigmod08.pdf – whose work has influenced lot of vendors across the board. This paper analyzes how different/same are both stores.
    d. MonetDB – http://www.monetdb.org/Home/Features

    3. The growth of Columnar databases as of last year – http://www.dbms2.com/2011/06/20/columnar-dbms-vendor-customer-metrics/
    – there are lot of interesting facts hidden there in terms of customers/scale (petabyte)

    • Vinod Kumar says:

      Thanks Govindji for the additional notes. The DW workload is something to lookout for now with OLTP workloads are becoming a rare thing :) … Awesome resources. Thanks.

  6. Shashikant Shakya says:

    Nice article Vinod…

    But it’s really hard to get understand for me this as im not much in Indexes, It’s really good that you have included the links of Pinal’s blog.

    Vinod + Pinal = Complete packgae of information.

    Thanks to both of u…

  7. Pinal Dave says:

    Excellent blog post – I was waiting for something like this and thanks for the link!

Leave a Reply to Pinal Dave