Archive for January 10th, 2011

Sparse Columns – Need and Limitation January 10th, 2011

Vinod Kumar

Sparse columns are ordinary columns that have been designed for optimized storage of NULL values. Sparse columns are by nature expected to hold NULL values for most of the rows in the containing table. The benefit of sparse columns is that they reduce the space requirements for NULL values. However, this benefit comes at the cost of increased processing overhead to retrieve non-null values.

Sparse columns address the business/technical problem that arose from the increasing need to store large amount of semi-structured data. Such a need emerged with “Web 2.0” systems used for business collaboration, Wikis, Blogs, discussion forums, etc. Sparse column can be successfully applied in the following examples. Consider the variety of documents, photos, videos, GPS data, emails, contacts, etc. that exists on the Internet today. While many of these contents are stream-oriented, all of them contain a rich set of metadata properties, both standardized sets (e.g. Name, Size, Date) as well as custom-defined sets (e.g. LocationDescription, ViewableBy, Color, Event). Given the nature and variety of content, these property sets are critical for organizing, categorizing and searching these rich semi-structured objects. A large portion of these data is currently stored in file systems, but it seems logical to store such contents in relational databases taking advantage of the extensive querying, transactional and manageability facilities built into these systems. What is key to observe about the semi-structured data is that not all properties apply to all objects. For example Event may only be relevant to the video you captured on a trip to Paris, while Latitude is only relevant to GPS data. Therefore, a relational system would need to provide flexibility in the way such data is organized and managed.

There are some restrictions on what columns that can be marked as Sparse. The following datatypes cannot be marked as Sparse Columns –

  1. Geography
  2. Geometry
  3. Image
  4. Ntext
  5. Text
  6. Timestamp
  7. User-defined data types (UDT)

Columns that cannot be marked as sparse

  1. Computed columns (a computed column can however contains a sparse column)
  2. Identity
  3. RowGuid
  4. Filestream

Other restrictions:

  1. A sparse column cannot have a default value.
  2. A sparse column cannot be bound to a rule.
  3. A sparse column cannot be part of a clustered index or a unique primary key index. However, both persisted and nonpersisted computed columns that are defined on sparse columns can be part of a clustered key.
  4. A sparse column cannot be used as a partition key of a clustered index or heap. However, a sparse column can be used as the partition key of a nonclustered index.
  5. A sparse column cannot be part of a user-defined table type, which are used in table variables and table-valued parameters.

Sparse columns are beyond the limitations, the details on using the same will be something we will discuss for a different post. Hope these will be in your minds while using the same.

Continue reading...