SQL Server: Storage Engine Basics

In by day-job I have a need to guide customers with specific deployment scenario’s on SQL Server. Often I end up talking about the storage, transaction engine and deployment options from SAN, DAS etc. This conversation can get really complex and many times we have a need to educate customers on using one over the other. It is a tough call because there are specific features to discuss in each scenario, there is a budget constraint of customer and the environment SQL Server is running and this involves multiple teams. From SQL Developers, SQL DBAs, Network Admins, Performance testing teams and System Admins all these can be actually different functions inside an given organization.

Storage is typical and sometimes tough given the number of parameters to work with. Let me give some of the high-level considerations when on this point.

  1. When we are working with databases a special eye needs to be put for input/output (I/O) load of the application. The I/O characteristics is a function of business requirements and and components enabled for SQL Server. Typical questions to ask for your I/O characteristics are: 
    1. How are the read versus write ratio for the database?
    2. Has performance counters monitored for the typical I/O volume (I/O per second) on the system for a window of usage? 
    3. Based on the usage patterns, can we quantify the I/O done sequential and how much is random?
  2. Has the business called out the SLA and RTO requirement to narrow down the availability and performance needs for the database system.
  3. Based on 1 and 2, has a capacity planning done to determine the hardware required to support the business. 
  4. Has the SQL Server been configured to make best use of the hardware in step 3.
  5. Is proper monitoring in place to track the performance of the database while the workload changes over a period of time.

The whitepapers on database architecture and storage is long and beyond the scope of this blog. See Microsoft SQL Server Storage Top 10 Best Practices and Physical Database Storage Design for more detailed recommendations from the SQL Server team.

Since this is in continuation to the other topics on basics, a lot has been omitted because they are covered in other locations. Also make sure to read SQL Server: Database Page Basics, SQL Server: Database Basics, SQL Server: Locking basics, SQL Server: TempDB Basics and SQL Server: Transaction Logs and Recovery.

SQL Server – Access Methods

  1. SQL doesn’t directly retrieve from disk, it makes requests to buffer manager which serves up the page in cache before rendering out.
  2. When scan starts, SQL Server can use look-ahead mechanisms to qualify rows and index entries on a page. The retrieving of rows that meet specified criteria is known as a qualified retrieval.
  3. Basics of Row and Index Operations:
    1. Each component maintains its respective on-disk data structures – rows of data or B-Tree indexes.
    2. Understand and manipulate information on data and index pages
    3. Row operation code retrieves, modifies and performs operations on individual rows
    4. Special operations are needed to handle LOB data (Large Object)
    5. Index operation code maintains and supports searches on B-Tree which are used for SQL Server indexes.
  4. Basics of Page Allocation Operations:
    1. The allocation operations code manages a collection of pages for each database and keeps track of which pages in the db have already been used, for what purpose and how much space is available on each page.
    2. Each database is a collection of 8 KB pages spread across one more physical files.
    3. 13 types of disk pages, specifics are explained in the Page basics blog:
      1. Data is stored on data or LOB pages
      2. Index rows are stored on index pages
      3. Page Free Space (PFS) pages keep track of which pages in a db are available to hold new data
      4. Global Allocation Map (GAM), Shared Global Allocation Map (SGAM), Index Allocation Map (IAM) keep track of other pages
      5. Bulk Changed Map (BCM) and Differential Changed Maps (DCM) are used to make backup and recovery more efficient
      6. Rest of the page types are used for logging and recovery
  5. Versioning Operations, also look at SQL Server: TempDB Basics for few more details on versioning:
    1. Manages access to the version store
    2. Row versioning allows SQL Server to maintain older versions of changed rows

Database Transaction Services

  1. Provides support for Atomicity, Consistency, Isolation and Durability
  2. Write-ahead logging ensures that the record of each transaction’s changes is captured on disk in the transaction log before a transaction is acknowledged as committed.
  3. Log records are always written to disk before the data pages where changes were made are actually written.
  4. Writes to the transaction log are synchronous while writes to data pages can be asynchronous.
  5. Allows for transaction that cross databases within the same SQL Server Instance.
  6. For cross-instance transactions, it coordinates with Microsoft Distributed Transaction Coordinator (MS DTC). Use it with utmost caution and understand the overheads of the two phased commit here.
  7. Also transaction services coordinates locking to provide Isolation.
  8. SQL Server supports two concurrency models
    1. Optimistic Concurrency – provides consistent data by keeping older versions of rows with committed values in version store. Readers do not block writers and writers do not block readers. Writers do block writers
    2. Pessimistic Concurrency – guarantees correctness and consistency by locking data so it cannot be changed
  9. SQL Server has five isolation levels:
    1. Read Uncommitted [pessimistic]
    2. Repeatable Read [pessimistic]
    3. Serializable [pessimistic]
    4. Snapshot [optimistic]
    5. Read Committed [optimistic/pessimistic - depends on database setting]
  10. Behavior of transactions depends on the isolation level and concurrency model one is working with

Database Locking Operations

  1. Acquires and releases various types of locks
    1. Share locks – reading
    2. Exclusive locks – writing
    3. Intent locks – taken at a higher granularity to signal a potential "plan" to perform some operation
    4. Extent locks – for space allocation
  2. Manages compatibility between the lock types, resolves deadlocks and escalates locks.
  3. Controls table, page, and row locks as well as system data locks.
  4. Locking is the best way SQL Server can bring a balance to consistency and concurrency and it is SQL Server’s way of implementing Isolation Levels discussed above. Read the other post around “SQL Server: Locking basics” discussed in detail.

Database File Sizes

  1. An ideal recommendation would be to have data files of equal size – SQL Server uses a proportional fill algorithm that favors allocations in files with more free space.
  2. Pre-size data and log files well ahead of time and plan for the growth.
  3. Do not rely on AUTOGROW, instead manage the growth of these files manually. You may leave AUTOGROW ON for safety reasons, but you should proactively manage the growth of the data files.
  4. Configure an alerting mechanism to send emails or log information of available free space. Use other monitoring mechanism to send emails if required.
  5. Also have alert mechanism to look at AUTOGROWTH happening in the system for both Application databases and TempDB.

Disk Topology and SAN interfaces

Small Computer System Interface (SCSI)

  1. Supports forcing data to be written to disk, improving recoverability.
  2. SCSI with Tagged Command Queuing (TCQ) supports multiple I/O requests.
  3. Supports hot-swapping.
  4. SCSI can have up to 15 drives per channel.
  5. Less restrictive on physical cable length.
  6. Overloading the channels increases the chance of reaching the transfer rate limit.

Integrated Device Electronics (IDE)

  1. Supports hot-swapping.
  2. IDE has high transfer rates only if there is one drive attached per channel.
  3. Typically greater capacity than SCSI.
  4. Typically cheaper per GB than SCSI drives.
  5. Can only handle one outstanding I/O request per channel.

Serial Advanced Technology Attachment (SATA)

  1. SCSI with TCQ supports multiple I/O requests.
  2. Supports hot-swapping.
  3. Most are explicitly designed to support only one drive per channel; however, multiple SATA channels of 2 to 12+ on interface cards are also available.
  4. Typically greater capacity than SCSI.
  5. Typically cheaper per GB than SCSI drives.

Serial-attached SCSI (SAS)

  1. Very fast. Supports SCSI protocol.
  2. Allows for a larger number of disks than SCSI.
  3. Applicable to direct-attached storage (DAS) only.
  4. Replacement technology for parallel SCSI. Backward compatible with SATA drives.


  1. Can serve multiple servers.
  2. No limitations on the number of disks that can be accessible.
  3. Easier to install additional servers. Easier to manage many servers.
  4. Easier to reallocate disk storage between servers.
  5. Maintenance costs tend to be lower than DAS.


  1. Greater maximum bandwidth.
  2. Easier to manage for a smaller number of servers.
  3. Initial overhead costs are lower than SAN.
  4. Deployed per server directly attached.
  5. The number of disks is limited by the number of slots in the server and the type of interface used.

For both DAS and SAN, the following categories of performance should be measured:

  1. I/O per second
  2. Megabytes per second
  3. Latency

Performance of both DAS and SAN environments is affected by so many variables that simple recommendations are not possible. Examples of variables include drivers, configuration, underlying and supporting foundational technologies and host bus adapters (HBAs).

Fiber-Channel-switched fabric may be beneficial for SAN environments, because Fiber Channels can provide multiple links through the fabric, and can thereby enable I/O path parallelism so that the SAN can process I/O requests more efficiently.

Where to start?

Generally, when prioritizing data among faster disks for SQL Server databases, use the following ranking, these are generic and need evaluation on a case-to-case basis:

  1. Start with Tempdb data and the tempdb transaction logs
  2. Database transaction log files
  3. Search database
  4. Database data files
  5. In a heavily read-oriented portal site, prioritize data over logs.

Don’t overlook HBA configuration while deployment

  1. Use up-to-date HBA drivers as recommended by the vendor.
  2. Utilize storage vendor specific drivers from the HBA manufactures website.
  3. Tune your HBA driver settings as needed by the databases and access patterns.
  4. Ensure that the storage array firmware is up to the latest recommended level.
  5. Use multipath software to achieve balancing across HBA’s and LUN’s and ensure this is functioning properly.

Well, some of the conversations I have with customers go as long as a day on these topic and I am sure this is not exhaustive. So please feel free to drop in comments of areas that I have missed out here for the benefit of others. Thanks again for reading this far !!!

Share this article

Leave a Reply

  1. This is a very powerful article – it has everything one needs to work with SQL Server storage. Every word is an important one which makes this article a ready recokner on all days!

    Thanks for another very interesting post, Vinod!

  2. Another Good One and Well explained about DB Engine Basics Concepts. Excellent Vinod.

  3. Another Great post Vinod. Thanks!!!

    Seem Writing THEBEST articles becomes your habit.

    Under “Database Transaction Services” Section, you have mentined below point

    4.Writes to the transaction log are synchronous while writes to data pages can be asynchronous.

    I would like to share with you that both Write to T-Log and Write to Data Pages are asynchronous.

    Request you to refer SQL Server 2008 Internals book (Ch 4, Page 182 Para 2).

    Waiting for your next THEBEST :-)

    • Ramkumar – Thanks for the mention and note. I personally have respect and envy for books like SQL Server Internals and glad to see how deeply you have read these content.

      The statement is still correct if you read it carefully. Transaction log writes are a write-ahead concept and we need to do it before the request goes back. If you persist (write) this in Log first in an synchronous manner, in case of system crash this entry can *potentially* be lost. We cannot compromise on ACID properties and SQL Server will make sure this happens that way.

      Data files are async because if the records are in Log and not in Data files, after a crash SQL Server’s recovery of Redo and Undo Phase will make sure the data files are in transactionally consistent state.

      Hope this clarifies and thanks again for your comments.