Torn Page Vs CHECKSUM April 23rd, 2010

Vinod Kumar

This also comes from the various questions that were asked after my session at TechEd India 2010. Confusions exists what to turn ON. My suggestions from SQL Server 2005+ will be to use the CHECKSUM ON.

A term torn page comes from the idea of a torn page in a book. A SQL Server database page is 8Kb in size. This page can be thought of in terms of 512 byte units called sectors, which is the lowest unit of I/O for disk drive. So a SQL Server page is 16 sectors. Since the disk will actually write the 8Kb SQL Server page in 16 sectors, it is conceivable that during some type of power failure (without backupsSmile), one of the 16 sectors may not get persisted to the disk media. Think of a page in a book where part of the page is torn. Same idea.

Does this mean SQL Server cannot handle power failures? Absolutely not. Remember that SQL Server simply uses the Windows API to write a page to disk. It is up the disk system to ensure the write is successful when the API returns success back to SQL Server. But what happen is SQL Server initiated the Windows API call and a system power failure occurs while the API call is being executed? If the computer or disk system has some type of alternate power source, then a problem probably will not occur (of course these alternate sources could have a problem as well). But it is always possible for the disk driver to lose power. Since it writes data in 512 byte sectors, it may or may not have written all 16 sectors for a page before the power failure occurs. SQL Server deploys a technique to detect whether a page has been torn when the page is read from disk. This technique is enabled for a database using the ALTER DATABASE option TORN_PAGE_DETECTION. This option is ON by default for SQL Server 2000 databases (but not for 7.0 databases). There are numerous occasions I have seen people get into torn-page-corruption and have no clue … Anyways, that is a different discussion. Moving to the concept of CHECKSUM from SQL Server 2005+ …

Checksum stands for the concept of calculating a unique value based on the bytes stored on a page before the page is written to disk. The concept is that when the page is read back from disk, the checksum can be recalculated (minus the checksum already on the page) and then checked against the checksum stored on the page. If the values are not the same, the page must have changed after it was sent to the disk system or the disk system did not return the right data.

The thinking behind checksum is that the underlying OS or disk system must be at fault for a damaged page since the checksum value was correct before writing it to disk. The only possible window for a problem within the application is the time it takes to calculate the checksum and sending the buffer to an API call or a flaw in the checksum check algorithm. Checksum will become a very important part of root cause analysis for data corruption with SQL Server 2005+. Understanding IO errors esp 823, 824 errors can easily be worked out if we enabled CHECKSUM.

There are backup enhancements that get added as part of CHECKSUM from SQL Server 2005 onwards and I will discuss the same on a later post.

Tags: , , , ,

This entry was posted on Friday, April 23rd, 2010 at 06:49 and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 Responses to “Torn Page Vs CHECKSUM”

  1. Shashikant Shakya says:

    Really good post, I have copied this and saved it my notes. :)

  2. Terrific work! This is the type of information that should be shared around the web. Shame on the search engines for not positioning this post higher!

  3. MarkSpizer says:

    great post as usual!

Leave a Reply