Archive for November 12th, 2012

SQL Server: Transaction Logs and Recovery November 12th, 2012

Vinod Kumar

In my day-today customer interactions I meet a lot young, smart DBA’s who are still stepping into the ranks of making their applications robust. As part of their daily activities they are getting guided by some seniors who impose some of their views based on their past experience. Unfortunately, in this process the fundamentals are lost and as we build our career we need to understand the basics. One of the topics I explain almost every customer is about how Transaction Logs work, why they keep growing, how to make it cycle/reuse, what happens after backup etc. So I plan to write such posts in the future too, these are lists for reference making learning fun and easy. I am writing this post in the style of notes taken in a class, hope that will make reading easy and leaning faster :).

Transaction Log Basics

  1. Transaction log records changes made to the database and stores enough information to allow SQL Server to recover the database.
  2. Recovery is the process of reconciling the data files and the log.
  3. Any changes to the data that the log indicates have been committed must appear in the data files and any changes that are not marked as committed must not appear in the data files
  4. Log also stores information on how to roll back transactions if a ROLLBACK TRAN is received from the client or a deadlock generates an internal ROLLBACK.
  5. One or more transaction log files are associated with a database when the database is created.
  6. Log record is labeled with a Log Sequence Number (LSN) which is guaranteed to be unique.
  7. Log entries part of the same transaction are linked so that they can be located easily for undo or redo activities.
  8. SQL Server keeps track of the current position in the log by the LSN and when a page is changed the LSN corresponding to the log entry for that change is written into the header of the data page.
  9. Dirty pages can be written to the disk only when the LSN on the page is less than or equal to the LSN for the last record written to the log.
  10. The Buffer Manager guarantees that the transaction log will be written before changes to the database are written (write-ahead logging).
  11. Buffer Manager also guarantees that log pages are written in a specific order making it clear which log blocks must be processed after a system failure regardless of when the failure occurred.
  12. Log records for a transaction are written to disk before the commit transaction is sent to the client process. Actual changed data might not have been written out to data pages yet.
  13. Though log writes are asynchronous, at commit time thread must wait for the write to complete to the point of writing the commit record in the transaction log. Data page writes are completely asynchronous.
  14. Logging involves "demarcating" the begin and end of each transaction with information about changes made to the data in between. At the end, its marked with a commit record.
  15. An aborted transaction undoes the changes made to the original data and these changes are written to the log and marked as "compensation log records".
  16. There are two types of recovery (goal for both is to make sure log and data are in sync):
    1. Restart Recovery – runs every time SQL Server is started. Runs on each database (in parallel) and the SQL Server error log tells you how many transactions were rolled forward and back. This is also called as Crash Recovery.
    2. Restore Recovery – also known as Media Recovery is run by request when a restore operation is requested via the backup route.
  17. Both the above types have to deal with 2 important situations:
    1. When transactions are recorded as committed in the log but not written to the data file.
    2. When changes to the data file don’t correspond to committed transactions. This can happen when a implicit indirect or explicit checkpoint forced this case.

Phases of Recovery

Now that I have set some context to the basic units of transactional logs and why they are used. We need to also understand the stages of recovery process.

  1. Phase 1: Analysis Phase – Forward pass starting at the last checkpoint record in the transaction log. Determines and constructs a Dirty Page Table (DPT) consisting of pages that might have been dirty when SQL Server stopped. Active Transactions Table is also built consisting of uncommitted transactions
  2. Phase 2: Redo Phase – Returns the database to the state it was in at the time SQL Server was stopped. Starting point for this forward pass is the start of the oldest uncommitted transaction. The minimum LSN in the DPT is the first page that is expected to be redone. But all logged operations starting at the oldest open transaction need to be redone for necessary locks to be acquired
  3. Phase 3: Undo Phase – Uses the list of Active Transactions found in Phase 1 to roll each transaction back individually. Any transaction that was not committed is undone
  4. Fast Recovery Phase – available in Enterprise and Developer edition allows for a database to be online as soon as redo phase is finished

Page LSNs and Recovery

  1. Every database page has a LSN in the page header that reflects the location in the transaction log of the last log entry that modified a row on this page.
  2. Also keeps track of the LSN which was on the data page before the change recorded by the latest log record.
  3. During redo, LSN of each log record is compared to the LSN of the data page that the log entry modified.
    1. If it’s equal to the previous page LSN in the log record, the operation indicated in the log entry is redone.
    2. If it’s equal or higher than the actual LSN for the log record, REDO operation is skipped.
  4. A transaction log cannot be truncated prior to the point of the earliest transaction that is still open, no matter how many checkpoints have occurred.

Virtual Log Files

  1. A transaction log for any database is maintained as a set of virtual log files (VLFs) who size is determined internally by SQL Server by the total size of all the log files and the growth increment used when enlarging the log.
  2. Log file is grown in units of entire VLFs and be shrunk only to a VLF boundary:
    1. When a log file is first created it always has between 2 and 16 VLFs.
    2. If its 1 MB or less – log file size is divided by minimum VLF size 31 * 8KB to determine number of VLFs.
    3. If its between 1 and 64 MB – log is split into 4 VLFs.
    4. If greater than 64 MB but less than 1 GB 8 VLFs are created.
    5. If greater than 1 GB, 16 VLFs are created.

      The above rules of thumb can be different depending on the version of SQL Server you are working.

  3. 4 states for a VLF:
    1. Active – active portion of the log begins at the minimum LSN representing an active transaction. The active portion of the log ends at the last LSN written
    2. Recoverable – portion of the log preceding the oldest active transaction is needed only to maintain a sequence of log backups for restoring the database to a former state
    3. Reusable – If transaction log backups are not being maintained or if log was already backed up, VLFs before the oldest active transaction are not needed and can be reused
    4. Unused – One or more VLFs at the physical end of the log files might not have been used yet if not enough log activity has taken place. Or if earlier VLFs have been marked as reusable and then reused
  4. Automatic Truncation – understand when the Log backups are not existent:
    1. You have configured the database to truncate a log on a regular basis by setting the recovery model to SIMPLE.
    2. You have never taken a full database backup.
    3. All log records prior to the oldest active transaction are invalidated and all VLFs not containing any part of the active log are marked as reusable. This process doesn’t shrink the physical file it is used in a cyclic manner.
  5. Automatic Shrinkage – A database is truncated to allow for shrinking. If autotruncate and autoshrink option is set, the log is physically shrunk at regular intervals. This is something not recommended, though this option exists.

Database Recovery Models

As we learn about transaction logs, we cannot miss understanding the recovery models and the effects of the same. There are 3 modes of recovery models:

  1. FULL Recovery Model
    1. Provides the least risk of losing work in case of a damaged data file.
    2. All operations are fully logged – in addition to logging every row changed via DML, the entire row is written to the transaction log using BCP or BULK INSERT operation.
    3. Also fully logs CREATE INDEX operations.
    4. Use this if you need to recover to a Point-in-time from your backup.
  2. BULK_LOGGED Recovery Model
    1. Allows you to restore a database completely in case of media failure and gives the best performance and least log space usage for certain bulk operation.
    2. Minimal logging compared to FULL recovery model.
    3. Minimal logging consists of logging information that is required to roll back the transaction without supporting point-in-time recovery.
    4. Every data file in SQL has a special page called Bulk Changed Map (BCM) page also called Minimally Logged Map (ML Map) page. Each bit on a BCM page represents an extent and if the bit is 1 it means that this extent has been changed by a minimally logged bulk operation since the last transaction log backup.
    5. BCM map is located on page 7 of a 4GB span and all the bits are reset to 0 every time a log backup occurs.
  3. SIMPLE Recovery Model
    1. Offers the simplest backup and restore strategy.
    2. Transaction log is truncated whenever a checkpoint occurs (which happens regularly).
    3. Only backups that can be made are ones that don’t require log backups.

Choosing the right recovery model is very important for functioning of your databases.

Backing Up and Restoring a Database

SQL Server has four main types of backups:

  1. Full Backup – Copies all pages from a database onto a backup device.
  2. Differential Backup – Copies only the extents that were changed since the last full backup was made. Done by examining the bits on the DCM pages for each data file in the database. When any page in an extent is changed, the corresponding bit in the DCM page is set to 1
  3. Log Backup – Copies all the log records that have been written to the transaction log since the last full or log backup was made
  4. File and Filegroup Backup – these backups are intended to increase flexibility in scheduling and media handling compared to full backups, in particular for very large databases. Useful when database contains varying update characteristics or priority levels

Thanks for reading this far. Hope the basics are clear now. Given these pointers, feel free to explore more into MSDN documentation.

Continue reading...