Archive for November 8th, 2012

SQL DBA Checklist November 8th, 2012

Vinod Kumar

Long time back I did write about what activities constitute a DBA role and as I start meeting more DBA in my day job – I get asked, is there any checklist? If you really ask me, that article does call out most of the points. Here is just yet another list. Feel free to add more to this list –

As a DBA we work to keep the systems up and running for the business, hence there are business requirements we need to be aware off like below – know the answers to the below and ask your respective stakeholders before designing the system:

  1. Establish specific SLAs (service level agreements) with stakeholders
  2. What Availability do they want? (aka  find our your maintenance windows)
  3. RPO (recovery point objectives – how much you can afford to lose if DR / corruption occurs)
  4. RTO (recovery time objective – how long to bring database/system back up after DR/Corruption)
  5. Points of contact (owners) for the various databases. This is true in consolidation environments. People change roles or switch jobs – keep this uptodate always.
  6. Change control – how often, who has what rights to DB, what auditing is in place etc. – especially important for shared instances
  7. Response times & escalation procedures (how soon you will do changes if you do them for the customer, how soon you respond when problems arise – are you a 24×6, 24×7, 7×12, 5×10 shop,  do you have people on call, how do they reach on call person, etc.)
  8. Based on the responses, make your backup strategy based on this because the RTO, RPO will be something decided by this strategy.
  9. Know the immediate or near future plans of the organization to support newer versions. Plan on a upgrade or migration strategy before hand and know the nuances. Check my previous post on Upgrading to SQL Server 2012.
  10. Know your various contact points from an Hardware perspective – SAN vendor, Software vendors, Network switch vendors, Network Admins and the various access mechanisms to reach the server. Document them and keep them handy always.

Other lists include:

  1. Test your disaster recovery plan by a junior DBA to check if you have missed any steps.
  2. Pre-grow your databases to cover the projections for the next year
  3. Pre-size the tempDB to its regular maximum size.
  4. Check that security is still tight – people who have left the group/company no longer have logins, no new logins have snuck into highly privileged groups, etc.
  5. Check all your logs thoroughly for abnormality.
  6. If you’re not yet using Policy Based Management, double check that all your configuration settings are still correct.
  7. Take new performance baselines to track if the system is performing better compared to a baseline.
  8. Resync your test and development systems with production or production like data
  9. Implement some of those cool new techniques, functionality, tips, etc. you learned at your last conference. Always look at implementing new efficient methods of code.
  10. Ask your hardware admins to do low level hardware diagnostics
  11. Install MOM/SCOM agents (or at least some other monitoring / alerting tool).
  12. At the very least configure SQL (SMTP) Mail and Operator alerts for job failures.
  13. Ensure you have maintenance plans for re-indexing/defragging  and updating statistics.
  14. Configure your maintenance plans to pick up new databases automatically or write your own scripts to detect and configure new plans (in the process of writing some now).
  15. Add a job to recycle the error log every night to keep logs small.
  16. Keep an script handy for any non-compliance and keep running them time-to-time on the environment.
  17. Test your disaster recovery plan again!!

Additionally I might do things on a daily basis and quarterly basis like, some tasks done daily are:

  1. Monitor and Check for blocking spids
  2. Monitor and check for deadlocks
  3. Monitor and check status of SQLAgent jobs
  4. Monitor disk growth (especially processes that might be eating up transaction log-space).
  5. Monitor for rogue, ad-hoc queries that might be running against your production systems (and take appropriate, punitive action).
  6. Check if there are Login Errors or any suspicious activity on the logs.

Maybe quarterly I might do:

  1. Collect database and table growth history (at least once per week), then review growth rate every quarter.  Compare the actual growth rate against expected growth rate from the TECH spec  (DEV should always provide estimated growth rates in either the tech spec or deployment plan!)
  2. Create a simple SQL Profile trace that you can run as a SQLAgent job that collects samples of durations of the most critical stored procs and queries in the system.   Just like the growth rates above, compare the actual durations against expectations in the tech spec…   Also, look for concerning performance degradation over time.
  3. Collect blocking history and review trends quarterly
  4. Collect deadlock history and review trends quarterly
  5. Check who has access to the database regularly
  6. Keep looking at the organizations operational guides and see if there needs to be changes because of upgrades or if the behavior changed in the next releases.
  7. Check once in a while if the tapes and backups are uptodate and functional. This can be randomly done during the quarter.

I am sure I am missing some of the vital points you are doing in your environment. So please feel free to add it over comments and I will be more than happy to merge them.

Continue reading...