Chess was invented in 16th century, and is still the most popular board game currently, and in history. It has been played by millions of players, and continues to rise in popularity despite competition from all sorts of technologically advanced games. Each player gets eight pieces to make a total of sixteen pieces. There is a king, a queen, two rooks, two knights, two bishops, and eight pawns. The board is made up of the same black and white squares no matter where you play it, and every player has the same pieces. And yet out of all these games of chess, of the millions upon millions that have been played, no two have been identical.
How is this possible? Because chess is about strategy. It’s about being at your best, smart execution, planning your moves. All games of chess are different because different minds are playing them, and there are so many variables – 16 pieces! These multiple pieces need to work together and have a strategy to lead to one final execution plan.
Now let’s link this concept to SQL Server. There are many pieces that play a part in SQL, and they tend to stay the same from day to day and even from business to business. The main components are data, the CPU, memory, the SQL Server application, and the hard drive. However, as any DBA knows, this doesn’t mean SQL Server is ever the same from day to day. Why is this?
SQL changes all the time because while you’re working with the same “chess men.” Every execution plan will behave different because there will be different CPU cycles, different kind of joins, a different IO pattern, different memory.
Here is where the metaphor ends: there is a way to win at chess, but there are only ways to improve SQL Server, there is no trophy at the end. The only way to “win” is to use the best execution plan that has been drawn.
Some HIstory or background
Query Optimization is often considered as a new subject but if you look back in time many of the present time best practices were discovered over 30 years ago. Here are few of the landmark research papers.
1978 – Nonprocedural Query Processing for Databases with Access Paths. Nancy D. Griffeth. ACM SIGMOD Conference 1978.
1978 – Distributed Query Processing in a Relational Data Base System. Robert S. Epstein, Michael Stonebraker, Eugene Wong. ACM SIGMOD Conference 1978
1979 – Parallel Processing of Relations: A Single-Assignment Approach. Joachim W. Schmidt. VLDB 1979.
1980 – A Methodology for Interpreting Tree Queries Into Optimal Semi-Join Expressions. D. M. Chiu, Y. C. Ho. ACM SIGMOD Conference 1980.
1980 – On Retrieval from a Small Version of a Large Data Base. Adrian Walker. VLDB 1980.
1981 – QUIST: A System for Semantic Query Optimization in Relational Databases. Jonathan J. King. VLDB 1981.
1981 – Optimal Semijoin Schedules For Query Processing in Local Distributed Database Systems. Mohamed G. Gouda, Umeshwar Dayal. ACM SIGMOD Conference 1981.
1982 – Query Processing for Distributed Databases Using Generalized Semi-Joins. Yahiko Kambayashi, Masatoshi Yoshikawa, Shuzo Yajima. ACM SIGMOD Conference 1982.
1983 – Estimating Block Transfers and Join Sizes. Stavros Christodoulakis. ACM SIGMOD Conference 1983.
The amount of research related to Query Processing has grown exponentially in last 30 years. History demonstrates that initially Joins were the point of the interests for early years Query Tuning Wizards.
SQL Server execution plans are broken into two parts: query or compiled plans and execution contexts. Query/compiled plans are sharable by many users, whereas execution contexts are not. Query plans are stored in sys.dm_exec_cached_plans.
There are never more than two copies of the query plan in memory. There is one copy for all serial executions and there is possibly a second for all parallel executions. The parallel copy covers all parallel executions, regardless of their degree of parallelism.
The Query plan is generated at compile time and is reused until a recompile happens (due to a service restart, configuration change or user request). Query plan does not have any user or runtime information associated with it. The execution contexts contain runtime information like parameter values and are specific to the one session by the executing user.
||The execution context data structures are reusable. When multiple users execute a stored procedure concurrently, each user will be given a separate execution context.
Multiple execution contexts can exist, especially in high concurrency environments. If a new user executes a procedure, and one of the existing execution contexts is not in use, it is reinitialized with the context for the new user.
||The execution context maintains state information, such as the values of local variables/parameters for that execution that is specific to a given invocation of a query.
To improve plan stability, one deliberate choice from SQL Server 2005 was to keep the compiled plans in cache as long as possible. The architecture was changed so that compiled plans can keep pointers to the application execution context(s) for that plan.
Under memory pressure, the execution contexts are removed first since they can easily be regenerated. The compiled plans are only removed when there is still insufficient memory after removing the execution contexts. Hence in environments with high ad-hoc workloads, the plan cache can grow to a higher than expected value.
Query Plans are Fingerprints
The query plan should be read from right to left and top to bottom. A query plans shows:
- How data is accessed – which indexes are used; was seek or scan performed.
- How data is joined – order of tables joined and algorithm to join tables.
- How data is aggregated – aggregation algorithm used if query contains sum, avg etc
- Sequence of operations
- Use of temporary worktables – optimizer may create temporary work tables for example if the query contains a sort clause.
- Estimated rowcounts – how many rows the optimizer predicted would be returned by the query?
- Actual rowcounts – number of rows actually returned by the query.
- Use of parallelism – whether or not a parallel query plan was used.
- Operator level warnings from SQL Server 2012 and above.
Query and plan fingerprints are new from SQL Server 2008. The query fingerprint is generated by taking the SQL statement and running it through a hash function. If the SQL statements hash to the same value, they are probably the same SQL statement that only differ by literals.
The query fingerprint is exposed as the query_hash column in sys.dm_exec_requests, and sys.dm_exec_query_stats.
||The query fingerprint is also exposed as the queryhash attribute of the XML query plan.
The query hash can be used to determine if the plan cache contains similar queries which cannot be auto-parameterized. This situation can occur more frequently when an application issues ad-hoc SQL statements.
SQL Server 2008 has added 2 DMVs to identify expensive procedures and triggers. To identify expensive procedures use sys.dm_exec_procedure_stats. This dmv can be used to identify expensive TSQL stored procedures, CLR stored procedures or Extended stored procedures. The type and type_desc columns indicate which type of procedure it is.
To identify expensive triggers use sys.dm_exec_trigger_stats. This DMV is functionally equivalent to the new sys.dm_exec_procedure_stats but it provides the data about the overall time taken inside of triggers only. This information is available for both SQL and CLR triggers.
||Use sp_executesql when a single user might use the same batch multiple times and the parameters are known.
In SQL Server, using parameters or parameter markers in Transact-SQL statements increases the ability of the relational engine to match new SQL statements with existing, unused execution plans. If a SQL statement is executed without parameters, SQL Server parameterizes the statement internally to increase the possibility of matching it against an existing execution plan and this process is called as auto-parameterization.
Use auto-parameterization only for applications that cannot be easily modified. Auto-parameterization is useful in a very limited number of scenarios. Avoid designing applications that rely on auto-parameterization.
When the PARAMETERIZATION option (at Database settings) is set to FORCED, any literal value that appears in a SELECT, INSERT, UPDATE or DELETE statement is converted to a parameter during query compilation. Some queries may not be able to be parameterized.
Forced parameterization might improve performance for environments where high volumes of concurrent adhoc SQL statements are submitted like from an ERP or CRM systems.
I have been wanting to write on these concepts for a while and I hope this can be a good start to learn. These are fundamental notes on Execution plans and are these notes are a starting point. Keep learning and reading.