Sliding Window based Aggregation–SQL Server 2012 April 19th, 2013

Vinod Kumar

As part of performance testing at MTC, I get a chance to meet a number of customers from the financial domain. A lot of code I see in applications is around creating custom functions for some simple requirements. One of the most sought after requirement is the need for sliding window operation. Did you know we have this function already with SQL Server 2012? When I talk with customers, a lot of them find it interesting and are mostly unaware of this enhancement made. In this blog post I will introduce this with pointers to more read. Having said that, you can also check some other functions which will mimic Access based functions introduced with SQL Server 2012.

Check the Sliding Window implemented using OVER Clause in MSDN.

Here is a typical example for sliding window implementation.

SELECT SalesDate, Sales,

AVG(Sales) OVER (

             ORDER BY SalesDate

             ROWS BETWEEN 10 PRECEDING AND 10 FOLLOWING) as [SlidingAverage]

FROM tbl_Aggregate

GO

In this above example is a case of running average over a sliding window of rows of 10 previous (see PRECEDING keyword), current and 10 next rows (see FOLLOWING keyword).

We can also use the analytics functions introduced with SQL Server 2012 along with the above syntax.

To show how powerful queries can be made, below is a classic example. Let us create a simple budget table with values for allocation across the past 3 years.

CREATE TABLE tbl_Budget

(YearOfBudget INT, GroupName NVARCHAR(10), Budget DECIMAL)

GO

 

INSERT INTO tbl_Budget VALUES

      (2010, ‘Product’, 50000.00),

      (2010, ‘Support’, 30000.00),

      (2010, ‘Sales’, 40000.00),

      (2011, ‘Product’, 60000.00),

      (2011, ‘Support’, 40000.00),

      (2011, ‘Sales’, 50000.00),

      (2012, ‘Product’, 65000.00),

      (2012, ‘Support’, 45000.00),

      (2012, ‘Sales’, 55000.00)

Now that we have the data in place let us write a query which will show us side-by-side the current budget, previous budget and next years budget for a comparison perspective. This can be used for reporting type queries. We will be using two other analytic functions LAG and LEAD in addition to OVER clause.

SELECT YearOfBudget,

      GroupName,

      Budget,

 

— Next Years’s budget

LEAD(Budget, 1, 0) OVER (PARTITION BY GroupName ORDER BY YearOfBudget ) as NextBudget,

 

–Previous Years’s budget

LAG(Budget, 1, 0) OVER (PARTITION BY GroupName ORDER BY YearOfBudget ) as PreviousBudget

 

FROM tbl_Budget

The output needs to be seen in 3 section (Product, Sales, Support) for our dataset. The below figure shows the same.

image

Hope you got an idea of such functions introduced with this version. So how have you used these functions in your environments? Would surely love to hear from you.

Interesting with the above query is, the Execution plan has a new operator added. It is called Window Spool. Window spool operator uses actual spool table to keep sliding window’s rows. It can use either in-memory optimized or tempdb based regular spool tables.

image

I remember writing about spills over Pinal’s Blog earlier and it is an important consideration. Always remember, optimizing TempDB is an important aspect and these analytic functions add-up to the overall experience. If spooling does occur to disk, you can view it via xEvent profiling too with SQL Server 2012. The event to add is: window_spool_ondisk_warning – Occurs when a Window Spool operator uses on-disk work table. I did write about using the new Extended Events UI in SQL 2012 for Monitoring high CPU over this blog.

Some fineprints as we sign off that you might find useful are:

  1. If PARTITION BY clause is not specified in OVER clause, entire source table is treated as single partition.
  2. If ORDER BY is not specified, ROWS/RANGE clause cannot be present and computation of each row will include entire partition.
  3. If ORDER BY clause is specified, but ROWS/RANGE clause is NOT specified, the default behavior is to have "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" to calculate current row aggregate value.
  4. For an index to be used by Window spool operator, all the columns in PARTITION BY and ORDER BY clause should be specified in composite index.
  5. If the query contains multiple different OVER clauses, only execution strategy in SQL Server 2012 is stacked operation. This means that we will sort for first OVER clause, execute aggregate, sort again for second OVER clause, and execute aggregate again. In the above example, you will see two Window Spool functions.
  6. ROWS clause is always faster than RANGE clause. If ORDER BY column is unique, then it is better to use ROWS over RANGE clause.

Thanks for reading this far. Keep exploring and making the best use of the power that TSQL functions introduce in every release.

Tags: , , , , ,

This entry was posted on Friday, April 19th, 2013 at 08:30 and is filed under Technology. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply