Introduction to Windowing Functions

The next few articles I’m writing are going to disect Windowing Functions in SQL Server. Windowing in SQL Server was first introduced in SQL Server 2005, with very few features. Over time though, with the release of SQL Server 2008, 2008 R2 and 2012, more features have been added that make Windowing Functions a great asset to the T-SQL language.

First what are Windowing Functions, and why use them? In short, Windowing functions allow you to target a specific set of rows inside of your result, and apply a specific function to JUST those rows. This is where the term “window” comes into play. With windowing functions, your concentrating on a “window” of the data set to apply your function to. This makes things that can be really tricky in T-SQL sometimes, like a running total for example, much more straight forward and efficient. Previously before windowing functions, something like a running total, would require self-joins back to the main table, loops, cursors or some other row by row operation. SQL Server performs best when working with sets of data. Let’s take a simple query from the AdventureWorks 2012 database.

select sh.CustomerId,
	   sh.SalesOrderID, 
	   sh.OrderDate,
	   sd.UnitPrice
from Sales.SalesOrderHeader sh
   JOIN Sales.SalesOrderDetail sd on sh.SalesOrderId = sd.SalesOrderId
order by CustomerId

Pretty straight forward query and result set.

query_2

Let’s apply the OVER clause to this query to SUM up the totals for Unit Price.

--query 2
select sh.CustomerId,
	   sh.SalesOrderID, 
	   sh.OrderDate,
	   sd.UnitPrice,
	   SumUnitPrice = SUM(sd.UnitPrice) OVER ()
from Sales.SalesOrderHeader sh
   JOIN Sales.SalesOrderDetail sd on sh.SalesOrderId = sd.SalesOrderId
order by CustomerId

Here are the results of the query:

result_4

As you can see, the SumUnitPrice column is the SUM total of unit prices for the entire result set. Doesn’t seem very useful though does it? This is because the OVER () clause without a PARTITION BY statement SUMS unit price for the entire table. Think of it as using a SUM function, without a GROUP BY clause. You really never see that too often except in certain scenarios. In this case we want to see total dollar amounts by customer id in order date order. Let’s take a look at the OVER clause and the PARTITION BY working together.

--query
select sh.CustomerId,
	   sh.SalesOrderID, 
	   sh.OrderDate,
	   sd.UnitPrice,
	   RunningTotalAmt = SUM(sd.UnitPrice) OVER (PARTITION BY sh.CustomerId order by sh.OrderDate)
from Sales.SalesOrderHeader sh
   JOIN Sales.SalesOrderDetail sd on sh.SalesOrderId = sd.SalesOrderId
order by CustomerId

Here are the results of the new query:

results_6

That looks much better. The PARTITION BY statement combined with the OVER clause makes the results look much better. Here we have a running total based on customer. One important thing to note is the ORDER BY clause after the PARTITION BY statement. Always remember that data is not ordered unless you specify an ORDER BY clause. Here we can specify the ORDER in which we want to SUM the unit price. We can order the results before the final order by clause. In this case, our running total is ordered by Order Date inside the PARTITION BY statement. The PARTITION BY statement is using the CustomerID, combined with the OrderDate to give the running total for each customer in the query. We can also combine functions inside the same query. Here we’re using the same query except I’ve added a COUNT function to count the units also as a running total for each customer, next to the total dollar amount for each customer.

select sh.CustomerId,
	   sh.SalesOrderID, 
	   sh.OrderDate,
	   sd.UnitPrice,
	   RunningTotalAmt = SUM(sd.UnitPrice) OVER (PARTITION BY sh.CustomerId order by sh.OrderDate),
	   ProductCount = COUNT(*) OVER (PARTITION BY sh.CustomerId order by sh.OrderDate)
from Sales.SalesOrderHeader sh
    JOIN Sales.SalesOrderDetail sd on sh.SalesOrderId = sd.SalesOrderId
order by CustomerId 

Here are the results:

results_7

The OVER() clause has had some great updates since SQL Server 2008 R2. In SQL Server 2012 the ORDER BY clause was introduced inside the OVER clause. This determines not only the partitioning, but also the ordering of the result set before the window function is applied. If you compare the books online page of the OVER clause from SQL Server 2008 vs. SQL Server 2012, you’ll see the differences. Here are the links below for each respective version.
2008 R2: https://msdn.microsoft.com/en-us/library/ms189461(v=sql.105).aspx
2012: https://msdn.microsoft.com/en-us/library/ms189461(v=sql.110).aspx


One Comment on “Introduction to Windowing Functions”

  1. […] Introduction to Windowing Functions | Matt Batalon […]


Leave a comment