Tuesday, 10 January 2006

Implementing dynamic PIVOT in SQL Server 2005

It is well-known that PIVOT is one of the new features included in SQL Server 2005 which allows to convert rows into columns. To find out more details about it, check out my previous post. To begin with, we are going to use the following example to analyse some data, which will be pivoted shortly after.

USE AdventureWorks 
GO
SELECT CustomerID,YEAR(DueDate) [Year], TotalDue 
FROM Sales.SalesOrderHeader
ORDER BY CustomerID

According to the result set, there are many sales per customer between the years 2001 and 2004.



In order to pivot the 'TotalDue' per 'Year' we do need to indicate each year in the PIVOT clause. In this case we do also need to know the years which will be taken into account. For instance, this query will pivot 'TotalDue' for the years 2001, 2002, 2003, and 2004.

SELECT CustomerID, [2001] AS '2001', [2002] AS '2002', [2003] AS '2003', [2004]  AS '2004'
FROM (
      SELECT CustomerID,YEAR(DueDate) [Year], TotalDue FROM Sales.SalesOrderHeader
     ) pvt
PIVOT (SUM(TotalDue) FOR [Year] IN ([2001],[2002],[2003],[2004])) AS Child
ORDER BY CustomerID

Having successfully executed the query, we will get the following nice result:


Up to now, everything seems to be perfect. Nevertheless, what's going on if we wanted to pivot for many more years? Obviously, we would have to deal with a big limitation at first glance since we will need to add the years manually inside the query, but it does not make sense for a real business case. As a result, we can say that PIVOT is not scalable, I mean that PIVOT is not 'dynamic' by design. Luckily, the purpose of this post is to show how to implement an algorithm to simulate a dynamic PIVOT in SQL Server by using the native PIVOT clause and sp_executesql.

The algorithm is quite simple and does not deserve major explanation, however, I am going to say that this will only create a query dynamically to pivot 'TotalDue' by adding all years inside, and finally the output code will be executed via sp_executesql.

DECLARE @TableYears AS TABLE([Year] INT NOT NULL)
DECLARE @Year INT, @YearsPVT NVARCHAR(MAX) 
INSERT INTO @TableYears  SELECT DISTINCT YEAR(DueDate) AS [Year] FROM  Sales.SalesOrderHeader 
SET @Year = (SELECT MIN([Year]) FROM @TableYears)
SET @YearsPVT=N''
WHILE @Year IS NOT NULL
BEGIN
  SET @YearsPVT = @YearsPVT + N',['+ CONVERT(NVARCHAR(10),@Year) + N']'
  SET @Year = (SELECT MIN([Year]) FROM @TableYears WHERE [Year]>@Year)
END
SET @YearsPVT = SUBSTRING(@YearsPVT,2,LEN(@YearsPVT))
PRINT @YearsPVT
DECLARE @SQL NVARCHAR(MAX)
 SET @SQL = N'SELECT *
            FROM ( 
                SELECT CustomerID,YEAR(DueDate) [Year], TotalDue FROM Sales.SalesOrderHeader
                ) pvt
            PIVOT (SUM(TotalDue) FOR [Year] IN (' + @YearsPVT + ')) AS Child
            ORDER by CustomerID'
 
EXECUTE sp_executesql @SQL

Now I am going to illustrate another example by using data from Northwind database.

SELECT P.ProductID, C.CategoryName, OD.UnitPrice * OD.Quantity AS TotalAmount  
FROM Products P 
    INNER JOIN dbo.[Order Details] OD
ON P.ProductID=OD.ProductID
   INNER JOIN Categories C
ON C.CategoryID=P.CategoryID

Here the simple result without PIVOT.


Using PIVOT in its old-fashioned way:

SELECT ProductID, [Beverages], [Condiments], [Confections], [Dairy Products], 
       [Grains/Cereals], [Meat/Poultry],[Produce],[Seafood] 
FROM 
(
    SELECT P.ProductID, C.CategoryName, (OD.UnitPrice * OD.Quantity) AS Monto
    FROM Products P 
        INNER JOIN dbo.[Order Details] OD
            ON P.ProductID=OD.ProductID
        INNER JOIN Categories C
            on C.CategoryID=P.CategoryID
) PIV
PIVOT (SUM(Monto) FOR  CategoryName IN ([Beverages], [Condiments], [Confections], [Dairy Products], 
       [Grains/Cereals], [Meat/Poultry],[Produce],[Seafood])) AS Child

Having executed the code above, we will get this result pivoted.



Finally, using dynamic PIVOT the result will be the same.

DECLARE @CatPVT AS NVARCHAR(MAX), @Categorias AS VARCHAR(20)
DECLARE @CatID INT 
SET @CatID=(SELECT MIN(CategoryID) FROM Categories)
SET @Categorias = ( SELECT CategoryName FROM Categories WHERE CategoryID = @CatID)
SET @CatPVT = N''
WHILE @Categorias IS NOT NULL
BEGIN
  SET @CatPVT = @CatPVT + N',['+ @Categorias +N']'
  SET @Categorias = (SELECT TOP(1) CategoryName 
                     FROM Categories WHERE CategoryID > @CatID 
                     ORDER BY CategoryID ASC)
  SET @CatID=(SELECT MIN(CategoryID) FROM Categories WHERE Categoryname=@Categorias)
END
print @CatPVT
SET @CatPVT = SUBSTRING(@CatPVT, 2, LEN(@CatPVT))
 
DECLARE @sql AS NVARCHAR(MAX)
SET @sql = N'SELECT *  
            FROM (SELECT P.ProductID, C.CategoryName, (OD.UnitPrice * OD.Quantity) AS Monto
                  FROM Products P 
                    INNER JOIN dbo.[Order Details] OD
                        ON P.ProductID=OD.ProductID
                    INNER JOIN Categories C
                    ON C.CategoryID=P.CategoryID
            ) PIV
            PIVOT (SUM(Monto) FOR  CategoryName IN ('+ @CatPVT  + ')) AS Child'
 
EXEC sp_executesql @sql

As you have seen, dynamic PIVOT is truly useful for any real business case. Therefore, I hope you make the most out of this algorithm. That is all for now, let me know any remark you may have. Thanks for reading.

Monday, 9 January 2006

Exploring ROW_NUMBER function in SQL Server 2005

One of the new features in SQL Server 2005 are the Ranking functions which are used to analyse data. Among these functions we have, for instance, ROW_NUMBER which helps to enumerate every row of a result set in a simple way. Before showing the illustration of it, I will share the syntax:

ROW_NUMBER ( ) OVER ( [ partition_by_clause ] order_by_clause )
where:
  • partition_by_clause: Divide the result set generated by FROM clause into partitions on which ROW_NUMBER will work.
  • order_by_clause: Determine the order for ROW_NUMBER.
Now the illustration: we are going to enumerate the details of the sales based on the price.

SELECT ROW_NUMBER() OVER( ORDER BY UnitPrice DESC ) AS Item, 
SaleOrderID, ProductID, Quantity,UnitPrice 
FROM OrderDetail 

Item    SaleOrderID     ProductID   Quantity    UnitPrice
----    ------------    ---------   --------    -----------
1        00000015       FX8S           1        7621.71
2        00000016       FX8E           1        3601.77
3        00000016       FX5P           1        2207.87
4        00000014       FX5G           1        6507.82
5        00000018       FX21           1        631.62
6        00000019       FX19           1        751.62

Another example of it would be using PARTITION clause for partitions result sets by ProductGroupID and then ROW_NUMBER will be applied based on ProductStock. Therefore, we are writing a query like this:

SELECT ROW_NUMBER() OVER(PARTITION BY ProductGroupID ORDER BY ProductStock ASC) AS Item, 
ProductID, ProductDesc, ProductGroupID, ProductStock 
FROM Products

Item  ProductID    ProductDesc                       ProductGroupID    ProductStock
----  ---------    ------------------------------    --------------    -----------
1     DS02         MAINBOARD 845 PEMYL               G003               -1
2     DS06         MAINBOARD 850 EV2                 G003               10
1     DS07         TECLADO 104 MULTIMEDIA32          G004               20
1     DS12         SWITCH 5 PORT 10/100, 220V        G006               26
1     DS45         MUEBLE DE OFICINA                 G007               34
1     DS76         DISCO DURO                        G008               20
1     DS07         IMPRESORA LASER                   G009               10
1     DS45         MEMORIA STICK                     G010               2
2     DS29         MEMORIA VIDEO                     G010               5
1     DS08         TINTA CANON BCI24 NEGRO           G011               12
1     DS41         WEB CAM ALTIOR MODELO B02         G012               0
1     DS74         MONITOR DE 15                     G016               6
1     DS83         IMPRESORA MATRICIAL DFX5001       G018               12
2     DS84         IMPRESORA MATRICIAL DFX8502       G018               17
3     DS21         IMPRESORA MATRICIAL FX21944       G018               48

(15 row(s) affected)
As I mentioned before, this functionality is helpful, simple and easy to use it, so it does not worth a major explanation. That is all for now. Thanks for reading.