Wednesday, February 18, 2009

Rank makes things less rank

A query that has popped up a few times is to retrieve a list of people along with the first/last/top sale that they have made. If we were after the last sale ever made, this query would be simple, utilising top we would order by the relevant column and get the top 1, done. However in this case we want the top item PER person, not per the entire set, this is where Rank() comes into play.

Using Rank we can avoid iterating through our data with cursors to achieve this query in a single set operation.

Lets use adventureWorks and retrieve the last sale date made by each of our salespeople.

Our first step is to get all of our sales people and the orders they made:

select 
SOH.OrderDate,
SOH.SalesPersonID,
vEmp.FirstName + ' ' + vEmp.LastName Employee
from
Sales.SalesOrderHeader SOH
inner join Sales.SalesPerson SP on SOH.SalesPersonID = SP.SalesPersonID
inner join HumanResources.vEmployee vEmp on SP.SalesPersonID = vEmp.EmployeeID






Now we would like to provide away to determine which OrderDate is the latest, per SalesPerson. This is where rank comes into play:

select 
SOH.OrderDate,
SOH.SalesPersonID,
vEmp.FirstName + ' ' + vEmp.LastName Employee,
rank() over(Partition by SOH.SalesPersonID order by OrderDate desc) OrderRank
from
Sales.SalesOrderHeader SOH
inner join Sales.SalesPerson SP on SOH.SalesPersonID = SP.SalesPersonID
inner join HumanResources.vEmployee vEmp on SP.SalesPersonID = vEmp.EmployeeID






the key section is:

rank() over(Partition by SOH.SalesPersonID order by OrderDate desc) OrderRank

this now gives us a machanism for determining, per SalesPerson, which order was the latest, it will be the order with a OrderRank of 1.

Breaking down the function, the key parts are the partition and the order by:
Partition:
Patition is analogous with 'Group By' and is used to literally Partition our dataset to allow for multiple ordering withing the one dataset, Partitioning by the SalesPersonID will let us provide subsequent ranking based on each ID
Order By
Order By is the item which will determine our ranking and is based on the set of data provided by the partition

However, this still does not give us the Last order as per our request, to get the last order, we now filter the result set by OrderRank, limiting it only to those rows with a rank of 1, giving us the last order Per SalesPerson.


As Rank() is a derived column, we must select from this result set and filter the subset:

select
OrderDate, SalesPersonID, Employee
from
(
select
SOH.SalesOrderID,
SOH.OrderDate,
SOH.SalesPersonID,
vEmp.FirstName + ' ' + vEmp.LastName Employee,
rank() over(Partition by SOH.SalesPersonID order by OrderDate desc) OrderRank
from
Sales.SalesOrderHeader SOH
inner join Sales.SalesPerson SP on SOH.SalesPersonID = SP.SalesPersonID
inner join HumanResources.vEmployee vEmp on SP.SalesPersonID = vEmp.EmployeeID
) RankedOrders
where
RankedOrders.OrderRank = 1






Now we nearly have the last sale date of our salesperson. as we can see from the result, any items in the ranking which have the same OrderDate are given a ranking of 1, therefore we get 2 results for Stephen Jiang as he completed two order on June 1. Depending on your reporting requirements you can handle this with a more selective ranking order or in our case simply using distinct as we require the sales person and the last sale date.

select distinct
OrderDate, SalesPersonID, Employee
from
(
select
SOH.SalesOrderID,
SOH.OrderDate,
SOH.SalesPersonID,
vEmp.FirstName + ' ' + vEmp.LastName Employee,
rank() over(Partition by SOH.SalesPersonID order by OrderDate desc) OrderRank
from
Sales.SalesOrderHeader SOH
inner join Sales.SalesPerson SP on SOH.SalesPersonID = SP.SalesPersonID
inner join HumanResources.vEmployee vEmp on SP.SalesPersonID = vEmp.EmployeeID
) RankedOrders
where
RankedOrders.OrderRank = 1





And there we have it, a much less rank way of getting the top results without having to resort to cursors.

2 comments:

Smartbiz said...

Cool code...
Using it already...
Was going to do sub queries, but these can take ages to run....

John said...

glad you like it, im hoping to do more posts along these lines, just need to get that pesky 'work' done first :)