For the rest of this article, suppose you have a table describing events, with the following columns:
- EmployeeID: the ID of the employee who triggered the event
- ItemID: the ID of the item for which the event was triggered
- EventDateTime: when the triggered occurred.
FIRST_VALUE / LAST_VALUE
These functions will get the first/ last value within the specified PARTITION. “First” and “last” will be defined according to the ORDER specified in the OVER clause.
To get the ID of the employee who created an item (first event), you could use the following expression:
FIRST_VALUE(EmployeeID) OVER (PARTITION BY ItemID ORDER BY EventDateTime ASC)
The expression will return the value of EmployeeID for the row that has the lowest EventDateTime, amongst every rows having the same value for ItemID.
LAST_VALUE fits a similar role, but beware of the default window frame:
When no ROWS or RANGE clause has been specified in the OVER clause, RANGE UNBOUNDED PRECEDING AND CURRENT ROW is used.
Does anybody see any use for using LAST_VALUE with this window frame?
LAG / LEAD
LAG and LEAD allow you to get information from the n-th row before or after the current “row”.
The following expression returns the date time of the next event for the current ItemID:
LEAD( EventDateTime, 1, GETDATE() ) OVER (PARTITION BY ItemID ORDER BY EventDateTime)
LAG and LEAD accept no window frame (no implicit ROWS or RANGE clause).
Are these functions deterministic?
When you use these functions, remember the following: if the value you wish to return is not included in the ORDER BY clause, rows with different values may have the same order. In that case, you cannot guarantee which value will be returned.
Consider the following example:
WITH DATA (partitionID, orderingValue, Value) AS (
SELECT 1, 100, ‘VAL1’ UNION ALL
SELECT 1, 100, ‘VAL2’ UNION ALL
SELECT 2, 100, ‘VAL2’ UNION ALL
SELECT 2, 100, ‘VAL1’
)
SELECT
*
INTO #TMP
FROM DATA;
SELECT *
, FIRST_VALUE(Value) OVER (PARTITION BY partitionID ORDER BY orderingValue)
FROM #TMP;
CREATE CLUSTERED INDEX myTempIndex
ON #TMP (partitionID, orderingValue, Value);
SELECT *
, FIRST_VALUE(Value) OVER (PARTITION BY partitionID ORDER BY orderingValue)
FROM #TMP;
DROP INDEX myTempIndex ON #TMP;
DROP TABLE #TMP;
Creating a clustered INDEX on the table changed the outcome of the query.
Even without changes to an INDEX, I suspect a change in an execution plan may also lead to such changes.
Filed under: SQL, SQL Server 2012 Tagged: 2012, Analytic functions, FIRST_VALUE, LAG, LAST_VALUE, LEAD, SQL SERVER, T-SQL, TRANSACT-SQL