Quantcast
Channel: The Data Specialist
Viewing all articles
Browse latest Browse all 26

Analytic functions in SQL Server 2012 – Part I – Distributions

$
0
0

In this article, I take a look at the analytic functions in SQL Server 2012 to handle frequency distributions.

CUME_DIST

The CUME_DIST function returns the percentage of records that are of same or lower order than the current record.

The expression:
CUME_DIST() OVER(ORDER BY MyValue)
Is equivalent (neglecting precision) to :
1.0 * COUNT(*)
OVER (ORDER BY MyValue
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
/ COUNT(*) OVER ()

Possible scenario: calculate the percentage of households whose income is not greater than the current one.

PERCENT_RANK

The PERCENT_RANK rank is similar to the CUME_DIST function.

The expression:
PERCENT_RANK() OVER(ORDER BY MyValue)

Is equivalent to (neglecting integer division) :

(  RANK() OVER (ORDER BY MyValue) –1      )
/ ( COUNT(*) OVER ()         –1      )

Possible scenario: for each household, calculate the percentage of the other households that earn less than the current one.

PERCENTILE_DISC

Returns the smallest value such that CUME_DIST is equal to or greater than the provided probability.

PERCENTILE_DISC (0.4)
WITHIN GROUP ( ORDER BY MyValue ASC )
OVER()

A few remarks:

  • NULL values are ignored, although this is not the case for the CUME_DIST function;
  • The OVER clause is mandatory, although it may be empty;
  • No ORDER BY is allowed in the OVER clause; there is a specific WITHIN GROUP clause to specify the ordering of the partition.

Possible scenario: what is the income under which 10% of households fall?

PERCENTILE_CONT

This is an interpolated version of PERCENTILE_DISC. It shares the same syntax. The same remarks as above apply.


Filed under: SQL, SQL Server 2012 Tagged: 2012, CUME_DIST, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK, SQL SERVER, T-SQL, TRANSACT-SQL

Viewing all articles
Browse latest Browse all 26

Trending Articles