Window Functions In SQL

Many times Data Engineers get confused when exactly to use Window Functions and when they are propriate over aggregate functions? The main two cases for Window Functions would be following:

  1. When you want to measure trends or changes over rows.
  2. When you want to rank a column for outreach or prioritization.

What is window function

A window function is a calculation across a set of rows in a table that are somehow related to the current row. Means they can be used for calculating running totals that incorporate the current row or, ranking records across rows, inclusive of the current one. A window function is similar to aggregate functions combined with group by clauses but have one key difference: Window functions retain the total number of rows between the input table and the output table (or result). Behind the scenes, the window function is able to access more than just the current row of the query result. Result of the window function can be following columns:

  • Average running price
  • Running total orders
  • Running sum sales
  • Rank
  • Percentile

Types of window functions

There are 3 different types of Window functions described below.

  1. CORE:
  • PARTITION BY: A subclause of the OVER clause. Similar to GROUP BY.
  • OVER: Typically precedes the partition by that signals what to “GROUP BY”.
  • AGGREGATES: Aggregate functions that are used in window functions, too (e.g., sum, count, avg).

2. RANKING:

  • ROW_NUMBER(): Ranking function where each row gets a different number.
  • RANK(): Ranking function where a row could get the same rank if they have the same value.
  • DENCE_RANK(): Ranking function similar to rank() but ranks are not skipped with ties.

3. ADVANCED:

  • ALIASES: Shorthand that can be used if there are several window functions in one query.
  • PERSENTILES: Defines what percentile a value falls into over the entire table.
  • LAG/LEAD: Calculating differences between rows’ values.

Window function allows users to compare one row to another without doing any joins. Window functions are effective when you want to measure trends over time or rank a specific column, and it retains the total number of records without collapsing or condensing any of the original dataset.

There are a few key terms to review as a part of understanding core window functions:

  • PARTITION BY: A subclause of the OVER clause. I like to think of PARTITION BY as the GROUP BY equivalent in window functions. PARTITION BY allows you to determine what you’d like to “group by” within the window function. Most often, you are partitioning by a month, region, etc. as you are tracking changes over time.
  • OVER: This syntax signals a window function and precedes the details of the window function itself.

CORE window functions

Window function allows users to compare one row to another without doing any joins. Window functions are effective when you want to measure trends over time or rank a specific column, and it retains the total number of records without collapsing or condensing any of the original dataset.

There are a few key terms to review as a part of understanding core window functions:

  • PARTITION BY: A subclause of the OVER clause. I like to think of PARTITION BY as the GROUP BY equivalent in window functions. PARTITION BY allows you to determine what you’d like to “group by” within the window function. Most often, you are partitioning by a month, region, etc. as you are tracking changes over time.
  • OVER: This syntax signals a window function and precedes the details of the window function itself.

RANKING window functions

There are three types of ranking functions that serve the same use case: how to take a column and rank its values. The choice of which ranking function to use is up to the SQL user, often created in conjunction with someone on a customer or business team.

  • Row_number(): Ranking is distinct amongst records even with ties in what the table is ranked against.
  • Rank(): Ranking is the same amongst tied values and ranks skip for subsequent values.
  • Dense_rank(): Ranking is the same amongst tied values and ranks do not skip for subsequent values.

In following example we will preform results using all three ranking window functions with table orders showed above.

In this example alias account_year_window created at the end of query as multiple window functions been used. Aliases help tighten query syntax. Below is the result table of the query that helps understand difference between ranking window functions.

ADVENCED window functions

When there are a large number of records that need to be ranked, individual ranks (e.g., 1, 2, 3, 4…) are ineffective in helping teams determine the best of the distribution from the rest. Percentiles help better describe large datasets. For example, a team might want to reach out to the Top 5% of customers. The following components are important to consider when building a query with percentiles:

  1. NTILE + the number of buckets you’d like to create within a column (e.g., 100 buckets would create traditional percentiles, 4 buckets would create quartiles, etc.)
  2. OVER
  3. ORDER BY (optional, typically a date column)
  4. AS + the new column name

In the following example lets use the functionality to divide the accounts into 4 levels in terms of the amount of for their orders. Resulting table should have the , the time for each order, the total amount of paper purchased, and one of four levels in a column.

Conclusion

Window functions are similar to aggregate/group by functions. They maintain the total number of rows from the original dataset and typically used in the following ways:

  • Measure and/or track changes over time.
  • Rank a column to be used for outreach and/or prioritization
  • In case of using multiple window functions better create alia that leverage the same PARTITION BY, OVER, and ORDER BY in a single query, leveraging aliases will help tighten your syntax.

Additional Resources

This documentation on Window Functions is helpful to learn more.