Chapter 6

Chapter 6 — Window Functions

Window functions are the single most powerful upgrade between "knows SQL" and "writes professional SQL." They let you perform calculations across a set of related rows without collapsing them into a single output row the way GROUP BY does.

With a window function you can answer questions like: "What is each order's value and the running total up to that order?" or "Which book ranked #1 in sales within each genre this month?" — in a single query, without self-joins or subqueries.

MySQL version: Window functions require MySQL 8.0+. They are evaluated after WHERE and GROUP BY but before ORDER BY and LIMIT — which is why you cannot filter on a window function result in a WHERE clause directly. Use a CTE or subquery to wrap and filter.

1. The OVER() Clause — Anatomy

Every window function is a regular function followed by OVER(). What's inside OVER defines the "window" — which rows the function looks at when computing its result for the current row.

RANK() OVER ( PARTITION BY genre_id ← reset the window per group (optional) ORDER BY units_sold DESC ← sort within the window (often required) ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ← frame: which rows count (optional) )
  • PARTITION BY — splits rows into independent groups (like GROUP BY, but rows are kept). Omit to treat the whole result set as one window.
  • ORDER BY — defines row order inside each partition. Required for ranking and navigation functions; optional for pure aggregates.
  • Frame clause — narrows which rows within the partition are included. Defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW when ORDER BY is present.

GROUP BY vs window function — the key difference

-- GROUP BY: one row per genre, detail lost SELECT genre_id, SUM(units_sold) AS total FROM monthly_sales GROUP BY genre_id; -- Window function: every row kept, total added as extra column SELECT genre_id, title, units_sold, SUM(units_sold) OVER (PARTITION BY genre_id) AS genre_total FROM monthly_sales;

2. Ranking Functions

ROW_NUMBER()
Unique sequential integer per row within the partition. Ties get different numbers — which tie wins depends on row order.
RANK()
Same as ROW_NUMBER but ties share the same rank and the next rank skips. 1, 2, 2, 4 …
DENSE_RANK()
Like RANK but no gaps. Ties share rank, next rank is always +1. 1, 2, 2, 3 …
NTILE(n)
Divides rows into n roughly equal buckets. Good for quartiles, deciles, percentile bands.
PERCENT_RANK()
Relative rank 0–1. (rank - 1) / (rows - 1). First row = 0.0, last = 1.0.
CUME_DIST()
Cumulative distribution: fraction of rows with value ≤ current row's value.

ROW_NUMBER, RANK, DENSE_RANK side by side

-- Rank books by units sold — see how ties behave differently SELECT title, units_sold, ROW_NUMBER() OVER (ORDER BY units_sold DESC) AS row_num, RANK() OVER (ORDER BY units_sold DESC) AS rnk, DENSE_RANK() OVER (ORDER BY units_sold DESC) AS dense_rnk FROM monthly_sales ORDER BY units_sold DESC;
titleunits_soldrow_numrnkdense_rnk
Dune42111
Foundation38222
Neuromancer35333
The Hobbit35433
198429554
Middlemarch21665

Neuromancer and The Hobbit both sold 35 copies — both get rank 3 under RANK() and DENSE_RANK(). RANK() then skips to 5 (a gap); DENSE_RANK() continues to 4 (no gap).

Top-N per group — the most common window pattern

Find the top 2 best-selling books within each genre this month. You cannot put a window function in WHERE — wrap it in a CTE:

WITH ranked_books AS ( SELECT b.title, g.name AS genre, SUM(oi.quantity) AS units_sold, RANK() OVER ( PARTITION BY bg.genre_id ORDER BY SUM(oi.quantity) DESC ) AS rnk FROM order_items oi JOIN books b ON b.book_id = oi.book_id JOIN book_genres bg ON bg.book_id = b.book_id JOIN genres g ON g.genre_id = bg.genre_id GROUP BY b.book_id, b.title, bg.genre_id, g.name ) SELECT genre, rnk, title, units_sold FROM ranked_books WHERE rnk <= 2 ORDER BY genre, rnk;
RANK vs ROW_NUMBER for top-N: Use RANK() when you want all tied rows at position N included (e.g. both books that tied for 2nd). Use ROW_NUMBER() when you want exactly N rows per partition, ties broken arbitrarily.

NTILE — bucket customers into quartiles

WITH spend AS ( SELECT CONCAT(c.first_name, ' ', c.last_name) AS customer, SUM(oi.quantity * oi.unit_price) AS total_spent FROM customers c JOIN orders o ON o.customer_id = c.customer_id JOIN order_items oi ON oi.order_id = o.order_id GROUP BY c.customer_id, customer ) SELECT customer, total_spent, NTILE(4) OVER (ORDER BY total_spent DESC) AS quartile FROM spend ORDER BY total_spent DESC;

3. Navigation Functions — LAG, LEAD, FIRST_VALUE, LAST_VALUE

LAG(col, n)
Value from n rows before the current row in the window. Default n=1. Returns NULL at the start.
LEAD(col, n)
Value from n rows after the current row. Returns NULL at the end.
FIRST_VALUE(col)
Value from the first row of the window frame. Every row in the partition can see the partition's first value.
LAST_VALUE(col)
Value from the last row of the current frame. Needs an explicit frame clause to reach the partition end.
NTH_VALUE(col, n)
Value from the nth row of the window frame. 1-indexed.

LAG and LEAD — month-over-month sales change

WITH monthly_totals AS ( SELECT DATE_FORMAT(o.order_date, '%Y-%m') AS month, SUM(oi.quantity * oi.unit_price) AS revenue FROM orders o JOIN order_items oi ON oi.order_id = o.order_id GROUP BY month ) SELECT month, revenue, LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue, revenue - LAG(revenue) OVER (ORDER BY month) AS change, ROUND( (revenue - LAG(revenue) OVER (ORDER BY month)) / LAG(revenue) OVER (ORDER BY month) * 100, 1) AS pct_change FROM monthly_totals ORDER BY month;
monthrevenueprev_month_revenuechangepct_change
2026-011240.00NULLNULLNULL
2026-02980.501240.00-259.50-20.9
2026-031560.75980.50+580.25+59.2
2026-041820.001560.75+259.25+16.6
2026-051390.251820.00-429.75-23.6

LAG with a default value

LAG(col, n, default) — the third argument replaces NULL at the boundary:

-- Replace NULL with 0 for the first row so arithmetic doesn't break LAG(revenue, 1, 0) OVER (ORDER BY month)

FIRST_VALUE — benchmark each month against January

SELECT month, revenue, FIRST_VALUE(revenue) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS jan_revenue, ROUND(revenue / FIRST_VALUE(revenue) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) * 100, 1 ) AS vs_jan_pct FROM monthly_totals ORDER BY month;
LAST_VALUE gotcha: By default the frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so LAST_VALUE returns the current row's value — not the partition's last value. Always add ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING when you want the true last value of the partition.

4. Aggregate Window Functions — Running Totals and Moving Averages

Any aggregate function (SUM, COUNT, AVG, MIN, MAX) can be used as a window function by adding OVER(). The frame clause controls exactly which rows are included in the aggregate.

Frame clause reference

Frame syntaxMeaning
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWAll rows from partition start up to and including current row → running total
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGAll rows in the partition → same as no frame (partition total on every row)
ROWS BETWEEN 2 PRECEDING AND CURRENT ROWCurrent row plus the 2 rows before it → 3-row moving window
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWINGOne row before, current row, one row after → centred 3-row window
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWDefault when ORDER BY is present. Like ROWS but includes all peers (tied ORDER BY values).

Running total and cumulative percentage

WITH monthly_totals AS ( SELECT DATE_FORMAT(o.order_date, '%Y-%m') AS month, SUM(oi.quantity * oi.unit_price) AS revenue FROM orders o JOIN order_items oi ON oi.order_id = o.order_id GROUP BY month ) SELECT month, revenue, SUM(revenue) OVER ( ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_total, ROUND( SUM(revenue) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) / SUM(revenue) OVER () * 100 , 1) AS cumulative_pct FROM monthly_totals ORDER BY month;
monthrevenuerunning_totalcumulative_pct
2026-011240.001240.0018.5%
2026-02980.502220.5033.1%
2026-031560.753781.2556.4%
2026-041820.005601.2583.5%
2026-051390.256991.50100.0%

3-month moving average

SELECT month, revenue, ROUND( AVG(revenue) OVER ( ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) , 2) AS moving_avg_3m FROM monthly_totals ORDER BY month;
SUM() OVER () — no ORDER BY, no PARTITION BY — means "the total of the entire result set." This is the simplest window aggregate: every row sees the same grand total. Useful for computing percentages: revenue / SUM(revenue) OVER () * 100.

Running COUNT — orders placed so far this year

SELECT order_date, order_id, COUNT(*) OVER ( ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS orders_ytd FROM orders WHERE YEAR(order_date) = 2026 ORDER BY order_date;

5. Named Windows — the WINDOW Clause

When the same OVER() definition is repeated several times, define it once with WINDOW … AS (…) at the end of the query and reference it by name. This keeps queries DRY and reduces errors from copy-paste drift.

SELECT month, revenue, SUM(revenue) OVER w AS running_total, AVG(revenue) OVER w AS running_avg, COUNT(*) OVER w AS months_so_far, MIN(revenue) OVER w AS lowest_so_far, MAX(revenue) OVER w AS highest_so_far FROM monthly_totals WINDOW w AS ( ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) ORDER BY month;

6. Practical Patterns

Deduplicate — keep only the most recent order per customer

ROW_NUMBER is the standard way to deduplicate or pick one row per group:

WITH latest_orders AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY order_date DESC, order_id DESC ) AS rn FROM orders ) SELECT customer_id, order_id, order_date, total_amount FROM latest_orders WHERE rn = 1 ORDER BY customer_id;

Detect gaps — find orders where the previous order was more than 90 days ago

WITH order_gaps AS ( SELECT customer_id, order_id, order_date, LAG(order_date) OVER ( PARTITION BY customer_id ORDER BY order_date ) AS prev_order_date, DATEDIFF(order_date, LAG(order_date) OVER ( PARTITION BY customer_id ORDER BY order_date ) ) AS days_since_last FROM orders ) SELECT customer_id, order_date, prev_order_date, days_since_last FROM order_gaps WHERE days_since_last > 90 ORDER BY days_since_last DESC;

Pareto / cumulative share — find which books make up 80% of revenue

WITH book_revenue AS ( SELECT b.title, SUM(oi.quantity * oi.unit_price) AS revenue FROM order_items oi JOIN books b ON b.book_id = oi.book_id GROUP BY b.book_id, b.title ), cumulative AS ( SELECT title, revenue, SUM(revenue) OVER ( ORDER BY revenue DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cumulative_revenue, SUM(revenue) OVER () AS grand_total FROM book_revenue ) SELECT title, revenue, ROUND(cumulative_revenue / grand_total * 100, 1) AS cumulative_pct FROM cumulative WHERE (cumulative_revenue - revenue) / grand_total < 0.80 -- books needed to reach 80% ORDER BY revenue DESC;
Cannot filter on a window function in WHERE. MySQL evaluates window functions after WHERE. The pattern is always: compute in a CTE → filter in the outer SELECT's WHERE clause.

Quick Reference

Function / clauseWhat it doesCommon use
ROW_NUMBER() Unique integer per row within partition, no tie-sharing Deduplication, exactly-N-per-group selection
RANK() Ties share rank; next rank skips (gaps) Top-N per group where ties should all be included
DENSE_RANK() Ties share rank; no gaps in sequence Medal-style ranking; consecutive rank bands
NTILE(n) Divides rows into n buckets Quartiles, deciles, percentile segmentation
LAG(col, n, default) Value from n rows earlier in the window Period-over-period comparison, gap detection
LEAD(col, n, default) Value from n rows later in the window Forward-looking metrics, time-to-next-event
FIRST_VALUE(col) Value from first row of the frame Benchmark every row against partition start
LAST_VALUE(col) Value from last row of the frame (extend frame!) Benchmark against partition end — needs ROWS BETWEEN … UNBOUNDED FOLLOWING
SUM/AVG/COUNT OVER (ORDER BY … ROWS BETWEEN …) Running or moving aggregate Running totals, moving averages, YTD counts
SUM(x) OVER () Grand total across all rows Percentage-of-total: x / SUM(x) OVER ()
PARTITION BY Resets window per group Per-category ranks, per-customer running totals
WINDOW w AS (…) Named window reused across multiple functions Avoid repeating identical OVER() clauses
Cannot use in WHERE Window functions evaluated after WHERE/GROUP BY Always wrap in a CTE, then filter in outer WHERE
Next: Chapter 7 — Views: CREATE VIEW, querying views, updatable views, using views to simplify complex queries and control column-level access.