Chapter 6

Chapter 6 — Window Functions

Window functions are the single most powerful upgrade between "knows SQL" and "writes professional SQL." They let you perform calculations across a set of related rows without collapsing them into a single output row the way GROUP BY does.

With a window function you can answer questions like: "What is each order's value and the running total up to that order?" or "Which book ranked #1 in sales within each genre this month?" — in a single query, without self-joins or subqueries.

MySQL version: Window functions require MySQL 8.0+. They are evaluated after WHERE and GROUP BY but before ORDER BY and LIMIT — which is why you cannot filter on a window function result in a WHERE clause directly. Use a CTE or subquery to wrap and filter.

1. The OVER() Clause — Anatomy

Every window function is a regular function followed by OVER(). What's inside OVER defines the "window" — which rows the function looks at when computing its result for the current row.

RANK() OVER ( PARTITION BY genre_id ← reset the window per group (optional) ORDER BY units_sold DESC ← sort within the window (often required) ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ← frame: which rows count (optional) )

PARTITION BY — splits rows into independent groups (like GROUP BY, but rows are kept). Omit to treat the whole result set as one window.
ORDER BY — defines row order inside each partition. Required for ranking and navigation functions; optional for pure aggregates.
Frame clause — narrows which rows within the partition are included. Defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW when ORDER BY is present.

GROUP BY vs window function — the key difference

-- GROUP BY: one row per genre, detail lost SELECT genre_id, SUM(units_sold) AS total FROM monthly_sales GROUP BY genre_id; -- Window function: every row kept, total added as extra column SELECT genre_id, title, units_sold, SUM(units_sold) OVER (PARTITION BY genre_id) AS genre_total FROM monthly_sales;

2. Ranking Functions

ROW_NUMBER()

Unique sequential integer per row within the partition. Ties get different numbers — which tie wins depends on row order.

RANK()

Same as ROW_NUMBER but ties share the same rank and the next rank skips. 1, 2, 2, 4 …

DENSE_RANK()

Like RANK but no gaps. Ties share rank, next rank is always +1. 1, 2, 2, 3 …

NTILE(n)

Divides rows into n roughly equal buckets. Good for quartiles, deciles, percentile bands.

PERCENT_RANK()

Relative rank 0–1. (rank - 1) / (rows - 1). First row = 0.0, last = 1.0.

CUME_DIST()

Cumulative distribution: fraction of rows with value ≤ current row's value.

ROW_NUMBER, RANK, DENSE_RANK side by side

-- Rank books by units sold — see how ties behave differently SELECT title, units_sold, ROW_NUMBER() OVER (ORDER BY units_sold DESC) AS row_num, RANK() OVER (ORDER BY units_sold DESC) AS rnk, DENSE_RANK() OVER (ORDER BY units_sold DESC) AS dense_rnk FROM monthly_sales ORDER BY units_sold DESC;

title	units_sold	row_num	rnk	dense_rnk
Dune	42	1	1	1
Foundation	38	2	2	2
Neuromancer	35	3	3	3
The Hobbit	35	4	3	3
1984	29	5	5	4
Middlemarch	21	6	6	5

Neuromancer and The Hobbit both sold 35 copies — both get rank 3 under RANK() and DENSE_RANK(). RANK() then skips to 5 (a gap); DENSE_RANK() continues to 4 (no gap).

Top-N per group — the most common window pattern

Find the top 2 best-selling books within each genre this month. You cannot put a window function in WHERE — wrap it in a CTE:

WITH ranked_books AS ( SELECT b.title, g.name AS genre, SUM(oi.quantity) AS units_sold, RANK() OVER ( PARTITION BY bg.genre_id ORDER BY SUM(oi.quantity) DESC ) AS rnk FROM order_items oi JOIN books b ON b.book_id = oi.book_id JOIN book_genres bg ON bg.book_id = b.book_id JOIN genres g ON g.genre_id = bg.genre_id GROUP BY b.book_id, b.title, bg.genre_id, g.name ) SELECT genre, rnk, title, units_sold FROM ranked_books WHERE rnk <= 2 ORDER BY genre, rnk;

RANK vs ROW_NUMBER for top-N: Use RANK() when you want all tied rows at position N included (e.g. both books that tied for 2nd). Use ROW_NUMBER() when you want exactly N rows per partition, ties broken arbitrarily.

NTILE — bucket customers into quartiles

WITH spend AS ( SELECT CONCAT(c.first_name, ' ', c.last_name) AS customer, SUM(oi.quantity * oi.unit_price) AS total_spent FROM customers c JOIN orders o ON o.customer_id = c.customer_id JOIN order_items oi ON oi.order_id = o.order_id GROUP BY c.customer_id, customer ) SELECT customer, total_spent, NTILE(4) OVER (ORDER BY total_spent DESC) AS quartile FROM spend ORDER BY total_spent DESC;

3. Navigation Functions — LAG, LEAD, FIRST_VALUE, LAST_VALUE

LAG(col, n)

Value from n rows before the current row in the window. Default n=1. Returns NULL at the start.

LEAD(col, n)

Value from n rows after the current row. Returns NULL at the end.

FIRST_VALUE(col)

Value from the first row of the window frame. Every row in the partition can see the partition's first value.

LAST_VALUE(col)

Value from the last row of the current frame. Needs an explicit frame clause to reach the partition end.

NTH_VALUE(col, n)

Value from the nth row of the window frame. 1-indexed.

LAG and LEAD — month-over-month sales change

WITH monthly_totals AS ( SELECT DATE_FORMAT(o.order_date, '%Y-%m') AS month, SUM(oi.quantity * oi.unit_price) AS revenue FROM orders o JOIN order_items oi ON oi.order_id = o.order_id GROUP BY month ) SELECT month, revenue, LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue, revenue - LAG(revenue) OVER (ORDER BY month) AS change, ROUND( (revenue - LAG(revenue) OVER (ORDER BY month)) / LAG(revenue) OVER (ORDER BY month) * 100, 1) AS pct_change FROM monthly_totals ORDER BY month;

month	revenue	prev_month_revenue	change	pct_change
2026-01	1240.00	NULL	NULL	NULL
2026-02	980.50	1240.00	-259.50	-20.9
2026-03	1560.75	980.50	+580.25	+59.2
2026-04	1820.00	1560.75	+259.25	+16.6
2026-05	1390.25	1820.00	-429.75	-23.6

LAG with a default value

LAG(col, n, default) — the third argument replaces NULL at the boundary:

-- Replace NULL with 0 for the first row so arithmetic doesn't break LAG(revenue, 1, 0) OVER (ORDER BY month)

FIRST_VALUE — benchmark each month against January

SELECT month, revenue, FIRST_VALUE(revenue) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS jan_revenue, ROUND(revenue / FIRST_VALUE(revenue) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) * 100, 1 ) AS vs_jan_pct FROM monthly_totals ORDER BY month;

LAST_VALUE gotcha: By default the frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so LAST_VALUE returns the current row's value — not the partition's last value. Always add ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING when you want the true last value of the partition.

4. Aggregate Window Functions — Running Totals and Moving Averages

Any aggregate function (SUM, COUNT, AVG, MIN, MAX) can be used as a window function by adding OVER(). The frame clause controls exactly which rows are included in the aggregate.

Frame clause reference

Frame syntax	Meaning
`ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`	All rows from partition start up to and including current row → running total
`ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`	All rows in the partition → same as no frame (partition total on every row)
`ROWS BETWEEN 2 PRECEDING AND CURRENT ROW`	Current row plus the 2 rows before it → 3-row moving window
`ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING`	One row before, current row, one row after → centred 3-row window
`RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`	Default when ORDER BY is present. Like ROWS but includes all peers (tied ORDER BY values).

Running total and cumulative percentage

WITH monthly_totals AS ( SELECT DATE_FORMAT(o.order_date, '%Y-%m') AS month, SUM(oi.quantity * oi.unit_price) AS revenue FROM orders o JOIN order_items oi ON oi.order_id = o.order_id GROUP BY month ) SELECT month, revenue, SUM(revenue) OVER ( ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_total, ROUND( SUM(revenue) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) / SUM(revenue) OVER () * 100 , 1) AS cumulative_pct FROM monthly_totals ORDER BY month;

month	revenue	running_total	cumulative_pct
2026-01	1240.00	1240.00	18.5%
2026-02	980.50	2220.50	33.1%
2026-03	1560.75	3781.25	56.4%
2026-04	1820.00	5601.25	83.5%
2026-05	1390.25	6991.50	100.0%

3-month moving average

SELECT month, revenue, ROUND( AVG(revenue) OVER ( ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) , 2) AS moving_avg_3m FROM monthly_totals ORDER BY month;

SUM() OVER () — no ORDER BY, no PARTITION BY — means "the total of the entire result set." This is the simplest window aggregate: every row sees the same grand total. Useful for computing percentages: revenue / SUM(revenue) OVER () * 100.

Running COUNT — orders placed so far this year

SELECT order_date, order_id, COUNT(*) OVER ( ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS orders_ytd FROM orders WHERE YEAR(order_date) = 2026 ORDER BY order_date;

5. Named Windows — the WINDOW Clause

When the same OVER() definition is repeated several times, define it once with WINDOW … AS (…) at the end of the query and reference it by name. This keeps queries DRY and reduces errors from copy-paste drift.

SELECT month, revenue, SUM(revenue) OVER w AS running_total, AVG(revenue) OVER w AS running_avg, COUNT(*) OVER w AS months_so_far, MIN(revenue) OVER w AS lowest_so_far, MAX(revenue) OVER w AS highest_so_far FROM monthly_totals WINDOW w AS ( ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) ORDER BY month;

6. Practical Patterns

Deduplicate — keep only the most recent order per customer

ROW_NUMBER is the standard way to deduplicate or pick one row per group:

WITH latest_orders AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY order_date DESC, order_id DESC ) AS rn FROM orders ) SELECT customer_id, order_id, order_date, total_amount FROM latest_orders WHERE rn = 1 ORDER BY customer_id;

Detect gaps — find orders where the previous order was more than 90 days ago

WITH order_gaps AS ( SELECT customer_id, order_id, order_date, LAG(order_date) OVER ( PARTITION BY customer_id ORDER BY order_date ) AS prev_order_date, DATEDIFF(order_date, LAG(order_date) OVER ( PARTITION BY customer_id ORDER BY order_date ) ) AS days_since_last FROM orders ) SELECT customer_id, order_date, prev_order_date, days_since_last FROM order_gaps WHERE days_since_last > 90 ORDER BY days_since_last DESC;

Pareto / cumulative share — find which books make up 80% of revenue

WITH book_revenue AS ( SELECT b.title, SUM(oi.quantity * oi.unit_price) AS revenue FROM order_items oi JOIN books b ON b.book_id = oi.book_id GROUP BY b.book_id, b.title ), cumulative AS ( SELECT title, revenue, SUM(revenue) OVER ( ORDER BY revenue DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cumulative_revenue, SUM(revenue) OVER () AS grand_total FROM book_revenue ) SELECT title, revenue, ROUND(cumulative_revenue / grand_total * 100, 1) AS cumulative_pct FROM cumulative WHERE (cumulative_revenue - revenue) / grand_total < 0.80 -- books needed to reach 80% ORDER BY revenue DESC;

Cannot filter on a window function in WHERE. MySQL evaluates window functions after WHERE. The pattern is always: compute in a CTE → filter in the outer SELECT's WHERE clause.

Quick Reference

Function / clause	What it does	Common use
ROW_NUMBER()	Unique integer per row within partition, no tie-sharing	Deduplication, exactly-N-per-group selection
RANK()	Ties share rank; next rank skips (gaps)	Top-N per group where ties should all be included
DENSE_RANK()	Ties share rank; no gaps in sequence	Medal-style ranking; consecutive rank bands
NTILE(n)	Divides rows into n buckets	Quartiles, deciles, percentile segmentation
LAG(col, n, default)	Value from n rows earlier in the window	Period-over-period comparison, gap detection
LEAD(col, n, default)	Value from n rows later in the window	Forward-looking metrics, time-to-next-event
FIRST_VALUE(col)	Value from first row of the frame	Benchmark every row against partition start
LAST_VALUE(col)	Value from last row of the frame (extend frame!)	Benchmark against partition end — needs `ROWS BETWEEN … UNBOUNDED FOLLOWING`
SUM/AVG/COUNT OVER (ORDER BY … ROWS BETWEEN …)	Running or moving aggregate	Running totals, moving averages, YTD counts
SUM(x) OVER ()	Grand total across all rows	Percentage-of-total: `x / SUM(x) OVER ()`
PARTITION BY	Resets window per group	Per-category ranks, per-customer running totals
WINDOW w AS (…)	Named window reused across multiple functions	Avoid repeating identical OVER() clauses
Cannot use in WHERE	Window functions evaluated after WHERE/GROUP BY	Always wrap in a CTE, then filter in outer WHERE

Next: Chapter 7 — Views: CREATE VIEW, querying views, updatable views, using views to simplify complex queries and control column-level access.