Chapter 6 — Window Functions
Window functions are the single most powerful upgrade between "knows SQL" and "writes
professional SQL." They let you perform calculations across a set of related rows
without collapsing them into a single output row the way GROUP BY does.
With a window function you can answer questions like: "What is each order's value
and the running total up to that order?" or "Which book ranked #1 in sales
within each genre this month?" — in a single query, without self-joins or subqueries.
MySQL version: Window functions require MySQL 8.0+.
They are evaluated after WHERE and GROUP BY but before ORDER BY and LIMIT —
which is why you cannot filter on a window function result in a WHERE clause directly.
Use a CTE or subquery to wrap and filter.
1. The OVER() Clause — Anatomy
Every window function is a regular function followed by OVER().
What's inside OVER defines the "window" — which rows the function looks at when
computing its result for the current row.
RANK() OVER (
PARTITION BY genre_id ← reset the window per group (optional)
ORDER BY units_sold DESC ← sort within the window (often required)
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW ← frame: which rows count (optional)
)
- PARTITION BY — splits rows into independent groups (like GROUP BY, but rows are kept). Omit to treat the whole result set as one window.
- ORDER BY — defines row order inside each partition. Required for ranking and navigation functions; optional for pure aggregates.
- Frame clause — narrows which rows within the partition are included. Defaults to
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW when ORDER BY is present.
GROUP BY vs window function — the key difference
SELECT genre_id, SUM(units_sold) AS total
FROM monthly_sales
GROUP BY genre_id;
SELECT
genre_id,
title,
units_sold,
SUM(units_sold) OVER (PARTITION BY genre_id) AS genre_total
FROM monthly_sales;
2. Ranking Functions
ROW_NUMBER()
Unique sequential integer per row within the partition. Ties get different numbers — which tie wins depends on row order.
RANK()
Same as ROW_NUMBER but ties share the same rank and the next rank skips. 1, 2, 2, 4 …
DENSE_RANK()
Like RANK but no gaps. Ties share rank, next rank is always +1. 1, 2, 2, 3 …
NTILE(n)
Divides rows into n roughly equal buckets. Good for quartiles, deciles, percentile bands.
PERCENT_RANK()
Relative rank 0–1. (rank - 1) / (rows - 1). First row = 0.0, last = 1.0.
CUME_DIST()
Cumulative distribution: fraction of rows with value ≤ current row's value.
ROW_NUMBER, RANK, DENSE_RANK side by side
SELECT
title,
units_sold,
ROW_NUMBER() OVER (ORDER BY units_sold DESC) AS row_num,
RANK() OVER (ORDER BY units_sold DESC) AS rnk,
DENSE_RANK() OVER (ORDER BY units_sold DESC) AS dense_rnk
FROM monthly_sales
ORDER BY units_sold DESC;
| title | units_sold | row_num | rnk | dense_rnk |
| Dune | 42 | 1 | 1 | 1 |
| Foundation | 38 | 2 | 2 | 2 |
| Neuromancer | 35 | 3 | 3 | 3 |
| The Hobbit | 35 | 4 | 3 | 3 |
| 1984 | 29 | 5 | 5 | 4 |
| Middlemarch | 21 | 6 | 6 | 5 |
Neuromancer and The Hobbit both sold 35 copies — both get rank 3 under RANK() and DENSE_RANK().
RANK() then skips to 5 (a gap); DENSE_RANK() continues to 4 (no gap).
Top-N per group — the most common window pattern
Find the top 2 best-selling books within each genre this month.
You cannot put a window function in WHERE — wrap it in a CTE:
WITH ranked_books AS (
SELECT
b.title,
g.name AS genre,
SUM(oi.quantity) AS units_sold,
RANK() OVER (
PARTITION BY bg.genre_id
ORDER BY SUM(oi.quantity) DESC
) AS rnk
FROM order_items oi
JOIN books b ON b.book_id = oi.book_id
JOIN book_genres bg ON bg.book_id = b.book_id
JOIN genres g ON g.genre_id = bg.genre_id
GROUP BY b.book_id, b.title, bg.genre_id, g.name
)
SELECT genre, rnk, title, units_sold
FROM ranked_books
WHERE rnk <= 2
ORDER BY genre, rnk;
RANK vs ROW_NUMBER for top-N: Use RANK() when you want all tied rows
at position N included (e.g. both books that tied for 2nd). Use ROW_NUMBER() when
you want exactly N rows per partition, ties broken arbitrarily.
NTILE — bucket customers into quartiles
WITH spend AS (
SELECT
CONCAT(c.first_name, ' ', c.last_name) AS customer,
SUM(oi.quantity * oi.unit_price) AS total_spent
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
JOIN order_items oi ON oi.order_id = o.order_id
GROUP BY c.customer_id, customer
)
SELECT
customer,
total_spent,
NTILE(4) OVER (ORDER BY total_spent DESC) AS quartile
FROM spend
ORDER BY total_spent DESC;
3. Navigation Functions — LAG, LEAD, FIRST_VALUE, LAST_VALUE
LAG(col, n)
Value from n rows before the current row in the window. Default n=1. Returns NULL at the start.
LEAD(col, n)
Value from n rows after the current row. Returns NULL at the end.
FIRST_VALUE(col)
Value from the first row of the window frame. Every row in the partition can see the partition's first value.
LAST_VALUE(col)
Value from the last row of the current frame. Needs an explicit frame clause to reach the partition end.
NTH_VALUE(col, n)
Value from the nth row of the window frame. 1-indexed.
LAG and LEAD — month-over-month sales change
WITH monthly_totals AS (
SELECT
DATE_FORMAT(o.order_date, '%Y-%m') AS month,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
GROUP BY month
)
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
revenue - LAG(revenue) OVER (ORDER BY month) AS change,
ROUND(
(revenue - LAG(revenue) OVER (ORDER BY month))
/ LAG(revenue) OVER (ORDER BY month) * 100,
1) AS pct_change
FROM monthly_totals
ORDER BY month;
| month | revenue | prev_month_revenue | change | pct_change |
| 2026-01 | 1240.00 | NULL | NULL | NULL |
| 2026-02 | 980.50 | 1240.00 | -259.50 | -20.9 |
| 2026-03 | 1560.75 | 980.50 | +580.25 | +59.2 |
| 2026-04 | 1820.00 | 1560.75 | +259.25 | +16.6 |
| 2026-05 | 1390.25 | 1820.00 | -429.75 | -23.6 |
LAG with a default value
LAG(col, n, default) — the third argument replaces NULL at the boundary:
LAG(revenue, 1, 0) OVER (ORDER BY month)
FIRST_VALUE — benchmark each month against January
SELECT
month,
revenue,
FIRST_VALUE(revenue) OVER (ORDER BY month
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS jan_revenue,
ROUND(revenue /
FIRST_VALUE(revenue) OVER (ORDER BY month
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) * 100, 1
) AS vs_jan_pct
FROM monthly_totals
ORDER BY month;
LAST_VALUE gotcha: By default the frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,
so LAST_VALUE returns the current row's value — not the partition's last value.
Always add ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
when you want the true last value of the partition.
4. Aggregate Window Functions — Running Totals and Moving Averages
Any aggregate function (SUM, COUNT, AVG, MIN, MAX) can be used as a window function
by adding OVER(). The frame clause controls exactly which rows are included in the aggregate.
Frame clause reference
| Frame syntax | Meaning |
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | All rows from partition start up to and including current row → running total |
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING | All rows in the partition → same as no frame (partition total on every row) |
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW | Current row plus the 2 rows before it → 3-row moving window |
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING | One row before, current row, one row after → centred 3-row window |
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | Default when ORDER BY is present. Like ROWS but includes all peers (tied ORDER BY values). |
Running total and cumulative percentage
WITH monthly_totals AS (
SELECT
DATE_FORMAT(o.order_date, '%Y-%m') AS month,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
GROUP BY month
)
SELECT
month,
revenue,
SUM(revenue) OVER (
ORDER BY month
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total,
ROUND(
SUM(revenue) OVER (ORDER BY month
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
/ SUM(revenue) OVER () * 100
, 1) AS cumulative_pct
FROM monthly_totals
ORDER BY month;
| month | revenue | running_total | cumulative_pct |
| 2026-01 | 1240.00 | 1240.00 | 18.5% |
| 2026-02 | 980.50 | 2220.50 | 33.1% |
| 2026-03 | 1560.75 | 3781.25 | 56.4% |
| 2026-04 | 1820.00 | 5601.25 | 83.5% |
| 2026-05 | 1390.25 | 6991.50 | 100.0% |
3-month moving average
SELECT
month,
revenue,
ROUND(
AVG(revenue) OVER (
ORDER BY month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
)
, 2) AS moving_avg_3m
FROM monthly_totals
ORDER BY month;
SUM() OVER () — no ORDER BY, no PARTITION BY — means "the total of the entire
result set." This is the simplest window aggregate: every row sees the same grand total.
Useful for computing percentages: revenue / SUM(revenue) OVER () * 100.
Running COUNT — orders placed so far this year
SELECT
order_date,
order_id,
COUNT(*) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS orders_ytd
FROM orders
WHERE YEAR(order_date) = 2026
ORDER BY order_date;
5. Named Windows — the WINDOW Clause
When the same OVER() definition is repeated several times, define it once with
WINDOW … AS (…) at the end of the query and reference it by name.
This keeps queries DRY and reduces errors from copy-paste drift.
SELECT
month,
revenue,
SUM(revenue) OVER w AS running_total,
AVG(revenue) OVER w AS running_avg,
COUNT(*) OVER w AS months_so_far,
MIN(revenue) OVER w AS lowest_so_far,
MAX(revenue) OVER w AS highest_so_far
FROM monthly_totals
WINDOW w AS (
ORDER BY month
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
ORDER BY month;
6. Practical Patterns
Deduplicate — keep only the most recent order per customer
ROW_NUMBER is the standard way to deduplicate or pick one row per group:
WITH latest_orders AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date DESC, order_id DESC
) AS rn
FROM orders
)
SELECT customer_id, order_id, order_date, total_amount
FROM latest_orders
WHERE rn = 1
ORDER BY customer_id;
Detect gaps — find orders where the previous order was more than 90 days ago
WITH order_gaps AS (
SELECT
customer_id,
order_id,
order_date,
LAG(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS prev_order_date,
DATEDIFF(order_date,
LAG(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
)
) AS days_since_last
FROM orders
)
SELECT customer_id, order_date, prev_order_date, days_since_last
FROM order_gaps
WHERE days_since_last > 90
ORDER BY days_since_last DESC;
Pareto / cumulative share — find which books make up 80% of revenue
WITH book_revenue AS (
SELECT
b.title,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM order_items oi
JOIN books b ON b.book_id = oi.book_id
GROUP BY b.book_id, b.title
),
cumulative AS (
SELECT
title,
revenue,
SUM(revenue) OVER (
ORDER BY revenue DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_revenue,
SUM(revenue) OVER () AS grand_total
FROM book_revenue
)
SELECT
title,
revenue,
ROUND(cumulative_revenue / grand_total * 100, 1) AS cumulative_pct
FROM cumulative
WHERE (cumulative_revenue - revenue) / grand_total < 0.80
ORDER BY revenue DESC;
Cannot filter on a window function in WHERE. MySQL evaluates window functions after WHERE.
The pattern is always: compute in a CTE → filter in the outer SELECT's WHERE clause.
Quick Reference
| Function / clause | What it does | Common use |
| ROW_NUMBER() |
Unique integer per row within partition, no tie-sharing |
Deduplication, exactly-N-per-group selection |
| RANK() |
Ties share rank; next rank skips (gaps) |
Top-N per group where ties should all be included |
| DENSE_RANK() |
Ties share rank; no gaps in sequence |
Medal-style ranking; consecutive rank bands |
| NTILE(n) |
Divides rows into n buckets |
Quartiles, deciles, percentile segmentation |
| LAG(col, n, default) |
Value from n rows earlier in the window |
Period-over-period comparison, gap detection |
| LEAD(col, n, default) |
Value from n rows later in the window |
Forward-looking metrics, time-to-next-event |
| FIRST_VALUE(col) |
Value from first row of the frame |
Benchmark every row against partition start |
| LAST_VALUE(col) |
Value from last row of the frame (extend frame!) |
Benchmark against partition end — needs ROWS BETWEEN … UNBOUNDED FOLLOWING |
| SUM/AVG/COUNT OVER (ORDER BY … ROWS BETWEEN …) |
Running or moving aggregate |
Running totals, moving averages, YTD counts |
| SUM(x) OVER () |
Grand total across all rows |
Percentage-of-total: x / SUM(x) OVER () |
| PARTITION BY |
Resets window per group |
Per-category ranks, per-customer running totals |
| WINDOW w AS (…) |
Named window reused across multiple functions |
Avoid repeating identical OVER() clauses |
| Cannot use in WHERE |
Window functions evaluated after WHERE/GROUP BY |
Always wrap in a CTE, then filter in outer WHERE |
Next: Chapter 7 — Views:
CREATE VIEW, querying views, updatable views, using views to simplify complex queries
and control column-level access.