Chapter 7
Chapter 7 — Aggregate Functions and GROUP BY
Every query so far has returned one row per row in the table. Aggregate functions
change that — they collapse many rows into a single summary value. Combined with
GROUP BY, they let you produce per-category summaries: total sales per
genre, average price per author, number of orders per customer. This chapter covers
the five core aggregate functions, how GROUP BY works, and how HAVING lets you filter
those summaries.
1. The Five Core Aggregate Functions
Count rows. COUNT(*) counts all rows including NULLs. COUNT(col) counts only non-NULL values in that column.
Total of all non-NULL values. Returns NULL if all values are NULL (use COALESCE to default to 0).
Mean of all non-NULL values. NULLs are excluded from both numerator and denominator — they don't count as zero.
Smallest value. Works on numbers, strings (alphabetical), and dates. Ignores NULLs.
Largest value. Works on numbers, strings, and dates. Ignores NULLs.
Count unique non-NULL values. Useful for "how many different genres do we stock?" rather than "how many books?"
Aggregates across the whole table
Without GROUP BY, an aggregate function collapses the entire table into one row:
| total_books | books_with_genre | unique_genres | total_stock | avg_price | cheapest | most_expensive | oldest_year | newest_year |
|---|---|---|---|---|---|---|---|---|
| 6 | 6 | 4 | 182 | 9.07 | 6.99 | 10.99 | 1939 | 2020 |
All six books collapsed into one summary row. avg_price is rounded here for display — MySQL returns the full decimal.
2. NULL Behaviour in Aggregates
All aggregate functions silently ignore NULL values — except COUNT(*)
which counts rows regardless. This asymmetry causes a common mistake with AVG:
IFNULL(stock, 0). If you genuinely want to average only books where
stock is known, AVG(stock) is correct as-is.
3. GROUP BY — Per-Category Summaries
GROUP BY splits the rows into groups — one group per distinct value of the grouping column(s) — and runs the aggregate function separately on each group. The result has one row per group.
| genre | book_count | avg_price | total_stock |
|---|---|---|---|
| Non-fiction | 2 | 9.49 | 49 |
| Sci-fi | 2 | 9.24 | 36 |
| Fiction | 1 | 8.99 | 42 |
| Mystery | 1 | 6.99 | 55 |
Four genres, four result rows. MySQL grouped all Sci-fi books together, calculated COUNT/AVG/SUM for that group, then repeated for each genre.
Grouping by multiple columns
You can group by two or more columns to get a cross-tabulation. Each unique combination of the grouped columns becomes its own row:
The golden rule of GROUP BY
4. HAVING — Filtering Groups
WHERE filters individual rows before grouping. HAVING filters groups after the aggregate has been calculated. It's the only place you can reference aggregate functions in a filter condition.
| genre | book_count | avg_price |
|---|---|---|
| Non-fiction | 2 | 9.49 |
| Sci-fi | 2 | 9.24 |
Fiction (1 book) and Mystery (1 book) were excluded by HAVING COUNT(*) > 1.
WHERE vs HAVING — the key distinction
- Use for column values:
WHERE price > 5 - Use for date ranges:
WHERE order_date > '2024-01-01' - Use for row-level filters:
WHERE genre IS NOT NULL - Faster — reduces rows before the costly grouping step
- Use for aggregate results:
HAVING COUNT(*) > 1 - Use for group totals:
HAVING SUM(quantity) >= 10 - Use for group averages:
HAVING AVG(price) < 8 - Can also filter by grouped columns, but WHERE is faster for that
5. Full Clause Execution Order
MySQL evaluates clauses in a fixed logical order that is different from the order you write them. Understanding this explains every "column not found" and "can't use aggregate in WHERE" error:
6. Practical Patterns
Revenue per customer
| customer_id | order_count | total_spent | avg_order_value |
|---|---|---|---|
| 1 | 2 | 28.97 | 14.49 |
| 3 | 1 | 20.97 | 20.97 |
| 2 | 1 | 9.99 | 9.99 |
| 4 | 1 | 7.99 | 7.99 |
Customer 1 (Alice) has placed two orders totalling £28.97 — the highest spender.
Finding customers who spent more than £15 total
| customer_id | total_spent |
|---|---|
| 1 | 28.97 |
| 3 | 20.97 |
Books per author — minimum one book
| author_id | books_published | first_year | latest_year | total_copies_in_stock |
|---|---|---|---|---|
| 1 | 2 | 2015 | 2020 | 60 |
| 2 | 1 | 1965 | 1965 | 27 |
| 3 | 1 | 2011 | 2011 | 31 |
| 4 | 1 | 1939 | 1939 | 55 |
| 5 | 1 | 1969 | 1969 | 9 |
Author 1 (Matt Haig) has two books in the catalogue. IDs rather than names appear here — Chapter 8 (JOINs) will combine both tables to show the name alongside.
Counting NULL vs non-NULL — COUNT(*) vs COUNT(col)
| total_authors | with_nationality | unknown_nationality |
|---|---|---|
| 5 | 4 | 1 |
One author has no nationality recorded (Ursula Le Guin's nationality was set to NULL in Chapter 5). COUNT(*) caught it; COUNT(nationality) skipped it.
WITH ROLLUP — adding a grand total row
WITH ROLLUP is a GROUP BY modifier that appends an extra summary row
for each grouping level, with the final row representing the grand total across all
groups:
| genre | book_count | total_stock |
|---|---|---|
| Fiction | 1 | 42 |
| Mystery | 1 | 55 |
| Non-fiction | 2 | 49 |
| Sci-fi | 2 | 36 |
| TOTAL | 6 | 182 |
The final row is the grand total across all genres — added automatically by WITH ROLLUP.
Chapter Summary
| Concept | Key points |
|---|---|
| COUNT(*) | Counts all rows, including those with NULL values in any column. |
| COUNT(col) | Counts only non-NULL values in that column. Use COUNT(*) − COUNT(col) to count NULLs. |
| SUM / AVG / MIN / MAX | All ignore NULL values. AVG denominates by non-NULL count — not total row count. |
| COUNT(DISTINCT col) | Counts unique non-NULL values — useful for "how many different X" questions. |
| GROUP BY | Splits rows into groups; aggregates run per group. Every non-aggregated SELECT column must appear in GROUP BY. |
| HAVING | Filters groups after aggregation. The only place you can use aggregate functions in a condition. WHERE filters rows; HAVING filters groups. |
| Execution order | FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. Aliases from SELECT are only available in ORDER BY. |
| WITH ROLLUP | Appends subtotal and grand total rows to a GROUP BY result. ROLLUP sets grouping columns to NULL in summary rows — use IFNULL to label them. |