Chapter 7

Chapter 7 — Aggregate Functions and GROUP BY

Every query so far has returned one row per row in the table. Aggregate functions change that — they collapse many rows into a single summary value. Combined with GROUP BY, they let you produce per-category summaries: total sales per genre, average price per author, number of orders per customer. This chapter covers the five core aggregate functions, how GROUP BY works, and how HAVING lets you filter those summaries.

1. The Five Core Aggregate Functions

COUNT()

COUNT(*) or COUNT(col)

Count rows. COUNT(*) counts all rows including NULLs. COUNT(col) counts only non-NULL values in that column.

SUM()

SUM(col)

Total of all non-NULL values. Returns NULL if all values are NULL (use COALESCE to default to 0).

AVG()

AVG(col)

Mean of all non-NULL values. NULLs are excluded from both numerator and denominator — they don't count as zero.

MIN()

MIN(col)

Smallest value. Works on numbers, strings (alphabetical), and dates. Ignores NULLs.

MAX()

MAX(col)

Largest value. Works on numbers, strings, and dates. Ignores NULLs.

COUNT(DISTINCT)

COUNT(DISTINCT col)

Count unique non-NULL values. Useful for "how many different genres do we stock?" rather than "how many books?"

Aggregates across the whole table

Without GROUP BY, an aggregate function collapses the entire table into one row:

SELECT COUNT(*) AS total_books, COUNT(genre) AS books_with_genre, -- excludes NULLs COUNT(DISTINCT genre) AS unique_genres, SUM(stock) AS total_stock, AVG(price) AS avg_price, MIN(price) AS cheapest, MAX(price) AS most_expensive, MIN(published_year) AS oldest_year, MAX(published_year) AS newest_year FROM books;

total_books	books_with_genre	unique_genres	total_stock	avg_price	cheapest	most_expensive	oldest_year	newest_year
6	6	4	182	9.07	6.99	10.99	1939	2020

All six books collapsed into one summary row. avg_price is rounded here for display — MySQL returns the full decimal.

2. NULL Behaviour in Aggregates

All aggregate functions silently ignore NULL values — except COUNT(*) which counts rows regardless. This asymmetry causes a common mistake with AVG:

-- Suppose 2 of our 6 books have no stock value (NULL) -- These two queries give DIFFERENT answers: SELECT AVG(stock) AS avg_stock FROM books; -- MySQL: ignores the 2 NULLs, averages the 4 known values -- Result: 30.25 (121 / 4) SELECT SUM(stock) / COUNT(*) AS avg_stock FROM books; -- Divides by 6 (all rows), treating NULLs as if stock were 0 -- Result: 20.17 (121 / 6) -- If you mean "NULL stock counts as zero in the average", be explicit: SELECT AVG(IFNULL(stock, 0)) AS avg_stock FROM books; -- Result: 20.17 — matches SUM/COUNT(*) calculation

AVG skips NULLs — it doesn't treat them as zero. If stock is NULL for some books and you want those to contribute 0 to the average, wrap the column in IFNULL(stock, 0). If you genuinely want to average only books where stock is known, AVG(stock) is correct as-is.

3. GROUP BY — Per-Category Summaries

GROUP BY splits the rows into groups — one group per distinct value of the grouping column(s) — and runs the aggregate function separately on each group. The result has one row per group.

-- How many books and what is the average price per genre? SELECT genre, COUNT(*) AS book_count, AVG(price) AS avg_price, SUM(stock) AS total_stock FROM books GROUP BY genre ORDER BY book_count DESC;

genre	book_count	avg_price	total_stock
Non-fiction	2	9.49	49
Sci-fi	2	9.24	36
Fiction	1	8.99	42
Mystery	1	6.99	55

Four genres, four result rows. MySQL grouped all Sci-fi books together, calculated COUNT/AVG/SUM for that group, then repeated for each genre.

Grouping by multiple columns

You can group by two or more columns to get a cross-tabulation. Each unique combination of the grouped columns becomes its own row:

-- Orders per customer per book SELECT customer_id, book_id, COUNT(*) AS times_ordered, SUM(quantity) AS total_qty FROM orders GROUP BY customer_id, book_id ORDER BY customer_id;

The golden rule of GROUP BY

Every column in SELECT must either be in GROUP BY, or wrapped in an aggregate function. If you put a non-grouped, non-aggregated column in SELECT, MySQL (in strict mode) will error. In non-strict mode it picks an arbitrary value from the group — which is almost certainly wrong.

-- ✗ WRONG: title is not in GROUP BY and not aggregated SELECT genre, title, COUNT(*) FROM books GROUP BY genre; -- ERROR 1055: 'title' is not in GROUP BY and is not aggregated -- ✓ CORRECT option 1: add title to GROUP BY (group by both) SELECT genre, title, COUNT(*) FROM books GROUP BY genre, title; -- ✓ CORRECT option 2: aggregate title instead SELECT genre, MIN(title) AS sample_title, COUNT(*) FROM books GROUP BY genre;

4. HAVING — Filtering Groups

WHERE filters individual rows before grouping. HAVING filters groups after the aggregate has been calculated. It's the only place you can reference aggregate functions in a filter condition.

-- Only show genres with more than one book and average price above £8 SELECT genre, COUNT(*) AS book_count, AVG(price) AS avg_price FROM books GROUP BY genre HAVING COUNT(*) > 1 AND AVG(price) > 8.00 ORDER BY avg_price DESC;

genre	book_count	avg_price
Non-fiction	2	9.49
Sci-fi	2	9.24

Fiction (1 book) and Mystery (1 book) were excluded by HAVING COUNT(*) > 1.

WHERE vs HAVING — the key distinction

WHERE — filters rows before grouping

Runs before GROUP BY. Each row is evaluated individually. Cannot reference aggregate functions — the aggregates don't exist yet.

Use for column values: WHERE price > 5
Use for date ranges: WHERE order_date > '2024-01-01'
Use for row-level filters: WHERE genre IS NOT NULL
Faster — reduces rows before the costly grouping step

HAVING — filters groups after aggregating

Runs after GROUP BY. Each group is evaluated. Can (and usually does) reference aggregate functions.

Use for aggregate results: HAVING COUNT(*) > 1
Use for group totals: HAVING SUM(quantity) >= 10
Use for group averages: HAVING AVG(price) < 8
Can also filter by grouped columns, but WHERE is faster for that

-- Combined WHERE + GROUP BY + HAVING + ORDER BY -- "For fiction and sci-fi books only, which genres have average stock above 20?" SELECT genre, COUNT(*) AS book_count, AVG(stock) AS avg_stock, SUM(stock) AS total_stock FROM books WHERE genre IN ('Fiction', 'Sci-fi') -- row filter: only these genres enter grouping GROUP BY genre HAVING AVG(stock) > 20 -- group filter: only groups with high avg stock ORDER BY avg_stock DESC;

5. Full Clause Execution Order

MySQL evaluates clauses in a fixed logical order that is different from the order you write them. Understanding this explains every "column not found" and "can't use aggregate in WHERE" error:

FROM

Identify the source table(s) and load the rows

WHERE

Filter individual rows — aggregates don't exist yet

GROUP BY

Partition surviving rows into groups

HAVING

Filter groups — aggregates are now calculated and available

SELECT

Choose columns, compute expressions, assign aliases

ORDER BY

Sort the final result — aliases from SELECT are now available

LIMIT / OFFSET

Trim the result to the requested page

Why can't I use a SELECT alias in WHERE or HAVING? Because SELECT runs after both WHERE and HAVING. The alias doesn't exist at the time WHERE and HAVING are evaluated. You can use aliases in ORDER BY because that runs after SELECT. To reuse a calculation in HAVING without repeating it, some databases support column aliases in HAVING — MySQL does allow this as an extension, but it's not standard SQL and can behave unexpectedly. Repeating the expression is safer.

6. Practical Patterns

Revenue per customer

SELECT customer_id, COUNT(*) AS order_count, SUM(total_price) AS total_spent, AVG(total_price) AS avg_order_value FROM orders GROUP BY customer_id ORDER BY total_spent DESC;

customer_id	order_count	total_spent	avg_order_value
1	2	28.97	14.49
3	1	20.97	20.97
2	1	9.99	9.99
4	1	7.99	7.99

Customer 1 (Alice) has placed two orders totalling £28.97 — the highest spender.

Finding customers who spent more than £15 total

SELECT customer_id, SUM(total_price) AS total_spent FROM orders GROUP BY customer_id HAVING SUM(total_price) > 15.00 ORDER BY total_spent DESC;

customer_id	total_spent
1	28.97
3	20.97

Books per author — minimum one book

SELECT author_id, COUNT(*) AS books_published, MIN(published_year) AS first_year, MAX(published_year) AS latest_year, SUM(stock) AS total_copies_in_stock FROM books GROUP BY author_id ORDER BY books_published DESC;

author_id	books_published	first_year	latest_year	total_copies_in_stock
1	2	2015	2020	60
2	1	1965	1965	27
3	1	2011	2011	31
4	1	1939	1939	55
5	1	1969	1969	9

Author 1 (Matt Haig) has two books in the catalogue. IDs rather than names appear here — Chapter 8 (JOINs) will combine both tables to show the name alongside.

Counting NULL vs non-NULL — COUNT(*) vs COUNT(col)

-- How many authors have a nationality recorded vs unknown? SELECT COUNT(*) AS total_authors, COUNT(nationality) AS with_nationality, COUNT(*) - COUNT(nationality) AS unknown_nationality FROM authors;

total_authors	with_nationality	unknown_nationality
5	4	1

One author has no nationality recorded (Ursula Le Guin's nationality was set to NULL in Chapter 5). COUNT(*) caught it; COUNT(nationality) skipped it.

WITH ROLLUP — adding a grand total row

WITH ROLLUP is a GROUP BY modifier that appends an extra summary row for each grouping level, with the final row representing the grand total across all groups:

SELECT IFNULL(genre, 'TOTAL') AS genre, COUNT(*) AS book_count, SUM(stock) AS total_stock FROM books GROUP BY genre WITH ROLLUP; -- ROLLUP sets the grouping column to NULL in the summary rows -- IFNULL converts NULL → 'TOTAL' for the label

genre	book_count	total_stock
Fiction	1	42
Mystery	1	55
Non-fiction	2	49
Sci-fi	2	36
TOTAL	6	182

The final row is the grand total across all genres — added automatically by WITH ROLLUP.

Chapter Summary

Concept	Key points
COUNT(*)	Counts all rows, including those with NULL values in any column.
COUNT(col)	Counts only non-NULL values in that column. Use COUNT(*) − COUNT(col) to count NULLs.
SUM / AVG / MIN / MAX	All ignore NULL values. AVG denominates by non-NULL count — not total row count.
COUNT(DISTINCT col)	Counts unique non-NULL values — useful for "how many different X" questions.
GROUP BY	Splits rows into groups; aggregates run per group. Every non-aggregated SELECT column must appear in GROUP BY.
HAVING	Filters groups after aggregation. The only place you can use aggregate functions in a condition. WHERE filters rows; HAVING filters groups.
Execution order	FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. Aliases from SELECT are only available in ORDER BY.
WITH ROLLUP	Appends subtotal and grand total rows to a GROUP BY result. ROLLUP sets grouping columns to NULL in summary rows — use IFNULL to label them.

Next: Chapter 8 — JOINs. The queries in this chapter returned author IDs rather than author names — because the name lives in a different table. JOINs combine rows from two or more tables based on a shared key. Chapter 8 covers INNER JOIN, LEFT JOIN, RIGHT JOIN, self-joins, and the difference between them.