Chapter 7

Chapter 7 — Aggregate Functions and GROUP BY

Every query so far has returned one row per row in the table. Aggregate functions change that — they collapse many rows into a single summary value. Combined with GROUP BY, they let you produce per-category summaries: total sales per genre, average price per author, number of orders per customer. This chapter covers the five core aggregate functions, how GROUP BY works, and how HAVING lets you filter those summaries.

1. The Five Core Aggregate Functions

COUNT()
COUNT(*) or COUNT(col)

Count rows. COUNT(*) counts all rows including NULLs. COUNT(col) counts only non-NULL values in that column.

SUM()
SUM(col)

Total of all non-NULL values. Returns NULL if all values are NULL (use COALESCE to default to 0).

AVG()
AVG(col)

Mean of all non-NULL values. NULLs are excluded from both numerator and denominator — they don't count as zero.

MIN()
MIN(col)

Smallest value. Works on numbers, strings (alphabetical), and dates. Ignores NULLs.

MAX()
MAX(col)

Largest value. Works on numbers, strings, and dates. Ignores NULLs.

COUNT(DISTINCT)
COUNT(DISTINCT col)

Count unique non-NULL values. Useful for "how many different genres do we stock?" rather than "how many books?"

Aggregates across the whole table

Without GROUP BY, an aggregate function collapses the entire table into one row:

SELECT COUNT(*) AS total_books, COUNT(genre) AS books_with_genre, -- excludes NULLs COUNT(DISTINCT genre) AS unique_genres, SUM(stock) AS total_stock, AVG(price) AS avg_price, MIN(price) AS cheapest, MAX(price) AS most_expensive, MIN(published_year) AS oldest_year, MAX(published_year) AS newest_year FROM books;
total_booksbooks_with_genreunique_genrestotal_stockavg_pricecheapestmost_expensiveoldest_yearnewest_year
6641829.076.9910.9919392020

All six books collapsed into one summary row. avg_price is rounded here for display — MySQL returns the full decimal.

2. NULL Behaviour in Aggregates

All aggregate functions silently ignore NULL values — except COUNT(*) which counts rows regardless. This asymmetry causes a common mistake with AVG:

-- Suppose 2 of our 6 books have no stock value (NULL) -- These two queries give DIFFERENT answers: SELECT AVG(stock) AS avg_stock FROM books; -- MySQL: ignores the 2 NULLs, averages the 4 known values -- Result: 30.25 (121 / 4) SELECT SUM(stock) / COUNT(*) AS avg_stock FROM books; -- Divides by 6 (all rows), treating NULLs as if stock were 0 -- Result: 20.17 (121 / 6) -- If you mean "NULL stock counts as zero in the average", be explicit: SELECT AVG(IFNULL(stock, 0)) AS avg_stock FROM books; -- Result: 20.17 — matches SUM/COUNT(*) calculation
AVG skips NULLs — it doesn't treat them as zero. If stock is NULL for some books and you want those to contribute 0 to the average, wrap the column in IFNULL(stock, 0). If you genuinely want to average only books where stock is known, AVG(stock) is correct as-is.

3. GROUP BY — Per-Category Summaries

GROUP BY splits the rows into groups — one group per distinct value of the grouping column(s) — and runs the aggregate function separately on each group. The result has one row per group.

-- How many books and what is the average price per genre? SELECT genre, COUNT(*) AS book_count, AVG(price) AS avg_price, SUM(stock) AS total_stock FROM books GROUP BY genre ORDER BY book_count DESC;
genrebook_countavg_pricetotal_stock
Non-fiction29.4949
Sci-fi29.2436
Fiction18.9942
Mystery16.9955

Four genres, four result rows. MySQL grouped all Sci-fi books together, calculated COUNT/AVG/SUM for that group, then repeated for each genre.

Grouping by multiple columns

You can group by two or more columns to get a cross-tabulation. Each unique combination of the grouped columns becomes its own row:

-- Orders per customer per book SELECT customer_id, book_id, COUNT(*) AS times_ordered, SUM(quantity) AS total_qty FROM orders GROUP BY customer_id, book_id ORDER BY customer_id;

The golden rule of GROUP BY

Every column in SELECT must either be in GROUP BY, or wrapped in an aggregate function. If you put a non-grouped, non-aggregated column in SELECT, MySQL (in strict mode) will error. In non-strict mode it picks an arbitrary value from the group — which is almost certainly wrong.
-- ✗ WRONG: title is not in GROUP BY and not aggregated SELECT genre, title, COUNT(*) FROM books GROUP BY genre; -- ERROR 1055: 'title' is not in GROUP BY and is not aggregated -- ✓ CORRECT option 1: add title to GROUP BY (group by both) SELECT genre, title, COUNT(*) FROM books GROUP BY genre, title; -- ✓ CORRECT option 2: aggregate title instead SELECT genre, MIN(title) AS sample_title, COUNT(*) FROM books GROUP BY genre;

4. HAVING — Filtering Groups

WHERE filters individual rows before grouping. HAVING filters groups after the aggregate has been calculated. It's the only place you can reference aggregate functions in a filter condition.

-- Only show genres with more than one book and average price above £8 SELECT genre, COUNT(*) AS book_count, AVG(price) AS avg_price FROM books GROUP BY genre HAVING COUNT(*) > 1 AND AVG(price) > 8.00 ORDER BY avg_price DESC;
genrebook_countavg_price
Non-fiction29.49
Sci-fi29.24

Fiction (1 book) and Mystery (1 book) were excluded by HAVING COUNT(*) > 1.

WHERE vs HAVING — the key distinction

WHERE — filters rows before grouping
Runs before GROUP BY. Each row is evaluated individually. Cannot reference aggregate functions — the aggregates don't exist yet.
  • Use for column values: WHERE price > 5
  • Use for date ranges: WHERE order_date > '2024-01-01'
  • Use for row-level filters: WHERE genre IS NOT NULL
  • Faster — reduces rows before the costly grouping step
HAVING — filters groups after aggregating
Runs after GROUP BY. Each group is evaluated. Can (and usually does) reference aggregate functions.
  • Use for aggregate results: HAVING COUNT(*) > 1
  • Use for group totals: HAVING SUM(quantity) >= 10
  • Use for group averages: HAVING AVG(price) < 8
  • Can also filter by grouped columns, but WHERE is faster for that
-- Combined WHERE + GROUP BY + HAVING + ORDER BY -- "For fiction and sci-fi books only, which genres have average stock above 20?" SELECT genre, COUNT(*) AS book_count, AVG(stock) AS avg_stock, SUM(stock) AS total_stock FROM books WHERE genre IN ('Fiction', 'Sci-fi') -- row filter: only these genres enter grouping GROUP BY genre HAVING AVG(stock) > 20 -- group filter: only groups with high avg stock ORDER BY avg_stock DESC;

5. Full Clause Execution Order

MySQL evaluates clauses in a fixed logical order that is different from the order you write them. Understanding this explains every "column not found" and "can't use aggregate in WHERE" error:

1
FROM
Identify the source table(s) and load the rows
2
WHERE
Filter individual rows — aggregates don't exist yet
3
GROUP BY
Partition surviving rows into groups
4
HAVING
Filter groups — aggregates are now calculated and available
5
SELECT
Choose columns, compute expressions, assign aliases
6
ORDER BY
Sort the final result — aliases from SELECT are now available
7
LIMIT / OFFSET
Trim the result to the requested page
Why can't I use a SELECT alias in WHERE or HAVING? Because SELECT runs after both WHERE and HAVING. The alias doesn't exist at the time WHERE and HAVING are evaluated. You can use aliases in ORDER BY because that runs after SELECT. To reuse a calculation in HAVING without repeating it, some databases support column aliases in HAVING — MySQL does allow this as an extension, but it's not standard SQL and can behave unexpectedly. Repeating the expression is safer.

6. Practical Patterns

Revenue per customer

SELECT customer_id, COUNT(*) AS order_count, SUM(total_price) AS total_spent, AVG(total_price) AS avg_order_value FROM orders GROUP BY customer_id ORDER BY total_spent DESC;
customer_idorder_counttotal_spentavg_order_value
1228.9714.49
3120.9720.97
219.999.99
417.997.99

Customer 1 (Alice) has placed two orders totalling £28.97 — the highest spender.

Finding customers who spent more than £15 total

SELECT customer_id, SUM(total_price) AS total_spent FROM orders GROUP BY customer_id HAVING SUM(total_price) > 15.00 ORDER BY total_spent DESC;
customer_idtotal_spent
128.97
320.97

Books per author — minimum one book

SELECT author_id, COUNT(*) AS books_published, MIN(published_year) AS first_year, MAX(published_year) AS latest_year, SUM(stock) AS total_copies_in_stock FROM books GROUP BY author_id ORDER BY books_published DESC;
author_idbooks_publishedfirst_yearlatest_yeartotal_copies_in_stock
122015202060
211965196527
312011201131
411939193955
51196919699

Author 1 (Matt Haig) has two books in the catalogue. IDs rather than names appear here — Chapter 8 (JOINs) will combine both tables to show the name alongside.

Counting NULL vs non-NULL — COUNT(*) vs COUNT(col)

-- How many authors have a nationality recorded vs unknown? SELECT COUNT(*) AS total_authors, COUNT(nationality) AS with_nationality, COUNT(*) - COUNT(nationality) AS unknown_nationality FROM authors;
total_authorswith_nationalityunknown_nationality
541

One author has no nationality recorded (Ursula Le Guin's nationality was set to NULL in Chapter 5). COUNT(*) caught it; COUNT(nationality) skipped it.

WITH ROLLUP — adding a grand total row

WITH ROLLUP is a GROUP BY modifier that appends an extra summary row for each grouping level, with the final row representing the grand total across all groups:

SELECT IFNULL(genre, 'TOTAL') AS genre, COUNT(*) AS book_count, SUM(stock) AS total_stock FROM books GROUP BY genre WITH ROLLUP; -- ROLLUP sets the grouping column to NULL in the summary rows -- IFNULL converts NULL → 'TOTAL' for the label
genrebook_counttotal_stock
Fiction142
Mystery155
Non-fiction249
Sci-fi236
TOTAL6182

The final row is the grand total across all genres — added automatically by WITH ROLLUP.

Chapter Summary

ConceptKey points
COUNT(*)Counts all rows, including those with NULL values in any column.
COUNT(col)Counts only non-NULL values in that column. Use COUNT(*) − COUNT(col) to count NULLs.
SUM / AVG / MIN / MAXAll ignore NULL values. AVG denominates by non-NULL count — not total row count.
COUNT(DISTINCT col)Counts unique non-NULL values — useful for "how many different X" questions.
GROUP BYSplits rows into groups; aggregates run per group. Every non-aggregated SELECT column must appear in GROUP BY.
HAVINGFilters groups after aggregation. The only place you can use aggregate functions in a condition. WHERE filters rows; HAVING filters groups.
Execution orderFROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. Aliases from SELECT are only available in ORDER BY.
WITH ROLLUPAppends subtotal and grand total rows to a GROUP BY result. ROLLUP sets grouping columns to NULL in summary rows — use IFNULL to label them.
Next: Chapter 8 — JOINs. The queries in this chapter returned author IDs rather than author names — because the name lives in a different table. JOINs combine rows from two or more tables based on a shared key. Chapter 8 covers INNER JOIN, LEFT JOIN, RIGHT JOIN, self-joins, and the difference between them.