Every query so far has returned one row per row in the table. Aggregate functions
change that — they collapse many rows into a single summary value. Combined with
GROUP BY, they let you produce per-category summaries: total sales per
genre, average price per author, number of orders per customer. This chapter covers
the five core aggregate functions, how GROUP BY works, and how HAVING lets you filter
those summaries.
1. The Five Core Aggregate Functions
COUNT()
COUNT(*) or COUNT(col)
Count rows. COUNT(*) counts all rows including NULLs. COUNT(col) counts only non-NULL values in that column.
SUM()
SUM(col)
Total of all non-NULL values. Returns NULL if all values are NULL (use COALESCE to default to 0).
AVG()
AVG(col)
Mean of all non-NULL values. NULLs are excluded from both numerator and denominator — they don't count as zero.
MIN()
MIN(col)
Smallest value. Works on numbers, strings (alphabetical), and dates. Ignores NULLs.
MAX()
MAX(col)
Largest value. Works on numbers, strings, and dates. Ignores NULLs.
COUNT(DISTINCT)
COUNT(DISTINCT col)
Count unique non-NULL values. Useful for "how many different genres do we stock?" rather than "how many books?"
Aggregates across the whole table
Without GROUP BY, an aggregate function collapses the entire table into one row:
SELECTCOUNT(*) AS total_books,
COUNT(genre) AS books_with_genre, -- excludes NULLsCOUNT(DISTINCTgenre) AS unique_genres,
SUM(stock) AS total_stock,
AVG(price) AS avg_price,
MIN(price) AS cheapest,
MAX(price) AS most_expensive,
MIN(published_year) AS oldest_year,
MAX(published_year) AS newest_year
FROMbooks;
total_books
books_with_genre
unique_genres
total_stock
avg_price
cheapest
most_expensive
oldest_year
newest_year
6
6
4
182
9.07
6.99
10.99
1939
2020
All six books collapsed into one summary row. avg_price is rounded here for display — MySQL returns the full decimal.
2. NULL Behaviour in Aggregates
All aggregate functions silently ignore NULL values — except COUNT(*)
which counts rows regardless. This asymmetry causes a common mistake with AVG:
-- Suppose 2 of our 6 books have no stock value (NULL)-- These two queries give DIFFERENT answers:SELECTAVG(stock) AS avg_stock FROMbooks;
-- MySQL: ignores the 2 NULLs, averages the 4 known values-- Result: 30.25 (121 / 4)SELECTSUM(stock) /COUNT(*) AS avg_stock FROMbooks;
-- Divides by 6 (all rows), treating NULLs as if stock were 0-- Result: 20.17 (121 / 6)-- If you mean "NULL stock counts as zero in the average", be explicit:SELECTAVG(IFNULL(stock, 0)) AS avg_stock FROMbooks;
-- Result: 20.17 — matches SUM/COUNT(*) calculation
AVG skips NULLs — it doesn't treat them as zero. If stock is NULL
for some books and you want those to contribute 0 to the average, wrap the column
in IFNULL(stock, 0). If you genuinely want to average only books where
stock is known, AVG(stock) is correct as-is.
3. GROUP BY — Per-Category Summaries
GROUP BY splits the rows into groups — one group per distinct value of the grouping
column(s) — and runs the aggregate function separately on each group. The result has
one row per group.
-- How many books and what is the average price per genre?SELECTgenre,
COUNT(*) AS book_count,
AVG(price) AS avg_price,
SUM(stock) AS total_stock
FROMbooksGROUP BYgenreORDER BYbook_countDESC;
genre
book_count
avg_price
total_stock
Non-fiction
2
9.49
49
Sci-fi
2
9.24
36
Fiction
1
8.99
42
Mystery
1
6.99
55
Four genres, four result rows. MySQL grouped all Sci-fi books together, calculated COUNT/AVG/SUM for that group, then repeated for each genre.
Grouping by multiple columns
You can group by two or more columns to get a cross-tabulation. Each unique
combination of the grouped columns becomes its own row:
-- Orders per customer per bookSELECTcustomer_id,
book_id,
COUNT(*) AS times_ordered,
SUM(quantity) AS total_qty
FROMordersGROUP BYcustomer_id, book_idORDER BYcustomer_id;
The golden rule of GROUP BY
Every column in SELECT must either be in GROUP BY, or wrapped in an aggregate function.
If you put a non-grouped, non-aggregated column in SELECT, MySQL (in strict mode)
will error. In non-strict mode it picks an arbitrary value from the group — which
is almost certainly wrong.
-- ✗ WRONG: title is not in GROUP BY and not aggregatedSELECTgenre, title, COUNT(*) FROMbooksGROUP BYgenre;
-- ERROR 1055: 'title' is not in GROUP BY and is not aggregated-- ✓ CORRECT option 1: add title to GROUP BY (group by both)SELECTgenre, title, COUNT(*) FROMbooksGROUP BYgenre, title;
-- ✓ CORRECT option 2: aggregate title insteadSELECTgenre, MIN(title) AS sample_title, COUNT(*) FROMbooksGROUP BYgenre;
4. HAVING — Filtering Groups
WHERE filters individual rows before grouping. HAVING filters groups after the
aggregate has been calculated. It's the only place you can reference aggregate
functions in a filter condition.
-- Only show genres with more than one book and average price above £8SELECTgenre,
COUNT(*) AS book_count,
AVG(price) AS avg_price
FROMbooksGROUP BYgenreHAVINGCOUNT(*) >1ANDAVG(price) >8.00ORDER BYavg_priceDESC;
genre
book_count
avg_price
Non-fiction
2
9.49
Sci-fi
2
9.24
Fiction (1 book) and Mystery (1 book) were excluded by HAVING COUNT(*) > 1.
WHERE vs HAVING — the key distinction
WHERE — filters rows before grouping
Runs before GROUP BY. Each row is evaluated individually.
Cannot reference aggregate functions — the aggregates don't exist yet.
Use for column values: WHERE price > 5
Use for date ranges: WHERE order_date > '2024-01-01'
Use for row-level filters: WHERE genre IS NOT NULL
Faster — reduces rows before the costly grouping step
HAVING — filters groups after aggregating
Runs after GROUP BY. Each group is evaluated.
Can (and usually does) reference aggregate functions.
Use for aggregate results: HAVING COUNT(*) > 1
Use for group totals: HAVING SUM(quantity) >= 10
Use for group averages: HAVING AVG(price) < 8
Can also filter by grouped columns, but WHERE is faster for that
-- Combined WHERE + GROUP BY + HAVING + ORDER BY-- "For fiction and sci-fi books only, which genres have average stock above 20?"SELECTgenre,
COUNT(*) AS book_count,
AVG(stock) AS avg_stock,
SUM(stock) AS total_stock
FROMbooksWHEREgenreIN ('Fiction', 'Sci-fi') -- row filter: only these genres enter groupingGROUP BYgenreHAVINGAVG(stock) >20-- group filter: only groups with high avg stockORDER BYavg_stockDESC;
5. Full Clause Execution Order
MySQL evaluates clauses in a fixed logical order that is different from the order
you write them. Understanding this explains every "column not found" and "can't use
aggregate in WHERE" error:
Sort the final result — aliases from SELECT are now available
7
LIMIT / OFFSET
Trim the result to the requested page
Why can't I use a SELECT alias in WHERE or HAVING?
Because SELECT runs after both WHERE and HAVING. The alias doesn't exist at the
time WHERE and HAVING are evaluated. You can use aliases in ORDER BY because that
runs after SELECT. To reuse a calculation in HAVING without repeating it, some
databases support column aliases in HAVING — MySQL does allow this as an extension,
but it's not standard SQL and can behave unexpectedly. Repeating the expression
is safer.
6. Practical Patterns
Revenue per customer
SELECTcustomer_id,
COUNT(*) AS order_count,
SUM(total_price) AS total_spent,
AVG(total_price) AS avg_order_value
FROMordersGROUP BYcustomer_idORDER BYtotal_spentDESC;
customer_id
order_count
total_spent
avg_order_value
1
2
28.97
14.49
3
1
20.97
20.97
2
1
9.99
9.99
4
1
7.99
7.99
Customer 1 (Alice) has placed two orders totalling £28.97 — the highest spender.
Finding customers who spent more than £15 total
SELECTcustomer_id,
SUM(total_price) AS total_spent
FROMordersGROUP BYcustomer_idHAVINGSUM(total_price) >15.00ORDER BYtotal_spentDESC;
customer_id
total_spent
1
28.97
3
20.97
Books per author — minimum one book
SELECTauthor_id,
COUNT(*) AS books_published,
MIN(published_year) AS first_year,
MAX(published_year) AS latest_year,
SUM(stock) AS total_copies_in_stock
FROMbooksGROUP BYauthor_idORDER BYbooks_publishedDESC;
author_id
books_published
first_year
latest_year
total_copies_in_stock
1
2
2015
2020
60
2
1
1965
1965
27
3
1
2011
2011
31
4
1
1939
1939
55
5
1
1969
1969
9
Author 1 (Matt Haig) has two books in the catalogue. IDs rather than names appear here — Chapter 8 (JOINs) will combine both tables to show the name alongside.
Counting NULL vs non-NULL — COUNT(*) vs COUNT(col)
-- How many authors have a nationality recorded vs unknown?SELECTCOUNT(*) AS total_authors,
COUNT(nationality) AS with_nationality,
COUNT(*) -COUNT(nationality) AS unknown_nationality
FROMauthors;
total_authors
with_nationality
unknown_nationality
5
4
1
One author has no nationality recorded (Ursula Le Guin's nationality was set to NULL in Chapter 5). COUNT(*) caught it; COUNT(nationality) skipped it.
WITH ROLLUP — adding a grand total row
WITH ROLLUP is a GROUP BY modifier that appends an extra summary row
for each grouping level, with the final row representing the grand total across all
groups:
SELECTIFNULL(genre, 'TOTAL') AS genre,
COUNT(*) AS book_count,
SUM(stock) AS total_stock
FROMbooksGROUP BYgenreWITH ROLLUP;
-- ROLLUP sets the grouping column to NULL in the summary rows-- IFNULL converts NULL → 'TOTAL' for the label
genre
book_count
total_stock
Fiction
1
42
Mystery
1
55
Non-fiction
2
49
Sci-fi
2
36
TOTAL
6
182
The final row is the grand total across all genres — added automatically by WITH ROLLUP.
Chapter Summary
Concept
Key points
COUNT(*)
Counts all rows, including those with NULL values in any column.
COUNT(col)
Counts only non-NULL values in that column. Use COUNT(*) − COUNT(col) to count NULLs.
SUM / AVG / MIN / MAX
All ignore NULL values. AVG denominates by non-NULL count — not total row count.
COUNT(DISTINCT col)
Counts unique non-NULL values — useful for "how many different X" questions.
GROUP BY
Splits rows into groups; aggregates run per group. Every non-aggregated SELECT column must appear in GROUP BY.
HAVING
Filters groups after aggregation. The only place you can use aggregate functions in a condition. WHERE filters rows; HAVING filters groups.
Execution order
FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. Aliases from SELECT are only available in ORDER BY.
WITH ROLLUP
Appends subtotal and grand total rows to a GROUP BY result. ROLLUP sets grouping columns to NULL in summary rows — use IFNULL to label them.
Next: Chapter 8 — JOINs.
The queries in this chapter returned author IDs rather than author names — because
the name lives in a different table. JOINs combine rows from two or more tables based
on a shared key. Chapter 8 covers INNER JOIN, LEFT JOIN, RIGHT JOIN, self-joins, and
the difference between them.
Every chapter so far has queried a single table. Real databases split data across
multiple related tables — orders live in orders, the customer name lives
in customers, the book title lives in books. A JOIN combines
rows from two or more tables based on a matching column, letting you query that
connected data as if it were one result set.
Why split data across tables? If we stored the customer name inside
every order row, changing a customer's name would require updating every one of their
orders — and any that were missed would have inconsistent data. By keeping names in
one place (customers) and referencing them by ID, a single UPDATE fixes
everything. JOINs are the mechanism that puts it back together at query time.
1. Table Aliases — Keeping Queries Readable
JOIN queries reference two or more tables. Qualifying every column name with the
full table name (orders.customer_id, customers.customer_id)
gets verbose fast. A table alias — a short nickname assigned in the FROM clause —
keeps things concise:
-- Without aliases — verboseSELECTorders.order_id, customers.first_name, customers.last_nameFROMordersJOINcustomersONorders.customer_id=customers.customer_id;
-- With aliases — the standard waySELECTo.order_id, c.first_name, c.last_nameFROMordersoJOINcustomerscONo.customer_id=c.customer_id;
The alias follows the table name in the FROM or JOIN clause, with or without the
optional keyword AS. Both forms are identical — orders o and
orders AS o mean the same thing.
a→authors
b→books
c→customers
o→orders
We'll use these single-letter aliases consistently throughout this chapter.
2. INNER JOIN — Only Matching Rows
An INNER JOIN returns rows where the ON condition matches in both tables.
Rows that have no match on the other side are silently excluded. JOIN and
INNER JOIN are identical — INNER is optional.
-- Every order with the customer name and book titleSELECTo.order_id,
c.first_name,
c.last_name,
b.title,
o.quantity,
o.total_priceFROMordersoINNER JOINcustomerscONo.customer_id=c.customer_idINNER JOINbooksbONo.book_id=b.book_idORDER BYo.order_id;
order_id
first_name
last_name
title
quantity
total_price
1
Alice
Nguyen
The Midnight Library
2
17.98
2
Alice
Nguyen
Sapiens
1
10.99
3
Ben
Okafor
Dune
1
9.99
4
Chloe
Martinez
And Then There Were None
3
20.97
5
David
Singh
Reasons to Stay Alive
1
7.99
Three tables joined in one query. Customer IDs and book IDs are replaced with the real names and titles.
Joining all four tables — a full order receipt
-- Full receipt: order → customer → book → authorSELECTo.order_id,
o.order_date,
CONCAT(c.first_name, ' ', c.last_name) AS customer,
b.title,
CONCAT(a.first_name, ' ', a.last_name) AS author,
o.quantity,
o.total_priceFROMordersoJOINcustomerscONo.customer_id=c.customer_idJOINbooksbONo.book_id=b.book_idJOINauthorsaONb.author_id=a.author_idORDER BYo.order_date;
order_id
order_date
customer
title
author
qty
total
1
2024-02-14
Alice Nguyen
The Midnight Library
Matt Haig
2
17.98
2
2024-02-14
Alice Nguyen
Sapiens
Yuval Harari
1
10.99
3
2024-03-05
Ben Okafor
Dune
Frank Herbert
1
9.99
4
2024-03-20
Chloe Martinez
And Then There Were None
Agatha Christie
3
20.97
5
2024-04-10
David Singh
Reasons to Stay Alive
Matt Haig
1
7.99
All four tables joined in one query. Notice David (customer 4) was added in Chapter 5 but never ordered — he won't appear here because INNER JOIN only keeps matched rows.
3. JOIN Types at a Glance
INNER JOIN
Only rows that match in both tables. No match = excluded.
LEFT JOIN
All rows from the left table. Right side NULLs where no match.
RIGHT JOIN
All rows from the right table. Left side NULLs where no match.
LEFT ANTI-JOIN
Left rows with no match on the right. Find orphans / unmatched records.
JOIN type
Keeps from left
Keeps from right
Best for
INNER JOIN
Matched only
Matched only
The common case — you only care about rows that have data on both sides
LEFT JOIN
All rows
Matched rows + NULLs for non-matches
Keep all left rows even without a right-side match — "show all customers, even those with no orders"
RIGHT JOIN
Matched rows + NULLs for non-matches
All rows
Rarely needed — rewrite as LEFT JOIN by swapping table order instead
LEFT JOIN + WHERE right.col IS NULL
Unmatched only
Excluded
Find rows with no match — "customers who have never ordered", "books with no orders"
CROSS JOIN
All rows
All rows
Cartesian product — every left row paired with every right row. Rarely useful; almost always accidental.
4. LEFT JOIN — Keeping Unmatched Left Rows
LEFT JOIN returns all rows from the left table, whether or not they have a matching
row in the right table. When there is no match, every column from the right table is
NULL in that result row.
-- All customers, with their orders if they have anySELECTc.customer_id,
c.first_name,
c.last_name,
o.order_id,
o.total_priceFROMcustomerscLEFT JOINordersoONc.customer_id=o.customer_idORDER BYc.customer_id;
customer_id
first_name
last_name
order_id
total_price
1
Alice
Nguyen
1
17.98
1
Alice
Nguyen
2
10.99
2
Ben
Okafor
3
9.99
3
Chloe
Martinez
4
20.97
4
David
Singh
NULL
NULL
5
David
Singh (if added)
NULL
NULL
David (customer 4) has never ordered — LEFT JOIN still includes him, with NULLs for the orders columns.
LEFT JOIN + aggregate — counting orders per customer including zeros
-- Count orders per customer — include customers with zero ordersSELECTc.customer_id,
CONCAT(c.first_name, ' ', c.last_name) AS customer,
COUNT(o.order_id) AS order_count,
COALESCE(SUM(o.total_price), 0) AS total_spent
FROMcustomerscLEFT JOINordersoONc.customer_id=o.customer_idGROUP BYc.customer_id, c.first_name, c.last_nameORDER BYtotal_spentDESC;
customer_id
customer
order_count
total_spent
1
Alice Nguyen
2
28.97
3
Chloe Martinez
1
20.97
2
Ben Okafor
1
9.99
4
David Singh
0
0.00
David appears with 0 orders and £0.00 spent. COUNT(o.order_id) returns 0 (not COUNT(*)) because o.order_id is NULL for David — COUNT(col) skips NULLs. COALESCE converts NULL SUM to 0.00.
COUNT(*) vs COUNT(right_table_col) in a LEFT JOIN.
After a LEFT JOIN, unmatched rows have NULL in every right-table column.
COUNT(*) would count those NULL rows as 1. Use
COUNT(o.order_id) — it skips NULLs and correctly returns 0 for
customers with no orders.
5. Anti-Join — Finding Rows With No Match
Combine LEFT JOIN with WHERE right_table.col IS NULL to find rows that
have no corresponding entry in the joined table. This is one of the most practical JOIN
patterns in real applications:
-- Customers who have never placed an orderSELECTc.customer_id,
c.first_name,
c.last_nameFROMcustomerscLEFT JOINordersoONc.customer_id=o.customer_idWHEREo.customer_idIS NULL; -- only keep rows where no order matched
customer_id
first_name
last_name
4
David
Singh
-- Books that have never been orderedSELECTb.book_id,
b.titleFROMbooksbLEFT JOINordersoONb.book_id=o.book_idWHEREo.book_idIS NULL;
book_id
title
6
The Left Hand of Darkness
The Left Hand of Darkness has never been ordered — a good candidate for a promotion or a stock review.
6. RIGHT JOIN
RIGHT JOIN keeps all rows from the right table and NULLs for the left where there
is no match — the mirror image of LEFT JOIN. In practice, RIGHT JOIN is rarely used
because any RIGHT JOIN can be rewritten as a LEFT JOIN by swapping the table order,
which most developers find easier to read:
-- These two queries return identical results:-- Using RIGHT JOINSELECTb.title, o.order_idFROMordersoRIGHT JOINbooksbONo.book_id=b.book_id;
-- Equivalent LEFT JOIN (just swap the table order)SELECTb.title, o.order_idFROMbooksbLEFT JOINordersoONb.book_id=o.book_id;
Prefer LEFT JOIN over RIGHT JOIN. Both do the same job but most
developers read SQL top-to-bottom and find it more natural to start with the "primary"
table on the left. Mixing LEFT and RIGHT JOINs in the same query quickly becomes
confusing. Stick to LEFT JOIN and swap table order when needed.
7. Self-Join — Joining a Table to Itself
A self-join joins a table to itself using two different aliases. It's used when rows
in the same table have a relationship to each other — the classic example is an
employees table where each employee has a manager_id that
points to another row in the same table:
-- Suppose our authors table had a "mentored_by" column-- pointing to another author_id in the same table-- Self-join: show each author alongside their mentorSELECTa.first_nameAS author,
m.first_nameAS mentor
FROMauthorsaLEFT JOINauthorsmONa.mentored_by=m.author_id;
The bookshop schema doesn't have a recursive relationship, but self-joins come up
often in organisational charts (employees + managers), category hierarchies (parent
category → child category), and graph structures stored in relational tables.
8. Combining JOINs With Everything Else
JOINs compose cleanly with every clause from previous chapters. The execution order
stays the same — FROM + JOIN happens first, then WHERE, GROUP BY, HAVING, SELECT,
ORDER BY, LIMIT:
Sales report by genre
-- Total revenue and units sold per genre, top genres firstSELECTb.genre,
COUNT(DISTINCTo.order_id) AS order_lines,
SUM(o.quantity) AS units_sold,
SUM(o.total_price) AS revenue
FROMordersoJOINbooksbONo.book_id=b.book_idGROUP BYb.genreORDER BYrevenueDESC;
genre
order_lines
units_sold
revenue
Mystery
1
3
20.97
Fiction
1
2
17.98
Non-fiction
2
2
18.98
Sci-fi
1
1
9.99
Best-selling authors
-- Revenue per author, only those who have sold somethingSELECTCONCAT(a.first_name, ' ', a.last_name) AS author,
SUM(o.quantity) AS units_sold,
SUM(o.total_price) AS revenue
FROMauthorsaJOINbooksbONa.author_id=b.author_idJOINordersoONb.book_id=o.book_idGROUP BYa.author_id, a.first_name, a.last_nameORDER BYrevenueDESC;
author
units_sold
revenue
Agatha Christie
3
20.97
Matt Haig
3
25.97
Yuval Harari
1
10.99
Frank Herbert
1
9.99
Ursula Le Guin doesn't appear — her book has no orders (INNER JOIN excludes her). Use LEFT JOIN chains if you need to include authors with zero sales.
ON vs WHERE — filtering in the right place
-- Row filter (same for INNER and LEFT JOIN — fine in either place)SELECTc.first_name, o.order_id, o.total_priceFROMcustomerscLEFT JOINordersoONc.customer_id=o.customer_idWHEREo.total_price>10.00; -- filters AFTER join; loses unmatched customers-- To filter on the joined table WITHOUT losing unmatched rows from the left,-- move the condition into the ON clause:SELECTc.first_name, o.order_id, o.total_priceFROMcustomerscLEFT JOINordersoONc.customer_id=o.customer_idANDo.total_price>10.00;
-- Customers with no qualifying orders still appear; their order columns are NULL
In a LEFT JOIN, filtering on a right-table column in WHERE converts it to an INNER JOIN.
Once WHERE filters out rows where the right-table column is NULL, the "keep all left rows"
guarantee is broken. Move right-table filters into the ON clause if you still need
unmatched left rows in the result.
Chapter Summary
Concept
Key points
Table alias
Short nickname (a, b, c, o) assigned in FROM/JOIN. Required when the same column name exists in multiple tables. Use alias.column to qualify.
INNER JOIN
Returns only rows that match in both tables. Unmatched rows on either side are excluded. Most common join type.
LEFT JOIN
Returns all rows from the left table. Right-table columns are NULL for rows with no match. Use to include records regardless of whether a related record exists.
Anti-join pattern
LEFT JOIN + WHERE right.col IS NULL — finds left-table rows with no match on the right. "Customers who never ordered", "books never sold".
RIGHT JOIN
Mirror of LEFT JOIN. Rarely used — rewrite as LEFT JOIN by swapping table order.
Self-join
Join a table to itself using two aliases. Used for hierarchical/recursive data: manager → employee, parent category → child category.
COUNT in LEFT JOIN
Use COUNT(right_table.pk) not COUNT(*) — COUNT(*) counts NULL rows as 1; COUNT(col) skips them, giving the correct 0 for unmatched rows.
ON vs WHERE in LEFT JOIN
Filtering on a right-table column in WHERE silently converts LEFT JOIN to INNER JOIN. Move right-table filters into the ON clause to preserve unmatched left rows.
Next: Chapter 9 — String, Date, and Numeric Functions.
MySQL has dozens of built-in functions for transforming data: CONCAT, SUBSTRING, REPLACE,
UPPER/LOWER for strings; DATE_FORMAT, DATEDIFF, DATE_ADD for dates; ROUND, FLOOR, CEIL,
MOD for numbers. Chapter 9 covers the ones you'll reach for every week.