Chapter 8

Chapter 7 — Aggregate Functions and GROUP BY

Every query so far has returned one row per row in the table. Aggregate functions change that — they collapse many rows into a single summary value. Combined with GROUP BY, they let you produce per-category summaries: total sales per genre, average price per author, number of orders per customer. This chapter covers the five core aggregate functions, how GROUP BY works, and how HAVING lets you filter those summaries.

1. The Five Core Aggregate Functions

COUNT()

COUNT(*) or COUNT(col)

Count rows. COUNT(*) counts all rows including NULLs. COUNT(col) counts only non-NULL values in that column.

SUM()

SUM(col)

Total of all non-NULL values. Returns NULL if all values are NULL (use COALESCE to default to 0).

AVG()

AVG(col)

Mean of all non-NULL values. NULLs are excluded from both numerator and denominator — they don't count as zero.

MIN()

MIN(col)

Smallest value. Works on numbers, strings (alphabetical), and dates. Ignores NULLs.

MAX()

MAX(col)

Largest value. Works on numbers, strings, and dates. Ignores NULLs.

COUNT(DISTINCT)

COUNT(DISTINCT col)

Count unique non-NULL values. Useful for "how many different genres do we stock?" rather than "how many books?"

Aggregates across the whole table

Without GROUP BY, an aggregate function collapses the entire table into one row:

SELECT COUNT(*) AS total_books, COUNT(genre) AS books_with_genre, -- excludes NULLs COUNT(DISTINCT genre) AS unique_genres, SUM(stock) AS total_stock, AVG(price) AS avg_price, MIN(price) AS cheapest, MAX(price) AS most_expensive, MIN(published_year) AS oldest_year, MAX(published_year) AS newest_year FROM books;

total_books	books_with_genre	unique_genres	total_stock	avg_price	cheapest	most_expensive	oldest_year	newest_year
6	6	4	182	9.07	6.99	10.99	1939	2020

All six books collapsed into one summary row. avg_price is rounded here for display — MySQL returns the full decimal.

2. NULL Behaviour in Aggregates

All aggregate functions silently ignore NULL values — except COUNT(*) which counts rows regardless. This asymmetry causes a common mistake with AVG:

-- Suppose 2 of our 6 books have no stock value (NULL) -- These two queries give DIFFERENT answers: SELECT AVG(stock) AS avg_stock FROM books; -- MySQL: ignores the 2 NULLs, averages the 4 known values -- Result: 30.25 (121 / 4) SELECT SUM(stock) / COUNT(*) AS avg_stock FROM books; -- Divides by 6 (all rows), treating NULLs as if stock were 0 -- Result: 20.17 (121 / 6) -- If you mean "NULL stock counts as zero in the average", be explicit: SELECT AVG(IFNULL(stock, 0)) AS avg_stock FROM books; -- Result: 20.17 — matches SUM/COUNT(*) calculation

AVG skips NULLs — it doesn't treat them as zero. If stock is NULL for some books and you want those to contribute 0 to the average, wrap the column in IFNULL(stock, 0). If you genuinely want to average only books where stock is known, AVG(stock) is correct as-is.

3. GROUP BY — Per-Category Summaries

GROUP BY splits the rows into groups — one group per distinct value of the grouping column(s) — and runs the aggregate function separately on each group. The result has one row per group.

-- How many books and what is the average price per genre? SELECT genre, COUNT(*) AS book_count, AVG(price) AS avg_price, SUM(stock) AS total_stock FROM books GROUP BY genre ORDER BY book_count DESC;

genre	book_count	avg_price	total_stock
Non-fiction	2	9.49	49
Sci-fi	2	9.24	36
Fiction	1	8.99	42
Mystery	1	6.99	55

Four genres, four result rows. MySQL grouped all Sci-fi books together, calculated COUNT/AVG/SUM for that group, then repeated for each genre.

Grouping by multiple columns

You can group by two or more columns to get a cross-tabulation. Each unique combination of the grouped columns becomes its own row:

-- Orders per customer per book SELECT customer_id, book_id, COUNT(*) AS times_ordered, SUM(quantity) AS total_qty FROM orders GROUP BY customer_id, book_id ORDER BY customer_id;

The golden rule of GROUP BY

Every column in SELECT must either be in GROUP BY, or wrapped in an aggregate function. If you put a non-grouped, non-aggregated column in SELECT, MySQL (in strict mode) will error. In non-strict mode it picks an arbitrary value from the group — which is almost certainly wrong.

-- ✗ WRONG: title is not in GROUP BY and not aggregated SELECT genre, title, COUNT(*) FROM books GROUP BY genre; -- ERROR 1055: 'title' is not in GROUP BY and is not aggregated -- ✓ CORRECT option 1: add title to GROUP BY (group by both) SELECT genre, title, COUNT(*) FROM books GROUP BY genre, title; -- ✓ CORRECT option 2: aggregate title instead SELECT genre, MIN(title) AS sample_title, COUNT(*) FROM books GROUP BY genre;

4. HAVING — Filtering Groups

WHERE filters individual rows before grouping. HAVING filters groups after the aggregate has been calculated. It's the only place you can reference aggregate functions in a filter condition.

-- Only show genres with more than one book and average price above £8 SELECT genre, COUNT(*) AS book_count, AVG(price) AS avg_price FROM books GROUP BY genre HAVING COUNT(*) > 1 AND AVG(price) > 8.00 ORDER BY avg_price DESC;

genre	book_count	avg_price
Non-fiction	2	9.49
Sci-fi	2	9.24

Fiction (1 book) and Mystery (1 book) were excluded by HAVING COUNT(*) > 1.

WHERE vs HAVING — the key distinction

WHERE — filters rows before grouping

Runs before GROUP BY. Each row is evaluated individually. Cannot reference aggregate functions — the aggregates don't exist yet.

Use for column values: WHERE price > 5
Use for date ranges: WHERE order_date > '2024-01-01'
Use for row-level filters: WHERE genre IS NOT NULL
Faster — reduces rows before the costly grouping step

HAVING — filters groups after aggregating

Runs after GROUP BY. Each group is evaluated. Can (and usually does) reference aggregate functions.

Use for aggregate results: HAVING COUNT(*) > 1
Use for group totals: HAVING SUM(quantity) >= 10
Use for group averages: HAVING AVG(price) < 8
Can also filter by grouped columns, but WHERE is faster for that

-- Combined WHERE + GROUP BY + HAVING + ORDER BY -- "For fiction and sci-fi books only, which genres have average stock above 20?" SELECT genre, COUNT(*) AS book_count, AVG(stock) AS avg_stock, SUM(stock) AS total_stock FROM books WHERE genre IN ('Fiction', 'Sci-fi') -- row filter: only these genres enter grouping GROUP BY genre HAVING AVG(stock) > 20 -- group filter: only groups with high avg stock ORDER BY avg_stock DESC;

5. Full Clause Execution Order

MySQL evaluates clauses in a fixed logical order that is different from the order you write them. Understanding this explains every "column not found" and "can't use aggregate in WHERE" error:

FROM

Identify the source table(s) and load the rows

WHERE

Filter individual rows — aggregates don't exist yet

GROUP BY

Partition surviving rows into groups

HAVING

Filter groups — aggregates are now calculated and available

SELECT

Choose columns, compute expressions, assign aliases

ORDER BY

Sort the final result — aliases from SELECT are now available

LIMIT / OFFSET

Trim the result to the requested page

Why can't I use a SELECT alias in WHERE or HAVING? Because SELECT runs after both WHERE and HAVING. The alias doesn't exist at the time WHERE and HAVING are evaluated. You can use aliases in ORDER BY because that runs after SELECT. To reuse a calculation in HAVING without repeating it, some databases support column aliases in HAVING — MySQL does allow this as an extension, but it's not standard SQL and can behave unexpectedly. Repeating the expression is safer.

6. Practical Patterns

Revenue per customer

SELECT customer_id, COUNT(*) AS order_count, SUM(total_price) AS total_spent, AVG(total_price) AS avg_order_value FROM orders GROUP BY customer_id ORDER BY total_spent DESC;

customer_id	order_count	total_spent	avg_order_value
1	2	28.97	14.49
3	1	20.97	20.97
2	1	9.99	9.99
4	1	7.99	7.99

Customer 1 (Alice) has placed two orders totalling £28.97 — the highest spender.

Finding customers who spent more than £15 total

SELECT customer_id, SUM(total_price) AS total_spent FROM orders GROUP BY customer_id HAVING SUM(total_price) > 15.00 ORDER BY total_spent DESC;

customer_id	total_spent
1	28.97
3	20.97

Books per author — minimum one book

SELECT author_id, COUNT(*) AS books_published, MIN(published_year) AS first_year, MAX(published_year) AS latest_year, SUM(stock) AS total_copies_in_stock FROM books GROUP BY author_id ORDER BY books_published DESC;

author_id	books_published	first_year	latest_year	total_copies_in_stock
1	2	2015	2020	60
2	1	1965	1965	27
3	1	2011	2011	31
4	1	1939	1939	55
5	1	1969	1969	9

Author 1 (Matt Haig) has two books in the catalogue. IDs rather than names appear here — Chapter 8 (JOINs) will combine both tables to show the name alongside.

Counting NULL vs non-NULL — COUNT(*) vs COUNT(col)

-- How many authors have a nationality recorded vs unknown? SELECT COUNT(*) AS total_authors, COUNT(nationality) AS with_nationality, COUNT(*) - COUNT(nationality) AS unknown_nationality FROM authors;

total_authors	with_nationality	unknown_nationality
5	4	1

One author has no nationality recorded (Ursula Le Guin's nationality was set to NULL in Chapter 5). COUNT(*) caught it; COUNT(nationality) skipped it.

WITH ROLLUP — adding a grand total row

WITH ROLLUP is a GROUP BY modifier that appends an extra summary row for each grouping level, with the final row representing the grand total across all groups:

SELECT IFNULL(genre, 'TOTAL') AS genre, COUNT(*) AS book_count, SUM(stock) AS total_stock FROM books GROUP BY genre WITH ROLLUP; -- ROLLUP sets the grouping column to NULL in the summary rows -- IFNULL converts NULL → 'TOTAL' for the label

genre	book_count	total_stock
Fiction	1	42
Mystery	1	55
Non-fiction	2	49
Sci-fi	2	36
TOTAL	6	182

The final row is the grand total across all genres — added automatically by WITH ROLLUP.

Chapter Summary

Concept	Key points
COUNT(*)	Counts all rows, including those with NULL values in any column.
COUNT(col)	Counts only non-NULL values in that column. Use COUNT(*) − COUNT(col) to count NULLs.
SUM / AVG / MIN / MAX	All ignore NULL values. AVG denominates by non-NULL count — not total row count.
COUNT(DISTINCT col)	Counts unique non-NULL values — useful for "how many different X" questions.
GROUP BY	Splits rows into groups; aggregates run per group. Every non-aggregated SELECT column must appear in GROUP BY.
HAVING	Filters groups after aggregation. The only place you can use aggregate functions in a condition. WHERE filters rows; HAVING filters groups.
Execution order	FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. Aliases from SELECT are only available in ORDER BY.
WITH ROLLUP	Appends subtotal and grand total rows to a GROUP BY result. ROLLUP sets grouping columns to NULL in summary rows — use IFNULL to label them.

Next: Chapter 8 — JOINs. The queries in this chapter returned author IDs rather than author names — because the name lives in a different table. JOINs combine rows from two or more tables based on a shared key. Chapter 8 covers INNER JOIN, LEFT JOIN, RIGHT JOIN, self-joins, and the difference between them.

.tip-box strong { color: #7ecfa0; } .warn-box { background: #1f1a10; border-left: 3px solid #ffd580; border-radius: 0 6px 6px 0; padding: 12px 16px; margin-bottom: 16px; font-size: 0.88em; color: #b08a3a; } .warn-box strong { color: #ffd580; } .danger-box { background: #1f1010; border-left: 3px solid #f08080; border-radius: 0 6px 6px 0; padding: 12px 16px; margin-bottom: 16px; font-size: 0.88em; color: #b05a5a; } .danger-box strong { color: #f08080; } /* ── SQL block ──────────────────────────────────────────────── */ .sql-block { background: #0d0f18; border: 1px solid #2a2d3a; border-radius: 6px; padding: 14px 18px; font-family: 'Courier New', Consolas, monospace; font-size: 0.82em; line-height: 1.7; margin: 10px 0 18px; overflow-x: auto; white-space: pre; } .kw { color: #4f8ef7; font-weight: bold; } .fn { color: #c9a8ff; } .str { color: #ffd580; } .num { color: #f08080; } .cmt { color: #484f58; font-style: italic; } .tbl { color: #7ecfa0; } .out { color: #8b949e; } .op { color: #f08080; } /* ── Result table ───────────────────────────────────────────── */ .result-table { border-collapse: collapse; font-family: 'Courier New', monospace; font-size: 0.79em; margin: 0 0 6px; background: #0d0f18; border: 1px solid #2a2d3a; border-radius: 6px; overflow: hidden; width: 100%; } .result-table th { background: #1a1d27; color: #4f8ef7; padding: 7px 14px; text-align: left; border-bottom: 1px solid #2a2d3a; white-space: nowrap; font-family: system-ui, sans-serif; font-size: 0.85em; } .result-table td { padding: 6px 14px; border-bottom: 1px solid #1a1d27; color: #c9d1d9; } .result-table tr:last-child td { border-bottom: none; } .result-table td.null-val { color: #484f58; font-style: italic; } .result-table td.hi { color: #ffd580; } .result-table td.green { color: #7ecfa0; } .result-caption { font-family: system-ui, sans-serif; font-size: 0.72em; color: #484f58; margin: 0 0 20px; padding-left: 2px; } /* ── Venn diagram ───────────────────────────────────────────── */ .venn-row { display: flex; gap: 14px; flex-wrap: wrap; margin: 16px 0 24px; justify-content: center; } .venn-card { background: #1a1d27; border: 1px solid #2a2d3a; border-radius: 8px; padding: 14px 16px; flex: 1; min-width: 180px; max-width: 220px; text-align: center; } .venn-card__svg { margin: 0 auto 10px; display: block; } .venn-card__title { font-family: system-ui, sans-serif; font-size: 0.8em; font-weight: 700; color: #c9d1d9; margin-bottom: 4px; } .venn-card__desc { font-family: system-ui, sans-serif; font-size: 0.71em; color: #8b949e; line-height: 1.45; } /* ── Join comparison table ──────────────────────────────────── */ .join-table { border-collapse: collapse; width: 100%; font-size: 0.8em; margin: 12px 0 22px; } .join-table th { background: #1a1d27; color: #8b949e; padding: 8px 12px; text-align: left; border-bottom: 2px solid #2a2d3a; font-family: system-ui, sans-serif; } .join-table td { padding: 8px 12px; border-bottom: 1px solid #1a1d27; vertical-align: top; color: #8b949e; font-family: system-ui, sans-serif; } .join-table tr:last-child td { border-bottom: none; } .join-table .jname { font-family: 'Courier New', monospace; color: #4f8ef7; font-weight: 700; white-space: nowrap; } .join-table .keeps { color: #7ecfa0; } .join-table .drops { color: #f08080; } .join-table .use { color: #c9d1d9; } /* ── Alias legend ───────────────────────────────────────────── */ .alias-legend { display: flex; gap: 10px; flex-wrap: wrap; margin: 10px 0 18px; } .alias-pill { background: #1a1d27; border: 1px solid #2a2d3a; border-radius: 20px; padding: 4px 12px; font-family: 'Courier New', monospace; font-size: 0.75em; } .alias-pill .alias { color: #4f8ef7; } .alias-pill .sep { color: #484f58; margin: 0 4px; } .alias-pill .full { color: #7ecfa0; } /* ── Summary table ──────────────────────────────────────────── */ .summary-table { border-collapse: collapse; width: 100%; font-size: 0.82em; margin-top: 10px; } .summary-table th { background: #1a1d27; color: #8b949e; padding: 8px 12px; text-align: left; border-bottom: 2px solid #2a2d3a; } .summary-table td { padding: 7px 12px; border-bottom: 1px solid #1a1d27; color: #8b949e; vertical-align: top; } .summary-table tr:last-child td { border-bottom: none; } .summary-table .label { color: #c9d1d9; font-weight: 600; white-space: nowrap; font-family: 'Courier New', monospace; font-size: 0.95em; } .next-chapter { margin-top: 2rem; padding: 12px 16px; background: #1a1d27; border-radius: 6px; font-family: system-ui, sans-serif; font-size: 0.83em; color: #8b949e; } .next-chapter strong { color: #4f8ef7; }

Chapter 8 — JOINs

Every chapter so far has queried a single table. Real databases split data across multiple related tables — orders live in orders, the customer name lives in customers, the book title lives in books. A JOIN combines rows from two or more tables based on a matching column, letting you query that connected data as if it were one result set.

Why split data across tables? If we stored the customer name inside every order row, changing a customer's name would require updating every one of their orders — and any that were missed would have inconsistent data. By keeping names in one place (customers) and referencing them by ID, a single UPDATE fixes everything. JOINs are the mechanism that puts it back together at query time.

1. Table Aliases — Keeping Queries Readable

JOIN queries reference two or more tables. Qualifying every column name with the full table name (orders.customer_id, customers.customer_id) gets verbose fast. A table alias — a short nickname assigned in the FROM clause — keeps things concise:

-- Without aliases — verbose SELECT orders.order_id, customers.first_name, customers.last_name FROM orders JOIN customers ON orders.customer_id = customers.customer_id; -- With aliases — the standard way SELECT o.order_id, c.first_name, c.last_name FROM orders o JOIN customers c ON o.customer_id = c.customer_id;

The alias follows the table name in the FROM or JOIN clause, with or without the optional keyword AS. Both forms are identical — orders o and orders AS o mean the same thing.

a→authors

b→books

c→customers

o→orders

We'll use these single-letter aliases consistently throughout this chapter.

2. INNER JOIN — Only Matching Rows

An INNER JOIN returns rows where the ON condition matches in both tables. Rows that have no match on the other side are silently excluded. JOIN and INNER JOIN are identical — INNER is optional.

-- Every order with the customer name and book title SELECT o.order_id, c.first_name, c.last_name, b.title, o.quantity, o.total_price FROM orders o INNER JOIN customers c ON o.customer_id = c.customer_id INNER JOIN books b ON o.book_id = b.book_id ORDER BY o.order_id;

order_id	first_name	last_name	title	quantity	total_price
1	Alice	Nguyen	The Midnight Library	2	17.98
2	Alice	Nguyen	Sapiens	1	10.99
3	Ben	Okafor	Dune	1	9.99
4	Chloe	Martinez	And Then There Were None	3	20.97
5	David	Singh	Reasons to Stay Alive	1	7.99

Three tables joined in one query. Customer IDs and book IDs are replaced with the real names and titles.

Joining all four tables — a full order receipt

-- Full receipt: order → customer → book → author SELECT o.order_id, o.order_date, CONCAT(c.first_name, ' ', c.last_name) AS customer, b.title, CONCAT(a.first_name, ' ', a.last_name) AS author, o.quantity, o.total_price FROM orders o JOIN customers c ON o.customer_id = c.customer_id JOIN books b ON o.book_id = b.book_id JOIN authors a ON b.author_id = a.author_id ORDER BY o.order_date;

order_id	order_date	customer	title	author	qty	total
1	2024-02-14	Alice Nguyen	The Midnight Library	Matt Haig	2	17.98
2	2024-02-14	Alice Nguyen	Sapiens	Yuval Harari	1	10.99
3	2024-03-05	Ben Okafor	Dune	Frank Herbert	1	9.99
4	2024-03-20	Chloe Martinez	And Then There Were None	Agatha Christie	3	20.97
5	2024-04-10	David Singh	Reasons to Stay Alive	Matt Haig	1	7.99

All four tables joined in one query. Notice David (customer 4) was added in Chapter 5 but never ordered — he won't appear here because INNER JOIN only keeps matched rows.

3. JOIN Types at a Glance

INNER JOIN

Only rows that match in both tables. No match = excluded.

LEFT JOIN

All rows from the left table. Right side NULLs where no match.

RIGHT JOIN

All rows from the right table. Left side NULLs where no match.

LEFT ANTI-JOIN

Left rows with no match on the right. Find orphans / unmatched records.

JOIN type	Keeps from left	Keeps from right	Best for
INNER JOIN	Matched only	Matched only	The common case — you only care about rows that have data on both sides
LEFT JOIN	All rows	Matched rows + NULLs for non-matches	Keep all left rows even without a right-side match — "show all customers, even those with no orders"
RIGHT JOIN	Matched rows + NULLs for non-matches	All rows	Rarely needed — rewrite as LEFT JOIN by swapping table order instead
LEFT JOIN + WHERE right.col IS NULL	Unmatched only	Excluded	Find rows with no match — "customers who have never ordered", "books with no orders"
CROSS JOIN	All rows	All rows	Cartesian product — every left row paired with every right row. Rarely useful; almost always accidental.

4. LEFT JOIN — Keeping Unmatched Left Rows

LEFT JOIN returns all rows from the left table, whether or not they have a matching row in the right table. When there is no match, every column from the right table is NULL in that result row.

-- All customers, with their orders if they have any SELECT c.customer_id, c.first_name, c.last_name, o.order_id, o.total_price FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id ORDER BY c.customer_id;

customer_id	first_name	last_name	order_id	total_price
1	Alice	Nguyen	1	17.98
1	Alice	Nguyen	2	10.99
2	Ben	Okafor	3	9.99
3	Chloe	Martinez	4	20.97
4	David	Singh	NULL	NULL
5	David	Singh (if added)	NULL	NULL

David (customer 4) has never ordered — LEFT JOIN still includes him, with NULLs for the orders columns.

LEFT JOIN + aggregate — counting orders per customer including zeros

-- Count orders per customer — include customers with zero orders SELECT c.customer_id, CONCAT(c.first_name, ' ', c.last_name) AS customer, COUNT(o.order_id) AS order_count, COALESCE(SUM(o.total_price), 0) AS total_spent FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.first_name, c.last_name ORDER BY total_spent DESC;

customer_id	customer	order_count	total_spent
1	Alice Nguyen	2	28.97
3	Chloe Martinez	1	20.97
2	Ben Okafor	1	9.99
4	David Singh	0	0.00

David appears with 0 orders and £0.00 spent. COUNT(o.order_id) returns 0 (not COUNT(*)) because o.order_id is NULL for David — COUNT(col) skips NULLs. COALESCE converts NULL SUM to 0.00.

COUNT(*) vs COUNT(right_table_col) in a LEFT JOIN. After a LEFT JOIN, unmatched rows have NULL in every right-table column. COUNT(*) would count those NULL rows as 1. Use COUNT(o.order_id) — it skips NULLs and correctly returns 0 for customers with no orders.

5. Anti-Join — Finding Rows With No Match

Combine LEFT JOIN with WHERE right_table.col IS NULL to find rows that have no corresponding entry in the joined table. This is one of the most practical JOIN patterns in real applications:

-- Customers who have never placed an order SELECT c.customer_id, c.first_name, c.last_name FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id WHERE o.customer_id IS NULL; -- only keep rows where no order matched

customer_id	first_name	last_name
4	David	Singh

-- Books that have never been ordered SELECT b.book_id, b.title FROM books b LEFT JOIN orders o ON b.book_id = o.book_id WHERE o.book_id IS NULL;

book_id	title
6	The Left Hand of Darkness

The Left Hand of Darkness has never been ordered — a good candidate for a promotion or a stock review.

6. RIGHT JOIN

RIGHT JOIN keeps all rows from the right table and NULLs for the left where there is no match — the mirror image of LEFT JOIN. In practice, RIGHT JOIN is rarely used because any RIGHT JOIN can be rewritten as a LEFT JOIN by swapping the table order, which most developers find easier to read:

-- These two queries return identical results: -- Using RIGHT JOIN SELECT b.title, o.order_id FROM orders o RIGHT JOIN books b ON o.book_id = b.book_id; -- Equivalent LEFT JOIN (just swap the table order) SELECT b.title, o.order_id FROM books b LEFT JOIN orders o ON b.book_id = o.book_id;

Prefer LEFT JOIN over RIGHT JOIN. Both do the same job but most developers read SQL top-to-bottom and find it more natural to start with the "primary" table on the left. Mixing LEFT and RIGHT JOINs in the same query quickly becomes confusing. Stick to LEFT JOIN and swap table order when needed.

7. Self-Join — Joining a Table to Itself

A self-join joins a table to itself using two different aliases. It's used when rows in the same table have a relationship to each other — the classic example is an employees table where each employee has a manager_id that points to another row in the same table:

-- Suppose our authors table had a "mentored_by" column -- pointing to another author_id in the same table -- Self-join: show each author alongside their mentor SELECT a.first_name AS author, m.first_name AS mentor FROM authors a LEFT JOIN authors m ON a.mentored_by = m.author_id;

The bookshop schema doesn't have a recursive relationship, but self-joins come up often in organisational charts (employees + managers), category hierarchies (parent category → child category), and graph structures stored in relational tables.

8. Combining JOINs With Everything Else

JOINs compose cleanly with every clause from previous chapters. The execution order stays the same — FROM + JOIN happens first, then WHERE, GROUP BY, HAVING, SELECT, ORDER BY, LIMIT:

Sales report by genre

-- Total revenue and units sold per genre, top genres first SELECT b.genre, COUNT(DISTINCT o.order_id) AS order_lines, SUM(o.quantity) AS units_sold, SUM(o.total_price) AS revenue FROM orders o JOIN books b ON o.book_id = b.book_id GROUP BY b.genre ORDER BY revenue DESC;

genre	order_lines	units_sold	revenue
Mystery	1	3	20.97
Fiction	1	2	17.98
Non-fiction	2	2	18.98
Sci-fi	1	1	9.99

Best-selling authors

-- Revenue per author, only those who have sold something SELECT CONCAT(a.first_name, ' ', a.last_name) AS author, SUM(o.quantity) AS units_sold, SUM(o.total_price) AS revenue FROM authors a JOIN books b ON a.author_id = b.author_id JOIN orders o ON b.book_id = o.book_id GROUP BY a.author_id, a.first_name, a.last_name ORDER BY revenue DESC;

author	units_sold	revenue
Agatha Christie	3	20.97
Matt Haig	3	25.97
Yuval Harari	1	10.99
Frank Herbert	1	9.99

Ursula Le Guin doesn't appear — her book has no orders (INNER JOIN excludes her). Use LEFT JOIN chains if you need to include authors with zero sales.

ON vs WHERE — filtering in the right place

-- Row filter (same for INNER and LEFT JOIN — fine in either place) SELECT c.first_name, o.order_id, o.total_price FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id WHERE o.total_price > 10.00; -- filters AFTER join; loses unmatched customers -- To filter on the joined table WITHOUT losing unmatched rows from the left, -- move the condition into the ON clause: SELECT c.first_name, o.order_id, o.total_price FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.total_price > 10.00; -- Customers with no qualifying orders still appear; their order columns are NULL

In a LEFT JOIN, filtering on a right-table column in WHERE converts it to an INNER JOIN. Once WHERE filters out rows where the right-table column is NULL, the "keep all left rows" guarantee is broken. Move right-table filters into the ON clause if you still need unmatched left rows in the result.

Chapter Summary

Concept	Key points
Table alias	Short nickname (a, b, c, o) assigned in FROM/JOIN. Required when the same column name exists in multiple tables. Use alias.column to qualify.
INNER JOIN	Returns only rows that match in both tables. Unmatched rows on either side are excluded. Most common join type.
LEFT JOIN	Returns all rows from the left table. Right-table columns are NULL for rows with no match. Use to include records regardless of whether a related record exists.
Anti-join pattern	LEFT JOIN + WHERE right.col IS NULL — finds left-table rows with no match on the right. "Customers who never ordered", "books never sold".
RIGHT JOIN	Mirror of LEFT JOIN. Rarely used — rewrite as LEFT JOIN by swapping table order.
Self-join	Join a table to itself using two aliases. Used for hierarchical/recursive data: manager → employee, parent category → child category.
COUNT in LEFT JOIN	Use COUNT(right_table.pk) not COUNT() — COUNT() counts NULL rows as 1; COUNT(col) skips them, giving the correct 0 for unmatched rows.
ON vs WHERE in LEFT JOIN	Filtering on a right-table column in WHERE silently converts LEFT JOIN to INNER JOIN. Move right-table filters into the ON clause to preserve unmatched left rows.

Next: Chapter 9 — String, Date, and Numeric Functions. MySQL has dozens of built-in functions for transforming data: CONCAT, SUBSTRING, REPLACE, UPPER/LOWER for strings; DATE_FORMAT, DATEDIFF, DATE_ADD for dates; ROUND, FLOOR, CEIL, MOD for numbers. Chapter 9 covers the ones you'll reach for every week.