Chapter 8

Chapter 7 — Aggregate Functions and GROUP BY

Every query so far has returned one row per row in the table. Aggregate functions change that — they collapse many rows into a single summary value. Combined with GROUP BY, they let you produce per-category summaries: total sales per genre, average price per author, number of orders per customer. This chapter covers the five core aggregate functions, how GROUP BY works, and how HAVING lets you filter those summaries.

1. The Five Core Aggregate Functions

COUNT()
COUNT(*) or COUNT(col)

Count rows. COUNT(*) counts all rows including NULLs. COUNT(col) counts only non-NULL values in that column.

SUM()
SUM(col)

Total of all non-NULL values. Returns NULL if all values are NULL (use COALESCE to default to 0).

AVG()
AVG(col)

Mean of all non-NULL values. NULLs are excluded from both numerator and denominator — they don't count as zero.

MIN()
MIN(col)

Smallest value. Works on numbers, strings (alphabetical), and dates. Ignores NULLs.

MAX()
MAX(col)

Largest value. Works on numbers, strings, and dates. Ignores NULLs.

COUNT(DISTINCT)
COUNT(DISTINCT col)

Count unique non-NULL values. Useful for "how many different genres do we stock?" rather than "how many books?"

Aggregates across the whole table

Without GROUP BY, an aggregate function collapses the entire table into one row:

SELECT COUNT(*) AS total_books, COUNT(genre) AS books_with_genre, -- excludes NULLs COUNT(DISTINCT genre) AS unique_genres, SUM(stock) AS total_stock, AVG(price) AS avg_price, MIN(price) AS cheapest, MAX(price) AS most_expensive, MIN(published_year) AS oldest_year, MAX(published_year) AS newest_year FROM books;
total_booksbooks_with_genreunique_genrestotal_stockavg_pricecheapestmost_expensiveoldest_yearnewest_year
6641829.076.9910.9919392020

All six books collapsed into one summary row. avg_price is rounded here for display — MySQL returns the full decimal.

2. NULL Behaviour in Aggregates

All aggregate functions silently ignore NULL values — except COUNT(*) which counts rows regardless. This asymmetry causes a common mistake with AVG:

-- Suppose 2 of our 6 books have no stock value (NULL) -- These two queries give DIFFERENT answers: SELECT AVG(stock) AS avg_stock FROM books; -- MySQL: ignores the 2 NULLs, averages the 4 known values -- Result: 30.25 (121 / 4) SELECT SUM(stock) / COUNT(*) AS avg_stock FROM books; -- Divides by 6 (all rows), treating NULLs as if stock were 0 -- Result: 20.17 (121 / 6) -- If you mean "NULL stock counts as zero in the average", be explicit: SELECT AVG(IFNULL(stock, 0)) AS avg_stock FROM books; -- Result: 20.17 — matches SUM/COUNT(*) calculation
AVG skips NULLs — it doesn't treat them as zero. If stock is NULL for some books and you want those to contribute 0 to the average, wrap the column in IFNULL(stock, 0). If you genuinely want to average only books where stock is known, AVG(stock) is correct as-is.

3. GROUP BY — Per-Category Summaries

GROUP BY splits the rows into groups — one group per distinct value of the grouping column(s) — and runs the aggregate function separately on each group. The result has one row per group.

-- How many books and what is the average price per genre? SELECT genre, COUNT(*) AS book_count, AVG(price) AS avg_price, SUM(stock) AS total_stock FROM books GROUP BY genre ORDER BY book_count DESC;
genrebook_countavg_pricetotal_stock
Non-fiction29.4949
Sci-fi29.2436
Fiction18.9942
Mystery16.9955

Four genres, four result rows. MySQL grouped all Sci-fi books together, calculated COUNT/AVG/SUM for that group, then repeated for each genre.

Grouping by multiple columns

You can group by two or more columns to get a cross-tabulation. Each unique combination of the grouped columns becomes its own row:

-- Orders per customer per book SELECT customer_id, book_id, COUNT(*) AS times_ordered, SUM(quantity) AS total_qty FROM orders GROUP BY customer_id, book_id ORDER BY customer_id;

The golden rule of GROUP BY

Every column in SELECT must either be in GROUP BY, or wrapped in an aggregate function. If you put a non-grouped, non-aggregated column in SELECT, MySQL (in strict mode) will error. In non-strict mode it picks an arbitrary value from the group — which is almost certainly wrong.
-- ✗ WRONG: title is not in GROUP BY and not aggregated SELECT genre, title, COUNT(*) FROM books GROUP BY genre; -- ERROR 1055: 'title' is not in GROUP BY and is not aggregated -- ✓ CORRECT option 1: add title to GROUP BY (group by both) SELECT genre, title, COUNT(*) FROM books GROUP BY genre, title; -- ✓ CORRECT option 2: aggregate title instead SELECT genre, MIN(title) AS sample_title, COUNT(*) FROM books GROUP BY genre;

4. HAVING — Filtering Groups

WHERE filters individual rows before grouping. HAVING filters groups after the aggregate has been calculated. It's the only place you can reference aggregate functions in a filter condition.

-- Only show genres with more than one book and average price above £8 SELECT genre, COUNT(*) AS book_count, AVG(price) AS avg_price FROM books GROUP BY genre HAVING COUNT(*) > 1 AND AVG(price) > 8.00 ORDER BY avg_price DESC;
genrebook_countavg_price
Non-fiction29.49
Sci-fi29.24

Fiction (1 book) and Mystery (1 book) were excluded by HAVING COUNT(*) > 1.

WHERE vs HAVING — the key distinction

WHERE — filters rows before grouping
Runs before GROUP BY. Each row is evaluated individually. Cannot reference aggregate functions — the aggregates don't exist yet.
  • Use for column values: WHERE price > 5
  • Use for date ranges: WHERE order_date > '2024-01-01'
  • Use for row-level filters: WHERE genre IS NOT NULL
  • Faster — reduces rows before the costly grouping step
HAVING — filters groups after aggregating
Runs after GROUP BY. Each group is evaluated. Can (and usually does) reference aggregate functions.
  • Use for aggregate results: HAVING COUNT(*) > 1
  • Use for group totals: HAVING SUM(quantity) >= 10
  • Use for group averages: HAVING AVG(price) < 8
  • Can also filter by grouped columns, but WHERE is faster for that
-- Combined WHERE + GROUP BY + HAVING + ORDER BY -- "For fiction and sci-fi books only, which genres have average stock above 20?" SELECT genre, COUNT(*) AS book_count, AVG(stock) AS avg_stock, SUM(stock) AS total_stock FROM books WHERE genre IN ('Fiction', 'Sci-fi') -- row filter: only these genres enter grouping GROUP BY genre HAVING AVG(stock) > 20 -- group filter: only groups with high avg stock ORDER BY avg_stock DESC;

5. Full Clause Execution Order

MySQL evaluates clauses in a fixed logical order that is different from the order you write them. Understanding this explains every "column not found" and "can't use aggregate in WHERE" error:

1
FROM
Identify the source table(s) and load the rows
2
WHERE
Filter individual rows — aggregates don't exist yet
3
GROUP BY
Partition surviving rows into groups
4
HAVING
Filter groups — aggregates are now calculated and available
5
SELECT
Choose columns, compute expressions, assign aliases
6
ORDER BY
Sort the final result — aliases from SELECT are now available
7
LIMIT / OFFSET
Trim the result to the requested page
Why can't I use a SELECT alias in WHERE or HAVING? Because SELECT runs after both WHERE and HAVING. The alias doesn't exist at the time WHERE and HAVING are evaluated. You can use aliases in ORDER BY because that runs after SELECT. To reuse a calculation in HAVING without repeating it, some databases support column aliases in HAVING — MySQL does allow this as an extension, but it's not standard SQL and can behave unexpectedly. Repeating the expression is safer.

6. Practical Patterns

Revenue per customer

SELECT customer_id, COUNT(*) AS order_count, SUM(total_price) AS total_spent, AVG(total_price) AS avg_order_value FROM orders GROUP BY customer_id ORDER BY total_spent DESC;
customer_idorder_counttotal_spentavg_order_value
1228.9714.49
3120.9720.97
219.999.99
417.997.99

Customer 1 (Alice) has placed two orders totalling £28.97 — the highest spender.

Finding customers who spent more than £15 total

SELECT customer_id, SUM(total_price) AS total_spent FROM orders GROUP BY customer_id HAVING SUM(total_price) > 15.00 ORDER BY total_spent DESC;
customer_idtotal_spent
128.97
320.97

Books per author — minimum one book

SELECT author_id, COUNT(*) AS books_published, MIN(published_year) AS first_year, MAX(published_year) AS latest_year, SUM(stock) AS total_copies_in_stock FROM books GROUP BY author_id ORDER BY books_published DESC;
author_idbooks_publishedfirst_yearlatest_yeartotal_copies_in_stock
122015202060
211965196527
312011201131
411939193955
51196919699

Author 1 (Matt Haig) has two books in the catalogue. IDs rather than names appear here — Chapter 8 (JOINs) will combine both tables to show the name alongside.

Counting NULL vs non-NULL — COUNT(*) vs COUNT(col)

-- How many authors have a nationality recorded vs unknown? SELECT COUNT(*) AS total_authors, COUNT(nationality) AS with_nationality, COUNT(*) - COUNT(nationality) AS unknown_nationality FROM authors;
total_authorswith_nationalityunknown_nationality
541

One author has no nationality recorded (Ursula Le Guin's nationality was set to NULL in Chapter 5). COUNT(*) caught it; COUNT(nationality) skipped it.

WITH ROLLUP — adding a grand total row

WITH ROLLUP is a GROUP BY modifier that appends an extra summary row for each grouping level, with the final row representing the grand total across all groups:

SELECT IFNULL(genre, 'TOTAL') AS genre, COUNT(*) AS book_count, SUM(stock) AS total_stock FROM books GROUP BY genre WITH ROLLUP; -- ROLLUP sets the grouping column to NULL in the summary rows -- IFNULL converts NULL → 'TOTAL' for the label
genrebook_counttotal_stock
Fiction142
Mystery155
Non-fiction249
Sci-fi236
TOTAL6182

The final row is the grand total across all genres — added automatically by WITH ROLLUP.

Chapter Summary

ConceptKey points
COUNT(*)Counts all rows, including those with NULL values in any column.
COUNT(col)Counts only non-NULL values in that column. Use COUNT(*) − COUNT(col) to count NULLs.
SUM / AVG / MIN / MAXAll ignore NULL values. AVG denominates by non-NULL count — not total row count.
COUNT(DISTINCT col)Counts unique non-NULL values — useful for "how many different X" questions.
GROUP BYSplits rows into groups; aggregates run per group. Every non-aggregated SELECT column must appear in GROUP BY.
HAVINGFilters groups after aggregation. The only place you can use aggregate functions in a condition. WHERE filters rows; HAVING filters groups.
Execution orderFROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. Aliases from SELECT are only available in ORDER BY.
WITH ROLLUPAppends subtotal and grand total rows to a GROUP BY result. ROLLUP sets grouping columns to NULL in summary rows — use IFNULL to label them.
Next: Chapter 8 — JOINs. The queries in this chapter returned author IDs rather than author names — because the name lives in a different table. JOINs combine rows from two or more tables based on a shared key. Chapter 8 covers INNER JOIN, LEFT JOIN, RIGHT JOIN, self-joins, and the difference between them.
.tip-box strong { color: #7ecfa0; } .warn-box { background: #1f1a10; border-left: 3px solid #ffd580; border-radius: 0 6px 6px 0; padding: 12px 16px; margin-bottom: 16px; font-size: 0.88em; color: #b08a3a; } .warn-box strong { color: #ffd580; } .danger-box { background: #1f1010; border-left: 3px solid #f08080; border-radius: 0 6px 6px 0; padding: 12px 16px; margin-bottom: 16px; font-size: 0.88em; color: #b05a5a; } .danger-box strong { color: #f08080; } /* ── SQL block ──────────────────────────────────────────────── */ .sql-block { background: #0d0f18; border: 1px solid #2a2d3a; border-radius: 6px; padding: 14px 18px; font-family: 'Courier New', Consolas, monospace; font-size: 0.82em; line-height: 1.7; margin: 10px 0 18px; overflow-x: auto; white-space: pre; } .kw { color: #4f8ef7; font-weight: bold; } .fn { color: #c9a8ff; } .str { color: #ffd580; } .num { color: #f08080; } .cmt { color: #484f58; font-style: italic; } .tbl { color: #7ecfa0; } .out { color: #8b949e; } .op { color: #f08080; } /* ── Result table ───────────────────────────────────────────── */ .result-table { border-collapse: collapse; font-family: 'Courier New', monospace; font-size: 0.79em; margin: 0 0 6px; background: #0d0f18; border: 1px solid #2a2d3a; border-radius: 6px; overflow: hidden; width: 100%; } .result-table th { background: #1a1d27; color: #4f8ef7; padding: 7px 14px; text-align: left; border-bottom: 1px solid #2a2d3a; white-space: nowrap; font-family: system-ui, sans-serif; font-size: 0.85em; } .result-table td { padding: 6px 14px; border-bottom: 1px solid #1a1d27; color: #c9d1d9; } .result-table tr:last-child td { border-bottom: none; } .result-table td.null-val { color: #484f58; font-style: italic; } .result-table td.hi { color: #ffd580; } .result-table td.green { color: #7ecfa0; } .result-caption { font-family: system-ui, sans-serif; font-size: 0.72em; color: #484f58; margin: 0 0 20px; padding-left: 2px; } /* ── Venn diagram ───────────────────────────────────────────── */ .venn-row { display: flex; gap: 14px; flex-wrap: wrap; margin: 16px 0 24px; justify-content: center; } .venn-card { background: #1a1d27; border: 1px solid #2a2d3a; border-radius: 8px; padding: 14px 16px; flex: 1; min-width: 180px; max-width: 220px; text-align: center; } .venn-card__svg { margin: 0 auto 10px; display: block; } .venn-card__title { font-family: system-ui, sans-serif; font-size: 0.8em; font-weight: 700; color: #c9d1d9; margin-bottom: 4px; } .venn-card__desc { font-family: system-ui, sans-serif; font-size: 0.71em; color: #8b949e; line-height: 1.45; } /* ── Join comparison table ──────────────────────────────────── */ .join-table { border-collapse: collapse; width: 100%; font-size: 0.8em; margin: 12px 0 22px; } .join-table th { background: #1a1d27; color: #8b949e; padding: 8px 12px; text-align: left; border-bottom: 2px solid #2a2d3a; font-family: system-ui, sans-serif; } .join-table td { padding: 8px 12px; border-bottom: 1px solid #1a1d27; vertical-align: top; color: #8b949e; font-family: system-ui, sans-serif; } .join-table tr:last-child td { border-bottom: none; } .join-table .jname { font-family: 'Courier New', monospace; color: #4f8ef7; font-weight: 700; white-space: nowrap; } .join-table .keeps { color: #7ecfa0; } .join-table .drops { color: #f08080; } .join-table .use { color: #c9d1d9; } /* ── Alias legend ───────────────────────────────────────────── */ .alias-legend { display: flex; gap: 10px; flex-wrap: wrap; margin: 10px 0 18px; } .alias-pill { background: #1a1d27; border: 1px solid #2a2d3a; border-radius: 20px; padding: 4px 12px; font-family: 'Courier New', monospace; font-size: 0.75em; } .alias-pill .alias { color: #4f8ef7; } .alias-pill .sep { color: #484f58; margin: 0 4px; } .alias-pill .full { color: #7ecfa0; } /* ── Summary table ──────────────────────────────────────────── */ .summary-table { border-collapse: collapse; width: 100%; font-size: 0.82em; margin-top: 10px; } .summary-table th { background: #1a1d27; color: #8b949e; padding: 8px 12px; text-align: left; border-bottom: 2px solid #2a2d3a; } .summary-table td { padding: 7px 12px; border-bottom: 1px solid #1a1d27; color: #8b949e; vertical-align: top; } .summary-table tr:last-child td { border-bottom: none; } .summary-table .label { color: #c9d1d9; font-weight: 600; white-space: nowrap; font-family: 'Courier New', monospace; font-size: 0.95em; } .next-chapter { margin-top: 2rem; padding: 12px 16px; background: #1a1d27; border-radius: 6px; font-family: system-ui, sans-serif; font-size: 0.83em; color: #8b949e; } .next-chapter strong { color: #4f8ef7; }

Chapter 8 — JOINs

Every chapter so far has queried a single table. Real databases split data across multiple related tables — orders live in orders, the customer name lives in customers, the book title lives in books. A JOIN combines rows from two or more tables based on a matching column, letting you query that connected data as if it were one result set.

Why split data across tables? If we stored the customer name inside every order row, changing a customer's name would require updating every one of their orders — and any that were missed would have inconsistent data. By keeping names in one place (customers) and referencing them by ID, a single UPDATE fixes everything. JOINs are the mechanism that puts it back together at query time.

1. Table Aliases — Keeping Queries Readable

JOIN queries reference two or more tables. Qualifying every column name with the full table name (orders.customer_id, customers.customer_id) gets verbose fast. A table alias — a short nickname assigned in the FROM clause — keeps things concise:

-- Without aliases — verbose SELECT orders.order_id, customers.first_name, customers.last_name FROM orders JOIN customers ON orders.customer_id = customers.customer_id; -- With aliases — the standard way SELECT o.order_id, c.first_name, c.last_name FROM orders o JOIN customers c ON o.customer_id = c.customer_id;

The alias follows the table name in the FROM or JOIN clause, with or without the optional keyword AS. Both forms are identical — orders o and orders AS o mean the same thing.

aauthors
bbooks
ccustomers
oorders

We'll use these single-letter aliases consistently throughout this chapter.

2. INNER JOIN — Only Matching Rows

An INNER JOIN returns rows where the ON condition matches in both tables. Rows that have no match on the other side are silently excluded. JOIN and INNER JOIN are identical — INNER is optional.

-- Every order with the customer name and book title SELECT o.order_id, c.first_name, c.last_name, b.title, o.quantity, o.total_price FROM orders o INNER JOIN customers c ON o.customer_id = c.customer_id INNER JOIN books b ON o.book_id = b.book_id ORDER BY o.order_id;
order_idfirst_namelast_nametitlequantitytotal_price
1AliceNguyenThe Midnight Library217.98
2AliceNguyenSapiens110.99
3BenOkaforDune19.99
4ChloeMartinezAnd Then There Were None320.97
5DavidSinghReasons to Stay Alive17.99

Three tables joined in one query. Customer IDs and book IDs are replaced with the real names and titles.

Joining all four tables — a full order receipt

-- Full receipt: order → customer → book → author SELECT o.order_id, o.order_date, CONCAT(c.first_name, ' ', c.last_name) AS customer, b.title, CONCAT(a.first_name, ' ', a.last_name) AS author, o.quantity, o.total_price FROM orders o JOIN customers c ON o.customer_id = c.customer_id JOIN books b ON o.book_id = b.book_id JOIN authors a ON b.author_id = a.author_id ORDER BY o.order_date;
order_idorder_datecustomertitleauthorqtytotal
12024-02-14Alice NguyenThe Midnight LibraryMatt Haig217.98
22024-02-14Alice NguyenSapiensYuval Harari110.99
32024-03-05Ben OkaforDuneFrank Herbert19.99
42024-03-20Chloe MartinezAnd Then There Were NoneAgatha Christie320.97
52024-04-10David SinghReasons to Stay AliveMatt Haig17.99

All four tables joined in one query. Notice David (customer 4) was added in Chapter 5 but never ordered — he won't appear here because INNER JOIN only keeps matched rows.

3. JOIN Types at a Glance

INNER JOIN
Only rows that match in both tables. No match = excluded.
LEFT JOIN
All rows from the left table. Right side NULLs where no match.
RIGHT JOIN
All rows from the right table. Left side NULLs where no match.
LEFT ANTI-JOIN
Left rows with no match on the right. Find orphans / unmatched records.
JOIN typeKeeps from leftKeeps from rightBest for
INNER JOIN Matched only Matched only The common case — you only care about rows that have data on both sides
LEFT JOIN All rows Matched rows + NULLs for non-matches Keep all left rows even without a right-side match — "show all customers, even those with no orders"
RIGHT JOIN Matched rows + NULLs for non-matches All rows Rarely needed — rewrite as LEFT JOIN by swapping table order instead
LEFT JOIN + WHERE right.col IS NULL Unmatched only Excluded Find rows with no match — "customers who have never ordered", "books with no orders"
CROSS JOIN All rows All rows Cartesian product — every left row paired with every right row. Rarely useful; almost always accidental.

4. LEFT JOIN — Keeping Unmatched Left Rows

LEFT JOIN returns all rows from the left table, whether or not they have a matching row in the right table. When there is no match, every column from the right table is NULL in that result row.

-- All customers, with their orders if they have any SELECT c.customer_id, c.first_name, c.last_name, o.order_id, o.total_price FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id ORDER BY c.customer_id;
customer_idfirst_namelast_nameorder_idtotal_price
1AliceNguyen117.98
1AliceNguyen210.99
2BenOkafor39.99
3ChloeMartinez420.97
4DavidSinghNULLNULL
5DavidSingh (if added)NULLNULL

David (customer 4) has never ordered — LEFT JOIN still includes him, with NULLs for the orders columns.

LEFT JOIN + aggregate — counting orders per customer including zeros

-- Count orders per customer — include customers with zero orders SELECT c.customer_id, CONCAT(c.first_name, ' ', c.last_name) AS customer, COUNT(o.order_id) AS order_count, COALESCE(SUM(o.total_price), 0) AS total_spent FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.first_name, c.last_name ORDER BY total_spent DESC;
customer_idcustomerorder_counttotal_spent
1Alice Nguyen228.97
3Chloe Martinez120.97
2Ben Okafor19.99
4David Singh00.00

David appears with 0 orders and £0.00 spent. COUNT(o.order_id) returns 0 (not COUNT(*)) because o.order_id is NULL for David — COUNT(col) skips NULLs. COALESCE converts NULL SUM to 0.00.

COUNT(*) vs COUNT(right_table_col) in a LEFT JOIN. After a LEFT JOIN, unmatched rows have NULL in every right-table column. COUNT(*) would count those NULL rows as 1. Use COUNT(o.order_id) — it skips NULLs and correctly returns 0 for customers with no orders.

5. Anti-Join — Finding Rows With No Match

Combine LEFT JOIN with WHERE right_table.col IS NULL to find rows that have no corresponding entry in the joined table. This is one of the most practical JOIN patterns in real applications:

-- Customers who have never placed an order SELECT c.customer_id, c.first_name, c.last_name FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id WHERE o.customer_id IS NULL; -- only keep rows where no order matched
customer_idfirst_namelast_name
4DavidSingh
-- Books that have never been ordered SELECT b.book_id, b.title FROM books b LEFT JOIN orders o ON b.book_id = o.book_id WHERE o.book_id IS NULL;
book_idtitle
6The Left Hand of Darkness

The Left Hand of Darkness has never been ordered — a good candidate for a promotion or a stock review.

6. RIGHT JOIN

RIGHT JOIN keeps all rows from the right table and NULLs for the left where there is no match — the mirror image of LEFT JOIN. In practice, RIGHT JOIN is rarely used because any RIGHT JOIN can be rewritten as a LEFT JOIN by swapping the table order, which most developers find easier to read:

-- These two queries return identical results: -- Using RIGHT JOIN SELECT b.title, o.order_id FROM orders o RIGHT JOIN books b ON o.book_id = b.book_id; -- Equivalent LEFT JOIN (just swap the table order) SELECT b.title, o.order_id FROM books b LEFT JOIN orders o ON b.book_id = o.book_id;
Prefer LEFT JOIN over RIGHT JOIN. Both do the same job but most developers read SQL top-to-bottom and find it more natural to start with the "primary" table on the left. Mixing LEFT and RIGHT JOINs in the same query quickly becomes confusing. Stick to LEFT JOIN and swap table order when needed.

7. Self-Join — Joining a Table to Itself

A self-join joins a table to itself using two different aliases. It's used when rows in the same table have a relationship to each other — the classic example is an employees table where each employee has a manager_id that points to another row in the same table:

-- Suppose our authors table had a "mentored_by" column -- pointing to another author_id in the same table -- Self-join: show each author alongside their mentor SELECT a.first_name AS author, m.first_name AS mentor FROM authors a LEFT JOIN authors m ON a.mentored_by = m.author_id;

The bookshop schema doesn't have a recursive relationship, but self-joins come up often in organisational charts (employees + managers), category hierarchies (parent category → child category), and graph structures stored in relational tables.

8. Combining JOINs With Everything Else

JOINs compose cleanly with every clause from previous chapters. The execution order stays the same — FROM + JOIN happens first, then WHERE, GROUP BY, HAVING, SELECT, ORDER BY, LIMIT:

Sales report by genre

-- Total revenue and units sold per genre, top genres first SELECT b.genre, COUNT(DISTINCT o.order_id) AS order_lines, SUM(o.quantity) AS units_sold, SUM(o.total_price) AS revenue FROM orders o JOIN books b ON o.book_id = b.book_id GROUP BY b.genre ORDER BY revenue DESC;
genreorder_linesunits_soldrevenue
Mystery1320.97
Fiction1217.98
Non-fiction2218.98
Sci-fi119.99

Best-selling authors

-- Revenue per author, only those who have sold something SELECT CONCAT(a.first_name, ' ', a.last_name) AS author, SUM(o.quantity) AS units_sold, SUM(o.total_price) AS revenue FROM authors a JOIN books b ON a.author_id = b.author_id JOIN orders o ON b.book_id = o.book_id GROUP BY a.author_id, a.first_name, a.last_name ORDER BY revenue DESC;
authorunits_soldrevenue
Agatha Christie320.97
Matt Haig325.97
Yuval Harari110.99
Frank Herbert19.99

Ursula Le Guin doesn't appear — her book has no orders (INNER JOIN excludes her). Use LEFT JOIN chains if you need to include authors with zero sales.

ON vs WHERE — filtering in the right place

-- Row filter (same for INNER and LEFT JOIN — fine in either place) SELECT c.first_name, o.order_id, o.total_price FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id WHERE o.total_price > 10.00; -- filters AFTER join; loses unmatched customers -- To filter on the joined table WITHOUT losing unmatched rows from the left, -- move the condition into the ON clause: SELECT c.first_name, o.order_id, o.total_price FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.total_price > 10.00; -- Customers with no qualifying orders still appear; their order columns are NULL
In a LEFT JOIN, filtering on a right-table column in WHERE converts it to an INNER JOIN. Once WHERE filters out rows where the right-table column is NULL, the "keep all left rows" guarantee is broken. Move right-table filters into the ON clause if you still need unmatched left rows in the result.

Chapter Summary

ConceptKey points
Table aliasShort nickname (a, b, c, o) assigned in FROM/JOIN. Required when the same column name exists in multiple tables. Use alias.column to qualify.
INNER JOINReturns only rows that match in both tables. Unmatched rows on either side are excluded. Most common join type.
LEFT JOINReturns all rows from the left table. Right-table columns are NULL for rows with no match. Use to include records regardless of whether a related record exists.
Anti-join patternLEFT JOIN + WHERE right.col IS NULL — finds left-table rows with no match on the right. "Customers who never ordered", "books never sold".
RIGHT JOINMirror of LEFT JOIN. Rarely used — rewrite as LEFT JOIN by swapping table order.
Self-joinJoin a table to itself using two aliases. Used for hierarchical/recursive data: manager → employee, parent category → child category.
COUNT in LEFT JOINUse COUNT(right_table.pk) not COUNT(*) — COUNT(*) counts NULL rows as 1; COUNT(col) skips them, giving the correct 0 for unmatched rows.
ON vs WHERE in LEFT JOINFiltering on a right-table column in WHERE silently converts LEFT JOIN to INNER JOIN. Move right-table filters into the ON clause to preserve unmatched left rows.
Next: Chapter 9 — String, Date, and Numeric Functions. MySQL has dozens of built-in functions for transforming data: CONCAT, SUBSTRING, REPLACE, UPPER/LOWER for strings; DATE_FORMAT, DATEDIFF, DATE_ADD for dates; ROUND, FLOOR, CEIL, MOD for numbers. Chapter 9 covers the ones you'll reach for every week.