Chapter 2

Chapter 2 — INNER JOIN and LEFT JOIN

Chapter 1 split the bookshop into ten normalised tables. Now comes the payoff: JOINs reunite that data at query time, combining rows from multiple tables into a single result set. INNER JOIN and LEFT JOIN cover 90% of real-world multi-table queries. This chapter works through both in depth — what they return, when to use each, and the patterns that will appear in almost every query you write.

This chapter uses the bookshop schema from Chapter 1. Run the CREATE TABLE and INSERT statements from Chapter 1 before following along if you haven't already. All queries are tested against that exact data set.

Why JOINs Exist

Normalisation splits data across tables to eliminate redundancy. The publisher's name lives in publishers, not in every books row. That's correct design — but when a user asks "what books does Doubleday publish?", we need both tables. A JOIN tells MySQL: for each row in books, find the matching row in publishers and glue them together.

The result of a JOIN looks like a single wide table. It isn't stored anywhere — MySQL builds it on the fly for this query, then discards it. The underlying tables stay exactly as they are.

Explicit JOIN vs implicit JOIN: SQL has two syntaxes for joining. Implicit joins list tables in FROM separated by commas and filter in WHERE. Explicit joins use the JOIN keyword. Always use explicit JOIN — it makes the join condition visible and separate from the filter condition, and it prevents accidental cartesian products.
✗ Implicit join — never write this
SELECT b.title, p.name
FROM   books b, publishers p
WHERE  b.publisher_id = p.id;
-- Mix of join condition and filter in WHERE
-- If you forget the WHERE: 6 books × 4 publishers = 24 rows
✓ Explicit JOIN — always write this
SELECT b.title, p.name
FROM   books b
JOIN   publishers p ON p.id = b.publisher_id;
-- Join condition is ON, filter conditions go in WHERE
-- Clear, intentional, impossible to forget the ON

INNER JOIN

books publishers ┌─────────┐ ┌─────────┐ │ │ │ │ │ left■■│■■right │ │ only │ │matchonly │ │ │ │ │ └─────────┘ └─────────┘ ↑ INNER JOIN returns only the shaded intersection Rows with no match on either side are excluded.

INNER JOIN returns only rows where the ON condition is satisfied in both tables. If a book has a NULL publisher_id, it won't appear. If a publisher has no books, it won't appear. Only matched pairs make it into the result.

Basic syntax

SELECT b.title, p.name AS publisher, b.price, b.year_published FROM books b JOIN publishers p ON p.id = b.publisher_id ORDER BY b.year_published;
Result — 6 books, all with a matched publisher
titlepublisherpriceyear_published
Murder on the Orient ExpressHodder & Stoughton7.491934
FoundationDoubleday8.991951
DuneAce Books9.991965
NeuromancerAce Books7.991984
HyperionDoubleday9.991989
Gone GirlDoubleday8.492012
JOIN is shorthand for INNER JOIN. MySQL treats JOIN and INNER JOIN identically. Most developers write JOIN for inner joins and spell out LEFT JOIN / RIGHT JOIN / CROSS JOIN in full — the asymmetry makes the type visible at a glance.

Adding a WHERE clause — filtering after joining

-- Books from UK publishers only SELECT b.title, p.name AS publisher, p.country FROM books b JOIN publishers p ON p.id = b.publisher_id WHERE p.country = 'UK' ORDER BY b.title;
Result — only Murder on the Orient Express matches (Hodder = UK)
titlepublishercountry
Murder on the Orient ExpressHodder & StoughtonUK

Joining three tables — books + authors via book_authors

Books and authors have a many-to-many relationship through book_authors. To get author names on books, we join through the junction table:

-- Books with their primary author's full name SELECT b.title, CONCAT(a.first_name, ' ', a.last_name) AS author, b.price FROM books b JOIN book_authors ba ON ba.book_id = b.id AND ba.author_order = 1 -- primary author only JOIN authors a ON a.id = ba.author_id ORDER BY a.last_name;
Result
titleauthorprice
FoundationIsaac Asimov8.99
Murder on the Orient ExpressAgatha Christie7.49
Gone GirlGillian Flynn8.49
NeuromancerWilliam Gibson7.99
DuneFrank Herbert9.99
HyperionDan Simmons9.99
1
books b — start with the books table (aliased as b)
2
JOIN book_authors ba ON ba.book_id = b.id AND ba.author_order = 1 — join to the junction table, keeping only the primary author entry. The additional AND condition on the ON clause is applied during the join itself — it's more efficient than filtering in WHERE.
3
JOIN authors a ON a.id = ba.author_id — now join to authors using the author_id from the junction row. We now have one result row per book with all columns from all three tables available.
4
SELECT / ORDER BY — pick the columns we want and sort. MySQL has already done the hard work.

Aggregating across a join — average review rating per book

-- Books that have at least one review, with their average rating SELECT b.title, ROUND(AVG(r.rating), 1) AS avg_rating, COUNT(r.id) AS review_count FROM books b JOIN reviews r ON r.book_id = b.id GROUP BY b.id, b.title ORDER BY avg_rating DESC;
Result — only books with reviews appear (INNER JOIN excludes unreviewed books)
titleavg_ratingreview_count
Foundation5.01
Murder on the Orient Express5.01
Dune4.52
Neuromancer4.01

Hyperion and Gone Girl are missing — they have no reviews yet. That's INNER JOIN's defining behaviour. To include them, we need LEFT JOIN.

LEFT JOIN

books reviews ┌─────────┐ ┌─────────┐ │ │ │ │ │■■■■■■■■■■│■■│ │ │ left │ │matchright │ │ only │ │ │ only │ │ │ │ │ └─────────┘ └─────────┘ ↑ LEFT JOIN returns ALL left rows + matched right rows Right-table columns are NULL where there is no match.

LEFT JOIN returns every row from the left table, regardless of whether a match exists in the right table. Where there is no match, all columns from the right table appear as NULL. This is exactly what you want when the relationship is optional: a book may or may not have reviews; you still want the book in the result.

All books with their average rating — including unreviewed books

-- All books, showing NULL avg_rating and 0 reviews for unreviewed books SELECT b.title, ROUND(AVG(r.rating), 1) AS avg_rating, COUNT(r.id) AS review_count FROM books b LEFT JOIN reviews r ON r.book_id = b.id GROUP BY b.id, b.title ORDER BY review_count DESC, b.title;
Result — all 6 books present; Hyperion and Gone Girl have no reviews
titleavg_ratingreview_count
Dune4.52
Foundation5.01
Murder on the Orient Express5.01
Neuromancer4.01
Gone GirlNULL0
HyperionNULL0
COUNT(r.id) vs COUNT(*): when using LEFT JOIN, COUNT(*) counts the NULL row for unmatched left-side entries — giving 1 instead of 0 for books with no reviews. COUNT(r.id) counts only non-NULL values in the right table's column — correctly giving 0 when there are no matching reviews. Always use COUNT(column_from_right_table) after a LEFT JOIN aggregation.

COALESCE — replacing NULLs in output

-- Show "No reviews yet" instead of NULL, and "0.0" instead of NULL avg SELECT b.title, COALESCE(CAST(ROUND(AVG(r.rating), 1) AS CHAR), 'No reviews yet') AS avg_rating, COUNT(r.id) AS review_count FROM books b LEFT JOIN reviews r ON r.book_id = b.id GROUP BY b.id, b.title ORDER BY review_count DESC, b.title;

All customers with their order count — including customers who never ordered

SELECT CONCAT(c.first_name, ' ', c.last_name) AS customer, COUNT(o.id) AS order_count, COALESCE(SUM(o.id), 0) AS placeholder FROM customers c LEFT JOIN orders o ON o.customer_id = c.id GROUP BY c.id ORDER BY order_count DESC;
Result — Alice has 2 orders, Bob has 1, Carol has 1 (all pending)
customerorder_count
Alice Martin2
Bob Nguyen1
Carol Patel1

LEFT JOIN with a publisher that has books with nullable publisher_id

-- Add a book with no publisher to see LEFT JOIN behaviour INSERT INTO books (title, publisher_id, price, year_published, stock) VALUES ('Self-Published Manifesto', NULL, 4.99, 2025, 10); -- INNER JOIN: book with NULL publisher_id is excluded SELECT b.title, p.name AS publisher FROM books b JOIN publishers p ON p.id = b.publisher_id; -- 6 rows — Self-Published Manifesto not in result -- LEFT JOIN: all books, NULL publisher for the unmatched one SELECT b.title, COALESCE(p.name, 'Independent') AS publisher FROM books b LEFT JOIN publishers p ON p.id = b.publisher_id ORDER BY b.title; -- 7 rows — Self-Published Manifesto shows "Independent"

The Anti-Join Pattern — LEFT JOIN + WHERE IS NULL

A LEFT JOIN where you filter for NULL on the right table's column finds rows in the left table that have no match at all in the right table. This is called an anti-join — it's how you answer "which X has no Y?"

-- Which books have NO reviews yet? SELECT b.title, b.price FROM books b LEFT JOIN reviews r ON r.book_id = b.id WHERE r.id IS NULL -- NULL means no review row matched ORDER BY b.title;
Result
titleprice
Gone Girl8.49
Hyperion9.99
Self-Published Manifesto4.99
-- Which customers have NEVER placed an order? SELECT c.first_name, c.last_name, c.email FROM customers c LEFT JOIN orders o ON o.customer_id = c.id WHERE o.id IS NULL; -- Which authors have no books in the catalogue? SELECT a.first_name, a.last_name FROM authors a LEFT JOIN book_authors ba ON ba.author_id = a.id WHERE ba.book_id IS NULL;
The IS NULL filter must reference a NOT NULL column from the right table. Filter on the primary key or a column defined NOT NULL. If you accidentally filter on a nullable column, you can't tell whether NULL means "no match" or "the column value is NULL". r.id, o.id, and ba.book_id above are all primary or not-null keys — safe choices for the IS NULL test.

Joining on Non-Primary-Key Columns

The ON clause can match any column — not just primary and foreign keys. Joins on natural keys (ISBN, email, slug) and even on non-unique values are valid.

-- Join two hypothetical catalogue systems on ISBN (a natural key, not a PK) SELECT b.title, b.price AS our_price, ext.price AS competitor_price FROM books b JOIN external_catalogue ext ON ext.isbn = b.isbn; -- Join on a non-unique column: find all orders placed the same day as order #1 SELECT o2.id AS same_day_order, o2.customer_id, o2.status FROM orders o1 JOIN orders o2 ON DATE(o2.created_at) = DATE(o1.created_at) AND o2.id != o1.id -- exclude the original order itself WHERE o1.id = 1; -- Joining on a genre slug (string comparison) — no integer key involved SELECT b.title FROM books b JOIN book_genres bg ON bg.book_id = b.id JOIN genres g ON g.id = bg.genre_id WHERE g.slug = 'cyberpunk';
Index the join column for performance. MySQL uses the ON column to look up matching rows. If the column isn't indexed, MySQL scans the entire right-table for every left-table row — catastrophic on large tables. Primary keys are automatically indexed. Foreign key columns (like book_id in reviews) should have an index added explicitly, as we did in Chapter 1. For joins on natural keys like ISBN, add an index or UNIQUE constraint.

NULL Handling in JOINs

-- NULL != NULL — a NULL FK never matches any row -- This is why INNER JOIN excludes NULLs automatically -- Demonstrate: ON p.id = b.publisher_id where publisher_id IS NULL SELECT NULL = NULL; +-----------+ | NULL = NULL | +-----------+ | NULL | -- not TRUE — the comparison itself is NULL (unknown) +-----------+ -- MySQL evaluates ON conditions as true/false/null -- A NULL result is treated as false → the row is excluded from INNER JOIN -- Left JOIN includes it but sets all right-table columns to NULL -- Safe NULL check: use IS NULL / IS NOT NULL, never = NULL SELECT title FROM books WHERE publisher_id IS NULL; -- correct SELECT title FROM books WHERE publisher_id = NULL; -- always returns 0 rows

Common Mistakes

Missing the ON clause — accidental CROSS JOIN: JOIN publishers p without ON p.id = b.publisher_id is a CROSS JOIN — every book row is combined with every publisher row. 6 books × 4 publishers = 24 rows of garbage. MySQL 8.0 will usually throw a syntax error for a missing ON, but MySQL 5.x may not. Always write the ON.
-- Mistake 1: filtering on the wrong column SELECT b.title, p.name FROM books b JOIN publishers p ON p.id = b.id; -- wrong: should be b.publisher_id -- Joins book.id to publisher.id — nonsensical match -- Mistake 2: using LEFT JOIN but forgetting IS NULL in anti-join SELECT b.title FROM books b LEFT JOIN reviews r ON r.book_id = b.id; -- Returns ALL books including reviewed ones — forgot WHERE r.id IS NULL -- Mistake 3: COUNT(*) after LEFT JOIN SELECT b.title, COUNT(*) AS review_count -- returns 1 for unreviewed books FROM books b LEFT JOIN reviews r ON r.book_id = b.id GROUP BY b.id; -- Fix: COUNT(r.id) — counts only non-NULL values from the right table -- Mistake 4: ambiguous column name (both tables have "id") SELECT id, title, name -- ERROR: Column 'id' in field list is ambiguous FROM books b JOIN publishers p ON p.id = b.publisher_id; -- Fix: always qualify with alias when both tables share a column name SELECT b.id, b.title, p.name -- explicit table alias qualifiers FROM books b JOIN publishers p ON p.id = b.publisher_id;

Practical Example — Order Summary Report

A real-world query combining multiple JOINs and LEFT JOINs to produce an order line-item report:

-- Order summary: customer name, order date, book title, qty, line total -- Include orders with no items (pending with nothing added) via LEFT JOIN on items SELECT CONCAT(c.first_name, ' ', c.last_name) AS customer, o.id AS order_id, o.status, DATE(o.created_at) AS order_date, b.title AS book, oi.quantity, oi.quantity * oi.unit_price AS line_total FROM orders o JOIN customers c ON c.id = o.customer_id LEFT JOIN order_items oi ON oi.order_id = o.id LEFT JOIN books b ON b.id = oi.book_id ORDER BY o.created_at, oi.id;
Result — all orders, with line items. An order with no items would show NULL for book/qty/total.
customerorder_idstatusorder_datebookqtyline_total
Bob Nguyen3delivered2026-03-05Foundation18.99
Bob Nguyen3delivered2026-03-05Murder on the Orient Express17.49
Alice Martin1delivered2026-04-10Dune219.98
Alice Martin1delivered2026-04-10Neuromancer17.99
Alice Martin2shipped2026-05-20Hyperion19.99
Carol Patel4pending2026-06-01Gone Girl18.49

Quick Reference — Chapter 2

PatternWhat it does
JOIN … ON INNER JOIN — returns only rows with a match in both tables. Excludes NULLs and unmatched rows from either side.
LEFT JOIN … ON Returns all rows from the left table + matched rows from the right. Right-table columns are NULL where there is no match.
LEFT JOIN … WHERE right.id IS NULL Anti-join — rows in the left table that have no matching row in the right table. Always filter on a NOT NULL column.
COUNT(right_table.id) After a LEFT JOIN GROUP BY, counts only matched rows (0 for unmatched). COUNT(*) wrongly gives 1.
COALESCE(col, 'default') Replaces NULL in output with a fallback value. Use on right-table columns in LEFT JOIN results.
table alias FROM books b — alias b. Use b.column everywhere. Short aliases (b, p, a, c, o, r, oi) improve readability and remove ambiguity when tables share column names.
ON clause condition The join condition. Can include AND for extra filters applied during the join (more efficient than WHERE for LEFT JOINs). Use WHERE for post-join filters on the result.
Three-table join FROM a JOIN b ON … JOIN c ON … — chain as many JOINs as needed. Each adds one more table to the working result set.
NULL = NULL Evaluates to NULL (unknown), not TRUE. A NULL FK never matches any row in an INNER JOIN. Use IS NULL / IS NOT NULL for null checks — never = NULL.