Chapter 4
Chapter 4 — Subqueries
A subquery is a SELECT statement nested inside another SQL statement. It can appear almost anywhere an expression or table name is expected — in a SELECT list, a WHERE clause, a FROM clause, or even the target of an INSERT or UPDATE. The outer query sees the subquery's result as if it were a value, a row, or a temporary table.
Subqueries solve problems that joins can't easily express: "find customers who have never ordered," "find books priced above the average," "delete rows that match a condition on a different table." Understanding when to reach for a subquery — and when a join or CTE is better — is one of the most valuable judgement calls in SQL.
1. Three Shapes of Subquery
= (val1, val2) or row constructors. Rare in practice.2. Scalar Subqueries
In the SELECT list — add a comparison column
| title | price | avg_price | diff_from_avg |
|---|---|---|---|
| Dune | 19.99 | 12.84 | 7.15 |
| Foundation | 16.50 | 12.84 | 3.66 |
| Neuromancer | 11.99 | 12.84 | -0.85 |
| 1984 | 9.99 | 12.84 | -2.85 |
| The Hobbit | 8.49 | 12.84 | -4.35 |
(SELECT AVG(price) FROM books) calls
above are typically computed once and reused — but to be safe and explicit, a CTE
(Chapter 5) or user variable is cleaner for complex queries.
In WHERE — filter against a computed value
In HAVING — filter groups against a computed value
3. IN and NOT IN
IN (subquery) tests whether a value appears in the set returned
by the subquery. It's the most natural way to express "rows in table A whose
key appears in table B."
IN — customers who have placed at least one order
NOT IN — customers who have never ordered
orders.customer_id is nullable and even one row has NULL,
NOT IN returns zero rows — the entire result set disappears silently.
This is the most common subquery bug in production SQL.The fix: add
WHERE customer_id IS NOT NULL inside the subquery,
or use NOT EXISTS instead (see below) — it handles NULLs correctly.
4. EXISTS and NOT EXISTS
EXISTS (subquery) returns TRUE if the subquery produces at least one row —
it doesn't matter what the columns are. MySQL stops as soon as it finds the first
matching row, making EXISTS very efficient for existence checks.
EXISTS is always correlated — the subquery references columns from the outer query. This is the key difference from IN: IN computes its full set first, then filters; EXISTS short-circuits per outer row.
EXISTS — customers with at least one order
NOT EXISTS — customers who have never ordered (NULL-safe)
EXISTS with a condition — customers who spent over £100 in a single order
5. ANY and ALL
ANY and ALL compare a value against every value returned
by a subquery using a comparison operator.
val > ANY (subquery)— true if val is greater than at least one value in the set (equivalent toval > MIN(subquery))val > ALL (subquery)— true if val is greater than every value in the set (equivalent toval > MAX(subquery))= ANYis equivalent toIN<> ALLis equivalent toNOT IN(with the same NULL caveats)
ANY — books more expensive than at least one Penguin title
ALL — books more expensive than every Penguin title
> ANY → > (SELECT MIN(…)), > ALL → > (SELECT MAX(…)))
because the aggregate versions are more readable and unambiguously express intent.
Know ANY/ALL for reading code; write the aggregate form.
6. Correlated vs Non-Correlated Subqueries
A non-correlated subquery is self-contained — it doesn't reference any column from the outer query. MySQL can evaluate it once and cache the result.
A correlated subquery references a column from the outer query. MySQL must re-evaluate it once for each row of the outer query. This makes them potentially slow on large tables.
Non-correlated — evaluated once
Correlated — runs once per outer row
When correlated subqueries are the right tool
They shine when the logic genuinely requires row-by-row comparison and a join would produce fan-out. A classic case: "find the most recent order for each customer" without using a window function:
7. Derived Tables — Subqueries in FROM
A subquery in the FROM clause acts as a temporary, unnamed table — often called a derived table. It must be given an alias. Every column you want to use from it must have a name (alias expressions if needed).
Derived table — filter on an aggregate without a CTE
WITH and is reusable. A derived table is inline and
can only be referenced once. Prefer CTEs for anything beyond a single, simple subquery —
they are easier to read and debug.
Lateral derived tables (MySQL 8.0.14+)
A LATERAL derived table can reference columns from tables listed earlier
in the FROM clause — removing the need for correlated subqueries in many cases:
8. Subqueries in INSERT, UPDATE, and DELETE
INSERT … SELECT — copy rows from one table to another
UPDATE with a subquery in SET
UPDATE with a subquery in WHERE
DELETE with a subquery
9. Subquery vs JOIN vs CTE — Decision Guide
| Situation | Reach for | Why |
|---|---|---|
| "Does a related row exist?" with no columns needed from it | EXISTS / NOT EXISTS | Short-circuits. NULL-safe. Expresses intent clearly. |
| Filter on a single aggregate (avg price, max date) | Scalar subquery in WHERE | Concise. Non-correlated = evaluated once. |
| Need columns from both tables in the output | JOIN | Subqueries can't easily return data from both sides. |
| Complex multi-step logic, result reused more than once | CTE (WITH) | Named, readable, reusable. Replaces nested derived tables. |
| "Not in this list" — the list has no NULLs | NOT IN | Readable. Just ensure the subquery excludes NULLs. |
| "Not in this list" — NULLs possible | NOT EXISTS or LEFT JOIN IS NULL | NOT IN breaks silently on NULLs. |
| One result per outer row from a subquery with ORDER BY + LIMIT | LATERAL JOIN (MySQL 8.0.14+) | Only lateral derived tables can correlate and limit at the same time. |
| Add an aggregate alongside detail rows without collapsing them | Window function (Chapter 6) | Correlated subquery in SELECT is a common anti-pattern; window functions do it in one pass. |
Quick Reference
| Form | Returns | Used in | Watch out for |
|---|---|---|---|
| Scalar subquery | One value | SELECT list, WHERE, HAVING, ORDER BY, SET | Error if >1 row returned; NULL if 0 rows returned. |
| IN (subquery) | Set of values | WHERE | NULL in list makes IN always FALSE for that value. |
| NOT IN (subquery) | Set of values | WHERE | Any NULL in subquery returns zero rows. Always add WHERE col IS NOT NULL. |
| EXISTS (subquery) | Boolean | WHERE | Always correlated. Short-circuits on first match — fast. |
| NOT EXISTS (subquery) | Boolean | WHERE | NULL-safe alternative to NOT IN. Preferred for "no match" checks. |
| val > ANY (subquery) | Boolean | WHERE | True if val > at least one value. Equivalent to val > (SELECT MIN(…)). |
| val > ALL (subquery) | Boolean | WHERE | True if val > every value. Equivalent to val > (SELECT MAX(…)). |
| Derived table (FROM) | Virtual table | FROM clause | Must have an alias. Cannot be referenced more than once — use CTE instead. |
| LATERAL JOIN | Virtual table per outer row | JOIN … ON TRUE | MySQL 8.0.14+. Can reference outer columns. Useful with LIMIT inside the lateral. |
| Correlated subquery | Any of the above | WHERE, SELECT list | Runs once per outer row — expensive on large tables. Rewrite as JOIN or window function when possible. |