Chapter 10 — Query Optimisation
Every query you write goes through three stages: the parser checks syntax, the
optimiser decides how to execute it (which indexes to use, what join order
to pick, whether to use a temporary table), and the executor runs the chosen plan.
The optimiser is very good — but it can only work with the indexes you give it and
the SQL you write. This chapter teaches you to read its mind, feed it what it needs,
and spot the patterns that defeat it.
The golden rule of optimisation: measure first, then fix.
Gut-feel optimisation creates complexity without guaranteed gain.
Use EXPLAIN to confirm there's a problem, the slow query log to find which queries
need attention, and benchmark before and after any change.
1. EXPLAIN — Reading the Query Plan
Prefix any SELECT (or DML statement in MySQL 8.0+) with EXPLAIN to see
the execution plan the optimiser chose — without actually running the query.
EXPLAIN
SELECT
b.title,
CONCAT(a.first_name, ' ', a.last_name) AS author,
COUNT(r.review_id) AS reviews
FROM books b
JOIN book_authors ba ON ba.book_id = b.book_id
JOIN authors a ON a.author_id = ba.author_id
LEFT JOIN reviews r ON r.book_id = b.book_id
GROUP BY b.book_id, b.title, a.first_name, a.last_name
ORDER BY reviews DESC\G
The key columns you need to read
| Column | What it tells you | Red flags |
| type |
The access method (see graded list below) |
ALL (full table scan), index (full index scan) |
| key |
The index actually chosen (NULL if none) |
NULL on a large table |
| key_len |
Bytes of the index used — tells you how many columns of a composite index were used |
Much shorter than expected → composite index not fully used |
| rows |
Estimated rows examined per outer row — the optimiser's cost estimate |
Very high relative to table size |
| filtered |
Estimated % of rows that pass the WHERE after index access |
Low (e.g. 5%) means index is poor for this filter |
| Extra |
Additional plan details |
Using filesort, Using temporary — expensive operations |
| possible_keys |
Indexes the optimiser considered |
NULL means no usable index exists for this table |
The type column — access method graded best to worst
const / eq_ref
Best. At most one row matched — primary key or unique index equality. As fast as a lookup can be.
ref
Good. Non-unique index equality — returns a small set of matching rows. Common on foreign key columns.
range
Acceptable. Index range scan — WHERE col BETWEEN, col > val, col IN (…). Reads a contiguous slice of the index.
index
Marginal. Scans the entire index tree (cheaper than a table scan, but still full scan). Often appears when a covering index is used.
ALL
Bad. Full table scan — reads every row. Acceptable only on tiny tables (<a few hundred rows). Investigate immediately on large tables.
Extra column — what the flags mean
| Extra value | Meaning | Fix |
| Using index | Covering index — no table row read needed | Great. No action needed. |
| Using where | WHERE filter applied after index access | Normal. Fine unless rows is very high. |
| Using filesort | ORDER BY couldn't use an index — sort in memory/disk | Add an index that covers the ORDER BY columns. |
| Using temporary | Intermediate result stored in a temp table (GROUP BY, DISTINCT, some subqueries) | Add an index covering the GROUP BY columns, or rewrite the query. |
| Using filesort + Using temporary | Sort after a temp table — double penalty | High priority to fix. Redesign index or query structure. |
| Using index condition | Index condition pushdown — filter applied inside the storage engine | Good. Engine filters before returning rows. |
2. EXPLAIN ANALYZE — Actual Execution Stats
EXPLAIN ANALYZE (MySQL 8.0.18+) actually runs the query and returns the
plan annotated with real timing and row counts alongside the estimates.
Use it to confirm whether the optimiser's estimates match reality.
EXPLAIN ANALYZE
SELECT b.title, COUNT(r.review_id) AS reviews
FROM books b
LEFT JOIN reviews r ON r.book_id = b.book_id
GROUP BY b.book_id
ORDER BY reviews DESC\G
-> Sort: reviews DESC (actual time=12.4..12.5 rows=84 loops=1)
-> Table scan on <temporary> (actual time=0.0..0.4 rows=84 loops=1)
-> Aggregate using temporary table (actual time=11.1..11.8 rows=84 loops=1)
-> Nested loop left join (actual time=0.2..8.4 rows=312 loops=1)
-> Table scan on b (cost=9.3 rows=84) (actual time=0.1..0.5 rows=84 loops=1)
-> Index lookup on r using idx_reviews_book_id (book_id=b.book_id)
(cost=0.9 rows=3) (actual time=0.06..0.09 rows=3 loops=84)
Compare estimated vs actual rows. When the optimiser estimates 3 rows but
actually processes 30,000, its cost model is wrong — usually because table statistics are
stale. Run ANALYZE TABLE tablename; to refresh them.
3. Index Strategy
Single-column index
The baseline. Index one column used frequently in WHERE, JOIN, or ORDER BY. InnoDB primary key is always a clustered index — the table rows are physically ordered by it.
Composite index
Index on (col_a, col_b, col_c). Left-prefix rule: queries must use col_a to benefit, then optionally col_b, then col_c. Skipping a column in the middle breaks the chain.
Covering index
A composite index that includes all columns the query needs — SELECT list + WHERE + ORDER BY. The engine never touches the table rows — EXPLAIN shows "Using index."
Prefix index
Index on the first N characters of a TEXT/VARCHAR column: INDEX (email(20)). Saves space but cannot be a covering index and cannot be used for ORDER BY.
Adding indexes to the bookshop schema
CREATE INDEX idx_orders_customer ON orders (customer_id);
CREATE INDEX idx_order_items_order ON order_items (order_id);
CREATE INDEX idx_order_items_book ON order_items (book_id);
CREATE INDEX idx_reviews_book ON reviews (book_id);
CREATE INDEX idx_reviews_customer ON reviews (customer_id);
CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);
CREATE INDEX idx_books_cover ON books (publisher_id, price, title);
CREATE INDEX idx_customers_email_pfx ON customers (email(20));
The left-prefix rule — composite index column order matters
SELECT * FROM orders WHERE customer_id = 7 AND order_date >= '2026-01-01';
SELECT * FROM orders WHERE customer_id = 7;
SELECT * FROM orders WHERE order_date >= '2026-01-01';
Covering index — eliminate the table row lookup
SELECT title, price
FROM books
WHERE publisher_id = 3
ORDER BY price;
Index selectivity — not all columns deserve an index
Selectivity = distinct values ÷ total rows. A column with low selectivity
(e.g. a boolean, or a status column with 3 values) is a poor index candidate —
the optimiser may prefer a full table scan anyway because too many rows match.
SELECT
COUNT(DISTINCT status) AS distinct_values,
COUNT(*) AS total_rows,
ROUND(COUNT(DISTINCT status) / COUNT(*), 4) AS selectivity
FROM orders;
4. Anti-Patterns That Defeat Indexes
1 — Function on the indexed column
SELECT * FROM orders WHERE YEAR(order_date) = 2026;
SELECT * FROM orders
WHERE order_date >= '2026-01-01' AND order_date < '2027-01-01';
SELECT * FROM customers WHERE LOWER(email) = 'alice@example.com';
SELECT * FROM customers WHERE email = 'alice@example.com';
2 — Implicit type cast
SELECT * FROM books WHERE isbn = 9780142000670;
SELECT * FROM books WHERE isbn = '9780142000670';
3 — Leading wildcard LIKE
SELECT * FROM books WHERE title LIKE '%foundation%';
SELECT * FROM books WHERE title LIKE 'Foundation%';
ALTER TABLE books ADD FULLTEXT INDEX ft_title (title);
SELECT * FROM books
WHERE MATCH(title) AGAINST ('foundation' IN BOOLEAN MODE);
4 — OR across different columns
SELECT * FROM books
WHERE title = 'Dune' OR isbn = '9780441013593';
SELECT * FROM books WHERE title = 'Dune'
UNION
SELECT * FROM books WHERE isbn = '9780441013593';
5 — Negation operators
SELECT * FROM orders WHERE status != 'pending';
SELECT * FROM orders
WHERE status IN ('shipped', 'delivered', 'cancelled');
6 — Non-deterministic functions in WHERE
SELECT * FROM orders
WHERE order_date >= NOW() - INTERVAL 30 DAY;
5. The Slow Query Log
The slow query log records every query that takes longer than a threshold you set.
It's the most reliable way to find real problems in production — not guessed ones.
Enable the slow query log
SHOW VARIABLES LIKE 'slow_query%';
SHOW VARIABLES LIKE 'long_query_time';
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';
SET GLOBAL log_queries_not_using_indexes = 'ON';
Reading the slow query log with mysqldumpslow
mysqldumpslow -s t -t 10 /var/log/mysql/slow.log
# Time: 2026-06-15T09:14:22
# User@Host: admin[admin] @ localhost []
# Query_time: 3.842134 Lock_time: 0.000120
# Rows_sent: 1 Rows_examined: 487503
# SET timestamp=1750000462;
# SELECT * FROM orders WHERE YEAR(order_date) = 2026;
Rows_examined vs Rows_sent is your key ratio.
Examining 487,503 rows to send 1 means the query is doing an enormous amount of
unnecessary work. A well-indexed query should examine close to the number of rows it returns.
6. Live Diagnosis
SHOW PROCESSLIST — what's running right now
SHOW FULL PROCESSLIST\G
KILL QUERY 42;
KILL 42;
Key SHOW STATUS counters
FLUSH STATUS;
SHOW STATUS LIKE 'Handler_%';
SHOW STATUS LIKE 'Select_full%';
SHOW STATUS LIKE 'Created_tmp%';
ANALYZE TABLE — refresh statistics for the optimiser
ANALYZE TABLE orders, order_items, books;
SELECT table_name, update_time, table_rows
FROM information_schema.tables
WHERE table_schema = DATABASE()
ORDER BY update_time DESC;
7. A Complete Optimisation Walk-Through
A slow report query has been flagged in the slow query log — 4.2 seconds,
examining 620,000 rows, returning 28. Here's the full diagnosis and fix cycle:
Step 1: run EXPLAIN
EXPLAIN
SELECT
DATE_FORMAT(o.order_date, '%Y-%m') AS month,
COUNT(DISTINCT o.order_id) AS orders,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
WHERE YEAR(o.order_date) = 2026
GROUP BY month
ORDER BY month\G
| table | type | key | rows | Extra |
| o |
ALL |
NULL |
420000 |
Using where; Using temporary; Using filesort |
| oi |
ref |
idx_order_items_order |
3 |
NULL |
Diagnoses: (1) YEAR() wraps the column — index on order_date is bypassed.
(2) No index exists for ORDER BY month. (3) GROUP BY forces a temporary table.
Step 2: fix the query and add an index
CREATE INDEX idx_orders_date_id ON orders (order_date, order_id);
SELECT
DATE_FORMAT(o.order_date, '%Y-%m') AS month,
COUNT(DISTINCT o.order_id) AS orders,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
WHERE o.order_date >= '2026-01-01'
AND o.order_date < '2027-01-01'
GROUP BY month
ORDER BY month;
Step 3: verify with EXPLAIN
| table | type | key | rows | Extra |
| o |
range |
idx_orders_date_id |
28 |
Using where; Using index |
| oi |
ref |
idx_order_items_order |
3 |
NULL |
Rows examined dropped from 420,000 to 28. "Using temporary" and "Using filesort"
are gone. Query time: 12ms.
Quick Reference
| Tool / concept | Command / rule | What to act on |
| EXPLAIN |
EXPLAIN SELECT … |
type = ALL, key = NULL, Extra = Using filesort / Using temporary |
| EXPLAIN ANALYZE |
EXPLAIN ANALYZE SELECT … (8.0.18+) |
Large gap between estimated and actual rows → run ANALYZE TABLE |
| Access types (best→worst) |
const → eq_ref → ref → range → index → ALL |
Investigate anything at index or ALL on tables > a few hundred rows |
| Composite index order |
Put equality columns first, range/ORDER BY column last |
Left-prefix rule: skipping a column breaks index use for subsequent columns |
| Covering index |
Include SELECT + WHERE + ORDER BY cols in one index |
EXPLAIN shows "Using index" — no table row read. Fastest possible read path. |
| Function on column |
WHERE YEAR(col) → WHERE col BETWEEN … |
Any function wrapping an indexed column defeats the index |
| Leading wildcard |
LIKE '%term' → use FULLTEXT index + MATCH … AGAINST |
B-tree indexes require a known prefix |
| NOT IN with NULLs |
Add WHERE col IS NOT NULL inside subquery |
Any NULL in the list silently empties the NOT IN result |
| Slow query log |
SET GLOBAL slow_query_log = ON; SET GLOBAL long_query_time = 1; |
High Rows_examined vs Rows_sent ratio → missing or wrong index |
| mysqldumpslow |
mysqldumpslow -s t -t 10 /path/slow.log |
Find the top time-wasting query patterns across thousands of log entries |
| Refresh statistics |
ANALYZE TABLE tbl; |
Run after large bulk operations; stale stats cause poor plan choices |
| SHOW PROCESSLIST |
SHOW FULL PROCESSLIST\G |
Long-running queries holding locks; kill with KILL QUERY id |
| Selectivity check |
COUNT(DISTINCT col) / COUNT(*) |
Values near 0 → poor standalone index candidate; combine in composite or skip |
Course Complete
You've covered all 10 chapters of the Advanced SQL course — from normalisation and every join type through subqueries, CTEs, window functions, views, stored procedures, triggers, events, and query optimisation. The skills in this course represent the full professional SQL toolkit used in production databases worldwide.