Chapter 10

Chapter 10 — Query Optimisation

Every query you write goes through three stages: the parser checks syntax, the optimiser decides how to execute it (which indexes to use, what join order to pick, whether to use a temporary table), and the executor runs the chosen plan. The optimiser is very good — but it can only work with the indexes you give it and the SQL you write. This chapter teaches you to read its mind, feed it what it needs, and spot the patterns that defeat it.

The golden rule of optimisation: measure first, then fix. Gut-feel optimisation creates complexity without guaranteed gain. Use EXPLAIN to confirm there's a problem, the slow query log to find which queries need attention, and benchmark before and after any change.

1. EXPLAIN — Reading the Query Plan

Prefix any SELECT (or DML statement in MySQL 8.0+) with EXPLAIN to see the execution plan the optimiser chose — without actually running the query.

EXPLAIN SELECT b.title, CONCAT(a.first_name, ' ', a.last_name) AS author, COUNT(r.review_id) AS reviews FROM books b JOIN book_authors ba ON ba.book_id = b.book_id JOIN authors a ON a.author_id = ba.author_id LEFT JOIN reviews r ON r.book_id = b.book_id GROUP BY b.book_id, b.title, a.first_name, a.last_name ORDER BY reviews DESC\G

The key columns you need to read

ColumnWhat it tells youRed flags
type The access method (see graded list below) ALL (full table scan), index (full index scan)
key The index actually chosen (NULL if none) NULL on a large table
key_len Bytes of the index used — tells you how many columns of a composite index were used Much shorter than expected → composite index not fully used
rows Estimated rows examined per outer row — the optimiser's cost estimate Very high relative to table size
filtered Estimated % of rows that pass the WHERE after index access Low (e.g. 5%) means index is poor for this filter
Extra Additional plan details Using filesort, Using temporary — expensive operations
possible_keys Indexes the optimiser considered NULL means no usable index exists for this table

The type column — access method graded best to worst

const / eq_ref
Best. At most one row matched — primary key or unique index equality. As fast as a lookup can be.
ref
Good. Non-unique index equality — returns a small set of matching rows. Common on foreign key columns.
range
Acceptable. Index range scan — WHERE col BETWEEN, col > val, col IN (…). Reads a contiguous slice of the index.
index
Marginal. Scans the entire index tree (cheaper than a table scan, but still full scan). Often appears when a covering index is used.
ALL
Bad. Full table scan — reads every row. Acceptable only on tiny tables (<a few hundred rows). Investigate immediately on large tables.

Extra column — what the flags mean

Extra valueMeaningFix
Using indexCovering index — no table row read neededGreat. No action needed.
Using whereWHERE filter applied after index accessNormal. Fine unless rows is very high.
Using filesortORDER BY couldn't use an index — sort in memory/diskAdd an index that covers the ORDER BY columns.
Using temporaryIntermediate result stored in a temp table (GROUP BY, DISTINCT, some subqueries)Add an index covering the GROUP BY columns, or rewrite the query.
Using filesort + Using temporarySort after a temp table — double penaltyHigh priority to fix. Redesign index or query structure.
Using index conditionIndex condition pushdown — filter applied inside the storage engineGood. Engine filters before returning rows.

2. EXPLAIN ANALYZE — Actual Execution Stats

EXPLAIN ANALYZE (MySQL 8.0.18+) actually runs the query and returns the plan annotated with real timing and row counts alongside the estimates. Use it to confirm whether the optimiser's estimates match reality.

EXPLAIN ANALYZE SELECT b.title, COUNT(r.review_id) AS reviews FROM books b LEFT JOIN reviews r ON r.book_id = b.book_id GROUP BY b.book_id ORDER BY reviews DESC\G
-- Typical EXPLAIN ANALYZE output (tree format) -> Sort: reviews DESC (actual time=12.4..12.5 rows=84 loops=1) -> Table scan on <temporary> (actual time=0.0..0.4 rows=84 loops=1) -> Aggregate using temporary table (actual time=11.1..11.8 rows=84 loops=1) -> Nested loop left join (actual time=0.2..8.4 rows=312 loops=1) -> Table scan on b (cost=9.3 rows=84) (actual time=0.1..0.5 rows=84 loops=1) -> Index lookup on r using idx_reviews_book_id (book_id=b.book_id) (cost=0.9 rows=3) (actual time=0.06..0.09 rows=3 loops=84)
Compare estimated vs actual rows. When the optimiser estimates 3 rows but actually processes 30,000, its cost model is wrong — usually because table statistics are stale. Run ANALYZE TABLE tablename; to refresh them.

3. Index Strategy

Single-column index
The baseline. Index one column used frequently in WHERE, JOIN, or ORDER BY. InnoDB primary key is always a clustered index — the table rows are physically ordered by it.
Composite index
Index on (col_a, col_b, col_c). Left-prefix rule: queries must use col_a to benefit, then optionally col_b, then col_c. Skipping a column in the middle breaks the chain.
Covering index
A composite index that includes all columns the query needs — SELECT list + WHERE + ORDER BY. The engine never touches the table rows — EXPLAIN shows "Using index."
Prefix index
Index on the first N characters of a TEXT/VARCHAR column: INDEX (email(20)). Saves space but cannot be a covering index and cannot be used for ORDER BY.

Adding indexes to the bookshop schema

-- Foreign key columns (if not indexed automatically) CREATE INDEX idx_orders_customer ON orders (customer_id); CREATE INDEX idx_order_items_order ON order_items (order_id); CREATE INDEX idx_order_items_book ON order_items (book_id); CREATE INDEX idx_reviews_book ON reviews (book_id); CREATE INDEX idx_reviews_customer ON reviews (customer_id); -- Composite: date range reports that also filter by customer CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date); -- Covering: the catalogue listing query touches only these columns CREATE INDEX idx_books_cover ON books (publisher_id, price, title); -- Prefix: email is long, only the first 20 chars are usually enough CREATE INDEX idx_customers_email_pfx ON customers (email(20));

The left-prefix rule — composite index column order matters

-- Composite index: (customer_id, order_date) -- ✅ Uses index — leads with customer_id SELECT * FROM orders WHERE customer_id = 7 AND order_date >= '2026-01-01'; -- ✅ Uses index (partial) — customer_id alone still benefits SELECT * FROM orders WHERE customer_id = 7; -- ❌ Cannot use index — skips the leading column SELECT * FROM orders WHERE order_date >= '2026-01-01';

Covering index — eliminate the table row lookup

-- Index: (publisher_id, price, title) -- Query only touches those three columns → "Using index" in EXPLAIN SELECT title, price FROM books WHERE publisher_id = 3 ORDER BY price;

Index selectivity — not all columns deserve an index

Selectivity = distinct values ÷ total rows. A column with low selectivity (e.g. a boolean, or a status column with 3 values) is a poor index candidate — the optimiser may prefer a full table scan anyway because too many rows match.

-- Check selectivity before creating an index SELECT COUNT(DISTINCT status) AS distinct_values, COUNT(*) AS total_rows, ROUND(COUNT(DISTINCT status) / COUNT(*), 4) AS selectivity FROM orders; -- selectivity of 0.0003 → poor index candidate alone -- combine with a high-selectivity column in a composite index instead

4. Anti-Patterns That Defeat Indexes

1 — Function on the indexed column

-- ❌ YEAR() wraps the column — index on order_date is useless SELECT * FROM orders WHERE YEAR(order_date) = 2026; -- ✅ Rewrite as a range — index is used SELECT * FROM orders WHERE order_date >= '2026-01-01' AND order_date < '2027-01-01';
-- ❌ LOWER() on the column defeats the index SELECT * FROM customers WHERE LOWER(email) = 'alice@example.com'; -- ✅ Store emails lowercased, or use a case-insensitive collation SELECT * FROM customers WHERE email = 'alice@example.com'; -- (works if email column uses utf8mb4_unicode_ci collation)

2 — Implicit type cast

-- ❌ isbn is VARCHAR — passing an integer forces a cast on every row SELECT * FROM books WHERE isbn = 9780142000670; -- ✅ Pass the correct type SELECT * FROM books WHERE isbn = '9780142000670';

3 — Leading wildcard LIKE

-- ❌ Leading % means MySQL cannot use a B-tree index SELECT * FROM books WHERE title LIKE '%foundation%'; -- ✅ Trailing wildcard only — index is used SELECT * FROM books WHERE title LIKE 'Foundation%'; -- For full-text search, use FULLTEXT index + MATCH … AGAINST ALTER TABLE books ADD FULLTEXT INDEX ft_title (title); SELECT * FROM books WHERE MATCH(title) AGAINST ('foundation' IN BOOLEAN MODE);

4 — OR across different columns

-- ❌ OR on two columns — optimiser often falls back to full scan SELECT * FROM books WHERE title = 'Dune' OR isbn = '9780441013593'; -- ✅ Rewrite as UNION — each branch can use its own index SELECT * FROM books WHERE title = 'Dune' UNION SELECT * FROM books WHERE isbn = '9780441013593';

5 — Negation operators

-- ❌ != / NOT IN / NOT LIKE rarely use indexes efficiently SELECT * FROM orders WHERE status != 'pending'; -- ✅ Rewrite positively when the positive set is small SELECT * FROM orders WHERE status IN ('shipped', 'delivered', 'cancelled');

6 — Non-deterministic functions in WHERE

-- ❌ NOW() is evaluated per row in some contexts — use a parameter instead SELECT * FROM orders WHERE order_date >= NOW() - INTERVAL 30 DAY; -- This is actually fine — NOW() is a constant for the query duration -- but DATE(order_date) > ... wraps the column — avoid that form

5. The Slow Query Log

The slow query log records every query that takes longer than a threshold you set. It's the most reliable way to find real problems in production — not guessed ones.

Enable the slow query log

-- Check current status SHOW VARIABLES LIKE 'slow_query%'; SHOW VARIABLES LIKE 'long_query_time'; -- Enable for this session (doesn't persist across restart) SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 1; -- log queries > 1 second SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log'; -- Also log queries that perform a full table scan (no index used) SET GLOBAL log_queries_not_using_indexes = 'ON';
# Add to /etc/mysql/mysql.conf.d/mysqld.cnf for persistence: # [mysqld] # slow_query_log = 1 # slow_query_log_file = /var/log/mysql/slow.log # long_query_time = 1 # log_queries_not_using_indexes = 1

Reading the slow query log with mysqldumpslow

# Summarise the top 10 slowest query patterns mysqldumpslow -s t -t 10 /var/log/mysql/slow.log # -s t = sort by total time # -s c = sort by call count (finds frequently-run slow queries) # -t 10 = top 10
# Typical slow query log entry # Time: 2026-06-15T09:14:22 # User@Host: admin[admin] @ localhost [] # Query_time: 3.842134 Lock_time: 0.000120 # Rows_sent: 1 Rows_examined: 487503 # SET timestamp=1750000462; # SELECT * FROM orders WHERE YEAR(order_date) = 2026;
Rows_examined vs Rows_sent is your key ratio. Examining 487,503 rows to send 1 means the query is doing an enormous amount of unnecessary work. A well-indexed query should examine close to the number of rows it returns.

6. Live Diagnosis

SHOW PROCESSLIST — what's running right now

SHOW FULL PROCESSLIST\G -- Kill a stuck query (replace 42 with the Id from PROCESSLIST) KILL QUERY 42; -- kills the query, keeps the connection KILL 42; -- kills query AND disconnects

Key SHOW STATUS counters

-- Reset counters then run your workload, then check FLUSH STATUS; -- After running queries: SHOW STATUS LIKE 'Handler_%'; -- Important counters: -- Handler_read_rnd_next — rows read by full table scan (should be low) -- Handler_read_key — index lookups (higher is better) -- Handler_read_next — index range scan rows (acceptable) -- Created_tmp_disk_tables — temp tables spilled to disk (should be 0) -- Select_full_join — joins with no index (should be 0) SHOW STATUS LIKE 'Select_full%'; SHOW STATUS LIKE 'Created_tmp%';

ANALYZE TABLE — refresh statistics for the optimiser

-- After large INSERT/DELETE/UPDATE batches, statistics can become stale ANALYZE TABLE orders, order_items, books; -- Check when a table was last analyzed SELECT table_name, update_time, table_rows FROM information_schema.tables WHERE table_schema = DATABASE() ORDER BY update_time DESC;

7. A Complete Optimisation Walk-Through

A slow report query has been flagged in the slow query log — 4.2 seconds, examining 620,000 rows, returning 28. Here's the full diagnosis and fix cycle:

Step 1: run EXPLAIN

EXPLAIN SELECT DATE_FORMAT(o.order_date, '%Y-%m') AS month, COUNT(DISTINCT o.order_id) AS orders, SUM(oi.quantity * oi.unit_price) AS revenue FROM orders o JOIN order_items oi ON oi.order_id = o.order_id WHERE YEAR(o.order_date) = 2026 GROUP BY month ORDER BY month\G
tabletypekeyrowsExtra
o ALL NULL 420000 Using where; Using temporary; Using filesort
oi ref idx_order_items_order 3 NULL

Diagnoses: (1) YEAR() wraps the column — index on order_date is bypassed. (2) No index exists for ORDER BY month. (3) GROUP BY forces a temporary table.

Step 2: fix the query and add an index

-- Fix 1: rewrite the date filter as a range -- Fix 2: add a composite index for date + order_id to cover the GROUP BY CREATE INDEX idx_orders_date_id ON orders (order_date, order_id); -- Rewritten query SELECT DATE_FORMAT(o.order_date, '%Y-%m') AS month, COUNT(DISTINCT o.order_id) AS orders, SUM(oi.quantity * oi.unit_price) AS revenue FROM orders o JOIN order_items oi ON oi.order_id = o.order_id WHERE o.order_date >= '2026-01-01' AND o.order_date < '2027-01-01' GROUP BY month ORDER BY month;

Step 3: verify with EXPLAIN

tabletypekeyrowsExtra
o range idx_orders_date_id 28 Using where; Using index
oi ref idx_order_items_order 3 NULL

Rows examined dropped from 420,000 to 28. "Using temporary" and "Using filesort" are gone. Query time: 12ms.

Quick Reference

Tool / conceptCommand / ruleWhat to act on
EXPLAIN EXPLAIN SELECT … type = ALL, key = NULL, Extra = Using filesort / Using temporary
EXPLAIN ANALYZE EXPLAIN ANALYZE SELECT … (8.0.18+) Large gap between estimated and actual rows → run ANALYZE TABLE
Access types (best→worst) const → eq_ref → ref → range → index → ALL Investigate anything at index or ALL on tables > a few hundred rows
Composite index order Put equality columns first, range/ORDER BY column last Left-prefix rule: skipping a column breaks index use for subsequent columns
Covering index Include SELECT + WHERE + ORDER BY cols in one index EXPLAIN shows "Using index" — no table row read. Fastest possible read path.
Function on column WHERE YEAR(col)WHERE col BETWEEN … Any function wrapping an indexed column defeats the index
Leading wildcard LIKE '%term' → use FULLTEXT index + MATCH … AGAINST B-tree indexes require a known prefix
NOT IN with NULLs Add WHERE col IS NOT NULL inside subquery Any NULL in the list silently empties the NOT IN result
Slow query log SET GLOBAL slow_query_log = ON; SET GLOBAL long_query_time = 1; High Rows_examined vs Rows_sent ratio → missing or wrong index
mysqldumpslow mysqldumpslow -s t -t 10 /path/slow.log Find the top time-wasting query patterns across thousands of log entries
Refresh statistics ANALYZE TABLE tbl; Run after large bulk operations; stale stats cause poor plan choices
SHOW PROCESSLIST SHOW FULL PROCESSLIST\G Long-running queries holding locks; kill with KILL QUERY id
Selectivity check COUNT(DISTINCT col) / COUNT(*) Values near 0 → poor standalone index candidate; combine in composite or skip
Course Complete
You've covered all 10 chapters of the Advanced SQL course — from normalisation and every join type through subqueries, CTEs, window functions, views, stored procedures, triggers, events, and query optimisation. The skills in this course represent the full professional SQL toolkit used in production databases worldwide.