Chapter 1 — Database Design and Normalisation
Good SQL starts before you write a single query. A poorly designed schema creates problems that no amount of clever querying can fully fix: redundant data that drifts out of sync, updates that break things silently, and queries that are far slower than they need to be. This chapter covers normalisation — the formal rules for designing tables that avoid these problems — and ends by building the extended bookshop schema used throughout this entire course.
What this chapter covers: the three anomalies that bad design causes (update, insert, delete). First, Second, and Third Normal Form — what each requires, why it matters, and how to recognise a violation. Denormalisation: when breaking the rules is deliberate and correct. ER relationship types and foreign key behaviour options. The full extended bookshop schema: publishers, genres, book_authors, book_genres, order_items, and reviews — with all CREATE TABLE statements and the design decisions behind them.
The Problem — Three Anomalies Bad Design Creates
Normalisation solves three specific problems that arise when related data is stored in the same table without proper structure. Here's a poorly designed books table that stores the publisher's details directly alongside the book:
| id | title | pub_name | pub_country | price |
| 1 | Dune | Hodder | UK | 9.99 |
| 2 | Foundation | Hodder | UK | 8.99 |
| 3 | Neuromancer | Hodder | UK | 7.99 |
| 4 | Hyperion | Gollancz | UK | 9.99 |
pub_name and pub_country duplicated across rows 1–3
| id | title | pub_id | price |
| 1 | Dune | 1 | 9.99 |
| 2 | Foundation | 1 | 8.99 |
| 3 | Neuromancer | 1 | 7.99 |
| 4 | Hyperion | 2 | 9.99 |
Publisher details live in publishers(id, name, country) — one row each
The denormalised version creates three anomalies:
- Update anomaly: if Hodder rebrands to "Hodder & Stoughton", you must update three rows — and if you miss one, the database now contradicts itself. With a
publishers table, you update exactly one row.
- Insert anomaly: you can't record a new publisher until they have at least one book in the table. There's nowhere to store Penguin unless Penguin has a book row.
- Delete anomaly: if you delete Hyperion (the only Gollancz book), you lose the fact that Gollancz exists as a publisher. Publisher data is coupled to book data and they disappear together.
The Three Normal Forms
Rule: every column must hold a single, indivisible value. No lists, no comma-separated values, no repeating column groups (genre1, genre2, genre3…). Every row must be uniquely identifiable by a primary key.
Test: can you split a column value and still get meaningful data? If yes, it violates 1NF.
Rule: must be in 1NF, and every non-key column must depend on the whole primary key — not just part of it. This only applies when the primary key is composite (two or more columns). If your primary key is a single column, you get 2NF for free.
Test: take each non-key column. Does it describe the whole PK, or just one part of it? If just part → partial dependency → 2NF violation.
Rule: must be in 2NF, and no non-key column may depend on another non-key column. Non-key columns must depend only on the primary key — not on each other.
Test: take each non-key column. Does it describe the primary key, or does it describe another non-key column? If the latter → transitive dependency → 3NF violation.
1NF — Fixing Atomic Values
| id | title | genres |
| 1 | Dune | Science Fiction, Adventure |
| 2 | Gone Girl | Mystery, Thriller, Crime |
| 3 | Foundation | Science Fiction |
Can't query "all science fiction books" without string parsing
| book_id | genre_id |
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 2 | 4 |
| 2 | 5 |
| 3 | 1 |
book_genres table: each genre per book is its own row
SELECT b.title
FROM books b
JOIN book_genres bg ON bg.book_id = b.id
JOIN genres g ON g.id = bg.genre_id
WHERE g.name = 'Science Fiction';
2NF — Fixing Partial Dependencies
This only matters with composite primary keys. Consider an order_items table where the PK is (order_id, book_id):
| order_id | book_id | book_title | qty |
| 101 | 1 | Dune | 2 |
| 101 | 3 | Foundation | 1 |
| 102 | 1 | Dune | 1 |
book_title depends only on book_id, not on (order_id, book_id)
| order_id | book_id | qty | unit_price |
| 101 | 1 | 2 | 9.99 |
| 101 | 3 | 1 | 8.99 |
| 102 | 1 | 1 | 9.99 |
title comes from books.title via JOIN; unit_price is order-specific (price may change)
Why store unit_price in order_items? The book's current price can change — a sale, a price increase. The order captures the price at the time of purchase. This is a deliberate decision: unit_price in order_items is not redundant with books.price — they represent different things. This pattern is correct even though it looks like duplication.
3NF — Fixing Transitive Dependencies
| id | title | pub_id | pub_name | pub_country |
| 1 | Dune | 1 | Hodder | UK |
| 2 | Foundation | 1 | Hodder | UK |
| 4 | Hyperion | 2 | Gollancz | UK |
pub_name and pub_country depend on pub_id, not on the book's id
| id | name | country |
| 1 | Hodder | UK |
| 2 | Gollancz | UK |
books only stores publisher_id as a FK; publisher details live in publishers
The chain book.id → pub_id → pub_name is the transitive dependency: pub_name doesn't describe the book — it describes the publisher. Moving it to a publishers table breaks the chain and eliminates the update, insert, and delete anomalies from the opening example.
Denormalisation — When Breaking the Rules is Correct
Normalisation is the right starting point. But databases serving read-heavy workloads — reporting dashboards, product listings, analytics — sometimes deliberately denormalise to avoid expensive joins at query time.
Denormalise intentionally, not accidentally. A denormalised table created on purpose, with a documented reason and a process to keep it in sync, is a valid engineering choice. A denormalised table that happened because nobody thought about design is a maintenance problem. The distinction is intention and control.
- Derived/cached columns: storing a pre-calculated
order_total on the orders row avoids summing order_items on every page load. Acceptable if you update it on every insert/update to order_items (a trigger or application logic).
- Redundant columns for speed: adding
author_name to books to avoid a join on the book listing page. Acceptable if update paths are controlled.
- Separate reporting tables: a daily snapshot table that copies and flattens live data for reporting queries. The snapshot is explicitly denormalised and rebuilt on schedule — not the source of truth.
- The rule: the normalised schema is always the authoritative source of truth. Denormalised copies are derived. They should be clearly marked as such in table names (e.g.
report_daily_sales) or schema documentation.
ER Relationships — One-to-One, One-to-Many, Many-to-Many
One-to-One (1:1)
A ──── B
One row in A relates to exactly one row in B and vice versa. Rare in practice — if the data always appears together, put it in one table. Use when splitting a wide table for permissions (users vs user_secrets) or optional data (books vs book_cover_art).
Example: users ↔ user_preferences
One-to-Many (1:N)
A ──── B B B B
One row in A relates to many rows in B. The most common relationship. Implemented with a foreign key in B pointing to A's primary key. One publisher has many books; one customer has many orders.
Example: publishers ↔ books, customers ↔ orders
Many-to-Many (M:N)
A A A ──── B B B
Many rows in A relate to many rows in B. Implemented with a junction table (also called a bridge or associative table) that holds foreign keys to both A and B. One book has many genres; one genre has many books.
Example: books ↔ book_genres ↔ genres
Foreign Keys and ON DELETE Behaviour
A foreign key constraint enforces referential integrity: a row in the child table cannot reference a parent row that doesn't exist. When a parent row is deleted, MySQL needs to know what to do with the orphaned children.
| ON DELETE option | What happens to child rows | Best for |
| RESTRICT |
MySQL blocks the DELETE if any child rows reference this parent. The parent cannot be deleted until all children are deleted first. |
Safety-first. Use when accidental deletion would be catastrophic (e.g. deleting a publisher that still has books). |
| CASCADE |
Child rows are automatically deleted when the parent is deleted. |
Owned data where children have no meaning without the parent. Deleting an order cascades to order_items. Deleting a user cascades to their sessions. |
| SET NULL |
The FK column in child rows is set to NULL when the parent is deleted. The FK column must be nullable. |
Optional references. A review's customer_id could be SET NULL if the customer deletes their account — the review stays, attributed to a deleted user. |
| NO ACTION |
Identical to RESTRICT in MySQL InnoDB. Checks the constraint at the end of the statement. |
Rarely chosen explicitly — defaults vary by engine and SQL standard. |
The Extended Bookshop Schema
The SQL Foundations course used a simplified bookshop schema. The Advanced course extends it into a properly normalised database that can model real-world scenarios. All JOIN, subquery, CTE, window function, and optimisation examples in subsequent chapters use this schema.
Tables at a glance
publishers
id · name · country · founded_year
authors
id · first_name · last_name · nationality · bio
books
id · title · publisher_id · isbn · price · year_published · stock
book_authors
book_id · author_id · author_order
junction — M:N books↔authors
book_genres
book_id · genre_id
junction — M:N books↔genres
customers
id · first_name · last_name · email · created_at
orders
id · customer_id · status · created_at
no total_amount — it's derived
order_items
id · order_id · book_id · quantity · unit_price
reviews
id · book_id · customer_id · rating · body · created_at
ER diagram — entity relationships
publishers ──1:N──> books <──M:N──> authors (via book_authors)
│
├──M:N──> genres (via book_genres)
│
├──1:N──> order_items <──N:1──> orders <──N:1──> customers
│
└──1:N──> reviews <──N:1──> customers
Key relationships:
publishers 1:N books one publisher has many books
books M:N authors a book can have multiple authors (co-authored)
books M:N genres a book can belong to multiple genres
customers 1:N orders one customer places many orders
orders 1:N order_items one order contains many line items
books 1:N order_items one book appears in many orders
books 1:N reviews one book has many reviews
customers 1:N reviews one customer writes many reviews
CREATE TABLE — the full schema
CREATE DATABASE IF NOT EXISTS bookshop
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;
USE bookshop;
CREATE TABLE publishers (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(200) NOT NULL,
country VARCHAR(100) NOT NULL DEFAULT '',
founded_year SMALLINT UNSIGNED DEFAULT NULL,
UNIQUE KEY uq_publisher_name (name)
) ENGINE=InnoDB;
CREATE TABLE authors (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100) NOT NULL,
nationality VARCHAR(100) DEFAULT NULL,
bio TEXT DEFAULT NULL,
INDEX idx_author_name (last_name, first_name)
) ENGINE=InnoDB;
CREATE TABLE genres (
id TINYINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(80) NOT NULL,
slug VARCHAR(80) NOT NULL,
UNIQUE KEY uq_genre_slug (slug)
) ENGINE=InnoDB;
CREATE TABLE books (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(300) NOT NULL,
publisher_id INT UNSIGNED DEFAULT NULL,
isbn CHAR(13) DEFAULT NULL,
price DECIMAL(8,2) NOT NULL,
year_published SMALLINT UNSIGNED DEFAULT NULL,
stock INT UNSIGNED NOT NULL DEFAULT 0,
description TEXT DEFAULT NULL,
UNIQUE KEY uq_isbn (isbn),
INDEX idx_books_publisher (publisher_id),
INDEX idx_books_year (year_published),
CONSTRAINT fk_books_publisher
FOREIGN KEY (publisher_id) REFERENCES publishers(id)
ON DELETE SET NULL
ON UPDATE CASCADE
) ENGINE=InnoDB;
CREATE TABLE book_authors (
book_id INT UNSIGNED NOT NULL,
author_id INT UNSIGNED NOT NULL,
author_order TINYINT UNSIGNED NOT NULL DEFAULT 1,
PRIMARY KEY (book_id, author_id),
CONSTRAINT fk_ba_book FOREIGN KEY (book_id) REFERENCES books(id) ON DELETE CASCADE,
CONSTRAINT fk_ba_author FOREIGN KEY (author_id) REFERENCES authors(id) ON DELETE CASCADE
) ENGINE=InnoDB;
CREATE TABLE book_genres (
book_id INT UNSIGNED NOT NULL,
genre_id TINYINT UNSIGNED NOT NULL,
PRIMARY KEY (book_id, genre_id),
CONSTRAINT fk_bg_book FOREIGN KEY (book_id) REFERENCES books(id) ON DELETE CASCADE,
CONSTRAINT fk_bg_genre FOREIGN KEY (genre_id) REFERENCES genres(id) ON DELETE CASCADE
) ENGINE=InnoDB;
CREATE TABLE customers (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100) NOT NULL,
email VARCHAR(254) NOT NULL,
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
UNIQUE KEY uq_customer_email (email),
INDEX idx_customer_name (last_name, first_name)
) ENGINE=InnoDB;
CREATE TABLE orders (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
customer_id INT UNSIGNED NOT NULL,
status ENUM('pending', 'paid', 'shipped', 'delivered', 'cancelled')
NOT NULL DEFAULT 'pending',
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
INDEX idx_orders_customer (customer_id),
INDEX idx_orders_status (status),
INDEX idx_orders_created (created_at),
CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id) REFERENCES customers(id)
ON DELETE RESTRICT
ON UPDATE CASCADE
) ENGINE=InnoDB;
CREATE TABLE order_items (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
order_id INT UNSIGNED NOT NULL,
book_id INT UNSIGNED NOT NULL,
quantity SMALLINT UNSIGNED NOT NULL DEFAULT 1,
unit_price DECIMAL(8,2) NOT NULL,
INDEX idx_oi_order (order_id),
INDEX idx_oi_book (book_id),
CONSTRAINT fk_oi_order FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE,
CONSTRAINT fk_oi_book FOREIGN KEY (book_id) REFERENCES books(id) ON DELETE RESTRICT
) ENGINE=InnoDB;
CREATE TABLE reviews (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
book_id INT UNSIGNED NOT NULL,
customer_id INT UNSIGNED DEFAULT NULL,
rating TINYINT UNSIGNED NOT NULL,
body TEXT DEFAULT NULL,
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
INDEX idx_reviews_book (book_id),
INDEX idx_reviews_customer (customer_id),
CONSTRAINT chk_rating CHECK (rating BETWEEN 1 AND 5),
CONSTRAINT fk_rev_book FOREIGN KEY (book_id) REFERENCES books(id) ON DELETE CASCADE,
CONSTRAINT fk_rev_customer FOREIGN KEY (customer_id) REFERENCES customers(id) ON DELETE SET NULL
) ENGINE=InnoDB;
Sample data to follow along
INSERT INTO publishers (name, country, founded_year) VALUES
('Hodder & Stoughton', 'UK', 1868),
('Victor Gollancz', 'UK', 1927),
('Doubleday', 'USA', 1897),
('Ace Books', 'USA', 1952);
INSERT INTO authors (first_name, last_name, nationality) VALUES
('Frank', 'Herbert', 'American'),
('Isaac', 'Asimov', 'American'),
('William', 'Gibson', 'American'),
('Dan', 'Simmons', 'American'),
('Gillian', 'Flynn', 'American'),
('Agatha', 'Christie', 'British');
INSERT INTO genres (name, slug) VALUES
('Science Fiction', 'science-fiction'),
('Thriller', 'thriller'),
('Mystery', 'mystery'),
('Cyberpunk', 'cyberpunk'),
('Adventure', 'adventure'),
('Crime', 'crime');
INSERT INTO books (title, publisher_id, isbn, price, year_published, stock) VALUES
('Dune', 4, '9780441013593', 9.99, 1965, 50),
('Foundation', 3, '9780553293357', 8.99, 1951, 35),
('Neuromancer', 4, '9780441569595', 7.99, 1984, 28),
('Hyperion', 3, '9780385249492', 9.99, 1989, 40),
('Gone Girl', 3, '9780307588364', 8.49, 2012, 22),
('Murder on the Orient Express', 1, '9780007119318', 7.49, 1934, 60);
INSERT INTO book_authors VALUES
(1, 1, 1),
(2, 2, 1),
(3, 3, 1),
(4, 4, 1),
(5, 5, 1),
(6, 6, 1);
INSERT INTO book_genres VALUES
(1, 1), (1, 5),
(2, 1),
(3, 1), (3, 4),
(4, 1), (4, 5),
(5, 2), (5, 3), (5, 6),
(6, 3), (6, 6);
INSERT INTO customers (first_name, last_name, email) VALUES
('Alice', 'Martin', 'alice@example.com'),
('Bob', 'Nguyen', 'bob@example.com'),
('Carol', 'Patel', 'carol@example.com');
INSERT INTO orders (customer_id, status, created_at) VALUES
(1, 'delivered', '2026-04-10 09:00:00'),
(1, 'shipped', '2026-05-20 11:30:00'),
(2, 'delivered', '2026-03-05 14:00:00'),
(3, 'pending', '2026-06-01 08:45:00');
INSERT INTO order_items (order_id, book_id, quantity, unit_price) VALUES
(1, 1, 2, 9.99),
(1, 3, 1, 7.99),
(2, 4, 1, 9.99),
(3, 2, 1, 8.99),
(3, 6, 1, 7.49),
(4, 5, 1, 8.49);
INSERT INTO reviews (book_id, customer_id, rating, body, created_at) VALUES
(1, 1, 5, 'A masterpiece. Read it twice.', '2026-04-15'),
(1, 2, 4, 'Dense but rewarding.', '2026-03-20'),
(2, 2, 5, 'Asimov at his best.', '2026-03-10'),
(3, 1, 4, 'Invented cyberpunk. Essential.', '2026-04-20'),
(6, 2, 5, 'The plot twist still holds up 90 years on.', '2026-03-25');
SELECT table_name, table_rows
FROM information_schema.tables
WHERE table_schema = 'bookshop'
ORDER BY table_name;
Quick Reference — Chapter 1
| Concept | The rule |
| 1NF | Atomic values only — no lists, no comma-separated values, no repeating column groups. PK required. |
| 2NF | No partial dependencies — every non-key column depends on the whole composite PK. Irrelevant if PK is a single column. |
| 3NF | No transitive dependencies — non-key columns describe the PK, not each other. |
| M:N relationship | Always needs a junction table with FKs to both sides. PK is the composite of both FKs. |
| ON DELETE CASCADE | Child rows deleted automatically. Use for owned data (order_items when order deleted). |
| ON DELETE RESTRICT | Blocks parent deletion while children exist. Use for safety (can't delete customer with orders). |
| ON DELETE SET NULL | Child FK set to NULL. Use for optional references (review stays if customer is deleted). |
| unit_price in order_items | Not a 3NF violation — it records the price at time of purchase, distinct from current books.price. |
| Denormalisation | Deliberate only. Normalised schema = source of truth. Derived copies are clearly separate. |