Writing tests

Claude in VS Code: Pair Programming

Chapter 6 · Writing Tests with Claude

Writing tests is one of the most mechanical parts of software development — and one of the best fits for Claude. Given a function to test, Claude can generate a suite of cases covering the happy path, error paths, and boundary conditions faster than you can type them. The skill is briefing Claude correctly on your testing framework and style, and knowing how to prompt for the edge cases you haven't thought of yet.

Brief Claude on Your Testing Setup First

Before asking Claude to write any tests, give it a one-time brief on your testing environment. Without this, Claude will make reasonable guesses — and those guesses may not match your project. Do this once at the start of a testing session:

Testing brief — send this before the first test request

FrameworkWe use pytest with pytest-asyncio for async tests.

StyleTests live in tests/ mirroring the app structure. Test files are named test_[module].py. Test functions are named test_[what]_[condition] (e.g. test_create_order_missing_user_returns_404).

CoverageWe aim to cover: happy path, each distinct error case, and at least one boundary value per numeric input.

ExceptionsWe don't mock the database — tests run against a real test DB. We do mock external HTTP calls using respx.

Once Claude has this brief, all test requests in the session will use the right framework, naming conventions, and mocking approach without you repeating it.

Types of Tests Claude Generates Well

Unit

Pure function tests

Write tests for the `calculate_late_fee` function in billing/utils.py. Cover: no late fee (paid on time), one day late, exactly 30 days late, and a negative due date input.

Claude's strongest area. Pure functions with defined inputs and outputs are trivial to test and Claude generates comprehensive cases quickly.

Unit

Class / method tests

Write tests for the `OrderProcessor.process()` method. Mock the payment gateway. Test: successful order, payment declined, and inventory unavailable.

Tell Claude exactly what to mock. If you leave it unspecified, Claude will choose — and may choose differently from your project's conventions.

Integration

API endpoint tests

Write integration tests for POST /orders using the test client. Test: valid order creation (201), missing required fields (422), unauthenticated request (401), and duplicate order (409).

Specify the expected status code for every case. Claude will generate the fixture setup if you open the existing conftest.py first.

Parametrised

Data-driven tests

Write a parametrised test for `parse_date_range(start, end)` covering: valid range, reversed dates, same start and end, start in the past, and end more than 1 year from now.

Claude writes clean pytest.mark.parametrize tables. Give it the cases; it handles the structure.

Anatomy of a Well-Generated Test

Example — pytest test for a billing function

def test_calculate_late_fee_one_day_late():arrange
    # Arrange
    due_date = date(2026, 1, 10)
    paid_date = date(2026, 1, 11)
    daily_rate = Decimal("2.50")

    # Actact
    result = calculate_late_fee(due_date, paid_date, daily_rate)

    # Assertassert
    assert result == Decimal("2.50")

A well-structured test has three clear sections: Arrange (set up inputs), Act (call the thing being tested), Assert (verify the result). If you ask Claude to follow this structure explicitly, the output is easier to read and easier to modify. Add Follow the Arrange-Act-Assert pattern to your brief.

Using Claude to Find Edge Cases You Haven't Thought Of

This is one of the highest-value ways to use Claude for testing. Rather than asking Claude to write tests for cases you already know about, ask it to find the ones you missed:

Edge case discovery prompts

// Discover first, write second
What edge cases and error conditions should I test for `calculate_late_fee`? List them before writing any code.

// Security-focused
What inputs to this function could cause unexpected behaviour from a security perspective? I want to make sure we're testing those.

// Boundary focus
What are the boundary values for this function? I want parametrised tests that cover exactly at, just below, and just above each boundary.

// Combination cases
What combinations of inputs are most likely to cause problems? I'm thinking about cases where multiple conditions interact.

Ask Claude to list the cases first — before writing any code. Review the list, add or remove cases, then ask Claude to write tests for the approved list. This gives you control over coverage without having to think of every case yourself.

Edge Case Categories to Always Cover

Null / empty

None, empty string, empty list, empty dict, zero-length file

Boundary values

Min, max, exactly at limit, one below limit, one above limit

Type coercion

Integer where float expected, string "0", "false", "null", numeric string

Special characters

Unicode, emoji, newlines in strings, SQL metacharacters, HTML tags

Negative / zero

Negative numbers, zero, negative zero, very large numbers, overflow

Time and dates

Leap day, DST transition, midnight, timezone mismatch, far-future dates

Concurrency

Duplicate requests, race on shared state, timeout during operation

External failure

DB unavailable, HTTP 500, network timeout, malformed response body

Common Pitfalls in Claude-Generated Tests

Pitfall	What it looks like	How to avoid it
Tests that always pass	An assertion like `assert result is not None` — technically passes but verifies almost nothing	Ask Claude: "Assert the exact expected value, not just that a result exists."
Over-mocking	Claude mocks the function being tested, or mocks so much that the test doesn't exercise real code	Specify what to mock and what not to. "Mock the HTTP call only — use the real service layer."
Wrong framework syntax	Claude uses `unittest.mock` when you use `pytest-mock`, or uses `self.assert` in a non-class test	Include framework and library specifics in your brief. Open an existing test file before asking.
Tests coupled to implementation	Tests that break when you refactor internal details, even if the behaviour didn't change	Ask Claude: "Test the observable behaviour, not the internal implementation."
Missing cleanup	Tests that leave state (DB rows, files, env vars) that affect other tests	Ask Claude to add fixtures or teardown. "Make sure each test is independent and cleans up after itself."

Run the tests before committing them

Claude-generated tests can contain subtle errors — wrong import paths, a fixture that doesn't exist yet, an assertion that passes vacuously. Always run the full test file after generation and fix any failures before committing. A test suite with broken tests is worse than no test suite — it trains the team to ignore failures.

Tests for Code That Already Has Bugs

An underused pattern: ask Claude to write a test that reproduces a bug you're about to fix. Write the test first (it should fail), then fix the code (the test passes), then commit both. This is lightweight TDD with Claude doing the test-writing:

Bug-first test prompt

"Write a test that demonstrates this bug: when calculate_late_fee is called with a paid_date before the due_date, it returns a negative fee instead of zero. The test should fail against the current code and pass once the bug is fixed."

Next — Chapter 7: Working Across Files
Project-wide changes, asking Claude about architecture, and how Claude's cross-file awareness works. How to make changes that span modules, how to ask architectural questions that require reading multiple files, and the limits of Claude's project-wide understanding.