Writing tests

Claude in VS Code: Pair Programming

Chapter 6  ·  Writing Tests with Claude

Writing tests is one of the most mechanical parts of software development — and one of the best fits for Claude. Given a function to test, Claude can generate a suite of cases covering the happy path, error paths, and boundary conditions faster than you can type them. The skill is briefing Claude correctly on your testing framework and style, and knowing how to prompt for the edge cases you haven't thought of yet.

Brief Claude on Your Testing Setup First

Before asking Claude to write any tests, give it a one-time brief on your testing environment. Without this, Claude will make reasonable guesses — and those guesses may not match your project. Do this once at the start of a testing session:

Testing brief — send this before the first test request
FrameworkWe use pytest with pytest-asyncio for async tests.

StyleTests live in tests/ mirroring the app structure. Test files are named test_[module].py. Test functions are named test_[what]_[condition] (e.g. test_create_order_missing_user_returns_404).

CoverageWe aim to cover: happy path, each distinct error case, and at least one boundary value per numeric input.

ExceptionsWe don't mock the database — tests run against a real test DB. We do mock external HTTP calls using respx.

Once Claude has this brief, all test requests in the session will use the right framework, naming conventions, and mocking approach without you repeating it.

Types of Tests Claude Generates Well

Unit
Pure function tests
Write tests for the `calculate_late_fee` function in billing/utils.py. Cover: no late fee (paid on time), one day late, exactly 30 days late, and a negative due date input.
Claude's strongest area. Pure functions with defined inputs and outputs are trivial to test and Claude generates comprehensive cases quickly.
Unit
Class / method tests
Write tests for the `OrderProcessor.process()` method. Mock the payment gateway. Test: successful order, payment declined, and inventory unavailable.
Tell Claude exactly what to mock. If you leave it unspecified, Claude will choose — and may choose differently from your project's conventions.
Integration
API endpoint tests
Write integration tests for POST /orders using the test client. Test: valid order creation (201), missing required fields (422), unauthenticated request (401), and duplicate order (409).
Specify the expected status code for every case. Claude will generate the fixture setup if you open the existing conftest.py first.
Parametrised
Data-driven tests
Write a parametrised test for `parse_date_range(start, end)` covering: valid range, reversed dates, same start and end, start in the past, and end more than 1 year from now.
Claude writes clean pytest.mark.parametrize tables. Give it the cases; it handles the structure.

Anatomy of a Well-Generated Test

Example — pytest test for a billing function
def test_calculate_late_fee_one_day_late():arrange
    # Arrange
    due_date = date(2026, 1, 10)
    paid_date = date(2026, 1, 11)
    daily_rate = Decimal("2.50")

    # Actact
    result = calculate_late_fee(due_date, paid_date, daily_rate)

    # Assertassert
    assert result == Decimal("2.50")

A well-structured test has three clear sections: Arrange (set up inputs), Act (call the thing being tested), Assert (verify the result). If you ask Claude to follow this structure explicitly, the output is easier to read and easier to modify. Add Follow the Arrange-Act-Assert pattern to your brief.

Using Claude to Find Edge Cases You Haven't Thought Of

This is one of the highest-value ways to use Claude for testing. Rather than asking Claude to write tests for cases you already know about, ask it to find the ones you missed:

Edge case discovery prompts
// Discover first, write second
What edge cases and error conditions should I test for `calculate_late_fee`? List them before writing any code.

// Security-focused
What inputs to this function could cause unexpected behaviour from a security perspective? I want to make sure we're testing those.

// Boundary focus
What are the boundary values for this function? I want parametrised tests that cover exactly at, just below, and just above each boundary.

// Combination cases
What combinations of inputs are most likely to cause problems? I'm thinking about cases where multiple conditions interact.

Ask Claude to list the cases first — before writing any code. Review the list, add or remove cases, then ask Claude to write tests for the approved list. This gives you control over coverage without having to think of every case yourself.

Edge Case Categories to Always Cover

Null / empty
None, empty string, empty list, empty dict, zero-length file
Boundary values
Min, max, exactly at limit, one below limit, one above limit
Type coercion
Integer where float expected, string "0", "false", "null", numeric string
Special characters
Unicode, emoji, newlines in strings, SQL metacharacters, HTML tags
Negative / zero
Negative numbers, zero, negative zero, very large numbers, overflow
Time and dates
Leap day, DST transition, midnight, timezone mismatch, far-future dates
Concurrency
Duplicate requests, race on shared state, timeout during operation
External failure
DB unavailable, HTTP 500, network timeout, malformed response body

Common Pitfalls in Claude-Generated Tests

PitfallWhat it looks likeHow to avoid it
Tests that always pass An assertion like assert result is not None — technically passes but verifies almost nothing Ask Claude: "Assert the exact expected value, not just that a result exists."
Over-mocking Claude mocks the function being tested, or mocks so much that the test doesn't exercise real code Specify what to mock and what not to. "Mock the HTTP call only — use the real service layer."
Wrong framework syntax Claude uses unittest.mock when you use pytest-mock, or uses self.assert in a non-class test Include framework and library specifics in your brief. Open an existing test file before asking.
Tests coupled to implementation Tests that break when you refactor internal details, even if the behaviour didn't change Ask Claude: "Test the observable behaviour, not the internal implementation."
Missing cleanup Tests that leave state (DB rows, files, env vars) that affect other tests Ask Claude to add fixtures or teardown. "Make sure each test is independent and cleans up after itself."
Run the tests before committing them
Claude-generated tests can contain subtle errors — wrong import paths, a fixture that doesn't exist yet, an assertion that passes vacuously. Always run the full test file after generation and fix any failures before committing. A test suite with broken tests is worse than no test suite — it trains the team to ignore failures.

Tests for Code That Already Has Bugs

An underused pattern: ask Claude to write a test that reproduces a bug you're about to fix. Write the test first (it should fail), then fix the code (the test passes), then commit both. This is lightweight TDD with Claude doing the test-writing:

Bug-first test prompt
"Write a test that demonstrates this bug: when calculate_late_fee is called with a paid_date before the due_date, it returns a negative fee instead of zero. The test should fail against the current code and pass once the bug is fixed."
Next — Chapter 7: Working Across Files
Project-wide changes, asking Claude about architecture, and how Claude's cross-file awareness works. How to make changes that span modules, how to ask architectural questions that require reading multiple files, and the limits of Claude's project-wide understanding.