Chain-of-thought

Prompt Engineering

Chapter 5  ·  Chain-of-Thought — Getting Claude to Reason Step by Step

Some questions have a direct answer that Claude can produce in one shot. Others require working through intermediate steps — and if Claude skips those steps, it's more likely to get the answer wrong. Chain-of-thought prompting is the technique of explicitly asking Claude to reason before concluding, and it makes a measurable difference on anything involving logic, multi-step analysis, or careful judgement.

Why Step-by-Step Reasoning Helps

Recall from Chapter 1: Claude generates responses token by token, left to right. It can't go back and revise. This means if it jumps to a conclusion early, the reasoning that follows is partly post-hoc justification rather than genuine working-out. The conclusion shapes the reasoning, rather than the other way around.

When you ask Claude to reason step by step, those intermediate steps become part of the context Claude is generating into. Each step informs the next, and the final conclusion is built on explicit reasoning rather than a direct guess. The reasoning is visible — which means you can check it, catch errors, and push back on specific steps.

Direct answer (skips reasoning)
Q: Our server has 32 GB RAM. We're running MySQL (needs 8 GB), three app instances (2 GB each), and a Redis cache (1 GB). Is 32 GB enough if we add a fourth app instance? A: No, 32 GB won't be enough.
Correct answer, but no working shown. If it were wrong, you'd have no way to spot where the error was.
With chain-of-thought
Q: (same) — Think through this step by step. A: Current usage: - MySQL: 8 GB - 3 app instances × 2 GB = 6 GB - Redis: 1 GB Total: 15 GB used, 17 GB free. Adding a fourth instance: +2 GB → 17 GB used. OS overhead typically 1–2 GB → ~19 GB total. 32 GB available. Yes, there's headroom.
Checkable, correctable. You can spot if a figure is wrong. The conclusion is trustworthy because the working is visible.

What a Chain of Thought Looks Like

A well-structured chain-of-thought response moves through identifiable stages:

1
Restate / Clarify

Claude confirms its understanding of the problem, often surfacing hidden assumptions. "This is asking me to compare X under condition Y…"

2
Break Down

The problem is split into sub-problems or sequential steps. Each piece is small enough to handle individually.

3
Work Through

Each sub-problem is addressed in turn. Intermediate results appear explicitly and inform the next step.

4
Conclude

The final answer is stated, grounded in the steps above. It should follow naturally — not contradict the working.

Trigger Phrases That Work

The exact wording matters less than you might think — Claude recognises the intent. But some phrasings are more reliable than others:

Think through this step by step.

The classic. Works across almost any task. Put it at the end of the prompt where it carries most weight.

Reason through this before giving your answer.

Useful when you want the reasoning to precede the conclusion — prevents Claude from answering first and justifying after.

Walk me through your thinking.

Slightly conversational; works well in diagnostic contexts where you want to follow the logic interactively.

Before answering, list the factors you're weighing.

Good for decision or trade-off questions. Forces Claude to surface assumptions explicitly.

Show your working.

Familiar from maths contexts; also works well for capacity planning, cost estimates, and logic problems.

Think out loud as you work through this.

Produces a more exploratory, less structured response — good when you want Claude to surface uncertainty and alternatives.

Placement tip
As noted in Chapter 1, the end of your message carries high weight. Put your chain-of-thought trigger at the end of the prompt, not the beginning — it's more likely to shape the response structure.

A Real Example — Debugging with Chain-of-Thought

Debugging prompt with explicit step-by-step instruction
You
My Python script is supposed to read a CSV, filter rows where the "status" column equals "active", and write the result to a new file. It runs without errors but the output file is empty. Think through what could cause this step by step, then tell me the most likely cause.
Claude
Step 1 — The file writes without error but is empty. That rules out a file permission issue or write failure. Step 2 — The filter step is the most likely culprit. If no rows match the condition, the output would be empty. Common reasons: a) The column name has trailing whitespace: "status " ≠ "status" b) Case mismatch: "Active" ≠ "active" c) The CSV uses a different delimiter, so "status" was never parsed as a column Step 3 — If the filter matched rows, the write would have content, so the problem is upstream of writing. Step 4 — Least likely: the CSV is genuinely empty or all rows have non-"active" status values. Most likely cause: a whitespace or case mismatch in the "status" column values. Try printing df['status'].unique() before filtering to see the actual values in the column.

When Chain-of-Thought Helps — and When to Skip It

Task typeUse CoT?Why
Multi-step maths or calculations Yes Each step uses the previous result — skipping steps introduces compounding error
Debugging — finding a root cause Yes Reasoning through possibilities systematically beats guessing the most common cause
Trade-off decisions (A vs B) Yes Forces the criteria to be explicit rather than hidden in a conclusion
Security or logic review Yes You want to see what Claude checked and what it dismissed — the working matters as much as the answer
Simple factual questions No Adds verbosity with no benefit — "What port does SSH use?" doesn't need a reasoning chain
Creative writing No Interrupts flow; reasoning steps don't improve creative output
Format / style tasks No Rewriting a sentence or reformatting a table doesn't benefit from explicit intermediate steps
Ambiguous problem you're still exploring Yes "Think out loud" surfaces assumptions and alternatives you might not have considered

Asking Claude to Separate Thinking from Answer

Sometimes you want the reasoning but don't need it cluttering the final response. You can ask Claude to structure the output explicitly:

Separating reasoning from conclusion
You
Should we use a message queue or direct HTTP calls between these two services? Consider latency, reliability, and team complexity. Reason through it first under a "Thinking:" heading, then give a one-paragraph recommendation under "Recommendation:".
Claude
Thinking: Latency: Direct HTTP is lower latency for synchronous needs; a queue adds processing delay. Reliability: A queue decouples failure — if the consumer is down, messages persist. HTTP fails at the call site. Team complexity: Queues require operating a broker (RabbitMQ, SQS, etc.), monitoring dead-letter queues, and debugging async flows. HTTP is simpler to trace. Key question: does the caller need an immediate response, or fire-and-forget? Recommendation: If this is a fire-and-forget notification or background job, use a message queue — the reliability benefit outweighs the complexity cost. If the caller needs an immediate result to continue processing, use direct HTTP with a retry policy. Given that you mentioned the team is junior, I'd default to HTTP unless reliability under consumer failure is a hard requirement.

Extended Thinking — The Built-In Version

Claude models support an optional extended thinking mode (available via the API and some interfaces) where Claude maintains a private reasoning scratchpad before generating its response. This is chain-of-thought at the architecture level — the model genuinely works through the problem internally before committing to a response token.

  • More thorough than a prompted chain-of-thought for hard reasoning tasks
  • The scratchpad content may or may not be shown to you depending on the interface
  • Slower and higher cost — worth it for complex decisions, not for everyday tasks
  • In Claude Code, this is what Fast Mode disables — standard responses are faster; complex reasoning tasks benefit from it being on
Practical note
For most prompted chain-of-thought needs in a chat interface, explicit step-by-step instructions work well without needing extended thinking mode. Reserve extended thinking for hard logical or mathematical problems where even a careful prompted response goes wrong.
Next — Chapter 6: Iterative Refinement
Treating a conversation as a multi-turn collaboration rather than a single-shot request. How to build on Claude's output, redirect it, ask for alternatives, and get progressively closer to exactly what you need — without starting from scratch each time.