How Claude thinks

Prompt Engineering

Chapter 1  ·  How Claude Thinks — Tokens, Context Windows, and Why Phrasing Matters

Before you can write great prompts, it helps to understand what's actually happening inside Claude when you send a message. This chapter pulls back the curtain — not the deep maths, but the mental model that makes a real difference to how you write prompts.

Everything is Tokens

Claude doesn't see words — it sees tokens. A token is a chunk of text, somewhere between a character and a word. Common short words are a single token; longer or unusual words get split into several. Whitespace and punctuation count too.

Here's the sentence "The WireGuard configuration failed unexpectedly." broken into tokens:

The Wire Guard configuration failed unexpect edly .

A few rules of thumb:

  • ~1 token per common English word
  • Technical jargon, proper nouns, and rare words often split into 2–4 tokens
  • Code tends to be token-heavy — wg0.conf is 4 tokens, not one
  • ~750 words ≈ 1,000 tokens
Why it matters
Claude pays attention to token patterns — not individual characters. Typos in unusual words can genuinely confuse it, because the tokenisation produces something it has rarely seen. Common words with typos matter less.

The Context Window — Claude's Working Memory

Claude has no persistent memory between conversations. Within a single conversation, everything — your system prompt, the full chat history, your current message, and Claude's response so far — lives inside the context window. Think of it as a scroll of text that Claude can read from top to bottom.

System prompt
Conversation history
Your message
← available space →
System prompt Conversation history Current message Free space (for Claude's response)

Claude (Sonnet 4.6, the model powering this session) has a 200,000-token context window — roughly 150,000 words. That's enormous; you'll rarely hit the limit in normal use. But the principle matters:

  • Claude reads the whole context on every message. It doesn't "remember" earlier messages — it literally re-reads them. This is why a very long conversation can slow slightly and why context at the top (the system prompt) carries weight.
  • Older context gets equal attention. Claude doesn't forget what you said in message 3 just because you're now on message 30 — it's all still there.
  • When the window fills, something must give. Claude Code automatically summarises older turns to make room. That summary is accurate but lossy — precise code snippets from early in the session may be paraphrased.
Common misconception
"Claude forgot what I said earlier." Almost never true within a session. If Claude contradicts something from earlier, it's more likely a reasoning error than a memory failure — the earlier text is still there.

How Claude Generates a Response

Claude is a next-token predictor. It reads your entire prompt and then generates a response one token at a time, each new token informed by everything before it (the context plus what it's already written in this response).

This has some important consequences:

Claude commits as it writes

Once a token is generated, it's fixed. Claude can't go back and revise the third word after writing the tenth. This is why asking Claude to think step by step or reason before concluding genuinely helps — it forces the reasoning to appear in the context window where it can influence later tokens, rather than being skipped.

The first sentence sets the trajectory

If Claude opens with "Sure, here's a simple explanation…" it has now committed to a simple, explanatory register. If it opens with "There are three key trade-offs to consider…" the rest of the response will naturally follow an analytical structure. Your prompt shapes which opening Claude is likely to produce — and that opening shapes everything after it.

Claude doesn't plan ahead

There's no hidden phase where Claude "thinks" before writing. What you see is what's being generated. Extended thinking (available on some models) is different — it's an explicit reasoning scratchpad — but in a standard response, the generation is the thinking.

Why Phrasing Matters — Practical Examples

Because Claude predicts likely continuations, your framing powerfully shapes the kind of response you get. The same underlying question, phrased differently, can produce very different outputs.

Weak prompt
Tell me about SSH.
Likely result: a broad, encyclopaedic overview covering history, use cases, key exchange — much of which you probably know already.
Better prompt
I've set up SSH key auth on my Debian server but get "Permission denied (publickey)". Walk me through diagnosing it — I can run commands on the server.
Likely result: a focused diagnostic sequence — checking ~/.ssh/authorized_keys permissions, sshd_config, SELinux/AppArmor — exactly what's needed.
Weak prompt
Write a Python function.
Likely result: Claude guesses. You'll get a placeholder that may not resemble what you actually needed.
Better prompt
Write a Python function that takes a list of file paths, filters to only .html files, and returns them sorted by modification date, newest first. Type hints required.
Likely result: exactly that function, with pathlib, os.path.getmtime, and proper type annotations.
The core insight
Claude is trying to produce the most likely useful continuation of your input. The more your prompt resembles the beginning of the kind of document you want to end up with, the more likely Claude is to produce exactly that. A vague opener invites a generic response; a specific, well-structured opener invites a specific, well-structured answer.

Attention — What Claude Weighs Most Heavily

Claude's attention mechanism means not all parts of the context carry equal weight:

Position / Type Relative influence Practical implication
System prompt (top of context) Very high Good place for persistent instructions, persona, constraints
Most recent message Very high The immediate question; Claude focuses here
Earlier conversation turns Moderate Available but competes with recency; restate key details if drift occurs
Middle of a long document you pasted Lower Known as "lost in the middle" — put critical info at the start or end
End of your message High A final instruction or format spec carries real weight

Key Takeaways

Tokens, not words

Unusual words, code, and jargon cost more tokens. Keep prompts crisp; waffle costs context space and dilutes signal.

Context is a scroll

Claude re-reads everything every time. There's no forgetting within a session — only context limits and reasoning errors.

Generation is linear

Claude writes left to right and can't revise. Asking it to reason before concluding genuinely changes the output.

Framing is everything

Your prompt is the start of a document Claude is completing. Make your opener look like the document you want.

Position matters

Critical instructions belong at the start (system prompt) or end (final message). The middle of a long paste is easy to miss.

Next — Chapter 2: Anatomy of a Good Prompt
Now that you know how Claude processes input, we build a framework: the four components every strong prompt should have — clarity, context, constraints, and output format — with worked examples for each.