How Claude thinks
Prompt Engineering
Before you can write great prompts, it helps to understand what's actually happening inside Claude when you send a message. This chapter pulls back the curtain — not the deep maths, but the mental model that makes a real difference to how you write prompts.
Everything is Tokens
Claude doesn't see words — it sees tokens. A token is a chunk of text, somewhere between a character and a word. Common short words are a single token; longer or unusual words get split into several. Whitespace and punctuation count too.
Here's the sentence "The WireGuard configuration failed unexpectedly." broken into tokens:
A few rules of thumb:
- ~1 token per common English word
- Technical jargon, proper nouns, and rare words often split into 2–4 tokens
- Code tends to be token-heavy —
wg0.confis 4 tokens, not one - ~750 words ≈ 1,000 tokens
The Context Window — Claude's Working Memory
Claude has no persistent memory between conversations. Within a single conversation, everything — your system prompt, the full chat history, your current message, and Claude's response so far — lives inside the context window. Think of it as a scroll of text that Claude can read from top to bottom.
Claude (Sonnet 4.6, the model powering this session) has a 200,000-token context window — roughly 150,000 words. That's enormous; you'll rarely hit the limit in normal use. But the principle matters:
- Claude reads the whole context on every message. It doesn't "remember" earlier messages — it literally re-reads them. This is why a very long conversation can slow slightly and why context at the top (the system prompt) carries weight.
- Older context gets equal attention. Claude doesn't forget what you said in message 3 just because you're now on message 30 — it's all still there.
- When the window fills, something must give. Claude Code automatically summarises older turns to make room. That summary is accurate but lossy — precise code snippets from early in the session may be paraphrased.
How Claude Generates a Response
Claude is a next-token predictor. It reads your entire prompt and then generates a response one token at a time, each new token informed by everything before it (the context plus what it's already written in this response).
This has some important consequences:
Claude commits as it writes
Once a token is generated, it's fixed. Claude can't go back and revise the third word after writing the tenth. This is why asking Claude to think step by step or reason before concluding genuinely helps — it forces the reasoning to appear in the context window where it can influence later tokens, rather than being skipped.
The first sentence sets the trajectory
If Claude opens with "Sure, here's a simple explanation…" it has now committed to a simple, explanatory register. If it opens with "There are three key trade-offs to consider…" the rest of the response will naturally follow an analytical structure. Your prompt shapes which opening Claude is likely to produce — and that opening shapes everything after it.
Claude doesn't plan ahead
There's no hidden phase where Claude "thinks" before writing. What you see is what's being generated. Extended thinking (available on some models) is different — it's an explicit reasoning scratchpad — but in a standard response, the generation is the thinking.
Why Phrasing Matters — Practical Examples
Because Claude predicts likely continuations, your framing powerfully shapes the kind of response you get. The same underlying question, phrased differently, can produce very different outputs.
pathlib, os.path.getmtime, and proper type annotations.Attention — What Claude Weighs Most Heavily
Claude's attention mechanism means not all parts of the context carry equal weight:
| Position / Type | Relative influence | Practical implication |
|---|---|---|
| System prompt (top of context) | Very high | Good place for persistent instructions, persona, constraints |
| Most recent message | Very high | The immediate question; Claude focuses here |
| Earlier conversation turns | Moderate | Available but competes with recency; restate key details if drift occurs |
| Middle of a long document you pasted | Lower | Known as "lost in the middle" — put critical info at the start or end |
| End of your message | High | A final instruction or format spec carries real weight |
Key Takeaways
Unusual words, code, and jargon cost more tokens. Keep prompts crisp; waffle costs context space and dilutes signal.
Claude re-reads everything every time. There's no forgetting within a session — only context limits and reasoning errors.
Claude writes left to right and can't revise. Asking it to reason before concluding genuinely changes the output.
Your prompt is the start of a document Claude is completing. Make your opener look like the document you want.
Critical instructions belong at the start (system prompt) or end (final message). The middle of a long paste is easy to miss.