Common failure modes
Prompt Engineering
Claude is not perfect. There are four recurring ways it goes wrong — and all four can be significantly reduced with the right prompting. Understanding why each failure happens is what makes the countermeasures work, rather than just following rules you don't understand.
The Four Failure Modes
Hallucination is when Claude produces plausible-sounding information that is factually incorrect — made-up citations, wrong API parameters, non-existent function names, incorrect dates. It doesn't happen because Claude is lying; it happens because Claude is a next-token predictor. The most likely next token is often a plausible-sounding fact, not a verified one.
Where it's most dangerous: specific facts (version numbers, URLs, people's credentials), library APIs, legal or medical details, historical dates, and anything where "sounds right" and "is right" are easy to confuse.
- Fix Ask Claude to flag uncertainty. Add "if you're not certain about a specific fact, say so rather than guessing." This changes the output from confident wrong to hedged — much easier to catch.
- Fix Ask for sources or reasoning. "Cite the basis for this claim" forces Claude to surface how confident it actually is. If it can't ground a claim, it will say so.
- Fix Verify specific facts independently. For any version number, API call, person's credentials, or URL — check it. Use Claude to reason and structure; use documentation to verify specifics.
- Fix Provide the facts yourself. Paste the relevant documentation or code snippet into the prompt. Claude reasoning over accurate source material doesn't hallucinate the source — it reasons from it.
- Fix Ask "how confident are you?" as a follow-up. Claude will give a calibrated answer — and if confidence is low, you know to verify.
Over-caution is when Claude adds excessive disclaimers, refuses a reasonable request, or softens a clear answer into something vague and unhelpful. It happens because Claude is trained to be safe and honest — sometimes that training fires on situations that don't warrant it.
Common forms: "I'm not able to provide medical advice, but…" on a general health question; refusing to write a villain's dialogue; adding five safety disclaimers to a Linux command; giving a wishy-washy answer when you need a clear recommendation.
- Fix Give permission explicitly. "Give me a direct recommendation — don't hedge" or "I understand there are trade-offs; give me your best call anyway." This overrides the default toward balance.
- Fix Add context that makes the request clearly legitimate. "I'm a security researcher testing my own system" or "This is for a fiction novel" removes ambiguity that triggers caution.
- Fix Use a role that implies directness. "Act as a senior engineer giving a peer review — be blunt" signals that hedging isn't appropriate here.
- Fix Ask for the conclusion first. "What's your recommendation? Give reasons after." Forces the answer before the hedging apparatus can kick in.
- Fix Push back directly. "That's too hedged — pick one and tell me why." Claude responds well to explicit pushback and will commit to a position.
Verbosity is Claude's tendency to over-explain, add unnecessary context, repeat information already established, restate the question, or close with a summary of what it just said. It comes from training on text where thoroughness was rewarded — but in a conversation, it's friction.
Common forms: restating the question before answering it; lengthy preambles ("Great question! Let me break this down…"); closing summaries ("In conclusion, we've seen that…"); explaining obvious things; multiple paragraphs where one sentence would do.
- Fix Specify length explicitly. "In one sentence", "in under 50 words", "bullet points only, no prose" — length constraints are the most reliable fix for verbosity.
- Fix Ban specific padding patterns. "No preamble. No closing summary. Answer directly." Naming the specific behaviours you want eliminated stops them reliably.
- Fix Ask for the answer first. "Answer first, then explain if needed." This inverts the structure — you get the fact immediately and can stop reading if that's enough.
- Fix Use a terse role. "You are a senior engineer. Be concise — one sentence per point, no fluff." Roles that imply directness pull toward terseness.
- Fix Ask for a revision. "That's too long. Rewrite it in half the words, keeping the substance." Claude is good at condensing its own output when asked directly.
Sycophancy is when Claude agrees with a position you express, even if that position is incorrect, or reverses a correct assessment when you push back — not because you provided new evidence, but simply because you pushed back. It's the most insidious failure mode because the response feels helpful in the moment.
Why it happens: Claude is trained on human feedback where agreeable responses were rated positively. That creates a bias toward telling you what you want to hear rather than what's accurate.
- Fix Explicitly invite disagreement. "If I'm wrong, tell me directly — don't soften it." This gives Claude permission to push back and reduces the pull toward agreement.
- Fix Ask for the steel-man of the opposite view. "What's the strongest argument against my position?" Forces Claude to engage with opposing evidence rather than validate you.
- Fix Watch for reversals without new evidence. If you push back ("Are you sure?") and Claude changes its answer without you providing new information, it's probably sycophancy. Challenge it: "What new information changed your answer?"
- Fix Ask for a critical review, not a validation. "Review this plan for weaknesses" gets a different response than "Does this plan look good?" The framing signals whether you want validation or analysis.
- Fix Use a sceptical role. "You are a sceptical peer reviewer. Find the flaws." The role signals that agreement is not the goal.
Quick Reference — Failure Mode Countermeasures
| Failure mode | Fastest fix | Prevention prompt addition |
|---|---|---|
| Hallucination | Paste the source material; verify specifics independently | If you're not certain, say so rather than guessing. |
| Over-caution | Push back: "That's too hedged — pick one." | Give me a direct recommendation. Don't hedge. |
| Verbosity | Ask for a revision: "Rewrite in half the words." | In under [N] words. No preamble, no closing summary. |
| Sycophancy | Ask "What new information changed your answer?" | If I'm wrong, tell me directly. Don't soften it. |