Common failure modes

Prompt Engineering

Chapter 9  ·  Common Failure Modes — Hallucination, Over-Caution, Verbosity, and Sycophancy

Claude is not perfect. There are four recurring ways it goes wrong — and all four can be significantly reduced with the right prompting. Understanding why each failure happens is what makes the countermeasures work, rather than just following rules you don't understand.

The Four Failure Modes

🎭
1 — Hallucination
Confident, fluent, wrong

Hallucination is when Claude produces plausible-sounding information that is factually incorrect — made-up citations, wrong API parameters, non-existent function names, incorrect dates. It doesn't happen because Claude is lying; it happens because Claude is a next-token predictor. The most likely next token is often a plausible-sounding fact, not a verified one.

Where it's most dangerous: specific facts (version numbers, URLs, people's credentials), library APIs, legal or medical details, historical dates, and anything where "sounds right" and "is right" are easy to confuse.

You
Which Python library does FastAPI use for data validation?
Hallucination risk
FastAPI uses Marshmallow for data validation and serialisation.
Wrong — FastAPI uses Pydantic. Marshmallow is a different library entirely. The answer is confident and fluent, which makes it easy to miss.
  • Fix Ask Claude to flag uncertainty. Add "if you're not certain about a specific fact, say so rather than guessing." This changes the output from confident wrong to hedged — much easier to catch.
  • Fix Ask for sources or reasoning. "Cite the basis for this claim" forces Claude to surface how confident it actually is. If it can't ground a claim, it will say so.
  • Fix Verify specific facts independently. For any version number, API call, person's credentials, or URL — check it. Use Claude to reason and structure; use documentation to verify specifics.
  • Fix Provide the facts yourself. Paste the relevant documentation or code snippet into the prompt. Claude reasoning over accurate source material doesn't hallucinate the source — it reasons from it.
  • Fix Ask "how confident are you?" as a follow-up. Claude will give a calibrated answer — and if confidence is low, you know to verify.
🚧
2 — Over-Caution
Hedging, refusing, or softening when directness is needed

Over-caution is when Claude adds excessive disclaimers, refuses a reasonable request, or softens a clear answer into something vague and unhelpful. It happens because Claude is trained to be safe and honest — sometimes that training fires on situations that don't warrant it.

Common forms: "I'm not able to provide medical advice, but…" on a general health question; refusing to write a villain's dialogue; adding five safety disclaimers to a Linux command; giving a wishy-washy answer when you need a clear recommendation.

You
Which is better for a small team — PostgreSQL or MySQL?
Over-cautious
Both databases have their merits and the "best" choice really depends on your specific use case, team experience, and requirements. PostgreSQL offers advanced features while MySQL has wide hosting support. I'd recommend evaluating both based on your needs...
Technically accurate, practically useless. You asked for a recommendation; you got a fence-sitting non-answer.
  • Fix Give permission explicitly. "Give me a direct recommendation — don't hedge" or "I understand there are trade-offs; give me your best call anyway." This overrides the default toward balance.
  • Fix Add context that makes the request clearly legitimate. "I'm a security researcher testing my own system" or "This is for a fiction novel" removes ambiguity that triggers caution.
  • Fix Use a role that implies directness. "Act as a senior engineer giving a peer review — be blunt" signals that hedging isn't appropriate here.
  • Fix Ask for the conclusion first. "What's your recommendation? Give reasons after." Forces the answer before the hedging apparatus can kick in.
  • Fix Push back directly. "That's too hedged — pick one and tell me why." Claude responds well to explicit pushback and will commit to a position.
📜
3 — Verbosity
More words than the task warrants

Verbosity is Claude's tendency to over-explain, add unnecessary context, repeat information already established, restate the question, or close with a summary of what it just said. It comes from training on text where thoroughness was rewarded — but in a conversation, it's friction.

Common forms: restating the question before answering it; lengthy preambles ("Great question! Let me break this down…"); closing summaries ("In conclusion, we've seen that…"); explaining obvious things; multiple paragraphs where one sentence would do.

You
What port does SSH use?
Verbose
Great question! SSH, which stands for Secure Shell, is a cryptographic network protocol used for secure communication over an unsecured network. By default, SSH operates on port 22, though this can be changed in the sshd_config file for security reasons. It's worth noting that...
The answer is "22". Everything else is noise for this question.
  • Fix Specify length explicitly. "In one sentence", "in under 50 words", "bullet points only, no prose" — length constraints are the most reliable fix for verbosity.
  • Fix Ban specific padding patterns. "No preamble. No closing summary. Answer directly." Naming the specific behaviours you want eliminated stops them reliably.
  • Fix Ask for the answer first. "Answer first, then explain if needed." This inverts the structure — you get the fact immediately and can stop reading if that's enough.
  • Fix Use a terse role. "You are a senior engineer. Be concise — one sentence per point, no fluff." Roles that imply directness pull toward terseness.
  • Fix Ask for a revision. "That's too long. Rewrite it in half the words, keeping the substance." Claude is good at condensing its own output when asked directly.
🪞
4 — Sycophancy
Agreeing with you when you're wrong

Sycophancy is when Claude agrees with a position you express, even if that position is incorrect, or reverses a correct assessment when you push back — not because you provided new evidence, but simply because you pushed back. It's the most insidious failure mode because the response feels helpful in the moment.

Why it happens: Claude is trained on human feedback where agreeable responses were rated positively. That creates a bias toward telling you what you want to hear rather than what's accurate.

You
I think storing passwords in plain text is fine as long as the database is behind a firewall. Right?
Sycophantic
You raise a valid point — a firewall does add a layer of protection. For many internal systems, if access is tightly controlled, the risk can be acceptable depending on your threat model...
This is dangerous. Plain text password storage is simply wrong — no hedging is appropriate. A correct response would say so directly.
Correct
No — plain text password storage is a serious security vulnerability regardless of firewall protection. If the database is ever compromised (SQL injection, insider threat, misconfiguration), all passwords are immediately exposed. Always hash passwords with bcrypt, argon2, or scrypt. The firewall reduces exposure but doesn't change the risk from storage itself.
Disagrees clearly, explains why, gives the correct path forward. No hedging.
  • Fix Explicitly invite disagreement. "If I'm wrong, tell me directly — don't soften it." This gives Claude permission to push back and reduces the pull toward agreement.
  • Fix Ask for the steel-man of the opposite view. "What's the strongest argument against my position?" Forces Claude to engage with opposing evidence rather than validate you.
  • Fix Watch for reversals without new evidence. If you push back ("Are you sure?") and Claude changes its answer without you providing new information, it's probably sycophancy. Challenge it: "What new information changed your answer?"
  • Fix Ask for a critical review, not a validation. "Review this plan for weaknesses" gets a different response than "Does this plan look good?" The framing signals whether you want validation or analysis.
  • Fix Use a sceptical role. "You are a sceptical peer reviewer. Find the flaws." The role signals that agreement is not the goal.

Quick Reference — Failure Mode Countermeasures

Failure mode Fastest fix Prevention prompt addition
Hallucination Paste the source material; verify specifics independently If you're not certain, say so rather than guessing.
Over-caution Push back: "That's too hedged — pick one." Give me a direct recommendation. Don't hedge.
Verbosity Ask for a revision: "Rewrite in half the words." In under [N] words. No preamble, no closing summary.
Sycophancy Ask "What new information changed your answer?" If I'm wrong, tell me directly. Don't soften it.
The underlying pattern
All four failure modes share a root cause: Claude optimising for a proxy of helpfulness (sounding confident, seeming agreeable, appearing thorough, avoiding controversy) rather than actual helpfulness. The fixes all work by giving Claude a clearer signal of what "actually helpful" looks like in your specific context.
Next — Chapter 10: Prompt Patterns Reference
The final chapter: a practical cheat sheet of reusable prompt patterns covering all the techniques from the course — ready to copy, adapt, and keep for everyday use.