System prompts

Advanced AI Workflows & the Claude API

Chapter 1 · The Claude API — Authentication, Models, and Your First Request

Everything you've done in previous courses — using Claude Code, building projects, pair programming in VS Code — runs on top of the Claude API. This course goes one layer deeper: you'll call the API directly, build Claude-powered applications, chain requests together, and design workflows that go beyond what any chat interface can do. This first chapter covers authentication, the current model family, and the anatomy of a request and response.

Getting an API Key

The API key is the credential that identifies your application to Anthropic. Every request must include it. To get one:

Sign in at console.anthropic.com
Navigate to API Keys in the left sidebar
Click Create Key, give it a name (e.g. "meal-planner-dev"), and copy it immediately — it won't be shown again
Store it in an environment variable: ANTHROPIC_API_KEY=sk-ant-...

Never hardcode API keys

Do not put your API key directly in source code. It will end up in git history, get pushed to GitHub, and be scraped by bots within minutes. Always read it from an environment variable or a secrets manager. Add .env to your .gitignore immediately.

Installing the SDK

Anthropic provides official SDKs for Python and TypeScript. For Python:

Terminalshell

      pip install anthropic

      # Or with uv (faster)

      uv add anthropic

The SDK handles authentication, request serialisation, error parsing, and retry logic for you. You can also call the API directly over HTTP — the SDK is a thin convenience wrapper, not a black box. Understanding the raw request format (covered below) makes debugging SDK issues much easier.

The Current Model Family

claude-opus-4-8

Opus · Most capable

Anthropic's most intelligent model. Best reasoning, analysis, and complex multi-step tasks. Highest cost.

Best for: research synthesis, complex code generation, nuanced writing, difficult reasoning chains

claude-sonnet-4-6

Sonnet · Balanced

High capability at moderate cost. The default choice for most production applications — strong performance, reasonable price.

Best for: most production use cases, API integrations, coding assistants, content generation

claude-haiku-4-5-20251001

Haiku · Fast & cheap

Fastest response time, lowest cost. Excellent for high-volume, latency-sensitive tasks where cost per request matters.

Best for: classification, extraction, summarisation at scale, real-time features, chatbots

claude-fable-5

Fable · Creative

Optimised for creative and narrative tasks. Strong at fiction, dialogue, and expressive writing.

Best for: creative writing, storytelling, dialogue generation, narrative content

Start with Sonnet, optimise later

When building a new feature, start with claude-sonnet-4-6. Once the feature works correctly, test whether Haiku produces acceptable quality at lower cost. Switch to Opus only if Sonnet's output quality isn't sufficient. This order saves money during development and avoids premature optimisation.

Your First API Request

basic_request.pypython

      import anthropic

      client = anthropic.Anthropic()
        # reads ANTHROPIC_API_KEY from env automatically

      message = client.messages.create(

          model="claude-sonnet-4-6",

          max_tokens=1024,

          messages=[

              {"role": "user", "content": "Explain what an API key is in two sentences."}

          ]

      )

      print(message.content[0].text)

That's the minimum viable request — model, max_tokens, and a messages list with at least one user turn. Everything else is optional. Run it and you'll see Claude's response printed to the terminal.

Request Parameters — What Each Field Does

model

required

The model ID string. Use the exact ID from the model table above — abbreviated names like "sonnet" are not accepted. Example: "claude-sonnet-4-6"

max_tokens

required

Maximum tokens Claude will generate in the response. Claude stops at this limit even mid-sentence. Set it high enough for your expected output — it's a ceiling, not a target. Typical values: 256 (short answers), 1024 (paragraphs), 4096 (long documents), 8192+ (very long outputs)

messages

required

List of conversation turns. Each turn is {"role": "user" | "assistant", "content": "..."}. Must start with a user turn. Alternate roles for multi-turn conversations. For a fresh request: one user message. For conversation history: alternate user/assistant pairs followed by the new user message.

system

optional

The system prompt — instructions that shape Claude's behaviour for the entire conversation. Equivalent to CLAUDE.md for an API application. Set the role, constraints, output format, and persona here. Not part of the messages list — it's a separate top-level field.

temperature

optional

Controls randomness. Range 0–1. Lower = more deterministic and consistent; higher = more varied and creative. Default is 1. Use 0 for extraction/classification tasks. Use 0.7–1 for creative generation.

stop_sequences

optional

List of strings that cause Claude to stop generating when encountered. Useful for structured output where you want Claude to stop at a delimiter. Example: ["", "###END###"]

stream

optional

Set to True to receive the response as a stream of events rather than waiting for the full response. Covered in depth in Chapter 4. Essential for any UI that shows Claude's response as it's generated.

Reading the Response Object

Response object — key fields

message.id unique request ID
  # "msg_01XFDUDYJgAACzvnptvVoYEL" — log this for debugging

message.content[0].text the actual response
  # The text Claude generated. content is a list — always index [0] for simple requests

message.usage.input_tokens tokens consumed
message.usage.output_tokens tokens generated
  # Input + output tokens = what you're billed for

message.stop_reason why Claude stopped
  # "end_turn" → Claude finished naturally ✓
  # "max_tokens" → hit the limit — increase max_tokens if output was cut off
  # "stop_sequence" → hit one of your stop_sequences
  # "tool_use" → Claude wants to call a tool (Chapter 3)

Always check stop_reason

If stop_reason is "max_tokens", Claude's response was cut off — the output is incomplete. Either increase max_tokens or redesign the prompt to produce shorter output. A truncated response that looks complete is a subtle bug that's easy to miss in testing.

A Request with a System Prompt

with_system_prompt.pypython

      message = client.messages.create(

          model="claude-sonnet-4-6",

          max_tokens=512,

          system="You are a concise technical writer. "

                 "Always respond in plain text with no markdown. "

                 "Keep answers under 3 sentences.",

          messages=[

              {"role": "user", "content": "What is a webhook?"}

          ]

      )

Common API Errors and What They Mean

Status	Error type	Cause and fix
401	authentication_error	API key missing, wrong, or revoked. Check your environment variable is set and the key is valid in the console.
400	invalid_request_error	Malformed request — wrong field name, wrong type, messages list starts with assistant turn, or max_tokens exceeds model limit. Read the error message; it's usually specific.
429	rate_limit_error	You've exceeded your requests-per-minute or tokens-per-minute limit. Implement exponential backoff and retry. Chapter 10 covers rate limit strategy in depth.
413	request_too_large	Input tokens exceed the model's context window. Reduce the size of your messages — chunk long documents or truncate conversation history.
500	api_error	Anthropic server error. Retry with backoff — these are transient. If persistent, check status.anthropic.com.
529	overloaded_error	Anthropic is temporarily overloaded. Retry with backoff. More common during peak hours — design your application to handle this gracefully.

Handling Errors in Code

error_handling.pypython

      import anthropic

      client = anthropic.Anthropic()

      try:

          message = client.messages.create(...)

      except anthropic.AuthenticationError:

          raise RuntimeError("Invalid API key — check ANTHROPIC_API_KEY")

      except anthropic.RateLimitError:

          raise   # caller handles retry with backoff

      except anthropic.APIStatusError as e:

          print(f"API error {e.status_code}: {e.message}")

          raise

Next — Chapter 2: System Prompts
Shaping Claude's behaviour at the API level — writing effective system prompts, controlling persona and constraints, the difference between system and user instructions, and designing system prompts for production applications.