Building a simple AI-powered app

Advanced AI Workflows & the Claude API

Chapter 5 · Building a Simple AI-Powered App

The first four chapters covered the Claude API in pieces — authentication, system prompts, tool use, streaming. This chapter puts them together. You'll build a complete AI-powered chat application: a FastAPI backend that manages conversation history and calls Claude, a streaming endpoint, and a minimal frontend that reads the stream. The goal is a working mental model of how all the parts connect, not a production-ready UI framework.

What We're Building

Browser

HTML + vanilla JS

⟷

FastAPI

Python backend

⟷

Claude API

Anthropic SDK

In-memory session store

conversation history per session

The app has two routes: GET / serves the HTML page, and POST /chat accepts a user message, appends it to the conversation history, calls Claude with streaming, and returns the response as a stream. The browser appends tokens as they arrive.

Project Structure

ai_chat_app/
  main.pyFastAPI app — routes, session store, Claude calls
  .envANTHROPIC_API_KEY=sk-ant-...
  static/
    index.htmlfrontend — chat UI + streaming fetch
  requirements.txtfastapi, uvicorn, anthropic, python-dotenv

requirements.txttext

      fastapi

      uvicorn[standard]

      anthropic

      python-dotenv

Step 1 — Conversation History

Claude has no memory between API calls. Every request must include the full conversation so far. You manage this by keeping a list of {"role": "...", "content": "..."} dicts and appending each new turn. For a real app you'd store this in a database; for this example, a server-side dict keyed by session ID is enough:

Conversation history after two turns

system

"You are a helpful assistant. Be concise."

user

"What is FastAPI?"

assistant

"FastAPI is a modern Python web framework for building APIs…"

user

"How does it compare to Flask?"

The next API call includes all four entries. Claude sees the context from the first question when answering the second — that's how conversation continuity works.

Step 2 — The FastAPI Backend

main.pypython

      import uuid, os

      from dotenv import load_dotenv

      from fastapi import FastAPI, Cookie, Response

      from fastapi.responses import FileResponse, StreamingResponse

      from fastapi.staticfiles import StaticFiles

      from pydantic import BaseModel

      import anthropic

      load_dotenv()

      app = FastAPI()

      client = anthropic.Anthropic()

      app.mount("/static", StaticFiles(directory="static"), name="static")

      # In-memory session store: session_id → list of message dicts

      sessions: dict[str, list] = {}

      SYSTEM_PROMPT = """You are a helpful assistant. \

      Answer concisely and accurately. \

      If you don't know something, say so."""

      class ChatRequest(BaseModel):

          message: str

      @app.get("/")

      def index(response: Response, session_id: str | None = Cookie(default=None)):

          if not session_id or session_id not in sessions:

              session_id = str(uuid.uuid4())

              sessions[session_id] = []

              response.set_cookie("session_id", session_id)

          return FileResponse("static/index.html")

      def stream_claude(history: list, collected: list):

          with client.messages.stream(

              model="claude-sonnet-4-6",

              max_tokens=2048,

              system=SYSTEM_PROMPT,

              messages=history

          ) as stream:

              for text in stream.text_stream:

                  collected.append(text)  # accumulate for history

                  yield text             # stream to client

      @app.post("/chat")

      def chat(req: ChatRequest, session_id: str = Cookie()):

          history = sessions[session_id]

          history.append({"role": "user", "content": req.message})

          collected = []  # will hold assistant reply chunks

          def after_stream():

              # Save assistant reply to history once stream ends

              history.append({"role": "assistant", "content": "".join(collected)})

          def generate():

              yield from stream_claude(history, collected)

              after_stream()

          return StreamingResponse(generate(), media_type="text/plain")

Step 3 — The Frontend

static/index.htmlhtml

      <!DOCTYPE html>

      <html lang="en"><head>

      <meta charset="UTF-8">

      <title>AI Chat</title>

      <style>

        body { font-family: system-ui; background: #0d1117; color: #c9d1d9; 

               display:flex; flex-direction:column; height:100vh; margin:0; padding:1rem; }

        #log { flex:1; overflow-y:auto; padding:1rem; background:#161b22;

              border-radius:8px; margin-bottom:1rem; white-space:pre-wrap; }

        .user-msg { color:#7ee787; margin:0.5rem 0; }

        .bot-msg  { color:#c9d1d9; margin:0.5rem 0; }

        #form { display:flex; gap:0.5rem; }

        input { flex:1; padding:0.6rem; background:#161b22; border:1px solid #30363d;

               border-radius:6px; color:#c9d1d9; }

        button { padding:0.6rem 1.2rem; background:#a78bfa; border:none;

                border-radius:6px; color:#0d1117; font-weight:700; cursor:pointer; }

      </style></head><body>

      <div id="log"></div>

      <form id="form">

        <input id="msg" placeholder="Type a message…" autocomplete="off">

        <button>Send</button>

      </form>

      <script>

      const log = document.getElementById('log');

      function addMsg(cls) {

        const el = document.createElement('div');

        el.className = cls;

        log.appendChild(el);

        return el;

      }

      document.getElementById('form').addEventListener('submit', async (e) => {

        e.preventDefault();

        const input = document.getElementById('msg');

        const text = input.value.trim();

        if (!text) return;

        addMsg('user-msg').textContent = 'You: ' + text;

        input.value = '';

        const botEl = addMsg('bot-msg');

        botEl.textContent = 'Claude: ';

        const res = await fetch('/chat', {

          method: 'POST',

          headers: { 'Content-Type': 'application/json' },

          body: JSON.stringify({ message: text })

        });

        const reader = res.body.getReader();

        const dec = new TextDecoder();

        while (true) {

          const { done, value } = await reader.read();

          if (done) break;

          botEl.textContent += dec.decode(value);

          log.scrollTop = log.scrollHeight;

        }

      });

      </script></body></html>

Running the App

terminalbash

      # Install dependencies

      pip install -r requirements.txt

      # Run the server (--reload for auto-restart on file changes)

      uvicorn main:app --reload

      # Open in browser

      http://localhost:8000

How the Pieces Connect

1

Browser loads the page → receives a session cookie FastAPI's GET / creates a UUID session and sets it as a cookie. That cookie travels with every subsequent request, identifying whose history to use.
2

User sends a message → appended to history POST /chat receives the JSON body, looks up the session's history list, and appends {"role": "user", "content": "..."}.
3

Claude is called with the full history The messages parameter contains every turn so far — Claude sees the full conversation every time, which is how it maintains context across turns.
4

Response streams to the browser The generator yields text chunks as they arrive from Claude. FastAPI's StreamingResponse forwards them immediately. The browser appends each chunk to the message element.
5

After stream ends → assistant turn saved to history The accumulated chunks are joined and appended as {"role": "assistant", "content": "..."}. The next user message will include this reply, keeping the conversation coherent.

What This App Has and What It's Missing

✓Conversation history — context across turns

✓Session isolation — separate history per user

✓Streaming — tokens appear in real time

✓System prompt — controls Claude's behaviour

✗Persistence — history lost on server restart

✗Auth — anyone with the URL can chat

✗History pruning — history grows unbounded

✗Error handling — no timeout / retry logic

History pruning matters in production

Every message you send includes the entire history — so costs grow with conversation length. Common strategies: keep only the last N turns, summarise old context into a single message, or use prompt caching (Chapter 6) to avoid re-encoding repeated context on every call.

API key security

The API key lives in .env and is read server-side only — it never touches the browser. Never put your API key in frontend code; it would be visible to anyone who opens DevTools.

Next — Chapter 6: Prompt Caching
Every API call re-encodes your system prompt and any shared context — even if it hasn't changed. Prompt caching lets Anthropic store that prefix server-side and reuse it across calls, cutting latency by up to 85% and cost by up to 90% on the cached tokens. Chapter 6 covers how caching works, when to use it, and how to add the cache breakpoints to an existing app.