Building a simple AI-powered app

Advanced AI Workflows & the Claude API

Chapter 5  ·  Building a Simple AI-Powered App

The first four chapters covered the Claude API in pieces — authentication, system prompts, tool use, streaming. This chapter puts them together. You'll build a complete AI-powered chat application: a FastAPI backend that manages conversation history and calls Claude, a streaming endpoint, and a minimal frontend that reads the stream. The goal is a working mental model of how all the parts connect, not a production-ready UI framework.

What We're Building

Browser
HTML + vanilla JS
FastAPI
Python backend
Claude API
Anthropic SDK
In-memory session store
conversation history per session

The app has two routes: GET / serves the HTML page, and POST /chat accepts a user message, appends it to the conversation history, calls Claude with streaming, and returns the response as a stream. The browser appends tokens as they arrive.

Project Structure

ai_chat_app/
  main.pyFastAPI app — routes, session store, Claude calls
  .envANTHROPIC_API_KEY=sk-ant-...
  static/
    index.htmlfrontend — chat UI + streaming fetch
  requirements.txtfastapi, uvicorn, anthropic, python-dotenv
requirements.txttext
fastapi
uvicorn[standard]
anthropic
python-dotenv

Step 1 — Conversation History

Claude has no memory between API calls. Every request must include the full conversation so far. You manage this by keeping a list of {"role": "...", "content": "..."} dicts and appending each new turn. For a real app you'd store this in a database; for this example, a server-side dict keyed by session ID is enough:

Conversation history after two turns
system
"You are a helpful assistant. Be concise."
user
"What is FastAPI?"
assistant
"FastAPI is a modern Python web framework for building APIs…"
user
"How does it compare to Flask?"

The next API call includes all four entries. Claude sees the context from the first question when answering the second — that's how conversation continuity works.

Step 2 — The FastAPI Backend

main.pypython
import uuid, os
from dotenv import load_dotenv
from fastapi import FastAPI, Cookie, Response
from fastapi.responses import FileResponse, StreamingResponse
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel
import anthropic

load_dotenv()
app = FastAPI()
client = anthropic.Anthropic()
app.mount("/static", StaticFiles(directory="static"), name="static")

# In-memory session store: session_id → list of message dicts
sessions: dict[str, list] = {}

SYSTEM_PROMPT = """You are a helpful assistant. \
Answer concisely and accurately. \
If you don't know something, say so."""


class ChatRequest(BaseModel):
    message: str

@app.get("/")
def index(response: Response, session_id: str | None = Cookie(default=None)):
    if not session_id or session_id not in sessions:
        session_id = str(uuid.uuid4())
        sessions[session_id] = []
        response.set_cookie("session_id", session_id)
    return FileResponse("static/index.html")

def stream_claude(history: list, collected: list):
    with client.messages.stream(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system=SYSTEM_PROMPT,
        messages=history
    ) as stream:
        for text in stream.text_stream:
            collected.append(text) # accumulate for history
            yield text # stream to client

@app.post("/chat")
def chat(req: ChatRequest, session_id: str = Cookie()):
    history = sessions[session_id]
    history.append({"role": "user", "content": req.message})

    collected = [] # will hold assistant reply chunks

    def after_stream():
        # Save assistant reply to history once stream ends
        history.append({"role": "assistant", "content": "".join(collected)})

    def generate():
        yield from stream_claude(history, collected)
        after_stream()

    return StreamingResponse(generate(), media_type="text/plain")

Step 3 — The Frontend

static/index.htmlhtml
<!DOCTYPE html>
<html lang="en"><head>
<meta charset="UTF-8">
<title>AI Chat</title>
<style>
  body { font-family: system-ui; background: #0d1117; color: #c9d1d9;
         display:flex; flex-direction:column; height:100vh; margin:0; padding:1rem; }
  #log { flex:1; overflow-y:auto; padding:1rem; background:#161b22;
        border-radius:8px; margin-bottom:1rem; white-space:pre-wrap; }
  .user-msg { color:#7ee787; margin:0.5rem 0; }
  .bot-msg { color:#c9d1d9; margin:0.5rem 0; }
  #form { display:flex; gap:0.5rem; }
  input { flex:1; padding:0.6rem; background:#161b22; border:1px solid #30363d;
         border-radius:6px; color:#c9d1d9; }
  button { padding:0.6rem 1.2rem; background:#a78bfa; border:none;
          border-radius:6px; color:#0d1117; font-weight:700; cursor:pointer; }
</style></head><body>

<div id="log"></div>
<form id="form">
  <input id="msg" placeholder="Type a message…" autocomplete="off">
  <button>Send</button>
</form>

<script>
const log = document.getElementById('log');

function addMsg(cls) {
  const el = document.createElement('div');
  el.className = cls;
  log.appendChild(el);
  return el;
}

document.getElementById('form').addEventListener('submit', async (e) => {
  e.preventDefault();
  const input = document.getElementById('msg');
  const text = input.value.trim();
  if (!text) return;

  addMsg('user-msg').textContent = 'You: ' + text;
  input.value = '';

  const botEl = addMsg('bot-msg');
  botEl.textContent = 'Claude: ';

  const res = await fetch('/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ message: text })
  });

  const reader = res.body.getReader();
  const dec = new TextDecoder();
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    botEl.textContent += dec.decode(value);
    log.scrollTop = log.scrollHeight;
  }
});
</script></body></html>

Running the App

terminalbash
# Install dependencies
pip install -r requirements.txt

# Run the server (--reload for auto-restart on file changes)
uvicorn main:app --reload

# Open in browser
http://localhost:8000

How the Pieces Connect

  • 1
    Browser loads the page → receives a session cookie FastAPI's GET / creates a UUID session and sets it as a cookie. That cookie travels with every subsequent request, identifying whose history to use.
  • 2
    User sends a message → appended to history POST /chat receives the JSON body, looks up the session's history list, and appends {"role": "user", "content": "..."}.
  • 3
    Claude is called with the full history The messages parameter contains every turn so far — Claude sees the full conversation every time, which is how it maintains context across turns.
  • 4
    Response streams to the browser The generator yields text chunks as they arrive from Claude. FastAPI's StreamingResponse forwards them immediately. The browser appends each chunk to the message element.
  • 5
    After stream ends → assistant turn saved to history The accumulated chunks are joined and appended as {"role": "assistant", "content": "..."}. The next user message will include this reply, keeping the conversation coherent.

What This App Has and What It's Missing

Conversation history — context across turns
Session isolation — separate history per user
Streaming — tokens appear in real time
System prompt — controls Claude's behaviour
Persistence — history lost on server restart
Auth — anyone with the URL can chat
History pruning — history grows unbounded
Error handling — no timeout / retry logic
History pruning matters in production
Every message you send includes the entire history — so costs grow with conversation length. Common strategies: keep only the last N turns, summarise old context into a single message, or use prompt caching (Chapter 6) to avoid re-encoding repeated context on every call.
API key security
The API key lives in .env and is read server-side only — it never touches the browser. Never put your API key in frontend code; it would be visible to anyone who opens DevTools.
Next — Chapter 6: Prompt Caching
Every API call re-encodes your system prompt and any shared context — even if it hasn't changed. Prompt caching lets Anthropic store that prefix server-side and reuse it across calls, cutting latency by up to 85% and cost by up to 90% on the cached tokens. Chapter 6 covers how caching works, when to use it, and how to add the cache breakpoints to an existing app.