Building Multi-Step Agents | Architect

A single prompt call is a question. An agent is a conversation with a purpose. The difference is that an agent knows what it is trying to accomplish and keeps working until it gets there.

Analyze Phase

What Makes Something an Agent

01

The Four Agent Properties

Goal-directed: Has a defined objective, not just a task. An agent knows what success looks like and works toward it.

Tool-using: Can take actions beyond text generation — search, read files, call APIs, execute commands. It expands its capability beyond language.

Memory-aware: Maintains context across steps, not just within one call. It remembers what it learned in step 2 when it reaches step 5.

Self-evaluating: Can assess its own output and decide if it meets the goal. If not, it loops and tries again.

02

Agent vs. Chain vs. Single Call

Single call: One prompt, one response. You ask a question, AI answers. Done.

Chain: Fixed sequence of calls. Step 1 → Step 2 → Step 3. The path is predetermined.

Agent: Dynamic. It decides what to do next based on results. If it finds what it's looking for at step 2, it skips step 3. If it needs more information, it loops.

03

The ReAct Loop (Reason → Act → Observe)

Every agent runs a loop: Reason (what do I know? what do I need?) → Act (call a tool or API) → Observe (what did I get back?) → Repeat until goal met or max steps reached.

This loop is the heartbeat of every agent. Understanding it explains agent behavior completely.

The danger of agents is infinite loops and runaway costs. Every agent you build must have: a maximum step count, a cost ceiling, and a human escalation path. These are not optional — they are architecture requirements.

Integrate Phase

Building a Research Agent

04

Agent Architecture Diagram

Input: research question

Step 1: Claude decides what to search

Step 2: Search tool returns results

Step 3: Claude evaluates results

Step 4: Claude synthesizes or loops

Output: structured research report

This is the basic pattern. Every research agent follows this skeleton.

05

The Tool Interface Pattern

Every tool your agent uses needs a clean interface:

name: What is this tool called?
description: When should Claude use it? (This is critical — Claude uses the description to decide WHEN to use the tool)
input_schema: What data does it need?
output_schema: What does it return?

Claude uses the description to decide WHEN to use the tool. The schema tells it HOW.

06

Python Agent Skeleton

def run_agent(question, max_steps=5):
    step = 0
    messages = [{"role": "user", "content": question}]

    while step < max_steps:
        response = client.messages.create(
            model="claude-opus",
            max_tokens=2048,
            tools=tools,
            messages=messages
        )

        if response.stop_reason == "end_turn":
            return response.content[0].text

        # Execute tool
        tool_call = response.content[0]
        result = execute_tool(tool_call.name, tool_call.input)

        # Add to conversation
        messages.append({"role": "assistant", "content": response.content})
        messages.append({
            "role": "user",
            "content": [{"type": "tool_result", "content": result}]
        })

        step += 1

    return "Max steps reached"

This skeleton has the ReAct loop logic, step counting, and cost tracking (implicit in max_steps). Use it as your starting point.

The tool description is as important as the tool itself. If Claude cannot understand when to use your tool from the description alone, the agent will use it wrong, too much, or not at all. Write tool descriptions like you are writing them for a smart colleague who has never seen your codebase.

Manage Phase

Agent Reliability and Safety

07

The Max Steps Ceiling

Set max_steps based on your use case:

Research agent: 10
Customer support bot: 5
Anything with write access: 3

Agents that write to databases or send messages need the tightest leashes. If your agent can delete data, it should have max_steps=2.

08

Memory Patterns

Session memory: In-context, resets per conversation. Fast, fits in context window, but forgets between sessions.

Persistent memory: Write key facts to a file, reload on next session. Survives restarts, but requires file management.

Semantic memory: Embed and retrieve relevant context by similarity. Handles large datasets, but adds latency and cost.

09

Human-in-the-Loop Checkpoints

High-stakes agents must pause before irreversible actions.

Pattern: Agent proposes action → log to queue → human approves → agent executes.

Never skip this for anything with side effects. If the agent can send an email, delete a file, or charge a card — it must ask for approval first.

PTR — Prove The Result

Build a 3-step research agent that: takes a question as input, calls a mock search tool (returns hardcoded results), asks Claude to evaluate if the results answer the question, and either returns the answer or runs one more search. Enforce max_steps=3. Log each step's reasoning.

Define the tool interface: name, description, input_schema, output_schema for a search tool.
Implement run_agent(question, max_steps=3) with the ReAct loop.
Test with a real question. Observe the loop run. Print each step's reasoning and tool calls.
Verify max_steps is enforced: the agent stops after 3 steps, even if it's not done.

Common Mistakes

⚠ "My agent runs forever"

You forgot max_steps. Always set it. If you set it to 10 and your agent hits 9 without finishing — that's the ceiling working. Lower it and try again.

⚠ "Claude doesn't understand when to use my tool"

Your tool description is too vague. Instead of "search for information," write "Use this tool to search a database of product specifications when the user asks about a feature or price." Be specific about when and why.

⚠ "The agent loops on the same tool"

The tool is returning the same result each time. Check: Is the tool being called with different inputs? If yes, is the Observe step actually updating the agent's mental model? The agent may not realize the tool result is the same as before.

Module Checkpoint

Before You Move On

✓ Verify These Four Things

Can distinguish agent from chain from single call — and explain why agent is different.
Know the four required properties of a real agent (goal-directed, tool-using, memory-aware, self-evaluating).
Have written a ReAct loop with tool use in Python — even if it's a skeleton or mock.
Know three memory patterns and can name one use case for each.

AIM Commitment

What You Proved Today

Analyze: Mapped the four properties that define an agent and how they differ from chains.
Integrate: Built a ReAct-pattern agent with tool interfaces and step logging.
Manage: Designed agent safety controls: step ceiling, cost cap, human escalation path.

Building Multi-Step AgentsChains, Tools, Memory