A single prompt call is a question. An agent is a conversation with a purpose. The difference is that an agent knows what it is trying to accomplish and keeps working until it gets there.
What Makes Something an Agent
Goal-directed: Has a defined objective, not just a task. An agent knows what success looks like and works toward it.
Tool-using: Can take actions beyond text generation — search, read files, call APIs, execute commands. It expands its capability beyond language.
Memory-aware: Maintains context across steps, not just within one call. It remembers what it learned in step 2 when it reaches step 5.
Self-evaluating: Can assess its own output and decide if it meets the goal. If not, it loops and tries again.
Single call: One prompt, one response. You ask a question, AI answers. Done.
Chain: Fixed sequence of calls. Step 1 → Step 2 → Step 3. The path is predetermined.
Agent: Dynamic. It decides what to do next based on results. If it finds what it's looking for at step 2, it skips step 3. If it needs more information, it loops.
Every agent runs a loop: Reason (what do I know? what do I need?) → Act (call a tool or API) → Observe (what did I get back?) → Repeat until goal met or max steps reached.
This loop is the heartbeat of every agent. Understanding it explains agent behavior completely.
The danger of agents is infinite loops and runaway costs. Every agent you build must have: a maximum step count, a cost ceiling, and a human escalation path. These are not optional — they are architecture requirements.
Building a Research Agent
Input: research question
Step 1: Claude decides what to search
Step 2: Search tool returns results
Step 3: Claude evaluates results
Step 4: Claude synthesizes or loops
Output: structured research report
This is the basic pattern. Every research agent follows this skeleton.
Every tool your agent uses needs a clean interface:
- name: What is this tool called?
- description: When should Claude use it? (This is critical — Claude uses the description to decide WHEN to use the tool)
- input_schema: What data does it need?
- output_schema: What does it return?
Claude uses the description to decide WHEN to use the tool. The schema tells it HOW.
def run_agent(question, max_steps=5):
step = 0
messages = [{"role": "user", "content": question}]
while step < max_steps:
response = client.messages.create(
model="claude-opus",
max_tokens=2048,
tools=tools,
messages=messages
)
if response.stop_reason == "end_turn":
return response.content[0].text
# Execute tool
tool_call = response.content[0]
result = execute_tool(tool_call.name, tool_call.input)
# Add to conversation
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{"type": "tool_result", "content": result}]
})
step += 1
return "Max steps reached"
This skeleton has the ReAct loop logic, step counting, and cost tracking (implicit in max_steps). Use it as your starting point.
The tool description is as important as the tool itself. If Claude cannot understand when to use your tool from the description alone, the agent will use it wrong, too much, or not at all. Write tool descriptions like you are writing them for a smart colleague who has never seen your codebase.
Agent Reliability and Safety
Set max_steps based on your use case:
- Research agent: 10
- Customer support bot: 5
- Anything with write access: 3
Agents that write to databases or send messages need the tightest leashes. If your agent can delete data, it should have max_steps=2.
Session memory: In-context, resets per conversation. Fast, fits in context window, but forgets between sessions.
Persistent memory: Write key facts to a file, reload on next session. Survives restarts, but requires file management.
Semantic memory: Embed and retrieve relevant context by similarity. Handles large datasets, but adds latency and cost.
High-stakes agents must pause before irreversible actions.
Pattern: Agent proposes action → log to queue → human approves → agent executes.
Never skip this for anything with side effects. If the agent can send an email, delete a file, or charge a card — it must ask for approval first.
Build a 3-step research agent that: takes a question as input, calls a mock search tool (returns hardcoded results), asks Claude to evaluate if the results answer the question, and either returns the answer or runs one more search. Enforce max_steps=3. Log each step's reasoning.
- Define the tool interface: name, description, input_schema, output_schema for a search tool.
- Implement run_agent(question, max_steps=3) with the ReAct loop.
- Test with a real question. Observe the loop run. Print each step's reasoning and tool calls.
- Verify max_steps is enforced: the agent stops after 3 steps, even if it's not done.
Common Mistakes
Before You Move On
- Can distinguish agent from chain from single call — and explain why agent is different.
- Know the four required properties of a real agent (goal-directed, tool-using, memory-aware, self-evaluating).
- Have written a ReAct loop with tool use in Python — even if it's a skeleton or mock.
- Know three memory patterns and can name one use case for each.
What You Proved Today
- Analyze: Mapped the four properties that define an agent and how they differ from chains.
- Integrate: Built a ReAct-pattern agent with tool interfaces and step logging.
- Manage: Designed agent safety controls: step ceiling, cost cap, human escalation path.