🎓 AIIQM-WELL University · Architect · Module A1 of 5

System Design
— How AI Products Are Structured

90 min
📋 Prerequisites: Builder level complete
📊 Architect Level
Lesson Outcome: You can diagram an AI product's layers, identify where intelligence lives, and design the interfaces that connect them.
AIMY Opening

Before We Begin

You've been using AI as a tool. In this module, you start seeing it as a system. Every AI product has the same skeleton — input layer, intelligence layer, output layer. Once you see it, you see it everywhere.

The architects who build reliable systems are not smarter than everyone else. They just see the structure first, then code second. Code without structure is expensive debugging. Structure without code is just a drawing.

By the end of this module, you'll be able to draw any AI product and explain why it either works reliably or breaks under load.

Analyze Phase

The Anatomy of an AI Product

Every AI product, regardless of complexity, has three core layers. Understanding these layers is the foundation of system architecture.

01
The Three Layers

Input Layer: What data comes in and how it's cleaned. This is where validation happens. Is the input valid? Is it the right size? Does it match what the AI expects?

Intelligence Layer: Where the model runs and decisions are made. The API call to Claude, the local inference, the vector search — this is where the actual AI happens.

Output Layer: How results are formatted and delivered to users. JSON? Email? A database? A file? The output layer decides what the user actually sees.

02
The Interface Question

The real architecture work is designing interfaces. Not just APIs — every handoff between layers is an interface. Bad interfaces equal fragile systems.

An interface defines: what data format flows through it, what size constraints exist, what happens when it breaks. Every interface needs a contract. If the contract is unclear, someone will break it under pressure.

Most architectural problems are not model problems. They are interface problems.

03
Real Product Map: Document Summarizer

Let's map a real example: a system that summarizes documents.

Input Layer: User uploads PDF. System extracts text. Validates: is it valid text? Is it under 100KB? Splits if too long.

Intelligence Layer: Calls Claude API with extraction + structured prompt. Claude returns JSON with summary and key points.

Output Layer: Formats summary as HTML. Stores in database. Sends email notification to user.

Every arrow between boxes is a contract. PDF extraction must produce valid text (not garbage). Claude must return valid JSON (or the next step fails). Email must handle delivery failures (or users lose results).

Most AI "products" that break are not model problems. They are interface problems. The model worked fine. The data coming in was wrong, or the output had nowhere clean to go.

Integrate Phase

Drawing Your System

Architecture lives on paper first, code second. The drawing IS the design.

04
The Architecture Diagram Rule

Before you write code, draw the boxes. Every component gets a box. Every handoff gets an arrow. Every arrow needs a contract.

What is a contract? It's the answer to these questions:

  1. What format does data have when it crosses this boundary?
  2. What is the maximum size allowed?
  3. What happens if the data is invalid?
  4. What is the timeout for this step?

Document these contracts on the diagram or in a separate spec. When you code the integration, you check the contract first.

05
Component Checklist

Every AI system needs these components. If your diagram is missing any, you have a blind spot:

  1. Input validation: Does the data match what we expect?
  2. Prompt construction: How is the prompt built from user input?
  3. Model call: How do we talk to the AI?
  4. Output parsing: Does the output have the structure we expect?
  5. Error handling: What happens when any of the above fail?
  6. Logging: Can we replay what happened?
06
Hands-On: Draw Your B5 Tool

Take the AI tool you built in Builder module B5. Draw its architecture on paper or in a tool like draw.io.

  1. Draw three boxes: Input, Intelligence, Output.
  2. Inside each box, list the components that live there.
  3. Draw arrows between boxes.
  4. For each arrow, write the contract: format, size, timeout, fallback.

You're done when someone else could read your diagram and understand exactly what your system does and why.

If you cannot draw it, you cannot build it reliably. The drawing IS the design. Code is just the implementation.

Manage Phase

Designing for Failure

Reliable systems are not systems that never fail. They are systems that fail predictably and recover automatically.

07
The Fallback Principle

Every AI system must have a fallback. Ask yourself: what happens when the model is slow? When the API is down? When the input is malformed? Design the happy path AND the failure path.

Examples:

  1. API timeout → use cached result from last successful call
  2. Invalid input → reject with clear error message, don't crash
  3. Model returns unparseable output → return null and log the raw output
  4. Database is full → implement queue and retry

Every system path (happy + all failures) should be documented on your architecture diagram.

08
Rate Limits and Cost Control

Claude API has rate limits. Every production system needs:

  1. Request queuing: Don't slam the API. Queue requests, process them one at a time or in batches.
  2. Retry logic with exponential backoff: If a call fails, wait 1 second, then 2, then 4. Don't retry immediately.
  3. Cost tracking: Track tokens used per call, per user, per day. Set alerts when you exceed budget.

Put these controls in your system before you go to production. Retrofitting them is expensive.

09
The Logging Imperative

Log everything: inputs, outputs, latency, token counts, errors. If you cannot replay a failure, you cannot fix it.

What to log:

  1. Timestamp and request ID (so you can trace the entire flow)
  2. Input data (sanitized — never log passwords or PII)
  3. Model response (raw, before parsing)
  4. Latency (how long did each step take?)
  5. Token counts (from Claude's usage object)
  6. Errors (full stack traces, not just messages)

Store logs in a central place (database, log aggregator). Make them queryable so you can ask "show me all failures for user X" or "what was the slowest step yesterday?"

PTR — Prove The Result

Draw a complete architecture diagram for a system that: takes a PDF → extracts text → sends to Claude → returns structured JSON → saves to a file.

  1. Include boxes for Input, Intelligence, and Output layers.
  2. Label all components inside each box.
  3. Draw arrows between boxes with labels (format, size, timeout).
  4. Design fallback paths for: PDF extraction fails, Claude API is slow, JSON parsing fails, file write fails.

You're done when you can walk someone else through every step and every failure mode. That is the proof.

Module Checkpoint

Before You Move On

✓ Verify These Four Things
  1. You can name the three layers of any AI product (Input, Intelligence, Output).
  2. You can identify interface points (handoffs) in a system and explain what a contract is.
  3. You understand why fallbacks are required, not optional, in production systems.
  4. You have drawn at least one architecture diagram with failure paths and can explain it.
AIM Commitment

What You Proved Today

You moved from seeing AI as a tool to seeing it as a system with structure, contracts, and failure modes.

  1. Analyze: You deconstructed AI products into three distinct layers — each with a clear purpose and responsibility.
  2. Integrate: You drew a system architecture with interface contracts and labeled all components.
  3. Manage: You designed fallback and error paths into every system component and understood why logging matters.