architecture-handbook

Agentic Workflows

Difficulty: Intermediate Time Investment: 2-3 hours Prerequisites: Understanding of prompts, RAG, and basic LM limitations


Learning Resources (Start Here)

Primary Video


Why This Matters

Traditional LM usage is single-shot: You send a prompt, get a response, done.

Agentic workflows introduce iteration and autonomy:

As a Technical Architect, you’ll encounter agentic patterns when:

Understanding these patterns helps you assess what’s possible and what guardrails are needed.


Key Concepts

The Four Agentic Design Patterns

From the Stanford lecture:

Pattern What It Does Example Use Case
Planning Multi-step planning to achieve goals “Build a web app” → breaks into design, code, test, deploy
Reflexion Examines own work and iterates Code review: generate → critique → regenerate
Tool Use Calls external APIs, scripts, databases Customer support: query CRM, retrieve order, process refund
Multi-Agent Specialised agents collaborate Research: one agent searches, one summarises, one validates

Key insight: Combining these patterns creates systems far more capable than single-shot prompting.


Pattern 1: Planning

How It Works

The LM breaks a complex goal into a sequence of sub-tasks.

Example Task: “Prepare a financial analysis report”

Planning Output:

Plan:
1. Gather quarterly revenue data from database
2. Calculate YoY growth percentages
3. Generate visualization charts
4. Write executive summary
5. Format as PDF

The agent then executes each step sequentially, using the output of one step as input to the next.

Common Approaches

Approach A: Fixed Plan

Approach B: Dynamic Re-Planning

When to Use


Pattern 2: Reflexion (Self-Critique)

How It Works

The LM generates output, reviews it, then regenerates based on its own feedback.

Example - Code Quality:

Prompt 1: "Write a Python function to validate email addresses"
Output 1: [basic regex function]

Prompt 2: "Review the above code. What edge cases are missing?"
Output 2: "Missing: special chars (+, .), subdomains, internationalised domains"

Prompt 3: "Rewrite the function incorporating your feedback"
Output 3: [improved function with edge case handling]

The Iteration Loop

graph LR
    A[Generate] --> B[Critique]
    B --> C{Good Enough?}
    C -->|No| D[Regenerate]
    D --> B
    C -->|Yes| E[Done]

Key question: When to stop iterating?

Performance Impact

From Stanford research: Reflexion improves output quality significantly on tasks like coding, writing, and reasoning.

Example Benchmark:

Diminishing returns: After 3-4 cycles, improvements plateau.

When to Use


Pattern 3: Tool Use

How It Works

The LM can call external functions/APIs to gather information or perform actions.

Example - Customer Support Agent:

User: "Can I get a refund for order #12345?"

Agent reasoning:
1. I need the refund policy → Call tool: get_policy()
2. I need order details → Call tool: get_order(12345)
3. I need product info → Call tool: get_product_details(order.product_id)
4. Generate response based on retrieved data

The Tool Calling Flow

graph TD
    A[User Query] --> B[LM Analyzes]
    B --> C{Need Tool?}
    C -->|Yes| D[Call Tool]
    D --> E[Tool Returns Data]
    E --> B
    C -->|No| F[Generate Final Response]

Key insight: The LM decides when and which tool to call. You provide the tools; the LM orchestrates them.

Tool Definition Example

{
  "name": "get_order",
  "description": "Retrieves order details by order ID",
  "parameters": {
    "order_id": {
      "type": "string",
      "description": "The unique order identifier"
    }
  }
}

The LM reads this schema and knows how to call the tool correctly.

Common Tools

Tool Type Example Use Case
Search Web search, vector DB Research, fact-checking
Code Execution Python interpreter Math, data analysis
APIs CRM, payment gateway Business workflows
File System Read/write files Document processing

When to Use


Pattern 4: Multi-Agent Collaboration

How It Works

Instead of one general-purpose agent, create specialised agents that collaborate.

Example - Research Task:

Goal: "Research AI safety regulations in the EU"

Agent 1 (Searcher):
- Role: Find relevant articles and papers
- Tools: Web search, academic databases

Agent 2 (Summariser):
- Role: Summarise findings
- Tools: Text processing

Agent 3 (Validator):
- Role: Fact-check claims
- Tools: Cross-reference multiple sources

Agent 4 (Writer):
- Role: Compile final report
- Tools: Document formatter

Collaboration Patterns

Pattern A: Sequential (Pipeline)

Searcher → Summariser → Validator → Writer

Each agent passes output to the next.

Pattern B: Debate

Agent A proposes solution
Agent B critiques
Agent A revises
Agent C makes final decision

Agents challenge each other to improve output quality.

Pattern C: Voting

3 agents generate solutions independently
Vote on best solution
Execute winning approach

Reduces error rate (majority vote filters outliers).

When to Use


Combining Patterns (Real-World Example)

Scenario: “Build a dashboard for sales analytics”

Agent Workflow:

1. Planning Agent:
   - Break task into: design schema, write SQL, build frontend, test

2. Tool Use (SQL Agent):
   - Call tool: execute_sql()
   - Retrieve sales data

3. Reflexion (Code Agent):
   - Generate React component
   - Review for accessibility issues
   - Regenerate with fixes

4. Multi-Agent (Testing):
   - Unit test agent
   - Integration test agent
   - Security test agent

5. Final output: Deployed dashboard

Key insight: Real agentic systems combine all four patterns.


Scaling Considerations (AI-Generated)

As agentic workflows move from prototypes to production, scaling challenges emerge. Here’s how complexity and requirements change with team size.

Team Size: 1-10 Developers

Approach: Single agent, 3-5 tools, fixed planning

What works:

Scaling limits:

When to evolve: When tool count exceeds 10, or multiple sub-teams form


Team Size: 10-50 Developers

Approach: Multi-agent collaboration, specialised tool libraries, light governance

What changes:

New challenges:

Solutions:

When to evolve: When agent coordination failures become frequent, or cost/token usage spikes


Team Size: 50-200 Developers

Approach: Agent platform, formal governance, centralised monitoring

What this looks like:

Critical infrastructure:

Governance patterns:

Common failure modes:

Solutions:

When to evolve: When coordinating 200+ developers, or multiple business units involved


Team Size: 200+ Developers (Enterprise Scale)

Approach: Federated agent platforms, multi-tenant infrastructure, advanced governance

What changes:

Enterprise considerations:

Platform maturity requirements:

This is rare: Most organisations don’t reach this scale. If you’re here, you’re treating agents like critical infrastructure (similar to CI/CD, observability platforms).


Scaling Anti-Patterns (AI-Generated)

Anti-pattern 1: Premature Platform

Anti-pattern 2: No Tool Governance

Anti-pattern 3: Single Agent for Everything

Anti-pattern 4: No Cost Tracking


Try It Yourself (AI Generated)

Experiment 1: Test Reflexion

Setup: Use Claude Code or Cursor

Task:

Prompt 1: "Write a function to calculate Fibonacci numbers"
Prompt 2: "Review your code for performance issues"
Prompt 3: "Rewrite using memoization"

Observe:


Experiment 2: Tool Use with APIs

Setup: Use an LM with function calling (Claude, GPT-4, Gemini)

Task: Build a simple weather bot

Tools:
- get_weather(city) → returns temp, conditions
- get_forecast(city, days) → returns multi-day forecast

Test query: "What's the weather in London, and should I bring an umbrella tomorrow?"

Observe:


Experiment 3: Multi-Agent Debate

Setup: Use 2-3 LM instances

Task: “Should we migrate from monolith to microservices?”

Agent A (Pro-Microservices):
- Generate argument for microservices

Agent B (Pro-Monolith):
- Generate counterargument

Agent C (Architect):
- Evaluate both arguments
- Make final recommendation with trade-offs

Observe: Does the debate surface nuanced trade-offs that a single agent would miss?


Common Pitfalls

Pitfall 1: Over-Planning

Problem: Agent generates a 20-step plan that takes forever Solution: Constrain plan depth (e.g., “max 5 steps”)

Pitfall 2: Infinite Reflexion Loops

Problem: Agent keeps finding issues, never finishes Solution: Set max iterations (e.g., 3 cycles) or quality threshold

Pitfall 3: Tool Calling Errors

Problem: LM calls tools with wrong parameters Solution: Validate tool calls before execution; retry on error

Pitfall 4: Multi-Agent Coordination Overhead

Problem: Agents spend more time coordinating than working Solution: Only use multi-agent when specialization clearly improves outcomes


Advanced Considerations

Cost Management

Agentic workflows can be expensive (many LM calls, long contexts).

Optimization strategies:

Latency

Agentic workflows are slower than single-shot prompts.

When latency matters:

Safety & Control

Agents can take unexpected actions.

Guardrails:

See AI Safety & Control for more.



Key Takeaway

Agentic workflows unlock complex automation. But they require:

  1. Careful tool design (what can the agent do?)
  2. Safety guardrails (what should it NOT do?)
  3. Cost/latency budgets (how much iteration is acceptable?)

When evaluating agentic tools (Cursor, Claude Code, etc.), ask:

Start simple (single agent, few tools, fixed plan) and add complexity only when you see clear benefits.