Difficulty: Intermediate Time Investment: 2-3 hours Prerequisites: Understanding of prompts, RAG, and basic LM limitations
Traditional LM usage is single-shot: You send a prompt, get a response, done.
Agentic workflows introduce iteration and autonomy:
As a Technical Architect, you’ll encounter agentic patterns when:
Understanding these patterns helps you assess what’s possible and what guardrails are needed.
From the Stanford lecture:
| Pattern | What It Does | Example Use Case |
|---|---|---|
| Planning | Multi-step planning to achieve goals | “Build a web app” → breaks into design, code, test, deploy |
| Reflexion | Examines own work and iterates | Code review: generate → critique → regenerate |
| Tool Use | Calls external APIs, scripts, databases | Customer support: query CRM, retrieve order, process refund |
| Multi-Agent | Specialised agents collaborate | Research: one agent searches, one summarises, one validates |
Key insight: Combining these patterns creates systems far more capable than single-shot prompting.
The LM breaks a complex goal into a sequence of sub-tasks.
Example Task: “Prepare a financial analysis report”
Planning Output:
Plan:
1. Gather quarterly revenue data from database
2. Calculate YoY growth percentages
3. Generate visualization charts
4. Write executive summary
5. Format as PDF
The agent then executes each step sequentially, using the output of one step as input to the next.
Approach A: Fixed Plan
Approach B: Dynamic Re-Planning
The LM generates output, reviews it, then regenerates based on its own feedback.
Example - Code Quality:
Prompt 1: "Write a Python function to validate email addresses"
Output 1: [basic regex function]
Prompt 2: "Review the above code. What edge cases are missing?"
Output 2: "Missing: special chars (+, .), subdomains, internationalised domains"
Prompt 3: "Rewrite the function incorporating your feedback"
Output 3: [improved function with edge case handling]
graph LR
A[Generate] --> B[Critique]
B --> C{Good Enough?}
C -->|No| D[Regenerate]
D --> B
C -->|Yes| E[Done]
Key question: When to stop iterating?
From Stanford research: Reflexion improves output quality significantly on tasks like coding, writing, and reasoning.
Example Benchmark:
Diminishing returns: After 3-4 cycles, improvements plateau.
The LM can call external functions/APIs to gather information or perform actions.
Example - Customer Support Agent:
User: "Can I get a refund for order #12345?"
Agent reasoning:
1. I need the refund policy → Call tool: get_policy()
2. I need order details → Call tool: get_order(12345)
3. I need product info → Call tool: get_product_details(order.product_id)
4. Generate response based on retrieved data
graph TD
A[User Query] --> B[LM Analyzes]
B --> C{Need Tool?}
C -->|Yes| D[Call Tool]
D --> E[Tool Returns Data]
E --> B
C -->|No| F[Generate Final Response]
Key insight: The LM decides when and which tool to call. You provide the tools; the LM orchestrates them.
{
"name": "get_order",
"description": "Retrieves order details by order ID",
"parameters": {
"order_id": {
"type": "string",
"description": "The unique order identifier"
}
}
}
The LM reads this schema and knows how to call the tool correctly.
| Tool Type | Example | Use Case |
|---|---|---|
| Search | Web search, vector DB | Research, fact-checking |
| Code Execution | Python interpreter | Math, data analysis |
| APIs | CRM, payment gateway | Business workflows |
| File System | Read/write files | Document processing |
Instead of one general-purpose agent, create specialised agents that collaborate.
Example - Research Task:
Goal: "Research AI safety regulations in the EU"
Agent 1 (Searcher):
- Role: Find relevant articles and papers
- Tools: Web search, academic databases
Agent 2 (Summariser):
- Role: Summarise findings
- Tools: Text processing
Agent 3 (Validator):
- Role: Fact-check claims
- Tools: Cross-reference multiple sources
Agent 4 (Writer):
- Role: Compile final report
- Tools: Document formatter
Pattern A: Sequential (Pipeline)
Searcher → Summariser → Validator → Writer
Each agent passes output to the next.
Pattern B: Debate
Agent A proposes solution
Agent B critiques
Agent A revises
Agent C makes final decision
Agents challenge each other to improve output quality.
Pattern C: Voting
3 agents generate solutions independently
Vote on best solution
Execute winning approach
Reduces error rate (majority vote filters outliers).
Scenario: “Build a dashboard for sales analytics”
Agent Workflow:
1. Planning Agent:
- Break task into: design schema, write SQL, build frontend, test
2. Tool Use (SQL Agent):
- Call tool: execute_sql()
- Retrieve sales data
3. Reflexion (Code Agent):
- Generate React component
- Review for accessibility issues
- Regenerate with fixes
4. Multi-Agent (Testing):
- Unit test agent
- Integration test agent
- Security test agent
5. Final output: Deployed dashboard
Key insight: Real agentic systems combine all four patterns.
As agentic workflows move from prototypes to production, scaling challenges emerge. Here’s how complexity and requirements change with team size.
Approach: Single agent, 3-5 tools, fixed planning
What works:
.cursorrules or project docsScaling limits:
When to evolve: When tool count exceeds 10, or multiple sub-teams form
Approach: Multi-agent collaboration, specialised tool libraries, light governance
What changes:
New challenges:
Solutions:
When to evolve: When agent coordination failures become frequent, or cost/token usage spikes
Approach: Agent platform, formal governance, centralised monitoring
What this looks like:
Critical infrastructure:
Governance patterns:
Common failure modes:
Solutions:
When to evolve: When coordinating 200+ developers, or multiple business units involved
Approach: Federated agent platforms, multi-tenant infrastructure, advanced governance
What changes:
Enterprise considerations:
Platform maturity requirements:
This is rare: Most organisations don’t reach this scale. If you’re here, you’re treating agents like critical infrastructure (similar to CI/CD, observability platforms).
Anti-pattern 1: Premature Platform
Anti-pattern 2: No Tool Governance
Anti-pattern 3: Single Agent for Everything
Anti-pattern 4: No Cost Tracking
Setup: Use Claude Code or Cursor
Task:
Prompt 1: "Write a function to calculate Fibonacci numbers"
Prompt 2: "Review your code for performance issues"
Prompt 3: "Rewrite using memoization"
Observe:
Setup: Use an LM with function calling (Claude, GPT-4, Gemini)
Task: Build a simple weather bot
Tools:
- get_weather(city) → returns temp, conditions
- get_forecast(city, days) → returns multi-day forecast
Test query: "What's the weather in London, and should I bring an umbrella tomorrow?"
Observe:
Setup: Use 2-3 LM instances
Task: “Should we migrate from monolith to microservices?”
Agent A (Pro-Microservices):
- Generate argument for microservices
Agent B (Pro-Monolith):
- Generate counterargument
Agent C (Architect):
- Evaluate both arguments
- Make final recommendation with trade-offs
Observe: Does the debate surface nuanced trade-offs that a single agent would miss?
Problem: Agent generates a 20-step plan that takes forever Solution: Constrain plan depth (e.g., “max 5 steps”)
Problem: Agent keeps finding issues, never finishes Solution: Set max iterations (e.g., 3 cycles) or quality threshold
Problem: LM calls tools with wrong parameters Solution: Validate tool calls before execution; retry on error
Problem: Agents spend more time coordinating than working Solution: Only use multi-agent when specialization clearly improves outcomes
Agentic workflows can be expensive (many LM calls, long contexts).
Optimization strategies:
Agentic workflows are slower than single-shot prompts.
When latency matters:
Agents can take unexpected actions.
Guardrails:
See AI Safety & Control for more.
Agentic workflows unlock complex automation. But they require:
When evaluating agentic tools (Cursor, Claude Code, etc.), ask:
Start simple (single agent, few tools, fixed plan) and add complexity only when you see clear benefits.