architecture-handbook

Agentic AI Evolution

Difficulty: Introductory Time Investment: 2-3 hours Prerequisites: Basic understanding of LLMs


Learning Resources (Start Here)

Primary Video


Why This Matters

You’ll learn new what’s possible when you combine reasoning, retrieval, and action. Having an understanding that LLMs are evolving beyond simple Q&A. The progression from static prompts → retrieval systems → autonomous agents fundamentally changes:


Key Concepts

The Evolution Timeline

  1. Basic Prompts (2022-early 2023)
    • Static instructions, no external data
    • Limited by training data cutoff
    • “Hallucinations” when asked about proprietary or recent info
  2. RAG - Retrieval Augmented Generation (2023)
    • LMs gain ability to pull in external context
    • Addresses knowledge limitations
    • Still lacks reasoning about multi-step tasks
  3. Agentic AI (2024+)
    • Combines reasoning (ReAct) + retrieval (RAG) + tool use
    • Can break down complex tasks autonomously
    • Iterates and self-corrects based on feedback

The Two Core Patterns

Pattern What It Does Limitation When Used Alone
ReAct (Reason + Act) Uses Chain of Thought to break tasks into steps and prompt itself Lacks external information gathering
RAG (Retrieval Augmented Generation) Retrieves relevant context from external sources before generating Lacks complex reasoning

Combining them = Agentic AI: The ability to reason AND gather information creates autonomous task execution.


How It Works

ReAct Pattern

Task: "Can I get a refund for order #12345?"

Step 1 (Reason): I need to check the refund policy
Step 2 (Act): Call tool → retrieve_policy()
Step 3 (Reason): Policy says 30 days. I need order date.
Step 4 (Act): Call tool → get_order_info(12345)
Step 5 (Reason): Order is 10 days old. Within policy. Need to verify product eligibility.
Step 6 (Act): Call tool → get_product_info()
Step 7 (Generate): "Yes, you're eligible. Here's how to proceed..."

The LM prompts itself through a reasoning chain, taking actions when needed.

RAG Architecture (Simplified)

graph LR
    A[User Query] --> B[Vector Embedding]
    B --> C[Similarity Search]
    C --> D[Vector Database]
    D --> E[Top K Relevant Chunks]
    E --> F[Augmented Prompt]
    F --> G[LLM]
    G --> H[Response]

    I[Documents] --> J[Chunking]
    J --> K[Vector Embedding]
    K --> D

Key insight: Docs are pre-chunked and embedded. At query time, only the most relevant chunks are added to the prompt.


Common Approaches

Approach 1: Pure Prompting

Approach 2: RAG Only

Approach 3: Agentic Workflows (ReAct + RAG + Tools)


Agentic Design Patterns

From the Stanford lecture, four key patterns emerge:

  1. Planning: Multi-step planning to achieve goals
  2. Reflexion: Examines its own work and improves iteratively
  3. Tool Use: Calls external APIs, scripts, or databases
  4. Multi-Agent Collaboration: Different agents with specialised roles coordinate

Example: Reflexion Pattern

Prompt 1: "Here is the <code>. Check the code and provide feedback."
Prompt 2: "Here is the <code> and <feedback>. Use the feedback to improve."

Repeating this cycle improves output quality significantly.


Try It Yourself

Experiment 1: Compare Prompt vs. Agentic Approach

Task: Ask an LLM to “Summarise the latest news on [topic]”

What to observe: How often does the pure prompt approach fail? What’s the latency difference with RAG?

Note: You might have noticed this yourself already if you’ve been using products like ChatGPT since they first launched, web search was first launched in ChatGPT in October 2024 for paid users and for all users in Febraury 2025.

Experiment 2: Test Reflexion

Task: Ask Claude Code to write a function, then ask it to review and improve its own code.

Prompt 1: "Write a Python function to validate email addresses"
Prompt 2: "Review the above code. What edge cases are missing?"
Prompt 3: "Rewrite the function incorporating your feedback"

What to observe: Does the second iteration catch issues (e.g., missing regex for special chars)? How many iterations reach “good enough”?


Common Limitations & How Agentic AI Addresses Them

LLM Limitation Agentic Solution
Outdated knowledge RAG: Retrieve current data from vector stores or APIs
No domain expertise RAG: Provide proprietary docs (contracts, policies, code)
Single-shot answers ReAct: Break into steps, iterate
Can’t verify accuracy Reflexion: Self-critique and regenerate
No external actions Tool Use: Call APIs, run scripts, update databases


Further Reading


Key Takeaway

LLMs are moving from “assistants” to “agents.” The architectural implications:

Start thinking about AI not as a feature, but as a component in a distributed system that reasons, retrieves, and acts.