Difficulty: Introductory Time Investment: 2-3 hours Prerequisites: Basic understanding of LLMs
You’ll learn new what’s possible when you combine reasoning, retrieval, and action. Having an understanding that LLMs are evolving beyond simple Q&A. The progression from static prompts → retrieval systems → autonomous agents fundamentally changes:
| Pattern | What It Does | Limitation When Used Alone |
|---|---|---|
| ReAct (Reason + Act) | Uses Chain of Thought to break tasks into steps and prompt itself | Lacks external information gathering |
| RAG (Retrieval Augmented Generation) | Retrieves relevant context from external sources before generating | Lacks complex reasoning |
Combining them = Agentic AI: The ability to reason AND gather information creates autonomous task execution.
Task: "Can I get a refund for order #12345?"
Step 1 (Reason): I need to check the refund policy
Step 2 (Act): Call tool → retrieve_policy()
Step 3 (Reason): Policy says 30 days. I need order date.
Step 4 (Act): Call tool → get_order_info(12345)
Step 5 (Reason): Order is 10 days old. Within policy. Need to verify product eligibility.
Step 6 (Act): Call tool → get_product_info()
Step 7 (Generate): "Yes, you're eligible. Here's how to proceed..."
The LM prompts itself through a reasoning chain, taking actions when needed.
graph LR
A[User Query] --> B[Vector Embedding]
B --> C[Similarity Search]
C --> D[Vector Database]
D --> E[Top K Relevant Chunks]
E --> F[Augmented Prompt]
F --> G[LLM]
G --> H[Response]
I[Documents] --> J[Chunking]
J --> K[Vector Embedding]
K --> D
Key insight: Docs are pre-chunked and embedded. At query time, only the most relevant chunks are added to the prompt.
From the Stanford lecture, four key patterns emerge:
Prompt 1: "Here is the <code>. Check the code and provide feedback."
Prompt 2: "Here is the <code> and <feedback>. Use the feedback to improve."
Repeating this cycle improves output quality significantly.
Task: Ask an LLM to “Summarise the latest news on [topic]”
What to observe: How often does the pure prompt approach fail? What’s the latency difference with RAG?
Note: You might have noticed this yourself already if you’ve been using products like ChatGPT since they first launched, web search was first launched in ChatGPT in October 2024 for paid users and for all users in Febraury 2025.
Task: Ask Claude Code to write a function, then ask it to review and improve its own code.
Prompt 1: "Write a Python function to validate email addresses"
Prompt 2: "Review the above code. What edge cases are missing?"
Prompt 3: "Rewrite the function incorporating your feedback"
What to observe: Does the second iteration catch issues (e.g., missing regex for special chars)? How many iterations reach “good enough”?
| LLM Limitation | Agentic Solution |
|---|---|
| Outdated knowledge | RAG: Retrieve current data from vector stores or APIs |
| No domain expertise | RAG: Provide proprietary docs (contracts, policies, code) |
| Single-shot answers | ReAct: Break into steps, iterate |
| Can’t verify accuracy | Reflexion: Self-critique and regenerate |
| No external actions | Tool Use: Call APIs, run scripts, update databases |
LLMs are moving from “assistants” to “agents.” The architectural implications:
Start thinking about AI not as a feature, but as a component in a distributed system that reasons, retrieves, and acts.