Difficulty: Intermediate Time Investment: 2-3 hours Prerequisites: Understanding of prompts, agentic workflows, file systems
The Problem: High-quality prompts improve LM performance, but they:
Agent Skills solve this by giving LMs:
As a Technical Architect, understanding Skills helps you:
| Feature | High-Quality Prompt | Agent Skill |
|---|---|---|
| Context Load | Static. Full text loaded immediately. | Progressive. Tiny metadata first; full manual only if needed. |
| Logic Type | Declarative. “Act like a tax lawyer.” | Procedural. “Here’s a Python script to validate tax codes. Run it if unsure.” |
| Actionability | LLM describes what it would do. | LLM executes pre-written scripts in secure runtime. |
| Persistence | Lost when chat ends (unless in System Prompt). | Portable folders. Reusable across projects. |
Key takeaway: If a prompt is an instruction manual, a skill is a toolbox that includes the manual and the tools.
.cursorrules (AI Generated)Both provide context, but they serve different purposes:
| Aspect | .cursorrules |
Agent Skills |
|---|---|---|
| Scope | Conventions (coding style, folder structure) | Capabilities (executable scripts, validation logic) |
| The “Script” Component | Can follow rules to write code | Packages pre-written, validated scripts that agent calls as tools |
| Runtime | Local IDE context | Tight coupling with runtime environment (Claude Code container, VM) |
| Primary Use | “Always use TypeScript” | “Here’s a script to generate a PDF report” |
Example:
.cursorrules: “Use Tailwind for styling”Architect’s takeaway: Use .cursorrules for conventions, Skills for executable capabilities.
/my-skill/
├── SKILL.md # Instructions for when/how to use this skill
└── scripts/ # Executable logic (Python, Bash, etc.)
├── validate.py
└── generate.sh
SKILL.md Example:
# ADR Governor Skill
## When to use
Trigger this skill when:
- User proposes a new technology or architectural change
- A new ADR needs to be created
- An existing ADR needs validation
## How it works
1. Check if ADR exists for this decision domain
2. Run scripts/validate_adr.py to check against enterprise tech radar
3. Draft ADR using templates/adr-template.md
4. Flag deprecated technologies
## Scripts
- validate_adr.py: Checks ADR against enterprise standards
- generate_adr.sh: Creates new ADR from template
The Problem: Loading all skills upfront bloats the context window.
The Solution: Two-stage loading
Stage 1: Metadata Only
{
"skill_name": "adr-governor",
"description": "Validates architectural decisions against enterprise standards",
"triggers": ["new technology", "architecture change", "ADR creation"]
}
(~50 tokens)
Stage 2: Full Load (only if triggered)
SKILL.md (full instructions) + scripts/ (executable logic)
(~5000 tokens)
Impact: Agent can have access to 50 skills without context bloat (when triggered reliably). Only loads what it determines it needs.
Philosophy: Instead of building custom APIs for every task, Anthropic uses Bash/File System standard.
Principle: If a human can do it in a terminal, the agent *can** do it with a skill.
*if the skill is triggered.
Example:
# Traditional approach: Build custom API
POST /api/skills/threat-model
Body: { "architecture": "..." }
# Skills approach: Standard file operations
£ cat architecture.md | python scripts/threat-model.py > risks.json
Why this matters: Lower barrier to skill creation. Developers already know Bash/Python.
The Innovation: LMs can write their own scripts and save them for future use.
Example Workflow:
Implication: Skills enable emergent expertise—the more an agent works, the more capable it becomes (when skills trigger reliably).
Despite their potential, skills don’t always activate automatically. Current trigger reliability is ~20% in production environments. Here are practical fallback strategies:
When to use: Skills exist but agent doesn’t recognise when to trigger them
How it works:
Instead of: "Review my architecture"
Use: "Use the threat-modeler skill to analyse this architecture"
Trade-off: Requires you to know which skill to invoke (defeats some automation benefit)
When to use: Frequent tasks where you want quick invocation
How it works: Create aliases or shell functions:
alias threat-model="claude --skill threat-modeler"
Trade-off: Requires setup; still relies on explicit invocation
Skills are powerful but not yet fully autonomous. Treat them as:
For mission-critical tasks (security, compliance), use deterministic approaches such as CI/CD actions that trigger scripts to ensure consistent execution.
Anthropic’s vision for the AI infrastructure stack:
┌─────────────────────────────────────┐
│ Human / User │
└─────────────────────────────────────┘
↕
┌─────────────────────────────────────┐
│ Agent (Claude) │
├─────────────────────────────────────┤
│ Skills (Expertise) │ ← Procedural knowledge
│ - ADR Governance │
│ - Threat Modeling │
│ - Trade-off Analysis │
├─────────────────────────────────────┤
│ MCP (Connectivity) │ ← Wires to data sources
│ - Slack, GitHub, Databases │
├─────────────────────────────────────┤
│ Runtime (Execution Environment) │ ← Where agent "works"
│ - File system, Bash, APIs │
└─────────────────────────────────────┘
Layers Explained:
Our recent dive into this area has revealed five potentially useful skills for Technical Architects:
threat-modeler (Security Architect)Problem: Security treated as afterthought; manual STRIDE analysis is tedious
Skill Contents:
Value: Helps automate security analysis during design phase (shift-left security) when triggered reliably.
adr-governor (Technical/Enterprise Architect)Problem: ADRs are forgotten, inconsistent, or misaligned with enterprise standards
Skill Contents:
Value: Helps maintain architectural consistency at scale, reducing manual gatekeeping overhead.
trade-off-analyzer (Technical Architect)Problem: Technology choices (Kafka vs. RabbitMQ, monolith vs. microservices) are often biased or lack rigor
Skill Contents:
Value: Forces structured comparison of “ilities” (scalability, availability, maintainability).
iac-security-auditor (Security/Technical Architect)Problem: Security architects can’t review every Terraform/CloudFormation file; misconfigurations (open S3 buckets) slip through
Skill Contents:
tfsec, checkov for deterministic scans.tf file → Run scan → Find violations → Rewrite to be compliant → Explain whyValue: Real-time enforcement of security policy in IaC layer.
architecture-context-first (Solution Architect)Problem: Engineers and AI agents are quick to solutionise a problem before exploring the full context of the problem first, leading to misaligned solutions
Skill Contents:
Value: Acts as “Digital Librarian” for enterprise, ensuring contract-first design.
Task Build a code-reviewer skill using Claude’s ‘skill-creator’ Skill, tip: you ask Gemini or ChatGPT to create a prompt for you.
Test: Ask Claude Code to “review this file using the code-reviewer skill”
Observe: Does it trigger the skill? Does it execute the script?
Scenario: Enforce “all functions must have docstrings”
Approach A: Prompt
Remember to always add docstrings to functions.
Approach B: Skill
/docstring-enforcer/
├── SKILL.md: "Check if functions have docstrings"
└── scripts/check_docstrings.py: AST parser to find missing docstrings
Test: Generate 5 functions with each approach
Observe:
Problem: Creating 50 hyper-specific skills Solution: Start with 3-5 high-impact skills; expand only when needed
Problem: Skill has vague “when to use” instructions Solution: Define specific triggers (keywords, file patterns, user intents)
Problem: Building complex scripts that do too much Solution: Keep scripts focused (one task per script)
Problem: Skills become outdated (references old standards) Solution: Treat skills like code—version control, regular updates
Skills can call other skills:
trade-off-analyzer
↓ calls
iac-security-auditor (to check security implications of choice)
↓ calls
adr-governor (to document decision)
Benefit: Modular, reusable expertise
Caveat: Avoid deep nesting (hard to debug)
Skills are evolving rapidly. Current limitations:
See When Skills Fail for detailed fallback strategies.
Skills turn “smart but inexperienced” LMs into domain experts.
When to build a skill:
Start:
The opportunity: A small team of architects can “clone” their expertise via skills and scale governance across hundreds of developers.