architecture-handbook

Tool Comparison Matrix (AI Generated)

A practical guide to AI/ML tools for Technical Architects. Organised by category with trade-off analysis.


AI-Assisted Development Tools

Code Editors & IDEs

Tool Type Key Features Best For Trade-offs
Cursor AI-native IDE (VSCode fork) Composer mode (multi-file edits), voice-to-code, context-aware Architects who want control + velocity ✅ High control, iterative feedback
❌ Paid product, vendor lock-in
Claude Code CLI agent Agent skills, MCP integration, terminal-based Architects building governance automation ✅ Extensible (skills), runs locally
❌ Command-line only, learning curve
GitHub Copilot Autocomplete plugin Inline suggestions, works in any IDE Quick productivity boost, low friction ✅ IDE-agnostic, low learning curve
❌ Line-level only (no multi-file)
Replit Agent Cloud IDE Full-stack app generation, hosting included Rapid prototyping, demos ✅ Zero setup, instant deploy
❌ Cloud-only, less control
Devin Autonomous agent Fully autonomous (plans, codes, tests, deploys) Experimental, high-risk projects ✅ Minimal human input
❌ Low control, unpredictable

Architect’s Decision Framework:


LLM Providers

Model Comparison (As of January 2026)

Provider Models Strengths Weaknesses Pricing
Anthropic Claude Opus 4.5, Sonnet 4.5, Haiku Long context (200k), tool use, safety Not best at math/code (vs GPT-4) ££££ (Opus), £££ (Sonnet), £ (Haiku)
OpenAI GPT-4o, GPT-4 Turbo, GPT-3.5 Code generation, reasoning, broad training Shorter context (128k), less safe ££££ (GPT-4), ££ (GPT-3.5)
Google Gemini 2.0, Gemini Pro Multimodal (video, audio), fast inference Newer, less battle-tested £££ (2.0), ££ (Pro)
Open Source Llama 3.1, Mixtral, Qwen Free, runs locally, no data sent to cloud Lower quality, need infrastructure Free (compute cost only)

Key Insight: Model “best for coding” changes monthly. Follow community (Twitter, Reddit) to track current leader.

Architect’s Decision:


RAG & Vector Databases

Database Type Best For Trade-offs
Pinecone Managed, cloud-native Production, scale quickly ✅ Easy setup, auto-scaling
❌ Vendor lock-in, cost at scale
Weaviate Open-source, self-hosted Need control, hybrid search (vector + keyword) ✅ Flexible, open-source
❌ Operational overhead, slower setup
pgvector Postgres extension Already using Postgres, simple stack ✅ No new infrastructure
❌ Less optimised for scale
Chroma In-memory, local Prototyping, small datasets ✅ Fast setup, free
❌ Not production-ready, no persistence
Qdrant Open-source, Rust-based High performance, self-hosted ✅ Fast, efficient
❌ Smaller community, newer tool

Architect’s Decision:


MLOps Platforms

Platform Type Best For Trade-offs
AWS SageMaker Managed, cloud AWS-native, integrated with AWS services ✅ Built-in features, auto-scaling
❌ Vendor lock-in, higher cost
Azure ML Managed, cloud Azure-native, enterprise integration ✅ Enterprise features, security
❌ Vendor lock-in, complex UI
GCP Vertex AI Managed, cloud GCP-native, good for BigQuery integration ✅ Integrated ML tools
❌ Vendor lock-in, newer platform
MLflow Open-source Experiment tracking, model registry ✅ Vendor-neutral, free
❌ No compute (need to integrate)
Kubeflow Open-source, Kubernetes Full ML pipelines on K8s ✅ Cloud-agnostic, powerful
❌ High complexity, steep learning curve

Architect’s Decision:


Context Management Tools

Tool Purpose Best For Trade-offs
.cursorrules Project conventions (Cursor-specific) Coding standards, architecture decisions ✅ Simple (one file), built-in to Cursor
❌ Cursor-only, limited to text rules
.clinerules Project conventions (Claude Code) Similar to .cursorrules but for Claude Code ✅ Works with CLI agent
❌ Claude Code-only
Claude Projects Persistent context (Anthropic) Project-specific context across sessions ✅ Cross-session memory
❌ Anthropic-only, cloud-based
ChatGPT Custom Instructions User-level context (OpenAI) Personal preferences, global rules ✅ Works across all chats
❌ User-level (not project-level)
MCP Servers Data connectivity Connect LM to Slack, GitHub, databases ✅ Real-time data access
❌ Requires setup, security config

Architect’s Decision:


Security & Compliance Tools

IaC Security Scanners

Tool Coverage Best For Trade-offs
tfsec Terraform Terraform-specific, fast ✅ Fast, detailed reports
❌ Terraform-only
Checkov Terraform, CloudFormation, K8s, ARM Multi-platform IaC ✅ Broad coverage, active development
❌ Slower than tfsec
Terrascan Terraform, K8s, Docker Policy-as-code ✅ Custom policies
❌ Smaller community
Snyk IaC Multi-platform Enterprise, integrated with Snyk ecosystem ✅ Enterprise features, good UI
❌ Paid product

Architect’s Decision:

Threat Modeling Tools

Tool Type Best For Trade-offs
Microsoft Threat Modeling Tool GUI-based Windows users, STRIDE methodology ✅ Free, structured STRIDE
❌ Windows-only, manual process
OWASP Threat Dragon Web/desktop Cross-platform, open-source ✅ Open-source, multi-platform
❌ Less mature than MS tool
IriusRisk Enterprise platform Large orgs, compliance tracking ✅ Automated, enterprise features
❌ Expensive, complex setup
Agent Skills AI-powered Automated STRIDE during design phase ✅ Fast, integrated into workflow
❌ Requires building the skill

Architect’s Decision:


Schema & API Tools

Tool Purpose Best For Trade-offs
OpenAPI REST API schemas REST APIs, code generation ✅ Industry standard, tooling ecosystem
❌ Verbose for simple APIs
GraphQL API query language + schema Flexible queries, strong typing ✅ Client-driven queries
❌ More complex backend
Avro Schema for events Kafka, event-driven systems ✅ Schema evolution support
❌ Requires schema registry
Protobuf Binary serialization + schema gRPC, high-performance systems ✅ Compact, fast
❌ Not human-readable
OpenAPI Diff Breaking change detection CI/CD validation ✅ Automated detection
❌ CLI-only, basic reports
Pact Consumer-driven contracts Microservices integration testing ✅ Consumer-driven, good for testing
❌ Setup overhead

Architect’s Decision:


Governance & Architectural Drift Prevention

Policy Enforcement Tools

Tool Type Best For Trade-offs
OPA (Open Policy Agent) Policy engine Enforcing architectural rules in code ✅ Flexible, declarative policies
❌ Learning curve (Rego language)
Conftest OPA for configs Testing IaC, K8s manifests ✅ Easy integration with CI/CD
❌ Limited to static analysis
Kyverno Kubernetes-native K8s policy enforcement ✅ Simpler than OPA for K8s
❌ K8s-only

Architect’s Decision:

Drift Detection & Code Review Tools

Tool Focus Strengths Gaps
vFunction Architectural drift detection Visualises true architecture vs. intended, monitors drift in real-time Observability-focused (tells you drift happened, doesn’t prevent it)
CodeRabbit AI peer review Deep, context-aware reviews beyond syntax Recommender, not enforcer (no comprehension verification gate)
Qodo (Codium) AI test generation + review Good at generating tests, coverage analysis Focuses on test coverage, not architectural comprehension
Traycer Intent-based review Detects when code “veers off intent,” monitors modularity Still prioritises velocity over understanding
SonarQube Code quality + security Established platform, wide language support Focuses on bugs/vulnerabilities, not architectural patterns

Architect’s Decision:

AST Analysis & Dependency Tools

Tool Purpose Best For Trade-offs
madge Dependency graph (JS/TS) Visualizing module dependencies, circular detection ✅ Fast, simple
❌ JavaScript-only
dependency-cruiser Advanced dependency rules Enforcing architectural boundaries (layers, modules) ✅ Powerful rule engine
❌ Complex configuration
jscodeshift AST transformations (JS/TS) Automated refactoring, codemod creation ✅ Powerful for migrations
❌ JavaScript-only, steep learning curve
semgrep Multi-language AST analysis Finding anti-patterns, enforcing custom rules ✅ Works across many languages
❌ Less architectural focus (more security)

Architect’s Decision:


Cost Comparison (Rough Estimates)

LLM API Costs (per 1M tokens)

Provider Model Input Output Use Case
Anthropic Claude Opus 4.5 £15 £75 Complex reasoning, long context
Anthropic Claude Sonnet 4.5 £3 £15 Balanced (most use cases)
Anthropic Claude Haiku £0.25 £1.25 High-volume, simple tasks
OpenAI GPT-4o £5 £15 Code generation, general
OpenAI GPT-3.5 Turbo £0.50 £1.50 Cost-sensitive, simple tasks
Google Gemini 2.0 £4 £12 Multimodal, fast inference

Cost Optimization Tips:


Integration Patterns

Tool Stacks for Common Scenarios

Scenario 1: Small Team, Rapid Development

Development: Cursor (AI IDE)
Context: .cursorrules (conventions)
LLM: Claude Sonnet (balanced cost/quality)
RAG: pgvector (simple, no new infrastructure)

Scenario 2: Enterprise, Governance Focus

Development: Claude Code (CLI agent)
Context: Agent Skills (ADR, security, contracts)
LLM: Claude Opus (safety, long context)
RAG: Weaviate (self-hosted, control)
IaC Security: Checkov (multi-platform)
Schema: OpenAPI + Pact (contract-driven)

Scenario 3: ML-Heavy, Cloud-Native

MLOps: AWS SageMaker (managed)
LLM: OpenAI GPT-4o (code generation)
Vector DB: Pinecone (managed, scalable)
Monitoring: MLflow (experiment tracking)

Scenario 4: Cost-Conscious, Open-Source

Development: VSCode + GitHub Copilot
LLM: Llama 3.1 (self-hosted)
RAG: Chroma → Qdrant (free, self-hosted)
MLOps: MLflow (free) + Kubernetes
IaC Security: tfsec (free)

Evaluation Framework

When evaluating a new AI/ML tool, ask:

1. Control vs. Autonomy

2. Cost vs. Quality

3. Lock-in vs. Convenience

4. Setup Time vs. Features

5. Team Expertise


Staying Current

Model capabilities change monthly. This comparison is a snapshot (January 2026).

How to stay updated:

  1. Follow tool-specific Discord/Slack channels
  2. Monitor Twitter/X for benchmark updates
  3. Check tool release notes monthly
  4. Test new models quarterly (benchmark on your workloads)

Red flags (tool might be declining):



Key Takeaway

No “one size fits all” tool. Your stack should match:

Start simple:

  1. Pick one AI IDE (Cursor or GitHub Copilot)
  2. Pick one LLM (Claude Sonnet or GPT-4o)
  3. Use built-in context tools (.cursorrules)
  4. Add complexity (RAG, MLOps, Skills) only when needed

Iterate: Re-evaluate tools quarterly. The landscape changes fast.