Writing Good Specs
Specs are the contract between you and your agent team. A well-written spec produces well-written code; a vague spec produces chaos. Every minute spent on spec clarity saves ten minutes of rework.
- Be specific about acceptance criteria — vague ACs lead to vague tests, which lead to implementations that “work” but don’t meet expectations.
- Include happy path AND error cases — agents will only test what you specify. If you skip error handling in the spec, you skip it in the code.
- Define data models explicitly — field names, types, required vs. optional, constraints. Don’t leave agents guessing your schema.
- Include API contracts if applicable — request/response shapes, status codes, headers. The spec should be enough to build a mock server.
- Reference related specs for cross-feature dependencies — if feature B depends on feature A’s auth module, link to the auth spec explicitly.
/add:spec interview. Let the interview do its job — answer thoroughly, and the spec writes itself.
Acceptance Criteria That Work
Acceptance criteria are the atomic unit of verification. Each one becomes a test, and each test becomes a guarantee. Get these right and everything downstream improves.
- Use numbered format: AC-001, AC-002, etc. — makes traceability from spec to test to implementation unambiguous.
- Each AC should be independently testable — if you can’t write a single test for it, it’s not a good AC.
- Avoid compound ACs — “user can login AND see dashboard” is two behaviors. Split them into AC-001 (login) and AC-002 (dashboard redirect).
- Include boundary conditions — what happens at limits? What about zero? What about maximum?
Edge Cases
The difference between a prototype and a production system is edge case handling. Always include these categories in your specs:
- Input boundaries: empty inputs, maximum lengths, special characters, unicode, null vs. undefined
- Concurrency: simultaneous access, race conditions, duplicate submissions
- Network: timeouts, connection failures, partial responses, retries
- API-specific: malformed requests, missing authentication, expired tokens, rate limiting
- UI-specific: screen sizes, keyboard navigation, loading states, error states, empty states
TDD Discipline
ADD enforces a strict four-phase TDD cycle. Each phase has a purpose, and skipping any of them undermines the entire methodology.
- RED phase: Write ALL failing tests before any implementation. Every acceptance criterion becomes at least one test. The test suite should be a complete expression of the spec.
- GREEN phase: Write MINIMAL code to pass — no gold-plating, no “while I’m here” additions. The only goal is a green test suite.
- REFACTOR phase: Clean up with confidence. Your tests are the safety net. Improve naming, extract functions, reduce duplication — but change no behavior.
- VERIFY phase: An independent agent runs quality gates in a different context with no shared state. This catches assumptions the implementing agent baked in.
Test Strategy
Different test types serve different purposes. Match your test strategy to what you’re verifying.
- Unit tests for business logic — fast, isolated, high volume. These are your first line of defense.
- Integration tests for API boundaries — verify that modules talk to each other correctly. Test real interactions, not mocks of mocks.
- E2E tests for critical user paths (Beta+ maturity) — expensive to run but catch what unit and integration tests miss.
- Test naming: describe what behavior is being tested, not how.
should reject expired tokensis better thantest checkTokenExpiry function.
Coverage Targets
Coverage expectations scale with project maturity. Don’t over-invest in coverage for a POC, and don’t ship a GA product at 40%.
| Maturity Level | Coverage Target | Rationale |
|---|---|---|
| POC | None required | Validate the idea first. Tests slow exploration. |
| Alpha | 60% | Core paths tested. Gaps acceptable in experimental areas. |
| Beta | 80% | Production-bound code needs real coverage. E2E tests added. |
| GA | 90% | Ship with confidence. Remaining 10% is generated code or trivial getters. |
Human-Agent Collaboration
ADD defines three engagement modes that control how much autonomy agents have. Choosing the right mode prevents both bottlenecks and runaway decisions.
- Guided mode — Use when exploring unfamiliar codebases, onboarding to a new project, or making architectural decisions. The agent asks before acting.
- Balanced mode (default) — Use for established projects where specs exist and the agent understands the codebase. The agent acts on clear specs, asks on ambiguity.
- Autonomous mode — Use only for well-specified features at GA maturity. The agent executes full TDD cycles without interruption. Requires clear specs and established patterns.
/add:away. Review the briefing from /add:back before continuing work — the agent may have queued decisions that need your input.
Away Mode Best Practices
Away mode lets your agent team work while you’re gone. But autonomy without boundaries is a recipe for irreversible mistakes.
- Define scope explicitly: “Work on specs/auth-login.md only” is clear. “Make progress” is not.
- Set boundaries: “No database schema changes,” “No new dependencies,” “No merges to main.”
- Set duration: Default is 2 hours. Adjust based on work scope — larger scope needs more time, but also more risk.
- Review the away log when you return — it contains every action taken, every decision made, and every item queued for your review.
- Run
/add:backfor a structured briefing — don’t just read the log; let the agent summarize what matters.
Running Effective Retros
Retrospectives are how ADD gets smarter over time. Without them, agents repeat the same mistakes across cycles.
- Run after every milestone or every 2 weeks — whichever comes first. Stale retros lose context.
- Agent auto-checkpoints capture data continuously — after every verify, TDD cycle, deploy, and away session. The data is there; retros organize it.
- Interactive retros (
/add:retro) capture both perspectives — the human sees things the agent missed, and vice versa. Both viewpoints matter. - Agree on 2–3 concrete changes — don’t try to change everything at once. Small, specific improvements compound over time.
- Promote broadly useful learnings to your cross-project library — if it helped here, it will help there.
Knowledge Management
ADD’s three-tier knowledge system prevents agents from starting every project from scratch. Let it work for you.
- Let checkpoints accumulate naturally — they trigger automatically after verify, TDD cycle, deploy, and away sessions. Don’t force them.
- Review
.add/learnings.mdperiodically — prune outdated entries, merge duplicates, and clarify vague ones. - During retros, promote universal patterns to
~/.claude/add/library.md— cross-project wisdom that helps every future project. - Project-specific quirks stay in Tier 3 — “this project uses Prisma for ORM” is project-specific. “Always validate foreign keys in integration tests” is universal.
Knowledge Promotion
Knowledge flows upward through the tiers: project discoveries can be promoted to your user library during retros, and truly universal insights can be promoted to plugin-global (in the ADD dev project only).
| Tier | Location | Scope | Promote When |
|---|---|---|---|
| Tier 3: Project | .add/learnings.md |
This project only | Auto-checkpoints; always active |
| Tier 2: User | ~/.claude/add/library.md |
All your projects | During /add:retro when a learning is universal |
| Tier 1: Global | knowledge/global.md |
All ADD users | Rare; only for fundamental ADD insights |
Common Mistakes
These are the mistakes we see most often. Every one of them has cost teams real time — learn from their pain.
| Anti-Pattern | Why It’s Bad | Do This Instead |
|---|---|---|
| Skipping specs | Implementation drifts, no traceability, agents guess at requirements | Always /add:spec before code |
| Writing tests after code | Tests validate implementation, not behavior — they pass by definition | Strict RED → GREEN order |
| Gold-plating in GREEN phase | Wastes time, tests may not cover extras, scope creep | Minimal code to pass tests, nothing more |
| Ignoring retros | Same mistakes repeated, no learning accumulation | Run retros every 2 weeks minimum |
| Autonomous mode too early | Agent makes wrong decisions without context, rework required | Start Guided, graduate to Balanced |
| Compound acceptance criteria | Can’t test independently, unclear pass/fail | One behavior per AC |
| Skipping VERIFY phase | Quality issues slip through, self-grading homework | Always run /add:verify |
| Not setting away boundaries | Agent makes irreversible changes outside intended scope | Define explicit scope and limits |
| Promoting everything to Tier 2 | Knowledge library becomes noisy, agents waste context window | Only promote universal patterns |
| Starting at GA maturity | Overwhelming process for new projects, friction kills momentum | Start POC, promote as project matures |