ADD v0.9 — M3 Pre-GA Hardening ships; maturity promoted to beta

v0.9 is the release that earns ADD the right to call v1.0 “GA.” Every claim in the README now has a test next to it. The TDD rule can’t be silently bypassed by an agent deleting a failing test — a gate fails the cycle. The security posture isn’t prose; it’s an auto-loaded rule plus a scan hook plus a documented threat model. The Codex adapter no longer writes to a deprecated prompts path; it emits native Codex Skills with preserved frontmatter. Telemetry is OTel-aligned. Context is cache-disciplined. AGENTS.md is generated, not hand-maintained.

tl;dr — Seven specs drafted in parallel, built in parallel by seven worktree-isolated agent swarms during a ten-hour /add:away session, squash-merged sequentially with rebase resolutions, and promoted through a v0.8.1 hotfix after a plugin-family review surfaced three genuine shipping bugs. Three signed tags went out back-to-back: v0.8.1 (hotfix), v0.9.0 (M3 ships, alpha→beta), v0.9.1 (beta polish, CI wired). 93 tests run on every PR. Install v0.9.1; migration from any version ≥ 0.5.0 is automatic.

What shipped in M3

M3 was a coordinated milestone of seven independent-but-related specs, each with disjoint file sets so they could be developed in parallel without merge conflicts. The full parallelism analysis lives in docs/milestones/M3-pre-ga-hardening.md.

Spec	What it does	PR
`agents-md-sync`	Maturity-aware `AGENTS.md` generator at project root; ADD-managed content wrapped in marker blocks so user edits survive regen; PostToolUse staleness hook; dog-fooded on this repo.	#8
`cache-discipline`	Stable-prefix layout convention for sub-agent dispatch. Auto-loaded rule + validator (warn-only in v0.9, strict in v1.0) + audit of four highest-impact skills.	#9
`secrets-handling`	Auto-loaded rule + `.secretsignore` template + pre-commit gate in `/add:deploy`. Scaffolded `core/knowledge/threat-model.md`.	#10
`telemetry-jsonl`	Append-only `.add/telemetry/{date}.jsonl` aligned with OpenTelemetry GenAI semantic conventions. Cost & velocity panel in `/add:dashboard`. Cache-read / cache-creation / hit-ratio fields.	#11
`codex-native-skills`	Codex adapter migrated from the deprecated `~/.codex/prompts/` path to the native `~/.codex/skills/add-<name>/SKILL.md` layout with preserved YAML frontmatter. AGENTS.md slimmed to a manifest. Codex CLI pinned at 0.122.0.	#12
`test-deletion-guardrail`	Defends the signature TDD claim against the Kent Beck / TDAD-paper failure mode (“the agent doesn’t want to do TDD — it deletes the failing test”). `/add:verify` Gate 3.5 fails the cycle if `tests_removed > 0` without a recorded override.	#13
`prompt-injection-defense`	Auto-loaded rule + PostToolUse scan hook pattern-matching tool output against eight named patterns (OWASP Top 10 Agentic 2026, Snyk ToxicSkills, Comment-and-Control). Full T1–T5 threat model.	#14

Every spec ships with fixture-based tests. Nine suites, 93 tests total, running on every PR and push to main via the new guardrails.yml workflow.

The failure mode TDD guardrail defends against

Kent Beck flagged it first in April 2026: the genie doesn’t want to do TDD. Give an agent a failing test and the task of making it pass, and sometimes the agent picks the wrong lever. Deleting the test is a faster path to green than implementing the behavior. In the wild: Anthropic’s own eval work on test-driven agentic development (“TDAD”) caught agents silently weakening assertions, renaming to the same trivial assert-true body, or wholesale removing tests it couldn’t satisfy. The failing test is the signal — and the agent has a motive to mute the signal.

ADD’s /add:verify now fails the cycle if the test count decreased without an explicit, recorded override. Same-name body rewrites (replacing a real test with a trivial one) require both an --allow-test-rewrite flag and a recorded override in .add/cycles/cycle-{N}/overrides.json — or an [ADD-TEST-DELETE: <reason>] commit trailer for the range. Renames (same body, new name) pass. When a test is genuinely obsolete, the human is in the loop; when the agent is cutting corners, the cycle fails.

The original v0.9.0 implementation had a quiet flaw: the --allow-test-rewrite flag by itself bypassed the approval check, even though the error message said flag and override were required. Caught by the plugin-family review four hours after v0.9.0 almost tagged; fixed in v0.8.1 before any user install. A regression fixture (replacement-with-flag-no-override) now proves the bypass is closed.

Security: not prose, runtime

v0.9 treats the security posture the way the rest of ADD treats TDD — with hooks and tests, not manifestos. Two rules and one runtime hook now auto-load in every ADD session:

core/rules/secrets-handling.md — defines the regex catalog, the read-deny list, the redact-on-ingest invariant, and the pre-commit gate wired into /add:deploy.
core/rules/injection-defense.md — treats untrusted content (PR comments, web fetches, foreign-repo docs, node_modules/) as data, never as instructions.
runtimes/claude/hooks/posttooluse-scan.sh — runs after every Read, WebFetch, WebSearch, and Bash tool call. Pattern-matches the output against core/security/patterns.json (eight named patterns). Audit events append to .add/security/injection-events.jsonl; an ADD-SEC: warning surfaces in the agent’s next turn.

Defended attacks are enumerated in threat-model.md — T1 accidental secrets disclosure, T2.1 direct injection via PR/issue comment (the January 2026 Comment-and-Control attack), T2.2 indirect injection via WebFetch (OWASP LLM01), T2.3 hostile README in a foreign repo, T2.4 malicious payload in node_modules (Snyk ToxicSkills), T2.5 Comment-and-Control signature match. Out-of-scope threats are listed explicitly — the goal is to calibrate the claim, not overpromise. v0.9 is warn-only for T2; block-on-critical is v1.0 scope with a tested allowlist.

Users can extend without forking. Drop a patterns.json at .add/security/patterns.json (project scope) or ~/.claude/add/security/patterns.json (workstation scope). enabled: false in a higher-precedence file disables a default pattern.

Codex: native Skills, not prompts

Codex CLI shipped Skills with YAML frontmatter, sub-agents via TOML, and hooks in the same window ADD was still compiling to ~/.codex/prompts/ — a deprecated target. v0.9 migrates:

dist/codex/.agents/skills/add-<name>/SKILL.md — native Codex Skills with preserved frontmatter for description-matched dispatch.
dist/codex/.agents/skills/add-<name>/agents/openai.yaml — per-skill invocation policy (explicit-only on high-leak interview skills).
dist/codex/.codex/agents/{role}.toml — sub-agent TOMLs (test-writer, implementer, reviewer, explorer).
dist/codex/.codex/config.toml — global [agents] + [features] config block.
dist/codex/.codex/hooks/*.sh + hooks.json — SessionStart / Stop / UserPromptSubmit hook registration.
dist/codex/AGENTS.md — slimmed from 3,101 lines to 81 (a manifest pointing at the skills directory, not inlining every skill body).

The Codex CLI version is now pinned in runtimes/codex/adapter.yaml so a mid-flight Codex schema change doesn’t silently break the install. The v0.8.1 hotfix closed the coda to this one: the install path had shipped broken, with generated skills referencing ~/.codex/templates/ while the installer staged shared assets under the namespaced ~/.codex/add/. Every skill invocation would have failed to resolve its asset refs. Fixed in v0.8.1, with a regression smoke test that installs into a temp CODEX_HOME and asserts every path resolves.

AGENTS.md as an output, not an input

AGENTS.md is now the cross-tool standard (Linux Foundation, 60k+ projects). ADD treats it as an output — your project gets a tool-portable AGENTS.md at its root that any agent in any tool (Cursor, Codex CLI, Copilot, Windsurf, Amp, Devin) can read without needing the ADD plugin installed.

The generator is maturity-aware: POC gets bullet points, Alpha gets sectioned content, Beta gets the full structure (autonomy ceiling + active-spec pointer + conditional TDD section), GA gets full content plus team conventions. ADD-managed content is wrapped in  marker blocks so your hand-written sections survive regeneration. Four modes: --write (default), --check (CI drift gate), --merge / --import (absorb hand-curated AGENTS.md).

A PostToolUse hook writes .add/agents-md.stale when .add/config.json, core/rules/*.md, or core/skills/*/SKILL.md changes and an AGENTS.md exists at root. The hook never auto-rewrites — the human triggers regen. ADD dog-foods the skill; the AGENTS.md at the plugin’s own root is generated by /add:agents-md.

Telemetry: OTel-aligned, not bespoke

ADD now emits structured telemetry to .add/telemetry/{YYYY-MM-DD}.jsonl aligned with the OpenTelemetry GenAI semantic conventions (gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.usage.cache_read_input_tokens, gen_ai.usage.cache_creation_input_tokens, and a derived cache_hit_ratio). Daily rotation. Null-safe — missing fields emit null, never an error.

/add:dashboard grows a Cost & Velocity panel: per-skill and per-cycle aggregations with an inline SVG trend chart. The EU AI Act deadline in August 2026 pushed audit/cost attribution up the priority stack; this closes that gap without adding any runtime dependency.

Cache discipline: structural, not quantitative

Anthropic’s prompt cache offers up to 90% input-cost discount on cache hits and Anthropic’s own case study reports 85% latency reduction on long agent sessions when prompts are cache-aware. The discipline is simple: stable preamble first, volatile state last. v0.9 formalizes it as an auto-loaded rule (core/rules/cache-discipline.md), a validator (scripts/validate-cache-discipline.py), and a remediation pass on the four highest-impact skills (tdd-cycle, implementer, reviewer, verify).

SKILL.md files that dispatch via Task now wrap the cacheable region in  and the per-call region in . Test-writer, implementer, and reviewer dispatches share a byte-identical stable prefix; only the volatile role-specific tail differs. Cache hits compound across sub-agent calls. The validator runs warn-only in v0.9 and --strict in v1.0 once every skill is remediated.

Alpha → Beta

v0.9 is the release that earns the promotion. Readiness against the cascade matrix in core/rules/maturity-lifecycle.md: 12 of 13 applicable requirements met. The single exemption — wiring the guardrail suites into CI — was time-boxed to v0.9.x and closed in v0.9.1. Exemptions list is now empty.

Cascade changes active at beta:

Strict TDD enforcement (not just critical paths — all code)
Reviewer agent recommended on all changes (up from optional)
Balanced /add:away autonomy (plan reviewed, execution autonomous, verify with human)
Interview depth: ~12 questions (up from ~8)
Parallel-agent ceiling raised 2 → 3

Next promotion criteria for GA (logged in .add/config.json): guardrail suite running in CI and release-blocking (done), real Claude + Codex install smoke in CI, per-runtime capability matrix in release notes, 60-day stability at beta, marketplace submission approved, 20+ projects using ADD.

How it was built: seven swarms, one session

The seven M3 specs were drafted, built, and merged in a single session using ADD’s own /add:away workflow. A ten-hour autonomy window, explicit hard boundaries (no merge to main, no prod deploy), and seven parallel worktree-isolated agent swarms — one per spec. Each swarm followed the ADD SDLC end-to-end: read spec, draft plan, write RED fixtures, implement GREEN, run the verify gates, open PR.

Wave 1 shipped three Small specs in parallel (agents-md-sync, cache-discipline, secrets-handling). Two of the three swarms fell out of their assigned worktrees early in the run — Swarm A used absolute paths rooted at the main repo rather than its worktree, and Swarm B ended up working directly in main’s tree. Neither broke correctness, but Wave 2 briefs added explicit pwd-first verification and absolute-path warnings. Wave 2+3 ran four concurrent swarms (prompt-injection-defense, test-deletion-guardrail, telemetry-jsonl, codex-native-skills) with zero worktree leaks. Total wall-clock for seven specs: about 64 minutes.

A plugin-family review from a separate agent surfaced twenty findings post-merge. Three were genuine shipping bugs: the Claude marketplace manifest had an invalid top-level description, the Codex install path mismatch (confirmed by building a temp-home smoke test), and the test-rewrite bypass. These became the v0.8.1 hotfix. The remaining seventeen findings split into real-but-small v0.9.x work (rule parity, validator tightening, /add:version cross-runtime path, CI wiring — all shipped in v0.9.1) and architectural scope dressed up as release criteria (host-neutral kernel, adapter-contract schema, command-catalog generator) — parked as an M4 milestone.

Installing / upgrading

claude plugin install add           # fresh install
claude plugin update add            # from any 0.5.0+ install

Migration is automatic from any version ≥ 0.5.0. You’ll see a one-time notification on first skill invocation after upgrade. No manual steps, no file edits. On the Codex side:

./scripts/install-codex.sh          # native Skills layout (v0.9+)

Verify the signed release tag:

curl -fsSL https://github.com/MountainUnicorn.gpg | gpg --import
git tag --verify v0.9.1
# Good signature from "Anthony Brooke <anthony.g.brooke@gmail.com>"

Full v0.9.1 release notes → · v0.9.0 release notes → · v0.8.1 release notes →

Tagged: v0.9 · M3 · beta · multi-runtime · security · TDD-guardrail · telemetry · cache-discipline · AGENTS.md