QA Workflow for Agent-Built Apps
Short answer
Agent-built apps need a repeatable QA loop, not one-off test files. TestChimp connects markdown plans, SmartTests in Git, test runs, TrueCoverage, and /testchimp commands so every merge improves verified coverage.
Who this is for
Any team shipping with Cursor, Claude Code, Copilot, Windsurf, Lovable, Replit, or Codex who merges daily and cannot rely on session-scoped agent tests alone.
The problem with agent-only QA
Coding agents optimize local success. Without orchestration you get:
| Symptom | Business impact |
|---|---|
| Demo-only tests | Revenue paths untested under real data |
| No scenario links | Reviewers cannot see requirement regressions |
| Missing Arrange layer | Parallel CI flakes on shared staging |
| No TrueCoverage loop | Production gaps discovered in support tickets |
| Stale suites | Next agent session rewrites unrelated tests |
The TestChimp loop
1. Init (once per repo)
/testchimp init scaffolds seed/probe/teardown routes, fixtures, Playwright CI, and TrueCoverage instrumentation (init). This is the world-state layer agent-generated tests must plug into.
2. Test (every feature PR)
/testchimp test — agents read markdown plans, extend SmartTests for the PR diff, wire // @Scenario: links, and run scoped suites (test). Orchestration uses requirement gaps, CI history, and TrueCoverage—not chat memory alone.
3. Explore (UX risk)
/testchimp explore — ExploreChimp analytics on SmartTest pathways; UX findings roll up via the same scenario spine (explore · explorations).
4. Evolve (continuous)
/testchimp evolve — close gaps from TrueCoverage and test run history after deploys (evolve).
Three realities to align
| Reality | Source | When misaligned |
|---|---|---|
| Planned | Markdown scenarios + @Scenario links | Features ship without tests |
| Tested | CI runs + test run telemetry | False confidence from stale suites |
| Production | TrueCoverage user events | Untested journeys until incidents |
Builder-specific guides
| Platform | Guide |
|---|---|
| Lovable | Testing apps built with Lovable |
| Cursor | Testing apps built with Cursor |
| Claude Code | Testing apps built with Claude Code |
| Replit | Testing apps built with Replit |
| Copilot | Testing apps built with Copilot |
| Windsurf | Testing apps built with Windsurf |
| Codex | Testing apps built with Codex |
| Vibe-coded | Testing vibe-coded apps |
Getting started
- Connect Git and install the TestChimp skill
- Run
/testchimp init - Add markdown scenarios for top revenue paths
- Gate the next PR with
/testchimp test - Enable TrueCoverage; schedule
/testchimp evolveafter deploys
See also Autonomous QA platform and QA on Autopilot.
Frequently asked questions
Which agent tools work with this workflow?
Cursor, Claude Code, Copilot, Windsurf, Codex, and others via the TestChimp skill and MCP—any agent that edits your repo and runs `/testchimp` commands.
Do we still need markdown test plans?
Yes—they tell agents and reviewers what must stay covered. `/testchimp test` reads scenarios from Git; traceability is the source of truth, not chat memory.
We already use coding agents—do we still need TestChimp without QA?
Agents alone produce session-scoped tests. TestChimp orchestrates agent-built apps with markdown plans, CI history, ExploreChimp, and TrueCoverage—`/testchimp test` on every PR so developers drive QA without a separate org.
Agent-written tests failed overnight—how does TestChimp recover?
Because SmartTests live in Git with scenario links, the next `/testchimp test` run sees CI history and TrueCoverage gaps, then opens a fix PR—not a fresh chat thread. Deterministic Arrange/Assert steps fail fast; hybrid AI steps absorb copy or layout churn without rerunning entire agent sessions.
Apply these patterns in your repo
Run `/testchimp init` to connect TestChimp to your repo, then `/testchimp test` on PRs to turn these patterns into maintained SmartTests. Use `/testchimp evolve` when you want to expand coverage as your app grows.