AI Test Generation Explained
Short answer
AI test generation fails when it stops at recorded clicks. TestChimp generates Playwright SmartTests with Arrange (fixtures/API), Act (UI), Assert (probes)—scoped by markdown scenarios, PR diffs, and TrueCoverage gaps, then maintains them on every merge.
The generation misconception
Many teams equate AI test generation with exporting a script from chat or recording a session. That produces files that:
- Lack backend Arrange — parallel CI fights over shared users and coupons
- Use UI-only Assert — green toasts while orders or auth state are wrong
- Have no scenario link — reviewers cannot see requirement impact
- Rot after the next UI regen — no evolve loop tied to production
TestChimp treats generation as ongoing portfolio work inside PRs—not a one-time artifact.
Inputs that matter
| Input | Why generation needs it |
|---|---|
| Markdown scenarios | Defines what to prove—not whatever the model guesses (test planning) |
| PR diff | Scopes work to what changed this merge |
| TrueCoverage | Surfaces what users do that tests still miss (TrueCoverage) |
| Manual capture session | Grounds agents in real UI state (manual capture) |
| Test run history | Informs repair, not blind regen (test runs) |
Without these inputs, “AI generation” is indistinguishable from random exploration.
Generation paths in TestChimp
Per-PR generation (/testchimp test)
On every feature PR, agents read plans, identify affected scenarios, and author or update SmartTests. /testchimp init supplies seed/probe routes so Arrange and Assert stay stable (test).
Capture → generate prompt
Manual testers record flows; agents convert sessions into SmartTest prompts with deterministic steps wherever possible (creating SmartTests).
Agent skill in IDE
Cursor, Claude Code, and similar tools invoke the TestChimp skill to write Playwright—but orchestration happens when /testchimp test runs with MCP context (QA on Autopilot).
Output format: SmartTests
SmartTests are Playwright on disk with:
- Optional
ai.act/ai.verifyfor volatile UI // @Scenario:links for traceability (linking scenarios)- Standard trace, screenshot, reporter debugging
You review and merge like any application code.
Example scenario
Situation: An AI tool generates 200 lines of UI clicks for checkout.
Expected outcome: SmartTest with seed coupon, focused Act, probe Assert, scenario link.
Why UI-only automation breaks: Generated script passes while tax calculation bug ships—no probe.
- Arrange: Seed route creates cart with known SKU and tax jurisdiction.
- Act: Playwright completes checkout UI steps.
- Assert: Probe returns order total, tax line, and payment status.
TestChimp workflow: Evolve adds international tax variant when TrueCoverage shows prod usage.
Same Arrange/Act/Assert pattern as expired-coupon checkout.
Maintenance loop (generation is not one-shot)
| Phase | Command | Purpose |
|---|---|---|
| PR work | /testchimp test | Generate or repair scoped to diff + plans |
| Post-deploy | /testchimp evolve | Close TrueCoverage and plan gaps (evolve) |
| UX risk | /testchimp explore | ExploreChimp on SmartTest paths (explore) |
Generation without maintenance is technical debt at startup velocity.
Why SmartTests beat record-replay generation
| Record-replay / chat export | TestChimp SmartTests | |
|---|---|---|
| Asset | Opaque session or chat file | Playwright in Git |
| Arrange | Shared staging luck | Seed routes per run |
| Assert | UI text | Probes + optional UI |
| Traceability | None | @Scenario links |
| CI debug | Poor | Standard Playwright |
See record-replay vs TestChimp and pure agentic vs SmartTests.
Compare commercial “AI generation” tools
English SaaS runners (testRigor, Testsigma, Momentic, …) optimize authoring speed but often lock you into vendor execution. TestChimp generates Playwright you own with orchestration and TrueCoverage: AI test generation tool (landing) · TestChimp vs testRigor.
Frequently asked questions
Is AI test generation a one-time export from chat?
Not with TestChimp. Generation happens in PRs via `/testchimp test`, anchored to markdown scenarios and informed by prior runs and TrueCoverage—suites evolve after deploy through `/testchimp evolve`.
Do generated tests flake in CI?
Flakes usually mean missing Arrange (shared coupons, users) or UI-only Assert. TestChimp pushes seeds and probes; hybrid AI covers volatile UI. Agents fix harness and tests together in the failing PR.
Can we review AI-generated tests before merge?
Yes—Playwright diffs in Git with `// @Scenario:` links showing requirement impact. Same review flow as hand-written automation.
We already use coding agents—do we still need TestChimp without QA?
Agents alone produce session-scoped tests. TestChimp orchestrates AI generation with markdown plans, CI history, ExploreChimp, and TrueCoverage—`/testchimp test` on every PR so developers drive QA without a separate org.
Generate and maintain SmartTests in Git
Scope generation to scenarios and TrueCoverage—/testchimp test on every PR, evolve after deploy.