Skip to main content

TestChimps' approach to test automation

Consider this scenario in a sample e-commerce app:

A shopper reaches checkout, enters a coupon code, and the coupon card is expired. Expected outcome: checkout fails and no order is created.

Simple as it sounds, this scenario exposes the difference between fragile automation and reliable automation.

The core problem with typical web-based or no-code QA tools

Web-based QA tools operate outside the system boundary of the app under test.

Because they can operate only through the app UI, they must do Arrange, Act, and Assert in the UI layer (the exact expensive top layer the test pyramid warns about):

  • Arrange in UI: click through screens to create setup data, or rely on pre-existing data in the environment.
  • Act in UI: perform user steps.
  • Assert in UI: infer backend truth from visible UI text and states.

For our expired-coupon checkout example, this usually becomes:

  1. Manually wrangle with the DB to insert an expired coupon in staging, then remember to reuse it in the test.
  2. Log in with a known user.
  3. Attempt checkout.
  4. Verify error text in the UI.
  5. Assume no order was created because the UI said so.

This is where suites become brittle. Teams end up with test-order dependencies (one test creates data, another mutates it, another deletes it), environment-specific assumptions, and non-deterministic retries (a retry often runs against data mutated by the first attempt).

Another common failure mode is data aging in shared environments. Expectations drift as time passes (a classic non-determinism source discussed by Martin Fowler in Eradicating Non-Determinism in Tests):

  • Time-windowed entities (coupons, subscriptions, trial periods, scheduled jobs) naturally change state.
  • A test that passed last month can fail today without any product regression.

Example close to our checkout narrative: a happy-path test may rely on a shared "valid coupon" created long ago. A few weeks later that coupon expires, and the same test starts failing at checkout. The failure looks random, but the root cause is stale shared test data.

Boundary difference at a glance

flowchart LR
subgraph Typical["Typical web/no-code tool (outside system boundary)"]
T[Tool] --> UI1[UI only]
UI1 --> ARR1["Arrange in UI"]
UI1 --> ACT1["Act in UI"]
UI1 --> ASS1["Assert in UI"]
ASS1 --> RISK1["Infer backend truth indirectly"]
end

subgraph TestChimp["TestChimp + Claude (inside codebase boundary)"]
C[Claude orchestrated by TestChimp] --> FIX["Arrange via fixtures"]
FIX --> SEED["Seed endpoints create run-scoped entities"]
C --> UI2["Act in UI (only scenario behavior)"]
C --> PROBE["Assert via UI + backend probes"]
PROBE --> REL["Deterministic, parallel-safe tests"]
end

What Playwright recommends

Playwright is designed for parallel workers and retries (see parallelism and retries). That execution model implies a non-negotiable rule:

Each test run attempt must be independently executable with its own isolated data slice.

In other words, a correct Playwright suite should avoid:

  • Dependency on pre-stored shared environment data.
  • Dependency on other tests running before it.
  • Shared mutable entities across retry attempts.

For our hero scenario, every single run attempt should be able to create its own expired-coupon world state, execute checkout, and verify outcomes without touching another attempt's data.

Doing this well requires the right primitives: seed endpoints, probe endpoints, and fixture definitions. Historically, organizational friction made this difficult (for example, QA teams without codebase access). With Claude/Cursor agents operating in the repository, that limitation is removed, and TestChimp upskills the agent to implement this correctly through the infrastructure setup flow in /testchimp init.

Parallel + retry isolation (mini diagram)

flowchart TB
T["Spec: expired coupon checkout"] --> W1["Worker 1 / attempt 1"]
T --> W2["Worker 2 / retry attempt 2"]

W1 --> D1["Fixture seeds:\nuser_run123_attempt1\ncoupon_run123_attempt1\ncart_run123_attempt1"]
W2 --> D2["Fixture seeds:\nuser_run123_attempt2\ncoupon_run123_attempt2\ncart_run123_attempt2"]

D1 --> A1["Act + Assert on attempt1 data only"]
D2 --> A2["Act + Assert on attempt2 data only"]

TestChimps' opinionated Arrange -> Act -> Assert

TestChimp orchestrates Claude inside your codebase, so it is not restricted to browser-only setup.

Tests are still authored as:

Arrange -> Act -> Assert

But each phase happens in the right layer.

Arrange: world-state via fixtures and seed endpoints

Claude first identifies the exact world-state needed for the scenario:

  • A valid user.
  • A coupon card marked as expired.
  • Cart contents and pricing context.

Then it authors or updates fixtures (reusable setup utilities for tests; see Playwright fixtures) that create this state using seed endpoints (infrastructure endpoints dedicated to building backend test entities). In TestChimp terms, this wiring is part of the PR workflow executed in /testchimp test.

Fixtures are parameterized with testInfo (run id, retry attempt metadata, and related context from Playwright testInfo), so entities are scoped per attempt and do not clash across workers or retries.
For example, instead of a static user@example.com, a fixture can create user.e2e3f9a1@testchimp.io (derived from testInfo) so each worker/retry gets an isolated identity.

Result: the test does not "hope" data exists. It builds what it needs every time, matching how SmartTests are meant to run in deterministic CI.

Act: UI interactions for behavior under test

Once the fixture establishes state, the test body performs only the UI interactions that actually represent the scenario:

  • open checkout,
  • apply coupon,
  • submit payment attempt.

The UI is used for behavior execution, not for heavy environment setup.

Assert: verify at the right layer

Assertions are done where they are most accurate:

  • UI assert where user-visible behavior is the requirement (for example, "coupon expired" message).
  • Backend probe assert where state truth matters (for example, confirming no order record was created).

To support that, Claude can define probe endpoints where needed, keeping state checks deterministic and explicit.

Walking the hero use case end-to-end

Let's replay the expired-coupon checkout scenario in this style:

  1. Arrange Fixture calls seed endpoints to create:
    • user_run123_attempt1
    • expired_coupon_run123_attempt1
    • cart linked to that user
  2. Act Test drives checkout flow in UI and applies that coupon.
  3. Assert
    • UI shows "coupon expired" rejection.
    • Backend probe confirms no order entity exists for that run-scoped user/cart.

If this test retries, it gets a fresh isolated slice (attempt2) and runs independently, with no hidden coupling to previous attempts and no impact from other tests mutating shared DB data.

Hero flow (expired coupon checkout)

sequenceDiagram
participant PW as Playwright Worker
participant FX as Fixture(testInfo scoped)
participant API as Seed/Probe APIs
participant UI as Checkout UI
participant DB as Backend State

PW->>FX: Start test attempt (runId, retry)
FX->>API: Seed user/cart/expired coupon using scoped ids
API->>DB: Create isolated entities
PW->>UI: Perform checkout + apply coupon
UI-->>PW: Show "coupon expired"
PW->>API: Probe order state for scoped user/cart
API->>DB: Query order existence
API-->>PW: No order created
PW-->>PW: Test passes (reliable Arrange -> Act -> Assert)

Why this is significantly more reliable than browser-only automation

This approach reduces flake because it:

  • Eliminates shared mutable test data.
  • Removes dependence on environment-specific preloaded entities.
  • Reduces UI setup noise by moving Arrange to APIs/fixtures.
  • Uses backend probing for truth when UI alone is insufficient.
  • Aligns perfectly with Playwright's worker + retry execution model.

The result is high-signal tests that fail for product reasons, not orchestration noise.

This vs typical web-based QA tools

ConcernTypical outside-the-system toolsTestChimps' approach
ArrangeUI setup or pre-existing env dataFixture-driven, seed-endpoint setup
IsolationOften shared data, test-order couplingPer-run/per-attempt scoped entities
AssertMostly UI-onlyUI + backend probes where needed
Retry safetyFragile with shared stateBuilt for parallel + retry execution
Infra evolutionLimited ability to change system internalsClaude can update seeds/probes/fixtures in repo

What teams this approach is best suited for

This model is best suited for development teams that own the codebase and treat test automation as part of the "definition of done."

Why:

  • The approach depends on editing system internals (seed endpoints, probe endpoints, fixtures).
  • Those changes should be code-reviewed by people who understand domain behavior and platform constraints.
  • Dev teams are best positioned to maintain this setup as product logic evolves.

Prerequisites (important)

TestChimps' approach requires codebase ownership and the ability to modify test infrastructure in the system under test.

If a QA team is fully outsourced and does not have repository-level access to implement and maintain seed/probe/fixture plumbing, they will not be able to gain the benefits of this approach.

Final takeaway

TestChimp's opinionated approach is not "more UI automation." It is proper Playwright automation authored by Claude/Cursor with the right constructs in the right layer.

Arrange and Assert are API/fixture-led, Act stays focused on meaningful UI behavior, and each run attempt gets isolated state. That is why the tests are significantly more reliable than typical no-code or browser-only workflows.

Further reading