15 posts tagged with "Announcement"

Announcement tag description

How to Find Duplicate Tests in a Playwright Suite (Semantic Graph for Agentic QA)

July 4, 2026 · 10 min read

Founder & CEO, TestChimp

TL;DR: When coding agents can write dozens of Playwright tests in a single session, the bottleneck shifts from authoring to governance: are the new tests distinct and useful, or just near-duplicates of what you already have? Semantic Graph is a free, open-source CLI that scans your suite, embeds each test semantically, clusters related tests, and renders an interactive graph so you—and your agent—can spot redundancy before it compounds.

Semantic Graph visualization — folder tree, 2D similarity graph, and cluster list view

The new problem: agents author tests en masse

For most of the last decade, the hard part of E2E testing was throughput: humans could not write and maintain enough tests to keep up with product velocity.

That constraint is collapsing. With Claude Code, Cursor, and agent skills like the TestChimp skill, a single prompt can produce a folder of well-formed Playwright specs in minutes. Coverage gaps that used to take a sprint to close can shrink to an afternoon.

The bottleneck has moved.

Era	Primary constraint	What "good" looked like
Manual QA	Authoring speed	Enough tests to cover the happy path
Human + low-code tools	UI-layer setup friction	Stable POMs, fewer flakes
Agentic QA	Suite quality at scale	Distinct, high-signal tests—not copies

When an agent is rewarded for adding tests—closing coverage gaps, responding to PR feedback, or filling in scenarios from a test plan—it has no innate sense of "this already exists, slightly reworded." Left unchecked, suites balloon with:

Duplicate tests that assert the same behaviour under different titles
Near-duplicates that differ only in fixture data or selector phrasing
Clustered redundancy where five tests all exercise the same checkout edge case
Invisible overlap across folders, because no human (and no agent) holds the entire suite in working memory

This is the QA equivalent of boiling the lake in the wrong direction: lots of heat, little new coverage. Worse, duplicate tests inflate CI time, confuse failure triage, and give a false sense of depth—your line count grows while your behavioural breadth stalls.

The question is no longer "Can we write more tests?" It is:

"Are we writing useful, distinct tests—or just duplicative ones?"

That question needs a semantic answer, not a filename diff.

What is Semantic Graph?

Semantic Graph is an open-source tool from TestChimp that maps your Playwright test suite by meaning, not syntax.

It is published as @testchimp/semantic-graph on npm and lives in the TestChimp/semantic-graph repository. Run one command against your tests directory; the CLI:

Scans *.spec.ts, *.test.ts, and related Playwright files
Parses each test's suite path, title, intent comments, scenario annotations, and body
Embeds the canonical test text with an embedding model (OpenAI or Voyage AI)
Clusters tests by semantic similarity using DBSCAN
Lays out a 2D graph with UMAP so similar tests appear close together
Names clusters with a lightweight LLM pass (e.g. "auth", "checkout", "api-contracts")
Serves a local interactive UI at http://localhost:3859

No database. No TestChimp account required. Embeddings are computed in memory each run—ideal for local audits, pre-merge reviews, or giving an agent a structural view of the suite before it authors more tests.

How it works (the pipeline)

Understanding the pipeline helps you interpret the graph—and tune how agents use it.

1. Parse tests into embedding-ready text

The core library (@testchimp/semantic-graph-core) includes a vendored Playwright-aware parser. For each test it builds canonical text:

Suite: checkout > guest flow
Test: rejects expired coupon at payment step
Body:
Scenario: Guest checkout with invalid coupon
// intent: verify error copy and no charge created
await page.goto('/checkout');
...

Parsing captures intent comments and scenario annotations—the same metadata agents should be authoring anyway when following requirement traceability conventions. Two tests with different selectors but the same intent will land close together in embedding space.

2. Embed with cosine similarity

Each test's text is sent to an embedding API in batches (default model: text-embedding-3-small for OpenAI, voyage-4 for Voyage). The tool computes cosine similarity between vectors and applies configurable thresholds:

Signal	Default threshold	Meaning
Graph edge	≥ 0.75	Tests are semantically related
Similar	≥ 0.80	Worth reviewing together
Potential duplicate	≥ 0.92	Strong dedup candidate

These thresholds mirror how humans judge redundancy: not byte-identical, but "would a failure in one make the other pointless?"

3. Cluster with DBSCAN

Similar embeddings are grouped with DBSCAN density clustering—no need to pick k clusters upfront. Each cluster gets an LLM-generated label (e.g. "settings-page", "admin-tasks") so the legend is readable at a glance.

4. Visualize with UMAP + D3

A seeded UMAP projection maps high-dimensional embeddings to 2D coordinates. The bundled UI (built with D3.js) renders:

Graph view — nodes as tests, edges as similarity links; click a node to see nearest neighbours and duplicate flags
Clusters view — grouped list with colour-coded legend
Folder tree — scope the graph to a directory or single file

Zoom into tests/checkout/ before a refactor. Scan the whole suite before a release. Hand the URL to an agent and ask it to propose merges.

Why this matters for agentic QA workflows

Semantic Graph is not a replacement for TrueCoverage—production-informed prioritization—or requirement traceability. It solves a orthogonal problem: intra-suite redundancy.

Here is where it fits in a modern agent loop:

Before the agent writes

Run Semantic Graph and attach the cluster summary to the agent's context. Instructions become concrete:

"We already have four tests in the checkout cluster covering coupon validation. Do not add another unless you are testing a different failure mode."

This is cheaper and more reliable than asking the agent to grep test titles.

After the agent writes

Re-run the graph on the PR branch. New nodes that snap onto existing clusters—or spike duplicate scores above 0.92—are review flags. Pair with CI the same way you gate on lint or coverage deltas.

During suite health reviews

Quarterly "suite diet" sessions used to mean spreadsheets and gut feel. Now: filter to clusters with high internal similarity, merge or delete, and measure CI time recovered.

Complement to production signals

TrueCoverage tells you what behaviours users need tested. Semantic Graph tells you whether your existing tests are saying the same thing twice. Both are necessary for a suite that is broad and lean.

What you see in the UI

The demo above shows the full workflow:

Left panel — folder tree mirroring your repo layout; click a folder or file to scope the view
Graph mode — force-directed layout; proximate nodes are semantically alike
Clusters mode — tests bucketed with named themes
Popover — click any test to see top similar neighbours, similarity scores, and potential duplicate badges

The UI ships inside the npm package—no separate install. It is the same "freebie" static app published as @testchimp/semantic-graph-viz in the monorepo for anyone who wants to embed or fork it.

Try it yourself

Prerequisites

Node.js 18+
An API key for embeddings (and cluster naming):
- OpenAI — one key covers embeddings + LLM, or
- Anthropic + Voyage — Claude for cluster labels, Voyage for embeddings (Anthropic does not ship an embedding API)

Quick start (OpenAI)

export PROVIDER=openai
export API_KEY=sk-...

npx @testchimp/semantic-graph visualize --tests-dir ./tests

Open the printed URL (default port 3859). Add --verbose for embedding progress and diagnostics.

Claude + Voyage

export PROVIDER=anthropic
export API_KEY=sk-ant-...
export VOYAGE_API_KEY=pa-...

npx @testchimp/semantic-graph visualize --tests-dir ./tests

All options

Flag	Description
`--tests-dir <path>`	Root folder to scan (required)
`--port <n>`	Listen port (default `3859`)
`--verbose` / `-v`	Diagnostics to stderr

See the README for environment variables, monorepo build instructions, and npm publish details.

Continuous governance with TestChimp

Semantic Graph is deliberately local and standalone—a flashlight you can shine on any Playwright repo, TestChimp customer or not.

For continuous duplicate detection, requirement traceability, release confidence, and keeping suites healthy as agents keep authoring, see TestChimp—the git-native QA governance platform built for agentic teams. Install the TestChimp Agent Skill and run /testchimp test after each PR to orchestrate coverage, exploration, and plan alignment in one loop.

FAQ

What test file types are supported?

The scanner picks up *.spec.ts, *.spec.js, *.test.ts, *.test.js, and .mjs / .cjs variants under your chosen root—standard Playwright test layouts.

Does it require a TestChimp account?

No. Semantic Graph runs entirely locally. You only need embedding (and optionally LLM) API keys.

How is this different from code coverage?

Code coverage measures which lines executed. Semantic Graph measures whether test intentions overlap. A suite can have high line coverage and still be full of redundant scenarios.

How is this different from duplicate detection by test name?

Titles lie. Agents especially love paraphrasing: "should reject invalid coupon" vs "guest user sees error for expired promo code." Embeddings capture the full body and intent, not the string on line one.

Can I use it in CI?

Today the primary interface is the local visualize command and JSON APIs (/api/graph, /api/similar). For CI gates, parse the API responses or run before review and archive the graph output. Continuous server-side governance is on the TestChimp platform roadmap.

What embedding models are supported?

Defaults: text-embedding-3-small (OpenAI) and voyage-4 (Voyage). Override with EMBEDDING_MODEL. LLM cluster naming defaults to gpt-5-nano or claude-3-5-haiku-latest.

Is the source code open?

Yes. MIT-licensed monorepo: github.com/TestChimp/semantic-graph. Packages: @testchimp/semantic-graph-core, @testchimp/semantic-graph, @testchimp/semantic-graph-viz.

Summary

Agentic QA solved test authoring at scale. The next discipline is test distinctness at scale—ensuring every new spec adds behavioural breadth, not noise.

Semantic Graph gives you a semantic map of your Playwright suite: embeddings for meaning, DBSCAN for clusters, UMAP for intuition, and a local UI for humans and agents alike. Run it before you merge agent-authored tests. Run it when CI gets slow. Run it when you suspect the lake is boiling but not reducing risk.

Get started: github.com/TestChimp/semantic-graph · npx @testchimp/semantic-graph visualize

References and further reading

TestChimp Semantic Graph repository — source, README, and issue tracker
@testchimp/semantic-graph on npm — CLI package
Playwright Test documentation — supported project layouts
OpenAI Embeddings guide — text-embedding-3-small and related models
Voyage AI documentation — embeddings when using Claude as the LLM provider
UMAP: Uniform Manifold Approximation and Projection — dimensionality reduction for the 2D layout
DBSCAN clustering — density-based cluster assignment
Fixtures in agentic test automation — complementary TestChimp blog on Arrange-layer quality
TrueCoverage for agentic QA — production-informed test prioritization
TestChimp Agent Skills — orchestrate QA workflows in Claude and Cursor

From Manual Session to Automation Test

June 13, 2026 · 4 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

Manual testing still finds what automation misses—but too often, the path from a good manual run to a reliable automated test is broken.

Teams try Playwright codegen or record-replay tools, get a script quickly, and then spend weeks fighting flakes: shared data, missing assertions, no link back to the scenario, and no fit with POMs or fixtures already in the repo.

Today we’re announcing a workflow we recommend for turning manual sessions into SmartTests: capture with traceability, then let a coding agent upskilled with TestChimp author automation that actually belongs in your codebase.

Manual session to automation

The problem with “just record it”

Record-replay—including Playwright codegen—optimizes for mirroring UI clicks. That is not the same as authoring a repeatable test.

Real automation needs:

Arrange: seed data, fixtures, run-scoped entities
Act: the journey that matters (often shorter than what a human clicked through)
Assert: UI checks and backend state where outcomes live

Recorders capture the act layer well. They usually skip arrange and assert, and they never know which business scenario you were proving.

The result is familiar: tests that pass once on a developer machine, then fail in CI because the world-state was never set up—or because the script asserts the wrong thing (or nothing at all).

What we do instead

TestChimp connects manual execution, test planning, and agent-authored Playwright in one loop.

1) Capture the manual session—with scenario context

Use the TestChimp Chrome extension Manual tab to record a session while exercising your app. Start from Test Planning so the scenario is pre-linked (recommended), or link a scenario as part of the workflow.

What gets stored:

Step-by-step actions and screenshots
Linked scenarios (business context)
Environment and release metadata
Pass/fail outcome and optional bugs/notes

The session is auditable manual evidence and the reference for automation—not a throwaway recording.

2) Generate prompt → coding agent

Open the session in TestChimp (Executions → Manual Sessions) and click Copy test generate prompt. Paste it into your agent host (Cursor, Claude Code, etc.) with the TestChimp skill installed.

The agent pulls rich context via get-manual-session-details (CLI or MCP):

Recorded steps
Linked scenarios and scenario steps
Screenshots for visual grounding
Project layout and existing POMs, fixtures, seed/probe endpoints

It uses the manual walkthrough as reference, navigates the app to validate selectors, and writes a SmartTest that reuses your harness—not a blind replay file.

3) Continuous improvement—not one-shot codegen

Authoring does not stop at the first green run. TestChimp’s feedback loop surfaces coverage gaps (planned scenarios and TrueCoverage behaviour signals). Your agent runs /testchimp test on PRs and /testchimp evolve on a schedule or after deploys to close gaps, extend fixtures, and keep tests aligned with how users actually behave (QA on Autopilot).

The Web IDE is where you view tests, run them, and see insights aligned with your test folder structure—not where we expect most authoring to happen anymore.

How this differs from record-replay vendors

Tools like mabl, Katalon, and Testim (and codegen at the framework level) center on capture → replay. They can speed up first script creation, but they typically:

omit fixture-backed world-state
lack in-repo scenario traceability at authoring time
rarely generate backend probe assertions
produce tests that do not compose with your existing Playwright patterns

TestChimp’s manual-to-auto path is informed agent authoring: session + scenario + screenshots + your repo conventions → repeatable Playwright in Git. See the full comparison: Why record-replay falls short in creating repeatable tests.

When to use which path

Situation	What we recommend
Exploratory selector discovery	Playwright codegen or inspector—disposable output
Turning a validated manual scenario into CI automation	Manual capture → generate prompt → TestChimp agent
Ongoing suite maintenance and gap closure	`/testchimp evolve` + coverage insights
Viewing tests and folder-aligned insights	TestChimp Web IDE

Get started

Install the Chrome extension and add the TestChimp skill to your coding agent.
Capture a manual session from a linked scenario (manual test capture guide).
Copy test generate prompt and let the agent author the SmartTest (Creating SmartTests).
Wire /testchimp test into your PR flow and schedule /testchimp evolve for portfolio upkeep.

Manual testing stays human. Automation becomes engineering-grade—because the agent authors like an engineer who read the scenario, not like a recorder that only heard the clicks.

Test Runs: Turn Testing Into Release Confidence

June 7, 2026 · 11 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

TL;DR: TestChimp now has Test Runs—named validation campaigns that roll up scenario progress across manual sessions and automation batches. If you have used test runs in TestRail, Qase, or PractiTest, the concept will feel familiar. What is different is that scope, progress, and drill-down inherit the folder structure of your test plan—not a flat, manually curated case list copied into yet another container.

Test Run viewer — overview, trends, and folder-scoped scenario progress

What is a test run?

In software testing, a test run is the execution of a defined set of tests against a specific version or build of the system under test. The ISTQB glossary defines it as “the execution of a test suite on a specific version of the test object.” Test execution—the process of running those tests and recording outcomes—is a core part of the fundamental test process described in ISO/IEC/IEEE 29119.

In practice, teams use test runs to answer a release question: Given the scenarios we committed to validate for this sprint or version, how far along are we—and what is still failing?

Traditional test management systems such as TestRail and Qase model a run as a container: selected test cases, assignees, pass/fail/blocked status, and often milestone or environment context. TestRail’s guidance notes that runs are typically created per sprint or release so managers can track progress in real time.

Test Runs in TestChimp preserve that coordination purpose while changing what sits underneath—the plan, the executions, and how progress rolls up.

The gap test runs are meant to close

Most teams already know the shape of a release cycle:

a defined set of scenarios to validate
manual testers working through critical paths
automation running in CI on every build
a lead asking, “Are we done yet?”

Traditional tools answer the last question with a run—but the artifacts rarely stay connected.

User stories often live in Jira or similar issue trackers. Scenarios live in a TMS. Manual evidence sits in screenshots and Slack. Automation results sit in GitHub Actions, Jenkins, or a Playwright CI report. Requirement traceability—linking requirements to verifying tests, as described in ISO/IEC/IEEE 29148—is often maintained in spreadsheets or a test traceability matrix that goes stale.

The run becomes another manually curated list, disconnected from how the product is organized and how work actually happens.

We built Test Runs in TestChimp to close that loop without duplicating your plan in a flat case catalog.

Same concept, different foundation

A Test Run in TestChimp is still a time-bound validation campaign: a title, optional environment and release context, collaborators, a due date, and a scope of scenarios to validate.

What changes is everything underneath.

Traditional TMS test run	TestChimp Test Run
Flat list of test cases copied into the run (TestRail `add_run`)	Scope selected from your plans folder tree (stories and scenarios)
Manual results entered in the TMS UI	Manual sessions linked from the Chrome extension or web UI—with step evidence
Automation results imported via API or re-entered (TestRail result import)	Automation batches linked after CI Playwright runs; no duplicate result entry
Progress is case-by-case checkboxes	Progress is scenario status (passing / failing / not attempted) from the latest linked execution
Roll-up is a fixed “suite” or “section”	Roll-up follows any folder in your plan—checkout today, authentication tomorrow

You are not maintaining a parallel catalog. You are pointing a run at the test plan you already have.

One run, both execution types

The most common fracture in enterprise QA is two parallel tracks:

manual validation tracked in a test management tool
automated validation tracked in CI or a vendor dashboard

Qase’s own documentation describes the tension: auto-generated CI run names pile up quickly, and teams need runs that “tell a story at a glance” when reviewing overnight failures before a release.

A Test Run in TestChimp is deliberately execution-type agnostic. Link a manual session from exploratory regression. Link tonight’s Playwright batch. Link both to the same run. Scenario status reflects the latest relevant execution—whether a human marked a session passed or CI reported a SmartTest failure.

That is the same unified coverage story we told with manual testing and traceability—now packaged for release-scale questions instead of only folder-level requirement traceability insights.

Folder-based progress, not flat lists

Because TestChimp organizes stories and scenarios as markdown files in folders (Test Planning as Code), a test run inherits something traditional tools struggle to offer: scoped views at any granularity.

Select the root of the run and see overall progress for the whole release. Select checkout/ and see only checkout scenarios. Select a single story file and see exactly what is left on that requirement.

No re-tagging. No re-grouping cases into ad hoc suites every sprint. The folder structure you already use for planning becomes the structure you use for reporting—the same principle as coverage at any folder level in Test Planning.

That matters when:

feature teams own folders, not individual case IDs
a release spans several modules but not the entire backlog
you need a standup answer for one area without re-filtering a 2,000-row grid

Trend charts in the run viewer show how passing, failing, and not-attempted counts move over time—useful for daily readouts without exporting to a spreadsheet.

Why this fits the agentic era

Test runs are not a throwback to heavyweight process. They are a lightweight coordination layer on top of artifacts agents can already read.

Your scenarios are files. Your tests link with @Scenario comments (requirement traceability in code). Executions feed the same traceability graph whether they are manual or automated. A test run simply names the campaign—“Sprint 42 regression”, “v2.1 sign-off”—and gives humans a place to see progress while agents keep authoring against the same plan.

We are not replacing CI dashboards or extension manual capture. We are giving product and QA leads a single pane for this validation cycle, grounded in requirements rather than orphaned case records.

See it in action

For step-by-step setup—creating a run, defining scope, linking batches and sessions, reading the viewer—see Test Runs in the docs.

Frequently asked questions

What is a test run in software testing?

A test run is a structured execution of a selected set of tests against a specific build, release, or milestone. The ISTQB glossary defines it as running a test suite on a particular version of the system under test. Teams use runs to track who tested what, record pass/fail outcomes, and report release readiness.

How is a TestChimp Test Run different from a TestRail or Qase test run?

The coordination goal is the same: scope a set of tests, track progress, report status. The foundation is different. Traditional tools copy flat test cases into a run container (TestRail runs, Qase test runs). TestChimp scopes runs from your existing plans folder tree and aggregates results from linked manual sessions and automation batches—without maintaining a duplicate case list.

Can one test run include both manual testing and test automation?

Yes. TestChimp Test Runs are execution-type agnostic. Link manual sessions captured via the Chrome extension and automation batches from Playwright CI to the same run. Each scenario’s status reflects the latest linked execution, whether the outcome came from a human or from CI.

Do I need to duplicate test cases to create a test run?

No. You select scope from folders and files already in Test Planning. Scenarios remain the same markdown artifacts your team authors and version-controls; the run is a pointer and progress lens, not a second catalog.

What is folder-based test run progress?

Because stories and scenarios live in a nested folder structure (Test Planning as Code), the test run viewer lets you drill into any folder or file and see passing, failing, and not-attempted counts for just that subtree. Root shows the full run; authentication/ shows auth only—without re-tagging cases or rebuilding suites each sprint.

How do Test Runs relate to requirement traceability?

Requirement traceability links requirements and scenarios to executions over time—supporting the verification relationships described in standards such as ISO/IEC/IEEE 29148. Test Runs add a named campaign layer: a due date, collaborators, explicit scope, and release-oriented progress for one validation cycle. Traceability is ongoing product health; test runs are this regression or this release sign-off.

When should we use test runs vs Test Planning insights alone?

Use Test Planning insights when you want continuous coverage visibility for a folder, environment, and time range. Use Test Runs when you need a time-bound campaign with assigned collaborators, a due date, and a dedicated dashboard fed by executions you link during that cycle—similar to how teams use TestRail test runs per sprint, but unified across manual and automated work.

Can we link executions to a test run after they complete?

Yes. Automation batches can be linked from Executions → Automation Batches (list or batch viewer). Manual sessions can be linked at capture time in the extension or afterward from the manual session viewer. Many-to-many linking is supported—a batch or session can belong to multiple active runs.

Try it

Open Test Runs from the TestChimp sidebar, create a run scoped to the folder your team owns, and link the next manual session or automation batch you execute.

If you are comparing approaches, our requirement traceability post explains the foundation; this feature adds the campaign layer on top when you need to track a specific release or regression cycle end to end.

We are iterating on collaborator workflows, PDF reporting, and deeper agent integration. Feedback welcome—especially from teams migrating off TestRail-style run models.

Multi-platform test automation: one test codebase for web and mobile

June 4, 2026 · 11 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

TL;DR: If your product ships both a web app and native mobile apps, you are probably maintaining two automation codebases that repeat the same Arrange logic—users, listings, payments, feature flags—before any UI step runs. TestChimp Multi-Platform Projects put Playwright (web), Mobilewright (iOS/Android), and API tests in one Git-connected scaffold, with shared business logic for world-state setup and platform-specific UI tests, coverage, and UX analytics. UI interactions stay platform-specific; test infrastructure does not have to—and neither does your requirements, TrueCoverage, or Atlas view of quality.

TestChimp Multi-Platform project: shared test codebase with Web, iOS, and Android coverage

The hidden cost of “Appium for mobile, Playwright for web”

Cross-platform products rarely differ at the data layer. A booking marketplace needs the same primitives whether the customer taps Book in Safari or in your iOS app:

A test user with a known identity
Inventory (for example, a few property listings)
A valid payment method linked to that user
Whatever else your domain requires before the flow under test is meaningful

None of that is inherently web or mobile. It is application state—the Arrange phase in the classic Arrange → Act → Assert model (Martin Fowler on Given-When-Then).

Yet the dominant split for years has been:

Layer	Typical tooling
Web UI	Playwright
Native mobile UI	Appium (often with WebDriver-style clients)
Shared setup	Duplicated across two repos or two top-level trees

Teams end up with parallel helper libraries, duplicate seed scripts, and drift—web tests create users one way, mobile tests another, and failures become “which stack is wrong?” instead of “did we break the product?”

The Act and Assert steps should differ by surface: selectors, gestures, and viewport behaviour are platform-specific. The Arrange layer often should not.

Why Mobilewright changes the consolidation story

Mobilewright brings native iOS and Android automation closer to the Playwright mental model: async tests, auto-waiting, project matrices in config, and fixtures that feel familiar if you already run npx playwright test.

That alignment matters for multi-platform engineering, not only for “mobile testing” as an isolated workstream:

Same language and patterns (commonly TypeScript/JavaScript in one repo)
Same CI habits (config projects, parallel workers, artifact uploads)
Same opportunity to share code for factories, API clients, and database seeding

TestChimp already extended the plan → repo → agent → CI loop to native mobile (native mobile testing announcement). Multi-Platform Projects are the next step: one TestChimp project type and one tests tree for teams that ship web and mobile together.

What TestChimp Multi-Platform Projects provide

When you create a TestChimp project with type Multi-Platform, the platform scaffolds a single tests/ directory that includes:

web/ — browser SmartTests via Playwright (playwright.config.js, web/e2e/, web/pages/, web/fixtures/)
mobile/ — native UI tests via Mobilewright (mobilewright.config.ts, mobile/e2e/common|ios|android/, mobile/pages/, mobile/fixtures/)
api/ — platform-agnostic HTTP specs (often the fastest way to Arrange and to assert backend state)
shared/ — cross-suite helpers and fixture factories (seed users, auth builders)—excluded from test discovery, intended for reuse
setup/ — global setup run once before suites in both configs

Platform-specific UI code lives in platform-specific folders. Business logic that creates entities and prepares situations can live in shared/, api/fixtures/, or factories imported by both web and mobile specs.

tests/
  setup/
  shared/              ← shared Arrange logic (users, listings, payments, flags)
  api/
    fixtures/
  mobile/
    fixtures/
    pages/
    e2e/
      common/
      ios/
      android/
  web/
    fixtures/
    pages/
    e2e/
  playwright.config.js
  mobilewright.config.ts

Result for QA and platform teams:

Less duplicated infrastructure — one place to update “premium user with saved card”
Less maintenance — fix seeding once; web and mobile suites consume the same factories
More consistency — the same world-state definitions drive cross-platform regression

Smart Steps (ai.act, ai.verify) remain web-only today; native mobile continues to use standard Mobilewright APIs for UI Act steps. For platform capabilities and CI notes, see Mobile testing.

One project, platform-specific coverage and UX intelligence

Consolidating tests in one repo does not mean blending web and mobile into one misleading coverage number. Multi-Platform Projects keep one TestChimp project and one plans/tests Git mapping, while treating Web, iOS, and Android as first-class execution platforms everywhere insights matter.

Think of it as: shared requirements and shared Arrange code, sliced execution and analytics per surface.

Area	What stays unified	What is platform-specific
Test plans	Markdown scenarios and user stories in `plans/`	Coverage and execution history per platform
TrueCoverage	Same project, env/release/branch scope	Production RUM + test attribution per platform
Atlas	Same product vocabulary (screens/states)	SiteMap tree, bugs, and baselines per platform

Requirement traceability (Test Planning)

Requirement traceability links scenarios in Git to SmartTest runs. On a Multi-Platform project, the Insights tab and scenario execution history respect an execution scope that includes platform alongside environment, release, branch, and time range.

Choose Web, iOS, or Android to see which scenarios passed or failed on that surface.
Drill into a user story to view execution history filtered to the platform you care about—useful when mobile lags web or when a shared scenario is covered by both web/e2e/ and mobile/e2e/ specs.
Folder roll-ups in Test Planning still work; the platform dimension answers questions like “Is checkout covered on iOS in QA this week?” without spinning up a second project.

Agents and CI should report runs with the correct platform identity (via @testchimp/playwright / Mobilewright reporter wiring) so linked // @Scenario: tests attribute to the right slice. Your plans can describe behaviour once; coverage status reflects where that behaviour is actually exercised.

TrueCoverage (production-informed gaps)

TrueCoverage compares real user journeys (RUM) with automation coverage (test-tagged events). Each surface has its own instrumentation path—@testchimp/rum-js on web, testchimp-rum-ios and testchimp-rum-android on native—with TESTCHIMP_PROJECT_TYPE set to web, ios, or android as described in Instrumenting your app.

On Multi-Platform projects, the TrueCoverage execution scope offers the same Web / iOS / Android selector. That keeps comparisons honest:

Production events from the iOS app are not mixed with web test runs when you evaluate gaps.
Agents prioritizing fixtures and tests can target the platform where users actually hit the gap—for example high drop-off on Android checkout vs healthy web funnel.

Instrument every surface you ship; scope analytics one platform at a time when deciding what to automate next.

Atlas (UX bugs on the right surface)

Atlas is TestChimp’s app-structure map: screens and states, with UX and non-functional bugs tagged where ExploreChimp or SmartTests observed them. For multi-platform products, the SiteMap is not a single blurred tree—you browse and triage per platform.

A platform selector (Web, iOS, Android) loads the screen-state tree for that execution platform.
Bugs discovered during exploration or annotated runs are associated with screen-state context on that platform, so a layout regression on mobile does not drown in unrelated web noise.
markScreenState checkpoints in web Playwright tests and mobile Mobilewright tests feed the vocabulary ExploreChimp and Atlas use; platform-specific folders keep Act steps separate while structure stays comparable across surfaces.

That matters for engineering leads reviewing quality: you open Atlas, pick iOS, and see UX issues on the iOS SiteMap—assign owners per screen, run targeted ExploreChimp from a node, and track fix status without conflating desktop-only flows.

Phase	Web	Mobile	Share?
Arrange	API/fixtures/DB seed	Same backends	Yes — prefer `api/`, `shared/`, or backend fixtures
Act	Playwright locators & navigation	Mobilewright gestures & native selectors	No — keep under `web/` and `mobile/`
Assert	DOM + optional API probes	Native UI + optional API probes	Often partial — API assertions can be shared; UI assertions stay local

This is the same insight as fixtures and Object Mother patterns in xUnit-style testing (xUnit Test Patterns — test fixture, Object Mother): push incidental complexity of setup out of the test body and into reusable, composable building blocks. Agents authoring tests benefit even more when Arrange is API-backed rather than repeated through slow UI clicks (fixtures in agentic automation).

How to get started

Sign in to TestChimp and open Add project.
Choose project type Multi-Platform (web + native mobile in one codebase).
Connect Git and map your plans/ and tests/ folders (same workflow as web-only projects).
Run your usual agent workflow—for example /testchimp test after a PR—using the TestChimp skill on Claude or Cursor.

Docs to read next:

Mobile testing (iOS and Android) — configs, CI, and web vs mobile capability table
Creating SmartTests — Playwright additions on web; Mobilewright on native
Instrumenting your app for TrueCoverage — RUM on web and native when you want production-informed coverage

If your team already runs separate web and mobile automation repos, migrating Arrange into shared/ and api/ first—before moving UI specs—is usually the lowest-risk path. You keep platform runners; you stop duplicating the world behind them.

Frequently asked questions

Is Multi-Platform the same as creating separate web and mobile TestChimp projects?

No. Multi-Platform is one project and one scaffold where both Playwright and Mobilewright configs and folder layouts coexist. Separate Web and Mobile project types still exist when you only need one surface.

Do I have to abandon Appium to use this?

TestChimp’s native path is Mobilewright, not Appium. Teams often adopt it when they want Playwright-like authoring and shared TypeScript with web suites. If you are standardized on Appium, compare effort to maintain duplicate Arrange code versus migrating Act layers over time while centralizing setup in API tests first.

Can API tests really replace UI for Arrange?

For many domains, yes—and Playwright’s request context (and direct HTTP clients in api/*.spec.js) are the fastest, least flaky way to reach a given situation. UI Act remains necessary to validate what users see and tap; UI Arrange is usually optional once APIs or admin seeds exist (QA in production).

What’s the biggest win if we already have Playwright on web?

The win is often consolidation of test infrastructure, not “another mobile runner.” Mobilewright lets mobile join the same repo conventions as web so agents and engineers maintain one mental model for fixtures, plans, and CI.

If plans and tests are in one repo, is coverage merged across web and mobile?

No—not by default. Requirement coverage, TrueCoverage comparisons, and Atlas navigation use an explicit platform dimension on Multi-Platform projects (Web, iOS, Android). Shared scenarios in plans/ can be linked from both web/ and mobile/ tests; the platform scope shows where those links actually ran and passed.

TestChimp now supports native mobile testing

May 23, 2026 · 4 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

TL;DR: TestChimp now supports native mobile app testing on both iOS and Android. This brings the same seamless workflow we unlock for your web testing - just say "/testchimp test".

TestChimp native mobile testing support

What shipped

Mobile is not a separate product bolted on the side. It is the same plan → repo → agent → CI loop you use for web SmartTests, extended to native apps via Mobilewright—a Playwright-style API and toolchain for iOS and Android.

Create a TestChimp project with project type iOS or Android, connect Git for your plans and tests folders, install the TestChimp skill on Claude or Cursor, and after each PR say /testchimp test. The platform keeps doing what you expect: wiring RUM, reading scenarios, closing coverage gaps, and surfacing analytics—now on screens that live inside your app, not only in the browser.

For setup details and parity tables, see Mobile testing (iOS and Android).

Five value props for Claude-based test authoring—four are live on mobile

TestChimp’s agentic QA model rests on five pillars. On native mobile, four are fully supported today:

Value prop	What it gives you	Mobile status
Requirement traceability	Plans ↔ tests feedback loop; scenarios stay linked to coverage	Supported
TrueCoverage	Real user behaviour ↔ tests feedback loop; production informs what to automate	Supported
QA workflow execution	Seed/probe endpoints, fixtures for reusable world-states, test authoring, scenario linking	Supported
ExploreChimp	Analytics on screenshots, logs, and network from exploratory runs	Supported
Smart Steps	Intent-based steps in test scripts (`ai.act`, `ai.verify`, …)	Not yet

Smart Steps remain web-only for now. Native mobile tests use standard Mobilewright APIs for UI interaction—the same deterministic, async execution model you know from Playwright, without the intent-comment layer on top.

Everything else—the closed loops between requirements, production behaviour, fixtures, and tests—carries over.

The same seamless workflow as web

You do not need a new playbook. The habit stays the same:

Install the TestChimp skill on Claude or Cursor.
After each PR, run /testchimp test (or your team’s equivalent in the agent host).

TestChimp then orchestrates the work you would otherwise stitch together manually:

RUM libraries — Wire up testchimp-rum-ios and testchimp-rum-android so production and test runs speak the same event vocabulary.
Instrumentation — Understand real user behaviour: segments, interaction flows, and scenarios—not just “the app launched.”
Plans and stories — Read markdown scenarios, pull requirement traceability insights, and see what is still untested.
Test authoring — Author Mobilewright tests to cover gaps, with traceability annotations where your plan expects them.
Spot analytics — Run ExploreChimp-style analysis on new screens: visuals, logs, network.

You still get continuous transparency of QA posture in one platform—requirements, coverage, failures, and exploration—whether the surface is a browser tab or a native view controller.

Familiar tests, less flakiness

Mobile tests are authored in a Playwright-familiar style via Mobilewright: auto-waits, async execution, and fixtures that behave like the ecosystem you already trust on web. That consistency matters when agents (and humans) move between repos that ship both web and mobile.

Fair credit where it is due: the reliability characteristics of that execution model come from Mobilewright—and we are grateful they exist. Mobilewright moved our timeline for serious native support forward by at least a year. If you need cloud-hosted real devices in CI, Mobile Use integrates with the same stack.

What to do next

Docs: Mobile testing (iOS and Android) — feature parity, CI notes, and links to RUM instrumentation.
TrueCoverage on device: Instrumenting your app for TrueCoverage.
Agent commands: QA on Autopilot — init, test, explore, evolve.

If you are already on TestChimp for web, create an iOS or Android project, point Git at your plans and tests folders, and run /testchimp test on your next mobile PR. Smart Steps will follow; the feedback loops you care about for shipping quality are already there.

SKILLs are becoming SaaS’s best distribution hack (here’s why)

May 11, 2026 · 3 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

For years, the hardest part of selling a complex technical product was not the demo—it was the learning curve. Buyers had to internalize workflows, edge cases, and “the right way” to use each feature before they could reliably get value.

That is changing fast. Agent Skills—portable folders of instructions, checklists, and resources that teach an AI agent how to work with your product—are starting to look like one of the most attractive distribution mechanisms for technical SaaS. Instead of hoping every customer reads the docs in the right order, you ship a repeatable operating procedure the agent can follow on demand.

A skill turns every “new user” into a “power user”

A well-designed Agent Skill effectively turns every user into a power user: one that knows which workflows to follow, how to use the product correctly, and how to extract maximum value from every feature.

That compresses time-to-value—the path to the “aha moment”—because the agent is not improvising from vague prompts; it is executing your intended playbook.

What we are seeing at TestChimp

We have been seeing this firsthand since launching the TestChimp Agent Skill.

For teams, the workflow is intentionally simple:

Author a few user stories (or import from Jira).
Install the TestChimp skill on your coding agent.
After each PR, simply say /testchimp test.

The skill teaches Claude how to coordinate with TestChimp to:

instrument the app for TrueCoverage,
fetch and interpret coverage gaps,
write tests that addresses the gaps and link them to scenarios correctly,
run targeted exploratory testing to catch UX issues,
and use AI-native test steps in tests where they help.

The upgrade loop: your perfect user ships with your product

The best part is what happens when you ship new features.

With a properly designed, self-updating TestChimp Agent Skill, your "user" continuously learns your latest workflows, capabilities, and best practices—and applies them the way you intended. Your agent-side “instruction manual” can move as fast as your product, without requiring every human user to re-read release notes and learn every new capability you ship.

If you are building technical SaaS in the agent era, the product surface area is no longer only your UI and APIs. It is also the skill: the packaged expertise that turns your users in to power users.

References and further reading

Authoritative guides and registries for Agent Skills (format, discovery, and ecosystem):

Boiling the lake - QA style

April 28, 2026 · 3 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

Boil the lake - credits: https://garryslist.org/posts/boil-the-ocean

Garry Tan recently introduced a simple but powerful idea: The old adage “don’t boil the ocean” is bad advice in the AI agent era. Well - at the very least, “lakes” are now very much “boilable”.

The core insight is: AI compresses certain work by orders of magnitude. That doesn’t just make things faster - it fundamentally changes what’s feasible.

Most people ask the wrong question:

“What existing human workflows can we speed up with AI?”

That’s incremental thinking. The real leverage comes from asking:

“What powerful workflows did we avoid entirely because they were too expensive to do with humans?”

Those are your “lakes”. And with AI, many of them go from infeasible → trivial.

The QA lake

In QA - making “test authoring faster” is akin to the former. The bigger ROI lies in the granular workflows that get unlocked now that agents can take autonomy in your test automation.

The Big Idea:

Could agents execute a workflow - where they continuously monitor “planned reality” (user stories / scenarios) and “production reality” (real user behaviour patterns) to improve the “tested reality” (test suite + test infra) - in a continuous feedback loop. All of it done in the background - looping you in for approval of plans it makes.

Feedback Loop enabled by TestChimp

This is exactly the future we were building TestChimp for - where agents participate in each phase of QA; where agents access real world insights / plan artifacts to self-direct its work strategically.

Claude + TestChimp

Today, we are adding the final piece of the puzzle: A SKILL that you can install on Claude / Cursor that enables just that.

In TestChimp, test plans are already maintained as Markdowns in repo - directly accessible to agents.
Requirements are linked to tests via in-code comments - that Agents can author.
Test executions are auto-tracked by our Playwright plugin
Event ingests are tracked across prod and test - to generate TrueCoverage insights.

The Skill “upskills” Claude to read those insights via our CLI / MCP, to plan and execute the entire QA workflow:

Understand coverage gaps, prioritize (using signals exposed by TestChimp) and plan
Author fixtures that emulate real-world situations observed
Update test infrastructure (seed / probe endpoints) as needed
Author tests - (provisioning PR-local envs to test in and validating tests work)
Update instrumentations to learn about real user behaviour (for future cycles - covering new user journeys introduced)

QA workflow orchestrated by TestChimp - Overview

The best part: All of this is condensed to just 2 commands - enabling a frictionless DevX:

/testchimp test -> (Run after each PR) Updates plans, authors seeds / fixtures, author tests, validate them in PR scoped isolated environments, instrument code for TrueCoverage
/testchimp evolve -> (Run periodically / on deploy) Audits test coverage aligned with requirements and real-user insights, to “evolve” your QA infra & test suite to cover critical under-tested areas and do corrective actions & run targeted exploratory runs.

Claude can write tests. With the right feedback loop, it can fully manage an effective, self-evolving QA posture that de-risks your product continuously. This is what TestChimp enables, by making each phase of QA agent-native, informed by requirements and real user behaviour insights, in a tight feedback loop.

Shift-Left with Git Branch-Aware Testing

March 5, 2026 · 4 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

The traditional QA bottleneck is a well-known pain point for modern development teams. For years, the industry has pushed to "shift-left" – to move testing earlier in the development lifecycle. However, a major technical hurdle has always remained: the environment gap.

When QA happens on a global "staging" environment or only after code merges to the main branch, the feedback loop is too slow. Bugs found post-merge cause expensive context-switching for developers and delay releases.

Today, we’re bridging that gap. We’ve added full branch awareness to the TestChimp platform, enabling true shift-left testing at the PR level.

Shift-Left Git Testing

Why Branch-Aware Testing?

Branch-aware testing means your QA process mirrors your Git workflow. Instead of testing "the app," you test the "feature-in-progress."

1. Test Authoring at the Feature Level

You can now switch between repository feature branches directly within TestChimp. File versions are maintained per branch, allowing QAs to sync with branch-specific remote content.

Most importantly, QAs can author tests and raise Pull Requests from TestChimp that merge directly into the feature branch. This ensures that by the time a developer is ready to merge their code, the corresponding tests are already part of the PR.

[!TIP] Security & Outsourcing: Our new GitHub App-based approach means you don't need to give external QA resources full repository access. They can work exclusively on the tests and plans folders (with PRs raised via TestChimp platform), maintaining a tight security posture.

2. Branch-Specific Test Execution

Gone are the days of manually pointing tests at different URLs. In your project settings, you can now configure a template string for branch-specific deployment URLs (e.g., Vercel or Netlify preview URLs).

When you run tests on a branch, TestChimp resolves the correct URL and injects it as a BASE_URL environment variable. Your scripts simply consume process.env.BASE_URL, ensuring they always target the correct preview deployment.

Branch Management UI

3. Exploratory Testing & Smart Bug Diffing

Exploratory testing is no longer a "post-release" activity. All exploratory runs can now be executed against the branch-specific deployment.

Our agents are now smart enough to report only new bugs found on the feature branch compared to the default branch. This allows you to instantly see what UX, performance, accessibility, or internationalization issues were introduced by a specific PR – before they ever touch production.

4. QA Intelligence: Sliced by Branch

In the Atlas page, you can now filter results by branch to see exactly how a specific screen or flow was affected by a PR. This level of granularity allows teams to answer the questions that actually matter during code review:

"What user stories are breaking in this PR?"
"Are unrelated scenarios being affected by these changes?"

Seamless CI Integration

If you already have a CI pipeline that generates preview URLs, TestChimp fits right in. Simply pass that preview URL as the BASE_URL environment variable in your CI action, and your tests will execute against the live branch deployment with zero extra configuration.

Strategic Planning, Tactical Execution

While test authoring and execution are now branch-aware, we’ve intentionally kept Test Planning artifacts product-scoped.

Strategy should be stable. Planning artifacts continue to sync with the repo's default branch, ensuring your high-level test coverage goals remain consistent even as individual features are developed and tested in parallel branches.

The Future is Shift-Left

By moving QA participation closer to the development phase, you’re not just catching bugs – you’re preventing them from ever reaching the main branch. Branch-aware testing turns QA from a gatekeeper into a core part of the feature development engine.

Special Purpose Testing Agents

January 14, 2026 · 3 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

If you’re already familiar with ExploreChimp, you know it’s like having a driver navigate your web app for you. Guided by SmartTests, ExploreChimp scans the DOM, screenshots, network calls, and browser metrics to spot bugs that traditional automation scripts can't see.

While ExploreChimp gives you broad coverage, some problems only show up when you deliberately push specific edges.

That’s why we’re launching our "Troop of Special Purpose Testing Agents".

They’re all guided by the same SmartTests, but each agent is purpose-built to tackle a specific class of bugs.

Here is the starting line up that we are launching today:

Form Validation Tester: Meet “Deadpool”

Writing negative test cases for forms is soul-crushing work. You have to think of every wrong input a user could throw at your app, then write tests to catch it.

Form Validation Agent In Action

Our Form Validation Tester, affectionately nicknamed Deadpool, does all the heavy lifting for you. You only need to define the “Happy Path” - the correct way to fill a form. From there, Deadpool goes rogue:

Past and future dates
Negative numbers
Random strings
Whitespace-only inputs
Invalid data formats, like numbers in text fields

It pushes your forms to the limits to ensure your validation logic holds up - all without writing a single line of negative test code.

Theme Tester: Spot Invisible Problems

Themes are more than just aesthetic. Switching between dark mode, high contrast, or custom color palettes can break visual harmony that users expect to "just work".

Theme Agent In Action

The Theme Tester loops through all the themes your app supports, hunting for:

Contrast issues
Text visibility problems
Ugly color combinations

It can toggle themes via cookies, local storage, or by interacting with your app’s UI - whatever works best for your setup.

Localization Tester: No More "Lost in Translation"

Supporting multiple locales introduces a whole new set of bugs. Dates, currencies, text overflow, RTL layouts, and even cultural appropriateness can break your user experience.

Localization Agent Configuration

Our Localization Tester handles it all:

Detects broken translations or dangling template strings
Checks date and currency formatting across locales
Verifies layout integrity in RTL languages
Flags potential cultural missteps

With this agent on your team, you can support global audiences with confidence.

Screen Discovery Agent: Building Your App’s GPS

No test scripts yet? No problem.

The Screen Discovery Agent methodically crawls your app, visiting key screens. It automatically generates your initial SmartTest suite, so the rest of the troop can get to work.

Expanded Behaviour Map

tip

Note: You can easily add more user journeys with our Chrome Extension.

More Agents Coming Soon

This is just the beginning. We’re already working on more powerful agents, including: • RBAC Tester – to verify role-based authorizations work as intended • Network Resilience Checker – to see how your app behaves when connectivity gets fuzzy, backend breaks...

And we’re always looking for more ideas. If you’ve got a pain point in testing, we want to hear about it!

Ready to Try the Troop?

Stop stressing over the worst parts of testing. Let our agents handle the tedious tasks so you can focus on what really matters: building amazing experiences.

Try the Troop today: https://testchimp.io
Read the docs: https://docs.testchimp.io/explorations/intro

From Bug Report to Pull Request: The TestChimp x OpenHands Integration

January 5, 2026 · 3 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

Let’s be honest. Finding a bug is only the start.

Then comes the context switching – reproducing the issue, digging through logs, writing a fix that doesn’t break something else…

As of today, that workflow is outdated.

Fix Bug Cover

Today, we are launching our OpenHands Integration. This isn’t just a “Chat with AI” wrapper. It is a fully automated pipeline that takes a bug found in TestChimp and turns it into a ready-to-merge Pull Request in your GitHub repository. Here is how it works, why it actually fixes things (instead of hallucinating), and how to set it up.

The "Context Gap" (Why AI usually fails at debugging)

Most AI coding agents are smart, but blind. You tell them “The cart button is broken,” and they hallucinate a fix because they can’t see the state of the application. We solved the Context Gap. When TestChimp captures a bug (whether manually or via our automated agents), we record the entire runtime reality of that failure. When you click “Fix” via the OpenHands integration, we feed the cloud agent the complete necessary context including:

Visual Bounding Boxes: We show the agent exactly where the bug is physically located on the screen.
API Payloads: The agent sees the actual network requests and response bodies that triggered the error.
Console Logs: JavaScript errors, warnings, and stack traces captured at the exact moment of failure.
DOM Context: The full element selectors and structure information.
Screen-State: Specifics on which screen and state the app was in.

The OpenHands agent doesn’t guess anymore. It fetches these artif acts on-demand, analyzes your codebase, and writes a precise fix.

The Workflow: One Click, Real Code

We built this for speed. Here is what the new flow looks like:

1. Spot the Bug (or Batch Them)

You can select a single bug or use the checkboxes to select multiple bugs at once. If you have five related UI glitches, select them all. The agent is smart enough to identify common issues, group them, and address them together.

2. Click “Fix”

Hit the tool icon next to the bug. TestChimp validates your config and sends the context package to the OpenHands cloud instance.

Fix Bug Screenshot

3. Watch it Work Live

This is the cool part. We pop a success modal with a direct link to the OpenHands Conversation. You can click that link and watch the agent “think” in real-time. You see it analyzing the screenshots, reading the API logs, and reasoning through the code changes.

4. Review the PR

Once the agent is done, it automatically raises a Pull Request in your connected GitHub repository. You review the code, run your CI, and merge.

Technical Setup (How to turn it on)

This feature is available now for TestChimp Teams subscribers. Prerequisites: You need an OpenHands account (cloud or self-hosted) and your GitHub repository must be connected to both OpenHands and TestChimp. Note: The repo connected in OpenHands must match the repo configured in TestChimp. Configuration Steps:

Go to Project Settings -> Integrations -> OpenHands.
Enter your OpenHands API Key.
Select your Installation Type (Cloud or Self-hosted).
Click Save Configuration.

Why this matters

We are moving from “Bug Tracking” to “Bug Killing.” By giving an autonomous agent access to on-demand artifacts like bounding boxes and DOM states, we are removing the manual labor from regression testing. Stop fixing bugs manually. Let the chimp handle it.

The new problem: agents author tests en masse​

What is Semantic Graph?​

How it works (the pipeline)​

1. Parse tests into embedding-ready text​

2. Embed with cosine similarity​

3. Cluster with DBSCAN​

4. Visualize with UMAP + D3​

Why this matters for agentic QA workflows​

Before the agent writes​

After the agent writes​

During suite health reviews​

Complement to production signals​

What you see in the UI​

Try it yourself​

Prerequisites​

Quick start (OpenAI)​

Claude + Voyage​

All options​

Continuous governance with TestChimp​

FAQ​

What test file types are supported?​

Does it require a TestChimp account?​

How is this different from code coverage?​

How is this different from duplicate detection by test name?​

Can I use it in CI?​

What embedding models are supported?​

Is the source code open?​

Summary​

References and further reading​

The problem with “just record it”​

What we do instead​

1) Capture the manual session—with scenario context​

2) Generate prompt → coding agent​

3) Continuous improvement—not one-shot codegen​

How this differs from record-replay vendors​

When to use which path​

Get started​

What is a test run?​

The gap test runs are meant to close​

Same concept, different foundation​

One run, both execution types​

Folder-based progress, not flat lists​

Why this fits the agentic era​

See it in action​

Frequently asked questions​

What is a test run in software testing?​

How is a TestChimp Test Run different from a TestRail or Qase test run?​

Can one test run include both manual testing and test automation?​

Do I need to duplicate test cases to create a test run?​

What is folder-based test run progress?​

How do Test Runs relate to requirement traceability?​

When should we use test runs vs Test Planning insights alone?​

Can we link executions to a test run after they complete?​

Try it​

Further reading​

The hidden cost of “Appium for mobile, Playwright for web”​

Why Mobilewright changes the consolidation story​

What TestChimp Multi-Platform Projects provide​

One project, platform-specific coverage and UX intelligence​

Requirement traceability (Test Planning)​

TrueCoverage (production-informed gaps)​

Atlas (UX bugs on the right surface)​

Arrange vs Act: what to share (and what not to)​

How to get started​

Frequently asked questions​

Is Multi-Platform the same as creating separate web and mobile TestChimp projects?​

Do I have to abandon Appium to use this?​

Can API tests really replace UI for Arrange?​

What’s the biggest win if we already have Playwright on web?​

If plans and tests are in one repo, is coverage merged across web and mobile?​

Further reading​

What shipped​

Five value props for Claude-based test authoring—four are live on mobile​

The same seamless workflow as web​

Familiar tests, less flakiness​

What to do next​

A skill turns every “new user” into a “power user”​

What we are seeing at TestChimp​

The upgrade loop: your perfect user ships with your product​

References and further reading​

The new problem: agents author tests en masse

What is Semantic Graph?

How it works (the pipeline)

1. Parse tests into embedding-ready text

2. Embed with cosine similarity

3. Cluster with DBSCAN

4. Visualize with UMAP + D3

Why this matters for agentic QA workflows

Before the agent writes

After the agent writes

During suite health reviews

Complement to production signals

What you see in the UI

Try it yourself

Prerequisites

Quick start (OpenAI)

Claude + Voyage

All options

Continuous governance with TestChimp

FAQ

What test file types are supported?

Does it require a TestChimp account?

How is this different from code coverage?

How is this different from duplicate detection by test name?

Can I use it in CI?

What embedding models are supported?

Is the source code open?

Summary

References and further reading

The problem with “just record it”

What we do instead

1) Capture the manual session—with scenario context

2) Generate prompt → coding agent

3) Continuous improvement—not one-shot codegen

How this differs from record-replay vendors

When to use which path

Get started

What is a test run?

The gap test runs are meant to close

Same concept, different foundation

One run, both execution types

Folder-based progress, not flat lists

Why this fits the agentic era

See it in action

Frequently asked questions

What is a test run in software testing?

How is a TestChimp Test Run different from a TestRail or Qase test run?

Can one test run include both manual testing and test automation?

Do I need to duplicate test cases to create a test run?

What is folder-based test run progress?

How do Test Runs relate to requirement traceability?

When should we use test runs vs Test Planning insights alone?

Can we link executions to a test run after they complete?

Try it

Further reading

The hidden cost of “Appium for mobile, Playwright for web”

Why Mobilewright changes the consolidation story

What TestChimp Multi-Platform Projects provide

One project, platform-specific coverage and UX intelligence

Requirement traceability (Test Planning)

TrueCoverage (production-informed gaps)

Atlas (UX bugs on the right surface)

Arrange vs Act: what to share (and what not to)

How to get started

Frequently asked questions

Is Multi-Platform the same as creating separate web and mobile TestChimp projects?

Do I have to abandon Appium to use this?

Can API tests really replace UI for Arrange?

What’s the biggest win if we already have Playwright on web?

If plans and tests are in one repo, is coverage merged across web and mobile?

Further reading

What shipped

Five value props for Claude-based test authoring—four are live on mobile

The same seamless workflow as web

Familiar tests, less flakiness

What to do next

A skill turns every “new user” into a “power user”

What we are seeing at TestChimp

The upgrade loop: your perfect user ships with your product

References and further reading