How to Find Duplicate Tests in a Playwright Suite (Semantic Graph for Agentic QA)

July 4, 2026 · 10 min read

Founder & CEO, TestChimp

TL;DR: When coding agents can write dozens of Playwright tests in a single session, the bottleneck shifts from authoring to governance: are the new tests distinct and useful, or just near-duplicates of what you already have? Semantic Graph is a free, open-source CLI that scans your suite, embeds each test semantically, clusters related tests, and renders an interactive graph so you—and your agent—can spot redundancy before it compounds.

Semantic Graph visualization — folder tree, 2D similarity graph, and cluster list view

The new problem: agents author tests en masse

For most of the last decade, the hard part of E2E testing was throughput: humans could not write and maintain enough tests to keep up with product velocity.

That constraint is collapsing. With Claude Code, Cursor, and agent skills like the TestChimp skill, a single prompt can produce a folder of well-formed Playwright specs in minutes. Coverage gaps that used to take a sprint to close can shrink to an afternoon.

The bottleneck has moved.

Era	Primary constraint	What "good" looked like
Manual QA	Authoring speed	Enough tests to cover the happy path
Human + low-code tools	UI-layer setup friction	Stable POMs, fewer flakes
Agentic QA	Suite quality at scale	Distinct, high-signal tests—not copies

When an agent is rewarded for adding tests—closing coverage gaps, responding to PR feedback, or filling in scenarios from a test plan—it has no innate sense of "this already exists, slightly reworded." Left unchecked, suites balloon with:

Duplicate tests that assert the same behaviour under different titles
Near-duplicates that differ only in fixture data or selector phrasing
Clustered redundancy where five tests all exercise the same checkout edge case
Invisible overlap across folders, because no human (and no agent) holds the entire suite in working memory

This is the QA equivalent of boiling the lake in the wrong direction: lots of heat, little new coverage. Worse, duplicate tests inflate CI time, confuse failure triage, and give a false sense of depth—your line count grows while your behavioural breadth stalls.

The question is no longer "Can we write more tests?" It is:

"Are we writing useful, distinct tests—or just duplicative ones?"

That question needs a semantic answer, not a filename diff.

What is Semantic Graph?

Semantic Graph is an open-source tool from TestChimp that maps your Playwright test suite by meaning, not syntax.

It is published as @testchimp/semantic-graph on npm and lives in the TestChimp/semantic-graph repository. Run one command against your tests directory; the CLI:

Scans *.spec.ts, *.test.ts, and related Playwright files
Parses each test's suite path, title, intent comments, scenario annotations, and body
Embeds the canonical test text with an embedding model (OpenAI or Voyage AI)
Clusters tests by semantic similarity using DBSCAN
Lays out a 2D graph with UMAP so similar tests appear close together
Names clusters with a lightweight LLM pass (e.g. "auth", "checkout", "api-contracts")
Serves a local interactive UI at http://localhost:3859

No database. No TestChimp account required. Embeddings are computed in memory each run—ideal for local audits, pre-merge reviews, or giving an agent a structural view of the suite before it authors more tests.

How it works (the pipeline)

Understanding the pipeline helps you interpret the graph—and tune how agents use it.

1. Parse tests into embedding-ready text

The core library (@testchimp/semantic-graph-core) includes a vendored Playwright-aware parser. For each test it builds canonical text:

Suite: checkout > guest flow
Test: rejects expired coupon at payment step
Body:
Scenario: Guest checkout with invalid coupon
// intent: verify error copy and no charge created
await page.goto('/checkout');
...

Parsing captures intent comments and scenario annotations—the same metadata agents should be authoring anyway when following requirement traceability conventions. Two tests with different selectors but the same intent will land close together in embedding space.

2. Embed with cosine similarity

Each test's text is sent to an embedding API in batches (default model: text-embedding-3-small for OpenAI, voyage-4 for Voyage). The tool computes cosine similarity between vectors and applies configurable thresholds:

Signal	Default threshold	Meaning
Graph edge	≥ 0.75	Tests are semantically related
Similar	≥ 0.80	Worth reviewing together
Potential duplicate	≥ 0.92	Strong dedup candidate

These thresholds mirror how humans judge redundancy: not byte-identical, but "would a failure in one make the other pointless?"

3. Cluster with DBSCAN

Similar embeddings are grouped with DBSCAN density clustering—no need to pick k clusters upfront. Each cluster gets an LLM-generated label (e.g. "settings-page", "admin-tasks") so the legend is readable at a glance.

4. Visualize with UMAP + D3

A seeded UMAP projection maps high-dimensional embeddings to 2D coordinates. The bundled UI (built with D3.js) renders:

Graph view — nodes as tests, edges as similarity links; click a node to see nearest neighbours and duplicate flags
Clusters view — grouped list with colour-coded legend
Folder tree — scope the graph to a directory or single file

Zoom into tests/checkout/ before a refactor. Scan the whole suite before a release. Hand the URL to an agent and ask it to propose merges.

Why this matters for agentic QA workflows

Semantic Graph is not a replacement for TrueCoverage—production-informed prioritization—or requirement traceability. It solves a orthogonal problem: intra-suite redundancy.

Here is where it fits in a modern agent loop:

Before the agent writes

Run Semantic Graph and attach the cluster summary to the agent's context. Instructions become concrete:

"We already have four tests in the checkout cluster covering coupon validation. Do not add another unless you are testing a different failure mode."

This is cheaper and more reliable than asking the agent to grep test titles.

After the agent writes

Re-run the graph on the PR branch. New nodes that snap onto existing clusters—or spike duplicate scores above 0.92—are review flags. Pair with CI the same way you gate on lint or coverage deltas.

During suite health reviews

Quarterly "suite diet" sessions used to mean spreadsheets and gut feel. Now: filter to clusters with high internal similarity, merge or delete, and measure CI time recovered.

Complement to production signals

TrueCoverage tells you what behaviours users need tested. Semantic Graph tells you whether your existing tests are saying the same thing twice. Both are necessary for a suite that is broad and lean.

What you see in the UI

The demo above shows the full workflow:

Left panel — folder tree mirroring your repo layout; click a folder or file to scope the view
Graph mode — force-directed layout; proximate nodes are semantically alike
Clusters mode — tests bucketed with named themes
Popover — click any test to see top similar neighbours, similarity scores, and potential duplicate badges

The UI ships inside the npm package—no separate install. It is the same "freebie" static app published as @testchimp/semantic-graph-viz in the monorepo for anyone who wants to embed or fork it.

Try it yourself

Prerequisites

Node.js 18+
An API key for embeddings (and cluster naming):
- OpenAI — one key covers embeddings + LLM, or
- Anthropic + Voyage — Claude for cluster labels, Voyage for embeddings (Anthropic does not ship an embedding API)

Quick start (OpenAI)

export PROVIDER=openai
export API_KEY=sk-...

npx @testchimp/semantic-graph visualize --tests-dir ./tests

Open the printed URL (default port 3859). Add --verbose for embedding progress and diagnostics.

Claude + Voyage

export PROVIDER=anthropic
export API_KEY=sk-ant-...
export VOYAGE_API_KEY=pa-...

npx @testchimp/semantic-graph visualize --tests-dir ./tests

All options

Flag	Description
`--tests-dir <path>`	Root folder to scan (required)
`--port <n>`	Listen port (default `3859`)
`--verbose` / `-v`	Diagnostics to stderr

See the README for environment variables, monorepo build instructions, and npm publish details.

Continuous governance with TestChimp

Semantic Graph is deliberately local and standalone—a flashlight you can shine on any Playwright repo, TestChimp customer or not.

For continuous duplicate detection, requirement traceability, release confidence, and keeping suites healthy as agents keep authoring, see TestChimp—the git-native QA governance platform built for agentic teams. Install the TestChimp Agent Skill and run /testchimp test after each PR to orchestrate coverage, exploration, and plan alignment in one loop.

FAQ

What test file types are supported?

The scanner picks up *.spec.ts, *.spec.js, *.test.ts, *.test.js, and .mjs / .cjs variants under your chosen root—standard Playwright test layouts.

Does it require a TestChimp account?

No. Semantic Graph runs entirely locally. You only need embedding (and optionally LLM) API keys.

How is this different from code coverage?

Code coverage measures which lines executed. Semantic Graph measures whether test intentions overlap. A suite can have high line coverage and still be full of redundant scenarios.

How is this different from duplicate detection by test name?

Titles lie. Agents especially love paraphrasing: "should reject invalid coupon" vs "guest user sees error for expired promo code." Embeddings capture the full body and intent, not the string on line one.

Can I use it in CI?

Today the primary interface is the local visualize command and JSON APIs (/api/graph, /api/similar). For CI gates, parse the API responses or run before review and archive the graph output. Continuous server-side governance is on the TestChimp platform roadmap.

What embedding models are supported?

Defaults: text-embedding-3-small (OpenAI) and voyage-4 (Voyage). Override with EMBEDDING_MODEL. LLM cluster naming defaults to gpt-5-nano or claude-3-5-haiku-latest.

Is the source code open?

Yes. MIT-licensed monorepo: github.com/TestChimp/semantic-graph. Packages: @testchimp/semantic-graph-core, @testchimp/semantic-graph, @testchimp/semantic-graph-viz.

Summary

Agentic QA solved test authoring at scale. The next discipline is test distinctness at scale—ensuring every new spec adds behavioural breadth, not noise.

Semantic Graph gives you a semantic map of your Playwright suite: embeddings for meaning, DBSCAN for clusters, UMAP for intuition, and a local UI for humans and agents alike. Run it before you merge agent-authored tests. Run it when CI gets slow. Run it when you suspect the lake is boiling but not reducing risk.

Get started: github.com/TestChimp/semantic-graph · npx @testchimp/semantic-graph visualize

References and further reading

TestChimp Semantic Graph repository — source, README, and issue tracker
@testchimp/semantic-graph on npm — CLI package
Playwright Test documentation — supported project layouts
OpenAI Embeddings guide — text-embedding-3-small and related models
Voyage AI documentation — embeddings when using Claude as the LLM provider
UMAP: Uniform Manifold Approximation and Projection — dimensionality reduction for the 2D layout
DBSCAN clustering — density-based cluster assignment
Fixtures in agentic test automation — complementary TestChimp blog on Arrange-layer quality
TrueCoverage for agentic QA — production-informed test prioritization
TestChimp Agent Skills — orchestrate QA workflows in Claude and Cursor

The new problem: agents author tests en masse​

What is Semantic Graph?​

How it works (the pipeline)​

1. Parse tests into embedding-ready text​

2. Embed with cosine similarity​

3. Cluster with DBSCAN​

4. Visualize with UMAP + D3​

Why this matters for agentic QA workflows​

Before the agent writes​

After the agent writes​

During suite health reviews​

Complement to production signals​

What you see in the UI​

Try it yourself​

Prerequisites​

Quick start (OpenAI)​

Claude + Voyage​

All options​

Continuous governance with TestChimp​

FAQ​

What test file types are supported?​

Does it require a TestChimp account?​

How is this different from code coverage?​

How is this different from duplicate detection by test name?​

Can I use it in CI?​

What embedding models are supported?​

Is the source code open?​

Summary​

References and further reading​