Blog | TestChimp Documentation

UX Bug Traceability: Translating Bugs to Business Impact

February 21, 2026 · 2 min read

Founder & CEO, TestChimp

Which bugs are slowing down your checkout flow?
Are there localization issues in onboarding causing drop-off?

Most QA setups fall short of answering such questions that matter to the business.

The Gap Between QA and Business Impact

Traditional QA workflows surface issues --- broken buttons, layout glitches, validation errors, performance slowdowns. But when leadership asks:

How is this affecting conversion?
Is onboarding friction hurting activation?
Which workflow is most at risk?

The connection between bugs and business outcomes is often missed entirely - resulting in QA being perceived as a cost center, than what it really is - revenue protection.

That's the gap we're addressing today by adding UX bug traceability to our QA intelligence layer.

Guided Exploration, Not Random Crawling

Our secret sauce: Our exploratory agents don't wander randomly.

They use your existing automation tests as guides. Because tests are already linked to scenarios (via structured code comments), that same traceability naturally extends to exploratory runs and their findings.

This means every discovered issue can be traced back to:

The scenario
The user story
The business objective behind it

Exploration becomes business-contextual --- not isolated.

UX Bug Traceability Linkage

Structured for Insight at Any Level

In TestChimp, user stories are organized in nested folders. That structure becomes powerful when paired with traceable exploratory results.

Insights roll up automatically:

By area of the application
By workflow
By product surface
By team ownership

You can zoom in to a single scenario or zoom out to understand impact across an entire product area.

UX Bug Traceability Screenshot

Beyond a Floating Bug List

Instead of maintaining a detached list of issues, you gain visibility into:

Which flows are degrading user experience
Which exact bugs are causing latency in key user journeys - that mater to your revenue
Where UX friction is tied to user retention

It's no longer just about "bugs found."

It's about translating them understanding what is hurting your revenue --- and prioritizing accordingly.

Test Planning as Code: Your Test Artifacts, Version-Controlled and Agent-Ready

February 3, 2026 · 4 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

We used to live in forms.

Historically, dropdowns and text fields were the default way we planned and managed work. But in the agentic era, the winning UX isn’t a fancy form. It’s plain, boring text.

We already see it everywhere. We use skills.md for upskilling agents. claude.md for context. Spec-based development in Cursor. But look at your test planning tools. Jira, Linear—they were built in a pre-AI era. They’re database-centric, form-heavy, and fundamentally hostile to agentic workflows.

Shouldn’t testing be as modern as coding?

That’s why we’ve reimagined test planning for the agentic era. We call it Test Planning as Code.

Plans as strongly typed markdown

In TestChimp, your plans live as strongly typed markdown files—user stories and test scenarios as .md files with YAML frontmatter, organized in folders and version-controlled alongside your codebase.

Test Scenario as Markdown

There are some pretty significant advantages to maintaining stories and scenarios as simple .md files.

First, they sync to your code repository. That means your coding and testing agents can read them and work on them directly. No proprietary API, no “export for AI”—just the same files your team already uses.

Second, you can organize them in a nested folder structure however you want. By area. By journey. By team. That structure gives agents broader context. They see related stories and linked scenarios, not just isolated tickets floating in a database.

This is what actually gets stored in your repo. No proprietary formats. No lock-in. Just plain markdown.

“I don’t want to manage status in a text editor”

Humans still need workflows. We need priorities, due dates, and assignees.

TestChimp layers those workflows on top of the files. You get the familiarity of a structured UI for human workflows—rich forms, status dropdowns, filters—without losing the benefits of file-first planning. The source of truth stays in the files; the platform makes them easier to work with.

User Story Form

And because TestChimp indexes everything, the AI can actually work with your plan. It can help write or refine a user story more accurately. It can suggest relevant test scenarios. It can even detail them out, grounded in your actual requirements.

Linked Scenarios

Linking tests is trivial

Once you have scenarios, linking tests is trivial. Just add a comment in your test code:

// @Scenario: Login - Invalid Credentials Error

No spreadsheets. No manual mapping. No juggling multiple tools.

Export to Git keeps your test plan in the repo—stories and scenarios live under a path you choose, with full history and pull-request workflows. Your agents and your CI see the same files.

Export to Git

Coverage at any granularity

As tests run, coverage insights update automatically. And because your stories are organized in folders, you can see coverage at any granularity—per story, per area, per component.

If you’re working in a team where different groups own different parts of the system, you already know how useful this is.

Requirement Traceability

You can finally answer the question: Which scenarios are due next week, ready for testing, but still missing coverage? No spreadsheet. No manual roll-up. Just select the folder, apply the filters, and look at the Insights tab.

Wrapping up

Test Planning as Code is a different take on where test artifacts should live and who should be able to use them. Files in the repo instead of rows in a database; workflows layered on for humans, and that same file structure giving agents the context they need. If that approach resonates—or you’re just curious how it works in practice—we’ve documented the full workflow in the Test Planning section: authoring user stories, authoring test scenarios, export to Git, and requirement traceability.

Requirement Traceability, Without the Spreadsheet Circus

January 28, 2026 · 2 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

Q: How do you currently get requirement traceability?
Which user stories and scenarios are covered by tests, and what’s failing?

A:
For most teams, it looks something like this:

User stories live in Jira.
Test cases live somewhere else.
The mapping between them lives in an Excel sheet that someone manually maintains.

That spreadsheet is periodically uploaded to a test management tool like PractiTest. Then test execution results are pushed via an API to get a view of coverage and failures.

It works—until it doesn’t.

The problem with today’s approach

This is how requirement traceability is typically achieved today: a hodgepodge of tools stitched together with process and hope.

Multiple sources of truth
Manually maintained spreadsheets that inevitably go stale
Fragile workflows that break as teams and test suites scale

No single system actually owns the full picture. Instead, teams spend time keeping artifacts in sync rather than improving product quality.

A simpler model with TestChimp

In TestChimp, requirement traceability isn’t an afterthought. It’s built in.

You already author detailed user stories and break them down into meaningful test scenarios directly in the platform - with AI assistance that understands your product through your existing test scripts and documentation.

Linking tests to those scenarios is intentionally simple. In your test script, add a comment:

// @Scenario: <scenario title>

That’s it! TestChimp takes care of the rest:

Automatically links tests to scenarios
Tracks execution results across runs
Aggregates outcomes at scenario, story, and suite level

Requirement Traceability

You get clear, real-time dashboards that let you answer business relevant questions:

Which user stories are missing test coverage?
Which scenarios are currently failing?
Which tests are flaky or unstable?

All without juggling multiple tools or maintaining brittle Excel sheets.

⸻

One system, end to end

Instead of retrofitting traceability after the fact, TestChimp treats it as a first-class concept - connecting requirements, scenarios, and executions in one place.

No spreadsheets.
No manual syncing.
Just a single system that understands what you’re building, how it’s tested, and where the gaps are.

Special Purpose Testing Agents

January 14, 2026 · 3 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

If you’re already familiar with ExploreChimp, you know it’s like having a driver navigate your web app for you. Guided by SmartTests, ExploreChimp scans the DOM, screenshots, network calls, and browser metrics to spot bugs that traditional automation scripts can't see.

While ExploreChimp gives you broad coverage, some problems only show up when you deliberately push specific edges.

That’s why we’re launching our "Troop of Special Purpose Testing Agents".

They’re all guided by the same SmartTests, but each agent is purpose-built to tackle a specific class of bugs.

Here is the starting line up that we are launching today:

Form Validation Tester: Meet “Deadpool”

Writing negative test cases for forms is soul-crushing work. You have to think of every wrong input a user could throw at your app, then write tests to catch it.

Form Validation Agent In Action

Our Form Validation Tester, affectionately nicknamed Deadpool, does all the heavy lifting for you. You only need to define the “Happy Path” - the correct way to fill a form. From there, Deadpool goes rogue:

Past and future dates
Negative numbers
Random strings
Whitespace-only inputs
Invalid data formats, like numbers in text fields

It pushes your forms to the limits to ensure your validation logic holds up - all without writing a single line of negative test code.

Theme Tester: Spot Invisible Problems

Themes are more than just aesthetic. Switching between dark mode, high contrast, or custom color palettes can break visual harmony that users expect to "just work".

Theme Agent In Action

The Theme Tester loops through all the themes your app supports, hunting for:

Contrast issues
Text visibility problems
Ugly color combinations

It can toggle themes via cookies, local storage, or by interacting with your app’s UI - whatever works best for your setup.

Localization Tester: No More "Lost in Translation"

Supporting multiple locales introduces a whole new set of bugs. Dates, currencies, text overflow, RTL layouts, and even cultural appropriateness can break your user experience.

Localization Agent Configuration

Our Localization Tester handles it all:

Detects broken translations or dangling template strings
Checks date and currency formatting across locales
Verifies layout integrity in RTL languages
Flags potential cultural missteps

With this agent on your team, you can support global audiences with confidence.

Screen Discovery Agent: Building Your App’s GPS

No test scripts yet? No problem.

The Screen Discovery Agent methodically crawls your app, visiting key screens. It automatically generates your initial SmartTest suite, so the rest of the troop can get to work.

Expanded Behaviour Map

tip

Note: You can easily add more user journeys with our Chrome Extension.

More Agents Coming Soon

This is just the beginning. We’re already working on more powerful agents, including: • RBAC Tester – to verify role-based authorizations work as intended • Network Resilience Checker – to see how your app behaves when connectivity gets fuzzy, backend breaks...

And we’re always looking for more ideas. If you’ve got a pain point in testing, we want to hear about it!

Ready to Try the Troop?

Stop stressing over the worst parts of testing. Let our agents handle the tedious tasks so you can focus on what really matters: building amazing experiences.

Try the Troop today: https://testchimp.io
Read the docs: https://docs.testchimp.io/explorations/intro

From Bug Report to Pull Request: The TestChimp x OpenHands Integration

January 5, 2026 · 3 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

Let’s be honest. Finding a bug is only the start.

Then comes the context switching – reproducing the issue, digging through logs, writing a fix that doesn’t break something else…

As of today, that workflow is outdated.

Fix Bug Cover

Today, we are launching our OpenHands Integration. This isn’t just a “Chat with AI” wrapper. It is a fully automated pipeline that takes a bug found in TestChimp and turns it into a ready-to-merge Pull Request in your GitHub repository. Here is how it works, why it actually fixes things (instead of hallucinating), and how to set it up.

The "Context Gap" (Why AI usually fails at debugging)

Most AI coding agents are smart, but blind. You tell them “The cart button is broken,” and they hallucinate a fix because they can’t see the state of the application. We solved the Context Gap. When TestChimp captures a bug (whether manually or via our automated agents), we record the entire runtime reality of that failure. When you click “Fix” via the OpenHands integration, we feed the cloud agent the complete necessary context including:

Visual Bounding Boxes: We show the agent exactly where the bug is physically located on the screen.
API Payloads: The agent sees the actual network requests and response bodies that triggered the error.
Console Logs: JavaScript errors, warnings, and stack traces captured at the exact moment of failure.
DOM Context: The full element selectors and structure information.
Screen-State: Specifics on which screen and state the app was in.

The OpenHands agent doesn’t guess anymore. It fetches these artif acts on-demand, analyzes your codebase, and writes a precise fix.

The Workflow: One Click, Real Code

We built this for speed. Here is what the new flow looks like:

1. Spot the Bug (or Batch Them)

You can select a single bug or use the checkboxes to select multiple bugs at once. If you have five related UI glitches, select them all. The agent is smart enough to identify common issues, group them, and address them together.

2. Click “Fix”

Hit the tool icon next to the bug. TestChimp validates your config and sends the context package to the OpenHands cloud instance.

Fix Bug Screenshot

3. Watch it Work Live

This is the cool part. We pop a success modal with a direct link to the OpenHands Conversation. You can click that link and watch the agent “think” in real-time. You see it analyzing the screenshots, reading the API logs, and reasoning through the code changes.

4. Review the PR

Once the agent is done, it automatically raises a Pull Request in your connected GitHub repository. You review the code, run your CI, and merge.

Technical Setup (How to turn it on)

This feature is available now for TestChimp Teams subscribers. Prerequisites: You need an OpenHands account (cloud or self-hosted) and your GitHub repository must be connected to both OpenHands and TestChimp. Note: The repo connected in OpenHands must match the repo configured in TestChimp. Configuration Steps:

Go to Project Settings -> Integrations -> OpenHands.
Enter your OpenHands API Key.
Select your Installation Type (Cloud or Self-hosted).
Click Save Configuration.

Why this matters

We are moving from “Bug Tracking” to “Bug Killing.” By giving an autonomous agent access to on-demand artifacts like bounding boxes and DOM states, we are removing the manual labor from regression testing. Stop fixing bugs manually. Let the chimp handle it.

SmartTests Now Support The Full Playwright Ecosystem

December 17, 2025 · 4 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

We’re excited to announce that SmartTests now fully support the core Playwright testing patterns and constructs you know and love. This means you can write maintainable, well-structured test suites that leverage Playwright’s powerful features while still getting all the AI-powered adaptability that makes TestChimp SmartTests special.

What Are SmartTests?

For those new to SmartTests, you can think of a SmartTest as a Playwright scripts with couple of twists:

Intent Comments:

SmartTest Steps include intent comments that describe what you’re trying to accomplish. When a test runs, it executes as a standard Playwright script for speed and determinism. But when a step fails, our AI agent steps in to fix the issue on the fly and raises a PR with the changes – giving you the best of both worlds: fast script execution and intelligent adaptability.

Screen-state annotations:

Markers that specify the screen and state the UI is at a given step in the script. These annotations are authored and used by ExploreChimp to tag the bugs to the correct screen-state in the SiteMap.

What's New: Full Playwright Compatibility

SmartTests now support all the essential Playwright patterns that help you build professional, maintainable test suites:

1. Hooks for Setup and Teardown

SmartTests now support all four Playwright hooks at both file and suite levels:

– beforeAll– Run once before all tests in a suite – afterAll – Run once after all tests in a suite – beforeEach – Run before each test – afterEach – Run after each test

This means you can set up test data, initialize page objects, authenticate users, and clean up resources exactly as you would in standard Playwright tests.

2. Page Object Models (POMs)

SmartTests fully support the Page Object Model pattern, allowing you to encapsulate page interactions in reusable classes. This keeps your tests clean, maintainable, and aligned with best practices.

Example:

import { Page } from '@playwright/test';

class SignInPage {
constructor(private page: Page) {}

async navigate() {
await this.page.goto('/signin');
}

async login(email: string, password: string) {
await this.page.fill('#email', email);
await this.page.fill('#password', password);
await this.page.click('#sign-in-button');
}
}

test('user can sign in', async ({ page }) => {
const signInPage = new SignInPage(page);
await signInPage.navigate();
await signInPage.login('user@example.com', 'password123');
});

3. Fixtures for File Uploads

SmartTests support Playwright fixtures, making it easy to handle file uploads and other test artifacts. Upload your fixture files (like test data, images, or documents) under the fixtures folder in the SmartTests tab, and they will be available during test execution.

4. Playwright Configuration

SmartTests folder contains a playwright.config.js file in your project to configure the Playwright execution environment. This is essential for:

Browser Authentication: Set up HTTP basic auth for staging environments
Custom Headers: Add authorization tokens, API keys, or custom headers
Base URLs: Configure default URLs for your test environment
Viewport Settings: Set default browser viewport sizes And more: All standard Playwright configuration options

Example playwright.config.js:

const { defineConfig } = require(‘@playwright/test’);

module.exports = defineConfig({
use: {
baseURL: ‘https://staging.example.com’,
httpCredentials: {
username: ‘staging-user’,
password: ‘staging-password’
},
extraHTTPHeaders: {
‘Authorization’: ‘Bearer your-token’,
‘X-Environment’: ‘staging’
}
}
});
“

5. Test Suites with Multiple Tests

SmartTests support organizing multiple tests in a single file using Playwright’s test.describe() blocks. You can create nested suites, group related tests together, and apply suite-level hooks – just like in standard Playwright.

Why This Matters

These additions mean SmartTests are now fully compatible with Playwright’s ecosystem. You can:

✅ Write maintainable tests using industry-standard patterns like POMs and hooks

✅ Organize your test suite with proper grouping and structure

✅ Handle complex setups with configuration files and fixtures

✅ Reuse existing Playwright knowledge without learning new patterns

✅ Still get AI-powered fixes when tests fail – the best of both worlds!

Getting Started

If you’re already using SmartTests, you can start using these features immediately. Just structure your tests using standard Playwright patterns, and SmartTests will handle the rest.

For new users, SmartTests work just like Playwright tests – with the added benefit of AI-powered failure recovery & stepwise execution enabling guided exploration.

What's Next?

SmartTests continue to evolve, and we’re committed to maintaining full compatibility with Playwright’s ecosystem while adding intelligent features that make testing easier and more reliable. Stay tuned for more updates!

Got questions or feedback? We’d love to hear from you! Drop us a line at contact@testchimp.io.

The Silent Killer Churning Your Users: Slow, Janky UX

December 1, 2025 · 3 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

Everyone loves to talk about “building features” and “shipping fast.” But we rarely talk about the thing that silently kills conversions, frustrates users, and destroys retention:

Performance.

Not the “page still loads eventually” kind – but the slow, janky, slightly-off performance that users instantly notice and abandon your product for.

And the data is brutal:

Amazon found that a 1-second delay in page load time reduced conversions by 7%.
The probability of a bounce increases by 32% as load time goes from 1s → 3s.
Apps that invest in performance optimizations see up to 30% higher retention.

Users don’t always tell you this directly, but every UX study confirms it:

Slow, sluggish experiences are one of the most complained-about frustrations – and a top reason users bounce.

But We Already Have Automated Tests… Isn’t Our App “Tested”?

This is the dangerous assumption teams make.

Yes, you may have automation test coverage.

Yes, your flows might “functionally work.”

But functional checks don’t catch:

the button that feels slow
the layout shift that makes the user misclick
the subtle JavaScript bloat that accumulates over releases
the screen that takes 1.2s longer than it used to
the resource that takes long to load due to cache misconfiguration
the memory leak that only appears after a few steps

These aren’t textbook “bugs” so no one files them.

And because performance is subjective (“eh, feels a bit sluggish?”), rarely gets documented with hard numbers.

Result: regressions creep in release after release – until your retention chart quietly slopes downward.

Performance Bug Detection in TestChimp’s Exploratory Agent

To fix this blind spot, TestChimp’s exploratory agent now automatically flags performance and memory issues – alongside the other usability bugs it catches.

And just like other bugs it finds, every performance issue is tied to the exact screen/state it appeared in.

You get a clear map of where your app slows down, why, and by how much.

No more vague complaints.

No more guessing.

Performance bugs, accurately tracked, and backed by hard evidence.

Performance bugs in TestChimp exploratory agent

What the Agent Analyzes

The agent captures and analyzes deep browser performance metrics such as:

CLS (Cumulative Layout Shift) – where janky content shifts occur
INP (Interaction to Next Paint) – slow button responses, input lag
Long Tasks – heavy JS blocking the main thread
Large or unoptimized resource loads
TBT (Total Blocking Time)
Memory heap usage and leaks
Network timing and caching misses And more…

Combines this with Screenshot data to highlight:

Which screens are causing frustration
Which buttons are slow to respond
Where layout instability is happening
Which resources are dragging down load times
Where caching is failing

Essentially:

The stuff that actually impacts user experience – and revenue – but never gets caught in ordinary test suites.

Why This Matters

Performance isn’t a “nice-to-have.”

It’s a direct business driver:

Higher conversions
Lower bounce rates
Higher user trust
Better retention
Cleaner UX
Higher SEO ranking
Less app fatigue and frustration

By treating performance issues as first-class bugs, you’re not just “optimizing”, You’re making your product feel premium and effortless, the way users expect modern webapps to be.

E2E tests as a Map of App Pathways

November 27, 2025 · 4 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

End-to-end tests are ultimately just a sequence of user actions and expectation checks. Conceptually, each test is a walk through your app

Goto url -> Login -> Go to Settings Page -> Update role -> Verify role is updated

You can represent this as a path: every step is a node, and the edges show how the user moves from one step to the next.

Now imagine aggregating all the paths from all the tests in your suite. You end up with a tree-like structure—essentially a map of every known pathway through your product.

Behaviour Map

This isn’t just a “cool visualization.”

It unlocks powerful, practical applications – especially when using AI agents for testing.

Better RAG for Testing Agents

This tree acts as a graph index over your product’s behavioural pathways.Just like a database index accelerates queries, this structure enables an agent to answer deeper questions about your app’s behaviour – making retrieval-augmented reasoning much more effective.With it, an agent doesn’t have to hallucinate how the app works.It can look up structure, pathways, and reachable states deterministically.

Automatically Expanding Your Test Suite

Once you have this_“pathway map,”_ an agent can intelligently expand your test suite by targeting untested branches. To do this well, the agent needs two answers:

How do I reach the required state?
Which branches from that state are already covered?

In TestChimp (under Atlas → Behaviour Tree), selecting any node shows:

the exact path from the root to that node (how to get there), and

all outgoing edges (which branches are already explored by existing tests).

From there, the agent simply:

Navigates to the node by following the script steps.
Look at the UI state.
Brainstorms unexplored actions (new branches).
Converts each unexplored branch into a new test.

In other words, the map gives the agent the same advantage a human has when using Google Maps – it can get anywhere, deliberately.

Controlled Agentic Exploration

Agent-led exploratory testing can be powerful: the agent can analyze DOM, screenshots, network logs, and console output while walking through your app.

But in practice, fully-agentic exploration has challenges:

Slow – inference happens at every step
Easily distracted – coarse objectives lead to wandering
Unfocused – without context, exploration becomes random

It’s like asking a human to explore an unfamiliar city with no map:

slow progress, random detours, and little sense of the big picture.

Your behavioural pathway graph is the map.

With it, the agent can:

reason about where it is,
figure out where to go next,
and explore far more methodically.

You can even focus exploration narrowly – for example:

“Analyze the Settings page as an admin user.”

Because each step in the graph is annotated with the screen and state (from previous explorations), the agent can determine:

how to reach that precise screen state, and
how to explore meaningfully once there.
To try variations (e.g., test different scenarios in Settings), the agent simply follows the shared trunk of paths that lead to that screen – much like several routes through a city share the same highway.

Bridging Pathways With App Structure: Screens & States

Throughout this post we’ve mentioned “screens” and “states.”

Here’s how they fit in.

A human knows, while navigating:

“I’m on the login page”
“Now I’m on the home page”
“Now I’m in the settings page as an admin”

Traditional Playwright scripts do not carry that semantic information.

But an agent can.

As it walks through a test step-by-step, it can look at the UI and infer:

Which screen am I on?
What state am I in? (logged in, admin, item added, etc.) This is exactly what ExploreChimp does.

During guided exploration, it maps each step to the screen and state the UI is currently in.

That enriched context enables the agent to answer questions like:

“How do I get to the Settings page as an admin user?” “What screens does this test touch?” “Which parts of the product lack coverage?”

By connecting behavioural paths with semantic screen/state understanding, TestChimp gains a rich structural model of your app – fueling downstream capabilities like:

generating user stories,
planning test strategies,
writing new tests,
and performing targeted exploratory analysis.

Screen-State markers in SmartTests

November 18, 2025 · 3 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

Ok, first a quick recap on SmartTests:

SmartTests are plain playwright scripts, with intent comments before steps, that enables hybrid execution (fallback to agent mode execution when needed).

SmartTests are used by ExploreChimp to guide its explorations in pre-defined pathways, along which it identifies UX issues of the webapp such as performance, visual glitches, usability, content and more.

The Challenge: Context for Bugs

When ExploreChimp finds bugs, it tags them with the “Screen” and “State” where they were captured. This context helps with troubleshooting and understanding when issues occur.

A Screen is a conceptual view of your application: Dashboard, Homepage, Shopping Cart, etc.
A State represents a specific situation within that screen: Empty Cart vs Cart with Items, Logged In vs Logged Out, etc.

ExploreChimp autonomously determines current screen and state based on the steps taken and the current screenshot. While this makes getting started easier, it may not always align with your mental model / the granularity you want things tracked at.

The Solution: Screen-State Annotations

Now you can add explicit screen-state markers directly in your SmartTest scripts. These annotations tell ExploreChimp exactly which screen and state the app is at at a given point in the test, ensuring bugs are tagged with the context you care about.

How It Works

After ExploreChimp runs, if the script didn’t contain screen-state markers, it updates the script with screen-state annotations it determined during the walk.

If you don’t want agent to update the script, you can turn it off by unchecking “Update script with screen-state annotations” under Advanced Settings (in the Exploration config wizard).

You can edit these annotations to match your conceptual model. For example, you may want to track UX bugs for “Cart with out-of-stock items” vs “Cart with in-stock items.” instead of the agent suggested states.

On the next run, ExploreChimp uses your annotations instead of guessing, so bugs are tagged consistently with your terminology.

Here is an example of a SmartTest with screen-state annotations:

test('Shopping Cart Flow', async ({ page }) => {
  // Navigate to homepage
  await page.goto('https://example.com');
  // @Screen: Homepage @State: Default

  // Search for a product
  await page.getByPlaceholder('Search products').fill('laptop');
  await page.getByRole('button', { name: 'Search' }).click();
  // @Screen: Search Results @State: With Results

  // Add item to cart
  await page.getByRole('link', { name: /laptop/i }).first().click();
  await page.getByRole('button', { name: 'Add to Cart' }).click();
  // @Screen: Shopping Cart @State: Cart with Items

  // Proceed to checkout
  await page.getByRole('button', { name: 'Proceed to Checkout' }).click();
  // @Screen: Checkout @State: Payment Step
});

Benefits

Consistent bug tagging: Bugs are tagged consistantly using your terminology, not AI-generated labels.
Better organization: View bugs by screen-state in Atlas → SiteMap with your own categories.
Easy refinement: Edit annotations to match your mental model easily – no need to retrain or reconfigure.

Getting Started

Run ExploreChimp on your SmartTest (annotations are added automatically).
Review and edit the annotations in your script to match your terminology.
The next time ExploreChimp is run on that test, it will use your annotations for consistent bug tagging.

The annotations are simple comments, so they don’t affect test execution – they’re purely for ExploreChimp’s context understanding.

ai-wright: AI Steps in Playwright Scripts

November 10, 2025 · 3 min read

Nuwan Samarasekera

Founder & CEO, TestChimp

Bring AI-native actions and verifications into your Playwright tests – open source, vision-enabled, and BYOL.

The Problem

Most “AI testing” frameworks make you throw away what already works.

They replace your entire test suite with “agentic” systems — where an LLM drives every click, assertion, and navigation step.

Sounds cool… until you hit:

Slow, flaky, or non-deterministic runs
Proprietary test formats
Complete vendor lock-in

For most teams, that’s a non-starter.

What if you could keep your existing Playwright scripts, and just inject AI where it’s actually needed – the ambiguous, messy, or dynamic parts of your app?

The Idea

ai-wright brings AI steps to Playwright.

You still write regular Playwright tests – deterministic, fast, inspectable – but when you hit a fuzzy point, you can drop in a step like:

await ai.act('Click on a top rated campaign', { page, test });

await ai.verify('The campaign description should not contain offensive words"', { page, test });

That’s it. AI only handles that step.

Everything else stays Playwright-native.

Why It’s Different

Vision-Enabled Existing libraries (like ZeroStep and auto-playwright) use sanitized HTML – which misses what’s actually on screen.

This causes many issues:

HTML ≠ UI reality – static DOM can’t reveal if elements are disabled, visible, obscured, or off-screen – resulting in LLMs attempting interaction with non-interactive elements.
Loss of semantics – sanitized HTML strips ARIA roles, computed text, layout cues, and shadow DOM content, which are critical for accurate reasoning.
Unbounded prompt size – large DOMs can often get too verbose, requiring truncation (resulting in loss of context).
Fragile selectors – HTML-based approaches force LLMs to guess selectors; ai-wright uses precise SoM IDs bound to live DOM nodes, enabling accurate one-shot execution.
ai-wright is vision-enabled: it blends SOM (Set-Of-Marks) annotated screenshots + structured DOM context for grounded, visual reasoning.

The result: AI that operates just like a normal user would – based on what it sees on the screen.

Better Reasoning

Instead of one-shot “guess the next click”, ai-wright uses a multi-step reasoning loop.

It plans ahead, performs coarse-grained objective handling (e.g., “fill out login form,” not just “click button”), and adapts to UI state changes – minimizing retries and random flailing.

It can identify blockers (such as Modals etc.), and execute pre-steps before actioning on the objective.

BYOL (Bring Your Own License)

ai-wright is LLM-agnostic – unlike existing solutions which require either proprietary licenses or supports specific providers only.

You can use your own OpenAI, Claude, Gemini key, or your self-hosted model – avoiding vendor lock-in.

You can choose to use your TestChimp license as well – which will proxy the LLM calls, removing separate token costs for you.

Fully Open Source

Unlike agentic SaaS offerings which are closed source, proprietary solutions, ai-wright is fully open source, giving you complete transparency and community support.

ai-wright lets you inject AI where it matters — the tricky, ambiguous, or dynamic parts of your app — without giving up the speed, determinism, and maintainability of Playwright.

With vision-enabled reasoning, resilient multi-step planning, LLM flexibility, and a fully open source foundation, ai-wright bridges the best of both worlds: reliable, scriptable tests and AI-powered intelligence where you need it most – without any vendor lock-in.

AI where it helps, plain Playwright everywhere else.

The Gap Between QA and Business Impact​

Guided Exploration, Not Random Crawling​

Structured for Insight at Any Level​

Beyond a Floating Bug List​

Plans as strongly typed markdown​

“I don’t want to manage status in a text editor”​

Linking tests is trivial​

Coverage at any granularity​

Wrapping up​

The problem with today’s approach​

A simpler model with TestChimp​

One system, end to end​

Form Validation Tester: Meet “Deadpool”​

Theme Tester: Spot Invisible Problems​

Localization Tester: No More "Lost in Translation"​

Screen Discovery Agent: Building Your App’s GPS​

More Agents Coming Soon​

The "Context Gap" (Why AI usually fails at debugging)​

The Workflow: One Click, Real Code​

1. Spot the Bug (or Batch Them)​

2. Click “Fix”​

3. Watch it Work Live​

4. Review the PR​

Technical Setup (How to turn it on)​

Why this matters​

What Are SmartTests?​

Intent Comments:​

Screen-state annotations:​

What's New: Full Playwright Compatibility​

1. Hooks for Setup and Teardown​

2. Page Object Models (POMs)​

3. Fixtures for File Uploads​

4. Playwright Configuration​

5. Test Suites with Multiple Tests​

Why This Matters​

Getting Started​

What's Next?​

But We Already Have Automated Tests… Isn’t Our App “Tested”?​

Performance Bug Detection in TestChimp’s Exploratory Agent​

What the Agent Analyzes​

Why This Matters​

Better RAG for Testing Agents​

Automatically Expanding Your Test Suite​

Controlled Agentic Exploration​

Bridging Pathways With App Structure: Screens & States​

The Challenge: Context for Bugs​

The Solution: Screen-State Annotations​

How It Works​

Benefits​

Getting Started​

The Problem​

The Idea​

Why It’s Different​

The Gap Between QA and Business Impact

Guided Exploration, Not Random Crawling

Structured for Insight at Any Level

Beyond a Floating Bug List

Plans as strongly typed markdown

“I don’t want to manage status in a text editor”

Linking tests is trivial

Coverage at any granularity

Wrapping up

The problem with today’s approach

A simpler model with TestChimp

One system, end to end

Form Validation Tester: Meet “Deadpool”

Theme Tester: Spot Invisible Problems

Localization Tester: No More "Lost in Translation"

Screen Discovery Agent: Building Your App’s GPS

More Agents Coming Soon

The "Context Gap" (Why AI usually fails at debugging)

The Workflow: One Click, Real Code

1. Spot the Bug (or Batch Them)

2. Click “Fix”

3. Watch it Work Live

4. Review the PR

Technical Setup (How to turn it on)

Why this matters

What Are SmartTests?

Intent Comments:

Screen-state annotations:

What's New: Full Playwright Compatibility

1. Hooks for Setup and Teardown

2. Page Object Models (POMs)

3. Fixtures for File Uploads

4. Playwright Configuration

5. Test Suites with Multiple Tests

Why This Matters

Getting Started

What's Next?

But We Already Have Automated Tests… Isn’t Our App “Tested”?

Performance Bug Detection in TestChimp’s Exploratory Agent

What the Agent Analyzes

Why This Matters

Better RAG for Testing Agents

Automatically Expanding Your Test Suite

Controlled Agentic Exploration

Bridging Pathways With App Structure: Screens & States

The Challenge: Context for Bugs

The Solution: Screen-State Annotations

How It Works

Benefits

Getting Started

The Problem

The Idea

Why It’s Different