Skip to main content

ai-wright: AI Steps in Playwright Scripts

· 3 min read
Nuwan Samarasekera
Founder & CEO, TestChimp

Bring AI-native actions and verifications into your Playwright tests – open source, vision-enabled, and BYOL.

The Problem

Most “AI testing” frameworks make you throw away what already works.

They replace your entire test suite with “agentic” systems — where an LLM drives every click, assertion, and navigation step.

Sounds cool… until you hit:

  • Slow, flaky, or non-deterministic runs
  • Proprietary test formats
  • Complete vendor lock-in

For most teams, that’s a non-starter.

What if you could keep your existing Playwright scripts, and just inject AI where it’s actually needed – the ambiguous, messy, or dynamic parts of your app?

The Idea

ai-wright brings AI steps to Playwright.

You still write regular Playwright tests – deterministic, fast, inspectable – but when you hit a fuzzy point, you can drop in a step like:

await ai.act('Click on a top rated campaign', { page, test });

Or

await ai.verify('The campaign description should not contain offensive words"', { page, test });

That’s it. AI only handles that step.

Everything else stays Playwright-native.

Why It’s Different

  1. Vision-Enabled Existing libraries (like ZeroStep and auto-playwright) use sanitized HTML – which misses what’s actually on screen.

This causes many issues:

  1. HTML ≠ UI reality – static DOM can’t reveal if elements are disabled, visible, obscured, or off-screen – resulting in LLMs attempting interaction with non-interactive elements.
  2. Loss of semantics – sanitized HTML strips ARIA roles, computed text, layout cues, and shadow DOM content, which are critical for accurate reasoning.
  3. Unbounded prompt size – large DOMs can often get too verbose, requiring truncation (resulting in loss of context).
  4. Fragile selectors – HTML-based approaches force LLMs to guess selectors; ai-wright uses precise SoM IDs bound to live DOM nodes, enabling accurate one-shot execution.
  5. ai-wright is vision-enabled: it blends SOM (Set-Of-Marks) annotated screenshots + structured DOM context for grounded, visual reasoning.

The result: AI that operates just like a normal user would – based on what it sees on the screen.

  1. Better Reasoning

Instead of one-shot “guess the next click”, ai-wright uses a multi-step reasoning loop.

It plans ahead, performs coarse-grained objective handling (e.g., “fill out login form,” not just “click button”), and adapts to UI state changes – minimizing retries and random flailing.

It can identify blockers (such as Modals etc.), and execute pre-steps before actioning on the objective.

  1. BYOL (Bring Your Own License)

ai-wright is LLM-agnostic – unlike existing solutions which require either proprietary licenses or supports specific providers only.

You can use your own OpenAI, Claude, Gemini key, or your self-hosted model – avoiding vendor lock-in.

You can choose to use your TestChimp license as well – which will proxy the LLM calls, removing separate token costs for you.

  1. Fully Open Source

Unlike agentic SaaS offerings which are closed source, proprietary solutions, ai-wright is fully open source, giving you complete transparency and community support.

ai-wright lets you inject AI where it matters — the tricky, ambiguous, or dynamic parts of your app — without giving up the speed, determinism, and maintainability of Playwright.

With vision-enabled reasoning, resilient multi-step planning, LLM flexibility, and a fully open source foundation, ai-wright bridges the best of both worlds: reliable, scriptable tests and AI-powered intelligence where you need it most – without any vendor lock-in.

AI where it helps, plain Playwright everywhere else.

Building Agents? Watch Memento

· 2 min read
Nuwan Samarasekera
Founder & CEO, TestChimp

LLMs sound like humans – so we often end up instructing them as if they experience the world like us.

But there’s a subtle difference – especially when used as Agents.

👀 Humans experience a continuous stream of input and reasoning.

We build tiny hypotheses along the way:

“Let me hover over the tooltip to see what this button is for.”

It’s a loop of sense → reason → act, in continuity.

🧠 Agents, on the other hand, live in snapshots:

See screen → Decide → Act → See new screen.

Building Agents

They’re like a human who:

  • Looks at the screen
  • Writes a letter to a controller to perform an action
  • Closes their eyes while it’s happening ← VERY IMPORTANT
  • Opens their eyes to a new scene – with no memory of the past The only continuity? 📝

A notepad on the table – a few scribbled notes before they "blacked out".

So we asked ourselves:

“If this were me, how would I use that notepad?”

We’d been giving agents summaries of prior steps – but something was still missing.

So we made a small tweak to the prompt:

👉 “Write a note to your future self”

Result: the agent now jots down whatever it wants its future self to know, such as:

  • What hypothesis it’s testing
  • Why it chose this action
  • What to look for in the new state

So in the next iteration when it wakes up, it knows: “What was I thinking?”

That single line — “Write a note to your future self”

gave our agent a memory-like thread.

A small change. A big leap in clarity and navigation. 🚀

#AI #Agents #LLM #StartUp #BuildInPublic #AgenticAI