Skip to main content

The Silent Killer Churning Your Users: Slow, Janky UX

· 3 min read
Nuwan Samarasekera
Founder & CEO, TestChimp

Everyone loves to talk about “building features” and “shipping fast.” But we rarely talk about the thing that silently kills conversions, frustrates users, and destroys retention:

Performance.

Performance bugs cover

Not the “page still loads eventually” kind – but the slow, janky, slightly-off performance that users instantly notice and abandon your product for.

And the data is brutal:

  • Amazon found that a 1-second delay in page load time reduced conversions by 7%.

  • The probability of a bounce increases by 32% as load time goes from 1s → 3s.

  • Apps that invest in performance optimizations see up to 30% higher retention.

Users don’t always tell you this directly, but every UX study confirms it:

Slow, sluggish experiences are one of the most complained-about frustrations – and a top reason users bounce.

But We Already Have Automated Tests… Isn’t Our App “Tested”?

This is the dangerous assumption teams make.

Yes, you may have automation test coverage.

Yes, your flows might “functionally work.”

But functional checks don’t catch:

  • the button that feels slow
  • the layout shift that makes the user misclick
  • the subtle JavaScript bloat that accumulates over releases
  • the screen that takes 1.2s longer than it used to
  • the resource that takes long to load due to cache misconfiguration
  • the memory leak that only appears after a few steps

These aren’t textbook “bugs” so no one files them.

And because performance is subjective (“eh, feels a bit sluggish?”), rarely gets documented with hard numbers.

Result: regressions creep in release after release – until your retention chart quietly slopes downward.

Performance Bug Detection in TestChimp’s Exploratory Agent

To fix this blind spot, TestChimp’s exploratory agent now automatically flags performance and memory issues – alongside the other usability bugs it catches.

And just like other bugs it finds, every performance issue is tied to the exact screen/state it appeared in.

You get a clear map of where your app slows down, why, and by how much.

No more vague complaints.

No more guessing.

Performance bugs, accurately tracked, and backed by hard evidence.

Performance bugs in TestChimp exploratory agent

What the Agent Analyzes

The agent captures and analyzes deep browser performance metrics such as:

  • CLS (Cumulative Layout Shift) – where janky content shifts occur
  • INP (Interaction to Next Paint) – slow button responses, input lag
  • Long Tasks – heavy JS blocking the main thread
  • Large or unoptimized resource loads
  • TBT (Total Blocking Time)
  • Memory heap usage and leaks
  • Network timing and caching misses And more…

Combines this with Screenshot data to highlight:

  • Which screens are causing frustration
  • Which buttons are slow to respond
  • Where layout instability is happening
  • Which resources are dragging down load times
  • Where caching is failing

Essentially:

The stuff that actually impacts user experience – and revenue – but never gets caught in ordinary test suites.

Why This Matters

Performance isn’t a “nice-to-have.”

It’s a direct business driver:

  1. Higher conversions
  2. Lower bounce rates
  3. Higher user trust
  4. Better retention
  5. Cleaner UX
  6. Higher SEO ranking
  7. Less app fatigue and frustration

By treating performance issues as first-class bugs, you’re not just “optimizing”, You’re making your product feel premium and effortless, the way users expect modern webapps to be.

E2E tests as a Map of App Pathways

· 4 min read
Nuwan Samarasekera
Founder & CEO, TestChimp

End-to-end tests are ultimately just a sequence of user actions and expectation checks. Conceptually, each test is a walk through your app

Goto url -> Login -> Go to Settings Page -> Update role -> Verify role is updated

You can represent this as a path: every step is a node, and the edges show how the user moves from one step to the next.

Now imagine aggregating all the paths from all the tests in your suite. You end up with a tree-like structure—essentially a map of every known pathway through your product.

Behaviour Map

This isn’t just a “cool visualization.”

It unlocks powerful, practical applications – especially when using AI agents for testing.

Better RAG for Testing Agents

This tree acts as a graph index over your product’s behavioural pathways.Just like a database index accelerates queries, this structure enables an agent to answer deeper questions about your app’s behaviour – making retrieval-augmented reasoning much more effective.With it, an agent doesn’t have to hallucinate how the app works.It can look up structure, pathways, and reachable states deterministically.

Automatically Expanding Your Test Suite

Once you have this_“pathway map,”_ an agent can intelligently expand your test suite by targeting untested branches. To do this well, the agent needs two answers:

  • How do I reach the required state?
  • Which branches from that state are already covered?

In TestChimp (under Atlas → Behaviour Tree), selecting any node shows:

the exact path from the root to that node (how to get there), and

all outgoing edges (which branches are already explored by existing tests).

From there, the agent simply:

  1. Navigates to the node by following the script steps.

  2. Look at the UI state.

  3. Brainstorms unexplored actions (new branches).

  4. Converts each unexplored branch into a new test.

In other words, the map gives the agent the same advantage a human has when using Google Maps – it can get anywhere, deliberately.

Controlled Agentic Exploration

Agent-led exploratory testing can be powerful: the agent can analyze DOM, screenshots, network logs, and console output while walking through your app.

But in practice, fully-agentic exploration has challenges:

  • Slow – inference happens at every step
  • Easily distracted – coarse objectives lead to wandering
  • Unfocused – without context, exploration becomes random

It’s like asking a human to explore an unfamiliar city with no map:

slow progress, random detours, and little sense of the big picture.

Your behavioural pathway graph is the map.

With it, the agent can:

  • reason about where it is,
  • figure out where to go next,
  • and explore far more methodically.

You can even focus exploration narrowly – for example:

“Analyze the Settings page as an admin user.”

Because each step in the graph is annotated with the screen and state (from previous explorations), the agent can determine:

  • how to reach that precise screen state, and
  • how to explore meaningfully once there.
  • To try variations (e.g., test different scenarios in Settings), the agent simply follows the shared trunk of paths that lead to that screen – much like several routes through a city share the same highway.

Bridging Pathways With App Structure: Screens & States

Throughout this post we’ve mentioned “screens” and “states.”

Here’s how they fit in.

A human knows, while navigating:

  • “I’m on the login page”
  • “Now I’m on the home page”
  • “Now I’m in the settings page as an admin”

Traditional Playwright scripts do not carry that semantic information.

But an agent can.

As it walks through a test step-by-step, it can look at the UI and infer:

  • Which screen am I on?
  • What state am I in? (logged in, admin, item added, etc.) This is exactly what ExploreChimp does.

During guided exploration, it maps each step to the screen and state the UI is currently in.

That enriched context enables the agent to answer questions like:

“How do I get to the Settings page as an admin user?” “What screens does this test touch?” “Which parts of the product lack coverage?”

By connecting behavioural paths with semantic screen/state understanding, TestChimp gains a rich structural model of your app – fueling downstream capabilities like:

  • generating user stories,
  • planning test strategies,
  • writing new tests,
  • and performing targeted exploratory analysis.

Screen-State markers in SmartTests

· 3 min read
Nuwan Samarasekera
Founder & CEO, TestChimp

Ok, first a quick recap on SmartTests:

SmartTests are plain playwright scripts, with intent comments before steps, that enables hybrid execution (fallback to agent mode execution when needed).

SmartTests are used by ExploreChimp to guide its explorations in pre-defined pathways, along which it identifies UX issues of the webapp such as performance, visual glitches, usability, content and more.

The Challenge: Context for Bugs

When ExploreChimp finds bugs, it tags them with the “Screen” and “State” where they were captured. This context helps with troubleshooting and understanding when issues occur.

  • A Screen is a conceptual view of your application: Dashboard, Homepage, Shopping Cart, etc.
  • A State represents a specific situation within that screen: Empty Cart vs Cart with Items, Logged In vs Logged Out, etc.

ExploreChimp autonomously determines current screen and state based on the steps taken and the current screenshot. While this makes getting started easier, it may not always align with your mental model / the granularity you want things tracked at.

The Solution: Screen-State Annotations

Now you can add explicit screen-state markers directly in your SmartTest scripts. These annotations tell ExploreChimp exactly which screen and state the app is at at a given point in the test, ensuring bugs are tagged with the context you care about.

How It Works

After ExploreChimp runs, if the script didn’t contain screen-state markers, it updates the script with screen-state annotations it determined during the walk.

If you don’t want agent to update the script, you can turn it off by unchecking “Update script with screen-state annotations” under Advanced Settings (in the Exploration config wizard).

You can edit these annotations to match your conceptual model. For example, you may want to track UX bugs for “Cart with out-of-stock items” vs “Cart with in-stock items.” instead of the agent suggested states.

On the next run, ExploreChimp uses your annotations instead of guessing, so bugs are tagged consistently with your terminology.

Here is an example of a SmartTest with screen-state annotations:

test('Shopping Cart Flow', async ({ page }) => {
// Navigate to homepage
await page.goto('https://example.com');
// @Screen: Homepage @State: Default

// Search for a product
await page.getByPlaceholder('Search products').fill('laptop');
await page.getByRole('button', { name: 'Search' }).click();
// @Screen: Search Results @State: With Results

// Add item to cart
await page.getByRole('link', { name: /laptop/i }).first().click();
await page.getByRole('button', { name: 'Add to Cart' }).click();
// @Screen: Shopping Cart @State: Cart with Items

// Proceed to checkout
await page.getByRole('button', { name: 'Proceed to Checkout' }).click();
// @Screen: Checkout @State: Payment Step
});

Benefits

  • Consistent bug tagging: Bugs are tagged consistantly using your terminology, not AI-generated labels.

  • Better organization: View bugs by screen-state in Atlas → SiteMap with your own categories.

  • Easy refinement: Edit annotations to match your mental model easily – no need to retrain or reconfigure.

Getting Started

  • Run ExploreChimp on your SmartTest (annotations are added automatically).

  • Review and edit the annotations in your script to match your terminology.

  • The next time ExploreChimp is run on that test, it will use your annotations for consistent bug tagging.

The annotations are simple comments, so they don’t affect test execution – they’re purely for ExploreChimp’s context understanding.

ai-wright: AI Steps in Playwright Scripts

· 3 min read
Nuwan Samarasekera
Founder & CEO, TestChimp

Bring AI-native actions and verifications into your Playwright tests – open source, vision-enabled, and BYOL.

The Problem

Most “AI testing” frameworks make you throw away what already works.

They replace your entire test suite with “agentic” systems — where an LLM drives every click, assertion, and navigation step.

Sounds cool… until you hit:

  • Slow, flaky, or non-deterministic runs
  • Proprietary test formats
  • Complete vendor lock-in

For most teams, that’s a non-starter.

What if you could keep your existing Playwright scripts, and just inject AI where it’s actually needed – the ambiguous, messy, or dynamic parts of your app?

The Idea

ai-wright brings AI steps to Playwright.

You still write regular Playwright tests – deterministic, fast, inspectable – but when you hit a fuzzy point, you can drop in a step like:

await ai.act('Click on a top rated campaign', { page, test });

Or

await ai.verify('The campaign description should not contain offensive words"', { page, test });

That’s it. AI only handles that step.

Everything else stays Playwright-native.

Why It’s Different

  1. Vision-Enabled Existing libraries (like ZeroStep and auto-playwright) use sanitized HTML – which misses what’s actually on screen.

This causes many issues:

  1. HTML ≠ UI reality – static DOM can’t reveal if elements are disabled, visible, obscured, or off-screen – resulting in LLMs attempting interaction with non-interactive elements.
  2. Loss of semantics – sanitized HTML strips ARIA roles, computed text, layout cues, and shadow DOM content, which are critical for accurate reasoning.
  3. Unbounded prompt size – large DOMs can often get too verbose, requiring truncation (resulting in loss of context).
  4. Fragile selectors – HTML-based approaches force LLMs to guess selectors; ai-wright uses precise SoM IDs bound to live DOM nodes, enabling accurate one-shot execution.
  5. ai-wright is vision-enabled: it blends SOM (Set-Of-Marks) annotated screenshots + structured DOM context for grounded, visual reasoning.

The result: AI that operates just like a normal user would – based on what it sees on the screen.

  1. Better Reasoning

Instead of one-shot “guess the next click”, ai-wright uses a multi-step reasoning loop.

It plans ahead, performs coarse-grained objective handling (e.g., “fill out login form,” not just “click button”), and adapts to UI state changes – minimizing retries and random flailing.

It can identify blockers (such as Modals etc.), and execute pre-steps before actioning on the objective.

  1. BYOL (Bring Your Own License)

ai-wright is LLM-agnostic – unlike existing solutions which require either proprietary licenses or supports specific providers only.

You can use your own OpenAI, Claude, Gemini key, or your self-hosted model – avoiding vendor lock-in.

You can choose to use your TestChimp license as well – which will proxy the LLM calls, removing separate token costs for you.

  1. Fully Open Source

Unlike agentic SaaS offerings which are closed source, proprietary solutions, ai-wright is fully open source, giving you complete transparency and community support.

ai-wright lets you inject AI where it matters — the tricky, ambiguous, or dynamic parts of your app — without giving up the speed, determinism, and maintainability of Playwright.

With vision-enabled reasoning, resilient multi-step planning, LLM flexibility, and a fully open source foundation, ai-wright bridges the best of both worlds: reliable, scriptable tests and AI-powered intelligence where you need it most – without any vendor lock-in.

AI where it helps, plain Playwright everywhere else.

Building Agents? Watch Memento

· 2 min read
Nuwan Samarasekera
Founder & CEO, TestChimp

LLMs sound like humans – so we often end up instructing them as if they experience the world like us.

But there’s a subtle difference – especially when used as Agents.

👀 Humans experience a continuous stream of input and reasoning.

We build tiny hypotheses along the way:

“Let me hover over the tooltip to see what this button is for.”

It’s a loop of sense → reason → act, in continuity.

🧠 Agents, on the other hand, live in snapshots:

See screen → Decide → Act → See new screen.

Building Agents

They’re like a human who:

  • Looks at the screen
  • Writes a letter to a controller to perform an action
  • Closes their eyes while it’s happening ← VERY IMPORTANT
  • Opens their eyes to a new scene – with no memory of the past The only continuity? 📝

A notepad on the table – a few scribbled notes before they "blacked out".

So we asked ourselves:

“If this were me, how would I use that notepad?”

We’d been giving agents summaries of prior steps – but something was still missing.

So we made a small tweak to the prompt:

👉 “Write a note to your future self”

Result: the agent now jots down whatever it wants its future self to know, such as:

  • What hypothesis it’s testing
  • Why it chose this action
  • What to look for in the new state

So in the next iteration when it wakes up, it knows: “What was I thinking?”

That single line — “Write a note to your future self”

gave our agent a memory-like thread.

A small change. A big leap in clarity and navigation. 🚀

#AI #Agents #LLM #StartUp #BuildInPublic #AgenticAI