Skip to main content

4 posts tagged with "Smart Tests"

Smart testing strategy and tooling

View All Tags

Your E2E tests are unreliable? Here's why

· 6 min read
Nuwan Samarasekera
Founder & CEO, TestChimp

End-to-end tests are a necessary evil: they are the last line of defense that something actually works in a real browser—but they break often enough that the suite becomes a burden instead of a trustworthy signal.

There are three main sources of variance that make E2E tests unreliable. Understanding them is the first step toward a suite you can actually rely on.

1. World-state variance

This is what happens when your tests run in a different world-state than the one they were written against. A common cause is a shared environment where manual testing and automated runs both happen. The world changes between runs; the next run fails for reasons that have nothing to do with the code under test.

This kind of variance does more than flake tests. It also slows feedback: if those environments only get updates after PRs merge to main, you get weaker root-cause isolation and more expensive triage when something breaks.

2. System variance

These are variances built into the stack itself: network latency, transient failures, UI paint timing, and so on. Mature frameworks like Playwright address a lot of this with built-in waiting, auto-waiting locators, and expect polling—so a big slice of “flakiness” is really tooling and patterns, not fate.

3. Product variance

Even in a steady state (with no product change) — modern web apps are not as simple as calculators. Behavior can be inherently non-deterministic, and that is only more true now that AI often sits in the user journey (for example, a splash or offer that appears only sometimes). Much of that variance may be irrelevant to what a given test is trying to prove.

When tests are authored with a fragile, UI-selector-heavy approach, those product-level variances show up as broken steps. The test is coupled to incidental UI, not to intent.


Solve for these three kinds of variance, and you get a suite that is finally trustworthy.

World-state variance → controlled environments

Use ephemeral environments loaded into known, predefined world-states, so each run matches the state the test was authored against as closely as possible.

System variance → solid automation primitives

This is largely where mature frameworks shine. With Playwright, you get strong primitives for timing and stability—so you are not reinventing waits on every test.

Product variance → intent where the UI is messy

This is where agentic steps in tests can help: natural-language instructions executed by an AI, instead of brittle coupling to selectors—only on the messy, flaky parts of the flow.

There is no free lunch: natural-language steps tend to be slower, costlier, and harder to debug than plain script. The goal is to use them surgically, not everywhere.

SmartTests: intent-based Playwright scripts

That tradeoff is what TestChimp SmartTests are designed around: intent-based Playwright scripts.

They are still scripts for the most part, with an extra capability when you need flexibility—the parts of the app that fight selector-based automation.

Instead of:

await page.locator('.anticon.anticon-plus-square.ant-tree-switcher-line-icon > svg').nth(1).click();

Now you can write:

await ai.act('Expand the tree displayed in the left pane');

Only where you need it—the brittle, shifting UI—not for every line of the test.

Because the tests remain Playwright-based, system variance is handled by the same patterns and tooling you already trust. Run them in ephemeral environments with controlled world-states, and you have a test suite with all 3 variances accounted for - a suite that you can trust.

And yes, we are cooking up something on the ephemeral-environment side too. Stay tuned...

References

The themes above—non-deterministic tests, UI-level flakiness, and mitigations—are well documented in both industry practice and research. These sources are a good starting point if you want to go deeper.

  1. Martin Fowler, Eradicating Non-Determinism in Tests — A widely cited overview of why tests become non-deterministic (including async and shared-state issues) and how to structure tests to get repeatable results.
    https://martinfowler.com/articles/nonDeterminism.html

  2. Google Testing Blog (George Pirocanac), Test Flakiness - One of the main challenges of automated testing (Dec 2020) — Describes categories of flakiness and why inconsistent automated tests slow development; follow-up posts in the same series expand on causes and responses.
    https://testing.googleblog.com/2020/12/test-flakiness-one-of-main-challenges.html

  3. Google Testing Blog (John Micco), Flaky Tests at Google and How We Mitigate Them (May 2016) — Early, concrete account of flaky tests at scale, including mitigation strategies and discussion of where UI tests skew flaky.
    https://testing.googleblog.com/2016/05/flaky-tests-at-google-and-how-we.html

  4. Wing Lam, Stefan Winter, Anjiang Wei, Tao Xie, Darko Marinov, Jonathan Bell, A Large-Scale Longitudinal Study of Flaky Tests, Proc. ACM Program. Lang. 4, OOPSLA, Article 202 (2020). Peer-reviewed study of when tests become flaky and how changes in code, tests, and dependencies contribute.
    https://doi.org/10.1145/3428270
    Conference entry: https://2020.splashcon.org/details/splash-2020-oopsla/78/A-Large-Scale-Longitudinal-Study-of-Flaky-Tests

  5. Microsoft Playwright, Auto-waiting (Actionability) — Official documentation for pre-action checks (visible, stable, receiving events, enabled, etc.) that reduce timing-driven failures.
    https://playwright.dev/docs/actionability

  6. Microsoft Playwright, Assertions — Describes auto-retrying assertions (expect) that wait until conditions hold, complementary to actionability for stable checks.
    https://playwright.dev/docs/test-assertions

  7. Heroku / 12factor, Dev/prod parity (The Twelve-Factor App) — Classic framing for keeping development, staging, and production sufficiently aligned so “works in my environment” mismatches show up earlier; relevant when reasoning about world-state and shared environments.
    https://12factor.net/dev-prod-parity

  8. Google Research (Diego Cavalcanti), De-Flake Your Tests: Automatically Locating Root Causes of Flaky Tests in Code at Google, ICSME 2020 — Empirical work on locating flaky-test root causes in code at Google scale; reports high accuracy for the proposed technique in their evaluation.
    https://research.google/pubs/de-flake-your-tests-automatically-locating-root-causes-of-flaky-tests-in-code-at-google/

Simplified View: No-Code Editor - Full Code Power

· 3 min read
Nuwan Samarasekera
Founder & CEO, TestChimp

TestChimp tests have always been plain Playwright under the hood — with extra capabilities like plain-English steps and lightweight scenario linking via code comments. That gives you fixtures, hooks, page objects, and the test organization you expect from a serious engineering setup.

Most QA teams are a mix of technical and non-technical teammates. Code-only authoring keeps contribution narrow. A separate no-code tool often means a second suite that drifts from the “real” tests and never gets the same CI treatment.

We added Simplified View in the web IDE so you do not have to choose.

SmartTests Simplified View in the web IDE

What Simplified View Is

Simplified View is a no-code surface for creating and editing SmartTests that still compiles to fully functional Playwright scripts. Everyone works on the same test; people just choose how they interact with the tests.

Your teammates can:

  • Add plain English steps - that are run agentically.
  • Use structured building blocks for common actions — less boilerplate and fewer syntax slips.
  • Drop in free-form code when you need it — custom waits, tricky selectors, helpers: full Playwright, no lock-in.

You pick the level of code per step and per person, not one rule for the whole team.

Why this matters

Non-technical members can contribute directly to test automation — not only by filing tickets for engineers to translate later. They build and edit steps in Simplified View; the result is still Playwright your automation folks can refine, reuse, and run in the same pipelines as everything else.

That lifts throughput for the whole team: more people can ship checks in parallel, fewer scenarios sit in a queue waiting for a coder, and engineers spend time on structure and hard cases instead of retyping flows from docs. Underneath, it stays real Playwright — deterministic runs, familiar debugging, ExploreChimp, CI, and Git workflows you already rely on.

Getting Started

Open a SmartTest in the web IDE and switch to Simplified View to author or edit steps. When you need the full script, switch to code view; both views stay aligned with the same underlying test.

For more on creating and editing SmartTests, see Creating Smart Tests.

Further Reading

If you’re interested in how no-code and low-code approaches impact QA team velocity and collaboration in general, these resources provide useful perspectives:

SmartTests Now Support The Full Playwright Ecosystem

· 4 min read
Nuwan Samarasekera
Founder & CEO, TestChimp

We’re excited to announce that SmartTests now fully support the core Playwright testing patterns and constructs you know and love. This means you can write maintainable, well-structured test suites that leverage Playwright’s powerful features while still getting all the AI-powered adaptability that makes TestChimp SmartTests special.

What Are SmartTests?

For those new to SmartTests, you can think of a SmartTest as a Playwright scripts with couple of twists:

Intent Comments:

SmartTest Steps include intent comments that describe what you’re trying to accomplish. When a test runs, it executes as a standard Playwright script for speed and determinism. But when a step fails, our AI agent steps in to fix the issue on the fly and raises a PR with the changes – giving you the best of both worlds: fast script execution and intelligent adaptability.

Screen-state annotations:

Markers that specify the screen and state the UI is at a given step in the script. These annotations are authored and used by ExploreChimp to tag the bugs to the correct screen-state in the SiteMap.

What's New: Full Playwright Compatibility

SmartTests now support all the essential Playwright patterns that help you build professional, maintainable test suites:

1. Hooks for Setup and Teardown

SmartTests now support all four Playwright hooks at both file and suite levels:

beforeAll– Run once before all tests in a suite – afterAll – Run once after all tests in a suite – beforeEach – Run before each test – afterEach – Run after each test

This means you can set up test data, initialize page objects, authenticate users, and clean up resources exactly as you would in standard Playwright tests.

2. Page Object Models (POMs)

SmartTests fully support the Page Object Model pattern, allowing you to encapsulate page interactions in reusable classes. This keeps your tests clean, maintainable, and aligned with best practices.

Example:

import { Page } from '@playwright/test';

class SignInPage {
constructor(private page: Page) {}

async navigate() {
await this.page.goto('/signin');
}

async login(email: string, password: string) {
await this.page.fill('#email', email);
await this.page.fill('#password', password);
await this.page.click('#sign-in-button');
}
}

test('user can sign in', async ({ page }) => {
const signInPage = new SignInPage(page);
await signInPage.navigate();
await signInPage.login('user@example.com', 'password123');
});

3. Fixtures for File Uploads

SmartTests support Playwright fixtures, making it easy to handle file uploads and other test artifacts. Upload your fixture files (like test data, images, or documents) under the fixtures folder in the SmartTests tab, and they will be available during test execution.

4. Playwright Configuration

SmartTests folder contains a playwright.config.js file in your project to configure the Playwright execution environment. This is essential for:

  • Browser Authentication: Set up HTTP basic auth for staging environments
  • Custom Headers: Add authorization tokens, API keys, or custom headers
  • Base URLs: Configure default URLs for your test environment
  • Viewport Settings: Set default browser viewport sizes And more: All standard Playwright configuration options

Example playwright.config.js:

const { defineConfig } = require(‘@playwright/test’);

module.exports = defineConfig({
use: {
baseURL: ‘https://staging.example.com’,
httpCredentials: {
username: ‘staging-user’,
password: ‘staging-password’
},
extraHTTPHeaders: {
‘Authorization’: ‘Bearer your-token’,
X-Environment’: ‘staging’
}
}
});

5. Test Suites with Multiple Tests

SmartTests support organizing multiple tests in a single file using Playwright’s test.describe() blocks. You can create nested suites, group related tests together, and apply suite-level hooks – just like in standard Playwright.

Why This Matters

These additions mean SmartTests are now fully compatible with Playwright’s ecosystem. You can:

✅ Write maintainable tests using industry-standard patterns like POMs and hooks

✅ Organize your test suite with proper grouping and structure

✅ Handle complex setups with configuration files and fixtures

✅ Reuse existing Playwright knowledge without learning new patterns

✅ Still get AI-powered fixes when tests fail – the best of both worlds!

Getting Started

If you’re already using SmartTests, you can start using these features immediately. Just structure your tests using standard Playwright patterns, and SmartTests will handle the rest.

For new users, SmartTests work just like Playwright tests – with the added benefit of AI-powered failure recovery & stepwise execution enabling guided exploration.

What's Next?

SmartTests continue to evolve, and we’re committed to maintaining full compatibility with Playwright’s ecosystem while adding intelligent features that make testing easier and more reliable. Stay tuned for more updates!

Got questions or feedback? We’d love to hear from you! Drop us a line at contact@testchimp.io.

Screen-State markers in SmartTests

· 3 min read
Nuwan Samarasekera
Founder & CEO, TestChimp

Ok, first a quick recap on SmartTests:

SmartTests are plain playwright scripts, with intent comments before steps, that enables hybrid execution (fallback to agent mode execution when needed).

SmartTests are used by ExploreChimp to guide its explorations in pre-defined pathways, along which it identifies UX issues of the webapp such as performance, visual glitches, usability, content and more.

The Challenge: Context for Bugs

When ExploreChimp finds bugs, it tags them with the “Screen” and “State” where they were captured. This context helps with troubleshooting and understanding when issues occur.

  • A Screen is a conceptual view of your application: Dashboard, Homepage, Shopping Cart, etc.
  • A State represents a specific situation within that screen: Empty Cart vs Cart with Items, Logged In vs Logged Out, etc.

ExploreChimp autonomously determines current screen and state based on the steps taken and the current screenshot. While this makes getting started easier, it may not always align with your mental model / the granularity you want things tracked at.

The Solution: Screen-State Annotations

Now you can add explicit screen-state markers directly in your SmartTest scripts. These annotations tell ExploreChimp exactly which screen and state the app is at at a given point in the test, ensuring bugs are tagged with the context you care about.

How It Works

After ExploreChimp runs, if the script didn’t contain screen-state markers, it updates the script with screen-state annotations it determined during the walk.

If you don’t want agent to update the script, you can turn it off by unchecking “Update script with screen-state annotations” under Advanced Settings (in the Exploration config wizard).

You can edit these annotations to match your conceptual model. For example, you may want to track UX bugs for “Cart with out-of-stock items” vs “Cart with in-stock items.” instead of the agent suggested states.

On the next run, ExploreChimp uses your annotations instead of guessing, so bugs are tagged consistently with your terminology.

Here is an example of a SmartTest with screen-state annotations:

test('Shopping Cart Flow', async ({ page }) => {
// Navigate to homepage
await page.goto('https://example.com');
// @Screen: Homepage @State: Default

// Search for a product
await page.getByPlaceholder('Search products').fill('laptop');
await page.getByRole('button', { name: 'Search' }).click();
// @Screen: Search Results @State: With Results

// Add item to cart
await page.getByRole('link', { name: /laptop/i }).first().click();
await page.getByRole('button', { name: 'Add to Cart' }).click();
// @Screen: Shopping Cart @State: Cart with Items

// Proceed to checkout
await page.getByRole('button', { name: 'Proceed to Checkout' }).click();
// @Screen: Checkout @State: Payment Step
});

Benefits

  • Consistent bug tagging: Bugs are tagged consistantly using your terminology, not AI-generated labels.

  • Better organization: View bugs by screen-state in Atlas → SiteMap with your own categories.

  • Easy refinement: Edit annotations to match your mental model easily – no need to retrain or reconfigure.

Getting Started

  • Run ExploreChimp on your SmartTest (annotations are added automatically).

  • Review and edit the annotations in your script to match your terminology.

  • The next time ExploreChimp is run on that test, it will use your annotations for consistent bug tagging.

The annotations are simple comments, so they don’t affect test execution – they’re purely for ExploreChimp’s context understanding.