Skip to main content

How to Catch Non-Functional Bugs in E2E (UX, Performance, Accessibility)

Short answer

Functional E2E proves state changed correctly—probes pass while users still suffer slow flows, confusing errors, broken keyboard paths, and layout explosions in RTL. Layer ExploreChimp on SmartTest paths after probe coverage exists; keep Playwright for Arrange/Act/Assert, explorations for UX and non-functional regressions record-replay tools miss.

Part of Testing Guides by UI patterns.

Who this is for

Teams with green functional CI but support tickets about "confusing," "slow," or "can't complete on mobile." Especially SaaS onboarding, checkout, and admin consoles where form validation probes pass but error UX fails accessibility.

Functional vs non-functional

TypeExampleTypical assert
FunctionalCoupon rejected, no orderProbe paidOrderCount=0
PerformanceCheckout takes 12sP95 step timing, LCP on path
UX copyError blames user for server 500ExploreChimp flags mismatch
AccessibilityNo aria-invalid on field erroraxe or ExploreChimp a11y recipe
VisualRTL overflow hides Pay buttonTheme/locale exploration
ResilienceDouble-click creates duplicate toast noiseInteraction stress on path

Non-functional bugs do not replace probes—they catch what probes intentionally ignore.

Complexity map

ScenarioEdge caseWhy functional E2E misses itApproach
Slow APISpinner foreverProbe eventually passesStep timing budget on path
Wrong error tone"You failed" on 500Toast visible = passExploreChimp semantic review
Missing focus trapModal scrolls backgroundClick still worksa11y exploration
i18n layoutDE string overflows CTAProbe OKLocale exploration
Skeleton flashCLS on dashboardData loadsVisual/regression on SmartTest entry
Confusing empty stateZero results no guidanceCount probe = 0UX exploration recipe
Keyboard-onlyTab order skips PayClick path testedTab-through exploration
Mobile viewportSticky header covers fieldDesktop spec greenViewport variant on same path

Playwright accessibility baseline

Keep lightweight a11y checks on critical forms—not full-page scans every spec:

import AxeBuilder from '@axe-core/playwright';

test('checkout exposes accessible errors', async ({ page }) => {
await page.goto('/checkout');
await page.getByRole('button', { name: 'Pay' }).click();
const results = await new AxeBuilder({ page })
.include('[data-testid="checkout-form"]')
.analyze();
expect(results.violations.filter(v => v.impact === 'critical')).toEqual([]);
await expect(page.getByLabel('Card number')).toHaveAttribute('aria-invalid', 'true');
});

See @axe-core/playwright and Playwright accessibility testing.

Limit: axe catches many WCAG issues, not confusing copy or perceived performance—that's ExploreChimp's lane.

ExploreChimp on SmartTest paths

After functional SmartTests cover a path with probes:

  1. /testchimp explore — run ExploreChimp on the same URLs/states annotated with markScreenState
  2. RecipesForm Validation Tester, Theme Tester for systematic non-functional passes
  3. Triage — UX findings link to scenarios via UX bug traceability

ExploreChimp differs from URL-only agents—it follows documented SmartTest pathways with screen-state context (comparison).

Performance signals without flaky budgets

Avoid hard page.waitForTimeout performance asserts. Prefer:

  • Playwright trace action durations on CI regressions
  • Probe poll intervals logging slow backends (already in flaky waits)
  • Web Vitals in test env where instrumentation exists—compare release over release, not absolute ms in CI

TrueCoverage compares prod vs test-run traffic—useful to prioritize which paths need exploration, not a substitute for UX review.

When record-replay and no-code tools fall short

Recorders capture clicks that worked once—they rarely assert error compassion, focus order, or locale layout. TestChimp keeps markdown plans + SmartTests in Git and adds /testchimp explore for non-functional passes on those same paths.

Anti-patterns

Anti-patternWhy it failsBetter approach
Snapshot entire pageCopy/locale flakeProbe + targeted a11y
Skip functional probesUX theaterProbes first, explore second
Explore random URLsMisses real journeysSmartTest-anchored paths
axe only in unit testsIntegration gapsCritical flows in E2E
Ignore mobileDesktop-only greenViewport matrix on top paths

TestChimp workflow

  1. /testchimp init — probes on business-critical flows
  2. /testchimp test — maintain functional SmartTests per PR
  3. /testchimp explore — non-functional regressions on annotated paths
  4. /testchimp evolve — close gaps between plans, explore findings, and prod

Requirement traceability shows functional coverage; exploration backlog tracks UX/a11y debt explicitly.

Example scenario

Situation: Server returns 500 on coupon apply; UI shows generic 'Something went wrong'.

Expected outcome: User sees actionable message; support can diagnose.

Why UI-only automation breaks: Functional probe shows discountCents=0 (correct) but copy blames the user ('Invalid code').

  1. Arrange: Seed cart with valid server error stub for coupon service.
  2. Act: Apply coupon in UI.
  3. Assert: Probe confirms no discount; ExploreChimp flags error copy vs server fault classification.

TestChimp workflow: /testchimp explore on checkout SmartTest path after probe spec exists—surfaces non-functional regression without weakening Assert.

Same Arrange/Act/Assert pattern as expired-coupon checkout.

Frequently asked questions

Should non-functional tests block release?

Triage by severity—critical a11y and checkout blockers yes; cosmetic copy nits can backlog. Functional probes still gate data correctness.

Is ExploreChimp a replacement for Playwright?

No—Playwright/SmartTests own Arrange/Act/Assert with probes. ExploreChimp adds UX, a11y, and visual passes on those paths.

Can axe replace ExploreChimp?

axe catches many WCAG violations; ExploreChimp addresses confusing flows, error compassion, theme/locale layout, and path-specific UX recipes.

We only have functional CI—is that wrong?

It is a solid start. Add exploration on top 5 revenue paths after probes exist—biggest ROI for support-ticket class bugs.

How do record-replay tools handle UX?

They replay clicks—they do not systematically evaluate error quality, keyboard paths, or locale layout across releases.

Performance budgets in Playwright?

Use traces and trend comparison; avoid brittle ms asserts on shared CI runners. Pair with Real User Monitoring where TrueCoverage instruments prod.

Link UX bugs to requirements?

ExploreChimp findings can trace to scenarios and screen states—see UX bug traceability docs. Keeps UX debt visible alongside // @Scenario: functional links.

When to run /testchimp explore?

After functional SmartTests stabilize on a path—typically post-merge to staging or scheduled on main, plus after major UI refactors.

Apply these patterns in your repo

Run `/testchimp init` to connect TestChimp to your repo, then `/testchimp test` on PRs to turn these patterns into maintained SmartTests. Use `/testchimp evolve` when you want to expand coverage as your app grows.

Start free on TestChimp · Book a demo