How to Catch Non-Functional Bugs in E2E (UX, Performance, Accessibility)

Short answer

Functional E2E proves state changed correctly—probes pass while users still suffer slow flows, confusing errors, broken keyboard paths, and layout explosions in RTL. Layer ExploreChimp on SmartTest paths after probe coverage exists; keep Playwright for Arrange/Act/Assert, explorations for UX and non-functional regressions record-replay tools miss.

Part of Testing Guides by UI patterns.

Who this is for

Teams with green functional CI but support tickets about "confusing," "slow," or "can't complete on mobile." Especially SaaS onboarding, checkout, and admin consoles where form validation probes pass but error UX fails accessibility.

Functional vs non-functional

Type	Example	Typical assert
Functional	Coupon rejected, no order	Probe `paidOrderCount=0`
Performance	Checkout takes 12s	P95 step timing, LCP on path
UX copy	Error blames user for server 500	ExploreChimp flags mismatch
Accessibility	No `aria-invalid` on field error	axe or ExploreChimp a11y recipe
Visual	RTL overflow hides Pay button	Theme/locale exploration
Resilience	Double-click creates duplicate toast noise	Interaction stress on path

Non-functional bugs do not replace probes—they catch what probes intentionally ignore.

Complexity map

Scenario	Edge case	Why functional E2E misses it	Approach
Slow API	Spinner forever	Probe eventually passes	Step timing budget on path
Wrong error tone	"You failed" on 500	Toast visible = pass	ExploreChimp semantic review
Missing focus trap	Modal scrolls background	Click still works	a11y exploration
i18n layout	DE string overflows CTA	Probe OK	Locale exploration
Skeleton flash	CLS on dashboard	Data loads	Visual/regression on SmartTest entry
Confusing empty state	Zero results no guidance	Count probe = 0	UX exploration recipe
Keyboard-only	Tab order skips Pay	Click path tested	Tab-through exploration
Mobile viewport	Sticky header covers field	Desktop spec green	Viewport variant on same path

Playwright accessibility baseline

Keep lightweight a11y checks on critical forms—not full-page scans every spec:

import AxeBuilder from '@axe-core/playwright';

test('checkout exposes accessible errors', async ({ page }) => {
  await page.goto('/checkout');
  await page.getByRole('button', { name: 'Pay' }).click();
  const results = await new AxeBuilder({ page })
    .include('[data-testid="checkout-form"]')
    .analyze();
  expect(results.violations.filter(v => v.impact === 'critical')).toEqual([]);
  await expect(page.getByLabel('Card number')).toHaveAttribute('aria-invalid', 'true');
});

See @axe-core/playwright and Playwright accessibility testing.

Limit: axe catches many WCAG issues, not confusing copy or perceived performance—that's ExploreChimp's lane.

ExploreChimp on SmartTest paths

After functional SmartTests cover a path with probes:

/testchimp explore — run ExploreChimp on the same URLs/states annotated with markScreenState
Recipes — Form Validation Tester, Theme Tester for systematic non-functional passes
Triage — UX findings link to scenarios via UX bug traceability

ExploreChimp differs from URL-only agents—it follows documented SmartTest pathways with screen-state context (comparison).

Performance signals without flaky budgets

Avoid hard page.waitForTimeout performance asserts. Prefer:

Playwright trace action durations on CI regressions
Probe poll intervals logging slow backends (already in flaky waits)
Web Vitals in test env where instrumentation exists—compare release over release, not absolute ms in CI

TrueCoverage compares prod vs test-run traffic—useful to prioritize which paths need exploration, not a substitute for UX review.

When record-replay and no-code tools fall short

Recorders capture clicks that worked once—they rarely assert error compassion, focus order, or locale layout. TestChimp keeps markdown plans + SmartTests in Git and adds /testchimp explore for non-functional passes on those same paths.

Anti-patterns

Anti-pattern	Why it fails	Better approach
Snapshot entire page	Copy/locale flake	Probe + targeted a11y
Skip functional probes	UX theater	Probes first, explore second
Explore random URLs	Misses real journeys	SmartTest-anchored paths
axe only in unit tests	Integration gaps	Critical flows in E2E
Ignore mobile	Desktop-only green	Viewport matrix on top paths

TestChimp workflow

/testchimp init — probes on business-critical flows
/testchimp test — maintain functional SmartTests per PR
/testchimp explore — non-functional regressions on annotated paths
/testchimp evolve — close gaps between plans, explore findings, and prod

Requirement traceability shows functional coverage; exploration backlog tracks UX/a11y debt explicitly.

Example scenario

Situation: Server returns 500 on coupon apply; UI shows generic 'Something went wrong'.

Expected outcome: User sees actionable message; support can diagnose.

Why UI-only automation breaks: Functional probe shows discountCents=0 (correct) but copy blames the user ('Invalid code').

Arrange: Seed cart with valid server error stub for coupon service.
Act: Apply coupon in UI.
Assert: Probe confirms no discount; ExploreChimp flags error copy vs server fault classification.

TestChimp workflow: /testchimp explore on checkout SmartTest path after probe spec exists—surfaces non-functional regression without weakening Assert.

Same Arrange/Act/Assert pattern as expired-coupon checkout.

Frequently asked questions

Should non-functional tests block release?

Triage by severity—critical a11y and checkout blockers yes; cosmetic copy nits can backlog. Functional probes still gate data correctness.

Is ExploreChimp a replacement for Playwright?

No—Playwright/SmartTests own Arrange/Act/Assert with probes. ExploreChimp adds UX, a11y, and visual passes on those paths.

Can axe replace ExploreChimp?

axe catches many WCAG violations; ExploreChimp addresses confusing flows, error compassion, theme/locale layout, and path-specific UX recipes.

We only have functional CI—is that wrong?

It is a solid start. Add exploration on top 5 revenue paths after probes exist—biggest ROI for support-ticket class bugs.

How do record-replay tools handle UX?

They replay clicks—they do not systematically evaluate error quality, keyboard paths, or locale layout across releases.

Performance budgets in Playwright?

Use traces and trend comparison; avoid brittle ms asserts on shared CI runners. Pair with Real User Monitoring where TrueCoverage instruments prod.

Link UX bugs to requirements?

ExploreChimp findings can trace to scenarios and screen states—see UX bug traceability docs. Keeps UX debt visible alongside // @Scenario: functional links.

When to run /testchimp explore?

After functional SmartTests stabilize on a path—typically post-merge to staging or scheduled on main, plus after major UI refactors.

Apply these patterns in your repo

Run `/testchimp init` to connect TestChimp to your repo, then `/testchimp test` on PRs to turn these patterns into maintained SmartTests. Use `/testchimp evolve` when you want to expand coverage as your app grows.

Start free on TestChimp · Book a demo

Who this is for​

Functional vs non-functional​

Complexity map​

Playwright accessibility baseline​

ExploreChimp on SmartTest paths​

Performance signals without flaky budgets​

When record-replay and no-code tools fall short​

Anti-patterns​

TestChimp workflow​