How to Catch Non-Functional Bugs in E2E (UX, Performance, Accessibility)
Short answer
Functional E2E proves state changed correctly—probes pass while users still suffer slow flows, confusing errors, broken keyboard paths, and layout explosions in RTL. Layer ExploreChimp on SmartTest paths after probe coverage exists; keep Playwright for Arrange/Act/Assert, explorations for UX and non-functional regressions record-replay tools miss.
Part of Testing Guides by UI patterns.
Who this is for
Teams with green functional CI but support tickets about "confusing," "slow," or "can't complete on mobile." Especially SaaS onboarding, checkout, and admin consoles where form validation probes pass but error UX fails accessibility.
Functional vs non-functional
| Type | Example | Typical assert |
|---|---|---|
| Functional | Coupon rejected, no order | Probe paidOrderCount=0 |
| Performance | Checkout takes 12s | P95 step timing, LCP on path |
| UX copy | Error blames user for server 500 | ExploreChimp flags mismatch |
| Accessibility | No aria-invalid on field error | axe or ExploreChimp a11y recipe |
| Visual | RTL overflow hides Pay button | Theme/locale exploration |
| Resilience | Double-click creates duplicate toast noise | Interaction stress on path |
Non-functional bugs do not replace probes—they catch what probes intentionally ignore.
Complexity map
| Scenario | Edge case | Why functional E2E misses it | Approach |
|---|---|---|---|
| Slow API | Spinner forever | Probe eventually passes | Step timing budget on path |
| Wrong error tone | "You failed" on 500 | Toast visible = pass | ExploreChimp semantic review |
| Missing focus trap | Modal scrolls background | Click still works | a11y exploration |
| i18n layout | DE string overflows CTA | Probe OK | Locale exploration |
| Skeleton flash | CLS on dashboard | Data loads | Visual/regression on SmartTest entry |
| Confusing empty state | Zero results no guidance | Count probe = 0 | UX exploration recipe |
| Keyboard-only | Tab order skips Pay | Click path tested | Tab-through exploration |
| Mobile viewport | Sticky header covers field | Desktop spec green | Viewport variant on same path |
Playwright accessibility baseline
Keep lightweight a11y checks on critical forms—not full-page scans every spec:
import AxeBuilder from '@axe-core/playwright';
test('checkout exposes accessible errors', async ({ page }) => {
await page.goto('/checkout');
await page.getByRole('button', { name: 'Pay' }).click();
const results = await new AxeBuilder({ page })
.include('[data-testid="checkout-form"]')
.analyze();
expect(results.violations.filter(v => v.impact === 'critical')).toEqual([]);
await expect(page.getByLabel('Card number')).toHaveAttribute('aria-invalid', 'true');
});
See @axe-core/playwright and Playwright accessibility testing.
Limit: axe catches many WCAG issues, not confusing copy or perceived performance—that's ExploreChimp's lane.
ExploreChimp on SmartTest paths
After functional SmartTests cover a path with probes:
/testchimp explore— run ExploreChimp on the same URLs/states annotated withmarkScreenState- Recipes — Form Validation Tester, Theme Tester for systematic non-functional passes
- Triage — UX findings link to scenarios via UX bug traceability
ExploreChimp differs from URL-only agents—it follows documented SmartTest pathways with screen-state context (comparison).
Performance signals without flaky budgets
Avoid hard page.waitForTimeout performance asserts. Prefer:
- Playwright trace
actiondurations on CI regressions - Probe poll intervals logging slow backends (already in flaky waits)
- Web Vitals in test env where instrumentation exists—compare release over release, not absolute ms in CI
TrueCoverage compares prod vs test-run traffic—useful to prioritize which paths need exploration, not a substitute for UX review.
When record-replay and no-code tools fall short
Recorders capture clicks that worked once—they rarely assert error compassion, focus order, or locale layout. TestChimp keeps markdown plans + SmartTests in Git and adds /testchimp explore for non-functional passes on those same paths.
Anti-patterns
| Anti-pattern | Why it fails | Better approach |
|---|---|---|
| Snapshot entire page | Copy/locale flake | Probe + targeted a11y |
| Skip functional probes | UX theater | Probes first, explore second |
| Explore random URLs | Misses real journeys | SmartTest-anchored paths |
| axe only in unit tests | Integration gaps | Critical flows in E2E |
| Ignore mobile | Desktop-only green | Viewport matrix on top paths |
TestChimp workflow
/testchimp init— probes on business-critical flows/testchimp test— maintain functional SmartTests per PR/testchimp explore— non-functional regressions on annotated paths/testchimp evolve— close gaps between plans, explore findings, and prod
Requirement traceability shows functional coverage; exploration backlog tracks UX/a11y debt explicitly.
Example scenario
Situation: Server returns 500 on coupon apply; UI shows generic 'Something went wrong'.
Expected outcome: User sees actionable message; support can diagnose.
Why UI-only automation breaks: Functional probe shows discountCents=0 (correct) but copy blames the user ('Invalid code').
- Arrange: Seed cart with valid server error stub for coupon service.
- Act: Apply coupon in UI.
- Assert: Probe confirms no discount; ExploreChimp flags error copy vs server fault classification.
TestChimp workflow: /testchimp explore on checkout SmartTest path after probe spec exists—surfaces non-functional regression without weakening Assert.
Same Arrange/Act/Assert pattern as expired-coupon checkout.
Related
- Form validation and a11y
- Localization and i18n
- UI-only assertions gotcha
- SaaS onboarding
- Explorations intro
Frequently asked questions
Should non-functional tests block release?
Triage by severity—critical a11y and checkout blockers yes; cosmetic copy nits can backlog. Functional probes still gate data correctness.
Is ExploreChimp a replacement for Playwright?
No—Playwright/SmartTests own Arrange/Act/Assert with probes. ExploreChimp adds UX, a11y, and visual passes on those paths.
Can axe replace ExploreChimp?
axe catches many WCAG violations; ExploreChimp addresses confusing flows, error compassion, theme/locale layout, and path-specific UX recipes.
We only have functional CI—is that wrong?
It is a solid start. Add exploration on top 5 revenue paths after probes exist—biggest ROI for support-ticket class bugs.
How do record-replay tools handle UX?
They replay clicks—they do not systematically evaluate error quality, keyboard paths, or locale layout across releases.
Performance budgets in Playwright?
Use traces and trend comparison; avoid brittle ms asserts on shared CI runners. Pair with Real User Monitoring where TrueCoverage instruments prod.
Link UX bugs to requirements?
ExploreChimp findings can trace to scenarios and screen states—see UX bug traceability docs. Keeps UX debt visible alongside // @Scenario: functional links.
When to run /testchimp explore?
After functional SmartTests stabilize on a path—typically post-merge to staging or scheduled on main, plus after major UI refactors.
Apply these patterns in your repo
Run `/testchimp init` to connect TestChimp to your repo, then `/testchimp test` on PRs to turn these patterns into maintained SmartTests. Use `/testchimp evolve` when you want to expand coverage as your app grows.