Skip to main content

Testing Guides for AI and Conversational UX

Short answer

Chatbots, agents, RAG, LLM output validation, streaming, and canvas—evals plus E2E with AIMock, ai.act/ai.verify, and probe Assert. Each guide includes complexity maps, fixture patterns, probe-first Assert, and page-scoped FAQs—wired to TestChimp QA workflow where it helps maintenance and requirement coverage.

Guides in this section

GuideWhat you'll learn
Chatbots and Conversational UIsTest chatbots and conversational UI with hybrid Playwright, AIMock golden responses, offline evals, and probe Assert—without brittle exact-text matching
AI Agent Workflows and Tool CallingAgent workflow E2E—tool failure, retries, idempotent side effects, human approval—with stub APIs and probe Assert
RAG and Knowledge-Base SearchRAG failures show as wrong answers or missing citations—test retrieval and generation layers separately where possible
LLM Output Quality in E2EExact string match fails on LLMs—use schema validation, semantic evals, or AIMock for E2E
Streaming AI ResponsesStreaming UI needs asserts on completion, not first token
Canvas, Charts, and Visual WidgetsPixel coordinates flake on canvas—prefer ai

How TestChimp applies to these guides

These guides are scenario playbooks—Arrange/Act/Assert patterns, fixture discipline, probe Assert—not product feature lists. When you wire them into your repo, TestChimp adds:

LayerWhat it does
Test planning as codeMarkdown scenarios in Git consolidate business rules agents read on every PR (test planning)
SmartTests + AI stepsPlaywright you own; ai.act / ai.verify for volatile UI without abandoning probes (SmartTests)
Requirement traceability// @Scenario: links connect specs to plan rows—critical in complex products with many dimension combos (traceability)
Per-PR QA workflow/testchimp test in Claude or Cursor with the TestChimp skill—not a web recorder (QA on Autopilot)
ExploreChimpNon-functional bug capture on SmartTest paths—latency, UX confusion, accessibility gaps most suites miss (explorations)
Post-deploy evolve/testchimp evolve closes plan and production gaps; TrueCoverage is one signal among explore findings and requirement holes

Why record-replay and no-code tools fall short

Browser recorders and web-based no-code suites optimize click capture. They struggle with per-run seed data, probe Assert on authoritative state, requirement matrices in Git, and non-functional regressions on real user paths. TestChimp contrasts that approach: orchestrated QA where Claude (upskilled with the TestChimp skill) maintains SmartTests against markdown plans—see record-replay vs TestChimp.

Other guide sections

Frequently asked questions

Which guide should I read first?

Open the guide closest to your current sprint risk—conversational UI if shipping chat, agent workflows if adding tools, RAG if citations matter, streaming if partial-render bugs appear. All guides share Arrange/Act/Assert and probe Assert patterns.

Do these replace TestChimp product docs?

No—product docs explain features; these guides explain how to test specific AI scenarios authoritatively, with vendor docs linked and TestChimp workflow mentioned where it naturally helps maintenance and requirement coverage.

Why not rely on LLM evals alone?

Evals catch prompt and retrieval regression cheaply but miss auth gates, tool side effects, streaming timing, and cross-tenant isolation. Guides show the hybrid pattern mature AI teams use—evals plus probe-backed E2E.

How is this different from no-code AI test tools?

No-code and recorder tools capture UI clicks; they rarely seed per-run threads, stub models with AIMock, or assert authoritative state via probes. TestChimp runs in Claude/Cursor with markdown plans and SmartTests in Git—see built-with and record-replay comparison guides.