How to Test Streaming AI Responses
Short answer
Streaming chat UIs render partial tokens before the model finishes—asserting message content on first chunk flakes and misses abort/error paths. Wait on stream-complete signals (data attributes, SSE end, AIMock final chunk), assert loading/disabled states during stream, and use probe Assert only after completion. Pair with offline evals for content quality; E2E for timing and UX wiring.
Part of Testing Guides by AI and conversational UX.
Who this is for
Teams shipping token-streaming chat, copilot panels, or inline completion UIs using SSE, fetch streams, or WebSockets—not batch-only APIs where the full response arrives at once.
Why testing streaming matters
| Bug | User experience | Missed if you assert early |
|---|---|---|
| Stuck spinner | Never see full answer | Wait only on first token |
| Partial shown as final | Truncated instructions | No stream-complete wait |
| Abort ignored | Charges tokens; confusing UI | No stop button test |
| Error mid-stream | Blank bubble | No SSE error event test |
| Double submit | Duplicate messages | Send not disabled during stream |
| Scroll jump | Unreadable stream | No layout stability check |
Complexity map
| Scenario | Edge case | Why tests break | Approach |
|---|---|---|---|
| Partial render | Assert on chunk 1 | Content incomplete | Wait stream-complete |
| Slow stream | CI timeout | False fail | AIMock chunked SSE |
| Abort | User clicks Stop | Orphan request | Assert idle + probe cancelled |
| Error mid-stream | 500 after tokens | Half message shown | Error banner + no save probe |
| Reconnect | SSE drop | Duplicate content | Idempotent message id probe |
| Markdown stream | Unclosed code fence | Broken render | Complete before DOM assert |
| Tool call stream | Args stream then execute | Assert too early | Wait tool_started event |
| Concurrent streams | Tab race | Mixed messages | One stream per thread test |
| Accessibility | aria-busy stuck | a11y fail | Assert busy cleared |
| Network offline | fetch abort | Hung UI | Offline route after start |
Stream completion signals
Pick one authoritative signal per app—document in test helpers:
| Signal | Example |
|---|---|
| Data attribute | [data-stream-complete="true"] |
| ARIA | aria-busy="false" on message |
| Network | page.waitForResponse matching /api/chat/stream |
| AIMock | Final SSE chunk { "done": true } |
| Custom event | window event chat:stream-end |
async function waitStreamComplete(page) {
await page.locator('[data-stream-complete="true"]').last().waitFor({ timeout: 30_000 });
}
Never waitForTimeout guessing stream duration.
AIMock chunked SSE
Stub stream endpoint for deterministic CI:
// Test route streams chunks with delay
// event: token\ndata: {"text":"Hello"}\n\n
// event: token\ndata: {"text":" world"}\n\n
// event: done\ndata: {}\n\n
Register AIMock in Arrange so Playwright tests real client SSE parser without live model latency variance.
Assert phases during streaming
await page.getByRole('textbox').fill('Explain refund policy');
await page.keyboard.press('Enter');
// Phase 1: streaming active
await expect(page.getByTestId('assistant-message').last()).toHaveAttribute('aria-busy', 'true');
await expect(page.getByRole('button', { name: /send/i })).toBeDisabled();
// Phase 2: complete
await waitStreamComplete(page);
await expect(page.getByTestId('assistant-message').last()).toHaveAttribute('aria-busy', 'false');
// Phase 3: content structure (not exact prose)
await ai.verify('Message discusses refund policy or shows citation chips');
Abort/stop path
await page.getByRole('textbox').fill('Write a long essay');
await page.keyboard.press('Enter');
await page.getByRole('button', { name: /stop/i }).click();
await expect(page.getByTestId('stream-status')).toHaveText(/stopped|cancelled/i);
await expect.poll(() => probeStreamActive(runId)).toBe(false);
AIMock: stop mid-chunk sequence to verify client aborts fetch/SSE.
Error mid-stream
Arrange AIMock or stub to emit tokens then HTTP 500:
- Partial text remains with error indicator OR rolled back per product rules
- Probe shows no completed assistant message row if policy is all-or-nothing
Hybrid ai.act for stream UI
Use ai.act when stop/send buttons lack stable selectors:
await ai.act('Send a message and click Stop while the assistant is still responding');
await ai.verify('Streaming stops and UI returns to idle state');
Keep timing asserts on data attributes—not ai.verify alone for completion.
Anti-patterns
| Anti-pattern | Why it fails | Better approach |
|---|---|---|
| Assert text after 500ms | Partial content | stream-complete wait |
| Real model stream in CI | Latency flake | AIMock chunked SSE |
| Ignore Stop button | Orphan cost/state | Abort scenario |
networkidle as completion | Background polls | Explicit done signal |
| Snapshot mid-stream | Nondeterministic | Assert after complete |
| No disabled send during stream | Double message bugs | Phase 1 asserts |
Example scenario
Situation: User sends a question and the assistant streams a long policy answer.
Expected outcome: Loading states clear, full message marked complete, user cannot double-send during stream.
Why UI-only automation breaks: First sentence visible so test passes while remainder never arrives and spinner stuck.
- Arrange: AIMock SSE emits 10 chunks then done event.
- Act: Submit question; optionally abort in separate spec.
- Assert: stream-complete true; send re-enabled; probe message status completed.
TestChimp workflow: Track stream_completion_state in TrueCoverage—abort and error slices often missing from suites.
Same Arrange/Act/Assert pattern as expired-coupon checkout.
Evals vs E2E: when each layer helps
| Layer | Best for | Limitations |
|---|---|---|
| Offline evals | Answer quality on complete (non-streaming) responses | Does not test SSE parser, abort, aria-busy, or disabled send |
| E2E SmartTests (AIMock stream + completion waits + ai.act) | Stream UX, timing, error/abort paths | Not for grading 500 answer variants |
| Hybrid | Evals on content; E2E on streaming wiring | Standard split for copilot products |
Content quality belongs in golden eval sets on batch API responses. E2E owns client stream handling. TestChimp does not ship eval tooling—use AIMock streams in SmartTests for deterministic timing tests.
Connect scenarios to your QA workflow
Capture business rules in markdown test plans and enforce them with seed routes and probe Assert. Link SmartTests with // @Scenario: for requirement traceability. Use /testchimp test on PRs; /testchimp explore on SmartTest paths for non-functional gaps (ExploreChimp).
Related scenarios
- Conversational UI — multi-turn chat
- LLM output validation — post-stream structured parse
- WebSockets live updates — non-SSE streaming
External references
- Server-Sent Events (MDN)
- Playwright assertions
- Playwright network — waitForResponse patterns
- SmartTests intro
Frequently asked questions
When should I assert during streaming?
Assert loading/disabled states during stream; wait for completion event (data attribute, network response end, or AIMock final chunk) before structural or content asserts on the full message.
How do I stub streaming in CI?
Use AIMock or test route that emits chunked SSE with known delays and a final done event—exercises real client parser without live model latency.
Should I use networkidle after sending a chat message?
No—background polling prevents idle. Use explicit stream-complete signal documented in your app.
How do I test the Stop button?
Start AIMock long stream, click Stop, assert UI idle and probe stream inactive. Include in suite—abort paths are often untested.
Can ai.verify run before stream completes?
Only for in-progress UX checks. Final ai.verify and probes must wait for stream-complete—otherwise you validate partial answers.
How do I test errors mid-stream?
Stub stream to emit tokens then return 500. Assert error UI and probe policy on partial messages (kept vs discarded).
Do offline evals replace streaming E2E?
No—evals grade final text quality; E2E validates SSE wiring, abort, and accessibility busy states. Use both.
Apply these patterns in your repo
Run `/testchimp init` to connect TestChimp to your repo, then `/testchimp test` on PRs to turn these patterns into maintained SmartTests. Use `/testchimp evolve` when you want to expand coverage as your app grows.