How to Test Streaming AI Responses

Short answer

Streaming chat UIs render partial tokens before the model finishes—asserting message content on first chunk flakes and misses abort/error paths. Wait on stream-complete signals (data attributes, SSE end, AIMock final chunk), assert loading/disabled states during stream, and use probe Assert only after completion. Pair with offline evals for content quality; E2E for timing and UX wiring.

Part of Testing Guides by AI and conversational UX.

Who this is for

Teams shipping token-streaming chat, copilot panels, or inline completion UIs using SSE, fetch streams, or WebSockets—not batch-only APIs where the full response arrives at once.

Why testing streaming matters

Bug	User experience	Missed if you assert early
Stuck spinner	Never see full answer	Wait only on first token
Partial shown as final	Truncated instructions	No stream-complete wait
Abort ignored	Charges tokens; confusing UI	No stop button test
Error mid-stream	Blank bubble	No SSE error event test
Double submit	Duplicate messages	Send not disabled during stream
Scroll jump	Unreadable stream	No layout stability check

Complexity map

Scenario	Edge case	Why tests break	Approach
Partial render	Assert on chunk 1	Content incomplete	Wait stream-complete
Slow stream	CI timeout	False fail	AIMock chunked SSE
Abort	User clicks Stop	Orphan request	Assert idle + probe cancelled
Error mid-stream	500 after tokens	Half message shown	Error banner + no save probe
Reconnect	SSE drop	Duplicate content	Idempotent message id probe
Markdown stream	Unclosed code fence	Broken render	Complete before DOM assert
Tool call stream	Args stream then execute	Assert too early	Wait tool_started event
Concurrent streams	Tab race	Mixed messages	One stream per thread test
Accessibility	aria-busy stuck	a11y fail	Assert busy cleared
Network offline	fetch abort	Hung UI	Offline route after start

Stream completion signals

Pick one authoritative signal per app—document in test helpers:

Signal	Example
Data attribute	`[data-stream-complete="true"]`
ARIA	`aria-busy="false"` on message
Network	`page.waitForResponse` matching `/api/chat/stream`
AIMock	Final SSE chunk `{ "done": true }`
Custom event	`window` event `chat:stream-end`

async function waitStreamComplete(page) {
  await page.locator('[data-stream-complete="true"]').last().waitFor({ timeout: 30_000 });
}

Never waitForTimeout guessing stream duration.

AIMock chunked SSE

Stub stream endpoint for deterministic CI:

// Test route streams chunks with delay
// event: token\ndata: {"text":"Hello"}\n\n
// event: token\ndata: {"text":" world"}\n\n
// event: done\ndata: {}\n\n

Assert phases during streaming

await page.getByRole('textbox').fill('Explain refund policy');
await page.keyboard.press('Enter');

// Phase 1: streaming active
await expect(page.getByTestId('assistant-message').last()).toHaveAttribute('aria-busy', 'true');
await expect(page.getByRole('button', { name: /send/i })).toBeDisabled();

// Phase 2: complete
await waitStreamComplete(page);
await expect(page.getByTestId('assistant-message').last()).toHaveAttribute('aria-busy', 'false');

// Phase 3: content structure (not exact prose)
await ai.verify('Message discusses refund policy or shows citation chips');

Abort/stop path

await page.getByRole('textbox').fill('Write a long essay');
await page.keyboard.press('Enter');
await page.getByRole('button', { name: /stop/i }).click();
await expect(page.getByTestId('stream-status')).toHaveText(/stopped|cancelled/i);
await expect.poll(() => probeStreamActive(runId)).toBe(false);

AIMock: stop mid-chunk sequence to verify client aborts fetch/SSE.

Error mid-stream

Arrange AIMock or stub to emit tokens then HTTP 500:

Partial text remains with error indicator OR rolled back per product rules
Probe shows no completed assistant message row if policy is all-or-nothing

Hybrid ai.act for stream UI

Use ai.act when stop/send buttons lack stable selectors:

await ai.act('Send a message and click Stop while the assistant is still responding');
await ai.verify('Streaming stops and UI returns to idle state');

Keep timing asserts on data attributes—not ai.verify alone for completion.

Anti-patterns

Anti-pattern	Why it fails	Better approach
Assert text after 500ms	Partial content	stream-complete wait
Real model stream in CI	Latency flake	AIMock chunked SSE
Ignore Stop button	Orphan cost/state	Abort scenario
`networkidle` as completion	Background polls	Explicit done signal
Snapshot mid-stream	Nondeterministic	Assert after complete
No disabled send during stream	Double message bugs	Phase 1 asserts

Example scenario

Situation: User sends a question and the assistant streams a long policy answer.

Expected outcome: Loading states clear, full message marked complete, user cannot double-send during stream.

Why UI-only automation breaks: First sentence visible so test passes while remainder never arrives and spinner stuck.

Arrange: AIMock SSE emits 10 chunks then done event.
Act: Submit question; optionally abort in separate spec.
Assert: stream-complete true; send re-enabled; probe message status completed.

TestChimp workflow: Track stream_completion_state in TrueCoverage—abort and error slices often missing from suites.

Same Arrange/Act/Assert pattern as expired-coupon checkout.

Evals vs E2E: when each layer helps

Layer	Best for	Limitations
Offline evals	Answer quality on complete (non-streaming) responses	Does not test SSE parser, abort, aria-busy, or disabled send
E2E SmartTests (AIMock stream + completion waits + ai.act)	Stream UX, timing, error/abort paths	Not for grading 500 answer variants
Hybrid	Evals on content; E2E on streaming wiring	Standard split for copilot products

Content quality belongs in golden eval sets on batch API responses. E2E owns client stream handling. TestChimp does not ship eval tooling—use AIMock streams in SmartTests for deterministic timing tests.

Connect scenarios to your QA workflow

Capture business rules in markdown test plans and enforce them with seed routes and probe Assert. Link SmartTests with // @Scenario: for requirement traceability. Use /testchimp test on PRs; /testchimp explore on SmartTest paths for non-functional gaps (ExploreChimp).

Conversational UI — multi-turn chat
LLM output validation — post-stream structured parse
WebSockets live updates — non-SSE streaming

External references

Server-Sent Events (MDN)
Playwright assertions
Playwright network — waitForResponse patterns
SmartTests intro

Frequently asked questions

When should I assert during streaming?

Assert loading/disabled states during stream; wait for completion event (data attribute, network response end, or AIMock final chunk) before structural or content asserts on the full message.

How do I stub streaming in CI?

Use AIMock or test route that emits chunked SSE with known delays and a final done event—exercises real client parser without live model latency.

Should I use networkidle after sending a chat message?

No—background polling prevents idle. Use explicit stream-complete signal documented in your app.

How do I test the Stop button?

Start AIMock long stream, click Stop, assert UI idle and probe stream inactive. Include in suite—abort paths are often untested.

Can ai.verify run before stream completes?

Only for in-progress UX checks. Final ai.verify and probes must wait for stream-complete—otherwise you validate partial answers.

How do I test errors mid-stream?

Stub stream to emit tokens then return 500. Assert error UI and probe policy on partial messages (kept vs discarded).

Do offline evals replace streaming E2E?

No—evals grade final text quality; E2E validates SSE wiring, abort, and accessibility busy states. Use both.

Apply these patterns in your repo

Run `/testchimp init` to connect TestChimp to your repo, then `/testchimp test` on PRs to turn these patterns into maintained SmartTests. Use `/testchimp evolve` when you want to expand coverage as your app grows.

Start free on TestChimp · Book a demo

Who this is for​

Why testing streaming matters​

Complexity map​

Stream completion signals​

AIMock chunked SSE​

Assert phases during streaming​

Abort/stop path​

Error mid-stream​

Hybrid ai.act for stream UI​

Anti-patterns​