Skip to main content

How to Test Streaming AI Responses

Short answer

Streaming chat UIs render partial tokens before the model finishes—asserting message content on first chunk flakes and misses abort/error paths. Wait on stream-complete signals (data attributes, SSE end, AIMock final chunk), assert loading/disabled states during stream, and use probe Assert only after completion. Pair with offline evals for content quality; E2E for timing and UX wiring.

Part of Testing Guides by AI and conversational UX.

Who this is for

Teams shipping token-streaming chat, copilot panels, or inline completion UIs using SSE, fetch streams, or WebSockets—not batch-only APIs where the full response arrives at once.

Why testing streaming matters

BugUser experienceMissed if you assert early
Stuck spinnerNever see full answerWait only on first token
Partial shown as finalTruncated instructionsNo stream-complete wait
Abort ignoredCharges tokens; confusing UINo stop button test
Error mid-streamBlank bubbleNo SSE error event test
Double submitDuplicate messagesSend not disabled during stream
Scroll jumpUnreadable streamNo layout stability check

Complexity map

ScenarioEdge caseWhy tests breakApproach
Partial renderAssert on chunk 1Content incompleteWait stream-complete
Slow streamCI timeoutFalse failAIMock chunked SSE
AbortUser clicks StopOrphan requestAssert idle + probe cancelled
Error mid-stream500 after tokensHalf message shownError banner + no save probe
ReconnectSSE dropDuplicate contentIdempotent message id probe
Markdown streamUnclosed code fenceBroken renderComplete before DOM assert
Tool call streamArgs stream then executeAssert too earlyWait tool_started event
Concurrent streamsTab raceMixed messagesOne stream per thread test
Accessibilityaria-busy stucka11y failAssert busy cleared
Network offlinefetch abortHung UIOffline route after start

Stream completion signals

Pick one authoritative signal per app—document in test helpers:

SignalExample
Data attribute[data-stream-complete="true"]
ARIAaria-busy="false" on message
Networkpage.waitForResponse matching /api/chat/stream
AIMockFinal SSE chunk { "done": true }
Custom eventwindow event chat:stream-end
async function waitStreamComplete(page) {
await page.locator('[data-stream-complete="true"]').last().waitFor({ timeout: 30_000 });
}

Never waitForTimeout guessing stream duration.

AIMock chunked SSE

Stub stream endpoint for deterministic CI:

// Test route streams chunks with delay
// event: token\ndata: {"text":"Hello"}\n\n
// event: token\ndata: {"text":" world"}\n\n
// event: done\ndata: {}\n\n

Register AIMock in Arrange so Playwright tests real client SSE parser without live model latency variance.

Assert phases during streaming

await page.getByRole('textbox').fill('Explain refund policy');
await page.keyboard.press('Enter');

// Phase 1: streaming active
await expect(page.getByTestId('assistant-message').last()).toHaveAttribute('aria-busy', 'true');
await expect(page.getByRole('button', { name: /send/i })).toBeDisabled();

// Phase 2: complete
await waitStreamComplete(page);
await expect(page.getByTestId('assistant-message').last()).toHaveAttribute('aria-busy', 'false');

// Phase 3: content structure (not exact prose)
await ai.verify('Message discusses refund policy or shows citation chips');

Abort/stop path

await page.getByRole('textbox').fill('Write a long essay');
await page.keyboard.press('Enter');
await page.getByRole('button', { name: /stop/i }).click();
await expect(page.getByTestId('stream-status')).toHaveText(/stopped|cancelled/i);
await expect.poll(() => probeStreamActive(runId)).toBe(false);

AIMock: stop mid-chunk sequence to verify client aborts fetch/SSE.

Error mid-stream

Arrange AIMock or stub to emit tokens then HTTP 500:

  • Partial text remains with error indicator OR rolled back per product rules
  • Probe shows no completed assistant message row if policy is all-or-nothing

Hybrid ai.act for stream UI

Use ai.act when stop/send buttons lack stable selectors:

await ai.act('Send a message and click Stop while the assistant is still responding');
await ai.verify('Streaming stops and UI returns to idle state');

Keep timing asserts on data attributes—not ai.verify alone for completion.

Anti-patterns

Anti-patternWhy it failsBetter approach
Assert text after 500msPartial contentstream-complete wait
Real model stream in CILatency flakeAIMock chunked SSE
Ignore Stop buttonOrphan cost/stateAbort scenario
networkidle as completionBackground pollsExplicit done signal
Snapshot mid-streamNondeterministicAssert after complete
No disabled send during streamDouble message bugsPhase 1 asserts

Example scenario

Situation: User sends a question and the assistant streams a long policy answer.

Expected outcome: Loading states clear, full message marked complete, user cannot double-send during stream.

Why UI-only automation breaks: First sentence visible so test passes while remainder never arrives and spinner stuck.

  1. Arrange: AIMock SSE emits 10 chunks then done event.
  2. Act: Submit question; optionally abort in separate spec.
  3. Assert: stream-complete true; send re-enabled; probe message status completed.

TestChimp workflow: Track stream_completion_state in TrueCoverage—abort and error slices often missing from suites.

Same Arrange/Act/Assert pattern as expired-coupon checkout.

Evals vs E2E: when each layer helps

LayerBest forLimitations
Offline evalsAnswer quality on complete (non-streaming) responsesDoes not test SSE parser, abort, aria-busy, or disabled send
E2E SmartTests (AIMock stream + completion waits + ai.act)Stream UX, timing, error/abort pathsNot for grading 500 answer variants
HybridEvals on content; E2E on streaming wiringStandard split for copilot products

Content quality belongs in golden eval sets on batch API responses. E2E owns client stream handling. TestChimp does not ship eval tooling—use AIMock streams in SmartTests for deterministic timing tests.

Connect scenarios to your QA workflow

Capture business rules in markdown test plans and enforce them with seed routes and probe Assert. Link SmartTests with // @Scenario: for requirement traceability. Use /testchimp test on PRs; /testchimp explore on SmartTest paths for non-functional gaps (ExploreChimp).

External references

Frequently asked questions

When should I assert during streaming?

Assert loading/disabled states during stream; wait for completion event (data attribute, network response end, or AIMock final chunk) before structural or content asserts on the full message.

How do I stub streaming in CI?

Use AIMock or test route that emits chunked SSE with known delays and a final done event—exercises real client parser without live model latency.

Should I use networkidle after sending a chat message?

No—background polling prevents idle. Use explicit stream-complete signal documented in your app.

How do I test the Stop button?

Start AIMock long stream, click Stop, assert UI idle and probe stream inactive. Include in suite—abort paths are often untested.

Can ai.verify run before stream completes?

Only for in-progress UX checks. Final ai.verify and probes must wait for stream-complete—otherwise you validate partial answers.

How do I test errors mid-stream?

Stub stream to emit tokens then return 500. Assert error UI and probe policy on partial messages (kept vs discarded).

Do offline evals replace streaming E2E?

No—evals grade final text quality; E2E validates SSE wiring, abort, and accessibility busy states. Use both.

Apply these patterns in your repo

Run `/testchimp init` to connect TestChimp to your repo, then `/testchimp test` on PRs to turn these patterns into maintained SmartTests. Use `/testchimp evolve` when you want to expand coverage as your app grows.

Start free on TestChimp · Book a demo