How to Run Playwright E2E in GitHub Actions (Parallel, Sharding, Traces)
Short answer
Parallel Playwright in GitHub Actions stays green when Arrange is isolated per worker (runId seeds), Assert polls probes instead of sleeping, and sharding splits wall-clock time—not shared staging users. Upload traces on failure and wire @testchimp/playwright for test-run history on every PR.
Part of E2E testing in CI.
Who this is for
Startups running Playwright or SmartTests on every PR—especially after enabling workers > 1 or matrix sharding and seeing new flakes. Stacks: Next.js, Vite SPA, Rails with preview deploys on pull_request.
Why CI parallelization fails without foundations
Turning on parallelism multiplies world-state collisions:
- Worker 2 consumes the coupon Worker 1 seeded
- Shared
storageStatelogs everyone in as admin waitForTimeoutpasses on fast runners, fails on GitHub-hosted runners
Fix seed routes and probes before chasing shard count.
Complexity map
| Scenario | Edge case | Why tests break | Approach |
|---|---|---|---|
| Parallel workers | Same coupon/user | Intermittent 409 | Per-run runId in every seed |
| Shard imbalance | One shard has slow specs | Long pole | Split by timing or grep tags |
| Missing browsers | npx playwright install skipped | Launch error | Official install-deps action |
| Env secrets | BASE_URL wrong | 404 on preview | PR comment URL or Bunnyshell |
| Flaky retry | Masks Arrange bug | Silent debt | Fix probes first; retry only after |
| Artifact size | Full video always | Slow uploads | trace: on-first-retry |
| Reporter noise | HTML in logs | Hard triage | Blob + GitHub summary |
| Branch deploy lag | Preview not ready | Connection refused | wait-on health check |
Baseline GitHub Actions workflow
# .github/workflows/e2e.yml
name: E2E
on:
pull_request:
branches: [main]
jobs:
test:
timeout-minutes: 30
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
shard: [1, 2, 3, 4]
env:
E2E_TEST_MODE: 'true'
BASE_URL: ${{ vars.PREVIEW_URL || 'http://localhost:3000' }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: npm
- name: Install dependencies
run: npm ci
- name: Install Playwright browsers
run: npx playwright install --with-deps chromium
- name: Build app
run: npm run build
- name: Start server
run: npm run start &
- name: Wait for server
run: npx wait-on "${{ env.BASE_URL }}" -t 120000
- name: Run Playwright (shard ${{ matrix.shard }}/4)
run: npx playwright test --shard=${{ matrix.shard }}/4 --workers=2
env:
PLAYWRIGHT_TEST_BASE_URL: ${{ env.BASE_URL }}
- uses: actions/upload-artifact@v4
if: failure()
with:
name: playwright-report-shard-${{ matrix.shard }}
path: |
playwright-report/
test-results/
retention-days: 7
See Playwright CI docs for official templates and playwright-github-action merge-reports pattern.
playwright.config.ts essentials
import { defineConfig, devices } from '@playwright/test';
export default defineConfig({
testDir: './tests',
fullyParallel: true,
forbidOnly: !!process.env.CI,
retries: process.env.CI ? 1 : 0,
workers: process.env.CI ? 2 : undefined,
reporter: [
['list'],
['html', { open: 'never' }],
['@testchimp/playwright', { projectId: process.env.TESTCHIMP_PROJECT_ID }],
],
use: {
baseURL: process.env.PLAYWRIGHT_TEST_BASE_URL,
trace: 'on-first-retry',
screenshot: 'only-on-failure',
},
projects: [{ name: 'chromium', use: { ...devices['Desktop Chrome'] } }],
});
Product detail: run SmartTests in CI and runtime plugin.
Sharding vs workers
| Knob | What it does | When to use |
|---|---|---|
--workers=N | Parallel tests within one job | I/O-bound specs, isolated seeds |
--shard=i/k | Split suite across jobs | Long suites (>15 min) |
| Both | Maximum throughput | After flake audit |
Rule: shards do not isolate data—seeds must still be per-run, not per-shard.
Preview URLs on PRs
Point BASE_URL at preview deploys (Bunnyshell, Vercel, Render). Health-check before tests:
- name: Wait for preview
run: npx wait-on "${{ env.BASE_URL }}/api/health" -t 180000
Use multi-environment execution when SmartTests target branch-specific URLs.
Anti-patterns
| Anti-pattern | Why it fails | Better approach |
|---|---|---|
workers: 8 before seeds | Collision storm | runId fixtures first |
waitForTimeout in CI | Still races | expect.poll on probes |
| No artifacts on failure | Un-debuggable | Trace + screenshot upload |
| Retry=3 always | Hides Arrange bugs | Fix data; retry=1 max |
| Single global admin user | Session races | Seed user per runId |
TestChimp workflow
Gate merges with /testchimp test so agents repair SmartTests when selectors drift—scenario markdown supplies context recorders lack. @testchimp/playwright attaches runs to your TestChimp project for history across shards. After deploy, /testchimp evolve closes gaps between plans and production behaviour.
External references
Example scenario
Situation: PR enables 4-way sharding; checkout spec fails only on shard 3.
Expected outcome: Each shard runs isolated checkout with its own coupon and cart.
Why UI-only automation breaks: All shards reuse COUPON50; shard 3 hits 'already redeemed' intermittently.
- Arrange: Every spec seeds cart via /api/test/seed-cart with unique runId.
- Act: Complete checkout on preview URL from PLAYWRIGHT_TEST_BASE_URL.
- Assert: Probe returns paidOrderCount=1 for that runId only.
TestChimp workflow: @testchimp/playwright reporter links failing shard trace to scenario // @Scenario: checkout.
Same Arrange/Act/Assert pattern as expired-coupon checkout.
Related
Frequently asked questions
How many shards should we use?
Start with wall-clock target: if suite exceeds 15 minutes at workers=2, try 4 shards. Increase only after per-run seeds eliminate shared-data flakes.
Should we run E2E on every PR?
Yes for critical paths if preview deploy + seeds are fast. Use grep tags (@smoke) for very large suites on draft PRs.
Does TestChimp replace GitHub Actions?
No—TestChimp orchestrates tests in your repo and records runs. You keep Playwright in Actions; add @testchimp/playwright for traceability.
playwright install-deps vs install?
On ubuntu-latest use install --with-deps for system libraries. Caching browser binaries speeds subsequent runs.
How do we merge shard HTML reports?
Playwright blob reporter + merge-reports CLI combines shard outputs into one HTML artifact—see official CI sharding guide.
Flakes started after parallel—what first?
Audit shared coupons, accounts, carts, storageState. Reproduce locally with --workers=4 before raising timeouts.
Can we run against production?
Avoid mutating prod in E2E. Read-only smoke against prod is rare—prefer preview env with E2E_TEST_MODE seeds.
How does /testchimp test fit CI?
Runs before or as part of PR workflow—agents update SmartTests linked to markdown scenarios so CI failures get repaired in Git, not siloed chat.
Apply these patterns in your repo
Run `/testchimp init` to connect TestChimp to your repo, then `/testchimp test` on PRs to turn these patterns into maintained SmartTests. Use `/testchimp evolve` when you want to expand coverage as your app grows.