How to Test MFA and 2FA Flows
Short answer
MFA adds TOTP clock skew, one-time backup codes, SMS delivery, and step-up prompts—passing login without the second factor is not proof enforcement works. Seed TOTP secrets with otplib, stub SMS in CI, use Playwright clock for windows, and probe Assert on protected APIs—not shared authenticator apps tied to one phone.
Part of Testing Guides by auth and identity.
Who this is for
Teams shipping TOTP authenticator apps, SMS OTP, WebAuthn/passkeys, or backup codes (Auth0 MFA, Okta MFA, Firebase multi-factor, Duo, custom TOTP) who need Playwright E2E that covers enrollment, step-up, recovery, and lockout—not tests that disable MFA globally in CI.
Typical stacks: Auth0 Guardian, Okta Verify, Firebase multiFactor, AWS Cognito MFA, self-hosted TOTP with @otplib/preset-default.
Why testing MFA matters
MFA bugs are high severity because they bypass your last line of defense:
- Revenue loss — "Remember this device" lasts forever; step-up never triggers on wire transfer after 90 days.
- Security incidents — backup codes reusable; TOTP window accepts codes from ±24 hours; SMS OTP logged in plaintext; MFA skipped when
X-Forwarded-Forspoofed. - Support load — clock skew on VMs rejects valid codes; users burn backup codes during failed enrollment; recovery email loop when device lost.
- Compliance exposure — PCI/SOC2 requires proof MFA enforced for admin roles; audit finds MFA toggled off via client-side flag manipulation.
E2E must assert API rejects sensitive actions without step-up—not only that a 6-digit input appears on screen.
Complexity map
| Scenario | Edge case | Why tests break | Approach |
|---|---|---|---|
| TOTP enrollment | QR secret not captured | Cannot generate codes | Seed secret via API; otplib |
| TOTP clock skew | VM time drift | Valid code rejected | NTP sync; Wider window in test env only |
| TOTP replay | Same code twice | Should fail second use | Submit code twice |
| Backup codes | Single-use | Reuse accepted | Second attempt 401 |
| SMS OTP | Real SMS cost | Untested | Twilio test creds / stub webhook |
| WebAuthn | Requires authenticator | Cannot run headless | Virtual authenticator OR API bypass tier |
| Step-up on action | Transfer vs read | Only login MFA tested | Probe POST /transfer without step-up |
| Remember device | Cookie lasts too long | No re-prompt | Clock forward 31 days |
| MFA bypass flag | ?skipMfa=1 in test | Ships to prod | Negative URL tamper test |
| Lost device recovery | Backup email link | Untested | Mailtrap recovery path |
| Admin MFA policy | Role without MFA | Privileged access | Seed admin without MFA → 403 |
| Rate limit OTP | Brute force 6-digit | Lockout untested | Rapid wrong codes |
| Push notification MFA | Cannot automate Duo push | Skipped entirely | Nightly manual or vendor test API |
| Concurrent sessions | MFA on one device only | Confusing UX | Two browser contexts |
| Firebase MFA | Phone second factor | SMS stub needed | Emulator test numbers |
| Okta MFA enrollment | Factor setup redirect | Flaky UI | Okta API enroll factor in Arrange |
| Disable MFA | Re-auth required | Attacker disables | Probe requires password + MFA |
Tools and libraries
| Tool | Use case | Docs |
|---|---|---|
| otplib | Generate TOTP codes from secret in tests | otplib API |
Playwright clock | Advance time for remember-device expiry | Playwright clock |
| Twilio test credentials | SMS without real send | Magic numbers |
| Auth0/Okta Management API | Enroll MFA factors programmatically | Vendor docs |
| WebAuthn virtual authenticator | Chrome CDP in headed tests | Playwright WebAuthn |
TOTP enrollment and login
Seed secret in Arrange (preferred)
Expose test route or use Management API to set known TOTP secret:
import { authenticator } from 'otplib';
// Arrange — enroll user with known secret
const secret = authenticator.generateSecret();
await request.post('/api/test/enroll-totp', {
data: { runId, secret, verified: true },
});
const token = authenticator.generate(secret);
// Act — login with MFA
await page.goto('/login');
await page.getByLabel('Email').fill(`e2e-${runId}@test.local`);
await page.getByLabel('Password').fill(`pw-${runId}`);
await page.getByRole('button', { name: 'Sign in' }).click();
await page.getByLabel('Authentication code').fill(token);
await page.getByRole('button', { name: 'Verify' }).click();
await page.waitForURL('/dashboard');
expect((await page.request.get('/api/me')).status()).toBe(200);
Test enrollment UI separately
Reserve UI enrollment specs for QR display, manual secret entry, and invalid code errors. Use known secret from test shim that mirrors production enrollment API.
test('invalid TOTP rejected during enrollment', async ({ page }) => {
await loginAsNewUser(page, runId);
await page.goto('/settings/security/mfa');
await page.getByRole('button', { name: 'Set up authenticator' }).click();
await page.getByLabel('Verification code').fill('000000');
await page.getByRole('button', { name: 'Verify' }).click();
await expect(page.getByText(/invalid code/i)).toBeVisible();
const factors = await page.request.get('/api/me/mfa-factors');
expect((await factors.json()).totp).toBeFalsy();
});
Clock skew and time windows
TOTP validators typically allow ±1 step (30-second window). CI VMs with drift cause flakes.
test('TOTP accepts code within valid window', async ({ page }) => {
await page.clock.install({ time: new Date('2025-06-01T12:00:00Z') });
const secret = await enrollTotpForRunId(runId);
const token = authenticator.generate({ secret, epoch: Date.now() });
// ... submit token
});
test('TOTP rejects expired code after window', async ({ page }) => {
await page.clock.install({ time: new Date('2025-06-01T12:00:00Z') });
const secret = await enrollTotpForRunId(runId);
const token = authenticator.generate({ secret, epoch: Date.now() });
await page.clock.fastForward('00:02:00'); // past typical 30s + skew
await submitTotp(page, token);
await expect(page.getByText(/invalid/i)).toBeVisible();
});
Do not widen TOTP windows in production to fix tests—fix CI time sync or use Arrange secrets.
Backup codes
| Test | Assert |
|---|---|
| Generate backup codes | 8–10 codes returned once; not stored plaintext in probe |
| Login with backup code | Session established; code marked used |
| Reuse backup code | Rejected; probe 401 |
| Regenerate codes | Old codes invalidated |
test('backup code is single-use', async ({ page, request }) => {
const { codes } = await request.post('/api/test/mint-backup-codes', {
data: { runId, count: 5 },
}).then(r => r.json());
await loginWithBackupCode(page, runId, codes[0]);
await page.context().clearCookies();
await loginWithBackupCode(page, runId, codes[0]);
await expect(page.getByText(/invalid|already used/i)).toBeVisible();
});
SMS OTP stubbing
Never send real SMS in CI for every spec.
- Twilio test credentials — use magic numbers that simulate delivery
- Vendor stub — Auth0/Okta test tenant with fixed OTP in lab mode
- Webhook capture — test server records OTP for
expect.pollretrieval
// Poll test OTP endpoint populated by your SMS webhook in staging
const otp = await expect.poll(async () => {
const res = await request.get(`/api/test/last-sms-otp?phone=${encodeURIComponent(phone)}`);
return (await res.json()).code;
}, { timeout: 15_000 }).toMatch(/^\d{6}$/);
Firebase phone MFA: use Auth emulator test numbers.
Step-up authentication
Sensitive actions (password change, API key create, billing update) should require fresh MFA even when session exists:
test('wire transfer requires step-up MFA', async ({ page, request }) => {
await loginWithMfa(page, runId); // completed MFA at login
const transfer = await request.post('/api/transfers', {
data: { amount: 10000, toAccount: 'external' },
});
expect(transfer.status()).toBe(403);
expect(await transfer.json()).toMatchObject({ code: 'STEP_UP_REQUIRED' });
const secret = await getTotpSecret(runId);
const stepUp = await request.post('/api/auth/step-up', {
data: { totp: authenticator.generate(secret) },
});
expect(stepUp.status()).toBe(200);
const retry = await request.post('/api/transfers', { data: { amount: 10000, toAccount: 'external' } });
expect(retry.status()).toBe(200);
});
Probe the API, not only a modal appearance.
WebAuthn / passkeys (tiered)
WebAuthn in headless CI requires virtual authenticator:
import { test as base } from '@playwright/test';
test('register passkey and login', async ({ page, context }) => {
const client = await context.newCDPSession(page);
await client.send('WebAuthn.enable');
const { authenticatorId } = await client.send('WebAuthn.addVirtualAuthenticator', {
options: {
protocol: 'ctap2',
transport: 'internal',
hasResidentKey: true,
hasUserVerification: true,
isUserVerified: true,
},
});
await page.goto('/settings/security/passkeys');
await page.getByRole('button', { name: 'Add passkey' }).click();
// WebAuthn prompt auto-satisfied by virtual authenticator
await expect(page.getByText(/passkey added/i)).toBeVisible();
});
Default PR CI: TOTP via otplib for coverage; nightly headed job for WebAuthn if prod relies on passkeys.
MFA policy by role
Map to RBAC guide:
- Admin role must have MFA enrolled — probe admin API 403 without MFA
- Viewer role optional MFA — login without second factor allowed
- MFA enrollment deadline — grace period then blocked
CI checklist
- otplib generates codes from Arrange secrets—no shared Google Authenticator
- Unique user per worker with own TOTP secret
- SMS stubbed; Firebase emulator for phone MFA
- Step-up specs probe API, not modal only
- Backup code single-use and regeneration tested
- Playwright clock for remember-device expiry
- Document nightly WebAuthn / push MFA jobs
Anti-patterns
| Anti-pattern | Why it fails | Better approach |
|---|---|---|
DISABLE_MFA=true in CI | Ships without MFA | Test tenant + otplib |
| Shared TOTP on engineer phone | Cannot parallelize | Per-run secrets |
| Assert 6-digit input visible | Bypass via API | Probe protected action |
| Skip backup code reuse | Account recovery hack | Single-use spec |
| Ignore step-up | Wire fraud | POST sensitive action probe |
| Real SMS every spec | Cost + flake | Stub webhook |
waitForTimeout for OTP SMS | Slow/flaky | expect.poll inbox/webhook |
Example scenario
Situation: Logged-in user with MFA attempts a $10,000 wire transfer without completing step-up.
Expected outcome: Transfer blocked with STEP_UP_REQUIRED—no funds moved.
Why UI-only automation breaks: Transfer button disabled in UI but API POST succeeds—test never calls API.
- Arrange: User with MFA enrolled; valid session cookie from login 2 hours ago.
- Act: POST /api/transfers directly without step-up token.
- Assert: 403 STEP_UP_REQUIRED; after valid TOTP step-up, POST returns 200 and audit log records both events.
TestChimp workflow: Instrument mfa_challenge with mfa_method and recovery_path; compare prod step-up rate vs test.
Same Arrange/Act/Assert pattern as expired-coupon checkout.
Connect scenarios to your QA workflow
Capture business rules in markdown test plans and enforce them with seed routes and probe Assert. Link SmartTests with // @Scenario: for requirement traceability. Use /testchimp test on PRs; /testchimp explore on SmartTest paths for non-functional gaps (ExploreChimp).
Related scenarios
- Auth0 and Okta SSO — IdP-enforced MFA
- Firebase Authentication — phone second factor
- RBAC permissions — MFA required for admin role
- Session timeout — step-up vs session expiry
- Magic links — passwordless + MFA combo
- Transactional email — recovery emails
External references
- otplib
- Playwright clock
- Playwright WebAuthn
- Auth0 MFA
- Okta MFA APIs
- Firebase multi-factor auth
- Twilio test credentials
Frequently asked questions
How do I generate TOTP codes in Playwright without a phone?
Enroll users with a known secret via test API or Management API, then use otplib authenticator.generate(secret) in the test. Never share one Google Authenticator entry across parallel workers.
Should I disable MFA in CI?
No. Use test tenants, known TOTP secrets, and SMS stubs. Disabling MFA in CI means step-up and enrollment regressions ship to prod—exactly what MFA protects against.
How do I test step-up MFA for sensitive actions?
With a valid post-login session, POST to a sensitive API endpoint and expect 403 STEP_UP_REQUIRED. Complete step-up via TOTP API, then retry and expect 200. Do not rely on disabled buttons alone.
How do I test backup codes without burning real ones?
Use a test mint route that generates codes tied to runId. Assert first use succeeds, second fails, and regeneration invalidates old codes. Probe server state—not only UI.
Can WebAuthn run in headless CI?
Use Playwright virtual authenticator via CDP for headed or Chromium CI jobs. For default PR pipelines, cover MFA with TOTP otplib and run WebAuthn nightly if passkeys dominate prod—check TrueCoverage mfa_method.
SMS OTP is slow and flaky in tests—what should I do?
Stub SMS webhooks to a test OTP retrieval endpoint, use Twilio test credentials, or Firebase Auth emulator phone numbers. Poll with expect.poll; never send real SMS per spec.
How does TestChimp help track MFA coverage?
TrueCoverage compares mfa_method and challenge_context in prod vs test. Use /testchimp evolve to add step-up and backup-code scenarios when admin MFA adoption rises—link SmartTests with // @Scenario: for audit traceability.
Apply these patterns in your repo
Run `/testchimp init` to connect TestChimp to your repo, then `/testchimp test` on PRs to turn these patterns into maintained SmartTests. Use `/testchimp evolve` when you want to expand coverage as your app grows.