Path: blob/main/dev-docs/debugging-flaky-tests.md
12923 views
Debugging Flaky Tests: A Systematic Approach
A methodology for debugging tests that fail intermittently in CI or when run as part of a test suite, but pass when run in isolation.
Problem Pattern
Tests that hang or timeout when run in CI or as part of a test suite, but work fine when run alone. Common root cause categories:
State pollution: One test modifies global state that affects subsequent tests
Resource leaks: File handles, processes, or network connections not cleaned up
Environment corruption: Package managers (TinyTeX, npm, etc.) get into inconsistent state
Timing/race conditions: Tests depend on specific execution order or timing
Investigation Methodology
Phase 1: Reproduce Locally
Goal: Confirm you can reproduce the issue outside of CI.
Identify the failing test bucket from CI logs
Extract the test file list from CI configuration
Create a test script to run the bucket sequentially:
Run and confirm the hang occurs locally
Phase 2: Binary Search to Isolate Culprit
Goal: Find which specific test file causes the issue.
If test N causes state pollution, tests 1 through N-1 will pass, then the problematic test will occur.
Split your test list in half
Run first half + the hanging test:
If it hangs: culprit is in first half, repeat with first half
If it passes: culprit is in second half, repeat with second half
Continue until you identify the single test file
Example: In #13647, binary search across 51 tests identified render-format-extension.test.ts as the culprit.
Phase 3: Narrow Down Within Test File
Goal: Find which specific operation in the test file causes pollution.
Read the test file to understand what it does
Identify distinct operations (e.g., rendering different formats)
Comment out sections and retest:
Binary search through the operations to find the specific one
Example: In #13647, rendering academic/document.qmd with elsevier-pdf format was the specific trigger.
Phase 4: Understand the State Change
Goal: Determine what environmental change causes the issue.
Common suspects: package installations (TinyTeX, npm, pip), configuration file modifications, cache pollution, file system changes.
Create a clean test environment (fresh TinyTeX install)
Take snapshots before/after the problematic operation:
For TinyTeX issues, check:
Installed packages:
tlmgr list --only-installedPackage versions:
tlmgr info <package>Format files:
ls -la $(kpsewhich -var-value TEXMFSYSVAR)/web2c/luatex/What
tlmgr update --allinstalls
Example: In #13647, elsevier-pdf rendering triggered tlmgr update --all which updated core packages and regenerated lualatex format files. The format regeneration expected modern conventions that conflicted with the bundled class file.
Phase 5: Identify Root Cause
Goal: Understand WHY the state change causes the failure.
Compare working vs broken states in detail
For package version issues:
Check if test bundles old versions of libraries/classes
Compare with system-installed versions
Review changelogs between versions
Create minimal reproduction:
Example: In #13647, the bundled elsarticle.cls v3.3 was missing \RequirePackage[T1]{fontenc}. TinyTeX's elsarticle.cls v3.4c includes it. The font encoding mismatch corrupted lualatex format files, causing subsequent lualatex renders to hang.
Phase 6: Verify Solution
Goal: Confirm your fix resolves the issue.
Apply the fix (update package, patch code, etc.)
Create verification script:
Run multiple times to ensure consistency
Test with clean environment each time (critical for environment pollution issues)
Key Debugging Tools
TinyTeX
Test Isolation
Package/Dependency Comparison
Best Practices
Always reproduce locally first - CI is too slow for iterative debugging
Use binary search - Most efficient way to isolate culprits in large test suites
Test with clean environments - Especially for environment pollution issues
Take snapshots - Before/after comparisons are invaluable
Create verification scripts - Automate testing your fix
Document the root cause - Help others understand the issue
Common Pitfalls
Testing with polluted environment - Always start fresh for environment issues
Assuming causation from correlation - Just because test A runs before test B doesn't mean A causes B's failure
Stopping too early - Finding the problematic test isn't enough; understand WHY it causes issues
Not verifying the fix - Always confirm your solution actually works
Checklist
Reproduce the issue locally
Identify the specific test bucket that triggers the issue
Use binary search to isolate the culprit test file
Narrow down to specific operation within the test
Take environment snapshots before/after
Identify what environmental change occurs
Understand WHY the change causes the failure
Develop and apply a fix
Verify the fix with clean environments
Document the root cause and solution
Case Study: #13647 (tufte.qmd Hanging in CI)
Symptom: tufte.qmd hangs after 10+ minutes when run after a bucket of tests. Same document renders fine in ~30s when run alone. Lualatex engine stuck during "running lualatex - 1".
Investigation summary:
References:
quarto-journals/elsevier#38 - Update elsarticle.cls
quarto-journals/elsevier#40 - CTAN update