No README available for this skill.
mutation-testing
Description
Use when running mutation testing, killing mutants, verifying test quality, checking mutation score, or analyzing survivors after the test baseline is green
apm install SebastienDegodez/copilot-instructions/plugins/superpowers-whetstone/skills/mutation-testing
View on GitHub
Mutation Testing
Add a third validation layer to Outside-In TDD workflow. Acceptance tests verify WHAT (observable behavior), Domain tests verify HOW (business rules), mutation testing verifies tests actually catch bugs.
Core Concept
Mutation testing introduces deliberate bugs (mutants) into source code, then runs the test suite. If tests fail, the mutant is killed β. If tests pass despite the bug, the mutant survives β (test gap found).
Source code β introduce mutation β run tests
βββ tests FAIL β mutant killed β
βββ tests PASS β mutant survived β
A project with 100% code coverage can still have a 60% mutation score β meaning 40% of introduced bugs go undetected.
When to Use
Run mutation testing after the relevant test baseline is green:
- β Core behavior tests pass
- β Rule-focused tests pass
- 𧬠Mutation testing β verify tests detect regressions
Never run on red baseline β mutation assumes tests work correctly first.
Approach for .NET/C#
Primary: Stryker.NET (Recommended)
For .NET projects, Stryker.NET is the established mutation framework with excellent C# support. No config file needed β all options are passed via CLI.
Install (only if not already available):
# Check first β if this succeeds, skip installation entirely. Do NOT manipulate PATH.
dotnet stryker --version
# Only run if the above command fails (tool not found)
dotnet tool install -g dotnet-stryker
Run on changed code only (default workflow β use after every story):
# Mutate only files changed since main β fast, targeted
dotnet stryker \
--project src/YourProject.Domain/YourProject.Domain.csproj \
-tp tests/YourProject.UnitTests/YourProject.UnitTests.csproj \
--mutate "**/*.cs" --mutate "!**/*Marker.cs" --mutate "!**/DependencyInjection.cs" \
--since:main \
--break-at 100 \
-r json
--since:mainβ only mutants within git-diff vsmainare tested. Unchanged code produces no result. Fast.
Run full business logic (use before merge):
dotnet stryker \
--project src/YourProject.Core/YourProject.Core.csproj \
-tp tests/YourProject.UnitTests/YourProject.UnitTests.csproj \
--mutate "src/YourProject.Core/**/*.cs" \
--mutate "src/YourProject.Application/**/*.cs" \
--mutate "!**/*Marker.cs" --mutate "!**/DependencyInjection.cs" \
--break-at 100 \
--threshold-high 90 --threshold-low 80 \
-r json -r cleartext
Cumulative baseline β full picture after incremental runs:
# --with-baseline combines --since with a persistent baseline report
# Use this in CI to keep a full history while only re-testing changed code
dotnet stryker \
--project src/YourProject.Core/YourProject.Core.csproj \
-tp tests/YourProject.UnitTests/YourProject.UnitTests.csproj \
--with-baseline:main \
--break-at 100 \
-r json
--with-baseline=--since+ saves/loads a baseline report. Gives a complete score even when only changed files were re-tested.
Alternative: Custom Mutation Tool (For Specific Needs)
Build a custom tool only when:
- Stryker doesnβt cover domain-specific mutation patterns
- You need tight integration with custom test infrastructure
- Performance optimization requires targeted mutation scope
Architecture (3 modules):
- Mutations β rules table (
+β-,trueβfalse,>=β>) - Runner β source-to-test mapping, targeted test execution
- Core β orchestration: apply mutation β run tests β restore β report
For full custom tool reference, see Uncle Bobβs empire-2025 mutation testing.
Core Mutation Categories
| Category | Examples |
|---|---|
| Arithmetic | + β -, * β /, ++ β -- |
| Comparison | > β >=, < β <=, == β != |
| Boolean | true β false, && β ||, !x β x |
| Conditional | negate conditions, swap if/else branches |
| Constant | 0 β 1, "" β "mutant", null β new() |
| Return value | return true β return false |
| Void method | remove method call entirely |
| LINQ | .Any() β .All(), .First() β .Last() |
Workflow
Universal prerequisite β applies to every step, every scenario: Before any mutation activity (first run, CI setup, killing survivors, analyzing reports), the test suite for the affected scope must be green. If tests are failing, fix them first. Mutation results on a red baseline are meaningless β failing tests cannot kill mutants they already canβt run.
Step 1: Verify Prerequisites
Before running mutation testing, confirm:
- β Baseline tests are green for the mutated scope
- β Meaningful unit tests exist (mutation runs against unit tests)
- β No uncommitted changes (mutations modify source temporarily)
- β Tests are fast (< 100ms each) β slow tests = slow mutation runs
Step 2: Set Mutation Scope
Target critical business logic first:
- Domain policies, decision engines, pricing/risk calculators
- Application orchestration with complex conditionals
- Validation rules and boundary behavior
Exclude from mutation:
- DTOs, data structures without logic
- Infrastructure (repositories, adapters)
- Configuration, DependencyInjection files
- Generated code, marker interfaces
Progressive scoping:
| Phase | Scope | Goal |
|---|---|---|
| Week 1-2 | One critical rule module | Baseline + learning |
| Week 3-4 | All core rule modules | Establish quality gate |
| Ongoing | Core + critical orchestration handlers | Full confidence |
Step 3: Run Mutations
During development (fast, on changed code only):
dotnet stryker \
--project src/YourProject.Core/YourProject.Core.csproj \
-tp tests/YourProject.UnitTests/YourProject.UnitTests.csproj \
--since:main \
--break-at 100 \
-r json
Before merge (full business logic scope):
dotnet stryker \
--project src/YourProject.Core/YourProject.Core.csproj \
-tp tests/YourProject.UnitTests/YourProject.UnitTests.csproj \
--mutate "src/YourProject.Core/**/*.cs" \
--mutate "src/YourProject.Application/**/*.cs" \
--mutate "!**/*Marker.cs" --mutate "!**/DependencyInjection.cs" \
--break-at 100 \
-r json -r cleartext
Metrics:
- Total mutants generated
- Mutants killed (tests caught the bug β)
- Mutants survived (test gap β)
- Mutation score: (killed / total) Γ 100
--sincenote: unchanged files produce no result β this is expected. Survivors and kills only apply to the diff scope.
Expected duration: --since run: ~1-3 min. Full run: ~5-15 min (depends on test suite speed).
Step 4: Analyze Survivors
Query survivors directly from the JSON report β do not read the full file:
jq '[.files | to_entries[] | {file: .key, survivors: [.value.mutants[] | select(.status == "Survived") | {mutator: .mutatorName, line: .location.start.line, replacement: .replacement}]}] | map(select(.survivors | length > 0))' \
StrykerOutput/$(ls -t StrykerOutput | head -1)/reports/mutation-report.json
For each surviving mutant:
- Read the mutation β what was changed? (e.g.,
>=β>, removedifbranch) - Identify unguarded behavior β which business rule isnβt tested?
- Categorize:
- Real gap β behavior change not caught by tests
- Equivalent mutant β mutation doesnβt change observable behavior
Equivalent mutant examples:
x = x + 0changed tox = x + 1(dead code)- Logging statements removed (no observable effect)
- Defensive null checks when value is guaranteed non-null by type
After classifying survivors, always include a targeted re-run command scoped to the files that contain real gaps β this confirms kills after you write new tests and gives reviewers a runnable artifact:
dotnet stryker \
--project <YourProject.Domain.csproj> \
-tp <path/to/YourProject.UnitTests/YourProject.UnitTests.csproj> \
--mutate "**/<FileWithRealGap>.cs" \
--mutate "!**/*Marker.cs" --mutate "!**/DependencyInjection.cs" \
--break-at 100 \
-r cleartext
Step 5: Kill Surviving Mutants
For each real survivor (not equivalent):
- Write a new test targeting the unguarded behavior
- Run test against mutated code (using Strykerβs mutation operator):
- Expected: test FAILS (catches the bug)
- Run test against original code:
- Expected: test PASSES
- Re-run Stryker to confirm kill
Example:
Survivor: if (age >= 18) mutated to if (age > 18) β survived
// New test to kill the boundary mutant
[Fact]
public void WhenDriverIsExactly18_ShouldBeEligible()
{
var policy = new EligibilityPolicy();
var driver = new DriverInfo(Age: 18, LicenseYears: 1);
var vehicle = new VehicleInfo(Type: "sedan", Age: 1);
var result = policy.Evaluate(driver, vehicle);
Assert.True(result.IsEligible); // Fails if mutant uses `age > 18`
}
Step 6: Report & Document
Present summary with before/after metrics:
Mutation Testing Report β Core Business Layer
βββββββββββββββββββββββββββββββββββββββ
Scope: YourProject.Core.Policies
Score: 68% β 82% (after killing survivors)
Killed: 82 / 100
Survived: 18 β 10
New tests added: 8
- Boundary tests for age/experience thresholds: 4
- Edge cases for vehicle type combinations: 3
- Null/empty validation: 1
Remaining survivors (equivalent mutants β documented):
- EligibilityPolicy.cs:L42 β removed log statement (no observable effect)
- DriverAge.cs:L15 β defensive null check (guaranteed non-null by type)
Document legitimate survivors in code comments or architecture decision records.
Mutation Score Targets
Set thresholds based on team policy and risk profile. Common practice is to start with a progressive threshold and tighten it over time.
| Score | Assessment | Action |
|---|---|---|
| High threshold met | Healthy signal | Keep survivor review discipline |
| Near threshold | Potential gaps | Add targeted tests for risky survivors |
| Below threshold | Quality risk | Block merge or require mitigation plan |
Equivalent mutants are the only legitimate exception β document them explicitly.
Progressive Threshold Strategy
| Phase | Threshold | Enforcement |
|---|---|---|
| Week 1-2 | Baseline only | Measure, learn mutation categories |
| Week 3-4 | Team-defined threshold (e.g., 80%) | Block PR if below |
| Month 2 | Tightened threshold (e.g., 90%) | Ramp up |
| Steady state | Risk-based target per module | Block merge when policy is not met |
CI/CD integration:
# In CI pipeline - fail build if below 100%
dotnet stryker --break-at [team-threshold]
When the CI gate fails, it means survivors remain. Do not raise the threshold to pass β investigate each survivor first. Classify them as real gap (write a test) or equivalent mutant (document). Only equivalent mutants are an acceptable reason to adjust the threshold.
Integration with Outside-In TDD
Mutation testing is the third validation layer:
1. Gherkin scenarios (WHAT) β Acceptance tests
2. Business rules (HOW) β Domain tests
3. Test effectiveness (REAL?) β Mutation testing
Workflow integration:
- Write Gherkin scenario (outside-in-tdd)
- RED β validate β SYNTHESIZE GREEN (red-synthesize-green)
- After story complete: Run mutation testing on affected business logic modules
- Kill critical survivors before merge
Anti-Patterns
βLet me mutate before tests are greenβ
No. Fix failing tests first. Mutation assumes a green baseline.
β100% is unrealisticβ
Aggressive targets can be appropriate for critical logic, but thresholds are a policy decision. Equivalent mutants remain the only valid exception to survivor cleanup.
βMutate everything including Infrastructureβ
Never mutate repositories, adapters, and pure plumbing. Focus on business logic first.
βRun mutations on every commitβ
Too slow. Run on feature completion or weekly. CI runs only on PR.
βIgnore all survivors as equivalentβ
Rationalization. Most survivors are real gaps. Investigate each one.
βChase the score, not the qualityβ
Mutation score is a signal, not the goal. Focus on killing mutants that represent real behavioral gaps.
Common Mistakes
| Mistake | Fix |
|---|---|
| Running mutation on failing tests | Green baseline required β fix tests first |
| Mutating test files | Configure Stryker to mutate source only |
| Treating all survivors as equivalent | Only equivalent mutants are exempt β document them, kill the rest |
| Mutation testing without fast tests | Optimize test speed β slow tests = slow mutations |
| Not scoping mutations progressively | Start small (one policy), expand gradually |
| Accepting < 100% on business logic | 100% is the target β find the gap and test it |
Tools & Commands
Install / update Stryker.NET:
dotnet tool install -g dotnet-stryker
dotnet tool update -g dotnet-stryker
On changed code only (fast β during development):
dotnet stryker \
--project <YourProject.Domain.csproj> \
-tp <path/to/YourProject.UnitTests/YourProject.UnitTests.csproj> \
--since:main \
--break-at 100 \
-r json
Full business logic scope (before merge):
dotnet stryker \
--project <YourProject.Domain.csproj> \
-tp <path/to/YourProject.UnitTests/YourProject.UnitTests.csproj> \
--mutate "src/<YourProject>.Domain/**/*.cs" \
--mutate "src/<YourProject>.Application/**/*.cs" \
--mutate "!**/*Marker.cs" --mutate "!**/DependencyInjection.cs" \
--break-at 100 \
--threshold-high 100 --threshold-low 100 \
-r json -r cleartext
Cumulative baseline in CI (full picture + incremental speed):
dotnet stryker \
--project <YourProject.Domain.csproj> \
-tp <path/to/YourProject.UnitTests/YourProject.UnitTests.csproj> \
--with-baseline:main \
--break-at 100 \
-r json
Scope to a specific file or feature (debug a survivor):
dotnet stryker \
--project <YourProject.Domain.csproj> \
-tp <path/to/YourProject.UnitTests/YourProject.UnitTests.csproj> \
--mutate "**/<TargetFile>.cs" \
--mutate "!**/*Marker.cs" --mutate "!**/DependencyInjection.cs" \
--break-at 100 \
-r cleartext
Inspect JSON report:
jq '.' StrykerOutput/**/reports/mutation-report.json | head -n 120
Key CLI flags reference:
| Flag | Short | Purpose |
|---|---|---|
--project <name.csproj> | -p | Source project to mutate (filename only) |
--test-project <path> | -tp | Test project(s) β repeatable |
--mutate <glob> | -m | Include/exclude files (prefix ! to exclude) β repeatable |
--since:<committish> | Only test mutants in git-diff vs committish | |
--with-baseline:<committish> | Like --since + persist baseline for full cumulative report | |
--break-at <0-100> | -b | Exit code 1 if score < value |
--threshold-high <0-100> | Score β₯ this β green | |
--threshold-low <0-100> | Score < high but β₯ this β warning | |
--reporter <name> | -r | json, cleartext, dots, markdown, html β repeatable |
--concurrency <n> | -c | Parallel worker count |
--verbosity <level> | -V | error, warning, info, debug, trace |
References
- Stryker.NET Documentation
- Stryker.NET Configuration
- Uncle Bobβs Mutation Testing Plan
- Mutation Testing Patterns
Integration
REQUIRED BACKGROUND: superpowers-whetstone:outside-in-tdd β defines the two test streams (Application + Domain)
REQUIRED BACKGROUND: superpowers-whetstone:red-synthesize-green β TDD cycle that produces tests to mutate
WORKFLOW:
Run mutation testing after story completion, before PR/merge. Use as quality gate, not coverage metric.
π Evaluation Benchmark
View all βNo evaluation data yet.
Results appear after the first pipeline run (weekly on Mondays, or on PR).