Improve the Quality of Your Tests With Mutation Testing

Photo by ANIRUDH

Indrek Ots
by Indrek Ots
9 min read

Categories

  • articles

Tags

  • testing
  • java
  • pit

As software engineers, we write tests to ensure the correctness of our code. By rerunning tests after each change, we can quickly identify if any modifications have inadvertently introduced new bugs or broken existing functionality. The existing set of tests form a safety net that allows us to move faster with less risk of breaking things. But how can we validate the quality of the tests themselves? This post will explore the concept of mutation testing, a potential answer to that very question.

What is Mutation Testing?

Imagine a scenario where you edit existing functionality but accidentally introduce a bug. This can happen to any of us. All is well and good if your tests start to fail. A failing test means that the functionality is sufficiently covered with tests. Your safety net prevented a potential issue. However, if no tests fail after you introduce the bug, it may indicate that certain parts of the code are not being properly tested or that the tests themselves are inadequate.

This is essentially the premise behind mutation testing. Instead of you introducing bugs accidentally, a mutation testing tool will do that deliberately. A mutation testing tool introduces a change of behavior to the code. Consider, for example, negating or removing a conditional, or returning null from a method call. The altered version of the code is referred to as a mutant.

Tests are executed after each mutation. The mutant is killed if at least one test fails, indicating that the tests discovered a noticable change in behavior. If no tests fail, the mutant has survived, suggesting there are gaps in our tests or we have tests that never fail. However, in some cases, it can also mean that the code itself is meaningless and isn’t actually needed.

We can draw a parallel to chaos engineering.

Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production.

Mutation testing, however, experiments on our code and it’s goal is to introduce changes that make the tests fail. This helps us to discover weaknesses in our test suite.

Show me the Code

Let’s have a look at an example scenario using PIT, a mutation testing tool for Java. The following is the method under test. It calculates income tax using dual rates. All income up to 10,000 is charged with 20% and everything above 10,000 is charged with 25%. Ignore the fact that we’re dealing with money and we use double. It’s just an example.

class IncomeTaxCalculator {

    private static final double INCOME_TAX_20 = 0.2;
    private static final double INCOME_TAX_25 = 0.25;
    private static final double INCOME_TAX_25_BRACKET = 10_000.00;

    public static double calculateIncomeTax(double income) {
        var incomeTax = 0.00;
        var remainingTaxableIncome = income;

        if (income > INCOME_TAX_25_BRACKET) {
            var incomeOver25PercentBracket = income - INCOME_TAX_25_BRACKET;
            incomeTax += incomeOver25PercentBracket * INCOME_TAX_25;
            remainingTaxableIncome -= incomeOver25PercentBracket;
        }

        return incomeTax + (remainingTaxableIncome * INCOME_TAX_20);
    }
}

Let’s create our first test.

@Test
void calculateTaxBracket20() {
    assertEquals(1600.00, IncomeTaxCalculator.calculateIncomeTax(8_000.00));
}

The following is the PIT mutation testing report. It shows which mutations were applied and whether the mutants were killed or not. It resembles a code coverage report. We can clearly see that the if-branch of the code was never executed.

1
package test.pit;
2
3
class IncomeTaxCalculator {
4
5
    private static final double INCOME_TAX_20 = 0.2;
6
    private static final double INCOME_TAX_25 = 0.25;
7
    private static final double INCOME_TAX_25_BRACKET = 10_000.00;
8
9
    public static double calculateIncomeTax(double income) {
10
        var incomeTax = 0.00;
11
        var remainingTaxableIncome = income;
12
13 2 1. calculateIncomeTax : changed conditional boundary → SURVIVED
2. calculateIncomeTax : negated conditional → KILLED
        if (income > INCOME_TAX_25_BRACKET) {
14 1 1. calculateIncomeTax : Replaced double subtraction with addition → NO_COVERAGE
            var incomeOver25PercentBracket = income - INCOME_TAX_25_BRACKET;
15 2 1. calculateIncomeTax : Replaced double multiplication with division → NO_COVERAGE
2. calculateIncomeTax : Replaced double addition with subtraction → NO_COVERAGE
            incomeTax += incomeOver25PercentBracket * INCOME_TAX_25;
16 1 1. calculateIncomeTax : Replaced double subtraction with addition → NO_COVERAGE
            remainingTaxableIncome -= incomeOver25PercentBracket;
17
        }
18
19 3 1. calculateIncomeTax : replaced double return with 0.0d for test/pit/IncomeTaxCalculator::calculateIncomeTax → KILLED
2. calculateIncomeTax : Replaced double addition with subtraction → KILLED
3. calculateIncomeTax : Replaced double multiplication with division → KILLED
        return incomeTax + (remainingTaxableIncome * INCOME_TAX_20);
20
    }
21
}

Mutations

13

1.1
Location : calculateIncomeTax
Killed by : test.pit.IncomeTaxCalculatorTest.[engine:junit-jupiter]/[class:test.pit.IncomeTaxCalculatorTest]/[method:calculateTaxBracket20()]
negated conditional → KILLED

2.2
Location : calculateIncomeTax
Killed by : none
changed conditional boundary → SURVIVED

14

1.1
Location : calculateIncomeTax
Killed by : none
Replaced double subtraction with addition → NO_COVERAGE

15

1.1
Location : calculateIncomeTax
Killed by : none
Replaced double multiplication with division → NO_COVERAGE

2.2
Location : calculateIncomeTax
Killed by : none
Replaced double addition with subtraction → NO_COVERAGE

16

1.1
Location : calculateIncomeTax
Killed by : none
Replaced double subtraction with addition → NO_COVERAGE

19

1.1
Location : calculateIncomeTax
Killed by : test.pit.IncomeTaxCalculatorTest.[engine:junit-jupiter]/[class:test.pit.IncomeTaxCalculatorTest]/[method:calculateTaxBracket20()]
replaced double return with 0.0d for test/pit/IncomeTaxCalculator::calculateIncomeTax → KILLED

2.2
Location : calculateIncomeTax
Killed by : test.pit.IncomeTaxCalculatorTest.[engine:junit-jupiter]/[class:test.pit.IncomeTaxCalculatorTest]/[method:calculateTaxBracket20()]
Replaced double addition with subtraction → KILLED

3.3
Location : calculateIncomeTax
Killed by : test.pit.IncomeTaxCalculatorTest.[engine:junit-jupiter]/[class:test.pit.IncomeTaxCalculatorTest]/[method:calculateTaxBracket20()]
Replaced double multiplication with division → KILLED

To kill the mutants from the if-branch, we need to add couple more tests. Let’s add a test where the income is greater than 10,000 and let’s also test the boundary condition.

@Test
void calculateTaxBracket25() {
    assertEquals(4500.00, IncomeTaxCalculator.calculateIncomeTax(20_000.00));
}

@Test
void calculateTaxExactlyAtBracket() {
    assertEquals(2000.00, IncomeTaxCalculator.calculateIncomeTax(10_000.00));
}

The report looks better now, but not perfect.

1
package test.pit;
2
3
class IncomeTaxCalculator {
4
5
    private static final double INCOME_TAX_20 = 0.2;
6
    private static final double INCOME_TAX_25 = 0.25;
7
    private static final double INCOME_TAX_25_BRACKET = 10_000.00;
8
9
    public static double calculateIncomeTax(double income) {
10
        var incomeTax = 0.00;
11
        var remainingTaxableIncome = income;
12
13 2 1. calculateIncomeTax : changed conditional boundary → SURVIVED
2. calculateIncomeTax : negated conditional → KILLED
        if (income > INCOME_TAX_25_BRACKET) {
14 1 1. calculateIncomeTax : Replaced double subtraction with addition → KILLED
            var incomeOver25PercentBracket = income - INCOME_TAX_25_BRACKET;
15 2 1. calculateIncomeTax : Replaced double multiplication with division → KILLED
2. calculateIncomeTax : Replaced double addition with subtraction → KILLED
            incomeTax += incomeOver25PercentBracket * INCOME_TAX_25;
16 1 1. calculateIncomeTax : Replaced double subtraction with addition → KILLED
            remainingTaxableIncome -= incomeOver25PercentBracket;
17
        }
18
19 3 1. calculateIncomeTax : replaced double return with 0.0d for test/pit/IncomeTaxCalculator::calculateIncomeTax → KILLED
2. calculateIncomeTax : Replaced double addition with subtraction → KILLED
3. calculateIncomeTax : Replaced double multiplication with division → KILLED
        return incomeTax + (remainingTaxableIncome * INCOME_TAX_20);
20
    }
21
}

Mutations

13

1.1
Location : calculateIncomeTax
Killed by : test.pit.IncomeTaxCalculatorTest.[engine:junit-jupiter]/[class:test.pit.IncomeTaxCalculatorTest]/[method:calculateTaxBracket25()]
negated conditional → KILLED

2.2
Location : calculateIncomeTax
Killed by : none
changed conditional boundary → SURVIVED

14

1.1
Location : calculateIncomeTax
Killed by : test.pit.IncomeTaxCalculatorTest.[engine:junit-jupiter]/[class:test.pit.IncomeTaxCalculatorTest]/[method:calculateTaxBracket25()]
Replaced double subtraction with addition → KILLED

15

1.1
Location : calculateIncomeTax
Killed by : test.pit.IncomeTaxCalculatorTest.[engine:junit-jupiter]/[class:test.pit.IncomeTaxCalculatorTest]/[method:calculateTaxBracket25()]
Replaced double multiplication with division → KILLED

2.2
Location : calculateIncomeTax
Killed by : test.pit.IncomeTaxCalculatorTest.[engine:junit-jupiter]/[class:test.pit.IncomeTaxCalculatorTest]/[method:calculateTaxBracket25()]
Replaced double addition with subtraction → KILLED

16

1.1
Location : calculateIncomeTax
Killed by : test.pit.IncomeTaxCalculatorTest.[engine:junit-jupiter]/[class:test.pit.IncomeTaxCalculatorTest]/[method:calculateTaxBracket25()]
Replaced double subtraction with addition → KILLED

19

1.1
Location : calculateIncomeTax
Killed by : test.pit.IncomeTaxCalculatorTest.[engine:junit-jupiter]/[class:test.pit.IncomeTaxCalculatorTest]/[method:calculateTaxBracket25()]
replaced double return with 0.0d for test/pit/IncomeTaxCalculator::calculateIncomeTax → KILLED

2.2
Location : calculateIncomeTax
Killed by : test.pit.IncomeTaxCalculatorTest.[engine:junit-jupiter]/[class:test.pit.IncomeTaxCalculatorTest]/[method:calculateTaxBracket25()]
Replaced double addition with subtraction → KILLED

3.3
Location : calculateIncomeTax
Killed by : test.pit.IncomeTaxCalculatorTest.[engine:junit-jupiter]/[class:test.pit.IncomeTaxCalculatorTest]/[method:calculateTaxBracket25()]
Replaced double multiplication with division → KILLED

There’s still a surviving conditional boundary mutant on line number 13. In this case, the mutatation changes the condition to income >= INCOME_TAX_25_BRACKET but it has no effect on the outcome of the tests. According to mutation testing rules, this is a potential gap in the tests but if we read the code, we can see that it cannot affect the end result because 25% tax is charged on everything that’s above 10,000. income - INCOME_TAX_25_BRACKET evaluates 0 when income is 10,000.

Sometimes, mutation testing can produce results where you, the author, have to evaluate whether it’s significant. It could be that the code under test is not needed or the code could be rewritten. For example, in our case we could remove the if-sentence altogether and get a 100% green mutation test result with the following implementation.

public static double calculateIncomeTax(double income) {
    var incomeTax = 0.00;
    var remainingTaxableIncome = income;

    var incomeOver25PercentBracket = Math.max(income - INCOME_TAX_25_BRACKET, 0);
    incomeTax += incomeOver25PercentBracket * INCOME_TAX_25;
    remainingTaxableIncome -= incomeOver25PercentBracket;

    return incomeTax + (remainingTaxableIncome * INCOME_TAX_20);
}

Use common sense. Mutation testing cannot evaluate the readability or the performance of the code. Arguably, having the if-sentence present makes it very clear that 25% tax is applied to everything above 10,000. Additionally, the new implementation performs unnecessary work when income <= 10_000.

Code Coverage Anybody?

When we try to understand the quality of our test suite, discussions often lead to measuring code coverage. At first glance, there’s a similarity between mutation testing and code coverage, especially when we look at the red and green lines of code in the mutation testing report.

Code coverage tells you which lines of code were executed during a test. Technically, you can get 100% line coverage with a single test but that doesn’t guarantee the test executed all possible code paths. You need to specifically measure branch coverage if you’re interested in all the possible decision points in the code. What’s more, you can get 100% line coverage even if you don’t assert anything. Therefore, measuring code coverage alone cannot protect you against poorly maintained tests. But that doesn’t mean code coverage is a completely useless metric though. You need to understand what you’re measuring.

Is Mutation Testing the end-all be-all?

Mutation testing is an interesting way to discover gaps in our tests and a tool we can include into our toolbox. However, like any tool, it comes with its own set of challenges. Even on a relatively small codebase, the number of generated mutations can grow very high. Now imagine a scenario where the entire test suite takes a minute to finish, and there are 1000 mutations. Mutation testing would take approximataely 17 hours to finish.

Of course, there are optimisations that can be done:

  • limiting the number of mutations generated for each run
  • limiting the types of mutations included into each run
  • excluding tests
  • excluding code under test
  • PIT has an incremental analysis feature

Due to the scalability issues, mutation testing is probably not something you want to run on your CI/CD pipeline on every commit. Instead, consider running it on a schedule automatically or you could run it locally on a smaller scale, only on the source files you’re working with.

Summary

Mutation testing is a method for testing the quality of the tests. It works by making small changes, or mutations, in your code to see if your tests can find the errors. At first glance, it’s similar to measuring code coverage, which tells you how much of your code is tested, but mutation testing also checks how well your tests work. These two methods are different and complement each other. Remember, the aim isn’t to kill all mutants but to make your tests better.