How to avoid bulky data providers in phpunit

 

How to avoid bulky data providers?

If you made unit tests you will eventually discover data providers in phpunit. It allows you to test the same thing again and again, but every time with a different input.

What is a data provider?

Basically a data provider is just a way to have different input arguments for the same test. It's a (static) method that just creates a list of inputs to test. The easiest example would be a test that calculates addition of 2 numbers:


use PHPUnit\Framework\TestCase;

class ExampleTest extends TestCase
{
    /**
     * @test
     * @dataProvider provideAdditions
     */
    public function it_can_add_to_numbers(int $expected, int $firstNumber, int $secondNumber)
    {
        $this->assertEquals($expected, $firstNumber + $secondNumber);
    }
    
    public function provideAdditions(): \Generator
    {
        yield 'simple test case' => [3, 1, 2];
        yield 'works with negative numbers' => [3, -1, 4];
    }
}

Some pointers I follow when writing tests

  • If I use a data provider I use the key to tell other programmers what this test case is testing. This text will also be shown if the test fails.
  • I use PHP generators for the data provider and not a large array as it allows the most flexible data providers.
  • I always write my test function arguments to start with the expected value as the first argument. Most people do not do this, but all phpunit assertion messages also follow this argument order.

Testing all cases

Just imagine I want to test a method with lots of if statements and I want to test every testing combination for complete coverage then I would write a data provider similar to this:

public function someDataProviderExample(): \Generator
{
    $input1 = 42;
    $input2 = 0;
    $input3 = 5;
    $enum1 = Enum::ACTIVE;
    $enum2 = Enum::INACTIVE;
    $enum3 = Enum::BLOCKED;
    yield [true, $input1, $enum1];
    yield [true, $input2, $enum1];
    yield [true, $input3, $enum1];
    yield [false, $input1, $enum2];
    yield [false, $input2, $enum2];
    yield [true, $input3, $enum2];
    yield [true, $input1, $enum3];
    yield [false, $input2, $enum3];
    yield [true, $input3, $enum3];
}

So what is the problem with this approach? The problem with this approach is that these data providers can get incredibly large and adding another input argument will make this data provider even larger. In our example we already have 9 test cases and adding another argument with 3 options will make the list even larger into 27 test cases..

While this is sometimes unavoidable, it clutters the testing class a lot as 80% of these test classes are data providers instead.

Glob files solution

A solution I use in certain cases (and I know phpstan uses this in his own tests) is putting all testcases inside a folder called 'testcases' and end up with a data provider similar like this. My example code uses symfony/finder for finding all files and uses the file name for the test description.

use Symfony\Component\Finder\Finder;
public function provideInput(): \Generator
{
    foreach (Finder::create()->in(__DIR__ . '/testcases')->files()->pattern('*.php') as $inputFile) {
        $description = $inputFile->getFilenameWithoutExtension();
        $expectedFile = preg_replace('/\.php$/i', '.json', (string) $inputFile);
        yield $description => [
            json_decode(file_get_contents($expectedFile), true),
            require($inputFile),
        ];
    }
}

So instead of writing everything in the test case itself we write it in additional files. We assume there is a testcases folder with php files and a json. If we want a new case we just add a new php file and add a json file for the expected outcome. The expected value needs to be something serializable, but a php input and a php output file would work fine as well.

Depdendency injection data providers

For the integration tests (that I talked about in a previous article "How I made the integration tests work for 2 frameworks") I wanted to run my test with multiple instances of TestApplicationInterface. We have 2 class declarations with this interface: LaravelApplication and SymfonyApplication, both require a Configuration object. I have multiple Configuration instances. And I want to combine all variations for SymfonyApplication and For LaravelApplication only the configuration instances that do not disable the template engine because Laravel can't disable the the template engine.
So my data provider would look like this (and I have to repeat a similar structure for every integration test):


public function provideInput(): \Generator
{
    $configNoTemplateMemoryDatalayer = new ApplicationConfiguration(false, InMemoryDataLayer::class);
    $configWithTemplateMemoryDatalayer = new ApplicationConfiguration(true, InMemoryDataLayer::class);
    $configNoTemplateFakerDatalayer = new ApplicationConfiguration(false, FakerDataLayer::class);
    $configWithTemplateFakerDatalayer = new ApplicationConfiguration(true, FakerDataLayer::class);
    $configNoTemplateDbDatalayer = new ApplicationConfiguration(false, DbDataLayer::class);
    $configWithTemplateDbDatalayer = new ApplicationConfiguration(true, DbDataLayer::class);
    yield 'no template, memory data layer, Symfony application' => [new SymfonyApplication($configNoTemplateMemoryDatalayer)];
    yield 'with template, memory data layer, Symfony application' => [new SymfonyApplication($configWithTemplateMemoryDatalayer)];
    yield 'no template, faker data layer, Symfony application' => [new SymfonyApplication($configNoTemplateFakerDatalayer)];
    yield 'with template, faker data layer, Symfony application' => [new SymfonyApplication($configWithTemplateFakerDatalayer)];
    yield 'no template, DB data layer, Symfony application' => [new SymfonyApplication($configNoTemplateDbDatalayer)];
    yield 'with template, DB data layer, Symfony application' => [new SymfonyApplication($configWithTemplateDbDatalayer)];

    yield 'memory data layer, Laravel application' => [new LaravelApplication($configWithTemplateMemoryDatalayer)];
    yield 'faker data layer, Laravel application' => [new LaravelApplication($configWithTemplateFakerDatalayer)];
    yield 'DB data layer, Laravel application' => [new LaravelApplication($configWithTemplateDbDatalayer)];
}
You get the point: this data provider becomes a maintenance hell and if multiple integration tests do this it becomes almost impossible to maintain. We can't use the glob file workaround for it as all data is intertwined. It's why I decided to make a composer package apie/phpunit-matrix-data-provider for this very specific problem. Instead of writing everything in a data provider we create a factory object that creates multiple instances of a class and use typehints to automatically link all the dependencies with dependency injection.

class TestObjectFactory
{
    public function createInMemoryDatalayerImplementation(): DatalayerImplementationEnum
    {
        return DatalayerImplementation::InMemoryDatalayer;
    }
    
    public function createFakerDatalayerImplementation(): DatalayerImplementationEnum
    {
        return DatalayerImplementation::FakerDatalayer;
    }
    
    public function createDbDatalayerImplementation(): DatalayerImplementationEnum
    {
        return DatalayerImplementation::DbDatalayer;
    }
    
    public function createConfigurationWithTemplating(DatalayerImplementationEnum $enum): ApplicationConfiguration
    {
    	return new ApplicationConfiguration(true, $enum->toClassName());
    }
    
    public function createConfigurationWithoutTemplating(DatalayerImplementationEnum $enum): ApplicationConfiguration
    {
    	return new ApplicationConfiguration(false, $enum->toClassName());
    }
    
    public function createSymfonyApplication(ApplicationConfiguration $configuration): TestApplicationInterface
    {
    	return new SymfonyApplication($configuration);
    }
    
    public function createLaravelApplication(ApplicationConfiguration $configuration): TestApplicationInterface
    {
    	return new LaravelApplication($configuration);
    }
}
And this is how our test + data provider looks like:

use Apie\PhpunitMatrixDataProvider\MakeDataProviderMatrix;
use PHPUnit\Framework\TestCase;
class ExampleIntegrationTest extends TestCase
{
    use MakeDataProviderMatrix;
    /**
     * @dataProvider provideApplications
     */
    public function testRoot(TestApplicationInterface $application): void
    {
        $application->bootApplication();
        $response = $application->getHttpRequest('/');
        $this->assertEquals(200, $response->getStatusCode());
    }

    public function provideApplications(): \Generator
    {
        yield from $this->createDataProviderFrom(
            new ReflectionMethod($this, 'testRoot'),
            new TestObjectFactory()
        );
    }
}
The data provider looks a bit hard to grasp, but this is how it works:
  1. it reads the method arguments of testRoot and finds TestApplicationInterface as typehint.
  2. It iterates over all public methods of TestObjectFactory to find methods whose return typehint is exactly TestApplicationInterface
  3. For every method that returns TestApplicationInterface instances we read those method arguments and find ApplicationConfiguration.
  4. For every method that returns ApplicationConfiguration we do the same and find DatalayerImplementationEnum.
  5. For every method that returns DatalayerImplementationEnum we do the same, but now we find no function arguments.
  6. We call every method that returns ApplicationConfiguration and provide every DatalayerImplementationEnum we find and we repeat this until we have all possible combinations of TestApplicationInterface.
So in short: it generates the data provider for you where every combination of dependencies is handled. The above example will generate 2 x 2 x 3 = 12 use cases.

Filtering results

While it works, it actually tests too many cases. As said, the template engine is only available in Symfony if the TwigBundle is included, but no such option is available in Laravel, so testing the configuration without templating is not possible in a Laravel application. Therefore I decided that any factory can return null to ignore a specific use case.

The createLaravelApplication in TestObjectFactory will become:
public function createLaravelApplication(ApplicationConfiguration $configuration): ?TestApplicationInterface
    {
        if (!$configuration->allowsTemplating()) {
          return null;
        }
    	return new LaravelApplication($configuration);
    }
Now LaravelApplication will only be instantiated with a configuration that enabled templating. We now have 1 x 1 x 3 + 1 x 2 x 3 = 9 use cases, getting rid of 3 redundant ones.

More test configuration options

Another integration test case was seeing if Apie could provide an Apie Laravel Facade. Laravel Facades are only working in Laravel application and not in Symfony applications. We use the return null trick to fix that one and make a more accurate typehint (the library only looks at exact typehints):


public function createLaravelTestApplication(ApplicationInterface $application): ?LaravelTestApplication
{
}	return $application instanceof LaravelTestApplication ? $application : null;

Now our test looks like this:


public function it_registers_a_laravel_facade_provider(): Generator
{
    yield from $this->createDataProviderFrom(
        new ReflectionMethod($this, 'it_registers_a_laravel_facade'),
        new IntegrationTestHelper()
    );
}

/**
 * @dataProvider it_registers_a_laravel_facade_provider
 * @test
 */
public function it_registers_a_laravel_facade(LaravelTestApplication $testApplication)
{
    $testApplication->bootApplication();
    $apieService = $testApplication->getServiceContainer()->get('apie');
    $this->assertInstanceOf(
        GetItemAction::class,
        Apie::createAction(new ApieContext([ContextConstants::APIE_ACTION => GetItemAction::class]))
    );
    $this->assertSame($apieService, Apie::getFacadeRoot());
    $testApplication->cleanApplication();
}

So this package will work in just very specific cases, but it does make the data providers smaller and better to maintain and makes it ridiculously easy to test my library in 2 frameworks with different options.

I've seen projects where there were too many tests because of excessive large data providers hindering actual develop. We already use auto-wiring services in the application with dependency injection, so why not use dependency injection in our data provider?

Comments