A monorepo setup: github actions on steroids

As I have mentioned several times in previous blog articles: I develop my composer packages for the Apie library in a monorepo. With a monorepo I can update 33 composer packages as the same time without much effort. You commit everything in one single (git) repository and the CI (for example Github Actions) will deploy new versions for all the libraries into multiple composer packages. It's mainly used by all large frameworks. Also big companies like Google and Facebook use a monorepo for their entire codebase.

Recently I have updated the github action workflow for the Apie monorepo, because the original setup was a bit too simple. Now it's a good example of github actions on steroids:

Why a monorepo?

Whenever you develop an (open source) library, you will end up with several decisions how to set up your library.

Is your library code framework agnostic or does it only work in a specific framework?

Leaning into a framework is easier to setup as frameworks do a lot of things for you.
Your library is less reusable if you rely on a framework especially if it relies on framework functionality.
Framework agnostic libraries lead to a more consistent architecture in general, but it could also makes you reinventing the wheel again.
Splitting your library in at least 2 packages is very common: one framework agnostic library and at least one package that links it with a framework
Splitting means you end up having to do and test your work twice. You first need to change the library and then update the framework package.

Do you want one big package that contains everything or do you want to separate your libary in multiple packages so a developer only gets the dependencies he/she needs?

If you have one big package, it's easy to make assumptions that everything is available
If you have one big package, it's very likely your package requires too much dependencies to install or has internal logic to see if a specific library is installed.
A big package is nice for the person developing the package, a small package is nice for the person using your package.
The downside of many packages is that it's hard to update some code as updating one package requires updating all the packages that depend on this package afterwards. If it still contains a bug you have to repeat doing this.

Did you decide to make the library framework agnostic and split over multiple packages? Then a monorepo would be the best solution.

Monorepo's in node

Monorepo's in Node is basically committing all your files in one repository and have multiple npm publish commands being executed. There is also a library called lerna.js that adds some functionality to publish multiple packages with the same version applied to it and build all libraries at the same time.

Monorepo's in PHP

In PHP we use Composer as package manager and a composer package is always linked to a single source control repository, in most cases git. So the only way to update multiple packages with a monorepo is pushing specific folders to its own git repository and link this as the package git repo. A very common composer package to handle this is symplify/monorepo-builder. It's also the package I use.

It did have a github action to do the splitting, but I was not satisfied with it as I had to manually make the branches in the git subtree. I still need to manually make the repositories as I do not want Github Actions to have the power to make repositories.

Apie Monorepo folder structure

See the screenshot of the folder structure of Apie:

In .github you can find the workflows for github actions.
In bin we can find console command tools, for example bin/create-package creates a new composer package.
In packages we can find all the composer packages. So folder packages/rest-api contains the package apie/rest-api and will be pushed to a git subtree found at github.com/apie-lib/rest-api.
The playground folder contains the playground to test the library code in an actual application on the fly. See my article about making a playground.
The resources folder contains templates for the console commands, for example the templates for making a new package.
tools folder contains code used by the general console command tools.
The composer.json in the root contains the combined requirements from all packages in packages.
The rest of the files in the root of the project are general configuration files for phpstan, php-cs-fixer, composer, your IDE etc.

Downsides of monorepo's

Now the downside is that there are some restrictions often to keep your sanity. All your packages require non-conflicted dependencies. So if one package requires symfony 7.*, all packages in the monorepo require symfony 7.*. You can not have a small part of them require symfony 6.* and the other part 7.*. You can however give all packages a 6.*|7.* version constraint. You also have to deal with testing any package individually and test all packages combined. If you don't you will end up with some challenges:

incorrect package dependencies only show up after you pushed your work
the code coverage of testing everything or only one package should be merged because it could hit different code.
you should also test all packages with the lowest dependencies set and test all packages with the highest composer dependencies.

So I had to extend the CI a lot.

Running all tests combined

This is easy to set up. We just install the dependencies from the composer.json of the root or our repository to install all dependencies of all our packages and run php vendor/bin/phpunit, We also test our code in multiple PHP versions that we claim are supported in the composer.json. I wouldn't be the first one to use a PHP 8.3 feature in a library that claims PHP 8.1 support.

Composer lowest/latest install dependency

In an application, version constraints are locked with the composer.lock. This is not the case for a library as they can be different. We can only tell the supported versions within the package.json of the library with requires and conflicts. However testing your application with every combination of every allowed composer dependency would be a little bit too much. A much easier solution is to run composer update --prefer-lowest and composer update.

The prefer-lowest setting will install the lowest allowed version of a package and not the latest version. So we test the entire suite with multiple PHP version and with multiple composer dependencies. With this combination we can be sure our code works in the versions we actually wrote down in our composer.json file.

Running single package tests

To ease single package tests and to make sure it wil not load dependencies not in the composer.json of the package I ran the tests in an isolated docker container. I made it accessible with bin/run-package-test in the monorepo package.

Running the tests created a coverage file of the tests found only in the package. I made sure you can download it as artifacts.

Workflow matrix

We can run our single package tests parallel as they can be tested standalone. Luckily github actions has a matrix setting that can be used to run the same code, but with different settings. We can use it to run all our individual package tests with different settings (lowest or highest Composer dependencies, which PHP version). There is a limit of 256 concurrent jobs and if you have a private repository, it will cost you a lot of Github minutes. We first create a step that determines all the packages:


provide_packages_json:
    name: Determine all available packages
    runs-on: ubuntu-latest

    steps:
      -
        uses: actions/checkout@v4

      - uses: shivammathur/setup-php@v2
        with:
          php-version: 8.3
          coverage: pcov

      - uses: "ramsey/composer-install@v3"
        with:
          dependency-versions: "lowest"

      -
        id: output_data
        run: echo "matrix=$(vendor/bin/monorepo-builder packages-json)" >> $GITHUB_OUTPUT

    outputs:
      matrix: ${{ steps.output_data.outputs.matrix }}

Symplify/monorepo-builder has en executable that outputs a json file with all packages it could find. We output this to a matrix output. Now we can use a workflow for every package test like this and execute all our tests individually:

run_phpunit_per_package:
    name: Run tests
    needs:
      - provide_packages_json

    runs-on: ubuntu-latest
    strategy:
      fail-fast: true
      matrix:
        php_version: ['8.1', '8.3', 'latest']
        package: ${{fromJson(needs.provide_packages_json.outputs.matrix)}}
    steps:
      -
        id: coverage
        uses: actions/checkout@v4
      - uses: "ramsey/composer-install@v3"
        with:
          dependency-versions: "lowest"
      -
        name: Run tests for ${{ matrix.package }} in php ${{ matrix.php_version }}
        run: bin/run-package-test ${{ matrix.php_version }} ${{ matrix.package }}
      - name: Upload reports' artifacts 
        if: success() || failure()
        uses: actions/upload-artifact@v4
        with:
          name: ${{ github.run_id }}_artifact_${{ matrix.php_version }}_${{ matrix.package }}
          if-no-files-found: ignore
          path: ${{ github.workspace }}/coverage
          retention-days: 1

If you have seen a github action before, it's very easy to read and see what it does. If you have never seen it before it requires some reading and understanding what happens. Basically it makes a checkout of your code, installs composer dependencies on the lowest possible version, runs the testsuite with a bash script and uploads it to github as an artifact, so you can download the results. It does this with the matrix setting, so it runs every package with php 8.1, 8.3 and the latest RC version of PHP.

I could also add a third matrix setting for installing the latest or lowest composer dependencies, but doing so will result easily in more than the 256 jobs limit if I would add even more packages.

Merging code coverage

In our codebase we have code like this:


if (PHP_VERSION_ID < 80200) { // < PHP 8.2 has no support for null only typehints
    return '?string';
} else {
    return 'null';
}
if (!class_exists(\OptionalLibrary\Object\Test::class)) {
    throw new \RuntimeException('To enable this feature, you require to install optional library with composer require optional-library/example');
}

It's not possible to get 100% code coverage in a single test suite with these code examples, but we do want an accurate code coverage. Luckily phpunit can generate code coverage files in formats that can be merged with the tool phpcov

Code coverage report formats

You can output multiple code coverage formats in phpunit. Some are just for human readable reasons:

'html' outputs to HTML output, 'text' outputs it to your screen. But the other formats are more useful in merging code coverage:

Cobertura: this is a format used often in Java applications.
Crap4J: this is another format that comes from Java applications. It's more known for their CRAP rating on code, which is also being generated in PHPUnit
PHPUNIT xml: this is the native format generated by PHPUnit, outside PHPUnit nobody uses this format.
Clover: this is a another format that originates from the Java world. However this format is often the default format used by generic tools. For example codecov expects a clover coverage to generate a coverage badge for the code.
PHP: this is not really a format, but more a php file that creates the original code coverage report class instance. You can include it just in a PHP file to get the code coverage files again. This is the only format that can be used to merge code coverage. You have to have matching versions!

Merging code coverage output files

So the only way to merge code coverage is using the PHP format. We can use the phpcov command for it that can be installed as phar or with composer. The bash command line looks like this:

php -d memory_limit=-1 vendor/bin/phpcov merge --html projectCoverage --clover coverage.xml ./coverage

We are merging so many coverage file that we need to disable the memory limit in PHP. Sadly phpcov is a very developer unfriendly command as at the moment you execute it, you require the actual files as well and they are stored as a global path. If the file can not be found, it is actually being ignored. The code coverage report objects are also very much 'array-typed' objects, so modifying the objects in your own code is also a pain (see my article about the bad overuse of arrays in php). To fix it I made sure all tests run in a docker container and have matching paths so I can easily merge them. This really took some serious time to figure out as I continuously ended up with en empty coverage and phpcov provides no warning on missing files.

Publish code coverage results

Publishing the code coverage in readable HTML is easy. You need a working github pages and you need to output the code coverage as html. Then you can easily commit the generated html to your github pages branch/repo and you can read the code coverage.

In many projects you can see status badges in a project. There are many external services that provide status badges, for example to display the latest stable and dev version. But for code coverage the best you can do is use an external service like codecov.

But there is an alternative. You can use a github action to calculate the code coverage yourself and create a coverage badge yourself. If you create a SVG and put it in the same folder as the code coverage you can easily push a coverage badge that can be used in your README.md. For example apie/core's badge is generated at https://apie-lib.github.io/projectCoverage/coverage-core.svg so in a README I can easily show it like this:

[![Coverage](https://apie-lib.github.io/projectCoverage/coverage-core.svg)](https://apie-lib.github.io/projectCoverage/app/packages/core/index.html)

and it will look like this:

Calculating code coverage

The simplest way to calculate code coverage is by parsing the HTML of the HTML output or reading the clover format. I read the clover format as it is more suitable for reading by a external script then the HTML output. I also group it by package:

$xml = simplexml_load_string($contents);

$files = $xml->xpath('//file');

$groups = [];
    
foreach ($files as $file) {
    $attributes = $file->attributes();
    $filePath = (string) $attributes->name;
        
    if (preg_match('|/app/packages/([^/]+)/|', $filePath, $matches)) {
        $subfolder = $matches[1];
    } else {
        echo "Skipped: $filePath \n";
        continue;
    }
        
    $metrics = $file->metrics->attributes();
    $elements = (int) $metrics->elements;
    $coveredElements = (int) $metrics->coveredelements;
        
    if (!isset($groups[$subfolder])) {
        $groups[$subfolder] = [
            'elements' => 0,
            'coveredelements' => 0
        ];
    }
        
    $groups[$subfolder]['elements'] += $elements;
    $groups[$subfolder]['coveredelements'] += $coveredElements;
}

Basically I read all <file> tags and look for the covered elements attributes in the <metrics> tag inside the <file> tag. There are multiple ways to measure code coverage, so it could be something else for you. It always stays an indication how well it is being tested.

Conclusion

So the conclusion: a monorepo is a very good way to maintain a large list of composer packages. Setting up the CI does takes some time, but once you have it running, it feels like a very big accomplishment. Now with a simple commit I can refactor 33 packages at the same time and show other developers the test code coverage. And that's all thanks to a github action on steroids.

There are still some improvements I'd like to look at. For example I want to display a code coverage report on pull requests and I also want to make the tests less suspectible for failing http requests.

Apie programmer's blog

Search This Blog

How I stopped doing MVC in web development