Don't be afraid for refactoring



It's easy that once your code 'works' that you will never change the entire structure because you are afraid you 'break' something. The result is that your code will degrade in quality over time as any chosen code structure has weaknesses, where code is not easy to write or modify. The last thing you want is code that everybody is afraid to change. The moment you have this, you have a big problem.

I'm working on the Apie library and I made the doctrine-entity-layer package. It's basically the default data layer implementation for Apie that communicates with the database. It took me about 120 hours to write the basics. You add a column and the doctrine-entity-layer will update the database with the column. It also integrates the indexing in it, so you can do a full text search on all entities. I haven't seen any PHP library ever being able to handle this without assistance from the programmer.

The first version (that was also shown in my sneak preview movie) could handle objects, string value objects, primitive properties and composite value objects. It was still missing polymorphic relations and storing arrays of objects.

I decided to make the polymorphic relations to be stored with this package, but after 16 hours I ended up with some buggy implementation that was not even working. I realised that I had to redesign the package even though the package works really nice.

Current design

The current design revolves around 2 packages: apie/doctrine-entity-converter and apie/doctrine-entity-datalayer.

apie/doctrine-entity-converter

this package is made around making Doctrine entities on the fly from a Apie domain object following the spec I originally designed. A class had methods to convert a domain object to an entity and viceversa:

class apie_resource_example_user implements GeneratedDoctrineEntityInterface
{
    #[ORM\OneToMany(targetEntity: 'apie_index_example_user')
    private Collection $_indexTable;

    #[ORM\Column()]
    #[ORM\Id]
    private string $apie_id;

    public function inject(EntityInterface $object): void
    {
        // code to move doctrine properties in existing object
        DoctrineUtils::setValue(new ReflectionProperty(User::class, 'id'), UserIdentifier::fromNative($this->apie_id));   
    }
    
    public static function getOriginalClassName(): string
    {
        return User::class;
    }
    
    public function updateFrom(EntityInterface $object): self
    {
        // code to fill in properties in the doctrine entity from a domain object.
        $this->apie_id = DoctrineUtils::getValue($object, new ReflectionProperty(User::class, 'id'));
    }
    
    public static self createFrom(object $object): self
    {
    	$instance = (new ReflectionClass($this->getOriginalClassName()))->newInstanceWithoutConstructor();
        return $this->updateFrom($instance);
    }
}

apie/doctrine-entity-datalayer

Apie uses a strict datalayer interface to store and retrieve objects. A data layer is basically a repository to retrieve entities, but also a method to update, create or delete an object.
This package uses apie/doctrine-entity-converter to create doctrine entities for a series of domain objects, creates a Doctrine entity manager with specific settings linking to the created entities and adds a class around it that follows Apie's datalayer interface. The package has no dependency to the Doctrine bundle in Symfony or the laravel-doctrine package for Laravel, but adding the Doctrine bundle does link the entity manager to the Doctrine Bundle to see all executed queries.
The only thing this package also does is automatic migrations with the Doctrine schema tool (in development mode only). This means this will happen if I create a property to an existing domain property:
  • it recreates the entity again with a link to the new property.
  • The Doctrine schema tools sees a missing database column and tries to add the new column.
  • If the migration failed it resets the database if this setting is turned on or throws an error otherwise.

Problems with the current design

I noticed several problems with it design:
  • I had a static factory that creates table classes. The problem is that if you want to extend it you end up with another method in case you want to add a new functionality in the database. I tried to work around it with a createFromContext method, but it's rather confusing.
  • You could not disable features. There always had to be indexing and it's always one index table per resource.
  • There were some bugs related to updating entities as it sometimes resulted in orphaned objects that were still kept in the database.
  • I could not make recursive references work and sadly they are needed in Doctrine. For example a one to many need a reference to it's parent, but the parent should know the child object exist as well.
  • If you have existing data you will run into problems with auto migration if the new column is not nullable. Basically every column should be nullable in all cases. That also makes polymorphic relations easier.
  • Field filters were inconsistent as it searches on the property, but the result returns the return value of getters. This makes an assumption that the getter method will always return the property value.
  • The code I wrote was convoluted with code generation, converting types between domain objects and database objects and it was completely tightly coupled with Doctrine.

Refactoring process

Step 1: extract logical blocks

So the main problem I was struggling was with the conversion of a Doctrine entity to a Apie domain object and viceversa. The only way I could test it was by actually having the entire flow tested in an integration test. Just doing the conversion was not testable and I would have to write the same logic again if I would make an other data layer class. Also it became a nightmare when I tried to make polymorphic relations work.

So the first thing I did is actually making objects that are compatible with Doctrine that can be easily converted into an Apie domain object. Basically I ended up with a Data Transfer Object with all typed public properties that contains some attribute to tell where it comes from. The interface forces you to have a method that tells you what domain class should be used. I called it apie/storage-metadata. It was completely unaware of Doctrine entities and does most magic by just converting properties using php8 attributes to tell the original link. An example class:
class UserWithAddressStorage implements StorageDtoInterface
{
    public static function getClassReference(): ReflectionClass
    {
        return new ReflectionClass(UserWithAddress::class);
    }
    public function __construct(
        #[PropertyAttribute('id')]
        public string $apieId,
        #[OneToOneAttribute('address')]
        public AddressStorage $apieAddress,
        #[PropertyAttribute('password')]
        #[SensitiveParameter]
        public ?string $apiePassword = null,
    ) {
    }
}
I wrote a class that will convert this class in a UserWithAddress domain object using the php8 attributes and reading the property types to convert types with apie/type-converter. The attributes are similar to the Doctrine entities, except simpler and not having a dynamic link to Doctrine. These classes can also be created with different ORM's in mind. Nontheless I could write unit tests for all conversions including polymorphic relations, so all the issues of the previous solution could be easily countered and tested before I replace the original code.

Polymorphic relations are handled similar to Doctrine single table inheritance. I just provide an additional argument to get the declared class of the property.

class AnimalStorage implements StorageClassInstantiatorInterface
{
    public function __construct(
        #[DiscriminatorMappingAttribute]
        private array $discriminatorMapping,
        #[GetMethodAttribute('getId')]
        private string $id,
        #[PropertyAttribute('id')]
        private string $apieId,
        #[PropertyAttribute('hasMilk', Cow::class)]
        private ?bool $apieHasMilk = null,
        #[PropertyAttribute('starving', Elephant::class)]
        private ?bool $apieStarving = null,
        #[PropertyAttribute('poisonous', Fish::class)]
        private ?bool $apiePoisonous = null
    ) {
    }

    public static function getClassReference(): ReflectionClass
    {
        return new ReflectionClass(Animal::class);
    }

    public function createDomainObject(ReflectionClass $class): object
    {
        $class = EntityUtils::findClass($this->discriminatorMapping, $class);
        assert(null !== $class);
        return $class->newInstanceWithoutConstructor();
    }
}
I also added the StorageClassInstantiatorInterface to make it possible a storage object can create it's own domain object as it depends on the discriminator mapping which class instance should be called (class Animal is abstract).

Step 2: make a code generator that creates Storage DTO's

I created apie/storage-metadata-builder that will create the storage DTO's from an existing domain object. To deal with circular references I made it work by creating a todoList construction and a chain of responsibility. I use the same context object for dealing with objects as properties. The main service is just a few lines of code:

    public function generateCode(): GeneratedCode
    {
        $code = new GeneratedCode(
            $this->boundedContextHashmap,
            new GeneratedCodeHashmap([
            ]),
        );
        $this->bootGeneratedCode->boot($code);
        while ($code->hasTodos()) {
            $context = $code->getNextTodo();
            $this->runGeneratedCode->run($context);
        }
        $this->postRunGeneratedCode->postRun($context);

        return $code;
    }
the tests involve basically seeing if the code matches exact code matches.

Step 3: replace the current Doctrine data layer

This step took most time. I had to replace doctrine/entity-converter to use apie/storage-metadata-builder with an extra AddDoctrineFields step to add Doctrine attributes.
The consequence of this architecture change is that instead of 2 packages we have to deal with 4 packages. After replacing the code I ran all the tests and the results were many failing tests:

So I had to fix all of them. There were about 5 tests that took some effort (fixing them took me several hours of work per test), there were some things I had to fix:
  • Doctrine always wants an identifier field, but not all of them  had an identifier.
  • I renamed things and many tests assumed the old names. Some tests also give not meaningful errors, like integration tests telling me they expected status code 200, not 500.
  • All code in apie/core in the Persistence namespace could be removed.
  • In some cases I had to do extra checks if a field has a Doctrine column attribute or not, because you can not add it twice and Doctrine will just not set a property if the column attribute is missing.
  • I use nette/php-generator for code generation, but promoted properties are not returned if you ask for all properties in a class.
  • The search by field name was very different as in the previous case it was always the same as the field, but now it's an array of searching terms.

Step 4: Test in a skeleton project

Once the tests succeeded and phpstan was also no longer complaining I tried to see if I broke anything in an actual application. I committed all my changes and just pushed it to the current 1.0 branch. I used the Apie skeleton project and fixed a few bugs. The biggest one was that the indexing table was not being updated properly, because of several factors. The good test coverage prevented me from making mistakes, so I was suprised this bug was not found in the test to be honest.

Refactoring conclusion

So was the refactoring successful? The original code cost me about 90 hours and an additional 30 hours for extra features. But all missing features I was expecting it would take me more than 90 hours to fix it. The refactoring cost me about 50 hours and fixing some minor details will cost me about 10 hours. Still however when I estimate all missing features would probably take another 20 hours which is a lot shorter. So the refactoring + implementing new features would take less time than the 90 hours it would take me to make it in the old flow. Luckily I have not a stable release yet, so I could easily replace it before I have to think about backwards compatibility.

I also think the new structure is better as it makes me less depending on Doctrine and gave me a lot more insights and knowledge about several libraries. There are still about a few parts in the code where you can see I still rushed the code base, but still I think the refactoring was worthwile.

And make sure you remember: every solution has advantages and disadvantages, so some features might be harder to implement than in the old situation.

Benefits of refactoring code:
  • Less time for future features which makes the code easier to maintain
  • Improving your own programming skills as a programmer
  • Improves your insight about the domain of the application.
  • If you know your code handles a few situations bad, make sure you change it ASAP before you can not change it easily anymore.
  • Is your code well tested? Then you can be quite sure you did not break anything if all tests succeeded.
Of course there are downsides about refactoring code:
  • You never know if the refactoring will actually improve your codebase. This is often the reason some programmers stop refactoring code.
  • Integration tests are fine if setup properly, but some unit tests can be thrown away after a refactoring as it requires more time rewriting a test then writing a new test.


Comments