fbpx

Hi folks, DonCallisto here.

In this article we’ll see why I personally don’t like status flags and how to handle them in a more elegant way. We’ll find out code written without flags which is semantically better and convey business rules way better. Curious? Keep reading!

Before start, I would like to be sure you understand what status flag are. Since I’m not aware of a literature definition for the concept, I come up with one (if someone finds it, please leave a comment and I’ll be happy to edit with an accurate quote). When I talk about status flag I mean a class member which decree the current status of an object (class instance). For example think of a kanban board with several columns like “confirmed”, “assigned”, “in progress”, “done” and “released”. A task inside the kanban, in a given moment, can be pinned in one and only one of these columns. It follows that a task has a status, alternatively confirmed, assigned (to someone) and so on. That’s what I mean when I use “status flag” locution. You can also think them in terms of enums. Having said that, I would not include binary flags in this definition as only two possible values can be assigned, like true or false, on or off, open or closed, and so on and so forth; further on should be clear why.

Going straight to the point, consider a piece of software responsible for blog’s articles management. An article has a status depending on the point in time you look at it. In this example possible status are: draft, ready for review, reviewed and published. There’s also a business rule we need to take into account and states that a status transition can happen only in a finite-state machine like fashion. An article is created in a draft status, then it can be moved to ready for a review, then when reviewed it acquire a reviewed status and only when reviewed it can be published. So we got a pretty straightforward state machine, where transitions happens in a one way direction from a state to the other as depicted below

To ease the understanding of the concept, Article was modeled as a simplified version of a real world scenario so we can focus on what matters the most leaving out the irrelevant details. For example I decided to avoid transitions like the one back from “ready for review” to “draft”, I also kept out some status like “need edit” when a review is done but not accepted and – you’ll see in no time – articles doesn’t have a title or other attributes one expects to find in a real world scenario. These are only some of the simplification you can find reading on this post.

So far so good, context is set and we’re now ready to go on. As Linus Torvalds said once… “Talk is cheap, show me the code!”

Consider the following PHP 8.1 classes (feel free to check out code directly in GitHub; please notice the link point to a specific commit and not to the whole repo, further on, when needed, I’ll link other commits)

User and ArticleException were left out; you can find whole code on GitHub repo linked above.

Status is an enum, a new PHP feature you can take advantage of from PHP 8.1. Status is our status flag, the one you’ve read during introduction. Looking at Article, it’s easy to find how I used Status and I’m pretty sure you, in your developer career, did wrote code in this fashion. No shame in that. Probably you’re wondering what’s wrong with Article class as it seems to model every business rule described before. The point is I’m using a single class to represent and model four different status, so four different behaviors, all in one class. You may argue that there’s only one domain concept – other than Status that you can see pretty much as a value object as it’s immutable and its identity is verified not by its id but by its values – and it is Article, but from my point of view, as long as Article should behave differently depending on its status, it should be modeled as different objects. To support latter statement, consider what follows.

  • publicationDate attribute must be nullable as it has meaning only for a certain Status. This is a smell for little cohesion. Moreover as publicationDate is nullable, getPublicationDate must be able to return null or, alternatively, must throw an exception if publicationDate is null. Either chances are bad as the first forces the client to explicitly check for null values, whereas the second forces the client to catch possible exceptions and can be fuzzy if seen from outside (why on the world publicationDate retrieval can lead to an exception?). In addition, and as a side effect, publicationDate concept has no meaning in certain situation. To prove that, go and ask your domain experts something like “what’s the publication date of a draft/ready for review/reviewed article?”. No surprise if they’ll be puzzled. The same goes for reviewer.
  • There’s a lot handling for public API. For instance, look at readyForReview public method (the same goes for all transitional method). Internally we are forced to put an extra effort to convey and guarantee a domain rule for Status changes.
  • Public API is not telling anything about the business. Leaving out is-ser methods in Article and looking at the class from the outside through its API we have readyForReview, reviewed, publish, changeContentTo, getReviewer and getPublicationDate. For a client it might seem possible to change Article status to any of the possible status regardless of its current one. Anybody which knows well domain rules is forced to look into Article code to find what can go possibly wrong. Even if domain invariants are respected, here we’re hiding them and communicating something wrong about Article nature. Talking about getReviewer and getPublicationDate, I’ve already pointed out what’s the issue.
  • Think of a statistics service (a class) designed to provide daily, weekly, monthly and yearly unique readers for a (published) article. Can you imagine its unique public method? It would be something like public function getReaders(Article $article): int. Again looking at this method some questions could be raised; one above all is what to return when an article is not published. Should we return 0 that’s equivocal with the concept of “no readers”, or should we return null, meaning statistics cannot be retrieved? Should we raise an exception if article is not published? From the point of view of a newcomer, whatever path you take, is it clear the concept behind the return value? Is it clear only Status::PUBLISHED articles will be manageables?
  • Think about entity retrieval. Let’s pretend we wish to retrieve only Status::DRAFT articles. You need to add an extra WHERE condition when querying. Yes of course it’s not dramatic from performance perspective, but this is true as long as you store few records. Reading this article you may find that your queries could have nearly 60% of overhead (when millions records stored).
  • Think about a new attribute which has meaning only for some or just one status but not for the whole object (like publicationDate). Whoever modify Article must put attention when introducing this kind of change as a forgetfulness could lead to inconsistent API and data (see it like adding a method or an attribute to a class where it does not belongs. Yes, it won’t happen when dealing with unrelated classes but could happen in this specific case).

These are some points against status flags. Maybe there could be more, or maybe you won’t consider any of these problematic. As usual your mileage may vary.

An alternative

Taking up one of previous statements

“The point is I’m using a single class to represent and model four different status, so four different behaviors, all in one class.”

Let’s see how code can be changed to express better domain concepts (GitHub code here)

Previous code was split into four classes and an interface. The interface is helpful for circumstances when an article should be handled regardless its Status. Moreover we got rid of Status object and exception object as Status is incapsulated directly into object types. New classes has narrower public APIs and domain concepts are better defined and communicated to the outside. New developers can easily understand what domain rules are when talking about transitions, no exceptions or nullable values could be raised or returned, narrow services API (you can choose to accept only an article in a specific status, if needed), and so on and so forth. Finally, if a new attribute and/or API method should be added it’s easier not to mess up other classes with senseless concepts.

Of the critical points showed above, only questionable one is the retrieval. If it is true that a WHERE condition could lead to worse performance, it’s also true that retrieving all articles (in any state) could require up to four queries (unless you use some kind of technique like single table inheritance or similar which can bring out some other kind of issues).

A practical example

As I didn’t want to left the reader with only a “conceptual” solution where these objects weren’t shown “in action”, I’ve decided to develop a nearly real world scenario. Let’s say we would like to export the whole list of articles in different formats. When using a single object it is child’s play as all methods reside in the same class and only one class is involved, but how we can do this when working with four different objects, considering that every class has its own methods and type?

See the code

An interface is defined for articles (not entities one, but a sort of DTO as I would not wanted to bloat domain class with application concepts) to accept a Visitor, following the Visitor Pattern

Then two types of visitors are defined: one for array format and the other for json format, both under ArticleVisitor interface.

ArticleJsonVisitor is defined as a decorator for ArticleArrayVisitor.

I would not report ExportableArticle implementations as you can find whole code in GitHub repo.

Lastly here’s the exporter

where VisitorAbstractFactory is a factory used to get both visitor and aggregator (find it, again, on GitHub repo) in an Abstract Factory pattern fashion. As you can see we’re using ExportableArticle that’s the corresponding (more or less) for domain’s ArticleInterface; just to show how easy is to export all kind of objects as it would have been with a single Article class.

Lastly, a working example

Here’s the output

Looking at this example is easy to understand that splitting Article into separated and more focused classes does not lead to harder handling. Of course there are more classes but that’s the price for flexibility and pattern usage: the more you’re abstract, flexible and have “quick to change” code, the more classes and abstraction you have in your codebase.

What’s next?

Following this solution could force a change int the way you code (and rightfully so, sometimes). One of the possible scenarios left out is about persistence. When there’s only an Article class persistence is just straightforward, but as soon as you model domain this way some decisions must be taken. For instance, how to handle the ids? As a transition from one state to the other happens to create a new entity, should the new entity take the old id or should it have a brand new one? If old id is used, who have to take care of persistence order (as id must be unique)? If new id is used, how to manage something like permalink or caching strategy or any kind of comparable issues? I would not provide specific answers as you might apply different choices based on different situations but, to give some pointers, if old ids are chosen, so if the id will be assigned from starting entity to new one, a solution could be to wrap persistence operations in a transaction, perform a delete first then add new entity. If new ids are chosen is it possible to use “surrogate identity” (or something similar) and provide to the outside world an immutable id. The key concept here is don’t let the application or infrastructure fool you when modeling your domain: if you choose not to split things because of persistence tools like ORMs you can fall into worst issues. Moreover data is your ultimate source of truth as application and code may vary along the time but data must adhere to business/domain rules and having a unique Article entity could lead to bugs and consequentially corrupted data.

Conclusions

Status flags, sometimes, are a smell for a unique class which incapsulates much more than a single concept. When internal data is meaningful for a subset or worst for a single status you’ll obtain a not so cohesive class. This kind of situation calls for a split. Also when clients behave differently depending on flags this is true. You’ve seen a split in action and what it entails. You’re now also aware that more “uncommon” decision must be taken when dealing with splitted code (when not operating into RAD logics that’s pretty much always true) and more abstract code could be part of the codebase. Be also aware that persistence have to be managed “by hand”. However what you get in return is priceless: more expressive and flexible code, no null checks due to “polymorphic” nature of the class, no fuzzy exceptions, less buggy code and data.

What do you think about this solution? Is it something you’ve already adopted? Will you try to embrace it in your next or current project? Let us know and happy coding!

DonCallisto
DonCallisto
Articoli: 21

Lascia una risposta

Il tuo indirizzo email non sarà pubblicato. I campi obbligatori sono contrassegnati *

Questo sito usa Akismet per ridurre lo spam. Scopri come i tuoi dati vengono elaborati.