Categories: Architecture, Java, Programming
Introduction
In object-oriented code, how much business logic should be co-located with the data it manipulates, and how much should be externalized? Or in other words: how rich should classes representing domain model types be?
A number of important contributors to the theory of object-oriented programming recommend rich domain models, and warn against the opposite: anaemic domain models. However there don’t seem to be any clear guidelines for deciding where to put certain types of logic. This article sets out what I can find of existing recommendations, and adds my own thoughts. Your contributions are very welcome!
While I do have a personal preference regarding this topic, it is only a tentative conclusion; this topic is hard - as can be seen by the length of this article. There are many factors involved, and hard evidence of the superiority of one approach over another is hard to find.
And by the way: anemic = US English, anaemic = British English.
Context
In this article, I’m considering programs with non-trivial business logic which offer an API (ReST, gRPC, etc) to access that functionality.
Applications which don’t have much in the way of business logic (eg basic wrappers around a database) obviously don’t have to worry about rich vs anaemic domain models. Applications which are primarily about presentation (front end stuff) may also have a domain model, but don’t have a lot of logic to place on those types and so have different trade-offs which this article doesn’t consider.
The Pros and Cons of Rich Domain Models
Any system (problem domain) which is to be represented in software is a combination of things (nouns) and processes/workflows which operate on those things. Things have properties (state) while processes do not.
Object oriented software represents those things as classes with fields/properties - aka “domain model types”. But what do we with the processes/workflows?
Well-known software development writer Martin Fowler has expressed concerns that much of the software he sees has an anaemic domain model, ie too little logic is on the domain-model types, with it instead being on “service” types which manipulate the state of the model types fairly directly in a manner more resembling procedural programming (what he calls “transaction scripts”).
There are definitely advantages to pushing logic into domain-model types. They include:
- Readability - the types represent things in the real problem domain, and having all the logic related to that thing in one place is helpful for comprehending that type. Centralising logic for a type also helps programmers locate existing relevant functionality.
- Invariants - when all the logic that manipulates a set of state is co-located, then it is easier to verify that the state is always self-consistent ie fulfils the “invariants” for that type.
- Data hiding - state which is useful for implementing behaviour but which is not relevant for users of the type can be hidden, thus simplifying the API of the type.
- Polymorphism - related types which support a logical operation with different implementations can be elegantly implemented via virtual method dispatch.
Unfortunately there are some issues with implementing logic as methods on a specific domain-model-type:
- Persistence - should a domain type be responsible for loading/saving itself and other objects it interacts with?
- Operations which require “supporting objects” to implement - how should a reference to these be obtained?
- Code clutter - does it really improve the readability and testability of a model type to have (potentially multiple) complex workflow implementations defined directly on that type?
- Distributed systems - when objects are being passed between processes then the external properties of the types become the dominant concept; does it then still make sense to talk about “rich” representations of this type in specific components of the distributed system?
These topics are discussed below.
Other Programming Styles
Object-oriented design is of course not the only way to implement software systems.
Procedural systems clearly do not have “rich domain model types”, and there are many successful such programs. I would suspect that the most successful are at least highly modular, where the modules correspond to collections of functions and types - and therefore they can be seen as at least sharing the OO principle of “co-locate data and logic” and thus support those benefits of readability/invariants/data-hiding to a degree. Polymorphism is of course not directly possible; procedural programs typically contain various if-statements testing parameter types followed by type-casts. Alternatively, tables of function-references (manual virtual dispatch) can be used. There are also a few non-object-oriented languages which directly support some kind of dynamic dispatch at runtime based on parameter type - eg Rust’s trait objects or Groovy/Raku’s multiple dispatch.
Functional programming is similar to procedural in this aspect.
Anyway, the point is: even in an OO language (such as Java for example), it is clearly possible to write successful programs with “anaemic” domain-model-types which really resemble procedural/functional data-structures rather than classes. So the question is not a binary “anaemic or rich”, but: where is the sweet spot on this scale when working in an object-oriented language?
By the way, I’m not quite sure where to allocate the concept of “extension methods” as provided by C#, Kotlin, and various other languages. Is the logic then part of the “domain model” or not?
Other Articles on this Topic
Many people have presented their thoughts on this topic. Some of the most interesting articles/presentations are:
- Martin Fowler: Anemic Domain Model - a short and often quoted critic of “Anemic” domain models
- Vaughn Vernon: Effective Aggregate Design - an excellent guide from the perspective of Domain Driven Design
- Jimmy Bogard: Wicked Domain Models - a one-hour stage presentation (Youtube) on the basics of domain models
- Kamil Berdychowski: Domain-Driven Design vs Anemic Model
Other interesting articles/comments can be found in section “Further Reading” at the end of this article.
The Easy Bits
One aspect of “rich” domain models is to ensure they really map to business concepts. That’s not trivial to do, but is not controversial.
Another aspect is to avoid primitive types as much as possible. For example:
- If a property represents one of a set of possible values, use an enum and not an integer.
- If a property represents a money amount, then some type representing that should be used, and not a raw float.
- If some property naturally has two parts, then create a type that represents that pair of objects rather than having two properties on the model type.
- Avoid boolean-typed properties where possible; an enum is usually a better choice.
And so forth. It’s a little more code, but the clarity and type-safety is very likely to be worth it in the long run.
Each type should have a set of invariants, ie rules which say which values its properties can hold, define any relations between those properties, and what value transitions are allowed. The operations that make changes to the properties of a type must ensure the invariants are preserved (and reject modifications otherwise); this is best done when the mutation operations (or for immutable types, the operations that creates a modified copy of the object) are methods on the model type. Examples are:
- Email address must not be null or empty
- When address1 is non-null then postcode must also be non-null
- State may never be changed from CLOSED to OPEN
Somewhat related to invariants is dealing with concurrency; if the domain model type is mutable then it should ensure that concurrent calls result in correct behaviour. Leaving those checks to code external to the type greatly increases the risks of inconsistent/insufficient locking.
Where possible, plain property-setters should be avoided. In particular:
If you are calling two setters in a row, you are missing a concept (Oliver Gierke).
However the pushing of logic down into model types becomes more problematic when the logic needs references to objects other than the one on which the method is defined. Methods that manage “child objects” are generally not a problem, but accessing other objects which are not “simple children” of the model type can lead to issues. This is addressed in the following sections.
Persistence
One of the major items affecting the functionality that can be available on a domain model is persistence. There are two main aspects which affect the domain model code:
- How deep/complex are the in-memory references between domain-model-types (ie how big is the graph of references for each type)
- Are the domain model types persistence-aware or is that handled outside of the types?
Reference Complexity
Regarding references between domain-model types, Martin Fowler says:
There are objects, many named after the nouns in the domain space, and these objects are connected with the rich relationships and structure that true domain models have.
However this isn’t very clear on exactly how many relationships are considered appropriate. Evan’s DDD principles provide the very helpful concept of an aggregate - a set of domain model types which is atomically read or written. The top (and often only) domain model object in an aggregate is called the aggregate root. Vaughn Vernon (a major contributor to the concepts of DDD) has an excellent and detailed guide to defining the boundaries of aggregates, and this guide leans strongly towards very small graphs ie recommends against domain objects having complex (rich) references to other domain objects in memory.
From part 1 of Vaughn Vernon’s guide:
.. a high percentage of aggregates can be limited to a single entity, the root.
And also:
aggregates are chiefly about consistency boundaries and not driven by a desire to design object graphs.
It therefore seems that Martin Fowler’s recommendation “connected with .. rich relationships” appears to be meant as a logical concept rather than implying deep graphs of references between in-memory objects at runtime. Each aggregate (set of domain model types) is carefully chosen to match the transactional requirements of the application, and the types in that aggregate then hold only ids of logically related objects from different aggregates rather than real references to them.
Or in short, your domain model doesn’t need to be loaded into memory as a complex graph in order to be “a rich domain model”.
Vaughn Vernon is very explicit about this in part2 of his guide to aggregates. On page 8 he states:
Prefer references to external aggregates only by their globally unique identity, not by holding a direct object reference
ie a domain-model-type should have properties that hold only IDs of external entities, not direct references
Then in “Model Navigation” (also page 8):
Some will use a repository from inside an aggregate for look up. This technique is called disconnected domain model, and it’s actually a form of lazy loading. There’s a different recommended approach, however: Use a repository or domain service to look up dependent objects ahead of invoking the aggregate behavior.
Internal or External Persistence
Some code obviously needs to load the initial aggregate (top-level domain object and its immediate children) that any use-case interacts with, and then call the relevant method(s) on it. This code is part of the service/application layer.
But how do we handle cases where business logic needs to interact with other entities that are not part of the same “aggregate”? Options are:
- Methods on domain-model types use a persistence-context/repository/dao helper to load additional objects as needed
- The service/application code populates domain-model objects with references to all the things that the methods being invoked will need.
- The service/application code provides references to those extra objects via parameters of domain-model methods.
- Logic that interacts with objects outside of that initial set is defined in service/application code and not in the domain model.
Option 1 (“internal persistence” aka “disconnected domain model”) allows the maximum of code to be pushed down into domain model types. The presentation from Bogard linked to earlier happens to use this style (but see minute 36 where the issue of external services is addressed)1. However there are some issues:
- Each domain entity needs some way to get at the relevant “persistence context”.
- The performance implications of methods on the domain model type aren’t clear (persistence operations are hidden in the implementation).
- Business logic is mixed with persistence operations, and potentially also error-handling.
Option 2 requires the calling code to be very aware of which properties on the model types are mandatory for which operations - an undesirable coupling. The model type will also have properties which are only “valid for use” (set by the caller) in some cases - an inelegant situation.
One potential issue for both option 1 and 2 is that persistence helpers depend on domain model types; those types are what they read and write. However both of these options also require a dependency from the domain types on the persistence layer. Persistence frameworks whose APIs use generics are not affected, but approaches which use strongly-typed APIs may be.
In part2 of his DDD-aggregates article, Vaughn Vernon describes options 1 and 3 (see Model Navigation on page 8), but recommends option 3, ie that when a domain-model-type needs to interact with types that are not part of the aggregate then:
- the domain-model type should hold just the ID of those other types (aggregate roots);
- domain-model methods which need other types should take them as parameters;
- such methods should only read from those additional types, not mutate them; and
- the application-layer (ie service-type) should use those IDs to fetch required objects before invoking a method which needs them.
This seems to be a good balance. This approach;
- allows business logic that requires additional objects to still be part of the domain-model type;
- leaves responsibility for persistence in the calling layer ie does not mix persistence and error-handling with business logic;
- simplifies unit-testing of domain model types (no persistence to mock);
- doesn’t require injecting additional references into domain model types (ie avoids problems with options 1 and 2);
- makes dependencies clear (domain methods which require objects outside the aggregate have parameters which make that explicit);
- but does make code that interacts with entities external to the aggregate a little odd/unnatural in that it requires those entities to be provided as parameters even though the type has the IDs of those entities as properties.
Option 4 is effectively falling back to procedural/functional programming for more complicated business logic. As noted earlier, such code can work fine ie this is an option. It does move the domain model types more towards the “anaemic” part of the spectrum.
Lazy-loaded references are somewhere between option 1 and 2. The domain model type has a property which is “uninitialised” until read; some methods on the type reference that property while others ignore it. This allows a “natural” representation of child objects without having the performance impact of loading them if they are not needed. However it can lead to somewhat surprising performance behaviour; lazy loading is addressed in the next section. Note that it is probably best to consider the target of such lazy references as still being part of the same aggregate and not as a mechanism for referencing other aggregates.
So far, we’ve been talking about read-operations. But what about writes?
When using a document database, saves of an aggregate are relatively simple. Document databases don’t support update-operations on specific properties, and nested child entities are (re)persisted with their owning document/aggregate so there isn’t much to optimise - just store the new version of the whole aggregate. When using a relational database there is theoretically a benefit to updating just specific columns in a table row but in practice in most cases re-writing all columns in a row is adequate ie per-property change tracking isn’t necessary. Updating related entities is, however, important to optimise; deleting and reinserting all children of an entity is just not acceptable. It is necessary to know which child objects have been added, which have been deleted, and which have been updated.
Efficient loading/saving can be done by performing persistence operations as needed (eg issuing “update” SQL statements at appropriate points in the business logic), or it can be done by doing some kind of change-detection after the business logic has completed to detect which SQL statements need to be issued.
The “internal persistence” aka “disconnected domain model” aka “active record” pattern approach is one in which domain model types themselves are persistence-aware. In some cases (particularly the “active record” pattern), model types are generated - which obviously does not easily support adding domain logic to those types. In others, types must inherit from specific base types; this is a problem if working in a language which only supports single-inheritance. And either of these approaches can make unit-testing difficult. The best (cleanest) case is where each domain type simply has a single reference to the appropriate repository/dao service and has a few methods which delegate to that, eg save()
just calls myrepo.save(this)
. This allows mocking of those persistence operations, and avoids any complicated persistence logic in the domain model type (just a simple delegation). Nevertheless, having that reference on each model instance complicates instantiation of those types. Domain experts can generally be assumed to understand the need for persistence, ie adding persistence operations as part of domain logic doesn’t greatly interfere with the “communication” aspect. However persistence operations can fail, so performing them as part of methods on domain model types will complicate the method implementations with aspects that aren’t really business-logic-related.
Various frameworks (eg Hibernate, JDO) support automatic entity change detection, ie are consistent with options 1-3 above. Service/application-layer code loads a model, calls methods on it, then the framework figures out which entities have been added/removed/modified and figures out the necessary SQL statements to apply the corresponding changes to the database. However the price is very high: proxies wrap all domain types and collections of types in order to do this detection, an approach which is often fragile. It also blurs the boundary between persistence-entity-types and domain-model-types, ie increases the risk that the domain-model gets distorted to match the needs of the persistence layer. And, in the most common implementation, the domain model types get cluttered with persistence-related annotations.
Option 1 doesn’t require such change-detection; as domain methods change entities, it can register them for persistence later or just persist them immediately.
When using options 2 or 3, but not using automatic change-detection, then there are a few possibilities. The original and current state of objects and collections can be explicitly compared against each other to detect modified objects. Or objects can themselves provide “dirty flags”. Or domain model type methods can return information about which entities have been modified to inform the caller which objects need persistence.
Option (4) solves the problem by simply placing such modification logic on the service/application layer where persistence operations can be immediately carried out.
IMO none of the above are truly elegant:
- Active-record/disconnected-domain-model/internal-persistence mixes persistence and business logic (including persistence-error-handling), and requires injecting references to persistence support types into every domain type. With some implementations it can also mess with the inheritance structure and interfere with unit/integration testing.
- Injecting additional aggregates as properties on the domain model types depending on use-case (invoked method) increases coupling.
- Passing external entities as parameters is somewhat odd when the receiving type has their IDs already.
- Moving business logic to the service layer reduces the “object-orientedness” of the application which can potentially lead to duplicated code or unenforced invariants.
However on balance, option (3) combined with automatic or manual change-detection seems a good compromise - at least with respect to invoking persistence services. The use of other services, and consistency issues, are discussed later.
Whichever option is chosen, this affects the appearance of the domain model type APIs, and the way such types are instantiated. When option (4) is chosen, it also reduces the “richness” of the domain model types - though that isn’t necessarily a bad thing.
Note that in options 1-3 it helps when the aggregate contains only a small number of domain model types (ideally one); this limits the places where references needs to be injected (options 1/2) and the depth of call-chains ( option 3).
Lazy Loading
Some persistence frameworks support “lazy loading”. When a domain type is instantiated via “loading from a database” and has a collection of related entities, that collection can be initially “unloaded”. If (and only if) the collection is referenced, then the relevant database operation is executed in order to load that data.
This allows the same domain type to be used in multiple use-cases: some where that child collection is used, and some not.
Vernon makes clear that lazy loading is not coupled to the concept of an aggregate. An aggregate is the set of objects which must be atomically persisted together in order to fulfil system invariants. There are therefore two types of lazy loading:
- When the reference is to an entity that is still part of the aggregate - in which case this is just an optimisation of the aggregate to avoid loading in unnecessary circumstances.
- When the reference is to an entity outside of the aggregate - in which case logic must not mutate that object as that would effectively turn two aggregates into one ie break the model rules.
This is one case where persistence annotations on domain model types might be helpful; when annotations are used then it is clear which properties can trigger lazy-loading. When externalised ORM mapping is used (eg JDO mapping APIs or XML mapping specs) then it may not be clear to the reader of code what the performance implications of accessing a specific property is - or indeed where the aggregate boundary lies.
Lazy collections are either completely loaded or not at all. Code which iterates over such a collection, selecting just a subset of those items for processing, is much less efficient than if only that subset of items had been loaded from the database.
Depending on the framework, inserting a new member into such a collection could also trigger loading of all existing items - even though they are not actually needed to perform an insertion of a new record into the database.
IMO it is therefore a difficult decision whether to use lazy loading for “intra-aggregate” references or not. It could potentially be avoided by having a base domain type without the related entity collection, and a subclass with that property and the business methods that use that property. The service/application code which loads the type from the database instantiates the parent type when invoking methods that do not need the related entities, and the child type otherwise. This approach does, however, distribute the logic for one logical model type across parent and child class definitions. Using lazy loading for “inter-aggregate” references seems quite dangerous to me, tempting developers to modify objects that are not part of the aggregate; passing external aggregates as parameters makes this much clearer.
Accessing Services
We have already looked at one reason why business logic might need to interact with system services: persistence. And it seems that while it is possible to inject references to persistence helpers into domain objecs so that such logic can be part of domain model types, it is also possible to do the persistence in the service/application layer while still having business logic on domain model types.
However there are other cases where business logic needs to access services (complex non-business/non-stateful logic). Examples include:
- Making ReST calls to external systems 2.
- Sending emails (interacting with an SMTP server or similar).
- Rendering to PDF format.
- Validating a user password (rules may be complex or even configurable).
As with persistence, there appears to be a limited set of options:
- Inject a reference to the service into a property of some domain model type.
- Pass a reference to the service into invoked methods which need it.
- Implement logic which needs such services in the service layer, not the domain model type.
Both options 1 and 2 are effectively equivalent to the “active record” approach for persistence, ie the domain model type holds logic that interacts directly with external services - and in the case of ReST calls, must handle errors.
As described in the section on persistence, option 1 complicates instantiation of domain model types. In particular, if the persistence layer is instantiating the type as part of a “load” operation, then injection of references to additional services needs to be somehow hooked into that persistence layer (eg instantiation is done via a factory rather than a default constructor). Loading objects in other ways (eg based upon JSON received as part of a ReST request) also needs to inject the services appropriately.
And as described in the section on persistence, option 2 is somewhat clumsy. It exposes details in the API that ideally would be internal implementation details.
Putting such logic in the service/application layer does make the domain model types “less rich” but is a valid alternative.
Any other suggestions (or better proven solutions) would be very welcome!
Polymorphism-related Clutter
Is there an upper limit to the amount of process-like business logic that should be added to a domain model type? Is there a point at which it is clearer (and more testable) to represent a process as its own thing, rather than as a behaviour of a domain model type? If there are N different complicated processes that apply to a particular type, is it reasonable to define all that logic on one type? And if additional processes for the same type may be defined later, will it really be more elegant to add that logic to the existing type or to create a new class for that new process?
And what about logic that is applied to a set of similar types? While it is technically possible to add such logic to an “abstract base type” that relevant subtypes inherit from, I believe inheritance of implementation is falling out of favour. I certainly do not like it, and prefer either:
- putting the logic in a “helper function” which takes a domain-model-type as a parameter, and have concrete model classes call that as needed;
- applying the strategy pattern, ie invoking a domain model method passing an object which the method then calls back into; or
- putting the logic in a “service type” which simply calls into the domain-model-type (and yes, this is “procedural” in style)
In all cases, the actual logic is external to the domain model type, ie the domain model can’t really be considered “rich”.
The first two options do increase the ability to do data-hiding, ie reduce the need for getters/setters on the domain model type, as the data passed to the logic is provided by the domain model type itself. Those options also do make it easier to find all logic that applies to that type - but at the cost of clutter, and of needing to modify the domain model type in order to add new processes.
Dealing with Distributed Systems
When working in a distributed environment, and particularly in a multi-language environment, what does it mean to “transfer an object” between two processes? It really can only mean transferring the properties; the code cannot be sent. And therefore there is immediately a hard “break” between the concept of a “rich domain model type” with complex functionality in one application, and the fact that all that is transferred is a set of properties. Any properties of a type which are there just to “provide references to helpers needed by rich-domain-type methods” must be ignored when building the representation which is transferred; the receiver may not need those helpers or have a different solution.
Any architectural design document for the system will need to concentrate on the externally-visible properties of model types as a priority. Then each component that uses a type will have its own set of behaviours associated with that type - ie each component will have its own “domain language”. There may be similarity (and sometimes overlap) between the operations (methods) but the focus will be on the properties - just as with procedural design.
This doesn’t make a rich domain model impossible in any specific component, but does move the emphasis from behaviour to plain data in many discussions of the types.
Isolation of Remote and Persistence Layers from Domain Dependencies
Many architectural design patterns emphasise the benefits of a clean domain model, isolating it from issues related to communicating with the external world. Business logic should not be dealing with http-sessions or query-parameters. It should also not be dealing with SQL result-sets, transactions, etc.
However that code which handles incoming http requests does need to instantiate domain model types (or the service/application code needs to know about on-the-wire types). Similarly, the persistence layer needs to instantiate domain model types (unless there is an intermediate layer of “persistence entity types” which the service/application/business code knows about).
However many programming languages do not cleanly isolate the interfaces of types from their implementations. If the domain model types have complex logic within them, then they will usually depend upon a large suite of third-party libraries. When the remote/persistence layers depend upon those types, then they also depend upon those third-party libraries - and upon the entire declared API of those domain model types. This makes complilation less efficient, but more significantly puts up no barrier to inappropriate code creeping into those remote/persistence layers - after all, all the code needed to implement business logic directly is available in the classpath.
Languages such as C++, where interfaces are defined in header-files separate from implementation, do somewhat reduce this issue. Frameworks such as OSGi (for Java) can also help, by separating external and internal dependencies. Even in languages such as Java, it may also be possible to define a module containing only interface types representing aggregate roots and interfaces for factories for those types and have remote/persistence modules depend only on that “domain model interface” module. However in all cases, great care needs to be taken to avoid having inappropriate concrete types as parameters in any declared method.
Possibly compile-time tools that validate code against architectural rules can help (eg for Java: jqassistant or ArchUnit).
Having relatively anaemic domain model types also solves this issue. Such types can be defined in a module, and because there is relatively little business logic on them, that module needs few external dependencies. Remote/persistence modules which depend upon this core domain model module are able to instantiate the types, but don’t inherit unwanted dependencies and don’t see irrelevant business logic. These model types can still contain logic that enforces their basic invariants, just not implementation logic which has complex interactions with external objects.
An Existing Successful Anaemic Domain Model
One of the code-bases I’ve been working on recently is large, with a lot of business logic. And while not perfect, it is a successful application; it handles millions of transactions per day, and earns millions of Euros for its owning company per year - and has done for a decade.
It also uses a fairly anaemic domain model, with business logic in services rather than in the domain model types. And it works.
The application is in Java, split into multiple Maven modules, and its structure is roughly as follows:
- domain model type module (where types are relatively anaemic)
- service interface module
- service implementation module (actually, multiple, divided by subject area)
- persistence interface module (defines DAOs for loading/saving domain model types)
- persistence implementation module
- remote (ReST) implementation module
The business logic is cleanly separated from the rest of the system; the service-interface, service-implementation(s), and domain-model-type modules together form this “layer” and have no dependencies on ReST-related concepts (or any future possible interfaces such as GraphQL). The service implementation module(s) do have a dependency on the persistence-interface module - which contains interfaces only.
The persistence implementation module depends upon the domain-model-type module in order to be able to load and save these types. However while that module contains concrete classes (not just interfaces) the code in that module is relatively simple and has few dependencies on external libraries. The persistence implementation module does define its own entity types where needed, ie the relational persistence model is not necessarily 1:1 with the domain model types. In this particular case, JPA and similar are not being used, ie the domain model types are free of persistence annotations. However using JPA or similar would not change anything significant in this architecture.
The remote implementation module depends on the domain-model-types module and the service-interface module; neither of these have complicated dependencies. It defines its own DTO types for communicating with the outside world; these are internal ie not exposed to other modules. Requests are mapped to corresponding data-model-types and then the appropriate service is invoked, passing those model objects. Additional kinds of remote interface can be implemented using the same pattern, without any changes in the domain model itself.
The service implementation modules hold the majority of the business logic. Domain model types do follow the rules described in section “The Easy Bits”, ie are not just plain data structures. In particular, an effort is made to represent domain types properly (avoid primitives where possible) and to define methods which allow domain-model-types to enforce their own invariants where possible.
It is true that this approach does require domain-model-types to have more setters/getters than would be needed in a truly rich domain model; the business logic in the services needs access in a way that could be “internal references” if that logic had been pushed down into the model types. This in turn does increase the possibility of objects ending up in inconsistent states. It also makes locating all the logic associated with a particular type more difficult - care needs to be taken with naming and package structures.
However this approach also solves a lot of the issues which are discussed above in this article, eg:
- Interacting with persistence and other helper objects.
- Clutter on domain types.
- Being part of a distributed system.
When business logic is defined on singleton service objects which are instantiated at application startup, then standard dependency injection can be used to inject references to any other services needed including things such as daos/repositories/persistence-contexts, helpers for making outbound ReST calls, helpers for validating passwords, helpers for generating PDFs, and whatever else may be needed.
Multiple processes which apply to a domain model type can be defined as methods on a single service type, or on multiple service types, depending on how similar the processes are to each other. Adding new processes does not require modifying the domain model type. Such processes can be unit-tested independently of the domain model type itself. Modern IDEs also make it easy to find all code referencing a particular type, ie the quoted benefit of a rich domain model in making logic for a type “findable” is perhaps not as strong as it once was.
And the focus is on the data that the model types represent, making it easier to deal with distributed systems. When discussing topics regarding data ownership, replication, and inter-system communication, the emphasis is always on the data rather than the behaviour and so keeping the data in focus (avoiding over-complicating domain models) helps such discussions.
This could also be considered compatible with the Hexagonal Architecture pattern; the services-interface and persistence-interface modules form ports with the remote-implementation and persistence-implementation modules being adapters.
While this approach may be criticised as “not OO”, it could be considered “functional” instead - or at least tending towards that end of the spectrum. And functional programming practices are also a valid design pattern. I’m also not aware of any conflict with the concepts of Domain Driven Design in general; the ideas of an ubiquitous language, aggregates, etc. still seem perfectly valid in this kind of application architecture.
My Personal Preferences
Here’s what I personally feel is the best balance to all the issues discussed above.
As noted in “The Easy Bits”, do define domain-model-types that represent the “things” in the problem domain, together with their logical properties. Define the invariants for this type (valid set of values for each property, and rules that define valid combinations of properties), and avoid adding setter-methods which allow these invariants to be violated:
- Where possible, add logical operations instead of setters, eg “clearHistory” which resets a set of properties at the same time rather than a setter for each.
- Otherwise ensure that setters reject calls which would result in an invalid object state.
Now add as much logic to the domain-model-type as possible, as long as it doesn’t require:
- Adding references to domain-model objects which are not part of the same aggregate (have disconnected lifetimes).
- Adding references to “helper objects” which aren’t domain-model-types at all.
Then for large, complicated “process-like” methods, consider whether they might be better as either helper functions that the domain-model-type calls, or methods on a “service type”.
When logic is implemented in a service rather than in a domain-model-type, do pay careful attention to service naming, and grouping (eg by package), so that logic associated with a particular type is easy to find. The fact that logic isn’t on a particular class doesn’t mean that it should be randomly scattered throughout a code-base.
Or in short: do take advantage of OO design principles where they naturally apply; create that useful correlation between code and the language that the business experts use. And take advantage of OO to support the invariants for each type. However don’t be afraid of service types and procedural/functional programming styles to implement the processes/workflows that glue them together - even if it makes the domain-model-types themselves “less rich”.
Postscript: Fine-grained Microservices
As noted in section Context, the issue of rich vs anaemic domain models only applies where an application has complicated business logic. When implementing a microservices architecture with very fine-grained microservices, then each code-base is pretty small; if less than (say) 50 classes, then then all the rules about well-structured code, domain models, etc. are mostly irrelevant. There is of course a “domain model” encoded in the API of the component, but the internal code can be thrown together without any particular structure and will still be understandable and maintainable.
Further Reading
There are a number of writers who have addressed this in online articles. A few of the best articles were listed in section “Other Articles on this Topic”; here are a few more I discovered during research for this article.
Generally interesting views:
- Sapiens Works: Rich Domain is SOLID - presents the view that DDD “use cases” are not part of the “domain model” itself, and are actually part of the DDD “application layer”.
- Ivan Paulovich: Rich Domain Model with DDD/TDD - a good reminder of some important (anti-)patterns: Feature Envy, Primitive Obsession, Public Setters Abuse.
- StackOverflow: Rich vs Anemic - a somewhat rambling StackOverflow thread, but with interesting contributions
- Stackoverflow: The Different Kinds of Services - the question isn’t particularly interesting, but the response from Aaronaught is very informative, clearing up ambiguity in uses of the word “service”.
Articles related to data persistence and OO:
- Mehdi Khalili: ORM Antipatterns: Active Record - actually the whole series of articles on ORM patterns is excellent
- Tulka: Object Oriented Design vs Persistence
- Brad Irby: What is the best pattern to use for data persistence
- Matthias Noback: ORMLess
Articles generally promoting rich domain model:
- Ismael Mota: Anaemic Domain Model vs Rich Domain Model
- Rene Link: Anemic vs Rich Domain Models
- Giudo Deschamps: The Anemic Domain Model
Other:
- Vonos.net: Anaemic Domain Models - some earlier thoughts of mine on this topic, particularly with respect to OSGi and service lifecycles. Mostly superceded by this article.
Footnotes
-
The presentation also talks about creating abstract type ExpirationType and subclasses in order to avoid a switch which is IMO unnecessary; I don’t see it as more elegant than a switch, assuming a sane language. ↩
-
In the specific case of ReST calls to external systems, it is worth considering whether they can be removed. When the call is intended to notify an external system of an event, then perhaps sending a message via a message-broker could be a better solution - or at least writing an “event” record to the database and using a separate thread/process to send the actual event. When the call is intended to retrieve data from an external system, then perhaps it is possible to keep a local read model of relevant data from that remote system, so that accessing that data is just a read from a local database. Sending emails is also something that could potentially be handed off to some other thread/process, ie done asynchronously rather than as part of the request-handling thread, and thus removing need for error-handling or complex dependencies. ↩