In object-oriented code, how much logic should be co-located with the data it manipulates, and how much should be externalized? And does the answer change when talking about OSGi vs traditional Java code? No real conclusions here, but perhaps some food for debate…
Procedural vs Object Oriented
Early programming languages were all procedural, and many modern ones still are. Data is stored in datastructures, and the logic that manipulates datastructures is stored elsewhere. Such logic takes a datastructure as a parameter and either transforms it or returns a modified version. Examples of such languages are C, Fortran, Cobol, Basic, plain Lisp. Object-oriented programming instead tries to group data and associated logic together.
One of the influential object-oriented books I read many years ago (title long forgotten) stated that any class with the word “manager” in its name should be regarded as suspicious; normally such logic should instead be moved to the managed type. Martin Fowler coined the term Anemic Domain Model for the approach in which objects are either “state with no behaviour” or “behaviour with no state” (my rephrasing). Note that trivial getter and setter methods do not count as real behaviour - they are just a cleaner form of access to a datastructure (anemic domain model).
Martin’s article (often referencing Eric Evans) talks about a service layer and the services in it: The key point here is that the service layer is thin - all the key logic lies in the domain layer.
The procedural programming article on Wikipedia addresses some of the same issues.
The object-oriented idea of grouping data and logic together has some great benefits: during analysis, maintenance. Costs for system changes tend to be better aligned with business expectations in an OO system : a small conceptual change is more likely to map to a small code change when the code-structure is modelled on the same concepts the business users have. I’ve worked on a number of large bespoke applications that needed maintenance over many years, and having IT effort correlated with managers’ perception of the size of the effort is very valuable. At a developer level, encapsulation brings many well-known benefits.
Unfortunately, the idea of grouping data and logic together encounters some problems in the real world:
- OSGi services and their lifecycle
- logic that affects multiple domain objects
An Example and its Implications for Code Structure
One example use-case I encountered recently: a relative date class which expresses an offset in days. The offset can optionally be counted in working days, in which case the relative date instance also specifies a calendar. Different countries have different calendars (and a company might have its own private calendar). However calendars can be large objects so the relative-date instance should hold a calendar identifier rather than a reference to a calendar instance (this also helps with serialization and persistence). The obvious API for a relative-date to provide would be
Date resolveRelativeTo(Date d). When the relative-date is not a working-days-offset, this is trivial to implement. Otherwise, it needs access to a calendar service through which it can map the calendar-id to a calendar, and thus return the correct date.
Initialising a new instance of a relative-date with a reference to a suitable calendar service is no problem. However (a) when such a class is serialized/deserialized, what happens? And (b) in an OSGi environment, where a service may be replaced, then what?
With some custom serialization/deserialization code, it is possible to solve (a) by replacing the
servicereference on serialization with a logical service id of some sort, and performing the reverse on deserialization. I have actually implemented this in the past.
When using OSGi in Blueprint style, it is possible to solve (b) by using a service proxy. This also works with serialization/deserialization : as above, map to a logical service id on serialization, and rebuild the proxy on deserialization.
However as I mentioned in my recent article on OSGi Dependency Injection, the Blueprint style has many disadvantages and it is generally better for objects with mandatory service references to be managed so that they have the same lifecycle as the services they need.
Note: There might be some security issues when creating service proxies : if the other end is trusted to specify any service it might be possible to point to a service that was not expected. However it’s a fairly unlikely attack vector, as the service must implement the interface that the deserialized type expects. Any code that gets a service should also ensure that the service is released at an appropriate time (see
BundleContext.ungetService); AFAIK this is actually only relevant for cleanup of per-bundle objects allocated via a ServiceFactory, but is the correct thing to do.
As the OSGi dependency injection article notes, holding a service reference is acceptable for a short period of time. In this particular relative-date use-case, the instances only ever existed for short periods of time. They were created as part of a logical transaction or an RPC call, and discarded at the end - ie lived as real in-memory objects for less than a second. I therefore implemented the locate-service-on-deserialization strategy described above, and it works fine. It also allows the class to preserve the nice minimal API that takes a date and returns a date, hiding all other details internally.
However consider what would happen if such an object were to be held long-term (eg placed in a cache). When using “native” OSGi lifecycles, it makes little sense for a POJO to hold a reference to another service; instead such an object should be managed so that it is disabled/discarded when that service is not available. The cleanest way to avoid invoking a non-available mandatory service is for the code doing the invoking to be itself unavailable when any of its mandatory services are. That clearly makes no sense for an object like a relative-date, which would appears to leave only one option: pull out the relevant logic into a separate service and reduce the domain object to a dumb datastructure.
In some cases, such refactoring could instead be limited to passing services as parameters to the domain object (rather than the domain object having a permanent reference to such services), though that has its own price: the caller is then exposed to intimate details the domain object would not otherwise need to expose. In the relative-date example, (a) it doesn’t seem logically relevant for the caller to have to provide a calendar; it just wants to transform a date and (b) in this particular example, the calendar is only needed sometimes which makes it even uglier to force the caller to provide the corresponding service. This approach also works poorly if additional subclasses are added in the future, which might need different services to perform their
resolveRelativeTo behaviour. Information-hiding really is useful.
It may be worth considering OSGi services in two groups: mandatory and optional. When thinking about domain objects and their use of sevices, we are almost always talking about mandatory services. And even in OSGi, mandatory services don’t come and go often. Well, except when using bndtools for development, which relies on being able to unload/reload any bundle whose code has changed. On the positive side, in the bndtools situation there are no “inflight transactions”, ie the short-lived kinds of objects discussed in the previous paragraph are presumably all gone.
It is interesting how the popularity of ideas evolves:
- structured programming (procedural code operating on data structures)
- object-oriented programming with deep class hierarchies (extension based on subclassing)
- object-oriented programming with interfaces and delegation (aggregation preferred over subclassing)
- POJOs (based on annotations and naming-patterns to avoid having framework interfaces polute the business logic). See JEE for example, which initially was very intrusive.
- and now with clustering and polyglot programming on the rise, the benefits of a data-structure approach is rising again
And functional programming is also generally based on transforming of data-structures (though a number of functional languages also support object-style coding).
In the end, procedural code is not the end of the world. Many successful applications have been developed this way, and as the wikipedia article notes, this is still a disputed area with various advantages and disadvantages. The Haskell programming language is a functional language with object-oriented features, yet some in the community consider the OO features undesirable and would like to see them removed. There certainly is no consensus (any more) that OO programming is the most effective style.
And OO programs have always been a mix of these styles; there were always cases where logic did not belong directly on domain types - controllers, validators, etc. Persistence frameworks have almost always been implemented as external logic: persister.save(user) rather than user.save().
So if OSGi services also push us towards separating logic and data, then maybe the resulting procedural-plus-datastructures code is acceptable. Interfaces are still useful; they decouple service implementation from service interface, and can be used to hide some minor variations in data structure representation behind the getters and setters. However it does not allow the kind of extensibility that was hoped for with OO, where new business rules could be implemented just by creating a new variant of an existing domain model type and implementing the desired behaviour in the methods of that type. Instead, such modifications will affect both the datastructure class and every service that manipulates it - the loss of polymorphism increases coupling.
The primary difference between annotations and interfaces is that annotations are optional at runtime (class-resolution time). Interesting thought : what if Java had simply made interfaces optional when loading a class? This could have covered the same use-case as annotations on types, and some of the use-cases for annotations on methods.
Having business logic on domain models has always been a problem for cases where two objects need to cooperate to perform an operation. Which of them holds the overall logic?
Eclipse EMF is another technology which encourages anemic domain models. Datastructures are defined via a GUI editor, and code is then generated. However it is very difficult to associate real business logic with such classes.
When performing dependency injection, each object can have an associated lifetime scope, and injecting a short-lived object into one with a longer one will cause problems, as the long-lived object may later try to reference this object when it is no longer ‘live’. Unfortunately, an OSGi service can be considered to have undetermined scope, as it can be stopped at any time. This is discussed in my recent article on OSGi and Dependency Injection. And unfortunately this again implies that a “domain model” should never have a reference to a service - and the longer the lifetime of the domain object is, the worse the problem. Reducing the functionality implemented directly on the domain model (so the service is not needed) appears to be the only solution…
In OO development, it is important that a class enforce its invariants (ie ensure its internal consistency); it must not expose fields directly, or expose methods that can leave the object in an invalid state. However whether other functionality to manipulate an object is implemented on the class, or externally is open for debate; moving logic related to a class into that class is certainly “more OO”, and via polymorphism allows different implementations of such logic. However as noted above that can lead to the object needing to hold references to services which can have nasty consequences. Some languages support “dynamic dispatch” via mechanisms other than an object’s vtable which makes externalising object-specific logic more practical; sadly Java does not.