OSGi-aware Serialization

Categories: Java, OSGi

Introduction

Java native serialization was not designed for use in an OSGi environment, and there are some issues that OSGi developers might be interested in.

I recently developed a solution which solved these issues for my particular (reasonably common) use-case : a client/server design where both ends were OSGi-based, and both developed by the same team (so many bundles were shared between client and server). Details are presented below…

The Classname Problem

It is quite common to map a java object (including the objects it references) to a stream of bytes, send it across a network, and turn it back into a tree of objects at the other end. When both ends are implemented in Java, then Java’s native serialization is a good choice for the serialization process.

It is common for there to be a dedicated bundle that handles such communication, and that this bundle:

  • does the serialization for outgoing data (ie is passed objects from other bundles, not streams) and
  • does the deserialization for incoming data (ie converts the incoming stream to objects before passing them to other bundles)

Writing objects to the stream is done by a java.io.ObjectOutputStream. As it writes each object, it checks whether that object’s class has previously been written to the stream; if not then it writes a class-descriptor to the stream first. All instances of that class (including the first one) then include the id of the object-descriptor before their field-data is written. This class-descriptor contains the fully qualified name of the class, as a string. In an OSGi environment, this presents no problems; given a reference to an object (even an OSGi service), a simple call to Object.getClass() returns the concrete class even when that class is not exported from the defining bundle.

Reading from the stream is done by a java.io.ObjectInputStream. Each time a class-descriptor is found in the incoming data, the local class with that name is located and added to a table. As instances of that type occur in the stream, the table is used to find the corresponding concrete class and a new instance of that class is instantiated. Finding a class by name isn’t a problem in a traditional Java application with a flat class-path: Class.forName(somename) will find it if it is available. However in an OSGi environment, Class.forName doesn’t work well - it only sees classes in packages which the bundle doing the deserialization has imported!

Even if there is no dedicated bundle for doing the communications, the problem of mapping a classname to a class during deserialization still exists; whichever OSGi bundle is doing the deserialization might not have access to the implementation class specified in the incoming stream.

Using DynamicImport-Package

The bundle doing deserialization can use DynamicImport-Package:* which allows it to see all packages exported by all bundles in the OSGi container. However this has the following problems:

  • it still can’t see classes which are not exported by the bundle that defines it;
  • dynamic imports are slow and inelegant;
  • it doesn’t handle cases where multiple bundles export the same class in different versions

In many cases, the code doing the deserialization knows which bundle will consume the object being deserialized. However that doesn’t always help - asking that bundle to load the class by name may still fail in cases where the object is being referenced via an abstract interface. A bundle may accept an object with interface X as a method-parameter without having access to all implementations of interface X.

I believe the common solution to this kind of problem when developing cooperating OSGi applications is to ensure that all the concrete (implementation) type of all objects that might be serialized are exported from their defining bundle, and having the deserializing bundle use DynamicImport-Package as described above. However I personally find it rather ugly to have bundles export the implementations of many classes only to be able to support deserialization (when they could otherwise remain cleanly hidden as implementation details).

Delegating to Another Bundle

Using a custom subclass of ObjectInputStream, it is possible to look up classes via a specific bundle - or even a list of bundles.

In some cases, a “communications” bundle may know which other bundle it will be passing the deserialized objects to. In this case, it may be best to look up each class by-name using that bundle rather than the “communications” bundle. Or perhaps try both.

This may work in many cases. However it doesn’t guarantee that all classes can successfully be deserialized; consider:

  • Bundle X defines an interface MyDomainObject1
  • Bundle Y defines a service that returns a MyDomainObject1Impl
  • Bundle Z defines a service that accepts a MyDomainObject1 as a parameter
  • Bundle C is the communications bundle

In this case, when C processes an input-stream containing the data for an instance of MyDomainObject1Impl, asking bundle Z to provide that class won’t work - even though Z would be the recipient of the deserialized object.

Writing the Bundle Symbolic Name to the Serialization Stream

I have implemented a solution to this serialization problem which may not work for all usecases, but does work for a significant portion of them : subclassing ObjectOutputStream so that when a class descriptor is written (method annotateClass), a marker object containing the symbolic name of the bundle that the class came from is also written. This information is easy to obtain: static method org.osgi.framework.FrameworkUtil.getBundle(Class c) returns the bundle associated with any Class. There is little performance hit - this is only done once for each unique class written to the stream, not each instance.

The receiving end needs a subclass of ObjectInputStream which retrieves the bundle symbolic-name and then loads the class from that bundle (in method resolveClass). There is no standard API to find an OSGi bundle by name; I used a bundle-tracker to maintain a map with this information.

There are two limitations I am aware of:

  • the bundle symbolic-name must be the same at the sending and receiving end; and
  • there should not be multiple versions of that bundle available.

When developing a typical client/server application which communicate using serialized objects, neither of these issues are likely to be a problem.

Note: some classes should be exempted from having the bundle-name output for them. The custom class that wraps the bundle-symname is one (to prevent infinite recursion!). For performance, I also excluded any classes coming from the system bundle - but when testing for that, make sure that primitive types and arrays are correctly handled!

The total amount of code for my implementation was:

  • about 30 SLOC for the custom ObjectOutputStream
  • about 50 SLOC to determine whether a class is a system class (ie whether to skip writing the bundle-symname for it)
  • about 100 SLOC for the custom ObjectInputStream
  • a dozen lines of code for the bundle-tracker that maintains name->bundle information
  • about 200 SLOC to implement a method loadClass(ClassLoader cl, String type) which is capable of handling the types present in the serialized streams. Sadly, Class.getName returns strings that ClassLoader.loadClass does not support - particularly related to primitives and arrays.

So: a significant amount of work, but not huge.

Mixing Standard and Custom Serialization

The custom ObjectOutputStream/ObjectInputStream implementations need to be used as a pair : if one end of a socket sends a ‘standard’ object stream to a server, and the receiver tries to read bundle-names from the stream, the stream will fail to deserialize. A similar failure happens if the custom format is sent but the receiver uses a normal ObjectInputStream.

In many use-cases this will never happen, ie on a given socket the receiver can be sure which serialization format has been used. However if you need to support a mix of code using custom serialization and standard serialization then this is possible. Each stream starts with a ‘stream header’ containing a (magic, version) pair of short (16-bit) integers. The version value can be used to indicate when the serialization format has been modified.

Other Options

The (dead) Apache Felix Serialization Framework project was an attempt to address the serialization issue by defining a “serialization service” that each bundle could hook into. This was apparently abandoned without any real progress being made, and I am not aware of any other attempts to solve this issue. AFAIK, most people just use the “DynamicImport” approach documented above, and export lots of implementation classes from their bundles just for the purpose of supporting serialization.

It may also be that many people serialize to JSON or XML, in which case there are other problems to deal with…

The serialVersionUID Problem

I previously wrote an article about serialVersionUIDs, and recommended that in general explicit serialVersionUIDs are a bad idea, and that the JVM should be left to compute them automatically. However there is a problem with classes containing synthetic methods. These can at least be detected via custom ObjectInputStream/ObjectOutputStream classes, and reported so that manual serialVesionUIDs can be added to the (relatively few) classes which are affected. This isn’t an OSGi-specific issue, but if a custom classloader is being built to support OSGi-aware serialization then it makes sense to also add support for this too.

In methods ObjectOutputStream.annotateClass and ObjectInputStream.resolveClass, the class being handled can be tested and an error reported if it:

  • uses a default (autogenerated) serialVersionUID; and
  • contains any synthetic methods

Reasonable testing will then reveal any problem classes which really require a manually-assigned serialVersionUID due to the existence of synthetic methods.

As checking for the serialVersionUID and synthetic methods is a moderate performance hit, it is best to keep a cache of classes seen so far. In an OSGi environment this needs some careful implementation, in order to correctly support uninstallation of bundles.

Alternatively, if you prefer to assign all serialVersionUIDs manually (and update them whenever an incompatible change is made!), then custom input/output streams can be used to detect any classes without explicit UIDs. Sadly, it cannot be used to detect cases where incompatible changes were made without changing the UID…

Implementation Issues

I mentioned earlier that the stream header can be used to distinguish between “standard” and “custom” serialization formats. However the ObjectOutputStream code has a design flaw : method writeStreamHeader is invoked from the constructor. There is therefore no easy way to pass information to a customised version of this method: the method signature is fixed, and no members on a subclass can be initialised before it is invoked from the parent constructor. The only reasonable way I found to pass parameters to an ObjectOutputStream which influence the (magic, version) header output by the stream was to use a thread local variable!

Handling primitive types and arrays is tricky. Given an object to serialize, the class-descriptor written includes a string which is simply generated via object.getClass().getName(). When the object is a normal object then the behaviour is reasonably obvious : the string is just the fully-qualified classname. However when the object is a primitive, or an array, or an array of arrays, or an array-of-primitives, etc. then some very special string syntax is generated. Unfortunately the standard method ClassLoader.loadClass doesn’t accept such strings, so on the deserialization side a significant amount of work is needed to ensure the right object gets instantiated in these cases.

Related Topics

When building RPC-style communication using serialized objects over a network, the handling of exceptions can be tricky. I wrote an article about that earlier.

Quirks of Serialization

Pop quiz: under what circumstances can a class-descriptor for a non-serializable class be written to a serializable output stream?

Answer 1:

class NonSerializable {
}

class Foo<NonSerializable> implements Serializable {
  private String thing;
  private transient NonSerializable thang;
}

Although no instance of the NonSerializable type is ever written to the stream, a class-descriptor is still written. And yep, this is from experience :-)

Answer 2:

class Foo implements Serializable {
  private Class<?> clazz = NonSerializable.class; // ok as long as no INSTANCES are sent..
}

Another quirk of serialization: the standard ObjectInputStream silently fails when readObject/readResolve throws an unexpected (Runtime) exception. For anyone subclassing ObjectInputStream and overriding these methods, the solution is to catch all runtime exceptions and rethrow them as checked type InvalidObjectException.