What is EMF?

Categories: Java

Like so many other topics on the internet, most tutorials for the Eclipse Modelling Framework (EMF) plunge straight into some made-up-example-program that shows how to use it, without first explaining what it is and why you might want to use it. Here is the missing overview!

At its core, EMF is quite simple to describe:

  1. An EMF “meta-model” is equivalent to a simplified UML class diagram, ie defines classes, their properties, and the relations between classes
  2. Tooling applications can take the meta-model as input and do things such as analyse the classes for complexity or compliance against certain rules, or can generate output based on the input model.

The EMF project provides the following tools that operate on a meta-model:

  • a code generator that can create corresponding Java classes with various interesting method implementations
  • a generator of JPA configuration files for mapping EMF modelled objects into SQL database tables
  • a generator that generates the Java code for an Eclipse RCP user interface which can edit the EMF modelled objects

Presumably there are a number of other tools which can operate on EMF meta-models (feel free to list them in the comments section of this posting).

 

The Java Code Generator

The standard code generator features include:

  • Adding generic set(int fieldid, Object value) and get(int fieldId) methods to each generated class
  • Creating setter methods which not only update the local field but also send a “change event” to listeners

The generic getters/setters together with the auto-generated field-identifier constants can be used by other EMF classes to do “fast reflection”. These generic getters/setters can also be used via EMF-aware serialization (eg serialize-to-binary or serialize-to-xml) classes. A “field id” can be used to identify a specific property on a bean in a way that is difficult to do with plain Java (a java.lang.reflect.Method object can’t be referenced directly at compile-time, and isn’t quite the same thing as a property anyway).

EMF goes to extreme lengths to encourage developers to access types via their interfaces only. For each type defined via an EMF model, the Java-code-generator creates an interface class corresponding to the model, and a separate implementation class. For each Java package it then generates a factory class through which types from that package can be instantiated (rather than using the “new” operator).

The “change events” mechanism can be used to generate “audit trails” of changes to all child objects of an arbitrary root object. This can be useful for supporting “undo/redo” functionality, and also for keeping two trees “in sync” by transmitting just the changes.

The generated Java code can be edited, and reimported into EMF, ie “round trip” functionality works so you can add hand-crafted code to EMF and then update/regenerate without losing changes. The generated code is reasonably readable although a few oddities appear such as lists being of type EList instead of plain java.util.list, and various EMF annotations being added to classes and fields. The meta-model itself is saved as a “.ecore” file (internally, xml).

Eclipse also provides a few library classes that work together with the output of the standard code generator to achieve useful effects described below.

Unfortunately, although hand-made changes are supported, it is nevertheless inconvenient to make major changes to these classes; the presence of auto-generated code makes them unnatural to work with as regular code. These classes therefore typically end up as simple “data transfer” types without any significant business logic.

Resources and Proxies

One of the common problems when dealing with Java objects that are loaded from a database or distributed between a client and server application is deciding which of an object’s child properties to populate. An object may contain references to many other objects (particularly when it has a member which is a collection of other objects). Passing the entire object with everything it references across the network (or loading it all from the database) may take a lot of time, memory and bandwidth - and in many cases the code using the object will not need all those referenced objects.

The Hibernate ORM allows relations to be “lazy”; when the main object is loaded and it has a lazy relation to some other object then Hibernate will set the object’s property to be a proxy object (which implements the expected type). If the proxy is ever invoked, then it loads the actual data from the database.

The Eclipse “Resource Framework” system works similarly. Any EMF-generated object can be replaced by a proxy which is of the expected type but internally just wraps a reference to a Resource and a url. If the proxy is ever accessed, then it asks the Resource to return the object identified by the URL (which contains among other things the object type and unique id). The Resource checks its internal pool of objects and if it doesn’t have the object then the appropriate url handler is invoked to obtain the object somehow (eg via database load or remote RMI call to a server).

EMF Editors

EMF can also generate an Eclipse editor component that can edit a modelled class (and its fields) in a typesafe (but not particularly elegant) way.

So Why use EMF?

Positives

Being able to generate multiple outputs from a single object model allows “unification” of the various ways that Java classes can declare meta-information about their properties. Jaxb for xml import/export has one set of annotations for this, while JPA (Java Persistence Api) has an almost equivalent yet different set. Having a common abstract meta-model means that generated code can have the appropriate annotations (or config files) generated reasonably easily. Not only code can be generated; there is a standard API for analysing an EMF meta-model, and things like configuration files can also potentially be generated from the data in the model.

Having the option for all setter methods to dispatch a change-event to a listener is useful in many ways. Adding such boiler-plate code by hand quickly becomes tiresome.

Negatives

The generated classes are internally ugly. They extend Eclipse base classes, ie you cannot choose what the actual base class for the type is. The code is sort of readable, but certainly not elegant. It isn’t easy to add business logic to them, so the generated classes tend to be “dumb data transfer objects” rather than have real domain behaviour.

The Resource/Proxy framework shows all the flaws of the Hibernate proxy system - unpredictable runtime behaviour, difficult to debug, and forces all IO behaviour through a single simple interface when in fact different types may need different behaviours. There is also a very steep learning curve associated with the Resource framework code.

Having custom logic in all setters could also be achieved by byteweaving techniques rather than code-generation; this would keep unnecessary details from cluttering the code that developers see.

Summary

EMF might be useful if you have a multi-language system, eg C++ talking to Java talking to Python. In this case, you can write the OO design in a modelling tool, then import into EMF and generate java code. However that describes few real-world projects. Having a model is also useful for more complex object relationships, as a diagram may be easier to understand than dozens of interfaces. On the other hand, there are already many modelling tools that can import existing code for this purpose without needing the “generate” part. EMF is primarily an IBM-driven project coming out of their needs for developing systems that are large, complex, multi-programming-language, and very formally-designed - and IMO that is where EMF best fits, if anywhere.

Is EMF worth it? The Eclipse Foundation people seem to thing so; several significant Eclipse IDE parts and plugins use EMF-modelled classes. I personally find it an ugly wart; compile-time code generation has never seemed elegant to me. It would be much nicer to see this sort of thing achieved with AOP and byteweaving applied to normal POJOs using standard Java annotations.