The Graal Virtual Machine

Categories: Java, Programming

Functionality

Oracle are well known for the Java Virtual Machine project (inherited from Sun). They have now published the first release-candidate for version 1.0 of a general-purpose virtual machine called Graal that supports executing:

  • JVM bytecode (production quality) - ie bytecode generated by languages such as Java, Scala, Groovy, Kotlin
  • Javascript (production quality) - including Node.js applications
  • LLVM bitcode, ie apps compiled from C, C++, Rust and other languages via the LLVM compiler (experimental)
  • Python, Ruby, and R (experimental)

Code in these languages can call into other code running within Graal, regardless of the language it was written in! Arranging for additional libraries (including the language standard libraries) to be available requires some steps, but is possible.

Not only does this allow running apps in a “standalone” environment, it means that any larger software package which embeds the Graal VM and allows user code to run in that VM can support any language that Graal supports. Examples include database servers which embed the VM for stored procedure logic.

Licencing

With Oracle, it is important to look at the licencing terms-and-conditions. This does initially seem to be OK; the code is completely licensed under the GPL2-with-classpath-exception, like OpenJDK. Oracle does warn that there is “no support” for the open-source code (aka “community edition”) and recommends that a support licence be bought for the “enterprise edition” instead - but OpenJDK is reliable enough, and so the Graal “community edition” will hopefully be so too.

Components

The Graal project provides:

  • A new JVM-bytecode-to-machinecode compiler, usable for both JIT (just-in-time) and AOT (ahead-of-time) compilation (this compiler happens to be written in Java).
  • The SubstrateVM - a new library that provides the lowest-level functions of a virtual machine - but does not interpret JVM bytecodes
  • The truffle framework - which helps compile other languages to JVM bytecode (eg Javascript, LLVM bytecode, Python, Ruby, etc).

Java version 9 or later supports “pluggable JITs”, making it possible to plug in the Graal JIT engine in place of the default (c++-based) JIT compiler.

The Graal compiler can also be used to compile code ahead-of-time (ie map JVM bytecode to native machine code), which improves startup time significantly. The result is an executable binary file; this does not embed a full JVM but instead embeds SubstrateVM which is a cut-down VM holding just the components (eg garbage collection) which an ahead-of-time-compiled application needs. Because the output of the AOT compilation contains no Java bytecode, SubstrateVM does not contain a bytecode interpreter or JIT.

When compiling Java code AOT:

  • any classes referenced from the application are embedded in the resulting binary, but not the whole Java standard library
  • and thus the resulting binary is small
  • but dynamic classloading is not supported (eg Class.forName(“..”)) and there are some limitations on reflection.

The lack of dynamic classloading is unfortunately a problem for many frameworks.

SubstrateVM is really a library of “core JVM functions” that an AOT application needs.

The truffle framework helps to build source-code-to-JVM-bytecode compilers. Instead of writing a traditional compiler, the developer only implements a parser and simple interpreter for their input langage. Truffle then transforms that interpreter into a cross-compiler from whatever source-code is supported to JVM bytecode; that bytecode can be interpreted or JIT-compiled to native machine code. Writing an interpreter is far simpler than writing a compiler.

Truffle can be run on any modern Java VM (verson 8 or later), and has been in development for a long while. Programs in any language for which a Truffle interpreter/compiler has been implemented (Javascript, Python, Ruby, LLVM, etc.) can be run on a JVM. When running just one language, then it is usual to use the dedicated tools provided in the GraalVM distribution, eg node which is a binary containing the truffle framework for node.js, the substrateVM, and other necessary bits - but no support for other languages. If you want to run a mix of languages within the same process then you need to execute node --jvm to instead run Truffle on a full (traditional) JVM.

The Graal and Truffle frameworks are themselves written in Java. It is technically interesting that a Java JIT can be implemented in Java - but “self-bootstrapping” languages are actually quite normal. The traditional Java JIT engine was in C++ which its maintainers find hard to work with.

LLVM bitcode is supported because someone has written an interpreter for LLVM using the truffle framework (in Java), and then lets truffle accelerate this interpreter to JIT-type performance.

The code Truffle generates relies on:

  • the JVM garbage collector
  • the JVM mutex and other synchronization primitives

which can be provided at runtime by either a traditional JVM or SubstrateVM.

The polyglot library provides a gateway between different languages running in the same JVM instance. It is available as a native language for each supported language, eg there is a python Polyglot library which provides an API for calling into other languages. The API is fairly ugly, ie calling between languages is possible but not elegant.

The Python support in Graal also runs C-language Python extensions! It does this by compiling the C source to LLVM bitcode and then interpreting that bitcode with the Graal LLVM support.

And you can even use Graal AOT to generate a library which can be called using the standard C conventions!

Graal has the ability to accelerate mathematical processing in dynamically-typed languages, due to the ability to skip most of the dynamic-dispatch and type-checks. Of course in many cases such processing is done by calling into native C libs anyway (R and Python often do this).

As mentioned in the top-10-things blog port, implementing a language interpreter in Truffle is easy - one of the easiest ways to implement a new language at all. And it is then automatically accelerated to very high speeds. It is therefore a serious possibility for anyone considering prototyping a new language. And you also get a debugger for free!

References