Trying out GraalVM

Interested in running Java a bit like running C/C++ or Go? I wanted to give it a try: GraalVM is a very interesting project for a few reasons - a new Java JIT, running a variety of languages on the JVM, and, most importantly for here, the native image capability. Using native images means programs for the JVM are compiled down to binaries just like C, C++, Go and other languages increasing performance, decreasing the size of the binary, and removing the need to install the JVM on the target container or computer.

Photo by American Public Power Association on Unsplash

Most important to me was the performance and efficiency - the promise of Java code that could run faster and/or more efficiently (Java already is efficient according to this study but still some ways to go to reach C level performance), but the other features are also appealing: a small, deployable binary, would Python code on GraalVM run faster than JPython did, memory savings, faster startup times, and so on.

The take away is that many of these are delivered, but not all at once or all equally. Performance of native images sometimes lagged the normal JVM significantly.

To get started, download the GraalVM. I used Java 11 rather than Java 8 although I later found that Java 11 has poorer performance vs Java 8 or the newest (at the moment) Java 14 (OpenJDK 14). Once downloaded and unpacked (tar xzvf ...), cd into the directory that the unpacked download was put into. Then set some environment variables:
export JAVA_HOME=$PWD
export PATH=$PWD/bin:$PATH
which java; java -version
which gu
The last few commands are to check that you're accessing the right Java location and version and that the gu "GraalVM Component Updater" is available. The java -version produced output including this: OpenJDK Runtime Environment GraalVM CE 20.1.0.

Using this trivial Java code, let's give GraalVM a try - here's SimpleGraalExample.java:
public class SimpleGraalExample {
    public static void main(String[] args) {
        System.out.println(System.currentTimeMillis());
        System.out.println("Hello from the simplest GraalVM Example");
    }
}


Compile that with normal Java and run it:
date +%s%N; java SimpleGraalExample  
1595759258474307392
1595759258719
Hello from the simplest GraalVM Example

The date call returns the time since the epoch in seconds and then nanoseconds while the Java code returns the time in milliseconds. This gives us a rough way to see how long the jvm took to start up and run the first statement - 719ms - 474ms = 245ms in this case (there are things happening on the command line that will account for some of the time. I think there's a way to get the JVM to report the startup time so will update.)

To make a native image, you'll need to install a number of libraries and gcc: gcc, glibc-devel, zlib-devel, and libstdc++, then install the Graal native image component using gu:
gu install native-image

Next run:
native-image SimpleGraalExample
On my system, it took about 10s and used about 2GB of memory to produce the native image. Once done, you should have an executable file named simplegraalexample (all lower case - Linux/Unix standard). Run that:
date +%s%N; ./simplegraalexample  
1595759279619196068
1595759279628
Hello from the simplest GraalVM Example
The startup time has dropped to roughly 9ms. So, a single file executable with a fast startup time - definitely useful for containers that are set to autoscale under load especially if this shrinks start up times for larger applications that would normally take several seconds to load.

In this example, we've not really stretched the JIT aspect at all. Java has two tiers to the current JIT compiler - the C1 "quick and decent improvements" tier and the C2 "deeper optimizations that take longer" tier. Here's a description from the C++ JDK code:

 *  The system supports 5 execution levels:
 *  * level 0 - interpreter
 *  * level 1 - C1 with full optimization (no profiling)
 *  * level 2 - C1 with invocation and backedge counters
 *  * level 3 - C1 with full profiling (level 2 + MDO)
 *  * level 4 - C2


The idea behind the new JIT in Graal is to write it in Java so that it's easier to maintain and extend and one that avoids any memory issues of the current C++ versions.

To test the JIT and Graal's performance more, use the CountUpperCase example on the GraalVM site. I've added a call to System.currentTimeMillis() at the start.
javac CountUppercase.java
date +%s%N; java CountUppercase What kind of Performance would anyOne expect from Graal and the new JIT compiler
1595760201725131521
1595760202020
1 (687 ms)
2 (199 ms)
3 (159 ms)
4 (158 ms)
5 (140 ms)
6 (142 ms)
7 (169 ms)
8 (151 ms)
9 (141 ms)
total: 69999993 (2096 ms)

The time for the first vs second and subsequent iterations shows the effects of compilation. If you want to see the compilation happen, add -Dgraal.PrintCompilation=true to the java execution above.

Run this again turning off the new JIT compiler and run with the default:
date +%s%N; java -XX:-UseJVMCICompiler CountUppercase What kind of Performance would anyOne expect from Graal and the new JIT compiler
1595760345136700939
1595760345238
1 (1051 ms)
2 (957 ms)
3 (954 ms)
4 (940 ms)
5 (943 ms)
6 (950 ms)
7 (942 ms)
8 (953 ms)
9 (958 ms)
total: 69999993 (9596 ms)


Below, I tried to "turn on" tiered compilation (C1 + C2) another way, but since this is the default behavior since Java 8, I'm not sure what it really did as the results look much less like "UseJVMCICompiler" and much more like the Graal JIT.

date +%s%N; java -XX:+TieredCompilation CountUppercase What kind of Performance would anyOne expect from Graal and the new JIT compiler
1595760835130792310
1595760835241
1 (602 ms)
2 (279 ms)
3 (194 ms)
4 (235 ms)
5 (165 ms)
6 (220 ms)
7 (154 ms)
8 (271 ms)
9 (168 ms)
total: 69999993 (2451 ms)


To stop at level 2 JIT, use a command like this:
java -XX:-UseJVMCICompiler -XX:TieredStopAtLevel=2 CountUpper...

To see the native image performance:
native-image CountUppercase
date +%s%N; ./countuppercase What kind of Performance would anyOne expect from Graal and the new JIT compiler
1595780282446553637
1595780282455
1 (1467 ms)
2 (1227 ms)
3 (1222 ms)
4 (1211 ms)
5 (1219 ms)
6 (1216 ms)
7 (1233 ms)
8 (1252 ms)
9 (1277 ms)
total: 69999993 (12585 ms)
An almost instant startup, but we've lost performance here - seemingly lots.

The native-image tool has a code profiling optimization that is only available in the enterprise version and a tracing agent that helps identify which classes will be used. The latter can be helpful for reducing time by instantiating classes at compile time. To use the tracing agent, run the code normally and exercise the code paths. For the simple CountUpperCase.java, it's run it like this:
 
java -agentlib:native-image-agent=config-output-dir=./META-INF/native-image/ CountUppercase What kind of Performance would anyOne expect from Graal and the new JIT compiler
 
This will put four files into META-INF/native-images. In this case, the files were close to empty, so I knew this wouldn't make much of a difference. However, in a larger application, the trace will help identify class usage better than static analysis.
Now, run:
native-image -cp ./META-INF/ CountUppercase
date +%s%N; ./countuppercase What kind of Performance would anyOne expect from Graal and the new JIT compiler
1595782668222550375
1595782668277
1 (1364 ms)
2 (1177 ms)
3 (1168 ms)
4 (1174 ms)
5 (1154 ms)
6 (1163 ms)
7 (1152 ms)
8 (1165 ms)
9 (1209 ms)
total: 69999993 (12008 ms)
Unfortunately, not much of a gain at all.

While this runs slower, tests with a simple web app (using HTTPServer) show very little difference in performance - faster or slower suggesting that for APIs and Spring boot apps, native image would have little impact. (Will add code...)

Testing the Java scimark program, here are the steps and results (javac and javac -O produced similar results as you'd expect if the JIT rather than javac was doing most of the optimization):
javac jnt/scimark2/commandline.java
time java -classpath . jnt.scimark2.commandline
 
SciMark 2.0a
 
Composite Score: 1050.056478100177
FFT (1024): 1036.3878030994783
SOR (100x100):   857.5678798364443
Monte Carlo : 408.1116711801417
Sparse matmult (N=1000, nz=5000): 1338.152124290685
LU (100x100): 1610.0629120941362
 
java.vendor: GraalVM Community
java.version: 11.0.7
os.arch: amd64
os.name: Linux
os.version: 5.7.8-100.fc31.x86_64
 
real    0m31.560s
user    0m31.959s
sys    0m0.098s


Using the -UseJVMCICompiler option to turn off the new compiler:

time java -XX:-UseJVMCICompiler -classpath . jnt.scimark2.commandline
 
SciMark 2.0a
 
Composite Score: 1442.2801308185528
FFT (1024): 914.6516464184309
SOR (100x100):   1068.4530497649632
Monte Carlo : 664.6498065711132
Sparse matmult (N=1000, nz=5000): 1135.3139812371085
LU (100x100): 3428.3321701011487


Again trying with the +TieredCompiler option was more in line with the GraalVM JIT results:
time java -XX:+TieredCompilation -classpath . jnt.scimark2.commandline
 
SciMark 2.0a
 
Composite Score: 1053.134652575841
.....
I suspect that it's not turning off the new JIT.

Creating a native image and running it. First, add a manifest file, create a jar and then run native-image:
cat META-INF/MANIFEST.MF 
Main-Class: jnt.scimark2.commandline
jar cmvf META-INF/MANIFEST.MF scimark2.jar jnt/scimark2/*.clas
java -jar scimark2.jar
native-image -jar scimark2.jar commandline # wait about 40s to run
./commandline

SciMark 2.0a
 
Composite Score: 665.0761252118398
FFT (1024): 678.1650940478376
SOR (100x100):   877.8838572232665
Monte Carlo : 31.565788470303204
Sparse matmult (N=1000, nz=5000): 695.5266550424169
LU (100x100): 1042.2392312753743
 
java.vendor: Oracle Corporation
java.version: 11.0.7
os.arch: amd64
os.name: Linux
os.version: 5.7.8-100.fc31.x86_64

Again, the performance here has dropped (also note the change in java vendor). While investigating this, I came across this comment from adinn which I could summarize as: "why would you expect the static compiler to produce code as fast as the JIT?" He carries on with an excellent explanation. However, the reason I'd expect it to run as fast or faster than JIT is that C/C++ code with the static compiler based optimizers run very fast (faster than Java). Undoubtedly, there is more performance available, but overall, the ability to make native images (no JVM installation in the production system), fast start ups, run other programming languages, and the potential for the new (Java based) compiler make GraalVM a very interesting project.

I haven't added stats for running GraalVM against a project that combines with large frameworks like Spring. I'll add that later. However, this is another place that Graal shines - by evaluating code paths and throwing away unnecessary code, the binaries are much smaller and start up much faster (as above) than normal.

Comments

Popular Posts