Saturday, August 8, 2020

Trying out GraalVM

Interested in running Java a bit like running C/C++ or Go? I wanted to give it a try: GraalVM is a very interesting project for a few reasons - a new Java JIT, running a variety of languages on the JVM, and, most importantly for here, the native image capability. Using native images means programs for the JVM are compiled down to binaries just like C, C++, Go and other languages increasing performance, decreasing the size of the binary, and removing the need to install the JVM on the target container or computer.

Photo by American Public Power Association on Unsplash

Most important to me was the performance and efficiency - the promise of Java code that could run faster and/or more efficiently (Java already is efficient according to this study but still some ways to go to reach C level performance), but the other features are also appealing: a small, deployable binary, would Python code on GraalVM run faster than JPython did, memory savings, faster startup times, and so on.

The take away is that many of these are delivered, but not all at once or all equally. Performance of native images sometimes lagged the normal JVM significantly.

To get started, download the GraalVM. I used Java 11 rather than Java 8 although I later found that Java 11 has poorer performance vs Java 8 or the newest (at the moment) Java 14 (OpenJDK 14). Once downloaded and unpacked (tar xzvf ...), cd into the directory that the unpacked download was put into. Then set some environment variables:
export PATH=$PWD/bin:$PATH
which java; java -version
which gu
The last few commands are to check that you're accessing the right Java location and version and that the gu "GraalVM Component Updater" is available. The java -version produced output including this: OpenJDK Runtime Environment GraalVM CE 20.1.0.

Using this trivial Java code, let's give GraalVM a try - here's
public class SimpleGraalExample {
    public static void main(String[] args) {
        System.out.println("Hello from the simplest GraalVM Example");

Compile that with normal Java and run it:
date +%s%N; java SimpleGraalExample  
Hello from the simplest GraalVM Example

The date call returns the time since the epoch in seconds and then nanoseconds while the Java code returns the time in milliseconds. This gives us a rough way to see how long the jvm took to start up and run the first statement - 719ms - 474ms = 245ms in this case (there are things happening on the command line that will account for some of the time. I think there's a way to get the JVM to report the startup time so will update.)

To make a native image, you'll need to install a number of libraries and gcc: gcc, glibc-devel, zlib-devel, and libstdc++, then install the Graal native image component using gu:
gu install native-image

Next run:
native-image SimpleGraalExample
On my system, it took about 10s and used about 2GB of memory to produce the native image. Once done, you should have an executable file named simplegraalexample (all lower case - Linux/Unix standard). Run that:
date +%s%N; ./simplegraalexample  
Hello from the simplest GraalVM Example
The startup time has dropped to roughly 9ms. So, a single file executable with a fast startup time - definitely useful for containers that are set to autoscale under load especially if this shrinks start up times for larger applications that would normally take several seconds to load.

In this example, we've not really stretched the JIT aspect at all. Java has two tiers to the current JIT compiler - the C1 "quick and decent improvements" tier and the C2 "deeper optimizations that take longer" tier. Here's a description from the C++ JDK code:

 *  The system supports 5 execution levels:
 *  * level 0 - interpreter
 *  * level 1 - C1 with full optimization (no profiling)
 *  * level 2 - C1 with invocation and backedge counters
 *  * level 3 - C1 with full profiling (level 2 + MDO)
 *  * level 4 - C2

The idea behind the new JIT in Graal is to write it in Java so that it's easier to maintain and extend and one that avoids any memory issues of the current C++ versions.

To test the JIT and Graal's performance more, use the CountUpperCase example on the GraalVM site. I've added a call to System.currentTimeMillis() at the start.
date +%s%N; java CountUppercase What kind of Performance would anyOne expect from Graal and the new JIT compiler
1 (687 ms)
2 (199 ms)
3 (159 ms)
4 (158 ms)
5 (140 ms)
6 (142 ms)
7 (169 ms)
8 (151 ms)
9 (141 ms)
total: 69999993 (2096 ms)

The time for the first vs second and subsequent iterations shows the effects of compilation. If you want to see the compilation happen, add -Dgraal.PrintCompilation=true to the java execution above.

Run this again turning off the new JIT compiler and run with the default:
date +%s%N; java -XX:-UseJVMCICompiler CountUppercase What kind of Performance would anyOne expect from Graal and the new JIT compiler
1 (1051 ms)
2 (957 ms)
3 (954 ms)
4 (940 ms)
5 (943 ms)
6 (950 ms)
7 (942 ms)
8 (953 ms)
9 (958 ms)
total: 69999993 (9596 ms)

Below, I tried to "turn on" tiered compilation (C1 + C2) another way, but since this is the default behavior since Java 8, I'm not sure what it really did as the results look much less like "UseJVMCICompiler" and much more like the Graal JIT.

date +%s%N; java -XX:+TieredCompilation CountUppercase What kind of Performance would anyOne expect from Graal and the new JIT compiler
1 (602 ms)
2 (279 ms)
3 (194 ms)
4 (235 ms)
5 (165 ms)
6 (220 ms)
7 (154 ms)
8 (271 ms)
9 (168 ms)
total: 69999993 (2451 ms)

To stop at level 2 JIT, use a command like this:
java -XX:-UseJVMCICompiler -XX:TieredStopAtLevel=2 CountUpper...

To see the native image performance:
native-image CountUppercase
date +%s%N; ./countuppercase What kind of Performance would anyOne expect from Graal and the new JIT compiler
1 (1467 ms)
2 (1227 ms)
3 (1222 ms)
4 (1211 ms)
5 (1219 ms)
6 (1216 ms)
7 (1233 ms)
8 (1252 ms)
9 (1277 ms)
total: 69999993 (12585 ms)
An almost instant startup, but we've lost performance here - seemingly lots.

The native-image tool has a code profiling optimization that is only available in the enterprise version and a tracing agent that helps identify which classes will be used. The latter can be helpful for reducing time by instantiating classes at compile time. To use the tracing agent, run the code normally and exercise the code paths. For the simple, it's run it like this:
java -agentlib:native-image-agent=config-output-dir=./META-INF/native-image/ CountUppercase What kind of Performance would anyOne expect from Graal and the new JIT compiler
This will put four files into META-INF/native-images. In this case, the files were close to empty, so I knew this wouldn't make much of a difference. However, in a larger application, the trace will help identify class usage better than static analysis.
Now, run:
native-image -cp ./META-INF/ CountUppercase
date +%s%N; ./countuppercase What kind of Performance would anyOne expect from Graal and the new JIT compiler
1 (1364 ms)
2 (1177 ms)
3 (1168 ms)
4 (1174 ms)
5 (1154 ms)
6 (1163 ms)
7 (1152 ms)
8 (1165 ms)
9 (1209 ms)
total: 69999993 (12008 ms)
Unfortunately, not much of a gain at all.

While this runs slower, tests with a simple web app (using HTTPServer) show very little difference in performance - faster or slower suggesting that for APIs and Spring boot apps, native image would have little impact. (Will add code...)

Testing the Java scimark program, here are the steps and results (javac and javac -O produced similar results as you'd expect if the JIT rather than javac was doing most of the optimization):
javac jnt/scimark2/
time java -classpath . jnt.scimark2.commandline
SciMark 2.0a
Composite Score: 1050.056478100177
FFT (1024): 1036.3878030994783
SOR (100x100):   857.5678798364443
Monte Carlo : 408.1116711801417
Sparse matmult (N=1000, nz=5000): 1338.152124290685
LU (100x100): 1610.0629120941362
java.vendor: GraalVM Community
java.version: 11.0.7
os.arch: amd64 Linux
os.version: 5.7.8-100.fc31.x86_64
real    0m31.560s
user    0m31.959s
sys    0m0.098s

Using the -UseJVMCICompiler option to turn off the new compiler:

time java -XX:-UseJVMCICompiler -classpath . jnt.scimark2.commandline
SciMark 2.0a
Composite Score: 1442.2801308185528
FFT (1024): 914.6516464184309
SOR (100x100):   1068.4530497649632
Monte Carlo : 664.6498065711132
Sparse matmult (N=1000, nz=5000): 1135.3139812371085
LU (100x100): 3428.3321701011487

Again trying with the +TieredCompiler option was more in line with the GraalVM JIT results:
time java -XX:+TieredCompilation -classpath . jnt.scimark2.commandline
SciMark 2.0a
Composite Score: 1053.134652575841
I suspect that it's not turning off the new JIT.

Creating a native image and running it. First, add a manifest file, create a jar and then run native-image:
Main-Class: jnt.scimark2.commandline
jar cmvf META-INF/MANIFEST.MF scimark2.jar jnt/scimark2/*.clas
java -jar scimark2.jar
native-image -jar scimark2.jar commandline # wait about 40s to run

SciMark 2.0a
Composite Score: 665.0761252118398
FFT (1024): 678.1650940478376
SOR (100x100):   877.8838572232665
Monte Carlo : 31.565788470303204
Sparse matmult (N=1000, nz=5000): 695.5266550424169
LU (100x100): 1042.2392312753743
java.vendor: Oracle Corporation
java.version: 11.0.7
os.arch: amd64 Linux
os.version: 5.7.8-100.fc31.x86_64

Again, the performance here has dropped (also note the change in java vendor). While investigating this, I came across this comment from adinn which I could summarize as: "why would you expect the static compiler to produce code as fast as the JIT?" He carries on with an excellent explanation. However, the reason I'd expect it to run as fast or faster than JIT is that C/C++ code with the static compiler based optimizers run very fast (faster than Java). Undoubtedly, there is more performance available, but overall, the ability to make native images (no JVM installation in the production system), fast start ups, run other programming languages, and the potential for the new (Java based) compiler make GraalVM a very interesting project.

I haven't added stats for running GraalVM against a project that combines with large frameworks like Spring. I'll add that later. However, this is another place that Graal shines - by evaluating code paths and throwing away unnecessary code, the binaries are much smaller and start up much faster (as above) than normal.

Wednesday, May 27, 2020


What are Quaternions and why would anyone care?

Quaternions are the next step up in complex numbers. Complex numbers are real numbers with an 'imaginary' part. 'imaginary' is in quotes as that's what we call it, but it's not fiction, it's the square root of -1, often represented by i (or j in electrical engineering). Written differently i2=-1. In early days, people considered this strange and useless so imaginary is fitting. By the way, if you go back far enough people didn't see any point in 0 either.

A real number is represented by a - any real number like 2, 1.5781739, 10000, etc. A complex number is represented by a + bi where a and b are real numbers and i2=-1. Complex numbers are very good at represented rotations (as in a circle around 0) which is why they're important in many scientific and engineering activities.

Quaternions are the next step up: a + bi + cj + dk or a0 + ia1 + ja2 + ka3. Here i2 = j2 = k2 = -1. All good so far, but it gets more complex in that ij = k = -ji, jk = i = -kj, and ki = j = -ik. That leads to ijk = -1 which can be seen from ij=k and k*k = -1. The fact that ij = k and not -1 is a little confusing, but these are more than just the square root of -1, but directions in 'quaternion space'. What these look like is the outcome of a standard vector cross product of i x j = k and j x i = -k.

What's interesting about quaternions is that they have almost all the properties of real and complex numbers in terms of addition, subtraction, multiplication, division, and inverses, except that multiplication is not commutattive. In other words, for normal numbers a*b = b*a, but as we saw for quaternions ij = -ji.

In terms of their use, quaternions are useful for rotations in higher dimensions that planes where complex numbers are good. In many ways the 4 dimensional approach resembles relativity theory where time plus the three spatial coordinates are linked or common differential equations where time and space are linked - especially in Schrodinger's equation where time and space are linked via a multiplier of i. For fun purposes, quaternions can replace complex numbers to create higher dimensional Mandelbrot sets. Don't stop there - there are also Tessarines another 4-dimensional complex number were j2 = 1 and Octonions - an 8th dimensional complex number.

The story is that Hamilton was trying to solve a difficult problem and came up with the idea of triplets of complex numbers while out on a walk with his wife by the Royal Canal in Dublin. Supposedly, he was so struck by the idea that he carved i2 = j2 = k2 = -1 and ijk = -1 into the stone of a bridge.

Sunday, February 16, 2020

Other Useful Linux Commands

A collection of other bash and Linux commands ... or solutions

There's obviously no point in detailing all of the Linux/Unix/Bash commands available. Here are a few commands or solutions, I wanted to remember.

Get motherboard or DIMM info:
dmidecode reads information from the DMI (desktop management interface) table which is closely related to the SMBIOS (system management BIOS). Need sudo to run:

dmidecode -t 4 # for CPU info
dmidecode -t 2 # for motherboard info
dmidecode -t memory # for all memory
dmidecode -t 17 # for sodimm information
dmidecode -t 16 # for motherboard info on memory 

There's also lshw, but it wasn't installed on my system.
Related commands: lspci, lsusb, lscpu, lsscsi, lsblk

(Other ls* commands of interest: lsmem, lslocks, lsns, lsipc, lslogins)

Weather on the command line:

While loop failing:
I had a problem where I wanted to check ssh on a number of hosts. Simple - cat the file, pipe into while, use ssh with timeout to do the check like this:
cat host_list.txt | while read h; do timeout 3 ssh $h; done

It didn't work. It just hung on the ssh command until timeout cut the command. ssh seemed to be grabbing standard input (and maybe stdout) and interfering with the while loops input. Switching to a for loop helped with the input. Then ssh didn't seem to be the best choice nor did telnet 22. Netcat in a for loop worked best:
nc -v $host 22 

Problem solved and the hosts with ssh running were found....

ssh starting remote processes:
Here's another one that might be easy or not... With ssh, you can run a remote command:
ssh "ls -l"
What if you want to start a remote process and want it to keep running - something like this:
ssh "nohup java example.jar &"

The problem with the above is that it often won't work - the remote service won't be running. This mostly has to do with terminal control. Despite nohup saying that it is redirecting output to nohup.out, there's still a problem. Two easy solutions:
ssh "nohup java example.jar > ./output_file.out 2>&1 &"

This grabs both standard out and standard error and puts them into output_file.out. Of course, another option is to create a systemd (or init.d) script and use that.

Specifically, inotifywait which will watch file system objects for events that you specify.
inotifywait -r -e access,modify /var/log
This will watch for access or modification updates on files in /var/log.
Combine inotifywait with a while loop as in
while inotifywait -e access /var/log
    some shell commands here

The only downside to inotifywait is that inotify-tools often needs to be installed (yum/dnf install or apt install).