GraalVM demos: Graal Performance Examples for Java

The Graal compiler achieves excellent performance for modern workloads such as Scala or usage of the Java Streams API. The examples below demonstrate this.

Prerequisites

Running the examples

Let us use a simple example based on the Streams API to demonstrate performance of the Graal compiler. This example counts the number of upper case characters in a body of text. To simulate a large load, the same sentence is processed 10 million times:

1. Save the following code snippet to a file named CountUppercase.java:

  // COMPILE-CMD: javac {file}
// RUN-CMD: java -Diterations=2 {file} In 2017 I would like to run ALL languages in one VM.
// RUN-CMD: java -Diterations=2 -XX:-UseJVMCICompiler {file} In 2017 I would like to run ALL languages in one VM.
// BEGIN-SNIPPET
public class CountUppercase {
    static final int ITERATIONS = Math.max(Integer.getInteger("iterations", 1), 1);
    public static void main(String[] args) {
        String sentence = String.join(" ", args);
        for (int iter = 0; iter < ITERATIONS; iter++) {
            if (ITERATIONS != 1) System.out.println("-- iteration " + (iter + 1) + " --");
            long total = 0, start = System.currentTimeMillis(), last = start;
            for (int i = 1; i < 10_000_000; i++) {
                total += sentence.chars().filter(Character::isUpperCase).count();
                if (i % 1_000_000 == 0) {
                    long now = System.currentTimeMillis();
                    System.out.printf("%d (%d ms)%n", i / 1_000_000, now - last);
                    last = now;
                }
            }
            System.out.printf("total: %d (%d ms)%n", total, System.currentTimeMillis() - start);
        }
    }
}
// END-SNIPPET

  

2. Compile it and run as follows:

$ javac CountUppercase.java
$ java CountUppercase In 2017 I would like to run ALL languages in one VM.
1 (2078 ms)
2 (633 ms)
3 (394 ms)
4 (346 ms)
5 (218 ms)
6 (108 ms)
7 (106 ms)
8 (94 ms)
9 (96 ms)
total: 69999993 (4179 ms)

Since Graal itself is executed by the VM, it will first be interpreted and only JIT compiled once it is hot. This is reflected in the first times shown above. This warmup time depends on numerous factors including how multi-threaded the application code is and how many compiler threads are used by the VM. On a machine with fewer cores, the warmup can take longer. If the performance profile of CountUppercase on your machine does not match the above, run it for more iterations by adding -Diterations=N just after java for some N greater than 1.

Note: We are currently developing a solution in which Graal is compiled ahead of time using SubstrateVM into a native library. In this mode, Graal will have no noticeable warmup time.

3. Add the -Dgraal.PrintCompilation=true option to see statistics for Graal compilations:

$ java -Dgraal.PrintCompilation=true CountUppercase In 2017 I would like to run ALL languages in one VM.

This option prints a line after each Graal compilation that shows the method compiled, time taken, bytecodes processed (including inlined methods), size of machine code produced, and amount of memory allocated during compilation.

4. Use the -XX:-UseJVMCICompiler option to disable the Graal compiler and use the native top tier compiler in the VM to compare performance, as follows:

$ java -XX:-UseJVMCICompiler CountUppercase In 2017 I would like to run ALL languages in one VM.
1 (754 ms)
2 (627 ms)
3 (604 ms)
4 (609 ms)
5 (611 ms)
6 (622 ms)
7 (607 ms)
8 (610 ms)
9 (610 ms)
total: 69999993 (5018  ms)

The preceding example demonstrates the benefits of partial escape analysis (PEA) and advanced inlining, which combine to significantly reduce heap allocation. The results were obtained using GraalVM Enterprise Edition.

The GraalVM Community Edition still has good performance compared to the native top tier compiler as shown below. You can simulate the Community Edition on the Enterprise Edition by adding the option -Dgraal.CompilerConfiguration=community.

Sunflow is an open source rendering engine. The following example is a simplified version of code at the core of the Sunflow engine. It performs calculations to blend various values for a point of light in a rendered scene.

1. Save the following code snippet to a file named Blender.java:

  // COMPILE-CMD: javac {file}
// RUN-CMD: java {file}
// RUN-CMD: java -XX:-UseJVMCICompiler {file}
// BEGIN-SNIPPET
public class Blender {

    private static class Color {
        double r, g, b;

        private Color(double r, double g, double b) {
            this.r = r;
            this.g = g;
            this.b = b;
        }

        public static Color black() {
            return new Color(0, 0, 0);
        }

        public void add(Color other) {
            r += other.r;
            g += other.g;
            b += other.b;
        }

        public void add(double nr, double ng, double nb) {
            r += nr;
            g += ng;
            b += nb;
        }

        public void multiply(double factor) {
            r *= factor;
            g *= factor;
            b *= factor;
        }
    }

    private static final Color[][][] colors = new Color[100][100][100];

    public static void main(String[] args) {
        for (int j = 0; j < 10; j++) {
            long t = System.nanoTime();
            for (int i = 0; i < 100; i++) {
                initialize(new Color(j / 20, 0, 1));
            }
            long d = System.nanoTime() - t;
            System.out.println(d / 1_000_000 + " ms");
        }
    }

    private static void initialize(Color id) {
        for (int x = 0; x < colors.length; x++) {
            Color[][] plane = colors[x];
            for (int y = 0; y < plane.length; y++) {
                Color[] row = plane[y];
                for (int z = 0; z < row.length; z++) {
                    Color color = Color.black();
                    color.add(x / (double) colors.length, y / (double) row.length, z / (double) row.length);
                    color.add(id);
                    color.multiply(1 / 4d);
                    // PEA moves allocation of `color` to this point so above
                    // computation avoids memory accesses for its fields.
                    row[z] = color;
                }
            }
        }
    }
}
// END-SNIPPET

  

2. Compile it and run as follows:

$ javac Blender.java
$ java Blender
2676 ms
706 ms
612 ms
520 ms
537 ms
531 ms
579 ms
547 ms
557 ms
554 ms

3. Once again, use the -XX:-UseJVMCICompiler option to disable Graal:

$ java -XX:-UseJVMCICompiler Blender
1194 ms
916 ms
726 ms
727 ms
733 ms
748 ms
748 ms
735 ms
811 ms
752 ms

For fun, modify Blender.java to replace row[z] = color; with if (z == y && y == x) row[z] = color; and re-run to see what happens to the performance on Graal.

The improvement in this example comes from PEA moving the allocation of color in initialize down to the point where it is stored into colors (i.e., the point at which it escapes).