Skip to main content

JVM's Boltzmann effects

This write-up discusses an aspect of JVM's just-in-time compilation around dynamic method dispatch.

Following are the results from a run of 5 measurement iterations of JMH (Java micro-benchmarking harness) showing that JVM could perform 69 million operations in the bi-morphic case than just 62 million operations in mega-morphic case. And no matter how many times the benchmark is run, the results remain the same.

This benchmark was executed against a Java 8 runtime, similar results appear if we run the same code against Java 10's latest update. 





The hierarchy of classes involved in this example is shown here:

















Above results are from a run of following benchmarking code and we can see there is literally no difference between the two methods being benchmarked. Then what just happened?








In general, JVM knows (via just in time compilation) how many possible types exist for a polymorphic reference type. With this awareness, JIT compiler can possibly make some optimizations.

if JIT sees that application is using only one concrete type (e.g only Beowulf or only Centaur) to be assigned to reference variable 'beast', then all invocations of beast.showSomeAwesomeness() don't need a virtual function table dispatch.

A virtual function table dispatch (exactly what happens in C++ and don't be surprised to know that JVM in its running state is actually a C++ process) is responsible to invoke and execute the correct method implementation from the correct concrete instance in polymorphic scenarios. Virtual lookup and dispatch have a cost which is generally negligible but can be seen in precisely benchmarked scenarios.

When JIT sees only one possible concrete type, it can cache the C++ pointers used for correct method dispatch at the call site. This is called Monomorphic dispatch. And after caching by JIT, it gets superfast as no lookup is needed on future invocations.

If JIT sees, only two possible concrete call site invocations, it applies another optimization which is called Bi-morphic dispatch which works similar to the monomorphic case by caching two pointers at the call site. 

This is the case at line number 43 in the above example. Since JIT knew only possible concrete types in use will be Centaur and Polyphemus, it will cache the method dispatch accordingly. This is possible because the code already revealed Beowulf as one concrete type at line number 40. So the possible remaining types were two only and JIT was conscious enough to perform the optimization at bi-morphic call site in else block. This can be seen as peeling off (or revelation) few possible types, leaving only two remaining types. 

This kind of trick is only possible when we know the number of possible types will be limited. If there are more types that can be presumed, a mega-morphic dispatch happens which is the slowest because there is no call-site caching involved by JIT.

Moreover, the caching which JIT does is not everlasting. There is an undo behavior which JIT applies. E.g. what if the reference variable beast is not method local and is coming from outside where potentially some other thread might change the variable assignment just before the invocations at lines 40, 43 or 57. That will lead to incorrect behavior if runtime uses the cached pointers because then incorrect code will be invoked. Therefore runtime applies a guard just before the invocation to validate the reference. If it sees the guard has been invalidated, it un-caches the call site which means the next call site invocation will use virtual table lookup again.

This is only a small view of many profile-guided optimizations Hotspot VM's perform at runtime. Among other optimizations are escape analysis, inlining, lock elision, loop unrolling etc.. This kind of self-awareness is not possible for statically compiled language runtimes and is one of the strongest arguments in favor of Java and JVM enabled languages (Kotlin, Scala, Java, Grovvy ... many others).

---------------------

Regarding the beasts used in this code, here is a self-colored portrait of Polyphemus (I only colored it, did not draw it). More details about Polyphemus can be found here.







Comments

Popular posts from this blog

C++ logging using Apache Log4cxx on Linux

I'd demonstrate the set up of Apache log4cxx on Ubuntu 12.x in following steps: 1. Download all required packages 2. Build Log4cxx via Apache Maven 3. Use the libraries in a NetBeans C++ project on Ubuntu Linux. Reference URL: http://www.yolinux.com/TUTORIALS/Log4cxx.html This URL is quite detailed and explains things for other *nix operating systems as well. I wanted to start with minimum steps required, hence this post. I have a Windows 7 desktop and have Ubuntu 12.x 64-bit running on it via Oracle Virtualbox. So Ubuntu is running as a guest on my Windows 7 host. [The reference URL mentions different versions of  'libaprXX' libs but we have to use 'libapr1-dev' and 'libaprutil-dev', will see that later.] Prerequisites (install all of the following one by one) autoconf and automake --> sudo apt-get install autoconf automake libxml2 and libxml2-devel ( http://xmlsoft.org/ ) --> sudo apt-get install libxml2 libxml2-dev gmp ( ...

C++11 std::thread Construction and Assignment Tricks

C++11 arrived with std::thread and related classes built within the standard. With the move semantics in place, the construction and assignment practices of 'std threads' deserve a small write up. Let's define a 'Runnable' first: class Runnable {    public:       virtual void operator ()() = 0;       virtual ~Runnable(){} }; class Task : Runnable {      public:          void operator ()()          {               //do sth here..          } };                                           ...

C++ Initialization Subtleties (significance of -Werror compiler option)

Modern C++ continuous integration build pipelines might produce huge logs which may be easily overlooked. Among other errors/warnings, a potential risk caused by invalid narrowing assignments might be lurking in those dark corners... This write up is a little reminder about an essential feature in modern C++ compilers and can help defend against that specific problem. Prior to C++11, the following was a valid assignment and still is even in C++20 or higher ... unsigned int unsignedSinceBirth = 10; unsignedSinceBirth = -10;  // assigning a negative value to an unsigned container printUnsignedVal(unsignedSinceBirth); which when compiled using these options "g++ -std=c++20 <cpp files> -o <executable-name>" does not even emit a warning. And an executable is generated successfully. The output of running that code is an arbitrary unexpected value. Modern C++ (post Cpp11) allow uniform initialization syntax which can help the compiler detect this situation as follows: ...