Skip to main content

JVM's Boltzmann effects

This write-up discusses an aspect of JVM's just-in-time compilation around dynamic method dispatch.

Following are the results from a run of 5 measurement iterations of JMH (Java micro-benchmarking harness) showing that JVM could perform 69 million operations in the bi-morphic case than just 62 million operations in mega-morphic case. And no matter how many times the benchmark is run, the results remain the same.

This benchmark was executed against a Java 8 runtime, similar results appear if we run the same code against Java 10's latest update. 





The hierarchy of classes involved in this example is shown here:

















Above results are from a run of following benchmarking code and we can see there is literally no difference between the two methods being benchmarked. Then what just happened?








In general, JVM knows (via just in time compilation) how many possible types exist for a polymorphic reference type. With this awareness, JIT compiler can possibly make some optimizations.

if JIT sees that application is using only one concrete type (e.g only Beowulf or only Centaur) to be assigned to reference variable 'beast', then all invocations of beast.showSomeAwesomeness() don't need a virtual function table dispatch.

A virtual function table dispatch (exactly what happens in C++ and don't be surprised to know that JVM in its running state is actually a C++ process) is responsible to invoke and execute the correct method implementation from the correct concrete instance in polymorphic scenarios. Virtual lookup and dispatch have a cost which is generally negligible but can be seen in precisely benchmarked scenarios.

When JIT sees only one possible concrete type, it can cache the C++ pointers used for correct method dispatch at the call site. This is called Monomorphic dispatch. And after caching by JIT, it gets superfast as no lookup is needed on future invocations.

If JIT sees, only two possible concrete call site invocations, it applies another optimization which is called Bi-morphic dispatch which works similar to the monomorphic case by caching two pointers at the call site. 

This is the case at line number 43 in the above example. Since JIT knew only possible concrete types in use will be Centaur and Polyphemus, it will cache the method dispatch accordingly. This is possible because the code already revealed Beowulf as one concrete type at line number 40. So the possible remaining types were two only and JIT was conscious enough to perform the optimization at bi-morphic call site in else block. This can be seen as peeling off (or revelation) few possible types, leaving only two remaining types. 

This kind of trick is only possible when we know the number of possible types will be limited. If there are more types that can be presumed, a mega-morphic dispatch happens which is the slowest because there is no call-site caching involved by JIT.

Moreover, the caching which JIT does is not everlasting. There is an undo behavior which JIT applies. E.g. what if the reference variable beast is not method local and is coming from outside where potentially some other thread might change the variable assignment just before the invocations at lines 40, 43 or 57. That will lead to incorrect behavior if runtime uses the cached pointers because then incorrect code will be invoked. Therefore runtime applies a guard just before the invocation to validate the reference. If it sees the guard has been invalidated, it un-caches the call site which means the next call site invocation will use virtual table lookup again.

This is only a small view of many profile-guided optimizations Hotspot VM's perform at runtime. Among other optimizations are escape analysis, inlining, lock elision, loop unrolling etc.. This kind of self-awareness is not possible for statically compiled language runtimes and is one of the strongest arguments in favor of Java and JVM enabled languages (Kotlin, Scala, Java, Grovvy ... many others).

---------------------

Regarding the beasts used in this code, here is a self-colored portrait of Polyphemus (I only colored it, did not draw it). More details about Polyphemus can be found here.







Comments

Popular posts from this blog

C++11 std::thread Construction and Assignment Tricks

C++11 arrived with std::thread and related classes built within the standard. With the move semantics in place, the construction and assignment practices of 'std threads' deserve a small write up. Let's define a 'Runnable' first: class Runnable {    public:       virtual void operator ()() = 0;       virtual ~Runnable(){} }; class Task : Runnable {      public:          void operator ()()          {               //do sth here..          } };                                                 //later..      #include <thread>     Task runnable;     std::thread t1; //not attached to any thread                                                 std::thread t2 ( runnable ); //attached and running   Copy Construction of std::thread is not allowed            std::thread t3 ( t2 ); // compilation error            std::thread t4 = t2; // compilation error error : use of deleted function std::thread (const std::th

C++ logging using Apache Log4cxx on Linux

I'd demonstrate the set up of Apache log4cxx on Ubuntu 12.x in following steps: 1. Download all required packages 2. Build Log4cxx via Apache Maven 3. Use the libraries in a NetBeans C++ project on Ubuntu Linux. Reference URL: http://www.yolinux.com/TUTORIALS/Log4cxx.html This URL is quite detailed and explains things for other *nix operating systems as well. I wanted to start with minimum steps required, hence this post. I have a Windows 7 desktop and have Ubuntu 12.x 64-bit running on it via Oracle Virtualbox. So Ubuntu is running as a guest on my Windows 7 host. [The reference URL mentions different versions of  'libaprXX' libs but we have to use 'libapr1-dev' and 'libaprutil-dev', will see that later.] Prerequisites (install all of the following one by one) autoconf and automake --> sudo apt-get install autoconf automake libxml2 and libxml2-devel ( http://xmlsoft.org/ ) --> sudo apt-get install libxml2 libxml2-dev gmp (

..Where Apache Camel meets Spring JMS

This post is relevant for developers who are aware of Apache Camel and use its JMS component to consume from a JMS broker. We are only discussing the message consumption part here. I'm assuming you are conversant with Camel's terminology of routes, endpoints, processors and EIPs. If not, following is a good place to start: Link --->  Apache Camel Home   Ever wondered what happens underneath when a Camel consumer route is defined to take messages from a JMS endpoint ? This post is an attempt to explain how Camel integrates with JMS via Spring APIs and how the business logic defined in a Camel route is invoked ? Following is a simplified sequence diagram showing the set-up process which will eventually lead to message consumption from JMS broker. . Creation of JMS endpoints in a camel context is initiated by definition (via Java/XML DSL). A JMS endpoint takes a number of configurable attributes as seen here Camel JMS Component . As soon as context is bootstrappe