A compiler is a software tool that translates (compiles) program source code into machine-readable instructions such as machine binary code. Along the way, compilers usually optimize the program to consume less resources such as CPU time and memory capacity.
Typical compilers don’t understand the semantics of the program operations, nor their structure. In contrast, an inferential compiler assesses what data structures and computations may benefit from being restructured and distributes workloads across multiple processors based on program semantics.
Typically, compilers perform generic optimizations that are independent of the functionality of the program. For example, in certain cases, the compiler can understand that a variable is not used anywhere else in the program, so it can eliminate the computation associated with it. This optimization is called dead code elimination. Here’s an example of dead code elimination:
The declaration and assignment of the variable i within func is dead code, as i is not used anywhere else, and this code can be eliminated by the compiler. This optimization does not require understanding the semantics of the program’s operations and its functionality in general. The compiler does not even need to know if the program is a web portal or an AI training algorithm.
The lack of understanding of program semantics by typical compilers doesn’t allow them to perform consequential optimizations such as replacing data structures, restructuring operations, and parallelization. Such major program transformations are typically performed by high-performance computing experts and require significant manual effort. Typically, these experts use Message-Passing Interface, or MPI, which is a library of functions for message passing between processors to facilitate parallel computation.
An inferential compiler is a compiler that can understand the semantics of the program operations as well as their structure. It uses this understanding to infer fundamental properties of the program and optimize it. These properties range from variable data types to parallelization, and enable the inferential compiler to automate many of the manual steps taken by performance experts for optimization.
Automatic parallelization may be the most challenging transformation for a compiler. An inferential compiler for data analytics provides auto-parallelism by inferring which data structures and computations need to be distributed across processors and how, based on program semantics.
Bodo’s inferential compiler first analyzes Python data analytics code to understand and optimize its operations (Optimization). Then it decides the data structure and computation distribution across the processors (Parallelization). The next step is transforming the program into a parallel version, which requires distributing data and compute and inserting the necessary MPI calls (Distributed Transform). Finally, Bodo uses LLVM to produce optimized, parallel machine-readable code (LLVM).