Enabling Julia to target GPGPUs using Polly | Google Summer of Code 2017

I'm excited to be a part of GSoC 2017, which has been a dream come true for 2 reasons: I get to contribute to open source software, which I've been looking forward to for the past 8 years and will be gaining practical experience in Compilers and Scientific Computing, which are fields of my interest.

Over the course of the next 3 months, I will be working towards making it easier to run Julia, a scientific computing language, on GPGPUs. The following is a short primer of the tools used in the project, to help set the context for my project. You could also have a look at my GSoC proposal which delves deeper in detail.

A Primer

LLVM

The LLVM project provides a modular compiler infrastructure that enables analysis and optimisation of code. It positions itself at the optimisation stage in the compile pipeline, between the initial stages of parsing and when the machine code equivalent to the source program is generated. Since many optimisations that are generally performed on programs are the same or similar, LLVM has the potential to become a common component of many compilers. This is enabled by the LLVM-IR, an intermediate representation that can be understood and manipulated by LLVM's routines, to which the source language is converted to. These routines are known as "Compiler Passes" and are of two types,

Analysis Passes:- Which provide information characteristic of the IR.
Transformation Passes:- Which change the code to optimise it according to a parameter (e.g. size, run-time) and may use metadata generated by the Analyses Passes.

LLVM has been used by clang C/C++ compiler, Glasgow Haskell Compiler and Julia as their optimization stage..

Polly

Polly is an analysis and transformation pass in LLVM. It uses concepts from polyhedral compilation, wherein programs are captured in an abstract mathematical representation, called the polyhedral model, which provides an easy way to analyse the program for locality, parallelism, etc. and exploit them by mutating this representation.

Polly can accelerate many BLAS kernels with almost no programming effort and has also been able to automatically offload some kernels to the GPU with the help of ppcg, a transpiler for automatically generating CUDA or OpenCL code from C code using polyhedral compilation, in the form of Polly-ACC. Polly-ACC transfers the polyhedral model of the program to ppcg for GPU-specific optimizations and generates a kernel in GPU assembly in case the optimizations were profitable and successful.

Julia

Julia is a contemporary high-performance programming language meant for numerical computing and computational science. The runtime uses LLVM to optimize code by converting Julia code into LLVM-IR, which is operated upon by LLVM's transformation passes and is lowered to machine code by LLVM's back-ends.

Enabling Polyhedral Optimisations in Julia

Matthias Reisinger has enabled Julia to utilise Polly to optimize functions by providing the @polly macro to mark functions that Polly has to notice, during his GSoC 16' project of the same title. Here's an introductory post to his work. He has also rewritten the kernels in the PolyBench benchmark in Julia called PolyBench.jl.

Extending Julia and Polly to target GPGPUs

Taking off from the foundations laid by Matthias, I'll be extending Julia to leverage Polly-ACC's GPGPU code-generation capabilities.

Before GSoC '17, I had successfully offloaded kernels written in Julia to the GPU using Polly-ACC ( which was albeit a hack

) and observed speedups of upto 191x with only a little effort intended from a programmer. I intend to make these changes a part of Julia codebase in the coming months and see to that as many compute kernels benefit from these changes.

In the next post, I'll talk about the goals of my project and the progress along each goal.

Search This Blog

Hmm... Can it run faster ?