Research Staff Member
Researcher with a focus on special-purpose accelerators and optimizing compilers.  My current focus is on building compilation technology for next-generation GPU based supercomputers using the LLVM compilation infrastructure. I proposed and implemented a new technique to map fork-join parallelism via the OpenMP programming model onto the SIMT architecture of NVIDIA GPUs. Our LLVM-based compiler is freely available at: My work has also been integrated into IBM's commercially supported XL compiler toolchain.

Previously I worked on loop scheduling optimizations for the Active Memory Cube (AMC), a low power near-memory processor. I extended swing modulo scheduling for the exposed pipeline Vector-VLIW core. More details are available in several papers [AMC] [COMPILER] [SCHEDULER] [CO-DESIGN].