1 research outputs found
Optimizing Data-Intensive Computations in Existing Libraries with Split Annotations
Data movement between main memory and the CPU is a major bottleneck in
parallel data-intensive applications. In response, researchers have proposed
using compilers and intermediate representations (IRs) that apply optimizations
such as loop fusion under existing high-level APIs such as NumPy and
TensorFlow. Even though these techniques generally do not require changes to
user applications, they require intrusive changes to the library itself: often,
library developers must rewrite each function using a new IR. In this paper, we
propose a new technique called split annotations (SAs) that enables key data
movement optimizations over unmodified library functions. SAs only require
developers to annotate functions and implement an API that specifies how to
partition data in the library. The annotation and API describe how to enable
cross-function data pipelining and parallelization, while respecting each
function's correctness constraints. We implement a parallel runtime for SAs in
a system called Mozart. We show that Mozart can accelerate workloads in
libraries such as Intel MKL and Pandas by up to 15x, with no library
modifications. Mozart also provides performance gains competitive with
solutions that require rewriting libraries, and can sometimes outperform these
systems by up to 2x by leveraging existing hand-optimized code.Comment: Appearing in SOSP 2019, Huntsville, ON, C