45 research outputs found
UPIR: Toward the Design of Unified Parallel Intermediate Representation for Parallel Programming Models
The complexity of heterogeneous computing architectures, as well as the
demand for productive and portable parallel application development, have
driven the evolution of parallel programming models to become more
comprehensive and complex than before. Enhancing the conventional compilation
technologies and software infrastructure to be parallelism-aware has become one
of the main goals of recent compiler development. In this paper, we propose the
design of unified parallel intermediate representation (UPIR) for multiple
parallel programming models and for enabling unified compiler transformation
for the models. UPIR specifies three commonly used parallelism patterns (SPMD,
data and task parallelism), data attributes and explicit data movement and
memory management, and synchronization operations used in parallel programming.
We demonstrate UPIR via a prototype implementation in the ROSE compiler for
unifying IR for both OpenMP and OpenACC and in both C/C++ and Fortran, for
unifying the transformation that lowers both OpenMP and OpenACC code to LLVM
runtime, and for exporting UPIR to LLVM MLIR dialect.Comment: Typos corrected. Format update
LLOV: A Fast Static Data-Race Checker for OpenMP Programs
In the era of Exascale computing, writing efficient parallel programs is indispensable and at the same time,
writing sound parallel programs is highly difficult. While parallel programming is easier with frameworks
such as OpenMP, the possibility of data races in these programs still persists. In this paper, we propose a
fast, lightweight, language agnostic, and static data race checker for OpenMP programs based on the LLVM
compiler framework. We compare our tool with other state-of-the-art data race checkers on a variety of
well-established benchmarks. We show that the precision, accuracy, and the F1 score of our tool is comparable
to other checkers while being orders of magnitude faster. To the best of our knowledge, this work is the only
tool among the state-of-the-art data race checkers that can verify a FORTRAN program to be data race free
LLOV: A Fast Static Data-Race Checker for OpenMP Programs
In the era of Exascale computing, writing efficient parallel programs is
indispensable and at the same time, writing sound parallel programs is very
difficult. Specifying parallelism with frameworks such as OpenMP is relatively
easy, but data races in these programs are an important source of bugs. In this
paper, we propose LLOV, a fast, lightweight, language agnostic, and static data
race checker for OpenMP programs based on the LLVM compiler framework. We
compare LLOV with other state-of-the-art data race checkers on a variety of
well-established benchmarks. We show that the precision, accuracy, and the F1
score of LLOV is comparable to other checkers while being orders of magnitude
faster. To the best of our knowledge, LLOV is the only tool among the
state-of-the-art data race checkers that can verify a C/C++ or FORTRAN program
to be data race free.Comment: Accepted in ACM TACO, August 202
Restoring the Broken Covenant Between Compilers and Deep Learning Accelerators
Deep learning accelerators address the computational demands of Deep Neural
Networks (DNNs), departing from the traditional Von Neumann execution model.
They leverage specialized hardware to align with the application domain's
structure. Compilers for these accelerators face distinct challenges compared
to those for general-purpose processors. These challenges include exposing and
managing more micro-architectural features, handling software-managed scratch
pads for on-chip storage, explicitly managing data movement, and matching DNN
layers with varying hardware capabilities. These complexities necessitate a new
approach to compiler design, as traditional compilers mainly focused on
generating fine-grained instruction sequences while abstracting
micro-architecture details. This paper introduces the Architecture Covenant
Graph (ACG), an abstract representation of an architectural structure's
components and their programmable capabilities. By enabling the compiler to
work with the ACG, it allows for adaptable compilation workflows when making
changes to accelerator design, reducing the need for a complete compiler
redevelopment. Codelets, which express DNN operation functionality and evolve
into execution mappings on the ACG, are key to this process. The Covenant
compiler efficiently targets diverse deep learning accelerators, achieving
93.8% performance compared to state-of-the-art, hand-tuned DNN layer
implementations when compiling 14 DNN layers from various models on two
different architectures