875 research outputs found
nelli: a lightweight frontend for MLIR
Multi-Level Intermediate Representation (MLIR) is a novel compiler
infrastructure that aims to provide modular and extensible components to
facilitate building domain specific compilers. However, since MLIR models
programs at an intermediate level of abstraction, and most extant frontends are
at a very high level of abstraction, the semantics and mechanics of the
fundamental transformations available in MLIR are difficult to investigate and
employ in and of themselves. To address these challenges, we have developed
\texttt{nelli}, a lightweight, Python-embedded, domain-specific, language for
generating MLIR code. \texttt{nelli} leverages existing MLIR infrastructure to
develop Pythonic syntax and semantics for various MLIR features. We describe
\texttt{nelli}'s design goals, discuss key details of our implementation, and
demonstrate how \texttt{nelli} enables easily defining and lowering compute
kernels to diverse hardware platforms
Opt: A Domain Specific Language for Non-linear Least Squares Optimization in Graphics and Imaging
Many graphics and vision problems can be expressed as non-linear least
squares optimizations of objective functions over visual data, such as images
and meshes. The mathematical descriptions of these functions are extremely
concise, but their implementation in real code is tedious, especially when
optimized for real-time performance on modern GPUs in interactive applications.
In this work, we propose a new language, Opt (available under
http://optlang.org), for writing these objective functions over image- or
graph-structured unknowns concisely and at a high level. Our compiler
automatically transforms these specifications into state-of-the-art GPU solvers
based on Gauss-Newton or Levenberg-Marquardt methods. Opt can generate
different variations of the solver, so users can easily explore tradeoffs in
numerical precision, matrix-free methods, and solver approaches. In our
results, we implement a variety of real-world graphics and vision applications.
Their energy functions are expressible in tens of lines of code, and produce
highly-optimized GPU solver implementations. These solver have performance
competitive with the best published hand-tuned, application-specific GPU
solvers, and orders of magnitude beyond a general-purpose auto-generated
solver
PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR
Deep neural networks (DNNs) are of critical use in different domains. To
accelerate DNN computation, tensor compilers are proposed to generate efficient
code on different domain-specific accelerators. Existing tensor compilers
mainly focus on optimizing computation efficiency. However, memory access is
becoming a key performance bottleneck because the computational performance of
accelerators is increasing much faster than memory performance. The lack of
direct description of memory access and data dependence in current tensor
compilers' intermediate representation (IR) brings significant challenges to
generate memory-efficient code.
In this paper, we propose IntelliGen, a tensor compiler that can generate
high-performance code for memory-intensive operators by considering both
computation and data movement optimizations. IntelliGen represent a DNN program
using GIR, which includes primitives indicating its computation, data movement,
and parallel strategies. This information will be further composed as an
instruction-level dataflow graph to perform holistic optimizations by searching
different memory access patterns and computation operations, and generating
memory-efficient code on different hardware. We evaluate IntelliGen on NVIDIA
GPU, AMD GPU, and Cambricon MLU, showing speedup up to 1.97x, 2.93x, and
16.91x(1.28x, 1.23x, and 2.31x on average), respectively, compared to current
most performant frameworks.Comment: 12 pages, 14 figure
Enhancing R with Advanced Compilation Tools and Methods
I describe an approach to compiling common idioms in R code directly to
native machine code and illustrate it with several examples. Not only can this
yield significant performance gains, but it allows us to use new approaches to
computing in R. Importantly, the compilation requires no changes to R itself,
but is done entirely via R packages. This allows others to experiment with
different compilation strategies and even to define new domain-specific
languages within R. We use the Low-Level Virtual Machine (LLVM) compiler
toolkit to create the native code and perform sophisticated optimizations on
the code. By adopting this widely used software within R, we leverage its
ability to generate code for different platforms such as CPUs and GPUs, and
will continue to benefit from its ongoing development. This approach
potentially allows us to develop high-level R code that is also fast, that can
be compiled to work with different data representations and sources, and that
could even be run outside of R. The approach aims to both provide a compiler
for a limited subset of the R language and also to enable R programmers to
write other compilers. This is another approach to help us write high-level
descriptions of what we want to compute, not how.Comment: Published in at http://dx.doi.org/10.1214/13-STS462 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
The Collection Virtual Machine: An Abstraction for Multi-Frontend Multi-Backend Data Analysis
Getting the best performance from the ever-increasing number of hardware
platforms has been a recurring challenge for data processing systems. In recent
years, the advent of data science with its increasingly numerous and complex
types of analytics has made this challenge even more difficult. In practice,
system designers are overwhelmed by the number of combinations and typically
implement only one analysis/platform combination, leading to repeated
implementation effort -- and a plethora of semi-compatible tools for data
scientists.
In this paper, we propose the "Collection Virtual Machine" (or CVM) -- an
extensible compiler framework designed to keep the specialization process of
data analytics systems tractable. It can capture at the same time the essence
of a large span of low-level, hardware-specific implementation techniques as
well as high-level operations of different types of analyses. At its core lies
a language for defining nested, collection-oriented intermediate
representations (IRs). Frontends produce programs in their IR flavors defined
in that language, which get optimized through a series of rewritings (possibly
changing the IR flavor multiple times) until the program is finally expressed
in an IR of platform-specific operators. While reducing the overall
implementation effort, this also improves the interoperability of both analyses
and hardware platforms. We have used CVM successfully to build specialized
backends for platforms as diverse as multi-core CPUs, RDMA clusters, and
serverless computing infrastructure in the cloud and expect similar results for
many more frontends and hardware platforms in the near future.Comment: This paper is currently under review at DaMoN'2
- …