536 research outputs found
Building Efficient Query Engines in a High-Level Language
Abstraction without regret refers to the vision of using high-level
programming languages for systems development without experiencing a negative
impact on performance. A database system designed according to this vision
offers both increased productivity and high performance, instead of sacrificing
the former for the latter as is the case with existing, monolithic
implementations that are hard to maintain and extend. In this article, we
realize this vision in the domain of analytical query processing. We present
LegoBase, a query engine written in the high-level language Scala. The key
technique to regain efficiency is to apply generative programming: LegoBase
performs source-to-source compilation and optimizes the entire query engine by
converting the high-level Scala code to specialized, low-level C code. We show
how generative programming allows to easily implement a wide spectrum of
optimizations, such as introducing data partitioning or switching from a row to
a column data layout, which are difficult to achieve with existing low-level
query compilers that handle only queries. We demonstrate that sufficiently
powerful abstractions are essential for dealing with the complexity of the
optimization effort, shielding developers from compiler internals and
decoupling individual optimizations from each other. We evaluate our approach
with the TPC-H benchmark and show that: (a) With all optimizations enabled,
LegoBase significantly outperforms a commercial database and an existing query
compiler. (b) Programmers need to provide just a few hundred lines of
high-level code for implementing the optimizations, instead of complicated
low-level code that is required by existing query compilation approaches. (c)
The compilation overhead is low compared to the overall execution time, thus
making our approach usable in practice for compiling query engines
deGoal a tool to embed dynamic code generators into applications
International audienceThe processing applications that are now being used in mo- bile and embedded platforms require at the same time a fair amount of processing power and a high level of flexibility, due to the nature of the data to process. In this context we propose a lightweight code genera- tion technique that is able to perform data dependent optimizations at run-time for processing kernels. In this paper we present the motivations and how to use deGoal a tool designed to build fast and portable binary code generators called com- pilettes
PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development
This paper describes PlinyCompute, a system for development of
high-performance, data-intensive, distributed computing tools and libraries. In
the large, PlinyCompute presents the programmer with a very high-level,
declarative interface, relying on automatic, relational-database style
optimization to figure out how to stage distributed computations. However, in
the small, PlinyCompute presents the capable systems programmer with a
persistent object data model and API (the "PC object model") and associated
memory management system that has been designed from the ground-up for high
performance, distributed, data-intensive computing. This contrasts with most
other Big Data systems, which are constructed on top of the Java Virtual
Machine (JVM), and hence must at least partially cede performance-critical
concerns such as memory management (including layout and de/allocation) and
virtual method/function dispatch to the JVM. This hybrid approach---declarative
in the large, trusting the programmer's ability to utilize PC object model
efficiently in the small---results in a system that is ideal for the
development of reusable, data-intensive tools and libraries. Through extensive
benchmarking, we show that implementing complex objects manipulation and
non-trivial, library-style computations on top of PlinyCompute can result in a
speedup of 2x to more than 50x or more compared to equivalent implementations
on Spark.Comment: 48 pages, including references and Appendi
Sidekick compilation with xDSL
Traditionally, compiler researchers either conduct experiments within an
existing production compiler or develop their own prototype compiler; both
options come with trade-offs. On one hand, prototyping in a production compiler
can be cumbersome, as they are often optimized for program compilation speed at
the expense of software simplicity and development speed. On the other hand,
the transition from a prototype compiler to production requires significant
engineering work. To bridge this gap, we introduce the concept of sidekick
compiler frameworks, an approach that uses multiple frameworks that
interoperate with each other by leveraging textual interchange formats and
declarative descriptions of abstractions. Each such compiler framework is
specialized for specific use cases, such as performance or prototyping.
Abstractions are by design shared across frameworks, simplifying the transition
from prototyping to production. We demonstrate this idea with xDSL, a sidekick
for MLIR focused on prototyping and teaching. xDSL interoperates with MLIR
through a shared textual IR and the exchange of IRs through an IR Definition
Language. The benefits of sidekick compiler frameworks are evaluated by showing
on three use cases how xDSL impacts their development: teaching, DSL
compilation, and rewrite system prototyping. We also investigate the trade-offs
that xDSL offers, and demonstrate how we simplify the transition between
frameworks using the IRDL dialect. With sidekick compilation, we envision a
future in which engineers minimize the cost of development by choosing a
framework built for their immediate needs, and later transitioning to
production with minimal overhead
Enabling aggressive compiler optimization for the mobile environment
Aggressive code optimization on the mobile environment is a difficult endeavor. Billions of users rely on mobile devices for their daily computing tasks. Yet, they mostly run poorly optimized code, under-utilizing their already limited processing and energy resources. Existing optimization approaches, like iterative compilation, perform well in other domains but fall short on the mobile environment. They either rely on representative inputs that are hard to reconstruct, or expose users to slowdowns and crashes.
An ideal solution must be able to perform an optimization search by repeatedly evaluating different optimization decisions on the same input. That input should be representative of actual user usage without requiring developers to artificially create it. Finally, users should never be exposed to slow or crashing evaluations, a quite common side-effect of iterative compilation. This thesis presents a novel approach with all above in mind, bringing aggressive code optimization to the mobile environment.
With a transparent capture mechanism, real user inputs can be stored. This mechanism is infrequently invoked and remains unnoticeable from the users. A single capture is enough to enable offline, input-driven code optimization. It supports C functions as well as code regions of interactive Android applications. It allows controlling the timing and frequency of captures, it bails out on imminent high-impact runtime events, and excludes from captures some immutable data.
A replay-based evaluation mechanism is able to repeatedly restore a captured input while changing the underlying code. For C programs, it employs compile and link-time strategies to consistently work despite code transformations. For Android apps, a novel mechanism was developed, able to replay using different code types. These are the original Android-compiled code, interpretation, and LLVM-generated code. Additionally, it works well even in the presence of memory-shuffling security mechanisms.
Capture and replay is fused into an iterative compilation system that uses offline, replay-based evaluations. Initially, real inputs are captured online, without noticeably affecting the users. For C and interactive apps, captures required on average 2ms and 15ms respectively. Then, an optimization search is performed by repeatedly replaying the inputs using different code transformations. As this happens offline, any crashing or erroneous executions are not affecting the users. C programs became 29% faster using a random search, while interactive apps became 44% faster using a genetic algorithm and a novel Android backend based on LLVM. Finally, with crowd-sourcing, the offline evaluation effort was significantly accelerated. Specifically, for the user with the highest workload the search accelerated by 7 times
TDO-CIM: Transparent Detection and Offloading for Computation In-memory
Computation in-memory is a promising non-von Neumann approach aiming at
completely diminishing the data transfer to and from the memory subsystem.
Although a lot of architectures have been proposed, compiler support for such
architectures is still lagging behind. In this paper, we close this gap by
proposing an end-to-end compilation flow for in-memory computing based on the
LLVM compiler infrastructure. Starting from sequential code, our approach
automatically detects, optimizes, and offloads kernels suitable for in-memory
acceleration. We demonstrate our compiler tool-flow on the PolyBench/C
benchmark suite and evaluate the benefits of our proposed in-memory
architecture simulated in Gem5 by comparing it with a state-of-the-art von
Neumann architecture.Comment: Full version of DATE2020 publicatio
- …