26,316 research outputs found
Building Efficient Query Engines in a High-Level Language
Abstraction without regret refers to the vision of using high-level
programming languages for systems development without experiencing a negative
impact on performance. A database system designed according to this vision
offers both increased productivity and high performance, instead of sacrificing
the former for the latter as is the case with existing, monolithic
implementations that are hard to maintain and extend. In this article, we
realize this vision in the domain of analytical query processing. We present
LegoBase, a query engine written in the high-level language Scala. The key
technique to regain efficiency is to apply generative programming: LegoBase
performs source-to-source compilation and optimizes the entire query engine by
converting the high-level Scala code to specialized, low-level C code. We show
how generative programming allows to easily implement a wide spectrum of
optimizations, such as introducing data partitioning or switching from a row to
a column data layout, which are difficult to achieve with existing low-level
query compilers that handle only queries. We demonstrate that sufficiently
powerful abstractions are essential for dealing with the complexity of the
optimization effort, shielding developers from compiler internals and
decoupling individual optimizations from each other. We evaluate our approach
with the TPC-H benchmark and show that: (a) With all optimizations enabled,
LegoBase significantly outperforms a commercial database and an existing query
compiler. (b) Programmers need to provide just a few hundred lines of
high-level code for implementing the optimizations, instead of complicated
low-level code that is required by existing query compilation approaches. (c)
The compilation overhead is low compared to the overall execution time, thus
making our approach usable in practice for compiling query engines
Why (and How) Networks Should Run Themselves
The proliferation of networked devices, systems, and applications that we
depend on every day makes managing networks more important than ever. The
increasing security, availability, and performance demands of these
applications suggest that these increasingly difficult network management
problems be solved in real time, across a complex web of interacting protocols
and systems. Alas, just as the importance of network management has increased,
the network has grown so complex that it is seemingly unmanageable. In this new
era, network management requires a fundamentally new approach. Instead of
optimizations based on closed-form analysis of individual protocols, network
operators need data-driven, machine-learning-based models of end-to-end and
application performance based on high-level policy goals and a holistic view of
the underlying components. Instead of anomaly detection algorithms that operate
on offline analysis of network traces, operators need classification and
detection algorithms that can make real-time, closed-loop decisions. Networks
should learn to drive themselves. This paper explores this concept, discussing
how we might attain this ambitious goal by more closely coupling measurement
with real-time control and by relying on learning for inference and prediction
about a networked application or system, as opposed to closed-form analysis of
individual protocols
ret2spec: Speculative Execution Using Return Stack Buffers
Speculative execution is an optimization technique that has been part of CPUs
for over a decade. It predicts the outcome and target of branch instructions to
avoid stalling the execution pipeline. However, until recently, the security
implications of speculative code execution have not been studied.
In this paper, we investigate a special type of branch predictor that is
responsible for predicting return addresses. To the best of our knowledge, we
are the first to study return address predictors and their consequences for the
security of modern software. In our work, we show how return stack buffers
(RSBs), the core unit of return address predictors, can be used to trigger
misspeculations. Based on this knowledge, we propose two new attack variants
using RSBs that give attackers similar capabilities as the documented Spectre
attacks. We show how local attackers can gain arbitrary speculative code
execution across processes, e.g., to leak passwords another user enters on a
shared system. Our evaluation showed that the recent Spectre countermeasures
deployed in operating systems can also cover such RSB-based cross-process
attacks. Yet we then demonstrate that attackers can trigger misspeculation in
JIT environments in order to leak arbitrary memory content of browser
processes. Reading outside the sandboxed memory region with JIT-compiled code
is still possible with 80\% accuracy on average.Comment: Updating to the cam-ready version and adding reference to the
original pape
Towards High-Level Programming of Multi-GPU Systems Using the SkelCL Library
Application programming for GPUs (Graphics Processing Units) is complex and error-prone, because the popular approaches — CUDA and OpenCL — are intrinsically low-level and offer no special support for systems consisting of multiple GPUs. The SkelCL library presented in this paper
is built on top of the OpenCL standard and offers preimplemented recurring computation and communication patterns (skeletons) which greatly simplify programming for multiGPU systems. The library also provides an abstract vector data type and a high-level data (re)distribution mechanism to shield the programmer from the low-level data transfers between the system’s main memory and multiple GPUs. In this
paper, we focus on the specific support in SkelCL for systems with multiple GPUs and use a real-world application study from the area of medical imaging to demonstrate the reduced programming effort and competitive performance of SkelCL as compared to OpenCL and CUDA. Besides, we illustrate how SkelCL adapts to large-scale, distributed heterogeneous
systems in order to simplify their programming
Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures
An accurate prediction of scheduling and execution of instruction streams is
a necessary prerequisite for predicting the in-core performance behavior of
throughput-bound loop kernels on out-of-order processor architectures. Such
predictions are an indispensable component of analytical performance models,
such as the Roofline and the Execution-Cache-Memory (ECM) model, and allow a
deep understanding of the performance-relevant interactions between hardware
architecture and loop code. We present the Open Source Architecture Code
Analyzer (OSACA), a static analysis tool for predicting the execution time of
sequential loops comprising x86 instructions under the assumption of an
infinite first-level cache and perfect out-of-order scheduling. We show the
process of building a machine model from available documentation and
semi-automatic benchmarking, and carry it out for the latest Intel Skylake and
AMD Zen micro-architectures. To validate the constructed models, we apply them
to several assembly kernels and compare runtime predictions with actual
measurements. Finally we give an outlook on how the method may be generalized
to new architectures.Comment: 11 pages, 4 figures, 7 table
- …