5,625 research outputs found
Exploiting cache locality at run-time
With the increasing gap between the speeds of the processor and memory system, memory access has become a major performance bottleneck in modern computer systems. Recently, Symmetric Multi-Processor (SMP) systems have emerged as a major class of high-performance platforms. Improving the memory performance of Parallel applications with dynamic memory-access patterns on Symmetric Multi-Processors (SMP) is a hard problem. The solution to this problem is critical to the successful use of the SMP systems because dynamic memory-access patterns occur in many real-world applications. This dissertation is aimed at solving this problem.;Based on a rigorous analysis of cache-locality optimization, we propose a memory-layout oriented run-time technique to exploit the cache locality of parallel loops. Our technique have been implemented in a run-time system. Using simulation and measurement, we have shown our run-time approach can achieve comparable performance with compiler optimizations for those regular applications, whose load balance and cache locality can be well optimized by tiling and other program transformations. However, our approach was shown to improve significantly the memory performance for applications with dynamic memory-access patterns. Such applications are usually hard to optimize with static compiler optimizations.;Several contributions are made in this dissertation. We present models to characterize the complexity and present a solution framework for optimizing cache locality. We present an effective estimation technique for memory-access patterns to support efficient locality optimizations and information integration. We present a memory-layout oriented run-time technique for locality optimization. We present efficient scheduling algorithms to trade off locality and load imbalance. We provide a detailed performance evaluation of the run-time technique
Deductive Optimization of Relational Data Storage
Optimizing the physical data storage and retrieval of data are two key
database management problems. In this paper, we propose a language that can
express a wide range of physical database layouts, going well beyond the row-
and column-based methods that are widely used in database management systems.
We use deductive synthesis to turn a high-level relational representation of a
database query into a highly optimized low-level implementation which operates
on a specialized layout of the dataset. We build a compiler for this language
and conduct experiments using a popular database benchmark, which shows that
the performance of these specialized queries is competitive with a
state-of-the-art in memory compiled database system
Building Efficient Query Engines in a High-Level Language
Abstraction without regret refers to the vision of using high-level
programming languages for systems development without experiencing a negative
impact on performance. A database system designed according to this vision
offers both increased productivity and high performance, instead of sacrificing
the former for the latter as is the case with existing, monolithic
implementations that are hard to maintain and extend. In this article, we
realize this vision in the domain of analytical query processing. We present
LegoBase, a query engine written in the high-level language Scala. The key
technique to regain efficiency is to apply generative programming: LegoBase
performs source-to-source compilation and optimizes the entire query engine by
converting the high-level Scala code to specialized, low-level C code. We show
how generative programming allows to easily implement a wide spectrum of
optimizations, such as introducing data partitioning or switching from a row to
a column data layout, which are difficult to achieve with existing low-level
query compilers that handle only queries. We demonstrate that sufficiently
powerful abstractions are essential for dealing with the complexity of the
optimization effort, shielding developers from compiler internals and
decoupling individual optimizations from each other. We evaluate our approach
with the TPC-H benchmark and show that: (a) With all optimizations enabled,
LegoBase significantly outperforms a commercial database and an existing query
compiler. (b) Programmers need to provide just a few hundred lines of
high-level code for implementing the optimizations, instead of complicated
low-level code that is required by existing query compilation approaches. (c)
The compilation overhead is low compared to the overall execution time, thus
making our approach usable in practice for compiling query engines
Benchmarking Memory Management Capabilities within ROOT-Sim
In parallel discrete event simulation techniques, the simulation model is partitioned into objects, concurrently executing events on different CPUs and/or multiple CPUCores. In such a context, run-time supports for logical time synchronization across the different simulation objects play a central role in determining the effectiveness of the specific parallel simulation environment. In this paper we present an experimental evaluation of the memory management capabilities offered by the ROme OpTimistic Simulator (ROOT-Sim). This is an open source parallel simulation environment transparently supporting optimistic synchronization via recoverability (based on incremental log/restore techniques) of any type of memory operation affecting the state of simulation objects, i.e., memory allocation, deallocation and update operations. The experimental study is based on a synthetic benchmark which mimics different read/write patterns inside the dynamic memory map associated with the state of simulation objects. This allows sensibility analysis of time and space effects due to the memory management subsystem while varying the type and the locality of the accesses associated with event processin
- …