3,487 research outputs found
Cache-conscious Splitting of MapReduce Tasks and its Application to Stencil Computations
Modern cluster systems are typically composed by nodes with multiple processing
units and memory hierarchies comprising multiple cache levels of various sizes. To leverage
the full potential of these architectures it is necessary to explore concepts such as
parallel programming and the layout of data onto the memory hierarchy. However, the
inherent complexity of these concepts and the heterogeneity of the target architectures
raises several challenges at application development and performance portability levels,
respectively. In what concerns parallel programming, several model and frameworks
are available, of which MapReduce [16] is one of the most popular. It was developed
at Google [16] for the parallel and distributed processing of large amounts of data in
large clusters of commodity machines. Although being very powerful tools, the reference
MapReduce frameworks, such as Hadoop and Spark, do not leverage the characteristics
of the underlying memory hierarchy. This shortcoming is particularly noticeable in
computations that benefit from temporal locality, such as stencil computations.
In this context, the goal of this thesis is to improve the performance of MapReduce
computations that benefit from temporal locality. To that end we optimize the mapping
of MapReduce computations in a machine’s cache memory hierarchy by applying cacheaware
tiling techniques. We prototyped our solution on top of the framework Hadoop
MapReduce, incorporating a cache-awareness in the splitting stage.
To validate our solution and assess its benefits, we developed an API for expressing
stencil computations on top the developed framework. The experimental results show
that, for a typical stencil computation, our solution delivers an average speed-up of 1.77
while reaching a peek speed-up of 3.2. These findings allows us to conclude that cacheaware
decomposition of MapReduce computations considerably boosts the execution of
this class of MapReduce computations
Recommended from our members
Analytical Query Execution Optimized for all Layers of Modern Hardware
Analytical database queries are at the core of business intelligence and decision support. To analyze the vast amounts of data available today, query execution needs to be orders of magnitude faster. Hardware advances have made a profound impact on database design and implementation. The large main memory capacity allows queries to execute exclusively in memory and shifts the bottleneck from disk access to memory bandwidth. In the new setting, to optimize query performance, databases must be aware of an unprecedented multitude of complicated hardware features. This thesis focuses on the design and implementation of highly efficient database systems by optimizing analytical query execution for all layers of modern hardware. The hardware layers include the network across multiple machines, main memory and the NUMA interconnection across multiple processors, the multiple levels of caches across multiple processor cores, and the execution pipeline within each core. For the network layer, we introduce a distributed join algorithm that minimizes the network traffic. For the memory hierarchy, we describe partitioning variants aware to the dynamics of the CPU caches and the NUMA interconnection. To improve the memory access rate of linear scans, we optimize lightweight compression variants and evaluate their trade-offs. To accelerate query execution within the core pipeline, we introduce advanced SIMD vectorization techniques generalizable across multiple operators. We evaluate our algorithms and techniques on both mainstream hardware and on many-integrated-core platforms, and combine our techniques in a new query engine design that can better utilize the features of many-core CPUs. In the era of hardware becoming increasingly parallel and datasets consistently growing in size, this thesis can serve as a compass for developing hardware-conscious databases with truly high-performance analytical query execution
No Bits Left Behind
One of the key tenets of database system design is making efficient
use of storage and memory resources. However, existing database
system implementations are actually extremely wasteful of such
resources; for example, most systems leave a great deal of empty
space in tuples, index pages, and data pages, and spend many
CPU cycles reading cold records from disk that are never used.
In this paper, we identify a number of such sources of waste, and
present a series of techniques that limit this waste (e.g., forcing
better memory locality for hot data and using empty space in index
pages to cache popular tuples) without substantially complicating
interfaces or system design. We show that these techniques
effectively reduce memory requirements for real scenarios from
the Wikipedia database (by up to 17.8×) while increasing query
performance (by up to 8×)
- …