3,614 research outputs found
Towards co-designed optimizations in parallel frameworks: A MapReduce case study
The explosion of Big Data was followed by the proliferation of numerous
complex parallel software stacks whose aim is to tackle the challenges of data
deluge. A drawback of a such multi-layered hierarchical deployment is the
inability to maintain and delegate vital semantic information between layers in
the stack. Software abstractions increase the semantic distance between an
application and its generated code. However, parallel software frameworks
contain inherent semantic information that general purpose compilers are not
designed to exploit.
This paper presents a case study demonstrating how the specific semantic
information of the MapReduce paradigm can be exploited on multicore
architectures. MR4J has been implemented in Java and evaluated against
hand-optimized C and C++ equivalents. The initial observed results led to the
design of a semantically aware optimizer that runs automatically without
requiring modification to application code.
The optimizer is able to speedup the execution time of MR4J by up to 2.0x.
The introduced optimization not only improves the performance of the generated
code, during the map phase, but also reduces the pressure on the garbage
collector. This demonstrates how semantic information can be harnessed without
sacrificing sound software engineering practices when using parallel software
frameworks.Comment: 8 page
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
Garbage collection auto-tuning for Java MapReduce on Multi-Cores
MapReduce has been widely accepted as a simple programming pattern that can form the basis for efficient, large-scale, distributed data processing. The success of the MapReduce pattern has led to a variety of implementations for different computational scenarios. In this paper we present MRJ, a MapReduce Java framework for multi-core architectures. We evaluate its scalability on a four-core, hyperthreaded Intel Core i7 processor, using a set of standard MapReduce benchmarks. We investigate the significant impact that Java runtime garbage collection has on the performance and scalability of MRJ. We propose the use of memory management auto-tuning techniques based on machine learning. With our auto-tuning approach, we are able to achieve MRJ performance within 10% of optimal on 75% of our benchmark tests
Improving Memory Hierarchy Utilisation for Stencil Computations on Multicore Machines
Although modern supercomputers are composed of multicore machines, one can
find scientists that still execute their legacy applications which were
developed to monocore cluster where memory hierarchy is dedicated to a sole
core. The main objective of this paper is to propose and evaluate an algorithm
that identify an efficient blocksize to be applied on MPI stencil computations
on multicore machines. Under the light of an extensive experimental analysis,
this work shows the benefits of identifying blocksizes that will dividing data
on the various cores and suggest a methodology that explore the memory
hierarchy available in modern machines
- ā¦