Search CORE

511 research outputs found

Automatic Parallelization With Statistical Accuracy Bounds

Author: Kim Deokhwan
Misailovic Sasa
Rinard Martin
Publication venue
Publication date: 10/02/2010
Field of study

Traditional parallelizing compilers are designed to generate parallel programs that produce identical outputs as the original sequential program. The difficulty of performing the program analysis required to satisfy this goal and the restricted space of possible target parallel programs have both posed significant obstacles to the development of effective parallelizing compilers. The QuickStep compiler is instead designed to generate parallel programs that satisfy statistical accuracy guarantees. The freedom to generate parallel programs whose output may differ (within statistical accuracy bounds) from the output of the sequential program enables a dramatic simplification of the compiler and a significant expansion in the range of parallel programs that it can legally generate. QuickStep exploits this flexibility to take a fundamentally different approach from traditional parallelizing compilers. It applies a collection of transformations (loop parallelization, loop scheduling, synchronization introduction, and replication introduction) to generate a search space of parallel versions of the original sequential program. It then searches this space (prioritizing the parallelization of the most time-consuming loops in the application) to find a final parallelization that exhibits good parallel performance and satisfies the statistical accuracy guarantee. At each step in the search it performs a sequence of trial runs on representative inputs to examine the performance, accuracy, and memory accessing characteristics of the current generated parallel program. An analysis of these characteristics guides the steps the compiler takes as it explores the search space of parallel programs. Results from our benchmark set of applications show that QuickStep can automatically generate parallel programs with good performance and statistically accurate outputs. For two of the applications, the parallelization introduces noise into the output, but the noise remains within acceptable statistical bounds. The simplicity of the compilation strategy and the performance and statistical acceptability of the generated parallel programs demonstrate the advantages of the QuickStep approach

DSpace@MIT

Parallelizing Sequential Programs With Statistical Accuracy Tests

Author: Kim Deokhwan
Misailovic Sasa
Rinard Martin
Publication venue
Publication date: 05/08/2010
Field of study

We present QuickStep, a novel system for parallelizing sequential programs. QuickStep deploys a set of parallelization transformations that together induce a search space of candidate parallel programs. Given a sequential program, representative inputs, and an accuracy requirement, QuickStep uses performance measurements, profiling information, and statistical accuracy tests on the outputs of candidate parallel programs to guide its search for a parallelizationthat maximizes performance while preserving acceptable accuracy. When the search completes, QuickStep produces an interactive report that summarizes the applied parallelization transformations, performance, and accuracy results for the automatically generated candidate parallel programs. In our envisioned usage scenarios, the developer examines this report to evaluate the acceptability of the final parallelization and to obtain insight into how the original sequential program responds to different parallelization strategies. Itis also possible for the developer (or even a user of the program who has no software development expertise whatsoever) to simply use the best parallelization out of the box without examining the report or further investigating the parallelization. Results from our benchmark set of applications show that QuickStep can automatically generate accurate and efficient parallel programs---the automatically generated parallel versions of five of our six benchmark applications run between 5.0 and 7.7 times faster on 8 cores than the original sequential versions. Moreover, a comparison with the Intel icc compiler highlights how QuickStep can effectively parallelize applications with features (such as the use of modern object-oriented programming constructs or desirable parallelizations with infrequent but acceptable data races) that place them inherently beyond the reach of standard approaches

DSpace@MIT

khmer: Working with Big Data in Bioinformatics

Author: Brown C. Titus
McDonald Eric
Publication venue
Publication date: 09/03/2013
Field of study

We introduce design and optimization considerations for the 'khmer' package.Comment: Invited chapter for forthcoming book on Performance of Open Source Application

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

ASC: Automatically Scalable Computation

Author: Adams Ryan Prescott
Angelino Elaine
Appavoo Jonathan
Seltzer Margo I.
Waterland Amos
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/10/2017
Field of study

We present an architecture designed to transparently and automatically scale the performance of sequential programs as a function of the hardware resources available. The architecture is predicated on a model of computation that views program execution as a walk through the enormous state space composed of the memory and registers of a single-threaded processor. Each instruction execution in this model moves the system from its current point in state space to a deterministic subsequent point. We can parallelize such execution by predictively partitioning the complete path and speculatively executing each partition in parallel. Accurately partitioning the path is a challenging prediction problem. We have implemented our system using a functional simulator that emulates the x86 instruction set, including a collection of state predictors and a mechanism for speculatively executing threads that explore potential states along the execution path. While the overhead of our simulation makes it impractical to measure speedup relative to native x86 execution, experiments on three benchmarks show scalability of up to a factor of 256 on a 1024 core machine when executing unmodified sequential programs.Engineering and Applied Science

Harvard University - DASH

A Survey on Compiler Autotuning using Machine Learning

Author: Ashouri Amir H.
Cavazos John
Killian William
Palermo Gianluca
Silvano Cristina
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/09/2018
Field of study

Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

The Hipeac Vision, 2010

Author: Cohen Albert
De Bosschere Koen
De Sutter Bjorn
Duranton Marc
Falsafi Babak
Gaydadjiev Georgi
Katevenis Manolis
Maebe Jonas
Munk Harm
Navarro Nacho
Ramirez Alex
Temam Olivier
Valero Matero
Yehia Sami
Publication venue: HiPEAC
Publication date: 01/01/2010
Field of study

Ghent University Academic Bibliography

Archivsystem Ask23

ASC

Author: Bishop Christopher M.
Blume Bill
Cesa-Bianchi Nicolo
Coates Adam
Daniel
Dasgupta A.
Dubey Pradeep K.
Gabbay Freddy
Greenlaw Raymond
Hertzberg Ben
Jaynes E.T.
Kennedy Ken
Lagarias Jeffrey C.
Michie Donald
Ritson C. G.
Sazeides Yiannakis
Vigoda Benjamin
Yang J.
Zhong Hongtao
Zilles Craig
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

“It Looks Like You’re Writing a Parallel Loop”:A Machine Learning Based Parallelization Assistant

Author: Castaneda Lozano Roberto
Cole Murray
Franke Bjoern
Maramzin Aleksandr
Vasiladiotis Christos
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Crossref

Edinburgh Research Explorer

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications

Author: Cheung Alvin
Kemper Alfons
Palkar Shoumik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/06/2018
Field of study

MapReduce is a popular programming paradigm for developing large-scale, data-intensive computation. Many frameworks that implement this paradigm have recently been developed. To leverage these frameworks, however, developers must become familiar with their APIs and rewrite existing code. Casper is a new tool that automatically translates sequential Java programs into the MapReduce paradigm. Casper identifies potential code fragments to rewrite and translates them in two steps: (1) Casper uses program synthesis to search for a program summary (i.e., a functional specification) of each code fragment. The summary is expressed using a high-level intermediate language resembling the MapReduce paradigm and verified to be semantically equivalent to the original using a theorem prover. (2) Casper generates executable code from the summary, using either the Hadoop, Spark, or Flink API. We evaluated Casper by automatically converting real-world, sequential Java benchmarks to MapReduce. The resulting benchmarks perform up to 48.2x faster compared to the original.Comment: 12 pages, additional 4 pages of references and appendi

arXiv.org e-Print Archive

Crossref