Search CORE

38 research outputs found

Language and Compiler Support for Auto-Tuning Variable-Accuracy Algorithms

Author: Amarasinghe Saman
Ansel Jason
Chan Cy
Edelman Alan
Olszewski Marek
Wong Yee Lok
Publication venue
Publication date: 27/07/2010
Field of study

Approximating ideal program outputs is a common technique for solving computationally difficult problems, for adhering to processing or timing constraints, and for performance optimization in situations where perfect precision is not necessary. To this end, programmers often use approximation algorithms, iterative methods, data resampling, and other heuristics. However, programming such variable accuracy algorithms presents difficult challenges since the optimal algorithms and parameters may change with different accuracy requirements and usage environments. This problem is further compounded when multiple variable accuracy algorithms are nested together due to the complex way that accuracy requirements can propagate across algorithms and because of the resulting size of the set of allowable compositions. As a result, programmers often deal with this issue in an ad-hoc manner that can sometimes violate sound programming practices such as maintaining library abstractions. In this paper, we propose language extensions that expose trade-offs between time and accuracy to the compiler. The compiler performs fully automatic compile-time and install-time autotuning and analyses in order to construct optimized algorithms to achieve any given target accuracy. We present novel compiler techniques and a structured genetic tuning algorithm to search the space of candidate algorithms and accuracies in the presence of recursion and sub-calls to other variable accuracy code. These techniques benefit both the library writer, by providing an easy way to describe and search the parameter and algorithmic choice space, and the library user, by allowing high level specification of accuracy requirements which are then met automatically without the need for the user to understand any algorithm-specific parameters. Additionally, we present a new suite of benchmarks, written in our language, to examine the efficacy of our techniques. Our experimental results show that by relaxing accuracy requirements, we can easily obtain performance improvements ranging from 1.1x to orders of magnitude of speedup

CiteSeerX

DSpace@MIT

Crossref

eScholarship - University of California

Language and Compiler Support for Auto-Tuning Variable-Accuracy Algorithms

Author: Amarasinghe Saman P.
Ansel Jason Andrew
Chan Cy
Edelman Alan
Olszewski Marek Krystyn
Wong Yee Lok
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2011
Field of study

Approximating ideal program outputs is a common technique for solving computationally difficult problems, for adhering to processing or timing constraints, and for performance optimization in situations where perfect precision is not necessary. To this end, programmers often use approximation algorithms, iterative methods, data resampling, and other heuristics. However, programming such variable accuracy algorithms presents difficult challenges since the optimal algorithms and parameters may change with different accuracy requirements and usage environments. This problem is further compounded when multiple variable accuracy algorithms are nested together due to the complex way that accuracy requirements can propagate across algorithms and because of the size of the set of allowable compositions. As a result, programmers often deal with this issue in an ad-hoc manner that can sometimes violate sound programming practices such as maintaining library abstractions. In this paper, we propose language extensions that expose trade-offs between time and accuracy to the compiler. The compiler performs fully automatic compile-time and installtime autotuning and analyses in order to construct optimized algorithms to achieve any given target accuracy. We present novel compiler techniques and a structured genetic tuning algorithm to search the space of candidate algorithms and accuracies in the presence of recursion and sub-calls to other variable accuracy code. These techniques benefit both the library writer, by providing an easy way to describe and search the parameter and algorithmic choice space, and the library user, by allowing high level specification of accuracy requirements which are then met automatically without the need for the user to understand any algorithm-specific parameters. Additionally, we present a new suite of benchmarks, written in our language, to examine the efficacy of our techniques. Our experimental results show that by relaxing accuracy requirements , we can easily obtain performance improvements ranging from 1.1× to orders of magnitude of speedup

CiteSeerX

DSpace@MIT

Crossref

Application-independent Autotuning for GPUs

Author: Dachsbacher C.
Karcher T.
Tichy W. F.
Tillmann M.
Publication venue: IOS-Press
Publication date: 01/01/2014
Field of study

KITopen

An Efficient Monte Carlo-based Probabilistic Time-Dependent Routing Calculation Targeting a Server-Side Car Navigation System

Author: Bispo Joao
Cardoso Joao M. P.
Gadioli Davide
Golasowski Martin
Martinovic Jan
Palermo Gianluca
Pinto Pedro
Silvano Cristina
Slaninova Katerina
Vitali Emanuele
Publication venue
Publication date: 01/01/2019
Field of study

Incorporating speed probability distribution to the computation of the route planning in car navigation systems guarantees more accurate and precise responses. In this paper, we propose a novel approach for dynamically selecting the number of samples used for the Monte Carlo simulation to solve the Probabilistic Time-Dependent Routing (PTDR) problem, thus improving the computation efficiency. The proposed method is used to determine in a proactive manner the number of simulations to be done to extract the travel-time estimation for each specific request while respecting an error threshold as output quality level. The methodology requires a reduced effort on the application development side. We adopted an aspect-oriented programming language (LARA) together with a flexible dynamic autotuning library (mARGOt) respectively to instrument the code and to take tuning decisions on the number of samples improving the execution efficiency. Experimental results demonstrate that the proposed adaptive approach saves a large fraction of simulations (between 36% and 81%) with respect to a static approach while considering different traffic situations, paths and error requirements. Given the negligible runtime overhead of the proposed approach, it results in an execution-time speedup between 1.5x and 5.1x. This speedup is reflected at infrastructure-level in terms of a reduction of around 36% of the computing resources needed to support the whole navigation pipeline

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

SiblingRivalry: Online Autotuning Through Local Competitions

Author: Amarasinghe Saman P.
Ansel Jason Andrew
Chan Cy
O'Reilly Una-May
Olszewski Marek Krystyn
Pacula Maciej
Wong Yee Lok
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2012
Field of study

Modern high performance libraries, such as ATLAS and FFTW, and programming languages, such as PetaBricks, have shown that autotuning computer programs can lead to significant speedups. However, autotuning can be burdensome to the deployment of a program, since the tuning process can take a long time and should be re-run whenever the program, microarchitecture, execution environment, or tool chain changes. Failure to re-autotune programs often leads to widespread use of sub-optimal algorithms. With the growth of cloud computing, where computations can run in environments with unknown load and migrate between different (possibly unknown) microarchitectures, the need for online autotuning has become increasingly important. We present SiblingRivalry, a new model for always-on online autotuning that allows parallel programs to continuously adapt and optimize themselves to their environment. In our system, requests are processed by dividing the available cores in half, and processing two identical requests in parallel on each half. Half of the cores are devoted to a known safe program configuration, while the other half are used for an experimental program configuration chosen by our self-adapting evolutionary algorithm. When the faster configuration completes, its results are returned, and the slower configuration is terminated. Over time, this constant experimentation allows programs to adapt to changing dynamic environments and often outperform the original algorithm that uses the entire system.United States. Dept. of Energy (DOE Award DE-SC0005288

DSpace@MIT

Crossref

eScholarship - University of California

Automatic Thread-Level Parallelization in the Chombo AMR Library

Author: Christen Matthias
Keen Noel
Ligocki Terry
Oliker Leonid
Shalf John
Van Straalen Brian
Williams Samuel
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 26/05/2011
Field of study

The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware?s architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number of existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation

Crossref

UNT Digital Library

ASAC: Automatic Sensitivity Analysis for Approximate Computing

Author: Chundong Wang
Pooja Roy
Rajarshi Ray
Weng-fai Wong
Publication venue
Publication date
Field of study

The approximation based programming paradigm is especially attractive for developing error-resilient applications, targeting low power embedded devices. It allows for program data to be computed and stored approximately for better energy efficiency. The duration of battery in the smartphones, tablets, etc. is generally more of a concern to users than an application’s accuracy or fidelity beyond certain acceptable quality of service. Therefore, relaxing accuracy to improve energy efficiency is an attractive tradeoff when permissible by the application’s domain. Recent works suggest source code annotations and type qualifiers to facilitate safe approximate computation and data manipulation. It requires rewriting of programs or the availability of source codes for annotations. This may not be feasible as real-world applications tend to be large, with source code that is not readily available. In this paper, we propose a novel sensitivity analysis that automatically generates annotations for programs for the purpose of approximate computing. Our framework, ASAC, extracts information about the sensitivity of the output with respect to program data. We show that the program output is sensitive to only a subset of program data that we deem critical, and hence must be precise. The rest of the data can be computed and stored approximately. We evaluated our analysis on a range of applications, and achieved a 86 % accuracy compared to manual annotations by programmers. We validated our analysis by showing that the applications are within the acceptable QoS threshold if we approximate the non-critical data

CiteSeerX

Autotuning Algorithmic Choice for Input Sensitivity

Author: Ansel J.
Frigo M.
Fursin G.
Karsai G.
Tapus C.
Vuduc R.
Whaley R. C.
Publication venue
Publication date: 23/06/2014
Field of study

Empirical autotuning is increasingly being used in many domains to achieve optimized performance in a variety of different execution environments. A daunting challenge faced by such autotuners is input sensitivity, where the best autotuned configuration may vary with different input sets. In this paper, we propose a two level solution that: first, clusters to find input sets that are similar in input feature space; then, uses an evolutionary autotuner to build an optimized program for each of these clusters; and, finally, builds an adaptive overhead aware classifier which assigns each input to a specific input optimized program. Our approach addresses the complex trade-off between using expensive features, to accurately characterize an input, and cheaper features, which can be computed with less overhead. Experimental results show that by adapting to different inputs one can obtain up to a 3x speedup over using a single configuration for all inputs

DSpace@MIT

Crossref