Search CORE

8,997 research outputs found

Automatic creation of tile size selection models using neural networks

Author: Yuki Tomofumi
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2010
Field of study

2010 Spring.Includes bibliographic references (pages 54-59).Covers not scanned.Print version deaccessioned 2022.Tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality. Effective use of tiling requires selection and tuning of the tile sizes. This is usually achieved by hand-crafting tile size selection (TSS) models that characterize the performance of the tiled program as a function of tile sizes. The best tile sizes are selected by either directly using the TSS model or by using the TSS model together with an empirical search. Hand-crafting accurate TSS models is hard, and adapting them to different architecture/compiler, or even keeping them up-to-date with respect to the evolution of a single compiler is often just as hard. Instead of hand-crafting TSS models, can we automatically learn or create them? In this paper, we show that for a specific class of programs fairly accurate TSS models can be automatically created by using a combination of simple program features, synthetic kernels, and standard machine learning techniques. The automatic TSS model generation scheme can also be directly used for adapting the model and/or keeping it up-to-date. We evaluate our scheme on six different architecture-compiler combinations (chosen from three different architectures and four different compilers). The models learned by our method have consistently shown near-optimal performance (within 5% of the optimal on average) across the tested architecture-compiler combinations

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Differentiating the multipoint Expected Improvement for optimal batch design

Author: Chevalier Clément
Ginsbourger David
Marmin Sébastien
Publication venue
Publication date: 26/07/2018
Field of study

This work deals with parallel optimization of expensive objective functions which are modeled as sample realizations of Gaussian processes. The study is formalized as a Bayesian optimization problem, or continuous multi-armed bandit problem, where a batch of q > 0 arms is pulled in parallel at each iteration. Several algorithms have been developed for choosing batches by trading off exploitation and exploration. As of today, the maximum Expected Improvement (EI) and Upper Confidence Bound (UCB) selection rules appear as the most prominent approaches for batch selection. Here, we build upon recent work on the multipoint Expected Improvement criterion, for which an analytic expansion relying on Tallis' formula was recently established. The computational burden of this selection rule being still an issue in application, we derive a closed-form expression for the gradient of the multipoint Expected Improvement, which aims at facilitating its maximization using gradient-based ascent algorithms. Substantial computational savings are shown in application. In addition, our algorithms are tested numerically and compared to state-of-the-art UCB-based batch-sequential algorithms. Combining starting designs relying on UCB with gradient-based EI local optimization finally appears as a sound option for batch design in distributed Gaussian Process optimization

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Space exploration: The interstellar goal and Titan demonstration

Author
Publication venue
Publication date: 01/11/1982
Field of study

Automated interstellar space exploration is reviewed. The Titan demonstration mission is discussed. Remote sensing and automated modeling are considered. Nuclear electric propulsion, main orbiting spacecraft, lander/rover, subsatellites, atmospheric probes, powered air vehicles, and a surface science network comprise mission component concepts. Machine, intelligence in space exploration is discussed

NASA Technical Reports Server

UMDA/S: An Effective Iterative Compilation Algorithm for Parameter Search

Author: Che Yonggang
Lu Pingjing
Wang Zhenghua
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

The search process is critical for iterative compilation because the large size of the search space and the cost of evaluating the candidate implementations make it infeasible to find the true optimal value of the optimization parameter by brute force. Considering it as a nonlinear global optimization problem, this paper introduces a new hybrid algorithm -- UMDA/S: Univariate Marginal Distribution Algorithm with Nelder-Mead Simplex Search, which utilizes the optimization space structure and parameter dependency to find the near optimal parameter. Elitist preservation, weighted estimation and mutation are proposed to improve the performance of UMDA/S. Experimental results show the ability of UMDA/S to locate more excellent parameters, as compared to existing static methods and search algorithms

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Automatic scheduling of image processing pipelines

Author: Sioutas Savvas
Publication venue: Technische Universiteit Eindhoven
Publication date: 18/12/2020
Field of study

Pure OAI Repository

Analytical cost metrics: days of future past

Author: Prajapati Nirmal
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2019
Field of study

2019 Summer.Includes bibliographical references.Future exascale high-performance computing (HPC) systems are expected to be increasingly heterogeneous, consisting of several multi-core CPUs and a large number of accelerators, special-purpose hardware that will increase the computing power of the system in a very energy-efficient way. Specialized, energy-efficient accelerators are also an important component in many diverse systems beyond HPC: gaming machines, general purpose workstations, tablets, phones and other media devices. With Moore's law driving the evolution of hardware platforms towards exascale, the dominant performance metric (time efficiency) has now expanded to also incorporate power/energy efficiency. This work builds analytical cost models for cost metrics such as time, energy, memory access, and silicon area. These models are used to predict the performance of applications, for performance tuning, and chip design. The idea is to work with domain specific accelerators where analytical cost models can be accurately used for performance optimization. The performance optimization problems are formulated as mathematical optimization problems. This work explores the analytical cost modeling and mathematical optimization approach in a few ways. For stencil applications and GPU architectures, the analytical cost models are developed for execution time as well as energy. The models are used for performance tuning over existing architectures, and are coupled with silicon area models of GPU architectures to generate highly efficient architecture configurations. For matrix chain products, analytical closed form solutions for off-chip data movement are built and used to minimize the total data movement cost of a minimum op count tree

Mountain Scholar (Digital Collections of Colorado and Wyoming)