Search CORE

94,190 research outputs found

Fine-Grain Iterative Compilation for WCET Estimation

Author: Cullmann Christoph
Derrien Steven
Gebhard Gernot
Puaut Isabelle
Publication venue: OASIcs - OpenAccess Series in Informatics. 18th International Workshop on Worst-Case Execution Time Analysis (WCET 2018)
Publication date: 01/01/2018
Field of study

Compiler optimizations, although reducing the execution times of programs, raise issues in static WCET estimation techniques and tools. Flow facts, such as loop bounds, may not be automatically found by static WCET analysis tools after aggressive code optimizations. In this paper, we explore the use of iterative compilation (WCET-directed program optimization to explore the optimization space), with the objective to (i) allow flow facts to be automatically found and (ii) select optimizations that result in the lowest WCET estimates. We also explore to which extent code outlining helps, by allowing the selection of different optimization options for different code snippets of the application

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Dagstuhl Research Online Publication Server

HAL-Rennes 1

Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks

Author: Amarasinghe Saman
Carbin Michael
Mendis Charith
Renda Alex
Publication venue
Publication date: 30/05/2019
Field of study

Predicting the number of clock cycles a processor takes to execute a block of assembly instructions in steady state (the throughput) is important for both compiler designers and performance engineers. Building an analytical model to do so is especially complicated in modern x86-64 Complex Instruction Set Computer (CISC) machines with sophisticated processor microarchitectures in that it is tedious, error prone, and must be performed from scratch for each processor generation. In this paper we present Ithemal, the first tool which learns to predict the throughput of a set of instructions. Ithemal uses a hierarchical LSTM--based approach to predict throughput based on the opcodes and operands of instructions in a basic block. We show that Ithemal is more accurate than state-of-the-art hand-written tools currently used in compiler backends and static machine code analyzers. In particular, our model has less than half the error of state-of-the-art analytical models (LLVM's llvm-mca and Intel's IACA). Ithemal is also able to predict these throughput values just as fast as the aforementioned tools, and is easily ported across a variety of processor microarchitectures with minimal developer effort.Comment: Published at 36th International Conference on Machine Learning (ICML) 201

arXiv.org e-Print Archive

DSpace@MIT

Chaos in computer performance

Author: Cook M.
Daniel Gracia Pérez
Hugues Berry
Kantz H.
Kilian J.
Kulkarni P.
Moore G. E.
Olivier Temam
Stephenson M.
Wolfram S.
Publication venue: 'AIP Publishing'
Publication date: 14/12/2005
Field of study

Modern computer microprocessors are composed of hundreds of millions of transistors that interact through intricate protocols. Their performance during program execution may be highly variable and present aperiodic oscillations. In this paper, we apply current nonlinear time series analysis techniques to the performances of modern microprocessors during the execution of prototypical programs. Our results present pieces of evidence strongly supporting that the high variability of the performance dynamics during the execution of several programs display low-dimensional deterministic chaos, with sensitivity to initial conditions comparable to textbook models. Taken together, these results show that the instantaneous performances of modern microprocessors constitute a complex (or at least complicated) system and would benefit from analysis with modern tools of nonlinear and complexity science

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Recommended from our members

Automation of Determination of Optimal Intra-Compute Node Parallelism

Author: Brown James C.
Gómez-Iglesias Antonio
Publication venue
Publication date: 01/01/2016
Field of study

Maximizing the productivity of modern multicore and manycore chips requires optimizing parallelism at the compute node level. This is, however, a complex multi-step process. It is an iterative method requiring determining optimal degrees of parallel scalability and optimizing memory access behavior. Further, there are multiple cases to be considered, programs which use only MPI or OpenMP and hybrid (MPI +OpenMP) programs. This paper presents a set of three coordinated workﬂows for determining the optimal parallelism at the program level for MPI programs and at the loop level for hybrid (MPI+OpenMP) cases. The paper also details mostly automated implementations of these workﬂows using the PerfExpert infrastructure. Finally the paper presents case studies demonstrating both the applicability and the effectiveness of optimizing parallelism at the compute node level. The results shown in the paper will provide valuable information to further advance in the full automation of the workﬂows. The software implementing the parallelism scalability optimization is open source and available for download.Texas Advanced Computing Center (TACC)Computer Science

Texas ScholarWorks

Inferring Energy Bounds via Static Program Analysis and Evolutionary Modeling of Basic Blocks

Author: A. SERRANO
E Albert
J Navas
M Hermenegildo
M Méndez-Lojo
M. V. HERMENEGILDO
P Lopez-Garcia
Reinhard Wilhelm
S Lafond
Saumya K. Debray
Steve Kerrison
U Liqat
U. Liqat
Publication venue
Publication date: 22/09/2017
Field of study

The ever increasing number and complexity of energy-bound devices (such as the ones used in Internet of Things applications, smart phones, and mission critical systems) pose an important challenge on techniques to optimize their energy consumption and to verify that they will perform their function within the available energy budget. In this work we address this challenge from the software point of view and propose a novel parametric approach to estimating tight bounds on the energy consumed by program executions that are practical for their application to energy verification and optimization. Our approach divides a program into basic (branchless) blocks and estimates the maximal and minimal energy consumption for each block using an evolutionary algorithm. Then it combines the obtained values according to the program control flow, using static analysis, to infer functions that give both upper and lower bounds on the energy consumption of the whole program and its procedures as functions on input data sizes. We have tested our approach on (C-like) embedded programs running on the XMOS hardware platform. However, our method is general enough to be applied to other microprocessor architectures and programming languages. The bounds obtained by our prototype implementation can be tight while remaining on the safe side of budgets in practice, as shown by our experimental evaluation.Comment: Pre-proceedings paper presented at the 27th International Symposium on Logic-Based Program Synthesis and Transformation (LOPSTR 2017), Namur, Belgium, 10-12 October 2017 (arXiv:1708.07854). Improved version of the one presented at the HIP3ES 2016 workshop (v1): more experimental results (added benchmark to Table 1, added figure for new benchmark, added Table 3), improved Fig. 1, added Fig.

arXiv.org e-Print Archive

Crossref

Archivo Digital UPM

Statistical Reliability Estimation of Microprocessor-Based Systems

Author: Benso Alfredo
Bosio A.
Di Carlo Stefano
Di Natale G.
Politano Gianfranco Michele Maria
Savino Alessandro
Publication venue: IEEE Computer Society
Publication date: 01/01/2012
Field of study

What is the probability that the execution state of a given microprocessor running a given application is correct, in a certain working environment with a given soft-error rate? Trying to answer this question using fault injection can be very expensive and time consuming. This paper proposes the baseline for a new methodology, based on microprocessor error probability profiling, that aims at estimating fault injection results without the need of a typical fault injection setup. The proposed methodology is based on two main ideas: a one-time fault-injection analysis of the microprocessor architecture to characterize the probability of successful execution of each of its instructions in presence of a soft-error, and a static and very fast analysis of the control and data flow of the target software application to compute its probability of success. The presented work goes beyond the dependability evaluation problem; it also has the potential to become the backbone for new tools able to help engineers to choose the best hardware and software architecture to structurally maximize the probability of a correct execution of the target softwar

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino