94,190 research outputs found
Fine-Grain Iterative Compilation for WCET Estimation
Compiler optimizations, although reducing the execution times of programs, raise issues in static WCET estimation techniques and tools. Flow facts, such as loop bounds, may not be automatically found by static WCET analysis tools after aggressive code optimizations. In this paper, we explore the use of iterative compilation (WCET-directed program optimization to explore the optimization space), with the objective to (i) allow flow facts to be automatically found and (ii) select optimizations that result in the lowest WCET estimates. We also explore to which extent code outlining helps, by allowing the selection of different optimization options for different code snippets of the application
Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks
Predicting the number of clock cycles a processor takes to execute a block of
assembly instructions in steady state (the throughput) is important for both
compiler designers and performance engineers. Building an analytical model to
do so is especially complicated in modern x86-64 Complex Instruction Set
Computer (CISC) machines with sophisticated processor microarchitectures in
that it is tedious, error prone, and must be performed from scratch for each
processor generation. In this paper we present Ithemal, the first tool which
learns to predict the throughput of a set of instructions. Ithemal uses a
hierarchical LSTM--based approach to predict throughput based on the opcodes
and operands of instructions in a basic block. We show that Ithemal is more
accurate than state-of-the-art hand-written tools currently used in compiler
backends and static machine code analyzers. In particular, our model has less
than half the error of state-of-the-art analytical models (LLVM's llvm-mca and
Intel's IACA). Ithemal is also able to predict these throughput values just as
fast as the aforementioned tools, and is easily ported across a variety of
processor microarchitectures with minimal developer effort.Comment: Published at 36th International Conference on Machine Learning (ICML)
201
Chaos in computer performance
Modern computer microprocessors are composed of hundreds of millions of
transistors that interact through intricate protocols. Their performance during
program execution may be highly variable and present aperiodic oscillations. In
this paper, we apply current nonlinear time series analysis techniques to the
performances of modern microprocessors during the execution of prototypical
programs. Our results present pieces of evidence strongly supporting that the
high variability of the performance dynamics during the execution of several
programs display low-dimensional deterministic chaos, with sensitivity to
initial conditions comparable to textbook models. Taken together, these results
show that the instantaneous performances of modern microprocessors constitute a
complex (or at least complicated) system and would benefit from analysis with
modern tools of nonlinear and complexity science
Recommended from our members
Automation of Determination of Optimal Intra-Compute Node Parallelism
Maximizing the productivity of modern multicore and manycore chips requires optimizing parallelism at the compute node level. This is, however, a complex multi-step process. It is an iterative method requiring determining optimal degrees of parallel scalability and optimizing memory access behavior. Further, there are multiple cases to be considered, programs which use only MPI or OpenMP and hybrid (MPI +OpenMP) programs. This paper presents a set of three coordinated workïŹows for determining the optimal parallelism at the program level for MPI programs and at the loop level for hybrid (MPI+OpenMP) cases. The paper also details mostly automated implementations of these workïŹows using the PerfExpert infrastructure. Finally the paper presents case studies demonstrating both the applicability and the effectiveness of optimizing parallelism at the compute node level. The results shown in the paper will provide valuable information to further advance in the full automation of the workïŹows. The software implementing the parallelism scalability optimization is open source and available for download.Texas Advanced Computing Center (TACC)Computer Science
Inferring Energy Bounds via Static Program Analysis and Evolutionary Modeling of Basic Blocks
The ever increasing number and complexity of energy-bound devices (such as
the ones used in Internet of Things applications, smart phones, and mission
critical systems) pose an important challenge on techniques to optimize their
energy consumption and to verify that they will perform their function within
the available energy budget. In this work we address this challenge from the
software point of view and propose a novel parametric approach to estimating
tight bounds on the energy consumed by program executions that are practical
for their application to energy verification and optimization. Our approach
divides a program into basic (branchless) blocks and estimates the maximal and
minimal energy consumption for each block using an evolutionary algorithm. Then
it combines the obtained values according to the program control flow, using
static analysis, to infer functions that give both upper and lower bounds on
the energy consumption of the whole program and its procedures as functions on
input data sizes. We have tested our approach on (C-like) embedded programs
running on the XMOS hardware platform. However, our method is general enough to
be applied to other microprocessor architectures and programming languages. The
bounds obtained by our prototype implementation can be tight while remaining on
the safe side of budgets in practice, as shown by our experimental evaluation.Comment: Pre-proceedings paper presented at the 27th International Symposium
on Logic-Based Program Synthesis and Transformation (LOPSTR 2017), Namur,
Belgium, 10-12 October 2017 (arXiv:1708.07854). Improved version of the one
presented at the HIP3ES 2016 workshop (v1): more experimental results (added
benchmark to Table 1, added figure for new benchmark, added Table 3),
improved Fig. 1, added Fig.
Statistical Reliability Estimation of Microprocessor-Based Systems
What is the probability that the execution state of a given microprocessor running a given application is correct, in a certain working environment with a given soft-error rate? Trying to answer this question using fault injection can be very expensive and time consuming. This paper proposes the baseline for a new methodology, based on microprocessor error probability profiling, that aims at estimating fault injection results without the need of a typical fault injection setup. The proposed methodology is based on two main ideas: a one-time fault-injection analysis of the microprocessor architecture to characterize the probability of successful execution of each of its instructions in presence of a soft-error, and a static and very fast analysis of the control and data flow of the target software application to compute its probability of success. The presented work goes beyond the dependability evaluation problem; it also has the potential to become the backbone for new tools able to help engineers to choose the best hardware and software architecture to structurally maximize the probability of a correct execution of the target softwar
- âŠ