2 research outputs found
Worst-Case Execution Time Analysis of Predicated Architectures
The time-predictable design of computer architectures for the use in (hard) real-time systems is becoming more and more important, due to the increasing complexity of modern computer architectures. The design of predictable processor pipelines recently received considerable attention. The goal here is to find a trade-off between predictability and computing power.
Branches and jumps are particularly problematic for high-performance processors. For one, branches are executed late in the pipeline. This either leads to high branch penalties (flushing) or complex software/hardware techniques (branch predictors). Another side-effect of branches is that they make it difficult to exploit instruction-level parallelism due to control dependencies.
Predicated computer architectures allow to attach a predicate to the instructions in a program. An instruction is then only executed when the predicate evaluates to true and otherwise behaves like a simple nop instruction. Predicates can thus be used to convert control dependencies into data dependencies, which helps to address both of the aforementioned problems.
A downside of predicated instructions is the precise worst-case execution time (WCET) analysis of programs making use of them. Predicated memory accesses, for instance, may or may not have an impact on the processor\u27s cache and thus need to be considered by the cache analysis. Predication potentially has an impact on all analysis phases of a WCET analysis tool. We thus explore a preprocessing step that explicitly unfolds the control-flow graph, which allows us to apply standard analyses that are themselves not aware of predication
Profile-guided redundancy elimination
Program optimisations analyse and transform the programs such
that better performance results can be achieved. Classical optimisations
mainly use the static properties of the programs to analyse program
code and make sure that the optimisations work for every possible
combination of the program and the input data. This approach
is conservative in those cases when the programs show the same runtime
behaviours for most of their execution time. On the other hand,
profile-guided optimisations use runtime profiling information to discover
the aforementioned common behaviours of the programs and explore
more optimisation opportunities, which are missed in the classical,
non-profile-guided optimisations. Redundancy elimination is one of the
most powerful optimisations in compilers. In this thesis, a new partial
redundancy elimination (PRE) algorithm and a partial dead code elimination
algorithm (PDE) are proposed for a profile-guided redundancy
elimination framework. During the design and implementation of the
algorithms, we address three critical issues: optimality, feasibility and
profitability.
First, we prove that both our speculative PRE algorithm and our
region-based PDE algorithm are optimal for given edge profiling information.
The total number of dynamic occurrences of redundant expressions
or dead codes cannot be further eliminated by any other code
motion. Moreover, our speculative PRE algorithm is lifetime optimal,
which means that the lifetimes of new introduced temporary variables
are minimised.
Second, we show that both algorithms are practical and can be efficiently
implemented in production compilers. For SPEC CPU2000
benchmarks, the average compilation overhead for our PRE algorithm
is 3%, and the average overhead for our PDE algorithm is less than 2%.
Moreover, edge profiling rather than expensive path profiling is sufficient
to guarantee the optimality of the algorithms.
Finally, we demonstrate that the proposed profile-guided redundancy
elimination techniques can provide speedups on real machines by conducting
a thorough performance evaluation. To the best of our knowledge,
this is the first performance evaluation of the profile-guided redundancy
elimination techniques on real machines