Search CORE

16,306 research outputs found

Global predicate analysis and its application to register allocation

Author: California
David M Gillies
Dz-Thing Roy
Ju Hewlett-Packard
Publication venue: IEEE Computer Society
Publication date: 01/01/1996
Field of study

Abstract To fully utilize the wide machine resources in modern high-performance microprocessors it is necessary to exploit parallelism beyond individual basic blocks. Architectural support for predicated execution increases the degree of instruction level parallelism by allowing instructions from dtgerent basic blocks to be converted to straight-line code guarded by boolean predicates. However; predicated execution also presents signijcant challenges to an optimizing compiler For example, in live range analysis, a predicated definition does not necessarily end the live range of a virtual register This paper describes techniques to analyze the relations among predicates in order to improve the precision and effectiveness of various compiler analysis and transformation phases in the presence of predicated code. Our predicate analysis operates globally to obtain relations among predicates. Moreover we analyze control flow and predication in a single unifiedframework. The result can be queried by subsequent optimization and analysis phases. Based on this framework, we extend a traditional method to a predicate-aware register allocator which takes global predicate relations into account. We have implemented the proposed algorithms to effectively reduce register pressure. Our experimental results show 24.6% of a large test suite obtain, on average, 20.71% better register allocation due to the algorithms presented in this paper

CiteSeerX

IR-Level Versus Machine-Level If-Conversion for Predicated Architectures

Author: Alexander Jordan
Andreas Krall
Nikolai Kim
Publication venue
Publication date: 01/01/2013
Field of study

If-conversion is a simple yet powerful optimization that converts control dependences into data dependences. It allows elimination of branches and increases available instruction level parallelism and thus overall performance. If-conversion can either be applied alone or in combination with other techniques that increase the size of scheduling regions. The presence of hardware support for predicated execution allows if-conversion to be broadly applied in a given program. This makes it necessary to guide the optimization using heuristic estimates regarding its potential benefit. Similar to other transformations in an optimizing compiler, if-conversion inherently su↵ers from phase ordering issues. Driven by these facts, we developed two algorithms for if-conversion targeting the TI TMS320C64x+ architecture within the LLVM framework. Each implementation targets a di↵erent level of code abstraction. While one targets the intermediate representation, the other addresses machine-level code. Both make use of an adapted set of estimation heuristics and prove to be successful in general, but each one exhibits di↵erent strengths and weaknesses. High-level if-conversion, applied before other control flow transformations, has more freedom to operate. But in contrast to its machine-level counterpart, which is more restricted, its estimations of runtime are less accurate. Our results from experimental evaluation show a mean speedup close to 14 % for both algorithms on a set of programs from the MiBench and DSPstone benchmark suites. We give a comparison of the implemented optimizations and discuss gained insights on the topics of ifconversion, phase ordering issues and profitability analysis

CiteSeerX

Crossref

Code scheduling for multiple instruction stream architectures

Author: C. Stephens
Gary Tyson
J. E. Smith
Matthew Farrens
R. L. Sites
T. Austin
W. Wulf
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Worst-Case Execution Time Analysis of Predicated Architectures

Author: Brandner Florian
Naji Amine
Publication venue: OASIcs - OpenAccess Series in Informatics. 17th International Workshop on Worst-Case Execution Time Analysis (WCET 2017)
Publication date: 01/01/2017
Field of study

The time-predictable design of computer architectures for the use in (hard) real-time systems is becoming more and more important, due to the increasing complexity of modern computer architectures. The design of predictable processor pipelines recently received considerable attention. The goal here is to find a trade-off between predictability and computing power. Branches and jumps are particularly problematic for high-performance processors. For one, branches are executed late in the pipeline. This either leads to high branch penalties (flushing) or complex software/hardware techniques (branch predictors). Another side-effect of branches is that they make it difficult to exploit instruction-level parallelism due to control dependencies. Predicated computer architectures allow to attach a predicate to the instructions in a program. An instruction is then only executed when the predicate evaluates to true and otherwise behaves like a simple nop instruction. Predicates can thus be used to convert control dependencies into data dependencies, which helps to address both of the aforementioned problems. A downside of predicated instructions is the precise worst-case execution time (WCET) analysis of programs making use of them. Predicated memory accesses, for instance, may or may not have an impact on the processor\u27s cache and thus need to be considered by the cache analysis. Predication potentially has an impact on all analysis phases of a WCET analysis tool. We thus explore a preprocessing step that explicitly unfolds the control-flow graph, which allows us to apply standard analyses that are themselves not aware of predication

Dagstuhl Research Online Publication Server

Swing modulo scheduling: a lifetime-sensitive approach

Author: Ayguadé Parra Eduard
González Colás Antonio María
Llosa Espuny José Francisco
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

This paper presents a novel software pipelining approach, which is called Swing Modulo Scheduling (SMS). It generates schedules that are near optimal in terms of initiation interval, register requirements and stage count. Swing Modulo Scheduling is an heuristic approach that has a low computational cost. The paper describes the technique and evaluates it for the Perfect Club benchmark suite. SMS is compared with other heuristic methods showing that it outperforms them in terms of the quality of the obtained schedules and compilation time. SMS is also compared with an integer linear programming approach that generates optimum schedules but with a huge computational cost, which makes it feasible only for very small loops. For a set of small loops, SMS obtained the optimum initiation interval in all the cases and its schedules required only 5% more registers and a 1% higher stage count than the optimumPeer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Future value based single assignment program representations and optimizations

Author: Ding Shuhan
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2012
Field of study

An optimizing compiler internal representation fundamentally affects the clarity, efficiency and feasibility of optimization algorithms employed by the compiler. Static Single Assignment (SSA) as a state-of-the-art program representation has great advantages though still can be improved. This dissertation explores the domain of single assignment beyond SSA, and presents two novel program representations: Future Gated Single Assignment (FGSA) and Recursive Future Predicated Form (RFPF). Both FGSA and RFPF embed control flow and data flow information, enabling efficient traversal program information and thus leading to better and simpler optimizations. We introduce future value concept, the designing base of both FGSA and RFPF, which permits a consumer instruction to be encountered before the producer of its source operand(s) in a control flow setting. We show that FGSA is efficiently computable by using a series T1/T2/TR transformation, yielding an expected linear time algorithm for combining together the construction of the pruned single assignment form and live analysis for both reducible and irreducible graphs. As a result, the approach results in an average reduction of 7.7%, with a maximum of 67% in the number of gating functions compared to the pruned SSA form on the SPEC2000 benchmark suite. We present a solid and near optimal framework to perform inverse transformation from single assignment programs. We demonstrate the importance of unrestricted code motion and present RFPF. We develop algorithms which enable instruction movement in acyclic, as well as cyclic regions, and show the ease to perform optimizations such as Partial Redundancy Elimination on RFPF

Michigan Technological University