10 research outputs found
Parallel Copy Elimination on Data Dependence Graphs
Register allocation regained much interest in recent years due to the development of decoupled strategies that split the problem into separate phases: spilling, register assignment, and copy elimination. Traditional approaches to copy elimination during register allocation are based on interference graphs and register coalescing. Variables are represented as nodes in a graph, which are coalesced, if they can be assigned the same register. However, decoupled approaches strive to avoid interference graphs and thus often resort to local recoloring. A common assumption of existing coalescing and recoloring approaches is that the original ordering of the instructions in the program is not changed. This work presents an extension of a local recoloring technique called Parallel Copy Motion. We perform code motion on data dependence graphs in order to eliminate useless copies and reorder instructions, while at the same time a valid register assignment is preserved. Our results show that even after traditional register allocation with coalescing our technique is able to eliminate an additional 3% (up to 9%) of the remaining copies and reduce the weighted costs of register copies by up to 25% for the SPECINT 2000 benchmarks. In comparison to Parallel Copy Motion, our technique removes 11% (up to 20%) more copies and up to 39% more of the copy costs
Parallel Copy Motion
International audienceRecent results on the static single assignment (SSA) form open promising directions for the design of new register allocation heuristics for just-in-time (JIT) compilation. In particular, heuris- tics based on tree scans with two decoupled phases, one for spilling, one for splitting/coloring/coalescing, seem good candidates for de- signing memory-friendly, fast, and competitive register allocators. Another class of register allocators, well-suited for JIT compilation, are those based on linear scans. Most of them perform coalesc- ing poorly but also do live-range splitting (mostly on control-flow edges) to avoid spilling. This leads to a large amount of register-to- register copies inside basic blocks but also, implicitly, on critical edges, i.e., edges that flow from a block with several successors to a block with several predecessors. This paper presents a new back-end optimization that we call parallel copy motion. The technique is to move copy instructions in a register-allocated code from a program point, possibly an edge, to another. In contrast with a classical scheduler that must preserve data dependences, our copy motion also permutes register assign- ments so that a copy can "traverse" all instructions of a basic block, except those with conflicting register constraints. Thus, parallel copies can be placed either where the scheduling has some empty slots (for multiple-issues architectures), or where fewer copies are necessary because some variables are dead at this point. Moreover, to the cost of some code compensations (namely, the reverse of the copy), a copy can also be moved out from a critical edge. This pro- vides a simple solution to avoid critical-edge splitting, especially useful when the compiler cannot split it, as it is the case for abnor- mal edges. This compensation technique also enables the schedul- ing/motion of the copy in the successor or predecessor basic block. Experiments with the SPECint benchmarks suite and our own benchmark suite show that we can now apply broadly an SSA-based register allocator: all procedures, even with abnormal edges, can be treated. Simple strategies for moving copies from edges and locally inside basic block show significant average improvements (4% for SPECint and 3% for our suite), with no degradation. It let us believe that the approach is promising, and not only for improving coalescing in fast register allocators
Allocation de registres découplée (basée sur la formulation SSA) : De la théorie à la pratique, faire face aux contraintes liées à la compilation juste à temps et aux processeurs embarqués
My thesis deals with register allocation. During this phase, the compiler has to assign variables of the source program, in an arbitrary big number, to actual registers of the processor, in a limited number k. Recent works, for instance the thesis of F. Bouchez and S. Hack, have shown that it is possible to split in two different decoupled step this phase: the spill - store the variables into memory to release registers - followed by the registers assignment. These works demonstrate the feasibility of this decoupling relying on a theoretic framework and some assumptions. In particular, it is sufficient to ensure after the spill step that the number of variables simultaneously live is below k.My thesis follows these works by showing how to apply this kind of approach when real-world constraints come in play: instructions encoding, ABI (application binary interface), register aliasing. Different approaches are proposed. They allow either to ignore these problems or to directly tackle them into the theoretic framework. The hypothesis of the models and the proposed solutions are evaluated and validated using a thorough experimental study with the compiler of STMicroelectronics. Finally, all these works have been done with the constraints of modern compilers in mind, the JIT (just-in-time) compilation, where the compilation time et the memory footprint of the compiler are key factors. We strive to offer solutions that cope with these criteria or improve the result until a given budget is reached. We, in particular, used the SSA (static single assignment) form to define algorithm like tree scan that generalizes linear scan based approaches proposed for JIT compilation.Ma thèse porte sur l’allocation de registres. Durant cette étape, le compilateur doit assigner les variables du code source, en nombre arbitrairement grand, aux registres physiques du processeur, en nombre limité k. Des travaux récents, notamment ceux des thèses de F. Bouchez et S. Hack, ont montré qu’il était possible de séparer de manière complètement découplée cette étape en deux phases : le vidage (spill) – stockage de variables en mémoire pour libérer des registres – suivi de l’assignation aux registres proprement dite. Ces travaux démontraient la faisabilité de ce découpage en s’appuyant sur un cadre théorique et certaines hypothèses simplificatrices. En particulier, il est suffisant de s’assurer qu’après le spill, le nombre de variables simultanément en vie est inférieur à k.Ma thèse fait suite à ces travaux en montrant comment appliquer ce type d’approche dans un cadre réaliste, en prenant en compte les contraintes liées à l’encodage des instructions, à l’ABI (application binary interface), aux bancs de registres avec aliasing. Différentes approches sont proposées qui permettent soit de s’affranchir des problèmes précités, soit de les prendre en compte directement dans le modèle théorique. Les hypothèses des modèles et les solutions proposées sont évaluées et validées par une étude expérimentale poussée dans le compilateur de STMicroelectronics. Enfin, l’ensemble de ces travaux a été réalisé avec, en ligne de mire, les contraintes de la compilation moderne, la compilation JIT (just-in-time), où rapidité et consommation mémoire du compilateur sont des facteurs déterminants. Nous nous sommes efforcés d’offrir des solutions satisfaisant ces critères ou améliorant les résultats attendus tant qu’un certain budget n’a pas été dépassé, exploitant en particulier la forme SSA (static single assignment) pour définir des algorithmes de type tree scan qui généralisent les approches de type linear scan, proposées pour le JIT
Copy Elimination on Data Dependence Graphs
International audienceRegister allocation recently regained much interest due to new decoupled strategies that split the problem into separate phases: spilling, register assignment, and copy elimination. A common assumption of existing copy elimination approaches is that the original ordering of the instructions in the program is not changed. This work presents an extension of a local recoloring technique called Parallel Copy Motion. We perform code motion on data dependence graphs in order to eliminate useless copies and reorder instructions, while at the same time a valid register assignment is preserved. Our results show that even after traditional register allocation with coalescing our technique is able to eliminate an additional 3% (up to 9%) of the remaining copies and reduce the weighted costs of register copies by up to 25% for the SPECINT 2000 benchmarks. In comparison to Parallel Copy Motion, our technique removes 11% (up to 20%) more copies and up to 39% more of the copy costs
Allocation de registres découplée (basée sur la formulation SSA) (De la théorie à la pratique, faire face aux contraintes liées à la compilation juste à temps et aux processeurs embarqués)
Ma thèse porte sur l allocation de registres. Durant cette étape, le compilateur doit assigner les variables du code source, en nombre arbitrairement grand, aux registres physiques du processeur, en nombre limité k. Des travaux récents, notamment ceux des thèses de F. Bouchez et S. Hack, ont montré qu il était possible de séparer de manière complètement découplée cette étape en deux phases : le vidage (spill) stockage de variables en mémoire pour libérer des registres suivi de l assignation aux registres proprement dite. Ces travaux démontraient la faisabilité de ce découpage en s appuyant sur un cadre théorique et certaines hypothèses simplificatrices. En particulier, il est suffisant de s assurer qu après le spill, le nombre de variables simultanément en vie est inférieur à k.Ma thèse fait suite à ces travaux en montrant comment appliquer ce type d approche dans un cadre réaliste, en prenant en compte les contraintes liées à l encodage des instructions, à l ABI (application binary interface), aux bancs de registres avec aliasing. Différentes approches sont proposées qui permettent soit de s affranchir des problèmes précités, soit de les prendre en compte directement dans le modèle théorique. Les hypothèses des modèles et les solutions proposées sont évaluées et validées par une étude expérimentale poussée dans le compilateur de STMicroelectronics. Enfin, l ensemble de ces travaux a été réalisé avec, en ligne de mire, les contraintes de la compilation moderne, la compilation JIT (just-in-time), où rapidité et consommation mémoire du compilateur sont des facteurs déterminants. Nous nous sommes efforcés d offrir des solutions satisfaisant ces critères ou améliorant les résultats attendus tant qu un certain budget n a pas été dépassé, exploitant en particulier la forme SSA (static single assignment) pour définir des algorithmes de type tree scan qui généralisent les approches de type linear scan, proposées pour le JIT.My thesis deals with register allocation. During this phase, the compiler has to assign variables of the source program, in an arbitrary big number, to actual registers of the processor, in a limited number k. Recent works, for instance the thesis of F. Bouchez and S. Hack, have shown that it is possible to split in two different decoupled step this phase: the spill - store the variables into memory to release registers - followed by the registers assignment. These works demonstrate the feasibility of this decoupling relying on a theoretic framework and some assumptions. In particular, it is sufficient to ensure after the spill step that the number of variables simultaneously live is below k.My thesis follows these works by showing how to apply this kind of approach when real-world constraints come in play: instructions encoding, ABI (application binary interface), register aliasing. Different approaches are proposed. They allow either to ignore these problems or to directly tackle them into the theoretic framework. The hypothesis of the models and the proposed solutions are evaluated and validated using a thorough experimental study with the compiler of STMicroelectronics. Finally, all these works have been done with the constraints of modern compilers in mind, the JIT (just-in-time) compilation, where the compilation time et the memory footprint of the compiler are key factors. We strive to offer solutions that cope with these criteria or improve the result until a given budget is reached. We, in particular, used the SSA (static single assignment) form to define algorithm like tree scan that generalizes linear scan based approaches proposed for JIT compilation.LYON-ENS Sciences (693872304) / SudocSudocFranceF
Elimination of parallel copies using code motion on data dependence graphs
International audienceRegister allocation regained much interest in recent years due to the development of decoupled strategies that split the problem into separate phases: spilling, register assignment, and copy elimination. Traditional approaches to copy elimination during register allocation are based on interference graphs and register coalescing. Variables are represented as nodes in a graph, which are coalesced, if they can be assigned the same register. However, decoupled approaches strive to avoid interference graphs and thus often resort to local recoloring. A common assumption of existing coalescing and recoloring approaches is that the original ordering of the instructions in the program is not changed. This work presents an extension of a local recoloring technique called Parallel Copy Motion. We perform code motion on data dependence graphs in order to eliminate useless copies and reorder instructions, while at the same time a valid register assignment is preserved. Our results show that even after traditional register allocation with coalescing our technique is able to eliminate an additional 3% (up to 9%) of the remaining copies and reduce the weighted costs of register copies by up to 25% for the SPECINT 2000 benchmarks. In comparison to Parallel Copy Motion, our technique removes 11% (up to 20%) more copies and up to 39% more of the copy costs
Studying Optimal Spilling in the Light of SSA
International audienceRecent developments in register allocation, mostly linked tostatic single assignment (SSA) form, have shown that it ispossible to decouple the problem in two successive phases: afirst spilling phase places load and store instructions so thatthe register pressure at all program points is small enough, asecond assignment and coalescing phase maps the remainingvariables to physical registers and reduces the number ofmove instructions among registers. This paper focuses onthe first phase, for which many open questions remain: inparticular, we study the notion of optimal spilling (what canbe expressed?) and the impact of SSA form (does it help?).To identify the important features for optimal spilling onload-store architectures, we develop a new integer linear pro-gramming formulation, more accurate and expressive thanprevious approaches. Among other features, we can expressSSA -functions, memory-to-memory copies, and the factthat a value can be stored simultaneously in a register andin memory. Based on this formulation, we present a thor-ough analysis of the results obtained for the SPECINT 2000and EEMBC 1.1 benchmarks, from which we draw, amongothers, the following conclusions: a) rematerialization is ex-tremely important, b) SSA complicates the formulation ofoptimal spilling, especially because of memory coalescingwhen the code is not in CSSA, c) micro-architectural fea-tures are significant and thus have to be accounted for, d)significant savings can be obtained in terms of static spillcosts, cache miss rates, and dynamic instruction counts
Parallel Copy Motion
International audienceRecent results on the static single assignment (SSA) form open promising directions for the design of new register allocation heuristics for just-in-time (JIT) compilation. In particular, heuris- tics based on tree scans with two decoupled phases, one for spilling, one for splitting/coloring/coalescing, seem good candidates for de- signing memory-friendly, fast, and competitive register allocators. Another class of register allocators, well-suited for JIT compilation, are those based on linear scans. Most of them perform coalesc- ing poorly but also do live-range splitting (mostly on control-flow edges) to avoid spilling. This leads to a large amount of register-to- register copies inside basic blocks but also, implicitly, on critical edges, i.e., edges that flow from a block with several successors to a block with several predecessors. This paper presents a new back-end optimization that we call parallel copy motion. The technique is to move copy instructions in a register-allocated code from a program point, possibly an edge, to another. In contrast with a classical scheduler that must preserve data dependences, our copy motion also permutes register assign- ments so that a copy can "traverse" all instructions of a basic block, except those with conflicting register constraints. Thus, parallel copies can be placed either where the scheduling has some empty slots (for multiple-issues architectures), or where fewer copies are necessary because some variables are dead at this point. Moreover, to the cost of some code compensations (namely, the reverse of the copy), a copy can also be moved out from a critical edge. This pro- vides a simple solution to avoid critical-edge splitting, especially useful when the compiler cannot split it, as it is the case for abnor- mal edges. This compensation technique also enables the schedul- ing/motion of the copy in the successor or predecessor basic block. Experiments with the SPECint benchmarks suite and our own benchmark suite show that we can now apply broadly an SSA-based register allocator: all procedures, even with abnormal edges, can be treated. Simple strategies for moving copies from edges and locally inside basic block show significant average improvements (4% for SPECint and 3% for our suite), with no degradation. It let us believe that the approach is promising, and not only for improving coalescing in fast register allocators