471 research outputs found
Survey on Combinatorial Register Allocation and Instruction Scheduling
Register allocation (mapping variables to processor registers or memory) and
instruction scheduling (reordering instructions to increase instruction-level
parallelism) are essential tasks for generating efficient assembly code in a
compiler. In the last three decades, combinatorial optimization has emerged as
an alternative to traditional, heuristic algorithms for these two tasks.
Combinatorial optimization approaches can deliver optimal solutions according
to a model, can precisely capture trade-offs between conflicting decisions, and
are more flexible at the expense of increased compilation time.
This paper provides an exhaustive literature review and a classification of
combinatorial optimization approaches to register allocation and instruction
scheduling, with a focus on the techniques that are most applied in this
context: integer programming, constraint programming, partitioned Boolean
quadratic programming, and enumeration. Researchers in compilers and
combinatorial optimization can benefit from identifying developments, trends,
and challenges in the area; compiler practitioners may discern opportunities
and grasp the potential benefit of applying combinatorial optimization
Split Latency Allocator: Process Variation-Aware Register Access Latency Boost in a Near-Threshold Graphics Processing Unit
Over the last decade, Graphics Processing Units (GPUs) have been used extensively in gaming consoles, mobile phones, workstations and data centers, as they have exhibited immense performance improvement over CPUs, in graphics intensive applications. Due to their highly parallel architecture, general purpose GPUs (GPGPUs) have gained the foreground in applications where large data blocks can be processed in parallel. However, the performance improvement is constrained by a large power consumption. Likewise, Near Threshold Computing (NTC) has emerged as an energy-efficient design paradigm. Hence, operating GPUs at NTC seems like a plausible solution to counteract the high energy consumption. This work investigates the challenges associated with NTC operation of GPUs and proposes a low-power GPU design, Split Latency Allocator, to sustain the performance of GPGPU applications
Trace-based Register Allocation in a JIT Compiler
State-of-the-art dynamic compilers often use global approaches, like Linear Scan or Graph Coloring, for register allocation. These algorithms consider the complete compilation unit for allocation, which increases the complexity of the implementation (e.g., support for lifetime holes in Linear Scan) and potentially also affects compilation time. We propose a novel non-global algorithm, which splits a compilation unit into traces based on profiling feedback and subsequently performs register allocation within each trace individually. Traces reduce the problem size to a single linear code segment, which simplifies the problem a register allocator needs to solve. Additionally, we can apply different register allocation algorithms to each trace. We show that this non-global approach can achieve results competitive to global register allocation.
We present an implementation of Trace Register Allocation based on the Graal VM and show an evaluation for common Java benchmarks. We demonstrate that performance of this non-global approach is within 3% (on AMD64) and 1% (on SPARC) of global Linear Scan register allocation.(VLID)247065
Register Loading via Linear Programming
We study the following optimization problem. The input is a number k and a directed graph with a specified âstart â vertex, each of whose vertices may have one âmemory bank requirementâ, an integer. There are k âregistersâ, labeled 1...k. A valid solution associates to the vertices with no bank requirement one or more âload instructions â L[b,j], for bank b and register j, such that every directed trail from the start vertex to some vertex with bank requirement c contains a vertex u that has been associated L[c,i] (for some register i †k) and no vertex following u in the trail has been associated an L[b,i], for any other bank b. The objective is to minimize the total number of associated load instructions. We give a k(k +1)-approximation algorithm based on linear programming rounding, with (k+1) being the best possible unless Vertex Cover has approximation 2âÇ« for Ç«> 0. We also present a O(klogn) approximation, with n being the number of vertices in the input directed graph. Based on the same linear program, another rounding method outputs a valid solution with objective at most 2k times the optimum for k registers, using 2kâ1 registers. This version of the paper corrects some minor errors that made it in the final Algorithmica paper.
Charting the Reform Path
A Review of Inequality and the Labor Market: The Case for Greater Competition. Edited by Sharon Block and Benjamin H. Harris
COMPARISON OF INSTRUCTION SCHEDULING AND REGISTER ALLOCATION FOR MIPS AND HPL-PD ARCHITECTURE FOR EXPLOITATION OF INSTRUCTION LEVEL PARALLELISM
The integrated approaches for instruction scheduling and register allocation have been promising area of research for
code generation and compiler optimization. In this paper we have proposed an integrated algorithm for instruction
scheduling and register allocation and implemented it for compiler optimization in machine description in trimaran
infrastructure for exploitation of Instruction level parallelism. Our implementation in trimaran infrastructure shows
that our scheduler reduces the number of active live ranges dealt with linear scan allocator. As a result only few spills
were needed and the quality of the code generated was improved. For our experiments we used 20 benchmarks
available with trimaran infrastructure for HPL-PD architecture. We compare some of these results with results
obtained by Haijing Tang et al (2013) performed by LLVM compiler on MIPS architecture. For our experimental work
we added machine description (MDES) targeted to HL-PD architecture. The implemented algorithm is based on
subgraph isomorphism. The input program is represented in the form of directed acyclic graph (DAG). The vertices of
the DAG represent the instructions, input and output operands of the program, while the edges represent dependencies
among the instructions
Network-on-Chip
Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoCâits research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems
- âŠ