471 research outputs found

    Survey on Combinatorial Register Allocation and Instruction Scheduling

    Full text link
    Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a compiler. In the last three decades, combinatorial optimization has emerged as an alternative to traditional, heuristic algorithms for these two tasks. Combinatorial optimization approaches can deliver optimal solutions according to a model, can precisely capture trade-offs between conflicting decisions, and are more flexible at the expense of increased compilation time. This paper provides an exhaustive literature review and a classification of combinatorial optimization approaches to register allocation and instruction scheduling, with a focus on the techniques that are most applied in this context: integer programming, constraint programming, partitioned Boolean quadratic programming, and enumeration. Researchers in compilers and combinatorial optimization can benefit from identifying developments, trends, and challenges in the area; compiler practitioners may discern opportunities and grasp the potential benefit of applying combinatorial optimization

    Split Latency Allocator: Process Variation-Aware Register Access Latency Boost in a Near-Threshold Graphics Processing Unit

    Get PDF
    Over the last decade, Graphics Processing Units (GPUs) have been used extensively in gaming consoles, mobile phones, workstations and data centers, as they have exhibited immense performance improvement over CPUs, in graphics intensive applications. Due to their highly parallel architecture, general purpose GPUs (GPGPUs) have gained the foreground in applications where large data blocks can be processed in parallel. However, the performance improvement is constrained by a large power consumption. Likewise, Near Threshold Computing (NTC) has emerged as an energy-efficient design paradigm. Hence, operating GPUs at NTC seems like a plausible solution to counteract the high energy consumption. This work investigates the challenges associated with NTC operation of GPUs and proposes a low-power GPU design, Split Latency Allocator, to sustain the performance of GPGPU applications

    Trace-based Register Allocation in a JIT Compiler

    Get PDF
    State-of-the-art dynamic compilers often use global approaches, like Linear Scan or Graph Coloring, for register allocation. These algorithms consider the complete compilation unit for allocation, which increases the complexity of the implementation (e.g., support for lifetime holes in Linear Scan) and potentially also affects compilation time. We propose a novel non-global algorithm, which splits a compilation unit into traces based on profiling feedback and subsequently performs register allocation within each trace individually. Traces reduce the problem size to a single linear code segment, which simplifies the problem a register allocator needs to solve. Additionally, we can apply different register allocation algorithms to each trace. We show that this non-global approach can achieve results competitive to global register allocation. We present an implementation of Trace Register Allocation based on the Graal VM and show an evaluation for common Java benchmarks. We demonstrate that performance of this non-global approach is within 3% (on AMD64) and 1% (on SPARC) of global Linear Scan register allocation.(VLID)247065

    Register Loading via Linear Programming

    Get PDF
    We study the following optimization problem. The input is a number k and a directed graph with a specified “start ” vertex, each of whose vertices may have one “memory bank requirement”, an integer. There are k “registers”, labeled 1...k. A valid solution associates to the vertices with no bank requirement one or more “load instructions ” L[b,j], for bank b and register j, such that every directed trail from the start vertex to some vertex with bank requirement c contains a vertex u that has been associated L[c,i] (for some register i ≀ k) and no vertex following u in the trail has been associated an L[b,i], for any other bank b. The objective is to minimize the total number of associated load instructions. We give a k(k +1)-approximation algorithm based on linear programming rounding, with (k+1) being the best possible unless Vertex Cover has approximation 2−ǫ for Ç«> 0. We also present a O(klogn) approximation, with n being the number of vertices in the input directed graph. Based on the same linear program, another rounding method outputs a valid solution with objective at most 2k times the optimum for k registers, using 2k−1 registers. This version of the paper corrects some minor errors that made it in the final Algorithmica paper.

    Charting the Reform Path

    Get PDF
    A Review of Inequality and the Labor Market: The Case for Greater Competition. Edited by Sharon Block and Benjamin H. Harris

    COMPARISON OF INSTRUCTION SCHEDULING AND REGISTER ALLOCATION FOR MIPS AND HPL-PD ARCHITECTURE FOR EXPLOITATION OF INSTRUCTION LEVEL PARALLELISM

    Get PDF
    The integrated approaches for instruction scheduling and register allocation have been promising area of research for code generation and compiler optimization. In this paper we have proposed an integrated algorithm for instruction scheduling and register allocation and implemented it for compiler optimization in machine description in trimaran infrastructure for exploitation of Instruction level parallelism. Our implementation in trimaran infrastructure shows that our scheduler reduces the number of active live ranges dealt with linear scan allocator. As a result only few spills were needed and the quality of the code generated was improved. For our experiments we used 20 benchmarks available with trimaran infrastructure for HPL-PD architecture. We compare some of these results with results obtained by Haijing Tang et al (2013) performed by LLVM compiler on MIPS architecture. For our experimental work we added machine description (MDES) targeted to HL-PD architecture. The implemented algorithm is based on subgraph isomorphism. The input program is represented in the form of directed acyclic graph (DAG). The vertices of the DAG represent the instructions, input and output operands of the program, while the edges represent dependencies among the instructions

    Network-on-Chip

    Get PDF
    Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems
    • 

    corecore