Search CORE

7,972 research outputs found

Survey on Combinatorial Register Allocation and Instruction Scheduling

Author: Lozano Roberto Castañeda
Schulte Christian
Publication venue
Publication date: 01/01/2018
Field of study

Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a compiler. In the last three decades, combinatorial optimization has emerged as an alternative to traditional, heuristic algorithms for these two tasks. Combinatorial optimization approaches can deliver optimal solutions according to a model, can precisely capture trade-offs between conflicting decisions, and are more flexible at the expense of increased compilation time. This paper provides an exhaustive literature review and a classification of combinatorial optimization approaches to register allocation and instruction scheduling, with a focus on the techniques that are most applied in this context: integer programming, constraint programming, partitioned Boolean quadratic programming, and enumeration. Researchers in compilers and combinatorial optimization can benefit from identifying developments, trends, and challenges in the area; compiler practitioners may discern opportunities and grasp the potential benefit of applying combinatorial optimization

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

Author: Bell Steven Emberton
Cao Kaidi
Gao Mingyu
Ha Heonjae
Horowitz Mark
Kozyrakis Christos
Liu Qiaoyi
Nayak Ankita
Pu Jing
Raina Priyanka
Setter Jeff Ou
Yang Xuan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/04/2020
Field of study

We show that DNN accelerator micro-architectures and their program mappings represent specific choices of loop order and hardware parallelism for computing the seven nested loops of DNNs, which enables us to create a formal taxonomy of all existing dense DNN accelerators. Surprisingly, the loop transformations needed to create these hardware variants can be precisely and concisely represented by Halide's scheduling language. By modifying the Halide compiler to generate hardware, we create a system that can fairly compare these prior accelerators. As long as proper loop blocking schemes are used, and the hardware can support mapping replicated loops, many different hardware dataflows yield similar energy efficiency with good performance. This is because the loop blocking can ensure that most data references stay on-chip with good locality and the processing units have high resource utilization. How resources are allocated, especially in the memory system, has a large impact on energy and performance. By optimizing hardware resource allocation while keeping throughput constant, we achieve up to 4.2X energy improvement for Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long Short-Term Memories (LSTMs) and multi-layer perceptrons (MLPs), respectively.Comment: Published as a conference paper at ASPLOS 202

arXiv.org e-Print Archive

Crossref

Feasibility of Optimizations Requiring Bounded Treewidth in a Data Flow Centric Intermediate Representation

Author: Meyer Jan Christian
Sjømæling Nordgaard Sigve
Publication venue: NIKT Foundation
Publication date: 01/01/2020
Field of study

Data flow analyses are instrumental to effective compiler optimizations, and are typically implemented by extracting implicit data flow information from traversals of a control flow graph intermediate representation. The Regionalized Value State Dependence Graph is an alternative intermediate representation, which represents a program in terms of its data flow dependencies, leaving control flow implicit. Several analyses that enable compiler optimizations reduce to NP-Complete graph problems in general, but admit linear time solutions if the graph’s treewidth is limited. In this paper, we investigate the treewidth of application benchmarks and synthetic programs, in order to identify program features which cause the treewidth of its data flow graph to increase, and assess how they may appear in practical software. We find that increasing numbers of live variables cause unbounded growth in data flow graph treewidth, but this can ordinarily be remedied by modular program design, and monolithic programs that exceed a given bound can be efficiently detected using an approximate treewidth heuristic

BIBSYS: Open Journals Systems

NORA - Norwegian Open Research Archives

Hardware-software codesign in a high-level synthesis environment

Author: Visegrady Tamas L
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/1999
Field of study

Interfacing hardware-oriented high-level synthesis to software development is a computationally hard problem for which no general solution exists. Under special conditions, the hardware-software codesign (system-level synthesis) problem may be analyzed with traditional tools and efficient heuristics. This dissertation introduces a new alternative to the currently used heuristic methods. The new approach combines the results of top-down hardware development with existing basic hardware units (bottom-up libraries) and compiler generation tools. The optimization goal is to maximize operating frequency or minimize cost with reasonable tradeoffs in other properties. The dissertation research provides a unified approach to hardware-software codesign. The improvements over previously existing design methodologies are presented in the frame-work of an academic CAD environment (PIPE). This CAD environment implements a sufficient subset of functions of commercial microelectronics CAD packages. The results may be generalized for other general-purpose algorithms or environments. Reference benchmarks are used to validate the new approach. Most of the well-known benchmarks are based on discrete-time numerical simulations, digital filtering applications, and cryptography (an emerging field in benchmarking). As there is a need for high-performance applications, an additional requirement for this dissertation is to investigate pipelined hardware-software systems\u27 performance and design methods. The results demonstrate that the quality of existing heuristics does not change in the enhanced, hardware-software environment

UNH Scholars' Repository

An Efficient Algorithm for Computing Network Reliability in Small Treewidth

Author: Goharshady Amir Kafshdar
Mohammadi Fatemeh
Publication venue
Publication date: 19/03/2019
Field of study

We consider the classic problem of Network Reliability. A network is given together with a source vertex, one or more target vertices, and probabilities assigned to each of the edges. Each edge appears in the network with its associated probability and the problem is to determine the probability of having at least one source-to-target path. This problem is known to be NP-hard. We present a linear-time fixed-parameter algorithm based on a parameter called treewidth, which is a measure of tree-likeness of graphs. Network Reliability was already known to be solvable in polynomial time for bounded treewidth, but there were no concrete algorithms and the known methods used complicated structures and were not easy to implement. We provide a significantly simpler and more intuitive algorithm that is much easier to implement. We also report on an implementation of our algorithm and establish the applicability of our approach by providing experimental results on the graphs of subway and transit systems of several major cities, such as London and Tokyo. To the best of our knowledge, this is the first exact algorithm for Network Reliability that can scale to handle real-world instances of the problem.Comment: 14 page

arXiv.org e-Print Archive

Ghent University Academic Bibliography

IST Austria: PubRep (Institute of Science and Technology)

Hal-Diderot

Comparability graph coloring for optimizing utilization of stream register files in stream processors

Author: Golumbic Martin Charles
Jingling Xue
Li Wang
Owens John D.
Xuejun Yang
Ying Zhang
Yu Deng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Dataflow computers: a tutorial and survey

Author: Davis A.L.
Drongowski P. J.
Publication venue: University of Utah
Publication date: 01/01/1980
Field of study

Journal ArticleThe demand for very high performance computer has encouraged some researchers in the computer science field to consider alternatives to the conventional notions of program and computer organization. The dataflow computer is one attempt to form a new collection of consistent systems ideas to improve both computer performance and to alleviate the software design problems induced by the construction of highly concurrent programs

The University of Utah: J. Willard Marriott Digital Library