Search CORE

47 research outputs found

Survey on Combinatorial Register Allocation and Instruction Scheduling

Author: Lozano Roberto Castañeda
Schulte Christian
Publication venue
Publication date: 01/01/2018
Field of study

Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a compiler. In the last three decades, combinatorial optimization has emerged as an alternative to traditional, heuristic algorithms for these two tasks. Combinatorial optimization approaches can deliver optimal solutions according to a model, can precisely capture trade-offs between conflicting decisions, and are more flexible at the expense of increased compilation time. This paper provides an exhaustive literature review and a classification of combinatorial optimization approaches to register allocation and instruction scheduling, with a focus on the techniques that are most applied in this context: integer programming, constraint programming, partitioned Boolean quadratic programming, and enumeration. Researchers in compilers and combinatorial optimization can benefit from identifying developments, trends, and challenges in the area; compiler practitioners may discern opportunities and grasp the potential benefit of applying combinatorial optimization

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Custom Integrated Circuits

Author: Allen Jonathan
Aluru Narayana R.
Bergendahl Jason R.
Beskok Ali
Chandrakasan Anantha P.
Chou Michael T.
Decker Steven J.
Devadas Srinivas
Dynes Scott B. C.
Ehrlich Michael S.
Elfadel Ibrahim M.
Engels Daniel W.
Fallah Farzan
Frumkin Stanislav E.
Gealow Jeffrey C.
Hadjiyiannis George I.
Hanono Silvina Z.
Horn Berthold K. P.
Kamon Mattan
Korsmeyer F. Thomas
Lee Chang Ho
Lee Hae-Seung
Li Jing-Rebecca
Martin David A.
Masaki Ichiro
Massoud Yehia M.
Nastov Ogden J.
Newman J. Nicholas
Orlando Terry P.
Phillips Joel R.
Schmidt Martin A.
Senturia Stephen D.
Sodini Charles G.
Tausch Johannes
Terman Christopher J.
van der Zant Herre S. J.
Wang Chig-Chun
Wang Junfeng
White Jacob K.
Wyatt John L., Jr.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date
Field of study

Contains table of contents for Part III, table of contents for Section 1 and reports on eleven research projects.IBM CorporationMIT School of EngineeringNational Science Foundation Grant MIP 94-23221Defense Advanced Research Projects Agency/U.S. Army Intelligence Center Contract DABT63-94-C-0053Mitsubishi CorporationNational Science Foundation Young Investigator Award Fellowship MIP 92-58376Joint Industry Program on Offshore Structure AnalysisAnalog DevicesDefense Advanced Research Projects AgencyCadence Design SystemsMAFET ConsortiumConsortium for Superconducting ElectronicsNational Defense Science and Engineering Graduate FellowshipDigital Equipment CorporationMIT Lincoln LaboratorySemiconductor Research CorporationMultiuniversity Research IntiativeNational Science Foundatio

DSpace@MIT

Survey on Instruction Selection: An Extensive and Modern Literature Review

Author: Blindell Gabriel S. Hjort
Publication venue
Publication date: 01/01/2013
Field of study

Instruction selection is one of three optimisation problems involved in the code generator backend of a compiler. The instruction selector is responsible of transforming an input program from its target-independent representation into a target-specific form by making best use of the available machine instructions. Hence instruction selection is a crucial part of efficient code generation. Despite on-going research since the late 1960s, the last, comprehensive survey on the field was written more than 30 years ago. As new approaches and techniques have appeared since its publication, this brings forth a need for a new, up-to-date review of the current body of literature. This report addresses that need by performing an extensive review and categorisation of existing research. The report therefore supersedes and extends the previous surveys, and also attempts to identify where future research should be directed.Comment: Major changes: - Merged simulation chapter with macro expansion chapter - Addressed misunderstandings of several approaches - Completely rewrote many parts of the chapters; strengthened the discussion of many approaches - Revised the drawing of all trees and graphs to put the root at the top instead of at the bottom - Added appendix for listing the approaches in a table See doc for more inf

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Code Generation for an Application-Specific VLIW Processor With Clustered, Addressable Register Files

Author: Bernard Christian
Charles Henri-Pierre
Cohen Albert
Fabre Christian
Llopard Ivan
Martin Jérôme
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/02/2013
Field of study

International audienceModern compilers integrate recent advances in compiler construction, intermediate representations, algorithms and programming language front-ends. Yet code generation for appli\-cation-specific architectures benefits only marginally from this trend, as most of the effort is oriented towards popular general-purpose architectures. Historically, non-orthogonal architectures have relied on custom compiler technologies, some retargettable, but largely decoupled from the evolution of mainstream tool flows. Very Long Instruction Word (VLIW) architectures have introduced a variety of interesting problems such as clusterization, packetization or bundling, instruction scheduling for exposed pipelines, long delay slots, software pipelining, etc. These have been addressed in the literature, with a focus on the exploitation of Instruction Level Parallelism (ILP). While these are well known solutions already embedded into existing compilers, they rely on common hardware functionalities that are expected to be present in a fairly large subset of VLIW architectures. This paper presents our work on back-end compiler for Mephisto, a high performance low-power application-specific processor, based on LLVM. Mephisto is specialized enough to challenge established code generation solutions for VLIW and DSP processors, calling for an innovative compilation flow. Conversely, even though Mephisto might be seen a somewhat exotic processor, its hardware characteristics such as addressable register files benefit from existing analyses and transformations in LLVM. We describe our model of the Mephisto architecture, the difficulties we encountered, and the associated compilation methods, some of them new and specific to Mephisto

INRIA a CCSD electronic archive server

HAL-CEA

From design space exploration to code generation : a constraint satisfaction approach for the architectural synthesis of digital VLSI circuits

Author: Timmer A.H.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1996
Field of study

Pure OAI Repository

Path splitting--a technique for improving data flow analysis

Author: Poletto Massimiliano Antonio
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1995
Field of study

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (p. 83-87).by Massimiliano Antonio Poletto.M.Eng

DSpace@MIT

Rechnergestützter Entwurf / Produktion (Mikroelektronik

Author: Leupers Rainer
Publication venue
Publication date
Field of study

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Data Dependence Analysis of Assembly Code

Author: Amme Wolfram
Braun Peter
Thomasset François
Zehendner Eberhard
Publication venue: HAL CCSD
Publication date: 01/01/1999
Field of study

Determination of data dependences is a task typically performed with high-leve- l language source code in today's optimizing and parallelizing compilers. Very little work has been done in the field of data dependence analysis on assembly language code, but this area will be of growing importance, e.g. for increasing ILP. A central element of a data dependence analysis in this case is a method for memory reference disambiguation which decides whether two memory operations may/must access the same memory location. In this paper we describe a new approach for determination of data dependences in assembly code. Our method is based on a sophisticated algorithm for symbolic value propagation, and it can derive value-based dependences between memory operations instead of just address-based dependences. We have integrated our method into the SALTO system for assembly language optimization. Experimental results show that our approach greatly improves the accuracy of the dependence analysis in many cases

INRIA a CCSD electronic archive server

Loop transformations for clustered VLIW architectures

Author: Qian Yi
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2002
Field of study

With increasing demands for performance by embedded systems, especially by digital signal processing (DSP) applications, embedded processors must increase available instructionlevel parallelism (ILP) within significant constraints on power consumption and chip cost. Unfortunately, supporting a large amount of ILP on a processor while maintaining a single register file increases chip cost and potentially decreases overall performance due to increased cycle time. To address this problem, some modern embedded processors partition the register file into multiple low-ported register files, each directly connected with one or more functional units. These functional unit/register file groups are called clusters. Clustered VLIW (very long instruction word) architectures need extra copy operations or delays to transfer values among clusters. To take advantage of clustered architectures, the compiler must expose parallelism for maximal functional-unit utilization, and schedule instructions to reduce intercluster communication overhead. High-level loop transformations offer an excellent opportunity to enhance the abilities of low-level optimizers to generate code for clustered architectures. This dissertation investigates the effects of three loop transformations, i.e., loop fusion, loop unrolling, and unroll-and-jam, on clustered VLIW architectures. The objective is to achieve high performance with low communication overhead. This dissertation discusses the following techniques: Loop Fusion This research examines the impact of loop fusion on clustered architectures. A metric based upon communication costs for guiding loop fusion is developed and tested on DSP benchmarks. Unroll-and-jam and Loop Unrolling A new method that integrates a communication cost model with an integer-optimization problem is developed to determine unroll amounts for loop unrolling and unroll-and-jam automatically for a specific loop on a specific architecture. These techniques have been implemented and tested using DSP benchmarks on simulated, clustered VLIW architectures and a real clustered, embedded processor, the TI TMS320C64X. The results show that the new techniques achieve an average speedup of 1.72-1.89 on five different clustered architectures. These techniques have been implemented and tested using DSP benchmarks on simulated, clustered VLIW architectures and a real clustered, embedded processor, the TI TMS320C64X. The results show that the new techniques achieve an average speedup of 1.72-1.89 on five different clustered architectures

Michigan Technological University

Complete and Practical Universal Instruction Selection

Author: Blindell G. H.
Boender J.
Buchwald S.
Eckstein E.
Floch A.
Gebotys C. H.
Johnson N.
Land A. H.
Lattner C.
Lee C.
Lozano R. C.
Nethercote N.
Single Thread Performance CPU
Tanaka H.
Wilson T.
Živojnović V.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref