Search CORE

10 research outputs found

Generalized instruction selection using SSA

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

Crossref

Automated application-specific instruction set generation

Author: XU CE
Publication venue
Publication date: 09/02/2006
Field of study

Master'sMASTER OF ENGINEERIN

ScholarBank@NUS

Survey on Instruction Selection: An Extensive and Modern Literature Review

Author: Blindell Gabriel S. Hjort
Publication venue
Publication date: 01/01/2013
Field of study

Instruction selection is one of three optimisation problems involved in the code generator backend of a compiler. The instruction selector is responsible of transforming an input program from its target-independent representation into a target-specific form by making best use of the available machine instructions. Hence instruction selection is a crucial part of efficient code generation. Despite on-going research since the late 1960s, the last, comprehensive survey on the field was written more than 30 years ago. As new approaches and techniques have appeared since its publication, this brings forth a need for a new, up-to-date review of the current body of literature. This report addresses that need by performing an extensive review and categorisation of existing research. The report therefore supersedes and extends the previous surveys, and also attempts to identify where future research should be directed.Comment: Major changes: - Merged simulation chapter with macro expansion chapter - Addressed misunderstandings of several approaches - Completely rewrote many parts of the chapters; strengthened the discussion of many approaches - Revised the drawing of all trees and graphs to put the root at the top instead of at the bottom - Added appendix for listing the approaches in a table See doc for more inf

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Architectural and Complier Mechanisms for Accelerating Single Thread Applications on Mulitcore Processors.

Author: Zhong Hongtao
Publication venue
Publication date
Field of study

Multicore systems have become the dominant mainstream computing platform. One of the biggest challenges going forward is how to efficiently utilize the ever increasing computational power provided by multicore systems. Applications with large amounts of explicit thread-level parallelism naturally scale performance with the number of cores. However, single-thread applications realize little to no gains from multicore systems. This work investigates architectural and compiler mechanisms to automatically accelerate single thread applications on multicore processors by efficiently exploiting three types of parallelism across multiple cores: instruction level parallelism (ILP), fine-grain thread level parallelism (TLP), and speculative loop level parallelism (LLP). A multicore architecture called Voltron is proposed to exploit different types of parallelism. Voltron can organize the cores for execution in either coupled or decoupled mode. In coupled mode, several in-order cores are coalesced to emulate a wide-issue VLIW processor. In decoupled mode, the cores execute a set of fine-grain communicating threads extracted by the compiler. By executing fine-grain threads in parallel, Voltron provides coarse-grained out-of-order execution capability using in-order cores. Architectural mechanisms for speculative execution of loop iterations are also supported under the decoupled mode. Voltron can dynamically switch between two modes with low overhead to exploit the best form of available parallelism. This dissertation also investigates compiler techniques to exploit different types of parallelism on the proposed architecture. First, this work proposes compiler techniques to manage multiple instruction streams to collectively function as a single logical stream on a conventional VLIW to exploit ILP. Second, this work studies compiler algorithms to extract fine-grain threads. Third, this dissertation proposes a series of systematic compiler transformations and a general code generation framework to expose hidden speculative LLP hindered by register and memory dependences in the code. These transformations collectively remove inter-iteration dependences that are caused by subsets of isolatable instructions, are unwindable, or occur infrequently. Experimental results show that proposed mechanisms can achieve speedups of 1.33 and 1.14 on 4 core machines by exploiting ILP and TLP respectively. The proposed transformations increase the DOALL loop coverage in applications from 27% to 61%, resulting in a speedup of 1.84 on 4 core systems.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/58419/1/hongtaoz_1.pd

Deep Blue Documents at the University of Michigan

Quantum Search Algorithms for Constraint Satisfaction and Optimization Problems Using Grover\u27s Search and Quantum Walk Algorithms with Advanced Oracle Design

Author: Alasow Abdirahman Sheikh Hassan
Publication venue: PDXScholar
Publication date: 15/04/2024
Field of study

The field of quantum computing has emerged as a powerful tool for solving and optimizing combinatorial optimization problems. To solve many real-world problems with many variables and possible solutions for constraint satisfaction and optimization problems, the required number of qubits of scalable hardware for quantum computing is the bottleneck in the current generation of quantum computers. In this dissertation, we will demonstrate advanced, scalable building blocks for the quantum search algorithms that have been implemented in Grover\u27s search algorithm and the quantum walk algorithm. The scalable building blocks are used to reduce the required number of qubits in the design. The proposed architecture effectively scales and optimizes the number of qubits needed to solve large problems with a limited number of qubits. Thus, scaling and optimizing the number of qubits that can be accommodated in quantum algorithm design directly reflect on performance. Also, accuracy is a key performance metric related to how accurately one can measure quantum states. The search space of quantum search algorithms is traditionally created by using the Hadamard operator to create superposition. However, creating superpositions for problems that do not need all superposition states decreases the accuracy of the measured states. We present an efficient quantum circuit design that the user has control over to create the subspace superposition states for the search space as needed. Using only the subspace states as superposition states of the search space will increase the rate of correct solutions. In this dissertation, we will present the implementation of practical problems for Grover\u27s search algorithm and quantum walk algorithm in logic design, logic puzzles, and machine learning problems such as SAT, MAX-SAT, XOR-SAT, and like SAT problems in EDA, and mining frequent patterns for association rule mining

PDXScholar (Portland State University)

Customizing the Computation Capabilities of Microprocessors.

Author: Clark Nathan T.
Publication venue
Publication date: 01/01/2007
Field of study

Designers of microprocessor-based systems must constantly improve performance and increase computational efficiency in their designs to create value. To this end, it is increasingly common to see computation accelerators in general-purpose processor designs. Computation accelerators collapse portions of an application's dataflow graph, reducing the critical path of computations, easing the burden on processor resources, and reducing energy consumption in systems. There are many problems associated with adding accelerators to microprocessors, though. Design of accelerators, architectural integration, and software support all present major challenges. This dissertation tackles these challenges in the context of accelerators targeting acyclic and cyclic patterns of computation. First, a technique to identify critical computation subgraphs within an application set is presented. This technique is hardware-cognizant and effectively generates a set of instruction set extensions given a domain of target applications. Next, several general-purpose accelerator structures are quantitatively designed using critical subgraph analysis for a broad application set. The next challenge is architectural integration of accelerators. Traditionally, software invokes accelerators by statically encoding new instructions into the application binary. This is incredibly costly, though, requiring many portions of hardware and software to be redesigned. This dissertation develops strategies to utilize accelerators, without changing the instruction set. In the proposed approach, the microarchitecture translates applications at run-time, replacing computation subgraphs with microcode to utilize accelerators. We explore the tradeoffs in performing difficult aspects of the translation at compile-time, while retaining run-time replacement. This culminates in a simple microarchitectural interface that supports a plug-and-play model for integrating accelerators into a pre-designed microprocessor. Software support is the last challenge in dealing with computation accelerators. The primary issue is difficulty in generating high-quality code utilizing accelerators. Hand-written assembly code is standard in industry, and if compiler support does exist, simple greedy algorithms are common. In this work, we investigate more thorough techniques for compiling for computation accelerators. Where greedy heuristics only explore one possible solution, the techniques in this dissertation explore the entire design space, when possible. Intelligent pruning methods ensure that compilation is both tractable and scalable.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/57633/2/ntclark_1.pd

Deep Blue Documents at the University of Michigan

Estudo e desenvolvimento de sistemas de geração de back-ends do processo de compilação

Author: Matos Paulo
Publication venue: Universidade do Minho
Publication date: 01/01/1999
Field of study

O back-end de um compilador agrupa todo um conjunto de tarefas cuja implementação é intrinsecamente dependente das características do processador para o qual se pretende gerar código. A rápida evolução da industria dos processadores e microcontroladores levou esta área de desenvolvimento de software a realizar fortes investimentos na pesquisa de meios que permitissem dar uma resposta rápida e de qualidade à procura verificada. É dentro deste contexto que surge o tema e o trabalho desenvolvido ao longo desta tese de mestrado, que pretende de alguma forma sintetizar o que já se encontra feito e propor algumas soluções, que apesar de individualmente não serem originais permitem, quando em conjunto, vislumbrar alternativas aos sistemas já concebidos e avançar um pouco mais na área de investigação dos geradores de código final e optimizadores. O trabalho aqui descrito é extremamente abrangente para uma qualquer tese, cobrindo todas as áreas do processo de compilação a partir da análise semântica até à geração do código máquina, passando pela apresentação de modelos de compiladores, representação da informação, sistemas de análise de fluxo de controlo e de dados, alocação de registos local e global, selecção de instruções e geração de selectores, optimização de código a vários níveis, etc. É ainda de referir que do trabalho desenvolvido resultou o Back-End Development System, que como o nome indica é um sistema de apoio ao desenvolvimento das tarefas de back-end de um compilador. The back-end of a compiler gathers a group of tasks, whose implementation is directly dependent on the features of the processor for which machine code is intended to be generated. The fast evolution of processors and micro-controllers industry lead this area of software development to perform strong investments in the research of means, which would give a fast and proper answer to the demand. It is within this context that the theme and the work carried on through this thesis emerges. The aim of this work is to synthesise what has already been done and to give some solutions which, although individually not original, when put together, they allow alternatives to the pre-established systems and move on a little further in the research of generators of final code and optimisers. This work is extremely wide-ranging, covering all areas of the compiling process, going from the semantic analyses till the generation of machine code. It also contains the presentation of models of compilers, representation of information, control and data flow analysis, local and global registers allocation, instructions selection and generation of selectors, code optimisation at several levels, etc. It is also important to refer that from the development work emerged the Back-End Development System, which, as the name itself indicates, is a software system to support development of back-end tasks of a compiler

Biblioteca Digital do IPB

Estudo e desenvolvimento de sistemas de geração de back-ends do processo de compilação

Author: Matos Paulo
Publication venue: Universidade do Minho
Publication date: 01/01/1999
Field of study

Publikationer från KTH

Biblioteca Digital do IPB

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Code Generation For Fixed-point Dsps

Author: Araujo G.
Malik S.
Publication venue
Publication date
Field of study

This paper examines the problem of code-generation for Digital Signal Processors (DSPs). We make two major contributions. First, for an important class of DSP architectures, we propose an optimal O(n) algorithm for the tasks of register allocation and instruction scheduling for expression trees. Optimality is guaranteed by sufficient conditions derived from a structural representation of the processor Instruction Set Architecture (ISA). Second, we develop heuristics for the case when basic blocks are Directed Acyclic Graphs (DAGs). © 1998 ACM.32136161Aho, A.V., Ganapathi, M., TJIANG, S. W. K. 1989. Code Generation Using Tree Matching and Dynamic Programming., , ACM Trans. Program. Lang. Syst. 11, 4 (Oct.), 491-516Aho, A., AND JOHNSON, S. 1976. Optimal Code Generation for Expression Trees., , J. ACM 23, 3 (July), 488-501Aho, A., Johnson, S., ULLMAN, J. 1977. Code Generation for Expressions with Common Subexpressions., , J. ACM 24, 1 (Jan.), 146-160Aho, A., ULLMAN, J. 1977. Code Generation for Machines with Multiregister Operations., , In Proc. 4th ACM Symposium on Principles of Programming Languages. 21-28Aho, A.V., Sethi, R., ULLMAN, J. D. 1986. Compilers: Principles, Techniques, and Tools., , Addison-Wesley Longman Publ. Co., Inc., Reading, MAAppel, A.W., AND SUPOWIT, K. J. 1987. Generalization of the Sethi-Ullman Algorithm for Register Allocation., , Softw. Pract. Exper. 17, 6 (June), 417-421Araujo, G., AND MALIK, S. 1995. Optimal Code Generation for Embedded Memory Non-homogeneous Register Architectures., , In Proceedings of the Eighth International Symposium on System Synthesis (Cannes, France, Sept. 13-15, 1995). ACM Press, New York, NY, 36-41Araujo, G., Malik, S., LEE, M. 1996. Using Register-transfer Paths in Code Generation for Heterogeneous Memory-register Architectures., , In Proceedings of the 33rd Conference on Design Automation. 591-596Bruno, J., AND SETHI, R. 1975. the Generation of Optimal Code for Stack Machines., , J. ACM 22, 3 (July), 382-396Bruno, J., AND SETHI, J. 1976. Code Generation for One-register Machine., , J. ACM 23, 3 (July), 502-510Coffman, E., AND SETHI, R. 1983. Instructions Sets for Evaluating Arithmetic Expressions., , J. ACM 30, 3 (July), 457-478Fraser, C., Hanson, D., PROEBSTING, T. 1993. Engineering a Simple, Efficient Code Generator., , J. ACM 22, 12 (Mar.), 248-262Garey, M., AND JOHNSON, D. 1979. Computers and Intractability., , W. H. Freeman & Co., New York, NYHuffman, C., AND O'DoNNELL, M. 1992. Pattern Matching in Trees., , J. ACM 29, 1 (Jan.), 68-95Kolson, D., Nicolau, A.N.D., KENNEDY, K. 1996. Optimal Register Assignment to Loops for Embedded Code Generation., , ACM Trans. Des. Autom. Electron. Syst. l, 2 (Apr.), 251-279Lanner, D., Cornero, M., Goosens, G., DE MAN, H. 1994. Data Routing: a Paradigm for Efficient Data-path Synthesis and Code Generation., , In Proceedings of the High-Level Synthesis Symposium. 17-22Lapsley, P., Bier, J., Shoham, A., LEE, E. A. 1996. DSP Processor Fundamentals: Architectures and Features., , IEEE Press, Piscataway, NJLee, E.A., 1988. Programmable DSP Architectures: Part I., , IEEE ASSP Mag., 4-19Lee, E.A., 1989. Programmable DSP Architectures: Part II., , IEEE ASSP Mag., 4-14Liao, S., Devadas, S., Keutzer, K., TJIANG, S. 1995. Instruction Selection Using Binate Covering for Code Size Optimization., , In Proceedings of the 1995 IEEE I ACM international conference on Computer-aided design (ICCAD-95, San Jose, CA, Nov. 5-9, 1995). IEEE Computer Society Press, Los Alamitos, CA, 393-399Liem, C.M.T., AND P., P. 1994. Instruction-set Matching and Selection for DSP and ASIP Code Generation., , In Proceedings of the European Design and Test Conference. 31-37Marwedel, P., Kluwer Academic Publishers, Hingham, MAMarwedel, P., 1993. Tree-based Mapping of Algorithms to Predefined Structures., , In Proceedings of the international conference on Computer-aided design (ICCAD '93). 586-5931990. DSP56000i'DSP56001 Digital Signal Processor User's Manual., , Motorola Inc., Phoenix, AZPrabhala, B., 1980. Efficient Computation of Expressions with Common Subexpressions., , J. ACM 27, 1 (Jan.), 146-163Sethi, R., 1975. Complete Register Allocation Problems., , SIAM J. Comput. 4, 3 (Sept.), 226-248Sethi, R., 1970. the Generation of Optimal Code for Arithmetic Expressions., , J. ACM 17, 4 (Oct.), 715-7281990. Digital Signal Processing Applications with the TMS320 Family., , Texas Instruments, Austin, TXTjiang, S., 1993. an Olive Twig., , Synopsys, Inc., Mountain View, CAWess, B., 1990. on the Optimal Code Generation for Signal Flow Computation., , In Proceedings of the International Conference on Proceedings of the International Conference on Circuits and Systems. 444-447Wess, B., 1992. Automatic Instruction Code Generation Based on Trellis Diagrams., , In Proceedings of the International Conference on Proceedings of the International Conference on Circuits and Systems. 645-648Zivojnovic, V., Velarde, J., SCLAAGER, C. 1994. DSPstone, a DSP Benchmarking Methodology., , Aachen University of Technology, Aachen, German

Repositorio da Producao Cientifica e Intelectual da Unicamp