519 research outputs found

    A mathematical formulation of the loop pipelining problem

    Get PDF
    This paper presents a mathematical model for the loop pipelining problem that considers several parameters for optimization and supports any combination of resource and timing constraints. The unrolling degree of the loop is one of the variables explored by the model. By using Farey’s series, an optimal exploration of the unrolling degree is performed and optimal solutions not considered by other methods are obtained. Finding an optimal schedule that minimizes resource and register requirements is solved by using an Integer linear programming (ILP) model. A novel paradigm called branch and prune is proposed to eficiently converge towards the optimal schedule and prune the search tree for integer solutions, thus drastically reducing the running time. This is the first formulation that combines the unrolling degree of the loop with timing and resource constraints in a mathematical model that guarantees optimal solutions.Peer ReviewedPostprint (author's final draft

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code

    Comparison of high level design methodologies for algorithmic IPs : Bluespec and C-based synthesis

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Includes bibliographical references (leaves 37-39).High level hardware design of Digital Signal Processing algorithms is an important design problem for decreasing design time and allowing more algorithmic exploration. Bluespec is a Hardware Design Language (HDL) that allows designers to express intended microarchitecture through high-level constructs. C-based design tools directly generate hardware from algorithms expressed in C/C++. This research compares these two design methodologies in developing hardware for Reed-Solomon decoding algorithm under area and performance metrics. This work illustrates that C-based design flow may be effective in early stages of the design development for fast prototyping. However, the Bluespec design flow produces hardware that is more customized for performance and resource constraints. This is because in later stages, designers need to have close control over the hardware structure generated that is a part of HDLs like Bluespec, but is difficult to express under the constraints of sequential C semantics.by Abhinav Agarwal.S.M

    Optimized FPGA Implementation of Model Predictive Control for Embedded Systems Using High-Level Synthesis Tool

    Get PDF
    Model predictive control (MPC) is an optimization-based strategy for high-performance control that is attracting increasing interest. While MPC requires the online solution of an optimization problem, its ability to handle multivariable systems and constraints makes it a very powerful control strategy specially for MPC of embedded systems, which have an ever increasing amount of sensing and computation capabilities. We argue that the implementation of MPC on field programmable gate arrays (FPGAs) using automatic tools is nowadays possible, achieving cost-effective successful applications on fast or resource-constrained systems. The main burden for the implementation of MPC on FPGAs is the challenging design of the necessary algorithms. We outline an approach to achieve a software-supported optimized implementation of MPC on FPGAs using high-level synthesis tools and automatic code generation. The proposed strategy exploits the arithmetic operations necessaries to solve optimization problems to tailor an FPGA design, which allows a tradeoff between energy, memory requirements, cost, and achievable speed. We show the capabilities and the simplicity of use of the proposed methodology on two different examples and illustrate its advantages over a microcontroller implementation

    Survey on Combinatorial Register Allocation and Instruction Scheduling

    Full text link
    Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a compiler. In the last three decades, combinatorial optimization has emerged as an alternative to traditional, heuristic algorithms for these two tasks. Combinatorial optimization approaches can deliver optimal solutions according to a model, can precisely capture trade-offs between conflicting decisions, and are more flexible at the expense of increased compilation time. This paper provides an exhaustive literature review and a classification of combinatorial optimization approaches to register allocation and instruction scheduling, with a focus on the techniques that are most applied in this context: integer programming, constraint programming, partitioned Boolean quadratic programming, and enumeration. Researchers in compilers and combinatorial optimization can benefit from identifying developments, trends, and challenges in the area; compiler practitioners may discern opportunities and grasp the potential benefit of applying combinatorial optimization

    Time-constrained loop pipelining

    Get PDF
    This paper addresses the problem of Time-Constrained Loop Pipelining, i.e. given a fixed throughput, finding a schedule of a loop which minimizes resource requirements. We propose a methodology, called TCLP, based on dividing the problem into two simpler and independent tasks: retiming and scheduling. TCLP explores different sets of resources, searching for a maximum resource utilization. This reduces area requirements. After a minimum set of resources has been found, the execution throughput is increased and the number of registers required by the loop schedule is reduced. TCLP attempts to generate a schedule which minimizes cost in time and area (resources and registers). The results show that TCLP obtains optimal schedules in most cases.Peer ReviewedPostprint (published version
    corecore