80 research outputs found

    Obtaining performance and programmability using reconfigurable hardware for media processing

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2002.Includes bibliographical references (p. 127-132).An imperative requirement in the design of a reconfigurable computing system or in the development of a new application on such a system is performance gains. However, such developments suffer from long-and-difficult programming process, hard-to-predict performance gains, and limited scope of applications. To address these problems, we need to understand reconfigurable hardware's capabilities and limitations, its performance advantages and disadvantages, re-think reconfigurable system architectures, and develop new tools to explore its utility. We begin by examining performance contributors at the system level. We identify those from general-purpose and those from dedicated components. We propose an architecture by integrating reconfigurable hardware within the general-purpose framework. This is to avoid and minimize dedicated hardware and organization for programmability. We analyze reconfigurable logic architectures and their performance limitations. This analysis leads to a theory that reconfigurable logic can never be clocked faster than a fixed-logic design based on the same fabrication technology. Though highly unpredictable, we can obtain a quick upper bound estimate on the clock speed based on a few parameters. We also analyze microprocessor architectures and establish an analytical performance model. We use this model to estimate performance bounds using very little information on task properties. These bounds help us to detect potential memory-bound tasks. For a compute-bound task, we compare its performance upper bound with the upper bound on reconfigurable clock speed to further rule out unlikely speedup candidates.(cont.) These performance estimates require very few parameters, and can be quickly obtained without writing software or hardware codes. They can be integrated with design tools as front end tools to explore speedup opportunities without costly trials. We believe this will broaden the applicability of reconfigurable computing.by Ling-Pei Kung.Ph.D

    Design and implementation of a multi-purpose cluster system NIU

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 209-221).by Boon Seong Ang.Ph.D

    Método para la evaluación de un microcontrolador de núcleo abierto

    Get PDF
    La etapa de verifi cación desempeña un papel fundamental en el diseñoe implementación de microcontroladores. Con el fi n de realizar una verificación acertada del diseño, son utilizadas algunas técnicas de verificación funcional tales como: pruebas defi nidas por el diseñador paraverifi car el desempeño ante casos extremos, la simulación a través detestbenches, y la ejecución de aplicaciones extensas. El proyecto propuestoen este trabajo tiene como objetivo desarrollar e implementarun método para la evaluación de un microcontrolador de núcleo abierto,con la realización de pruebas directamente sobre el hardware. Esteenfoque presenta como ventajas, un proceso mucho más rápido queotros métodos que emplean simulaciones y menos requerimiento dememoria para las pruebas. Un Ethernet IP Core ha sido integrado alproyecto, con el fi n de hacer que el método sea independiente del sistemaoperativo, de la arquitectura de microprocesador y de la herramientade diseño

    Optimizing SIMD execution in HW/SW co-designed processors

    Get PDF
    SIMD accelerators are ubiquitous in microprocessors from different computing domains. Their high compute power and hardware simplicity improve overall performance in an energy efficient manner. Moreover, their replicated functional units and simple control mechanism make them amenable to scaling to higher vector lengths. However, code generation for these accelerators has been a challenge from the days of their inception. Compilers generate vector code conservatively to ensure correctness. As a result they lose significant vectorization opportunities and fail to extract maximum benefits out of SIMD accelerators. This thesis proposes to vectorize the program binary at runtime in a speculative manner, in addition to the compile time static vectorization. There are different environments that support runtime profiling and optimization support required for dynamic vectorization, one of most prominent ones being: 1) Dynamic Binary Translators and Optimizers (DBTO) and 2) Hardware/Software (HW/SW) Co-designed Processors. HW/SW co-designed environment provides several advantages over DBTOs like transparent incorporations of new hardware features, binary compatibility, etc. Therefore, we use HW/SW co-designed environment to assess the potential of speculative dynamic vectorization. Furthermore, we analyze vector code generation for wider vector units and find out that even though SIMD accelerators are amenable to scaling from the hardware point of view, vector code generation at higher vector length is even more challenging. The two major factors impeding vectorization for wider SIMD units are: 1) Reduced dynamic instruction stream coverage for vectorization and 2) Large number of permutation instructions. To solve the first problem we propose Variable Length Vectorization that iteratively vectorizes for multiple vector lengths to improve dynamic instruction stream coverage. Secondly, to reduce the number of permutation instructions we propose Selective Writing that selectively writes to different parts of a vector register and avoids permutations. Finally, we tackle the problem of leakage energy in SIMD accelerators. Since SIMD accelerators consume significant amount of real estate on the chip, they become the principle source of leakage if not utilized judiciously. Power gating is one of the most widely used techniques to reduce leakage energy of functional units. However, power gating has its own energy and performance overhead associated with it. We propose to selectively devectorize the vector code when higher SIMD lanes are used intermittently. This selective devectorization keeps the higher SIMD lanes idle and power gated for maximum duration. Therefore, resulting in overall leakage energy reduction.Postprint (published version

    Reconfigurable Model for RISC Processors

    Get PDF
    The instruction set of a processor is embodied in the particular micro-architecture representing the processor hardware. Verifying proper operation of the instruction set for a particular processor hardware implementation requires exhaustive testing to expose unknown dependencies and other elusive design flaws. This paper presents the research and development of a flexible micro-architectural model written in SystemC for a RISC processor based upon a user defined configuration database; the RISC processor is based on an architecture assigned in course Design of Computer Systems (DCS) offered at Rochester Institute of Technology (RIT). This model will be tested by a test bench written in SystemVerilog, using randomly generated instructions, and results will be compared with various DCS student processors originally developed at the Register Transfer Level (RTL) in a Hardware Description Language (HDL) such as Verilog or VHDL. The test bench will provide stimulus such as the system clock and random instructions through a program memory attached to both the model and RTL processor. The main goal of this work is to automate verification and validation of a diverse set of processors designed in RTL by using an appropriate configuration database and comparison of all states and signals from the processor being tested and the model developed by the author. The test results will be compared and discussed

    ASC: A stream compiler for computing with FPGAs

    No full text
    Published versio
    • …
    corecore