32 research outputs found

    Reconfigurable framework for high-bandwidth stream-oriented data processing

    Get PDF
    Designing a digital system that implements an assortment of specialized high performance algorithms can be costly. Considerable non-recurring engineering costs are required to develop an application specific integrated circuit (ASIC). Additionally, updating or adding features to a design requires the ASIC to be redesigned and refabricated. An alternative to using an ASIC is the field programmable gate array (FPGA). The modern FPGA\u27s ability to be partially reconfigured at runtime allows for the device to have the flexibility normally associated with a processor, while also being able to implement digital logic like in an ASIC. This capability allows for multiple digital functions to be loaded into the device at runtime only as needed. This thesis focuses on developing a reconfigurable framework that enables stream-oriented applications to make more effective use of FPGA resources and to manage partial reconfiguration operations across multiple tasks. This multichannel framework addresses several shortcomings of past research that evaluated various dynamic partial reconfiguration techniques using a color space conversion (CSC) engine. This framework allows for multiple different computations to be performed simultaneously, further improving throughput and flexibility of applications implemented within it. Performance of the system is evaluated by comparing its computational throughput to previous efforts using the CSC engine as well as the performance gained from the flexible scheduling that the framework allows. Implementations using the CSC engine show that performance can be improved up to 5 times faster than previous works, as a result of exploiting parallelism

    Reconfiguration of field programmable logic in embedded systems

    Get PDF

    Cycle-accurate multicore performance models on FPGAs

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 159-165).The goal of this project is to improve computer architecture by accelerating cycle-accurate performance modeling of multicore processors using FPGAs. Contributions include a distributed technique controlling simulation on a highly-parallel substrate, hardware design techniques to reduce development effort, and a specific framework for modeling shared-memory multicore processors paired with realistic On-Chip Networks.by Michael Pellauer.Ph.D

    Reconfigurable microarchitectures at the programmable logic interface

    Get PDF

    Automatic mapping of graphical programming applications to microelectronic technologies

    Get PDF
    Adaptive computing systems (ACSs) and application-specific integrated circuits (ASICs) can serve as flexible hardware accelerators for applications in domains such as image processing and digital signal processing. However, the mapping of applications onto ACSs and ASICs using the traditional methods can take months for a hardware engineer to develop and debug. In this dissertation, a new approach for automatic mapping of software applications onto ACSs and ASICs has been developed, implemented and validated. This dissertation presents the design flow of the software environment called CHAMPION, which is being developed at the University of Tennessee. This environment permits high-level design entry using the Cantata graphical programming software fromKRI. Using Cantata as the design entry, CHAMPION hides from the user the low-level details of the hardware architecture and the finer issues of application mapping onto the hardware. Validation of the CHAMPION environment was performed using multiple applications of moderate complexity. In one case, theapplication mapping time which required six weeks to perform manually took only six minutes for CHAMPION, yet comparable results were produced. Furthermore, the CHAMPION environment was constructed such that retargeting to a new adaptive computing system could be accomplished in just a few hours as opposed to weeks using manual methods. Thus, CHAMPION permits both ACSs and ASICs to be utilized by a wider audience and application development accomplished in less time

    FPGA acceleration of sequence analysis tools in bioinformatics

    Full text link
    Thesis (Ph.D.)--Boston UniversityWith advances in biotechnology and computing power, biological data are being produced at an exceptional rate. The purpose of this study is to analyze the application of FPGAs to accelerate high impact production biosequence analysis tools. Compared with other alternatives, FPGAs offer huge compute power, lower power consumption, and reasonable flexibility. BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. It is a complex highly-optimized system, consisting of tens of thousands of lines of code and a large number of heuristics. Our idea is to emulate the main phases of its algorithm on FPGA. Utilizing our FPGA engine, we quickly reduce the size of the database to a small fraction, and then use the original code to process the query. Using a standard FPGA-based system, we achieved 12x speedup over a highly optimized multithread reference code. Multiple Sequence Alignment (MSA)--the extension of pairwise Sequence Alignment to multiple Sequences--is critical to solve many biological problems. Previous attempts to accelerate Clustal-W, the most commonly used MSA code, have directly mapped a portion of the code to the FPGA. We use a new approach: we apply prefiltering of the kind commonly used in BLAST to perform the initial all-pairs alignments. This results in a speedup of from 8Ox to 190x over the CPU code (8 cores). The quality is comparable to the original according to a commonly used benchmark suite evaluated with respect to multiple distance metrics. The challenge in FPGA-based acceleration is finding a suitable application mapping. Unfortunately many software heuristics do not fall into this category and so other methods must be applied. One is restructuring: an entirely new algorithm is applied. Another is to analyze application utilization and develop accuracy/performance tradeoffs. Using our prefiltering approach and novel FPGA programming models we have achieved significant speedup over reference programs. We have applied approximation, seeding, and filtering to this end. The bulk of this study is to introduce the pros and cons of these acceleration models for biosequence analysis tools

    Design and implementation of a multi-purpose cluster system NIU

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 209-221).by Boon Seong Ang.Ph.D

    Automatic synthesis of reconfigurable instruction set accelerators

    Get PDF

    Vector support for multicore processors with major emphasis on configurable multiprocessors

    Get PDF
    It recently became increasingly difficult to build higher speed uniprocessor chips because of performance degradation and high power consumption. The quadratically increasing circuit complexity forbade the exploration of more instruction-level parallelism (JLP). To continue raising the performance, processor designers then focused on thread-level parallelism (TLP) to realize a new architecture design paradigm. Multicore processor design is the result of this trend. It has proven quite capable in performance increase and provides new opportunities in power management and system scalability. But current multicore processors do not provide powerful vector architecture support which could yield significant speedups for array operations while maintaining arealpower efficiency. This dissertation proposes and presents the realization of an FPGA-based prototype of a multicore architecture with a shared vector unit (MCwSV). FPGA stands for Filed-Programmable Gate Array. The idea is that rather than improving only scalar or TLP performance, some hardware budget could be used to realize a vector unit to greatly speedup applications abundant in data-level parallelism (DLP). To be realistic, limited by the parallelism in the application itself and by the compiler\u27s vectorizing abilities, most of the general-purpose programs can only be partially vectorized. Thus, for efficient resource usage, one vector unit should be shared by several scalar processors. This approach could also keep the overall budget within acceptable limits. We suggest that this type of vector-unit sharing be established in future multicore chips. The design, implementation and evaluation of an MCwSV system with two scalar processors and a shared vector unit are presented for FPGA prototyping. The MicroBlaze processor, which is a commercial IP (Intellectual Property) core from Xilinx, is used as the scalar processor; in the experiments the vector unit is connected to a pair of MicroBlaze processors through standard bus interfaces. The overall system is organized in a decoupled and multi-banked structure. This organization provides substantial system scalability and better vector performance. For a given area budget, benchmarks from several areas show that the MCwSV system can provide significant performance increase as compared to a multicore system without a vector unit. However, a MCwSV system with two MicroBlazes and a shared vector unit is not always an optimized system configuration for various applications with different percentages of vectorization. On the other hand, the MCwSV framework was designed for easy scalability to potentially incorporate various numbers of scalar/vector units and various function units. Also, the flexibility inherent to FPGAs can aid the task of matching target applications. These benefits can be taken into account to create optimized MCwSV systems for various applications. So the work eventually focused on building an architecture design framework incorporating performance and resource management for application-specific MCwSV (AS-MCwSV) systems. For embedded system design, resource usage, power consumption and execution latency are three metrics to be used in design tradeoffs. The product of these metrics is used here to choose the MCwSV system with the smallest value

    Design methodology addressing static/reconfigurable partitioning optimizing software defined radio (SDR) implementation through FPGA dynamic partial reconfiguration and rapid prototyping tools

    Get PDF
    The characteristics people request for communication devices become more and more demanding every day. And not only in those aspects dealing with communication speed, but also in such different characteristics as different communication standards compatibility, battery life, device size or price. Moreover, when this communication need is addressed by the industrial world, new characteristics such as reliability, robustness or time-to-market appear. In this context, Software Defined Radios (SDR) and evolutions such as Cognitive Radios or Intelligent Radios seem to be the technological answer that will satisfy all these requirements in a short and mid-term. Consequently, this PhD dissertation deals with the implementation of this type of communication system. Taking into account that there is no limitation neither in the implementation architecture nor in the target device, a novel framework for SDR implementation is proposed. This framework is made up of FPGAs, using dynamic partial reconfiguration, as target device and rapid prototyping tools as designing tool. Despite the benefits that this framework generates, there are also certain drawbacks that need to be analyzed and minimized to the extent possible. On this purpose, a SDR design methodology has been designed and tested. This methodology addresses the static/reconfigurable partitioning of the SDRs in order to optimize their implementation in the aforementioned framework. In order to verify the feasibility of both the design framework and the design methodology, several implementations have been carried out making use of them. A multi-standard modulator implementing WiFi, WiMAX and UMTS, a small-form-factor cognitive video transmission system and the implementation of several data coding functions over R3TOS, a hardware operating system developed by the University of Edinburgh, are these implementations.Las características que la gente exige a los dispositivos de comunicaciones son cada día más exigentes. Y no solo en los aspectos relacionados con la velocidad de comunicación, sino que también en diferentes características como la compatibilidad con diferentes estándares de comunicación, autonomía, tamaño o precio. Es más, cuando esta necesidad de comunicación se traslada al mundo industrial, aparecen nuevas características como fiabilidad, robustez o plazo de comercialización que también es necesario cubrir. En este contexto, las Radios Definidas por Software (SDR) y evoluciones como las Radios Cognitivas o Radios Inteligentes parecen la respuesta tecnológica que va a satisfacer estas necesidades a corto y medio plazo. Por ello, esta tesis doctoral aborda la implementación de este tipo de sistemas de comunicaciones. Teniendo en cuenta que no existe una limitación, ni en la arquitectura de implementación, ni en el tipo de dispositivo a usar, se propone un nuevo entrono de diseño formado por las FPGAs, haciendo uso de la reconfiguración parcial dinámica, y por las herramientas de prototipado rápido. A pesar de que este entorno de diseño ofrece varios beneficios, también genera algunos inconvenientes que es necesario analizar y minimizar en la medida de lo posible. Con este objetivo, se ha diseñado y verificado una metodología de diseño de SDRs. Esta metodología se encarga del particionado estático/reconfigurable de las SDRs para optimizar su implementación sobre el entrono de diseño antes comentado. Para verificar la viabilidad tanto del entorno, como de la metodología de diseño propuesta, se han realizado varias implementaciones que hacen uso de ambas cosas. Estas implementaciones son: un modulador multi-estándar que implementa WiFi, WiMAX y UMTS, un sistema cognitivo y compacto de transmisión de video y la implementación de varias funciones de codificación de datos sobre R3TOS, un sistema operativo hardware desarrollado por la Universidad de Edimburgo
    corecore