2,083 research outputs found

    Minimum maximum reconfiguration cost problem

    Get PDF
    This paper discusses the problem of minimizing the reconfiguration cost of some types of reconfigurable systems. A formal definition of the problem and a proof of its NP-completeness are provided. In addition, an Integer Linear Programming formulation is proposed. The proposed problem has been used for optimizing a design stage of Finite Virtual State Machines

    Reconfigurable Computing for Speech Recognition: Preliminary Findings

    Get PDF
    Continuous real-time speech recognition is a highly computationally-demanding task, but one which can take good advantage of a parallel processing system. To this end, we describe proposals for, and preliminary findings of, research in implementing in programmable logic the decoder part of a speech recognition system. Recognition via Viterbi decoding of Hidden Markov Models is outlined, along with details of current implementations, which aim to exploit properties of the algorithm that could make it well-suited for devices such as FPGAs. The question of how to deal with limited resources, by reconfiguration or otherwise, is also addressed

    Transformations of High-Level Synthesis Codes for High-Performance Computing

    Full text link
    Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C/C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target specialized hardware, the optimization principles known from traditional software design are no longer sufficient to implement high-performance codes. Fast and efficient codes for reconfigurable platforms are thus still challenging to design. To alleviate this, we present a set of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. Our work provides a toolbox for developers, where we systematically identify classes of transformations, the characteristics of their effect on the HLS code and the resulting hardware (e.g., increases data reuse or resource consumption), and the objectives that each transformation can target (e.g., resolve interface contention, or increase parallelism). We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip streaming dataflow, allowing for massively parallel architectures. To quantify the effect of our transformations, we use them to optimize a set of throughput-oriented FPGA kernels, demonstrating that our enhancements are sufficient to scale up parallelism within the hardware constraints. With the transformations covered, we hope to establish a common framework for performance engineers, compiler developers, and hardware developers, to tap into the performance potential offered by specialized hardware architectures using HLS

    A Reconfigurable Pattern Matching Hardware Implementation Using On-Chip RAM-Based FSM

    Get PDF
    The use of synthesizable reconfigurable IP cores has increasingly become a trend in System on Chip (SoC) designs because of their flexibility and powerful functionality. The market introduction of multi-featured platform FPGAs equipped with embedded memory and processor blocks has further expanded the possibility of utilizing dynamic reconfiguration to improve overall system adaptability to meet varying product requirements. In this paper, a reconfigurable hardware implementation for pattern matching using Finite State machine (FSM) is proposed. The FSM design is RAMbased and is reconfigured on the fly through altering memory contents only. An embedded processor is used for orchestrating run time reconfiguration. Experimental results show that the system can reconfigure itself based on a new incoming pattern and perform the text search without the need of a host processor. Results also proved that each search iteration was executed in one clock cycle and the maximum achievable clock frequency is independent of search pattern length

    Stochastic Digital Circuits for Probabilistic Inference

    Get PDF
    We introduce combinational stochastic logic, an abstraction that generalizes deterministic digital circuit design (based on Boolean logic gates) to the probabilistic setting. We show how this logic can be combined with techniques from contemporary digital design to generate stateless and stateful circuits for exact and approximate sampling from a range of probability distributions. We focus on Markov chain Monte Carlo algorithms for Markov random fields, using massively parallel circuits. We implement these circuits on commodity reconfigurable logic and estimate the resulting performance in time, space and price. Using our approach, these simple and general algorithms could be affordably run for thousands of iterations on models with hundreds of thousands of variables in real time

    Analysis of the reconfiguration latency and energy overheads for a Xilinx Virtex-5 FPGA

    Get PDF
    In this paper we have evaluated the overhead and the tradeoffs of a set of components usually included in a system with run-time partial reconfiguration implemented on a Xilinx Virtex-5. Our analysis shows the benefits of including a scratchpad memory inside the reconfiguration controller in order to improve the efficiency of the reconfiguration process. We have designed a simple controller for this scratchpad that includes support for prefetching and caching in order to further reduce both the energy and latency overhead
    corecore