4,187 research outputs found

    A Reconfigurable Tile-Based Architecture to Compute FFT and FIR Functions in the Context of Software-Defined Radio

    Get PDF
    Software-defined radio (SDR) is the term used for flexible radio systems that can deal with multiple standards. For an efficient implementation, such systems require appropriate reconfigurable architectures. This paper targets the efficient implementation of the most computationally intensive kernels of two significantly different standards, viz. Bluetooth and HiperLAN/2, on the same reconfigurable hardware. These kernels are FIR filtering and FFT. The designed architecture is based on a two-dimensional arrangement of 17 tiles. Each tile contains a multiplier, an adder, local memory and multiplexers allowing flexible communication with the neighboring tiles. The tile-base data path is complemented with a global controller and various memories. The design has been implemented in SystemC and simulated extensively to prove equivalence with a reference all-software design. It has also been synthesized and turns out to outperform significantly other reconfigurable designs with respect to speed and area

    AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

    Full text link
    CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. This paper aims to address this challenge and pave the path from software programs towards high-quality FPGA accelerators. Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs. Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. Also, we introduce an analytical model to capture the performance and resource trade-offs among different design configurations of the CPP microarchitecture, which lays the foundation for fast design space exploration. On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated. AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels

    Implementation of JPEG compression and motion estimation on FPGA hardware

    Full text link
    A hardware implementation of JPEG allows for real-time compression in data intensivve applications, such as high speed scanning, medical imaging and satellite image transmission. Implementation options include dedicated DSP or media processors, FPGA boards, and ASICs. Factors that affect the choice of platform selection involve cost, speed, memory, size, power consumption, and case of reconfiguration. The proposed hardware solution is based on a Very high speed integrated circuit Hardware Description Language (VHDL) implememtation of the codec with prefered realization using an FPGA board due to speed, cost and flexibility factors; The VHDL language is commonly used to model hardware impletations from a top down perspective. The VHDL code may be simulated to correct mistakes and subsequently synthesized into hardware using a synthesis tool, such as the xilinx ise suite. The same VHDL code may be synthesized into a number of sifferent hardware architetcures based on constraints given. For example speed was the major constraint when synthesizing the pipeline of jpeg encoding and decoding, while chip area and power consumption were primary constraints when synthesizing the on-die memory because of large area. Thus, there is a trade off between area and speed in logic synthesis

    A Framework for an Automated Compilation System for Reconfigurable Architectures

    Get PDF
    The advent of the Field Programmable Gate Array has allowed the implementation of runtime reconfigurable computer systems. These systems are capable of configuring their hardware to provide custom hardware support for software applications. Since these architectures can be reconfigured during operation, they are able to provide hardware support for a variety of applications, without removal from the system. The Air Force is currently investigating reconfigurable architectures for avionics and signal processing applications. This thesis investigates the problem of automating the application development process for reconfigurable architectures. The lack of automated development support is a major limiting factor in the use of these systems. This thesis creates a framework for a reconfigurable compiler, which automatically implements a single high level language specification as a reconfigurable hardware/software application. The major tasks in the process are examined, and possible methods for implementation are investigated. A prototype reconfigurable compiler has been developed to demonstrate the feasibility of important concepts, and to uncover additional areas of difficulty

    FPGA implementation of online finite-set model based predictive control for power electronics

    Get PDF
    Recently there has been an increase in the use of model based predictive control (MBPC) for power-electronic converters. MBPC allows fast and accurate control of multiple controlled variables for hybrid systems such as a power electronic converter and its load. The computational burden for this control scheme however is very high and often restrictive for a good implementation. This means that a suitable technology and design approach should be used. In this paper the implementation of finite-set MBPC (FS-MBPC) in field-programmable gate arrays (FPGAs) is discussed. The control is fully implemented in programmable digital logic by using a high-level design tool. This allows to obtain very good performances (both in control quality, speed and hardware utilization) and have a flexible, modular control configuration. The feasibility and performance of the FPGA implementation of FS-MBPC is discussed in this paper for a 4-level flying-capacitor converter (FCC). This is an interesting application as FS-MBPC allows the simultaneous control of the output current and the capacitor voltages, yet the high number of possible switch states results in a high computational load. The good performance is obtained by exploiting the FPGAā€™s strong points: parallelism and pipe-lining. In the application discussed in this paper the parallel processing for the three converter phases and a fully pipelined calculation of the prediction stage allow to realize an area-time efficient implementation

    Efficiency analysis methodology of FPGAs based on lost frequencies, area and cycles

    Get PDF
    We propose a methodology to study and to quantify efficiency and the impact of overheads on runtime performance. Most work on High-Performance Computing (HPC) for FPGAs only studies runtime performance or cost, while we are interested in how far we are from peak performance and, more importantly, why. The efficiency of runtime performance is defined with respect to the ideal computational runtime in absence of inefficiencies. The analysis of the difference between actual and ideal runtime reveals the overheads and bottlenecks. A formal approach is proposed to decompose the efficiency into three components: frequency, area and cycles. After quantification of the efficiencies, a detailed analysis has to reveal the reasons for the lost frequencies, lost area and lost cycles. We propose a taxonomy of possible causes and practical methods to identify and quantify the overheads. The proposed methodology is applied on a number of use cases to illustrate the methodology. We show the interaction between the three components of efficiency and show how bottlenecks are revealed
    • ā€¦
    corecore