42,198 research outputs found

    Predictive control using an FPGA with application to aircraft control

    Get PDF
    Alternative and more efficient computational methods can extend the applicability of MPC to systems with tight real-time requirements. This paper presents a “system-on-a-chip” MPC system, implemented on a field programmable gate array (FPGA), consisting of a sparse structure-exploiting primal dual interior point (PDIP) QP solver for MPC reference tracking and a fast gradient QP solver for steady-state target calculation. A parallel reduced precision iterative solver is used to accelerate the solution of the set of linear equations forming the computational bottleneck of the PDIP algorithm. A numerical study of the effect of reducing the number of iterations highlights the effectiveness of the approach. The system is demonstrated with an FPGA-inthe-loop testbench controlling a nonlinear simulation of a large airliner. This study considers many more manipulated inputs than any previous FPGA-based MPC implementation to date, yet the implementation comfortably fits into a mid-range FPGA, and the controller compares well in terms of solution quality and latency to state-of-the-art QP solvers running on a standard PC

    Maximizing CNN Accelerator Efficiency Through Resource Partitioning

    Full text link
    Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs. Current approaches construct a single processor that computes the CNN layers one at a time; the processor is optimized to maximize the throughput at which the collection of layers is computed. However, this approach leads to inefficient designs because the same processor structure is used to compute CNN layers of radically varying dimensions. We present a new CNN accelerator paradigm and an accompanying automated design methodology that partitions the available FPGA resources into multiple processors, each of which is tailored for a different subset of the CNN convolutional layers. Using the same FPGA resources as a single large processor, multiple smaller specialized processors increase computational efficiency and lead to a higher overall throughput. Our design methodology achieves 3.8x higher throughput than the state-of-the-art approach on evaluating the popular AlexNet CNN on a Xilinx Virtex-7 FPGA. For the more recent SqueezeNet and GoogLeNet, the speedups are 2.2x and 2.0x

    SPH Simulations with Reconfigurable Hardware Accelerator

    Full text link
    We present a novel approach to accelerate astrophysical hydrodynamical simulations. In astrophysical many-body simulations, GRAPE (GRAvity piPE) system has been widely used by many researchers. However, in the GRAPE systems, its function is completely fixed because specially developed LSI is used as a computing engine. Instead of using such LSI, we are developing a special purpose computing system using Field Programmable Gate Array (FPGA) chips as the computing engine. Together with our developed programming system, we have implemented computing pipelines for the Smoothed Particle Hydrodynamics (SPH) method on our PROGRAPE-3 system. The SPH pipelines running on PROGRAPE-3 system have the peak speed of 85 GFLOPS and in a realistic setup, the SPH calculation using one PROGRAPE-3 board is 5-10 times faster than the calculation on the host computer. Our results clearly shows for the first time that we can accelerate the speed of the SPH simulations of a simple astrophysical phenomena using considerable computing power offered by the hardware.Comment: 27 pages, 13 figures, submitted to PAS

    Efficient FPGA implementation of high-throughput mixed radix multipath delay commutator FFT processor for MIMO-OFDM

    Get PDF
    This article presents and evaluates pipelined architecture designs for an improved high-frequency Fast Fourier Transform (FFT) processor implemented on Field Programmable Gate Arrays (FPGA) for Multiple Input Multiple Output Orthogonal Frequency Division Multiplexing (MIMO-OFDM). The architecture presented is a Mixed-Radix Multipath Delay Commutator. The presented parallel architecture utilizes fewer hardware resources compared to Radix-2 architecture, while maintaining simple control and butterfly structures inherent to Radix-2 implementations. The high-frequency design presented allows enhancing system throughput without requiring additional parallel data paths common in other current approaches, the presented design can process two and four independent data streams in parallel and is suitable for scaling to any power of two FFT size N. FPGA implementation of the architecture demonstrated significant resource efficiency and high-throughput in comparison to relevant current approaches within literature. The proposed architecture designs were realized with Xilinx System Generator (XSG) and evaluated on both Virtex-5 and Virtex-7 FPGA devices. Post place and route results demonstrated maximum frequency values over 400 MHz and 470 MHz for Virtex-5 and Virtex-7 FPGA devices respectively
    corecore