1,924 research outputs found

    A general framework for efficient FPGA implementation of matrix product

    Get PDF
    Original article can be found at: http://www.medjcn.com/ Copyright Softmotor LimitedHigh performance systems are required by the developers for fast processing of computationally intensive applications. Reconfigurable hardware devices in the form of Filed-Programmable Gate Arrays (FPGAs) have been proposed as viable system building blocks in the construction of high performance systems at an economical price. Given the importance and the use of matrix algorithms in scientific computing applications, they seem ideal candidates to harness and exploit the advantages offered by FPGAs. In this paper, a system for matrix algorithm cores generation is described. The system provides a catalog of efficient user-customizable cores, designed for FPGA implementation, ranging in three different matrix algorithm categories: (i) matrix operations, (ii) matrix transforms and (iii) matrix decomposition. The generated core can be either a general purpose or a specific application core. The methodology used in the design and implementation of two specific image processing application cores is presented. The first core is a fully pipelined matrix multiplier for colour space conversion based on distributed arithmetic principles while the second one is a parallel floating-point matrix multiplier designed for 3D affine transformations.Peer reviewe

    High Resolution Single-Chip Radix II FFT Processor for High- Tech Application

    Get PDF
    Electrical motors are vital components of many industrial processes and their operation failure leads losing in production line. Motor functionality and its behavior should be monitored to avoid production failure catastrophe. Hence, a high‐tech DSP processor is a significant method for electrical harmonic analysis that can be realized as embedded systems. This chapter introduces principal embedded design of novel high‐tech 1024‐point FFT processor architecture for high performance harmonic measurement techniques. In FFT processor algorithm pipelining and parallel implementation are incorporated in order to enhance the performance. The proposed FFT makes use of floating point to realize higher precision FFT. Since floating‐point architecture limits the maximum clock frequency and increases the power consumption, the chapter focuses on improving the speed, area, resolution and power consumption, as well as latency for the FFT. It illustrates very large‐scale integration (VLSI) implementation of the floating‐point parallel pipelined (FPP) 1024‐point Radix II FFT processor with applying novel architecture that makes use of only single butterfly incorporation of intelligent controller. The functionality of the conventional Radix II FFT was verified as embedded in FPGA prototyping. For area and power consumption, the proposed Radix II FPP‐FFT was optimized in ASIC under Silterra 0.18 ”m and Mimos 0.35 ”m technology libraries

    REAL-TIME ADAPTIVE PULSE COMPRESSION ON RECONFIGURABLE, SYSTEM-ON-CHIP (SOC) PLATFORMS

    Get PDF
    New radar applications need to perform complex algorithms and process a large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low-power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression algorithms for real-time transceiver optimization is presented, and is based on a System-on-Chip architecture for reconfigurable hardware devices. This study also evaluates the performance of dedicated coprocessors as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion, which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through high-performance buses, to perform floating-point operations, control the processing blocks, and communicate with an external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band testbed together with a low-cost channel emulator for different types of waveforms

    FPGA Implementation of Spectral Subtraction for In-Car Speech Enhancement and Recognition

    Get PDF
    The use of speech recognition in noisy environments requires the use of speech enhancement algorithms in order to improve recognition performance. Deploying these enhancement techniques requires significant engineering to ensure algorithms are realisable in electronic hardware. This paper describes the design decisions and process to port the popular spectral subtraction algorithm to a Virtex-4 field-programmable gate array (FPGA) device. Resource analysis shows the final design uses only 13% of the total available FPGA resources. Waveforms and spectrograms presented support the validity of the proposed FPGA design

    High-resolution wide-band Fast Fourier Transform spectrometers

    Full text link
    We describe the performance of our latest generations of sensitive wide-band high-resolution digital Fast Fourier Transform Spectrometer (FFTS). Their design, optimized for a wide range of radio astronomical applications, is presented. Developed for operation with the GREAT far infrared heterodyne spectrometer on-board SOFIA, the eXtended bandwidth FFTS (XFFTS) offers a high instantaneous bandwidth of 2.5 GHz with 88.5 kHz spectral resolution and has been in routine operation during SOFIA's Basic Science since July 2011. We discuss the advanced field programmable gate array (FPGA) signal processing pipeline, with an optimized multi-tap polyphase filter bank algorithm that provides a nearly loss-less time-to-frequency data conversion with significantly reduced frequency scallop and fast sidelobe fall-off. Our digital spectrometers have been proven to be extremely reliable and robust, even under the harsh environmental conditions of an airborne observatory, with Allan-variance stability times of several 1000 seconds. An enhancement of the present 2.5 GHz XFFTS will duplicate the number of spectral channels (64k), offering spectroscopy with even better resolution during Cycle 1 observations.Comment: Accepted for publication in A&A (SOFIA/GREAT special issue

    H-SIMD machine : configurable parallel computing for data-intensive applications

    Get PDF
    This dissertation presents a hierarchical single-instruction multiple-data (H-SLMD) configurable computing architecture to facilitate the efficient execution of data-intensive applications on field-programmable gate arrays (FPGAs). H-SIMD targets data-intensive applications for FPGA-based system designs. The H-SIMD machine is associated with a hierarchical instruction set architecture (HISA) which is developed for each application. The main objectives of this work are to facilitate ease of program development and high performance through ease of scheduling operations and overlapping communications with computations. The H-SIMD machine is composed of the host, FPGA and nano-processor layers. They execute host SIMD instructions (HSIs), FPGA SIMD instructions (FSIs) and nano-processor instructions (NPLs), respectively. A distinction between communication and computation instructions is intended for all the HISA layers. The H-SIMD machine also employs a memory switching scheme to bridge the omnipresent large bandwidth gaps in configurable systems. To showcase the proposed high-performance approach, the conditions to fully overlap communications with computations are investigated for important applications. The building blocks in the H-SLMD machine, such as high-performance and area-efficient register files, are presented in detail. The H-SLMD machine hierarchy is implemented on a host Dell workstation and the Annapolis Wildstar II FPGA board. Significant speedups have been achieved for matrix multiplication (MM), 2-dimensional discrete cosine transform (2D DCT) and 2-dimensional fast Fourier transform (2D FFT) which are used widely in science and engineering. In another FPGA-based programming paradigm, a high-level language (here ANSI C) can be used to program the FPGAs in a mode similar to that of the H-SIMD machine in terms of trying to minimize the effect of overheads. More specifically, a multi-threaded overlapping scheme is proposed to reduce as much as possible, or even completely hide, runtime FPGA reconfiguration overheads. Nevertheless, although the HLL-enabled reconfigurable machine allows software developers to customize FPGA functions easily, special architecture techniques are needed to achieve high-performance without significant penalty on area and clock frequency. Two important high-performance applications, matrix multiplication and image edge detection, are tested on the SRC-6 reconfigurable machine. The implemented algorithms are able to exploit the available data parallelism with independent functional units and application-specific cache support. Relevant performance and design tradeoffs are analyzed

    Applications for FPGA's on Nanosatellites

    Get PDF
    This thesis examines the feasibility of using a Field Programmable Gate Array (FPGA) based design on-board a CubeSat-sized nanosatellite. FPGAs are programmable logic devices that allow for the implementation of custom digital hardware on a single Integrated Circuit (IC). By using these FPGAs in spacecraft, more efficient processing can be done by moving the design onto hardware. A variety of different FPGA-based designs are looked at, including a Watchdog Timer (WDT), a Global Positioning System (GPS) receiver, and a camera interface

    An SoC Architecture for Real-Time Noise Cancellation System Using Variable Speech PDF Method

    Get PDF
    This paper presents the architecture and implementation of system-on-chip (SoC) for realtime noise cancellation system which exploits variable speech probability density function (PDF) and maximum a posteriori (MAP) estimation rule as noise cancelling algorithm. The hardware software co-design approach is employed to achieve real-time performance while considering ease of implementation and design flexibility. The software module utilizes LEON SPARC-v8 and FPU co-prosessor as processing unit. The AMBA based Hanning Filter and FFT/IFFT are utilized as processing accelerator modules to increase system performance. The FFT/IFFT module employs custom Radix-2^2 Single Delay Feedback (R2^2SDF). In order to deliver high data transfer rate between buffer and hardware accelerators, the DMA controller is incorporated. The overall system implementation utilizes 18,500 logic elements and consumes 21.87 kB of memory. The system takes only 0.69 ms latency which is appropriate for real-time application. An FPGA Altera DE2-70 is used for prototyping with both algorithms and the noise cancellation function have been verified
    • 

    corecore