1,924 research outputs found
A general framework for efficient FPGA implementation of matrix product
Original article can be found at: http://www.medjcn.com/ Copyright Softmotor LimitedHigh performance systems are required by the developers for fast processing of computationally intensive applications. Reconfigurable hardware devices in the form of Filed-Programmable Gate Arrays (FPGAs) have been proposed as viable system building blocks in the construction of high performance systems at an economical price. Given the importance and the use of matrix algorithms in scientific computing applications, they seem ideal candidates to harness and exploit the advantages offered by FPGAs. In this paper, a system for matrix algorithm cores generation is described. The system provides a catalog of efficient user-customizable cores, designed for FPGA implementation, ranging in three different matrix algorithm categories: (i) matrix operations, (ii) matrix transforms and (iii) matrix decomposition. The generated core can be either a general purpose or a specific application core. The methodology used in the design and implementation of two specific image processing application cores is presented. The first core is a fully pipelined matrix multiplier for colour space conversion based on distributed arithmetic principles while the second one is a parallel floating-point matrix multiplier designed for 3D affine transformations.Peer reviewe
High Resolution Single-Chip Radix II FFT Processor for High- Tech Application
Electrical motors are vital components of many industrial processes and their operation failure leads losing in production line. Motor functionality and its behavior should be monitored to avoid production failure catastrophe. Hence, a highâtech DSP processor is a significant method for electrical harmonic analysis that can be realized as embedded systems. This chapter introduces principal embedded design of novel highâtech 1024âpoint FFT processor architecture for high performance harmonic measurement techniques. In FFT processor algorithm pipelining and parallel implementation are incorporated in order to enhance the performance. The proposed FFT makes use of floating point to realize higher precision FFT. Since floatingâpoint architecture limits the maximum clock frequency and increases the power consumption, the chapter focuses on improving the speed, area, resolution and power consumption, as well as latency for the FFT. It illustrates very largeâscale integration (VLSI) implementation of the floatingâpoint parallel pipelined (FPP) 1024âpoint Radix II FFT processor with applying novel architecture that makes use of only single butterfly incorporation of intelligent controller. The functionality of the conventional Radix II FFT was verified as embedded in FPGA prototyping. For area and power consumption, the proposed Radix II FPPâFFT was optimized in ASIC under Silterra 0.18 ”m and Mimos 0.35 ”m technology libraries
REAL-TIME ADAPTIVE PULSE COMPRESSION ON RECONFIGURABLE, SYSTEM-ON-CHIP (SOC) PLATFORMS
New radar applications need to perform complex algorithms and process a large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low-power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression algorithms for real-time transceiver optimization is presented, and is based on a System-on-Chip architecture for reconfigurable hardware devices. This study also evaluates the performance of dedicated coprocessors as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion, which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through high-performance buses, to perform floating-point operations, control the processing blocks, and communicate with an external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band testbed together with a low-cost channel emulator for different types of waveforms
FPGA Implementation of Spectral Subtraction for In-Car Speech Enhancement and Recognition
The use of speech recognition in noisy environments requires the use of speech enhancement algorithms in order to improve recognition performance. Deploying these enhancement techniques requires significant engineering to ensure algorithms are realisable in electronic hardware. This paper describes the design decisions and process to port the popular spectral subtraction algorithm to a Virtex-4 field-programmable gate array (FPGA) device. Resource analysis shows the final design uses only 13% of the total available FPGA resources. Waveforms and spectrograms presented support the validity of the proposed FPGA design
High-resolution wide-band Fast Fourier Transform spectrometers
We describe the performance of our latest generations of sensitive wide-band
high-resolution digital Fast Fourier Transform Spectrometer (FFTS). Their
design, optimized for a wide range of radio astronomical applications, is
presented. Developed for operation with the GREAT far infrared heterodyne
spectrometer on-board SOFIA, the eXtended bandwidth FFTS (XFFTS) offers a high
instantaneous bandwidth of 2.5 GHz with 88.5 kHz spectral resolution and has
been in routine operation during SOFIA's Basic Science since July 2011. We
discuss the advanced field programmable gate array (FPGA) signal processing
pipeline, with an optimized multi-tap polyphase filter bank algorithm that
provides a nearly loss-less time-to-frequency data conversion with
significantly reduced frequency scallop and fast sidelobe fall-off. Our digital
spectrometers have been proven to be extremely reliable and robust, even under
the harsh environmental conditions of an airborne observatory, with
Allan-variance stability times of several 1000 seconds. An enhancement of the
present 2.5 GHz XFFTS will duplicate the number of spectral channels (64k),
offering spectroscopy with even better resolution during Cycle 1 observations.Comment: Accepted for publication in A&A (SOFIA/GREAT special issue
H-SIMD machine : configurable parallel computing for data-intensive applications
This dissertation presents a hierarchical single-instruction multiple-data (H-SLMD) configurable computing architecture to facilitate the efficient execution of data-intensive applications on field-programmable gate arrays (FPGAs). H-SIMD targets data-intensive applications for FPGA-based system designs. The H-SIMD machine is associated with a hierarchical instruction set architecture (HISA) which is developed for each application. The main objectives of this work are to facilitate ease of program development and high performance through ease of scheduling operations and overlapping communications with computations.
The H-SIMD machine is composed of the host, FPGA and nano-processor layers. They execute host SIMD instructions (HSIs), FPGA SIMD instructions (FSIs) and nano-processor instructions (NPLs), respectively. A distinction between communication and computation instructions is intended for all the HISA layers. The H-SIMD machine also employs a memory switching scheme to bridge the omnipresent large bandwidth gaps in configurable systems. To showcase the proposed high-performance approach, the conditions to fully overlap communications with computations are investigated for important applications. The building blocks in the H-SLMD machine, such as high-performance and area-efficient register files, are presented in detail. The H-SLMD machine hierarchy is implemented on a host Dell workstation and the Annapolis Wildstar II FPGA board. Significant speedups have been achieved for matrix multiplication (MM), 2-dimensional discrete cosine transform (2D DCT) and 2-dimensional fast Fourier transform (2D FFT) which are used widely in science and engineering.
In another FPGA-based programming paradigm, a high-level language (here ANSI C) can be used to program the FPGAs in a mode similar to that of the H-SIMD machine in terms of trying to minimize the effect of overheads. More specifically, a multi-threaded overlapping scheme is proposed to reduce as much as possible, or even completely hide, runtime FPGA reconfiguration overheads. Nevertheless, although the HLL-enabled reconfigurable machine allows software developers to customize FPGA functions easily, special architecture techniques are needed to achieve high-performance without significant penalty on area and clock frequency. Two important high-performance applications, matrix multiplication and image edge detection, are tested on the SRC-6 reconfigurable machine. The implemented algorithms are able to exploit the available data parallelism with independent functional units and application-specific cache support. Relevant performance and design tradeoffs are analyzed
Applications for FPGA's on Nanosatellites
This thesis examines the feasibility of using a Field Programmable Gate Array (FPGA) based design on-board a CubeSat-sized nanosatellite. FPGAs are programmable logic devices that allow for the implementation of custom digital hardware on a single Integrated Circuit (IC). By using these FPGAs in spacecraft, more efficient processing can be done by moving the design onto hardware. A variety of different FPGA-based designs are looked at, including a Watchdog Timer (WDT), a Global Positioning System (GPS) receiver, and a camera interface
An SoC Architecture for Real-Time Noise Cancellation System Using Variable Speech PDF Method
This paper presents the architecture and implementation of system-on-chip (SoC) for realtime noise cancellation system which exploits variable speech probability density function (PDF) and maximum a posteriori (MAP) estimation rule as noise cancelling algorithm. The hardware software co-design approach is employed to achieve real-time performance while considering ease of implementation and design flexibility. The software module utilizes LEON SPARC-v8 and FPU co-prosessor as processing unit. The AMBA based Hanning Filter and FFT/IFFT are utilized as processing accelerator modules to increase system performance. The FFT/IFFT module employs custom Radix-2^2 Single Delay Feedback (R2^2SDF). In order to deliver high data transfer rate between buffer and hardware accelerators, the DMA controller is incorporated. The overall system implementation utilizes 18,500 logic elements and consumes 21.87 kB of memory. The system takes only 0.69 ms latency which is appropriate for real-time application. An FPGA Altera DE2-70 is used for prototyping with both algorithms and the noise cancellation function have been verified
- âŠ