1,448 research outputs found

    Design and implementation of DA FIR filter for bio-inspired computing architecture

    Get PDF
    This paper elucidates the system construct of DA-FIR filter optimized for design of distributed arithmetic (DA) finite impulse response (FIR) filter and is based on architecture with tightly coupled co-processor based data processing units. With a series of look-up-table (LUT) accesses in order to emulate multiply and accumulate operations the constructed DA based FIR filter is implemented on FPGA. The very high speed integrated circuit hardware description language (VHDL) is used implement the proposed filter and the design is verified using simulation. This paper discusses two optimization algorithms and resulting optimizations are incorporated into LUT layer and architecture extractions. The proposed method offers an optimized design in the form of offers average miminimizations of the number of LUT, reduction in populated slices and gate minimization for DA-finite impulse response filter. This research paves a direction towards development of bio inspired computing architectures developed without logically intensive operations, obtaining the desired specifications with respect to performance, timing, and reliability

    Design of a reusable distributed arithmetic filter and its application to the affine projection algorithm

    Get PDF
    Digital signal processing (DSP) is widely used in many applications spanning the spectrum from audio processing to image and video processing to radar and sonar processing. At the core of digital signal processing applications is the digital filter which are implemented in two ways, using either finite impulse response (FIR) filters or infinite impulse response (IIR) filters. The primary difference between FIR and IIR is that for FIR filters, the output is dependent only on the inputs, while for IIR filters the output is dependent on the inputs and the previous outputs. FIR filters also do not sur from stability issues stemming from the feedback of the output to the input that aect IIR filters. In this thesis, an architecture for FIR filtering based on distributed arithmetic is presented. The proposed architecture has the ability to implement large FIR filters using minimal hardware and at the same time is able to complete the FIR filtering operation in minimal amount of time and delay when compared to typical FIR filter implementations. The proposed architecture is then used to implement the fast affine projection adaptive algorithm, an algorithm that is typically used with large filter sizes. The fast affine projection algorithm has a high computational burden that limits the throughput, which in turn restricts the number of applications. However, using the proposed FIR filtering architecture, the limitations on throughput are removed. The implementation of the fast affine projection adaptive algorithm using distributed arithmetic is unique to this thesis. The constructed adaptive filter shares all the benefits of the proposed FIR filter: low hardware requirements, high speed, and minimal delay.Ph.D.Committee Chair: Anderson, Dr. David V.; Committee Member: Hasler, Dr. Paul E.; Committee Member: Mooney, Dr. Vincent J.; Committee Member: Taylor, Dr. David G.; Committee Member: Vuduc, Dr. Richar

    DESIGN OF HIGH PERFORMANCE MULTIPLIERLESS LINEAR PHASE FINITE IMPULSE RESPONSE FILTERS

    Get PDF
    This research work proposes the finite impulse response (FIR) filters design using distributed arithmetic architecture optimized for field programmable gate array. To implement computationally efficient, low power, high-speed FIR filter a two-dimensional fully pipelined structure is used. The FIR filter is dynamically reconfigured to realize low pass and high pass filter by changing the filter coefficients. The FIR filter is most fundamental components in digital signal processing for high-speed application. The aim of this research work is to design multiplier-less FIR filter for the requirements of low power and high speed various embedded applications.Â

    Multi-standard programmable baseband modulator for next generation wireless communication

    Full text link
    Considerable research has taken place in recent times in the area of parameterization of software defined radio (SDR) architecture. Parameterization decreases the size of the software to be downloaded and also limits the hardware reconfiguration time. The present paper is based on the design and development of a programmable baseband modulator that perform the QPSK modulation schemes and as well as its other three commonly used variants to satisfy the requirement of several established 2G and 3G wireless communication standards. The proposed design has been shown to be capable of operating at a maximum data rate of 77 Mbps on Xilinx Virtex 2-Pro University field programmable gate array (FPGA) board. The pulse shaping root raised cosine (RRC) filter has been implemented using distributed arithmetic (DA) technique in the present work in order to reduce the computational complexity, and to achieve appropriate power reduction and enhanced throughput. The designed multiplier-less programmable 32-tap FIR-based RRC filter has been found to withstand a peak inter-symbol interference (ISI) distortion of -41 dB

    Modified DA based FIR Filter in Multirate DSP systems on FPGA

    Get PDF
    Multirate systems are popular in DSP.Systems which employ multiple sampling rates in the processing of digital signals are called Multirate DSP systems, which are used in audio, video processing and communication systems. Multirate DSP systems that employ different chips for different frequency signal results in more area and power utilization. The setback can be avoided by implementing Multirate system, based on Distributed Arithmetic FIR filter. Using such systems, we can achieve computation efficiency and improve the system performance. Modified DA based FIR Filter using Multirate systems includes decimation, interpolation process implemented on FPGA with 53% less LUT utilization compared to existing Multirate system

    FPGA implementations for parallel multidimensional filtering algorithms

    Get PDF
    PhD ThesisOne and multi dimensional raw data collections introduce noise and artifacts, which need to be recovered from degradations by an automated filtering system before, further machine analysis. The need for automating wide-ranged filtering applications necessitates the design of generic filtering architectures, together with the development of multidimensional and extensive convolution operators. Consequently, the aim of this thesis is to investigate the problem of automated construction of a generic parallel filtering system. Serving this goal, performance-efficient FPGA implementation architectures are developed to realize parallel one/multi-dimensional filtering algorithms. The proposed generic architectures provide a mechanism for fast FPGA prototyping of high performance computations to obtain efficiently implemented performance indices of area, speed, dynamic power, throughput and computation rates, as a complete package. These parallel filtering algorithms and their automated generic architectures tackle the major bottlenecks and limitations of existing multiprocessor systems in wordlength, input data segmentation, boundary conditions as well as inter-processor communications, in order to support high data throughput real-time applications of low-power architectures using a Xilinx Virtex-6 FPGA board. For one-dimensional raw signal filtering case, mathematical model and architectural development of the generalized parallel 1-D filtering algorithms are presented using the 1-D block filtering method. Five generic architectures are implemented on a Virtex-6 ML605 board, evaluated and compared. A complete set of results on area, speed, power, throughput and computation rates are obtained and discussed as performance indices for the 1-D convolution architectures. A successful application of parallel 1-D cross-correlation is demonstrated. For two dimensional greyscale/colour image processing cases, new parallel 2-D/3-D filtering algorithms are presented and mathematically modelled using input decimation and output image reconstruction by interpolation. Ten generic architectures are implemented on the Virtex-6 ML605 board, evaluated and compared. Key results on area, speed, power, throughput and computation rate are obtained and discussed as performance indices for the 2-D convolution architectures. 2-D image reconfigurable processors are developed and implemented using single, dual and quad MAC FIR units. 3-D Colour image processors are devised to act as 3-D colour filtering engines. A 2-D cross-correlator parallel engine is successfully developed as a parallel 2-D matched filtering algorithm for locating any MRI slice within a MRI data stack library. Twelve 3-D MRI filtering operators are plugged in and adapted to be suitable for biomedical imaging, including 3-D edge operators and 3-D noise smoothing operators. Since three dimensional greyscale/colour volumetric image applications are computationally intensive, a new parallel 3-D/4-D filtering algorithm is presented and mathematically modelled using volumetric data image segmentation by decimation and output reconstruction by interpolation, after simultaneously and independently performing 3-D filtering. Eight generic architectures are developed and implemented on the Virtex-6 board, including 3-D spatial and FFT convolution architectures. Fourteen 3-D MRI filtering operators are plugged and adapted for this particular biomedical imaging application, including 3-D edge operators and 3-D noise smoothing operators. Three successful applications are presented in 4-D colour MRI (fMRI) filtering processors, k-space MRI volume data filter and 3-D cross-correlator.IRAQI Government

    A Comparative Performance of Discrete Wavelet Transform Implementations Using Multiplierless

    Get PDF
    Using discrete wavelet transform (DWT) in high-speed signal-processing applications imposes a high degree of care to hardware resource availability, latency, and power consumption. In this chapter, the design aspects and performance of multiplierless DWT is analyzed. We presented the two key multiplierless approaches, namely the distributed arithmetic algorithm (DAA) and the residue number system (RNS). We aim to estimate the performance requirements and hardware resources for each approach, allowing for the selection of proper algorithm and implementation of multi-level DAA- and RNS-based DWT. The design has been implemented and synthesized in Xilinx Virtex 6 ML605, taking advantage of Virtex 6’s embedded block RAMs (BRAMs)

    Multiplierless CSD techniques for high performance FPGA implementation of digital filters.

    Get PDF
    I leverage FastCSD to develop a new, high performance iterative multiplierless structure based on a novel real-time CSD recoding, so that more zero partial products are introduced. Up to 66.7% zero partial products occur compared to 50% in the traditional modified Booth's recoding. Also, this structure reduces the non-zero partial products to a minimum. As a result, the number of arithmetic operations in the carry-save structure is reduced. Thus, an overall speed-up, as well as low-power consumption can be achieved. Furthermore, because the proposed structure involves real time CSD recoding and does not require a fixed value for the multiplier input to be known a priori, the proposed multiplier can be applied to implement digital filters with non-fixed filter coefficients, such as adaptive filters.My work is based on a dramatic new technique for converting between 2's complement and CSD number systems, and results in high-performance structures that are particularly effective for implementing adaptive systems in reconfigurable logic.My research focus is on two key ideas for improving DSP performance: (1) Develop new high performance, efficient shift-add techniques ("multiplierless") to implement the multiply-add operations without the need for a traditional multiplier structure. (2) There is a growing trend toward design prototyping and even production in FPGAs as opposed to dedicated DSP processors or ASICs; leverage this trend synergistically with the new multiplierless structures to improve performance.Implementation of digital signal processing (DSP) algorithms in hardware, such as field programmable gate arrays (FPGAs), requires a large number of multipliers. Fast, low area multiply-adds have become critical in modern commercial and military DSP applications. In many contemporary real-time DSP and multimedia applications, system performance is severely impacted by the limitations of currently available speed, energy efficiency, and area requirement of an onboard silicon multiplier.I also introduce a new multi-input Canonical Signed Digit (CSD) multiplier unit, which requires fewer shift/add/subtract operations and reduced CSD number conversion overhead compared to existing techniques. This results in reduced power consumption and area requirements in the hardware implementation of DSP algorithms. Furthermore, because all the products are produced simultaneously, the multiplication speed and thus the throughput are improved. The multi-input multiplier unit is applied to implement digital filters with non-fixed filter coefficients, such as adaptive filters. The implementation cost of these digital filters can be further reduced by limiting the wordlength of the input signal with little or no sacrifice to the filter performance, which is confirmed by my simulation results. The proposed multiplier unit can also be applied to other DSP algorithms, such as digital filter banks or matrix and vector multiplications.Finally, the tradeoff between filter order and coefficient length in the design and implementation of high-performance filters in Field Programmable Gate Arrays (FPGAs) is discussed. Non-minimum order FIR filters are designed for implementation using Canonical Signed Digit (CSD) multiplierless implementation techniques. By increasing the filter order, the length of the coefficients can be decreased without reducing the filter performance. Thus, an overall hardware savings can be achieved.Adaptive system implementations require real-time conversion of coefficients to Canonical Signed Digit (CSD) or similar representations to benefit from multiplierless techniques for implementing filters. Multiplierless approaches are used to reduce the hardware and increase the throughput. This dissertation introduces the first non-iterative hardware algorithm to convert 2's complement numbers to their CSD representations (FastCSD) using a fixed number of shift and logic operations. As a result, the power consumption and area requirements required for hardware implementation of DSP algorithms in which the coefficients are not known a priori can be greatly reduced. Because all CSD digits are produced simultaneously, the conversion speed and thus the throughput are improved when compared to overlap-and-scan techniques such as Booth's recoding
    • …
    corecore