1,863 research outputs found

    High throughput spatial convolution filters on FPGAs

    Get PDF
    Digital signal processing (DSP) on field- programmable gate arrays (FPGAs) has long been appealing because of the inherent parallelism in these computations that can be easily exploited to accelerate such algorithms. FPGAs have evolved significantly to further enhance the mapping of these algorithms, included additional hard blocks, such as the DSP blocks found in modern FPGAs. Although these DSP blocks can offer more efficient mapping of DSP computations, they are primarily designed for 1-D filter structures. We present a study on spatial convolutional filter implementations on FPGAs, optimizing around the structure of the DSP blocks to offer high throughput while maintaining the coefficient flexibility that other published architectures usually sacrifice. We show that it is possible to implement large filters for large 4K resolution image frames at frame rates of 30–60 FPS, while maintaining functional flexibility

    A low-power asynchronous VLSI FIR filter

    Get PDF
    An asynchronous FIR filter, based on a Single Bit-Plane architecture with a data-dependent, dynamic-logic implementation, is presented. Its energy consumption and sample computation delay are shown to correlate approximately linearly with the total number of ones in its coeflcient-set. The proposed architecture has the property that coefficients in a Sign-Magnitude representation can be handled at negligible overhead which, for typical filter coefficient-sets, is shown to offer significant benefits to both energy consumption and throughput. Transistor level simulations show energy consumption to be lower than in previously reported designs

    Design of Multistage Decimation Filters Using Cyclotomic Polynomials: Optimization and Design Issues

    Full text link
    This paper focuses on the design of multiplier-less decimation filters suitable for oversampled digital signals. The aim is twofold. On one hand, it proposes an optimization framework for the design of constituent decimation filters in a general multistage decimation architecture. The basic building blocks embedded in the proposed filters belong, for a simple reason, to the class of cyclotomic polynomials (CPs): the first 104 CPs have a z-transfer function whose coefficients are simply {-1,0,+1}. On the other hand, the paper provides a bunch of useful techniques, most of which stemming from some key properties of CPs, for designing the proposed filters in a variety of architectures. Both recursive and non-recursive architectures are discussed by focusing on a specific decimation filter obtained as a result of the optimization algorithm. Design guidelines are provided with the aim to simplify the design of the constituent decimation filters in the multistage chain.Comment: Submitted to CAS-I, July 07; 11 pages, 5 figures, 3 table

    On the Polyphase Decomposition for Design of Generalized Comb Decimation Filters

    Full text link
    Generalized comb filters (GCFs) are efficient anti-aliasing decimation filters with improved selectivity and quantization noise (QN) rejection performance around the so called folding bands with respect to classical comb filters. In this paper, we address the design of GCF filters by proposing an efficient partial polyphase architecture with the aim to reduce the data rate as much as possible after the Sigma-Delta A/D conversion. We propose a mathematical framework in order to completely characterize the dependence of the frequency response of GCFs on the quantization of the multipliers embedded in the proposed filter architecture. This analysis paves the way to the design of multiplier-less decimation architectures. We also derive the impulse response of a sample 3rd order GCF filter used as a reference scheme throughout the paper.Comment: Submitted to IEEE TCAS-I, February 2007; 11 double-column pages, 9 figures, 1 tabl

    Wordlength determination algorithms for hardware implementation of linear time invariant systems with prescribed output accuracy

    Get PDF
    This paper proposes two novel algorithms for optimizing the hardware resources in finite wordlength implementation of linear time invariant systems. The hardware complexity is measured by the exact internal wordlength used for each intermediate data. The first algorithm formulates the design problem as a constrained optimization, from which an analytic closed-form solution of the internal wordlengths subject to a prescribed output accuracy can be determined by the Lagrange multiplier method. The second algorithm is based on a discrete optimization method called the Marginal Analysis method, and it yields the desired wordlengths in integer values. Both approaches are found to be very effective and they are well-suited to large scale systems such as software radio receivers. Design examples show that the proposed algorithms offer better results and a lower design complexity than conventional methods. © 2005 IEEE.published_or_final_versio

    Design and implementation of DA FIR filter for bio-inspired computing architecture

    Get PDF
    This paper elucidates the system construct of DA-FIR filter optimized for design of distributed arithmetic (DA) finite impulse response (FIR) filter and is based on architecture with tightly coupled co-processor based data processing units. With a series of look-up-table (LUT) accesses in order to emulate multiply and accumulate operations the constructed DA based FIR filter is implemented on FPGA. The very high speed integrated circuit hardware description language (VHDL) is used implement the proposed filter and the design is verified using simulation. This paper discusses two optimization algorithms and resulting optimizations are incorporated into LUT layer and architecture extractions. The proposed method offers an optimized design in the form of offers average miminimizations of the number of LUT, reduction in populated slices and gate minimization for DA-finite impulse response filter. This research paves a direction towards development of bio inspired computing architectures developed without logically intensive operations, obtaining the desired specifications with respect to performance, timing, and reliability

    Evolvable hardware platform for fault-tolerant reconfigurable sensor electronics

    Get PDF

    Multiplierless CSD techniques for high performance FPGA implementation of digital filters.

    Get PDF
    I leverage FastCSD to develop a new, high performance iterative multiplierless structure based on a novel real-time CSD recoding, so that more zero partial products are introduced. Up to 66.7% zero partial products occur compared to 50% in the traditional modified Booth's recoding. Also, this structure reduces the non-zero partial products to a minimum. As a result, the number of arithmetic operations in the carry-save structure is reduced. Thus, an overall speed-up, as well as low-power consumption can be achieved. Furthermore, because the proposed structure involves real time CSD recoding and does not require a fixed value for the multiplier input to be known a priori, the proposed multiplier can be applied to implement digital filters with non-fixed filter coefficients, such as adaptive filters.My work is based on a dramatic new technique for converting between 2's complement and CSD number systems, and results in high-performance structures that are particularly effective for implementing adaptive systems in reconfigurable logic.My research focus is on two key ideas for improving DSP performance: (1) Develop new high performance, efficient shift-add techniques ("multiplierless") to implement the multiply-add operations without the need for a traditional multiplier structure. (2) There is a growing trend toward design prototyping and even production in FPGAs as opposed to dedicated DSP processors or ASICs; leverage this trend synergistically with the new multiplierless structures to improve performance.Implementation of digital signal processing (DSP) algorithms in hardware, such as field programmable gate arrays (FPGAs), requires a large number of multipliers. Fast, low area multiply-adds have become critical in modern commercial and military DSP applications. In many contemporary real-time DSP and multimedia applications, system performance is severely impacted by the limitations of currently available speed, energy efficiency, and area requirement of an onboard silicon multiplier.I also introduce a new multi-input Canonical Signed Digit (CSD) multiplier unit, which requires fewer shift/add/subtract operations and reduced CSD number conversion overhead compared to existing techniques. This results in reduced power consumption and area requirements in the hardware implementation of DSP algorithms. Furthermore, because all the products are produced simultaneously, the multiplication speed and thus the throughput are improved. The multi-input multiplier unit is applied to implement digital filters with non-fixed filter coefficients, such as adaptive filters. The implementation cost of these digital filters can be further reduced by limiting the wordlength of the input signal with little or no sacrifice to the filter performance, which is confirmed by my simulation results. The proposed multiplier unit can also be applied to other DSP algorithms, such as digital filter banks or matrix and vector multiplications.Finally, the tradeoff between filter order and coefficient length in the design and implementation of high-performance filters in Field Programmable Gate Arrays (FPGAs) is discussed. Non-minimum order FIR filters are designed for implementation using Canonical Signed Digit (CSD) multiplierless implementation techniques. By increasing the filter order, the length of the coefficients can be decreased without reducing the filter performance. Thus, an overall hardware savings can be achieved.Adaptive system implementations require real-time conversion of coefficients to Canonical Signed Digit (CSD) or similar representations to benefit from multiplierless techniques for implementing filters. Multiplierless approaches are used to reduce the hardware and increase the throughput. This dissertation introduces the first non-iterative hardware algorithm to convert 2's complement numbers to their CSD representations (FastCSD) using a fixed number of shift and logic operations. As a result, the power consumption and area requirements required for hardware implementation of DSP algorithms in which the coefficients are not known a priori can be greatly reduced. Because all CSD digits are produced simultaneously, the conversion speed and thus the throughput are improved when compared to overlap-and-scan techniques such as Booth's recoding
    • …
    corecore