86 research outputs found

    Multiplierless CSD techniques for high performance FPGA implementation of digital filters.

    Get PDF
    I leverage FastCSD to develop a new, high performance iterative multiplierless structure based on a novel real-time CSD recoding, so that more zero partial products are introduced. Up to 66.7% zero partial products occur compared to 50% in the traditional modified Booth's recoding. Also, this structure reduces the non-zero partial products to a minimum. As a result, the number of arithmetic operations in the carry-save structure is reduced. Thus, an overall speed-up, as well as low-power consumption can be achieved. Furthermore, because the proposed structure involves real time CSD recoding and does not require a fixed value for the multiplier input to be known a priori, the proposed multiplier can be applied to implement digital filters with non-fixed filter coefficients, such as adaptive filters.My work is based on a dramatic new technique for converting between 2's complement and CSD number systems, and results in high-performance structures that are particularly effective for implementing adaptive systems in reconfigurable logic.My research focus is on two key ideas for improving DSP performance: (1) Develop new high performance, efficient shift-add techniques ("multiplierless") to implement the multiply-add operations without the need for a traditional multiplier structure. (2) There is a growing trend toward design prototyping and even production in FPGAs as opposed to dedicated DSP processors or ASICs; leverage this trend synergistically with the new multiplierless structures to improve performance.Implementation of digital signal processing (DSP) algorithms in hardware, such as field programmable gate arrays (FPGAs), requires a large number of multipliers. Fast, low area multiply-adds have become critical in modern commercial and military DSP applications. In many contemporary real-time DSP and multimedia applications, system performance is severely impacted by the limitations of currently available speed, energy efficiency, and area requirement of an onboard silicon multiplier.I also introduce a new multi-input Canonical Signed Digit (CSD) multiplier unit, which requires fewer shift/add/subtract operations and reduced CSD number conversion overhead compared to existing techniques. This results in reduced power consumption and area requirements in the hardware implementation of DSP algorithms. Furthermore, because all the products are produced simultaneously, the multiplication speed and thus the throughput are improved. The multi-input multiplier unit is applied to implement digital filters with non-fixed filter coefficients, such as adaptive filters. The implementation cost of these digital filters can be further reduced by limiting the wordlength of the input signal with little or no sacrifice to the filter performance, which is confirmed by my simulation results. The proposed multiplier unit can also be applied to other DSP algorithms, such as digital filter banks or matrix and vector multiplications.Finally, the tradeoff between filter order and coefficient length in the design and implementation of high-performance filters in Field Programmable Gate Arrays (FPGAs) is discussed. Non-minimum order FIR filters are designed for implementation using Canonical Signed Digit (CSD) multiplierless implementation techniques. By increasing the filter order, the length of the coefficients can be decreased without reducing the filter performance. Thus, an overall hardware savings can be achieved.Adaptive system implementations require real-time conversion of coefficients to Canonical Signed Digit (CSD) or similar representations to benefit from multiplierless techniques for implementing filters. Multiplierless approaches are used to reduce the hardware and increase the throughput. This dissertation introduces the first non-iterative hardware algorithm to convert 2's complement numbers to their CSD representations (FastCSD) using a fixed number of shift and logic operations. As a result, the power consumption and area requirements required for hardware implementation of DSP algorithms in which the coefficients are not known a priori can be greatly reduced. Because all CSD digits are produced simultaneously, the conversion speed and thus the throughput are improved when compared to overlap-and-scan techniques such as Booth's recoding

    Evolutionary design of digital VLSI hardware

    Get PDF

    Linear-Phase FIR Digital Filter Design with Reduced Hardware Complexity using Extremal Optimization

    Get PDF
    Extremal Optimization is a recent method for solving hard optimization problems. It has been successfully applied on many optimization problems. Extremal optimization does not share the disadvantage of most of the other evolutionary algorithms, which is the tendency to converge into local minima. Design of finite word length FIR filters using deterministic techniques can guarantee optimality at the expense of exponential increase in computational complexity. Alternatively, Evolutionary Algorithms are capable of converging very fast to a minimum, but have higher chances of failure if the ratio of feasible solutions is very less in the search space. In this thesis, a set of feasible solutions are determined by linear programming. In the second step, Extremal Optimization is used to further refine these results. This strategy helps by reducing the search space for the EO algorithm and is able to find good solutions in much shorter time than the existing methods

    Linear-Phase FIR Digital Filter ‎Design with Reduced Hardware Complexity using Discrete Differential Evolution

    Get PDF
    Optimal design of xed coe cient nite word length linear phase FIR digital lters for custom ICs has been the focus of research in the past decade. With the ever increasing demands for high throughput and low power circuits, the need to design lters with reduced hardware complexity has become more crucial. Multiplierless lters provide substantial saving in hardware by using a shift add network to generate the lter coe cients. In this thesis, the multiplierless lter design problem is modeled as combinatorial optimization problem and is solved using a discrete Di erential Evolution algorithm. The Di erential Evolution algorithm\u27s population representation adapted for the nite word length lter design problem is developed and the mutation operator is rede ned for discrete valued parameters. Experiments show that the method is able to design lters up to a length of 300 taps with reduced hardware and shorter design times

    Joint Optimization of Low-power DCT Architecture and Effcient Quantization Technique for Embedded Image Compression

    Get PDF
    International audienceThe Discrete Cosine Transform (DCT)-based image com- pression is widely used in today's communication systems. Signi cant research devoted to this domain has demonstrated that the optical com- pression methods can o er a higher speed but su er from bad image quality and a growing complexity. To meet the challenges of higher im- age quality and high speed processing, in this chapter, we present a joint system for DCT-based image compression by combining a VLSI archi- tecture of the DCT algorithm and an e cient quantization technique. Our approach is, rstly, based on a new granularity method in order to take advantage of the adjacent pixel correlation of the input blocks and to improve the visual quality of the reconstructed image. Second, a new architecture based on the Canonical Signed Digit and a novel Common Subexpression Elimination technique is proposed to replace the constant multipliers. Finally, a recon gurable quantization method is presented to e ectively save the computational complexity. Experimental results obtained with a prototype based on FPGA implementation and com- parisons with existing works corroborate the validity of the proposed optimizations in terms of power reduction, speed increase, silicon area saving and PSNR improvement

    Synthesis of reconfigurable multiplier blocks: part I: fundamentals

    Get PDF
    Reconfigurable Multiplier Blocks (ReMB) offer significant area, delay and possibly power reduction in time multiplexed implementation of multiple constant multiplications. This paper and its companion paper (subtitled Part II- Algorithm) together present a systematic synthesis method for Single Input Single Output (SISO) and Single Input Multiple Output (SIMO) ReMB designs. This paper presents the necessary foundation and terminology needed for developing a systematic synthesis technique. The companion paper illustrates the synthesis method through examples. The method proposed achieves reduced logic-depth and area over standard multipliers / multiplier blocks

    A FRAMEWORK FOR OPTIMAL DESIGN OF LOW-POWER FIR FILTERS

    Get PDF
    Approximate Computing has emerged as a new low-power design approach for application domains characterized by intrinsic error resilience. Digital Signal Processing (DSP) is one such domain where outputs of acceptable quality can be produced even though the internal computations are carried out in an approximate manner. With the ever increasing need for data rates at lower power usage; the need for improved complexity reduction schemes for DSP systems continues. One of the most widely performed steps in DSP is FIR filtering. FIR filters are preferred due to their linea
    • …
    corecore