Search CORE

7 research outputs found

FIR filter optimization for video processing on FPGAs

Author
Publication venue: Springer
Publication date: 25/05/2013
Field of study

Multiplierless CSD techniques for high performance FPGA implementation of digital filters.

Author: Wang Yunhua.
Publication venue
Publication date: 01/01/2007
Field of study

I leverage FastCSD to develop a new, high performance iterative multiplierless structure based on a novel real-time CSD recoding, so that more zero partial products are introduced. Up to 66.7% zero partial products occur compared to 50% in the traditional modified Booth's recoding. Also, this structure reduces the non-zero partial products to a minimum. As a result, the number of arithmetic operations in the carry-save structure is reduced. Thus, an overall speed-up, as well as low-power consumption can be achieved. Furthermore, because the proposed structure involves real time CSD recoding and does not require a fixed value for the multiplier input to be known a priori, the proposed multiplier can be applied to implement digital filters with non-fixed filter coefficients, such as adaptive filters.My work is based on a dramatic new technique for converting between 2's complement and CSD number systems, and results in high-performance structures that are particularly effective for implementing adaptive systems in reconfigurable logic.My research focus is on two key ideas for improving DSP performance: (1) Develop new high performance, efficient shift-add techniques ("multiplierless") to implement the multiply-add operations without the need for a traditional multiplier structure. (2) There is a growing trend toward design prototyping and even production in FPGAs as opposed to dedicated DSP processors or ASICs; leverage this trend synergistically with the new multiplierless structures to improve performance.Implementation of digital signal processing (DSP) algorithms in hardware, such as field programmable gate arrays (FPGAs), requires a large number of multipliers. Fast, low area multiply-adds have become critical in modern commercial and military DSP applications. In many contemporary real-time DSP and multimedia applications, system performance is severely impacted by the limitations of currently available speed, energy efficiency, and area requirement of an onboard silicon multiplier.I also introduce a new multi-input Canonical Signed Digit (CSD) multiplier unit, which requires fewer shift/add/subtract operations and reduced CSD number conversion overhead compared to existing techniques. This results in reduced power consumption and area requirements in the hardware implementation of DSP algorithms. Furthermore, because all the products are produced simultaneously, the multiplication speed and thus the throughput are improved. The multi-input multiplier unit is applied to implement digital filters with non-fixed filter coefficients, such as adaptive filters. The implementation cost of these digital filters can be further reduced by limiting the wordlength of the input signal with little or no sacrifice to the filter performance, which is confirmed by my simulation results. The proposed multiplier unit can also be applied to other DSP algorithms, such as digital filter banks or matrix and vector multiplications.Finally, the tradeoff between filter order and coefficient length in the design and implementation of high-performance filters in Field Programmable Gate Arrays (FPGAs) is discussed. Non-minimum order FIR filters are designed for implementation using Canonical Signed Digit (CSD) multiplierless implementation techniques. By increasing the filter order, the length of the coefficients can be decreased without reducing the filter performance. Thus, an overall hardware savings can be achieved.Adaptive system implementations require real-time conversion of coefficients to Canonical Signed Digit (CSD) or similar representations to benefit from multiplierless techniques for implementing filters. Multiplierless approaches are used to reduce the hardware and increase the throughput. This dissertation introduces the first non-iterative hardware algorithm to convert 2's complement numbers to their CSD representations (FastCSD) using a fixed number of shift and logic operations. As a result, the power consumption and area requirements required for hardware implementation of DSP algorithms in which the coefficients are not known a priori can be greatly reduced. Because all CSD digits are produced simultaneously, the conversion speed and thus the throughput are improved when compared to overlap-and-scan techniques such as Booth's recoding

SHAREOK repository

Techniques for Efficient Implementation of FIR and Particle Filtering

Author
Publication venue: 'Linkoping University Electronic Press'
Publication date
Field of study

Crossref

Programmable architectures for the automated design of digital FIR filters using evolvable hardware

Author: Hounsell Benjamin Iain
Publication venue: The University of Edinburgh
Publication date: 01/01/2001
Field of study

Edinburgh Research Archive

Energy efficient hardware acceleration of multimedia processing tools

Author: Kinane Andrew
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/01/2006
Field of study

The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

Irish Universities

DCU Online Research Access Service

Design and analysis of short word length DSP systems for mobile communication

Author: Memon T
Publication venue: RMIT University
Publication date: 01/01/2012
Field of study

Recently, many general purpose DSP applications such as Least Mean Squares-Like single-bit adaptive filter algorithms have been developed using the Short Word Length (SWL) technique and have been shown to achieve similar performance as multi-bit systems. A key function in SWL systems is sigma delta modulation (ΣΔM) that operates at an over sampling ratio (OSR), in contrast to the Nyquist rate sampling typically used in conventional multi-bit systems. To date, the analysis of SWL (or single-bit) DSP systems has tended to be performed using high-level tools such as MATLAB, with little work reported relating to their hardware implementation, particularly in Field Programmable Gate Arrays (FPGAs). This thesis explores the hardware implementation of single-bit systems in FPGA using the design and implementation in VHDL of a single-bit ternary FIR-like filter as an illustrative example. The impact of varying OSR and bit-width of the SWL filter has been determined, and a comparison undertaken between the area-performance-power characteristics of the SWL FIR filter compared to its equivalent multi-bit filter. In these experiments, it was found that single-bit FIR-like filter consistently outperforms the multi-bit technique in terms of its area, performance and power except at the highest filter orders analysed in this work. At higher orders, the ΣΔ approach retains its power and performance advantages but exhibits slightly higher chip area. In the second stage of thesis, three encoding techniques called canonical signed digit (CSD), 2’s complement, and Redundant Binary Signed Digit (RBSD) were designed and investigated on the basis of area-performance in FPGA at varying OSR. Simulation results show that CSD encoding technique does not offer any significant improvement as compared to 2’s complement as in multi-bit domain. Whereas, RBSD occupies double the chip area than other two techniques and has poor performance. The stability of the single-bit FIR-like filter mainly depends upon IIR remodulator due to its recursive nature. Thus, we have investigated the stability IIR remodulator and propose a new model using linear analysis and root locus approach that takes into account the widely accepted second order sigma-delta modulator state variable upper bounds. Using proposed model we have found new feedback parameters limits that is a key parameter in single-bit IIR remodulator stability analysis. Further, an analysis of single-bit adaptive channel equalization in MATLAB has been performed, which is intended to support the design and development of efficient algorithm for single-bit channel equalization. A new mathematical model has been derived with all inputs, coefficients and outputs in single-bit domain. The model was simulated using narrowband signals in MATLAB and investigated on the basis of symbol error rate (SER), signal-to-noise ratio (SNR) and minimum mean squared error (MMSE). The results indicate that single-bit adaptive channel equalization is achievable with narrowband signals but that the harsh quantization noise has great impact in the convergence

RMIT Research Repository

Optimización de recursos hardware para la operación de convolución utilizada en el procesamiento digital de señales

Author: Moreno Moreno Carlos Diego
Publication venue: Universidad de Córdoba, Servicio de Publicaciones
Publication date: 01/01/2013
Field of study

Esta tesis presenta varias arquitecturas sobre la unidad MAC (multiplica–acumula) para la optimización de la operación de convolución, que es ampliamente utilizada en el procesamiento digital de señales, sobre varios dispositivos electrónicos de bajo coste. Básicamente esta optimización se centra en las FPGA de Xilinx Spartan 3 y Spartan 6, utilizando aritmética redundante, en particular la aritmética carry–save. Este tipo de aritmética no se suele utilizar en las FPGAs debido a que aumenta el área consumida, pero en esta investigación se ha demostrado experimentalmente que cuando el número de operaciones MAC a realizar es elevado, como es el caso de la convolución de dos señales, el uso de la aritmética CSA resulta eficiente, ya que disminuye significativamente los tiempos empleados, sin un aumento excesivo de los recursos utilizados de la FPGA. Por otro lado, también se han estudiado otros dispositivos electrónicos que suelen ser empleados en el procesamiento digital de señales, tales como DSP o GPP, realizando una comparación de los tiempos empleados de las FPGAs respecto a estos dispositivos.This Thesis presents several architectures of the multiply-accumulate unit (MAC) to optimize the convolution operation, which is widely used in digital signal processing, on several low-cost electronic devices. This optimization is mainly focused on Xilinx Spartan- 3 and Spartan-6 FPGAs, using redundant arithmetic, specifically the carry-save arithmetic (CSA). This type of arithmetic is not usually used on FPGAs since its high consumption of area resources, but this research shows that if the number of MAC operations developed is high, as the case of the convolution of two signals, the use of CSA arithmetic is efficient, since it decreases significantly the execution times without an excessive increase of the resources used in the FPGA. On the other hand, other electronic devices as DSP or GPP, usually used in digital signal processing, have been studied. A comparation of execution times on FPGAs and these devices has been included

Repositorio Institucional de la Universidad de Córdoba