34 research outputs found

    Multiplierless CSD techniques for high performance FPGA implementation of digital filters.

    Get PDF
    I leverage FastCSD to develop a new, high performance iterative multiplierless structure based on a novel real-time CSD recoding, so that more zero partial products are introduced. Up to 66.7% zero partial products occur compared to 50% in the traditional modified Booth's recoding. Also, this structure reduces the non-zero partial products to a minimum. As a result, the number of arithmetic operations in the carry-save structure is reduced. Thus, an overall speed-up, as well as low-power consumption can be achieved. Furthermore, because the proposed structure involves real time CSD recoding and does not require a fixed value for the multiplier input to be known a priori, the proposed multiplier can be applied to implement digital filters with non-fixed filter coefficients, such as adaptive filters.My work is based on a dramatic new technique for converting between 2's complement and CSD number systems, and results in high-performance structures that are particularly effective for implementing adaptive systems in reconfigurable logic.My research focus is on two key ideas for improving DSP performance: (1) Develop new high performance, efficient shift-add techniques ("multiplierless") to implement the multiply-add operations without the need for a traditional multiplier structure. (2) There is a growing trend toward design prototyping and even production in FPGAs as opposed to dedicated DSP processors or ASICs; leverage this trend synergistically with the new multiplierless structures to improve performance.Implementation of digital signal processing (DSP) algorithms in hardware, such as field programmable gate arrays (FPGAs), requires a large number of multipliers. Fast, low area multiply-adds have become critical in modern commercial and military DSP applications. In many contemporary real-time DSP and multimedia applications, system performance is severely impacted by the limitations of currently available speed, energy efficiency, and area requirement of an onboard silicon multiplier.I also introduce a new multi-input Canonical Signed Digit (CSD) multiplier unit, which requires fewer shift/add/subtract operations and reduced CSD number conversion overhead compared to existing techniques. This results in reduced power consumption and area requirements in the hardware implementation of DSP algorithms. Furthermore, because all the products are produced simultaneously, the multiplication speed and thus the throughput are improved. The multi-input multiplier unit is applied to implement digital filters with non-fixed filter coefficients, such as adaptive filters. The implementation cost of these digital filters can be further reduced by limiting the wordlength of the input signal with little or no sacrifice to the filter performance, which is confirmed by my simulation results. The proposed multiplier unit can also be applied to other DSP algorithms, such as digital filter banks or matrix and vector multiplications.Finally, the tradeoff between filter order and coefficient length in the design and implementation of high-performance filters in Field Programmable Gate Arrays (FPGAs) is discussed. Non-minimum order FIR filters are designed for implementation using Canonical Signed Digit (CSD) multiplierless implementation techniques. By increasing the filter order, the length of the coefficients can be decreased without reducing the filter performance. Thus, an overall hardware savings can be achieved.Adaptive system implementations require real-time conversion of coefficients to Canonical Signed Digit (CSD) or similar representations to benefit from multiplierless techniques for implementing filters. Multiplierless approaches are used to reduce the hardware and increase the throughput. This dissertation introduces the first non-iterative hardware algorithm to convert 2's complement numbers to their CSD representations (FastCSD) using a fixed number of shift and logic operations. As a result, the power consumption and area requirements required for hardware implementation of DSP algorithms in which the coefficients are not known a priori can be greatly reduced. Because all CSD digits are produced simultaneously, the conversion speed and thus the throughput are improved when compared to overlap-and-scan techniques such as Booth's recoding

    A novel 2D filter design methodology for heterogeneous devices

    No full text
    Published versio

    Design and implementation of digital wave filter adaptors

    Get PDF

    Application of Neural Networks with CSD Coefficients for Human Face Recognition

    Get PDF
    Face recognition is one of the most popular, reliable and widely used applications in real world. It is the main biometric used by humans in many security, law enforcement and commercial systems and high demand of this application attracts researchers from various fields such as image processing, pattern recognition, neural network and computer vision etc. In a Human Face Recognition Systems, we start with pre-processing of the data followed by feature extraction for dimensionality reduction and then classification. In this thesis, neural network classifier with CSD coefficients is used to make the area required for implementation of recognition system more efficient. The FPGA implementation of the proposed technique indicates almost 50% saving in the area required for face recognition application by using neural network classifier with CSD coefficients while the processing speed is improved in comparison to its binary counterpart. Extensive experimental results were conducted to show the utility of the proposed technique

    High sample-rate Givens rotations for recursive least squares

    Get PDF
    The design of an application-specific integrated circuit of a parallel array processor is considered for recursive least squares by QR decomposition using Givens rotations, applicable in adaptive filtering and beamforming applications. Emphasis is on high sample-rate operation, which, for this recursive algorithm, means that the time to perform arithmetic operations is critical. The algorithm, architecture and arithmetic are considered in a single integrated design procedure to achieve optimum results. A realisation approach using standard arithmetic operators, add, multiply and divide is adopted. The design of high-throughput operators with low delay is addressed for fixed- and floating-point number formats, and the application of redundant arithmetic considered. New redundant multiplier architectures are presented enabling reductions in area of up to 25%, whilst maintaining low delay. A technique is presented enabling the use of a conventional tree multiplier in recursive applications, allowing savings in area and delay. Two new divider architectures are presented showing benefits compared with the radix-2 modified SRT algorithm. Givens rotation algorithms are examined to determine their suitability for VLSI implementation. A novel algorithm, based on the Squared Givens Rotation (SGR) algorithm, is developed enabling the sample-rate to be increased by a factor of approximately 6 and offering area reductions up to a factor of 2 over previous approaches. An estimated sample-rate of 136 MHz could be achieved using a standard cell approach and O.35pm CMOS technology. The enhanced SGR algorithm has been compared with a CORDIC approach and shown to benefit by a factor of 3 in area and over 11 in sample-rate. When compared with a recent implementation on a parallel array of general purpose (GP) DSP chips, it is estimated that a single application specific chip could offer up to 1,500 times the computation obtained from a single OP DSP chip

    Low Power Digital Filter Implementation in FPGA

    Get PDF
    Digital filters suitable for hearing aid application on low power perspective have been developed and implemented in FPGA in this dissertation. Hearing aids are primarily meant for improving hearing and speech comprehensions. Digital hearing aids score over their analog counterparts. This happens as digital hearing aids provide flexible gain besides facilitating feedback reduction and noise elimination. Recent advances in DSP and Microelectronics have led to the development of superior digital hearing aids. Many researchers have investigated several algorithms suitable for hearing aid application that demands low noise, feedback cancellation, echo cancellation, etc., however the toughest challenge is the implementation. Furthermore, the additional constraints are power and area. The device must consume as minimum power as possible to support extended battery life and should be as small as possible for increased portability. In this thesis we have made an attempt to investigate possible digital filter algorithms those are hardware configurable on low power view point. Suitability of decimation filter for hearing aid application is investigated. In this dissertation decimation filter is implemented using ‘Distributed Arithmetic’ approach.While designing this filter, it is observed that, comb-half band FIR-FIR filter design uses less hardware compared to the comb-FIR-FIR filter design. The power consumption is also less in case of comb-half band FIR-FIR filter design compared to the comb-FIR-FIR filter. This filter is implemented in Virtex-II pro board from Xilinx and the resource estimator from the system generator is used to estimate the resources. However ‘Distributed Arithmetic’ is highly serial in nature and its latency is high; power consumption found is not very low in this type of filter implementation. So we have proceeded for ‘Adaptive Hearing Aid’ using Booth-Wallace tree multiplier. This algorithm is also implemented in FPGA and power calculation of the whole system is done using Xilinx Xpower analyser. It is observed that power consumed by the hearing aid with Booth-Wallace tree multiplier is less than the hearing aid using Booth multiplier (about 25%). So we can conclude that the hearing aid using Booth-Wallace tree multiplier consumes less power comparatively. The above two approached are purely algorithmic approach. Next we proceed to combine circuit level VLSI design and with algorithmic approach for further possible reduction in power. A MAC based FDF-FIR filter (algorithm) that uses dual edge triggered latch (DET) (circuit) is used for hearing aid device. It is observed that DET based MAC FIR filter consumes less power than the traditional (single edge triggered, SET) one (about 41%). The proposed low power latch provides a power saving upto 65% in the FIR filter. This technique consumes less power compared to previous approaches that uses low power technique only at algorithmic abstraction level. The DET based MAC FIR filter is tested for real-time validation and it is observed that it works perfectly for various signals (speech, music, voice with music). The gain of the filter is tested and is found to be 27 dB (maximum) that matches with most of the hearing aid (manufacturer’s) specifications. Hence it can be concluded that FDF FIR digital filter in conjunction with low power latch is a strong candidate for hearing aid application

    Evolvable hardware platform for fault-tolerant reconfigurable sensor electronics

    Get PDF

    Techniques for Efficient Implementation of FIR and Particle Filtering

    Full text link

    An iterated tabu search algorithm for the design of fir filters

    Get PDF
    « RÉSUMÉ : Les systèmes modernes de télécommunication sans fils occupent une place majeure dans la société actuelle. Dans les dernières années, la complexité des outils qui en découlent n'a cessé d'augmenter car, en plus de prendre en charge les tâches basiques de communication vocale, ceux-ci doivent également supporter une quantité croissante de modules et d'applications parallèles (connexion internet, capture vidéo, guidage par satellite, etc.). En conséquence, l'évolution rapide subie par ces outils qui, dans la majorité des cas, sont alimentés par batteries, a singulièrement accru l'importance du rôle joué par la consommation énergétique, et a ainsi fait de l'efficacité énergétique et de l'informatique éco-responsable des caractéristiques essentielles dans les développements récents de la micro-éléctronique. Afin d'offrir une solution à ces problèmes énergétiques, une partie des recherches s'est focalisée sur la conception de filtres numériques efficaces. Les filtres numériques sont la pierre angulaire de tous les systèmes de traitement de signal numérique. Chaque filtre est implanté par un circuit intégré, qui, lui-même, est composé d'une liste d'éléments de base incluant des additionneurs, des multiplicateurs, des inverseurs, etc. La piste principale suivie par les chercheurs dans le but de réduire la quantité d'énergie consommée par les filtres numériques propose de remplacer les multiplicateurs dans les circuits par des éléments moins énergivores, tels que des additionneurs, des décaleurs et des inverseurs. L'objectif des méthodes introduites dans ce sens consiste généralement à remplacer les multiplicateurs tout en utilisant le moins d'additionneurs possible. En effet, en l'absence de multiplicateurs dans les circuits, les additionneurs deviennent l'élément le plus demandant en ressource énergétique. Dans les faits, la quantité d'additionneurs contenue dans un circuit sans multiplicateurs, aussi connue comme son coût en additionneurs, est communément utilisée afin d'estimer sa consommation énergétique. Nos travaux se concentrent sur la conception de filtres numériques sans multiplicateurs énergétiquement efficaces. Ils se décomposent en deux contributions majeures: un nouveau modèle de représentation efficace des circuits intégrés, et un algorithme innovateur destiné à la conception de filtres numériques efficaces. Dans un premier temps, notre modélisation des circuits sous la forme de graphes pondérés a l'avantage d'offrir une représentation concise des circuits intégrés, tout en annulant la symétrie présente dans les modèles de représentation actuels.Dans un second temps, notre métaheuristique, qui combine à la fois une recherche tabou et une recherche tabou itérée, offre un contrôle direct du niveau d'énergie consommée par le circuit qu'elle construit, en fixant la quantité d'additionneurs qu'il contient avant le démarrage du processus de conception. En outre, contrairement aux méthodes existantes, notre approche ne se réfère à aucune architecture spécifique afin de concevoir un circuit. Ce degré de liberté permet à notre méthode d'atteindre une optimisation plus globale de la structure du circuit en comparaison des autres méthodes et, ainsi, de posséder un contrôle plus précis de sa consommation énergétique. L'algorithme proposé est testé sur un jeu de données contenant plus de 700 filtres de complexité variée. Les résultats obtenus démontrent les performances élevées de notre approche car, en se basant sur le coût en additionneurs, dans plus de 99% des cas, notre méthode conçoit des filtres numériques avec un niveau de consommation énergétique total équivalent au niveau induit uniquement par l'architecture à laquelle les méthodes actuelles se réfèrent. En parallèle, notre méthode fournit également un meilleur contrôle de la longueur de mot interne dans les circuits, qui représente un autre aspect crucial de leur efficacité énergétique. La comparaison avec l'algorithme Heuristic cumulative benefit (Hcub) qui, à ce jour, est la méthode la plus performante montre que les filtres construits par notre algorithme utilisent 55% moins d'additionneurs que Hcub, tout en réduisant la taille de ces additionneurs de 33%. Ces améliorations sont obtenues au simple coût d'une augmentation de 17% du nombre de délais dans les circuits. Cependant, la consommation énergétique d'un délai étant de l'ordre de 20% de celle d'un additionneur, si l'on considère le nombre et la taille des additionneurs ainsi que la quantité de délais inclus dans nos circuits afin d'estimer leur consommation énergétique, on peut s'attendre à une économie globale de l'ordre de 65% en comparaison de la meilleure méthode actuelle.»----------«ABSTRACT : In today's modern society, we rely on wireless telecommunication devices that use applications and modules to perform many different tasks and are growing in their complexity day by day. Consequently, the fast evolution of these devices, which, most of the time, are battery-powered, drastically increased the importance of their energy consumption and made energy efficiency and green computing essential features of recent developments in microelectronics. To deal with the related issues, many researchers have focused their attention to designing energy-efficient digital filters, which are essential building blocks of all digital signal processing systems. Any digital filter is implemented by an integrated circuit composed by a list of basic elements, including adders, multipliers, shifts, etc. One of the paths that researchers have followed in order to decrease the amount of energy used by the integrated circuits was to replace the multipliers in the circuit structure with less energy-consuming elements such as adders, shifts and inverters. The goal of these methods is usually to perform the replacement of multipliers while using the least amount of adders, as, for multiplierless circuits, adders become the most energy-consuming elements. In fact, the quantity of adders contained in a multiplierless circuit, also known as its adder cost, is commonly used as an estimate of its power consumption. In our research we focus on energy-efficient multiplierless filters. Our work has two main contributions: a new model to efficiently represent integrated circuits, and an innovative algorithm to design efficient digital filters. On one hand, the main advantage of our new graph-based model is that it is able to represent any integrated circuit in a concise form, while avoiding symmetry in the representation. On the other hand, our metaheuristic, that combines both a tabu search and an iterated tabu search, offers a direct control of the level of energy consumed by the circuits it constructs, by fixing the number of adders that they contain. Besides, unlike other existing methods used for designing multiplierless filters, our approach does not refer to any specific architecture in the corresponding circuit structure. This degree of freedom allows our method to have a more globalized view on the optimization of circuit structure compared to the other methods, and thus, a better control on its power consumption. The proposed algorithm is tested on a benchmark containing more than 700 filters of different orders of complexity. The obtained results demonstrate the high accuracy of the proposed approach as, based on the adder cost estimation, in more than 99%99\% of the cases our method designs integrated circuits with a level of energy consumption equivalent to those implied only by the most accurate circuit architectures from which existing algorithms build their circuits, and absolutely no deviation from the desired filtering specifications. In parallel, our method also provides a better control of the internal wordlength in the circuits, which is another crucial point to improve the energy-efficiency. The comparison to the current state-of-the-art algorithm Heuristic cumulative benefit (Hcub) when designing all the benchmark filters shows that filters constructed with our algorithm are using 55% less adders than Hcub, while decreasing their size by 33%. This improvement can be reached at the cost of an increase of 17% in the number of delays in the circuits. However, by considering the number and the size of adders used in the circuit as well as the quantity of delays it contains as an estimate of the power consumed by the circuit, assuming that the energy consumption of a delay is in the order of 20% of the consumption of an adder, we can approximately expect an overall energy saving of 65% in our circuits compared to the best current method
    corecore