Search CORE

3 research outputs found

ASIC Implementation of Multiplexer Based DAA

Author: B Sahayajenila
D Srimathi
M Malarvizhi
P Santhini
Publication venue
Publication date: 03/04/2020
Field of study

ABSTRACT: In Digital Image Processing Point, Line and Edge detection are performed through software approach. The proposed Architecture performs these operations through hardware approach using Distributed Arithmetic. Distributed arithmetic (DA) has been widely used to implement inner product computations with fixed inputs. Conventional ROM-based DA suffers from large ROM requirements. To reduce the memory requirements, Adder based DA uses pre-defined structure for computation. But both the methods are suitable only if at least one input is constant. This project aims to implement a new Distributed Arithmetic Architecture for point detection, line detection and edge detection in DIP when both the inputs are variable. The new architecture is termed as Multiplexer based Distributed Arithmetic (MUX based DA). The proposed architecture takes the advantage of Multiplexer and DA for inner product computations when both the inputs are variable. In addition it reduces ROM requirement and complexity in constructing Adder based architecture for higher order inputs. Here, the performance of proposed Architecture with ROM based DA, Adder based DA and with multiplier based implementation are compared. The MUX based DA reduces power up to 81% and needs 40% of area as compared with multiplier based implementation. KEYWORDS: ROM based DA,ADDER based DA,MULTIPLEXER based DA, CADENCE 180nm Technology. I.INTRODUCTION Distributed Arithmetic (DA) has been widely adopted for its computational efficiency in many digital signal processing applications. The most frequently used form of computation in digital signal processing is a sum of products which is dot-product or inner-product generation. DA is generally abit-serial computation operation that forms a product of two vectors in one clock cycle. The typical applications include DCT, DFT (Discrete Fourier Transform), FIR (Finite Impulse Response), and DHT (Discrete Hartley Transform) which can be found in main stream multimedia standards and telecommunication protocols. The advantage of DA is its special non multiplication mechanization which uses adder replacing multiplication and therefore simplifies the hardware implementation. The idea behind the conventional DA, called ROM based, is to replace multiplication operations by pre-computing all possible values and storing these in a ROM. The Adder based DA uses a fixed architecture which can be obtained by distributing fixed variable is used for inner product computation. The DA technique distributes arithmetic operation rather than lumps themas multipliers do. Conventional DA called ROM based DA decomposes the variable input of the inner product into bit level to generate pre-computed data.ROM based DA uses a ROM table to store the pre-computed data, which makesit regular and efficient in silicon area in VLSI implementation. However, when the size of the inner product increases the ROM area increases exponentially and becomes impractically large, even using ROM partition. In contrast to conventional DA, Adder based DA decomposes the other operand of inner product into bit level, distributes the multiplication operation, and shares the common summation terms .The adder based DA exploits the distribution of binary value pattern and may maximize the hardware sharing possibility in the implementation. Although the Adder based DA requires less hardware area and smaller computation cycle time than ROM based DA, both the existing method operates only on one input as fixed but the proposed MUX base DA computes result with both the input as variable as same as MAC. The direct implementation of the filter requires more number of resources, to reduce the number of resources Distributed Arithmetic came into existence which replaces multiplications by additions and siftings. The proposed DA algorithm came into existence which uses multiplexers to remove the usage of ROM memory and complexity in constructing fixed architecture for higher order inputs. The proposed MUX based D

CiteSeerX

Matrices cellulaires reconfigurables en point flottant dédiées au traitement des signaux

Author: El Ghali Nabil
Publication venue
Publication date: 01/08/2011
Field of study

RÉSUMÉ Les processeurs scalaires sont majoritairement utilisés de nos jours, pour le traitement des signaux numériques, par comparaison aux processeurs matriciels qui offrent pourtant plus de vitesse de calcul due à leur architecture parallèle traitant de nombreuses données en temps réel. Il existe une multitude d’architectures de matrices cellulaires. Cependant la grande majorité est très spécialisée pour le calcul d’une ou deux fonctions de traitement de signaux et seuls quelques processeurs matriciels sont reconfigurables afin de traiter la plupart des fonctions de traitement de signaux. Ce mémoire présente l’architecture d’un processeur matriciel construit à partir de cellules complexes de calcul appelé "Module de Traitement Universel" (UPM). Ce processeur peut servir comme un module de propriété intellectuelle (IP block) destiné à être utilisé dans un FPGA pour le traitement des signaux. Des mêmes matrices d’UPMs sont reconfigurées en vue d’effectuer la plupart des opérations de Traitement Numérique des Signaux DSP incluant des fonctions de filtrage adaptatif récursives ou non et des fonctions d’analyse spectrale. Ce processeur peut être reconfiguré pour appliquer diverses transformées, filtres adaptatifs, filtres en treillis, en générations de fonctions, corrélations et en calcul de fonctions récursives qui peuvent être exécutées à grande vitesse. Pour une plus grande précision la conception est faite de manière à traiter les données en arithmétique point flottant. Afin de permettre le calcul de fonctions récursives l’unité de traitement UPM est construite avec un module de contrôle de récursivité. En outre l’UPM est conçu de manière à être mis en cascade afin d’augmenter l’ordre des opérations de traitement. La conception logicielle de matrice 2x2 UPMs et 6x4 UPMs, qui sont programmées en langage Verilog-HDL, est simulée et testée avec les mêmes cellules reconfigurées en plusieurs fonctions telles que le filtrage adaptatif, l’analyse spectrale et le calcul de fonctions récursives. La même matrice de cellules à été simulée sur Matlab Simulink sous différentes configurations.----------ABSTRACT Scalar processors are commonly used today in contrast with array processors which offer a higher computation speed due to their parallel architecture dealing with a great number of data in real time. Several cellular arrays architectures exist. However, the vast majority is highly specialized for the computation of one or two signal processing functions and only a few are reconfigurable to handle most of the of signal processing functions. This thesis presents the architecture of an array processor constructed using building blocks which are complex computation cells named Universal Processing Module (UPM). This array processor may serve as an intellectual property (IP block) to be used in FPGA technology and dedicated to signal processing. The same UPMs matrices are reconfigured to perform most of digital signal processing DSP operations including adaptive recursive and non recursive filtering, and spectral analysis functions. This processor can be reconfigured in order to compute transforms, adaptive filters, lattice filters, function generations, correlations and recursive functions, all performed at high speed. For greater accuracy the processor is constructed in floating point arithmetic. In order to enable computation of recursive functions, the UPM is built with a recursion control module. This processing element can also be indefinitely with the intention to increase filtering order. The software design of a 2x2 UPMs and a 6x4 UPMs arrays which is programmed in Verilog-HDL language, is simulated and tested using same cells reconfigured in order to compute DSP algorithms such as adaptive filtering, spectral analysis and recursive functions. The same matrix of cell is simulated on Matlab Simulink through different configuration. The processor is tested with all proposed reconfigurations and offers an acceptable computing precision

PolyPublie

Generic low power reconfigurable distributed arithmetic processor

Author: Liu Zhenyu
Publication venue: The University of Edinburgh
Publication date: 01/01/2009
Field of study

Higher performance, lower cost, increasingly minimizing integrated circuit components, and higher packaging density of chips are ongoing goals of the microelectronic and computer industry. As these goals are being achieved, however, power consumption and flexibility are increasingly becoming bottlenecks that need to be addressed with the new technology in Very Large-Scale Integrated (VLSI) design. For modern systems, more energy is required to support the powerful computational capability which accords with the increasing requirements, and these requirements cause the change of standards not only in audio and video broadcasting but also in communication such as wireless connection and network protocols. Powerful flexibility and low consumption are repellent, but their combination in one system is the ultimate goal of designers. A generic domain-specific low-power reconfigurable processor for the distributed arithmetic algorithm is presented in this dissertation. This domain reconfigurable processor features high efficiency in terms of area, power and delay, which approaches the performance of an ASIC design, while retaining the flexibility of programmable platforms. The architecture not only supports typical distributed arithmetic algorithms which can be found in most still picture compression standards and video conferencing standards, but also offers implementation ability for other distributed arithmetic algorithms found in digital signal processing, telecommunication protocols and automatic control. In this processor, a simple reconfigurable low power control unit is implemented with good performance in area, power and timing. The generic characteristic of the architecture makes it applicable for any small and medium size finite state machines which can be used as control units to implement complex system behaviour and can be found in almost all engineering disciplines. Furthermore, to map target applications efficiently onto the proposed architecture, a new algorithm is introduced for searching for the best common sharing terms set and it keeps the area and power consumption of the implementation at low level. The software implementation of this algorithm is presented, which can be used not only for the proposed architecture in this dissertation but also for all the implementations with adder-based distributed arithmetic algorithms. In addition, some low power design techniques are applied in the architecture, such as unsymmetrical design style including unsymmetrical interconnection arranging, unsymmetrical PTBs selection and unsymmetrical mapping basic computing units. All these design techniques achieve extraordinary power consumption saving. It is believed that they can be extended to more low power designs and architectures. The processor presented in this dissertation can be used to implement complex, high performance distributed arithmetic algorithms for communication and image processing applications with low cost in area and power compared with the traditional methods

Edinburgh Research Archive