64,313 research outputs found

    Design and Analysis of 8x8 Wallace Tree Multiplier using GDI and CMOS Technology

    Full text link
    Multiplier is a small unit of an arithmetic circuit that is widely used in Digital filters, Digital Signal Processing, microprocessors and communication applications etc. In today's scenario compact and small digital devices are critical concern in the field of VLSI design, which should perform fast as well as low power consumption. Optimizing the delay, area and power of a multiplier is a major design issues, as area and speed are usually conflicting constraints. A Wallace tree multiplier is an improved version of tree base multiplier. The main aim of this paper is a reconfigurable 8x8 Wallace Tree multiplier using CMOS and GDI technology. This is efficient in power and regularity without increase in delay and area. The generation of partial products in parallel using AND gates. The addition of partial products is reducing using Wallace Tree which is divided into levels. Therefore there will be a certain reduction in the power consumption, since power is provided only to the level that is involved in computation and the remaining two levels remain off

    A suggestion for low-power current-sensing complementary pass-transistor logic interconnection

    Get PDF
    [[abstract]]In this paper, a new circuit interconnection scheme of the low-power current-sensing complementary pass-transistor logic (LCSCPTL) is proposed and analyzed. The proposed new circuit scheme using full-swing and non-full-swing output signals to control the NMOS pass transistor logic tree network. Due to the non-full-swing outputs and the current-sensing scheme, the new logic circuit scheme can improve the power dissipation and operation speed. The non-full-swing LCSCPTL is applied to the design of the parallel multiplier. The 4-2 compressors and the conditional carry selection scheme are used in this design to achieve regular layout and improve the operation speed. Moreover, the 1.2 V 8×8-bit parallel multiplier can be fabricated without changing the conventional 5 V CMOS process. The operation speed of the parallel multiplier is 32 ns for 1.2 V supply voltage.[[conferencedate]]19970609~19970612[[conferencelocation]]Hong Kon

    A new truncation algorithm of low hardware cost multiplier

    Get PDF
    Multiplier is one of the most inevitable arithmetic circuit in digital signal design. Multipliers dissipate high power and occupy significant amount of the die area. In this paper, a low-error architecture design of the pre-truncated parallel multiplier is presented. The coefficients word length has been truncated to reduce the multiplier size. This truncation scaled down the gate count and shortened the critical paths of partial product array. The statistical errors of the designed multiplier are calculated for different pre-truncate values and compared. The multiplier is implemented using Stratix III, FPGA device. The post fitting report is presented in this paper, which shows a saving of 36.9 % in resources usage, and a reduction of 17 % in propagation time delay

    CMOS design of a current-mode multiplier/divider circuit with applications to fuzzy controllers

    Get PDF
    Multiplier and divider circuits are usually required in the fields of analog signal processing and parallel-computing neural or fuzzy systems. In particular, this paper focuses on the hardware implementation of fuzzy controllers, where the divider circuit is usually the bottleneck. Multiplier/divider circuits can be implemented with a combination of A/D-D/A converters. An efficient design based on current-mode data converters is presented herein. Continuous-time algorithmic converters are chosen to reduce the control circuitry and to obtain a modular design based on a cascade of bit cells. Several circuit structures to implement these cells are presented and discussed. The one that is selected enables a better trade-off speed/power than others previously reported in the literature while maintaining a low area occupation. The resulting multiplier/divider circuit offers a low voltage operation, provides the division result in both analog and digital formats, and it is suitable for applications of low or middle resolution (up to 9 bits) like applications to fuzzy controllers. The analysis is illustrated with Hspice simulations and experimental results from a CMOS multiplier/divider prototype with 5-bit resolution. Experimental results from a CMOS current-mode fuzzy controller chip that contains the proposed design are also included

    Image Processing using Approximate Data-path Units

    Get PDF
    abstract: In this work, we present approximate adders and multipliers to reduce data-path complexity of specialized hardware for various image processing systems. These approximate circuits have a lower area, latency and power consumption compared to their accurate counterparts and produce fairly accurate results. We build upon the work on approximate adders and multipliers presented in [23] and [24]. First, we show how choice of algorithm and parallel adder design can be used to implement 2D Discrete Cosine Transform (DCT) algorithm with good performance but low area. Our implementation of the 2D DCT has comparable PSNR performance with respect to the algorithm presented in [23] with ~35-50% reduction in area. Next, we use the approximate 2x2 multiplier presented in [24] to implement parallel approximate multipliers. We demonstrate that if some of the 2x2 multipliers in the design of the parallel multiplier are accurate, the accuracy of the multiplier improves significantly, especially when two large numbers are multiplied. We choose Gaussian FIR Filter and Fast Fourier Transform (FFT) algorithms to illustrate the efficacy of our proposed approximate multiplier. We show that application of the proposed approximate multiplier improves the PSNR performance of 32x32 FFT implementation by 4.7 dB compared to the implementation using the approximate multiplier described in [24]. We also implement a state-of-the-art image enlargement algorithm, namely Segment Adaptive Gradient Angle (SAGA) [29], in hardware. The algorithm is mapped to pipelined hardware blocks and we synthesized the design using 90 nm technology. We show that a 64x64 image can be processed in 496.48 µs when clocked at 100 MHz. The average PSNR performance of our implementation using accurate parallel adders and multipliers is 31.33 dB and that using approximate parallel adders and multipliers is 30.86 dB, when evaluated against the original image. The PSNR performance of both designs is comparable to the performance of the double precision floating point MATLAB implementation of the algorithm.Dissertation/ThesisM.S. Computer Science 201

    Low-Power, Low-Cost, & High-Performance Digital Designs : Multi-bit Signed Multiplier design using 32nm CMOS Technology

    Get PDF
    Binary multipliers are ubiquitous in digital hardware. Digital multipliers along with the adders play a major role in computing, communicating, and controlling devices. Multipliers are used majorly in the areas of digital signal and image processing, central processing unit (CPU) of the computers, high-performance and parallel scientific computing, machine learning, physical layer design of the communication equipment, etc. The predominant presence and increasing demand for low-power, low-cost, and high-performance digital hardware led to this work of developing optimized multiplier designs. Two optimized designs are proposed in this work. One is an optimized 8 x 8 Booth multiplier architecture which is implemented using 32nm CMOS technology. Synthesis (pre-layout) and post-layout results show that the delay is reduced by 24.7% and 25.6% respectively, the area is reduced by 5.5% and 15% respectively, the power consumption is reduced by 21.5% and 26.6% respectively, and the area-delay-product is reduced by 28.8% and 36.8% respectively when compared to the performance results obtained for the state-of-the-art 8 x 8 Booth multiplier designed using 32nm CMOS technology with 1.05 V supply voltage at 500 MHz input frequency. Another is a novel radix-8 structure with 3-bit grouping to reduce the number of partial products along with the effective partial product reduction schemes for 8 x 8, 16 x 16, 32 x 32, and 64 x 64 signed multipliers. Comparing the performance results of the (synthesized, post-layout) designs of sizes 32 x 32, and 64 x 64 based on the simple novel radix-8 structure with the estimated performance measurements for the optimized Booth multiplier design presented in this work, reduction in delay by (2.64%, 0.47%) and (2.74%, 18.04%) respectively, and reduction in area-delay-product by (12.12%, -5.17%) and (17.82%, 12.91%) respectively can be observed. With the use of the higher radix structure, delay, area, and power consumption can be further reduced. Appropriate adder deployment, further exploring the optimized grouping or compression strategies, and applying more low-power design techniques such as power-gating, multi-Vt MOS transistor utilization, multi-VDD domain creation, etc., help, along with the higher radix structures, realizing the more efficient multiplier designs

    A single chip low power asynchronous implementation of an FFT algorithm for space applications

    Get PDF
    Journal ArticleA fully asynchronous _x000C_fixed point FFT processor is introduced for low power space applications. The architecture is based on an algorithm developed by Suter and Stevens specifi_x000C_cally for a low power implementation. The novelty of this architecture lies in its high localization of components and pipelining with no need to share a global memory. High throughput is attained using large numbers of small, local components working in parallel. A derivation of the algorithm from the discrete Fourier transform is presented followed by a discussion of circuit design parameters specifi_x000C_cally those relevant to space applications. A survey of this application specifi_x000C_c architecture is included with a detailed look at the design of the complex-valued Booth multiplier to demonstrate the design methodology of this project. Finally, simulation results based on layout extractions are presented and an outline for future work is given

    Radix-8 Booth Encoded Modulo Multiplier

    Get PDF
    Abstract To design an efficient integrated circuit in terms of area, power and speed, has become a challenging task in modern VLSI design field. The encryption and decryption of PKC algorithms are performed by repeated modulo multiplications these multiplications differ from those encountered in signal processing and general computing applications. The Residue Number System (RNS) has emerged as a promising alternative number representation for the design of faster and low power multipliers owing to its merit to distribute a long integer multiplication into several shorter and independent modulo multiplications. The multipliers are the essential elements of the digital signal processing such as filtering, convolution, transformations and Inner products. RNS has also been successfully employed to design fault tolerant digital circuits. The modulo multiplier is usually the noncritical data path among all modulo multipliers in such high-DR RNS multiplier. This timing slack can be exploited to reduce the system area and power consumption without compromising the system performance. With this precept, a family of radix-8 Booth encoded modulo multipliers, with delay adaptable to the RNS multiplier delay, is proposed. In this paper, the radix-8 Booth encoded modulo multipliers whose delay can be tuned to match the RNS delay. In the proposed multiplier, the hard multiple is implemented using small word-length ripple carry adders (RCAs) operating in parallel. The carry-out bits from the adders are not propagated but treated as partial product bits to be accumulated in the CSA tree. The delay of the modulo multiplier can be directly controlled by the word-length of the RCAs to equal the delay of the critical modulo multiplier of the RNS. By combining radix-8 Booth encoded modulo multiplier, CSA and prefix architecture of multiplier, for high speed and low-power is achieved

    Design-Space Exploration of Mixed-precision DNN Accelerators based on Sum-Together Multipliers

    Get PDF
    Mixed-precision quantization (MPQ) is gaining momentum in academia and industry as a way to improve the trade-off between accuracy and latency of Deep Neural Networks (DNNs) in edge applications. MPQ requires dedicated hardware to support different bit-widths. One approach uses Precision-Scalable MAC units (PSMACs) based on multipliers operating in Sum-Together (ST) mode. These can be configured to compute N = 1, 2, 4 multiplications/dot-products in parallel with operands at 16/N bits. We contribute to the State of the Art (SoA) in three directions: we compare for the first time the SoA ST multipliers architectures in performance, power and area; compared to previous work, we contribute to the portfolio of ST-based accelerators proposing three designs for the most common DNN algorithms: 2D-Convolution, Depth-wise Convolution and Fully-Connected; we show how these accelerators can be obtained with a High-Level Synthesis (HLS) flow. In particular, we perform a design-space exploration (DSE) in area, latency, power, varying many knobs, including PSMAC units parallelism, clock frequency and ST multipliers type. From the DSE on a 28-nm technology we observe that both at multiplier level and at accelerator level there is no one-fits-all solution for each possible scenario. Our findings allow accelerators’ designers to choose, out of a rich variety, the best combination of ST multiplier and HLS knobs depending on the target, either high performance, low area, or low power
    • …
    corecore