1,007 research outputs found

    Using carry-save adders in low-power multiplier blocks

    Get PDF
    For a simple multiplier block FIR filter design, we compare the effects on power consumption of using direct versus transposed direct forms, tree versus linear structures and carry-save (CS) versus carry-ripple (CR) adders (for which multiplier block algorithms have been designed). We find that tree structures offer power savings, as expected, as does transposition in general but not always. Selective use of CS adders is shown to offer power savings provided that care is taken with their deployment. Our best result is with a direct form CWCS hybrid. The need for new multiplier-block design algorithms is identified

    VLSI design of high-speed adders for digital signal processing applications.

    Get PDF

    ASIC implementations of the Viterbi Algorithm

    Get PDF

    Studies on Implementation of . . . High Throughput and Low Power Consumption

    Get PDF
    In this thesis we discuss design and implementation of frequency selective digital filters with high throughput and low power consumption. The thesis includes proposed arithmetic transformations of lattice wave digital filters that aim at increasing the throughput and reduce the power consumption of the filter implementation. The thesis also includes two case studies where digital filters with high throughput and low power consumption are required. A method for obtaining high throughput as well as reduced power consumption of digital filters is arithmetic transformation of the filter structure. In this thesis arithmetic transformations of first- and second-order Richards’ allpass sections composed by symmetric two-port adaptors and implemented using carry-save arithmetic are proposed. Such filter sections can be used for implementation of lattice wave digital filters and bireciprocal lattice wave digital filters. The latter structures are efficient for implementation of interpolators and decimators by factors of two. Th

    A low-power transmission-gate-based 16-bit multiplier for digital hearing aids

    Get PDF
    The most widespread 16-bit multiplier architectures are compared in terms of area occupation, dissipated energy, and EDP (Energy-Delay Product) in view of low-power low-voltage signal processing for digital hearing aids and similar applications. Transistor-level simulations including back-annotated wire parasitics confirm that the propagation of glitches along uneven and re-convergent paths results in large unproductive node activity. Because of their shorter full-adder chains, Wallace-tree multipliers indeed dissipate less energy than the carry-save (CSM) and other traditional array multipliers (6.0”W/MHz versus 10.9”W/MHz and more for 0.25”m CMOS technology at 0.75V). By combining the Wallace-tree architecture with transmission gates (TGs), a new approach is proposed to improve the energy efficiency further (3.1”W/MHz), beyond recently published low-power architectures. Besides the reduction of the overall capacitance, minimum-sized transmission gate full-adders act as RC-low-pass filters that attenuate undesired switching. Finally, minimum size TGs increase the V dd to ground resistance, hence decreasing leakage dissipation (0.55nW versus 0.84nW in CSM and 0.94nW in Wallace

    Towards Verifying Nonlinear Integer Arithmetic

    Full text link
    We eliminate a key roadblock to efficient verification of nonlinear integer arithmetic using CDCL SAT solvers, by showing how to construct short resolution proofs for many properties of the most widely used multiplier circuits. Such short proofs were conjectured not to exist. More precisely, we give n^{O(1)} size regular resolution proofs for arbitrary degree 2 identities on array, diagonal, and Booth multipliers and quasipolynomial- n^{O(\log n)} size proofs for these identities on Wallace tree multipliers.Comment: Expanded and simplified with improved result

    Residue Number Systems: a Survey

    Get PDF

    Design of ALU and Cache Memory for an 8 bit ALU

    Get PDF
    The design of an ALU and a Cache memory for use in a high performance processor was examined in this thesis. Advanced architectures employing increased parallelism were analyzed to minimize the number of execution cycles needed for 8 bit integer arithmetic operations. In addition to the arithmetic unit, an optimized SRAM memory cell was designed to be used as cache memory and as fast Look Up Table. The ALU consists of stand alone units for bit parallel computation of basic integer arithmetic operations. Addition and subtraction were performed using Kogge Stone parallel prefix hardware operating at 330MHz. A high performance multiplier was built using Radix 4 Modified Booth Encoder (MBE) and a Wallace Tree summation array. The multiplier requires single clock cycle for 8 bit integer multiplication and operates at a maximum frequency of 100MHz. Multiplicative division hardware was built for executing both integer division and square root. The division hardware computes 8-bit division and square root in 4 clock cycles. Multiplier forms the basic building block of all these functional units, making high level of resource sharing feasible with this architecture. The optimal operating frequency for the arithmetic unit is 70MHz. A 6T CMOS SRAM cell measuring 90 ”m2 was designed using minimum size transistors. The layout allows for horizontal overlap resulting in effective area of 76 ”m2 for an 8x8 array. By substituting equivalent bit line capacitance of P4 L1 Cache, the memory was simulated to have a read time of 3.27ns. An optimized set of test vectors were identified to enable high fault coverage without the need for any additional test circuitry. Sixteen test cases were identified that would toggle all the nodes and provide all possible inputs to the sub units of the multiplier. A correlation based semi automatic method was investigated to facilitate test case identification for large multipliers. This method of testability eliminates performance and area overhead associated with conventional testability hardware. Bottom up design methodology was employed for the design. The performance and area metrics are presented along with estimated power consumption. A set of Monte Carlo analysis was carried out to ensure the dependability of the design under process variations as well as fluctuations in operating conditions. The arithmetic unit was found to require a total die area of 2mm2 (approx.) in 0.35 micron process

    ARITHMETIC LOGIC UNIT ARCHITECTURES WITH DYNAMICALLY DEFINED PRECISION

    Get PDF
    Modern central processing units (CPUs) employ arithmetic logic units (ALUs) that support statically defined precisions, often adhering to industry standards. Although CPU manufacturers highly optimize their ALUs, industry standard precisions embody accuracy and performance compromises for general purpose deployment. Hence, optimizing ALU precision holds great potential for improving speed and energy efficiency. Previous research on multiple precision ALUs focused on predefined, static precisions. Little previous work addressed ALU architectures with customized, dynamically defined precision. This dissertation presents approaches for developing dynamic precision ALU architectures for both fixed-point and floating-point to enable better performance, energy efficiency, and numeric accuracy. These new architectures enable dynamically defined precision, including support for vectorization. The new architectures also prevent performance and energy loss due to applying unnecessarily high precision on computations, which often happens with statically defined standard precisions. The new ALU architectures support different precisions through the use of configurable sub-blocks, with this dissertation including demonstration implementations for floating point adder, multiply, and fused multiply-add (FMA) circuits with 4-bit sub-blocks. For these circuits, the dynamic precision ALU speed is nearly the same as traditional ALU approaches, although the dynamic precision ALU is nearly twice as large
    • 

    corecore