Various high-speed techniques have heen developed for multipliers, but with the increasing popularity of mobile computing, a recent goal has been to minimize power dissipation. A popular delay-reduction technique applied to adder circuits is polarity inversion of bits. As this optimization reduces transistor count, it also has the potential for lowering power dissipation, and can be effectively applied to Wallace tree partial product reduction stages. We illustrate how this technique reduces power, interconnect capacitance, and chip area. Power reduction of up to 25% is achieved.
Introduction
Multiplier, low power, inverse polarity.
Recently emergent portable applications involving DSP tasks require high speed operation while minimizing energy so as to extend battery life. A major power-dissipating, highdelay block in such designs is the multiplier; several wellknown techniques exist for multiplier delay reduction, and recently a broad number of power reduction methods have been proposed [l] . We examine the potential of an existing delay optimization method, which has a dual advantage in providing low power operation.
The technique which we term "inverse polarity" is applied to ripple adders in the following manner: two numbers, (Fig. la) . In terms of the number of logic stages, the delay is Zn. The inverse polarity technique attempts to remove the inverter stage by complementing the input bits at every other hit position. In this manner, an adder of the form in Fig. l In the following sections, we will show how inverse polarity circuits can be used in Wallace tree multipliers [21 to lower power dissipation. We describe inverse polarity circuitry and develop a heuristic for implementing a Wallace tree PP reduction circuit using polarity inversion.
Circuit Design In Multipliers
Digital CMOS logic design consists of implementing a logic function while minimizing transistor count, subject to delay and power constraints. To this end, many CMOS logic gates are designed to drive large interconnect lines by using a twostage structure: the first stage implements the logic, and the second stage consists of a buffer (i.e., an inverter) to drive output capacitance. In this manner, input transistors may be small while the buffer can be made large, resulting in lower total transistor size.
While the logic stagebuffer structure provides suong drive for large loads, multiplier circuits have the interesting characteristic that large net capacitances are not commonly encountered. With a few exceptions, most connections are 2-point nets, and proper placement can ensure that connected components are located fairly close together, resulting in short wires. Therefore, other than the partial product generating circuitry and some of the final adder types, the advantage of a buffer structure is minimal. This suggests that removal of output inverters may realize lower power dissipation with minor effects on delay. If a logically equivalent implementation using fewer buffers can be assembled, the number of switching transistors and overall power will he reduced. We apply this technique to Wallace tree multipliers to determine resultant power savings.
Adder Designs
The fundamental building block in digital multipliers is the full adder, used in the partial product reduction phase to perform carry-save addition (therefore sometimes called a carry-save adder, or CSA). The CSA takes three inputs and calculates two outputs, sum and carry. The most commonly used implementation (modified from 131) is shown in Fig. 2 . The popularity of this particular design comes from the frugal use of transistors in implementing both the carry function and the exclusive-or (sum) function: 28 transistors are used, hence the designation "28T" cell. Note that this circuit incorporates the logic stagehuffer structure which is beneficial when driving large output capacitances.
The inverse polarity paradigm identities bits as one of two polarities-that is, bits will represent the results of additions, i.e., sum and carry (positive polarity-POS), or their complements and carry (negative polarity-NEG). Inverted polarity circuits require that an adder with POS inputs provideNEG outputs and vice-versa. Therefore:
CSAIp(a,b,c) = (sum, carry) CSAlp(a,b,c) = (sum, carry).
--
Similarly for half adders:
--
The 28T implementation of the CSA can be transformed into a CSA,, by simple removal of the inverters-this is because acomplement of the circuit inputs to a CSA yields sum and carry. The HA on the other hand cannot he so easily constructed. If the inputs t o 2 HA are complemented, the results are not sum and carry. Therefore, two versions of the HAlp are required-one for POS inputs which yields sum and carry and another for NEG inputs that gives sum and carry (see Fig. 3 ). 
Partial Product Reduction
Multiplication can be viewed as a series of shifts and adds of two numbers, the multiplicand and the multiplier. The multiplicand is shifted and added once for each non-zero bit of the multiplier. The resulting bits, called the partial product array (PPA), form a trapezoidal array where all bits in a column are added together using carry-save addition.
A greedy heuristic was proposed in [4] to construct a partial product reduction tree while minimizing logic depth. In this method, a priority queue stores the hits for each column, ordered by the largest static delay time of the bit. The algorithm proceeds on a column-by-column basis, starting at the lowest hit position, where the earliest arriving bits are added using a CSA or H A the resulting sum bit goes into the priority queue of the current column, and the carry bit is placed in the queue of the next column.
To use inverted polarity elements, we require that all the inputs to a gate be of the same polarity, either POS or NEG. In array multipliers, this can be achieved fairly easily, since each logic level of adders can be of opposite polarities. In Wallace trees however, some adders' inputs come from signals of different logic levels (see Fig. 4 ). To create equalpolarity inputs, one must track bit polarity and in some cases, inverters must be provided to complement input bits.
Inverse Polarity Algorithm
To assemble the inverse polarity multiplier, we provide two priority queues for each column, one for POS and one for NEG bits. Bits of the same polarity are selected from the queue with the lowest delay bit. When nearly all the bits in a column are consumed, bits from both queues may be mixed, using inverters to normalize all bits to the same polarity. The Since this greedy assembly algorithm uses the lowest delay bits each time it instantiates an adder, the procedure minimizes the growth of the maximum delay per column. The stopping condition is the only point where extra inverters are inserted, for the sole purpose of setting equal the input polarities of an adder. Once the partial product array has been reduced, two bits are present at each bit position. Note that the polarities of the bits may well be mixed, i.e., at any given column, the final two bits may be both POS, both NEG, or one POS and one NEG. At this point, a final adder is created to generate the final result.
Results
A simple placement tool was written which creates a layout in two phases: 1) simulated annealing is used to group elements in the Wallace tree which have high connectivity, 2) final adder blocks are arranged using a procedural placement technique. Estimates of the footprint required for each circuit element were calculated for the MOSIS HP 0.5pm CMOS technology. Interconnect net length was calculated using a Steiner tree construction for multipoint nets with Manhattan distances for each segment (0.165fF/ pm cap. to ground.) Delays are calculated using static timing analysis based on an HSPICE characterization of cells. Power is calculated using Star-sim from Avant! Corp., which allows fast power computation with high accuracy. Tables 1 and 2 Conventional different final adders. In all cases, minimum size devices were used to minimize power consumption. Results clearly indicate a power advantage for inverse polarity multipliers. A more pronounced advantage is seen in larger multipliers with carryselect adders; these have the greatest number of adder circuits, so reduced transistor count is most beneficial in these cases. A potential source of parasitic power dissipation in inverse polarity circuits is increased short circuit (totem pole) current due to more slowly fallinglrising inputs, which result from the inverse polarity optimization. Detailed simulations found this effect to be insignificant. 12.059ff
