Abstract-Low power design of VLSI c ircuits has been identified as vital technology in battery powered portable electronic devices and signal processing applications such as Digital Signal Processors (DSP). Multiplier has an important role in the DSPs. Without degrading the performance of the processor, low power parallel mult ipliers are needed to be design. Bypassing is the widely used technique in the DSPs when the input operand of the mult iplier is zero. A Row based Bypassing Multiplier with comp ressor at the final addition of the ripple carry adder (RCA) is designed to focus on low power and high speed. The proposed bypassing mu ltip lier with co mpressor shows high performance and energy efficiency than Kuo mu ltiplier with Carry Save Adder (CSA) at the final RCA.
I. Introduction
In modern VLSI system, power is the most important parameter to optimize for low power applications like Dig ital Signal Processor (DSP), portable devices etc. DSP is one of core technologies for mu ltimed ia and mobile applications, most DSP applications entail addition and multip lication arith metic operations. Especially, the mu ltip lier is the critical arith metic operation unit for many DSP applicat ions, such as filtering, convolution, Fast Fourier Transform (FFT), etc. Analysis of the conventional DSP applications shows that the average of zero input of operand in mu ltip lier is 73.8 percent. An important lo w power design to reduce power consumption is to shutdown part of a circuit while it is not in operation. The power reduction in mu ltipliers can be achieved using bypassing technique in DSP's. The primary power reductions are obtained by tuning off MOS co mponents through mult iplexers when the operand of the multipliers are zero [1] [ 2] .
The major source of power dissipation in CM OS circuits is the dynamic power d issipation. Dynamic power dissipation appears only when a CMOS gate switches from one stable state to another. In this paper we present a technique to min imize power dissipation in digital mu ltip liers, fro m dynamic power of the total power consumption the concentration is on switching activity. There have been proposed a lot of techniques to reduce the switching activ ity of logic circuit
[7] [8] . Bypassing is the extensively used technique for the reduction of major part of the total power consumption i.e dynamic power.
Mult iplication operation requires more co mputational t ime and higher circuit co mp lexity. Many other complex arith metic operations, like exponentiation, d ivision, and mu ltip licat ive inversion, can be therefore performed by applying mu ltiplication operations repeatedly. Hence, it is important in a practical sense to develop fast mu ltip licat ion algorith ms for these complex arith metic operations. Not only to reduce power consumption, to enhance the speed of the bypassing multip lier (BM) also reported [2] . Our contribution added a step ahead to improve the performance of the BM without increase of the power consumption.
The remaining organizat ion of the paper is as fo llows: In Section 2 bypassing technique is discussed. Section 3 discusses the row based bypassing array multiplier. The proposed row bypassing multiplier with co mpressor is discussed in Sect ion 4. The results and discussions are given in Section 5. Finally conclusion is given in Section 6.
Sources of Power Consumption in CMOS
Power consumption can be reduced in CM OS circuits by using a smaller design of mu ltipliers like unsigned row and column bypassing mult ipliers. The sources of power consumption in CMOS circuits is given by 
II. Bypassing Techniques
The key idea of this design is based on the observation that the most modern multip liers produce a large nu mber of signal transitions while adding zero partial products. The design uses another way to transition activity optimizat ion, and that is hardware bypassing. Since, adding zero part ial products generate a large nu mber of signal transitions in the carry-adder array without affecting the results and the additions bypasses by disabling the adders.
Row Bypassing
For a lo w power row bypassing multiplier, the addition in the j th row can be disabled to reduce the power dissipation if the bit b j in the mult iplier is 0, i.e, all partial products a i b j , 0 ≤ i ≤ n-1, are zero. As a result, the addition operations in the j th row of CSA is in the Fig. 1 is bypassed and the outputs fro m the (j-1) th row of CSAs is directly fed to the (j+1) th row of CSAs without affecting the mu ltip licat ion result. In the des ign, each modified FA in the CSA array is attached by three tristate buffers and two 2-to-1 mu ltiplexers as shown in Fig. 2 . The tri-state buffer shown in Fig. 3 decides whether to disable the full adders or not according to the mu ltip lier b its b j . And then utilizes two mult iplexers as shown in Fig. 4 to select the correct outputs. The extra correcting circuits must be added to correct the multiplication result.
When the corresponding partial product is zero, the RBAC d isables unnecessary transitions and bypasses the inputs to outputs. Two mult iplexers aug mented to the outputs of the adder transmit the input-carry bit and the input-sum b it of the previous addition to the outputs.
The tri state buffers placed at the input of the adder cells disable signal transitions in the adders which are bypassed, and the input carry bit and input sum b it are passed to downwards. 
Column Bypassing
Instead of bypassing rows of Full adders, columns of Full adders of the mult iplier design are bypassed. In this approach, the operations in a column can be disabled if the corresponding bit in the mu lt iplicand is 0. There are two pro of this method. First, it eliminates the extra correcting circuit. Secondly, the modified FA is shown in Fig. 6 is simp ler than that of used in the row bypassing multiplier.
Theorem: When a j = 0, the output of a colu mn j adder cell FA i,j can be specified as follows:
1. The output carry bit is 0.
2. The output sum bit is equal to the output sum b it of FA i-1,j+1 .
Proof: 1. Consider row 0. Note that, in row 0, there are only two bits to be added. Adder FA 0,j carries out a j b 1 + a j+1 b 0 . If a j = 0, then the output carry bit must be zero, and the out sum bit is equal to a j+1 b 0 .
2. Assume that the theorem holds for row i. The two tri-state buffers are placed at two inputs of full adder to disable the operation of full adder when a j is 0. The tri-state buffer is designed by TG-CMOS. The mu ltip lexer is placed at the sum output of full adder. The value of sum is selected fro m the bypassing value or sum output of full adder according to the value of a j . This bypassing cell does not need to add mu ltiplexer for carry output and tri-state buffer for carry input of full adder. Therefore, significant portion of extra hardware is saved without degrading the performance. In addition, power consumption can be also reduced as an effect of reduced hardware activities.
III. Bypassing Array Multiplier
In nu merous computing and signal processing applications, parallel mult iplier has been a building block for many algorith ms. The Carry-Save Array (CSA) mu lt iplier is a straight forward imp lementation of vector mult iplication. It consists of a partial product reduction tree, which is used to calculate partial products in Carry-Save redundant form, and a final chain adder to transform the redundant form in normal binary form. The functionality o f the Carry-Save array mu ltip lier is as follows. X = (x n-1 …….x 0 ) and Y = (y n-1 …..y 0 ) are fed into an array of FA cells. Each FA cell performs the mult iplication X i x Y i using an AND gate and then adds the result with the inco ming carry bits, to produce an output sum and an output carry. All FA cells are appropriately connected (sums and carries) to perform the mult iplication. The final adder is used to merge the sums and carries fro m the last row of the array, since in every row the carry bits are not immed iately added but rather propagated to the row below. The colu mn bypassing multip lier using CSA is as shown in Fig. 8 
IV. Row based Bypassing Multiplier with CSA
The Bypassing scheme is used for low power applications of the processor. The method is used to disable the gate if the input operand of the multiplier is zero. To focus on the speed of the multiplier a design was proposed shown in Fig. 9 , in which CSA architecture is used at the final addition of RCA to shorten the delay of the multip lier [2] . For an examp le, 8x8 mult iplication can be divided into two 8x4 bypassing multiplier based on RCA as shown in Fig. 7 . The partial sums and carry output fro m these two 8x4 mu ltip liers can be computed simultaneously. Note that the final stage adders consist of RCA adders in both sides and CSA adders in the middle. In this configuration, the parallelis m of the existed mult iplier can be established. Furthermore, delay time of RCA multiplier can be shortened through this method.
Proposed Row based Bypassing Multi plier with Compressor
In this paper, the proposed multiplier shown in Fig.  10 adopts parallel architecture to shorten delay time further than that of the mult iplier shown in Fig. 9 . This proposed mult iplier consists of a co mpressor at the middle of RCA further to accelerate the speed of the mu ltip lier. For example , in an 8x8 mu ltiplication the 8x4 two partial product blocks with bypassing method based on RCA with compressor design is shown in Fig.  10 . The part ial sums and carry output from these two 8x4 mu ltip liers can be computed simultaneously. Note that the final stage adders consist of RCA adders on both sides and compressors at the middle. With this configuration, a parallelis m of the proposed mult iplier can be established and also the delay can be reduced with somehow having extra hardware.
Minimizing the number of resources required within a processor would have a positive impact on its power performance. Furthermore, since an adder is one of the basic arithmet ic units, any improvement in the performance of an adder would have a major impact on the performance of a processor. Multi-operand adder structures are frequently used for the summation of partial products in mult iplication, as in the Wallace and Dadda tree mu ltip liers. They are also used in the implementation of arith metic expressions arising fro m the conversion of constant multiplications into shifts and additions [14] [15] [16] .
The 14-T fu ll adder design used to design the compressor is as shown in the Fig. 11 . The performances of the full adder with other fu ll adders [17] [18] are as shown in the Fig. 12 . Co mpressors do the simple operation of addition that adds more number of bits at a time.
Different compressors logic based upon the perception of the counter of full adder, a single b it fu ll adder can be considered as a counter of "1,s" at the input bits. It can be defined as single bit adder circu it that has four/five/six/seven inputs and three outputs.
The Wallace tree architecture s upports fully parallel partial product reduction. The classic 3-input Wallace tree element is a carry-save adder which accepts 3-hit wide opera.nds and exports a 2-bit wide result, i.e., the 3-2 co mpressor takes 3-inputs of same weight and produces 2 outputs, a sum of weight 1 and a carry of weight. 2. Given the nature of the 3-2 comp ressor, it is impossible to build completely regular tree architecture. 
V. Simulation Results and Discussions
The design implementation using Tanner EDA tool is as shown in Fig. 13 . The perfo rmance evaluation of the all the bypassing mu ltip liers are done by Synopsys HSPICE for 180n m technology with a supply voltage of 1.8V. Table 1 shows the performances of the bypassing mu ltip liers in terms of power, delay, energy delay product (EDP) and number of MOS components. The proposed RBM using compressors can consume litt le mo re power 13.36 mW than [2] 13.3 mW because of the one extra full adder as shown in the Fig. 14 . Due to extra hardware the RBM consumes more power than CBM.
RBM Co mpressor design shows high speed 1.45 ns than that of the RBM CSA 79.56 ns as shown the performances in the Fig. 15 . The vertical co mpression of the compressors due to parallel nature enhances the performance of the RBM Co mp. MOS co mponents of the implemented designs are also given in the Table. 1. Though the proposed requires one full adder, the proposed mult iplier is energy efficient than RBM and RBM CSA. A ll mult ipliers are imp lemented with 14-T full adders. Therefore the proposed requires 14 additional transistors. 
VI. Conclusion
Low power designs are mandatory nowadays for DSPs and battery powered portable electronic appliances. The arithmet ic operations of DSPs must be performed for low power consumption without loss of the performance. The pro minence of this paper is without increase of the power consumption to accelerate the performance of the mu ltip lier using bypassing technique. Colu mn bypassing cell is used in row bypassing technique for fewer transistors in order to decrease the power consumption. In this paper a step is taken further on to increase the speed of the mu ltiplier further using a compressor is designed with full add ers and placed at the middle of the final addition of the RCA. The RBM Co mp mu ltip lier consumes little more power with the enhancement of speed and also saves more energy by consuming one full adder area.
