In this paper, we present the design of a low-power and efficient parallel multiplication unit. This multiplier architecture is based on Jin-Tai Yan bypassing based multiplier. In order to improve his architecture, the activity for generating control signal is reduced and thereby also reduction of power consumption is achieved. The area overhead is reduce by the 14.3% whereas the reduction in switching power consumption is 20.4 %, thereby improving the efficiency of this multiplier with dip of 9.01% in terms of PDP.
INTRODUCTION
These days low power Digital Signal Processing (DSP) technology is prominently used in battery-powered mobile devices and many other applications, it is essential to come up with new design techniques that results in less power The multiplier unit is the critical arithmetic operation unit for many DSP applications, such as filtering, convolution, Fast Fourier Transform (FFT), etc, and consumes most of the power during DSP operations. These multipliers tend to consume most of the power in DSP computations, and thus power efficient multipliers are very important for the design of low power DSP systems Basically, the power dissipation in digital CMOS circuits can be divided into static (short circuit power, leakage current) and dynamic power dissipation as: P = α f c C L V DD 2 + I SC V DD + I leakage V DD While α represents switching probability, f c as the clock frequency, C L is the load capacitance, V DD is the supply voltage, I SC is the short circuit current, and I leakage is the leakage current. The dynamic power consumption of the multiplier is much more in proportion than the short circuit current and leakage current. It is clear that if the switching activity of a given logic circuit is reduced without changing its function, the power consumption can be reduced.
For the multiplication of two unsigned n-bit numbers, the multiplicand A = a n-1 , a n-2 , … . . ,a 0 and the multiplier B = bn-1, bn-2, .. , b0, the product P = P2n-1, P2n-2,…P0, can be represented as the following equation:
The parallel array multiplier structure is widely used to achieve the high-performance demand in DSP applications and the Braun design represents a typical implementation of an array multiplier. The nxn Braun array multiplier [1] consists of (n-1) rows of carry-save adders, in which each row contains (n-1) full adders, and (n-1)-bit ripple-carry adder in the last row. Also, the carry save adder in the first row can be replaced with halfadders.
Many previous works are done in order to reduce the switching activity of the multiplier. Among these, with the bypassing scheme we are able to disables the operations in some undesired rows or columns so as to save the switching power consumption [2] [3] [4] [5] [6] . The row/ column bypassing multipliers, additional tri-state buffers are used along with a MUXs to skip the full adder cell in the row/column of zero bits. In 2-D bypassing both row and column bypassing is done simultaneously. These designs are discussed in the next section.
Based on the concept of a low cost low-power Jin-Tai Yan bypassing based multiplier [2] , [7] , [8] , modification further proposed. However, the introduction of the bypassing circuit for the control signal generation decreases the power dissipation and when compared with the referenced design, it provides good results as discussed in the later section. This paper is organized as follows. In the next section we give some preliminary information about the different bypassing designs or previous works on low power multipliers. The design of our multiplier is presented in Section 3, and experimental results on the performance of various multipliers are shown in Section 4.
BYPASSING BASED DESIGNS
Dynamic power consumption can be reduced by bypassing method when the multiplier/multiplicand has more zeros in input data and to perform this isolation, tristate buffers can be used, as ideal switches with small power consumption, propagation delay. To study the proposed design we have consider row, column and 2-D bypassing based designs in which adder cells are bypassed or disabled if the corresponding bit in the multiplicand/multiplier bit is 0. For a low-power row-bypassing multiplier, if the bit, bj, in the multiplier is 0, then all partial products, aibj, will be zero and thus all the addition operations in the j-th row can be bypassed for the power reduction. In Fig. 1 , a 4x4 Braun multiplier with row bypassing can be illustrated.
As a result, the addition operations in the j th row (according to the j th bit of multiplier) of CSAs can be disabled and the outputs from previous (j-1)-th row are directly fed to
Figure1. Row Bypassing the next (j+1)-th row of CSAs , providing same multiplication result. In this multiplier design, each of the modified full adders is attached to the three tri-state buffers and two 2-to-1 multiplexers. The extra correcting circuit is required to correct the final multiplication result, because of bypassing operations of the rightmost FAs in the each row of CSA's. Figure 2 . Column bypassing For example, let b2 be 0 in the figure above. In this case, the CSA in the second row can be bypassed, and the outputs from the first row are fed directly to the third row CSA. However, since the rightmost FA in the second row is disabled, it does not execute the addition and thus the output is not correct. To remedy this problem, an extra circuit must be added.
Instead of bypassing rows of full adders, there exists another multiplier design in which columns of adders are bypassed. In this approach, the operations in a column can be disabled if the corresponding bit in the multiplicand is 0. Similar to previous technique, the column-bypassing multiplier uses additional tri-state buffers and MUXs to skip FA cell in the column of zero bits. As shown in the Fig. 2 , the column bypassing multiplier has less logic than the row-bypassing multiplier because column-bypassing does not need to consider bypassing of the carry bit, making this approach is more simpler in circuit and with less extra correcting circuit, explained as follows.
In the column bypassing multiplier, shown in Fig. 2 , we only need two tri-state gates and one multiplexer in a modified adder cell. If the bit aj is zero, it can be verified for FAs in the j th diagonal that the 'carry' bit from its upper right FA, and the partial product aibj will be zero. As a result, the output carry bit of such an FA is 0, and the output sum bit is simply equal to the third bit, which is the 'sum' output of its upper FA. [4] . Thus, the operation of the corresponding diagonal can be disabled since all the outputs are known. There is no need of a tri-state gate for the carry input (Ci-1, j) from upper right FA. In the bottom of the CSA array, there is need to set the carry outputs to be 0. Otherwise, the corresponding FAs may not produce the correct outputs since their inputs are disabled. This is done by adding an AND gate at the outputs of the last-row CSA adders.
Figure 3. Row and column bypassing
In the 2-dimensional bypassing multiplier, the nullity of the partial products and the bit of multiplicand is detected, to determine whether FA cells on the corresponding row and column are skipped or not, respectively. figure. 3 shows the structure of the 2-D multiplier, with row and column bypassing. [6] The adder cells of 2-dimensional bypassing multiplier have additional logics to solve the conflict which appears when row-bypassing and column-bypassing occur simultaneously. The additional logics take large circuit overhead.
Another recent design that exists is the low cost low power bypassing based design. This design is Based on 2-dimensional bypassing feature, in which if the bit, ai, in the multiplicand or the bit, bj, in the multiplier is 0, the bypassing of the addition operations in the (i+1) th column or the j th row takes place. In this low-power bypassingbased multiplier, the addition operation in the (i+1, j) th FA can be bypassed if the product, aibj, is equal to the carry bit, ci,j-1, that is, as the product, aibj, is not equal to the bit, ci,j-1, the addition operation in the (i+1, j)-th FA must be executed, which is the bypassing condition. Hence, the control signal in the bypassing condition can be obtained by the XOR result of the product, aibj, and the carry bit, Ci,j-1.Considering the area, each simplified adder, A+1, in the CSA array is only attached by one tri-state buffer and two 2-to-1 multiplexers. 
THE PROPOSED APPROACH
The main idea of our approach is based on the observation that low cost low power bypassing based design [2] consumes a large amount of power for producing the control signal, for obtaining the bypassing condition [5] . Therefore based on the two dimensional feature, the aim is to reduce the unnecessary switching in the [2] design, while generating control signal. The previous design uses the exor gate for producing the control signal for bypassing the (i+1,j) th FA operation when the product aibj and the carry bit Ci,j-1 are equal to each other. Now focusing on the functioning of XOR operation, for the control signal, which just produces the output equals to the Ci,j-1 or its negations, if the other input aibj is one or zero respectively. This function for generating control signal can be replaced with the low power Modified Block (MB) which is shown in the figure 5. The previous carry is attached to the tri-state buffers, controlled by the multiplicand and multiplier bit product aibj.
Figure 5. Modified block
The previous carry, from the previous right A+1 adder, is sent to the MB. It is known that the (i+1, j) FA only executes the A+1 addition as the product, aibj, is 1 and the bit, ci,j-1, is 0, or the product, aibj, is 0 and the bit, ci,j-1, is 1, when both are not equal to each other. On the other hand, the (i+1, j) FA will not be executed when the product, aibj, is equal to the bit, Ci,j-1. Thereby, the (i+1, j)-th FA now includes the A+1 incremental adder and a low-cost low power modified block and the resultant carry bit, ci+1,j, can be obtained by bypassing the previous carry bit, ci,j-1. The MB here produce the result as the control signal in the bypassing condition depending upon the product, aibj, and the carry bit, ci,j-1.
Except for the first row of CSAs, that are allowed to increment every time as the inputs are directly fed to the incremental adder without tri state buffer and for the rest of all FA in the rows, there are low cost A+ 1 incremental adder and the MBs, which are used to save the power consumption. The proposed design based on bypassing and with the use of modified block can be illustrated in figure 6 below.
IMPLEMENTATION AND RESULTS
We evaluated the proposed approach on 4x4 multipliers whose circuit design, in this paper, has been developed using Verilog-HDL and synthesized in Cadence RTL compiler using 180nm CMOS standard cell technology library and are compared with the traditional and previous implementations of the Braun's multiplier. Table 1 exhibits the post synthesis results of previous as well as proposed structures in terms of delay and area and power for the 4x4 multiplier. The area indicates the total cell area of the design; the total power is the sum of dynamic power, internal power, net power and leakage power. The delay is the critical path delay of the architecture.
The improvement in power consumption decreases due to the effect of the Modified block. On comparing the LCLP [2] design, we observe that the efficiency of the proposed design grows. The total area for the proposed design and the low cost low power bypassing based design [2] is decreased by the 13.49%, while the worst path delay is increases by 14.9%. Also, it comes into picture that the minimization of logic helps in producing good low power results and the proposed 4x4 architecture decreases power consumption by 20.4%. In addition to the realization of low power as discussed, results depicts that the proposed architecture provides the dip of 9.01% in the Power delay product (PDP), when compared with the low cost low power design i.e. [2] , making the design more recommendable. In summary we believe that the proposed design identifies itself a better alternative for use in low power arithmetic architectures
CONCLUSION
In this paper we have presented a modification in design of low cost low power bypassing based design [2] , with reduced switching, when the structure has been synthesized with Cadence RTL compiler using 180nm technology. The proposed structure proves to be an easier solution for improving the power consumption of the parallel multiplier. Although previous design bypasses for low power but along with the use of the modified block, the proposed unit is also found to consume less power and thus proving to be more efficient in terms of PDP.
