Multiplication is one of the essential operations in Digital Signal Processing (DSP) applications like Fast Fourier Transform (FFT), Digital filters etc. With the advancements in technology, research is still going on to design a multiplier that consumes less power or has high speed or occupies less area or a combination of these in a single multiplier. This makes the multipliers to be used for high speed or low power VLSI applications. The Braun's multiplier is one of the parallel array multiplier which is used for unsigned numbers multiplication. The dynamic power of the multiplier can be reduced by using the bypassing techniques. The delay can be reduced by replacing the ripple carry adder in the last stage by fast adders like Carry look ahead adder and Kogge stone adder. This paper presents a comparative study among different types of bypassing multipliers for 4*4, 8*8 and 16*16 bits and their architectural modifications using different FPGAs like Spartan -3E, Virtex -4, Virtex -5 and Virtex -6 Lower power using Xilinx 13.2 ISE tool from which we get the delay and the dynamic power and cell area reports are obtained using RTL Compiler from Cadence in 90 nm technology.
INTRODUCTION
In order to achieve the high speed and low power demand in DSP applications, parallel array multipliers are widely used. One such widely used parallel array multiplier is the Braun's multiplier. The Braun's multiplier is generally called as the Carry Save Array Multiplier. The architecture of a Braun's multiplier consists of AND gates and full adders. All the architecture implementations demand using ASICs but the cost of development of ASICs is high. So the algorithms must be verified thoroughly before implementing them. FPGA overcomes these disadvantages because of the advantages like high speed of hardware, parallelism and the software flexibility. Also, ASICs are meant only for a particular design but FPGAs can be reprogrammed. In DSP applications, most of the power is consumed by the multipliers. Hence, low power multipliers must be designed in order to reduce the power dissipation in DSP applications. The power dissipation in CMOS circuits is mainly due to the static power dissipation and the dynamic power dissipation. The power dissipation in CMOS circuits is given by, P = (1/2)*C*V 2 *f*N, where, P is the power dissipation, C is the load capacitance, V is the supply voltage, f is the frequency of the clock and N is the total number of switching activities in one clock cycle. Dynamic power is due to the switching activities. So, by reducing the switching activity the dynamic power can be reduced. In this low power multiplier design domain, many papers have been published to reduce the switching activity [7] and also to reduce the power dissipation by bypassing techniques. In this paper, techniques to further reduce the delay and power are proposed by making modifications to the adders since adders are one of the major building blocks in multiplier designs. Compared with the conventional multipliers, the modified multipliers have an improved performance in terms of delay and power.
II. PREVIOUS WORK AND RELATED RESEARCH
The architecture of a 4*4 Standard Braun multiplier is as shown in Fig.: 1. In general, for an n*n Braun multiplier, there will be n(n-1) number of full adders and n 2 AND gates. One of the major disadvantages of the Braun's multiplier is that the number of components required increases quadratically with the number of bits which will make the multiplier to be inefficient. The delay of the Braun's multiplier depends on the delay of the full adders and also on the delay of the final adder in the last stage.
The dynamic power can be reduced by using the bypassing techniques. In Row Bypassing multiplier [2] , if the multiplier bit b j is zero, then the addition operations in the j-th row can be bypassed, thus directly providing (j-1)-th row outputs directly to the (j+1)-th row. Thus, the switching activities will be reduced and hence the power. The Braun Multiplier with Row Bypassing is illustrated in Fig.: 2. In a column bypassing based Braun multiplier [1] , if the multiplicand bit a i is zero, then the addition operations in (i+1)-th row can be bypassed. The column bypassing based Braun multiplier is illustrated in Fig.: 3. A multiplier in which either the addition operations in the j-th row or (i+1)-th column can be bypassed is called a 2-dimensional bypassing based multiplier [3] . Here in order to correct the output carry if the bits a i and b j are both zero and carry c i,j-1 is 1, then either row or column bypassing cannot be performed. So, extra bypassing circuitry is needed. But because of extra circuitry the ability of power reduction is reduced. 
III. PROPOSED WORK AND RESULTS
In all the multipliers discussed above, the last stage consists of a ripple carry adder. The delay of the Braun multiplier depends on the full adders and also on the final adder in the last stage. In the last stage, a ripple carry adder has been used. The main drawback of this multiplier is that because of the ripple carry adder in the last stage glitching problem occurs and also the delay of the multiplier will be high.
Ripple Carry adder is a combination of several full adders. The carry input of full adder is dependent on the carry output of the previous full adder, and the present full adder should wait until the previous full adder has completed producing the outputs. Hence, the delay is more for the ripple carry adder. If the number of bits increases, then the delay also increases more for a ripple carry adder. The delay and power of the multiplier can be reduced by replacing the ripple carry adder with fast adders like Carry look ahead adder and Kogge stone adder. The Modified Row Bypassing multiplier that is obtained by replacing the Ripple carry adder by a Carry look ahead adder and a Kogge stone adder are shown in Similarly, the other modified bypassing multipliers are designed. The RTL codes for all the designs as well as their architectural modifications are written in Verilog HDL. All the multiplier designs are simulated and synthesized in Xilinx ISE 13.2 tool and the delay has been calculated. By using different FPGA devices like Spartan-3E, Virtex-4, Virtex-5 and Virtex-6 Lower Power FPGA devices, the delay values have been calculated and a comparison is made among them. The FPGA devices used for comparison are: Spartan-3E (xc3s500e-4-ft256), Virtex-4(xc4vlx15-10-sf363),Virtex-5(xc5vlx30-1-ff324) and Virtex-6 Lower Power (6vlx75tlff484-1l).
The maximum combinational path delay reports for 4*4, 8*8 and 16*16 bits obtained using Xilinx 13.2 ISE simulator for different FPGA devices is shown in Table: 1.
From the above results, it is observed that Virtex-6 Lower Power FPGA is showing the less maximum combinational path delay for the multiplier designs. The proposed work in this paper i.e.; replacing Ripple carry adder in the last stage by a Carry look-ahead adder or by a Kogge stone adder shows the minimum delay.
The glitching problem caused by ripple carry adder can also be eliminated. These changes can be highly noticeable when the number of bits is more.
All the multiplier designs are synthesized in Cadence in 90 nm technology and cell area and the dynamic power reports are obtained by using RTL Compiler tool from Cadence. From the cell area reports, it is observed that the cell area is more for Carry Look ahead adder and Kogge stone adder compared to that of a Ripple Carry adder. Kogge stone adder has more area compared to the other two adders.
From the dynamic power results, it is observed that the dynamic power has been reduced for bypassing based multipliers which implies that the total power has also been reduced. The dynamic power is more the Two-dimensional bypassing multiplier because of the extra bypassing circuitry used in its design. The multipliers with Carry Look ahead adder and Kogge stone adder in the last stage have more dynamic power compared to that of the Ripple Carry adder.
IV. CONCLUSION
From the obtained results in Xilinx and Cadence, it can be concluded that if the multiplier is to be used for high -speed applications, then a Kogge stone adder can be used with the multiplier design but the area as well as the dynamic power increases. But by using a Carry look ahead adder in the last stage of the multiplier designs, with a slight increase in cell area and the dynamic power but the delay reduces significantly. Thus, it is observed that the Carry look ahead adder has the optimized values in terms of area, delay and dynamic power. The Virtex -6 Lower Power FPGA showed the least maximum combinational path delay for different multiplier designs compared to other FPGA devices like Spartan -3E, Virtex -4 and Virtex -5.
V. FUTURE WORK
In this paper, the proposed work has been done for 4*4, 8*8 and 16*16 bit unsigned multipliers. The bypassing techniques with the architectural modifications can also be applied to signed array multiplier architectures.
