I. Introduction

wallace tree multipliers
As compared to simple array multiplier the Wallace tree multiplier is considerably faster. The Wallace tree multiplier is a high speed multiplier. The summing of the partial product bits in parallel using a tree of carry-save adders became generally known as the "Wallace Tree". Three step processes are used to multiply two numbers.

Formation of bit products.  Reduction of the bit product matrix into a two row matrix by means of a carry save adder.
Summation of remaining two rows using a faster Carry Look Ahead Adder (CLA).  An efficient hardware implementation of a digital circuit is a Wallace tree that multiplies two integers.  The Wallace tree has three steps:  Multiply (that is -AND) each bit of one of the arguments, by each bit of the other, yielding n 2 results. The wires will carry different weights, according to position of the multiplied bits for example wire of bit carrying result of a 2 b 3 is 32.  Reduce the number of partial products to two by layers of full and half adders.  Group the wires in two numbers, and add them with a conventional adder.
1.1.1motivation
As day by day the systems on chip are growing, large no. of signal processing devices are being implemented on a VLSI chip. These applications demand for great computation capacity as well as great amount of energy. While performance and also the area remain are another two major design issues, power consumption has become a critical and the major issue in today's Very Large Scale Integrated system design. The need for low-power VLSI system got the focus because of two main reasons. First, with a constant growth of operating frequency and processing capacity per chip, large current must be delivered and the heat due to large power consumption has to be removed by proper cooling techniques. Second, portable electronic devices have limited battery life. In these portable devices low power design directly leads to prolonged operation time.
Multiplication is a basic fundamental operation in all signal processing algorithms. Multipliers have large area, long latency and consume considerable power. Therefore low-power multiplier design is a very important part in low-power VLSI system design. There has been extensive work on low-power multipliers at technology, physical, circuit and logic levels. As the multiplier is the slowest element in the system, the overall system performance is evaluated by the performance of the multiplier. Moreover, it is the most area consuming too. Hence, optimizing the speed and area of the multiplier is also a major design issue. But we know, the area and speed both are conflicting parameters. As a result, a whole spectrum of multipliers with different area-speed constraints has been designed.Parallel Multipliers at one end of the spectrum and serial multipliers at the other end. In between there are digit serial multipliers where single digits consisting of several bits are operated on. DOI: 10.9790/2834-1202034954 www.iosrjournals.org 50 | Page
For speed and area these multipliers have moderate performance.The digit serial multipliers have been design by complicated switching systems and/or irregularities in design. Radix 2^n multipliers which operate on digits in a parallel fashion instead of bits bring the pipelining to the digit level and avoid most of' the above problems. These structures are iterative and modular. At the digit level the pipelining is done which brings the benefit of constant operation speed irrespective of the size of the multiplier. The clock speed is only determined by the digit size which is already fixed before the design is implemented.
Power optimization
Number of Joules dissipated over a certain amount of time refers power whereas the measure of the total number of Joules dissipated by a circuit is energy.In digital CMOS design, the well-known power-delay product is commonly used to assess the merits of designs. It is shown as power × delay = (energy/delay) × delay = energy, which implies delay is irrelevant.
Low-power multiplier design
Multiplication has three steps: 1) partial product generation(PPG) 2) reduction of partial products (PPR) 3) carry-propagate addition (CPA).
There are sequential and combinational multiplier implementations. But we consider combinational case here as the scale of integration is large enough to accept parallel multiplier implementations in digital VLSI systems. Different multiplication algorithms vary in the approaches of PPG, PPR, and CPA. For PPG, radix-2 is the easiest. The radix-4 digit set {-2,-1, 0, 1, 2} is very popular. The PPR has two alternatives exist a)reduction by rows, performed by an array of adders, b)reduction by columns, performed by an array of counters. The final CPA appear on the critical path as it require fast adder scheme.
II. Methodology
design methodology of gate diffusion input technique Basic gate diffusion input (gdi cell) functions
Gate Diffusion Input (GDI CELL) method is based on the use of a simple cell as shown in Figure -1 . At a first glance the basic cell reminds the standard CMOS inverter, but there are some important differences:
(1) Gate Diffusion Input (GDI CELL) contains three inputs -G (common gate input of NMOS and PMOS), P (input to the source/drain of PMOS), and N (input to the source/drain of NMOS).
(2) NMOS &PMOS can be arbitrarily biased at contrast with CMOS inverter when bulks of both are connected to N or P. A simple change of the input configuration of the simple Gate Diffusion Input (GDI) CELL as shown in figure 1 corresponds to six different Boolean functions.
Fig2.1 Basic GDI Cell
1. When input N=0, P=B, and G=A then output D=AB which is function F1. 2. When input N=B, P=1, and G=A then output D=A+B which is function F2. 3. When input N=B, P=0, and G=A then output D=AB which is AND function 4. When input N=1, P=B, and G=A then output D=A+B which is OR function. 5. When input N=C, P=B, and G=A then output D=AB +AC which is MUX function. 6. When input N=0, P=1, and G=A then output D= A which is NOT function. When N input is driven at high logic level and P input is at low logic level, the diodes between NMOS and PMOS bulks to out are directly polarized and there is a short between n and P, resulting in static power dissipation and Vout ~ 0.5 Vdd This cause a drawback for OR, AND and MUX implementation in regular CMOS with configuration. The effect can be reduced if the design is performed in floating-bulk SOI technologies, where a full GDI library can be implemented. In that case floating bulk effects have been considered. GDI cell structure has some important features, which allow improvements in design complexity level, transistor count and power dissipation. A deeper operational analysis of the basic cell can be understand by GDI cell propertiesin different cases and configuration.
designing of full adders
Conventional Wallace Tree Multiplier has Full Adders in their reduction phases. So here we have first designed conventional CMOS full adder circuit. Modified structure of full adder is with a GDI logic style which has less no. of MOSFETS required.
Fig 2.2Full Adder Block Diagram
partial product generation
Partial Product Generation is the next step. Reducing number of partial products the complexity of multiplier can be reduced. 
III. Design and Implementation
FUNCTIONAL BLOCK DIAGRAM OF WALLACE TREE MULTIPLIER
FIG 3.1FUNCTIONAL BLOCK DIAGRAM OF THE WALLACE TREE MULTIPLIER
IV. Conclusion
The power and area analysis of Wallace Tree Multiplier with CMOS logic and GDI logic, we have analysed that the Wallace Tree Multiplier with GDI logic has lesser power and area than that of the Wallace tree multiplier with CMOS logic.The primary goal is not only to provide an efficient result in reduction of area but also shows a succesful try in terms of power dissipation.The basic low power CMOS cell structures as like a two input XOR gate etc are designed using complementary CMOS logic style and an another effective approach Gate Diffusion Input Technique.The CMOS cell structures which are designed on Tanner Tool with 90nm technology. The main concern here is to reduce area and power which has been taken for both logic style and compared with each other. All the circuits operate at a supply voltage of 1.4V and 0.8V.With Gate Diffusion Input technique, the circuit energies are conserved rather than dissipated as heat. Besides the power reduction GDI technique also provide reduction in size as an example where in AND gate using the complementary CMOS style 6transistors are used wheras in the GDI technique only 2 transistors are used. Depending on the application and the system requirement , this approach can be used to reduce the power dissipation of the digital system. With the help of GDI technique , the power savings of upto 40% to 60% can bereached.
