Abstract: In VLSI technology, power consumption and delay becomes a major problem in multipliers. To reduce these issues we propose a new multiplier algorithm that combines numerical transformation and shift and add technique. In this design N X N bitmultiplication is done by using successive approximation of (N-1) X (N-1) bit multiplier. The strength of the multiplication is reduced by weight reduction technique.The performance of N-bit multiplier using successive approximation of N-1 bit multiplier is compared with the existing techniques and evaluated through simulation in order to highlight the speed superiority by reducing the number of components and interconnections. N-bit successive approximation is an excellent choice for low area and high speed applications. All the above mentioned multipliers are coded in VHDL and simulated in ModelSim and synthesized in EDA tool Xilinx_ISE 9.2i. This method is suitable for higher order bits. The analysis of power report is also presented here the proposed design is suitable for high speed, low area applications.
Introduction
The multiplier is the fundamental component of most of the latest Digital signal Processors (DSP'S). By power investigation, the multipliers consume more power in these chips. The computational speed depends on the critical path of the circuit. The array multiplier is one of the most familiar architectures. But this has the disadvantage of non-uniform path delays in the structure. Other than array multipliers, many authors invented different multiplication algorithms and architectures.
In [1] , the author suggested a method for improving the speed of array multiplier by equating delays among carry and sum pay-outs. In [1] , delay elements were introduced to compensate the delay imbalance in the structure. In [2] , multiplication was done in two steps. First, the complex multiplication operations were decomposed into elementary shift-and-add operations. Second, number transformation was applied to the original system to obtain equivalent architectures with different multiplier coefficients. In [3] , the researchers proposed high speed multiply-accumulate structures based on the Baugh-Wooley Algorithm (BWA). The multipliers were implemented with Modified Booth Algorithm (MBA).
In [4] , the design of bit parallel GF (2 m ) multiplier was proposed. The Mastrovito multiplier was modified and sharing sub-expression was exploited in the computation of the product matrix to reduce the complexity. In [5] , the multiplier was designed using separated multiplication technique to use in digital image signal processing with minimum power dissipation. Due to high spatial redundancy in 2-D image data, the multiplication is divided into higher and lower parts. The calculated values of the higher bits are stored in the memory cells and it can be reused when a cache hit occurs.
In [6] , radix-4 modular multiplier was proposed to be used in RSA (RIVEST, SHAMIR AND ADLEMAO). The booth algorithm is modified and radix-4 cellular array multiplier was designed in bit level and digit level. Here the low power multipliers were investigated by minimizing the switching activities of partial products according to effective dynamic ranges of input data. Many researches [7] , [8] , [9] and [10] have been done to design low power, low area multiplier. The conventional shift and add multiplier is modified to reduce its energy consumption. By pass zero during addition was implemented in [11] . This results in an average power reduction by 30%.
Many papers [12] , [13] , [14] and [15] were focussed on the design of low power array multiplier. They followed signal flow optimization in [3:2] adder array for the linear partial product reduction, left-to -right leapfrog (LRLF) structure and upper / lower splitting structure. In [13] the authors discussed spurious power suppression technique (SPST).
The remainder of the paper is organized as follows. In section II, the existing methods and their problems are discussed. In section III, the redundant calculation and shift and add are discussed. In section IV the proposed architecture with different conditional modes are discussed. Finally, the results and concluding remark are given in section V. The main advantage is its regular structure. It is having identical cells and generates partial products simultaneously and accumulates same time. But the disadvantage is that it requires large number of logic gates. Another, important widely used multiplier is shift-and-add multiplier. Main problem of this multiplier is that power dissipation is high due to high switching activity. The major sources of switching activities are summarized as below: Shifting of the ‗B' register, activity in the counter, activity in the adder, switching between ‗0' and ‗A' in the multiplexer, activity in the multiplexer select, shifting of the partial product register.
Existing Methods

Numerical Transformation
Numerical transform is a technique used for manipulating the data into another equivalent form. Normally, number splitting, sharing the sub-expression, constant multiplication etc. can be used. In [8] , the number splitting is used in constant multiplication for shift-and-add decomposition. The constant multiplication is used in the applications like matrix multiplication, FIR filter, and IIR filter.
In the proposed paper, multiplication of two N-bit numbers is derived by N-1 bit multiplication using weight reduction technique. Here the strength of the multiplication is reduced by reducing the maximum weight (2 N-1 ) of the number. The numerical transform technique is used here to reduce the weight of the number. Hence the computational complexity of the multiplier can be reduced. The concept is explained through an example. Let X and Y be the 4 bit numbers. X can be represented as,
And similarly
Here, the weight of the MSB (X N-1 and Y N-1 ) has to be reduced to 2 N-2 . The weight reduction is achieved by deriving redundant using numerical transform. The weight reduction is done in three ways. Three algorithms are developed based on this. They are, Mode I-Both numbers may be greater than or equal to 2 N-1 , Mode II -Both may be less than 2 N-1 and Mode III-one number may be greater than or equal to 2 N-1 and the other may be less than 2 N-1. By considering these modes, four architectures are proposed here. The number is split based on 2 N-1 . The number is divided into two parts, namely the maximum weight of the number and the redundant. The redundant may be positive or negative. If the number is greater than 2 N-1 , the redundancies will be positive. If the number is less than 2 N-1 , the redundancies will be negative.
For example, considering the numbers
The second term in (2) is knows as positive redundancy. The redundant calculated in (3) is negative. The redundant is derived by subtracting the number from the maximum weight 2 N-1 . If the number is less than 2 N-1 , subtracting the number from 2 N-1 will give the redundant. In this case, the redundant is negative. If the number is greater than 2 N-1 , the redundancies will be positive and it will be derived by subtracting 2 N-1 from the number. By calculating redundancy, the N bit number is reduced into N-1 bits. The N-1 bit multiplier is used to multiply these redundant. The redundant calculation and the shift-and-add method are discussed in section III.
Proposed Architecture
In this section, the proposed algorithms are presented and their architectures for four different modes are given. The results are proven theoretically in this section. Three modes are discussed in detail below. The comparator produces 1 output if the number is greater than or equal to 2 N-1 . The conditional subtractor is designed to produce the output such that if the control signal is 0, the output is calculated by subtracting the number from 2 N-1 , otherwise 2 N-1 is subtracted from input A or B.
The combined structure is shown in Fig.4 . Using the combined structure, the number in any mode can be calculated. This structure is similar to the structure shown in Fig.3 . Here the control signal to select adder/subtractor is generated by simple logic gate.
The above structure can be simplified by replacing comparator. The comparator produces 1 output if the number is greater than or equal to 8. Instead of designing comparator, it can be compensated by considering MSB bits of the number. If MSB=1,it implies that the number is greater than 2 N-1 . 
ISSN (Online
Results and Discussion
For comparison, we have considered both conventional methods and the proposed multiplier for different bit values in VHDL. The VHDL codes are implemented using Xilinx Vertex FPGA. The power analysis of the gate level structure and delay calculations is conducted using Xilinx ISE tool. The array multiplier is widely used due to its linear structure. It is advantageous for minimum number of bits.
The results of table 1 and 2 show the values of the proposed multiplier for different bit values. The multiplier unit is the important basic unit in most of the applications. The array multiplier and shift-and-add multipliers are widely used in most of the applications. In the table 2, the proposed multiplier is compared with existing array multiplier and shift-and-add multiplier for the standard bit sizes 4,8,16 and 32. From the result, it is clear that the proposed method is well suited for higher order bits. In the lower bits, the proposed method doesn't show advantageous result. For the bit size 2, the delay is high for the proposed method. But for the higher order bits, the delay is reduced. Similarly, the number of LUTs is also increased for different bit values. Comparatively, the proposed method reduces the area to the extent of 71%. The delay is reduced to the extent of 77%. By reducing the delay, the speed can also be increased. 
Conclusion and Future Work
In this paper, multiplier based on numerical transformations, i.e. number splitting is designed and the same algorithm is converted into hardware. Based on algorithmic concepts, four modules are proposed. All the modules are implemented Xilinx Vertex FPGA. Comparatively, the proposed method occupies less area with minimum delay. The proposed method is not advantageous for minimum number of bits. But the area and delay calculation is very optimal in higher order bits. This method is suitable for higher order bits. The architecture can also be modified to improve the speed further. 
