Floating point multiplication is a crucial operation in high power computing applications such as image processing, signal processing etc. And also multiplication is the most time and power consuming operation. This paper proposes an efficient method for IEEE 754 floating point multiplication which gives a better implementation in terms of delay and power. A combination of Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm (Vedic Mathematics) is used to implement unsigned binary multiplier for mantissa multiplication. The multiplier is implemented using Verilog HDL, targeted on Spartan-3E and Virtex-4 FPGA.
INTRODUCTION
Floating point multiplication units are an essential IP for modern multimedia and high performance computing such as graphics acceleration, signal processing, image processing etc. There are lot of effort is made over the past few decades to improve performance of floating point computations. Floating point units are not only complex, but also require more area and hence more power consuming as compared to fixed point multipliers. And the complexity of the floating point unit increases as accuracy becomes a major issue. IEEE 754 [1] support different floating point formats such as Single Precision format, Double Precision format, Quadruple Precision format etc. But as the precision increases, multiplier area, delay and power increases drastically. In the proposed paper, we present a new multiplication method which uses a combination of Karatsuba and Urdhva-Tiryagbhyam (Vedic Mathematics) algorithm for multiplication. This combination not only reduces delay, but also reduces the percentage increase in hardware as compared to conventional methods. IEEE 754 format specifies two different formats namely single precision and double precision format [1, 2] . Fig. 1 shows the different IEEE 754 floating point formats used commonly. The Single precision format is of 32-bit wide and Double precision format is of 64-bit wide. The Most Significand Bit is the sign bit. The exponent is a signed integer. It is often represented as an unsigned value by adding a bias. In Single precision format, the exponent is of 8-bit wide and the bias is 127, i.e. the exponent has a range of 127 128 . In Double precision format, the exponent is of 11-bit wide and the bias is 1023, i.e. the exponent has a range of 1023 1024 . The mantissa or significand of Single precision format is of 23-bit and of double precision format is of 52 bit wide. The maximum value that can be represented using floating point format is .
And the minimum value that can be represented is .
II. FLOATING POINT MULTIPLIER DESIGN
A floating point number has four parts: sign, exponent, significand or mantissa and the exponent base. A floating point number is represented in IEEE-754 format [1, 2] as or . The exponent base for binary format is 2. To perform multiplication of two floating point numbers 1 and 2 , the significant or mantissa parts are multiplied to get the product mantissa and exponents are added to get the product exponent. i.e.; the product is 1 2 . The hardware block diagram of floating point multiplier is shown in fig. 2 The important blocks in the implementa floating point multiplier [3] is described belo
A. Sign Calculation
The MSB of floating point number repre The sign of the product will be positive if are of same sign and will be negative if opposite sign. So, to obtain the sign of the pr a simple XOR gate as the sign calculator.
B. Addition of Exponents
To get the product exponent, the input ex together. Since we use a bias in the floa exponent, we need to subtract the bias f exponents to get the actual exponent. The 127 (01111111 ) for single precis 1023 (0111111111 ) for double prec proposed custom precision format also, a bia
The computational time of mantiss operation is much more that the exponen simple ripple carry adder and ripple bor optimal for exponent addition.
C. Karatsuba-Urdhva Tiryagbhyam binary m
In floating point multiplication, mos complex part is the mantissa multiplicatio operation requires more time compared to ad number of bits increase, it consumes more double precision format, we need a 53x53 bi single precision format we need 24x24 requires much time to perform these operat major contributor to the delay of the floating To make the multiplication operation more faster, the proposed model uses a combina algorithm and Urdhva Tiryagbhyam algorithm This method can be further optimize number of hardware. A more optimized hard [9, 10] is shown in Fig. 5 . This model eliminate the need for three operand 7-bit reduces hardware and delay. The adders ripple manner. 2 MSB ADDER1 p 3 3 MSB ADDER 2 p 4 4 MSB ADDER p 5 5 MSB ADD p 6 6 MSB p 7 hyam sutra Since there are more than two operands in can use carry save addition to implement ad technique reduces the delay to a great exten ripple carry adder. Karatsuba Algorithm for multiplication Karatsuba multiplication algorithm [11, 12] multiplying very large numbers. This metho Anatoli Karatsuba in 1962. It is a divide and in which we divide the numbers into their half and Least Significant half and then performed. Karatsuba algorithm reduces the numbe required by replacing multiplication opera operations. Additions operations are faster th and hence the speed of multiplier is increase of bits of inputs increase, Karatsuba algorith efficient. This algorithm is optimal if width than 16 bits. The hardware architecture of Ka is shown in fig. 6 . Karatsuba algorithm for tw can be explained as follow. 
mized to reduce
The equation (3) can be re-writ
The recurrence of Karatsuba al 3 2
D. Normalization of the result
Floating point representatio mantissa, which always has a v in the memory to save one bit considered to be the hidden bit left of decimal point. Usua shifting, so that the MSB of m radix 2, nonzero means 1. The multiplication result is shifted immediate left of decimal p operation of the result, the exp one. This is called normalizatio of hidden bit is always 1, it is c
E. Representation of exception
Some of the numbers ca normalized significand. To rep code is assigned to it. In the output signals namely Zero, Inf Denormal to represent these 0 and is taken as Zero (±0). If the pro 255 and 0, the (∞).
If the 255 and result is taken as NaN. Denorm numbers without a hidden 1 exponent. Denormals are us numbers that cannot be represe the product has then the result is represented represented as 0. s 2 , w III. IMPLIMENTA
The main objective of this p a floating point multiplier w operation both in terms of d multiplication is the most com multiplier, we designed a mult speed and increase in delay an increase in number of bits. IEEE-754 standard format is im and tested. The multiplier u replacing simple adders with e adders and carry save adders. simulated using Xilinx Synthes Saprtan-3E and Virtex-4 fpga Virtex-4 fpga is given in 
IV. CONCLUSION AND FUTURE WORK
This paper shows how to effectively reduce the percentage increase in delay and area of a floating point multiplier by using a very efficient combination of Karatsuba and UrdhvaTiryagbhyam algorithms. The model can be further optimized in terms of delay by using pipelining methods and precision of the result can be increased by adding efficient truncation and rounding methods. 
