Run-time reconfigurable multi-precision floating point multiplier design
  for high speed, low-power applications by Arish, S & Sharma, R. K.
Run-time reconfigurable multi-precision floating 
point multiplier design for high speed, low-power 
applications 
 
 
 
 
 
 
 
 
 
 
Abstract: Floating point multiplication is one of the crucial 
operations in many application domains such as image processing, 
signal processing etc. But every application requires different 
working features. Some need high precision, some need low power 
consumption, low latency etc. But IEEE-754 format is not really 
flexible for these specifications and also design is complex. Optimal 
run-time reconfigurable hardware implementations may need the use 
of custom floating-point formats that do not necessarily follow IEEE 
specified sizes. In this paper, we present a run-time-reconfigurable 
floating point multiplier implemented on FPGA with custom floating 
point format for different applications. This floating point multiplier 
can have 6 modes of operations depending on the accuracy or 
application requirement. With the use of optimal design with custom 
IPs (Intellectual Properties), a better implementation is done by 
truncating the inputs before multiplication. And a combination of 
Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm (Vedic 
Mathematics) is used to implement unsigned binary multiplier. This 
further increases the efficiency of the multiplier. 
Keywords: fpga, Run-time-reconfigurable, Variable-precision, 
Floating point multiplier, Vedic mathematics, Urdhva-Tiryagbhyam, 
Karatsuba 
I. INTRODUCTION  
    Floating point multiplication units are essential Intellectual 
Properties (IP) for modern multimedia and high performance 
computing such as graphics acceleration, signal processing, 
image processing etc. There are lot of effort is made over the 
past few decades to improve performance of floating point 
computations. Floating point units are not only complex, but 
also require more area and hence more power consuming as 
compared to fixed point multipliers. And the complexity of the 
floating point unit increases as accuracy becomes a major 
issue. Even a minute error in accuracy can cause major 
consequences. These errors are possible in floating point units 
mainly because of the discrete behavior of the IEEE-754 [1] 
floating point representation, where fixed number of bits is 
used to represent numbers. Due to the high computational 
requirements of scientific applications such as computational 
geometry, climate modeling, computational physics, etc., it is 
necessary to have extreme precision in floating point 
calculations. And these increased precision may not be 
provided with single precision or double precision format. 
That further increases the complexity of the unit. But some 
applications do not require high precision. Even an 
approximate value will be sufficient for the correct operation. 
For applications which require lower precision, the use of 
double precision or quadruple precision floating point units 
will be a luxury. It wastes area, power and also increases 
latency. 
    For devices such as portable or wearable devices in which 
accuracy requirement varies with different applications and 
also power consumption is a very important factor, use of high 
precision floating point multipliers is not a good option. In 
such cases a variable precision multiplier will be a good 
option which can save much power and time when application 
doesn’t need high precision. There are a lot of such models 
like [2], [3] and [4]. Most of such designs make use of already 
available IPs such as DSP (Digital Signal Processing) units 
and 18x18 multiplier units. In this proposed paper, we present 
a power efficient design of floating point multiplier with 
different modes of accuracy selection. With different precision 
modes, we can select the mode which is appropriate for the 
concerned application. As accuracy requirement decreases, the 
width of multiplier decreases and hence the power 
consumption and latency.  
II. PROPOSED MODEL 
    The proposed model is a reconfigurable multi-precision 
floating point multiplier which can be operated in six different 
modes according to the accuracy requirements. It can perform 
floating point format multiplication of different mantissa sizes 
depending on the precision requirement. The basic unit is a 
Double-precision floating point unit. According to the 
precision selected, the size of the mantissa is varied. Fig. 1 
shows the floating-point multiplication format used in the 
proposed model.   
    The multiplier accepts two inputs each of 67-bit wide. The 
first 3 bits are used for mode selection. The inputs to the 
multiplier can be given in double-precision floating point 
format with first 3 bits (66th bit to 64th bit) as mode select bits.  
 
R.K.Sharma 
School of VLSI Design and Embedded Systems 
National Institute of Technology Kurukshetra 
Kurukshetra, India 
rksharama@nitkkr.ac.in 
 
 
Arish S 
School of VLSI design and Embedded Systems 
National Institute of Technology Kurukshetra 
Kurukshetra, India 
arishsu@gmail.com 
 
The value of the mode select bits for both t
the same, otherwise a mode select erro
generated and the execution will be stopp
mode select bit combinations for different m
table 1. 
The different modes in the proposed multi-pr
are the following. 
Mode 1: Mode 1 is auto mode, i.e. the co
select the optimum mode by analyzing the in
execution. The optimum mode is selected
number of zeroes after a leading 1. If the num
or more after a leading 1, then the bits up to
counted. If the number of bits up to that lead
8, then mode 2 or 8-bit mantissa mode will 
number of bits before the leading 1 is les
mantissa mode will be selected and so on.  
Mode 2: This is a custom precision forma
double-precision floating point multiplier wi
of 8-bit. 
Mode 3: This is a custom precision forma
double-precision floating point multiplier wi
of 16-bit. 
Mode 4: This is a custom precision forma
double-precision floating point multiplier wi
of 23-bit. 
Mode 5: This is a custom precision forma
double-precision floating point multiplier wi
of 36-bit. 
Mode 6: This mode is a fully fledged double
point multiplier at the cost of accuracy. 
    The modes with less number of mantissa b
amount of power. These modes are best 
multiplication and also for applications whe
a big issue. Rounding of bits is done before 
every mode except mode 6 and this reduces 
results.  
    A simple block diagram of the proposed m
fig. 2. The custom precision formats with 8
16-bit mantissa are best suited for integ
where fractional accuracy is not an issue. It
 
3 1 11 52 
  Mode   Sign   Exponent              Manti
  select      
 
Fig. 1 Floating point format used in the prop
TABLE I - Different modes 
Mode Mode selec
Mode 1(Auto Mode) 
Mode 2 
Mode 3 
Mode 4 
Mode 5 
Mode 6 
 
000 
001 
010 
011 
100 
101 
he inputs must be 
r signal will be 
ed. The different 
odes is shown in 
ecision multiplier 
ntroller itself will 
puts and will start 
 by counting the 
ber of zeroes is 6 
 that leading 1 is 
ing 1 is less than 
be selected. If the 
s than 16, 16-bit 
t. It uses a basic 
th a mantissa size 
t. It uses a basic 
th a mantissa size 
t. It uses a basic 
th a mantissa size 
t. It uses a basic 
th a mantissa size 
-precision floating 
its consumes less 
suited for integer 
re accuracy is not 
multiplication for 
huge variations in 
odel is shown in 
-bit mantissa and 
er multiplications 
 can also be used 
for low value fractional mu
integer value as result. By usi
instead of a fully-fledged do
multiplier can save a lot of pow
The binary unsigned mu
multiplication is implemented
Karatsuba algorithm [4, 5] 
algorithm, which gives better 
and area. 
III. FLOATING PO
    A floating point number is r
[1] as  േݏ ൈ  ܾ௘  or  േݏ݂݅݃݊݅
perform multiplication of two 
 ܾ௘ଵ and േݏ2 ൈ ܾ௘ଶ, the sign
multiplied to get the produc
added to get the product expon
ݏ2ሻ  ൈ  ܾሺ௘ଵା௘ଶሻ. The hardware 
multiplier is shown in fig. 3.  
The important blocks in the
floating point multiplier is desc
A. Sign Calculation 
    The MSB of floating point 
The sign of the product will b
are of same sign and will b
opposite sign. So, to obtain the
a simple XOR gate as the sign 
B. Addition of Exponents 
    To get the product exponent
together. Since we use a bia
exponent, we need to subtra
exponents to get the actual e
127ଵ଴ (01111111ଶ) for s
1023ଵ଴(0111111111ଶ) for 
proposed custom precision for
The computational time of man
much more than the exponen
carry adder and ripple borr
exponent addition. 
 
Fig. 2 Block diagram
ssa 
  
osed model
t bits 
ltiplication which require an 
ng 8-bit and 16-bit multipliers 
uble precision floating point 
er and can increase the speed. 
ltiplier used for mantissa 
 by using a combination of 
and Urdhva-Tiryagbhyam [6] 
optimization in terms of speed 
INT MULTIPLIER 
epresented in IEEE-754 format 
݅ܿܽ݊݀ ൈ ܾܽݏ݁௘௫௣௢௡௘௡௧  [7]. To 
floating point numbers േݏ1 ൈ
ificant or mantissa parts are 
t mantissa and exponents are 
ent. i.e.; the product is േሺݏ1 ൈ
block diagram of floating point 
 implementation of proposed 
ribed below [8]. 
number represents the sign bit. 
e positive if both the numbers 
e negative if numbers are of 
 sign of the product, we can use 
calculator. 
, the input exponents are added 
s in the floating point format 
ct the bias from the sum of 
xponent. The value of bias is 
ingle precision format and 
double precision format. In 
mat also, a bias of 127 is used. 
tissa multiplication operation is 
t addition. So a simple ripple 
ow subtracter is optimal for 
 
 of the proposed model 
 C. Karatsuba-Urdhva Tiryagbhyam binary m
    In floating point multiplication, mos
complex part is the mantissa multiplicatio
operation requires more time compared to ad
number of bits increase, it consumes more 
double precision format, we need a 53x53 bi
single precision format we need 24x24 
requires much time to perform these operat
major contributor to the delay of the floating
To make the multiplication operation more 
faster, the proposed model uses a combina
algorithm and Urdhva Tiryagbhyam algorithm
    Karatsuba algorithm uses a divide and 
where it breaks down the inputs into Most S
Least Significant half and this process co
operands are of 8-bits wide. Karatsuba algor
for operands of higher bit length. But at low
not as efficient as it is at higher bit lengths.
problem, Urdhva Tiryagbhyam algorithm is 
stages. The model of Urdhva-Tiryagbhy
shown in Fig. 4. Urdhva Tiryagbhyam algo
algorithm for binary multiplication in terms 
But as the number of bits increases, delay als
partial products are added in a ripple manner
4-bit multiplication, it requires 6 adders con
manner. And 8-bit multiplication requires 14
Compensating the delay will cause incre
Urdhva Tiryagbhyam algorithm is not th
number of bits is much more. If we use Kara
higher stages and Urdhva Tiryagbhyam al
stages, it can somewhat compensate the limi
algorithms and hence the multiplier becom
The circuit is further optimized by using carr
save adders instead of ripple carry adders.
delay to a great extent with minimal incre
These two algorithms are explained in de
sections. 
Fig. 3 Floating point multiplier 
ultiplier 
t important and 
n. Multiplication 
dition. And as the 
area and time. In 
t multiplier and in 
bit multiplier. It 
ions and it is the 
 point multiplier. 
area efficient and 
tion of Karatsuba 
. 
conquer approach 
ignificant half and 
ntinues until the 
ithm is best suited 
er bit lengths, it is 
 To eliminate this 
used at the lower 
am algorithm is 
rithm is the best 
of area and delay. 
o increases as the 
. For example, for 
nected in a ripple 
 adders and so on. 
ase in area. So 
at optimal if the 
tsuba algorithm at 
gorithm at lower 
tations in both the 
es more efficient. 
y select and carry 
 This reduces the 
ase in hardware. 
tail in the below 
Urdhva Tiryagbhyam algorithm
    Urdhva-Tiryagbhyam sutra i
method for multiplication [
applicable to all cases of mult
short and consists of only on
‘Vertically and crosswise’. In U
the number of steps required fo
and hence the speed of multipli
    An illustration of steps for c
bit numbers is shown below
a3a2a1a0 and b3b2b1b0 and 
product. And the temporary par
The partial products are obtain
The line notation of the steps is
Step1: t0ሺ1ܾ݅ݐሻ ൌ a0b0. 
Step2: t1ሺ2ܾ݅ݐሻ ൌ a1b0 ൅ a
Step3: t2ሺ2ܾ݅ݐሻ ൌ a2b0 ൅ a
 Fig. 4 Karatsuba-Urd
Fig. 5 Line notation of Ur
 for multiplication 
s an ancient Vedic mathematics 
6]. It is a general formula 
iplication. The formula is very 
e compound word and means 
rdhva Tiryagbhyam algorithm, 
r multiplication can be reduced 
cation is increased. 
omputing the product of two 4-
 [9, 10]. The two input are 
let  p7p6p5p4p3p2p1p0 be the 
tial products are t0, t1, t2, … , t6. 
ed from the steps given below. 
 shown in fig. 5. 
0b1. 
1b1 ൅ a0b2 
 
 
hva multiplier model 
 
 
dhva Tiryagbhyam sutra 
Step4: t3ሺ3ܾ݅ݐሻ ൌ a3b0 ൅ a2b1 ൅ a1b2 ൅
Step5: t4ሺ2ܾ݅ݐሻ ൌ a3b1 ൅ a2b2 ൅ a1b3. 
Step6: t5ሺ2ܾ݅ݐሻ ൌ a3b2 ൅ a2b3. 
Step7: t6ሺ1ܾ݅ݐሻ ൌ a3b3 
 
The product is obtained by adding s1, s2 
below, where s1, s2 ܽ݊݀ s3 are the partial sum
 
s1 ൌ t6 t5ሾ0ሿ t4ሾ0ሿ t3ሾ0ሿ t2ሾ0ሿ t1ሾ0ሿ t0 
s2 ൌ t5ሾ1ሿ t4ሾ1ሿ t3ሾ1ሿ t2ሾ1ሿ t1ሾ1ሿ 
s3 ൌ t3ሾ2ሿ 
 
Product ൌ t6  t5ሾ0ሿ  t4ሾ0ሿ  t3ሾ0ሿ  t2ሾ0ሿ  t1ሾ0ሿ 
          t5ሾ1ሿ  t4ሾ1ሿ  t3ሾ1ሿ  t2ሾ1ሿ  t1ሾ1ሿ 
                    t3ሾ2ሿ       0          0         0 
 
            p7 p6   p5      p4       p3       p2       p1      
 
This method can be further optimized to red
hardware. A more optimized hardware archi
shown in Fig. 6. This model actually help
need for three operand 7-bit adder and hence
and delay. The adders are connected in ripple
The expressions for product bits are as shown
p0 ൌ a0b0 
 
Fig. 6 Hardware architecture for 4x4 U
Tiryagbhyam multiplier. 
a0b3. 
ܽ݊݀ s3 as shown 
 obtained. 
   t0  + 
   0   + 
    0     
 p0 
uce the number of 
tecture [11, 12] is 
s to eliminate the 
 reduces hardware 
 manner.  
 below. 
p1 ൌ ܮܵܤ ݋݂ ൫ܵݑ݉ሺܣܦܦܧܴ 1
      ൌ ܮܵܤ ݋݂ ሺa1b0 ൅ a0b1ሻ 
p2 ൌ ܮܵܤ ݋݂ ൫ܵݑ݉ሺܣܦܦܧܴ 2
      ൌ ܮܵܤ ݋݂ ሺMSBሺADDER1ሻ
p3 ൌ ܮܵܤ ݋݂ ൫ܵݑ݉ሺܣܦܦܧܴ 3
     ൌ ܮܵܤ ݋݂ ሺMSBሺADDER 2ሻ
p4 ൌ ܮܵܤ ݋݂ ൫ܵݑ݉ሺܣܦܦܧܴ 4
ൌ ܮܵܤ ݋݂ ሺMSBሺADDER
p5 ൌ ܮܵܤ ݋݂ ൫ܵݑ݉ሺܣܦܦܧܴ 5
ൌ ܮܵܤ ݋݂ ሺMSBሺADD
p6 ൌ ܮܵܤ ݋݂ ൫ܵݑ݉ሺܣܦܦܧܴ 6
ൌ ܮܵܤ ݋݂ ሺMSBሺ
p7 ൌ ܥܽݎݎݕ ݋݂ ܣܦܦܧܴ  
 
Since there are more than two
can use carry save addition to
technique reduces the delay to 
ripple carry adder. 
Karatsuba Algorithm for m
    Karatsuba multiplication alg
multiplying very large number
Anatoli Karatsuba in 1962. It i
in which we divide the numb
half and Least Significant h
performed. Karatsuba algori
multipliers required by replaci
addition operations. Addition
multiplications and hence the s
As the number of bits of input
becomes more efficient. This a
inputs is more than 16 bits. 
Karatsuba algorithm is shown i
Karatsuba algorithm for two in
as follow. 
Productൌ ܺ. ܻ 
X and Y can be written as, 
       ܺ ൌ 2௡/ଶ
       ܻ ൌ 2௡/ଶ
 
rdhva 
Fig. 7 Block diagram
ሻ൯ 
ሻ൯ 
൅a2b0 ൅ a1b1 ൅ a0b2ሻ 
ሻ൯ 
൅a3b0 ൅ a2b1 ൅ a1b2 ൅ a0b3ሻ 
ሻ൯ 
1ሻ൅a3b1 ൅ a2b2 ൅ a1b3ሻ 
ሻ൯ 
ER1ሻ൅a3b2 ൅ a2b3ሻ 
ሻ൯ 
ADDER1ሻ൅a3b3ሻ 
 operands in adders 2 to 5, we 
 implement adders 2 to 5. This 
a great extend compared to the 
ultiplication 
orithm [4, 5] is best suited for 
s. This method is discovered by 
s a divide and conquer method, 
ers into their Most Significant 
alf and then multiplication is 
thm reduces the number of 
ng multiplication operations by 
s operations are faster than 
peed of multiplier is increased. 
s increase, Karatsuba algorithm 
lgorithm is optimal if width of 
The hardware architecture of 
n Fig. 7.  
puts X and Y can be explained 
.  Xl ൅  Xr           (1) 
.  Yl ൅  Yr             (2) 
 
 
 of Karatsuba multiplier 
Where  Xl,  Yl and  Xr,  Yr are the Most Significant half and 
Least Significant half of X and Y respectively, and n is the 
number of bits. 
Then,  
ܺ. ܻ ൌ ቀ2೙మ.  Xl ൅  Xrቁ . ሺ2
೙
మ.  Yl ൅  Yrሻ 
             ൌ 2௡.  Xl Yl ൅ 2௡/ଶ ሺ Xl Yr ൅  Xr Ylሻ ൅  Xr Yr             (3) 
 
The Second term in equation (3) can be optimized to reduce 
the number of multiplication operations.  
i.e.;     Xl Yr ൅  Xr Yl ൌ ሺ Xl ൅  Xrሻሺ Yl ൅  Yrሻ െ  Xl Yl െ  Xr Yr 
               (4) 
The equation (3) can be re-written as, 
ܺ. ܻ ൌ 2௡.  Xl Yl ൅  Xr Yr ൅ 2
೙
మ ሺሺ Xl ൅  Xrሻሺ Yl ൅  Yrሻ 
      െ Xl Yl െ  Xr Yrሻ                             
(5) 
 
The recurrence of Karatsuba algorithm is, 
ܶሺ݊ሻ ൌ 3ܶ ቀ௡ଶቁ ൅ ܱሺ݊ሻ  ܱሺ݊ଵ.ହ଼ହሻ           (6) 
D. Normalization of the result 
    Floating point representations have a hidden bit in the 
mantissa, which always has a value 1 and hence it is not stored 
in the memory to save one bit. A leading 1 in the mantissa is 
considered to be the hidden bit, i.e. the 1 just immediate to the 
left of decimal point. Usually normalization is done by 
shifting, so that the MSB of mantissa becomes nonzero and in 
radix 2, nonzero means 1. The decimal point in the mantissa 
multiplication result is shifted left if the leading 1 is not at the 
immediate left of decimal point. And for each left shift 
operation of the result, the exponent value is incremented by 
one. This is called normalization of the result. Since the value 
of hidden bit is always 1, it is called ‘hidden 1’. 
E. Representation of exceptions 
    Some of the numbers cannot be represented with a 
normalized significand. To represent those numbers a special 
code is assigned to it. In the proposed model, we use four 
output signals namely Zero, Infinity, NaN (Not-a-number) and 
Denormal to represent these exceptions. If the product has 
 ݁ݔ݌݋݊݁݊ݐ ൅ ܾ݅ܽݏ ൌ 0 and ݏ݂݅݃݊݅݅ܿܽ݊݀ ൌ 0, then the result 
is taken as Zero (±0). If the product has  ݁ݔ݌݋݊݁݊ݐ ൅ ܾ݅ܽݏ ൌ
255 and ݏ݂݅݃݊݅݅ܿܽ݊݀ ൌ 0, then the result is taken as Infinity 
(∞). If the product has 
 ݁ݔ݌݋݊݁݊ݐ ൅ ܾ݅ܽݏ ൌ 255 and ݏ݂݅݃݊݅݅ܿܽ݊݀ ് 0, then the 
result is taken as NaN. Denormalized values or Denormals are 
numbers without a hidden 1 and with the smallest possible 
exponent. Denormals are used to represent certain small 
numbers that cannot be represented as normalized numbers. If 
the product has  ݁ݔ݌݋݊݁݊ݐ ൅ ܾ݅ܽݏ ൌ 0 and ݏ݂݅݃݊݅݅ܿܽ݊݀ ് 0, 
then the result is represented as Denormal. Denaormal is 
represented as േ0. s ൈ 2ିଵଶ଺ , where s is the significand. 
 
IV. IMPLIMENTATION AND RESULTS 
    The main objective of this work is to design and implement 
a floating point variable-precision circuit such that the device 
can reconfigure itself according to the precision requirements 
and can operate at high speed irrespective of accuracy and 
consume less power where accuracy is not an issue. Since 
mantissa multiplication is the most complex part in the 
floating point multiplier, we designed a multiplier which can 
operate at high speed and increase in delay and area is 
significantly less with increasing number of bits. The floating 
point multipliers of different modes with IEEE-754 standard 
format and custom precision format is implemented separately 
using Verilog HDL and tested. The binary multiplier unit 
(Karatsuba-Urdhva) are further optimized by replacing simple 
adders with efficient adders like carry select adders and carry 
save adders. The proposed model is implemented, synthesized 
and simulated using Xilinx Synthesis Tools (ISE 14.7) 
targeted on Virtex4 family. The model operates in a selected 
mode only and during operation, only the selected multiplier 
unit will be in ON state and all other multipliers units will be 
in OFF state. Hence, if a low precision mode is selected, the 
area and hence the power consumption will be less. The 
summary of results is given in table II and table III. 
Comparison with various multiplier units is given in tables IV, 
V, VI, VII and VIII. 
 
TABLE II - Performance analysis of Karatsuba-Urdhva 
multipliers in the proposed model 
  
 8-bit 
multiplier 
16-bit 
multiplier 
24-bit 
multiplier 
32-bit 
multiplier 
Slices 113 410 972 1389 
LUTs 120 451 1018 1545 
IOBs 33 65 97 129 
Delay 9.396ns 11.514ns 12.996ns 13.141ns 
௠݂௔௫ 
(MHz) 
274.469 248.964 226.508 209.606 
Logic 
levels 
14 22 31 39 
TABLE III – Performance analysis of floating point units in the 
proposed model 
 
 8-bit 
precision 
floating 
point 
multiplier 
16-bit 
precision 
floating 
point 
multiplier 
23-bit 
precision 
floating 
point 
multiplier 
Double 
precision 
floating 
point 
multiplier 
Slices 157 475 977 3877 
LUTs 220 584 1073 4033 
IOBs 61 83 104 193 
Delay 12.254ns 14.577ns 16.392ns 18.966ns 
௠݂௔௫ 
(MHz) 
264.767 240.955 226.508 173.952 
 
V. CONCLUSION AND FUTURE WORK 
    This paper describes a method to effectively adjust the 
delay and power consumption for different accuracy 
requirements. Also the paper shows how to effectively reduce 
the percentage increase in delay and area of a floating point 
multiplier with increase in number of bits by using a very 
efficient combination of Karatsuba and Urdhva-Tiryagbhyam 
algorithms. The model can be further optimized in terms of 
delay by using pipelining methods and precision of the result 
can be increased by adding efficient truncation and rounding 
methods. 
 
REFERENCES 
 
[1] IEEE 754-2008, IEEE Standard for Floating-Point Arithmetic, 2008. 
[2] K. Manolopoulos, D. Reisis, V.A. Chouliaras, “An Efficient Multiple 
Precision Floating-Point Multiplier”, 18th IEEE International 
Conference on Electronics, Circuits and Systems (ICECS),  pp. 153-156, 
2011 
[3] Claudio Brunelli, Perttu Salmela, Jarmo Takala and Jari Nurmi , “A 
Flexible Multiplier for Media Processing”, IEEE workshop on Signal 
processing System Design and Implementation, pp. 70-74, 2005 
[4] N.Anane, H.Bessalah, M.Issad, K.Messaoudi, “Hardware 
implementation of Variable Precision Multiplication on FPGA”, 4th 
International Conference on Design & Technology of Integrated 
Systems in Nanoscale Era, pp. 77-81, 2009 
[5] Anand Mehta, C. B. Bidhul, Sajeevan Joseph, Jayakrishnan. P, 
“Implementation of Single Precision Floating Point Multiplier using 
Karatsuba Algorithm”, 2013 International Conference on Green 
Computing, Communication and Conservation of Energy (ICGCE), pp. 
254-256, 2013  
[6] “Vedic mathematics”, Swami Sri Bharati Krsna Thirthaji Maharaja, 
Motilal Banarasidass Indological publishers and Book sellers, 1965 
[7] Computer Arithmetic, Behrooz Parhami, Oxford University Press, 2000. 
[8] B. Jeevan , S. Narender , C.V. Krishna Reddy, K. Sivani, “A High Speed 
Binary Floating Point Multiplier Using Dadda Algorithm”, International 
Multi-Conference on Automation, Computing, Communication, Control 
and Compressed Sensing, pp. 455-460, 2013 
[9] Poornima M, Shivaraj Kumar Patil, Shivukumar , Shridhar K P , Sanjay 
H, “Implementation of Multiplier using Vedic Algorithm”, International 
Journal of Innovative Technology and Exploring Engineering (IJITEE), 
ISSN: 2278-3075, Volume-2, Issue-6, pp. 219-223, May 2013 
[10] R. Sridevi, Anirudh Palakurthi, Akhila Sadhula, Hafsa Mahreen, 
“Design of a High Speed Multiplier (Ancient Vedic Mathematics 
Approach)”, International Journal of Engineering Research (ISSN : 
2319-6890), Volume No.2, Issue No.3, pp : 183-186, July 2013 
[11] Harpreet Singh Dhillon, Abhijit Mitra, “A Reduced-Bit Multiplication 
Algorithm for Digital Arithmetic”, World Academy of Science, 
Engineering and Technology, Vol 19, pp. 719-724, 2008 
[12] Premananda B.S., Samarth S. Pai, Shashank B., Shashank S. Bhat, 
“Design and Implementation of 8-Bit Vedic Multiplier”, International 
Journal of Advanced Research in Electrical, Electronics and 
Instrumentation Engineering, Vol. 2, Issue 12, pp. 5877-5882, December 
2013 
[13] R. Sai Siva Teja, A. Madhusudhan, “FPGA Implementation of Low-
Area Floating Point Multiplier Using Vedic Mathematics”, International 
Journal of Emerging Technology and Advanced Engineering, ISSN 
2250-2459, Volume 3, Issue 12, pp. 362-366, December 2013. 
[14] Jagadeshwar Rao M, Sanjay Dubey, “A High Speed and Area Efficient 
Booth Recoded Wallace Tree Multiplier for fast Arithmetic Circuits”, 
2012 Asia Pacific Conference on Postgraduate Research in 
Microelectronics & Electronics (PRIMEASIA), pp. 220-223, 2012. 
[15] R.K. Bathija, R.S. Meena, S. Sarkar, Rajesh Sahu, “Low Power High 
Speed 16x16 bit Multiplier using Vedic Mathematics”, International 
Journal of Computer Applications (0975 – 8887), Volume 59– No.6, pp. 
41-44, December 2012 
[16] Anna Jain, Baisakhy Dash, Ajit Kumar Panda, Muchharla Suresh, 
“FPGA Design of a Fast 32-bit Floating Point Multiplier Unit”, 
International Conference on Devices, Circuits and Systems (ICDCS), pp. 
545-547, 2012  
 
 
TABLE IV - Delay comparison of various 8-bit multipliers with 
proposed Karatsuba-Urdhva multiplier 
 
 Ref [9] Ref [12] Ref [13] Proposed 
multiplier 
Width 8-bit 8-bit 8-bit 8-bit 
Delay 28.27ns 15.050ns 23.973ns 9.396ns 
 
TABLE V - Delay comparison of various 16-bit multipliers with 
proposed Karatsuba-Urdhva multiplier 
 
 Ref [14]-vedic 
multiplier 
Ref [15] Proposed 
multiplier 
Width 16-bit 16-bit 16-bit 
Delay 13.452ns 27.148ns 11.514ns 
 
TABLE VI - Delay and area comparison of 24-bit multipliers 
with proposed Karatsuba-Urdhva multiplier 
 
 Slices  LUTs Delay 
Ref [16] 1306 2329 16.316ns 
Proposed 
multiplier 
972 1018 12.996ns 
 
TABLE VII - Delay and area comparison of 32-bit multipliers 
with proposed Karatsuba-Urdhva multiplier 
 
 LUTs Delay 
Ref [14]- Modified Booth 
multiplier (Radix-8) 
2721 12.081ns 
Ref [14]- Modified Booth 
multiplier (Radix-16) 
7161 11.564ns 
Ref [14] 2704 9.536ns 
Proposed multiplier 1545 13.141ns 
 
TABLE VIII - Delay and area comparison of SP-floating point 
multiplier with proposed SP FP multiplier 
 
 Slices  LUTs Delay 
Ref [16] 1269 2270 18.783ns 
Ref [8] 1149 1146 -- 
Proposed 
multiplier 
976 1091 16.392ns 
