Floating-Point Division Operator based on CORDIC Algorithm by Surapong, Pongyupinpanich & Samman, Faizal Arya
80 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.7, NO.1 May 2013
Floating-Point Division Operator based on
CORDIC Algorithm
Pongyupinpanich Surapong1 and Faizal Arya Samman2 , Non-members
ABSTRACT
Design and evaluation of a CORDIC (COordinate
Rotation DIgital Computer) algorithm for a oating-
point division operation is presented in this paper. In
general, division operation based on CORDIC algo-
rithm has a limitation in term of the range of inputs
that can be processed by the CORDIC machine to
give proper convergence and precise division opera-
tion result. A hardware architecture of CORDIC al-
gorithm capable of processing broader input ranges
is implemented and presented in this paper by using
a pre-processing and a post-processing stage. The
performance as well as the calculation error statis-
tics over exhaustive sets of input tests are evaluated.
The results show that the CORDIC algorithm can
be well-convergence and gives precise division opera-
tion results with broader input ranges. The proposed
hardware architecture is modeled in VHDL and syn-
thesized on a CMOS standard-cell technology and a
FPGA device, resulting 1 GFlops on the CMOS and
210.812 MFlops on the FPGA device.
Keywords: Floating-Point Operators, Accelerator
Processor, Product-of-Sum, Sum-of-Product, 32-bit
IEEE Standard Single-Precision.
1. INTRODUCTION
In modern digital computer architecture, oating-
point arithmetic units have been important compo-
nents to improve the performance of the digital com-
puter. Arithmetic components such as adder/sub-
tractor, multiplier and divider units are generally
basic operators in a scientic computations beside
other trigonometric functions such as sine, cosine,
logarithm, exponent, etc. Compared to xed point
arithmetic units, the oating-point arithmetic units
provides better accuracy and precision and it cov-
ers larger data ranges, which is suitable for scien-
tic computations in engineering application areas.
Compared to other arithmetic operator, designing a
divider unit requires a special attention because of
Manuscript received on July 2, 2012.
1 The author is with Ramkhamhaeng University, Fac-
ulty of Engineering, Department of Computer Engineering,
Ramkhamhaeng Road, Hua Mark, Bangkapi, Bangkok 10240,
Thailand , E-mail: surapong@riees.org
2 The author is with Universitas Hasanuddin at Makassar,
Faculty of Engineering, Department of Electrical Engineering,
Jl. Perintis Kemerdekaan Km. 10, Tamalanrea, Makassar
90245, Indonesia. , E-mail: faizalas@unhas.ac.id
the complexity to implement the operation, especially
when using oating-point data operands.
Compared to other arithmetic operator, designing
a divider unit requires a special attention because of
the complexity to implement the operation, especially
when using oating-point data operands. The special
attention opens a challenging issue for many scientists
and researchers to introduce new ecient algorithms
and methods to design the divider unit. [1]
Division operations are often used in many sci-
entic computations of image and signal processing
algorithms. Image convolution and Gaussian Filter-
ing are example computations [2] that require divider
operator as presented in Equ. (1) and Equ. (2), re-
spectively. Equ. (1) is the equation of pixel output of
a 3  3 convolution masks window, where wi is the
weight of the j adjacent input image pixel within the
window.
P =
P9
i=0 piwiP9
i=0 wi
(1)
Another important equation with division opera-
tion is presented by the 2D Gaussian distribution as
shown in Equ. (2), where x and y is the two dimen-
sional image signal and  is the standard deviation
of the distribution. The Gaussian ltering is used to
\blur" images and remove detail and noise [2].
G(x; y) =
1
22
e 
x2+y2
22 (2)
In adaptive digital signal processing applications
for instance, division operations are used in a nor-
malized adaptive least-mean-square (LMS) algorithm
presented in Equ. (5) to update the parameters of an
adaptive lter. The lter output signal, Equ. (3), is
compared to the desired system model output sig-
nal (d), resulting in an error signal, Equ. (4). This
error signal is then used to drive the adaptive lter
parameters, in such a way that nally the adaptive
lter parameters (wj) will be equal (almost equal in
practice) to system model parameters. The signal di-
visor in this case ( + x(k   j)2) is used to improve
the stability of the adaptive parameter identication
algorithm.
Floating-Point Division Operator based on CORDIC Algorithm 81
y(k) =
NtapX
j=1
wj(k)x(k   j); 8j 2 	 (3)
e(k) = d(k)  y(k) (4)
wj(k + 1) = wj(k) +
 e(k)x(k   j)
 + x(k   j)2 ; 8j 2 	 (5)
2. STATE-OF-THE-ARTS OF DIVISION
METHODS
In this section, we will present brief descriptions
on the state-of-the-art of the methodologies or algo-
rithms to implement the binary division operation.
The methodologies are described as follows.
1. Adder-Cell-based Method: The design of divi-
sion operator using the adder-cell-based method will
always result in a very compact divider architecture.
This method is classied as non iterative technique,
where the divider unit consists of half-adder and full-
adder cells as well as other logic gate units and sup-
porting modules [3]. A binary divider that uses carry-
save adder units is presented for example in [4].
2. Digit Recurrence Algorithm: In modern oat-
ing point arithmetic units the most common algo-
rithm employed to division function is a digit recur-
rence algorithm [5] [6] [7]. The algorithm performs
both operations based on shifting and subtraction
as the fundamental operators. A combined oating-
point square-root and division operation can also be
implemented by using a subtractive SRT (Sweeney,
Robertson and Tocher) algorithm [8], which can be
classied as a digit recurrence algorithm. The sub-
tractive SRT algorithm can be extended by using
Radix-8 IDS (Interleaved Digit Set) algorithm to im-
prove the performance of the traditional digit re-
currence algorithm. Another variant of the digit-
recurrence method is svoboda algorithm. A new
Svoboda-Tung Division algorithm is for instance pro-
posed in [9].
3. Taylor's Series Expansion Algorithm: A Tay-
lor's Series Expansion Algorithm [10] for example can
be used to calculate division operation using a sequen-
tial series of a harmonic equation. However, the Tay-
lor's Series Expansion algorithm is rarely used and
perform slow computation to calculate the division
operations.
4. Goldschmidt's Algorithm: The basic idea be-
hind the Goldschmidt's Algorithm is the iterative
parallel multiplication of the dividend and divisor by
updated factors in such as a way that the nal divi-
sor will be driven to one. Thus, the nal dividend
gives the quotient (the division result). Oberman et
al. for example [11] proposes a oating-point divider
and square root for AMD-K7 by using Goldschmidt's
algorithm. The Goldschmidt's algorithm has been
broadly used on many commercial microprocessors
and is also known as division by multiplicative nor-
malization or division by convergence [12]. The dis-
advantage of the Goldschmidt's algorithm in term of
the area overhead is the need for two independent
parallel multiplication. As we know, a multiplier re-
quires large number of logic area, especially when it
is implemented in oating-point arithmetic.
5. Newton-Raphson Algorithm: The Newton-
Raphson division algorithm is almost similar with the
Goldschmidt's algorithm. In the Newton-Raphson
method however, the iterative renement is applied
only to the reciprocal value of the divisor, which will
be convergent after several iterations [13]. The divi-
sion operation of the Newton-Raphson method can
be divided into three steps, i.e. the initial estimation
of the divisor's reciprocal, the iterative renement of
the divisor's reciprocal and the multiplication step be-
tween the divided and the nal convergent divisor's
reciprocal. The work in [14] has presented for ex-
ample a decimal oating-point divider using newton-
raphson iteration, where an accurate piece-wise linear
approximation is used to obtain an initial estimate of
a divisor's reciprocal.
6. CORDIC Algorithm: Beside the aforemen-
tioned method to implement the division operation,
there is also another powerful algorithm to implement
the divider unit called CORDIC (COrdinate Rotation
DIgital Computer) algorithm [15]. Like digit recur-
rence method, CORDIC is also classied into iter-
ative method. The main powerful characteristic of
the CORDIC algorithm is the capability to imple-
ment several trigonometric function [16], [15], phase
and magnitude functions [17], and hyperbolic func-
tions [18] as well as linear operational function such
as multiplication and division functions.
By using CORDIC algorithms, we can also easily im-
plement all the function in a single CORDIC hard-
ware architecture [19]. Some basic standard and
non-standard operators such as sum-of-product and
product-of-sum [20], which can be used to accelerate
oating-point operations, can also be implemented by
using CORDIC algorithm. The work in [21] presents
for example a exible FPGA implementation of a pa-
rameterizable oating-point library allowing to com-
pute the sine, cosine or arctangent functions. The
CORDIC algorithm can also even be used to solve
problematic operation in a fuzzy logic controller cir-
cuit [22]. Moreover, the CORDIC algorithm can
be used to implement a unied frequency analysis
or transformations functions such as DFT (Discrete-
Fourier Transform), DHT (Discrete Hartley Trans-
form), DCT (Discrete Cosine Transform) and DST
(Discrete Sine Transform) [23].
There are two main issues, in which CORDIC al-
gorithm is preferable to design of the oating-point
division operator i.e.,
1. The benet of CORDIC Algorithm: The
CORDIC algorithm provides advantages in the per-
formability of fundamental function for scientic and
82 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.7, NO.1 May 2013
engineering, the low algorithmic complexity, and the
simplicity for VLSI implementation. The VLSI im-
plementation is simple because it uses only shift and
add operations. The area cost of the CORDIC al-
gorithm is certainly lower than the Newton-Raphson
Algorithm, Goldschmidt's Algorithm and Taylor's Se-
ries Expansion Algorithm, which require multiply and
add operations. Compared to the Digit-Recurrence
algorithm, CORDIC algorithm is slightly simpler.
2. Design alternative: From the previous works,
there is few research dealing with design and inves-
tigation of a divider based on CORDIC algorithm.
Therefore, this paper proposes the algorithm, design,
and architecture of the oating-point divider based
on CORDIC algorithm and the analysis and investi-
gation on its error characteristics.
The CORDIC algorithm is however lack in compu-
tational latency and convergence range. These disad-
vantage can be alleviated by the techniques proposed
in [15]. The convergence of the CORDIC algorithm
can be accelerated by duplicate and triplicate the mi-
cro (angle) rotation on each stage of the CORDIC
iterative algorithm.
The rest of the paper is organized as follows. Sec-
tion 3.shows the architecture of a CORDIC algorithm
especially used for division operations. The main rea-
sons why we propose a modication of CORDIC al-
gorithm are also presented in this Section. Section 4.
describes the performance analysis of the CORDIC
divider by giving a wide range of inputs. The con-
vergence and calculation errors are presented in this
section. Synthesis results of the CORDIC divider core
modeled in VHDL by using 130-nm CMOS standard-
cell technology and by using an FPGA device are pre-
sented in this section.
3. CORDIC-BASED DIVIDER ARCHITEC-
TURE
There are two basic reasons why a new CORDIC
algorithm is proposed in this paper.
1. The CORDIC algorithm gives correct convergence
when the expected division results are located in the
following ranges:  0:9647  Q = YX  2:9647 (based
on our Matlab simulation results). The values outside
the range will tend to saturate at unexpected division
operation results. Please see the experimental result
shown in Fig. 1.
2. We cannot identify, whether the division results
are in the aforementioned range or not, unless the
division has been made.
Now let us see the divergence problem as presented
in Fig. 1, which is obtained from our simple experi-
mental. Two curves are presented in the gure, i.e.
the ideal (MATLAB) output and the division results
using CORDIC algorithm. The gure presents the
-axis and the Y=X-ordinat, where Y = sin  and
X = cos ,  in radian unit. The simulation results
present that the division operations cannot move to a
−1.5 −1 −0.5 0 0.5 1 1.5
−15
−10
−5
0
5
10
15
20
Θ
Y/
X 
Matlab
CORDIC
Fig.1:: The CORDIC divergence problem.
convergence value, when the division results are out-
side the following ranges, i.e.  0:9647  YX  2:9647.
Based on the above mentioned facts, we propose
a solution to improve the range of inputs that can
guarantee the precision and the convergence of the
CORDIC algorithm to perform a oating-point divi-
sion operation. Firstly, we will introduce the basic of
the CORDIC algorithm as described in the following.
3.1 Basic Radix-2 CORDIC Division Algo-
rithm
The basic radix-2 CORDIC iteration algorithm
can be written as follows [19].
Xi+1 = Xi  m  i  Yi  m;i
Yi+1 = Yi + i Xi  m;i
Zi+1 = Zi   i  m;i (6)
The CORDIC implementation for Divider function
can be performed by conguring the CORDIC algo-
rithm into linear mode of operation (linear modus),
i.e. by setting m = 0 in the Equ. (6) and m;i = 2
 i.
Hence, the CORDIC iterative equation for the Divi-
sion operator is shown in Equ. (7).
Xi+1 = Xi
Yi+1 = Yi + i Xi  2 i
Zi+1 = Zi   i  2 i (7)
The set of i values is i = f 1;+1g, which depends
on the value of Yi as shown in Equ. (8).
i =
(
+1 : Yi < 0
 1 : Yi  0
(8)
3.2 The Proposed CORDIC Division Algo-
rithm
Based on our experience, the domain of well-
convergence of the CORDIC inversion function is
Floating-Point Division Operator based on CORDIC Algorithm 83
EYS Y MY EX XMS X
Detection
Exponent
EEC XO
Inversion
Q
Multiplication
E MZ Z
(CORDIC Algorithm)
ZA
Dividend, Y Divisor, X
Alignment
Fig.2:: The oating-point divider architecture.
shown in Equ. (9). If the domain is written in the
IEEE binary oating-point standard, the domain can
be described in Equ. (10) as well as Equ. (11) and
Equ. (12) present the exponent and mantissa faction.
0:5  X < 1:5 (9)
1:0 2126  X < 1:5 2127 (10)
1:0 MX  1:5 (11)
126  EX  127 (12)
The complete division operation Q = YX and its
hardware architecture are proposed in this paper,
where Q is the division operation result, Y is the
dividend and X is the divisor, The hardware archi-
tecture is classied into four main stages as presented
in Fig. 2. The description of the oating-point hard-
ware architecture is described as follows.
1. Divisor's Exponent Detection: In this rst stage,
a credit exponent EC and e new exponent EX0 for X
are computed by using Alg. 1.
2. Divisor Inversion: In this second stage the inverse
value of the new input divisor Xnew = ( 1)SXMX
2EX0 is computed by using CORDIC algorithm pre-
sented in Alg. 2 to obtain the variable Z.
3. Alignment: In this third stage, ZA is computed
from the Z variable (computed from Alg. 2) whose
exponent is aligned by using the credit exponent EC
(computed from Alg. 1). By using a formal oating
point equation, then we have ZA = ( 1)SZ 1:MZ 
2EZ EC , where SZ = SX .
4. Multiplication: Finally we will have the complete
division result as Q = Y  ZA.
Example: If we have decimal numbers Xd = 180
and Yd = 500 then the CORDIC divider should give
Algorithm 1 [EC ; EX0]=DivisorDetect(X), where
X  1
1: Yo=1, Zo=0, Xo=X, S0=-1 fInitializationg
2: EX = Exponent of the Input Divisor X
3: if EX < 126 then
4: EC = 126  EX
5: EX0 = 126
6: else if EX > 127 then
7: EC = EX   127
8: EX0 = 127
9: else
10: EC = 0
11: EX0 = EX
12: end if
13: return EC ; EX0
Algorithm 2 Z=Inverting(X; I), where X  1
1: Y0=1, Z0=0, X0=Xnew, S0=-1 fInitializationg
2: for i = 0 to I   1 do
3: Xi+1 = Xi
4: Yi+1 = Yi + (Xi  Si  2 i)
5: Zi+1 = Zi   (Si  2 i)
6: if Y < 0 then
7: Si+1=+1
8: else
9: Si+1=-1
10: end if
11: end for
12: return Z
Qd =
Yd
Xd
= 2:77778. The oating point formats of
the input signals are Xfp = 1:40625  2134, where
SX = 0, MX = 1:40625 and EX = 134, meanwhile
Yd = 1:38889  2128, where SY = 0, MY = 1:38889
and EY = 128. The step-by-step computations made
by using our proposed algorithm is as follows:
1. Divisor's Exponent Detection: By using Alg. 1,
the credit exponent is obtained as EC = EX   127 =
134  127 = 7.
2. Divisor Inversion: By using Alg. 2, inversion re-
sult gives Z = 1Xnew =
1
1:406252127 = 1:42222 2126.
3. Alignment: In this third stage, we have ZA =
1:42222  2EZ+EC = 1:42222  2126 7 = 1:42222 
2119.
4. Multiplication: Finally the division result is Q =
Y  ZA = 1:9531250  1:42222  2135+119 127 =
2:77778  2127 or according to the normalized IEEE
binary oating-point standard, Q = 1:388892128 as
expected.
4. PERFORMANCE ANALYSIS
The CORDIC Division function has been evalu-
ated by feeding input signals ranging for low mag-
nitude until high magnitude. Fig. 3 shows how the
Divider Hardware unit gives output signals Q by in-
verting input signals X (Q = 1X , where Dividend Y 1
is set 1). The input signals are incremented linearly
from X = 0:1 until X = 33. The gure shows also
the comparison of the Divider Hardware and Matlab
Output (Real calculated output) as presented in the
upper diagram of Fig. 3. The absolute calculation
errors are shown in the bottom diagram of Fig. 3.
84 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.7, NO.1 May 2013
 0
 0.2
 0.4
 0.6
 0.8
 1
 1.2
 1.4
 1.6
 1.8
 0  5  10  15  20  25  30
Co
rd
ic
 In
ve
rs
io
n 
O
ut
pu
t
Input Signal (X)
CORDIC Output Signal
Hardware OutputMatlab Output
 0
 5e−06
 1e−05
 1.5e−05
 2e−05
 2.5e−05
 3e−05
 3.5e−05
 0  5  10  15  20  25  30
Input Signal (X)
CORDIC Error Calculation Signal
Co
rd
ic
 E
rro
r C
al
cu
la
tio
ns
Fig.3:: Measurements of the CORDIC Division with
unit Dividend (Q = 1X ) and the calculation errors by
testing of the 33 sets of inputs X.
 0
 5e−08
 1e−07
 1.5e−07
 2e−07
 2.5e−07
 3e−07
 3.5e−07
 4e−07
 4.5e−07
 5e−07
 100  105  110  115  120  125  130
Co
rd
ic
 E
rro
r C
al
cu
la
tio
ns
Input Signal (X)
CORDIC Error Calculation Signal
 0.0075
 0.008
 0.0085
 0.009
 0.0095
 0.01
 100  105  110  115  120  125  130
Co
rd
ic
 In
ve
rs
io
n 
O
ut
pu
t
Input Signal (X)
CORDIC Output Signal
Hardware OutputMatlab Output
Fig.4:: Measurements of the CORDIC Division with
unit Dividend (Q = 1X ) and the calculation errors
with another testing of the 133 sets of input X.
Input Y
Input X
−0.0001
 0
 0.0001
 0.0002
 0.0003
 0.0004
 0.0005
 0.0006
 1000  1500  2000  2500  3000  3500  4000C
or
di
c 
Er
ro
r C
al
cu
la
tio
ns
Set Number of Input Signal (X,Y)
CORDIC Error Calculation Signal
−9
−8
−7
−6
−5
−4
−3
−2
−1
 1000  1500  2000  2500  3000  3500  4000C
or
di
c 
D
iv
isi
on
 O
ut
pu
t Q
Set Number of Input Signal (X,Y)
CORDIC Matlab Output Signal
Matlab Output
−9
−8
−7
−6
−5
−4
−3
−2
−1
 1000  1500  2000  2500  3000  3500  4000
Co
rd
ic
 D
iv
isi
on
 O
ut
pu
t Q
Set Number of Input Signal (X,Y)
CORDIC Hardware Output Signal
Hardware Output
−5000
−4000
−3000
−2000
−1000
 0
 1000
 2000
 3000
 1000  1500  2000  2500  3000  3500  4000
In
pu
t S
ig
na
l X
,Y
Set Number of Input Signal (X,Y)
Input Signal X,Y
Fig.5:: Evaluation of the CORDIC Division oper-
ation (Q = YX ) and the calculation errors by incre-
menting both input operands.
Fig. 4 presents also another simulation result with
dierent input ranges, i.e. from X = 100 until
X = 133. The simulation results shown in Fig. 3 and
Fig. 4 are presented in this case to show the evalua-
tion result of the inversion step in the CORDIC-based
division operation. The evaluation of the division op-
erations itself will be presented in the next gure.
Fig. 5 presents a simulations result of division op-
erations, i.e. Q = YX , where the input signal X as
the divisor and input signal Y as the dividend are
shown in the rst and second lines of the diagrams
shown in the gure. Both input signals are incre-
mented to evaluate the CORDIC division operations
in dierent input ranges. The third and the fourth
lines of the diagrams are the calculation results of the
oating-point-based CORDIC Divider implemented
using VHDL model and the ideal Matlab output, re-
spectively. The diagram in the bottom line is the ab-
solute calculation errors between the CORDIC Hard-
ware and the Matlab output.
Another simulation result is presented in Fig. 6
Floating-Point Division Operator based on CORDIC Algorithm 85
Table 1:: Statistical error of CORDIC-based and traditional inverse calculation algorithm varied, where X
and Y are varied from 0.1 to 500.
Iter. Max. Min. Abs. Mean Std. Deviation
8 3.9030 -15.6220 3.676010 2 0.2999
16 0.0152 -0.0610 1.584110 4 1.470510 3
32 1.010 4 -4.468010 5 6.179810 7 3.363510 6
64 1.010 4 -4.468010 5 6.179810 7 3.363510 6
128 1.069410 6 -8.012310 7 1.188810 7 1.996810 7
Input X
Input Y
Set Number of Input Signal (X,Y)
Set Number of Input Signal (X,Y)
Set Number of Input Signal (X,Y)
Set Number of Input Signal (X,Y)
 0
 5e−05
 0.0001
 0.00015
 0.0002
 0.00025
 0.0003
 0.00035
 0.0004
 0.00045
 500  1000  1500  2000  2500  3000  3500  4000  4500  5000Co
rd
ic
 E
rro
r C
al
cu
la
tio
ns CORDIC Error Calculation Signal
 0
 1
 2
 3
 4
 5
 6
 7
 8
 9
 500  1000  1500  2000  2500  3000  3500  4000  4500  5000C
or
di
c 
D
iv
isi
on
 O
ut
pu
t Q
CORDIC Matlab Output Signal
Matlab Output
 0
 1
 2
 3
 4
 5
 6
 7
 8
 9
 500  1000  1500  2000  2500  3000  3500  4000  4500  5000
Co
rd
ic
 D
iv
isi
on
 O
ut
pu
t Q
CORDIC Hardware Output Signal
Hardware Output
 0
 50
 100
 150
 200
 250
 300
 350
 400
 450
 500
 500  1000  1500  2000  2500  3000  3500  4000  4500  5000
In
pu
t S
ig
na
l X
, Y
Input Signal X, Y
Fig.6:: Evaluation of the CORDIC Division oper-
ation (Q = YX ) and the calculation errors by incre-
menting the divisorX and decrementing the dividend
Y .
and Fig. 7. In the simulation result as presented in
Fig. 6, the input divisorX is ramp up, while the input
dividend Y is decremented. About 4500 sets of input
pairs are presented in the gure. As shown in the g-
ure, the division results of both the CORDIC hard-
ware and the matlab simulation tend to decrease ex-
ponentially, and the absolute calculation errors tends
also to decrease.
Fig. 7 shows another simulation, where the input
divisor X is decremented, while the input dividend
Input X
Input Y
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 0.7
 0.8
 0.9
 1
 0  500  1000  1500  2000C
or
di
c 
D
iv
isi
on
 O
ut
pu
t Q
Set Number of Input Signal (X,Y)
CORDIC Matlab Output Signal
Matlab Output
 0
 5e−06
 1e−05
 1.5e−05
 2e−05
 2.5e−05
 3e−05
 0  500  1000  1500  2000Co
rd
ic
 E
rro
r C
al
cu
la
tio
ns
Set Number of Input Signal (X,Y)
CORDIC Error Calculation Signal
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 0.7
 0.8
 0.9
 1
 0  500  1000  1500  2000C
or
di
c 
D
iv
isi
on
 O
ut
pu
t Q
Set Number of Input Signal (X,Y)
CORDIC Hardware Output Signal
Hardware Output
 0
 50
 100
 150
 200
 250
 300
 350
 400
 450
 500
 0  500  1000  1500  2000
In
pu
t S
ig
na
l X
, Y
Set Number of Input Signal (X,Y)
Input Signal X, Y
Fig.7:: Evaluation of the CORDIC Division oper-
ation (Q = YX ) and the calculation errors by decre-
menting the divisor X and incrementing the dividend
Y .
Y is incremented. As shown in the gure, the divi-
sion result of the CORDIC hardware tends to increase
exponentially, which in accordance with the matlab
simulation result. However, in this test case scenario,
the calculation errors tend to increase.
Table 1 shows the statistical analysis results of
the CORDIC hardware over the computational er-
rors compared to MATLAB simulation results. At
each number of iteration, the maximum, the mini-
mum and the absolute average errors as well as the
86 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.7, NO.1 May 2013
standard-deviation of the errors are evaluated over
5000 sets of input samples. It seems that the calcu-
lation errors decrease as the number of iterations is
increased.
5. SILICON-AND FPGA-BASED SYNTHE-
SIZED RESULT AND COMPARISON
The synthesis result of the oating-point CORDIC
Divider by using a 130-nm CMOS standard-cell tech-
nology library from Faraday Technology Corporation
is presented in Table 2. The synthesis result is made
by using target frequency of 500 MHz, resulting in
a slack-time of about 1.92 ns. The performance can
be still improved by using a newest and faster CMOS
standard-cell technology.
The logic cell area of components in the CORDIC
Hardware Divider is presented in Table 3. The to-
tal area of the CORDIC divider is 40612 m2. Most
of logic area is occupied by the multiplier and align-
ment units. The rest 5% logic-cell area is occupied
by combinatorial blocks. Since two parallel oating-
point operations are performed by CORDIC core, the
CORDIC has about 2500 MHz = 1 GFlops (Giga
oating-point operation per second).
Table 4 shows the synthesis result of the oating-
point CORDIC Divider on an FPGA device from Xil-
inx Corporation. By using the Virtex-2 FPGA de-
vice, the maximum data frequency of the CORDIC
core is slower than the synthesis result on CMOS
standard-cell technology. The maximum performance
of the CORDIC on the Virtex-2 FPGA is about
2 105:406 MHz = 210:812 MFlops.
A brief survey of several oating-point division im-
plementations in published articles is summarized in
Table 5. It is dicult to compare the designs and ar-
chitectures where dierent design methodologies were
applied, such as full custom and standard cell silicon
technology approach. However, the iterative archi-
tectural comparison based on the number of basic
oating-point operators and computational latency
Table 2:: Synthesis results using 130-nm CMOS
standard-cell technology library with target fre-
quency of 500 MHz.
Measurements Synth. result
Total logic cell area 0.0406 mm2
Slack time (critical path) 1.92 ns
Switching power (1.32V) 2.178 mW
Internal power (1.32V) 5.994 mW
Table 3:: Logic cell area of the components.
Component Cell area Percent.
(m2) (%).
Top Module 40612 100.0
Inversion Module 14264 35.0
Multiplier+Alignment 24222 59.6
Table 4:: Synthesis result using Virtex-2 FPGA de-
vice (2vp30896-7) from Xilinx Corporation.
Utilization % of Total
Number of slice 1254 of 13696 9%
Number of MULT18X18 4 of 136 2%
blocks
Minimum Delay 9.487 ns
Maximum Frequency 105.406 MHz)
can be used for performance and eciency evaluation
in top-level design. The proposed oating-point di-
vision based on CORDIC method is compared with
Goldschmit's [11] and Taylor-Series Expansion [10]
methods. The computational latency includes the
time to form the initial approximation and the appro-
priate number of iterations. The latency of the pro-
posed method is higher than Taylor-Series Expansion
method, but lower than Goldschmit's method. When
the number of basic oating-point operators are con-
sidered, the proposed CORDIC-based method applies
#1FPMUL and #2FPADD/SUB, where as Taylor-
Series Expansion method and Goldschmit's method
consume #2FPMUL, #1FPADD/SUB, and #2FP-
MUL, respectively.
Floating-Point Division Operator based on CORDIC Algorithm 87
Table 5:: Floating-Point Division comparison of published literatures in single-precision.
Methodology Latency Basic FP Operator Speed CMOS Technology size
Goldschmit's [11] 16/13 #2FPMUL 2.3 GHz 65 nm
Taylor-Series Expansion [10] 12/5 #2FPMUL+#1FPADD/SUB 500 MHz 65 nm
The proposed CORDIC-based 14/8 #1FPMUL+#2FPADD/SUB 500 MHz 130 nm
FPMUL : Floating-Point MULtiplier, FPFPADD/SUB : Floating-Point ADDer/SUBtractor
6. CONCLUSIONS AND FUTURE WORKS
A CORDIC core implementing a oating-point di-
vider operation has been presented in this paper. The
new algorithm is proposed to solve the limited input
domains of the input ranges that can be solved with
well convergence by the traditional CORDIC algo-
rithms to implement the division operations. The
CORDIC operation is basically divided into four
stages, i.e. divider exponent detection, divisor inver-
sion, inversion result alignment and multiplication.
The proposed algorithm shows that the CORDIC can
increase the input ranges that can guarantee the con-
vergence of the CORDIC algorithm. In future work,
the core will be integrated within a streaming proces-
sor [24]. Due to the recongurability of the CORDIC
core, the implementations of many trigonometric and
logarithmic functions, including the division opera-
tion on top of a single programmable/recongurable
CORDIC core will be helpful to reduce area of the
processor arithmetic units.
References
[1] B. Parhami, Computer Arithmetic: Algorithms
and Hardware Design. Oxford University Press,
New York, Oxford, 2000.
[2] D. Rao, S. Patil, N. Babu, and V. Muthuka-
mar, \Implementation and Evaluation of Image
Processing Algorithms on Recongurable Ar-
chitecture using C-based Hardware Description
Languages," International Journal of Theoreti-
cal and Applied Computer Sciences, vol. 1, no. 1,
pp. 9{34, 2006.
[3] A. Beaumont-Smith and S. Samudrala, \Method
and System of a Microprocessor Subtraction-
Division Floating-Point Divider," Patent US
7,127,483 B2, Oct. 24, 2006.
[4] D. J. Desmonds, \Binary Divider with Carry
Save Adders," Patent US 4,320,464, March 16,
1982.
[5] S. F. Oberman and M. J. Flynn, \Division Al-
gorithms and Implementations," IEEE Trans.
Computers, vol. 46, no. 8, pp. 833{854, Aug.
1997.
[6] J. Ebergen, I. Sutherland, and A. Chakraborty,
\New Division Algorithms by Digit Recurrence,"
in Conference Record of the Thirty-Eighth Asilo-
mar Conference on Signals, Systems and Com-
puters, vol. 2, 2004, pp. 1849{1855.
[7] L. Montalvo, K. Parhi, and A. Guyot, \A
Radix-10 Digit-Recurrence Division Unit: Algo-
rithm and Architecture," IEEE Trans. Comput-
ers, vol. 56, no. 6, pp. 727{739, June 2007.
[8] I. Rust and T. Noll, \A digit-set-interleaved
radix-8 division/square root kernel for double-
precision oating point," in IEEE International
Symposium on System on Chip (SoC), 2010, pp.
150{153.
[9] L. Montalvo, K. Parhi, and A. Guyot, \New
Svoboda-Tung Division," IEEE Trans. Comput-
ers, vol. 47, no. 9, pp. 1014{1020, Sep. 1997.
[10] T.-J. Kwon, J. Sondeen, and J. Draper,
\Floating-point division and square root using a
Taylor-series expansion algorithm," in the 50th
Midwest Symposium on Circuits and Systems
(MWSCAS 2007), 2007, pp. 305{308.
[11] S. F. Oberman, \Floating Point Division and
Square Root Algorithms and Implementation
in the AMD-K7 Microprocessor," in The 14th
IEEE Symposium on Computer Arithmetic,
1999, pp. 106{115.
[12] M. J. Schulte, D. Tan, and C. L. Lemonds,
\Floating-Point Division Algorithms for an x86
Microprocessor with a Rectangular Multiplier,"
in Proc. Int'l. Conf. on Computer Design
(ICCD), 2007, pp. 304{310.
[13] W. Gallagher and E. Swartzlander, \Fault-
Tolerance Newton-Raphson and Goldschmidt
Dividers using Time Shared TMR," IEEE Trans.
Computers, vol. 49, no. 6, pp. 588{595, June
2000.
[14] L.-K. Wang and M. Schulte, \Processing Unit
Having Decimal Floating-Point Divider Us-
ing Newton-Raphson Iteration," Patent US
7,467,174 B2, Dec. 16, 2008.
[15] P. Surapong, F. A. Samman, and M. Glesner,
\Design and Analysis of Extension-Rotation
CORDIC Algorithms based on Non-Redundant
Method," International Journal of Signal Pro-
cessing, Image Processing and Pattern Recogni-
tion, vol. 5, no. 1, pp. 65{84, March 2012.
[16] K. Maharatna, S. Banerjee, E. Grass, M. Krstic,
and A. Troya, \Modied Virtually Scaling-Free
Adaptive CORDIC Rotator Algorithm and Ar-
chitecture," IEEE Trans. on Circuits and Sys-
88 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.7, NO.1 May 2013
tems for Video Technology, vol. 15, no. 11, pp.
1463{1474, Nov. 2005.
[17] F. A. Sammany, P. Surapong, C. Spies, and
M. Glesner, \Floating-point-based Hardware
Accelerator of a Beam Phase-Magnitude De-
tector and Filter for a Beam Phase Control
System in a Heavy-Ion Synchrotron Applica-
tion," in Proc. Int'l Conf. on Accelerator and
Large Experimental Physics Control Systems
(ICALEPCS), 2011.
[18] H. Hahn, D. Timmermann, B. Hosticka, and
B. Rix, \A Unied and Division-Free CORDIC
Argument Reduction Method with Unlimited
Convergence Domain Including Inverse Hyper-
bolic Functions," IEEE Trans. on computers,
vol. 43, no. 11, pp. 1339{1344, Nov. 1994.
[19] J. Zhou, Y. Dou, Y. Lei, and Y. Dong,
\Hybrid-Mode Floating-Point FPGA CORDIC
Co-processor," in The 4th international work-
shop on Recongurable Computing: Architec-
tures, Tools and Applications, (ARC'08), 2008,
pp. 256{261.
[20] P. Surapong, F. Philipp, F. A. Samman,
and M. Glesner, \\Improvement of Standard
and Non-Standard Floating-Point Operators","
ECTI Trans. Computer and Information Tech-
nology, vol. 6, no. 1, pp. 9{22, May 2012.
[21] D. Munoz, D. Sanchez, C. Llanos, and M. Ayala-
Rincon, \FPGA based oating-point library
for CORDIC algorithms," in IEEE Conf. Pro-
grammable Logic Conference, 2010, pp. 55{60.
[22] T. Lund, M. Aguirre, and A. Torralba, \\Mak-
ing use of CORDIC and distributed arithmetic
to produce a eld-programmable fuzzy logic con-
troller in an FPGA"," in The 28th IEEE An-
nual Conf. of the Industrial Electronic Society
(IECON'02), 2002, pp. 3205{3208.
[23] B. Das and S. Banerjee, \Unied CORDIC-
based chip to realise DFT/DHT/DCT/DST,"
IEE Proceedings, Computers and Digital Tech-
niques, vol. 149, no. 4, pp. 121{127, Nov. 2002.
[24] F. A. Samman, P. Surapong, and M. Glesner,
\Recongurable streaming processor core with
interconnected oating-point arithmetic units
for multicore adaptive signal processing sys-
tems," in Proc. of the 6th International Work-
shop on Recongurable Communication-centric
Systems-on-Chip (ReCoSoC), 2011, pp. 1{6.
Pongyupinpanich Surapong was born
in Prachinburi, Thailand. He received
his Bachelor and Master of Engineering
degree in Electrical Engineering from
King Mongkut's Institute of Technol-
ogy Ladkrabang (KMITL), Thailand in
1998 and 2002, respectively. He received
his PhD degree in 2012 from Technische
Universitat Darmstadt, Germany. Cur-
rently, he is a lecturer and research fel-
low at Department of Computer Engi-
neering, Faculty of Engineering, Ramkhamhaeng Universitas,
in Bangkok, Thailand. His research interests include computer-
aided VLSI design, hardware modeling, design optimization al-
gorithm, circuit simulation, digital signal processing, system-
on-chip, all in the context of eld-programmable gate-array
devices and VLSI technology.
Faizal Arya Samman was born in
Makassar, Indonesia. He received
his Bachelor of Engineering degree in
Electrical Engineering from Universitas
Gadjah Mada (UGM), Yogyakarta in
1999 and his Master of Engineering de-
gree from Institut Teknologi Bandung
(ITB) in 2002. Since 2002 he has been
appointed to be a research and teach-
ing sta at Universitas Hasanuddin in
Makassar, Indonesia. He received his
PhD degree in 2010 from Technische Universitat Darmstadt,
Germany with scholarship award (2006-2010) from Deutscher
Akademischer Austausch-Dienst (DAAD, German Academic
Exchange Service). From 2010 until 2012, he was a postdoc-
toral fellow in the research project in LOEWE-Zentrum AdRIA
(Adaptronik-Research, Innovation, Application) within the re-
search cooperation framework between Technische Universitat
Darmstadt and Fraunhofer Institut LBF in Darmstadt. He is
now a lecturer and research fellow at Department of Electrical
Engineering, Faculty of Engineering, Universitas Hasanuddin,
in Makassar, Indonesia. His research interests include network
on-chip (NoC) microarchitecture, NoC-based multiprocessor
system-on-chip, design and implementation of analog and dig-
ital electronic circuits for control system applications on FP-
GA/ASIC as well as energy harvesting systems and wireless
sensor networks.
