Architecture for dual-mode quadruple precision floating point adder by Bogaraju, SV et al.
Title Architecture for dual-mode quadruple precision floating pointadder
Author(s) Jaiswal, MK; Bogaraju, SV; So, HKH
Citation
The 2015 IEEE Computer Society Annual Symposium on VLSI
(ISVLSI), Montpellier, France, 8-10 July 2015. In Conference
Proceedings, 2015, p. 249-254
Issued Date 2015
URL http://hdl.handle.net/10722/214074
Rights IEEE Computer Society Annual Symposium on VLSI. Copyright© IEEE Computer Society.
Architecture for Dual-Mode Quadruple Precision Floating Point Adder
Manish Kumar Jaiswal, B. Sharat Chandra Varma, and Hayden K.-H So
Department of EEE, The University of Hong Kong, Hong Kong
Email: {manishkj, varma, hso}@eee.hku.hk
Abstract—This paper presents a configurable dual-mode
architecture for floating point (F.P.) adder. The architecture
(named as QPdDP) works in dual-mode which can operates
either for quadruple precision or dual (two-parallel) double
precision. The architecture follows the standard state-of-the-art
flow for floating point adder. It is aimed for the computation of
normal as well as sub-normal operands, along with the support
for the exceptional case handling. The key sub-components
in the architecture are re-designed & optimized for on-the-fly
dual-mode processing, which enables efficient resource sharing
for dual precision operands. The data-path is optimized for
minimal multiplexing circuitry overhead. The presented dual-
mode architecture provide SIMD support for double precision
operands, along with high (quadruple) precision support.
The proposed architecture is synthesized using UMC 90nm
technology ASIC implementation. It is compared with the
best available literature works, and have shown better design
metrics in terms of area, period and area× period, along with
more computational support.
Keywords-Floating Point Addition, Configurable Architec-
ture, Dual-Mode Arithmetic, ASIC, Digital Arithmetic.
I. INTRODUCTION
Floating point (FP) number system [1], due to its wide
dynamic range, is a common choice for a large set of
scientific, engineering and numerical processing computa-
tions. Generally, the performance of these computations
greatly depends on the underlying floating point arithmetic
processing unit. Furthermore, the availability for double pre-
cision (DP) computation is not enough and the demand for
high precision arithmetic is increasing in many application
areas [2], [3].
The contemporary processing units achieve high per-
formance requirement by using multiple units of single
precision and double precision arithmetic hardware. Like,
the Synergistic Processing Element (SPE) in Cell-BE pro-
cessor [4] contains a vector array of 4 single precision and
an array of 2 double precision. The ARM VFU co-processor
(VFU9-S) [5] provides a vector array of 16 single precision
FP units and 8 double precision vector array. Similarly, it
can be seen in recent Intel Xeon PhiTM and Nvidia KeplerTM
GK110 [6]. In general, these computing systems contain sep-
arate units/arrays for single precision and double precision
computations. However, if an unified and configurable unit
can support a double precision with dual/two-parallel single
precision (DPdSP) arithmetic, or quadruple precision (QP)
with dual/two-parallel double precision (QPdDP) arithmetic,
it can save a large silicon area in the above computing
machines. In view of above, this paper is aimed towards the
design of a configurable dual-mode floating adder/subtractor
architecture, with high precision support.
Some literature [7], [8], [9] have proposed dual-mode
architectures for adder. These works have tried to improve
the resource utilization for the hardware with multi-precision
computational support. However, the overhead of extra hard-
ware, and un-optimized data-path and resource sharing lead
to large overhead of area and period metrics. Furthermore,
they have limited support only for normal operands. The
dual-mode adder architectures of [7], [8] used a large
number of multiplexers (to support dual mode) at various
level of architecture, and have less tuned data path for dual
mode operation. Further the extra use of resources (like more
adders/subtractors for exponent & mantissa, relatively larger
dual shifters, extra mantissa normalizing shifters for dual
mode support) made their area & period overhead larger.
Some recent literature [10], [11] have also worked on the
dual-mode architectures, but with low precision support.
This paper proposes an architecture for dual-mode QPdDP
(quadruple precision with dual/two-parallel double preci-
sion) adder/subtractor arithmetic. The computational sub-
components are designed for configurable dual-mode sup-
port. The data-path is tuned for better resource sharing
and to minimize the multiplexing circuitry. The proposed
architecture provides full support for normal as well as sub-
normal operands computation, exceptional case handling,
and with round-to-nearest rounding method. Other rounding
methods can also be easily included. A pipelined architec-
ture is designed and synthesized using 90nm standard cell
based ASIC implementation. The proposed architecture is
compared with the best available literature.
II. PROPOSED ARCHITECTURE OF QUADRUPLE
PRECISION / DUAL (TWO-PARALLEL) DOUBLE
PRECISION (QPDDP) ADDER/SUBTRACTOR
The present work on the dual-mode floating point adder
architecture follows the basic single-path algorithm for this
computation. A floating point arithmetic computation in-
volves computing separately the sign, exponent and mantissa
part of the operands, and later combine them after rounding
and normalization [1]. The standard format for floating point
numbers are as follows:
SP :
Sign
︷ ︸︸ ︷
1−bit
Exponent
︷ ︸︸ ︷
8−bit
Mantissa
︷ ︸︸ ︷
23−bit
2015 IEEE Computer Society Annual Symposium on VLSI
978-1-4799-8719-1/15 $31.00 © 2015 IEEE
DOI 10.1109/ISVLSI.2015.70
249
DP :
Sign
︷ ︸︸ ︷
1−bit
Exponent
︷ ︸︸ ︷
11−bit
Mantissa
︷ ︸︸ ︷
52−bit
QP :
Sign
︷ ︸︸ ︷
1−bit
Exponent
︷ ︸︸ ︷
15−bit
Mantissa
︷ ︸︸ ︷
112−bit
A basic state-of-the-art computational flow of the floating
point adder is shown in the Algorithm 1. Here, steps 6-7
and step-22 are require for sub-normal processing. In this
work, each steps of the flow are constructed for the support
of the dual-mode operation with resource sharing and tuned
data-path with minimum multiplexing circuitry.
Algorithm 1 F.P. Adder Computational Flow [1]
1: (IN1, IN2) Input Operands;
2: Data Extraction & Exceptional Check-up:
3: {S1(Sign1), E1(Exponent1), M1(Mantissa1)} ← IN1
4: {S2, E2, M2} ← IN2
5: Check for INFINITY, NAN
6: Check for SUB-NORMALs
7: Update Exponents & Mantissa’s MSB for SUB-
NORMALs
8: COMPARE, SWAP & Dynamic Right SHIFT:
9: IN1_gt_IN2←{E1,M1} ≥ {E2,M2}
10: Large_E,M ← IN1_gt_IN2 ? E1,M1 : E2,M2
11: Small_E,M ← IN1_gt_IN2 ? E2,M2 : E1,M1
12: Right_Shift ← Large_E - Small_E
13: Small_M ← Small_M >> Right_Shift
14: Mantissa Computation:
15: OP ← S1⊕S2
16: if OP == 1 then
17: Add_M ← Large_M + Small_M
18: else
19: Add_M ← Large_M - Small_M
20: Leading-One-Detection & Dynamic Left SHIFT:
21: Left_Shift ← LOD(Add_M)
22: Left_Shift ← Adjustment for SUB-NORMAL or Under-
flow
23: Add_M ← Add_M << Left_Shift
24: Normalization & Rounding:
25: Mantissa Normalization & Compute Rounding ULP based
on Guard, Round & Sticky Bit
26: Add_M ← Add_M + ULP
27: Large_E ← Large_E + Add_M[MSB] - Left_Shift
28: Finalizing Output:
29: Update Exponent & Mantissa for Exceptional Cases
30: Determine Final Output
The architecture for proposed dual-mode QPdDP adder
is shown in Fig. 1. The input/output register for this ar-
chitecture is assumed as shown in Fig. 2. The two 128-bit
input operands, contain either 1 set of quadruple precision
or 2 sets of double precision operands. Based on the mode
deciding control signal (qp_dp), the dual-mode architecture
switched to either quadruple precision or dual (two-parallel)
double precision computation mode (qp_dp: 1 → QP Mode,
qp_dp: 0 → Dual DP Mode). All the computational steps
in QPdDP dual mode adder are discussed below in detail.
The data-extraction, sub-normal and exceptional handling
are shown in the Fig. 3. Based on the precision format,
the sign, exponent and mantissa parts of the operands
are extracted for both, the quadruple precision and double
precision.
m_L m_S e_L
Dua-Mode DRS
add_m
e_S
e_L
Data Extraction & SubNormal HandlerComparator
_sn: SubNormal
-gt-: Greater than
-eq-: Equal to
_L: Large
_S: Small
_l_: Left
_r_: Right
_s: Sign
_e : Exponent
_m: Mantissa
_op: Operation
Swap: Large Sign, Exp, Mant and OP
R_Shift_Amount
LOD_in
Left Shift Update (for subnormal, underflow)
Exponent Update 
(for subnormal, underflow, 
overflow, exceptional cases)
Final Processing
m_ovf
m_ovf
add_mu add_ml
add_m_shifted
qp_dp
qp_dp
dp1_r_shiftqp_r_shiftdp2_r_shift
qp_dp
m_S_shifted
dp2_op
dp1_op
qp_opDual-Mode Add/Sub
Dual-Mode LOD
dp1_l_shift_tmpqp_l_shift_tmpdp2_l_shift_tmpdp1_sn
dp2_sn
qp_sn
qp_dp
dp2_l_shift qp_l_shift dp1_l_shift
qp_dpDual-Mode DLS
qp_dp
dp1_sn
dp2_sn
qp_sn
qp_dp
dp1_Ls
dp2_Ls
qp_Ls
qp: Quadruple Precision
dp: Double Precision
qp_dp: Quadruple/Double
Mantissa Sum & LOD_in
Normalization & 
Dual-Mode Rounding
in2[127:0] in1[127:0]
128128
22 22
6 67
65 65
128
2
128
6 67
6 67
128
128
qp_dp
128
e_L
Figure 1: QPdDP Adder Architecture.
128-bit
64-bit
15-bit 112-bit
11-bit 52-bit 11-bit 52-bit
QP[127:64] / DP2 QP[63:0] / DP1
Figure 2: QPdDP Adder: Input / Output Register Format.
dp1_sn1=~|in1[62:52]
dp1_sn2=~|in2[62:52]
dp1_sn=dp1_sn1 & dp1_sn2
dp2_sn1=~|in1[126:116]
dp2_sn2=~|in2[126:116]
dp2_sn=dp1_sn1 & dp2_sn2
qp_sn1=~|in1[115:112] & dp2_sn1
qp_sn2=~|in2[115:112] & dp2_sn2
qp_sn = qp_sn1 & qp_sn2
dp1_e1={in1[62:53],in1[52] | dp1_sn1}
dp1_e2={in2[62:53],in2[52] | dp1_sn2}
dp2_e1={in1[126:117],in1[116] | dp2_sn1}
dp2_e2={in2[126:117],in2[116] | dp2_sn2}
qp_e1={in1[126:113],in1[112] | qp_sn1}
qp_e2={in2[126:113],in2[112] | qp_sn2}
qp_m1={~qp_sn1,in1[111:0]}
qp_m2={~qp_sn2,in2[111:0]}
dp1_m1={~dp1_sn1,in1[51:0]}
dp1_m2={~dp1_sn2,in2[51:0]}
dp2_m1={~dp2_sn1,in1[115:64]}
dp2_m2={~dp2_sn2,in2[115:64]}
Figure 3: QPdDP Adder: Data Extraction and Subnormal
Handler.
As shown in Fig. 2 that the exponent portion of QP and
250
dp1_in1-gt-in2 =(in1[62:0] > in2[62:0]) ? 1 : 0
dp2_in1-eq-in2=(in1[126:64] == in2[126:64]) ? 1 : 0 
dp2_in1-gt-in2 =(in1[127:64] > in2[127:64]) ? 1 : 0
qp_in1-gt-in2  = dp2_in1-gt-in2 | (dp2_in1-eq-in2 & 
((in1[63]&~in2[63) | (in1[63]~^in2[63])&dp1_in1-gt-in2)) 
QP[127:64] / DP2 QP[63:0] / DP1
Compare DP-1Compare DP-2
dp2_in1-gt-in2 dp1_in1-gt-in2
qp_in1-gt-in2
Figure 4: QPdDP Adder: Comparator.
second DP (DP-2) operand are overlapped.
QP Exponent
︷ ︸︸ ︷
xxxxxxxxxxx
︸ ︷︷ ︸
DP−2 Exponent
xxxx
This scenario is used to share the resources related to sub-
normal, infinity, and NaN checks computations of QP and
second DP operands (the checks of sub-normal is shown in
the Fig. 3, similarly the checks for infinity and NaN are
handled). After these exceptional checks the exponent and
mantissa are updated accordingly. In comparison to only QP
computation, this unit requires extra related resources for
first DP (DP-1) operands.
The dual-mode comparator unit for dual-mode QPdDP
adder is shown in Fig. 4. The comparator unit determines
which operand is large and which one is small. This unit is
shared among the QP and both DP operands. It comprises
of two comparator units for both DPs operands, which
generates their corresponding comparison results. These DP
results are further combined to form QP comparison. In
terms of resources, this comparator unit requires similar
resources as needed in only QP comparator, and there is
no area overhead in this unit.
The next computational unit in this architecture is the
Dual-Mode SWAP, which generates large sign (effectively
output sign-bit), small & large exponents, small & large
mantissas and effective operations (to be performed between
large and small mantissas). This computational unit is shown
in Fig. 5. For SWAP, in general to handle both DPs and
QP, it needs four 11-bit (for both DP exponents), two 15-
bit (for QP exponents), four 53-bit (for both DP mantissas)
and two 113-bit (for QP mantissa) SWAP components for
all the computations of this section. However, to minimize
the swapping overhead, the unified exponents, mantissas and
greater-than control signals are generated, by multiplexing
either of the quadruple precision or both double precision
operands (as shown in Fig. 5). This is an important step
included in the dual-mode QPdDP architectural flow, which
helps to design a tuned data-path computation in later stages,
with reduced multiplexing circuitry. Using these unified
exponents, mantissas and greater-than control signals, it
requires only four 11-bit (for exponents) and four 64-bit (for
mantissas) SWAP circuitry for entire processing. Effectively,
dp1_Le = e_L[10:0]
dp2_Le = e_L[21:10]
qp_Le = e_L[14:0]
OP
dp1_op = dp1_s1 ~^ dp1_s2
dp2_op = dp2_s1 ~^ dp2_s2
qp_op = qp_s1 ~^ qp_s2
Large Sign
dp1_Ls = dp1_in1-gt-in2 ? dp1_s1 : dp1_s2
dp2_Ls = dp2_in1-gt-in2 ? dp2_s1 : dp2_s2
qp_Ls = qp_in1-gt-in2 ? qp_s1 : qp_s2
C1 C2
e1[10:0]
e2[10:0]
e_L[10:0]
e1[21:11]
e2[21:11]
e_L[21:11]
Large Exp
Small Exp 
e_S[10:0] = c1 ? e2[10:0] : e1[10:0] e_S[21:11] = c2 ? e2[21:11] : e1[21:11]
Large Mantissa
m_L[63:0]= c1 ? m1[63:0] : m2[63:0]
m_L[127:64]= c2 ? m1[127:64] : m2[127:64]
Small Mantissa
m_S[63:0]= c1 ? m2[63:0] : m1[63:0]
m_S[127:64]= c2 ? m2[127:64] : m1[127:64]
0
1
0
1
Unified Exponent
qp_dp
C1
qp_dp
qp_in1_gt_in2
dp2_in1_gt_in2
C2
Unified Compare
0
1qp_in1_gt_in2
dp1_in1_gt_in2 0
1
Unified Mantissa qp_dp
{qp_m1,15’b0}
m1
{dp2_m1,11’b0,
dp1_m1,11’b0} 0
1
qp_dp
{qp_m2,15’b0}
m2
{dp2_m2,11’b0,
dp1_m2,11’b0} 0
1
qp_dp
e1
{dp2_e1,dp1_e1}
{7’b0,qp_e1}
0
1
qp_dp
e2
{dp2_e2,dp1_e2}
{7’b0,qp_e2}
0
1
shift = e_L - e_S
qp_r_shift = dp_sp ? shift[14:0] : 0
dp1_r_shift = ~qp_dp ? shift[10:0] : 0
dp2_r_shift = ~qp_dp ? shift[21:11] :0
Right Shift Amount
Figure 5: QPdDP Adder: SWAP - Large Sign, Exponent,
Mantissa and OPERATION; Right Shift Amount.
it needs SWAP components slightly more than it requires for
only QP (only QP requires two 15-bit SWAP for exponents
and two 113-bit SWAP for mantissas), along with extra
multiplexing circuitry needed to generate unified signals,
however, facilitates the tuned data-path processing. Further,
among extra appended LSB ZEROs in mantissa multiplexing
(for m1 and m2), 3-bit are for Guard, Round and Sticky bit
computations in rounding phase, and remaining can provide
extended precision support to the operands.
The m_L contains mantissa of either large QP operand or
both of large DP operands. Similarly, m_S contains small
mantissas. Likewise, e_L contains large exponent, and e_S
contains small exponents, either of QP or both DP operands.
Now, the small mantissa needs right shifting by the
difference of large and small exponents. The right shift
amount for small mantissas are determined using the com-
ponent shown in Fig. 5. In general, it requires two 11-
bit subtractors for both double precision and one 15-bit
subtractor for quadruple precision. However, because of
effective multiplexing of operands in SWAP section, it
needs only one a 22-bit subtractor. A subtraction of unified
large exponent (e_L) and unified small exponent (e_S) will
produce right shift amount either for quadruple precision or
for both double precision. For right shift amount, compared
251
0101
01
y=2**x
>> y
One Stage Unit
[127:64] [63:0]
qp[x] | dp2[x] qp[x] | dp1[x]
[127:64] [63:0]
[63+y:64] [63-y:0]
qp_dp & qp[x]
[63:0][127:64]
>> y
in
01
Shifted Output
SHIFT<-- qp[6:0], dp2[5:0], dp1[5:0]
qp[6]
[127:0]
Dual-Mode Right Shifting 
(6 Stage <-  f(qp[5:0], dp2[5:0], dp1[5:0])
in[127:0]<-- {[127:0]} / {[63:0],[63:0]}
in >> 64
Figure 6: QPdDP Dual Mode Dynamic Right Shifter (DRS).
add_mu add_ml
qp_dp[63:0][63:0][127:64][127:64]
qp_dp
qp_op
dp2_op
dp1_op
qp_op
Add/Sub 64-bit
add_ml[64]
Add/Sub 64-bit
m_S_shiftedm_L
65 65
Figure 7: QPdDP Adder: Dual Mode Mantissa Addi-
tion/Subtraction.
to only quadruple precision, it requires extra resources for 7-
bit subtraction. Other processing in this section are bit-wise
operations, and are done separately for all operands.
For right shifting of small mantissas of quadruple and
both double precision operands, a dual-mode dynamic right
shifter (DRS) is designed. The QPdDP dual-mode dynamic
right shifter is shown in Fig. 6, which is used to right-shift
the small mantissas of either QP or both DPs. The initial step
in it right-shifts the operand by 64-bit in case of QP mode
with its true shift bit. The later 6-stages in it works in dual
mode, either for QP or for both DPs operands. Each dual-
mode stage contains two shifters for each of 64-bit blocks,
which right-shifts their inputs corresponding to their shifting
bit (either for quadruple or double precision). Each of these
stages also include one multiplexer which selects between
lower shifting output or their combination with primary input
to the stage, based on the mode of the operation.
Further to the right shifting of small mantissas, the
core operation of mantissa addition/subtraction fall in the
computational flow. The large mantissas and right-shifted
small mantissas undergo addition/subtraction based on their
effective operation. This computation is performed in dual-
mode using two 64-bit integer adder-subtraction unit, which
individually works for each double precision, and collec-
tively works for quadruple precision computation (as shown
in Fig. 7). This unit generates the lower and upper parts
of addition/subtraction separately. This component requires
effectively similar resources as present in only QP adder.
The lower and upper mantissa addition/subtraction results
[62:0][63:0] 1
add_ml[63:1]1
qp_dp
add_ml[64]
0
add_mu[0]
add_mu[64:1]
add_m
[62:0]1
01 qp_dp
LOD_in
add_mu[62:0]
[62:0]1
|add_mu[64:63]add_ml[63]
|add_ml[64:63]
add_ml[62:0]
mant_ovf
add_mu[64:0] add_ml[64:0]
Figure 8: QPdDP Dual Mode Mantissa SUM and LOD_in.
1 0
out_hout_hv
{1’b1,out_l}{1’b0,out_h}
out_lout_lv
LOD_in[127:64] LOD_in[63:0]
LOD_64:6 LOD_64:6
dp1_shift[5:0]dp2_shift[5:0] qp_shift[6:0]
LOD_in[63:32] LOD_in[31:0]
1 0
out_hout_hv
{1’b1,out_l}{1’b0,out_h}
out_lout_lv
out_valid
LOD_32:5 LOD_32:5
out[5:0]
LOD_64:6 LOD_128:7
Figure 9: QPdDP Dual Mode Leading-One-Detector.
generated in previous unit combined in “Mantissa SUM
and LOD_in unit”, to provide the actual sum (either for
QP or both DPs), mantissa overflow, and the input for next
level unit, leading-one-detector (LOD). This unit is shown
in Fig. 8.
The mantissa sum now requires to check for any un-
derflow, which requires a leading-one-detector (LOD), and
further a dynamic left shifter for mantissa. This situation
occurs when two very close mantissa undergoes subtrac-
tion operation. The LOD requires to compute the left-
shift amount. In present context, the dual-mode leading-one-
detector for QPdDP processing is shown in Fig. 9. The input
of LOD is either a QP LOD_in or two DP LOD_in. The
dual mode LOD is designed in a hierarchical manner, which
leads to 64-bit LOD. It is comprised of two 64-bit LOD.
The individual 64-bit LOD provides left shift information
for both DP operands, and collectively for QP operand. It
effectively requires resources equivalent to that of only QP
LOD.
The left shift amount, thus generated from LOD, is then
updated for sub-normal input cases (both sub-normal input
operands) and underflow cases (if left shift amount exceeds
or is equal to the corresponding large exponent). For both
sub-normal input operand case, the corresponding left shift is
forced to zero, and for the underflow case, the corresponding
left shift will be equal to corresponding large exponent
decremented by one. For the exponent decrements, one of
the related subtractor is shared for the QP and first DP,
as done in the case of computation of right shift amount.
This becomes possible because the required LSBs of e_L
are shared among the exponents of QP and first DP. This
exponent decrements requires one 7-bit (shared for QP
and a DP) and one 6-bit (for another DP) decrement. All
other computations, related to left shift update need to be
computed separately for QP and both DPs. With true qp_dp,
both DPs’ left shift are set to zero, and for false qp_dp the
QP left shift is forced to zero.
252
[63:0][127:65] 1
x[127] {x[62:0],0}
x[63:0]
1 000 11
qp_dp
x[127] x[63]
01qp_dp
0
01
x[64]
x[63]
x[127:65]
x[126:64]
Figure 10: QPdDP Dual Mode 1-Bit Left Shifter.
qp_dp
add_m[127:64]
Add 64-bit
add_ml_r
Add 64-bit
add_mu_r
add_m[63:0]
Cin dp2_ULP
add_ml_r[64]
dp1_ULP
qp_ULP
qp_dp
0
11
0 Cin
L: Rounding Position Bit
G: Guard Bit
R: Round Bit
S: Sticky Bit
ULP = (G & (R | S) ) | (L & G & ~(R | S))
R
S
G
G L
ULP
Figure 11: QPdDP Dual Mode ULP Addition.
The mantissa sum is then shifted left using a dual-mode
dynamic left shifter (DLS). The basic design concept for
dual-mode DLS architecture is similar to the dual-mode
DRS, except that there is change in the shifting direction.
(Architecture of DLS is not shown due to space limitation).
The output of dual-mode DLS then undergoes 1-bit left
shifting (normalization), in-case of mantissa overflow in
mantissa-addition. The dual-mode 1-bit left-shifter unit is
shown in Fig. 10. It either performs a 1-bit left shifting for
QP mode, or carries out 1-bit left-shifting for both DPs,
with-respect-to their corresponding mantissa overflow. The
resource requirement for this unit is similar to that of a only
QP 1-bit shifter, except two 1-bit 2:1 MUX.
The output from 1-bit left shifter is further processed for
rounding computation and ULP-addition (Fig. 11). In present
work, the round-to-nearest method is included, however,
other method can be included easily. The rounding ULP
computations are done based on LSB precision bit, Guard
bit, Round bit and Sticky bit. Here, the ULP computation
is required for separately for each of QP and both DP.
However, the ULP-addition is shared among both, as shown
in Fig. 11.
Parallel to above mantissa processing, in Exponent-update
unit, the exponents are updated for mantissa overflow and
mantissa underflow. In this, the large exponents need to be
incremented by one or decremented by left shift amount
(LargeExp+mant_ov f −Le f t_Shi f t). Since large exponent
(e_L) either contains large QP exponent or both DPs expo-
nents, this update is shared for the QP and DP-1, by sharing a
subtractor, similar to left shift update computation. In effect
it requires a 15-bit shared subtractor and a 11-bit subtractor
for DP-2. Thus it needs an extra 11-bit subtractor for DP-
2 processing, and a 7-bit multiplexer for left shift amount
multiplexing for the shared subtractor, as an overhead over
only QP processing.
Finally, the exponents and mantissas are updated for
underflow, overflow, sub-normal and exceptional cases to
produce the final output, and each requires separate units
for QP and both DPs. For overflow, the exponent will be set
to infinity and mantissa will be set to zero, and for underflow
case exponent will be set to zero and mantissa will take its
related computed value. The computed signs, exponents and
mantissas of quadruple precision and both double precision
are finally multiplexed to produce the final 128-bit output,
which either contains a QP output or two DP outputs.
III. IMPLEMENTATION RESULTS AND COMPARISONS
The proposed dual-mode QPdDP adder architecture is
synthesized using UMC90 nm standard-cell based ASIC
platform, using Synopsys Design Compiler. An architecture
for QP only and DP only adder is also designed (using sim-
ilar data path computational flow) and synthesized for area
& period overhead measurements. These architectures are
designed with four pipeline stages (as shown in Fig. 1). The
implementation details are shown in Table-I. Architectures
are synthesized for best possible period. The functionality
of proposed architecture is verified using 5-millions random
test cases in each mode, with all possible pairs of operands
(normal, sub-normal, exceptional cases).
The proposed dual-mode QPdDP adder architecture re-
quires approximately 17% more hardware resources and
roughly 5.45% extra period than only DP adder. However,
in comparison with a combination of 1-unit QP only and
two-units of DP only adder, the proposed QPdDP adder
requires approximately 35.86% smaller area ((QP+2*DP-
QPdDP)/(QP+2*DP)).
A comparison of dual-mode QPdDP architecture with
previous works is shown in Table-II. The comparisons are
carried out in terms of % area-overhead and % period-
overhead over corresponding only QP adder. Moreover,
for a technological independent comparison, gate-equivalent
or scaled area equivalent, and “Fan-Out-of-4’ (FO4) delay
are used. An unified comparison of area× period is also
performed.
A dual-mode QPdDP adder architecture is presented by
A. Akkas [7] with 3 & 6 pipelining stages, using 250 nm
technology. It requires approximately 15% more area and
roughly 8− 14% more period than their only QP design.
The proposed QPdDP architecture has similar area-overhead,
with smaller period overhead. Moreover, the area× period
of proposed architecture is much smaller than QPdDP adder
of [7]. Furthermore, the architectures shown in [7] does not
support sub-normal operands computation and exceptional
case handling.
A 110 nm based dual-mode QPdDP adder is proposed
by [8], with 3-stage and 5-stage pipelines. These archi-
tectures do not provide computational support for sub-
normal operands and without any exceptional case handling.
For their architectures, the area-overhead ranges between
253
Table I: ASIC Implementation Details
DP QP QPdDP
Latency 4 4 4
Area(μm2) 31863 76779 90116
Area(gates) 10621 25593 30038
Period(ns) 0.95 1.1 1.16
Period(FO4) 21.11 24.44 25.78
Power(mw) 7.26 12.87 16.93
Table II: Comparison of QPdDP Architecture with Related
Work
[7] 250nm [8] 110nm Proposed 90nm
Latency 3 6 3 5 4
Area OH1 15.3% 14.01% 35.80% 27.31% 17.37%
Period OH1 14.12% 8.71% 18.65% 10.11% 5.45%
Scaled Area2 - - 239250 199723 90116
Gate Count3 26967 33702 - - 30038
Period (FO4)4 65.28 35.92 28.9 17.81 25.78
Area × Period
(106) #1
- - 6.91 3.55 2.32
Area × Period
(106) #2
1.76 1.21 - - 0.77
1Area/Period OH = (QPdDP - QP) / QP
2in μm2 @ 90nm = (Area @ 110nm) * (90/110)2
3Based on minimum size inverter
41 FO4 (ns) ≈ (Tech. in μm) / 2
#1Scaled Area × Period (FO4), #2Gate Count × Period (FO4)
27−35% and period overhead is approximately 10−18%.
Compared to this work, the proposed dual-mode QPdDP
architecture outperforms them in terms of design overheads,
as well as in terms of design metrics: the area, period and
area× period.
Thus, compared to previous works, the proposed dual-
mode QPdDP adder architecture has smaller area-overhead
and period-overhead when compared to only QP adder. The
proposed QPdDP architecture shows an improvement of
approximately 50% in terms of unified metrics area× period
products.
IV. CONCLUSIONS
A configurable architecture for dual-mode floating point
adder arithmetic is presented in this paper. The proposed
dual-mode QPdDP adder architecture provides normal &
sub-normal computational support and exceptional case han-
dling. The data path and sub-components in the architecture
are constructed/re-designed for on-the-fly dual-mode pro-
cessing, with minimal required multiplexing. The presented
dual-mode QPdDP adder architecture needs approximately
17% more resources and 5.45% more period than the QP
only adder. When compared with the best literature work,
the proposed dual-mode design has approximately 50%
smaller area× period product, and has smaller area & period
overhead over only QP adder. It also provides more com-
putational support than previous literature work. Our future
work is aiming towards a tri-mode adder architecture, which
along with proposed computation can also be configured to
handle four-parallel single precision computation.
V. ACKNOWLEDGMENTS
This work is party supported by the “The University
of Hong Kong” grant (Project Code. 201409176200), the
“Research Grants Council” of Hong Kong (Project ECS
720012E), and the “Croucher Innovation Award” 2013.
REFERENCES
[1] “IEEE Standard for Floating-Point Arithmetic,” Tech. Rep.,
Aug. 2008.
[2] F. de Dinechin and G. Villard, “High precision numerical ac-
curacy in physics research,” Nuclear Instruments and Methods
in Physics Research Section A: Accelerators, Spectrometers,
Detectors and Associated Equipment, vol. 559, no. 1, pp. 207–
210, 2006.
[3] D. H. Bailey, R. Barrio, and J. M. Borwein, “High-precision
computation: Mathematical physics and dynamics,” Applied
Mathematics and Computation, vol. 218, no. 20, pp. 10 106–
10 121, 2012.
[4] H.-J. Oh, S. Mueller, C. Jacobi, K. Tran, S. Cottier,
B. Michael, H. Nishikawa, Y. Totsuka, T. Namatame,
N. Yano, T. Machida, and S. H.Dhong, “A fully pipelined
single-precision floating-point unit in the synergistic proces-
sor element of a cell processor,” Solid-State Circuits, IEEE
Journal of, vol. 41, no. 4, pp. 759–771, 2006.
[5] NXP Semiconductors, “AN10902 : Using the LPC32xx
VFP,” in Application note, Feb 2010. [Online]. Available:
www.nxp.com/documents/application\_note/AN10902.pdf
[6] Nvidia, “NVIDIA’s Next Generation CUDATM Compute
Architecture: KeplerTM GK110,” in White Paper, 2014.
[Online]. Available: www.nvidia.com/content/PDF/kepler/
NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf
[7] A. Akkas, “Dual-Mode Quadruple Precision Floating-Point
Adder,” Digital Systems Design, Euromicro Symposium on,
vol. 0, pp. 211–220, 2006.
[8] ——, “Dual-mode floating-point adder architectures,” Journal
of Systems Architecture, vol. 54, no. 12, pp. 1129–1142, Dec.
2008.
[9] M. Ozbilen and M. Gok, “A multi-precision floating-point
adder,” in Research in Microelectronics and Electronics,
2008. PRIME 2008. Ph.D., 2008, pp. 117–120.
[10] M. Jaiswal, R. Cheung, M. Balakrishnan, and K. Paul,
“Unified architecture for double/two-parallel single precision
floating point adder,” Circuits and Systems II: Express Briefs,
IEEE Transactions on, vol. 61, no. 7, pp. 521–525, July 2014.
[11] ——, “Configurable architecture for double/two-parallel sin-
gle precision floating point division,” in VLSI (ISVLSI), 2014
IEEE Computer Society Annual Symposium on, July 2014,
pp. 332–337.
254
