Complex Multiply-Add and Other Related Operators by Ercegovac, Milos & Muller, Jean-Michel
Complex Multiply-Add and Other Related Operators
Milos Ercegovac, Jean-Michel Muller
To cite this version:
Milos Ercegovac, Jean-Michel Muller. Complex Multiply-Add and Other Related Operators.
Franklin T. Luk. SPIE Conf. Advanced Signal Processing Algorithms, Architectures and Im-
plementation XVII, 6697, Aug 2007, San Diego, United States. SPIE, 2007. <ensl-00167372>
HAL Id: ensl-00167372
https://hal-ens-lyon.archives-ouvertes.fr/ensl-00167372
Submitted on 20 Aug 2007
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destine´e au de´poˆt et a` la diffusion de documents
scientifiques de niveau recherche, publie´s ou non,
e´manant des e´tablissements d’enseignement et de
recherche franc¸ais ou e´trangers, des laboratoires
publics ou prive´s.
Complex Multiply-Add and Other Related Operators
Milosˇ D. Ercegovaca and Jean-Michel Mullerb
aComputer Science Department, UCLA, Los Angeles, CA 90025, USA
bENS-Lyon, France
ABSTRACT
In this work we present algorithms and schemes for computing several common arithmetic expressions defined
in the complex domain as hardware-implemented operators. The operators include Complex Multiply-Add
(CMA : ab + c), Complex Sum of Products (CSP : ab + ce + f), Complex Sum of Squares (CSS : a2 + b2),
and Complex Integer Powers (CIPk : x2, x3, ..., xk). The proposed approach is to map the expression to a
system of linear equations, apply a complex-to-real transform, and compute the solutions to the linear system
using a digit-by-digit, the most significant digit first, recurrence method. The components of the solution vector
corresponds to the expressions being evaluated. The number of digit cycles is about m for m-digit precision. The
basic modules are similar to left-to-right multipliers. The interconnections between the modules are digit-wide.
Keywords: Complex arithmetic, multiply-add, sum of products, sum of squares, integer powers
1. INTRODUCTION
With the exception of complex addition and multiplication,?,?,? complex online SVD,? complex operations are
typically not implemented in hardware. Recently, hardware-oriented methods for complex division and square
root have been introduced.?,?
In this paper we describe a new method for computing several common arithmetic expressions defined in the
complex domain, suitable for hardware implementation as operators. The operators include Complex Multiply-
Add (CMA : ab + c)), Complex Sum of Products (CSP : ab + ce + f), Complex Sum of Squares (CSS :
a2 + b2), and Complex Integer Powers (CIPk : x2, x3, ..., xk). The variables and results are fixed-point complex
numbers. The proposed approach is to map the expression to a system of linear equations, apply a complex-to-real
transform, and compute the solutions to the linear system using a digit-by-digit, the most significant digit first
method. The components of the solution vector corresponds to the expressions being evaluated. The number
of digit cycles is about m for m-digit precision. The basic modules are similar in complexity to left-to-right
multipliers. The interconnections between the modules are digit-wide. The proposed method is a generalization
of a polynomial evaluation method over the reals introduced as the E-method,?,? and recently discussed in.?
This paper is based on the report? where the complex E-method is introduced and discussed in general terms.
The method uses the following approach: (i) an expression is mapped onto a system of linear equations, (iii)
a transform is applied to change the complex domain to the real domain, and (iii) the system is solved in a
digit-by-digit manner, the most-significant digit first.
We first review the transform which allows the method to be used in the complex field C as discussed in.?
Then we show how to use the complex evaluation method (CE-method) in implementing complex operators
CMA, CSP , CSS, and CIP .
Further author information: (Send correspondence to M. Ercegovac, milos@cs.ucla.edu)
2. COMPLEX-REAL (CR) TRANSFORMS
Complex numbers can be represented by 2× 2 skew-symmetric matrices
x+ iy ↔
(
x −y
y x
)
(1)
This isomorphism holds for complex addition and multiplication which are used in the proposed method.
Consequently, an m×n matrix of complex numbers can be represented as a 2m× 2n matrix of real numbers.
For n×n complex matrices, considered in this paper, the transform from the complex domain to the real domain
is as follows.
The CR-transform of a n-dimensional complex linear system is a 2n-dimensional real linear system. For
example, let n =, then the CR transform is

 a1,1 a1,2 a1,3a2,1 a2,2 a2,3
a3,1 a3,2 a3,3

×

 s1s2
s3

 =

 t1t2
t3

 (2)


ar1,1 −a
i
1,1 a
r
1,2 −a
i
1,2 a
r
1,3 −a
i
1,3
ai1,1 a
r
1,1 a
i
1,2 a
r
1,2 a
i
1,3 a
r
1,3
ar2,1 −a
i
2,1 a
r
2,2 −a
i
2,2 a
r
2,3 −a
i
2,3
ai2,1 a
r
2,1 a
i
2,2 a
r
2,2 a
i
2,3 a
r
2,3
ar3,1 −a
i
3,1 a
r
3,2 −a
i
3,2 a
r
3,3 −a
i
3,3
ar3,1 a
i
3,1 a
r
3,2 a
i
3,2 a
r
3,3 a
i
3,3


×


sr1
si1
sr2
si2
sr3
si3


=


tr1
ti1
tr2
ti2
tr3
ti3


(3)
where aj,k = a
r
j,k + ia
i
j,k, sj = s
r
j + is
i
j and tj = t
r
j + it
i
j . These two linear systems are equivalent.
The real linear system (??) is obtained from the complex linear system (??) by replacing each complex
element by the 2× 2 matrix defined in (??). In the following sections we review a hardware-oriented method for
solving such a system.?
3. REAL E-METHOD
For simplicity, we discuss here radix-2 E-method. Adaptation to higher radices is straightforward. The radix-2
method consists in solving the n-dimensional linear system
As = v
using the following iteration on residuals:
w(j) = 2×
[
w(j−1) −Ad(j−1)
]
(4)
with w(0) = [v0, v1, . . . , vn]
T , and d(j) = [d0, d1, . . . , dn]
T where the digits d
(j)
k are in {−1, 0, 1}. Define the
number D
(j)
k = d
(0)
k .d
(1)
k d
(2)
k . . . d
(j)
k (the d
(j)
k are the digits of a radix-2 signed-digit representation of D
(j)
k ). By
induction,
w(j) = 2j
[
w(0) −AD(j−1)
]
. (5)
Using (??), one can show that if the residuals |w
(j)
k | are bounded, then for all k, D
(j)
k converges to sk as j goes
to infinity. At step j we must select a value of the digits d
(j)
k from the residuals w
(j)
k such that the values w
(j+1)
k
remain bounded. The following selection function, proposed in ? as a form of rounding, achieves such a choice.
SEL(x) =
{
sign x× ⌊|x+ 1/2|⌋ , if |x| ≤ 1
sign x× ⌊|x|⌋ , otherwise,
(6)
To avoid carry-propagate addition in the recurrence, the selection function is applied to an estimate of the
residual : d
(j)
k = SEL(wˆ
(j)
k ), where wˆ
(j)
k is a low-precision approximation to w
(j)
k .
The iterations converge to the desired result if residual vector w(j) is bounded. Let ξ, α and ∆ (with
0 ≤ ∆ < 1) be constants such that
1. Sum of magnitudes of off-diagonal elements in matrix A: ≤ α;
2. Magnitudes of right-hand side elements: ≤ ξ;
3. |w
(j)
k,r − wˆ
(j)
k,r| ≤
∆
2 , and |w
(j)
k,i − wˆ
(j)
k,i | ≤
∆
2
Since |d
(j−1)
k,r − wˆ
(j−1)
k,r | ≤ 1/2 and |d
(j−1)
k,i − wˆ
(j−1)
k,i | ≤ 1/2, from (??) we find
|w
(j)
k,r| ≤ 2
(
1
2
+
∆
2
+ α
)
= 1 +∆+ 2α. (7)
The same bound holds for |w
(j)
k,i |. For this bound to be feasible, we must assure that a suitable choice of d
(j)
k,r and
d
(j)
k,r in {−1, 0, 1} is possible. This requires that |w
(j)
k,r| and |w
(j)
k,i | should be less than 3/2. Therefore,
∆ + 2α ≤
1
2
(8)
Since |w
(0)
k,r| and |w
(0)
k,i | must also be less than 3/2, we get
ξ ≤
3
2
(9)
In the complex E-method the coefficient matrix A, the solution vector s and the right-hand side v are in the
complex domain. Moreover, the complex residual vector is denoted as
w(j) = [w
(j)
0,r, w
(j)
0,i , w
(j)
1,r, w
(j)
1,i · · · , w
(j)
n,r, w
(j)
n,i]
and the complex digit vector d(j) is written as
d(j) = [d
(j)
0,r, d
(j)
0,i , d
(j)
1,r, d
(j)
1,i , · · · , d
(j)
n,r, d
(j)
n,i]
The mapping of operators on linear systems and complex residual iterations are discussed in the following sections
for each complex operator.
4. COMPLEX MULTIPLY-ADD OPERATOR (CMA)
The operator computes y = ab+ c. In the real domain, the mapping to a linear system is(
1 −a
0 1
)
×
(
s0
s1
)
=
(
c
b
)
where the solution is obtained as s0 = y. In the complex domain, the variables are y = y
r + iyi, a = ar + iai,
b = br + ibi, and c = cr + ici. The mapping in this case is

1 0 −ar ai
0 1 −ai −ar
0 0 1 0
0 0 0 1

×


sr0
si0
sr1
si0

 =


cr
ci
br
bi


so that sr0 = y
r and si0 = y
i.
The residual recurrences are:
w
(j)
0,r = 2
[
w
(j−1)
0,r − d
(j−1)
0,r + a
rd
(j−1)
1,r − a
id
(j−1)
1,i
]
w
(j)
0,i = 2
[
w
(j−1)
0,i − d
(j−1)
0,i + a
id
(j−1)
1,r + a
rd
(j−1)
1,i
]
w
(j)
1,r = 2
[
w
(j−1)
1,r − d
(j−1)
1,r
]
w
(j)
1,i = 2
[
w
(j−1)
1,i − d
(j−1)
1,i
]
(10)
with the initial conditions
w
(0)
0,r = c
r, w
(0)
0,i = c
i, w
(0)
1,r = b
r, w
(j)
1,i = b
i
The convergence requires that the following conditions are satisfied
|ar|+ |ai| ≤ α, |cr|, |ci|, |br|, |bi| ≤ 3/2 (11)
From condition (??) and using ∆ = 1/8, we get α ≤ 3/16. Therefore, |ar|, |ai| ≤ 3/32 assures convergence of
the algorithm. This range reduction of the input a can be achieved by scaling of the initial value.?
A scheme for implementing the CMA operator is shown in Figure ??(a) and the corresponding elementary
unit (EU0) is illustrated in Figure ??(b). Elementary units EU1 can be simplified as discussed below. A bit-
parallel bus transmits a values in a broadcast mode, while b and c variables are loaded in separate cycles. Note
that the initialization cycles could be shorter than the iteration cycles.
EU0i
EU1r
EU1i
EU0r d j0,r
a, b, c
bus
OFCr
OFCr
digit-serial
digit-parallel
y
REG
MG
REG
MG
MUX
REG
MUX
REG
[4:2] ADDER
SEL
0
ws wc
ws wc
EU0r  (EU0i similar)
a r a i c r
(a) (b)
d j0,i
d j1,i
d j1,r
d j0,r
d j-11,r d j-11,i
w0,r - d j-10,r
Figure 1. (a) Scheme for CMA operator. (b) Block diagram of elementary unit.
A block diagram of an elementary unit EU0 (real part only) for CMA operator is shown in Figure ??(b). It
uses the following modules:
• Registers (4)
• Multiple generators MG (2), producing {−1, 0, 1} × ar and {−1, 0, 1} × ai, and buffers
• Multiplexers MUX (2) for initializing the residual
• A [4:2] adder (the shaded MS part performs the indicated subtraction of the selected digit)
• Output digit selection SEL (a small table or a gate network)
The elementary unit EU1 can be greatly simplified. This module, in general radix-r case is effectively a
recoder from the digit set {0, 1, . . . , r − 1} to the the set {−a, . . . , a} where r/2 ≤ a ≤ r − 1. In the case of
radix-2, this recoding is unnecessary: the module is a left shift register.
The digit-serial outputs of EUs can be converted into digit-parallel form using on-the-fly converters OFCr
and OFCi as indicated by the thick lines.?
The cycle time (without interconnect delay between units), in terms of a full adder delay t, is estimated as
TEU−CMA = tBUFF + tMG + tSEL + t[4:2] + tREG
≈ (0.4 + 0.3 + 1 + 1.3 + 0.9)t = 3.9t (12)
The cost, again in terms of area of a full adder AFA, is estimated as
AEU−CMA(m) = ASEL + 2ABUFF + (m+ 2)[2AMG
+ 2AMUX +A[4:2] + 4AREG +AOFC ]
≈ [5 + 2× 0.4 + (m+ 2)(4× 0.45
+ 2.3 + 4× 0.6 + 2.1)]AFA
≈ (23 + 9m)AFA (13)
The cost is estimated as area occupied by modules using the area of a full-adder AFA as the unit. The areas of
primitive modules are: Register AREG = 0.6AFA; buffer ABUFF = 0.4AFA; MUX AMUX = 0.45AFA; multiple
generator MG AMG = 0.45AFA; [4:2] adder A[4:2] = 2.3AFA; SEL ASEL = 5AFA, and on-the-fly converters
AOFC = 2AMUX + 2AREG = 2.1AFA. A total cost of an m-bit CMA operator is
ACMA(m) = 2×AEU−CMA(m) + 2× (m+ 2)AREG ≈ (50 + 20m)AFA
5. COMPLEX SUM OF PRODUCTS OPERATOR (CSP )
The operator computes y = ab+ ce+ f . In the real domain, the mapping to a linear system is
 1 −a −c0 1 0
0 0 1

×

 s0s1
s2

 =

 fb
e


and the solution s0 = y. In the complex domain, a = a
r + iai, b = br + ibi, etc. The mapping in this case is

1 0 −ar ai −cr ci
0 1 −ai −ar −ci −cr
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1


×


sr0
si0
sr1
si1
sr2
si2


=


fr
f i
br
bi
er
ei


The residual recurrences are:
w
(j)
0,r = 2
[
w
(j−1)
0,r − d
(j−1)
0,r + a
rd
(j−1)
1,r − a
id
(j−1)
1,i + c
rd
(j−1)
2,r − c
id
(j−1)
2,i
]
w
(j)
0,i = 2
[
w
(j−1)
0,i − d
(j−1)
0,i + a
id
(j−1)
1,r + a
rd
(j−1)
1,i + c
id
(j−1)
2,r + c
rd
(j−1)
2,i
]
w
(j)
1,r = 2
[
w
(j−1)
1,r − d
(j−1)
1,r
]
w
(j)
1,i = 2
[
w
(j−1)
1,i − d
(j−1)
1,i
]
w
(j)
2,r = 2
[
w
(j−1)
2,r − d
(j−1)
2,r
]
w
(j)
2,i = 2
[
w
(j−1)
2,i − d
(j−1)
2,i
]
(14)
with the initial conditions
w
(0)
0,r = f
r, w
(0)
0,i = f
i, w
(0)
1,r = b
r, w
(j)
1,i = b
i, w
(j)
2,r = e
r, w
(j)
2,i = e
i
The convergence requires that the following conditions are satisfied
|ar|+ |ai|+ |cr|+ |ci| ≤ α, |fr|, |f i|, |br|, |bi|, |er|, |ei| ≤ 3/2 (15)
From condition (??) and using ∆ = 1/8, we get α ≤ 3/16 and |ar|, |ai|, |cr|, |ci| ≤ 3/64 to assure convergence
of the algorithm. This range reduction can be achieved by scaling of the initial values.?
A scheme for implementing the CSP operator is shown in Figure ??(a) and the corresponding elementary
unit (EU) is illustrated in Figure ??(b). A bit-parallel bus transmits a and c, while the real and imaginary parts
of f , b and e are loaded in separate cycles. Note that the initialization cycles could be shorter than the iteration
cycles.
A block diagram of elementary unit EU0r (real part only) for the CSP operator is shown in Figure ??(b).
The modules used are:
• Registers (6)
• Multiple generators MG (4), producing {−1, 0, 1} × ar and {−1, 0, 1} × ai, and buffers
• Multiplexers MUX (2) for initializing the residual
• A [6:2] adder (the shaded MS part performs the indicated subtraction of the selected digit)
• Output digit selection SEL (a small table or a gate network)
As in the CMA operator, the digit-serial outputs of the EU0 can be converted into digit-parallel form using
on-the-fly converters OFCr and OFCi. The cycle time, in terms of a full adder (complex gate) delay t, is
estimated as
TEU−CSP = tBUFF + tMG + tSEL + t[6:2] + tREG
≈ (0.4 + 0.3 + 1 + 2.3 + 0.9)t = 4.9t (16)
The cost, again in terms of area of a full adder AFA, is estimated as CHANGE
AEU−CSP (m) = ASEL + 4ABUFF + (m+ 2)[4AMG
+ AMUX +A[6:2] + 6AREG + 2AOFC ]
≈ [5 + 4× 0.4 + (m+ 2)(5× 0.45
+ 4.3 + 6× 0.6 + 2× 2.1)]AFA
≈ (35 + 14m)AFA (17)
EU0i
EU0r d j0,r
a, b, c’
e, f bus
OFCr
OFCr
y
REG
MG
[6:2] ADDER
SEL
ws wc
EU0r  (EU0i similar)
a r
(b)
d j0,i
d j0,r
d j-11,r
w0,r - d j-10,r
(a)
EU1r
EU1i
d j1,i
d j1,r
EU2r
EU2i
d j2,i
d j2,r
MUX
REG
MUX
REG
0
ws wc
f r
REG
MG
a id j-11,i
REG
MG
c rd j-12,r
REG
MG
c id j-12,i
digit-serial
digit-parallel
Figure 2. (a) Overall scheme for CSP operator. (b) Block diagram of elementary unit.
As discussed in the previous section, the cost is estimated as the area occupied by the modules using the area
of a full-adder AFA as the unit. A total cost of an m-bit CSP operator is
ACSP (m) = 2×AEU−CSP (m) + 4× (m+ 2)AREG ≈ (70 + 30m)AFA
6. COMPLEX SUM OF SQUARES OPERATOR (CSS)
This operator computes y = a2 + b2. In the real domain, the mapping to a linear system is
 1 −a −b0 1 0
0 0 1

×

 s0s1
s2

 =

 0a
b


and the solution s0 = y. In the complex domain, a = a
r + iai, and b = br + ibi. The mapping in this case is

1 0 −ar ai −br bi
0 1 −ai −ar −bi −br
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1


×


sr0
si0
sr1
si1
sr2
si2


=


0
0
ar
ai
br
bi


The residual recurrences are:
w
(j)
0,r = 2
[
w
(j−1)
0,r − d
(j−1)
0,r + a
rd
(j−1)
1,r − a
id
(j−1)
1,i + b
rd
(j−1)
2,r − b
id
(j−1)
2,i
]
w
(j)
0,i = 2
[
w
(j−1)
0,i − d
(j−1)
0,i + a
id
(j−1)
1,r + a
rd
(j−1)
1,i + b
id
(j−1)
2,r + b
rd
(j−1)
2,i
]
w
(j)
1,r = 2
[
w
(j−1)
1,r − d
(j−1)
1,r
]
w
(j)
1,i = 2
[
w
(j−1)
1,i − d
(j−1)
1,i
]
w
(j)
2,r = 2
[
w
(j−1)
2,r − d
(j−1)
2,r
]
w
(j)
2,i = 2
[
w
(j−1)
2,i − d
(j−1)
2,i
]
(18)
with the initial conditions
w
(0)
0,r = 0, w
(0)
0,i = 0, w
(0)
1,r = a
r, w
(j)
1,i = a
i, w
(j)
2,r = b
r, w
(j)
2,i = b
i
The convergence requires that the following conditions are satisfied
|ar|+ |ai|+ |br|+ |bi| ≤ α (19)
From condition (??) and using ∆ = 1/8, we get α ≤ 3/16 and |ar|, |ai|, |br|, |bi| ≤ 3/64 to assure convergence
of the algorithm. This range reduction can be achieved by scaling of the initial values.?
A scheme for implementing the CSS operator (general and elementary unit) is similar to that of Figure ??.
Consequently the delays and the cost are similar as estimated for the CSP operator.
7. COMPLEX INTEGER POWERS OPERATOR (CIP )
The operator computes in the real domain consecutive integer powers of the argument x in parallel: x2, x3, ...,
xk. The corresponding linear system is


1 −x 0 0 0 · · · 0
0 1 −x 0 0 · · · 0
...
...
...
...
... · · ·
...
0 0 · · · 0 0 1 −x
0 0 0 · · · 0 0 1


×


s0
s1
...
sk−1
sk


=


0
0
...
0
0
x


and the integer powers are obtained as
s0 = x
k, s1 = x
k−1, . . . , sn−1 = x
2
The mapping in the complex domain is shown next. The complex argument is z = x+ iy.
A =


1 0 −x y 0 0 0 0 · · · 0
0 1 −y −x 0 0 0 0 · · · 0
0 0 1 0 −x y 0 0 · · · 0
0 0 0 1 −y −x 0 0 · · · 0
...
...
...
...
...
...
...
...
...
...
0 0 · · · 0 0 0 1 0 −x y
0 0 · · · 0 0 0 0 1 −y −x
0 0 · · · 0 0 0 0 0 1 0
0 0 · · · 0 0 0 0 0 0 1


The components of the solution s of the linear system
A×
(
sr0, s
i
0, s
r
1, s
i
1, . . . , s
r
k−1, s
i
k−1, s
r
k, s
i
k
)T
(20)
= (0, 0, 0, 0, . . . , 0, 0, x, y)
T
are equal to the integer powers of z.
The residual recurrences are
w
(j)
h,r = 2
[
w
(j−1)
h,r − d
(j−1)
h,r + xd
(j−1)
h+1,r − yd
(j−1)
h+1,i
]
w
(j)
h,i = 2
[
w
(j−1)
h,i − d
(j−1)
h,i + yd
(j−1)
h+1,r + xd
(j−1)
h+1,i
] (21)
for h = k,
w
(j)
k,r = 2
[
w
(j−1)
k,r − d
(j−1)
k,r
]
w
(j)
k,i = 2
[
w
(j−1)
k,i − d
(j−1)
k,i
]
with the initial conditions for h = 0, . . . , k − 1
w
(0)
h,r = 0, w
(0)
h,i = 0
and
w
(0)
k,r = x, w
(0)
k,i = y
The convergence requires that the following conditions are satisfied
|x|+ |y| ≤ α (22)
From condition (??) and using ∆ = 1/8, we get α ≤ 3/16 and |x|, |y| ≤ 3/32 to assure convergence of the
algorithm. This range reduction can be achieved by scaling of the initial values.?
A scheme for implementing the CIP operator is shown in Figure ??. The corresponding elementary unit
is similar to the EU of the CMA operator, illustrated in Figure ??(b), with the same cycle time and cost. A
bit-parallel bus transmits x and y values in a broadcast mode as discussed earlier. A total cost of an m-bit CIPk
operator is
ACIPk(m) = (k − 1)×AEU−CMA(m) + 2× (m+ 2)AREG
The same scheme, with different initialization, can be used to evaluate complex polynomials.?
EU0i
EU1r
EU1i
EU2r
EU2i
EU0r
EU3r
EU3i
s r0
s r1
s i0
s r3
s r2
s i3
s i2
s i1
x, y
bus
OFCr
OFCr
digit-serial
digit-parallel
Figure 3. Scheme for CIP operator.
8. SUMMARY
We presented a general method for computing several commonly used arithmetic expressions in the complex
domain: multiply-add, sum of products, sum of squares, and integer powers. The method consists of mapping
the operators on diagonally-dominant systems of complex linear equations, transforming the system from the
complex to the real domain, and solving it using digit-by-digit MSDF algorithm. The latency is roughly m
cycles for m bits of precision and independent of the order of the resulting linear system. The cycle time is
independent of m. We discussed the mapping of operators to linear systems, transforms from the real to the
complex domains, the recurrences and convergence conditions. Implementations of the proposed operators are
discussed at a high level with estimates of the cost and cycle time. The method used here has been applied in
the case of complex polynomials and rational functions.?,?
REFERENCES
1. T. Aoki, H. Amada, and T. Higuchi. Real/complex reconfigurable arithmetic using redundant complex
number systems. Proc. 13th IEEE Symposium on Computer Arithmetic, pp.200-207, 1997.
2. M.D. Ercegovac. A general method for evaluation of functions and computation in a digital computer. PhD
thesis, Dept. of Computer Science, University of Illinois, Urbana-Champaign, 1975.
3. M.D. Ercegovac. A General Hardware-oriented Method for Evaluation of Functions and Computations in a
Digital Computer. IEEE Trans. Comp., C-26(7):667–680, 1977.
4. M.D. Ercegovac and T. Lang, Digital Arithmetic, Morgan Kaufmann Publishers, San Francisco, 2004.
5. M.D. Ercegovac and J.-M. Muller. Complex Division with Prescaling of Operands. IEEE International
Conference on Application-Specific Systems, Architectures and Processors, pp. 293-303, 2003.
6. M.D. Ercegovac and J.-M. Muller, Design of a complex divider. Proc. SPIE on Advanced Signal Processing
Algorithms, Architectures, and Implementations XII, pp. 51-59, 2004.
7. M.D. Ercegovac and J.-M. Muller. Complex Square Root with Operand Prescaling. IEEE International
Conference on Application-Specific Systems, Architectures and Processors, pp. 293-303, 2004.
8. M.D. Ercegovac and J.-M. Muller, Solving Systems of Linear Equations in Complex Domain : Complex
E-Method. LIP Report No. 2007-2, E´cole Normale Supe´rieure de Lyon, France.
9. M.D. Ercegovac and J.-M. Muller, A Hardware-Oriented Method for Evaluating Complex Polynomials. IEEE
International Conference on Application-Specific Systems, Architectures and Processors, 2007.
10. R.D. McIlhenny, Complex Number On-line Arithmetic for Reconfigurable Hardware: Algorithms, Implemen-
tations, and Applications, Ph.D. Dissertation, Computer Science Department, University of California, 2002.
11. V. Oklobdzija, D. Villeger and T. Soulas, An Integrated Multiplier for Complex Numbers. J. of VLSI Signal
Processing, vol.7, no. 3, pp.213-222, May 1994.
12. B.W.Y. Wei, H. Du, and H. Chen, A Complex-Number Multiplier Using Radix-4 Digits. Proc. 12th IEEE
Symposium on Computer Arithmetic, pp. 84-90, 1995
