Adder Based Residue to Binary Number Converters for (2n - 1; 2n; 2n + 1) by Wang, Y. et al.
1772 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 7, JULY 2002
Adder Based Residue to Binary Number Converters
for (2n 1;2n;2n +1 )
Yuke Wang, Xiaoyu Song, Mostapha Aboulhamid, Member, IEEE, and Hong Shen
Abstract—Based on an algorithm derived from the New Chi-
nese Remainder Theorem I, we present three new residue-to-bi-
nary converters for the residue number system (2 1 2 2 +
1) designed using 2 -bit or -bit adders with improvements on
speed, area, or dynamic range compared with various previous
converters.The2 -bitadderbasedconverterisfasterandrequires
about half the hardware required by previous methods. For -bit
adder-based implementations, one new converter is twice as fast as
the previous method using a similar amount of hardware, whereas
another newconverterachievesimprovementineitherspeed,area,
or dynamic range compared with previous converters.




HERE has been interest in residue number system (RNS)
arithmetic as a basis for computational hardware since the
1950s [1], [2]. During the past decade, the RNS has received
considerable attention in arithmetic computation and signal
processing applications, such as fast Fourier transforms, digital
filtering, and image processing [2], [3]. The main reasons
for the interests are the inherent properties of RNS such as
parallelism, modularity, fault tolerance, and carry-free oper-
ations [3]. The technology advantages offered by VLSI have
added a new dimension to the implementation of RNS-based
architectures. Several high-speed VLSI special-purpose digital
signal processors have been successfully implemented.
The two most important issues for the residue arithmetic are
the choice of moduli sets and the conversion of the residue to
binary numbers. The residue number system based on the set of
moduli hasgainedpopularityandisexpected
to play an increasing role in RNS digital signal processing [5].
For general moduli sets, the residue to binary conversions are
traditionally based on the Chinese Remainder Theorem (CRT)
or mixed-radix conversion. Some new general conversion algo-
rithms called New Chinese Remainder Theorems have been re-
cently proposed with smaller size modulo operations [13], [14].
Manuscript received August 5, 1999; revised March 19, 2002. The associate
editor coordinating the review of this paper and approving it for publication was
Prof. Scott C. Douglas.
Y.WangiswiththeDepartmentofComputerScience,ErikJonssonSchoolof
Engineering and Computer Science, University of Texas at Dallas, Richardson,
TX 75083-0688 USA (e-mail: Yuke@utdallas.edu).
X.SongiswiththeDepartmentofElectricalandComputerEngineering,Port-
land State University, Portland, OR 97207-0751 USA.
M. Aboulhamid is with the Departement d’informatique et de Recherche Op-
erationnelle, Universite de Montreal, Montreal, QC H3C 3J7 Canada.
H. Shen is with the School of Information Science, Japan Advanced Institute
of Science and Technology, Tatsunokuchi, Ishikawa, Japan.
Publisher Item Identifier S 1053-587X(02)05635-0.
Several conversion methods for have
beenreported[6]–[11],[15]–[18].Earlyconverters[17]forsuch
moduli sets use ROM, which can be limited by the size . In re-
cent years, converters using -bit or -bit adders have been
proposed. These converters are designed using special formulas
rather than the general CRT algorithm, and improvement in
terms of hardware complexity has been reported. Detailed com-
parisons of all those converters are presented in Tables I and II.
In this paper, for the moduli set ,w e
present new and uniform algorithms designed using the New
ChineseRemainderTheoremsfortheRNStobinaryconversion.
Three different converters using either -bit or -bit adders
are proposed. The -bit adder-based converter is faster and re-
quiresabouthalfthehardwarerequiredbythepreviousmethods
[7]–[9]. For -bit adder-based implementations, one new con-
verter is twice as fast as the previous method [6] using a similar
amount of hardware, whereas another new converter achieves
improvement in both speed and area. The amount of hardware
for the new converters is similar for the -bit adder-based con-
verter compared with the one in [9]. However, in [9], not the
entire dynamic range of numbers is used.
In the following, we first introduce background material and
derive the formulas; then, we show an example and propose
three different hardware implementations.
II. BACKGROUND
For any two numbers and , is defined
as for some integer such that .
can be written as or .
AnRNSisdefinedintermsofasetofrelativelyprimemoduli
,where for .Abinary
number can be represented as , where
, . Such a representation is unique
for any integer , .
For the RNS defined on the moduli set ,a
binarynumber can
be represented as a tuple , where and are two





To convert a residue number into its binary
number , the CRT and mixed-radix conversion method are
1053-587X/02$17.00 © 2002 IEEEWANG et al.: ADDER BASED RESIDUE TO BINARY NUMBER CONVERTERS 1773
TABLE I
PERFORMANCE COMPARISON OF 2n-BIT ADDER-BASED CONVERTERS
TABLE II
PERFORMANCE COMPARISON OF n-BIT ADDER-BASED CONVERTERS
traditionally used. We define to be the multiplicative
inverse of , i.e., .
ChineseRemainderTheorem: Thebinarynumber iscom-
puted by , where ,
and is the multiplicative inverse of .
The CRT requires a modulo (large-valued) operation,
which is not very efficient. Therefore, the converters proposed
in [6]–[11], [15], [16], and [18] use specially designed algo-
rithms to remove the modulo operation or to reduce the size
of the modulo operation. For example, the converters in [6] and
[14] are based on the formula ,
and methods are required to compute the coefficients
and . In [7], [9], and [15], the converters are based on the
formula , and methods for computing
are needed in each paper. In [7], the number is calculated
as , where , , , and are -bit
numbers obtained from . On the other hand, the
third formula in [15] reduces the size of the modulo operation
from to at the expense that some part of the dynamic
range will not be useable.
Recently,somealternativegeneralconversionalgorithms[the
New Chinese Remainder Theorems (New CRT-I, II, and III)
[13], [14]] have been proposed, which reduce the size of the
modulo operation required by the CRT.
New Chinese Remainder Theorem I (New CRT-I): Given
the residue number , the binary number
can be computed by (4), shown at the bottom of the page,
which can be easily simplified as (5), shown at the bottom
of the page, where ,
.
Based on the New CRT-I, we have the following theorem for
.
Theorem 1: For a three moduli set , the binary
number can be calculated as
(6)
where , and .
InSectionIII,weapply(6)tothemoduliset
to design the residue to binary converters.
III. BASIC FORMULAS
The following Theorem 2 is a direct application of The-
orem 1.
Theorem 2: For the moduli set , the
number can be computed from by the formula
(7)
(4)
(5)1774 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 7, JULY 2002
Proof: Using Theorem 1 and assuming that ,
, and ,w eh a v e and
such that and
. Therefore, we have the first equation at the
bottom of the page.
Proposition1: Foranyintegers and ,weha ve
.
Proof: Letting , we have that
and therefore, we have the following proposition.
Proposition 2: can be computed by (8.1)–(8.4), shown at
the bottom of the page, where and are the least signifi-
cant bits of and , respectively, and denotes the
XOR operation, i.e., XOR .
Proof: Let
in (7); then, we have , and
.
Since
, we denote ,
, and therefore, we have the last
equation at the bottom of the page, i.e., we have
Next, we present an example using the above formulas.
Example: Consider the example shown in [6]. Let
and a number 407, which can be rep-
resented as (1, 7, 2) in the moduli set (7, 8, 9).
Now,given(1,7,2) (001,111,0010),wehavetheequation
at the bottom of the next page. Compared with the long calcu-




(8.4)WANG et al.: ADDER BASED RESIDUE TO BINARY NUMBER CONVERTERS 1775
Fig. 1. Compute A using n FAs.
IV. NEW CONVERTERS
In Section III, we presented the necessary formulas for
residue to binary conversion. In this section, we propose new
converters using -bit or -bit adders based on the formulas
(8.1)–(8.4).
A. Basic Operations to Compute and
We have to compute the numbers
and in
order to obtain the values of
and
Let .
If , then ,
; , which implies that
, . Therefore, we have
(9)







is shown in Fig. 1(a) and (b). Fig. 1(b) shows the block dia-
gram of the unit. It consists of FAs, two MUXs, one XOR
gate, and inverters. The delay of this unit is the delay
of the FA plus the delay of an inverter and the delay of a MUX.
The circuit produces two numbers and
. We denote
and , and then, .1776 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 7, JULY 2002
Fig. 2. Compute B using n FAs.
Fig. 3. 2n-bit adder-based converters.
Next, we perform the addition
using FAs. The signal is connected to the carry-in bit
of the full adder at the last FA in Fig. 2(a). Fig. 2(b) shows the
block diagram of the unit. It consists of inverters, one HA,
and FAs. The delay of this unit (nFA2) is the delay of a full
adder plus the delay of an inverter. The circuit produces two
numbers ,
. We denote
and , and then,
.
Therefore, , as defined in (8.2), now becomes
i.e.,
(11)
where , , , and are all -bit numbers; is a
one-bit number.
Theadditionin(11)canbedoneinmanydifferentwaysusing
-bit or -bit adders. These different implementations will be
shown in the following.
B. -Bit Adder-Based Converter—Converter I
In the following, we present the new Converter I imple-
menting the addition in (11) using a -bit adder.
where , and
are two
-bit numbers, and is a one-bit number.
In Fig. 3(a), the units nFA1 and nFA2, which are used to pro-
duce , , , and , are connected to a -bit 1’s com-
plement adder. The -bit adder produces the value , which
forms the MSBs of the number , whereas forms the
LSBs of .
The hardware required in the new Converter I shown
in Fig. 3(a) is as follows: FAs, one HA, two MUXs,
one XOR gate, inverters, and one -bit 1s com-
plement adder. The delay of the converter is the sum
of the delay of the FA , the delay of an inverter ,
the delay of MUX , and the delay of the -bitWANG et al.: ADDER BASED RESIDUE TO BINARY NUMBER CONVERTERS 1777
Fig. 4. Converter II—using four n-bit adders.
1s complement adder [7], i.e.,
.
Intheliterature,oneofthebestconvertersusing -bitadders
is presented in [7]. In order to compare the performance, we
show the main components used in the converter proposed in
[7] as Fig. 3(b). The delay in [7] is
. For simplicity reasons, we only compare one version
of the implementation in [7]. The second implementation has
the same result. From the side-by-side comparison, it is easy to
see that we save one -bit CSA with end around carry (EAC).
Detailed comparison of the other related converters are sum-
marized in Table I, where the data for [8], [9], and [11] are from
[7, Table I]. In summary, Converter I is the best converter using
-bit adders, using about half of the hardware used in [7]. The
reason for such improvement is that the converters in [8], [9],
[11], and [18] use the formula , where
, , , and are bit numbers obtained from ,
whereas the new Converter I is derived based on the New Chi-
nese Remainder Theorem I and is computed by
, which reduces the four-
number operation into two numbers.
C. -Bit Adder Based Converters—Converter II and III
The addition in (11) can also be done by -bit adders. In this
section, we propose two such converters. The performance is to
be compared with the performance of the converter in [6], [15],
and [18], which use -bit adders as well. Since we can only
generate -bit numbers using -bit adders, we therefore obtain
the value in the form , where and are
both -bit binary numbers.
Recall that ,
where , , , and are all -bit numbers; is a
one bit number. Using an -bit adder, we can add and
together with , which generates a sum and a carry .
Similarly, we can add and using an -bit adder, which
generates a sum and a carry . Since the addition is module
addition, the carry represents a number that should
be added to the number . For the case where
the carry is 0, the sum is the value . For the case where
the carry is 0, the sum is the value such that
. However, when the carries and are not 0, the
value and must be modified to obtain the correct value
of and . In the following, we propose Converter II and
III for the operation. Compared with Converter II, Converter III
achieves faster speed while using more hardware.
Converter II: In Fig. 4, we use two carry look ahead (CLA)
adders to perform the operation and in
parallel. The results are denoted as and with carry
and , respectively. If ,w eh a v e
and . Similarly, two CLAs are used to
perform and , whereas the results are
denoted as and with carry and .I f ,
we have and .
The selector module selects the correct carry and the correct
sum for the number and . The function of the selector is
described in the following.
If and , then
Else if , then
If ,
else
Else if , then
If ,
else
Therefore, the carry if or (
and )o r( and ),
i.e., . Similarly,
.
The selector implements these two functions. Note here that
the selector does not introduce any extra delay since CLAs are
used, and the carries , , , and are generated during
the carry-generation phase of the CLAs and are available for
evaluation to the selector while the CLAs perform the summa-
tion.
The hardware required in Fig. 4 includes FAs, one HA,
MUXs, one XOR gate, inverters (including four
inverters for the selector), two AND gates for the selector, and
-bitCLAs.Thedelayoftheconverteris
.
Converter III: Considering the fact that
and , we can replace the CLA2 and CLA4 in
Fig.4byothercombinationalcircuitsthatperformtheoperation
and . Fig. 5 shows such a
converter.1778 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 7, JULY 2002
Fig. 5. Converter III—using 2 n-bit adders.
The circuit plus1 performs the function of adding 1 to a -bit
input number. Consider ,
.W eh a v et h e
following equations, which imply that the circuit plus1 requires
XOR gates and AND gates plus 1 inverter.
The hardware required in Fig. 5 includes FAs,
MUXs, XOR gate, inverters (including
four inverters for the selector and two for the plus-1 circuit),
AND gates for the selector and the plus-1 circuit, one
HA, and -bit CLAs. The delay of the converter is
.
In order to make clear comparison, the Fig. 6 shows the main
components for the converter proposed in [6]. No detailed im-
plementation is given for each module in [6]. We evaluate the
performance based on [4]. Recently, the results in [4] are also
used to evaluate the performance in [7]. Modules M1 and M2
require two CLAs and one CSA, where all are -bit adders,
one XOR for generating C1, and inverters for 2s comple-
ment operation. M3 and M4 require two additional CPAs and
inverters for 2s complement opseration. Module M6 uses
nine AND gates, one OR gate, eight inverters, and one XOR gate.
M5 uses bit memory to store the value. The delay is
.
Twomorerecentconvertersusing -bitaddershavealsobeen
proposed in [15] and [18]. The one in [18] is based on the ap-
proach in [7] and, therefore, has high hardware cost, whereas
the one in [15] has similar hardware cost as the new converters
proposed here. However, the converter in [15] has some unused
dynamic range.
Table II summarizes the comparison of the two converters
proposed in this paper as well as the converter in [6], [15], and
[18], where CII stands for Converter II, whereas CIII stands for
Converter III.
Assume , , ,
, ; then the delay of Converter II
is . The delay
of Converter III is
. The delay of the converter in [6] is
Fig. 6. Converter proposed in [6].
. The
delay of Converter II is almost half of the delay of the converter
in [6].
Assume the straightforward implementation of the CLA,
which consists of a carry look-ahead unit and a summation
unit, which, in total, require AND gates,
XOR gates, and OR gates. The hardware requirement in
[6] is even higher than the hardware required in Converter III,
whereas its delay is longer.
V. CONCLUSION
Three different residue-to-binary converters for the special
moduli have been presented in this paper.
Compared with various previous proposed converters, the new
converters proposed here have better performance in terms of
speed and area. The new converters are designed based on the
recentlyintroducedNewChineseRemainderTheorems.Itisex-





erees, which have improved the quality of the manuscript.
REFERENCES
[1] H. L. Garner, “The residue number system,” IRE Trans. Electron.
Comput., vol. EC-8, pp. 140–147, June 1959.
[2] N. Szabo and R. Tanaka, Residue Arithmetic and its Applications to
Computer Technology. New York: McGraw-Hill, 1967.
[3] M. A. Soderstrand et al., Ed., Residue Number System Arithmetic:
Modern Applications in Digital Signal Processing. New York: IEEE,
1986.
[4] Y. Hwang, Computer Arithmetic Principles, Architecture, and De-
sign. New York: Wiley, 1979.
[5] A. Ashur, M. K. Ibrahim, and A. Aggoun, “Novel RNS structures for
the moduli set (2 ￿1;2 ;2 +1)and their application to digital filter
implementation,” Signal Process., vol. 46, pp. 331–343, 1995.
[6] D. Gallaher, F. Petry, and P. Srinivasan, “The digital parallel method
for fast RNS to weighted number system conversion for specific moduli
(2 ￿1;2 ;2 +1),”IEEE Trans. Circuits Syst. II, vol. 44, pp. 53–57,
Jan. 1997.
[7] S. Piestrak, “A high-speed realization of a residue to binary number
system converter,” IEEE Trans. Circuits Syst. II, vol. 42, Oct. 1995.
[8] K. Ibrahim and S. Saloum, “An efficient residue to binary converter de-
sign,” IEEE Trans. Circuits Syst., vol. 35, pp. 1156–1158, Sept. 1988.
[9] S. Andraos and H. Ahmad, “A new efficient memoryless residue to bi-
naryconverter,”IEEETrans.CircuitsSyst.,vol.35,pp.1441–1444,Nov.
1988.
[10] F. Taylor and A. S. Ramnarynan, “An efficient residue-to-decimal con-
verter,” IEEE Trans. Circuits Syst., vol. CAS-28, Dec. 1981.
[11] A. Dhurkadas, “An efficient residue to binary converter design,” IEEE
Trans. Circuits Syst., vol. 37, pp. 849–850, June 1990.WANG et al.: ADDER BASED RESIDUE TO BINARY NUMBER CONVERTERS 1779
[12] Y. Wang and M. Abd-el-Barr, “A new algorithm for RNS decoding,”
IEEE Trans. Circuits Syst. I, vol. 43, pp. 998–1001, Dec. 1996.
[13] Y.Wang,“Residue-to-binaryconvertersbasedonnewchineseremainder
theorems,” IEEE Trans. Circuits Syst. II, pp. 197–206, Mar. 2000.
[14] , “New Chinese remainder theorems,” Proc. 32th Asilomar Conf.
Signals, Syst., Comput., vol. 1, pp. 165–171, 1998.
[15] R. Conway and J. Nelson, “Fast converter for 3 moduli RNS using new
property of CRT,” IEEE Trans. Comput., vol. 48, pp. 852–860, Aug.
1999.
[16] B. Vinnakota and V. V. B. Rao, “Fast conversion techniques for
binary-residue number systems,” IEEE Trans. Circuits Syst. I, vol. 41,
pp. 927–929, Dec. 1994.
[17] W.J. Jenkins, “Techniques for residue-to-analog conversion for residue-
encoded digital filters,” IEEE Trans. Circuits Syst., vol. CAS-25, pp.
555–562, July 1978.
[18] M.Bhardwaj,A.B.Premkumar,andT.Srikanthan,“Breakingthe2n-bit
carry propagation barrier in residue to binary conversion for the (2 ￿
1;2 ;2 +1 )module set,” IEEE Trans. Circuits Syst. II, vol. 45, pp.
998–1002, Sept. 1998.
Yuke Wang received the B.Sc. degree from the Uni-
versityofScienceandTechnologyofChina,Hefei,in
1989 and the M.Sc. and Ph.D. degrees from the Uni-
versity of Saskatchewan, Saskatoon, SK, Canada, in
1992 and 1996, respectively.
He has held faculty positions at Concordia Uni-
versity, Montreal, QC, Canada, and Florida Atlantic
University, Boca Raton. Currently, he is an Assistant
Professor with the Computer Science Department,
University of Texas at Dallas, Richardson. He has
also held Visiting Assistant Professor positions
with the University of Minnesota, Minneapolis, the University of Maryland,
College Park, and the University of California, Berkeley. His research interests
include VLSI design of circuits and systems for DSP and communications,
computer-aided design, and computer architectures. From 1996 to 2001, he has
published about 60 papers, among which, about 20 papers that have appeared
in IEEE/ACM Transactions. He is an Editor of Applied Signal Processing.
Dr. Wang is currently an Associate Editor of the IEEE TRANSACTIONS ON
CIRCUITS AND SYSTEMS,P ART II and of the IEEE TRANSACTIONS ON VLSI
SYSTEMS.
Xiaoyu Song received the M.S. and Ph.D. degrees in
computer engineering from University of Pisa, Pisa,
Italy, in 1987 and 1992, respectively.
From 1992 to 1997, he was a Faculty Member
with the University of Montreal, Montreal, QC,
Canada. He was a senior member of consulting
staff at Cadence, San Jose, CA. He is currently
an Associate Professor with the Department of
Electrical and Computer Engineering, Portland State
University, Portland, OR. His research interests
include IC and VLSI circuit design, testing and
verification, systems on a chip, and synthesis. He serves on the Editorial Board
of VLSI Design: An International Journal of Custom-Chip Design, Simulation,
and Testing.
Dr. Song is an editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS
and the IEEE TRANSACTIONS ON VLSI SYSTEMS. He has served on many pro-
gramcommittees,suchastheIEEEInternationalConferenceonQualityofElec-
tronics and the ACM International Conference on System-Level Interconnect
Prediction.
Mostapha Aboulhamid (M’82) received the Ph.D.
degree from the University of Montreal, Montreal,
QC, Canada, in 1985 and the engineering degree
from the University of Grenoble, Grenoble, France,
in 1974.
He is currently a Professor at the University of
Montreal. Hisresearch interests are in hardware/soft-
ware modeling and synthesis, paradigms for design
reuse, and hardware description languages.
Hong Shen received the B.Eng. degree from Beijing
University of Science and Technology, Beijing,
China, the M.Eng. degree from the University of
Science and Technology of China, Hefei, and the
Ph.Lic. and Ph.D. degrees from Abo Academi
University, Abo, Finland, all in computer science.
He is currently a Professor of computer science
with the Graduate School of Information Science,
Japan Advanced Institute of Science and Tech-
nology, Tatsunokuchi, Ishikawa. Previously, he
was a Professor with Griffith University, Brisbane,
Australia. He has published over 140 technical papers on algorithms, parallel
and distributed computing, interconnection networks, parallel databases and
data mining, multimedia systems, and networking. He has served as an editor
of Parallel and Distributed Computing Practice, Associate Editor of the Inter-
national Journal of Parallel and Distributed Systems and Networks, editorial
board member of Parallel Algorithms and Applications, the International
Journal of Computer Mathematics, and the Journal of Supercomputing.
Dr. Shen has been the chair/committee member of various international con-
ferences.