Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues by Premkumar, A. B. et al.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133
Improved Memoryless RNS Forward Converter
Based on the Periodicity of Residues
A. B. Premkumar, Senior Member, IEEE, E. L. Ang, and Edmund M.-K. Lai, Senior Member, IEEE
Abstract—The residue number system (RNS) is suitable for DSP
architectures because of its ability to perform fast carry-free arith-
metic. However, this advantage is over-shadowed by the complexity
involved in the conversion of numbers between binary and RNS
representations. Although the reverse conversion (RNS to binary)
is more complex, the forward transformation is not simple either.
Most forward converters make use of look-up tables (memory). Re-
cently, a memoryless forward converter architecture for arbitrary
moduli sets was proposed by Premkumar in 2002. In this paper,
we present an extension to that architecture which results in 44%
less hardware for parallel conversion and achieves 43% improve-
ment in speed for serial conversions. It makes use of the periodicity
properties of residues obtained using modular exponentiation.
Index Terms—Forward and reverse converters, periodicity
property, processing elements, residue number system (RNS).
I. INTRODUCTION
ARITHMETIC operations based on residue number sys-tems (RNS) can be carried out without intermediate
carry digits. However, intermodular operation and conversion
between number systems are awkward and have prevented the
wide-spread use of RNS. Converting a number from a binary
representation to its RNS equivalent is known as forward con-
version while the inverse operation is called reverse conversion.
Even though reverse conversion is generally more complex,
forward conversion for arbitrary moduli sets is not simpler.
For special moduli sets of the type, forward converters
require only modular adders and therefore can be easily im-
plemented. However, forward conversion for arbitrary moduli
sets is memory intensive. There are three main approaches for
forward conversion. The first approach involves precomputing
all possible values that the conversion requires and storing these
values in memory [2], [3]. The second approach involves using
efficient arithmetic units called processing elements (PE) along
with some memory. In both cases, the memory size requirement
increases as the dynamic range increases. The third approach is
memoryless in that it involves only combinatorial logic in the
design. A framework for memoryless forward conversion has
recently been introduced [1].
In this paper, we present an improvement to the memoryless
architecture in [1]. The improved design makes use of the cyclic
properties of residues to reduce hardware requirement. This
paper is organized as follows. Residue numbers and some early
forward converters are introduced in Section II. In Section III,
the converter proposed in [1] is briefly presented. Periodicity
properties of residues are discussed in Section IV. An improved
Manuscript received February 17, 2004; revised April 17, 2005. This paper
was recommended by Associate Editor S. Rosing.
The authors are with the School of Computer Engineering, Nanyang Techno-
logical University, Singapore.
Digital Object Identifier 10.1109/TCSII.2005.857090
converter which combines the periodicity property with the
converter proposed in [1] is then formulated. The advantages of
the new converter are discussed in Section VI and comparisons
between converters are made.
II. RNS AND ROM-BASED FORWARD CONVERSION
Any -bit nonnegative integer in the range
can be represented in the weighted binary system as
where . In RNS, is represented
by a set of residues as , where
. In this system, the set constitute
the moduli set [4]. The range of the RNS representation is given
by .
Converting a number from its binary representation to an
RNS representation is performed by the forward converter. We
shall briefly describe some earlier converters as well as a more
recent one [1] on which the present work is based.
In Alia and Martinelli’s method [2], the residue corre-
sponding to the th bit of an -bit binary number with respect
to is determined and stored in a register. Two such registers
storing the residues of adjacent bits are combined in a PE. For
a given -bit number, depending on the value of the bit ,
either the register content or a 0 is output. The outputs of two
adjacent PEs are combined using a modular adder. This
first level of computation requires modular adders. The
outputs from several adders are combined in a second level of
adders and this process is continued until the final residue is
obtained. A modification to this method was proposed by them
[3] in which the -bit word is partitioned into smaller
words. The residues corresponding to each group of bits
are stored in a ROM which are then accessed and added by
modular adders. Capocelli and Giancarlo [5] suggested storing
the residue corresponding to the first bit in a group of bits,
doubling this residue mod and evaluating the residue of the
next power of 2 in that group. This process can be pipelined
and the residues corresponding to only one group of bits need
to be stored. Hence, this results in more efficient use of ROM.
Piestrak, in [6] and subsequently in [7], [8] proposed residue
generators based on the periodic properties of residues. His
residue converters are based on half and full periods. They
make use of two different periods, namely, the period
of odd moduli and the half period of odd moduli.
exists for all moduli while only exists for
some odd moduli. Both these periods are calculated using
recursion and a table of periods and half periods for various
moduli are generated. Using properties associated with period-
icity and half periodicity, Piestrak proposed four architectures
that are suitable for input generators modulo . Ananda
Mohan [9] proposed partitioning the given -bit number based
1057-7130/$20.00 © 2006 IEEE
134 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006
on the cyclic property of . Using the fact that
have the same residues due to the periodicity
, fields of bits are added. The width of the result is
confined to bits by adding the carry bit resulting from the
addition to the result. The residue is then determined using
techniques given in [2], [3]. This results in similar PEs being
used in the transformation algorithm. Ananda Mohan later used
these principles to propose residue converters using ROM.
Although minor architectural differences exist, the converters
discussed so far invariably use ROM. In the next section, a
more recent work on forward converters that does not require
the use of memory is considered.
III. FORWARD CONVERSION BASED ON
MODULAR EXPONENTIATION
In the context of the presentwork,modular exponentiation of
refers to for some . Modular exponentiation syn-
thesizes theresidueofpowerof2moduliusingonlycombinatorial
logiccircuits.Hence, itdoesnotneedanymemoryorPEinresidue
computations. For the sake of completeness, we shall briefly in-
troduce modular exponentiation by means of an example. A com-
plete treatment of modular exponentiation can be found in [1].
Example 1: Determine the residue of ,
where
(1)
In the above, the and bits are assigned to multiplex the
functions synthesized using the and bits. In synthesizing
the functions, four different cases have to be considered.
1) Case 1: If , (1) reduces to 1, 3, 9, and 1
for ; and for
; respectively. A function can be synthesized for
different combinations of and as shown in
(2)
When the function is evaluated for different combinations of
and with , we have .
2) Case 2: If and , (1) yields
for different combinations of and . A second function
can be obtained as follows:
or (3)
3) Case 3: Similarly, function can be obtained for
and
or (4)
4) Case 4: Function is obtained when
or (5)
In all of the above functions, denotes the bitwise exclu-
sive-or logical operation. These functions use the positional
weights of the binary number system in arriving at the values
for different and . Hence, they can be written as follows:
(6)
or (7)
or (8)
or (9)
Forward conversion from binary to residues can be accom-
plished by determining the residue of each nonzero
term of the given binary number.
The conversion can either be performed on each nonzero bit
sequentially or on all bits simultaneously in parallel. Trade-offs
can be made between hardware cost and delay by using a combi-
nation of serial and parallel methods. Hardware complexity for
the parallel implementation mainly arises from the mutliplexers
since the gates used to generate the MIN terms in obtaining
can be shared among different exponents within the same mod-
ulus as well as among different moduli.
IV. PERIODICITY PROPERTIES OF RESIDUES
An interesting property of determining the residue of
is the cyclic nature of its residues. Some observations are
made about the periodicity properties of these residues.
1) For certain odd moduli , there exists a period given
by after which the residues repeat. Consider
. The residues of for different are
The periodicity of the residues is
. Whenever the period is , it is referred
to as basic period.
2) There exists for certain other odd moduli , a period that
is shorter than the basic period after which the residues
repeat.
3) In the case of even moduli other than those that belong
family, a short period exists after an initial subset of
residues [10], [11]. These three properties are illustrated
in Example 2.
Example 2: Consider . The residues of
for different are given by the set .
The residues repeat after . In this case, there exists a pe-
riod that is shorter than the basic period, namely, 6. Now, con-
sider . The corresponding residue set in this case is
. It is observed that there is a short
period of 4 after the first residue, 1.
V. FORWARD CONVERSION BASED ON
PERIODICITY PROPERTIES
In this section, we propose an improved forward converter
using modular exponentiation and periodicity property of the
residues discussed in the previous section. The proposed con-
verter differs from other converters in that it does not use any
PEs but is based on the converter proposed in [1].
In the case of odd moduli, the residues have either a basic pe-
riod or a short period. The given binary number is partitioned
based on either of these periods. Without loss of generality, ei-
ther the basic or the short period can be assumed to be . Hence,
will all have the same residues. The par-
titioned fields are added using -bit adders. The length of the
result is confined to bits by adding the carry bit back to the
result. This is illustrated by the following example.
Example 3: Consider the following 32-bit number:
PREMKUMAR et al.: IMPROVED MEMORYLESS RNS FORWARD CONVERTER 135
The decimal equivalent of this number is 421 185 084 and its
residue with respect to modulus 23 is 22. From [9, Table II],
the periodicity of the residues is 11. Therefore, the number is
partitioned into three 11-bit fields and added as shown below. A
zero has been appended at the MSB position to make the number
33 bits to facilitate easy partitioning into 11-bit fields.
In the above, the top row corresponds to the most significant
11-bit field of the given binary number while the third row cor-
responds to the least significant 11-bit field. The last row is the
result obtained by adding the three rows. In the addition, the
carry of 1 is added to the LSB of the result so that the length of
the final result is 11 bits. Forward conversion is now performed
on the result using modular exponentiation on only 11 bits as
opposed to all 32 bits as proposed in [1]. The residue compu-
tation of can be applied sequentially or in parallel.
This is discussed in the next section along with a comparison
between [1], [9] and the proposed converter.
It should be noted that even moduli frequently occur in the use
of residue number applications. One such moduli set is
. Hence, there is a need for forward conversion
for even moduli. Residue computation in this case can be per-
formed using the following theorem.
Theorem 1: Let be an even integer that is not a power of 2.
It can be expressed as where is a positive integer
and is an odd number. The residue of an integer modulo
can be expressed as
(10)
Proof: where is the quotient obtained
by dividing by and is the residue. Let be even and
composite not belonging to set. Let . Then
(11)
can also be written as
(12)
where is the quotient and is the residue obtained dividing
by
In this case
(13)
However, can be written as
(14)
is the quotient and is the remainder obtained by dividing
by
But from (12). Since is the residue of
Fig. 1. Serial implementation of binary forms for k8448k .
We know that
From (13) and (14)
(15)
Comparing with (15), we have
(16)
Note that can be obtained by simply considering the
least significant bits of . The first term in (10) is the
most significant bits of the , where is an -bit number. Eval-
uation of follows the procedure for odd modulus
since is odd. This is illustrated by the following example.
Example 4: To determine the residue of the same 32-bit
number with respect to 24, the modulus is written as a com-
posite number as . The residue is simply
the three LSB and is therefore, 4. The remaining 29 bits are
partitioned into 2-bit fields since the periodicity of 3 is 2 [9,
Table II]. The two-bit fields are then added and the result is
confined to 2 bits by adding the carry back to the result. The
modular exponentiation is then performed and the residue is
found to be 1. Using (10), the final residue works out to be 12.
VI. COMPARATIVE ANALYSIS
Most digital signal processing (DSP) algorithms can be conve-
niently executed using a wordlength of 32 bits. A suitable moduli
set would be . Architectures using
modulararithmeticneed to implementcircuits thatgenerate func-
tions such as those given by (6) to (9). Note that there are sev-
eral terms common to functions in these equations. Conse-
quently, the circuit implementing these functions can be derived
from commonly occurring terms. The common terms can even be
shared among different moduli in the given set. In the implemen-
tation, the number of multiplexers required depends on whether
serial or parallel technique is employed. The number of gates
required for both serial and parallel techniques for our proposed
converter based on this moduli set is estimated and compared
136 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006
Fig. 2. Parallel implementation of binary forms for k8448k .
with that proposed in [1] and [9]. Figs. 1 and 2 show examples of
the serial and parallel implementations of the proposed method.
These figures are specific for Example 1. Reference [7, Table IV]
discusses the number of gates that are required and delay issues
with regard to specific moduli sets such as and
that do not use memory. The other entries in [7, Table IV] discuss
comparison for arbitrary moduli sets. However, these converters
useROMandhencearenotconsideredforcomparisonhere.Since
the proposed forward converter is for arbitrary moduli sets that
do not use memory, the proposed converter complexity has been
compared with that in [1], [9]. Further, hardware and speed are
two parameters that are used in the comparison and both these
parameters are for modulus 19 only since this modulus has the
longest period, namely, 18.
A. Serial Method
For comparison with [1], a worst case scenario when all bits
in the given number are nonzero is considered. In Tables I and
II, the second column explains how the conversion is accom-
plished. The exponent 5 in column 2 indicates that there are 32
bits in the binary number. Out of 5 bits, three MSB bits are used
to generate functions. The two LSB bits are used in multi-
plexing these functions.
These are clearly shown in the case of a 4-bit exponent in
Example 1. Since each term in is accessed serially and residue
computation for each bit is performed taking one bit at a time, a
single multiplexer is sufficient. Hence, for a 32-bit number, the
residue computation is repeated a maximum of 32 times. Each
computation requires 5 clock cycles since there are 5 bits in each
residue and each bit in the residue is output sequentially. A total
of 132 nand gates is required for generating the functions.
All 32 5-bit residues are added using a single modular adder to
determine final residue.
The proposed converter requires the given decimal number to
be partitioned into fields based on the periodicity and then added.
In the chosen set of moduli, a maximum period of 18 occurs for
modulus 19 [9]. Residue computation is performed 18 times and
the 5-bit residues are added using a single modulo 19 adder. Each
of the modulo 19 adder uses 5 Full Adders (FA), 5 D Flip Flops
(FF) and two numbers of five 2-to-1 multiplexers and 5 inverters.
Theserialmethodusesone4-to-1multiplexer which isbuiltusing
three 2-to-1 multiplexers. Hence, the total number of 2-to-1 mul-
tiplexers required is 13. Each 2-to-1 multiplexer requires 9 nand
gates and hence a total of nand gates is required for
multiplexers. [1]alsorequiresamodulo19adderanda4-to-1mul-
tiplxer. Hence, the number of nand gates required in this method
is the same as that used in the proposed method. [9] uses memory
access forcomputingresidues andfor thechosenmodulus19,one
needs 19 look ups or ROM locations only. This is based on the
assumption that a residue is read from the ROM each time an ac-
cess is made. The addition time using a carry propagation adders
is the 10 full adder delay and this is common to all three methods.
Table I compares the different architectures.
The proposed method is faster by about 43%. However, if we
consider a more moderate distribution of ones and zeros in the
given number, the increase in speed may not be as high as 43%,
but still better than what is achieved in [1]. Since the number of
gates in both methods are more or less the same, no comparison
between the two methods in terms of hardware or area is made.
B. Parallel Method
In this method residue computation is performed on all bits
simultaneously and all terms required in are accessed in par-
PREMKUMAR et al.: IMPROVED MEMORYLESS RNS FORWARD CONVERTER 137
TABLE I
SERIAL METHOD: COMPARISON BETWEEN [1], [9] AND PROPOSED CONVERTER
TABLE II
PARALLEL METHOD: COMPARISON BETWEEN [1], [9] AND PROPOSED CONVERTER
allel. The residues that are output are added using a tree struc-
ture of adders. This increases the speed of conversion but at the
expense of increased number of multiplexers and adders. In the
proposed converter, residue computation is performed only on
18 bits due to the periodicity of 18 for modulus 19. This re-
quires 18 sets each set containing 5 numbers of 4-to-1 multi-
plexers. In all, for obtaining alone, 2-to-1
multiplexers are needed. For each 5-bit adder, 2 numbers of
5-bit 2-to-1 multiplexers are needed and so for 18 adders, one
needs 2-to-1 multiplexers. Hence, a total of
2-to-1 multiplexers are needed. Therefore, the
total hardware requirement is 85 FAs, 450 2-to-1 multiplexers,
85 D latches and 85 inverters in addition to 132 gates that are
required to generate the logic functions.
In [1], the conversion is accomplished using 160 numbers
of 4-to-1 or 480 2-to-1 multiplexers and 31 modulo 19 adders
connected using a tree structure. Therefore, the area requirement
is that used by 155 FAs, 155 D latches, 155 inverters and
790 numbers of 2-to-1 multiplexers. Method proposed in [9]
uses 19 memory locations for storing the residues and accesses
these residues in a sequential manner. All three schemes require
5 levels in the adder tree and as such the adder timing is the
same and therefore not included in the comparison.
As far as the hardware is concerned, the proposed architecture
uses about 44% less than that used in [1] and 66% less than that
used in [9]. In the latter case, ROM is assumed to be made up
of D flip—flops. Table II compares the performance of different
methods in terms of hardware.
In our comparison in Table I, we have assumed a worst case
when all bits in the given number are nonzero. However, if we
consider a case when the number nonzero bits is 50%, then the
total number of clock cycles for the method proposed in [1] will
be16 5.Since thedistributionof1sand0sisassumedtobe50%,
it can be assumed that there are nine 1s in 18-bit field. Hence, in
the case of the proposed method the number of cycles required is
9 5, a similar increase in speed as in the case of all nonzero bits.
The gains in speed and hardware do not necessarily imply that
these could be achieved irrespective of the moduli set chosen.
The proposed method depends on periodicity of the residues and
hence very much dependent on the choice of the moduli set.
Hence, care needs to be exercised in choosing a moduli set that
has low periodicity in the residues.
VII. CONCLUSION
Forward conversions in RNS have been traditionally imple-
mented using lookup tables. Modifications to memory based
systems mainly involved using PEs. Although the use of PEs
reduces memory requirements, they have not completely elim-
inated the use of memory. Furthermore, memory based con-
verters require reprogramming for different moduli sets. In this
paper we have presented an extension to memoryless binary to
RNS converters that makes them less hardware intensive. The
method makes use of the periodicity properties of residues ob-
tained using modular exponentiation. As a result, the new con-
verter requires 40% less hardware for parallel conversion and
achieves 43% improvement in speed for serial conversions.
ACKNOWLEDGMENT
TheauthorsareindebtedtoDr.P.V.AnandaMohanforhisvalu-
able comments and clarifications during the course of this work.
REFERENCES
[1] A. B. Premkumar, “A formal framework for conversion from binary to
residue numbers,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal
Process., vol. 46, no. 2, pp. 135–144, Feb. 2002.
[2] G. Alia and E. Martinelli, “A VLSI algorithm for direct and reverse con-
version from weighted binary number to residue number system,” IEEE
Trans. Circuits Syst., vol. CAS-31, no. 12, pp. 1425–1431, Dec. 1984.
[3] , “VLSI binary—Residue converters for pipelined processing,”
Comput. J., vol. 33, no. 5, pp. 473–475, 1990.
[4] N. S. Szabo and R. I. Tanaka, Residue Arithmetic and Its Applications
to Computer Technology. New York: McGraw-Hill, 1967.
[5] R.M.CapocelliandR.Giancarlo,“EfficientVLSInetworksforconverting
an integer from binary system to residue number system and vice versa,”
IEEE Trans. Circuits Syst., vol. 35, no. 11, pp. 1425–1431, Nov. 1988.
[6] S. Piestrak, “Design of residue generators and multioperand modular
adders using carry save adders,” in Proc. 10th IEEE Symp. Comput.
Arith., Jun. 1991, pp. 100–107.
[7] , “Design of residue generators and multioperand modular adders
using carry save adders,” IEEE Trans. Comput., vol. 43, no. 1, pp. 68–77,
Jan. 1994.
[8] , “Design of squarers modulo a with low-level pipelining,” IEEE
Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 49, no. 1, pp.
31–41, Jan. 2002.
[9] P. V. A. Mohan, “Novel design for binary to RNS converters,” in Proc.
IEEE Int. Symp. Circuits Syst., 1994, pp. 357–360.
[10] , Residue Number Systems: Algorithms and Architectures. Nor-
well, MA: Kluwer, 2002.
[11] , “Efficient design of binary to RNS converters,” J. Circuits Syst.
Comput., vol. 9, no. 3/4, pp. 145–154, 1999.
MASSEY UNIVERSITY
MASSEY RESEARCH ONLINE http://mro.massey.ac.nz/
Massey Documents by Type Journal Articles
Improved Memoryless RNS Forward
Converter Based on the Periodicity of Residues
Premkumar, A. B.
2006-02-01
http://hdl.handle.net/10179/9618
20/01/2020 - Downloaded from MASSEY RESEARCH ONLINE
