








     
 
Kuczborski, W., Attikiouzel, Y. and Crebbin, G. (1994) 
Decomposition of logic networks with emphasis on signed digit 
arithmetic systems. IEE Proceedings - Circuits, Devices and 



















Decomposition of logic networks with emphasis on 




Indexing terms: Decomposition of logic networks, Logic networks, Signed digit arithmetic 
Abstract: This paper describes an attempt to 
combine advantages of the signed digit number 
representation, applied at the word-level, and the 
residue number system applied at the digit-level, 
to achieve arithmetic decomposition of high-radix 
systems. Also introduced is a new decomposition 
algorithm for multiple-output Boolean functions 
based on partition products. Analysis of the pro- 
posed new method of arithmetic decomposition, 
when compared to an approach based on the 
theory of digit sets, reveals a more efficient use of 
data storage plus a higher degree of structural uni- 
formity. The practical importance of the proposed 
method has been tested on a number of designs 
for the field programmable gate arrays. Compari- 
son with a commercially available CAD system 
indicates a significant reduction in implementation 
complexity. 
1 Introduction 
Interest in the arithmetic decomposition of high-radix 
systems and the Boolean decomposition for multiple- 
output functions was prompted by the needs of real-time 
grey-scale morphological processors. The enormous 
hardware complexity of processors applying the prin- 
ciples of umbra transform [4] or threshold decomposi- 
tion [l 11 directed attention towards the direct 
implementation of morphological transformations. The 
direct approach demands a fast implementation of two 
conflicting operations: (1) addition and maximum search 
(dilation); (2) subtraction and minimum search (erosion). 
A conflict results from the characteristic of conventional 
arithmetic systems, which require opposite directions of 
digit processing. 
Image data throughputs of the order of 10' pixels per 
second and a high degree of design regularity can be 
achieved by applying the principles of digit-level systolic 
arrays [7]. However, the conflict between addition and 
magnitude comparison introduces loops into the depend- 
ence graph, and the whole problem becomes non- 
computable. The solution adopted is to replace the 
C: IEE, 1994 
Paper 1109G (C2), first received 22nd February and in revised form 
20th December 1993 
W. Kuczborski is at Edith Cowan University, Department of Computer 
and Communication Engineering, Joondalup WA6065, Australia 
Y. Attikiouzel and G. Crebhin are with The University of Western Aus- 
tralia, Centre for Intelligent Information Processing Systems, Nedlands 
WA6009, Australia 
I E E  Proc.-Circuits Deuices Sysf., Vul.  141. No. 4,  August 1994 
conventional arithmetic system by the signed digit 
number representation (SDNR). The SDNR eliminates 
loops in the dependence graph by allowing a uniform 
direction of addition/subtraction and magnitude com- 
parison ~ from the most to the least significant digit. 
Moreover, carry propagation is restricted to a single 
digit. 
SDNR systems are more difficult to implement than 
conventional arithmetic circuits. Boolean functions, 
which define the SDNR systems, are more complex and 
have a larger number of independent variables. Appar- 
ently some new methods of logic synthesis are required 
for a wider application of this non-conventional data rep- 
resentation. In most cases, especially for systems of 
radices higher than two, a complex logic network must 
be replaced by a cluster of simpler ones. 
Carter and Robertson [Z] applied the theory of digit 
sets to replace high-radix modules by simpler modules of 
radix two. An alternative approach is proposed here, 
based on a combination of the SDNR, applied at the 
word level, and the residue number system (RNS), 
applied at the digit level. Such a combination simplifies 
the complexity of basic modules and reduces the number 
of inputs to these modules. If the selected implementation 
technology, e.g. programmable logic arrays (PLAs), gate 
arrays, or field programmable gate arrays (FPGAs), 
requires a further reduction of the input number, then 
another design step is applied - the decomposition of 
the multivalued Boolean functions [3]. A new decompo- 
sition algorithm is presented here, based on partition 
products. The algorithm has been tested on FPGAs from 
Xilinx which allow five Boolean variables per logic 
module. 
Being aware of the speed and density limitations of the 
technology, when compared with full custom or semi- 
custom designs, the choice was influenced by two 
strengths of FPGAs: (1) fast and zero-cost modifications 
of the prototypes, a significant virtue for the unconven- 
tional data representation employed; (2) the very 
complex SDNR functions can be efficiently implemented 
by look-up tables of the Xilinx FPGAs. 
2 
2.1 SDNR 
The SDNR [l] has become an attractive alternative for 
dedicated VLSI systems such as morphological pro- 
cessors [ 6 ] ,  CORDIC processors [13], and IIR filters [9]. 
The SDNR is a redundant data format which requires 
additional memory space. However, the redundancy 
reduces carry propagation to at most a single digit. The 
Principles of t h e  signed digi t  representation 
(SDNR) and t h e  residue number  system (RNS) 
307 
SDNR has a number of important advantages: parallel 
processing of all digits, modularity, variable operand 
lengths, regular logic structures, and local conhections. 
SDNR, unlike RNS, is a positional number system: 
i = O  
The digits ai can be negative, zero, or positive. For sym- 
metric digit sets {-a, -a + 1, .. ,, - L O ,  1, ..., a - I, a), 
the allowed range of a becomes 
(minimal redundant set), and 
a, = r/2 + 1 
a,,=r-1 
(maximal redundant set), where 7 is the radix. A sensible 
choice of carry values is within the range -1 to 1. The 
threshold value t, which determines carry generation, 
must be within the range: 
l b r - a b t d a - 1  
For the addition SUM = X + Y, the sum digit at the ith 
position is a function of only four arguments, X i ,  &, 
Xi-  ,, &- 1, and can be calculated in two stages: 
Stage I: 
if X i +  & > t  then Ci-l = 1 
elseif X i + Y , <  -t thenC,. . ,=-I 
else ci-l = 0 
Si = Xi + & - r C i - ,  
S U M ,  = si + ci 
Stage 2: 
2.2 RNS 
The RNS [S, 12, 141 is defined by a set of relatively prime 
moduli 
{PI, ~ 2 ,  . . ., pn) 
The dynamic range of the numbers is defined by the 
product of all moduli: 
0 ... (pl x p2 x ... x pn) - 1 
The use of the RNS for coding the SDNR digits requires 
a range which includes negative numbers as well: 
-@l x p2 x ... x pn/2) ... (pl x p2 x ... x pn/2) - 1 
A numerical value N can be converted into its n-digit 
RNS equivalent 
N = (XP1, XP2, . . . , XPn) 
by modulo operations: 
X P 1 =  N mod pl, 
XP2 = N mod p2, .. . , XPn = N mod pn 
The main advantage of the residue arithmetic lies in its 
parallelism - additions, subtractions, and multiplica- 
tions can be executed at  each digit independently. This 
parallelism was utilised in the first stage of logic synthesis 
- arithmetic decomposition (next section). 
Despite its parallelism, the RNS is not an efficient r e p  
resentation of image data in the morphological processor. 
The reason for this is the strongly non-linear character of 
mathematical morphology, requiring constant magnitude 
308 
comparisons. According to Winograd’s lower bound 
theorem [14], the speed potential of the RNS can be fully 
utilised only if the number of additions significantly out- 
numbers the number of magnitude comparisons. 
However, the usual difficulties with RNS magnitude com- 
parisons or sign detections can be avoided if the applica- 
tion of the RNS is restricted to SDNR digit level. 
3 
As mentioned above, the SDNR system is used to rep- 
resent data of the grey-scale morphological processor. 
Although the signed digit representation demands addi- 
tional storage space for signal/image data, memory 
requirements can be reduced for systems with higher 
radices. Higher radices have other advantages too. They 
reduce the interconnection complexity, which is an 
important consideration in view of the fact that intercon- 
nections occupy approximately 70% of silicon area of the 
VLSI devices. Finally, high-radix systems may be faster 
- for example, multiplication will be completed after 
fewer algorithmic steps. 
The logic synthesis procedure is executed in two con- 
secutive steps. The first step, based on a combination of 
SDNR and RNS, reduces the number of inputs to each 
module by a factor of two. The SDNR, applied at the 
word level, allows easy magnitude comparisons and sign 
detections. The RNS, applied at the digit level, decom- 
poses each SDNR digit into simpler networks. Since the 
dynamic range of RNS values is very small and is deter- 
mined by the SDNR digit set, the RNS conversions and 
sign detections are handled by unified and simple logic 
circuits. It is seen in the next section that the SDNR/RNS 
decomposition method reduces storage requirements, 
when compared to an alternative method based on the 
theory of digit sets [2]. 
An important decision for designers of an RNS system 
is the choice of relatively prime moduli. Assuming the 
system to be implemented using conventional digital cir- 
cuits and restricting the maximum number of inputs in 
each module to six, then, the choice of moduli will 
depend on the required set of SDNR digits. Table 1 
Table 1 : Choice of moduli for various radicas 
Arithmetic decomposition based on the SDNR 
and the RNS 
Bits/SDNR Moduli Dynamic a Maximum 
diait 01.02 ranae 01 x0222a+l  radixfor , .  , . -  
p1 ;p2 minimum 
redundant 
SDNR 2e-1 
3 8 a 3 5 
4 2.7 14 6 11 
5 4 .7  2a 13 25 
6 7 .8  56 27 53 
specifies the choices of moduli for various radices of the 
arithmetic system. The moduli guarantee maximum 
dynamic ranges of digit sets for three, four, five and six 
bits per SDNR digit. It is assumed that all digit sets are 
symmetric. 
The principle of the arithmetic SDNR/RNS decom- 
position is based on RNS parallelism, which allows inde- 
pendent handling of the RNS digits because of carry-free 
arithmetic. 
Assume it is required to add two SDNR digits of a 
radix-10 system. To represent the maximum redundant 
digit set of { -9, . . . , 9}, the sign-and-magnitude (or two’s 
complement) code would require five bits. Then, a single- 
digit adder would require 10 inputs - too many for an 
IEE Prm-Circuits Devices Syst., Vol. 141. No. 4, August 1994 
efficient implementation of the circuit using FPGAs or 
similar technology. However, if the sign-and-magnitude 
code is replaced by the RNS, no module will have more 
than six inputs. 
Table 2 specifics the RNS representation of the digit 
set { - 9, . . . , 9) for moduli pl = 4 and p 2  = 7. The single 
line indicates intermediate sums which require a correc- 
tion by - 10 and generate carry = 1 ;  the double line 
indicates a correction by + 10 and carry = - 1. 
(0.0) 




Table 2: RNS representation of the digit set {-9, . . . , 9) for 
moduli pl - 4. p2 - 7 




17-11 (1, 3) carry 










Table 3 clarifies SDNR addition for r = 10, a = 9 and 
t = 8. The equivalent SDNR/RNS addition operates 
independently on each module to calculate a non- 
corrected intermediate sum. If the intermediate sum gen- 
erates C,-  >0, a correction by & 10 is required. Finally, 
SUM, is calculated by adding carries. 
This example of the SDNR/RNS addition reveals that 
a proper correction of intermediate sums and a carry 
generation require identification of four regions of the 
non-corrected intermediate sums 
region I C,-l  = 0 ( -8  < S, < 8) 
region I1 C , -  = 1 (S, = 9) 
region111 C , - l =  k1 ( 1 0 < S t < 1 8  
region IV Ct-l = -1 (S, = -9) 
or -18 < S, < -10) 
Identification of this region demands a magnitude test of 
non-corrected sums. Additionally, in the case of region 
111, where positive and negative values have identical 
RNS representations, it is required to identify the signs of 
both input arguments. The structure given in Fig. 1 
implements all necessary operations. It should be stressed 
that no module requires more than six inputs despite the 
circuit's suitability for a wide range of radices, between 3 
and 53. 
It will be even possible to  simplify the unified circuit of 
Fig. 1 if sufficient redundancy of module selection elimin- 
ates the ambiguous region 111. We must make sure that 
the condition 
n 2 4 x a + l  
is true, where n is the product of RNS moduli (the 
dynamic range) and a is the greatest element of the 
SDNR digit set. Fig. 2 shows the simplified structure of 








1 SUMPl I SUMP2 
Fig. 1 The un@ed structure of the SDNRIRNS odder ( I .  
argument - ( X P l ,  X P 2 ) :  2. argument = ( Y P I ,  YPZ),  non-corrected 
intermediate s ~ t  = (SPI ,  SPZ), corrected intermediate sum = (SPl' ,  
SP.?'),JinaI swn = (SUMPI ,  SUMP2) )  
XPI VPI XP2 VPZ 
I SUMPl I SUMP2 
Fis. 2 SDNRIRNS adder for  disjoint sets 
4 SDNR/RNS arithmetic decomposition versus 
arithmetic decomposition based on the theory 
of digit sets 
The importance of new design methods for the high-radix 
systems was appreciated by Carter and Robertson [2] 
Table3:SDNRadditionforr-10.0 = 9 a n d r = 8  
SDNR addition SDNRIRNS addition 
9 3 f 3 (-8767) <3.5)<3.3)(1.0)<3.3) 
+ (1 I 1 ) (O. 1 )(2.1 >(Os 4) + 1  8 6 4 (+1744) 
(0. 6)(3,4)(3.1)(3.0> non-cor. S 
8 1 3 7  s (O.OX2.4X2. 3XO. 0 )  correct. S 
0 1 i 0 carries (0.6)<1,1)<1.4)(3. 0) 
carries <o. O X ? .  1 )<3.6><0.0> 
1EE Proc.-Circuits Devices Syst., Vol. 141, No.  4, August 1994 309 
who developed a unified design technique based on the 
theory of digit sets. The theory applies the concept of the 
digit set and, not unlike our method, is suitable for 
redundant SDNR systems. 
A digit set (6") is defined by two parameters - 6, the 
diminished cardinality (that is, the number of digits less 
one) and a, the offset (that is, the magnitude of the smal- 
lest digit). 
For example, the conventional digit set of a radix-10 
system is represented by (9') or the radix-10 maximum 
redundant SDNR digit set is defined by (1g9). 
The decomposition process replaces a digit set of a 
high diminished cardinality by weighted sums of digit 
sets of lower diminished cardinalities (ternary and binary 
sets). 
For example, 
(le9) --t S(1') + 4(1') + 2(2') + (2') 
An important characteristic of any decomposition 
method is storage requirements. The above decomposed 
digit set requires 1 + 1 + 2 + 2 = 6 bits (two bits/ternary 
set, one bitpinary set). 
The decomposition method described here reduces 
storage requirements for the same digit set to 5 bits 
(moduli pl  = 4 and p 2  = 7 are to be selected). 
Assuming 32-bit words for a signal/image processing 
system, then Fig. 3 compares the dynamic ranges for 




Dynamic ranges ofSDNRIRNS and digit set systemsfor 32-bit 
radices between 3 (smallest radix for the SDNR) and 53 
(the largest radix for six bits per SDNR digit). For each 
radix an optimum digit set is selected (between minimum 
and maximum redundancy) which provides a maximum 
dynamic range. The figure clearly shows that the SDNR/ 
RNS decomposition method allows a more efficient use 
of data storage. The additional advantage of our 
approach is a higher degree of structural uniformity for a 
wide range of radices. The structure of Fig. 1 can be 
applied to any radix between 3 and 53. On the other 
hand, the set theoretical approach, although very inter- 
esting from a mathematical point of view, requires differ- 
ent structures for different radices. 
5 
The purpose of the second stage of logic synthesis, 
Boolean decomposition, is the replacement of a complex 
Theory of decomposition of Boolean functions 
310 
logic network by a number of simpler ones. In the case of 
the FPGAs, the primary objective is the elimination of 
networks with more than five inputs using a minimum 
number of configurable logic blocks (CLBs). Each CLB 
can implement any function of five variables or two func- 
tions of four variables. For functions of more than five 
variables, there are two optimisation criteria: minimal 
number of required CLBs and minimal number of logic 
levels. The complexity of decomposed functions is irrele- 
vant, since functions are implemented via look-up tables. 
The above optimisation criteria require new synthesis 
algorithms which return efficient designs. The very 
general method of decomposition based on Shannon's 
expansion theorem is not very useful because it requires a 
large number of CLBs. For example, a function of seven 
variables would require as many as seven CLBs. A less 
general but more efficient approach is simple disjoint 
decomposition [3]. It expresses the Boolean function 
F ( X )  as G(H(Y),  2) where Y, the set of bound variables, 
and Z, the set of free variables, are disjoint sets and the 
union of Y and Z equals X. The problem here is that 
only a tiny percentage of all possible functions meets the 
above requirement; for instance, 0.00046% in the case of 
five arguments [lo]. However, many functions applied in 
practice, especially functions which are not fully defined 
or sequential functions with optimum state assignments, 
can be decomposed in this way. 
The less restrictive simple non-disjoint decomposition 
allows common elements in the bound and free variables: 
F(X)  = G(H(C, Y), C, Z), where C is a set of common 
variables (Fig. 4). 
c.2 C.Y 
I F M )  
Fig. 4 Simple non-disjoint decomposition 
Other possibilities include multiple disjoint, iterative 
disjoint, and complex disjoint decompositions. 
Conditions for all types of decompositions have been 
well documented, althoutgh there is a shortage of efficient 
implementation algorithms. The spectral technique 
(Walsh transform) [lo] is an attempt to avoid time- 
consuming searches, although its complexity has pre- 
vented its practical implementation. Another approach 
has been proposed in Reference 8. It uses symbolic parti- 
tion description (SPD) for multiple-output functions. 
Although the decomposition procedure is very interesting 
from a mathematical point of view, the final stage of the 
procedure requires time-consuming merge-and-check 
operations on partitions. 
6 Boolean decomposition based on partition 
products 
The task of the decomposition algorithm is to find a 
simple disjoint or non-disjoint decomposition of a 
multiple-output function. The objective is to replace a 
logic network with too many arguments by an intercon- 
IEE ProcXircuits Devices Syst., Vol. 141, No.  4, August 1994 
nected pair of networks with a reduced number of inputs. 
The algorithm is explained with a specific example. Then, 
the general pseudo-code is given, followed by a compari- 
son between the algorithm and a commercially available 
CAD system. 
Consider the three output function Si, representing the 
first stage of a radix-4 SDNR adder: 
if Xi + yi > 2 then Ci+, = 1 
else if Xi + yi < -2  
else ci-l = 0 
s i = x i + y i - 4 x c i - 1  
then Ci+l  = -1 
Table 4 specifies all possible function values. Letters A, 
..., E represent function vectors OOO, ..., 110. It is 
assumed that the adder uses the maximum redundant set 
of digits { -3, -2, -1 ,  0, 1, 2, 3)  and each digit is rep- 
resented by the three-bit radix-magnitude code. 
For any valid permutation of common, free, and 
bound arguments, the algorithm calculates the number of 
required outputs of the network H in Fig. 4. If the total 
Table 4:  Function values for radix-4 SDNR adder 
x, y, s, 
0 0 0 0 0 0  O O O A  
0 0 0 0 0 1  0 0 1 8  
0 0 0 0 1 0  O l O C  
0 0 0 0 1 1  l O l D  
0 0 0 1 0 1  l O l D  
0 0 0 1 1 0  l l O E  
0 0 0 1 1 1  0 0 1 8  
0 0 1 0 0 0  0 0 1 8  
0 0 1 0 0 1  0 1 0 c  
0 0 1 0 1 0  1 0 1 D  
0 0 1 0 1 1  O O O A  
0 0 1 1 0 1  O O O A  
0 0 1 1 1 0  1 0 1 0  
0 0 1 1 1 1  l l O E  
0 1 0 0 0 0  O l O C  
0 1 0 0 0 1  1 0 1 D  
0 1 0 0 1 0  O O O A  
0 1 0 0 1 1  0 0 1 8  
0 1 0 1 0 1  0 0 1 8  
0 1 0 1 1 0  O O O A  
0 1 0 1 1 1  l O l D  
0 1 1 0 0 0  l O l D  
0 1 1 0 0 1  O O O A  
0 1 1 0 1 0  0 0 1 8  
0 1 1 0 1 1  O l O C  
0 1 1 1 0 1  O l O C  
0 1 1 1 1 0  0 0 1 8  
0 1 1 1 1 1  O O O A  
1 0 1 0 0 0  l O l D  
1 0 1 0 0 1  O O O A  
1 0 1 0 1 0  0 0 1 8  
1 0 1 0 1 1  O l O C  
1 0 1 1 0 1  1 1 0 E  
1 0 1 1 1 0  0 0 1 8  
1 0 1 1 1 1  O O O A  
1 1 0 0 0 0  l l O E  
1 1 0 0 0 1  l O l D  
1 1 0 0 1 0  O O O A  
1 1 0 0 1 1  0 0 1 8  
1 1 0 1 0 1  0 0 1 8  
1 1 0 1 1 0  O O O A  
1 1 0 1 1 1  1 0 1 D  
1 1 1 0 0 0  0 0 1 8  
1 1 1 0 0 1  l l O E  
1 1 1 0 1 0  l O l D  
1 1 1 0 1 1  O O O A  
1 1 1 1 0 1  O O O A  
1 1 1 1 1 0  l O l D  
1 1 1 1 1 1  l l O E  
I E E  Proc.-Circuits Devices Syst., Vol. 141, No. 4, August 1994 
number of inputs of the network G does not exceed a 
present maximum, then the algorithm accepts the permu- 
tation and, in the case of FPGAs, calculates the number 
of required CLBs. 
If the selected permutation contains common 
argument(s), the algorithm creates a function table for 
each combination of common argument(s), otherwise a 
single function table is generated. For example, the set 
combinations 
C = (3) Z = {4, 6) Y = (1, 2, 5 )  
which correspond to  the network 
3 4 6  q5 h = ?  
F ( X )  
leads to the following tables: 
C : 0 (i.e. xg = 0) 
0 1 2 3 4 5 6 7  
Z : O A C C A .  . E A  
l B D D B . .  D B  
2 . E . A  . . .  A 
3 D B B D . .  , D  
Y :  
0, 3, 7, ; 1, 2, ; 6, ; (partition products) 
0, 3, 7, ; 1, 2, ; 6, ; 
0, 3, 7, ; 1, 2, ; 6, ; 
C : 1 (i.e. x3 = 1) 
Y: 
0 1 2 3 4 5 6 1  
Z : O  B D D B D B B D 
1 C A A C A C E A  
2 . D . B .  B . D  
3 A E C A E A A E  
0, 3, 5, ; 6, ; 1, 2, 4, 7, ; (partition products) 
0, 3, 5, ; 1, 2, 4, 7, ; 6, ; 
0, 3, 5, ; 2, ; 1, 4, 7, ; 6, ; 
Each column of the table@) represents a unique com- 
bination of bound variables (one, two and five for our 
specific example). Rows of the function table represent 
combinations of free arguments (4, 6). The next step 
creates partitions for each line using function values as 
equivalence criteria. For example, the first line of the 
lower table is represented by the partition 
0, 3, 5, 6; 1, 2, 4, I 
In the case of rows which are not fully defined, don’t- 
cares will be added to every partition block. For example, 
the third row of the lower table is represented by the 
pseudo-partition 
1, I, 0, 2, 4, 6; 3, 5, 0, 2, 4, 6 
Finally, the algorithm calculates the product of all parti- 
tions and pseudo-partitions. After each product calcu- 
lation, all redundant blocks caused by don’t-cares are 
311 
ducts simultaneously, as the partition product is an 
This algorithm iS at least one order of magnitude 
associative operation. 
faster than the XACT-CAD system and significantly 
7 Design examples 
A number of experiments indicated that the XACT-CAD systems could not efficiently handle the implementation of 
SDNR modules with radices higher than two. In contrast, a decomposition of the modules based on the unified 
structure of the SDNR/RNS adder (see Fig. 1) and the new decomposition algorithm leads to highly efficient hardware 
implementations. To illustrate the potential of the new approach, the complex casc of a radix-53 SDNR/RNS is 
presented. As was mentioned in Section 3, high-radix arithmetic has a number of functional and technological advant- 
ages. Most importantly, the freedom of radix choice offered by the new decomposition method may dramatically widen 
the data dynamic range (see Fig. 3). 
According to Table 1 the (52”) digit set may apply moduli pl = 7 and p2 = 8. All modules of the SDNR/RNS 
adder which have more than five inputs must be decomposed by the algorithm from the previous section. For instance, 
the modulo-8 adder (the ‘+mod p2’ block of Fig. 1) is a network of six Boolean arguments XP21, XP22, XP23, YP21, 
312 IEE proC.-Circuirs Devices Syst., Vol. 141, No. 4, Augwz 1994 
R $ ~ - ~ 2 $ ~ R  adder 5 17 8 
Rad~x-4SDNRmax 9 6 36 18 
(S1,82.83.C1, CZ) 
(d1~dZ*d3~m1~m2*m3) 
YP22, YP23 which encode values 0, . . . , 7 of XP2 and YP2. The algorithm found that the following segregation of 
arguments reduces hardware requirements to just three CLBs: 
XP21, XP22, YP21, YP22: 
0 1 2  3 4 5 6 7 8 9 1011 1 2 1 3 1 4 1 5  
X P 2 3 , Y P 2 3 : O A C  E G C  E G A E  G A  C G A C E 
I B D F H D F H B F H B D H B D F  
2 B D F H D F H B F H B D H B D F  
3 C E G A E G A C G A C  E A C E G 
 Conclusions 
The two-stage decomposition Process leads to a unified 
Symbols A, . . . , H represent values 0, . . . , 7 of  the^ non- 
corrected intermediate sum SP21, SP22, SP23. The 
product of all partitions has only four blocks: 
(0, 7, 10, 13; 1, 4, 11, 14; 2, 5, 8, 15; 3, 6, 9, 12) 
and, as a result, only two internal outputs (h  = 2) are 
required : 
XP23 YP23 XP21 XP22 YP21 YP22 l5-. 
SP21 S 22 SP23 
Repeating the decomposition algorithm for all modules 
with more than five inputs leds to a very efficient 
mapping of the radix-53 SDNR/RNS adder with just 25 
CLBs (Fig. 5). 
XPl XP2 YPl YP2 XPl YPI XP2 YP2 q2 3 CLBs 
3 CLBs 2 CLBs 
1 SUMP1 I SUMP2 
Fig. 5 Radix-53 SDNRIRNS adder 
The significance of the SDNR/RNS data representa- 
tion concept is not restricted to linear systems. The mag- 
nitude comparison, required by morphological 
transformations, median filters, and other non-linear 
methods, can be based on a subtractor and a sign detec- 
tor. The sign of an SNDR/RNS difference is determined 
by the sign of the most significant non-zero digit [l]. 
Again, this algorithm leads to an efficient mapping of the 
network. For example, only two CLBs are required for a 
sign detector of a comparator with the dynamic range of 
+27 27 21 27,, = +233,280. 
IEE Proc.-Circuits Devices Sysr., Vol. 141, No.  4, August 1994 
structure, suitable for- a wide ;ange of linear and non- 
linear systems. It is applicable to high-radix systems 
which could not be designed using conventional methods. 
A comparison of the first decomposition stage, based 
on SDNR/RNS data representation with a set theoretical 
approach, indicates a significant reduction of memory 
requirements. Another benefit is a higher degree of struc- 
tural uniformity for a wide range of radices. 
An additional synthesis stage, which uses a new algo- 
rithm for simple Boolean decompositions, allows a 
further reduction in module complexity and an efficient 
hardware implementation based on conventional tech- 
nologies, such as PLAs, gate arrays, and FPGAs. Both 
stages lead to a reduction in hardware complexities by 
factors of 2-3 when compared to  designs generated by a 
complex, commercially available CAD system. 
Although the research results presented here have been 
inspired by our work on morphological processors, their 
significance carries over to other linear and non-linear 
signal and image processing systems, such as FFT, rank- 
order filters, and encryption devices. 
9 References 
1 AVIZIENIS, A.: ‘Signed-digit representations for fast parallel arith- 
metic’, IRE Transactions on Electronic Computers, 1961, pp. 389- 
400 
2 CARTER, T.M., and ROBERTSON, J.E.: ‘The set theory of arith- 
metic decomposition’, IEEE Trans. Comput., 1990, 39, (S), pp. 993- 
1005 
3 CURTIS, H.A.: ‘A new approach to the design of switching circuits’ 
(D. Van Nostrand, Princeton, 1962) 
4 GIARDINA, C.R., and DOUGHERTY, E.R.: ‘Morphological 
methods in image and signal processing’ (Rentice-Hall, Englewood 
Cliffs, 1988) 
5 JULIEN, G.A., BIRD, P.D., CARR, J.T., TAHERI, M., and 
MILLER, W.C.: ‘An efficient bit-level systolic cell design for finite 
ring digital signal processing applications’, J .  VLSI  Signal Process., 
1989, I, (3), pp. 189-207 
6 KUCZBORSKI, W., ATTIKIOUZEL, Y., and CREBBIN, G.: 
‘Video rate morphological processor based on a redundant number 
representation’, in B.G. BATCHELOR, M.J.W. CHEN, and F.W. 
WALTZ (Eds.): ‘Machine vision, architectures, integration, and 
applications’ (SPIE, Bellingham, 1992), pp. 249-260 
7 KUNG, S.Y.: ‘VLSI array processors’ (Prentice Hall, Englewood 
Cliffs, 1988) 
8 LUBA, T., JASINSKI, K., and KRASNIEWSKI, A.: ‘Combining 
serial decomposition with topological partitioning for effective 
multi-level PLA implementations’. in P. MICHEL, and G. 
SAUCER (Eds.): ‘Logic and architecture synthesis’ (North-Holland, 
Amsterdam, 1991), pp. 243-252 
9 MCNALLY, O.C., McCANNY, J.V., and WOODS, R.F.: ‘A 40 
Magasample IIR filter chip’, in M. VALERO, S.Y. KUNG, T. 
LANG, and J.A.B. FORTES (Eds.): ‘Special purpose architectures’ 
(IEEE Computer Society Press, 1991), pp. 416-430 
10 POSWIG J.: ‘Disjoint decomposition of Boolean functions’, IEE 
Prof .  E, 1991, 138, (l), pp. 48-56 
11  YEONG-CHYANG SHIH, F., and MITCHELL, O.R.: ‘Threshold 
313 
decomposition of gray-scale morphology into binary morphology’, 
IEEE Trans. Patfern Anal. Mach. Intell., 1989,11, (l), pp. 31-42 
I2 SODERSTRAND, M.A., JENKINS, W.K., JULLIEN, G.A., and 
TAYLOR, F.I.: ‘Residue number systems arithmetic: modern appli- 
cations in digital signal processing’(1EEE Press, New York, 1986) 
I3  TAKAGI, N., ASADA, T., and YAJIMA, S.: ‘Redundant CORDIC 
methods with a constant scale factor for sine and cosine computa- 
tion’, l E E E  Trans. Cornput., 1991,40, (9). pp. 989-994 
14 TAYLOR, F.J.: ‘Residue arithmetic: a tutorial with examples’, Com- 
puter, 1984, pp. 5&62 
15 TORNG, H.C.: ‘Switching circuits - theory and logic design’ 
(Addison-Wesley, Reading, 1972) 
314 IEE Proc.-Circuits Devices Syst., Vol. 141, No. 4, August 1994 
