Efficient modular arithmetic units for low power cryptographic applications by Modugu, Rajashekhar Reddy
Scholars' Mine 
Masters Theses Student Theses and Dissertations 
Fall 2010 
Efficient modular arithmetic units for low power cryptographic 
applications 
Rajashekhar Reddy Modugu 
Follow this and additional works at: https://scholarsmine.mst.edu/masters_theses 
 Part of the Computer Engineering Commons 
Department: 
Recommended Citation 
Modugu, Rajashekhar Reddy, "Efficient modular arithmetic units for low power cryptographic applications" 
(2010). Masters Theses. 6645. 
https://scholarsmine.mst.edu/masters_theses/6645 
This thesis is brought to you by Scholars' Mine, a service of the Missouri S&T Library and Learning Resources. This 
work is protected by U. S. Copyright Law. Unauthorized use including reproduction for redistribution requires the 
permission of the copyright holder. For more information, please contact scholarsmine@mst.edu. 
EFFICIENT MODULAR ARITHMETIC UNITS FOR LOW POWER 
CRYPTOGRAPHIC APPLICATIONS 
by 
RAJASHEKHAR REDDY MODUGU 
A THESIS 
Presented to the Faculty of the Graduate School of the 
MISSOURI UNIVERSITY OF SCIENCE AND TECHNOLOGY 
In Partial Fulfillment of the Requirements for the Degree 
MASTER OF SCIENCE IN COMPUTER ENGINEERING 
2010 
Approved by: 
Dr. Minsu Choi, Advisor 
Dr. Yiyu Shi 
Dr. Sabra Sedighsarvestani 
© 2010 
Rajashekhar Reddy Modugu 
All Rights Reserved 
111 
PUBLICATION THESIS OPTION 
This thesis consists of following articles that have been published as follows, and 
the papers were formatted in the style used by the university. 
The first paper presented in pages 02-15 entitled "A FAST LOW-POWER 
MODULO 2N+1 MULTIPLIER DESIGN" was published in the proceedings ofthe IEEE 
IMTC(Instrumentation and Measurement Technology Conference), 2009. 
The second paper presented in pages 17-28 entitled "AN EFFICIENT IDEA 
CRYPTO-HARDWARE USING NOVEL MODULAR ARITHMETIC 
COMPONENTS" was published in the proceedings of the IEEE IMTC(Instrumentation 
and Measurement Technology Conference), 2010. 
The third paper presented in pages 31-55 entitled "Efficient On-line Self-
Checking Modulo 2n+ 1 Multiplier Design" was accepted for Special Issue of IEEE 
Transactions on Instrumentation and Measurement, 2010. 
IV 
ABSTRACT 
The demand for high security in energy constrained devices such as mobiles and 
PDAs is growing rapidly. This leads to the need for efficient design of cryptographic 
algorithms which offer data integrity, authentication, non-repudiation and confidentiality 
of the encrypted data and communication channels. The public key cryptography is an 
ideal choice for data integrity, authentication and non-repudiation whereas the private key 
cryptography ensures the confidentiality of the data transmitted. The latter has an 
extremely high encryption speed but it has certain limitations which make it unsuitable 
for use in certain applications. Numerous public key cryptographic algorithms are 
available in the literature which comprise modular arithmetic modules such as modular 
addition, multiplication, inversion and exponentiation. Recently, numerous cryptographic 
algorithms have been proposed based on modular arithmetic which are scalable, do word 
based operations and efficient in various aspects. 
The modular arithmetic modules play a crucial role in the overall performance of 
the cryptographic processor. Hence, better results can be obtained by designing efficient 
arithmetic modules such as modular addition, multiplication, exponentiation and 
squarmg. 
This thesis is organized into three papers, describes the efficient implementation 
of modular arithmetic units, application of these modules in International Data 
Encryption Algorithm (IDEA). Second paper describes the IDEA algorithm 
implementation using the existing techniques and using the proposed efficient modular 
units. The third paper describes the fault tolerant design of a modular unit which has 
online self-checking capability. 
v 
ACKNOWLEDGMENTS 
It gives me great pleasure to thank all the people who have supported me and 
made this thesis possible. I would like to thank Dr. Minsu Choi, who has been a great ad-
visor throughout my Master's program and it has been a pleasure working with him. He 
has been a continuous source of motivation and has helped me develop my skills. 
I would like to express my sincere gratitude to Dr. Yiyu Shi and Dr. Sahra 
Sedighsarvestani who are serving as committee members. I would like to thank Dr. 
Beetner, Dr. McCracken, Dr. Al-Assadi and Dr. Ali Hurson who have taught me 
excellent courses during my Master's program. I am grateful to the department secretary, 
Mrs. Regina Kohout who has guided me through departmental obstacles and paperwork. I 
give sincere thanks to Jun Wu for associating with me in research and publications. 
I would like to also thank my friends Vikram Surendra, Murali Bottu, Dr. 
Bharath, Dr. Suhas, Arun Sharma, Komal and many more friends who have supported me 
continuously throughout my Degree program and provided me a refreshing environment. 
Most importantly, I would like to thank my parents, Krishna Reddy, Dhana Lakshmi, my 
brother, Sathish Reddy whose continuous support made this degree possible. 
Vl 
TABLE OF CONTENTS 
Page 
PUBLICATION THESIS OPTION .................................................................................. iii 
ABSTRACT ...................................................................................................................... .iv 
ACKNOWLEDGMENTS ................................................................................................... v 
LIST OF ILLUSTRA TIONS ............................................................................................. .ix 
LIST OF TABLES ............................................................................................................... x 
SECTION 
1. INTRODUCTION .................................................................................................. 1 
PAPER 
I. A FAST LOW -POWER MODULO 2N+ 1 MULTIPLIER DESIGN ...................... 2 
Abstract ....................................................................................... 2 
1. INTRODUCTION ............................................................................................... 3 
2. COMPRESSORS ................................................................................................. 4 
2.1. MUX VS. XOR ............................................................................................ 4 
2.2. DESCRIPTION OF COMPRESSORS ........................................................ 5 
3. ALGORITHM FOR IMPLEMENTATION OF MODULO 2N+1 
MULTIPLIER ..................................................................................................... 7 
4. PROPOSED IMPLEMENTATION OF THE MOD 2N+ 1 MUL TIPLIER ....... 1 0 
4.1. PARTIAL PRODUCTS GENERATION ................................................... 10 
4.2. PARTIAL PRODUCTS REDUCTION ..................................................... 10 
4.3. FINAL STAGE ADDITION ...................................................................... 11 
5. SIMLA TION AND RESULTS ......................................................................... 13 
5.1. SIMULATION ENVIRONMENT ............................................................. 13 
5 .2. SIMULATION RESULTS ......................................................................... 13 
6. CONCLUSIONS ............................................................................................... 15 
7. REFERENCES .................................................................................................. 16 
II. AN EFFICIENT IDEA CRYPTO-HARDW ARE USING NOVEL 
MODULAR ARITHMETIC COMPONENTS ................................................... 17 
Abstract ....................... · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · .................. 17 
vii 
1. INTRODUCTION ............................................................................................ 18 
2. COMPRESSORS .............................................................................................. 20 
2.1. MDX VS. XOR .......................................................................................... 20 
2.2. DESCRIPTION OF COMPRESSORS ...................................................... 21 
3. HARDWARE IMPLEMENTATION OF THE MOD 2N+1 ............................. 23 
3.1. PARTIAL PRODUCTS GENERATION ................................................... 23 
3.2. PARTIAL PRODUCTS REDUCTION ..................................................... 23 
3.3. FINAL STATE ADDITION ...................................................................... 24 
4. NOVEL IMPLEMENTATION OF INTERNATIONAL DATA 
ENCRYPTION ALGORITHM (IDEA) USING MODULO 2N+ 1 ................... 26 
5. SIMULATION AND RESULTS ...................................................................... 28 
5.1. SIMULATION ENVIRONMENT ............................................................. 28 
5.2. SIMULATION RESULTS ......................................................................... 28 
6. CONCLUSION ................................................................................................. 30 
7. REFERENCES .................................................................................................. 31 
III. EFFICIENT ON-LINE SELF-CHECKING MODULO 2N+ 1 MULTIPLIER 
DESIGN ................................................................................................................. 33 
Abstract ...................................................................................... 33 
1. INTRODUCTION ............................................................................................. 34 
2. PRELIMINARIES AND REVIEWS ................................................................ 37 
2.1. MDX VS. XOR .......................................................................................... 37 
2.2. DESCRIPTION OF COMPRESSORS ...................................................... 37 
2.3. SPARSE TREE ADDER BASED INVERTED END AROUND 
CARRY ADDER ....................................................................................... 41 
2.4. MODULO 2N + 1 MUL TIPLIER .............................................................. .46 
3. PROPOSED SELF-CHECKING MODULO 2N + 1 MULTIPLIER 
DESIGN ............................................................................................................. 48 
3.1. SELF-CHECKING MULTIPLIERS USING RESIDUE CODES ............ .48 
3.2. SELF-CHECKING MODULO 2N + 1 MULTIPLIERS USING 
RESIDUE CODES .................................................................................... 52 
4. pARAMETRIC COMPARISON ...................................................................... 58 
4.1. UNIT-GATE MODEL ANALYSIS ........................................................... 58 
4.2. EXPERIMENTAL RESULTS ................................................................... 60 
viii 
5. CONCLUSIONS ............................................................................................... 62 
6. REFERENCES .................................................................................................. 63 
VITA ............................................................................................................................ 67 
ix 
LIST OF ILLUSTRATIONS 
Figure Page 
PAPER I 
1. CMOS implementation of2-input (a) XOR (b) MUX ............................................... .4 
2. 5:2 compressors; (a) Existing design (b) New design ................................................. 6 
3. Initial partial product matrix ........................................................................................ 8 
4. Modified partial product matrix ................................................................................... 8 
5. Final n x n partial product matrix .............................................................................. 9 
6. Inverted EAC adder implemented using sparse tree structure ................................... 12 
7. Proposed implementation ofthe mod 216+1 multiplier using efficient compressor .. 12 
8. (a) Power (b) Delay comparisons of existing and proposed multipliers .................... 14 
PAPER II 
1. CMOS implementation of2-input (a) XOR (b) MUX .............................................. 20 
2. 5:2 compressors; (a) Existing design (b) New design ............................................... 22 
3. Proposed implementation ofthe mod 216+1 multiplier using efficient compressor .. 25 
4. Datapath of IDEA cipher with 4 pipeline stages ....................................................... 27 
PAPER III 
1. CMOS implementation of2-input (a) XOR (b) MUX ............................................... 38 
2. Block diagram of(a) 5:2 compressor (b) 7:2 compressor .......................................... 39 
3. 5:2 compressors; (a) Existing design (b) New design ............................................... .40 
4. Proposed MUX-based design of7:2 compressor ...................................................... .41 
5. (a) 16-bit Sparse tree based Inverted EAC adder (b) 4-bit conditional sum 
generator ..................................................................................................................... 45 
6. Hardware implementation ofthe modulo 216 + 1 multiplier ..................................... .47 
7. A block diagram of the multiplier with residue code check ...................................... .49 
8. A block diagram ofthe residue code checker ............................................................. 51 
9. A block diagram of the self-checking modulo 2n+ 1 multiplier ................................. 54 
10. Modulo generator with check base 24 - 1 for input width=16 .................................... 56 
11. Modulo generator with check base 24 + 1 for input width::::::I6 ................................... 57 
Table 
PAPER II 
LIST OF TABLES 
1. COMPARISON OF THE PERFORMANCE MEASUREMENTS FOR 
X 
Page 
IDEA CIPHER ...................................................................................................... 29 
PAPER III 
1. AREA AND DELAY COMPARISON OF MODULO 2N + 1 
MULTIPLIERS WITH AND WITHOUT SELF-CHECKING 
PROPERTY USING UNIT-GATE MODEL ANALYSIS ................................... 59 
2. EXPERIMENTAL RESULTS SHOWING THE AREA AND 
DELAY COMPARISON OF MODULO 2N + 1 MULTIPLIERS WITH 
AND WITHOUT SELF-CHECKING PROPERTY ............................................. 60 
1. INTRODUCTION 
Cryptography has become an integral part of most of the security applications and 
low-power embedded applications. The ability to secure and the performance of the 
cryptographic algorithm are the major factors that decide the overall efficiency of the 
system. Very secured cryptographic algorithms have been designed and it is really 
difficult to crack these algorithms. The performance of these algorithms can be improved 
by designing very efficient hardware models. This thesis focuses on the novel hardware 
implementation of a cryptographic algorithm. Experiments are performed to check the 
overall performance. The fault tolerant design of this multiplier is analyzed and 
implemented using residue codes. 
This thesis is comprised of three research publications. The first paper describes 
the efficient implementation of modulo 2n+ 1 multiplier which is the basic building block 
of International Data Encryption Algorithm (IDEA). For the efficient implementation of 
modulo multiplier, efficient compressors and a newly designed sparse tree adder is used. 
The second paper describes the design of hardware implementation of the IDEA 
cipher using novel modulo 2n+ 1 multipliers. It shown that the proposed modulo 2n+ 1 
multiplier improves the performance of the various cryptographic algorithms used in 
secure communication systems of networked instrumentation and distributed 
measurement systems. Efficient compressors and sparse tree based inverted end around 
carry adders are used to reduce the delay and complexity of the multiplier. Simulations 
are performed on the known implementation and the proposed implementation. 
The third paper describes the online self-checking model of the modulo 2n+ 1 
multiplier based on residue codes is presented. The self-checking multiplier is secured 
against faults affecting a single gate at a time and produce an error at the gate output, 
which may propagate through the subsequent gates and generate an error at the output of 
the modulo multiplier. These self-checking modulo multipliers are analyzed using unit-
gate model and compared with the modulo multipliers without self-checking property. 
These models are designed for different values of the input length and simulated to get 
the experimental results. 
PAPER 
I. A FAST LOW-POWER MODULO 2N+l MULTIPLIER DESIGN 
Rajashekhar Modugu1 and Minsu Choi1 
N ohpill Park2 
1 Department of Electrical & Computer Engineering 
Missouri University of Science and Technology 
Rolla, MO 65409, USA 
{ rrmt4b, choim} @mst.edu 
2Department of Computer Science 
Oklahoma State University 
Stillwater, OK 74078-1053, USA 
npark@cs.okstate.edu 
2 
Abstract- Modulo 2n+ 1 multipliers are the basic building blocks in many security 
applications such as International Data Encryption Algorithm in cryptography. In this 
paper, a fast low-power hardware implementation of modulo 2n+ 1 multiplier is proposed. 
Novel hardware implementations for a previously proposed algorithm are shown by using 
the efficient compressors and modulo carry look-ahead adders as the basic building 
blocks. The modulo carry lookahead adder uses the sparse-tree adder technique. The 
resulting implementations are compared both qualitatively and quantitatively, in standard 
CMOS cell technology, with the existing implementations. The results show that the 
proposed implementation is faster and consume less power than similar hardware 
implementations making them a viable option for efficient designs. 
Index Terms- Compressors; International Data Encryption Algorithm (IDEA); modulo 
multiplier; sparse-tree adder; 
3 
1. INTRODUCTION 
In the recent years, the number of internet and wireless communication nodes has 
grown rapidly, which involves the transmission of data over channels. The confidentiality 
and security requirements are becoming more and more important to protect the data 
transmitted and received. Various cryptographic systems have been studied and 
implemented to encode and to decode the data. International Data Encryption Algorithm 
(IDEA) [1] is one of the most reliable cryptographic algorithms used for transmission of 
the data. The ability to perform fast encoding and decoding operations is then still a 
major issue for the implementation of IDEA, particularly from a hardware point of view. 
The three major operations that decide the delay and the overall performance of 
IDEA cipher are modulo 2" addition, bitwise-XOR and modulo 2"+ 1 multiplication. As 
the first two operations take less time and are easy to implement, improving the delay and 
power efficiency of the modulo 2"+ 1 multiplication operation leads to significant increase 
in the performance of the entire IDEA cipher. Although numerous algorithms [2-5] of the 
modulo 2"+ 1 multiplier were proposed, hardware implementation of the same needs 
considerable effort. More recently, Vergos and Efstathiou [5] proposed an efficient 
algorithm for computing modulo 2"+ 1 multiplication, in which the partial product 
reduction block, which contributes most to the overall delay, is designed using full 
adders. Hence, the demand for efficient implementation of the partial product reduction 
block is continuously increasing. 
In this paper, a new hardware implementation of the partial product reduction 
module of the mod 2"+ 1 multiplier is proposed. And also, the final stage addition module 
is redesigned using more efficient carry lookahead adder technique. The resulting 
hardware implementation is faster and consumes less power than existing ones. 
The paper is organized as follows; Section II introduces multiplexor-based 
compressors. In Section Ill, the algorithm for computation of modulo 2"+ 1 multiplication 
is given. Section IV discusses the proposed implementation of the multiplier, with 
implementation example for the input word length, n=16, which is used in IDEA cipher. 
A comparison of this implementation to a recently proposed implementation is made in 
Section V. Conclusions are drawn in Section VI. 
4 
2. COMPRESSORS 
2.1. MUX VS. XOR 
Existing CMOS designs of 2-1 MUX and 2-input XOR are shown in Fig.l. 
According to [6], the CMOS implementation of MUX, it performs better in terms of 
power and delay compared to XOR. Suppose, X andY are inputs to the XOR gate, the 
output is XY + XY. The same XOR can be implemented using MUX with inputs X, X and 
select bit Y. The efficient implementation of compressors [7] is achieved by using both 




Fig.l CMOS implementation of2-input (a) XOR (b) MUX 
5 
2.2. DESCRIPTION OF COMPRESSORS 
A (p,2) compressor with p inputs Xl,X2 .. Xp and two output bits Sum and Carry 
along with carry input bits and carry output bits is governed by the equation: 
L:=l xi+ L:=l (Cin)i =Sum+ 2(Carry + L:=l (Cout)i) 
Efficient design of the existing XOR-based 5:2 compressor [8-9], which takes 5 
inputs and 2 carry inputs, is shown in Fig. 2a. The critical path delay of this compressor is 
4Ll-XOR (delay denoted by.!\). 
The newly designed compressors use multiplexers m place of XOR gates, 
resulting in high speed arithmetic. Also, as shown in Fig. 1 in all the existing CMOS 
implementations of the XOR and MUX gates both the output and its complement are 
available but the designs of compressors available in literature do not use these outputs 
efficiently. In the CMOS implementation of the MUX if both the select bit and its 
complement are generated in the previous stage then its output is generated with much 
less delay because the switching of the transistor is already completed. And also if both 
the select bit and its complement are generated in the previous stage then the additional 
stage of the inverter is eliminated which reduces the overall delay in the critical path. The 
new MUX-based design of 5:2 compressor [7] is shown in Fig. 2b, the delay of which is 
L1- XOR + 3..1 - MUX. CGEN block used in Fig. 2 can be obtained from the equation 




Fig. 2 5:2 compressors; (a) Existing design (b) New design 
7 
3. ALGORITHM FOR IMPLEMENTATION OF MODULO 2N+l MULTIPLIER 
The algorithm for computation of X · Y mod 2° + 1 is described below. From the 
architectural characteristic comparisons [5], the algorithm presented in [5] is considered 
as the best existing algorithm for the computation of X· Y mod 2° + 1 in the literature. 
Hence, this algorithm is used for the proposed implementation of the modulo multiplier. 
According to the algorithm it takes two n+ 1 bit unsigned numbers as inputs and gives one 
n+1 bit output. The proposed implementation can be adapted to IDEA cipher [1-2], in 
which the mod zn + 1 multiplication module takes two n-bit inputs and gives one n-bit 
output, by assigning the most significant bits of the inputs zeros and neglecting the most 
significant bit of the output. 
Let IAI 8 denote the residue of A modulo B. Let X and Y be two inputs 
represented as X= X 0 X 0 _ 1 ... x0 andY= YnYn-1 ... y0 where the most significant bits x0 
and Yn are '1' only when the inputs are 2° and 2°, respectively. X· Y mod 2° + 1 can be 
represented as follows: 
The n x n partial product matrix is derived from the initial partial product matrix 
in Fig. 3, based on several observations. This n X n partial product matrix is shown in 
Fig. 5. First observation is, the initial partial product matrix can be divided into four 
groups A, B, C and D in which the terms in only one group can be different from '0'. 
Groups A, B, D and C are different from '0', if inputs (X, Y) are in the form of (OZ, OZ), 
(1Z, OZ), (OZ, 1Z) and (10 .. 0,10 .. 0), respectively (here 'Z' is a 16-bit vector). Hence the 
four groups can be integrated into a single group by performing logical OR operation 
(denoted by v) instead of adding the bits arithmetically. Logical OR operation is 
performed on the terms of the groups B, D and A in the columns with weight 2° up to 
zzn-z and on the two terms of the groups B and D with weight 220- 1 (the ORed terms of 
the groups B and D are represented by qi, where qi = Pn,iVPi,n ). Since l220- 1lzn+1 = 
zn-1 + 1, the term with weight220- 1' qn-1> can be substituted by two terms qn-1 in the 
columns with weight zn-1 and 1, respectively, and ORed with any term of the group A 
8 
there. Moreover, since l22"1zn+1 = 1, the term Pn,ncan be ORed with Po,o· The modified 
partial product matrix is shown in Fig. 4. 
22" 22n-1 2 2n-2 .... 2"+2 2"+1 2" 2"-1 2"-2 22 21 20 
n-1,0 Pn-2,0 Pz.o P1,o Po,o 
Pn-2,1 Pn-3,1 P1,1 Po,1 
Pn-2,2 Pn-3,2 Pn-4,2 Po,2 
P4,n-1 P:s,n-1 P2,n-1 P1,n-1 Po,n-2 
P:s,n-1 P2,n·1 P1,n·1 Po,n-1 A 
Pn-2,n P2,n P1,n Po,n ID 
Fig. 3 Initial partial product matrix 
22n-2 2"+1 2" 
I 2"-1 2"·2 22 21 20 I 
I 
1 Pn-1,oYqn-1 Pn-2,0 P2.o P1.o Po,oV Pn,nY qn-1 
Pn·1,1Vqo : Pn-2,1 Pn-3,1 P1,1 Po,1 




Ps,n-2 P2,n-2 I P1,n-2 Po,n-2 
Pn-1,n-1 vqn-2 •••• P2,n-1 P1,n·1 
I 
Po,n-1 I 
Fig. 4 Modified partial product matrix 
Second observation is repositioning of the partial product terms in the modified 
partial product matrix, with weight greater than zn-t based on the following equation: 
lszil = 1-szlilnl = lczn + 1- s)zlilnl 2n+1 zn+1 zn+1 
= lszliln + znzlilnl 
zn+1 (1) 
Equation ( 1) shows that the repositioning of each bit results in a correction factor 
of znzliln. In the first partial product vector, there is only one such bit and in the second 
9 
partial product vector 2 bits need to be repositioned and so on. Hence the correction 
factor for the entire partial product matrix would be: 
COR1 = 2n(2(1 + 2 + 22 +····· +2n-2)- (n- 1)) 
(2) 
The n x n partial product matrix along with the equation (2) results in n+ 1 
operands. These partial product terms can be reduced into two final summands Sum array 
and Carry array using a Carry Save Adder (CSA) tree. Suppose the carry out bit at ith 
stage of CSA is ci with weight 2n, this carryout can be reduced into: 
lci2nbn+1 = 1-cdzn+l = 12n + c;bn+l 
Therefore the carry output bits at the most significant bit position of each stage 
can be used as carry input bits of the next stage. In an n-1 stage CSA in [5] produces n-1 
such carry out bits. Hence there will be a second correction factor (3). And the overall 
correction factor using this algorithm is: 
(3) 
The final correction factor will be the sum of COR1 and COR2. The constant '3' 
in equation (4) will be the final partial product. 
COR= COR1 + COR2 = l2n(n- 1) + 2n(2n- n- 1)lzn+1 
(4) 
2""1 2""2 211-3 22 21 20 
PPo = Pn-1,0 V qn-1 Pn-2,0 Pn-3,0 P2.o P1.o Po.o V qn-1 V Pn.n 
PP1 = Pn-2,1 Pn-3.1 Pn-4,1 P1.1 Po.1 Pn-1,1 vqo 
PP2= Pn-3,1 Pn-4,2 Pn-5,2 Po.2 Pn-1,2 V q1 Pn-2.2 
PPn-2 = P1.n-2 Po.n-2 Pn-1,n-2 V qn-3 P4.n-2 P3,n-2 P2.n-2 
PPn-1 = Po.n-1 Pn-1,n-1 V qn-2 Pn-2,n-1 P3,n-1 P2.n-1 P1.n-1 
Fig. 5 Final n X n partial product matrix 
10 
4. PROPOSED IMPLEMENTATION OF THE MOD 2N+ 1 MULTIPLIER 
The proposed implementation of the modulo multiplier consists of three modules. 
First module is to generate partial products, second module is to reduce the partial 
products to two final operands and the last module is to add the Sum and Carry operands 
from partial products reduction to get the final result. 
4.1. PARTIAL PRODUCTS GENERATION 
From the above n x n partial product matrix (shown in Fig. 5), it is possible to 
observe that the partial product generation requires AND, OR and NOT gates. The most 
complex function of partial product generation module is Pn-l,n-l V qn-z, where Pt,j = 
atbj and qi = Pn,tVPi,n· 
4.2. PARTIAL PRODUCTS REDUCTION 
This is the most important module which largely determines the critical path delay 
and the overall performance of the multiplier. Hence this module needs to be designed so 
as to get minimum delay and consume less power. 
The existing implementations [4-5, 11] use full adders (FA) and half adders (HA) 
to construct this module. The series of full adders in any column can be replaced by the 
novel compressors that take the same number of inputs. In the proposed implementation 
use of suggested compressors is done which not only reduces the delay and power 
consumption but also the area of the circuit. For a modulo 216+1 multiplier in IDEA 
cipher the FA implementation requires fifteen full adders in series in any column, these 
fifteen full adders can be replaced by two 7:2 compressors, one 5:2 compressor and two 
3:2 compressors. 
Computation of the correction factor COR for compressor implementation 
involves computing only COR2, because CORl is obtained based on repositioning of the 
11 
partial product term, which is same for both implementations. The correction factor COR2 
computation for FA implementation which has n- 1 stages of additions is shown in [5]. 
And the COR2 computation for the proposed multiplier implementation using the 
compressors also yields the same result. Since, any (p, 2) compressor can be primarily 
designed using (p - 2) F As which give p - 2 carry outs with 2n weight. Hence, the overall 
correction factor COR computation for FA implementation and compressor 
implementation yield the same result i.e, 3 as shown in (4). 
4.3. FINAL STAGE ADDITION 
The partial product reduction module gives one n-bit carry vector and one n-bit 
sum vector which need to be added in the final stage addition module. Very efficient 
parallel prefix adders are designed to do this operation [2]. 
Suppose S and C are sum and carry vectors produced after the partial product 
reduction section. As it is shown in the work of Zimmerman [2] that: 
IS+ C + llzn+1 = IS+ C + Coutl 2n (5) 
The constant '1' in the above equation can be obtained from the final partial 
product term COR ( 4) which is the constant '3'. Hence the new final partial product is the 
constant '2'. The equation (5) can be implemented using an inverted End-Around-Carry 
adder [2, 4-5]. Even though the propagation delay of this adder is in the order of logn, it 
has a drawback of high interconnect complexity and high fan-out. This can be overcome 
by sparse tree adder [ 1 0] based on the prefix network logic. The sparse tree adder takes 
the carry for every four bits instead of taking it at every stage and using a carry select 
block for selecting the final carry after the prefix network. This sparse tree adder was 
proven to be much more efficient in terms of both delay and power when compared with 
the existing prefix tree based adders [12]. Hence this sparse tree can be used to design 
Inverted-End-Around-Carry adder. The newly designed Inverted-End-Around-Carry 
adder using sparse tree adder structure is shown in Fig. 6. The proposed implementation 
of the modulo 216 + 1 multiplier for IDEA cipher is shown in Fig. 7. 
12 
C15 C11 C7 C3 
Fig. 6 Inverted EAC adder implemented using sparse tree structure 
PPo-e,taPP,. .. ,,., PPo.e.t3 PP1-a, t:t PP04.2 PP,.u PPo.a, 1 PP, ... ,, 
Fig. 7 Proposed implementation of the mod 216+ 1 multiplier using efficient compressor 
13 
5. SIMLATION AND RESULTS 
5.1. SIMULATION ENVIRONMENT 
All the simulations have been carried out using Mentor Graphics design suite. The 
calculation of power and delay are carried out using the Eldo simulation tool. The power 
and delay calculations are done using the O.l8f.1 CMOS technology. The simulations are 
performed at 1.8V with all inputs fed at a frequency of 100 MHz. 
5.2. SIMULATION RESULTS 
The proposed and existing implementations of the modulo 2n+ 1 multipliers for 
different values of n (8, 16, 32) are done with CMOS tsmc018 technology. The power 
and delay comparisons with the existing implementations are given in Fig. 8a and Fig. 
8b, respectively. For n=16 case which is used in the IDEA cipher, the power and delay 















• Existing • Proposed 
0.47 2.44 1.86 
8 N value 16 
(a) 
• Existing • Proposed 













In conclusion, an efficient implementation of modulo 2"+ 1 multiplier is proposed. 
The use of novel compressors in place of full adders resulted in considerable savings in 
terms of delay and power. The final stage adder is redesigned using sparse tree adder 
which has less interconnection complexity. Simulations have been performed on the 
proposed implementation and on the existing implementation. The proposed 
implementation is proven to perform better than the existing one in every aspect, (i.e, 















Zimmermann, R., Curiger, A., Bonnenberg, H., Kaeslin, H., Felber, N., and 
Fichtner, W., "A 177 Mb/s VLSI implementation of the international data 
encryption algorithm," IEEE J. Solid-State Circuits, 1994, 29, (3), pp. 303-307. 
Zimmermann, R., "Efficient VLSI implementation of modulo (2"±1) addition and 
multiplication," IEEE trans. Comput, 2002, 51, pp. 1389-1399. 
Sousa, L., and Chaves, R., "A universal architecture for designing efficient modulo 
2"+ 1 multipliers," IEEE Trans. Circuits Syst. I, 2005, 52, pp. 1166--1178. 
Efstathiou, C., and Vergos, H.T., Dimitrakopoulos, G, and Nikolos, D.: "Efficient 
diminished-! modulo 2"+ 1 multipliers," IEEE Trans. Comput, 2005, 54, pp. 491-
496. 
Vergos, H.T., and Efstathiou, C., "Design of efficient modulo 2"+ 1 multipliers," lET 
Comput. Digit. Tech, 2007, 1, (1), pp. 49-57. 
Zimmermann, R., and Fichtner, W., "Low-power logic styles: CMOS versus pass-
transistor logic," IEEE J Solid- State Circuits, vol. 32, pp. 1079-1090, July 1997 
Veeramachaneni, S., Avinash, L., Rajashekhar, M., and Srinivas, M.B., "Efficient 
Modulo (2k±l) Binary to Residue Converters," System-on-Chip for Real-Time 
Applications, The 6th International Workshop on Dec. 2006 pp.195 - 200. 
Chang, C.H., Gu, J., and Zhang, M., "Ultra low-voltage lowpower CMOS 4-2 and 
5-2 compressors for fast arithmetic circuits," IEEE J. Circuits and Systems I, 
Volume: 51, Issue: 10 pp: 1985- 1997,2004. 
Rouholamini, M., Kavehie, 0., Mirbaha, A.P., Jasbi, S.J., and Navi, K., "A New 
Design for 7:2 Compressors," Computer Systems and Applications, 2007. AICCSA 
'07. IEEE/ACS International Conference on 13-16 May 2007 Page(s):474- 478. 
Mathew, S., Anders, M., Krishnamurthy, R.K., and Borkar, S., "A 4-GHz 130-nm 
address generation unit with 32-bit sparse-tree adder core," In IEEE Journal of 
Solid-State Circuits, Volume 38, Issue 5, May 2003 Page(s):689- 695. 
Zhongde, W., Graham, A., and William, C., "An efficient tree architecture for 
modulo 2n+ 1 multiplication," VLSI Signal Processing 14(3): 241-248 (1996). 
Kogge, P., and Stone, H.S., "A parallel algorithm for the efficient solution of a 
general class of recurrence equations," IEEE Trans. Comput, vol. C-22, pp. 786-
793, Aug 1973. 
II. AN EFFICIENT IDEA CRYPTO-HARDW ARE USING NOVEL 
MODULAR ARITHMETIC COMPONENTS 
Rajashekhar Modugu and Minsu Choi 
Department of Electrical & Computer Engineering 
Missouri University of Science and Technology 
Rolla, MO 65409, USA 
{ rrmt4b, choim} @mst.edu 
17 
Abstract- The cryptographic algorithms such as International Data Encryption 
Algorithm (IDEA) have found various applications in secure transmission of the data in 
networked instrumentation and distributed measurement systems. Modulo zn+ 1 multiplier 
and squarer play a pivotal role in the implementation of these algorithms. In this paper, an 
efficient hardware design of the IDEA using novel modulo zn+ 1 multiplier and squarer as 
the basic modules is proposed for faster, smaller and low-power IDEA circuits. Novel 
hardware implementation of the modulo zn+ 1 multiplier is shown by using the efficient 
compressors and sparse tree based inverted end around carry adders is given. The novel 
modules are applied on IDEA algorithm and the resulting implementation is compared 
both qualitatively and quantitatively with the IDEA implementation using the existing 
multiplier implementations. The measurement results show that the proposed 
implementation is faster and smaller and also consume less power than similar hardware 
implementations making it a viable option for efficient hardware designs. 
Index Terms- Modulo zn + 1 multiplier; International Data Encryption Algorithm 
(IDEA); Sparse-tree adder; Power/area/speed measurement; 
18 
1. INTRODUCTION 
The demand for high security in communications channels, networked 
instrumentation and distributed measurement systems is ever growing rapidly. The 
confidentiality and security requirements are becoming more and more important to 
protect the data transmitted and received. This leads to the need for efficient design of 
cryptographic algorithms which offer data integrity, authentication, non-repudiation and 
confidentiality of the encrypted data across the communication channels. Various 
cryptographic algorithms have been studied and implemented to ensure security of these 
systems. In this paper, modulo 2n+ 1 multiplier has been much focus as it has found its 
important role in IDEA algorithm. For example, the three major operations that decide 
the overall performance and delay of the IDEA [1, 4, 15] are modulo 216 addition, 
bitwise- XOR and modulo 216+ 1 multiplication and the GF(2n) Montgomery 
multiplication and modular exponentiation can be implemented using repeated 
multiplication and squaring of the vectors. Among these operations, improving the delay 
and power efficiency of the modulo 2n+ 1 multiplication operation leads to significant 
increase in the performance of the entire cryptographic cipher. 
Numerous hardware implementations of the IDEA algorithm are proposed in the 
literature using different modulo 216+1 multiplier architectures. The IDEA algorithm has 
been implemented in software [3] on Intel Pentium II 445 MHz with encryption rate of 
23:53 Mb/Sec. Later, IDEA was realized on hardware chip by curiger et.all [1] with 
encryption rates up to 177 Mb/sec. By using a bit-serial implementation [4], which 
enables the IDEA to be fully, pipelined the encryption rates reached 500Mb/sec with 125 
MHz clock rate. The efficiency of the IDEA cipher can still be improved if efficient basic 
modules such as modulo multipliers and adders are used. The efficient implementation of 
the modulo 2n+ 1 multiplier based on novel compressors and sparse tree based inverted 
end around carry adders is presented in [7]. Even though the architecture of the modulo 
multiplier is very efficiently proposed in [6], the hardware implementation and 
optimization are considerably improved in [7]. This is resulted by replacing the full adder 
arrays with the novel compressors and the final stage adder with the sparse tree based 
inverted end around carry adder. 
19 
The paper is organized as follows; Section II introduces multiplexor-based 
compressors. In Section III, the hardware implementation of modulo 2" + 1 multiplier is 
given. Section IV discusses the proposed implementation of the IDEA cipher which uses 
modulo 2 16+ 1 multiplier. A comparison of this implementation to a recently proposed 
implementation is made in Section V. Conclusions are drawn in Section VI. 
20 
2. COMPRESSORS 
2.1. MUX VS. XOR 
Existing CMOS designs of 2-1 MUX and 2-input XOR are shown in Fig.l. 
According to [6], the CMOS implementation of MUX, it performs better in terms of 
power and delay compared to XOR. Suppose, X andY are inputs to the XOR gate, the 
output is XY + XY. The same XOR can be implemented using MUX with inputs X, X 
and select bit Y. The efficient implementation of compressors [7] is achieved by using 




Fig.l CMOS implementation of2-input (a) XOR (b) MUX 
21 
2.2. DESCRIPTION OF COMPRESSORS 
A (p, 2) compressor with p inputs Xl,XZ . . Xp and two output bits Sum and Carry 
along with carry input bits and carry output bits is governed by the equation: 
L~=1 xi+ L~=1 (Cin)i =Sum+ 2(Carry + L~=1 (Cout)D 
Efficient design of the existing XOR-based 5:2 compressor [8-9], which takes 5 
inputs and 2 carry inputs, is shown in Fig. 2a. The critical path delay of this compressor is 
4.1-XOR (delay denoted by~). 
The newly designed compressors use multiplexers in place of XOR gates, 
resulting in high speed arithmetic. Also, as shown in Fig. 1 in all the existing CMOS 
implementations of the XOR and MUX gates both the output and its complement are 
available but the designs of compressors available in literature do not use these outputs 
efficiently. In the CMOS implementation of the MUX if both the select bit and its 
complement are generated in the previous stage then its output is generated with much 
less delay because the switching of the transistor is already completed. And also if both 
the select bit and its complement are generated in the previous stage then the additional 
stage of the inverter is eliminated which reduces the overall delay in the critical path. The 
new MUX-based design of 5:2 compressor [7] is shown in Fig. 2b, the delay of which 
is .1 - XOR + 3.1 - MUX. CGEN block used in Fig. 2 can be obtained from the equation 




Fig.2 5:2 compressors; (a) Existing design (b) New design 
23 
3. HARDWARE IMPLEMENTATION OF THE MOD 2N+l 
The hardware implementation of the modulo multiplier consists of three modules. 
First module is to generate partial products, second module is to reduce the partial 
products to two final operands and the last module is to add the Sum and Carry operands 
from partial products reduction to get the final result. 
3.1. PARTIAL PRODUCTS GENERATION 
The n X n partial products matrix is obtained from the n+ 1-bit input vectors. This 
partial product matrix is generated after repositioning the bits of the initial partial product 
matrix based on several observations presented in [6]. The partial products bits can be 
computed from AND, OR and NOT gates. The most complex function of partial product 
generation module is Pn-l,n-1 V qn-Z• where Pi,j = aibj and qi = Pn,iVPi,n· 
3.2. PARTIAL PRODUCTS REDUCTION 
This is the most important module which largely determines the critical path delay 
and the overall performance of the multiplier. Hence this module needs to be designed so 
as to get minimum delay and consume less power. 
The implementations from the literature [5, 6, and 13] use full adders (FA) and 
half adders (HA) to construct this module. The series of full adders in any column can be 
replaced by the novel compressors that take the same number of inputs. In the proposed 
implementation use of suggested compressors is done which not only reduces the delay 
and power consumption but also the area of the circuit. For a modulo 216 + 1 multiplier in 
IDEA cipher the Carry Save Adder (CSA) array implementation using Full Adders 
requires fifteen full adders in series in any column, these fifteen full adders can be 
replaced by two 7:2 compressors, one 5:2 compressor and two 3:2 compressors. 
24 
Correction factor computation is an important step while generating the partial 
products matrix. The full adder implementation [6] and the compressor based 
implementations [7] result in the same value. Because of the space constraints, 
computation of the correction factor COR for full adder implementation [6] is not given 
in this paper. COR computation for compressor implementation involves computing only 
COR2, because COR1 is obtained based on repositioning of the partial product term, 
which is same for both implementations. The correction factor COR2 computation for FA 
implementation which has n-1 stages of additions is shown in [6]. And the COR2 
computation for the proposed multiplier implementation using the compressors also 
yields the same result. Since, any (p, 2) compressor can be primarily designed using (p-2) 
F As which give p-2 carry outs with 2n weight. Hence, the overall correction factor COR 
computation for CSA array FA implementation and compressor implementation yield the 
same result i.e, 3 as shown in [5]. 
3.3. FINAL STATE ADDITION 
The partial product reduction module gives one n-bit carry vector and one n-bit 
sum vector which need to be added in the final stage addition module. Very efficient 
parallel prefix adders are designed to do this operation [2]. 
Suppose S and C are sum and carry vectors produced after the partial product 
reduction section. As it is shown in the work of Zimmerman [2] that: 
IS+ C + llzn+l = IS+ C + Coutl 2n (1) 
The equation (1) can be implemented using an inverted End- Around-Carry adder 
[2, 5, and 6]. Even though the propagation delay of this adder is in the order of log2n, it 
has a drawback of high interconnect complexity and high fan-out. This can be overcome 
by sparse tree adder [12, 16] based on the prefix network logic. The sparse tree adder 
generates the carry for every four bits instead of generating it at every stage and using a 
carry select block for selecting the final carry after the prefix network. This sparse tree 
adder was proven to be much more efficient in terms of both delay and power when 
25 
compared with the existing prefix tree based adders [ 14] . Hence this sparse tree can be 
used to design Inverted-End-Around-Carry adder. 
The newly designed Inverted-End-Around-Carry adder using sparse tree adder 
structure is used in the final stage addition of the modulo multiplier is shown in Fig. 3. 
The proposed implementation of the modulo 216+ 1 multiplier for IDEA cipher is shown 
in Fig. 3 and R16R1s ... .. R2R1R0 represents the final product of the modulo 2 16+ 1 multiplier. 
Fig. 3 Proposed implementation ofthe mod 216+1 multiplier using efficient compressor 
26 
4. NOVEL IMPLEMENTATION OF INTERNATIONAL DATA ENCRYPTION 
ALGORITHM (IDEA) USING MODULO 2N+l 
The modulo 2n+ 1 computation is an integral part of the International Data 
Encryption Algorithm (IDEA) where n = 16 [1, 4, 15]. Three major operations that 
decide the overall delay and performance of IDEA cipher are: 
1) modulo 216 addition, 
2) bitwise-XOR and 3) modulo 216 + 1 multiplication. 
As the first two operations take less time and are easy to implement, the delay and 
power efficiency of the entire IDEA cipher depends significantly on the modulo 216+1 
multiplication operation. Hence, the IDEA cipher is implemented using the proposed 
modulo multiplier and compared with the existing implementations. To encrypt a data 
block using IDEA cipher, the data should be processed through three modulo 
multiplication operations in a single round and the manipulated data again should pass 
through seven such rounds iteratively and a final output transformation to produce the 
final encrypted output. The IDEA cipher takes 64-bit input data and produces a 64-bit 
cipher text with a 128-bit key. The encryption and decryption algorithms in IDEA are 
almost identical except they utilize two different sets of sub key generated by the same 
key with different processes. 
The IDEA encryption and decryption processes consist of eight rounds of data 
manipulation using sub keys and a final output transformation stage. In this cipher, all the 
operations are carried out on 16-bit sub-blocks. In the encryption process, the input data 
block of 64-bits is divided into 4 sub blocks of 16-bits each (X., X2, X3, X4). 52 sub 
keys for the encryption process are generated from the original 128-bit key by shifting a 
part of it. Out of the 52 sub keys, six different sub keys (i.e. z<r>l ;z<r>2 ;z<r>3,z<r>4,z<r>5 and 
z<r> 6, where r is the round number) are used for each round and the remaining 4 sub keys 
are used in the final output transformation stage. The 16-bit outputs at each round are 
represented as y<r>t. y<r>2, y<r\ y<r>4 and W.,W2,W3,W4 are the outputs of the final output 
stage transformation. The 52 subkeys used for the decryption process are obtained using a 
different algorithm [ 17]. As shown in Fig. 4, the critical path consists of three modulo 
216+1 multiplication operations, two modulo 216 addition operations and two 16-bit XOR 
27 
operations in each round. In the final output transformation stage, critical path consists of 
a single modulo 216+ 1 multiplication operation. The throughput of the IDEA cipher can 
be improved, if the delay of the modulo 216+ 1 multiplication operation is reduced in the 
pipelined implementation of the IDEA cipher. Fig. 4 shows the datapath of encryption 
process of the IDEA cipher and datapath of a single round with 4 pipeline stages with the 





y (1) y (1) 
. 
2 Seven more rounds 3 







Pipelined datapath of a single round of IDEA cipher 
Modulo 218+1 multiplication of two 16-bit 
blocks, here the value ·o· is considered as 218 
Modulo 218 addition of two 16-bit blocl<s 
16-bit XOR operation on 16-bit blcoks 
Fig. 4 Datapath of IDEA cipher with 4 pipeline stages 
28 
5. SIMULATION AND RESULTS 
This particular design of the IDEA cipher with four pipeline stages using novel 
modulo 2n+ 1 multipliers is used to analyze and compare with the well-known IDEA 
cipher implementations. The use of the novel modulo multiplier improves the throughput 
and performance of the IDEA cipher significantly. The hardware implementation of the 
modulo multiplier consists of three modules. First module is to generate partial products, 
second module is to reduce the partial products to two final operands and the last module 
is to add the Sum and Carry operands from partial products reduction to get the final 
result. 
5.1. SIMULATION ENVIRONMENT 
All the simulations hav~ been carried out using Mentor Graphics design suite. The 
IDEA cipher design is specifi~d using verilog HDL and the multiplier descriptions are 
mapped on a 0.18 1m CMOS standard cell library using Leonardo Spectrum synthesis 
tool from Mentor Graphics. The design is optimized for high speed performance. Netlists 
generated from synthesis tool fU'C passed on to standard route and place tool, the layouts 
are iteratively generated to get the circuits with minimum area. The calculation of power 
and delay are carried out using the Eldo simulation tool. The simulations are performed at 
1.8V with all inputs fed at a frequency of 25 MHz. 
5.2. SIMULATION RESUL 'l'S 
The IDEA cipher is implemented using both the proposed multiplier and the 
multipliers presented in [6]. Various performance measurements for the IDEA cipher 
using both the proposed multiplier and the existing multiplier are parametrically obtained 
and listed in Table 1. 
TABLE 1. COMPARISON OF THE PERFORMANCE MEASUREMENTS FOR 
IDEA CIPHER 
Performance Using proposed Using the %Reduction 
Measurement Multipliers multipliers in [6] 
Encryption Rate 460.25 412.15 11.25 Mb/sec 
Critical Path 4.372 5.168 15.4 delay 
Area of the cipher 




Hardware implementation of the IDEA cipher using novel modulo 2n+ 1 
multipliers is presented in this paper. It is shown that the proposed modulo 2n+ 1 
multiplier improves the performance of the various cryptographic algorithms used in 
secure communication systems of networked instrumentation and distributed 
measurement systems. Efficient compressors and sparse tree based inverted end around 
carry adders are used to reduce the delay and complexity of the multiplier. Simulations 
are performed on the known implementation and the proposed implementation. The 
presented implementation is proven to perform better than the existing one in various 
aspects, (i.e, throughput and critical path delay). 
31 
7. REFERENCES 
[1] Zimmermann, R., Curiger, A., Sonnenberg, H., Kaeslin, H., Felber, N., and 
Fichtner, W., "A 177 Mb/s VLSI implementation of the international data 
encryption algorithm," IEEE J. Solid-State Circuits, 1994, 29, (3), pp. 303-307 
[2] Zimmerman, R., "Efficient VLSI implementation of modulo 2"±1 addition and 
multiplication," IEEE trans. Comput, 2002,51, pp. 1389-1399. 
[3] Cheung, K.H., Tsoi, P.H.W., Leong, A., and Leong, M.P., "Tradeoffs in parallel and 
serial implementations of the international data encryption algorithm IDEA," 
Lecture Notes in Computer Science, vol. 2162, pp. 333-, 2001. 
[4] Cheung, K.H., Tsoi, P.H.W., Leong, A., and Leong, M.P., "A bit-serial 
implementation of the international data encryption algorithm IDEA," 2000 IEEE 
Symposium on Field-Programmable Custom Computing Machines, pp. 122 -131, 
2000. 
[5] Efstathiou, C., Vergos, H.T., Dimitrakopoulos, G., and Nikolos, D., "Efficient 
diminished-! modulo 2"+1 multipliers," IEEE Trans. Comput, 2005, 54, pp. 491-
496. 
[6] Vergos, H.T., and Efstathiou, C., "Design of efficient modulo 2"+ 1 multipliers," lET 
Comput. Digit. Tech, 2007, 1, (1), pp. 49-57. 
[7] Rajashekar, M., Nohpill, P., and Minsu, C., "A Fast Low-Power Modulo 2"+ 1 
Multiplier Design," 2009 IEEE International Instrumentation and Measurement 
Technology Conference, pp.951-956, May 2009. 
[8] Zimmermann, R., and Fichtner, W., "Low-power logic styles: CMOS versus pass-
transistor logic," IEEE J. Solid- State Circuits, vol. 32, pp. 1079-1090, July 1997. 
[9] Veeramachaneni, S., Avinash, L., Rajashekhar, M., and Srinivas, M.B., "Efficient 
Modulo 2k±l Binary to Residue Converters System-on-Chip for Real-Time 
Applications," The 6th International Workshop on Dec. 2006 pp.l95- 200. 
[10] Chang, C.H., Gu, J., and Zhang, M., "Ultra low-voltage lowpower CMOS 4-2 and 
5-2 compressors for fast arithmetic circuits," IEEE J. Circuits and Systems I, 
Volume: 51, Issue: 10 pp: 1985- 1997,2004. 
[11] Rouholamini, M., Kavehie, 0., Mirbaha, A.P., Jasbi, S.J., and Navi, K., "A New 
Design for 7:2 Compressors," Computer Systems and Applications, 2007. AICCSA 
'07. IEEE/ACS International Conference on 13-16 May 2007 Page(s):474- 478. 
[12] Mathew, S., Anders, M., Krishnamurthy, R.K., and Borkar, S., "A 4-GHz 130-nm 
address generation unit with 32-bit sparse-tree adder core," In IEEE Journal of 
Solid-State Circuits, Volume 38, Issue 5, May 2003 Page(s):689- 695. 
[13] Zhongde, W., Graham, A., Jullien, B., and William, C., "An efficient tree 







Kogge, P., and Stone, H.S., "A parallel algorithm for the efficient solution of a 
general class of recurrence equations," IEEE Trans. Comput, vol. C-22, pp. 786-
793,Aug 1973. 
Curiger, C., "VINCI: VLSI Implementation of the New Secret-keyBlock Cipher 
IDEA," Proc. ofthe Custom Integrated Circuits Conference, San Diego, USA, May 
1993. 
Yan, S., Dongyu, Z., Minxuan, Z., and Shaoqing, L., "High Performance Low-
Power Sparse-Tree Binary Adders," 8th International Conference on Solid state and 
Integrated Circuit Technology, ICSICT 2006. 
Yi-Jung, C., Dyi-Rong, D., and Yunghsian, S.H., "Improved Modulo (2"+1) 
Multiplier for IDEA," J. lnf. Sci. Eng. 23(3): 911-923 (2007). 
III. EFFICIENT ON-LINE SELF-CHECKING MODULO 2N+l 
MULTIPLIER DESIGN 
Rajashekhar Modugu and Minsu Choi 
Department of Electrical & Computer Engineering 
Missouri University of Science and Technology 
Rolla, MO 65409, USA 
{ mnt4b, choim} @mst.edu 
33 
Abstract- Modulo 2n+ 1 multiplier is one of the critical components in the area of data 
security applications such as International Data Encryption Algorithm (IDEA), digital 
signal processing and fault tolerant systems that demand high reliability and fault 
tolerance. Transient faults caused by electrical noise or external interference are resulting 
in soft errors which should be detected online. The effectiveness of the residue codes in 
the self-checking implementation of the modulo multipliers has been rarely explored. In 
this paper, an efficient hardware implementation of the self-checking modulo 2n+ 1 
multiplier is proposed based on the residue codes. Different check bases in the form 2c -1 
or 2c + 1 ( c € N) are selected for various values of the input operands. In the 
implementation of the modulo generators and modulo multipliers, novel multiplexor-
based compressors are applied for efficient modulo 2n + 1 multipliers with less area and 
lower power consumption. In the final addition stage of the modulo multipliers and 
modulo generators, efficient sparse tree based inverted end around carry adders are used. 
The proposed model is capable of detecting errors caused by faults on a single gate at a 
time. The experimental results show that the proposed self-checking modulo 2n+ 1 
multipliers have less area overhead and low performance penalty. 
34 
1. INTRODUCTION 
In the recent years, the number of internet and wireless communication nodes has 
grown rapidly, which involves the transmission of data over channels. The confidentiality 
and security requirements are becoming more and more important to protect the data 
transmitted and received; consequently secured communication of the data is given the 
utmost priority. Various cryptographic systems have been studied and implemented to 
ensure the security of these systems. International Data Encryption Algorithm (IDEA) is 
one of the most reliable cryptographic algorithms used for transmission of the data [ 10, 
11, 12]. In the hardware implementation of IDEA, the three equation major operations 
that decide the delay and the overall performance of IDEA cipher are modulo 2" addition, 
bitwise-XOR and modulo 2" + 1 multiplication. The performance of data path of the 
IDEA cipher significantly depends on the modulo 2" + 1 multiplication module. Apart 
from this, the modulo 2" + 1 module has found applications in Fermat number transform 
computation [13], digital signal processing [1] and fault tolerant design of ad-hoc 
networks[2]. Hence, the efficient and fault secured design ofthe modulo 2" + 1 multiplier 
is highly desired. The protection against errors is necessary in security applications such 
as IDEA for reliability. In VLSI systems transient faults can be detected by built in online 
fault detection circuits and self-checking circuits have this property [25, 33, 34]. 
Especially, transient faults caused by internal noise or external interference are not 
tolerable in this high speed computing world, these faults should be detected online. The 
self-checking designs detect the errors immediately as they occur and the output can be 
corrected by repeating the last operation. Hence, it is highly desirable to design efficient 
algorithms and methods that can detect the errors on-line, which may prevent any harm 
caused by the faults. Designing efficient self-checking circuits with less area overhead 
and performance penalty has been an important challenge in the area of fault tolerant 
applications. In the recent years, various self-checking circuits [14, 15, 16, 17] using 
different coding schemes such as parity prediction, arithmetic codes are presented, to 
check the functionality of the circuits. Self-checking arithmetic circuits using residue 
codes are reported in some of the industry applications [28, 29]. Parity code schemes for 
memory systems and register files may achieve fault secure property at low hardware 
35 
cost; sometimes they fail to achieve the fault secure property in arithmetic circuits [20, 
22]. In the self-checking arithmetic circuits using parity prediction scheme [24] detects 
errors only at the output of the circuit; however a fault in the intermediate signal gets 
propagated to other output signals and remains undetected. Even though the parity codes 
result in less area overhead in self-checking circuits, arithmetic circuits may sometimes 
produce multiple output errors which are not detectable by the parity codes. Especially in 
case of multipliers the circuit overhead is in the range of 40% to 50% when parity 
prediction codes are used [19]. Therefore, in the multipliers with large input operand 
width, the overall area overhead is adversely affected with parity prediction schemes. In 
the literature, self-checking implementations of the arithmetic circuits such as adders and 
multipliers have been proposed [30, 31, and 32]. These self-checking circuits use parity 
prediction schemes and arithmetic code schemes in the design by trading off with the 
hardware overhead, performance penalty and fault secureness. Few of the self-checking 
circuits are described as follows. In [23], self-checking adder circuits based on arithmetic 
codes were proposed [26, 27]. These checkers using arithmetic codes suffered from 
hardware complexity and overhead. An alternative method to design self-checking adders 
[24] is by using parity prediction schemes. However, this approach detects errors only at 
the outputs and a fault in the carry gets propagated and remains undetected leading to 
unreliability. A different technique is used in [14], to design a self-checking carry-select 
adder. The adders are totally self-checking for both permanent and transient stuck-at-
faults. A self-checking multiplier is proposed in [25], based on parity prediction scheme. 
The multiplier consists of AND matrix, Carry Save Adder and a final sum-bit duplicated 
adder. Single stuck-at-faults in the combinational logic and all even or odd errors in one 
of the duplicated output registers are detected. In [17], a self-checking code-disjoint 
booth multiplier based on linear Carry Save Addition is designed. This implementation 
can detect all the single input faults, single stuck-at-faults and all errors in the output 
register. In the work of [17, 40], it is described that the transient faults in the circuits 
create soft errors in the output latches of the combinational circuit when: 
• An output is related to the faulty sub circuit with respect to the input(logical 
condition). 
36 
• A pulse, altered by the faults, has a significant pulse width and amplitude 
(electrical condition). 
• A pulse, resulting from the faults, arriving at the clock transition (latching 
window). 
Because of the numerous masking effects, these transient faults result in single bit 
errors. Hence, circuits which can detect single input faults, single stuck-at-faults or 
multiple output faults are of usual interest. In this paper, a new hardware implementation 
of the self-checking modulo 2n + 1 multiplier based on residue codes is proposed. In the 
proposed implementation several techniques are used to come out with an efficient self-
checking modulo 2n+ 1 multiplier and they are listed below: 
• Efficient compressors are employed in the multiplier design and modulo 
generators design to reduce the overhead. 
• The residue code circuits with the check bases of the form 2k -1 and 2k + 1 are 
efficiently designed using compressors and sparse tree based adders. 
The resulting self-checking circuit has area overhead in the range of 20% to 45% 
for different values of n. 
The paper is organized as follows; Section II introduces multiplexor-based 
compressors, which are used in the self-checking multiplier design [5]. Section III 
discusses the proposed implementation of the self-checking modulo 2n + 1 multiplier, and 
the efficient implementation of the modulo generators with check bases 2k -1 and 2k + 1 
is given. Experimental results showing the area overhead and performance penalty of the 
resulting self-checking circuits are given in IV. Conclusions are drawn in Section V. 
37 
2. PRELIMINARIES AND REVIEWS 
2.1. MUX VS. XOR 
Multiplexor (MUX) is one of the logic gates used extensively in the digital 
design, which is very useful in efficient design of arithmetic and logic circuits. According 
to the CMOS implementation ofMUX [6], it performs better in terms of power and delay 
compared to exclusive-OR (XOR). Suppose, X and Y are inputs to the XOR gate, the 
output is XY + XY. The same XOR can be implemented using MUX with inputs X, X and 
select bit Y. Efficient compressors have been designed using MUX and reported in [7]. In 
the proposed compressors [7], both output and its complement of these gates are used. 
This also reduces the total number of garbage outputs. Existing CMOS designs of 2: 1 
MUX and 2-input XOR are shown in Fig. 1 for comparison. 
2.2. DESCRIPTION OF COMPRESSORS 
A (p:2) compressor has p inputs X1,X2 ..... Xp-I,Xp and two output bits (i.e., Sum 
bit and Carry bit) along with carry input bits and carry output bits. Its functionality can be 
represented by the following equation: 





Fig. I CMOS implementation of2-input (a) XOR (b) MUX 
For example, a (5, 2) compressor takes 5 inputs and 2 carry inputs and generate a 
Sum and Carry bit along with two carry out bits. Block diagrams of 5:2 and 7:2 
compressors are shown in Fig. 2. Efficient designs of the existing XOR-based 7:2, 5:2 
and 4:2 compressors [8, 9]have critical path delays of 6~(XOR), 4~(XOR) and 3~(XOR) 
(delay denoted by ~). respectively [8]. 
39 










Fig. 2 Block diagram of(a) 5:2 compressor (b) 7:2 compressor 
The newly proposed efficient compressors [7] use multiplexers in place of XOR 
gates, resulting in high speed arithmetic due to reduced gate delays. Also as shown in Fig. 
1, in all the existing CMOS implementations of the XOR and MUX gates both the output 
and their complements are available but the designs of compressors available in literature 
do not use these outputs efficiently. In CMOS implementation of the MUX if both the 
select bit and its complement are generated in the previous stage then its output can be 
generated with much less delay because the switching of the transistor is already 
completed. And also if both the select bit and its complement are generated in the 
previous stage then the additional stage of the inverter can be eliminated which reduces 
the overall delay in the critical path. The existing XOR based and proposed MUX-based 
designs of a 5:2 compressor are shown in Fig. 3, the delays of which are !::.. (XOR) +3L\ 
(MUX) and 4L\ (XOR). These compressors are primitive blocks of the proposed self-
40 
checking modulo multipliers. The proposed MUX-based design of the 7:2 compressor is 
shown in Fig. 4. CGEN block used in Fig. 3, Fig. 4 can be obtained from the equation 
Cout1 = (x1 + x2) · x3 + x1 · x2. 
(a) 
(b) 
Fig. 3 5:2 compressors; (a) Existing design (b) New design 
Carry 
Fig. 4 Proposed MUX-based design of7:2 compressor 
2.3. SPARSE TREE ADDER BASED INVERTED END AROUND CARRY 
ADDER 
41 
In binary addition operation, the critical path is determined by the carry 
computation module. Among various formulations to design carry computation module, 
parallel prefix formulation [ 41] is delay effective and has regular structure suitable for 
efficient hardware implementation. The binary addition of two numbers using a parallel 
prefix network is done as follows. Let A = an-lan-2 .... atao and B = bn-tbn-2 .... btbo be two 
weighted input operands to the network. The generate bit (g;) and propagate bit (p1) are 
42 
defined as g; = a; AND b; and p; = a; OR b;, and these generate bits can be associated 
using the prefix operator ± as follows: 
(g;, Pt) o{g;.t. Pt-t) = {g;+p;.g;.t. p;.p;.t) = {g;:;-t. Pt:t-t) where + is the logical OR 
operator and. is the logical AND operator. 
The carry outs ( C;) for all the bit positions can be obtained from the group 
generate (G; == C;) where (G;, P;) = {g;, p;) o (g;.t, p;.t) o .... {g1, Pt) o {go, po). 
The function of End Around Carry (EAC) adder is to feed back the carry out of 
the addition and add it to the least significant bit of the sum vector. Similarly, in inverted 
End Around Carry adders the carry out is inverted and fed back to the least significant bit 
of the sum vector. The parallel prefix network based Inverted EAC adder [42] achieves 
the addition of the input operands by recirculating the generate and the propagate bits at 
each existing level in log2n stages. Let c•; (G*;) be the carry at bit position i in the 
inverted EAC, this can be related to G; as follows: 
(Gj, Pt) = { (Gn-1> Pn-1) fori= -1 . 
(Gi, Pi) 0 (Gn-1:1+1, Pn-1:1+1) for n - 2 ~ 1 > 0 
(2) 
In the above equation (Gt, P,.) == (G, P), 
(Gn-1:i+l> Pn-1:i+1) 
= (gn-1• Pn-1) o (gn-2• Pn-2) ··· ··· .o (gi+z, Pi+2) o (gi+v Pi+1) 
In some cases it is not possible to compute (G*;, P*;) in log2n stages, then in 
these cases the equations in (2) are transformed into the equivalent ones as shown in Eq. 
(4) by using the following property [42]: 
Suppose that (Gx, P") = (g, p) o (G,P) and (GY, PY) = ( (p, g) o (G, P)) 
G" == g + p. G = g + p. G = g. (p + G) 
= (g. p) + gG = p + gG (3) 
Therefore (f = G" and in (2) pY is computed asp. P. 
43 
To implement the parallel prefix computation efficiently, these transformations 
have to be appliedj number of times recursively on (Gi, PD o (G0 _ 1:1+V Pn-1:1+1) using 
the following relation: 
{ n, ifi > ~-1 1 . +. 2 n- -I J= n . . n 
- If I < --1 2' - 2 
(4) 
The new carry outs can be computed using the following equation: 
(G~ R*) = { (G0 _ 11 P0 _ 1 ) fori = -1 
I' 1 ( ) • (P., Gl) 0 Gn-1:1+1, Pn-1:~+ 1 for n - 2 > I > 0 
(5) 
Hence, the transformations used above to achieve the parallel prefix computation 
in log2n stages result in more number of carry merge cells and thereby adding more 
number of interstage wires. Parallel prefix adders suffer from excessive inter-stage wiring 
complexity and large number of cells, and these factors make parallel prefix based adders 
inefficient choices for VLSI implementations. Therefore, a novel sparse tree based EAC 
and inverted EAC adders are used as the primitive blocks in this work. 
In sparse tree based inverted EAC adders [3, 4], instead of calculating the carry 
term G*; for each and every bit position, every J<!h (K = 4, 8 ... )carry is computed. The 
value of K is chosen based on the sparseness of the tree, generally for 16 bit and 32-bit 
adders K is chosen as 4 [36, 3 7, 3 8]. The higher value of K results in higher value of non-
critical path delay compared to critical path delay of O(log2n) which should not be the 
case. The proposed implementation of the sparse tree based Inverted End-Around-Carry 
Adder (IEAC) is explained below clearly for 16-bit operands. For a 16-bit sparse IEAC 
with sparseness factor (i.e, K) equal to 4, the carries are computed for bit positions -1, 3, 
7 and 11. Here, bit position -1 corresponds to the inverted carry out ((G15, P15)) of the 
bit position 15. The carry out equations for the 16-bit sparse tree IEAC are as follows: 
c:1 = ( G1s• P1s) = (g1S• P1s) o (g14, P14) o •·• ••• o (gv P1) o (go. Po) 
c; = (G3, P3) 0 (G15:4• p15:4) = (g3, P3) O•••••O (go. Po) 0 (g15• P1s) 0 •••••• •0 (g4, P4) 
= (P3, G3) o (G1s:4• P1s:4) 
c; = (G7,P7) ° CGts:a•P1s:a) = (g7.P7) O•••••O (go. Po) 0 (g15•P1s) 0 •••••• •0 (ga.Pa) 
= (P7, G7) o (Gts:a• P1s:a) 
c;t = (Gll, P11) o (Gts:t2• P1s:12) 
= (g11,p11) o·····o Coo. Po) o CBts•Pts) o ••· .... o (Btz•P12) 
= (P11, G11) o (Gts:t2• P1s:12) 
44 
Figure 5 shows the finalized 16-bit sparse tree Inverted EAC adder. From Fig. 5, 
observe that all the carry outs are computed in log2n stages with less number of carry 
merge cells and reduced inter-stage wiring intensity [36]. The implementation of the 
sparse tree based EAC is similar to IEAC shown in Fig. 5, except the carry is not 
inverted. 
I r-~------~~--~==~==4=-b=it==~==~~-;1---4--b-it__J: 
: Conditional : Conditional I 
Sum Sum I 
: Generator : Generator I 
11_-!._f •( 11_-!._f l 
S11 :s S7:4 
~ (G,,.ij',.. J'..) 










(G.,. s·. P.. ) 
(G,1 , P, 1 ) o (G," , ~:s ) 
'1'1;'\ '' Carry Merge 
45 
Fig. 5 (a) 16-bit sparse tree based Inverted EAC adder (b) 4-bit conditional sum generator 
46 
The Conditional Sum Generator (CSG) shown in Fig. 11-C is implemented using 
ripple carry adder logic, two separate rails are run to calculate the carries c·i+l, c·;+2, c·i+3 
and c•;+4 assuming the input carry c•; as 0 and 1. Four 2:1 multiplexers using the carry 
c•; from sparse tree network as 1-in-4 select line generate the final sum vector. The 
conditional sum generator is shown in Fig. 5 (b). The final sum is generated in log2n 
stages in IEAC sparse tree adder with less number of cells and less inter-stage wiring. 
Hence, this approach results in low power and smaller area while providing better 
performance. 
2.4. MODULO 2N + 1 MULTIPLIER 
Modulo 2° + 1 multiplier is extensively used in many digital signal processors and 
cryptographic applications. As 2° + 1 is an n+ 1 bit number, the input operands can be of 
n+ 1 bits. A brief explanation of the algorithm and implementation of the modulo 2°+ 1 
multiplier [3, 4] is given below. 
Let lA I 8 denote the residue of A modulo B. Let X and Y be two inputs 
represented as X = XnXn- 1 ••• x0 andY = YnYn- 1 ... Yo where the most significant bits Xn 
and Yn are '1' only when the inputs are 2n and 2n, respectively. X · Y mod 2n + 1 can be 
represented as follows: 
P = IX· Ylzn+1 = ll:f:oXt2i · LJ=OYj2jlzn+1 = ILf:o (L~- Pt,j2t+j)l (6) 
J-0 zn+1 
where p;J = a; AND b1 
From (6), observe that it results in an (2n + 1) X (n + 1) partial products matrix. 
This matrix is modified into an n x n partial products matrix based on several 
assumptions [43]. The conversion of the (2n + 1) x (n + 1) partial products matrix into 
n x n partial products matrix results in a correction factor of 3. The n x n partial 
products matrix is reduced into one sum vector and one carry vector. A part 2 of the total 
correction factor 3 is added to the n x n partial products matrix and the other part 1 is 
used in the final stage addition. In the reduction of the partial products, novel 
47 
compressors are used instead of full adders in each column of the carry save adder 
network. This selection of the compressors is based on the input width. For a particular 
input width, several compressor networks are possible. The best possible compressor 
network consists of compressors with high order such as 7:2 and 5:2 compressors. The 
sum and carry vectors generated by the partial products reduction module have to be 
added in the final stage addition module. A part of the correction factor 1 left behind is 
used in the final stage addition to take advantage of the following equation. 
(7) 
From (7), observe that the inverted carry out of the addition of Sum and Carry 
vectors has to be fed back. Hence, adding a constant ' 1' to the Sum and Carry vectors 
results in Inverted End Around Carry (EAC) modulo 2n addition which has regular VLSI 
implementation. This inverted EAC is efficiently designed using the sparse tree adder 
network, which has less interstage wiring and less cell density. A novel modulo 2 16+ 1 
multiplier, which uses efficient compressors in the partial products reduction module is 
shown in Fig. 6. 
Fig. 6 Hardware implementation ofthe modulo 2 16 + 1 multiplier 
48 
3. PROPOSED SELF -CHECKING MODULO 2N + 1 MULTIPLIER DESIGN 
For any self-checking circuit to detect online-errors, the circuit has to comply 
with a set of properties. These properties are described below [17]. 
• Code disjointness: If each non-code input is mapped to a non-code output 
word then the circuit is called code-disjoint. 
• Fault secure: A circuit is called fault secure if for any fault in the fault set 
there is no input code word that causes the circuit to generate incorrect code 
word. 
• Self-testing: For all faults in the fault set, there is at least one input code word 
that generates an output which is not a code word. 
• Self-checking: If the circuit obeys both the self-testing and fault secure 
properties then it is called totally self-checking. 
If input registers are used in the implementation then code-disjointness ensures 
that the faults in the input registers are detectable. Using the fault secure property, for an 
input fault the circuit either generates a correct output or detects the fault. In the self-
testing circuits faults can be detected by applying an input vector. 
In this section, the analysis of the self-checking multipliers using residue codes, 
which includes fixing a fault model and selection of the check bases is given. The 
analysis of the integer multipliers is extended to the proposed implementation of the 
modulo 2n + 1 multipliers. 
3.1. SELF -CHECKING MULTIPLIERS USING RESIDUE CODES 
The self-checking two's complement multiplier is clearly studied for a fixed fault 
model and appropriate check bases are designed for the same in [15]. The brief 
explanation of the self-checking two's complement multiplier [15] is given below. Let 
the inputs and outputs of self-checking multiplier circuit S are encoded using the error-
detecting code 1(0) and F be the fault set used for the circuit S. The fault secureness of 
the circuit S affected by the fault set F can be achieved by selecting the code space 0 
49 
such that any error in the output can be detected by a residue code check. The definition 
of the fault-secureness with respect to a fault set F is as follows. Let f e F and sr denote 
the circuitS affected by fault f. For a particular input i e I the output of S(Sf) is given 
by (i) [ st (i)]. Then, S is fault-secure for fault set F 0 for any input i € I and for any 
fault f E F, S(i) =;:. Sf (i) implies Sf (i) ~ 0. The basic block diagram of the multiplier 
with residue code check is shown in Fig. 7. 
X Y /(x)modb I(y)modb 
r··------------~------------~---1 
l .... ~-------~-------------~---1 
p,l(p) = I(x'fJI(y) l(p)modb 
Fig. 7 A block diagram of the multiplier with residue code check 
In the above figure, M represents the multiplier and M' represents the modulo 
multiplier with residue check base b E N. I(x) mod b, /(y)mod band /(p)mod b 
represent the residues of the inputs and output with check base b. The efficient 
implementation of the modulo generators is given in following sections. These modulo 
generators use novel compressors and sparse tree adder based end around carry adders. 
The error pf in the multiplier output is detected in and only if 
[pf,J(p)mod p] ~ 0 
~ I(p)mod b =;:. I(pf)mod b 
~ II(p) -J(pf)lmod b =;:. 0 
50 
To simplify the analysis, let us assume a fault model for the multiplier with 
residue codes. For the fault set F, assume fault caused by a primitive block in the 
multiplier circuit S [ 15]. The single fault is caused by the primitive blocks such as half 
adders or full adders. This fault results in change in the functional behavior of the 
primitive blocks thus producing an erroneous output. Let E(M) denote the set of the 
absolute errors in the output of the multiplier circuit M caused by a single fault in the 
primitive blocks. Then, E(M) can be represented as follows: 
E(M) = {II(p)- /(pf)l f fault caused by a primitive block in M 
p = M(x,y), 
pf = Mf(x,y)}/0 
The multiplier circuit S is fault secure to the fault set F +-+ 
'Ve E E(M): e mod b =F 0 (8) 
Hence, to achieve low hardware costs minimum value of a check base is selected 
which satisfies the above result. The selection of the check bases in the residue code 
based circuits is an important task. In selecting the check bases, first the absolute error set 
E(M) caused by the single faults have to be characterized. For different set of faults in 
various primitive blocks, the circuit functions differently. In this paper, the fault model 
consists of faults in a single primitive block. 
The multiplier circuit S design so far is only fault-secure i.e, only testable fault 
from the fault set F will be detected. In some cases, untestable faults from the fault set 
may affect the fault-secure property of the circuit. Hence, to achieve the fault-secureness 
for all the faults, the multiplier circuit has to be self-testing too. A brief explanation of the 
self-testing property [15] is given below. 
Let S be the self-checking multiplier circuit, I be the input code space and 0 be 
the output code space. Let the set of inputs subjected to the circuit in the fault free case be 
N, these inputs are given to the circuit as normal inputs, N c /.Then the circuitS is called 
self-testing when: 
• For a fault/E F, there exists ani EN such that d (i) fl. 0. 
• The circuit S is self-testing for fault set F +-+ S is self-testing for every fault 
from the set F. 
51 
For the multiplier circuit S with residue code check, in most applications N = I. 
Hence, the input set for both M and M' can be subjected to arbitrary values. Thus, if M 
and M' are realized without internal redundancy then the fault-secureness implies that the 
circuit S is self-testable. A basic configuration of the multiplier with residue code checker 
is shown in Fig. 8. In Fig.8, modMult b computes the residue of the multiplier output i.e, 
l(p) mod b. Dual Rail Check compares the output from modMult b and output of the 
inverter and asserts the error signal if the two inputs are not complementary to each other. 
with the above residue and dual rail checkers the self-testing property many not be 
always achieved. This guarantee the self-testing property of the multiplier circuit, the 
structure of the checkers has to be properly selected. 
p l(p)modb 
error 
Fig. 8 A block diagram of the residue code checker 
The check base selection of the residue codes depends on the faults in the fault set 
F and the resulting absolute error in the output vector of the multiplier. Hence, before 
selecting the check base, the E(M) set has to be fixed. As only one fault in a single 
52 
primitive block such as half adder or full adder is assumed, corresponding E(M) can be 
computed and proper check base is selected using (8). For the two's complement 
multiplier studied in [15], the check base b is fixed as 3 for most of the input cases. This 
result is obtained by making a particular assumption, otherwise check base 7 is used. 
Suppose for ann-bit multiplier, the partial products reduction module is designed using 
half adders and full adders. A single fault in any one of these primitive blocks may result 
in an error in the sum or carry outputs. If the fault causes an error set E(M) given by: 
E(M) c { azi Ia E [1: 3], i E [0: 2n - 2]} u {5. zzn-3 } (9) 
The check base b achieves the fault secureness for the given multiplier if 
bEN\ {y2ily E [1: 5], i E {O,N}} (10) 
where N denotes the set of all natural numbers. 
Hence, the smallest check base that satisfies (1 0) and to achieve the fault 
secureness for the given multiplier is 7. Since, modulo 3 generator is very popular, the 
check base 3 is used based on an assumption. In the partial products reduction module, 
half adders and full adders are assumed such that error values of the form ±3 do not 
occur. Assuming this error analysis, it can be shown that under the restricted fault model 
the error set Ere(M) can be represented as: 
EreCM) c {2ili E [0: 2n- 1]} U {3. zzn-z} (11) 
From (11) there exists only one error which cannot be detected by modulo 3 
check base. This error value occurs for only one input combination which is /(x) = I(y) = 
-2n-l and a fault on the outputs ofthe primitive block with weight 22n-2• This case is highly 
improbable and hence is neglected. 
3.2. SELF-CHECKING MODULO 2N + 1 MULTIPLIERS USING RESIDUE 
CODES 
In this section, a fault secured implementation of the modulo 2° + 1 multiplier 
using residue codes is given. The fault model for this self-checking multiplier includes 
faults affecting a single gate. And consequently, the same fault may propagate through 
the subsequent gates and generate errors at the multiplier outputs. Hence, to achieve fault 
53 
secureness these errors must be detected by the residue codes [19]. As shown in Fig. 3, 
the fault model includes any fault affecting the gates that generate the outputs sum and 
carry outs. 
From the implementation ofthe modulo 2°+ 1 multiplier [3], it consists of partial 
products generation module, partial products reduction module and final stage addition 
module. The partial products generation consists of basic logic gates and the partial 
products reduction module is implemented using compressors of different order (3:2, 4:2 
and 5:2 compressors etc). The final stage addition module is designed using sparse tree 
based inverted end around carry adder. For ordinary integer multipliers, the error in the 
arithmetic value of the output caused by the faults are well studied in the literature [21]. 
In residue code based self-checking multipliers [15, 19, 21, and 23], to detect an error the 
arithmetic difference should not be divisible by the check base of the residue codes. 
The block diagram of the self-checking modulo 2°+ 1 multiplier based on residue 
arithmetic codes [19] is shown in Fig. 9 (i.e, in the modulo A generator block is either 2k-
1 or 2k+ 1 ). From the figure, observe that the self-checking modulo 2°+ 1 multiplier 
consists of a modulo multiplier, modulo generators for the input operands followed by a 
modulo multiplier and modulo generators (with check bases of the form 2c - 1 or 2c + 1) 
for each of the modulo multiplier outputs. In the final stage, an arithmetic code checker to 
check the output of the modulo multiplier against its check part. In this case, a dual-rail 
checker is used. 
Input operand 1 Input Operand 2 
I I l ~r---~l 
r---'----~ l 










Modulo 2k+1 multiplier 
for the check parts 
Fig. 9 A block diagram of the self-checking modulo 2n+ 1 multiplier 
54 
As shown in Fig. 9, the modulo generator calculates input modulo 2c -1 or 2c + 1. 
The check base selection for arithmetic circuits such as adders and multipliers is 
presented in the literature [15, 21]. The hardware implementation and structure of the 
modulo multiplier is similar to integer multipliers [3]. 
In the array multipliers implementation reported in [35], the partial products 
reduction module consists of full adders and half adders which generates Sum and Carry 
outputs. A fault on the set of the gates that generate these Sum and Carry signals cause an 
output error [19, 21]. An error on the sum signal gives an arithmetic value of±2i (where i 
is the weight of the error signal), similarly an error on the Carry signal gives an 
arithmetic value of ±2.i. Hence, errors produced on both of these signals give an 
55 
arithmetic value of±3.2i. To detect the error caused by these faults, the check base ofthe 
residue codes should be selected such that the arithmetic difference should not be 
divisible by the check base. The final stage adder of the array multiplier is generally 
implemented using parallel prefix adders. Various algorithms are proposed to select the 
check bases for these fast parallel prefix adders. Unlike the ripple carry adders, the carry 
computation problem is logarithmic rather than linear. Hence, the error propagation is 
also different and causes various output errors. A brief description on the check base 
selection discussed in [19, 21] is presented below. The arithmetic value of the errors in 
the outputs is determined based on a couple of facts. 
• Faults on the signals with divergent degree higher than 1 result in errors with 
arithmetic value ±2i. 
• Consider faults on a random signal Wi which can be propagated to the carry Cj. 
The error may propagate to the other carry signals Cj G > i) which structurally 
depend on Ci subsequently on Wj. In the actual implementation, not all the 
carries Cj G > i) structurally depend on Wj. Hence, two kinds of errors are 
possible, in the first case all the carries Cj G > i) may depend on ci resulting in 
an error with final arithmetic value ±2i+ 1. In the second case, only a subset of 
the carries Cj G > i) may structurally depend on ci producing an output error 
given below: 
• +[ao(zi+l + zi+2 ..... +zi+k) + al(zil+l + zi1+2 ..... +2il+k1) + 
a2(zi2+1 + zt2+2 ..... +2i2+k2) + ... + CZm(zim+l + zim+2 ..... +zim+km)] 
where a0 , a 1 .••.. am E [0,1] and im + km < n, n is the width of the operands 
used in the adder. 
To achieve the fault secureness of the self-checking modulo 2n+ 1 multiplier, the 
resulting arithmetic value of the output errors caused by the faults must not be divisible 
by the check base. Hence, the smallest odd integer is chosen which does not divide the 
arithmetic value of the errors in the output. The check bases are that best suit for this 
operation are of the type (2c -112c+ 1, c E N) and these check bases result in efficient 
residue code computation [21, 15]. 
In the modulo 2n + 1 multipliers, the partial products reduction module is 
designed using compressors which are similar to full adders in operation. A fault in set of 
56 
the gates of the compressors generate an error with arithmetic value ±K. i (K is a 
constant). The final stage adder is designed using sparse tree based inverted end around 
carry adder. The operation of the sparse tree adder is same as parallel prefix adders, 
except the carries are computed at every 4th or 8th bit. Hence, the error analysis gives 
same output error as proven in [21]. Hence, to design efficient modulo generators, the 
check base of the form (2c -112c+ 1, c E N) is chosen. Efficient implementation of the 
modulo generators with check bases 24-1 and 24 + 1 for an input operand of width 16 are 
shown in Fig. 1 0 and Fig. 11. These modulo generators use novel sparse tree based end 
around carry adder and inverted end around carry adder, respectively. In the novel 
designs of the modulo generators, full adders are replaced by the efficient compressors. 
Compressor Compressor Compressor 
Sparse tree based end around carry adder 
s, s, 










Fig. 11 Modulo generator with check base 24 + 1 for input width= 16 
57 
Thus, the proposed self-checking modulo 2n + 1 multiplier based on residue codes 
efficiently detects all errors caused by the faults on a single gate at a time. The efficient 
use of the compressors in the modulo generators and modulo multipliers result in good 
savings in terms of area overhead and delay. The self-checking two's complement 
multiplier is clearly studied for a fixed fault model and appropriate check bases are 
designed for the same in [15]. 
58 
4. PARAMETRIC COMPARISON 
The efficient self-checking modulo 2" + 1 multiplier is obtained from the efficient 
use of the novel compressors in the modulo multipliers and modulo generators. In this 
section, the self-checking modulo 2" + 1 multipliers are compared against the modulo 2" 
+ 1 multipliers without self-checking property. The comparisons are carried out using the 
unit-gate model proposed by Tyagi (39] and also experimental results are compared. The 
hardware overhead in the proposed implementation of the modulo 2" + 1 multiplier is 
caused by the modulo generators, dual-rail checker the modulo multiplier which is used 
to generate the check part for the dual-rail checker to check against the actual result of the 
modulo 2" + 1 multiplier. The performance penalty of the multiplier is caused by the dual 
rail checker and the modulo generators. If the combined delay of the modulo generators 
and dual rail checker are more than the delay of the modulo 2" + 1 multiplier, this pays 
penalty for the performance of the multiplier. For different values of the input operands 
of the modulo 2" + 1 multiplier, the modulo generators have different check bases to 
achieve the full fault secureness. The residue code check bases of the form 2c -1 and 2c + 
1, for different values of the input operands are selected and corresponding modulo 
generators are designed. 
4.1. UNIT -GATE MODEL ANALYSIS 
The modulo multipliers and the modulo generators contribute largely to the 
overall area and delay of the multiplier. In the unit-gate model presented by Tyagi [39], 
each 2-input monotonic gate is considered as a single gate equivalent for both the area 
and delay comparisons, and the 2-input XOR gate and 2:1 MUX are considered as two 
gate equivalents (area and delay). The area and delay terms of the modulo multiplier with 
and without self-checking property are shown below. 
AMWS = AMG + AMm + ADRC 
TMWS = TMG + TMm + TDRC 
AMw = AMM 
59 
In the above equations AMws, T Mws denote, respectively the area and delay of the 
modulo multiplier with self-checking property. AMWs and T MWS are obtained by summing 
the areas and delays of the Modulo generators (AMa,TMa), modulo multipliers (AMm,TMm) 
and dual rail checker(ADRc,TDRc). AMW, TMW are the area and delay of the modulo 
multiplier without self-checking property, they are nothing but the area and delay of the 
ordinary modulo 2n + 1 multiplier denoted by AMM and TMM, respectively. The unit-gate 
areas and delays of these multipliers are computed and tabulated in the below table. 
TABLE 1. AREAAND DELAY COMPARISON OF MODULO 2N + 1 
MULTIPLIERS WITH AND WITHOUT SELF -CHECKING PROPERTY USING 
UNIT-GATE MODEL ANALYSIS 
%Area 
% 
n AMw AMws overhead TMw TMws 
Performance 
penalty 
8 553 4454 43 30 33 10 
16 1968 2676 36 52 56 7.7 
32 7825 10251 31 94 98 4.25 
64 25637 30508 19 168 173 2.98 
60 
4.2. EXPERIMENTAL RESULTS 
Even though the unit gate model gives delay and area comparisons in terms of 
gate counts, the standard cell based implementation of the proposed compressor based 
multiplier gives much more accurate delay and area estimations. The proposed self-
checking modulo multipliers for various values of input length are specified in Verilog 
Hardware Description Language (HDL). The multiplier descriptions are mapped on a 
0 .18Jlm CMOS standard cell library using Leonardo Spectrum synthesis tool from 
Mentor Graphics. The design is optimized for high speed performance. Netlists generated 
from synthesis tool are passed on to standard route and place tool, the layouts are 
iteratively generated to get the circuits with minimum area. 
TABLE 2. EXPERIMENTAL RESULTS SHOWING THE AREA AND DELAY 




n At(pm2) Az(pm2) overhead Tt Tz Performan
ce 
penalty 
8 3072 4454 45 1.982 2.121 7 
16 10933 14540 33 4.216 4.427 5 
32 43472 56078 29 8.542 8.884 4 
64 1608847 194624 21 15.315 15.621 2 
61 
In Table. 2 A 1 and A2 represent the area of the modulo 2n+ 1 multiplier without the 
self-checking property and with the self-checking property, respectively. Similarly, T1 
and T2 represent the delay of the modulo 2n + 1 multiplier without the self-checking 
property and with self-checking property, respectively. 
62 
5. CONCLUSIONS 
In this paper, a new self-checking modulo 2n+ 1 multiplier based on residue codes 
is proposed and validated. In the proposed implementation, the self-checking modulo 
multiplier consists of modulo generators with check bases of the form 2c -1 or 2c+ 1 ( c E 
N), modulo multipliers and self-checking dual rail checkers. All the modulo components 
such as modulo generators, modulo multipliers are efficiently designed using novel 
compressors. The final stage addition modules of the modulo multipliers and modulo 
generators are efficiently designed using sparse tree based inverted end around carry 
adders. The self-checking multiplier is secured against faults affecting a single gate at a 
time and produce an error at the gate output, which may propagate through the 
subsequent gates and generate an error at the output of the modulo multiplier. These 
selfchecking modulo multipliers are analyzed using unit-gate model and compared with 
the modulo multipliers without self-checking property. These models are designed for 
different values of the input length and simulated to get the experimental results. The 
results show that the proposed self-checking multiplier results in 20% to 45% area 
















Soderstrand, M.A., Jenkins, W.K., Jullien, G.A., and Taylor, F.A., "Modern 
Applications of Residue Number System Arithmetic to Digital Signal Processing," 
New York: IEEE Press, 1986. 
Beckmann, P.E., and Musicus, B.R., "Fast fault-tolerant digital convolution using a 
polynomial residue number system," IEEE Transactions on Signal Processing, vol. 
41, issue 7, pp. 2300-2313. 
Rajashekar, M., Nohpill, P., and Minsu, C., "A Fast Low-Power Modulo 2"+1 
Multiplier Design," 2009 IEEE International Instrumentation and Measurement 
Technology Conference, pp.951-956, May 2009. 
Rajashekar, M., Nohpill, P., Yong-Bin, K., and Minsu, C., "Efficient On-line Self-
Checking Modulo 2"+ 1 Multiplier Design," submitted to the 24th IEEE 
International Symposium on Defect and Fault Tolerance in VLSI Systems, October 
2009. 
Zimmermann, R., and Fichtner, W., "Low-power logic styles: CMOS versus pass-
transistor logic," IEEE J. Solid-State Circuits, vol. 32, pp. 1079-1090, July 1997. 
Veeramachaneni, S., Avinash, L., Rajashekhar, M., and Srinivas, M.B., "Efficient 
Modulo 2"±1 Binary to Residue Converters," The 6th International Workshop on 
System-on-Chip for Real-Time Applications, pp.l95-200, Dec. 2006. 
Chang, C.H., Gu, J., and Zhang, M., "Ultra low-voltage lowpower CMOS 4-2 and 
5-2 compressors for fast arithmetic circuits," IEEE J. Circuits and Systems I, 
Volume: 51, Issue: 10 pp: 1985- 1997, 2004. 
Rouholamini, M., Kavehie, 0., Mirbaha, A.P., Jasbi, S.J., and Navi, K., "A New 
Design for 7:2 Compressors," Computer Systems and Applications, 2007. AICCSA 
'07. IEEE/ACS International Conference on 13-16 May 2007, pp:474- 478. 
Curiger, C., "VINCI: VLSI Implementation of the New Secret-keyBlock Cipher 
IDEA," Proc. of the Custom Integrated Circuits Conference, San Diego, USA, 
May 1993 
Zimmermann, R., Curiger, A., Sonnenberg, H., Kaeslin, H., Felber, N., and 
Fichtner, W., "A 177 Mb/s VLSI implementation of the international data 
encryption algorithm," IEEE J. Solid-State Circuits, 1994, 29, (3), pp. 303-307 
Sklavos, N., and Koufopavlou, 0., "Asynchronous low power VLSI 
implementation of the International Data Encryption Algorithm," Electronics, 
Circuits and Systems, 2001. ICECS 2001. The 8th IEEE International Conference 
on, vol.3, no, pp.1425-1428 vo1.3, 2001. 
Leibowitz, L.M., "A simplified binary arithmetic for the Fermat number transform," 
IEEE Trans. Acoust. Speech Signal Process, 1976, 24, pp. 35659. 
Vasudevan, D.P., and Lala, P.K., "A technique for modular design of self-checking 
carry-select adder," Defect and Fault Tolerance in VLSI Systems, 2005. DFT 2005. 















Sparmann, U., and Reddy, S.M., "On the effectiveness of residue code checking for 
parallel two's complement multipliers," Fault-Tolerant Computing, 1994. FTCS-24. 
Digest of Papers, Twenty-Fourth International Symposium on, vol, no, pp.219-228, 
15-17 Jun 1994. 
Marienfeld, D., Sogomonyan, E.S., Ocheretnij, V., and Gossel, M., "New Self-
checking Output-Duplicated Booth Multiplier with High Fault Coverage for Soft 
Errors," Test Symposium, 2005. Proceedings. 14th Asian, vol, no, pp.76-81, 21-21 
Dec. 2005. 
Hunger, M., and Marienfeld, D., "New Self-Checking Booth Multipliers," 
International Journal of Applied Mathematics and Computer Science Vol. 18, No. 3, 
2008. 
Goessel, M., and Graf, F., "Error Detection Circuits," McGraw-Hill, London, 1993. 
Noufal, LA., and Nicolaidis, M., "A CAD framework for generating self-checking 
multipliers based on residue codes," Design, Automation and Test in Europe 
Conference and Exhibition 1999. Proceedings, vol, no, pp.122-129, 1999. 
Garcia, O.N., and Rao, T.R.N., "On the method of checking logical operations," 
Proc. 2nd Annual Princeton Conf. Inform. Sci. Sys, pp. 89-95, 1968. 
Sparmann, U., "On the check base selection problem for fast adders," VLSI Test 
Symposium, 1993. Digest of Papers, Eleventh Annual 1993 IEEE, vol, no, pp.62-
65, 6-8 Apr 1993. 
Sellers, F.F., Hsiao, M.Y., and Beamson, L.W., "Error Detecting Logic for Digital 
Computers," New-York, Me GRA WHILL 1968. 
Peterson, W. W. W., "On checking an adder," IBM J .Res. Develop, Vol.2, pp.166-168, 
Apr. 1958. 
Fujiwara, E., and Haruta, K., "Fault-tolerant Arithmetic Logic Unit Using Parity 
Based Codes," Transactions ofthe IECE of Japan, pp.653-660, October 1981. 
Marienfeld, D., Sogomonyan, E.S., Ocheretnij, V., and Gossel, M., "A new self-
checking multiplier by use of a code-disjoint sum-bit duplicated adder," Test 
Symposium, Proceedings. Ninth IEEE European, vol, no, pp. 30-35, 23-26 Mady 
2004. ETS 2004. Proceedings. 
Sayers, I.L., and Kinniment, D.J., "Low-cost residue codes and their application to 
self-checking VLSI systems," Computers and Digital Techniques, lEE Proceedings 
E, vol.132, no.4, pp.197-202, July 1985. 
Debany, W.H., Macera, A.R., Daskiewich, D.E., Gorniak, M.J., Kwiat, K.A., and 
Dussault, H.B., "Effective concurrent test for a parallel-input multiplier using 
modulo 3," VLSI Test Symposium, 1992. '1Oth Anniversary. Design, Test and 
Application: ASICs and Systems-on-a-Chip', Digest of Papers, 1992 IEEE, vol, no, 
















Heckelman, R., and Bhavsar, D., "Self-testing VLSI," Solid-State Circuits 
Conference. Digest of Technical Papers. 1981 IEEE International, vol.XXIV, no, 
pp. 174-175, Feb 1981. 
Siegel T.J ., and Veracca, R.J ., "Design and performance of the IBM Enterprise 
System/9000 Type 9121 vector facility," IBM J. Res. Develop, vol. 35, pp. 367-
381, May 1991. 
Nicolaidis, M., "Efficient implementations of self-checking adders and ALUs," 
Fault-Tolerant Computing, 1993. FTCS-23. Digest of Papers, The Twenty-Third 
International Symposium on, vol, no, pp.586-595, 22-24 Jun 1993. 
Kundu, S., and Reddy, S.M., "Embedded totally self-checking checkers: a practical 
design," Design and Test of Computers, IEEE, vol. 7, no.4, pp.5-12, Aug 1990 
Xrysovalantis, K., Dimitris, N., Foukarakis, G., and Gnardellis, T., "New efficient 
totally self-checking Berger code checkers," Integration 28(1): 101-118 (1999). 
Kumar, K., and Lala, P.K.,"On-line Detection of Faults in Carry-Select Adders," 
lTC, pp.912, International Test Conference 2003 (ITC'03), 2003. 
Vasudevan, D.P., Lala, P.K., and Parkerson, J.P., "Self-Checking Carry-Select Adder 
Design Based on Two-Rail Encoding," Circuits and Systems 1: Regular Papers, 
IEEE Transactions on, vol.54, no.12, pp.2696-2705, Dec. 2007. 
Behrooz, P., "Computer Arithmetic: Algorithms and Hardware Designs," Oxford 
University Press, New York, 2000. 
Mathew, S., Anders, M., Krishnamurthy, R.K., and Borkar, S., "A 4-GHz 130-nm 
address generation unit with 32-bit sparse-tree adder core," In IEEE Journal of 
Solid-State Circuits, Volume 38, Issue 5, May 2003 Page(s):689 - 695. 
Grad, J., and Stine, J.E., "A Multi-Mode Low-Energy Binary Adder," Fortieth 
Asilomar Conference on Signals, Systems and Computers, Oct. 29 2006-Nov. 1 
2006 pp. 2065-2068. 
Yan, S., Dongyu, Z., Minxuan, Z., and Shaoqing, L., "High Performance Low-
Power Sparse-Tree Binary Adders," Solid-State and Integrated Circuit Technology, 
2006. ICSICT '06. 8th International Conference on, vol, no, pp.1649-1651, 23-26 
Oct 2006. 
Tyagi, A., "A reduced-area scheme for carry-select adders," IEEE Trans. Comput, 
1993,42, (10), pp. 1163170. 
Shivakumar, P., Keckler, S., Kistler, M., Burger, D., and Alvisi, L., "Modeling the 
effect of technology trends on the soft error rate of combinatorial logic," 
Proceedings of the International Conference on Dependable Systems and 
Networks, pp. 389-398, 2002. 
Kogge, P., and Stone, H.S., "A parallel algorithm for the efficient solution of a 
general class of recurrence equations," IEEE Trans. Comput, vol. C-22, pp. 786-
793,Aug 1973. 
66 
[41] Vergos, H.T., Efstathiou, C., and Nikolos, D., "Diminished-one modulo 2°+1 adder 
design," Computers, IEEE Transactions on, vol.51, no.12, pp. 1389-1399, Dec 
2002. 
[42] Vergos, H.T., Efstathiou, C., "Design of efficient modulo 2°+1 multipliers," lET 
Comput. Digit. Tech, 2007, 1, (1), pp. 49-57. 
67 
VITA 
Rajashekhar Reddy Modugu, was born on January 19, 1987 in Bethavole, Andhra 
Pradesh, India. Raja completed his school education in Bharathi Vidya Mandir, Kodada, 
India. He did his intermediate education at Vignan Junior College, Vadlamudi, India. He 
completed his Bachelor of Technology (B. Tech) in Electronics and Communication 
Engineering from International Institute of Information Technology, Hyderabad, India in 
May 2008. He started his Master of Science program in Electrical and Computer 
Engineering at Missouri University of Science and Technology in August 2008. He 
graduated in December 2010. He has published several conference proceedings and IEEE 
Journal articles. 
