Emerging Design Methodology And Its Implementation Through Rns And Qca by Dajani, Omar
Wayne State University
DigitalCommons@WayneState
Wayne State University Dissertations
1-1-2013
Emerging Design Methodology And Its
Implementation Through Rns And Qca
Omar Dajani
Wayne State University,
Follow this and additional works at: http://digitalcommons.wayne.edu/oa_dissertations
This Open Access Dissertation is brought to you for free and open access by DigitalCommons@WayneState. It has been accepted for inclusion in
Wayne State University Dissertations by an authorized administrator of DigitalCommons@WayneState.
Recommended Citation
Dajani, Omar, "Emerging Design Methodology And Its Implementation Through Rns And Qca" (2013). Wayne State University
Dissertations. Paper 646.
  
EMERGING DESIGN METHODOLOGY AND ITS IMPLEMENTATION 
THROUGH RNS AND QCA 
by 
OMAR DAJANI 
DISSERTATION 
Submitted to the Graduate School 
of Wayne State University, 
Detroit, Michigan 
in partial fulfillment of the requirements 
for the degree of 
DOCTOR OF PHILOSOPHY 
2013  
                                                      MAJOR: ELECTRICAL ENGINEERING 
               Approved by:  
 
                                                                           
       Advisor                                             Date 
                                                                                       
                                                                                       
                                                                                       
                                                                                       
                                                                                       
  
© COPYRIGHT BY 
OMAR DAJANI 
2013 
All Rights Reserved 
 ii 
 
 
DEDICATION 
To my family 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 iii 
 
 
ACKNOWLEDGMENTS 
 
I sincerely thank my research advisor Prof. Harpreet Singh for guiding me through this 
research project and for giving me the encouragement and advice throughout the course of 
this work. Dr. Singh’s dedication and flexibility is what made it possible for me to 
succeed. I would like to thank Dr. Pepe Siy for his help during the initial phase of my PhD 
research.  
Also, I would like to thank my dissertation committee: Dr. Mohamad Berri, Dr. Feng 
Lin and Dr. Le Yi Wang for their helpful comments and encouragement.  
Many thanks to my colleagues for their assistance in FPGA, Multisim and Cadence 
that have helped me in my research development. 
Finally, I would like to thank my family for their support and encouragement. 
 iv 
TABLE OF CONTENTS 
Dedication…………………………………………………………………………………….…… 
 
ii  
Acknowledgments…………………………………………………………………………….…... 
  
iii  
List of Tables………………………………………………………………………………….…... 
  
vii  
List of Figures …………………………………………………………………………………..… 
 
viii 
Chapter 1 Introduction……………………………………………………………………….…..... 
1.1 Introduction to RNS………………………………………………………………..… 
1.2 Residue Number System Representation…………………………………………...... 
1.3 Residue Dynamic Range…………………..……………………………………….… 
1.4 Quantum-Dot Cellular Automata (QCA)………………………………...……..…..... 
1.5 Problem Statement………………………………………………………………...…. 
1.6 Thesis Organization…………………………………………………………………... 
Chapter 2 Literature Review ………………………………..……………….………………….… 
2.1 Introduction…………………………………………………………………………... 
2.2 RNS Arithmetic Operations………………………………………………………….. 
2.3 Conversion from RNS to Binary…………………………………………………....... 
2.3.1 Chinese Remainder Theorem Conversion……………….…………………... 
2.3.2 Mixed Radix Conversion………………………………….…………………. 
2.4 Residue Number System Sign Detection…………………………………………….. 
2.5 Scaling in Residue Number System………………………………………………….. 
2.6 RNS Fast Processing Applications………………………………….……………….. 
2.7 Quantum-Dot Cellular Automata (QCA)…………………………………………….. 
2.8 QCA Clocking………………………………………………………………………... 
2.9 Conclusion…………………………………………………………………………… 
 
1 
 
1 
 
2 
 
2 
 
5 
 
6 
 
8 
 
9 
 
9 
 
9 
 
11 
 
11 
 
13 
 
15 
 
16 
 
17 
 
17 
 
20 
 
22 
 
 
 
 
 v
Chapter 3 Novel Parallel - Perfix Structure Binary to Residue Number System Conversion 
Method……………………….…………….………..………………………………………… 
3.1 Introduction………………………………………………………………………...…. 
3.2 Residue Number System…………………………………………………………….... 
3.3 New Novel Conversion Method from Binary to Residue Representation..................... 
3.4 Illustrative Example…………………………………………………………...……… 
3.5 Implementation Selection…………………………………………………….………. 
3.6 Comparison Selection to Pervious Work………………………………………...……   
3.7 Conclusion……………………………………………………………………...….…. 
Chapter 4 Simplified RNS Scaling Algorithm……………………………………………….…. 
4.1 Introduction………………………………………………………………………..…. 
4.2 Division remainder zero……………………………………………………….…...… 
4.3 Scaling…………………………………………………………………………….….. 
4.4 General Division………………………………………………………………..……. 
4.5 New Base Extension Algorithm……………………………………………………… 
4.6 New Scaling Algorithm………………………………………………………..…….. 
4.7 Conclusion………………………………………………………………………..…... 
Chapter 5  VLSI Implementation of Residue Adder and Subtract……........................................ 
5.1 Introduction……………………………………………………………………..……. 
5.2 Residue Adder and Subtractor…………………………………………………….…. 
5.3 V LSI Implementation…………………………………………………………..……. 
5.4 Conclusion…….………………………………………………………………...……. 
Chapter 6 Novel Quantum Boolean Circuits Construction by Using XOR-AND Reduction 
Method……………….…….……………………………………………………………….. 
6.1 Introduction…………………………………………………………….…………….. 
 
 
23  
 
23 
 
24 
 
24 
 
34 
 
38 
 
41 
 
42 
 
43 
 
43 
 
43 
 
43 
 
44 
 
45 
 
47 
 
52 
 
53 
 
53 
 
54 
 
56 
 
59 
 
 
 
60 
 
60 
 
 
 vi 
6.2 QCA Background Material………………………………………………...………….  
6.3 Novel QCA Extraction………………………………………………………..………  
6.4 Conclusion……………………………………………………………………..…….. 
Chapter 7 Implementation of Generalized Pipeline Cellular Array Using Quantum-Dot 
Cellular Automata……….……………………………………………………………..…… 
7.1 Introduction………………………………………………………………………..…. 
7.2 QCA Pipeline Array…………………………………………………………….….…  
7.3 QCA Pipeline Implementation……………………………………………………..…  
7.4 Conclusion…………………………….…………………………………………..…. 
Chapter 8 Conclusion and Future Work….……………………….……………………….….… 
8.1 Introduction……………….…………………………………………………….……. 
8.2 Summary of Work………..………………………………………………………..…. 
8.3 Recommended Future Work…………………………………………………………. 
Appendix……………………………………………………………………………………..….. 
Preferences………………………………………………………………………………………. 
Abstract………………..……………………..………………………………………………..… 
Autobiographical Statement…………………………………………………………………….. 
 
 
 
 
 
 
 
 
61 
 
66 
 
71 
 
 
 
72 
 
72 
 
77 
 
85 
 
99 
 
100 
 
100 
 
100 
 
102 
 
106 
 
143 
 
153 
 
155 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 vii
LIST OF TABLES 
 Table (1.1):  Residue Digits for Unsigned Numbers…..………………………..………………... 
 Table (1.2): Residue Digits for Signed Numbers ……….…………………………………….….. 
 Table (1.3): Majority Function M(A,B,C)……….………….……………………………………. 
 Table (2.1): Equivalent QCA Expressions for Major Boolean Gates ……..……………….…..… 
 Table (3.1): Shows Comparison Among the Three Designs Implementation ……………...…..... 
 Table (3.2): Comparison Among Behrooa, Alia and the new design…….………………….….... 
 Table (5.1): Total RNS Adder and Subtractor Delay Time (PS)………………………...……….. 
 Table (5.2): RNS Adder and Subtractor Design Parameter ………………….………….……….. 
 Table (6.1): The Chart for Deriving XOR Equivalent Function “f1”…………………………….. 
 Table (6.2): The Chart for Deriving XOR Equivalent Function “f2” ….…..…………...………... 
 Table (7.1): Boolean Functions and Their Equivalent QCA…………………….……………….. 
 Table (7.2): QCA Pipeline Array Arithmetic Summary Operations…………….……………….. 
 Table (7.3): Subtrahend at Different Levels for Square Rooting………………..….……..……… 
 Table (7.4): QCA Performance Comparison Between the Two Designs………………………… 
 Table (7.5): COMS Performance Comparison Between the Two Designs……………….……… 
 Table (7.7): Design Delay Time (PS)…………………………...……………..…………………. 
 Table (7.8): Complexity Fuzzy Results for Pipeline Array Cells……………..…………………. 
 
 
 
 
 
 
 
 
 
4 
 
5 
 
6 
 
20 
 
40 
 
42 
 
58 
 
58 
 
66 
 
70 
 
75 
 
81 
 
82 
 
91 
 
96 
 
96 
 
98 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 viii
LIST OF FIGURES 
 Figure (1.1): Block Diagram that Shows the Research Logical Flow…......…………………..…. 
 Figure (2.1a): QCA Cells and Binary Encoding ….……………………………............................ 
 Figure (2.1b): QCA Majority Gate …………………………………………...………………….. 
 Figure (2.1c):  QCA Inverter Gate…....………………………….………………….…………… 
 Figure (2.1d): QCA Wire Types…....………………………………….………….……………... 
 Figure (2.2a):  QCA Clock Phases……………………………………………….………………. 
 Figure (2.2b): Four QCA Interdot Barrier State……………………………………...................... 
 Figure (3.1): Two Bits (b1 & b0) Binary to RNS Conversion …………………..………….….... 
 Figure (3.2): Prefix Logic Operation and Their Implementation ………………………............... 
 Figure (3.3): Three Bits (b2, b1, b0) Binary to RNS Conversion………........................................ 
 Figure (3.4): Four Bits Binary to RNS System…………………………..….………..................... 
 Figure (3.5): Six Bits Binary to Residue Number System Conversion………………..………..... 
 Figure (3.6): Prefix Structure of 8 Bits Binary to RNS ……………………………...................... 
 Figure (3.7a): Eight Bits Binary to Residue Number System Conversion…………..…………… 
 Figure (3.7b): Example for signal propagation……………………...……….…………................ 
 Figure (3.8): Prefix Structure of 10 Bits binary to RNS……..……………………..…………….. 
 Figure (4.1): Shows the Fermat’s Implementation for Finding Coefficient “c”………..………… 
 Figure (4.2): Shows Extended Euclid Algorithm Implementation………………….……………. 
 Figure (4.3): Shows Implementation of Finding Coefficient “aij”……………………………….. 
 Figure (5.1):  RNS Adder and Subtractor ………………………………….………….................. 
 Figure (5.2): Waveform of RNS Adder and Subtractor………………….……………………….. 
 Figure (5.3): RNS Adder and Subtractor Layout …….………………...………………………… 
 Figure (5.4): Encounter Part of RNS Adder and Subtractor ………………………....................... 
 Figure (5.5): Virtuoso Part of RNS Adder and Subtractor …..…………..……………................. 
 
 
7 
 
18 
 
18 
 
19 
 
19 
 
20 
 
21 
 
25 
 
26 
 
27 
 
28 
 
32 
 
34 
 
36 
 
37 
 
39 
 
48 
 
48 
 
48 
 
55 
 
56 
 
57 
 
57 
 
59 
 
 
 ix 
 Figure (6.1a): QCA Cells and Binary Encoding …………………….…………………………… 
 Figure (6.1b): QCA Majority Gate …………………..………………………….……………...... 
 Figure (6.1c): QCA Inverter Gate…....…….…..……………………….…….…….……………. 
 Figure (6.1d): QCA Wire Types ……….....…………………………….…..……………………. 
 Figure (6.2a):  QCA Clock Phases………….…….………………….….………..……………… 
 Figure (6.2b): Four QCA Interdot Barrier State….……………………..…..…….……………... 
 Figure (6.3): XOR-AND Function Extraction Methodology……………………………………. 
 Figure (6.4): Majority Tree for Function “f1”……………………………………........................ 
 Figure (6.5): Majority Gates Schematic for Function “f1”…………..…………..……………..... 
 Figure (7.1): QCA Cells and Binary Encoding ….…….…………….….………...……………… 
 Figure (7.2): QCA Wire Types ………….....……………………….…..……...………………… 
 Figure (7.3): QCA Majority Gate ……………………..…………………………………………. 
 Figure (7.4):  QCA Inverter Gate…………………………………………………..….................. 
 Figure (7.5):  QCA Clock Phases…………….……....…………………………..…..…………... 
 Figure (7.6): Block Diagram for Pipeline Array……………………………….…………………. 
 Figure (7.7a): Arithmetic Cell ………………………..………….………………..……………… 
 Figure (7.7b): Control Logic Cell………………….…………………… ……………………….. 
 Figure (7.8): QCA Arithmetic Cell and Control Cell Arithmetic Cell…..…….…...…………….. 
 Figure (7.9): QCA High Speed Arithmetic Cell…………………………….…….……………… 
 Figure (7.10): QCADesigner Layout for Arithmetic Cell Unit ……………………..…………… 
 Figure (7.11): QCADesigner Layout for Control Cell Unit ……………….……...……................ 
 Figure (7.12):  Simulation for Control Cell Unit…………………………….……..….................. 
 Figure (7.12):  Simulation for Arithmetic Cell Unit …………...…………..……..……………... 
 Figure (7.13): QCADesigner Layout for Arithmetic Cell Unit………………...………................ 
 Figure (7.14): Simulation for High Speed Arithmetic Cell Unit…….…………........................... 
62 
 
62 
 
63 
 
63 
 
64 
 
64 
 
65 
 
67 
 
68 
 
72 
 
73 
 
73 
 
74 
 
74 
 
78 
 
79 
 
79 
 
84 
 
85 
 
86 
 
87 
 
87 
 
88 
 
89 
 
90 
 
 
 x
 Figure (7.15): Waveform of Pipeline Squaring Output Result ……………………….………….. 
 Figure (7.16): Waveform of Pipeline Square Rooting Output Result………………….….……… 
 Figure (7.17): Multisim Implementation of Arithmetic Cell…………………………..…………. 
 Figure (7.18): Multisim Implementation of Control Cell …………………………………..……. 
 Figure (7.19): Multisim Implementation of High Speed Arithmetic Cell…………………...…… 
 Figure (7.20): Arithmetic cell FPGA packaging………………………………………..………… 
 Figure (7.21): Control cell FPGA packaging………………………………………..……………. 
 Figure (7.22): High speed arithmetic cell FPGA packaging…….…………………….………….. 
 Figure (7.23): Arithmetic cell FPGA schematic layout……………………………...………….... 
 Figure (7.24): High Speed arithmetic cell FPGA schematic layout ………………….….………. 
 Figure (7.25): Control cell FPGA schematic layout………………………………...……………. 
 Figure (7.26): Encounter Part of QCA Pipeline Array……………………………...……………. 
 Figure (7.27): Virtuoso Part of Padding…………………..…………………………..…….…….. 
 Figure (7.28): Virtuoso Part of Pipeline Array……………………………………..…………..… 
 Figure (7.29): Hardware Complexity Fuzzy Concept…..………..……………………………..… 
 
 
 
. 
 
91 
 
92 
 
92 
 
93 
 
93 
 
94 
 
94 
 
94 
 
94 
 
95 
 
95 
 
97 
 
98 
 
99 
 
99 
 
1 
 
CHAPTER 1 
INTRODUCTION 
1.1 Introduction to RNS 
In the last decade, Residue Number System (RNS) has received increased attention 
due to its ability to support high speed concurrent arithmetic applications such as  Fast 
Fourier Transform (FFT), image processing and digital filters utilizing the efficiencies of 
RNS arithmetic in addition and multiplication.  In spite of its effectiveness, RNS has 
remained more an academic challenge and very little impact in practical applications due 
to the complexity involved in the conversion process, magnitude comparison, overflow 
detection, sign detection, parity detection, scaling and division.  
The advancements in Very Large Scale Integration (VLSI) technology and the 
demand for parallelism computation have enabled researchers to consider RNS as an 
alternative approach to high speed concurrent arithmetic [10], [11].  
RNS is an unweighted representation system of numbers.  The difference between 
RNS and fixed radix systems is that no fixed base is used in the representation of RNS 
numbers. RNS is based on modular arithmetic operations and it is a carry-free system that 
performs addition, subtraction and multiplication as parallel operations.  
 
 
 
 
 
2 
 
1.2 Residue Number System Representation 
For any given set of relatively prime modulo set (m1, m2, m3, ... , mn), the residue 
representation of an binary number X is (x1 , x2, x3,…, xn) ; Where X can be defined by N 
equations.   
X = mi qi + xi          (1.1) 
Where xi is the least positive remainder of division X by mi and imix0 <≤ ; qi is the 
smallest positive integer of 





im 
X
.  
1.3 Residue Dynamic Range 
The residue representation of number is unique for any integer [ ]1-M0, X ∈ , where M 
is called dynamic range. 
∏
=
=
n
1i
im  M          (1.2)   
For signed numbers, one has to distinguish two cases 
Case 1: 
The product M is an even number. This occurs if one modulo is an even number and the 
range is defined as  




−∈ 1
2
M
,
2
M
-X         (1.3) 
and all number 
 
2
MX ≥          (1.4) 
are negative numbers 
 
3 
 
Case 2:  
The product M is an odd number. This occurs if all modulo are odd numbers and the 
range is defined as  




∈
2
1-M
,
2
1-M
-X                     (1.5) 
and all numbers according to  
1
2
1-MX +≥                    (1.6) 
are negative numbers. 
The RNS representation of negative number –X is 
n21n21
m,,m,mmnnm22m1110
)xm,,xm,xm()X(
K
K −−−→− RNS (1.7) 
To illustrate the residue representation, consider the three modulo set (2, 3, 5) 
example. The list of the positive numbers from 0 to M-1 and their RNS representation is 
shown in table 1.1. The list of positive and negative numbers from (-15, 14) and their 
RNS representation is shown in table 1.2. 
 
 
 
 
 
 
 
 
4 
 
TABLE 1.1 
RESIDUE DIGITS FOR UNSIGNED NUMBERS FOR M < 30 
 
Integer 
Residue digits 
  
Integer 
Residue 
digits 
Modulo Modulo 
2 3 5 2 3 5 
x1 x2 x3 x1 x2 x3 
0 0 0 0 15 1 0 0 
1 1 1 1 16 0 1 1 
2 0 2 2 17 1 2 2 
3 1 0 3 18 0 0 3 
4 0 1 4 19 1 1 4 
5 1 2 0 20 0 2 0 
6 0 0 1 21 1 0 1 
7 1 1 2 22 0 1 2 
8 0 2 3 23 1 2 3 
9 1 0 4 24 0 0 4 
10 0 1 0 25 1 1 0 
11 1 2 1 26 0 2 1 
12 0 0 2 27 1 0 2 
13 1 1 3 28 0 1 3 
14 0 2 4 29 1 2 4 
 
 
 
 
 
 
 
 
 
 
5 
 
TABLE 1.2 
RESIDUE DIGITS FOR SIGNED NUMBERS (-15, 14) 
Integer 
Residue digits 
  
Integer 
Residue 
digits 
Modulo Modulo 
2 3 5 2 3 5 
x1 x2 x3 x1 x2 x3 
0 0 0 0 -1 1 2 4 
1 1 1 1 -2 0 1 3 
2 0 2 2 -3 1 0 2 
3 1 0 3 -4 0 2 1 
4 0 1 4 -5 1 1 0 
5 1 2 0 -6 0 0 4 
6 0 0 1 -7 1 2 3 
7 1 1 2 -8 0 1 2 
8 0 2 3 -9 1 0 1 
9 1 0 4 -10 0 2 0 
10 0 1 0 -11 1 1 4 
11 1 2 1 -12 0 0 3 
12 0 0 2 -13 1 2 2 
13 1 1 3 -14 0 1 1 
14 0 2 4 -15 1 0 0 
 
1.4 Quantum-Dot Cellular automata (QCA) 
During past decade, Quantum-Dot Cellular Automata (QCA) has demonstrated the 
ability to implement both combinational and sequential logic devices [76]-[82]. Unlike 
conventional Boolean AND-OR-NOT based circuits, the fundamental logical device in 
QCA Boolean networks is majority gate which implements the Boolean function  
M(A,B,C) = AB+AC+BC     
         
(1.8) 
 
 
 
 
6 
 
 
TABLE 1.3 
MAJORITY FUNCTION M(A,B,C) 
A B C M(A,B,C) 
0 0 0 0 
0 0 1 0 
0 1 0 0 
0 1 1 1 
1 0 0 0 
1 0 1 1 
1 1 0 1 
1 1 1 1 
  
      With combining these QCA gates with NOT gates any combinational or sequential 
logical device can be constructed from QCA cells [76]-[82]. The process of QCA 
Boolean logic is more sophisticated than Boolean logic. The traditional Boolean logic 
reduction methods such as Kranaugh maps produce simplified Boolean expressions, 
However, converting these forms to QCA Boolean is not simple process due to 
complexity of multilevel majority gates. In chapter two, we will present literature review 
and background material for RNS and QCA. 
 
1.5 Problem Statement  
Residue number system  is a robust parallel system and it received attention due to its 
ability to support high speed concurrent arithmetic applications such as addition, 
subtraction and multiplication in modular levels. This system suffers from some 
weakness such as conversion process, scaling, division, overflow detection and 
7 
 
magnitude comparison. In this dissertation, we have proposed new techniques to solve 
the conversion and scaling issues. These techniques have been proved mathematically 
and verified through VLSI simulation. 
One of the new fields of nanotechnology is Quantum-dot cellular automata. Due to its 
ultra-power and small size, QCA has the potential to become the future of CMOS 
technology. Quantum-dot cellular automata uses different logic devices to design circuits 
other than Boolean logic devices. Converting Boolean circuits to QCA Boolean is not 
simple process due to complexity of QCA and existing Boolean reduction methods do not 
work with QCA logic. In this thesis, we have proposed a new QCA construction reduction 
method that utilizes the VLSI techniques that were used in RNS system. Fig 1.1 shows 
block diagram that explains the flow of our research. 
 
 
 
 
 
 
 
 
 
 
 
 
Fig. 1.1.  Block diagram that shows the research logical flow 
 
RNS 
Size Speed 
VLSI 
Digital 
Logic 
Nano 
Logic 
Quantum-Dot Cellular 
Automata (QCA) 
Carbon 
Nanotubes (CNT) 
Single Electron  
Transistor (SET) 
8 
 
1.6 Thesis Organization  
This thesis contains eights chapters. Chapter one is introduction, Chapter two starts 
with lecture reviews of residue number system and quantum-dot cellular automata 
system. Chapter three presents the new binary to residue number system conversion 
method. Chapter four presents the new scaling methods. VLSI RNS adder and subtractor 
implementation is presented in chapter five. Chapter six presents the new quantum 
boolean circuit construction reduction methodology. Chapter seven presents the QCA 
implementation of pipeline array and chapter eight discuss the summary of the thesis 
work and future research work. The dissertation also includes appendix that shows the 
updated FPGA, Cadence Encounter and Virtuoso procedures. 
 
 
 
 
 
 
 
 
 
 
 
 
 
9 
 
CHAPTER 2 
LITERATURE REVIEW 
2.1 Introduction 
The Residue number system has attracted researchers due to its advantage in the 
modular arithmetic operations such as addition, subtraction and multiplication since RNS 
provides the ability to add, subtract or multiply without the need to wait for the carry 
propagation as required by the weighted number systems. Also, RNS has shown 
significant efficiency in implementing Discrete Fourier Transformation and digital filters. 
In addition, quantum-dot cellular automata has showed the ability to implement both 
combinational and sequential logic devices. It is one of the new emerging 
nanotechnology and it has the potential to become the  future of CMOS technology due 
to its ultra-power and small size. This chapter provides a review on the RNS and QCA 
topics that are relevant to the dissertation works. Section 2.2 introduces addition, 
subtraction and multiplication RNS operations. Section 2.3, 2.4, 2.5 and 2.6 give an 
overview on conversion from RNS to binary, sign detection, RNS scaling and RNS fast 
processing applications. Section 2.7 presents review for QCA topics. 
 
2.2 RNS Arithmetic Operations 
      Residue Number System is an unweighted system with carry-free and borrow-free 
arithmetic operations. Addition, subtraction and multiplication are carried out on each 
residue digit concurrently and independently. This simplifies supporting parallel high-
speed concurrent computation. 
10 
 
       Addition can be accomplished simply by adding the small integer values together. 
The following equation explains the RNS addition operation. 
m
mm10 ba)ba( +⇔+     (2.1) 
Where a and b are integers. For example, the addition of the decimal number (10 + 8)10 
using the modulo set (2, 3, 5) is illustrated below. 
 
 
5,3,2
5,3,210
5,3,210
)3,3,0(
)3,2,0()8(
)0,1,0()10(
+→+
→
RNS
RNS
  
                                   105,3,2 )18()3,3,0(
1
 →
−RNS
 
 
The subtraction operation can be performed similar to the addition operation using the 
additive inverse of subtrahend (equation 1.7) 
For example, the subtraction of decimal number (18 - 10) using the modulo set (2, 3, 5) is 
illustrated below  
5,3,210 )0,2,0()10( →− RNS  
and  
5,3,2
5,3,210
5,3,210
)3,2,0(
)0,2,0()10(
)3,3,0()18(
+→−
→
RNS
RNS
 
 
 
Multiplication can be accomplished in a manner similar to addition and subtraction as 
follows. 
m
mm10 b*a)b*a( ⇔     (2.2) 
105,3,2 )8()3,2,0(
1
 →
−RNS
11 
 
For Example, the multiplication of (3 * 9)10 using modulo set (2, 3, 5) is illustrated below. 
5,3,2
5,3,210
5,3,210
)2,0,1(
)4,0,1(*)9(*
)3,0,1()3(
→
→
RNS
RNS
  
          105,3,2 )27()2,0,1(
1
 →
−RNS
 
 
 
2.3 Conversion from RNS to Binary  
       Several methods are available for converting residue to binary system, most of these 
methods are based on two techniques the first one is Chinese Remainder Theorem (CRT) 
and the second one is Mixed Radix Converters (MRC) [9]. 
 
2.3.1 Chinese Remainder Theorem Conversion  
      The Chinese Remainder Theorem is a basic conversion method. The problem 
associated with CRT approach is the requirements of M modulo adders. 
Definition 
For any given set of relatively prime modulo set (m1, m2, m3, ... , mn), the residue 
representation of an binary number X is (x1 , x2, x3,…, xn) . Number X can be represented 
as 
Mm
^
i
i
n
1i
^
i
i
m
x
m=X =∑
=  
Where imix0 <≤ ,  ∏
=
=
n
1i
im  M  
12 
 
im
i
^
i
i
^
m
1
 and    ,
m
M
m == is the multiplicative inverse of  
im
1
i
^
m
−
 
i
i
i
m
2m
i
^
m
i
^
m  
m
1
 
−
==
 
To illustrate how the method works, consider the following example  
Example: 
For modulo set (5, 7, 11), find the decimal number whose residue representation is  
(4, 5, 7) and M = 385 
 
Solution:  
35
11
385
m       ,55
7
385
m     ,77
5
385
m 3
^
2
^
1
^
======  
6
35
1
m     ,6
55
1
m     ,3
77
1
m
11
1
2
^
7
1
2
^
5
1
1
^
=========
−−−
 
96*7
m
x
      ,26*5
m
x
      ,23*4
m
x
11
m
^
3
3
7
m
^
2
2
5
m
^
1
1
221
=========  
 
 
 
1949*352*552*77
m
x
m=X 385
Mm
^
i
i
3
1i
^
i
i
=++==∑
=
13 
 
1011,7,5 )194()7,5,4(
1
 →
−RNS
 
 
2.3.2 Mixed Radix Conversion  
Converting RNS back to binary numbering system is challenging process and one 
of the methods that can be used to do this conversion is done by using mixed radix 
number system [9]. The problem associated with MRC lies in the complexity of finding 
mixed radix digits. 
Definition 
An integer  number X may be expressed in mixed radix form as 
∏−
=
++××+×+=
1n
1i
in213 121 ma....mma maa  X         (2.3) 
Where the mixed radix digits are determined sequentially by following manner starting 
with a1. 
All terms in equation (2.4) except the first are multiples of m1, consequently  
1m1 xX a 1 ==  
To obtain a2  
 Subtract   X – a1 
 Divide   by m1    
1
1
m
)a-(X
 
 Then take  mod m2             
2m1
1
m
)a-(X
 
and by successively subtracting and dividing in residue notation, all mixed  radix digits 
can be obtained. In general form mixed radix digits can be defined as: 
14 
 
∏−
=
++××+×+=
1n
1i
in213 121 ma....mma maa X  
The following example illustrates how this method works. 
Example: 
Find the decimal number that is represented by (1, 1, 3) for modulo set (2, 5, 7). 
Solution: 
Modulo set    2 5 7 
RNS    1 1 3  a1 = 1 
Subtract a1   1 1 0    
             0 4 3         
jjj m
m1jjm1j x-mxx-x +=  
Multiply by 
im
2
1
  =    2 5  a2 = 2 
Subtract a2    2 0   
     0 5 
Multiply by 3
5
1
  
7
==     1  a3 = 1 
X= a1 + a2*m1 +a3*m1*m2  =  1 + 2*2 + 1*2*5 = 15 
107,5,3 )15()1,0,2(
1
 →
−RNS
 
15 
 
2.4 Residue Number System Sign Detection 
      In residue number system sign detection is relatively a difficult operation compared to 
weighted systems.. The sign in RNS is a function of each digit and it is closely related to 
magnitude determination.  
       In section 1.3 case 1, showed that  all numbers in range [0, M/2 -1 ] are positive and 
all numbers in range [M/2, M-1] are negative; and in case 2, all numbers in range [0, (M -
1)/2] are positive and all numbers in range [(M+1)/2, M-1] are negative. Magnitude sign 
detection algorithm based on mixed radix process and case 1 is found in [9]. 
Definition  
Select the last modulo in the mixed radix conversion to be even )m( n . Then, it is easily 
seen that 1
2
m
a0 nn −≤≤  implies that MX  falls into the interval ]12
M
,0[ −  and therefore, 
can be considered positive. Conversely, 1ma
2
m
nn
n
−≤≤  implies that 
M
X falls into the 
interval ]1M,
2
M[ −  and therefore, can be considered negative [3]. 
Example:  
Find the sign of [1, 2, 3] for modulo set (2, 3, 5) 
 Solution: 
The dynamic range is M = 30 and number in range of (0, 14) are positive and numbers in 
range of (15, 39) are negative.  
First arrange the ordering of the modulo to place the even modulo 2 on the end  
Modulo set    5 3 2 
RNS    3 2 1  a1 = 3 
16 
 
Subtract a1   3 3 0  
    0 2 1  
Multiply by 
im
5
1
  =
   1 1  a2 = 1 
Subtract a2    1 0      
     0 1  
Multiply by 
2
3
1
  =     1  a3 = 1 
a3 = 1 implies that number (1, 2, 3) falls into interval (15, 29). It follows that (1, 2, 3) is 
negative 
X= a1 + a2*m1 +a3*m1*m2  =  3 + 1*5 + 1*5*3 = 23 
 
2.5 Scaling in Residue Number System 
       Scaling is comparatively a difficult RNS operation. Scaling is an essential operation 
in several signal processing algorithms. In binary system, the scaling constant is usually a 
power of 2.  Many scaling techniques reported in RNS literatures [12]. The RNS scaling 
by constant Q is defined as 






= Q
XY          (2.4) 
Q
X-X
Y Q=
         (2.5) 
and in RNS representation  
17 
 
i
i
m
Q
m Q
X-X
Y =         (2.6) 
If Q, the divisor is a modulo or product of the first powers of modulo, multiplicative 
inverse property can be used to simplify the division by Q 
i
i
m
Q
m
Q
X-X
Y ======         (2.7) 
2.6  RNS Fast Processing Applications 
 
       The advantages of residue number system are discussed in several publications and 
books [9]. Carry- free computation, simplified and fast addition and multiplication, which 
helps to obtain parallel architectures, are among the important advantages. Potential 
applications for RNS processors include fast DSP applications, adaptive array processing, 
Kalman filtering, Fast Fourier Transforms, and image processing for communications, 
surveillance, and intelligence systems. 
 
2.7 Quantum-Dot Cellular Automata (QCA) 
 
      In this section, we present background QCA material that will be helpful to 
understand the QCA topics. 
   QCA Cell: A quantum cell can be viewed as a set of four charge containers or dots 
positioned at the corners of a square as shown in fig. 2.1a. Each QCA cell contains two 
mobile electrons that can move to any quantum dot through electron tunneling. Thus 
there are two equivalent electrons arrangement polarization P = +1 (Logic 1) and P = -1 
(logic 0). 
18 
 
 
Fig. 2.1a. QCA cells and binary encoding  
 
   QCA Majority Gate: The basic QCA logic element is a majority gate as shown in fig. 
2.1b. It produces an output of one if the majority of inputs one. The classical AND and 
OR gates can be realized with majority gate by fixing one of three inputs as 0 or 1 
respectively, as follows: 
M(A,B,0) = AB               
 
(2.8a) 
M(A,B,1) = A+B             
 
(2.8b) 
 
 
 
Fig. 2.1b. QCA majority gate 
 
19 
 
  QCA Inverter: QCA cells layout of an inverter is shown in fig. 2.1c. The polarization 
of the output QCA cell “out” is opposite of input QCA cell “in”. 
 
 
 
Fig.  2.1c. QCA inverter gate 
 
 
  QCA Wire: There are two types of QCA wires normal (also called 90ο) and diagonal 
(also called 45ο). Fig. 2.1d shows the two QCA wire types with logic one polarized. 
 
 
Fig. 2.1d. QCA wire types 
 
20 
 
2.8  QCA Clocking 
       QCA cells use four phase scheme namely clock 1, clock 2, clock 3 and clock 4 as 
shown in fig. 2.2a. Every clock is 90ο out of phase form its pervious clock and each clock 
has four states namely switch, hold, release, and relax [77]. In switch state, QCA cells 
start polarized. In hold state, the cells retain it polarization and during release and relax 
states, QCA cells are unpolarized as shown fig. 2.2b.  
 
Fig. 2.2a. QCA clock phases; each clock lagging its prior by 90ο 
 
21 
 
 
Fig. 2.2b. Four QCA interdot barrier states 
 
Table 2.1 shows the equivalent QCA expressions for major Boolean gates. 
 
 
TABLE 2.1 
 
MAJORITY EXPRESSIONS AND DIAGRAMS FOR MAJOR BOOLEAN GATES 
 
 
22 
 
 
 
2.9  Conclusion 
Residue number system is a robust parallel system that supports high speed 
concurrent arithmetic applications such as addition, subtraction and multiplication in 
modular levels. However it suffers from some drawbacks. RNS has weakness such as 
conversion process, scaling, division, overflow detection and magnitude comparison. 
Quantum-dot cellular automata also is one of the new emerging nanotechnology and it is 
the future of CMOS technology due to its ultra-power and small size. Quantum-dot 
cellular automata uses different logic devices to design circuits other than Boolean logic 
devices. Converting Boolean circuits to QCA Boolean is not simple process due to 
complexity of QCA and existing Boolean reduction methods do not work with QCA 
logic. In thesis, we present a new binary to residue number system, new RNS scaling 
methodology, RNS adder and subtarctor implementation, a new QCA construction 
reduction method and QCA pipeline array implementation. 
23 
 
CHAPTER 3 
 
NOVEL PARALLEL - PERFIX STRUCTURE BINARY TO RESIDUE NUMBER SYSTEM 
CONVERSION METHOD 
3.1. Introduction 
  In the last decade, Residue Number System (RNS) has received increased attention due to its 
ability to support high speed concurrent arithmetic applications [1-3] such as Fast Fourier 
Transform (FFT), image processing and digital filters utilizing the efficiencies of RNS arithmetic in 
addition and multiplication. The advancements in Very Large Scale Integration (VLSI) technology 
and demand for parallelism computation have enabled researchers to consider RNS as an alternative 
approach to high speed concurrent arithmetic. 
 Several methods are found in literature for binary to RNS conversion. Alia and Martinelli [4] have 
proposed a method for binary to residue conversion based on powers of 2. A modification to the 
above method was proposed by Cappocelli and Giancarlo [5]. Mohan [6] has proposed a similar 
method but with difference that his method is based on the cyclic property of power of 2 modulo 
set. Behrooa[7]  proposed a table lookup schemes for binary to Residue conversions.  
 In this chapter, we present a novel binary to residue number system conversion method. The 
organization of this chapter is as follows. Section two explains RNS system. In section three, we 
present new conversion from binary to RNS algorithm. Section four and five show illustrative 
example and implementation selection techniques. Section six is comparison between the new 
method and pervious work. Conclusion is in section seven.   
 
 
 
 
24 
 
3.1. Residue Number System  
       Any n-bit nonnegative integer number X, in the range 0 ≤ X ≤ 2n-1 is represented in binary 
number system as  ∑
−
=
−
=++++=
1n
0j
j
j
012
2
1n
1-n b2bb 2b2...b2X   
where { }1,0b j ∈ . 
Meanwhile in RNS, X is represented by k residue digits xi as X = {x1, x2, x3, …, xk}  where xi  = X 
mod mi   and mi belong to set of relatively prime modulo; { }k321i m , ... ,m ,m ,mm ∈  [9]. If the 
modulo are relatively prime numbers, there is a unique RNS representation for each integer in range 
i
s
1i m X 0 =∏≤≤  
3.2. New Novel Conversion Method from Binary to Residue Representation 
       As shown above an integer number X can be represented in binary system as 
∑
−
=
−
=++++=
1n
0j
j
j
012
2
1n
1-n b2bb 2b2...b2X
 
And   RNS representation of number X  is 
2mforb2X
m
1-n
0j mj
j
m
>= ∑
=
  for   m  >  2 
m
1-n
0j jm
j b2∑
=
=                 (3.1) 
Let ]Y,Y,Y,Y)[A,A(M 321001AA 01 =   denotes a 2-bit multiplexer where the 2 control bits (A1, 
A0) select the inputs ( 3210 Y,Y,Y,Y  ) to be outputted 
Lemma 3.1:   For any pair of bits bj& bi   for j & i  ≥ 0, 
mjimim
i
j
m
j Xb2b2 =+
    
can be implemented using 2-bit multiplexer : 
]22,2,2,0)[b,b(M
mm
i
m
j
m
j
m
i
ijji +=                        (3.2) 
Where the control bits (A1, A0) equal (bj, bi) 
25 
 
Proof: 
Rewrite equation 
m
i
m
i
j
m
j b2b2 +   as 
)b.b.22()b..b2()b.b.2()b.b.0( ij
mm
i
m
j
ijm
j
ijm
i
ij ++++   
This is equivalent to 2-bit multiplexer Mji with control bits (A1, A0) equal (bj, bi). Fig. 3.1 shows the 
implementation for equation (3.2) with bj = b1 and bi= b0 
 
 
Fig. 3.1. Two bits (b1 & b0) binary to residue number system conversion 
This pre-processing operator Mji is represented in acyclic graph as node  "   " in fig. 3.2a, where all 
the inputs are constants and pre-calculated .   
 
 
 
 
26 
 
 
Fig. 3.2. Prefix logic operation and their implementation 
In three bit system, let ,Y)[A,A,A(M 0012AAA 012 =   ]Y,Y,Y,Y,Y,Y,Y 7654321  denotes a 3-bit 
multiplexer where the 3 control bits (A2, A1, A0) select the inputs ( 76543210 Y,Y,Y,Y,Y,Y,Y,Y )  to be 
outputted. 
Lemma 3.2:   For any three bits bk, bj & bi   for k, j & i  ≥ 0, 
mkjimim
i
j
m
j
k
m
k Xb2b2b2 =++ can be   
implemented using 3-bit multiplexer : 
,2,2,0)[b,b,b(M
m
j
m
i
ijkkji = ,22,2,22
mm
i
m
k
m
k
mm
i
m
j ++  
27 
 
]222,22
mm
i
m
j
m
k
mm
j
m
k +++                                                                                                   (3.3)  
Where control bits (A2, A1, A0) equal (bk,, bj, bi) 
Proof: 
Rewrite  
m
i
m
i
j
m
j
k
m
k b2b2b2 ++   as  
+++ )b..bb.2()b.bb.2()b.b.b.0( ijk
m
j
ijk
m
i
ijk +++ )b.b..b2()b..bb.22( ijk
m
k
ijk
mm
i
m
j
 
++ )b.bb,22( ijk
mm
i
m
k ++ )b..bb22( ijk
mm
j
m
k
   
)b..bb.222( ijk
m
i
m
j
m
k ++      
Above equation is equivalent to 3-bit multiplexer with bk, bj & bi as selection control inputs. Fig. 3.3 
shows the implementation for equation (3.3) with bk = b2  bj = b1 and bi= b0 
 
Fig. 3.3.  Three bits (b2, b1, b0) binary to residue number system conversion 
This pre-processing operator Mkji is represented in acyclic graph as node "  " in fig. 3.2b, where 
all the inputs are constants and pre-calculated. 
28 
 
 
Theorem 3.1:   For any two pairs of bits (bl & bk ) (bj & bi for j, i , l & k  ≥ 0 with the given 
expression  
mlkjimim
i
jm
j
km
k
lm
l Xb2b2b2b2 =+++  
can be implemented using 2-bit multiplexer  
,M,)[0b(b),b(b M jiijkllkji ++= ]MM,M jilklk +                                              (3.4) 
Where control bits (A1, A0) equal (bl,+bk ,  bj,+bi) 
Proof: 
m
i
m
i
j
m
j
k
m
k
l
m
l
mlkji b2b2b2b2X +++=  
mjilkmlkji MMX +=                  (3.5)  
Where 
m
k
m
k
l
m
l
lk b2b2M +=    and   
m
i
m
i
j
m
j
ji b2b2M +=   by Lemma 1  
Let   blk = (bl + bk)   and  bji = (bj+bi)     
Rewrite equation (3.5)  as ++ )b.b.M()b.b.0( jilkjijilk  )b.b.MM()b..bM( jiljilkjilklk k++ .    
And this is equivalent to  2-bit multiplexer Mlkji with control bits (A1, A0) equal  (bl+ bk , bj+ bi). Fig. 
3.4 shows implementation for two pair bits (b3, b2) & (b1, b0) 
 
Fig. 3.4. Four bits binary to RNS 
m
01m
1
2m
2
3m
3
m3..0 bb2b2b2X +++=  
29 
 
Lemma 3.3: 
Combining two pairs of bits (bl & bk ) (bj & bi ) requires one 2-bit multiplexer and one 2 input mod 
adder.   The delay time 
22 modadder mux Total ττ τ +=  
Proof: 
Equation (3.4) and fig. 3.4 show that ]MM,M,M,)[0b(b),b(b jilklkjiijkl +++  
is equivalent to one 2-bit multiplexer and  one 2-input mod adder;  and delay time is equal to 
 
22 modadder mux τ τ + .   
Fig. 3.2c represents acyclic graph "  " for node Mlkji   where Mlk & Mji  are inputs. 
Lemma 3.4:  
The parallel prefix operator   has the following properties 
1) Commutative  
lkM   jiM     =  jiM    lkM      
2) Associative 
lkM  hgM  (  )M ji  = lk(M  )M hg   jiM  
Proof: 
lkM   jiM     =     lkjiM  
      
,M,)[0b(b),b(b jiijkl ++= ]MM,M jilklk +  
      
m
i
m
i
j
m
j
k
m
k
l
m
l b2b2b2b2 +++=                                                                       (3.6) 
    jiM     lkM   =     jilkM  
        ,M,)[0b(b),b(b klklij ++= ]MM,M lkjiji +  
      
m
k
m
k
l
m
l
i
m
i
j
m
j b2b2b2b2 +++=                                                                     (3.7) 
30 
 
Both expressions (3.6) and (3.7) are the same by commutative property of  "+"  hence  operator is 
commutative  
lkM   hgM  (   )M ji   =   lkM   hgjiM   
             lkhgjiM =          
             +++= h
m
h
k
m
k
k
m
l b2b2b2   
m
i
m
i
j
m
j
g
m
g b2b2b2 ++
                             (3.8) 
lk(M    )M hg       jiM  =  lkhgM   jiM  
              lkhgjiM =  
               +++= h
m
h
k
m
k
k
m
l b2b2b2    
m
i
m
i
j
m
j
g
m
g b2b2b2 ++
                         (3.9) 
Both expressions (3.8) and (3.9) are the same by associative property of  "+"  hence   operator is 
associative 
Theorem 3.2:   For any three pairs of bits ( bl & bk) ,  
(bj & bi ) and (bh & bg )  for l, k, j, i, h & g ≥ 0  with given expression  
  ++++ i
m
i
j
m
j
k
m
k
l
m
l b2b2b2b2   
mlkjihgmgm
g
h
m
h Xb2b2 =+
  
can be implemented using 3-bit multiplexer  
,M)[0,b(b),b(b),b(b M hgghijkllkjihg +++= ,MM,M,MM,M hglklkhgjiji ++
 
]MMM,MM hgjilkjilk +++                                                                           (3.10) 
Where control bits (A2, A1, A0) equal (bl+bk , bj+bi , bh+bg,) 
Proof: 
+++= j
m
j
k
m
k
l
m
l
mlkjihg b2b2b2X   mgm
g
h
m
h
i
m
i b2b2b2 ++  
mhgjilkmlkjihg MMMX ++=                                                                                  (3.11) 
Where 
m
k
m
k
l
m
l
lk b2b2M +=   ,     
31 
 
 
m
i
m
i
j
m
j
ji b2b2M +=      and  
m
g
m
g
h
m
h
hg b2b2M +=  by Lemma 1  
Let   blk = (bl + bk) , bji = (bj + bi)   and   bhg = (bh + bg)    
Rewrite equation (3.11)   as 
++ )b.bb.M()b.b.b.0( hgjilkhghgjilk +++ )b..bb).MM(()b..bb.M( hgjilkhgjihgjilkji   
+++ )b.b).bMM(()b.b..bM( hgjilkhglkhgjilklk  
++ )b..b).bMM(( hgjilkjilk  
)b..bb).MMM(( hgjilkhgjilk ++  
This is equivalent to 3-bit multiplexer Mlkjihg with control bits (A2, A1, A0) equal   (bl+ bk , bj+ bi , bh+ 
bg ) 
Fig. 3.5 shows implementation for three pairs of bits (b5, b4), (b3, b2) & (b1, b0) 
 
32 
 
 
Fig. 3.5.  Six bits binary to residue number system conversion 
m
01
m
1
2
m
2
3
m
3
4
m
4
5
m
5
m5..0 bb2b2b2b2b2X +++++=  
Lemma 3.5: 
Combining three pairs of bits (bl & bk ), (bj & bi ) & (bh & bg )  requires one 3- bit multiplexer and 
three 2-input mod adder  and one 3-input mod adder.  The delay time equals 
332 3 modadder muxmodadder muxTotal τττ2τ τ +=+=   
 
 
 
 
 
33 
 
Proof: 
Equation (3.10) and fig. 3.5 show that ,M,)[0b(b),b(b),b(b hgghijkl +++  
,MM,M,MM,M hglklkhgjiji ++ ]MMM,MM hgjilkjilk +++  is equivalent to one 3- bit 
multiplexer and three 2-input mod adder  and one 3-input mod adder;  and  delay time is equal to 
2 3 modadder mux τ2 τ +  
Fig. 3.2d represents acyclic graph for "  " for node Mlkhgji    where Mlk,  Mhg  & Mji  are inputs   
Lemma 3.6: 
The parallel prefix operator   has the following properties   
1) Commutative  
lkjihgM   tsrqpoM     =   tsrqpoM      lkjihgM   
2) Associative 
lkjihgM   tsrqpoM (   )M zyxwru   =  lkjihgM (    )M tsrqpo       zyxwruM  
Proof: 
The proof is similar to Lemma 3.4 
 
 
 
 
 
 
 
 
 
34 
 
 
3.3. Illustrative Example 
          In this section, we will use to illustrate how theorem 3.1, theorem 3.2, lemma 3.1 and lemma 
3.2 can be combined to design a binary to residue convertor. Fig. 3.6 shows how 
m
X  for n = 8 is 
computed.  
In the first layer,  using pre-processing operator   each consecutive pair of bits are group together 
(b7, b6 )  (b5 , b4 ) (b3 , b2 )  (b1, b0 ) creating nodes M76, M54, M32, M10 . In the second layer, using 
parallel prefix operator   each consecutive M node are combined (M76, M54) (M32, M10) forming 
nodes M7..4 & M3..0.  In the last layer, using parallel prefix operator  the last 2 M nodes are 
combined (M7..4 ,  M3..0)  forming node M7..0 =   
m7..0X .  Fig. 3.7a shows the actual hardware 
implementation.  
 
Fig. 3.6.  Prefix structure of 8 bits binary to RNS 
Total delay time for this example is calculated by counting the delay introduce by the operator in 
each layer  
by using lemma 3.3  as follows  
35 
 
Layer 1:  delay time is  
2mux
 τ
 
pre-processing operator  doesn't requires an adder 
Layer 2 :  delay time is  
2 2 modadder mux τ τ +   
Layer 3:  delay time is   
2 2 modadder mux τ τ +  
Total delay is the sum of all layers delay time  
2 2 modadder muxTotal τ2τ3 τ +=  
To show that hardware works, the signal propagation for binary number 
6244)01101110(X
710727 ===   is illustrated in fig. 3.7b. Similarly, the reader can try any bit 
pattern in fig. 3.7b to check the validity of the design. For example 
3255)11111111(X
710727 ===  where each multiplexer select line is 3 and the selected output 
are shown in parenthesis. 
 
 
 
 
36 
 
 
Fig. 3.7a.  Eight bits binary to residue number system conversion 
m
01m2m
2
3
m
3
4
m
4
5
m
5
6
m
6
7
m
7
m7..0 bb2b2b2b2b2b2b2X +++++++=  
37 
 
 
Fig. 3.7b.  Example for signal propagation of   
72
)01101110(  and 
72
)11111111(  
 
 
 
 
 
38 
 
3.4. Implementation Selection 
       There are several possible binary to RNS implementations using a combination of 2-bit and 3-
bit multiplexers. Fig. 3.8 shows three different imp lementations (design 1, design 2 and  design 3) 
for 10 bits binary to residue conversion system.  
To simplify comparison, the following reasonable assumptions are made  
2 33 2 modadder modadder muxmux τ2τ;ττ ==  
Design 1 uses nine 2-bit multiplexers and four 2-input mod adders with  
Layer 1:  delay time is  
2mux
 τ
  
 
Layer 2:  delay time is  
2 2 modadder mux τ τ +   
Layer 3:  delay time is   
2 2 modadder mux τ τ +   
Layer 4:  delay time is  
2 2 modadder mux τ τ +  
Total delay is sum of all layers delay time  
2 2 modadder muxTotal τ34τ τ +=   
 
 
 
 
39 
 
 
Fig. 3.8.  Prefix structure of 10 bits binary to RNS   
m
01m2m
2
3m
3
4m
4
5m
5
6m
6
7m
7
8m
8
9m
9
m9..0 bb2b2b2b2b2b2b2b2b2X +++++++++=  
Design 2 uses four 3-bit multiplexers, one 2-bit multiplexers, four 2-input mod adders and one  
3-input adder with  
Layer 1:  delay time is  
3mux
τ  
Layer 2:  delay time is  
2 3 modadder mux τ2 τ +  
Layer 3:  delay time is   
2 2 modadder mux τ τ +  
40 
 
2 32 modadder muxmuxTotal τ3τ2 ττ ++=   
            
2 2 modadder mux τ3τ3 +=  
Design 3 uses three 3-bit multiplexers, three 2-bit multiplexer and three 2-input mod adders with  
Layer 1:  delay time is  
3mux
τ  
Layer 2 :  delay time is  
2 2 modadder mux τ τ +  
Layer 3:  delay time is   
2 2 modadder mux τ τ +  
2 32 modadder muxmuxTotal τ2τ2τ τ ++=   
          
2 2 modadder mux τ2τ3 +=  
Table 3.1 shows that design 3 uses less hardware and is faster than other designs. 
TABLE 3.1 
SHOWS COMPARISON AMONG THE THREE DESIGNS IMPLEMENTATION  
 Design  Hardware  count 
# Time Delay  Mux2 Mux3 
Mod 
add2 
Mod 
add3 
1 2 2 modadder mux τ34τ +   9 0 4 0 
2 2 2 modadder mux τ3τ3 +   1 4 4 1 
3 2 2 modadder mux τ2τ3 +   3 3 3 0 
 
 
 
 
 
41 
 
 
3.5. Comparison Selection to Pervious Work   
        This Novel method has hardware advantages greater than any competitive converters.  In 
1984, Alia and Martinelli [3] published a binary to RNS conversion design based on power 2 mod 
mi . The design uses processing elements (PE) and each PE is associated with two registers. Each of 
these registers is serially loaded with 
m
1i
m
i 2  and2 +  respectively. The two outputs are added in a 
modular adder. Thus, at the first level, n/2 PEs are required. The number of stages in this method is 
[log2 n]. After successive transformation and addition, the residue result is available.  Cappocelli and 
Giancarlo [4] suggested the use of t PEs where t = n/ log2 n, each PE computing the residue 
corresponding to k- bit binary word where k = log2 n, the residue  2kt mod mi is serially fed to thˆk PE 
( =kˆ 0, 1, 2, …, t-1). Based on these initial residues, the residues corresponding to the next (k-1) 
powers are computed by first doubling and then weighting according to the input bits in each PE. 
The partial residues of k-bit words computed over parallel t PEs are then added to yield the final 
residue. Mohan [5] has proposed a similar method but with a difference that X is divided into t 
sections based on the cyclic property of 2j mod mi. Using the fact that, 2j, 2j+lo and 2j+2lo have the 
same residues due to the periodicity of period lo , lo bits are first added. The width of the result is 
confined to lo bits by adding the carry bit resulting from previous addition to LSB of the result. The 
residue results are then determined by using methods given in [3]. 
 Complexity calculation is very important for any design development. Reduction in 
complexity of design can be done using adjustment in flow of design, which is made before 
implementation. Table 3.2 shows hardware comparison among our design, Behrooa [7] and Alia [4]. 
There are various criteria that can be  used to measure the hardware complexity number of gates, 
number of I/O, delay time, fan in / fan out, area / size, power dissipation, and  rank of design matrix. 
McCabe metric and Halstead’s software science are two common codes for software complexity 
42 
 
measures. McCabe metric determines code complexity based on number of control paths created by 
the code as follows 
v = e - n +2p 
Where e is the number of edges in a program flow graph, n the number of nodes, and p the number 
of connected components. Halstead introduced software science in oder to measure properties of 
the programs. Halstead’s program volume is defined to be  
v = (N1+N2) log2(µ1 +µ2) 
Where µ1 is number of distinct operators, µ2 is number of distinct operands, N1 is total number of 
operators and N2 is total number of operands. 
TABLE 3.2 
COMPARISON AMONG BEHROOA[7], ALIA[4] AND THE NEW DESIGN 
 
3.6. Conclusion 
     In this chapter, we presented a new novel binary to residue conversion method that eliminates 
the need for processing elements (PE) as the above competitive converter designs. This new novel 
design  doesn't use table lookup as in Behrooa Parhami [6]. The new method that we present here 
is based on multiplexers concept which makes it practical and suitable for VLSI implementation.  
43 
 
 
 
CHAPTER 4 
 
SIMPLIFIED RNS SCALING ALGORITHM 
 
4.1 Introduction 
 
Recent advances in computer architecture and VLSI technology have brought about a 
resurgence of interest in RNS based digital systems. RNS system has a very big advantage in the 
modular arithmetic operations like addition, subtraction and multiplication since this system 
provides the ability to add, subtract or multiply without the need to wait for the carry propagation 
as required by the weighted number systems. The non modular operation such as sign detection, 
division and conversion presents a challenge to researchers.  A lot of research has been done to 
address these issues [9]. In this chapter we present new algorithms for residue number system 
scaling that utilizes a simplified base extension process [9].  
4.2 Division remainder zero 
Division remainder zero method is a special simple case of division where the divisor is 
relatively prime to the modulo set and the dividend is a multiple of the divisor. This operation is 
accomplished by multiplying the dividend by the multiplicative inverse of the divisor. 
 
i
m
i
ni
m
m
m
maa
imaba
), .. m,m (mmab
a
b
i
iii
over   of inverse blemultiplica  theis  where
 allfor  prime relatively are  and  andremainder  without by  divides  ifonly  and If 
  set moduli in the  allfor  
1
21
1
−
−
=
 
 
4.2 Scaling 
Scaling is a restricted division operation where the divisor is one of the modulo or a product 
of modulo [9]. Several algorithms have been presented for scaling. The common idea of these 
algorithms is breaking the scaling process into two processes. A division remainder zero 
operation and a base extension operation [9]. If we assume that a number X is the dividend and 
44 
 
 
 
the number Y is the divisor over the modulo set (m1,m2,…,mn). The result of dividing X by Y can 
be expressed as follows 
Y
XY
Y
XX +



=  
QY +RX =                                          (4.1) 
Where 




=
Y
XQ is the integer quotient is value of X over Y and YXR = is the least positive 
integer remainder. The objective of scaling is to find the quotient Q for restricted Y values. 
 
Y
XX
Y
XQ Y−=



=     
ii m
Y
m Y
XX
Y
X −
=
for all i, where (mi , Y)=1  
 ( )
iii
m
mY
m
i YXXY
Xq 1−×−== where Q = (q1, q2, …, qn)              (4.2) 
Equation (4.2) is a division remainder zero operation and can be used to get the residue digits qi 
for all i where (mi , Y)=1. For the remaining digits where (mi , Y)≠1, base extension algorithm is 
needed to find all residues of the quotient Q. 
 
4.3 General Division 
General division operation is the operation where the divisor does not fit the restrictions 
mentioned in the division remainder zero or scaling operations [9]. General division can be 
divided into two categories, multiplicative and subtractive. Most of the multiplicative algorithms 
first compute the reciprocal of the divisor and then multiply the reciprocal by the dividend. The 
subtractive algorithms employ subtraction of multiples of the divisor until the difference is less 
than the divisor [9]. The algorithm presented in [9] seems attractive because of its simplicity. It 
converts the general division operation into iterative scaling operation and it uses a lookup table 
to identify the candidates for the scaling operations. The disadvantages of this algorithm along 
with many similar division algorithms are the scaling operation which is slow due to the 
complicated calculations which are either based on Galois Field or based on basic MRC. 
 
45 
 
 
 
4.4 New Base Extension Algorithm 
RNS base extension problem, is the problem of finding the residue digits of one set over 
another set that is an extension of the original set.  
Let (m1, m2, …, mn , mn+1, mn+2, …, mn+k) be relatively prime modulo, the base extension problem  
 
is finding the residues 
knnn mmm
XXX
+++
,...,,
21
given the residue ∏
=
=<≤
n
i
immm mMXXXX n
1
0  and  ;,...,,
21
. 
 
The algorithm described in [9] is based on MRC conversion, where the algorithm assigns a 
variable to the residue representation of  
1+nm
X  and then performs the MRC conversion on the 
new modulo set which ends up with a linear equation of the MRC coefficient as a function of
1+nm
X . This is a lengthy operation. Also, alternative method was presented [9] which are based on 
CRT. The advantage of CRT is that it is faster than MRC; however it requires large modulo 
adders because of the need to perform mod M operations.  
Lemma 4.1:  The general solution of linear Diophantine equation  
γβα =+ vu                                   (4.3)   
Where α, β, and γ are given integers and integer solutions u, v are desired 
are  ),gcd(
k
u u  * βα
β
+=    and     ),gcd(
k
 vv  * βα
α
−=  
Where k is any integer and u* and v* are any particular solution. Also, this solution will exist if 
and only if γ is a multiple of gcd(α, β). Equation (4.3) is equivalent to the following RNS 
equation: 
jjjiii xqmxqm +=+      i≠j                                  (4.4) 
gcd(α, β) = gcd(mi,mj) = 1 
Comparing equations (4.3) and (4.4), equation (4.3) can be written as matrix notation 
46 
 
 
 
1i1jjijiji aaam-am −=
                                 (4.5) 
Where aij = qi , aji = qj    are the unknowns 
and a1j  = xj  and a1i  = xi  are given integers  
The general solution for equation (4.5) is  
km)aa (c  a j1i1jij ×+−×=                                 (4.6) 
Where k and c is integers and can be obtained by using Extended Euclidean Algorithm or 
Fermat’s Theorems. Each coefficient aij represents one element of Matrix A which is needed to 
solve (4.6).  
















=
nn
n
n
n
a
aa
aaa
xxxx
A
......
...
...
...
333
22322
321
 
Where ( ) ( )( )( )










→
≥≤≤×+−×
≤≤=
=
−−−
OtherthemfindtoneedNo0
ijn,ji,2kmaac
nj11,ix
j1i1ij1iij
j
ija                               (4.7) 
The smallest positive integer solution of each of the aij can be obtained by iterations of equation 
(4.7) and the solution of Diophantine diagonal coefficients “aii” is equivalent to mixed radix 
digits in MRC conversion 
∏−
=
++××+×+=
1n
1i
inn2133 12211 ma....mma maa X                        (4.8)  
 
 
 
 
47 
 
 
 
Example:  
Given the residue representation X = (0, 1, 1) with modulo set (m1, m2, m3) = (2, 3, 5), extend the 
base to m4=7, i.e. find |X|7 
 
Solution: 
Convert X to its decimal value using an RNS to decimal conversion algorithm 
X = (0, 1, 1) = 16 
|X|7=|16|7 = 2 
The new representation of X over the new modulo set (2, 3, 5, 7) is (0, 1, 1, 2) 
Example:  
Given the residue representation x = (0, 1, 1) for the base with modulo (m1, m2, m3) = (2, 3, 5). 
Extend the base to m4=7, m5=11, i.e. find |X|7 and find |X|11 
Solution: 
Convert X to its decimal value using an RNS to decimal conversion algorithm 
X = (0, 1, 1) = 16 
|X|7=|16|7 = 2 
|X|11=|16|11 = 5 
The new representation of x over the new modulo set (2, 3, 5, 7, 11) is (0, 1, 1, 2, 5) 
4.5 New Scaling Algorithm 
The following algorithms can used to find coefficient “c” in equation (4.7) 
Fermat’s implementation for finding coefficient “c”  as follows: 
 
 
     
48 
 
 
 
 
 
 
 
 
 
 
 
 
 
Fig.  4.1. Fermat’s implementation for finding coefficient “c” 
Extended Euclid Algorithm Implementation 
 
 
 
 
 
 
 
 
 
 
 
Fig.  4.2. Extended Euclid algorithm implementation 
 
 
 
 
 
 
 
 
 
 
Fig. 4.3.  Implementation of finding coefficient (aij) 
        %----------------------------------------------------% 
        % finding coefficient c using Fermat method  
        %-----------------------------------------------------% 
        v = mod(mod(power(mj,mk-2),mk),mk); 
        v1 = mk-v; 
        if v <= v1  
            c= v 
        else 
            c = -1*(mk-v) 
        end 
       %----------------------------------------------------------------% 
        % Finding coefficient “c” using Extended Euclid Algorithm 
        %---------------------------------------------------------------% 
        u1 = 1;  u2 = mj;  v1 = 0;  v2 = mk; 
        while v2 
            q = int(u2/v2); 
            t1 = u1 - v1*q;  t2 = u2 - v2*q; 
            u1 = v1;         u2 = v2; 
            v1 = t1;         v2 = t2; 
        end 
        c = u1 
       %----------------------------------------------------------------% 
       % Finding the smallest positive integer “aij” solutions for 
       % Diophantine equations  
        %----------------------------------------------------------------% 
        a(j,k)= c*(a(j-1,k)-a(j-1,j-1)); 
        n =1;                               
        if a(j,k) < 0                     % case 1   ex -6 + 11n   
         while (a(j,k) + n*mk) < 0 
                n = n + 1; 
            end 
            a(j,k) = a(j,k) + n*mk; 
        elseif a(j,k) >  mk          % case  2  ex  30 +  11n      
            while (a(j,k) - n*mk) >= 0 
                n = n + 1; 
            end 
            a(j,k) = a(j,k) - (n-1)*mk;  
        else 
            a(j,k) = a(j,k);             % case 3    a(j,k) < mk   
        end         
         
49 
 
 
 
 
Example: 
Find RNS representation of number 300 using modulo set (5, 7, 11) and also find mixed radix 
digits using Diophantine method 
Solution: 
11,7,510 )3,6,0()300( →RNS  
From equation (4.5) 
a11 = x1 = 0 
a12 = x2 = 6 
a13 = x3 = 3 
a22 = c22 * [a12-a11] + m2 * k 
and from Extended Euclidean Algorithm or Fermat’s Theorems  c22 = 3  
a22 = 3 x [6 – 0] + 7k = 18 +7k 
at k = -2, a22 = 4  is the smallest positive integer solution 
a23 = c23 x [a13-a11] + m3 x k 
c23 = -2 
a23 = -2 x [3 – 0] + 11k = -6 +11k 
at k = 1,  a23 = -6 +11 = 5   is the smallest positive integer solution 
a33 = c33 x [a23-a22] + m3 x k 
c33 = -3 
a33 = -3 x [5 – 4] + 11k = -3 +11k 
at k = 1,  a33 = 8   is the smallest positive integer solution 
50 
 
 
 










=
8
54
360
A  
From equation (4.8) 
X = a11 + a22 x m1 + a33 x m1 x m2  
X = 0 + 4 x 5 + 8 x 5 x 7 = 300 
Diophantine equation can also help to solve RNS scaling by power of two. The new method is 
based on division remainder zero theorem [9] and be used to any set of prime modulo set (m1, 
m2, m3, ... , mn). 
Theorem 4.1 (Division Remainder Zero) 
( )
n21 m
1-
nm
1-
2m
1-
1M
1-
M
sx,,sx,sx Xs
s
X
K==  
Only if s divide by X without a remainder and gcd(s, mi) =1 
Theorem 4.2: 
For any set of prime modulo set (m1, m2, m3, ... , mn).  
Case A: X is an even number “scaling without a reminder” 
( )R,y,,y,y 
2
X
n21
M
K=  
Where 
ii
i
m
m
2m
ii 2xy
−
=
 for all odd modulo and  
R is the scaling result for m = 2 and it is found by following manner 
 Set R = 0  
 Find mixed radix digits (a11, a22, …, ann, D) for (y1, y2, …, yn, 0) by using Diophantine 
RNS to binary conversion method  
51 
 
 
 
  If  D = 0 then ( )0,y,,y,y 
2
X
n21
M
K⇔   and    
 If  D = 1  then ( )1,y,,y,y 
2
X
n21
M
K⇔  
Case B: X is odd number “rounding to nearest integer” 
Replace X with (X+1)  
( )R,y,,y,y 
2
1X
n21
M
K⇔
+
 
Where 
ii
i
m
m
2m
ii 21)(xy −+=  for all odd modulo and  
R is the scaling result for m = 2 and it  always equals to one. 
Example: 
For modulo set (2, 3, 5) determine the residue representation for 8/2, 6/2 and 5/2  
Solution: 
Let m1 =3, m2=5, m3=2  
a) 2,5,310 )0,3,2()8( →RNS  
122y
33
23
1 ==
−
 
423y
55
25
2 ==
−
 
Set R = 0  for m3 = 2 
( ) )0,4,1(R,y,y 21 =    
Mixed radix digits “a33”  = 0   →  R = 0 
4)0,4,1(
2
8 1
2,5,3
10
 →→




 −RNSRNS
 
52 
 
 
 
b) 2,5,310 )0,1,0()6( →RNS  
020y
33
23
1 ==
−
 
321y
55
25
2 ==
−
 
Set R = 0  for m3 = 2 
( ) )0,3,0(R,y,y 21 =    
Mixed radix digits “a33”  = 1   →  R = 1  
3)1,3,0(
2
6 1
2,5,3
10
 →→




 −RNSRNS
 
c) 2,5,310 )3,0,2()5( →RNS  
02*)12(2)12(y
333
23
1 =+=+=
−
 
33*)10(2)10(y 555
25
2 =+=+=
−
 
And R = 1  for m3 = 2 as shown in case B 
( ) )1,3,0(R,y,y 21 =    
3)1,3,0(
2
5 1
2,5,3
10
 →→




 −RNSRNS
    “Rounding to nearest integer 
4.6 Conclusion 
Diophantine RNS parallel algorithm provides an alternative method of finding mixed radix 
digits with  a high degree of parallelism. The algorithm has advantages over MRC method and 
CRT methods since it avoids the use of modulo computations and use of multiplicative inverse. 
 
53 
 
 
 
CHAPTER 5 
 
VLSI IMPLEMENTATION OF RESIDUE ADDER AND SUBTRACTOR 
 
5.1 Introduction 
In the last decade RNS arithmetic has become an attractive design option [1-3] for real time 
application fields such as signal processing, image processing and computer graphics. The merits 
of using RNS arithmetic lies in its capability of performing addition, subtraction and 
multiplication without the generation of carry propagation. Also, RNS arithmetic has the 
capability of being designed and fabricated using VLSI techniques. These characteristics of RNS 
makes it most suitable for digital signal processor hardware.  
In RNS the large numbers are represented by an n-tuple of smaller numbers that are 
independent of each other, where n is the number of modulo in the modulo set. Hence, the 
number operation can be done on these smaller numbers rather than the original number. 
Furthermore, because the numbers in the n-tuple are independent of each other, the operation can 
be done in parallel. The RNS is defined by a set of modulo (m1, m2, … , mn) that are pairwise 
relatively prime positive integers. It can be shown that there is a unique representation for each 
number in the range of 0 ≤ X < M Where ∏
=
=
n
1i i
mM
 and is called dynamic range [9]. Each 
integer X can be represented by an n-tuple of residues X = (x1, x2, …  , xn), 
where xi = X mod mi or it can be written as 
im
i Xx = . The RNS represents any number within 
the range of [0, M) for unsigned numbers or 




−− 1
2
M
,
2
M
  for signed numbers. In RNS, the 
binary operations { +, - , * } are defined as follow If  Z = A  B  then (z1, z2, …  , zn) = (a1, a2, …  , 
54 
 
 
 
an)  (b1, b2, …  , bn)  where zi = (ai  bi) mod mi. Each residue digit can be computed 
independently of the others allowing fast data processing in n parallel independent channels. 
VLSI implementations for adder and subtractor have been realized by many researchers. 
Bayoumi [65] used three approaches (look up table approach, binary adder approach and hybrid 
implementation approach). Banerji [67] and Piestrack [68] have their contributions as well for 
implementation of binary adders using VLSI techniques. In this chapter, we have implemented 
an RNS adder and subtractor. We have used Cadence (Virtuoso and Encounter) for the layout 
and XILINX for the simulation results of the design. 
5.2 Residue Adder and Subtractor 
If we have two integers A and B of modulo m, then their addition and subtraction is 
expressed as sum and subtraction of A  B mod m. The operation addition and subtraction can 
be described as below in equation 1 and 2 respectively. 



≥+−+
<++
=+
mBAmBA
mBABA
BA
m if
if
                   (5.1) 



<−+−
≥−−
=−
0BAmBA
0BABA
BA
m if
if
                    (5.2) 
Fig. 5.1 shows the implementation of RNS adder /subtractor based on above equations. 
55 
 
 
 
 
Fig. 5.1. RNS adder and subtractor 
The operation addition or subtraction is decided by the select line C0. When C0 is 0, the RNS 
operation is addition. When C0 is 1, the RNS operation is subtraction. For n-bit RNS adder the 
select line C0 is 0 and it will simply perform as an adder. Inputs to adder are A, B and m. Fig. 5.1 
shows all the components that are used in RNS adder and subtractor. There are two full adders, 
four 2x1 multiplexers and one OR gate are used. As already mentioned for Adder C0 is zero, 
therefore inputs A, B and C0 are fed to the first full adder, which in turn will yield carryout Can 
and sum Sa. In next step m, 0C  = 1 and sum of previous full adder Sa goes to the next full adder 
in the hierarchy which in turn will give us sum Sb, Cbn. Then Can and Cbn are added and  sent to 
the third 2x1 multiplexer along with signal Can. The output of the third 2x1 multiplexer is used 
as a control signal to the final 2x1 multiplexer which generates the final output Sum based on 
equation (5.1). For subtractor the scenario is same and equation (5.2) is implemented, but the 
control signal C0 in this case is 1.  
 
 
56 
 
 
 
5.3 VLSI Implementation 
In this section, we implemented the RNS adder and subtractor using VLSI techniques such as 
XILINX and Cadence tools (virtuoso, Encounter). For simulation we have used XILINX and for 
layout we used encounter and virtuoso. 
a) XILINX is a platform where we can generate the schematic using the verilog code of the 
design. At the same time, we can test if the desired schematic or design is correct by looking at 
the behavioral simulation. Once we have created a verilog code for the adder and subtractor, we 
used the Xilinx tools to synthesize the schematic. Eventually for the verification of design, we 
built a testbench and run the simulation to see the correct desired output. Fig. 5.2 shows the 
result of RNS adder and subtractor.  
 
Fig. 5.2. Waveform of RNS adder and subtractor 
57 
 
 
 
 
Fig. 5.3. RNS adder and subtractor layout 
b) Encounter is a technique where we can generate a layout of our design. Ambit Buildgates is 
used to generate the netlist then the netlist is uploaded to the encounter. Further, by assigning 
floor plans, appropriate layers and nano routing we can get the design in .gds and .def format. 
We can also check for the design if it is flawless or not in terms of connectivity, density etc. Fig. 
5.4 shows the encounter part of our design. 
 
Fig. 5.4. Encounter part of RNS adder and subtractor 
 
58 
 
 
 
TABLE 5.1 
TOTAL RNS ADDER AND SUBTRACTOR DELAY TIME (PS) 
 
 
TABLE 5.2  
RNS ADDER AND SUBTRACTOR DESIGN PARAMETER 
 
 
 
c) Virtuoso is a tool through which we can send our layout for the chip fabrication. We can 
pursue Virtuoso in two ways, one is to build the design from basic building blocks, such as from 
transistor level (pmos, nmos etc.) and the second way is to import the design from encounter 
with the help of certain technology libraries. In our work, we have used the later method where 
we have imported the design from encounter and the same was padframed in order to send it to 
MOSIS for chip fabrication. The complete Layout along with the connections is shown in fig. 5.4 
below. 
59 
 
 
 
 
Fig. 5.5. Virtuoso part of RNS adder and subtractor 
5.4 Conclusion 
We have discussed the VLSI approach for the realization of RNS adder and subtractor. 
XILINX is used to get the behavioral simulation of the design with the help of which we were 
able to verify that our design is performing as desired. Cadence encounter is used to build the 
layout of the design. Design layout along with the technology libraries and layers is exported to 
Virtuoso. Furthermore, padframing was performed in order to send the design to MOSIS for 
fabrication.  
 
 
 
 
60 
 
 
 
CHAPTER 6 
NOVEL QUANTUM BOOLEAN CIRCUITS CONSTRUCTION BY USING  
XOR-AND REDUCTION METHOD 
6.1 Introduction 
 One of the new fields of nanotechnology is quantum-dot cellular automata that provides an 
alternative design to CMOS architectures. Recent research studies [76-82] show that the 
advantages of using QCA technology are smaller circuit size, faster switching speed, and less 
power consumption. 
  During the past decade, quantum-dot cellular automata has demonstrated the ability to 
implement both combinational and sequential logic devices. Unlike conventional Boolean AND-
OR-NOT based circuits. The fundamental logical device in QCA Boolean networks is majority 
gate which implements the Boolean function:  
M(A,B,C) = AB+AC+BC        (6.1) 
With combining these QCA gates with NOT gates any combinational or sequential logical 
device can be constructed from  QCA cells. The process of QCA Boolean logic is more 
sophisticated than Boolean logic. The traditional Boolean logic  reductions methods such as 
Kranaugh maps produce simplified Boolean expressions. However, converting these forms to 
QCA Boolean is not simple process due to complexity of multilevel majority gates. R. Zhang 
[19] proposed thirteen standard functions to present all three variables Boolean functions that can 
be used to produce simplified majority logic. Chin-Yung [13] used Tabulation method to 
simplify Boolean logic functions and to produce a simplified QCA logic. In this chapter, we 
present a novel methodology for multilevel majority logic synthesis, our methodology takes as 
61 
 
 
 
its input a Boolean circuit, generates simplified XOR-AND equivalent circuit and output an 
equivalent majority gate circuits. 
6.2 QCA Background Material  
      QCA Cell: A quantum cell can be viewed as a set of four charge containers or dots 
positioned at the corners of a square as shown in fig. 6.1a. Each QCA cell contains two mobile 
electrons that can move to any quantum dot through electron tunneling. Thus, there are two 
equivalent logic  polarization P = +1 (Logic 1) and P = -1 (logic 0). 
   QCA Majority Gate: The basic QCA logic element is a majority gate as shown in fig. 6.1b. It 
produces an output of one if the majority of inputs one. The classical AND and OR gates can be 
realized with majority gate by fixing one of three inputs as 0 or 1 respectively, as follows: 
M(A,B,0) = AB       
          
(6.2a) 
M(A,B,1) = A+B              
 
(6.2b) 
  QCA Inverter: QCA cells layout of an inverter is shown in fig. 6.1c. The polarization of the 
output QCA cell “out” is opposite of input QCA cell “in”. 
  QCA Wire: There are two types of QCA wires normal (also called 90ο) and diagonal (also 
called 45ο). Fig. 6.1d shows the two QCA wire types with logic one polarized. 
62 
 
 
 
 
Fig. 6.1a. QCA cells and binary encoding  
 
 
Fig. 6.1b. QCA majority gate 
 
 
63 
 
 
 
 
Fig. 6.1c. QCA inverter gate 
 
 
Fig. 6.1d. QCA wire types 
   QCA Clocking: QCA cells use four phase scheme namely clock 1, clock 2, clock 3 and clock 
4 as shown in fig. 6.2a. Every clock is 90ο out of phase form its pervious clock and each clock 
has four states namely switch, hold, release, and relax [40]. In switch state, QCA cells start 
polarized. In hold state, cells retain thier polarization. Additionally, during release and relax 
states, QCA cells are unpolarized as shown in fig. 6.2b.  
64 
 
 
 
 
Fig 6.2a. QCA clock phases; each clock lagging its prior by 90ο 
 
 
Fig. 6.2b. Four QCA interdot barrier states 
 
 
 
65 
 
 
 
6.3 Novel QCA Extraction  
XOR algebra can be used very effectively to yield gate-minimum results not possible by 
conventional mapping methods. Our novel QCA extraction procedure takes as its input a 
Boolean network, generates simplified XOR-AND equivalent network and output an equivalent 
majority gate network as shown in fig 6.3. 
 
Fig 6.3. XOR-AND function extraction methodology 
66 
 
 
 
We will use the following two Boolean examples to illustrate the QCA extraction 
methodology. 
Example: Generate the equivalent QCA circuit for ∑= )14,13,12,8,7,5,3,2(),,,(1 dcbaf  
Solution: 
Step1:  Draw up the minimization chart and list all miniterms on the first column [10]. Refer to 
table 6.1 for detail of construction. 
TABLE 6.1 
THE CHART FOR DERIVING XOR EQUIVALENT FUNCTION “F1” 
Step2: List all possible variables possibilities in the first row. Start with one, all possible pairs of 
variables, then all triples of variables and so on up to columns for all the variables possibility. 
Step3: Filling 1’s in all possible variables columns that have unprimed variables in miniterms as 
shown in table 6.1. 
Step4: To get the function in final XOR-AND,  cross out all columns that have even number of 
1s in them. The XOR-AND function for this example is  
67 
 
 
 
     acdbdbcadcaf ⊕⊕⊕⊕⊕=1  
Step5: Utilizing ( )xx ⊕= 1  XOR property, the above function can be simplified to  
   dcabdcbaf ⊕⊕⊕=1  
Step6: Construct a majority gate tree as shown in fig. 6.4 and then replace each node with an 
equivalent majority XOR and AND gates as shown in fig. 6.5.  
 
 
Fig. 6.4. Majority tree for function f1 
 
Lemma 6.1: If x and y are two binary inputs, then  )0,,( yxMxy =  
Proof: 
By equation (6.1)        
xyxyyxM =++= 00)0,,(  
Lemma 6.2: If x and y are two binary inputs, then  )1),0,,(),0,,(( yxMyxMMyx =⊕  
Proof: 
=)1),0,,(),0,,(( yxMyxMM =++++ )1),00(),00(( yxyxM  
=++ yxyxyxyx ))((  
68 
 
 
 
yxyxyx ⊕=+  
Lemma 6.3: If x and y are two binary inputs, then  )1,,()0.,( yxMyxM =  
Proof: 
xyyxM =)0,,(  
yxxyyxM +==)0.,(  
By equation (6.2b) 
)1,,( yxMyx =+  
 
Fig. 6.5. Majority gates schematic for function f1 
 
 
69 
 
 
 
 
 
 
Step7: Majority tree can also be used to construct QCA expressions for any node. QCA 
expressions for n1 and n2 are as follows 
By Lemma 6.1  
)0,1,(aMa =  
)0,,( cbMcb =  
)0,,( dbMbd =  
)0,),0,,(( dcaMMacd =  
By Lemma 6.2 and  Lemma 6.3  
cban ⊕=1  
)1),0,,(),0,,((1 cbaMcbaMMn =  
)1),0,)0,,(),0,1,((
),0),0,,(,)0,1,(((1
cbMaMM
cbMaMMMn =
 
)1),0),1,,(),0,1,((
),0),0,,(),1,0,(((1
cbMaMM
cbMaMMMn =
 
And  
acdbdn ⊕=2  
)1),0,,(),0,,((2 acdbdMacdbdMMn =  
)1),0,)0,),0,,((),0,,((
),0),0,),0,,((,)0,,(((2
dcaMMdbMM
dcaMMdbMMMn =
 
70 
 
 
 
)1),1)1,,)0,,((),0,,((
),0),0,),0,,((),1,,(((2
dcaMMdbMM
dcaMMdbMMMn =
 
)1),1)1,),1,,((),0,,((
),0),0,),0,,((),1,,(((2
dcaMMdbMM
dcaMMdbMMMn =
 
Example: Generate the equivalent QCA logic expression for ∑= )7,4,3,2,0(),,(2 cbaf . 
Table 6.2 shows the minimization chart for function f2. For unprimed miniterms in minimization 
chart, 1s are filled for every column [10].  
TABLE 6.2  
MINIMIZATION CHART FOR FUNCTION F2 
 
The XOR-AND function for this example is  
abcbcabcf ⊕⊕⊕⊕= 12  
And utilizing ( )xx ⊕= 1  XOR property, the above function can be simplified to  
bcaabcf ⊕⊕=2  
Fig. 6.6 shows the majority gate tree for function f2 which helps to construct majority gates 
layout by replacing each node with equivalent majority XOR and AND gates. 
71 
 
 
 
 
Fig. 6.6. Majority gate tree for function f2 
QCA logic expression for function f2 is  
)1),0,,(),0,,((2 abcMabcMMf =  
)1),0,)0,,(,(
),0),0,,(,((2
baMcM
baMcMMf =
 
)1),0),1,,(,(
),0),0,,(,((2
baMcM
baMcMMf =
 
6.4 Conclusion 
We presented a systematic QCA logic construction method. Our novel method takes Boolean 
function as its input, generates simplified XOR-AND equivalent circuit and outputs an 
equivalent QCA logic circuits. In our novel method, we were able to simplify the Boolean 
functions and reduce number of majority gates with the help of XOR-AND reduction techniques 
then mapping QCA logic to Boolean functions.   
72 
 
CHAPTER 7 
IMPLEMENTATION OF GENERALIZED PIPELINE CELLULAR ARRAY 
USING QUANTUM-DOT CELLULAR AUTOMATA 
7.1 Introduction 
During the last decade, Quantum-Dot Cellular Automata (QCA) has attracted a lot of attention due to 
its extremely small size and its ultralow power consumption as compared to COMS technology. It has 
been demonstrated that QCA has the ability to implement both combinational and sequential logic devices 
[76]–[83].  
The fundamental unit of a QCA circuit is quantum cell which typically contains four quantum dots, 
placed near the corners of the cell where free electrons can reside. Quantum cells have two distinct stable 
polarizations, as shown in fig. 7.1. These states allow the cell to represent binary data. 
 
 
 
 
 
Fig. 7.1. Quantum cells with polarity -1 & polarity +1  
QCA Binary wires are the simplest QCA structures and consist of a series of quantum cells in close 
proximity to each other. The cells interact through Coulombic interactions with each other as shown in fig. 
7.2. Binary wires can also be constructed by orienting the dots in each cell at a 45 degree angle from the 
standard cell. This allows binary wires to cross in the same plane or layer without interacting with each 
other. 
73 
 
There are two logic gates that make up the fundamental set of logic in QCA: majority gate and inverter. 
By carefully arranging the location of QCA cells, one can create a majority logic gate, which is capable of 
functioning as either an AND or an OR gate. 
 
Fig. 7.2. QCA binary wire arrangements 
QCA majority gate takes three inputs and outputs a value that occurs most frequently as  
BCACABCBAM ++=),,(              (7.1) 
The majority gate can also be used to create AND and OR gates. If one input is held at 1, the majority gate 
functions as a standard 2-input OR gate. If one input is held at 0, the majority gate functions as a 2-input 
AND gate. Fig. 7.3 shows standard QCA majority gate construction. 
 
 
Fig. 7.3. QCA majority voter gate 
 
74 
 
QCA inverter gate has a single input and output. It simply returns the opposite of the value that was put in 
as shown in fig. 7.4. 
 
 
Fig. 7.4. QCA inverter gate 
QCA has a four-phase clocking mechanism. The sequence of the states in this scheme is the switch 
state, hold, release and relax states [76]. In the switch state, QCA cells start getting polarized. In hold state, 
the cells retain thier polarization. During release and relax states, QCA cells are unpolarized as shown in 
fig. 7.5. 
 
Fig. 7.5.  Four QCA four phase clocking mechanism 
 
75 
 
 
TABLE 7.1 
BOOLEAN FUNCTIONS AND THEIR EQUIVALENT QCA EXP. 
Boolean Function Majority Diagram / Expersssion
  
,, 0	
   
 
,, 1	
 ̅ 

̅, , 1	
 ̅
̅, , 0	
  
̅
 ,, 0 ,  ̅,,0 , 1	
 
 ̅
 ,, 0 , ̅, , 0 , 1	
  
   

   ⊕
  ⨀
 
76 
 
 
Lemma 7.1: If x and y are two binary inputs, then  )0,,( yxMxy =  
Proof: 
By equation (7.1)        
xyxyyxM =++= 00)0,,(  
Lemma 7.2: If x and y are two binary inputs, then  )1),0,,(),0,,(( yxMyxMMyx =⊕  
Proof: 
=)1),0,,(),0,,(( yxMyxMM =++++ )1),00(),00(( yxyxM  
=++ yxyxyxyx ))((  
yxyxyx ⊕=+  
Lemma 7.3: If x and y are two binary inputs, then  )1,,()0.,( yxMyxM =  
Proof: 
xyyxM =)0,,(  
yxxyyxM +==)0.,(  
By equation (2b) 
)1,,( yxMyx =+  
The use of generalized cellular pipeline arrays for various arithmetic operations has shown 
considerable promise in optical computer architecture because of the obvious advantages of improvement 
in speed and reduction in the cost and size. Cellular pipeline array consist of regular interconnections of 
selected logic sub-circuits called cells or processing elements (PE). The basic approach is to keep the 
number of I/O terminals to cellular array module to a minimum and supply control parameters as inputs to 
the arithmetic cells. 
77 
 
Different pipeline array designs have appeared in literature [88]-[92]. Singh [91] presented a 
generalized cellular array which can perform all of the basic arithmetic operations such as multiplication, 
division, squaring, and square rooting; and exploits the concept of pipelining. The basics of pipeline array 
is that the arithmetic operations are grouped together in single array with some additional control logic that 
can be used to realize the required arithmetic operation. Grouping these arrays (processing element units) 
can provide a single array network that can perform fast processing arithmetic operation. In this chapter 
we implemented pipeline array using QCA method design and comparing the design with [92].  
 
7.2 QCA Pipeline Array  
The generalized QCA pipeline array can perform all the basic arithmetic operations such as 
multiplication, division, squaring, and square rooting. The electronic implementation of a generalized 
pipeline array is adopted from an existing architecture [91]. Fig. 7.6 shows a block diagram for cellular 
pipeline array. The array consists of processing elements (PEs) with each PE communicating with its 
neighbours in the array either directly or through latches. The arithmetic cells marked as A are controlled 
1-bit adders. The cells marked as C are control cells that specify the type of arithmetic operations to be 
performed by the arithmetic cells. The cells marked as M are used for multiplication. The Cells marked S 
are used for squaring and square rooting. 
Fig. 7.7a shows a block diagram of an arithmetic cell, where lines A, B, and C are operand inputs, and 
lines X and F are control signals. The control unit specifies the type of operations to be performed in each 
PE. Fig. 7.7b shows a block diagram for control unit where P is an input, Fi is output, and X and C0 are 
inputs and pass-through a PE to adjacent cells. 
78 
 
 
Fig. 7.6.  Block diagram for pipeline array [91] 
79 
 
 
Fig. 7.7a. Arithmetic cell 
 
 
Fig. 7.7b. Control logic cell 
 
The arithmetic cell is capable of performing the following Boolean operations: 
[ ] ii FAFCXBAS +⊕⊕⊕= 1)(            
110 ))(( ACCAXBC ++⊕=                   
)( iFBCD +=                                              
))(( iFBCBE ++=
                             
And the Boolean expression for control cell is: 
XPXCF ii += 0
                             
Using lemma 7.1, 7.2 and 7.3, the above Boolean equations can be written in following QCA format  
)1),0,,(),0,,(( 3 ii FAMFnMMS =  
)1),0,,(),0,),1,,(( 1210 CAMnCAMMC =  
)0),1,,(,( iFBMCMD =           
80 
 
)0),1,,(),1,,(( iFBMCBMME =  
)1),0,,(),0,,(( 0 XPMXCMMF i=  
Where 
 )1),0,,(),0,,(( 21213 nnMnnMMn =  
 )1),0,,(),0,,(( 112 CAMCAMMn =  
 )1),0,,(),0,,((1 XBMXBMMn =  
Table 7.2 explains the required control signals for each arithmetic operation. When X=0, the arithmetic 
cell acts as an adder, and as a subtracter when X = 1. Sum and carry output are S and C0 respectively. The 
operands are applied at inputs A and B. The most significant bits of the inputs are A1 and B1. The most 
significant bit of the sum is S1.  The array is capable of finding the square root of a ten-bit binary number 
A1-10 with control inputs P1-5 are made zero and X is made 1. The B and C inputs to the first level are given 
as 00, 01, 10, 10, 10, 10, and 10, as shown in fig. 7.6. To find the square root of a number, it is applied 
across A, and then 01 is subtracted from the two most significant bits of A. If the remainder is positive, 
then the value of F1 is 1; otherwise, it is 0. If F1 is 0, the original value is kept for the next subtraction. 
Table 7.3 shows the value of the subtrahend for each succeeding stage.  
 
 
 
 
 
 
 
 
81 
 
TABLE 7.2 
 QCA PIPELINE ARRAY ARITHMETIC SUMMARY OPERATIONS 
Function Description of pipeline array 
operations [7] 
+  X=0, Fi=1 
1CBAS ⊕⊕=  
−  
X=1, Fi=1 
1CBAS ⊕⊕=  
x  X=0, A=0, B=C 
Right shift add method is used 
Multiplicand in B 
Multiplier in P 
÷  X=1, B=C; P=0 
Right shift and subtract  
method is used 
Dividend in A 
Divisor in B 
( )2  X=0, A=0 
B= 2’s comp of  “10”, C=”10” 
Operand in P 
 
X=1, P=0 
B= 2’s comp of  “10”, C=”10” 
Operand in A  
82 
 
 
 
TABLE 7.3 
 SUBTRAHEND AT DIFFERENT LEVELS FOR SQUARE ROOTING 
 
The array is also capable of taking the square of a 5-bit number. To find the square of a number, it is 
applied across Pi with X= 0, the arithmetic cells act as an adder, and the control cell transform Pi to Fi. 
Resulting in square of the number. The array can also be used to multiply a three-bit number B1-3 by a five-
bit number P1-5 with control bit X and A inputs are made zero. The array can divide a seven bit number A1-
7 by four-bit number B1-4, giving a four-bit quotient and a four-bit remainder. For this case, the control 
input X is made 1, and P inputs are made zero. Similar to the multiplication operation, the C inputs are 
kept the same as the B inputs. The array requires n (n + 2) arithmetic cells and n control cells. The delay of 
an arithmetic operation depends on the delay in processing the last level which uses 2n+1 arithmetic units 
and it is given by [91]. 
lcadelay n ττττ ++=  
Where aτ , cτ , and lτ  are the delays in arithmetic cell, control cell, and latch circuit, respectively. Fig. 7.8 
shows QCA design arithmetic and control cell unit layout. 
 
 
83 
 
 
Fig. 7.8. (a) QCA arithmetic cell and (b) control cell 
Agrawal [92] proposed high-speed multifunction array for multiplication, division, square-root 
operations. QCA equations for Agrawal’s arithmetic cell can be giving by:   
)1),0,,(),0,,(( 33 naMnaMMS iii =  
)1,,( 561 nnMci =+  
)1,,( 761 nnMei =+  
)0,,( 91 ii enMG =+  
)1,,( 91 ii enMP =+  
)0,,( 101 ii dnMg =−  
84 
 
)1,,( 111 ii bnMh =−  
Where 
)1),0,,(),0,,((1 xbMxbMMn ii=  
)1,,( 12 nrMn j=  
)1),0,,(),0,,(( 223 ncMncMMn ii=  
)1,,(4 ii caMn =  
)0,,( 245 nnMn =  
)0,,(6 ii caMn =  
)0,,( 167 nnMn =  
)1),0,,(),0,,(( 118 ncMncMMn ii=  
)1),0,,(),0,,(( 889 naMnaMMn ii=  
)1,,(10 ij brMn =  
)0,,(11 ij drMn =  
Fig. 7.9 shows QCA design high speed arithmetic cell unit layout. 
 
 
 
 
 
 
 
 
85 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Fig. 7.9 QCA high speed arithmetic cell 
 
7.3 QCA Pipeline Implementation  
For creating QCA pipeline array and verifying the design functionality, QCADesigner [86] is used. 
The tool provides two simulation engines: bistable engine and cotherence vector engine. QCA cells are 
assumed to have a height of 18nm and width of 18nm while the quantum dots have a diameter of 5nm. 
This follows the same assumptions as given in [87] and coherence vector engine has been used for 
simulations. Fig. 7.10 and fig. 7.11 show the QCADesigner layout for the arithmetic cell and control cell 
respectively. The layout is labeled to indicate majority gates inputs as well as the outputs. Fig.  7.12 and 
fig. 7.13 show the QCADesigner simulations results. 
 
 
86 
 
 
 
Fig. 7.10. QCADesigner layout for arithmetic cell unit  
87 
 
 
 
Fig. 7.11. QCADesigner layout for control cell unit. 
 
 
Fig. 7.12. Simulation for control cell unit 
88 
 
 
Fig. 7.12. Simulation for arithmetic cell unit 
 
Fig. 7.13 and fig. 7.14 show the QCADesigner layout for the Agrawal’s arithmetic cell and simulation 
results respectively. 
 
89 
 
 
Fig. 7.13. QCADesigner layout for high speed arithmetic cell unit 
 
90 
 
 
Fig. 7.14. Simulation for high speed arithmetic cell unit 
91 
 
Table 7.4 shows performance comparison between the two QCA designs showing the arithmetic and 
control cells have a simpler structure than Agrawal’s arithmetic cell. 
 
TABLE 7.4 
QCA PERFORMANCE COMPARISON BETWEEN THE TWO DESIGNS 
 
 
A different modeling approach has been used to simulate 10-bit QCA pipeline array. We created a 
behavior Verilog model for majority gate and used it as building block for creating majority AND, OR, 
NOT and XOR   QCA gates. Then we used Cadence NCLaunch simulation tool to test our design. Fig. 
7.15 and fig. 7.16 show the result of squaring and square rooting outputs respectively. 
 
Fig. 7.15.  Waveform of pipeline squaring output result 
92 
 
 
 
Fig. 7.16. Waveform of pipeline square rooting output result 
Fig. 7.17, fig 7.18 and fig. 7.19 show Multisim implementation for arithmetic cell, control cell and high 
speed arithmetic cell. 
 
Fig. 7.17. Multisim implementation of arithmetic cell 
93 
 
 
Fig. 7.18. Multisim implementation of control cell 
 
Fig.7.19 Multisim implementation of high speed arithmetic cell 
94 
 
Fig. 7.20, fig 7.21, fig. 7.22, fig. 7.23, fig 7.24 and fig. 7.25 show FPGA implementation for arithmetic 
cell, control cell and high speed arithmetic cell. 
 
Fig 7.20. Arithmetic cell FPGA packaging 
 
Fig 7.21. Control cell FPGA packaging 
 
Fig 7.22. High speed arithmetic cell FPGA packaging 
 
Fig 7.23. Arithmetic cell FPGA schematic layout 
95 
 
 
Fig 7.24. High Speed arithmetic cell FPGA schematic layout 
 
Fig 7.25. Control cell FPGA schematic layout 
 
 
96 
 
TABLE 7.6 
 
  QCA PERFORMANCE COMPARISON BETWEEN THE TWO DESIGNS 
 
 
TABLE 7.7 
 
DELAY TIME (PS) 
 
 
Table 7.6 and 7.7 show performance comparison between the two designs. The results show that 
number of gates that were used in control cell are four gates, four I/O pins and total nets are seven. For 
arithmetic cell, the total gates are fifteen, ten input I/O pins and total nets are twenty one. Meanwhile, in 
high speed arithmetic cell, total gates are twenty, I/O pins are fourteen and total nets are twenty seven. The 
allocated covered area and chip floor plan aspect ratio for the high speed arithmetic cell were the highest 
97 
 
due to the total gates for it is more than arithmetic cell. Also, the tables show that the maximum delay time 
for arithmetic cell is smaller than in high speed arithmetic cell. The results show  that arithmetic cell has 
less complex in hardware and processes smaller  delay time than high speed arithmetic cell as high speed 
arithmetic requires two half adder in serial and needs more processing time. 
We used Cadence Encounter to generate a COMS equivalent layout of our QCA pipeline array design. 
Ambit Buildgates is used to generate the netlist. Encounter used to assign floor plans, appropriate layers 
nano routing, and obtain the design in .gds and .def. In Encounter, we also checked for any design flawless 
in terms of connectivity, density etc. Fig. 7.26 shows the encounter part of our design. 
 
Fig. 7.26. Encounter part of QCA pipeline array 
98 
 
 
Fig.7.27. Virtuoso part of padding pipeline array 
For chip fabrication, we used Virtuoso where we have imported the design from encounter and the same 
padframed is used to send it to MOSIS for chip fabrication. The complete Layout along with the 
connections is shown in fig. 7.28 below. In digital circuits complexity is not appears to be defined because 
of a lot of parameters that can be used to measure digital complexity such as number of gates, number of 
I/O, delay, area/size. In this present work, we suggest a fuzzy complexity system. This concept needs to be 
validated and more works to be done. Fig 7.29 shows the hardware complexity fuzzy concept.   
TABLE 7.7 
COMPLEXITY FUZZY RESULTS FOR PIPLEINE ARRAY CELLS 
 
99 
 
 
Fig.7.28. Virtuoso part of pipeline array 
 
Fig.7.29. Hardware complexity fuzzy concept  
7.4 Conclusion 
We demonstrate that arithmetic cells can be successfully implemented using QCA cells. These arrays 
can perform multiplication, division, squaring and square rooting. All different modes of operation are 
controlled by a single control line. QCADesigner tool set is used to simulate both designs. We also used 
different VLSI approach to simulate 10-bit QCA pipeline array, we created behavior Verilog models for 
the design. Cadence NCLaunch simulation tool is used to simulate the pipeline array and verify that our 
design is performing as desired. Cadence encounter is used to generate a COMS equivalent layout of our 
QCA pipeline array design. Our Design layout is exported to virtuoso. Furthermore, padframing is 
performed in order to send the design to MOSIS for fabrication.  
100 
 
CHAPTER 8 
CONCLUSION AND FUTURE WORK 
8.1 Introduction 
Residue Number System (RNS) has received increased attention due to its ability to 
support high speed concurrent arithmetic applications such as Fast Fourier Transform 
(FFT), image processing and digital filters. In spite of its effectiveness, conversion and 
sign detection, base extension, scaling and division are complex operations with current 
methods. In this dissertation, we have addressed conversion, scaling and implementation 
of RNS adder and subtractor.  
Quantum-Dot Cellular Automata (QCA) is attracting a lot of attention due to its 
extremely small feature size and ultralow power consumption compared to CMOS 
technology.   This new type of nanotechnology uses different logic devices to design 
circuits. The traditional Boolean logic reduction methods such as Kranaugh maps 
produce a simplified Boolean expression. However, converting these forms to QCA 
Boolean is not simple process due to the complexity of multilevel majority gates. In this 
dissertation, we presented a novel methodology for generating QCA Boolean circuits 
from multi-output Boolean circuits and implementation of QCA pipeline array. 
8.2 Summary of Work 
A detailed research has been conducted on the residue number system and quantum-
dot cellular automata. Specific solutions have been developed to provide solutions for 
stressing issues. The following gives an executive summary of the contributions and 
results of this research. 
101 
 
 A simplified algorithm method for conversion from binary to residue number 
system was introduced. The algorithm requires less hardware size compared to 
those required by existing algorithms. It utilizes parallel-prefix techniques with 
multiplexers and modulo adders as the main building blocks without the use of 
lookup tables which makes it practical and suitable for VLSI implementation. 
While other existing methods such as Behrooa [7] uses a table lookup schemes for 
binary to residue conversion and Alia [4] uses processing elements (PE) with 
complex hardware.  
 A new scaling algorithm based on mixed radix conversion was presented. The 
algorithm utilizes a simplified base extension process that works on a smaller 
modulo. It provides an alternative method of finding the mixed radix digits with 
high degree of parallelism. The algorithm has advantages over CRT methods 
since it avoids the use of modulo computations and the use multiplicative inverse 
operation. 
 An efficient VLSI approach for the implementation of RNS adder and subtractor 
was introduced. XILINX is used to get the behavioral simulation of the design 
with the help of which we were able to verify that our design is performing as 
desired. Cadence encounter is used to build the layout of the design. Design 
layout along with the technology libraries are exported to virtuoso. Furthermore, 
padframing was performed in order to send the design to for fabrication.  
 
102 
 
 A novel methodology for generating QCA Boolean circuits from multi-output 
Boolean circuits was developed. Our methodology takes as its input a Boolean 
circuit, generates simplified XOR-AND equivalent circuit and output an 
equivalent majority gate circuits. 
 An efficient approach for implementation of a generalized pipeline cellular array 
using quantum-dot cellular automata cells was presented. The QCA pipeline array 
can perform all basic operations such as multiplication, division, squaring and 
square rooting. The different modes of operation are controlled by a single control 
line. We created behavior Verilog models for the design. Cadence NCLaunch 
simulation tool is used to simulate the pipeline array and verify that our design is 
performing as desired. Cadence encounter is used to generate a COMS equivalent 
layout of our QCA pipeline array design. Our Design layout is exported to 
virtuoso. Furthermore, padframing is performed in order to send the design to 
MOSIS for fabrication. 
8.3 Recommended Future Work 
The results of this research are promising and the following are recommended 
research topics that can be done as a continuation of this work.  
 Further research can be done on investigating new techniques for generating 
Carbon Nanotubes (CNT) Boolean circuits. Carbon Nanotubes have many 
different structures, differing in length, thickness, and number of layers. 
Although, they are formed from essentially the same material sheet, their 
electrical characteristics differ depending on these variations. With these 
103 
 
properties, they are acting either as metals or as semiconductors. Nanotube-based 
transistors, also known as Carbon Nanotube Field-Effect Transistors (CNTFETs), 
are capable of digital switching using a single electron. However, one major 
obstacle for this new emerging technology has been the lack of new methods for 
mapping existing multi-output boolean logic circuits to carbon nanotubes logic 
circuits also creating new methods for generating carbon nanotubes logic circuits 
is other area of research.  
 Modifying QCADesigner program to include QCA reduction algorithms that were 
presented in this dissertation. QCADesigner is program that had been developed 
by University of British Columbia to create a simulation tool and design for 
quantum-dot cellular automata. QCADesigner is open code source and can be 
downloaded from University of British Columbia. This tool is still under 
development and provided free of cost to the research community. 
 Also, further research can be done on creating new methods for generating 
boolean circuits for Single Electron Transistor (SET) circuits. SET operates by 
injecting or ejecting a single electron into or from a dot of silicon, so producing a 
change in electronic potential. That change must overcome thermal agitation, 
making optimized smallness of the dot essential for SET operation. With this 
property, the single electron transistor is type of switching device that uses 
controlled electron tunneling to amplify current. Usually, SET is made from two 
tunnel junctions that share a common electrode. A tunnel junction consists of two 
pieces of metal separated by a very thin insulator. The only way for electrons in 
one of the metal electrodes is to travel through the insulator. One major obstacle 
104 
 
for this new emerging technology has been the lack of new techniques for 
creating SET circuits. 
 Implementing five pins majority QCA gate. The basic elements in QCA are 
majority and inverter gates. As result, using a majority gate with more inputs in 
QCA circuit will cause reduction in cell count, latency and complexity. 
Furthermore, implementing seven input majority gate also could simplify and 
optimize QCA designs. Creating new VLSI techniques that can support five and 
seven majority gates and implementation different exciting circuits such as full 
adder with these QCA gates can be other research area. 
 Investigating new fabrication techniques of QCA logic devices is other research 
areas that can be conducted. Fabrication of QCA is still ongoing challenging 
research area.  These challenges include development of new manufacturing 
methodology for QCA circuit fabrication. Also implementation of detect free fault 
tolerant circuit. These new techniques require development new CAD methods 
and tools to help simulation.   
 Implementing residue arithmetic logic unit that can perform all modulo 
operations. RNS is an unweighted representation system of numbers.  RNS is 
based on modular arithmetic operations and it is a carry-free system. Creating   
modular logic unit that be used in applications such as Fast Fourier Transform 
(FFT), image processing and digital filters can simplify the RNS design and 
application implementation. 
 Investigating new residue number system techniques to simplify RNS magnitude 
comparison overflow detection, sign detection, parity detection and division. In 
105 
 
spite of its effectiveness, RNS has remained more an academic challenge due to 
the complexity involved in the magnitude comparison, overflow detection, sign 
detection, parity detection and division. These RNS areas are other research area 
to investigate. 
 Validating and improving the fuzzy complexity concept is other research area that 
can be investigated.   
106 
 
APPENDIX 
Setting up your Working Environment: 
1. Login to your Unix machine. 
– Use your WSU access ID and password. 
 
2. Double click on the “??????'s Home” folder on your desktop. 
– (“??????” should be your AccessID). 
 
If you already did steps 3 to 10 go to step 11 
3. Click “View” and check “Show Hidden Files”. 
 
4. Scroll down to find the .cshrc file. 
– The file is currently Read Only. 
– Right click on the file and choose “Properties”. 
– Go to the “Permissions” tag and check “Owner ->Write”. 
– Click “Close”. 
– Now the file can be edited. 
 
5. Right click on the file and choose “Open with Text Editor”. 
– This will open the .cshrc file in the text editor. 
 
6. Add these two lines to the file: 
 source /opt/cds/class/cds_setup 
 source /opt/cds/class/setup_files/vhdl/.vhdl_setup 
– Save and close the editor. 
 
7. Open a new terminal (by right click on the desktop and choose “Open Terminal”) and 
type the commands: 
   – cd $HOME 
   – source .cshrc 
 
 
8. Create new directory, name it cadence, under you 
home directory. 
– mkdir cadence 
 
9. Create vhdl directory under cadence directory. 
– mkdir vhdl 
 
10. Execute the following commands: 
– cd vhdl 
107 
 
– cp $NCVHDL/cds.lib $CDSVHDL 
– cp $NCVHDL/hdl.var $CDSVHDL 
 
11. Open cadence/vhdl/cds.lib file and add the following line 
 
DEFINE NCSU_TechLib_ami06 /opt/cds/class/local/lib/oa/NCSU_TechLib_ami06 
 
The cadence/VHDL/cds.lib file will look like  
  
include $CDS_INST_DIR/tools/inca/files/cds.lib 
#DEFINE ieee /opt/cds/ldv/tools/inca/files/IEEE 
#DEFINE std /opt/cds/ldv/tools/inca/files/STD 
DEFINE vhdl ~/cadence/vhdl 
DEFINE NCSU_TechLib_ami06 /opt/cds/class/local/lib/oa/NCSU_TechLib_ami06 
 
12. Encounter setup: Type these commands in you terminal 
      – cd $CDSVHDL 
– mkdir fe 
– cd fe 
– cp $DSMSE/ece753.conf ece753.conf 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108 
 
PART 1: ENCOUNTER 
The original procedure version was contributed from Dr. Singh VLSI Lab and this is update to current 
procedure: 
1) Open Unix terminal and type encounter 
2) Click on File (Top Left) -> Import Design   The following screen should open. 
 
3) Click on Load…  
Navigate to cadence/vhdl/fe  and select  ece753.conf  file 
 
4)  
109 
 
 
5) Click on … (top right of the screen above)  
6) Double Click the designated .verilog file which was created using BGX commands  
 
 
7) Press OK after selecting the desired .verilog file 
110 
 
 
 
111 
 
9) Click on Floorplan -> Specify Floorplan -> Ok (Just Press Ok, no need to change anything) 
10)   Click on Power -> Power Planning -> Add Ring 
 
11)  
  
 
112 
 
12) Click Route -> Special Route -> Ok (The blue line should appear now) 
13)    Click Place -> Place Jtag -> OK 
14)    Click Place -> Place Standard Cells -> OK 
15)    Click Route -> Nanoroute -> Route -> Ok 
16)    After doing step 15, a figure similar to the one below should  
 
 
 
17) Click Place -> Physical Cell -> Add Filler 
 
 
113 
 
18) Select and Add ‘FILL’ (Right Side) and Click Close 
 
 
 
 
 
 
 
 
 
 
 
 
19) FILL should appear in the Cell Name(s) as shown below. Then Click OK 
 
 
 
 
114 
 
21) Click Verify-> Verify Connectivity -> Ok (There is no need to change anything, just press OK) 
 
22) Click File -> Save ->GDS\OASIS  
(There are two Saves, one on top and other on the bottom. CLICK THE ONE ON THE BOTTOM!!!) 
23) Enter desired verilog name and GIVE THE EXTENSION .gds2 
24) Check the Structure Name (Do Not Edit the Already filled in Name) 
 
25) Click File -> Save -> DEF 
 
26)   In the encounter terminal type the following commands  
a. On your query to determine the area occupied, use command 
115 
 
==>queryPlaceDensity 
 
b. For determining number of instances do, 
==>selectInst * 
==>llength [dbGet selected] 
 
c. For Calculating delay use 
==> report_timing –from input_pin_name –to output_pin_name –unconstrained 
 
d. To view the schematic  from GUI click on ==>tools -> schematic viewer 
 
e. To get the Aspect Ratio use command: 
==>dbHeadAspectRatio 
 
f. To get the coordinate of selected box use : 
  selectInst instance_name 
  dbGet selected.box 
 
g. To Get the x and y dimensions of a particular cell,use the command after importing the design: 
  set a [dbGetInstByName instance_name] 
  set b [dbInstCell $a] 
  dbCellDim $b 
 
h. To Get the voltage for the specified cell  
  Set a [dbGetCellByName cell_name] 
  dbCellVoltage $a 
 
  
116 
 
 
 
PART 2: VIRTUOSO 
1) Open Virtuoso 
2) Close all the screens except LIBRARY MANAGER and the LOG window 
 
 
 
 
 
 
 
 
 
 
3) Click File->New->Create Library (The Library Window, not the Log window) 
 
4) Click OK. 
 
117 
 
 
 
5) Click File -> Import -> Stream (The LOG window, not the Library manager) 
 
6) Click on Options then click on Geometry 
 
7) Select Snap To Grid 
 
118 
 
 
 
8) Click Layers -> Load File 
 
 
9) Find the streamOut.map file in your cadence folder. (You might have to look around more) 
 
 
 
 
 
 
 
 
 
 
 
 
After finding the file, Select and click Open and then OK to 
return back to the Virtuoso(R) XStream In screen. 
119 
 
 
 
10) Click on … 
 
11) Find, Select and Open the .gds2 file that was saved in PART 1, STEP 23. 
 
12) Select the Destination Library (The new library created in step 3) 
13) Select the NCSU_TechLib_ami06 for your Technology Library Attachment 
 
 
120 
 
 
 
14) Click Translate. (A warning Log file will open, Just press No) 
15) Click File->Import->DEF 
 
 
16) Find the .def file in the cadence folder 
 
 
 
 
 
 
 
Select Target Library Name (Same 
one you created before) 
Fill Target Cell Name as shown (add 
‘cellname’ in the end) 
Fill Target View Name as shown (just 
add ‘layout’) 
Finally Click on … (Top Right Corner) 
This is annoying to use, but you need to browse back to 
your cadence folder and find the .def file. Click ../ to 
navigate back 
You might have to look around. 
Press OK once you find the .def file 
121 
 
 
 
17) Click File->New->Cell View 
 
 
18)  
 
 
19) Click Create and draw rectangle 
 
Example31cellname 
122 
 
 
 
20) Click Create->Add Pins 
     
 
 
 
 
 
 
 
21) Manually insert your ‘mygnd’ and ‘myvdd’ as inputs in the symbol as shown below 
. 
 
Add the Pin Names. 
Insert Space in between. 
Make sure the Direction is correct (input/output) 
Then go to the Black screen and just click away the pins. As you place each pin, it should 
automatically go to the next entered pin. 
The red ending is pointing out. 
Do the same for your outputs. 
123 
 
 
 
Click this to save symbol and then close 
 
 
 
22) Click File->New->Cell View 
 
 
23) 
 
 
 
 
 
 
 
 
Fill Cell (Just add ‘Pad’ in the end) 
Fill View (Just add schematic) 
Select Type (schematic) 
Press OK 
A blank black screen should open 
again. 
Press I on the keyboard. (Figure on 
the left should pop up) 
Click Browse and browse for the 
symbol you created. 
124 
 
 
 
23) Find, Select and Place the symbol to the black screen. (after selecting the symbol, simply move 
cursor to black screen, the symbol should appear automatically) 
 
 
24) Press I on the keyboard again and Click Browse and look for ami06_padframe (on the left) 
 
 
 
 
 
 
 
 
 
 
 
 
 
In this part, you will add padding to your circuit. 
In the middle column ‘Cell’, select and place padinc to the black screen. Padinc is the 
input padding corresponding to your input pins. Place a pad for each input. 
Do the same for output by using padout. 
Do the same for myvdd and mygnd by using padvdd and padgnd respectively. 
Press W on the keyboard to wire like so in the picture below. 
125 
 
 
 
 
 
 
126 
 
 
 
25) Click this to save schematic with padding and close. 
 
 
 
26) Click File->New->Cell View 
 
 
28) Click File->Launch XL (A screen with the placed pads should open on the left) 
29) Click this (bottom left corner) 
 
 
Fill the same name in the Cell as the 
one before 
Fill View as ‘layout’ 
Select Type as ‘layout’ 
Press OK 
A black screen should appear again. 
127 
 
 
 
30) This screen should pop up. Deselect everything under Generate except Instances 
31) Edit Width and Height to 3.6 
 
 
Like this 
 
 
Press OK 
32) Type placepads in the LOG window and press enter 
 
 
128 
 
 
 
33) Rectangular blocks in red should appear on the screen at this point. 
34) Press shift+f together to turn the red figure into 
 
 
 
NOTE: each block ONLY MOVES IN ONE DIRECTION. So move it left/right first then move it 
top/bottom or the other way around. 
 
To rotate block, select block and then click VIEW->Rotate and then click anywhere on the screen to 
initiate the rotate. 
 
SAVE THIS FILE. YOU ARE DONE 
129 
 
FPGA Implementation Procedure Update: 
Note: The following procedure was done on Xilinx Version 13. You may see variations between 
different versions. However, the general procedure for FPGA implementation remains the same, 
which is: 
 
1) Create a project, add existing source. 
2) Synthesize code by double clicking on View RTL Schematic. 
3) Assign Package Pins, under ‘User Constraints’. 
4) Generate programming file to generate .bit file 
5) Configure Device ………Boundary Scan ….. FPGA Implementation … selecting .bit file 
for your device…. Download program to the board. 
6) Testing code on the board.  
 
The below is a very quick and general overview of the entire process 
Procedure for FPGA Implementation: 
1) Create a new project 
File -> New Project 
2) The following screen will appear. Make sure you specify the correct family name, device 
name , package name and Preferred language. 
130 
 
 
3) Click next. Xilinx will show you the project summary. Verify that  and click Finish. 
4) Add an existing Verilog file. 
Right click Design window and select ‘Add Source’. 
 
131 
 
 
 
5) Xilinx will then prompt you to select Verilog file. Add it. 
6) Double click on View ‘RTL Schematic’ under Synthesize – XST 
132 
 
 
 
7) Under ‘User Constraints’, double click ‘I/O Planning- Pre-Synthesis. 
 
133 
 
 
 
8) You will then see the following screen. Please note: give Xilinx sometime to open ‘Plan 
Ahead Screen’. 
9) Then assign pins as you like in ‘I/O Ports’ wizard. 
 
134 
 
10) Save the design. 
 
 
11) Minimize Plan Ahead window and go back to Xilinx. 
12) Double click on ‘Generate Programming File’ as shown below. In this step, Xilinx will 
create a file with extension .bit 
 
135 
 
 
13) Next, expand ‘Configure Target Device’ and double click on iMPACT.  
136 
 
 
 
14) You will see the following screen. Select Boundary Scan.  
 
 
137 
 
15) Right click when prompted and select ‘Initialize Chain’. 
 
 
16) You should then then see the following two devices. 
 
 
138 
 
The device that we are interested in is XC3S200. You will be prompted to select .bit file 
that was generated earlier. Select that .bit file . 
 
 
Note: Underneath the device name, you should now see the name of the bit file you 
selected earlier. 
 
Xilinx will then ask you to add a .bit file to the second device as well. We are not 
interested in this, so click BYPASS for this one. 
139 
 
 
 
17) Then select FPGA Device when prompted as shown below. 
Click Apply  OK 
 
140 
 
 
18) Then right click on the screen and select program 
 
 
141 
 
Now you are ready to download the program to the board. 
 
19) After successful completion, you should see the message shown below: 
142 
 
 
 
20) Now, go ahead and test your code on the board using the pins you assigned earlier. 
 
 
 
143 
 
REFERENCES 
[1] M.A. Bayoumi, G.A Jullien, W.C Miller, " A look-up table VLSI design methodology for RNS 
structures used in DSP applications," IEEE Trans. Circuits and Syst., vol.34, no. 6, pp 604–616, 
Jun. 1987. 
[2] J. Bajard, and L. Imbert "A full RNS implementation of RSA" , IEEE Trans. on Comp., vol. 53, 
no. 6, pp. 769-774 , June 2004. 
[3] K. Konstantinides and V. Bhaskaran, “Monolithic architectures for image processing and 
compression,” IEEE Computer Graphics and Applications, vol. 12, no. 6, pp. 75–86, Nov. 1992. 
[4] G. Alia and E. Martinelli, "A VLSI algorithm for direct and reverse conversion from weighted 
binary number to residue number system," IEEE Trans. Circuits and Syst., vol. 31, no. 12, pp. 
1033–1039, Dec. 1984. 
[5] R. M. Capocelli and R. Giancarlo, "Efficient VLSI networks for converting an integer from 
binary system to residue number system and vice versa," IEEE Trans. Circuits and Syst., vol. 35, 
no.11, pp. 1425–1431, Nov. 1988. 
[6] A. Mohan, “Novel design for binary to RNS converters,”  IEEE Int. Symp conf. Circuits and 
Systems, vol. 2,  no. 2, pp. 357–360, June 1994. 
[7] P. Behrooa, "Optimal table-lookup schemes for binary-to-residue and residue-to-binary 
conversions," IEEE Signals, Systems and Compluter Conf., vol. 1, pp 812-816, Nov. 1993. 
[8] Mohamed. Akkal and Pepe Siy, "A new mixed radix conversion algorithm,"  Journal of 
Systems Architecture, vol. 5, no. 9, pp. 577-586, Sept.  2007. 
[9] N. S. Szabo and R. I. Tanaka, Residue Arithmetic and Its Applications to Computer 
Technology, New York: McGraw Hill, 1967. 
[10] D. N. Warren-Smith, Introduction to Digital Circuit Theory: A Monograph on Digital Circuit 
Theory from the Beginning, Digital Logic Systems, 2nd ed., 2006, ISBN: 978-095818941-5 
144 
 
[11] J. Timler and C. S. Lent, “Power gain and dissipation in quantam-dot cellular automata,” 
Journal Appl. Phys., vol. 91. no. 2, pp. 823-831, Jan. 2002. 
[12] F. Barsi and M. Cristina Pinotti, “Fast base extension and precise scaling in RNS for look-up 
table implementations”, IEEE Trans. on Signal Processing, vol. 43, no. 10, pp. 2427-2430, Oct. 
1995. 
[13] Chin-Yung, Shiou-An Wang and Sy-Yen, “Quantum boolean circuits construction using 
tabulation method,” 4th IEEE Nanotechnol. Conf.,  vol. 1,pp. 596-598, Aug. 2004. 
[14] Yuke Wang, "Residue-to-binary converters based on new chinese remainder theorems, " IEEE 
Trans. Circuits and Syst II, vol. 47, no. 3, pp. 197-205, March 2000. 
[15] Yuke Wang, Xiaoyu Song and Mostapha Aboulhamid, "A new algorithm for RNS magnitude 
comparison based on new chinese remainder theorems," IEEE Candian Conf. on Elec. and Comp., 
vol. 1, pp. 571-576, May 1999. 
[16] M. Anatha Shenoy and Ramdas Kumaresan, " A fast and accurate RNS scaling technique for 
high speed signal processing," IEEE Trans. on Acoustics. Speech and Signal Processing 
Processing, vol 37, no. 6, pp. 929-937, June 1989. 
[17] D. Banerji, "On the use of residue arithmetic for computation," IEEE Trans. Comp., vol.C-23, 
no. 12, Dec. 1974.  
[18] G. Dimauro, S. Impedovo, and G. Pirlo, "A new technique for fast number comparison in 
residue number system", IEEE Trans. on Comp., vol. 42, no. 5, pp. 608-612, May 1993. 
[19]  R. Zhang, K. Walus, W. Wang and G. A. Jullien, “A method of majority logic reduction for 
quantum cellular automata,” IEEE Trans. Nanotechnol., vol. 3, no. 4, pp. 443–450, Dec. 2004. 
[20] C. Efstathiou, D. Nikolos and J. Kalamatianos. "Area-time efficient modulo 2n-1 adder 
design," IEEE Trans. On Circuits and Systems-II, vol. 41, no. 7, pp. 463-467, July 1994. 
 
145 
 
[21] M. D. Ercegovac and T. Lang, "Simple radix-4 division with operands scaling," IEEE 
Trans.on  Comp., vol. 39, no. 9, pp.1204-1208, Sep. 1997. 
[22] A. Hiasat, "New designs for a sign detector and a residue to binary converter," IEEE Proc. on 
Circuits, Devices and Systems, vol. 140, no. 4,  pp. 247-252, August 1993. 
[23] H. S. Miller and R. O. Winder, “Majority-logic synthesis by geometric methods,” IEEE Trans. 
Elec. Comp., vol. EC-11, no. 1, pp. 89–90, Feb. 1962. 
[24] K. M. Ibrahim and S. N. Saloum, "An efficient residue to binary converter design," IEEE 
Trans. on Circuits and Sys., vol.35, no. 9. pp.1156-1158, Sept. 1988. 
[25] G. A. Jullien, "Residue number scaling and other operations using ROM arrays," IEEE Trans. 
on Comp., vol. 27, no. 4, pp. 325-337, April 1978. 
[26] F. Miyata, “Realization of arbitrary logical functions using majority elements,” IEEE Trans. 
Electronic. Comput., vol. EC-12, no. 3, pp. 183–191, Jun. 1963.  
[27] M. Lu and J. S. Chinag, "A novel division algorithm for the residue number system," IEEE 
Trans. on Comp., vol.41, No. 8, pp. 1026-1032, August 1992. 
[28] Roy D. Merrill, "Improving digital computer performance using residue number theory", IEEE 
Trans. Electronic Comp., vol. EC-13, no. 2, pp. 93-101, April 1964. 
[29] S. B. Akers, “On the algebraic manipulation of majority logic,” IEEE Trans. Electronic. 
Comp., vol. EC-10, no. 4, pp. 779–779, April 1961. 
[30] L. TAI, and F. Chen, "Overflow detection in a redundent residue number system," IEEE 
Proceedings, Part E, Computers and Digital Techniques, vol. 131,  no. 3, pp. 97-98, May 1984. 
[31] N. Takagi, H. Yasura, and S. Yajima, "High-speed VLSI multiplication algorithm with a 
redundant binary addition tree, " IEEE Transaction on comp, vol. C-34, no. 9, pp. 789-796, Spet. 
1985. 
[32] S. Talahmeh, and P. Siy, "Arithmetic division in RNS using galois field GF(p)," Computer 
Math. with Appl., vol. 39, no. 5,  pp. 227-238, March 2000.  
146 
 
[33] C. S. Lent and P. D. Tougaw, “A device architecture for computing with quantum dots,” Proc. 
IEEE, vol. 85, no. 4, pp. 541–557, Apr. 1997. 
[34] J. C. Majithia, "A pipeline array for square-root extraction," IEEE Electron. Lett., vol. 9, no.1,  
pp. 4-5, Jan. 1973. 
[35] A.B. Premkumar, "An RNS to binary converter in (2n-l),2n, (2n+1) moduli set", IEEE Trans. 
Circuits and Syst, vo1.39, no. 11, pp 480- 482, July 1992. 
[36] A.P. Shenoy and R. Kumaresan, "Residue to binary conversion for RNS arithmetic using only 
modular look-up tables", IEEE Trans. Circuits and Syst., vol.35, no.9. pp 1158-1162, Sept 1988. 
[37] P. Bernardson, "Fast memoryless, over 64 bits, residue-to binary converter," IEEE Trans. 
Circuits and Syst., vol. 32, no.3, pp 298-300, Mar. 1985. 
[38] M.Hitz and E. Kaltofen,” Integer division in residue number systems”, IEEE Trans. Comp., 
vol. 44, no. 8, pp.983-989, Aug. 1995 
[39] H. Cho and E. Swartzlander, "Adder design and analyses for quantum-dot cellular automata," 
IEEE Trans on Nanotechnology, vol. 6, no. 3, pp. 374-383, May 2007. 
[40] P. D. Tougaw and C. S. Lent, “Logical devices implemented using quantum cellular 
automata,”  Journal Appl. Phys., vol. 75, no. 3, pp. 1818–1825, Feb. 1994. 
[41] T. Oya, T. Asai, T. Fukui and Y. Amemiya, “A majority-logic device using an irreversible 
single-electron box,” IEEE Trans. Nanotechnol., vol. 2, no. 1, pp. 15–22, Mar. 2003. 
[42] Y. Fu and M. Wdlander, “Modelling and design of quantum dot cellular automata,’’ Journal 
Appl. Phys., vol. 83, no. 6, pp. 3186-3191, March 1997. 
[43] A. Gin, S. Williams, H. Meng, and P. D. Tougaw, “Hierarchical design of quantum cellular 
automata,” Journal. Appl. Phys., vol. 85, no. 7, pp.3713–3720, Apr. 1999. 
 
[44] S. Perri and P. Corsonello, “New methodology for the design of efficient binary addition 
circuits in QCA,” IEEE Trans. Nanotechnol., vol. 11, no. 6, pp. 1192–1200, Nov. 2012. 
147 
 
 
[45] S . Andraos and H. Ahmed, “A new efficient memoryless residue to binary converter,” IEEE 
Trans. Circuits and Syst., vol.35,  no. 11, pp. 1441-1444,  Nov. 1988. 
[46] J. C. Lusth and D. J. Jackson, “Graph theoretic approach to quantum cellular design and 
analysis” Journal. Appl. Phys., vol. 79, no. 4, pp. 2097 - 2102, Feb. 1996. 
[47] Rui Zhang, P. Gupta, and N. K. Jha, "Threshold network synthesis and optimization and its 
application to nanotechnologies," IEEE Trans. on Comp.-Aided Design of Intergrated Circuits and 
Systems, vol. 24, no. 1, pp. 107-118, Jan. 2005. 
[48] Rui Zhang, P. Gupta and N. K. Jha, "Majority and minority networks synthesis with and  
applications to QCA-, SET- and SET- based nanotechnologies," IEEE Trans. on Comp.-Aided 
Design of Intergrated Circuits and Systems,  vol. 26,  no. 7, pp. 1233 - 1245, July  2007. 
[49] T. V. Vu, “Efficienit implementations of the chinese remainder theorem for sign detection and 
residue decoding” IEEE Trans. Comp., vol. 34, no. 7,  pp. 646-651, July 1985. 
[50] M. B. Tahoori, J. Huang, M. Momenzadeh and F. Lombardi, “Testing of quantum cellular 
automata,” IEEE Trans. Nanotechnol., vol. 3, no.4, pp. 432–442, Dec. 2004. 
[51] P. Goel, “An implicit enumeration algorithm to generate tests for combinational logic 
circuits,” IEEE Trans. Comp., vol. 30, no. 3, pp. 215–222, Mar. 1981. 
[52] A. Chaudhary, D. Z. Chen, X. S. Hu, M. T. Niemier, R. Ravichandran and K. Whitton, 
“Fabricatable interconnect and molecular QCA circuits,” IEEE Trans. Comput.-Aided Des. Integr. 
Circuits Syst., vol. 26, no. 11, pp. 1978–1991, Nov. 2007.  
[53] S. Roy and V. Beiu, “Majority multiplexing-economical redundant fault-tolerant designs for 
nanoarchitectures,” IEEE Trans. Nanotechnol., vol. 4, no. 4, pp. 441–451, Jul. 2005. 
[54] W. Ibrahim, V. Beiu, and M. Sulieman, “On the reliability of majority gates full adders,” 
IEEE Trans. Nanotechnol., vol. 7, no. 1, pp. 56–67, Jan. 2008. 
148 
 
[55] S. Srivastava and S. Bhanja, “Hierarchial probabilistic macromodeling for QCA circuits,” 
IEEE Trans. Comput., vol. 56, no. 2, pp. 174–190, Feb. 2007. 
[56] S. Bhanja and S. Sarkar, “Probabilistic modeling of QCA circuits using bayesian networks,” 
IEEE Trans. Nanotechnol., vol. 5, no. 6, pp. 657–670, Nov. 2006. 
[57] K. Kim, K. Wu and R. Karri, “The robust QCA adder designs using  composable QCA 
building blocks,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 26, no. 1, pp. 176–
183, Jan. 2007. 
[58] J. Janulis, P. Tougaw, S. Henderson, and E. Johnson, “Serial bit-stream analysis using 
quantum-dot cellular automata,” IEEE Trans on Nanotechnology, vol. 3, no.1, pp. 158-164, March 
2004. 
[59] V. Vankamamidi, M. Orravi, and F. Lombardi, "A line-based parallel memory for QCA 
implementation," IEEE Trans. Nanotechnol., vol. 4, no. 6, pp. 690-698, Nov. 2005. 
[60] Richard F. Tinder, “Multilevel logic minimization using K-map XOR patterns,” IEEE Trans. 
On Eud., vol. 38, no. 5, pp. 370 - 375, Nov. 1995. 
[61] P. D. Tougaw and C. S. Lent, “Dynamic behavior of quantum cellular automata,” Journal 
Appl. Phys., vol. 80, no. 8, pp. 4722–4736, Oct.  1996.  
[62] C. S. Lent and B. Isaksen, "Clocked molecular quantum-dot cellular automata," IEEE Trans. 
Electron Devices, vol. 50, no. 9, pp. 1890-1896, Sep. 2003.  
[63] E. S. Mandell and M. Khatun, “Quasi-adiabatic clocking of quantum-dot cellular automata,” 
Journal Appl. Phys., vol. 94, no. 6, pp. 4116–4121, Sep. 2003 
[64] Seminario JM, Derosa PA, Cordova LE and Bozard BH, “A molecular device operating at 
terahertz frequencies: theoretical simulations,” IEEE Trans. on Nanotechnology , vol. 3, no. 1, 
pp.215–218, March 2004. 
[65] M. A.Bayoumi, G. A. Jullien and W. C. Miller, “A VLSI implementation of residue adders,” 
IEEE Trans. Circuits and Syst., vol. 34, no.3, pp. 284-288, Mar. 1987. 
149 
 
[66] F. J. Taylor, “A single modulus complex ALU for signal processing,” IEEE Trans. Acoust., 
Speech Signal Processing, vol. 33, no.5, pp. 1302 – 1315,  Oct. 1985. 
[67] D. K.Banerji, “A novel implementation method for addition and subtraction in residue number 
systems,” IEEE Trans. Comput., vol. C-23, no.1, pp. 106-109, Jan. 1974 
[68] S. J. Piestrack, “Design of residue generators and multioperand adders modulo 3 built of 
multioutput threshold circuits,”  Proc. IEEE Computers and Digital Techniques, vol. 141,  no. 2, 
pp. 129 - 134 , March 1994. 
[69] C. H. Huang, “A fully parallel mixed-radix conversion algorithm for residue number 
applications”, IEEE Transactions on Computers, vol. 32, no. 4, pp. 398 – 402,  April 1983. 
[70] M. Akkal and P. Siy, “A new mixed radix conversion algorithm MRC-II” , Journal of System 
Architecture, vol. 53, no. 9, pp. 577-586, May. 2007. 
[71] M. Becherer, G. Csaba, W. Porod, R. Emling, P. Lugli, D. Schmitt-Landsiedel, ”Magnetic 
ordering of focused-ion-beam structured cobalt-platinum dots for field-coupled computing”, IEEE 
Trans. on Nanotechnology, vol.7, no.3, pp. 316-320, May 2008. 
[72] B. Qiaa and H. E. Ruda, “Evolution of a two-dimensional quantum cellular nerual network 
driven by an extemal field,” Journal Appl. Phys., vol. 85, no. 5, pp. 2952-2961, March 1999.  
[73] M. Akkal and P. Siy “Optimum RNS sign detection algorithm using MRC-II with special 
moduli set,”  Journal of System Architecture. vol 54, no. 10, pp. 911-918. Oct. 2008. 
[75] G. C. Cardarilli, “RNS-to-binary conversion for efficient VLSI implementation,” IEEE 
Circuits and Systems I, vol. 46, no. 6, pp. 2427 – 2430, Oct. 1995. 
[76] V. Pudi and K. Sridharan, “Low complexity design of ripple carry and Brent–Kung adders in 
QCA,” IEEE Trans. Nanotechnol., vol. 11, no. 1, pp. 105–119, Jan. 2012.  
[77] M. Covemale, M. Macucci, G. Iannaccone, C. Ungarelli and J. Martorell, “Modeling and 
manufactarability assessment of bistable quantumdot cells,’’ Journal Appl. Phys., vol. 85, no. 5, 
pp. 2962-2971, Mar. 1999.  
150 
 
[78] K. Walus, R. A. Budiiman and G. A. Jullien, “Split current quantum-dot cellular automata 
modeling and simulation,” IEEE Trans. Nano., vol, 3, no. 2, pp. 249-255, June 2004.  
[79] G. Toth and C. S. Lent “Role of correlation in the operation of quantum-dot cellular 
automata,” Journal Appl. Phys., vol. 89, no. 12, pp. 7943-7953, June 2001. 
[80] J. Timler and C. S. Lent, “Power gain and dissipation in quantam-dot cellular automata,” 
Journal Appl. Phys., vol. 91. no. 2, pp. 823-831, Jan. 2002. 
[81] I. Amlani, “Experimental demonstration of a leadless quantum-dot cellular automata cell,” 
Appl. Phys. Lett., vol. 77, no. 5, pp. 738–740,  July 2000. 
[82] R. Brayton, G. Hachtel and A. Sangiovanni-Vincentelli, “Multilevel logic synthesis,” Proc. 
IEEE, vol. 78, no. 2, pp. 264–300, Feb. 1990. 
[83] K.Walus, “High level exploration of quantum dot automata (QCA),” Conf. Signal, Systm. and 
Comp., vol. 1, pp. 30- 33,  Nov. 2004.  
[84] T. Sasao and P. Besslich, “On the complexity of Mod-2-sum PLA’s,” IEEE Trans. Comput., 
vol. 39, no. 2,  pp. 262–265, Feb. 1990. 
[85] H. Cho and E. E. Swartzlander, “Adder and multiplier designs in quantumdot cellular 
automata,” IEEE Trans. Comput., vol. 58, no. 6, pp. 721–727, Jun. 2009. 
[86] K.Walus, T. Dysart, G. Jullien and R. Budiman, “QCADesigner: A rapid design and 
simulation tool for quantum-dot cellular automata,” IEEE Trans. Nanotechnol., vol. 3, no. 1, pp. 
26–29, Mar. 2004. 
[87] K. Walus, G. Schulhof and G. Jullien, “Implementation of a simulation engine for clocked 
molecular QCA,” Proc. IEEE Can. Conf. Electr. Comput. Eng., vol. 1, pp. 2128–2131, May 2006. 
[88] T. Yatagai, "Cellular logic arithmetic for optical computers", Applied Optics, vol. 25, no. 10, 
pp. 1571-1577, May 1986. 
[89] G. Strucke, "Parallel architecture for digital optical computer" Applied Optics, vol. 28, no. 2, 
pp. 363-370, Jan 1989.  
151 
 
[90] Dharma P. Agrawal and T. R. N. Rao, “On multiple operand addition of signed binary 
numbers,". IEEE Trans. Comp., vol. C-27, no. 11, pp. 1068-1070, Nov. 1978. 
[91] A.K Kamal, H. Singh and D. P. Agrawal, "A generalized pipeline array," IEEE Trans. Comp., 
vol. 23, no. 5, pp. 533-536, May 1974. 
[92] Dharma P. Agrawal, "High-speed Arithmetic arrays," IEEE Trans. Comp., vol. C-28,  no. 3, 
pp. 215-224, March 1979. 
[93] R. Golshan and J. S. Bedi, "Reversible nonlinear interface optical computing," Opt. Eng., vol. 
28, no. 6, pp. 683-686, June 1989. 
[94] M. Dagenais and W.F Sharfin "Optical switching of semiconductor laser amplifiers,” Appl. 
Phys. B., vol.  B46, no. 1, pp. 35-41, May 1988. 
[95] J. Tanda and Y Ichioka  "Modular Components for an array logic system",  Applied Optics,  
vol. 26,  no. 18,  pp.3954 -3960, Oct. 1987 
[96] J. C. Majithia and R. Kitai, "A cellular array for the nonrestoring extraction of square roots," 
IEEE Trans. Conput. (Corresp.), vol. C-20, pp. 1617-1618, Dec 1971. 
[97] H. H. Majithaia and R. Kitai, “Cellular array for nonrestoring square root extraction," 
Electron. Lett., vol. 6, pp. 66-87, 1970. 
[98] J. C. Hoffman, B. Lacaze and P. Csillag, "Iterative logical network for parallel multiplication," 
Electron. Lett., vol. 4, pp. 178-179, May 1968. 
[99] K. J. Dean, "Binary division using a data-dependent iterative array," Electron. Lett., vol. 4, pp. 
283-284, July 1968. 
[100] R. C. Devries and M. H. Chao, "Fully iterative array for extracting square roots," Electron. 
Lett., vol. 6, pp. 255-256, Apr. 1970. 
[101] A. Gex, "Multiplier-divider cellular array," Electron. Lett., vol. 7, pp. 442-444, July 1971. 
[102] J. C. Majithia, "Cellular array for extraction of squares and square roots of binary numbers," 
IEEE Trans. Comput. (Short Notes), vol. C-21, pp. 1023-1024, Sept. 1972. 
152 
 
[103] D. P. Agrawal and H. Singh, "Iterative array for square-root, division, and multiplication," 
8th Ann. Conv. Comput. Soc. of India, Mar. 26, 1973. 
[104] J. C. Majithia, "A pipeline array for square-root extraction," Electron. Lett., vol. 9, pp. 4-5, 
Jan. 1973. 
[105] R. Ranjan,  H. Singh, A. Awasthi, W. Smuda and G. R. Gerhart, "Finite state modeling of 
Mobile robots for complexity determination,” Proc. SPIE, Unmanned Systems Technology VIII, 
vol. 6230,  May 2006. 
[106] Wei Wang, M. Swamy, M Ahmed and Yuke Wang., “A high speed residue-to-binary 
converter and a scheme for its VLSI implementation, ”. IEEE Trans. Comput., vol. 6, no. 1, pp. 
330 – 333, July 1999. 
[107] B. Taskin, “Improving line-based QCA memory cell design through dual phase clocking” 
IEEE Trans. Very Large Scale Integration (VLSI), vol. 16, no. 12, pp. 1648 - 1656, Dec. 2008. 
[108] Yuke Yang and Mustafa Abd-El-Bar, “A new algorithm for RNS Decoding,” IEEE 
Transactions in Circuits and Systems-I: Fundamental Theory and Applications, vol 43, no 12, Dec. 
1996. 
[109] W. K. Jenkins and B. J. Leon, "The use of residue number systems in the design of finite 
impulse response digital filters," IEEE Trans. Circuits Syst., vol. 24, no. 4, pp. 191-201, Apr. 1977.   
153 
 
 
ABSTRACT  
EMERGING DESIGN METHODOLOGY AND ITS IMPLEMENTATION  
THROUGH RNS AND QCA 
by 
OMAR DAJANI 
August 2013  
Advisor: Dr. Harpreet Singh 
Major: Electrical Engineering  
Degree: Doctor of Philosophy 
 
Digital logic technology has been changing dramatically from integrated circuits, to a 
Very Large Scale Integrated circuits (VLSI) and to a nanotechnology logic circuits. 
Research focused on increasing the speed and reducing the size of the circuit design. 
Residue Number System (RNS) architecture has ability to support high speed concurrent 
arithmetic applications.  To reduce the size, Quantum-Dot Cellular Automata (QCA) has 
become one of the new nanotechnology research field and has received a lot of attention 
within the engineering community due to its small size and ultralow power.  
In the last decade, residue number system has received increased attention due to its 
ability to support high speed concurrent arithmetic applications such as Fast Fourier 
Transform (FFT), image processing and digital filters utilizing the efficiencies of RNS 
arithmetic in addition and multiplication.  In spite of its effectiveness, RNS has remained 
more an academic challenge and has very little impact in practical applications due to the 
complexity involved in the conversion process, magnitude comparison, overflow 
detection, sign detection, parity detection, scaling and division. The advancements in 
154 
 
 
very large scale integration technology and demand for parallelism computation have 
enabled researchers to consider RNS as an alternative approach to high speed concurrent 
arithmetic. Novel parallel - prefix structure binary to residue number system conversion 
method and RNS novel scaling method are presented in this thesis. 
Quantum-dot cellular automata has become one of the new  nanotechnology  research  
field and has received a lot of  attention within engineering community due to its 
extremely small feature size and ultralow power consumption compared to COMS 
technology. Novel methodology for generating QCA Boolean circuits from multi-output 
Boolean circuits is presented.  Our methodology takes as its input a Boolean circuit, 
generates simplified XOR-AND equivalent circuit and output an equivalent majority gate 
circuits.  
During the past decade, quantum-dot cellular automata showed the ability to 
implement both combinational and sequential logic devices. Unlike conventional Boolean 
AND-OR-NOT based circuits, the fundamental logical device in QCA Boolean networks 
is majority gate. With combining these QCA gates with NOT gates any combinational or 
sequential logical device can be constructed from QCA cells. We present an 
implementation of generalized pipeline cellular array using quantum-dot cellular 
automata cells. The proposed QCA pipeline array can perform all basic operations such 
as multiplication, division, squaring and square rooting. The different mode of operations 
are controlled by a single control line. 
 
155 
 
AUTOBIOGRAPHICAL STATEMENT  
Omar Dajani 
He received his BS degree in Computer Engineering from Jordan University of 
Science and Technology, Jordan, in 1995 and MS degree in Electrical and Computer 
Engineering from University of Detroit Mercy, Detroit, MI in 1996. He is currently 
working as electrical system engineer for Ford Motor Company and pursuing a Ph.D. 
degree in Electrical Engineering at Wayne State University, Detroit, MI. His research 
interests are in residue number system, nanotechnology, parallel processing and VLSI. 
  
PUBLICATIONS: 
 
[1] O. Dajani and P. Siy, “Novel parallel - prefix structure binary to residue number 
system conversion method,” CSC, Journal of Int. Logic and Comp., vol. 3, no. 1, pp. 1-
13, Oct. 2012. 
[2] O. Dajani and P. Siy, “Simplified RNS scaling algorithm and division algorithm,” 
CSC, Journal of Int. Logic and Comp., submitted April 2011 and accepted Aug. 2012. 
[3] O. Dajani, G. Bawa and H. Singh, “VLSI implementation of residue adder and 
subtractor,” Int'l Conf. Frontiers on Comp. Sci. and Comp. Eng. (FECS'12), vol. 1, pp. 
604 -607, July 2012. 
[4] O. Dajani and H. Singh, “Quantum boolean circuit construction methodology using 
XOR-AND reduction technique,” IEEE SEM Fall Conf., Nov. 2012. 
[5] O. Dajani, H. Singh and D. P. Agrawal, “Implementation of generalized pipeline 
cellular array using quantum-dot cellular automata,” IEEE Trans. Nanotechnol, submitted 
March 2013. 
 
