University of Windsor

Scholarship at UWindsor
Electronic Theses and Dissertations

Theses, Dissertations, and Major Papers

2005

Novel arithmetic implementations using cellular neural network
arrays.
Youssef Ibrahim
University of Windsor

Follow this and additional works at: https://scholar.uwindsor.ca/etd

Recommended Citation
Ibrahim, Youssef, "Novel arithmetic implementations using cellular neural network arrays." (2005).
Electronic Theses and Dissertations. 2878.
https://scholar.uwindsor.ca/etd/2878

This online database contains the full-text of PhD dissertations and Masters’ theses of University of Windsor
students from 1954 forward. These documents are made available for personal study and research purposes only,
in accordance with the Canadian Copyright Act and the Creative Commons license—CC BY-NC-ND (Attribution,
Non-Commercial, No Derivative Works). Under this license, works must always be attributed to the copyright holder
(original author), cannot be used for any commercial purposes, and may not be altered. Any other use would
require the permission of the copyright holder. Students may inquire about withdrawing their dissertation and/or
thesis from this database. For additional inquiries, please contact the repository administrator via email
(scholarship@uwindsor.ca) or by telephone at 519-253-3000ext. 3208.

INFORMATION TO USERS

This manuscript has been reproduced from the microfilm master. UMI films
the text directly from the original or copy submitted. Thus, some thesis and
dissertation copies are in typewriter face, while others may be from any type of
computer printer.

The quality of this reproduction is dependent upon the quality of the
copy submitted. Broken or indistinct print, colored or poor quality illustrations
and photographs, print bleedthrough, substandard margins, and improper
alignment can adversely affect reproduction.

In the unlikely event that the author did not send UMI a complete manuscript
and there are missing pages, these will be noted.

Also, if unauthorized

copyright material had to be removed, a note will indicate the deletion.

Oversize materials (e.g., maps, drawings, charts) are reproduced by
sectioning the original, beginning at the upper left-hand comer and continuing
from left to right in equal sections with small overlaps.

ProQuest Information and Learning
300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA
800-521-0600

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

R e p ro d u ce d with p erm ission of th e copyright ow ner. F u rth er reproduction prohibited w ithout perm ission.

Novel Arithmetic Implementations Using Cellular
Neural Network Arrays

by

Youssef Ibrahim

A Dissertation
Submitted to the Faculty of Graduate Studies and Research through the
Department of Electrical and Computer Engineering in Partial Fulfillment
of the Requirements for the Degree of Doctor of Philosophy at the
University of Windsor

Windsor. Ontario. Canada
2005

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

1*1

Library and
Archives Canada

Bibliotheque et
Archives Canada

Published Heritage
Branch

Direction du
Patrimoine de ('edition

395 Wellington Street
Ottawa ON K1A 0N4
C anada

395, rue Wellington
Ottawa ON K1A 0N4
Canada

0-494-09698-5

Your file Voire reference
ISBN:
Our file
ISBN:

Notre reference

NOTICE: .
The author has granted a non
exclusive license allowing Library
and Archives Canada to reproduce,
. publish, archive, preserve, consen/e,
communicate to the public by
telecommunication, or on the Internet,
loan, distribute and sell theses
worldwide, for commercial or non
commercial purposes, in microform,
paper, electronic and/or any other
formats.

AVIS:
L’auteur a accorde une licence non. exclusive
permettant a la Bibliotheque et Archives
Canada de reproduire, publier, archiver,
sauvegarder, conserver, transmettre au public
par telecommunication ou par I'lntemet, prefer,
distribuer et vendre des theses partout dans
le monde, a des fins commerciales ou autres,
sur support microforme, papier, electronique
et/ou autres formats.

The author retains copyright
ownership and moral rights in
this thesis. Neither the thesis
nor substantial extracts from it
may be printed or otherwise
reproduced without the author's
permission.

L’auteur conserve la propriete du droit d'auteur
et des droits moraux qui protege cette these.
Ni la these ni des extraits substantiels de
celle-ci ne doivent etre imprimes ou autrement
reproduits sans son autorisation.

In compliance with the Canadian
Privacy Act some supporting
forms may have been removed
from this thesis.

Conformement a la loi canadienne
sur la protection de la vie privee,
quelques formulaires secondaires
ont ete enleves de cette these.

While these forms may be included
in the document page count,
their removal does not represent
any loss of content from the
thesis.

Bien que ces formulaires
aient inclus dans la pagination,
il n'y aura aucun contenu manquant.

■ *i

Canada
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

© 2005 Youssef Ibrahim

All Rights Reserved. No part of this document may be
reproduced, stored or otherwise retained in a retrieval
system or transmitted in any form, on any medium or by any
means without the prior written permission of the author

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Abstract

The primary goal of this research is to explore the use of arrays of analog self-synchro
nized cells - the cellular neural network (CNN) paradigm - in the implementation of novel
digital arithmetic architectures. In exploring this paradigm we also discover that the
implementation o f these CNN arrays produces very low system noise; that is, noise gener
ated by the rapid switching of current through power supply die connections - so called
d i / dt noise. With the migration to sub 100 nanometer process technology, signal integ
rity is becoming a critical issue when integrating analog and digital components onto the
same chip, and so the CNN architectural paradigm offers a potential solution to this prob
lem. A typical example is the replacement of conventional digital circuitry adjacent to sen
sitive bio-sensors in a SoC Bio-Platform. The focus of this research is therefore to
discover novel approaches to building low-noise digital arithmetic circuits using analog
cellular neural networks, essentially implementing asynchronous digital logic but with the
same circuit components as used in analog circuit design.

We address our exploration by first improving upon previous research into CNN binary
arithmetic arrays. The second phase of our research introduces a logical extension of the
binary arithmetic method to implement binary signed-digit (BSD) arithmetic. To this end,
a new class of CNNs that has three stable states is introduced, and is used to implement
arithmetic circuits that use binary inputs and outputs but internally uses the BSD number
representation. Finally, we develop CNN arrays for a 2-dimensional number representa
tion (the Double-base Number System - DBNS). A novel adder architecture is described
in detail, that performs the addition as well as reducing the representation for further pro
cessing; the design incorporates an innovative self-programmable array. Extensive simu
lations have shown that our new architectures can reduce system noise by almost 70dB
and crosstalk by more than 23dB over standard digital implementations.

iv

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

To:

My parents,
my wife, Eman,
and my daughter, Lauren

V

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Acknowledgments

I would like to acknowledge all the people who assisted and supported me over the years
of my graduate study at the University of Windsor. Although this thesis prominently bears
my name, it would not have been possible without their contribution.
I would like to express my gratitude to my advisors, Dr. G.A. Jullien and Dr. W.C. Miller
for providing guidance and support. Aside from keeping me on track, they have also
allowed me the freedom to pursue research at my own pace. I would like to extend thanks
to the members of my committee, Dr. N. Yazdi, Dr. A. Ngom, Dr. C. Chen, and Prof. M.
Ahmadi whose insightful comments and helpful suggestions have improved this thesis. I
am also grateful to Dr. R. Muscedere who helped me use system-level CAD tools.
I am indebted to the RCIM lab manager Mr. Till Kuendiger who was always there when
ever I needed help. I appreciate my colleagues in the RCIM research group, my friends in
all other research groups, and the entire Electrical and Computer Engineering staff. They
were a constant source o f encouragement and optimism over the years.
I would like to acknowledge financial support from the Natural Sciences and Engineering
Research Council o f Canada, the Micronet Network of Centres of Excellence, and Gennum Corporation. Furthermore, I am obliged to CMC Microsystems (formerly The Cana
dian Microelectronics Corporation) for providing design tools, workstations and
fabrication services.
Finally, I would like to thank my parents, Salah and Nadia, for their never-fading love and
encouragement. I am also grateful to my surrogate parents, Larry and Doreen Cheshire,
and to my surrogate parents-in-law, Joseph and Betty Tome, for their love and assistance
during my stay in Windsor. I am indebted to my church family at Campbell Baptist
Church for their moral support throughout this research project. I am very grateful to my
wife, Eman, and my daughter, Lauren, for their love, inspiration, and understanding
through the very difficult moments in my studies.

vi

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Table of Contents

Abstract

iv

Dedication

v

Acknowledgments

vi

List of Figures

x

List of Tables

xiii

List of Symbols

xiv

List of Abbreviations

xvii

Introduction................................................................................................... 1

Chapter 1
1.1
1.2
1.3
1.4
1.5
1.6
1.7

Chapter 2

Low-noise - motivation.............................................................................. 1
Substrate N oise...........................................................................................3
CNN-based Arithmetic Circuits - Rationale.............................................8
Existing CNN-based Arithmetic Circuits............................................... 11
CNN-based Arithmetic Circuits Design Goals.......................................13
Thesis Overview.......................................................................................14
Thesis Organization..................................................................................15

Cellular NeuralNetworks:An Overview.................................................. 16
2.1
2.2
2.3

CNN Historyand Applications................................................................17
CNN Structures......................................................................................... 18
CNN Cell Architecture............................................................................ 20

v ii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

U niversity o f W in d so r

2.4

2.5

Chapter 3

CNN Dynamics........................................................................................22
2.4.1
Modes of Operation................................................................. 22
2.4.2
Network Convergence............................................................. 23
Summary.................................................................................................. 24

Binary Arithmetic Using CNNs................................................................ 25
3.1

3.2

3.3

3.4

3.5

Chapter 4

The Binary Number System: Overview................................................ 26
3.1.1
Definition................................................................................. 26
3.1.2
Binary Addition........................................................................26
Designing a 1-bit Binary Full Adder Using CNN (CNNBFA)........... 28
3.2.1
CNNBFA Templates Design.................................................. 29
3.2.2
CNNBFA CMOS Basic Building Blocks............................. 30
3.2.3
CNNBFA CMOS Implementation.........................................34
3.2.4
CNNBFA Hspice Simulation................................................. 36
CNNBFA Design Scalability.................................................................. 38
3.3.1
A 32-bit CNN-based Binary Adder........................................ 38
3.3.2
Impact of the CNN-based Binary Adder on Substrate Noise...
.................................................................................................39
CNNBFA Design Compatibility.............................................................42
3.4.1
A 32x32-bit CNN-based Binary M ultiplier........................... 43
3.4.2
Impact of the CNN-based Binary Multiplier on Substrate
Noise........................................................................................ 44
Summary of CNN-based Binary Arithmetic......................................... 45

Binary Signed-Digit Arithmetic Using CNNs......................................... 47
4.1
4.2

4.3

4.4

4.5

4.6

Introduction.............................................................................................. 48
The Binary Signed-Digit Number System: Overview.......................... 49
4.2.1
Definition..................................................................................49
4.2.2
BSD Addition........................................................................... 51
Designing a 1-digit BSD Full Adder Using CNN (CNNBSDFA)
54
4.3.1
A 3-State CNN Cell.................................................................. 54
4.3.2
CNNBSDFA Templates Design.............................................. 58
4.3.3
CNNBSDFA Hspice Simulation............................................. 60
CNNBSDFA Design Scalability............................................................. 62
4.4.1
A 32-digit CNN-based BSD Adder.........................................62
4.4.2
Impact of the CNN-based BSD Adder on Substrate Noise..63
CNNBSDFA Design Compatibility........................................................ 65
4.5.1
A 32x32-digit CNN-based BSD Multiplier........................... 67
4.5.2
Impact of the CNN-based BSD Multiplier on Substrate Noise
68
Summary of CNN-based BSD Arithmetic..............................................69

viii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

U niversity o f W in d so r

C hapter 5

Double-Base Num ber System Arithmetic Using CNNs.........................71
5.1
5.2

5.3

5.4

5.5

C hapter 6

Introduction.............................................................................................. 72
The Double-Base Number System: Overview....................................... 73
5.2.1
Definition................................................................................ 73
5.2.2
DBNS Addition.......................................................................75
5.2.3
Reduction to Addition-Ready Representation....................... 77
Designing a 1-bit DBNS Adder Unit Using CNN (CNNDBNSAU) ..79
5.3.1
CNNDBNSAU Templates Design.........................................79
5.3.2
Dealing with Special Cases of DBNS-maps......................... 81
5.3.3
CNNDBNSAU CMOS Implementation............................... 86
5.3.4
CNNDBNSAU Hspice Simulation........................................89
CNNDBNSAU Design Scalability......................................................... 91
5.4.1
A 20x20 CNN-based DBNS Adder.......................................91
5.4.2
Constraints on the CNN-based DBNS Adder to be Self-Pro
grammable.............................................................................. 94
5.4.3
Impact of the CNN-based DBNS Adder on Substrate Noise...
................................................................................................. 97
Summary of CNN-based DBNS Arithmetic...........................................98

Conclusions................................................................................................ 100
6.1
6.2
6.3

Summary and Contributions.................................................................. 100
Conclusions............................................................................................. 107
Suggestions for Future Work................................................................. 108

REFER EN C ES.................................................................................................................. 110

V itaA uctoris ..................................................................................................................... 126

ix

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

List of Figures

1.1
1.2
1.3
1.4

Simplified block diagram of a bio-sensor SoC chip.................................. o
Reducing system noise in the bio-sensor with smooth analog transitions,...j
Lumped model o f the substrate coupling................................................... 4
Hspice simulation of a standard digital inverter........................................ 6

Figure 1.5

Hspice simulation of d i / dt for a digital inverter and a CNN cell........... 9

Figure 1.6

Hspice simulation of d v^d t for a digital inverter and a CNN cell.......... in
Hspice simulation of a CNN cell output voltage for different time
constants........................................................................................................ 10
The flat binary adder.................................................................................... p
MATLAB simulation of the recursive binary adder.................................. 13
Examples of rectangular and hexagonal CNN grids with neighborhood of
size 1. Light grey cells belong to the neighborhood of the dark grey cell. 19
CNN cell activation function....................................................................... N
A block diagram representation of a CNN cell.......................................... N
Schematic of an electrical implementation of a CNN cell........................ 00
An example of binarv addition.................................................................... 01
Block diagram o f a binary adder: (a) 1-bit full adder, (b) n-bit binary
adder.............................................................................................................. n
Representation of the CNN-based 1-bit full adder: (a) CNN grid, (b) block
diagram......................................................................................................... 30
Schematic of a current-mode summing node............................................. 31
Schematic of current sources: (a) nMOS current source, (b) pMOS current
source............................................................................................................ 31
Schematic of simple current mirrors: (a) nMOS current mirror, pMOS
current mirror............................................................................................... 37

Figure
Figure
Figure
Figure

Figure 1.7
Figure 1.8
Figure 1.9
Figure 2.1
Figure 2.2
Figure 2.3
Figure 2.4
Figure 3.1
Figure 3.2
Figure 3.3
Figure 3.4
Figure 3.5
Figure 3.6
Figure 3.7
Figure 3.8
Figure 3.9
Figure 3.10

Schematic
Schematic
Schematic
Schematic

«■>

of a subtractor............................................................................
of absolute function................................................................... .
of basic CNN cell....................................................................... 34
o f the CNNBFA sum cell with connections to neighbors....... 35

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

U niv ersity o f W in d so r

Figure 3.11
Figure 3.12
Figure 3.13
Figure 3.14
Figure 3.15
Figure 3.16
Figure 3.17
Figure 3.18
Figure 3.19
Figure 4.1
Figure 4.2
Figure 4.3
Figure 4.4
Figure 4.5
Figure 4.6
Figure 4.7
Figure 4.8
Figure 4.9
Figure 4.10
Figure 4.11
Figure 4.12
Figure 4.13
Figure 4.14
Figure 4.15
Figure 4.16
Figure 4.17
Figure 5.1
Figure 5.2
Figure 5.3

Schematic of the CNNBFA carry cell with connections to neighbors........36
Hspice simulation of the CNNBFA............................................................... 37
Block diagram of an «-bit CNN-based binary adder.................................... 38
Hspice simulation of an 8-bit section of the 32-bit CNN-based binary
adder................................................................................................................39
Switching noise of the CNN-based and standard digital 32-bit binary
adders.............................................................................................................. 41
Cross talk noise o f the CNN-based and standard digital 32-bit binary
adders.............................................................................................................. 42
Block diagram of a carry-save tree multiplier...............................................43
Hspice simulation of a section of the 32x32-bit CNN-based binary
multiplier........................................................................................................ 44
Switching noise of the CNN-based and standard digital 32x32-bit binary
multipliers.......................................................................................................45
An example of BSD addition......................................................................... 52
Block diagram of a BSD adder: (a) 1-digit BSD adder, (b) /7-digit BSD
adder................................................................................................................53
The required CNN cell activation function................................................... 54
Schematic of the 3-input median extractor....................................................56
Transfer characteristics of the 3-input median extractor.............................. 56
Schematic of the 3-state circuit.......................................................................57
Transfer characteristics of the 3-state CNN cell............................................58
Representation of the CNN-based 1-digit SD adder: (a) CNN grid, (b)
block diagram................................................................................................. 59
Hspice simulation of the CNNBSDFA.......................................................... 61
Block diagram of an /7-digit CNN-based BSD adder....................................62
Hspice simulation of a section of the 32-digit CNN-based BSD adder...... 63
Switching noise of the CNN-based 32-digit BSD adder and 32-bit standard
digital binary adder........................................................................................ 64
Cross talk of the CNN-based 32-digit BSD adder and 32-bit standard
digital binary adder........................................................................................ 64
Block diagram of the BSD multiplier.........................
66
Hspice simulation of an 8-digit section of the 32x32-digit CNN-based
BSD multiplier. The output is in BSD representation................................. 68
Hspice simulation of an 8-digit section of the 32x32-digit CNN-based
BSD multiplier. The output is in binary representation.............................. 68
Switching noise of the CNN-based BSD and standard digital 32x32-bit
binary multipliers....................
69
Different representations o f 108 in the DBNS.............................................. 74
Graphical representation of the overlaying rule: (a) initial map, (b) final
map.................................................................................................................. 76
Addition in DBNS: (a) X, (b) Y. (c) map obtained by overlaying, (d) Z
after applying the overlaying rule................................................................. 76
xi

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

U niversity o f W in d so r

Figure 5.4
Figure 5.5
Figure 5.6
Figure 5.7
Figure 5.8
Figure 5.9
Figure 5.10
Figure 5.11
Figure 5.12
Figure 5.13
Figure 5.14

Figure 5.15
Figure 5.16
Figure 5.17
Figure 5.18
Figure 5.19

Figure 5.20

Figure 5.21
Figure 5.22
Figure 6.1
Figure 6.2
Figure 6.3

Graphical representation of the reduction rule: (a) initial map, (b) final
map................................................................................................................. 78
Non-zero digit reduction of Z: (a) initial map, (b) intermediate map, (c)
final map........................................................................................................ 78
An example of simultaneous reductions: (a) initial map, (b) and (c) correct
solutions, (d) and (e) wrong output.............................................................. 82
Two possible simultaneous applications of Eqn. (5.4) to the same cell. ...83
An example of the order of reduction: (a) initial map, (b) j in ascending
order, (c) j in descending order.................................................................... 85
Schematic of the reduction rule....................................................................87
Connection to participating cells..................................................................87
A situation where the row reduction rule can be applied to four different
groups o f cells................................................................................................88
Connection between groups of cells............................................................ 88
CNNDBNS adder cell schematic................................................................. 89
Hspice simulation of the CNNDBNSAU: (a) only the row reduction rule is
applicable, (b) both the row and overlaying reduction rules are applicable..
.........................................................................................................................90
Schematic of a 4x4 section o f the CNN-based DBNS adder..................... 92
An example of addition using the CNN-DBNS adder: (a) X, (b) 7, (c)
Overlaying X and Y, (d) Z after time T ....................................................... 92
Non-zero digit reduction of Z. Starting from left to right, each map is
obtained from the previous map after a time T...........................................93
Hspice simulation of a 4x4 section of the CNN-based DBNS adder
94
An example of a reduction rule followed by a reduction rule: (a) initial
map, (b) map after the first reduction rule, (c) map after the second
reduction rule.................................................................................................95
An example of a carry propagation rule followed by a reduction rule: (a)
initial map, (b) map after applying the overlaying rule, (c) map after
applying the row reduction rule................................................................... 96
Switching noise of the CNN-based 20x20 DBNS adder and 32-bit
standard digital binary adder........................................................................97
Cross talk of the CNN-based 20x20 DBNS adder and 32-bit standard
digital binary adder....................................................................................... 98
Switching noise of different adders vs. adder size.................................... 106
Switching noise of different multipliers vs. multiplier size...................... 106
Cross talk of different adders desisns........................................................ 107

xii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

List of Tables

Table 3.1
Table 4.1
Table 5.1
Table 6.1

Truth table of a 1-bit binary full adder.....................................................37
Truth table of BSD addition (* represents don’t care).............................. 60
Truth table of DBNS adder unit...............................................................90
Summary of design specifications.......................................................... 105

x iii

" R e p ro d u c e d with perm issio n of th e copyright ow ner. F u rther reproduction prohibited w ithout perm ission.

List of Symbols

A

Delta operator

P

Substrate resistivity

s

Substrate permittivity

a

Self-feedback current coefficient

1

Integer digit -1

2

Integer digit -2

71(0

Permutation z

*/

Two’s complement of digit at position i of operand X

r

Number of full adder levels in binary reduction tree

v

Micron

Ac

Output feedback template

A ij;kl

Element kl of the output feedback template A for the CNN cell C(i.j)

Be

Input control template

C

Carry out o f an addition

Cftj)
Ceff

CNN cell at position (i.j)

ci

Digit at position i of the carry in/out C

Cx

Input capacitance of a CNN cell

I

Bias of a CNN cell

Ini

Input i to median extractor

Effective parasitic capacitance

x iv

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

U n iversity o f W in d so r

isillj

Substrate current

l unit

Unit current

/(/£)£>

Instantaneous supply current

Ix(i,j)

DBNS representation of operand X

ly

Output current of a CNN cell

ly(i.j)

DBNS representation of operand Y

L (lj)

DBNS representation o f sum out Z

k

Average number of nonzero digits

L

Transistor length

L

Digit set of the binary signed digit

Lef f

Effective parasitic inductance

m

Milli

m

Meter

M

Number of rows in a CNN grid

m-,

Scaling factor o f a current mirror

n

Nano

n

Number o f digits in a number

N

Number o f columns in a CNN grid

N(i,j)

Neighborhood o f CNN cell C(i.j)

Ominus

Negative output current of reduction rule

Op[us

Positive output current of reduction rule

p

Pico

P

Product of multiplication

Pij

Partial product of digit i of operand X and digit j of operand Y

Pj

Product of operand X and digit at position j o f operand Y

q(r,t)

Transient charge density

r

Radius o f the CNN neighborhood

Rj

Resistor i

Rx

Input resistance of a CNN cell

Rv

Output resistance of a CNN cell

5

Sum out of an addition

Sj

Digit at position i o f the sum out S

T

Time constant o f a CNN cell

Tddqy

Maximum delay of DBNS adder

XV

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

U niversity o f W in d so r

h

Transfer digit at position i

uij

Input to CNN cell C(i,j)

V(r,t)

Transient voltage vector

^bias

Bias voltage

VDD

Power supply voltage

Veff

Effective voltage on chip

Vin

Input voltage

V0ut

Output voltage

Vref

Reference voltage

^state

State voltage of a CNN cell

w

Transistor width

wi
X

Intermediate sum at position i

xi

Digit at position / of the operand X

xu
Xy

An operand to an arithmetic operation
Digit in position (i j ) in a DBNS number
State voltage of CNN cell C(i.j)

Y

An operand to an arithmetic operation

y-i

Digit at position i of the operand Y

yy

Output voltage of CNN cell C(i,j)

z

Instantaneous sum out

~i

Digit at position i of the instantaneous sum out Z

xvi

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

List of Abbreviations

2D

Two dimensional

A/D

Analog-to-digital

ARDBNR

Addition-ready double-base number representation

BC

Before Christ

BSD

Binary signed-digit

BSDNS

Binary signed-digit number system

CDBNS

Canonic double-base number system

CD-ROM

Compact disk-Read only memory

CMOS

Complementary metal-oxide semiconductor

CNN

Cellular neural network

CNNBFA

CNN-based binary full adder

CNNBSDFA

CNN-based signed-digit full adder

CNNDBNSAU

CNN-based double-base adder unit

CNN-UM

Cellular neural network universal machine

CSD

Canonic signed-digit

CT-CNN

Continuous time CNN

DBNS

Double-base number system

DC

Direct current

DSN

Digital switching noise

DSP

Digital signal processing

DT-CNN

Discrete time CNN

ECC

Elliptic curve cryptography

FA

Full adder

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

U niversity o f W in d so r

FIR

Finite impulse response

FLP

Floating-point

IC

Integrated circuit

IDBNS

Index calculus double-base number system

IEEE

Institute of Electrical and Electronics Engineers

HR

Infinite impulse response

JSF

Joint sparse form

LNS

Logarithmic number system

LSB

Least significant bit

MAC

Multiply-accumulate

MAF

Multiplication-add fused

MCM

Multiple constant multiplication

MDLNS

Multidimensional logarithmic number system

MSB

Most significant bit

NCDBNR

Near-canonic double-base number representation

NN

Neural network

PS

Porous silicon

RF

Radio frequency

SD

Signed-digit

SDFA

Signed-digit full adder

SDNS

Signed-digit number system

SoC

System-on-a-chip

SOI

Silicon-on-insulator

SSN

Simultaneous switching noise

TFSOI

Thin-film silicon-on-insulator

VCCS

Voltage controlled current source

VLSI

Very large scale integrated (or integration)

xviii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Chapter 1
Introduction

The objective of this research project is to develop novel arithmetic
circuit structures using arrays of analog networks and the cellular
neural network paradigm. The motivation behind this work
contains both an exploration of this novel concept to a variety of
arithmetic techniques and a more practical investigation into the
use of these networks for implementing arithmetic circuits that
produce very low d i / dt noise as explained in Section 1.1. The
digital switching noise and cross talk problems are defined in
Section 1.2. A review of the research literature that addresses the
noise problem is also presented in this section. Reasons to use the
CNN paradigm are discussed in Section 1.3 and the state-of-the-art
CNN-based binary adders are presented in Section 1.4. The design
goals o f the arithmetic circuits are introduced in Section 1.5. The
thesis is outlined in Section 1.6 and Section 1.7.

1.1

Low-noise - motivation

The migration to sub 100 nanometer process technologies and the
advances in fabrication processes have allowed the packing of ever
increasing complex functionality into fewer chips on circuit boards
by the use of massive integration of circuitry onto single silicon die
using system-on-a-chip (SoC) tools and technologies. Despite the
In tro d u ctio n

L o w -n o ise - m o tiv atio n

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

l

U niversity o f W in d so r

impressive reductions in area and cost, signal integrity remains a critical issue when
integrating analog and digital components onto the same chip; for example digital
circuitry adjacent to sensitive bio-sensors in a SoC Bio-Platform as depicted in Figure
1.1. One o f the parasitic effects that adversely influences signal integrity is digital
switching noise (DSN) produced by the supply current drawn by fast switching digital
components. This noise propagates across the common Si substrate and can easily corrupt
sensitive analog signals. Noise can also couple to the substrate capacitively, a
phenomenon known as crosstalk. The noise problem is amplified as operating frequencies
increase and feature sizes decrease; being primarily responsible for inexplicable design
failure and poor yields o f mixed-signal SoC designs [1].

Sensors
control
& power

Sensor
j^rrav

Digital processor

conditioning
circuitry

A/n
A/D

Figure 1.1 Simplified block diagram of a bio-sensor SoC chip.
The work presented in this thesis couples different digital number representations with
low precision simple current-mode analog components in a novel way that combines the
computational capability of analog circuits and noise immunity of digital components. In
essence, we are building digital arithmetic circuits but using analog components to replace
uncontrolled digital transitions (produced by the digital processor of Figure 1.1) with
smooth analog transitions as illustrated in Figure 1.2. Digital arithmetic is converted into
a problem o f processing 2-D binary/ternary images and novel CNN structures are
designed to manipulate these images to perform the required arithmetic task.

In tro d u ctio n

L o w -noise - m o tivation

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

U niversity o f W indsor

Sensors
control
& power

Sensor

Analog
implementation
of digital processor

conditionmg
circuitry

A/D

Figure 1.2 Reducing system noise in the bio-sensor with smooth analog transitions.

1.2

Substrate Noise

Digital switching noise is one of the major sources of trouble in a typical mixed-signal
VLSI circuit design. When many static gates change state together, they draw a large
cumulative current from the power supply. Due to the self-inductance of the off-chip
bonding wires and package pins and the on-chip parasitic inductance inherent to the power
supply rails, as shown in Figure 1.3, the fast current surges result in voltage fluctuations
in the power distribution network [2]. The effective supply voltage on chip is given by the
following equation:

> V /=
The second term on the right hand side of Eqn. (1.1) is referred to as digital switching
noise (DSN), simultaneous switching noise (SSN), inductive—Ldi/dt—noise, or A/. A
fraction of this noise is invariably injected into the substrate.

In tro d u ctio n

S u b stra te N o ise

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

3

U niversity o f W in d so r

nn

Shared substrate

Figure 1.3 Lumped model of the substrate coupling.
The presence of parasitic capacitance between the transistors and the silicon substrate
contributes significantly to the problem. When digital circuits switch, they inject current
into the substrate via these capacitances. The amount of injected current is directly
proportional to the slew rate of the switching voltage and the lumped parasitic capacitance
according to Eqn. (1.2); this will be referred to as “cross-talk noise”.

U -

d-2)

Therefore, substrate noise increases as the operating frequency increases. Moreover, the
scaling down of feature sizes increases the total capacitance associated with the internal
circuitry [3]. With the number of transistors on a chip expected to reach over 600 million
by 2009 [4], the amount of injected noise increases dramatically.

The common substrate on which both digital and analog circuits are embedded serves as a
resistor network which can be modelled using a simplified form of Maxwell’s equations

[1]:

i vV(r, 0 + £i.(vV(r, /)) = - | V ,
p

In tro d u ctio n

dt

ot

0

S u b stra te N oise

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

(1.3)

U niversity o f W indsor

where p is the resistivity and s the permittivity of the uniformly-doped semiconductor.
V(r, t) is the transient voltage vector and —q(r, t) is the rate of generation of charge per
at
unit volume at location r = (x, y, z ) on the substrate. Consequently, voltage variations
around the injected points propagate in the substrate and also potential gradients arise due
to the resistive nature of the substrate. Assuming a 3-D semi-infinite substrate that goes to
infinity in all but one of the six spatial directions, the solution to Eqn. (1.3) in the Laplace
domain for the voltage at any point on the substrate, V2 due to a current, /, injected into
the substrate a distance r away, is:

*2 « - 2ttr
A •5(p
, ''(5
• s)?+, 1.

(i-4)

The time variant substrate voltages are sensed by MOSFETs through the body effect and
transferred to signal paths in consequence o f current fluctuations or gain mismatches in
analog circuits. On the other hand, sub-threshold current increase due to the body bias
change may degrade digital signal integrity seriously and thus cause dynamic operation
failures [5]. On-chip DSN can also create delay uncertainty since the power supply level
temporally changes the local drive current [6]. Furthermore, logic malfunctions may be
created and excess power may be dissipated due to faulty switching if the power supply
fluctuations are sufficiently large [7] [8], Predicting how and when this will happen is a
difficult problem, since it is highly dependent on the specific layout and process
technology used. Therefore, it is crucial in today’s sub 100 nanometer technology to
reduce the peak value of dynamic current provided by the supply source iDD, that is
proportional to the carrier injection into the substrate [9]. Moreover, by monitoring the
instantaneous power supply current, designers can determine a time window for the worstcase substrate current injection [1]. For example. Figure 1.4 shows the Hspice simulation
of a standard digital inverter in 0.35 pm CMOS technology during one period of
excitation. The simulation also illustrates the instantaneous power supply current drawn
by the inverter, digital switching noise, and cross talk noise.

In tro d u ctio n

S u b stra te N oise

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

5

U niversity o f W in d so r

input

\

/

Output

I

Supply current

LdiJdt

I

\

■K-

Cdv/dt

1
time

Figure 1.4 Hspice simulation of a standard digital inverter.
The rapidly growing SoC market presents an urgent need for highly effective solutions for
the substrate noise problem. Research in this area can be broadly classified into three main
categories: Process fabrication techniques, physical design and layout techniques, and
innovative digital circuit design techniques.

Process fabrication techniques: The goal of process technologists, regarding the noise
problem, is to prevent the substrate from working as a noise-coupling path by increasing
its resistance, ultimately to infinity. They employ a number of expensive techniques to
control the substrate noise problem. One technique employs a deep trench (through-thewafer) of porous Si (PS) to provide radio frequency (RF) isolation in Si between noise
generating and noise sensing circuits [10]. Traditional guard-rings [11][12] have very
limited effectiveness in suppressing the underlying substrate noise due to the fact that they
are very shallow structures on the wafer surface. However, a Faraday cage consisting of a
ring of high-aspect ratio substrate vias encircling noisy or sensitive circuits results in
improved performance [13]. A popular technique involves creating a deep N-well
structure where active devices are insulated from the substrate by a buried implant layer
[12]. Experimental results indicated that an improvement of 25-30dB can be achieved by
applying a relatively low-fluence proton bombardment on the isolation-intended region

Introduction

Substrate Noise

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

6

U niversity o f W in d so r

[14]. The expensive option based on silicon-on-insulator (SOI) wafers assures full DC
isolation; however, it fails to maintain its advantage in the high frequency AC regime [15].
Still another technique is to deploy more costly thin-film silicon-on-insulator (TFSOI)
technology where p+ substrate contact rings are used to improve the cross-talk isolation
[16] [17]. These techniques can help improve the intrinsic noise immunity of SoC devices,
but have limited effectiveness at elevated frequencies. Thus, although the use of one or
more o f these fabrication techniques can reduce substrate noise, process remedies alone
are insufficient to ensure a design's immunity to substrate noise coupling [18].

Physical design and layout techniques. Physical

design

and

layout

techniques

concerning noise immunity primarily aim to reduce the parasitic inductance associated
with the power supply network and package pins, minimize the parasitic capacitance
between transistors and the substrate, and attenuate noise coupling from one area on a chip
to another. It is common practise among analog designers to use separate supplies for
digital and analog sections of the chip to isolate the sensitive analog components from
noise introduced on the digital supplies [19]. The same technique is useful to isolate
different sensitive blocks. Dividing a chip into sections with different substrate grounds
will mitigate noise coupling [20]. Researchers have also found that using multiple digital
and analog pins can achieve the largest noise reduction. Decreasing the value of the
inductance of the bonding wire widens the bottleneck which reduces ground noise [2].
Proper choice of substrate contact geometry and placement plays a major rule in substrate
noise distribution [21] and a careful design of power lines geometry and supply network
distribution can greatly reduce parasitic inductance [22]. While relative placement of the
logic and analog blocks affects the amount of noise coupling [19], analog layout
techniques such as mirror symmetry and common-centroid geometries increases noise
immunity of analog circuitry [9]. Adding a dedicated backplane substrate contact can
substantially drain injected noise [23]. A first and excellent experimental study on the
impact of physical design on substrate coupling noise is presented in [24] and a wealth of
industry examples to highlight isolation impacts of technology can be found in [25].

In tro d u ctio n

S u b stra te N o ise

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

7

U n iv e rsity o f W in d so r

Digital design techniques. Regardless of measures taken to minimize noise coupling
from digital sections to analog sections on a chip, digital circuitry can still produce
significant transient noise. For example, the L d i/ dt noise is estimated to reach 0.35 volt
in 0.1 pm CMOS technology with 1.2 volt power supply [26]. This peak noise seriously
degrades signal integrity and can easily cause dynamic operation failure. Therefore
minimizing on-chip noise is an important element in the effort to improve a design's noise
immunity in high performance mixed-signal integrated circuits. Some techniques have
been reported that can reduce the effect of DSN. Adding decoupling capacitance can
reduce the amount of noise created by supplying local charge for nearby switching and
thus lowering the peak current drawn across the package inductance [27]. Building a
simple RC filter can leak out DSN with selected frequency roll-off [28]. A negative
feedback loop can also be formed by sampling the noise and re-injecting it into the
substrate with inverted phase. This inverted noise can reduce the substrate noise for low
frequency operations [29]. Using divided switches with current control can also reduce
switching noise by controlling the current slope [30]. A number of low-switching-noise
digital CMOS families have also been reported: current steering logic [31], folded sourcecoupled logic [32], NMOS current-balanced logic [33], and cellular neural networks
[34] [35]. However, static power consumption is the main penalty of such structures.
Furthermore, some actions at the system level can be taken to minimize switching
activities, for instance alternative architectural allocation and scheduling [9], reducing
switching activity by pin swapping [36], and the right choice of the clocking scheme [37].

1.3

CNN-based Arithmetic Circuits - Rationale

Among all o f the above methods, the use of cellular neural networks is quite interesting
for several reasons:

• uses analog circuit blocks with inherently lower system noise,
• additional noise reduction due to the asynchronous nature of the CNN arrays,
• noise reduction is independent of the traditional noise reduction methods (e.g., guard
rings) and thus can be used in combination with them, and

In tro d u ctio n

C N N -b ase d A rith m etic C irc u its - R atio n ale

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

s

U niversity o f W in d so r

• the regular structure o f the arrays and locality of connections makes it an excellent
choice for VLSI implementation.
CNN arrays inherently reduce system noise because their current-mode structures operate
with almost constant supply current, thus reducing variation in supply current and, hence,
switching noise. Moreover, the nodes are built with analog building blocks and effectively
provide controlled slewing. In digital logic, by contrast, the output of logic gates switch
rapidly between logic states; this switching rate of change being independent of the clock
rate of the input logic signals. To illustrate this idea, Hspice simulations of instantaneous
values of Ldi / dt and Cdv f dt from a standard CMOS static digital inverter are compared
to those from a CNN cell (which can be used as an inverter by forcing its operation in the
saturation mode and using the negative output of the cell) in Figure 1.5 and Figure 1.6
respectively. A parasitic inductance of InH is used to calculate the DSN while a parasitic
capacitance of 2pF is used in Eqn. (1.2). The CNN circuit clearly significantly suppresses
the noise in both cases. The advantage of using CNN arrays becomes greater as the circuit
size increases.

tim e (ns)

Figure 1.5 Hspice simulation of d i / dt for a digital inverter and a CNN cell.

In tro d u ctio n

C N N -b ase d A rith m etic C irc u its - R ationale

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

9

U niversity o f W in d so r

Digital

40

<

CNN

o

5

4

6

tim e (ns)

Figure 1.6 Hspice simulation of dv / dt for a digital inverter and a CNN cell.
In addition, CNN arrays permit a direct trade off between speed and cross talk (see
Section 2.3 for details). By increasing the integration time constant, the slope of the output
voltage is decreased and, hence, cross talk. Figure 1.7 shows Hspice simulations of the
output voltage of a CNN for different time constants.

>
•7

Direction o f increasing RC

2

2U 1
00

1

4

s

6

7

S

time (ns)

Figure 1.7 Hspice simulation of a CNN cell output voltage for different time
constants.

In troduction

C N N -based A rith m etic C ircuits - R ationale

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

10

U niversity o f W in d so r

1.4

Existing CNN-based Arithmetic Circuits

Two CNN-based binary adders, the flat adder and the recursive adder, were previously
introduced in [34] and [35] respectively. In the flat structure, the addition of two iV-bit
binary numbers A = aNaN_ ]...a 2a l and B = bNbN_ ]...b2b ] is performed through
successive conversion of the given addition operation to another equivalent addition using
the rules:

c.

1= 1
2 < i< N + 1

(1.5)

d-6)

As a result of each conversion step, one digit of the result is obtained and, in the worst
case, the complete result is obtained in N + 1 steps. Implementing this algorithm in CNN
required assigning a row of N + 1 cells for each original operand. Considering that in the
worst case N + 1 steps are needed to complete the addition, the network consisted of
2(N + 1) rows. However, as addition proceeds, more digits of the first operand become
zero and more digits of the second operand attain final value. Therefore, cells
corresponding to these digits are not required in the CNN implementation. The optimized
network structure consists of N (N + 1) cells as shown-in Figure 1.8 for the addition of
two 4-bit numbers. Two major drawbacks of this design are the huge silicon area required,
that increases with o{n~) , and consequently large power consumption.

In tro d u ctio n

E x isting C N N -bascd A rithm etic C ircu its

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

II

U niversity o f W in d so r

1
0
1
1
0
0
0
0

Co
0
0
0 0
1 0
1 0 0
0 1 0
1 1 0 i
:-®5 0m
0 '■TK a ? o -

S4
S3
S2
SI
A
B

Figure 1.8 The flat binary adder.
To save on silicon area, the author implemented a recursive adder in [35] that requires four
rows o f N + 1 cells. In this structure, the information is allowed to flow from the first two
rows back to the last two rows. Unlike the flat structure which performs quite robustly for
a wide range o f template values, this parameter here should be chosen carefully to avoid
divergence due to a potential race problem. The structure also lacks speed and stability
due to the recursive operation as shown in the simulation of Figure 1.9. As addition
proceeds, the height of the generated carries gradually falls. This indicates that the adder
will eventually diverge from correct sum outputs for large operands. To alleviate this
problem, the author suggested applying a positive velocity vector across the array from the
LSB to the MSB. This was implemented by decreasing the value of the self-feedback
factor a by 0.05 per bit position. This fine-tuning method can work with MATLAB
simulation as variables can be decremented virtually by any small value. However, its
realization is almost impossible because the analog designer is restricted by the physical
constraints imposed by the technology being used.

In tro d u ctio n

E x istin g C N N -b ase d A rith m etic C irc u its

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

12

U n iv e rsity o f W in d so r

5
4

*>

0
c

-4
•S

0

4

6

8

10

12

14

16

18

20

time (ns)

Figure 1.9 MATLAB simulation of the recursive binary adder.
Our research work builds on the previous work introduced in [34] and [35] by providing a
novel systematic paradigm for implementing digital arithmetic circuits using analog CNN.
The design methodology presented in the following chapters takes into consideration the
pitfalls of the previous designs and ensures convergence of the network while optimizes
its speed, silicon area, and power consumption.

1.5

CNN-based Arithmetic Circuits Design Goals

There are three major design goals that need to be fulfilled for a practical and successful
design o f a CNN-based arithmetic circuit: Convergence, scalability, and compatibility.
The first design goal corresponds to the continuous output feedback nature of the CNN
while the other two design goals come from the wide spectrum of applications using
arithmetic circuits and a wealth of arithmetic circuit designs available in the literature.

1. Convergence: This requirement ensures that the developed CNN-based arithmetic cir
cuits, after transient time, will always approach one of the stable equilibrium points, as
will be discussed in Section 2.4.2.

In tro d u ctio n

C N N -b ase d A rith m e tic C ircu its D esign G o als

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

13

U niversity o f W in d so r

2. Scalability: Precision required for arithmetic circuits varies by function. Consider mul
tiplication as an example. At the low end, 8 bit words are used, as is the case in image
compression algorithms, or 16 bits in more precise DSP tasks. At the high end, the
word lengths in the IEEE double precision floating point standard are 53 bit and 64 bit.
The scalability requirement ensures that arithmetic circuits with arbitrary sizes can be
developed to meet the needs of specific applications.
3. Compatibility: The enormous collection of arithmetic circuit designs available in the
literature places a stringent demand on new designs to provide backward compatibility.
Instead of re-inventing the wheel, this requirement guarantees that the developed CNNbased arithmetic circuits can be used as embedded components in existing, more com
plex circuit structures without the need to re-design the whole circuit.

1.6

Thesis Overview

This research work explores the implementation of arithmetic circuits using arrays of
analog circuits and the CNN computing paradigm, and also addresses mixed-signal
applications where the presence of digital switching noise is a major problem. We thus
describe a general technique for building low-noise digital arithmetic circuits using analog
cellular neural networks, essentially implementing asynchronous digital logic with analog
circuits. Each node in our asynchronous architectures uses controlled current sources
driving into capacitors; providing both low current and voltage time derivatives (5 //5 r
and 8 v /§ 0 and, as a result, reducing both instantaneous and time-averaged system and
cross talk noise. In our approach, nonlinear templates are employed to perform the
required arithmetic task without decomposing the arithmetic operation into primitive
linear templates. Utilizing nonlinear templates facilitates performing medium complexity
arithmetic operations with three major advantages. 1) Considerable reduction in
processing time; since the arithmetic task is performed using one nonlinear template, the
time needed to load/unload different templates with their inputs and initial conditions is
eliminated. 2) Simplification o f the circuit; this is because the control logic traditionally
required to control template operations is no longer needed. 3) Decrease in power
consumption; this result comes straightforwardly from the fact that power consumption is
Introduction

Thesis Overview

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

14

U niversity o f W in d so r

directly related to processing time. Therefore, reducing processing time translates into
lower power consumption. Moreover, removing the control logic from the circuit structure
reduces power consumption further by the amount needed by the control logic.

To demonstrate the effectiveness of our methodology, we have designed and simulated
CNN arrays for arithmetic operations using binary, binary signed-digit, and double-base
number systems. First, we re-defined the arithmetic task in the given number system using
continuous functions which are mapped into nonlinear templates. We then designed and
simulated a CMOS circuit implementation of the arithmetic operation. We have
demonstrated that the designed structures, regardless of the number system being used, are
quite modular which enables the accurate evaluation of the performance of larger
networks. We have also presented other novel contributions including the introduction of
a new class o f CNN featuring a 3-state transfer characteristics and an innovative selfprogrammable array using a novel feedback connection between groups of cells. Finally,
we have analyzed the performance of the designed circuits in terms of power
consumption, delay, and area. We have finally illustrated the efficiency of our designs to
suppress noise by comparing them to standard digital implementations.

1.7

Thesis Organization

The thesis is organized as follows. Chapter 2 provides the basic theory of CNN arrays
required to understand the work presented in subsequent chapters. Chapter 3 to Chapter 5
introduce general procedures to develop arithmetic circuits for the three number systems
described in Section 1.6. Each chapter analyzes the corresponding arithmetic operations,
defines the required templates, and provides Hspice simulations to demonstrate
convergence of the designed circuits. We also present the designs of multi-bit adders and
multipliers to illustrate the scalability and compatibility of each algorithm in more useful
arithmetic tasks. Each chapter also examines the impact of the corresponding design on
system noise using extensive Hspice simulations. Finally, Chapter 6 summarizes the work
and provides a detailed comparison of the performance of each design in terms of noise,
area, delay, and power consumption. Chapter 6 also presents the final conclusions.

In tro d u ctio n

T h esis O rganization

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

15

Chapter 2
Cellular Neural
Networks: An
Overview

Cellular Neural Network (CNN) arrays represent a massively
parallel asynchronous computing paradigm that is a hybrid of
Cellular Automata (CA) and Artificial Neural Networks (ANNs).
CNN arrays take
connectedness

advantage

makes

the

of both worlds: their local

arrays

well

suited

for

VLSI

implementation, and, similar to ANNs, they provide a natural
parallel processing paradigm. The CNN regular grid-like structure
also makes it a good candidate for online solutions of systems of
first order non-linear differential equations. CNNs represent an
analog nonlinear dynamic system operating in continuous or
discrete time. When considered as a system, a CNN is characterized
by the fact that information is directly exchanged just between
neighboring neurons. Of course, this characteristic does not prevent
the capability of obtaining global processing. Cells that are not in
the immediate neighborhood have an indirect effect because of the
propagation effects of the dynamics of the network. By exploiting
locality o f connections, electronic IC and optical or electro-optical
implementations become feasible, even for large nets, which is the
main advantage o f CNNs over ANNs. Since the research work
presented in this thesis is built on CNN arrays, the purpose of this
chapter is to acquaint the reader with basic CNN theory needed for
subsequent chapters. A brief history of CNNs and the scope of
Cellular Neural Networks: An Overview

R e p ro d u c ed with perm ission of th e copyright ow ner. F u rth er reproduction prohibited w ithout perm ission.

16

U niversity o f W in d so r

applications is given in Section 2.1. The spatial layout and restrictions on connections
between neighboring cells are described in Section 2.2. In Section 2.3. mathematical
equations defining the behavior of a CNN cell are reviewed and a general circuit
architecture of a CNN is given. The dynamics of CNNs is discussed in Section 2.4, and
modes o f operation and notes on CNN stability are also presented.

2.1

CNN History and Applications

Since their introduction in 1988 by Chua and Yang [38][39], Cellular Neural Networks
have attracted considerable attention. They are well suited for image processing
applications, because of their two-dimensional structure and local interconnections, which
are typical characteristics o f many image processing algorithms [40]-[42]. Enormous
advances have been made by many researchers in this field [43]. While software
prototypes prove the potential of CNN [44]-[46], a great deal of research has also been
reported in hardware implementations which can be used for real-time applications. These
implementations include transconductance-mode based processing elements [47],
switched-current signal processing elements [48], discrete-time implementations [49] [50],
current-mode implementation [50][51] and more [52]-[54]. The first CNN realizations
were designed to perform one specific function in image processing or classification, such
as edge detection [40], connected component detection [55], noise removal [39], or hole
filling [56]. More complex image processing functions are also reported including image
and video compression [57]-[60], image rotation [61]-[62], nonlinear image filtering [63][66], image enhancement [67]-[69], image restoration and reconstruction [70]-[72], image
segmentation [73]-[75], pattern matching and classification [76]-[78], and character/face
recognition [79]-[81]. CNN arrays have also been applied to a wide variety of important
tasks in robot navigation [82]-[84], motion detection and estimation [85]-[88], defect
inspection [89]-[92], satellite communication and secure transmission systems [93]-[97],
analysis of brain electrical activity in epilepsy [98]-[100], cryptography [101], bionic
eyeglasses [102]-[l 03], and solving partial differential equations and optimization
problems [104]-[l07], just to mention a few. More recently, researchers investigated
programmable CNNs to provide flexibility in implementing analog parallel array

C ellu la r N eu ral N etw o rk s: A n O v erv iew

C N N H istory and A p p lications

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

17

U n iv e rsity o f W indsor

processors [53],[108]. In other work, the slope and the threshold of the activation function
has also been made tunable [109]-[110]. Because of the thresholded activation function at
the output of the CNN structure, many image processing applications have traditionally
been based on black and white images even though CNN arrays, by their nature, are
analog, continuous processing systems. However, grey/color based applications have
recently started to emerge [61], [66], [111]-[113]. From a hardware point of view, the
local connectivity of the CNN array lends itself to practical VLSI implementations. The
addition of logic functions results in a programmable analog/logic array computer capable
of performing algorithms that combine the strengths of analog template processing and
logic operations [41],[114]. In fact, CNN arrays sets the platform for a new algorithmic
style based on the spatio-temporal properties of the array. The key elementary instruction
is a spatio-temporal transient generated by a two dimensional nonlinear dynamic
processor array. This basic instruction resembles the typical convolutional operator used
in image processing applications.

2.2

CNN Structures

The CNN is intrinsically defined spatially; generally only 1- or 2-dimensional space is
considered, so that the CNN can be realized physically. The most common types of CNN
can be characterized as a 2-D planar array of dynamic cells (neurons) with rectangular,
1

^- ,

triangular or hexagonal geometry. Any cell on the /

.L

row and /

column, C(i,j), is

connected only to cells within a small neighborhood, denoted as N(i,f). For hardware
implementations, and due to the wiring complexity involved, most often neighborhoods
are of radius 1; although, for software simulations, a radius of 3 or more has been reported.
As an example. Figure 2.1 shows two different CNN grids with a neighborhood of radius
1. Note that in this figure, each cell in the rectangular grid is connected to the inputs and
outputs of 9 cells, including to itself, while cells in the hexagonal grid are connected to the
inputs and outputs of 7 cells for the same radius.

C ellu lar N eural N etw o rk s: A n O v e rv ie w

C N N S tructures

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

18

U niversity o f W in d so r

Figure 2.1 Examples of rectangular and hexagonal CNN grids with neighborhood
of size 1. Light grey cells belong to the neighborhood of the dark grey cell.
In a CNN, cells may be all identical or they can belong to a few different types as is the
case for biological neurons. The interconnection strengths or connection weights are
usually spatially invariant. However, more than one connection network may also be
present, with a different neighborhood size to permit short range interactions and
subsystem connections. To ensure that the cells on the perimeter of the CNN grid achieve
proper convergence, dummy border cells (hatched cells in Figure 2.1) are added on the
border of the processing array to simulate interaction with imaginary cells outside the
CNN grid. The size o f the dummy border depends on the neighborhood radius. For
example, for a rectangular array with a neighborhood of radius 1, the width of the dummy
border would be 1 cell as depicted in Figure 2.1. A dummy cell outputs a constant voltage
that a properly converged computing cell would produce if it were in its place. A dummy
cell would also receive an input signal voltage as if it were a member of the array.
Therefore, a cell on the perimeter of the CNN array uses the input signal voltage and
dynamic output voltage of neighboring cells as well as the static output and signal
voltages of the dummy cells to arrive at the proper final state. The border cells are treated
as members of the array for initialization purposes and template implementation, but are
not considered in the final state analysis.

All cells in the CNN operate in parallel and when one computing cell is allocated to each
pixel in the 2D input signal, the CNN achieves very high signal processing speeds.
Although the cells are only locally connected, the network is able to perform global
C ellu lar N eu ral N etw o rk s: A n O v e rv ie w

C N N S tru ctu res

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

19

U niversity o fW in d s o r

operations on the 2-D inputs. This is possible because the missing global connections are
replaced by a time-multiplexing of the connections and the time-propagation of the
information through the network from cell to cell.

2.3

CNN Cell Architecture

CNN cells are multiple input - single output nonlinear processors that consist o f linear and
nonlinear circuit elements. Each cell is characterized by an internal state variable Xp that
is bounded for all time t > 0. Every cell also has a constant external input uj and output y p
The evolution and dynamics of the state of cell ij is described by the first order nonlinear
differential equation:

ca/o=
-jk/')+IN,J
*

C£

A *

+I v
C

+ !

£ A'„

and output function:
= j ( M ' ) +1|- K / ' ) - l |)

(2.2)

where I is a local value called the bias, and Ny is the /--neighborhood of the cell C(/j)
which contains all cells within a radius r. The output nonlinearity / / / is a piecewise
linear fu n ctio n ;/is linear in the unit range [-1,1], and outside the unit range the output
saturates to +1 for positive state values and to -1 for negative state values, as shown in
Figure 2.2. Ac and Bc are two generic parametric functionals. The Ac template
connections represents the inter-cell connection weights and provides an output feedback
mechanism. The Bc template connections in turn represents connections to the input and
serves as an input control mechanism. Specific entry values of the bias term and the
feedback and control templates are application dependent and, most often, are identical for
all cells (so called cloning templates). The constant bias, I, and the cloning templates
determine the transient behavior of the cellular nonlinear network.

C ellu lar N e u ra l N e tw o rk s: A n O v e rv ie w

C N N C ell A rc h ite c tu re

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

20

U niversity o f W in d so r

Figure 2.2 CNN cell activation function.
The solution to this system of equations is the classical exponential function of a first
order system. The maximum convergence rate of a cell is determined by the integration
time constant CXRX. Therefore the speed of the CNN array can be controlled by adjusting
this value. This property is crucial in controlling the cross talk (see Section 1.3).

In Figure 2.3, a block diagram that implements Eqn. (2.1) is represented. The cell sums
the incoming signal from the neighbors, itself, and the constant bias of the I template and
integrates them to compute its internal state. The cell also sends two signals to each of its
neighbors: one signal is its output multiplied by a weight from the Ac template; the second
signal is its input multiplied by a weight from the Bc template.

F eed b a ck m a trix
A c tiv a tio n fu n ctio n

output^,;
state .v,

X,(0)

state X,

Control matrix

Figure 2.3 A block diagram representation of a CNN cell.
C ellu la r N e u ra l N etw o rk s: A n O v e rv ie w

C N N C ell A rch itectu re

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

21

U niversity o f W in d so r

The schematic of an electrical implementation of a CNN cell is shown in Figure 2.4. The
linear capacitor Cx and the linear resistor Rx constitute a lossy current integrator. The
output yij is generated by a nonlinear voltage controlled current source (VCCS) Iy across
the output resistor Ry. The VCCS Iy is controlled by the state voltage xiy The linear VCCS,
Acyc and Bcuc, generate current signals that are sent to the states of the neighboring cells
(and itself). The / template is realized by the independent current source I.

C

—1—

4v

Figure 2.4 Schematic of an electrical implementation of a CNN cell.

2.4

CNN Dynamics

Each cell in a CNN is a non-linear dynamic system capable o f processing continuous
signals in either continuous-time or discrete-time modes. Continuous time (CT-CNN) are
nonlinear dynamic systems described by differential equations. Discrete time (DT-CNN),
with advantages similar to CT-CNN in terms o f local activity etc., is described by
nonlinear finite difference equations [115]. In most cases, the network is non-Markovian;
i.e., the future internal state depends also on the past history of the system [116]. In the
special case of a time-variant CNN, all the templates, neighborhoods, and parameters can
also be a function of time. This complex dynamic phenomena of CNN arrays has been
studied by many researchers [117]-[119].

2.4.1 Modes of Operation
The CNN can operate in two modes: Input-driven mode and autonomous mode.

Cellular Neural Networks: An Overview

CNN Dynamics

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

22

U niversity o f W in d so r

In input-driven mode, the signal to be processed is applied to the inputs of the CNN cells
and the states of the CNN cells are set to the initial conditions. After transient time, the
results o f the computation can be taken from the steady state equilibrium values and the
outputs of the cells.

In autonomous mode, the Bc template is set to zero and the inputs of the CNN network are
not used. The signal to be processed is applied as the initial conditions o f the network. The
result of the processing operation is contained in the steady state equilibrium values and
the outputs of the network.

2.4.2 Network Convergence
It is well known that the stability of CNN arrays is a critical characteristic for most
applications [120]- [123]. The stability criteria requires that the state o f each cell should be
bounded for all time / > 0 and, after the transient has settled down, a cellular neural
network must always approaches one of its stable equilibrium points. This last fact is
relevant because it implies that the circuit will not diverge or oscillate. The rate of
convergence is determined by many factors including the amount of current that is
flowing into the cell, the capacitor Cx, and the effecvtive resistance Rx in parallel with Cx.
The stability o f CNN networks is analyzed in [38] and [124] by defining a Lyapunov's
function which expresses the generalized energy present in the system. The properties of
this function imply that the states of the network will always evolve towards a constant
DC equilibrium value. Moreover, if the self-feedback term A n = a

satisfies the

condition: a > 1 / RX, each cell state settles at a stable equilibrium point with a magnitude
greater than 1. This means that the outputs of the network will be binary because of the
nonlinear activation function. This characteristic makes the CNN very attractive for some
pattern extracting applications, such as edge detection and connected component
detection, where a binary output image is acceptable. In such cases, the issues of linearity,
precision, and offsets of the output values are not relevant because the state variables are
not of critical importance [51],[114]. However, in some cases, a CNN with linear
continuous observable outputs is required, for example, to build a real time control system
C ellu la r N eu ral N e tw o rk s: A n O v e rv ie w

C N N D ynam ics

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

23

U niversity o f W in d so r

or to obtain an output image with multiple grey or color levels. Since the cells outputs are
limited by the activation function, state variables can be used as continuous outputs [111].

2.5

Summary

In this chapter, the basic theory of cellular neural networks is presented. A brief history of
CNN and the horizon o f applications available in the literature is also given. The most
common types o f CNN can be defined spatially as a 2-D planar array of dynamic cells
with rectangular, triangular or hexagonal geometry. Dummy border cells are usually
added on the border of the processing array to ensure that the cells on the perimeter of the
CNN grid achieve proper convergence. The behavior of the CNN cell is expressed as a
first-order differential equation and a functional schematic with continuous feedback is
analyzed. The dynamics of the nonlinear systems are discussed and differences between
input-driven mode and autonomous mode are explained. The stability criteria of these
continuous feedback systems requires that the network should always evolve towards a
constant DC equilibrium value.

C e llu la r N eu ral N e tw o rk s: A n O v e rv ie w

S u m m ary

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

24

Chapter 3
Binary Arithmetic
Using CNNs

This chapter presents an intriguing technique for building binary
arithmetic circuits using analog cellular neural networks. The
developed circuits also exhibit very low system noise because of
the smooth dynamics of the interconnected nonlinear analog cells.
Moreover, cross-talk can be controlled by adjusting the RXCX
integration time constant of the cell, as discussed in Section 1.3.
The new designs reduce the peak system noise by up to 57dB and
cross talk by about 20dB when compared to traditional CMOS
digital counterparts, developed in the same 0.35pm CMOS
technology. The chapter is organized as follows. In Section 3.1, the
binary number system is briefly reviewed and the binary addition
algorithm is explained. A systematic procedure to implement a 1bit binary full adder using an array of analog CNN cells is
introduced in Section 3.2. The binary addition algorithm is first
examined. The sum and carry functions are defined using new
continuous functions. This facilitates implementing the sum and
carry functions using simple basic analog circuits which is
discussed subsequently. CMOS implementation of the full adder is
then presented. The convergence of the full adder for all possible
inputs is shown using comprehensive Hspice simulations. This last
property guarantees the stability of larger arithmetic networks. The
scalability of the full adder is addressed by designing a 32-bit
Binary Arithmetic Using CNNs

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

25

U niversity o f W in d so r

CNN-based binary adder in Section 3.3. The compatibility of the full adder design with
existing complex structures is discussed in Section 3.4. The design of a 32x32-bit binary
multiplier is presented to demonstrate that the new full adder can be embedded in existing
architectures without modifying the original circuit. Noise performance of the new
designs is also addressed. The chapter is summarized in Section 3.5.

3.1

The Binary Number System: Overview

3.1.1 Definition
The binary number system is a weighted number system in which any algebraic value X
can be represented by an n-bit vector as:

n- l

X= 2 >
i

x 2''

(3.1)

= 0

where x j s [0,1 ] and the algebraic value X is bound by 0 < X < (2” - 1).

3.1.2 Binary Addition
Binary addition is the most basic function of binary arithmetic because other binary
arithmetic operations can be decomposed into primitive operations performed using
binary addition. For example, binary subtraction can be calculated by adding the minuend
to the 2's complement o f the subtrahend and multiplication can be obtained by adding the
multiplicand to itself a number o f times equal to the multiplier. As with decimal addition,
when the result of adding two binary bits at position i exceeds the value of the binary radix
(2), a carry out is produced and added to the next place z+1 as a carry in. This process is
shown in the binary addition example of Figure 3.1. The example shows the addition of
two 5-bit numbersX=\2 and 7=23.

B inary A rith m etic U sin g C N N s

T h e B in a ry N u m b e r S ystem : O v erv iew

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

26

U niversity o f W in d so r

5

4

3

2

1

2 2 2 2 2

0

2

1 1 1 1 1

0 0 1 1 0

carry

1

1 0

0

1 0

x
Y

+ 0 1 0 1 1 1
=

weight

0

5

Figure 3.1 An example of binary addition.
The addition process described above can be made modular by using a functional unit
calleda full adder(FA). TheFA shown in Figure 3.2-a accepts three binaryinputs x,-, y h
and c/, and generatestwobinary outputs 5;- and C/+1.

The operation of the FA can be

described using the following equation:

x i + y i + c i = 2 c i + i + Si

(3 -2 )

The solution to Eqn. (3.2) is given by:

si = (xi + y i + ci) m o d 2

(3 J)

c/ + i = | > / + J'I-+ cf) / 2 J
where \_aJ represents the largest integer value such that \_aJ < a .

Since each FA exchanges binary values with its immediate neighbor full adders only, one
can create any arbitrary size w-bit parallel binary adder by cascading n full adders together
as shown in Figure 3.2-b.

B inary A rith m etic U sin g C N N s

T h e B inary N u m b e r S y stem : O v e rv ie w

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

27

U niversity o f W in d so r

y,

x,

M
FA

FA

• • •

FA

FA

Figure 3.2 Block diagram of a binary adder: (a) 1-bit full adder, (b) /2-bit binary
adder.

3.2 Designing a 1-bit Binary Full Adder Using CNN
(CNNBFA)
In the CNN paradigm, developing a certain function or operation means to design
template connections, both output feedback template and input control template, as well as
the bias to each cell so that, when applying these templates to the CNN network, the
output of the network corresponds to the desired function applied to the input. Binary
arithmetic can be implemented on CNN arrays by decomposing the arithmetic operation
into a set o f primitive Boolean functions. Each Boolean function can be mapped into a set
of linear templates as described by Galias [126]. These linear templates would be applied
in sequence, each for a finite time, to obtain the final output of the arithmetic operation.
Another method to implement binary addition using recursive nonlinear templates is
described in [34], However, the network proposed is sensitive to template values and
diverges for operands larger than 8 bits. In this section, a novel method to implement
binary addition using nonlinear templates is introduced. The algorithm guarantees stability
of the operation and the scalability of the CNN network to perform addition of operands of
arbitrary size is discussed in Section 3.3. Binary multiplication using this technique is
addressed in Section 3.4.

B in ary A rith m etic U sin g C N N s

D e sig n in g a I-b it B in ary Full A d d e r U sing C N N (C N N B F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

28

U niversity o f W indsor

3.2.1 CNNBFA Templates Design
The objective here is to describe the sum and carry outputs of the binary addition
operation described by Eqn. (3.3) as continuous (analog) functions of the values of the
input bits x,-, y h and cr Thereupon, one can use the new continuous functions as the
templates connecting cells in a CNN network. The binary sum function given in Eqn. (3.3)
can be re-written using the XOR logic function as:

si = x i @ yi @ci

(3.4)

The logic XOR function can be defined in the continuous analog domain as the absolute of
the difference between the two inputs:

x j @ yi =

(3.5)

Then the binary sum function can be directly mapped into the continuous analog domain
by substituting Eqn. (3.5) into Eqn. (3.4) twice yielding:

si = I K '- T / I - c J

(3.6)

Now consider a CNN implementation of the 1-bit binary full adder shown in Figure 3.2-a.
Each of the output variables, 5/ and

can be implemented using one CNN cell. The

input variables, X;, y-P and cr can be applied as input signals to the CNN network. The
input signals must be binary voltages for design compatibility with standard digital
circuits. However, employing current-mode circuits can substantially reduce circuit
complexity and interconnects. For example, adding two signals in current-mode can be
performed with a wired sum without active devices. A current-mode CNN cell can act as
an input buffer that accepts voltage inputs and produces current signals for internal
processing. Observe from Figure 3.2-b that the carry output from weight 2' is the same as
the carry input to weight 2'~' and there is no need to use a CNN cell for the carry input.
The final CNN grid is shown in Figure 3.3-a. With this mapping in mind, the template
connections for the binary sum function of Eqn. (3.6) can be written as:
B inary A rith m etic U sing C N N s

D esig n in g a 1-bit B inary Full A d d e r U sing C N N (C N N B F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

29

U niversity o f W in d so r

Aij:ki(yki(t) ) = P • | | y / + 2 , / 0 - ^ + 3 , 7 ( 0 1 - y i+ w ( 0 |

( 3 - 7)

where P is a constant chosen to speed up the transition.

The binary carry function of Eqn. (3.3) can be mapped into the continuous analog domain
using a simple summing node where all the currents from the involved cells are summed
together. The thresholded output nonlinearity of the CNN cell can be utilized to
implement the floor function. However, this requires forcing the CNN cell state voltage to
settle at a value outside the linear range [-1,1]. This can be achieved by subtracting a unit
current instead of dividing by two in Eqn. (3.3). The new equation describing template
connections to the carry cell can then be written as:

(3.8)

c,-t

CNNBFA
x,
y.

“

r ~
s,

b

Figure 3 3 Representation of the CNN-based 1-bit full adder: (a) CNN grid, (b)
block diagram.

3.2.2 CNNBFA CMOS Basic Building Blocks
The CNNBFA shown in Figure 3.3 consists of four CNN cells that are connected together
using the templates described by Eqn. (3.7) and Eqn. (3.8). The templates can be
synthesized using several primitive current-mode components. Then the CNNBFA can be
constructed by connecting the four basic CNN cells using these primitive current-mode
circuits. The basic building blocks of the CNNBFA are:

B inary A rith m etic U sin g C N N s

D esig n in g a 1-bit B inary Full A d d e r U sing C N N (C N N B F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

30

U niversity o f W in d so r

Wired-sum: The most attractive feature of current-mode logic circuits is that arithmetic
summation, including polarity, of analog currents can be performed by means of a simple
wired connection without active devices. From Kirchhoff s current law, the current I in
Figure 3.4 is given by:

/=/,+/2

(3.9)

This property reduces interconnection complexity and the goal in the design of the
current-mode circuits is to maximize the usage of this operation so that the resulting
arithmetic circuits become quite simple.

/.

Figure 3.4 Schematic of a current-mode summing node.
Current source: A current source delivers one given level of current (unit current).
Current sources are designed by a pMOS or an nMOS transistor as shown in Figure 3.5.
The current level is adjusted by the transistor ratio W 'L and the gate reference voltage. In
the event that different current levels are required in other sections of the circuit, current
mirrors may be adopted as will be discussed next.

Figure 3.5 Schematic of current sources: (a) nMOS current source, (b) pMOS
current source.
B in ary A rith m etic U sin g C N N s

D e sig n in g a 1-b it B inary Full A d d e r U sin g C N N (C N N B F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

31

U niversity o f W in d so r

Current mirror: Current mirrors constitute the main interconnections between cells and
the speed o f the circuit depends largely on the speed of the current mirrors. They are used
for signal distribution by generating replicas of the input current. A replica can be scaled
by using an appropriate output and input W^L ratio. Current mirrors are also used for
inverting the current direction. The simple nMOS and pMOS current mirrors shown in
Figure 3.6 are used in the design of the CNNBFA. The nMOS current mirror is used for
producing replicas while the pMOS type is used for scaling and inverting direction. The
replicas are given by:

7;- = m j

(3.10)

where ml is the scaling factor and is determined mainly by the W 'L ratio of the input and
output transistors.

a

b

Figure 3.6 Schematic of simple current mirrors: (a) nMOS current mirror, pMOS
current mirror
Subtractor: Subtraction can be viewed mathematically as addition of the minuend and
the negative of the subtrahend; I = /, + ( - / 2) . The realization of a subtractor can be
achieved using a wired-sum and a simple current mirror (to invert the direction of the
subtrahend) as shown in Figure 3.7.

B in ary A rith m etic U sin g C N N s

D e sig n in g a 1-bit B in a ry Full A d d e r U sing C N N (C N N B F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

32

U niversity o f W in d so r

/
/ T

Figure 3.7 Schematic of a subtractor.
Absolute function: For correct operation of a uni-directional current-mode circuit, the
input current should flow in a certain direction. The current output of a subtractor depends
on the values o f the two inputs. If the minuend is less than the subtrahend, the output
current will be negative; i.e., flows in the opposite direction. The design of the CNNBFA
uses the absolute function presented in [127], and depicted in Figure 3.8 for convenience,
to force the current into the correct direction. The current output of the absolute function is
given by:

(3.11)

/

Figure 3.8 Schematic of absolute function.

B inary A rith m etic U sin g C N N s

D esig n in g a 1-bit B inary Full A d d e r U sin g C N N (C N N B F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

33

U niversity o f W in d so r

Activation function: The central building block in CNN circuits is the cell and the core
of the cell is the activation function. The state-to-output converter with a current output
presented in [128] is used in the design of the CNNBFA. Note in this implementation,
shown again in Figure 3.9, that one side of the differential pair is used for the
implementation o f the nonlinear activation function and the other side is used for signal
distribution to the neighbor cells. The output of the circuit is given by Eqn. (2.2).

T

Figure 3.9 Schematic of basic CNN cell.

3.2.3 CNNBFA CMOS Implementation
The CNNBFA consists of four CNN cells as shown in Figure 3.3-a. The two input rows
function as input buffers to convert binary input voltages into currents for internal
processing by the other two rows. The template connections for the sum function in Eqn.
(3.7) can be realized using basic building blocks presented in the previous section. The
complete CNN sum cell schematic with connections to neighbor cells is shown in Figure
3.10.

B in ary A rith m etic U sin g C N N s

D e sig n in g a 1-bit B inary Full A dder U sin g C N N (C N N B F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

34

U niversity o f W in d so r

Ml 1

M 12

M7

M6

Ml

M3

M8

M9

M2

M10

M IS

M16

M17

M5

M4

Figure 3.10 Schematic of the CNNBFA sum cell with connections to neighbors.
The operation of the circuit can be directly described using Eqn. (3.7) as follows. The
input signal representing y-t is subtracted from the input signal representing x-t using the
current subtractor circuit of M l-M2. The absolute of the output of the subtraction is
obtained using the absolute function generator of M3-M5 and mirrored using the current
mirror of M6-M7. This last operation inverts the direction of the signal so that the second
subtraction operation in Eqn. (3.7) reduces to merely addition of the output of the current
mirror M6-M7 and the input signal representing cr The absolute of the sum signal is first
obtained using the second absolute function generator of M8-M10 and then mirrored
using the current mirror of Ml 1-M12. The direction o f the output current of the current
mirror M l 1-Ml 2 forces the output of the CNN cell to take one of the two binary values 0
and 1.

The synthesis of the carry function given in Eqn. (3.8) is simpler than the sum function
because it only includes wired addition of current signals. Therefore, there is no need to
use absolute function generators. The carry output is also expected to settle down to the
final value before the sum output because the number of devices in the critical path is less.
The carry cell with template connections to neighbor cells is shown in Figure 3.11.

B in ary A rith m etic U sin g C N N s

D e sig n in g a 1-bit B inary Full A d d e r U sin g C N N (C N N B F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

35

U niversity o f W in d so r

Ml

M2

M5

M3

M6

M4

M9

M1 0

Mil

Figure 3.11 Schematic of the CNNBFA carry cell with connections to neighbors.
The operation of the circuit comes directly from the carry function of Eqn. (3.8). The three
input signals x-p y-t, and c-t are summed together. The sum of the signals is mirrored using
the current mirror M l-M2. A unit current, implemented using an nMOS current source, is
subtracted from the sum of the three input signals. The rest of the current is drained by
transistor M3 and mirrored again using the current mirror M5-M6 so that the output of the
CNN cell takes one o f the two binary values 0 or 1.

3.2.4 CNNBFA Hspice Simulation
The CNNBFA shown in Figure 3.3 has three binary inputs (xh y h and cz) and two binary
outputs (sz, and cz_i). To test the design for correct network operation, a truth table of the
functionality of a 1-bit binary full adder is constructed as shown in Table 3.1. The truth
table is divided into four sections according to the number of logic “ Is” in the input
patterns. A test circuit is designed using Hspice and parameters from 0.3 5pm CMOS
process technology. The inputs from the truth table are applied as a test bench to the
CNNBFA and the output signals are probed for plotting. The worst case delay of the
outputs of Hspice simulations of each section in the truth table is plotted in Figure 3.12.
The squares on the left side of the graph represent the input signals while the squares on
the right side represent the output signals. Logic “1” is represented by a filled square and
logic “0” by an empty square. The delay for the sum and carry signals was measured as the
time it takes the output signal to rise from 10% to 90% of its final value. The worst case
B inary A rith m etic U sing C N N s

D esig n in g a 1-bit B inary Full A d d e r U sing C N N (C N N B F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

36

U niversity o f W indsor

delay was measured as 2.2,5ns and 1.65ns for the sum and carry signals respectively.
Hspice simulations also show that the network is stable and the outputs monotonically
approach their correct steady state values. This also guarantees that multi-bit adders,
discussed in the next section, will also converge since the CNNBFA units are connected to
each other through the carry signal which is stable in itself.

Table 3.1 Truth table of a 1-bit binary full adder.

C,
0
0
1
0
1
0
1
1

0
0
0
1
0
1
1
1

0

OS

1

1.5

0
1
0
0
1
1
0
1

2

0
1
1
1
0
0
0
1

2.5

0
0
0
0
1
1
1
1

3

3.5

4

tim e (ns)

Figure 3.12 Hspice simulation of the CNNBFA.

Binary' A rith m etic U sin g C N N s

D esig n in g a 1-bit B inary Full A d d e r U sing C N N (C N N B F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

37

U niversity o f W in d so r

3.3

CNNBFA Design Scalability

The CNNBFA can be used as an enabling building block to design binary adders with
arbitrary sizes. An rc-bit binary adder can be obtained by cascading n CNNBFA units as
shown in Figure 3.13. Similar to the traditional 1-bit digital full adder, the carry output
cl+ 1 from the CNNBFA in position i is connected to the carry input c, of the CNNBFA in
position z'+l. The number of CNN cells required by the n-bit binary adder is 4n because
each CNNBFA uses four CNN cells.

CNNBFA

V ,

• • •

C2 "

CNNBFA

CNNBFA

Figure 3.13 Block diagram of an w-bit CNN-based binary adder.

3.3.1 A 32-bit CNN-based Binary Adder
To prove the scalability of the CNNBFA, a 32-bit CNN-based binary adder is designed.
Hspice simulations using random patterns of input signals were carried out and the outputs
of the designed adder were verified. It is known, however, that the sum and carry outputs
at position i depend on the carry input from position z'-l which in turn depends on the carry
input from position i-2 etc. This means that the worst case for the CNN network
convergence (and maximum delay) would be to add two binary numbers that force the
cany to propagate all the way from the least significant bit to the most significant bit.
Hspice simulations that reflects this situation are shown in Figure 3.14. The figure shows
the addition process of a small part of the 32-bit adder to make the figure readable. The
corresponding operands are X=01111111 and 7=00000001. The simulations demonstrate
that the 32-bit CNN-based binary adder always converges to the correct sum outputs even
for maximum carry propagation.

B in ary A rith m etic U sin g C N N s

C N N B F A D esign S calab ility

with permission of the copyright owner. Further reproduction prohibited without permission.

38

U niversity o f W in d so r

MSB carrv
M SB sum

©

o

o

10

20

30

tim e (ns)

Figure 3.14 Hspice simulation of an 8-bit section of the 32-bit CNN-based binary
adder.

3.3.2 Impact of the CNN-based Binary Adder on Substrate Noise
The amount o f switching noise is calculated as the product of the effective parasitic
inductance and the rate of change of the instantaneous power supply current (i.e., the
current drawn from the power supply by the circuit under test). The parasitic inductance is
mainly determined by the specific layout of the circuit and the process technology used to
implement the circuit. In order to make suitable comparisons, we make the assumption
that the parasitic inductance is identical for both the CNN and CMOS circuits and so the
instantaneous noise voltages are given by Eqn. (3.12):

v C A '.V =

^ S ic .v .v / 5 /

...

(j.12)

v CMOS ~ L § iCMOS/ ^ t

If we use a measure of noise power as the power dissipated in a given resistor, R, then we
can compute the ratio of this noise power for both the CNN and CMOS noise "sources".
Expressing this in decibels (dB) we find the effective power ratio is given by Eqn. (3.13):

B inary A rith m etic U sing C N N s

C N N B F A D esign S c alability

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

39

U niversity o f W in d so r

' W / c » ' = 2°'log (8t ~ V/^ f) dB

0>T;>)

\0 V i C MO S

= (2 0

• logj^S/^Y y/S /)) -

(2 0

• log10(8/c ,w os/ 80)

In order to better demonstrate the noise differences as a function of time, we will provide
separate graphs of the two computations shown in the lower expression of Eqn. (3.13).
The effective power ratio will be understood to be the difference between the two graphs.
As an aside, we note that although most calculations of noise are statistically based (where
there is often an assumption of stationarity), in our case we are very interested in the time
domain behaviour of what is non-stationary noise, since the switching noise may be
responsible for circuit errors at specific times (such as the incorrect latching of values in
registers at high switching noise events such as clock rise and fall). By taking worst case
time domain values we can make judgments on the relative merits of CNN circuits versus
CMOS circuits at these specific worst-case times.

In this section, the switching noise of the designed CNN-based 32-bit binary adder is
compared to the switching noise of a 32-bit standard digital binary adder by monitoring
the instantaneous power supply current [1]. Both adders are designed using the same
0.35um CMOS process technology and are operated at the same speed, the speed
determined by the slower CNN-based adder. Several Hspice simulations were performed
on random operands and the switching noise of each simulation was recorded. The worst
case switching noise of the CNN-based binary adder is plotted against the worst case
switching noise o f the standard digital binary adder in Figure 3.15.

Notice from the figure that the switching noise of the CNN-based adder is less than that of
the standard digital adder at all times during the addition process. This is because the
CNN-based adder tends to smooth out the transitions while the standard digital adder
changes states abruptly. The worst-case scenario for both adders is when the binary inputs
(representing the operands) are first applied. During the first few nano seconds, switching
noise increases drastically and then drops gradually as the addition proceeds and more bits

B in ary A rith m e tic U sin g C N N s

C N N B F A D esign Scalability'

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

40

U niversity o f W in d so r

settle to the final value. In these simulations, the CNN-based binary adder achieves an
improvement o f 50dB in switching noise over the standard digital binary adder.

-180
• 160
-140

S“ -120

Digital binary

100

-80
-60
-40
-20

0

10

20

30

40

tim e (ns)

Figure 3.15 Switching noise of the CNN-based and standard digital 32-bit binary
adders.
The presence o f parasitic capacitance between the switching transistors and the silicon
substrate also increases the substrate noise. The amount of noise injected into the common
substrate is given by Eqn. (1.2) as the product of the lumped parasitic capacitance and the
rate of change of the switching node voltage. The parasitic capacitance is also a function
of the specific circuit layout and the process technology used to fabricate the circuit. The
cross talk o f the CNN-based 32-bit binary adder is plotted against that of the standard
digital 32-bit binary adder in Figure 3.16. The CNN-based adder suppresses cross talk by
about 20dB over of that of the corresponding digital adder.

B in ary A rith m etic U sing C N N s

C N N B F A D esign S ca la b ility

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

41

U niversity o f W in d so r

Diaital
CNN

5

CSi

-20

-40

0

5

10

15

20

time (ns)

Figure 3.16 Cross talk noise of the CNN-based and standard digital 32-bit binary
adders.

3.4

CNNBFA Design Compatibility

The developed CNNBFA has standard inputs (operands and carry in) and standard outputs
(sum and carry out). This property facilitates using the CNNBFA in more complex circuit
structures without the need to change the design of the existing circuit. As an example,
consider designing a standard n x n -bit carry-save tree multiplier. The structure of such a
multiplier is shown in Figure 3.17.

The multiplier uses a carry-save reduction tree of binary full adders to reduce the n binary
numbers of partial products into two binary numbers. The number of levels of the binary
full adders used in the reduction tree is 0(\ogri) . The final stage of the multiplier is a
binary adder that adds together the two binary numbers produced by the carry-save
reduction tree. This specific application, demonstrates that the CNNBFA can be used as
an embedded component in the carry-save reduction tree. It also illustrates that the CNNbased multi-bit binary adder designed in Section 3.3 can also be embedded in the final
stage of the carry-save tree multiplier.

B in ary A rith m etic U sin g C N N s

C N N B F A D esign C o m p a tib ility

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

42

U niv ersity o f W in d so r

Y

X

o a

FA

FA

z z
FA
FA

Final Adder

P

Figure 3.17 Block diagram of a carry-save tree multiplier.

3.4.1 A 32x32-bit CNN-based Binary Multiplier
Using the CNNBFA and the CNN-based multi-bit adder, a 32x32-bit carry-save binary
multiplier was developed. Extensive Hspice simulations were performed using random
32-bit binary operands. The multiplier outputs of one of the simulations is shown in
Figure 3.18 where, for the sake of clarity, only the first 16-bits are shown. The CNNbased binary multiplier converged for all simulations and all outputs monotonically
approached their final state values.

B inary A rith m etic U sing C N N s

C N N B F A D esig n C o m p atibility

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

43

U niversity o f W in dsor

Output bits

c

2

o

0

40

20

60

tim e (ns)

Figure 3.18 Hspice simulation of a section of the 32x32-bit CNN-based binary
multiplier.

3.4.2 Impact of the CNN-based Binary Multiplier on Substrate Noise
The n x « -bit carry-save binary multiplier presented in Section 3.4 consists mainly of 1bit binary full adders, connected in a carry-save reduction tree, and a multi-bit binary
adder in the final stage. Since the CNN-based addition process improves switching noise,
as discussed in Section 3.3.2, one can conjecture that the CNN-based multiplier, as a
collection of CNN-based addition processes, will also improve switching noise. The
32x32-bit CNN-based carry-save binary multiplier developed in the previous section has
binary inputs and outputs and the multiplication process itself is carried out using parallel
CNN-based addition. To compare switching noise, a 32x32-bit standard digital binary
multiplier was designed using the carry-save structure of Figure 3.17. Numerous Hspice
simulations were performed on random operands and switching noise of each simulation
was recorded. The worst case switching noise of the CNN-based binary multiplier is
plotted against the worst case switching noise of the standard digital binary multiplier in
Figure 3.19.

B inary A rith m etic U sin g C N N s

C N N B F A D esign C o m patibility

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

44

U niversity o f W in d so r

-200
-180
-160

120

Digital Binary
■100

-40
-20

time (ns)

Figure 3.19 Switching noise of the CNN-based and standard digital 32x32-bit
binary multipliers.
The CNN-based binary multiplier achieves 57dB improvement in switching noise over the
standard digital binary multiplier. This measurement was taken as the difference between
the maximum values of the switching noise of the CNN-based multiplier and the digital
multiplier during one multiplication process. As in the case of binary addition, the
maximum values of switching noise occur at the beginning of the multiplication process
when the binary operands are first applied. Then switching noise decreases as the
multiplication proceeds toward the final product. The CNN-based multiplier smooths out
the transitions over a longer period of time which helps reduce peak switching noise.

3.5

Summary of CNN-based Binary Arithmetic

In this chapter, a general procedure to perform binary arithmetic using analog CNNs was
developed. New equations were derived to perform the sum and carry functions in binary
addition. The new equations were defined as continuous functions to facilitate mapping
them into the analog domain. The equations were synthesized using simple analog circuits
including: summing nodes, current mirrors, and absolute function generators. Following
the general procedure, a 1-bit CNN-based binary full adder (CNNBFA) was developed.
The CNNBFA uses the derived sum and carry continuous functions to describe template

B inary A rith m etic U sin g C N N s

S u m m ary o f C N N -b ased B inary A rith m etic

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

45

U niversity o f W in d so r

connections to neighboring cells. Hspice simulations of the CNNBFA proved that the
CNN network will converge for all possible binary inputs.

Similar to the standard digital 1-bit full adder, the CNNBFA accepts binary inputs
(operands and carry in) and produces binary outputs (sum and carry out). This property
gives circuit designers the ability to use the CNNBFA as an enabling building block to
develop multi-bit CNN-based binary adders with arbitrary sizes. It also provides circuit
designers with a full adder unit that can be embedded in existing, more complex, circuit
architectures without the need to re-design the circuit blocks.

To illustrates the scalability of the CNNBFA, a 32-bit CNN-based binary adder was
developed by cascading 32 CNNBFA units. The CNN-based adder converged for all test
patterns applied to it. The smooth transitions of the CNN nodes achieved improvement of
50dB in switching noise and 20dB in cross talk over standard digital adder operating at the
same speed. A 32x32-bit CNN-based carry-save binary multiplier was also developed to
demonstrate the compatibility of CNNBFA with standard circuit designs. Extensive
Hspice simulations show that the CNN-based multiplier improves switching noise by
57dB compared to a 32x32-bit carry-save binary multiplier implemented using standard
digital logic. All circuits were developed using the same 0.35pm CMOS process
technology.

B in ary A rith m etic U sin g C N N s

S u m m ary o f C N N -based B in a ry A rith m etic

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

46

Chapter 4
Binary Signed-Digit
Arithmetic Using
CNNs

A novel methodology for building CNN binary signed-digit (BSD)
arithmetic circuits using analog cellular neural networks is
described in this chapter. The work extends the concepts developed
in Chapter 3 to introduce an original application of CNNs to binary
signed digit arithmetic. In these architectures, the signed-digit
radix-2 number representation with symmetrical digit set { 1 , 0 , 1 }
is coupled with low-precision bi-directional current-mode analog
components in a novel way that combines the computational
capability of analog circuits and noise immunity of digital
components. The structures use a new class of current-mode CNN
that has three stable states to match the three values of the digit set.
Although switching noise is the primary concern, the designs
incorporate all the advantages of signed-digit arithmetic such as
reduced circuit complexity and reduced routing area. The chapter is
organized as follows. In Section 4.2, an overview of the binary
signed-digit number system is given and the addition algorithm is
explained. In Section 4.3, a practical technique to implement a BSD
adder unit in the CNN framework is presented. First, a new class of
CNN featuring a fundamental 3-state cell is introduced. This
facilitates mapping the 3-valued number system naturally into the
new class of CNN. Subsequently, the BSD addition algorithm is
analyzed and new functions that govern connections to neighbor
B in ary S ig n ed -D ig it A rith m etic U sin g C N N s

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

47

U niversity o f W in d so r

cells are defined. The design of a BSD adder unit is then presented and convergence is
illustrated using Hspice simulations. The designs of a 32-digit CNN-based BSD adder and
a 32x32-digit CNN-based BSD multiplier are presented in Section 4.4 and Section 4.5
respectively. The impact of the new designs on DSN and cross talk is also examined. A
summary of the work done in BSD is given in Section 4.6

4.1

Introduction

The design o f high speed adders and multipliers has always been a challenging topic in
computer arithmetic. The signed digit number system (SDNS) is a redundant number
system that can be employed to further enhance the performance of the LNS computation
[129], floating-point (FLP) multiplication-add fused (MAF) operation [130][131],
complex multiplication [132], trigonometric function calculation [133], division [134],
square rooting [135], online multiply-accumulate (MAC) operation [136], residue
arithmetic [137][138], and integer multiplication [139]. The representation that has fewest
nonzero digits is known as the canonic signed-digit (CSD) representation. It was shown in
[140] that, on average, CSD uses 33% fewer nonzero digits than the binary number. This
property justifies its adoption in high speed digital signal processing applications [141].
Canonic signed digit IIR/FIR filter coefficients result in a much smaller number of
nonzero digits [142]-[144], By combining subexpressions occurring often in coefficients,
the CSD representation can be used in design automation algorithms leading to quality
solutions to Multiple Constant Multiplication (MCM) problems and efficient digital filter
design [145]-[148]. Several types of programmable filter architectures have also been
developed and implemented using CSD [149]-[151]. Solinas and Proos introduced one
kind of SD representation with minimum joint weight, named the Joint Sparse Form (JSF)
[152] [153]. Such a representation is useful in simplifying the circuits for the
implementation of elliptic curve cryptosystems (ECC) [154]. The binary signed-digit
representation (BSD) can be used in adder circuits to limit carry propagations to one
position to the left by eliminating the dependency of the carry output function on the carry
input signal [155]-[157], With limited cany propagation, operations can be performed in
parallel for all digits o f two arbitrary size numbers [137],[157][158]. This means that

B inary S ig n ed -D ig it A rith m etic U sin g C N N s

In troduction

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

48

U niversity o f W in d so r

addition (and subtraction) o f two BSD numbers can be performed in a constant time
independent of the length of the operands. Consequently, fast computations can be
performed in a parallel system. On the other hand, in a conventional ripple-carry adder the
worst-case propagation delay is proportional to the size of the adder. This is because the
carry may propagate from the least significant bit to the most significant bit. This
advantage of BSDNS becomes more noteworthy in applications requiring arithmetic
operations with large operand sizes [159]. Another important property of the BSDNS is
that individual digits carry their own sign and separate sign information is not necessary.
If joined with bi-directional current-mode circuits, this property can potentially reduce the
amount of interconnect and routing complexities since multivalued signals convey more
information than binary signals, thus requiring less amount of interconnects to transmit a
similar bandwidth of information [160]-[162]. Efficient sign detection of the operands can
further improve the performance of applications [163][164]. The choice of BSDNS in this
work is made because a sufficient noise margin can be obtained using radix-2 SD
arithmetic circuits compared to higher radix SD arithmetic circuits. Also, using high-radix
number representations require complex analog components such as threshold detectors
and comparators whose complexity approaches that o f A/D converters. In addition, using
analog CNNs permits a direct trade-off between circuit speed and power consumption,
and therefore presents a design alternative to the pure digital circuit with fast-clocked
pipelined registers with high instantaneous power consumption. Intuitively, reducing the
instantaneous power consumption reduces switching noise; which is the ultimate goal of
this research initiative. Obviously, analog circuits are not expected to equal the efficiency
of power consumption of standard digital logic solutions [165], but in noise sensitive
applications this should be an acceptable trade-off.

4.2 The Binary Signed-Digit Number System:
Overview
4.2.1 Definition
The binary signed-digit number system, first introduced by Avizienis [157], is a weighted
number system in which any algebraic value X can be represented by an n-digit vector as:
B in ary S ig n cd -D ig it A rith m etic U sin g C N N sT h e B inary S ig n e d -D ig it N u m b e r S ystem : O verview

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

49

U niversity o f W in d so r

n- 1

(4.1)
/=0
where each digit, x h can assume one of the values in the symmetrical digit set
L = { 1 , 0 , 1 }, and I = - 1 .

From the definition above, BSD is a ternary number where each digit x, carries its own
sign, and there is no extra sign bit assigned to the number as a whole. An n-digit binary
signed-digit number X = [xw_ j, x n _ 2, .. -, x ,, x 0] has the value:

(4.2)

where X is bound by ~{2n - 1) < X < (2n - 1) and the sign of X is the sign of the most
significant non-zero digit.

This sign symmetry is advantageous for arithmetic operations in that:

1. The representation for -X o f a BSD number A" can be obtained directly by changing the
signs of all digits in X. For example, using primes to denote complementation, we have
(I)' = 1 ,

1' = I ,and O' = 0 .

2. Various signed arithmetic operations can be performed without special conversion
techniques.
The BSDNS is called a redundant number system because a given number may have more
than one signed-digit representation. For example, the integer 3 can be represented in
BSDNS by any of the 4-digit vectors: 001 1 , 0101, 0110, l l O l , l T l l . This property is
valid except for the value 0 which has a unique representation; all digits equal to zero. The
inherent redundancy o f the BSD number representation allows limited carry addition and
(by changing all the digit signs in the subtrahend) limited borrow subtraction. This

B in ary S ig n e d -D ig it A rith m etic U sin g C N N sT h e B in ary S ig n e d -D ig it N u m b e r S ystem : O v e rv ie w

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

50

U n iversity o f W in d so r

facilitates totally parallel operations, with an

0

( 1 ) time complexity of addition and

subtraction o f any length operands (i.e., independent of the word length, n).

4.2.2 BSD Addition
The objective of using the BSD addition algorithm is to reduce the addition time by
reducing the length of the maximum cany propagation chain. The goal is to eliminate the
carry propagation altogether. Given two BSD numbers, X = [xn _ , xn _ 2, ..., x ,, * 0 ] and
Y = \yn_ \ , y n_ 2 >
-- y7], J^03 >where X and Y are described byEqn. (4.1) and x i, y i s L ,
the addition o f X and Y can be performed in parallel in three successive steps [165]. First,
each digit x-t is added linearly to the corresponding digity-t to form the instantaneous sum
digit z-v Second, the instantaneous sum digit z-t is used to form an intermediate sum w,- and
a transfer digit t-r Finally, the sum digit s-t is obtained by linearly adding the intermediate
sum digit w;- and the previous transfer digit

That is, the transfer digit acts as a form of

carry to the next position. These three steps can be summarized in the following set of
equations:

z i = Xi + Y i

(4 -3 )

2 ti + wi = z i

(4.4)

5,-= w , . +

(4.5)

where z;- e {2 , 1 , 0 , 1 , 2 }, w i e { 1 , 0 , 1 }, and ti e { 1 , 0 , 1 } are the linear sum, the
intermediate sum, and the transfer digit, respectively.

To achieve parallel addition without carry propagation in BSD arithmetic, the last step in
the algorithm should be performed without producing a carry. This means that final sum
digit 5 / has to be retained within the digit set L. This can be obtained by imposing
restrictions on values of w;- in Eqn. (4.4). A problem arises only if z i = 1 and tiA is

B in ary S ig n ed -D ig it A rith m e tic U sin g C N N sT h e B inary S ig n e d -D ig it N u m b e r System : O v e rv ie w

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

51

U niversity o f W in d so r

negative or z; = 1 and

is positive. In this case. Eqn. (4.5) will generate a final sum

digit 5 ; = ±2 (sj <z L). To solve this problem, the restrictions on values of w,- in Eqn.
(4.4) are such that w;- e {0,1} when z( _ , < 1 and wi e {1,0} when z; _ , > 1. These
restrictions on w7-are summarized in Eqn. (4.6):

(z, = 2 )0 * ((z , = 1)AND{zi_ , > 0))

j

0

(z, = 0)O *((z, = 1)AND(zi _ , < 0))OR((zl = I )AND(zi_ , > 0)) (4.6)

1

(zf = 2)0/?((z;. = I^iVD(--._, < 0))

From Eqn. (4.6), the final sum s;- is determined by z-t and ziA independent of the other
linear sum inputs. Therefore, the carry propagation is always limited to one position to the
left. This property o f the BSDNS allows fast parallel operation, and the addition time is
independent of the length of operands n, as discussed earlier. An example of this BSD
addition process is shown in Figure 4.1. The example illustrates the addition of two BSD
numbers X = - 5 and Y = -1 9 .

2

2 2

2

2

weight

0

I

0

1

1

X

I

I

1

0

1

Y

I

2

1

1

2

Z

1

0

I

I

0

w

I

I

1

1

1

= I

0

1

0

0

2

+

T
0

S

Figure 4.1 An example of BSD addition.

B inary S ig n ed -D ig it A rith m etic U sing C N N sT h e B inary S ig n e d -D ig it N u m b e r System : O v erview

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

52

U niversity o f W in d so r

Similar to binary addition, BSD addition can be made quite modular because the final sum
at position i depends only on the inputs at position i and z'-l. Therefore, given the set of
equations describing BSD addition (Eqn. (4.3)-Eqn. (4.5)) and the restrictions on the
transfer digit summarized in Eqn. (4.6), one can create a functional BSD adder unit that
accepts four inputs

and f,-.j) and generates three outputs (w,-,

Sj). However, the

linear additions of Eqn. (4.3) and Eqn. (4.5) can be realized in bi-directional current-mode
circuits using summing nodes without active devices. A block diagram o f the bi
directional current-mode BSD adder unit is shown in Figure 4.2-a where the block
marked SDFA represents the restrictions on the transfer digit t-t of Eqn. (4.6). Now,
designing a multi-digit BSD adder is made easy. To design an n-digit BSD adder, one can
simply connect n BSD adder units as shown in Figure 4.2-b.

y,A x,.\

7,-. *,-i
-/-i
SDFA

- - i

SDFA

SDFA

SDFA

w

Figure 4.2 Block diagram of a BSD adder: (a) 1-digit BSD adder, (b) «-digit BSD
adder.
It is worth mentioning that subtraction in the BSD number system is performed by a
similar procedure. Noting that subtraction is essentially addition of the minuend and the
negative of the subtrahend:

X - Y = X + (-Y )

(4.7)

Therefore, subtraction can be performed by changing the sign of all the digits of the
subtrahend and adding the two operands together.
B inary S ig n e d -D ig it A rith m etic U sin g C N N sT h e B inary S ig n e d -D ig it N u m b e r System : O v e rv ie w

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

53

U niversity o f W in d so r

4.3 Designing a 1-digit BSD Full Adder Using CNN
(CNNBSDFA)
As was mentioned in Section 2.1, the nonlinear controlled current source in a CNN cell
provides two stable states as shown in Figure 2.2. This property makes CNN a natural
choice for applications characterized by 2-D binary outputs. The BSDNS, on the other
hand, is a ternary number system and it is extremely challenging to naturally map BSD
addition into the CNN framework. Although CNN arrays have been successfully used to
process images with many grey/color levels, the BSDNS is different because it requires
three distinctive stable states with sufficient noise margins between the states. This
restriction guarantees stable and correct arithmetic operations. The two stable states of the
traditional CNN can be used to represent the digits 1 and 1 in the symmetrical digit set
{ I, 0 , 1}. Coding the digit 0 requires creating a third stable state at the center of the linear
range o f the original activation function. The ideal required 3-state transfer function is
illustrated in Figure 4.3.

*„(/)

Figure 4.3 The required CNN cell activation function.

4.3.1 A 3-State CNN Cell
Current-mode circuits are analog in nature. There is no naturally available stable state
because the currents flowing can take on any value. This kind o f logic circuit is non
restoring, and it is often necessary to introduce some correcting circuits that will quantify
the amount o f current at any stage. In this section, a novel current-mode circuit design is
B in ary S ig n e d -D ig it A rith m etic U sin g C N N sD e sig n in g a 1-d ig it B S D Full A d d e r U sin g C N N (C N N B S D F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

54

University- o fW in d s o r

introduced that provides three stable states using continuous feedback signals. The design
utilizes a 3-input median selector which orders the input signals based on the
instantaneous magnitude and then finds the value in the middle of the sorted list. In this
section a brief analysis of the median extractor is given. A detailed analysis can be found
in [166],

Given n real input values lnv i = 1 to n, there is always a permutation :t(z) of the indices
such that the outputs, defined as Outi = I n ^ , are sorted by value:

O utl > O ut2 > ...> O u t n

(4.8)

If the number of input signals is odd, n = 2m + 1, the median value is defined as the
value in the middle of the sorted list:

0 u t m+1 = m ed(Inl,I n 2, . . . , I n 2m +])

(4.9)

such that the same number of input values are greater than or equal to the median as the
number of input values less than or equal to the median. The 3-input median extractor,
shown in Figure 4.4, is implemented with the pMOS series transistors M21-M31, M22M32, and M23-M33. These three groups of transistors provide bias for the matched
pMOS transistors Mi l - Ml 3 and the bleeding transistor M41. The equilibrium condition
requires that at least one of these bias paths be "on", therefore, any two of the nodes -4r A3
have a low potential.

Assuming, without loss o f generality, that VinX < Vin2 < Vin3, the corresponding drain
currents will be / , < / , < I3 . In equilibrium, node A\ will be saturated high, node A3 will
be saturated low, and

/0

= h to maintain a suitable value for the node A2 potential. In this

case, the M22-M32 path provides the common bias for Mi l - Ml 3 and M40-41. It is
important to note that M3 and M il are in the linear region. Therefore, even in the case of
a large input voltage difference Vin2 « Vjn3, the I3 drain current will not become dominant
B inary S ig n e d -D ig it A rith m etic U sin g C N N sD e sig n in g a 1-digit B S D Full A d d e r U sin g C N N (C N N B S D F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

55

U n iversity o f W in d so r

and ‘'steal” all the bias current Ib-ias. The DC characteristic curve of the 3-input median
circuit of Figure 4.4 is shown in Figure 4.5 where Vin2 and Vin2 are held constant at 1 and
2 volts respectively while VM is swept from 0 to 3 volts. The median transfer
characteristic has a unit gain in the range between Vin2 and Vin~ (1V-2V in the graph).

MU|

^

M12[

^

M IjT

~

M 4 l|

^

| m 40

V
rout

■MC

"“’- C

[M l_________

?

I M2_____________ M3

' bias

MO I

©

Figure 4.4 Schematic of the 3-input median extractor.

2.0

1.5

1.0

*>

0
I m l (V )

Figure 4.5 Transfer characteristics of the 3-input median extractor.

Binary' S ig n ed -D ig it A rith m etic U sing C N N sD e sig n in g a 1-digit B SD Full A d d e r U sing C N N (C N N B S D F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

56

U niversity o f W in d so r

Now, consider changing the function of the diode connected transistor MO to subtract a
mirror of the input current, produced by VM , in a feedback loop as shown in Figure 4.6.
The gain will be zero in the range between

Vin2

and Vf„3. The gain outside this range will

be an inverted ratio of the input voltages. The equilibrium equation of this circuit can be
written as:

W m I M s * _ mJ
r , + r2

M il

\

r , + r2

M12

ml

q

y.,v..\

M41

M13

M21

M31

’

"” J

(4 10)

' '

M 40

M23

M32

M33l

^q

[MJ_____________ [M2_____________ M3

MO

Figure 4.6 Schematic of the 3-state circuit.
This circuit has three different operating zones that depend on the input voltage Vjn].
Assuming, without loss of generality, that Vin2<Vin2 , the circuit operation can be described
by Eqn. (4.11).

B inary S ig n e d -D ig it A rith m etic U sing C N N sD e sig n in g a 1-digit B S D Full A d d e r U sing C N N (C N N B S D F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

57

U niversity o f W in d so r

Ri\

A
X + R ) V‘*
out

Rj

- t ‘
Ri
R->
(
R->
i + f j r M - f K e f < y M < { ^ r ) v inl- f v ref

ref
R

R~>\

(4.11)

R,

The width of the null gain range is determined by the input voltages V-irQ. F/w3, Vrejz n d the
ratios o f the resistors. The slope of the curve is determined by the feedback resistors ratio.
To compensate for the negative slope, the negative output of the CNN cell is used instead.
A Hspice simulation of the 3-state CNN cell using this solution is shown in Figure 4.7.

•>

1.5

0.5

0

0

1

0.5

15

I'm 1 (V )

Figure 4.7 Transfer characteristics of the 3-state CNN cell.

4.3.2 CNNBSDFA Templates Design
The restrictions on the transfer digit t-t in Eqn. (4.6) can be broken into several analog
primitive functions that can be then processed using the 3-state CNN cells described in the
previous section. Consider the CNN structure shown in Figure 4.8-a to implement the
SDFA of Figure 4.2. This structure consists of four 3-state CNN cells. The nonlinearity of
the CNN cell output in column position / = 4 can be used to restrict the value of the

B inary S ig n e d -D ig it A rith m etic U sin g C N N sD e sig n in g a 1-d ig it B SD Full A d d e r U sin g C N N (C N N B S D F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

58

U niversity o f W indsor

instantaneous sum z-t to one of the digits of the BSDNS. A new intermediate analog signal
can be defined for the CNN cell in column position / = 3 as:

Av - M O ) = P • ( y / + 1 j + y i + 1J + 1)

(4 . 12)

where the input signals >7+ y andy/+y +1 represent the thresholded outputs of the instanta
neous sum at row positions / and i - 1 respectively. Template connections for the transfer
digit t-t can then be expressed as a continuous function of the outputs of neighbor CNN
cells:

Aij:ki(yki(0 ) = P • (yi+ 2j

2J + 1- y i + 1 J

(4 -1 3 )

Given the value of the transfer digit t} in Eqn. (4.13), one can use Eqn. (4.4) to solve for
the intermediate sum w-r Template connections to the intermediate sum w;- can be directly
mapped as:

A ij:ki(ydO) = P • 0 / + 3j ~ 2 T/ + ij )

(4-14)

A block diagram of the CNN implementation of the BSDFA is shown in Figure 4.8-b.

CNNBSDFA

Figure 4.8 Representation of the CNN-based 1-digit SD adder: (a) CNN grid, (b)
block diagram.

B in ary S ig n ed -D ig it A rith m etic U sing C N N sD c sig n in g a 1-digit B S D Full A d d e r U sing C N N (C N N B S D F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

59

U niversity o f W in d so r

4.3.3 CNNBSDFA Hspice Simulation
The inputs to the CNNBSDFA. shown in Figure 4.8-b, z7-and zz-_j, can take any value from
the digit set {2, 1, 0, 1, 2} while the outputs, // and w-t, can only take values from the BSD
digit set {1, 0, 1}. However, for correct BSD addition operation, the value of the transfer
digit tj is governed by the rules given in Eqn. (4.6). These rules determine the value of //
based on the values of both z-t and Z/.j. The value of the intermediate sum digit w,- can then
be obtained by substituting the values of ij and t-t in Eqn. (4.4). The functional operation of
the CNNBSDFA in Figure 4.8-b can be summarized in the truth table of Table 4.1.

Table 4.1 Truth table of BSD addition (* represents don’t care).

“l
2

w.
0

/,

2

1

1

1

1

0

1

T
T
T

T

1

l

0

Zi-\
*

1
1
1

2

1

l

0

*

0

0

0

2

I
T
I

i

0

1
0

I

0

i

T

1
2

1

i

i

1

i

i

*

2

0

i

The truth table was used as a guide to test the CNNBSDFA for correct operation. All 25
sets o f inputs (including the don’t cares) were applied in sequence and the outputs were
probed for plotting. The outputs of each of the Hspice simulations are shown in Figure
4.9. The maximum delay for the output signal to reach 90% of its final value was
measured as 6A2ns. This delay is comparable to the delay of a voltage-mode BSD adder
reported using 0.18pm CMOS technology in [159] and one-eighth the delay reported
using a negative differential resistance method in [161]. Although the delay is larger than
B in ary S ig n e d -D ig it A rith m etic U sin g C N N sD e sig n in g a 1-d ig it B SD Full A d d e r U sin g C N N (C N N B S D F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

60

U niversity o fW in d s o r

that of the corresponding CNN-based 1-bit binary adder, nevertheless a multi-digit BSD
adder is much faster than a binary adder of the same size. This is because the delay is
constant for the case of the BSD adder. On the other hand, the delay of the binary adder
increases linearly with the size of the operands. It is clear from the simulations that the
CNNBSDFA converges to the correct output values for every possible input set. This
property guarantees the convergence of more complex circuits as will be shown next.

-i

40

SO

0

40

SO

Li
40

40

1

1

SO

0

0

E
SO

0

40

■

t
40

SO

0

40

SO

F
80

0

40

rr

40

SO

0

40

0

40

40

80

0

40

SO

0

40

SO

0

40

40

SO

0

40

40

80

f

i
40

SO

0

.

40

80

0

40

t

SO

0

40

SO

0

40

|L
40

Figure 4.9 Hspice simulation of the CNNBSDFA.

B inary S ig n e d -D ig it A rith m etic U sin g C N N sD e sig n in g a 1-d ig it B S D Full A dder U sin g C N N (C N N B S D F A )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

61

U n iversity o f W in d so r

4.4

CNNBSDFA Design Scalability

The CNNBSDFA can be used in a similar way to the traditional SDFA in Figure 4.2-b to
develop multi-digit BSD adders of arbitrary sizes. The operation is also similar where the
operands at position i are summed linearly using a summing node to form z-v The
instantaneous sums z-t and z ^ are used to develop the transfer digit /,• and the intermediate
sum Wj in the CNNBSDFA. The transfer digit

is linearly summed to the intermediate

sum w;- using a summing node to form the sum signal s-r The connections between
CNNBSDFA units to construct an /7-digit CNN-based BSD adder is shown in Figure
4.10. The complexity of the adder is 0(ri) because the adder uses n units of the
CNNBSDFA.

y,

*,-i
-i

—
i-i

x,
-/

CNNBSDFA

>’m
-i

“ /-I

CNNBSDFA

1

CNNBSDFA

w

Figure 4.10 Block diagram of an n-digit CNN-based BSD adder.

4.4.1 A 32-digit CNN-based BSD Adder
To illustrate the scalability of the CNNBSDFA. a 32-digit CNN-based BSD adder was
developed. The adder is constructed by cascading 32 units of the CNNBSDFA and
connecting each CNNBSDFA to its immediate neighbors as shown in Figure 4.10. The
inputs z/_i and t-t to the first CNNBSDFA are set to zero and the first sum digit s 0 is equal
to the value of the intermediate sum digit w0. A random number generator was used to
generate

test

patterns

for

the

adder

where

each

operand

is

bound

by

- ( 2 j2 - 1) < X, Y < (2 j2 - 1). The adder was simulated using Hspice and the outputs of

Binary' S ig n e d -D ig it A rith m etic U sing C N N s

C N N B S D F A D esign S c a la b ility

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

62

U niversity o f W in d so r

each simulation were verified against the correct sum values of the test pattern. A small
part of one o f the simulations is shown in Figure 4.11. In this simulation, the
instantaneous

sum input pattern is

2l202122lIo!ll20010211222

and the

corresponding sum output is TOOlOOllOlOllOOOlTllOOllO. Here logic “ 1” is
mapped as +1.8v, logic “0” as +lv, while logic “-1” is mapped as +0.2v. The 32-digit
adder converged for all applied test patterns. This result is not surprising since, from
Section 4.3.3, the 1-digit CNNBSDFA converges for all possible inputs.

tim e (ns)

Figure 4.11 Hspice simulation of a section of the 32-digit CNN-based BSD adder.

4.4.2 Impact of the CNN-based BSD Adder on Substrate Noise
Switching noise was recorded for each of the Hspice simulations performed in the
previous section. The worst case switching noise is plotted in Figure 4.12 against the
worst case switching noise for a 32-bit standard digital adder. Both adders operate at the
same speed, the speed determined by the CNN-based BSD adder. The CNN-based BSD
adder reduces switching noise by 61 dB over the 32-bit traditional digital binary adder.
This is an improvement o f 1ldB over the CNN-based binary adder presented in Chapter 3.

B in a ry S ig n ed -D ig it A rith m etic U sin g C N N s

C N N B S D F A D esign S calab ility

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

63

U niversity o f W in d so r

-180
-160
-140

CNN BSD
Digital binary

-120
-1 0 0

-80

-20

0

10

20

30

40

time (ns)

Figure 4.12 Switching noise of the CNN-based 32-digit BSD adder and 32-bit
standard digital binary adder.
Cross talk is plotted in Figure 4.13 for both adders. The CNN-based BSD adder reduces
cross talk by more than 23dB compared to that of the digital adder. This is an
improvement of more than 3dB over the corresponding CNN-based binary architecture.

Digital

CNN

CO

z
z

u

-40

0

5

10

IS

20

time (ns)

Figure 4.13 Cross talk of the CNN-based 32-digit BSD adder and 32-bit standard
digital binary adder.

B inary S ig n e d -D ig it A rith m etic U sin g C N N s

C N N B S D F A D esig n S c alability

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

64

U niv ersity o f W in d so r

4.5

CNNBSDFA Design Compatibility

To the circuit designer, the CNNBSDFA unit has the same functionality and input and
output pins as the traditional SDFA unit (compare Figure 4.2-b and Figure 4.10). This
property allows the circuit designer to replace SDFA units in existing complex circuits
with CNNBSDFA units and obtain the same functionality of the complex structure, but
with the added advantage of reduced switching noise and cross talk. To illustrate this
concept, consider the BSD multiplier design by Kawahito et el [167] for four-input
addition of partial products in the first level of the binary tree. This structure is chosen
because it speeds up the multiplication and reduces the number of full-adder modules and
interconnections. The algorithm is repeated here for convenience.

Since the inputs are two's complement binary number representations, the multiplicand
X = [xm_ x, x m_ 2,

* 0] and the multiplier

Y = \yn_ x, y n_ 2,

are

expressed as:

(4.15)

n- 2

(4.16)
/=o
where n and m are even integers. The following expression of X is obtained by using the
digit x i = 1 - x i and substituting x i = 1 - x t into Eqn. (4.15):

m -2

(4.17)

The product Pj o f X and an arbitrary digit >7 of Y is given by:

B in a ry S ig n e d -D ig it A rith m etic U sin g C N N s

C N N B S D F A D e sig n C o m p a tib ility

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

65

U niversity o f W in d so r

fj-y j* -

(4.18)

l V Wx 2'

where pj j is a partial product. When j is even, Eqn. (4.15) is used for the generation of
product Pj, and when j is odd, Eqn. (4. 1 7) is used. Thus, the partial products p i e {0,1}
•

when j is even and the partial products p t j e { 1 , 0 } when j is odd except for the most
significant digits. Therefore, the linear sum Z;j of four successive partial products:

~i , j

Pi , 4/ + P i - \ , 4j + 1

P i - 2, 4j + 2

P i —3, 4/ + 3

(4.19)

obviously belongs to the set { 2 , 1 , 0 , 1 , 2 } . This means that four product operands, P4j,
-P4/+1, -f*4/-r2 >

Py+s, can be added in parallel by using the CNNBSDFA adder designed

in Section 4.3.
Y

X

Oi Oi

Ch O i

_ /O i

/A o rO
r© r °

/r ®

/ f o fO ,-o ,-o
rO rO ©

rO ,-C

'Y '
SDFA

SDFA

SDFA

Signed-Digit Binary Interface

Figure 4.14 Block diagram of the BSD multiplier.
Accordingly, the binary tree was modified as shown in the 8 x 8 -digit multiplier structure in
Figure 4.14. Since every four partial product operands are added in parallel, the number

B in ary S ig n ed -D ig it A rith m etic U sing C N N s

C N N B S D F A D esign C o m p a tib ility

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

66

U niversity o f W in d so r

of operands is reduced by one fourth at the first level of the binary tree, consequently, the
number of SD full adder levels is reduced to

T = [lo g 2( « / 4 ) ]

(4.20)

where [ " a "I denotes the smallest integer such that \ a ] > a.

4.5.1 A 32x32-digit CNN-based BSD Multiplier
In this section, a 32x32 multiplier, that has the structure as Figure 4.14, is designed.
Similar to the binary multiplier of Figure 3.17, the BSD multiplier consists of three main
parts: a partial product generator matrix that forms the 32x32 partial products, a binary
tree that reduces the partial products into two operands in parallel, and an adder that adds
the final two operands. However, the multiplier uses binary inputs and outputs and
internally uses BSD redundant binary numbers with the digit set {1, 0, 1}. A BSD signal
between tree levels is represented as a bi-directional current on a single wire. From Eqn.
(4.20), the 32x32-digit multiplier has a delay of three levels of full adders. The bi
directional wired summations are fully used, greatly reducing the number of active
devices and the complexity of the interconnections. The final stage of the multiplier is a
BSD-to-binary converter where the 64-digit BSD number is converted into a 65-bit binary
number. The conversion is performed by a two-input binary adder such as the CNN-based
binary adder presented in Section 3.3.1. The Hspice simulation of a small section of the
multiplier output in BSD representation is shown in Figure 4.15. The operands are
X = 10101011 and 7 = 10001010 and the BSD product is P= l l i l l 100011 10010.
The multiplier output in binary representation is shown in Figure 4.16 where
P = 101110000101110.

B inary S ig n ed -D ig it A rith m etic U sin g C N N s

C N N B S D F A D esign C o m p a tib ility

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

67

U niversity o f W in d so r

bits

s

1

O

0
0

20

60

40

SO

100

ti m e (n s )

Figure 4.15 Hspice simulation of an 8-digit section of the 32x32-digit CNN-based
BSD multiplier. The output is in BSD representation.

Output bits

•>

C
h.

C.

0
0

20

40

60

SO

100

tim e (n s )

Figure 4.16 Hspice simulation of an 8-digit section of the 32x32-digit CNN-based
BSD multiplier. The output is in binary representation.

4.5.2 Impact of the CNN-based BSD Multiplier on Substrate Noise
It has been shown in Section 4.4.2 that the CNN-based BSD adder improves switching
noise by lld B over the CNN-based binary' adder presented in Section 3.3.2. Therefore,
one can conclude that the CNN-based BSD multiplier will also show switching noise
improvement over the CNN-based binary multiplier. In addition, the BSD multiplier
structure used in Figure 4.14 employs fewer full adder units in the reduction tree than the
B inary S ig n e d -D ig it A rith m etic U sin g C N N s

C N N B S D F A D esign C om patibility

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

68

U niversity o f W in d so r

binary multiplier structure of Figure 3.17. This property will certainly show an advantage
in switching noise of the BSD multiplier. Hspice simulation of switching noise of the
CNN-based 32x32-digit BSD multiplier developed in the previous section and the
corresponding 32x32-bit standard binary multiplier is shown in Figure 4.17. Both
multipliers are designed using the same 0.35 pm CMOS process technology.

The CNN-based BSD multiplier achieved 69.2dB improvement in switching noise over
the traditional digital multiplier. This measurement was taken as the difference between
the maximum values of switching noise of the CNN-based multiplier and the digital
multiplier during one multiplication process. The CNN-based BSD multiplier also
improves switching noise by 12.2dB over the corresponding CNN-based binary
multiplier. This result is not surprising since, as was mentioned above, the BSD multiplier
utilizes fewer full adder units.
-200

CNN BSD
-iso
-160
-140

120

Digital Binary
-100

so
-80
-60

-20

tim e (ns)

Figure 4.17 Switching noise of the CNN-based BSD and standard digital 32x32-bit
binary multipliers.

4.6

Summary of CNN-based BSD Arithmetic

A general methodology to implement BSD arithmetic using the analog CNN paradigm is
presented in this chapter. The greatest challenge was to discover a way to represent the 3valued BSD number system naturally into the CNN framework. This led to the design of a
new class of CNN characterized by a fundamental 3-state CNN cell. The 3-state CNN cell

B in ary S ig n e d -D ig it A rith m etic U sin g C N N s S u m m ary o f C N N -b ase d B S D A rith m etic

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

69

U niversity o f W in d so r

was used to design a fundamental 1-digit BSD adder unit. Addition in BSDNS requires
some restrictions on values of the transfer digit to ensure that the final sum will always be
in BSD representation. These restrictions were defined as primitive analog functions to
facilitate implementing them in the analog CNN framework. The rest of the BSD addition
algorithm is implemented using summing nodes without active devices. Hspice
simulations depict that the 1-digit BSD adder converges for all possible inputs to the
adder. This property guarantees convergence o f any complex circuit that uses the 1-digit
BSD adder as an embedded block.

To demonstrate the scalability of the 1-digit BSD adder, a 32-digit BSD adder was
designed by cascading 32 units of the 1-digit BSD adder. The BSD adder is tested using
Hspice for different random inputs. Switching noise and cross talk are also simulated and
compared to that of a 32-bit traditional binary adder operating at the same speed. The use
of bi-directional current summing nodes contributed to the reduction of switching noise by
a large degree. Although the delay of the 1-digit BSD adder is larger than that of the CNNbased 1-bit full adder, the multi-digit BSD adder is much faster than the same size multi
bit binary adder. The key difference is that the delay for a BSD adder is always constant
regardless o f the number of digits being added. On the other hand, in the binary case, the
delay increases linearly with the size of the operands.

A CNN-based 32x32-digit BSD multiplier is also designed to illustrate the compatibility
of the 1-digit BSD adder with existing structures. The key feature of the multiplier is the
addition o f four operands in the first level of the reduction tree. This feature reduces the
number of BSD full adders by half in the first level of the reduction tree. It also reduces
the interconnections considerably and renders the circuit more compact. Moreover,
reducing the number of full adders has a desirable effect on switching noise since the
instantaneous supply current is reduced. Hspice simulations show that the CNN-based
BSD multiplier reduces switching noise to unprecedented levels. Not surprisingly, the
switching noise of the BSD multiplier is 12.2dB lower than that of the corresponding
CNN-based binary multiplier.

B in ary S ig n e d -D ig it A rith m etic U sin g C N N s S u m m ary o f C N N -b ased B S D A rith m etic

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

70

Chapter 5
Double-Base Number
System Arithmetic
Using CNNs

This chapter introduces a new architecture and circuitry for
implementing addition and non-zero digit reduction for the double
base number system (DBNS), a recently introduced highly
redundant number system with a

2 -dimensional

representation.

This chapter builds on previous work to implement non-zero digit
reduction using an analog cellular neural network approach, which
naturally maps the 2-D DBNS representation to a 2-D analog CNN
architecture. The new design exploits some of the properties of the
DBNS to provide limited-carry addition with reduced complexity.
The chapter is organized as follows. In Section 5.2, the DBNS and
its graphical representation as 2-D maps is briefly reviewed. The
addition algorithm and non-zero digit reduction to addition-ready
representation are also explained. In Section 5.3, a general
methodology to implement a DBNS adder unit using the CNN
paradigm is presented. First, mapping DBNS addition and non-zero
digit reduction algorithms into CNN image manipulation is
discussed. The problems associated with the non-zero digit
reduction is addressed and, subsequently, a novel and efficient non
zero digit reduction technique is introduced. Finally, the CMOS
design of a CNN-based DBNS adder unit is presented which uses
simple current-mode circuits and linear templates without hardware
overhead. The design of a 20x20 CNN-based DBNS adder is
D o u b le-B ase N u m b e r S y ste m A rith m etic U sin g C N N s

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

71

U niversity o f W in d so r

presented in Section 5.4. In addition, Hspice simulation results that show the effectiveness
of the design are also presented and the advantage of reducing system-noise is
demonstrated. A summary of the work may be found in Section 5.5.

5.1

Introduction

In the two previous chapters we have used both the standard binary representation and
also the redundant signed digit representation to implement arithmetic using CNN arrays.
For binary representations, the carry propagation delay in the addition operation accounts
for the largest portion of the delay. The signed digit representation is redundant, but has
the advantage of reducing the carry propagation to a fixed delay, independent of the word
length. More information on classical redundant number representations can be found in
[168]. Of course, the goal is to achieve both high speed and regularity of layout; as a
result, making the arithmetic units suitable for very-large-scale-integration (VLSI)
implementation. The double-base number system (DBNS), introduced by Dimitrov et al.
[171], has some interesting properties related to reducing the carry propagation, and the
DBNS has similar properties to the classical logarithmic number system (LNS) if an index
calculus is used [171][172]. The DBNS provides more degrees of freedom than the LNS
and promises efficient arithmetic implementation over the LNS for applications such as
modular exponentiation in cryptography [173], The index calculus double-base number
system (IDBNS) has already been used to reduce hardware complexity in digital signal
processing [174][175]. The number system has been extended to more than 2 bases and a
logarithmic version of this extension, which is referred to as the multidimensional
logarithmic number system (MDLNS), has also proved useful for implementing digital
filters [172] [176]. The canonic version of the DBNS promises carry-free addition
operations by exploiting the redundancy in the representation. This property of the number
system is useful not only for addition but also for multiplication. Without carry
propagation, operations can be performed in parallel for all digits of a number.
Consequently, the computation can be completed much faster in massively parallel
systems.

D o u b le-B ase N u m b e r S y stem A rith m etic U sing C N N s

Intro d u ctio n

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

72

U niversity o f W in d so r

The use of CNN architectures for implementing DBNS arithmetic is not as tenuous as it at
first seems. The use o f CNN arrays for implementing image morphology operations is
well established [43] and, as will be shown, the manipulation of DBNS 2-D
representations is quite similar to such image processing operations. For example,
addition operations consist of simple overlays of number "images'’ followed by a
reduction o f the number system to a form suitable for further additions (addition-ready
form). In [34], an initial attempt to implement non-zero digit reduction to an additionready form using the CNN paradigm was reported. However, this implementation has the
disadvantage o f considerable hardware overhead by the requirement for hysteresis in the
CNN cells and also the necessity to use discrete digital logic circuits to control the
switching o f templates during the computation, which increases the potential of injected
noise. In the following sections, a new method is presented for implementing digital
arithmetic in the DBNS that eliminates these disadvantages. The new circuit uses a novel
self-programmable CNN architecture where the switching of templates is performed based
on the state values of the cells. After performing the addition of two addition-ready
representation maps, the CNN reduces the sum back to an addition-ready representation
using a simple reduction rule. This results in the most efficient implementation reported
for multiple additions in the DBNS.

5.2

The Double-Base Number System: Overview

5.2.1 Definition
The double-base number system is a weighted number system that uses two bases and
allows 0 and 1 as its digits. The discussion here will be limited to the original definition of
the DBNS with orthogonal bases 2 and 3. This assumption will not affect the
generalization o f the techniques presented below.
Any integer X can be represented in the DBNS as in Eqn. (5.1) [171]:

X = X x ..y - 3‘

<5-!>

U

D o u b le-B ase N u m b e r S y ste m A rith m e tic U sin g C N N sT he D o u b le -B ase N u m b e r S ystem : O v e rv ie w

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

73

U niversity o f W indsor

where digits x t- ■e {0 ,1 } . It can be seen that the DBNS reduces to the binary number
system for / =

0

and to the ternary number system for / =

0

.

The DBNS can be represented graphically using a 2-dimensional grid by using the base 2
as the x-axis and the base 3 as the y-axis. Consider a 2-D grid with [ log, A"! columns and
f log-A'I rows where every cell (i.j) corresponds to the 2-integer value 2/3‘. Any arbitrary
integer X < 2k3>m can be represented as a sum of 2 -integers, which appears in the first
k + |”log,3m~| columns and m + f log 32&~| rows. For example, the integer number 108
can be represented in the DBNS in different ways as: 108, 72+36, 54+36+18,
54+24+18+12, 36+27+24+18+3. The geometric interpretation of the integer number 108
into DBNS-maps is shown in Figure 5.1. DBNS-maps are obtained by representing each
non-zero term in Eqn. (5.1) as a black pixel, referred to as an active cell, and each zero
term as a white pixel.

£ £

t

BE

m
Figure 5.1 Different representations of 108 in the DBNS
It is obvious from the above example that the DBNS is a highly redundant number system
where, in general, each integer X can have several representations. It is also clear that the
DBNS-map allows extremely sparse representations of the integers. It is this property of
sparseness that promises fast arithmetic algorithms with reduced complexity. The
D o u b le -B ase N u m b e r S y ste m A rith m etic U sing C N N sT h e D o u b le-B ase N u m b e r System : O v e rv ie w

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

74

U niversity o f W in d so r

representation of an arbitrary integer in the DBNS-map, using the minimum number of
active cells, is called the canonic double-base number representation (CDBNR). This
canonic form is not, in general, unique.
The average number of nonzero digits required to represent any integer, X, in the form of
Eqn. (5.1) has a complexity with respect to the number given by Eqn. (5.2) [174]:

(5.2)

The complexity is a weak function of X so one expects that the implementations of
arithmetic operations will be quite efficient using the canonic form of the DBNS. The
major drawback with using the canonic form is that it is a hard problem to compute the
minimal representation. However, a near canonic form, referred to as the addition-ready
double base number representation (ARDBNR), can be relatively easily computed with a
simple greedy algorithm while retaining most of the efficiencies of computing with the
canonic form [172]. In the following discussion, the ARDBNR form is always used prior
to implementing any addition operation.

5.2.2 DBNS Addition
Let Ix (i,j) and lv(i.j) be the DBNS maps of two integers X and Y. represented in the
ARDBNR. It is known from the definition of the ARDBNR [174] that if both X and Y
have an active cell in the position 2/3 \ then the cells in the position l!

13/

are non-active

in both maps. Therefore the image I-(i,j) of the DBNS map of the number Z = X + Y can
be computed by overlaying the DBNS maps corresponding to X and Y. Any collisions
(black squares coincident) are resolved using the following identity:

(5.3)
which can be represented graphically as shown in Figure 5.2. This process is similar to
the traditional carry propagation in the binary number system and the two terms
overlaying rule and carry propagation rule will be used interchangeably. By definition.

D ouble-B ase N u m b e r System A rith m etic U sing C N N sT h e D ouble-B ase N u m b e r System : O v erv iew

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

75

U niversity o f W in d so r

the ARDBNR has no adjacent active cells in a row; this means that the carry propagation
in DBNS addition is limited to one cell position to the right.

...

... i

...

... 2*
*

1 r

j

i

■

3i+1
•

3i+1
•
a

b

Figure 5.2 Graphical representation of the overlaying rule: (a) initial map, (b)
final map.
An example of the addition process described above is shown in Figure 5.3. The image
I-(iJ) o f the DBNS map of the number Z = X + Y is obtained by overlying the two
ARDBNR maps Ix(ij) and Iy(i.j). Note that there is a collision at position (2.1) of the
overlaid image in Figure 5.3-c. The collision is solved by applying Eqn. (5.3). The two
operands, X = 91 and Y = 23, are represented in the ARDBNR, but, as generally
expected, the sum, Z, is not in the ARDBNR form. In order to prepare the final sum for
another addition, the DBNS map of Z needs to be mapped into an ARDBNR form as will
be discussed in the following section.

2°

I 2°

21

1

21

22

23

P
33

2° | 2

32
33

1
-r
22

23

Figure 5.3 Addition in DBNS: (a) X , (b) Y, (c) map obtained by overlaying, (d) Z
after applying the overlaying rule.

D o u b le-B ase N u m b e r S y ste m A rith m etic U sing C N N sT h e D o u b le-B ase N u m b e r S ystem : O v e rv ie w

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

76

U niversity o f W in d so r

5.2.3 Reduction to Addition-Ready Representation
In many applications, the complexity of the computational algorithms depends on the
average number of zeros representing the data; the greater the number of zeros the more
efficient the algorithm [165][179]. This means that one should seek the smallest number
o f 2-integers (a CDBNR) to represent the operands before performing any arithmetic
operation. Finding the canonical form for a DBNS representation is computationally
complex; however, an efficient greedy algorithm has been introduced in [174] that can
produce a near-canonic representation (NCDBNR) with a logarithmic complexity. The
NCDBNR produced by the greedy algorithm is close enough to the canonic form to
implement efficient arithmetic algorithms. In this section, it will be shown that using a
very simple reduction rule one can prepare arbitrary DBNS-maps for a limited-carry
addition process; it is only necessarily to transform the map into an addition-ready DBNR
(ARDBNR) form that has no consecutive active cells lying in a row. Thus, when adding
two DBNS representations, carries from any overlapping active cells will have a non
active cell to the right which can hold the doubled weight of the carry (see Section 5.2.2).
It can also be seen that using ARDBNR representations eliminates carry ripple, similar to
the limitation on carry ripple available by using signed digit redundant binary
representations [180]. We state the following definition:
Definition 5.1

We will call two adjacent active cells in positions (i.j) and

(z'j+ 1)

an

active group in position (i.j).
The reduction rule that transforms an arbitrary DBNS-map into an ARDBNR map is to
replace any active group (two consecutive active cells lying in a row) with a single active
cell using the following identity:

2/3,’+ 2/+ 13/’ = 2/3f+1

(5.4)

which has the geometric representation shown in Figure 5.4.

D o u b le -B ase N u m b e r S y stem A rith m e tic U sing C N N sT h e D o u b le -B ase N u m b e r S ystem : O v e rv ie w

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

77

U niversity o f W in d so r

:

... 4 f ' ...

... 4

...

•

31

3*

3i+1

3i+1

:

:
a

■
b

Figure 5.4 Graphical representation of the reduction rule: (a) initial map, (b) final
map.
To perform non-zero digit reduction on any DBNS-map to obtain an ARDBNR-map, one
needs to eliminate all adjacent active cells lying in a row in the DBNS-map. Therefore,
Eqn. (5.4) needs to be applied successively to replace any active group in position (i.j)
with one active cell in position (i+l.j). If the cell in position (i+\j) is already active, the
carry propagation rule described by Eqn. (5.3) is applied so that the cell becomes non
active and the cell in position (i+\.j+\) becomes active. The non-zero digit reduction
process of the image I:(Lj) obtained in the addition example of Section 5.2.2 is shown in
Figure 5.5.

2°

3°
31
32
33

21

1r

i2

■u

Figure 5.5 Non-zero digit reduction ofZ: (a) initial map, (b) intermediate map, (c)
final map.
The maximum size o f the ARDBNR map obtained using the overlaying and row reduction
rules can be calculated using the following theorem:

Theorem 5.1

Any M x N DBNR-map can be transformed into an ARDBNR-map

that is (M+ 2) x iV pixels at most.

D o u b le -B ase N u m b e r System A rith m etic U sin g C N N sT h e D ouble-B ase N u m b e r System : O v e rv ie w

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

78

U niversity o f W in d so r

Proof. If there is an active group in position (MJ), in the original DBNR-map or as a
result o f addition-ready transformation, the application of the reduction rule to that group
will generate an active cell in position (M+\.j). In the special case where an additional
active cell exists in position ( M j- 1), the application of the reduction rule to an active
group in position (M-lj) will generate an active cell in position (M.j). This new active cell
together with the active cell ( M j - 1) can be replaced by an active cell in position (M+\.j-\)
which in turn, with the active cell in position (M+\j), can be replaced by an active cell in
position (M+2.j-\). This completes the proof.

5.3 Designing a 1-bit DBNS Adder Unit Using CNN
(CNNDBNSAU)
The final sum output in DBNS addition as described in Section 5.2 is obtained in two
steps: performing addition of the two operands represented in ARDBNR-maps using the
overlaying rule given by Eqn. (5.3) and transforming the sum output to an ARDBNR-map
using both the overlaying rule and the row reduction rule given by Eqn. (5.4). In the CNN
framework, this requires the design of two templates: A template for the overlaying rule
and another template for the row reduction rule. The operation of the templates is usually
handled using an external control unit that is programmed to load different templates in
specific order for certain periods of time [41], [181][182]. Another method was reported
in [34] that used discrete logic gates attached to each cell to control template operation. In
the following sections, a novel self-programmable analog CNN array that performs DBNS
addition as well as reduction to addition-ready representation is presented. The network
switches between the overlaying rule and the row reduction rule based on the output
voltages of the involved cells without the need to use external control logic.

5.3.1 CNNDBNSAU Templates Design
Consider first designing the template for the row reduction rule described by Eqn. (5.4).
Examining the graphical representation of the reduction rule in Figure 5.4 reveals that the
cell at position (i+lj) should be activated if and only if the two cells at positions (i.j) and
(i.j+l) are active. The operation is similar to the two-input logical AND function with the
Double-Base N um ber System Arithmetic Using CNN sDesigning a 1-bit DBNS A dder U nit Using CNN (CNNDBNSAU)

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

79

U niversity o f W in d so r

output voltages of the cells (ij) and (ij+1) representing the inputs to the AND function and
the output voltage of the cell (i+lj) representing the output of the AND function.
Therefore, using a superscript notation in order to separate prior and post values on the
image (array) cell nodes, the output voltage of the cell (i+lj) can be described using the
equivalent logic equation on the digits in the DBNS representation:

IZ(i + 1, j) = l \ ( i, j)

a

l \ ( i , j + 1)

(5.5)

Using the CNN notation, the continuous function of the output voltage of cell (i.j) can be
written as:

Aij:kl(yki(0 ) = P • ( y . - i j +

i J+ i - i )

( 5 -6 )

The row reduction rule replaces the active group at position (ij) with one active cell at
position (i+lj). Therefore, once the cell at position (NT./) is activated, the cells at
positions (ij) and (ij+1) should be de-activated. The output voltages of the input cells to
the reduction rule can then be described using the following logic equations:

r: 0 J ) = / ! ( U ) ® ( /! ( /,;) a / ' ( / , ; + 1 »
7 J ( i J + l ) = / ! ( / ,/ + 1) © ( / ( U )

A

(5.7)
+ 1»

(5.8)

A simple and cost-effective way to map the logic equations given by Eqn. (5.7) and Eqn.
(5.8) to CNN continuous functions is to use a negative mirror of thecurrent given by Eqn.
(5.6) (that is used to activate cell (i+lj)) to de-activate the two input cells (ij) and (ij+1).
This method ensures that the current given by Eqn. (5.6) will set cells (ij) and (ij+1) non
active only if a reduction rule is detected. Otherwise, the two input cells will keep their
original state.
The overlaying rule given by Eqn. (5.3) can be realized in a similar way. Notice, however,
from the graphical representation in Figure 5.2 that one of the two input cells is on a third
dimension. This can be realized as a 3-D CNN architecture, or by applying one of the
operands as initial conditions to a 2-D CNN architecture and applying the second operand

D o u b le -B ase N u m b e r S y stem A rith m etic U sing C N N sD e sig n in g a 1-bit D B N S A d d e r U n it U sin g C N N (C N N D B N S A U )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

80

U niv ersity o f W in d so r

as inputs to the network. Using the second approach, the overlaying rule can be described
using the following logic equation on the digits of the operands:
IX Uj + 1) = Ix(Uj) a IyiUj)

(5.9)

which can be written in continuous form as:
A o : M O ) = P • (yu - 1 + uu - 1 - 1)

(5.10)

As in the case of the row reduction rule, the two input cells to the overlaying rule should
be de-activated when cell (ij+1) is activated. The logic function that describes this
condition can be written as:
Iz( i J ) = Ix( i J ) ® I y(i,j)

(5.11)

Here one can also use a negative mirror of the output current given by Eqn. (5.10) to de
activate the input cells.

5.3.2 Dealing with Special Cases of DBNS-maps
The previous discussion about the reduction of a DBNS-map into an ARDBNR-map (see
Section 5.2.3), dealt only with one application of the reduction rule in the map. Even in the
case of parallel applications o f the reduction rules across the DBNS-map, the initial
assumption was that the groups of active cells that participate in the reduction rules are far
apart with no interaction (and interference) among the active cells. However, if we now
include the possibility of such interactions, then if an active cell participates in more than
one reduction rule at the same time, the network might become unstable and, even if this
does not happen, may settle to a wrong output. Figure 5.6 shows a special case where an
active cell in position (ij) that can participate in two different occurrences of the row
reduction rule given by Eqn. (5.4) at the same time. Cell (ij) can participate with the cell
(/,/+1) to activate cell (/+!./) as shown in Figure 5.6-b. Another possibility is that cell (ij)
and cell (ij- 1) are replaced by cell (z'-Hj-l) as shown in Figure 5.6-c. Both solutions are
correct even though they give different DBNS-maps. This is due to the redundancy
inherent in the DBNS. Now assume that cell (ij) participated in the two occurrences at the

D o u b le -B ase N u m b e r S y stem A rith m etic U sing C N N sD e sig n in g a l- b it D B N S A d d e r U nit U sing C N N (C N N D B N S A U )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

81

U niversity o f W in d so r

same time. This means that both cell (7+1 j ) and cell (z+lj'-l) will become active as
shown in Figure 5.6-d. Now one can apply the row reduction rule again on cells (z'+lj)
and (7+1,y-1) and replace them with cell (7+2,y-l) as shown in Figure 5.6-e. This final
value is not correct and clearly it is important to prevent the simultaneous participation of
a cell in more than one network activity.

...

... 7' jj t '

i

:
31 ■ H I
3'+1
3W
a

:

3'

31

3i+1
3i+2

3W
3-2

■ ■

... i' i

■
■

b

:
3'
3w

3i+2

... i' i

i"

■
■
d

jf’

c

... f i

:
3'

r1
3i+2

■
e

Figure 5.6 An example of simultaneous reductions: (a) initial map, (b) and (c)
correct solutions, (d) and (e) wrong output.
There are two other special cases where the application of a reduction rule on an active
group can affect another active cell that is participating in a different reduction rule. These
two cases are illustrated in Figure 5.7. In Figure 5.7-a, the application of the reduction
rule to the active group in position (i-lj) will try to activate the cell in position (ij).
However, since cell (ij) is already active, it will try to participate in the application of the
overlaying rule. Moreover, cell (ij) can participate in the application of the row reduction
rule with the cell in position (y -l).
Figure 5.7-b shows a similar case where the application of the row reduction rule to the
active group (i-lj) will invoke the application of the carry propagation rule on cell (ij).

D o u b le -B ase N u m b e r S y stem A rith m etic U sing C N N sD e sig n in g a 1-bit D B N S A d d e r U nit U sing C N N (C N N D B N S A U )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

82

U niversity o f W in d so r

Also, cell (i,j) can participate in the application of the row reduction rule with the cell in
position (/,/+ 1 ).

i r

m

Figure 5.7 Two possible simultaneous applications of Eqn. (5.4) to the same cell.
It is clear that for correct network operation, the simultaneous participation of an active
cell in more than one reduction rule should be prevented. The above situations can be
summarized in the following two restrictions:
1. Any active cell (ij) can not, at any moment of time, participate in more than one reduc
tion rule.
2. The application o f a reduction rule on any group of active cells cannot affect, at any
moment of time, an active cell that is a candidate for another reduction rule.
The goal of applying the reduction rules is to generate an ARDBNR map. Therefore,
giving preference for a candidate cell (ij), which is violating the ARDBNR rule, to
participate in a certain reduction rule over another, should lead to a more sparse
representation. Since applying the row reduction rule replaces two active cells in row /
with an active cell in row z+ 1 and applying the overlaying rule replaces two overlaying
active cells in column j with an active cell in column j + 1, it becomes natural to try to clear
cells in the last row and last column first. The following theorem proves that using this
method results in a DBNS-map with fewer active cells.
Theorem 5.2

Given any DBNR-map, applying the reduction rules to active groups

in positions (ij) in descending order of i and j results in a DBNR-map with fewer active
cells.

D o u b le -B ase N u m b e r S y ste m A rith m etic U sin g C N N sD e sig n in g a 1-bit D B N S A dder U nit U sin g C N N (C N N D B N S A U )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

83

U n iversity o f W in d so r

Proof. Let us consider the order of scanning rows first. For any M x N DBNR-map. if
there is an active group in position (ij) (0 < i < M ) , there is also the possibility that
another active group will exist in position (i+lj). The active group in position (i.j) cannot
be chosen first because this will violate restriction 2. In this case, cell (i+lj), a member of
the active group in position (7+1./), will be affected by the application of the reduction rule
to the active group in position (ij). For the case i=M, there will always be an empty row
M+ 1 according to Theorem 5.1. Therefor the reduction rule must be applied to active
groups in every row i starting from i=M in descending order o f /.
Now let us look at the order of scanning columns. Assume the DBNR-map has k adjacent
active cells in a row. We consider two cases:
Case 1: A: is even. Since the application of the reduction rule takes two active cells at a
time, all the k active cells will eventually participate in a reduction rule and the order is not
critical.
Case 2: k is odd. In this case, k- 1 active cells will participate in ( k - l ) / 2 applications of
the reduction rule. The remaining active cell, might be able to participate later if an active
group in row z'-l generates an active cell adjacent to it. The remaining active cell will be in
position (i.j) where 0 < j < N - 1 if we apply the reduction rule in descending order of j.
and 1 <j < N if the application were in ascending order of j. For

1

<j < N - 1, any active

group in positions (i-l.j-l). (i-lj), or (z'-l.j+l) can force the remaining active cell to
participate in a reduction rule. For the case j=N, only active group in position (i-l.j-l) is
allowed. For the case j=0, active groups in positions (i-lj) and (/-1./+1) are allowed which
gives an additional 33% possibility for the remaining active cell to participate in a
reduction rule over the case j - N . Since the application of a reduction rule results in fewer
active cells (two active cells are replaced by a single active cell), this completes the proof.
Figure 5.8 illustrates Case 2 where k is odd. The rows were scanned for candidates in
descending order of / while the columns were scanned in ascending order of j in Figure
5.8-b and in descending order of j in Figure 5.8-c. Note that Figure 5.8-b can not be

D o u b le -B ase N u m b e r S y ste m A rith m etic U sin g C N N sD e sig n in g a 1-bit D B N S A d d e r U nit U sin g C N N (C N N D B N S A U )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

84

U niversity o f W in d so r

further processed because the final DBNS-map contains no consecutive active cells in one
row.

... i i* r2...
:
3i+1
3i+2
S'’ 3

...
•
•

i

r2...

■■
■

i
3i+1
3i+J

r 3

_____

...

i

3i+3

r2...

...

i

i" ...

r2...

...
:

j

j

3'

3*

3W

3i+1

3i+’

3i+2

3i+2

3i+2

3i+3

3i+3

3i+3

■
■

Figure 5.8 An example of the order of reduction: (a) initial map, (b) j in ascending
order, (c) j in descending order.
The following theorem calculates the time required to transform any DBNR-map to
ARDBNR-map using the order of reductions described above.

Theorem 5.3

Any M x N DBNR-map can be transformed to ARDBNR-map in

time less than or equal to (M+ f N Z2]) • T. where T is the time needed to perform a sin
gle reduction.

Double-Base N um ber System Arithmetic Using CNN sDcsigning a 1-bit DBNS A dder Unit U sing CNN (CNNDBNSAU)

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

85

U niversity o f W in d so r

Proof: Assume for simplicity that N is even. For any row / in the DBNR-map, any active
group at position (/,0) ((/, 1) if N is odd) will not participate in a reduction rule unless all
adjacent active groups in the same row have already participated. If all cells in row i are
active, completing the reductions requires time equal to (N / 2) • T ((JjV /2 j - 1) • T if A'
is odd) since the reduction rule takes two active cells at a time. Also, for any column j in
the DBNR-map, the active group in position (0 j ) will not participate unless all other
adjacent active groups in the same column have already participated. If the map has active
groups in positions (ij) Vi, completing the reductions requires time equal to M ■T . The
worst case scenario would be a map with active groups in positions (/,0) V/ ((/, 1) if N is
odd) and also all cells in row M active. The time required to complete the reductions will
then be (M + N /2 ) • T ((M+ [ N / 2~\ - I) ■T i f Ni s odd). For the special case where N
is odd and the map has an additional active cell in position ( 1 ,0 ), the application of the
reduction rule to the active group (0,1) will generate an active cell in position (1,1). This
last active cell together with the active cell ( 1 ,0 ) form a new active group that needs one
more time constant T to be reduced. This completes the proof.

5.3.3 CNNDBNSAU CMOS Implementation
The row reduction rule can be realized using the current-mode circuit shown in Figure
5.9. This circuit can detect the occurrence of an active group. The input current from cells

(ij) and ( ij + 1) is mirrored by transistor M2. A unit current is subtracted by the current
source 7sjnk and the rest of the current is drained by transistor M3. The circuit outputs high
current if both inputs from cells (ij) and ( ij +1) are high and zero current otherwise. When
a reduction rule is detected at cells (ij) and (/._/+1), a positive current is used to activate
cell (i+\j) through the current mirror M6-M7. At the same time, a negative feedback
current is used to deactivate the two input cells (ij) and (ij+ 1) through the current mirror
M3-M5. Figure 5.10 shows how the feedback template is connected to the cells

participating in the row reduction rule o f Eqn. (5.4). As was mentioned in Section 5.3.1,
the functions of the row reduction rule and the overlaying rule are essentially the same:
replace two active cells with one active cell. The only difference is that the row reduction
rule is mapped in a 2-D architecture while the overlaying rule takes on a third dimension.

D o u b le-B ase N u m b e r S y stem A rith m etic U sing C N N sD esig n in g a 1-bit D B N S A d d e r U nit U sing C N N (C N N D B N S A U )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

86

U niversity o f W in d so r

Therefore, the same circuit in Figure 5.9 can be used to implement the carry propagation
rule of Eqn. (5.3) with appropriate changes in the input/output connections.

from

(i.j) and (y + 1 )

M3 P

^

M4_________ _ ] M 5

Figure 5.9 Schematic of the reduction rule.

O Template 0

Figure 5.10 Connection to participating cells.
The circuit shown in Figure 5.9 does not prevent a cell from participating in more than
one reduction rule at the same time, and neither does it enforce the order of applying the
reduction rules to adjacent active cells, as described in Section 5.3.2. Consider again the
situation depicted in Figure 5.11. Four groups of active cells that can take part in the row
reduction rule described by Eqn. (5.4) can be identified. These four groups are outlined
and marked as G1, G2, G3, and G4.

D o u b le-B ase N u m b e r S y stem A rith m etic U sin g C N N sD e sig n in g a 1-bit D B N S A d d e r U n it U sing C N N (C N N D B N S A U )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

87

U niv ersity o f W in d so r

...

i

r2...
G3

zwm

z
V
;

■

1■

G4

mu

f=T= f=5E=

Gl

Figure 5.11 A situation where the row reduction rule can be applied to four
different groups of cells.
If the order of applying the reduction rules, as described in Section 5.3.2, can be enforced,
that will take care of the problem of a cell participating in more than one reduction rule at
the same time. From Section 5.3.2, the row reduction rule should not be applied to group
G4 unless it is not applicable to groups G l, G2, or G3 respectively. This restriction can be
realized using a negative feedback mirror o f the output of groups G l, G2, and G3 (Ominus
in Figure 5.9) to the input of group G4 as shown in Figure 5.12. This will ensure that
none o f the other groups will be active during the application of row reduction rule to
group G4.

G4

G2

0_ G3

o

o Gl

Figure 5.12 Connection between groups of cells.
The circuit schematic for a DBNS adder cell in position (i.j) with the feedback templates
representing Eqn. (5.6) and Eqn. (5.10) is shown in Figure 5.13. It is important to stress
that the operation o f the CNN-DBNS adder is radically different to that of the CNN
universal machine (CNN-UM), designed in [41] and implemented in [181], or the work
presented in [182], While the first two use a stored program to control the use of templates
and the last one uses temporal superimposition of signals, the CNN-DBNS adder
D o u b le-B ase N u m b e r S y stem A rith m e tic U sin g C N N sD e sig n in g a 1-bit D B N S A d d e r U nit U sing C N N (C N N D B N S A U )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

88

U n iversity o f W in d so r

architecture is self-programmable in the sense that the network switches, as needed,
between the two templates described by the row reduction rule of Eqn. (5.4) and the
overlaying rule of Eqn. (5.3) based on the state voltages of the involved cells.

J ¥ < n

I ft] i
T

from other
groups

| («£ '

from C(:-l.y)

from C(/-1 J+l)
J
to C(j-l.y)
to C (;-l./M )

| p

Row reduction rule
with order enforcement

to other groups

I—

G)
1slate

F T

l~—1|_J

to C (v)
from input

$)
CNN cell C ( v )

Overlaving rule

Figure 5.13 CNNDBNS adder cell schematic.

5.3.4 CNNDBNSAU Hspice Simulation
The application of the row reduction rule to the two active cells at positions (i.j) and
(ij+l) will force the cell at position (i+lj) to change to the active state. However, if the
cell at position (i+lj) was active already, the overlaying rule will take place at the doubled
weight cell (z'+ly) and will activate the cell at position (i+l.j+l). Therefore, the
neighborhood o f the CNN network required is of radius 1. The truth table in Table 5.1
summarizes all possible conditions of the cell (i+lj) and neighbor cells prior and post the
processing of the DBNS map. Notice that from all the

8

possible initial conditions, the

final DBNS map will be different to the initial map (CNN cells change states) in only the
last two entries o f the truth table. These two cases correspond to the application of the
overlaying or row reduction rules described above. Hspice simulations of these two cases
D o u b le -B ase N u m b e r S y ste m A rith m etic U sin g C N N sD e sig n in g a 1-bit D B N S A d d e r U nit U sin g C N N (C N N D B N S A U )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

89

U niversity o f W in d so r

are shown in Figure 5.14. Figure 5.14-a shows the Hspice simulation of the DBNS adder
cells where only the row reduction rule is applicable while Figure 5.14-b shows the
Hspice simulation of the DBNS adder cells when both the overlaying and row reduction
rule are applicable.
Table 5.1 Truth table of DBNS adder unit.

" V 1

“ V

~ ij~ \

" '- u

W

i

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

1

0

0

1

0

0

0

1

1

0

1

1

0

1

0

0

1

0

0

0

1

0

1

1

0

1

0

1

1

0

0

0

1

0

1

1

I

0

0

0

1

The worst case delay for the network to settle after applying the overlaying or row
reduction rules is measured as 5.1ns. This delay is 60% less than the delay reported in [34]
using discrete logic gates to control template operation.

*>

1
0

1

0

0

5

I

0

0

0

0
5

0

I
0

0

0

a

0

0

0

5

b

Figure 5.14 Hspice simulation of the CNNDBNSAU: (a) only the row reduction
rule is applicable, (b) both the row and overlaying reduction rules are applicable.

D ouble-B ase N u m b e r S y stem A rith m etic U sing C N N sD e sig n in g a 1-b it D B N S A d d e r U nit U sing C N N (C N N D B N S A U )

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

90

U niversity o f W in d so r

5.4

CNNDBNSAU Design Scalability

The CNN implementation of any M x N DBNS adder consists of a 2-dimensional grid of
CNNDBNS adder units with each adder unit corresponding to an entry in the DBNS-map.
At the beginning of the addition process, the ARDBNR-map representing X is loaded as
the initial conditions on the nodes of the array and the ARDBNR-map representing Y is
applied as inputs for a time unit T equal to the CNN cell time constant. The adder units are
connected to the input Y using Eqn. (5.3). After a time period T, the input 7 has no effect
and the adder units use Eqn. (5.3) and Eqn. (5.4) to connect to each other to reduce the
sum, Z, to ARDBNR.

5.4.1 A 20x20 CNN-based DBNS Adder
A 20x20 CNN-based DBNS adder is developed using a 20x20 grid of CNNDBNSAU.
This DBNS adder has the same dynamic range as a 32-bit binary adder [171]. The
CNNDBNSAU are connected to each other using the outputs from the overlaying and row
reduction rules presented in Section 5.3.3. The CNN-based DBNS adder has a very
regular structure as shown in the schematic of a 4x4 section in Figure 5.15. A number of
Hspice simulations, using parameters from 0.35 pm CMOS technology process, were
performed using random DBNS operands represented in the addition-ready form. To save
on space, the simulation of only a small 4x4 section of the DBNS adder is presented here
using the two DBNS-maps shown in Figure 5.16. These maps are not in the CDBNR form
but they comply with the definition of the ARDBNR. The maps were chosen to illustrate
the worst case situation where the sum output has to go through successions of
applications of the overlaying and row reduction rules to be represented in an ARDBNR
form. The example also illustrates the maximum delay where the row reduction rule has to
be applied to each row.

D o u b le-B ase N u m b e r S y ste m A rith m etic U sin g C N N sC N N D B N S A U D e sign S c alability

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

91

U niversity o f W in d so r

VS— O5
2 SC

a

a
UFRC
URC ■m
a - LC
-•
© 106 • a
-a 101 co
—-•
RC a
■ - - a - - a 102 a
105 - a - - a - - a
- a -- a - ■ 103 °
L04 -a- ♦ - -a

1 2
•

>

as

"
•

■
•
■

■
------o r vs------

C/2

-a
UFRC
URC •
a
a
LC
106
101 CO RC • - ---- a
102 m
105 ■ - • »
104 a • • a
103
S/3 C/3
> 0

3
UFRC

V. (A
2 a2
UFRC
URC - • -LC
o
106
101 co
RC - • —
- 102 m
105
104 ■ ♦
103 °
C/3 C/3
> SC
31 ■ ;
■ ■
C/3

a
•
»a»-

2
UFRC
LC
101
102
103

C/3

2:

URC

g
106
CO RC
I
105
104
°
C/3 C/1
> oc

3

-a
URC • a
106 ■ - IOI co
RC a - ■ - -■ 101
102 m
105 a - a- a 102
104 ■ ♦ - ■
103 D
103
> £

LC

2 2

■
■■■
■-

m:
ih tA
2 sc
UFRC =URC
LC g
106

C/3

§

*

■

■

■

■ . .
■
RC ■ • •
•
105 ■
104 -a • -

to
a
°

23:

i.

24: *

■ vrsn
2 SC
- ■ UFRC
URC
- -■ LC
© 106
- a 101 co
RC
102 m
105
- ■ 103 °
104
> s

-----= 2
-a - UFRC
URC
-■
a - LC
© 106
•■ - - -- B-- 101 co
RC
- ■ - a- -■ - 102 m 105
a a - • • 103 °
104
C/3 C/3
> SC

32 * ■
•
• •
■ ■
cr. c/3
2 sc
-a •
UFRC
URC
•a-— ... B LC
o 106
• — ■ - 101 co
RC
■
a- 102 m
105
■
■ 103 °
104

3 3 : : . . .
a a
"17: c/3
2 OS
a - UFRC = URC
-a • LC g
106
a- 101 cn
RC
a - 102 m 105
a 103 °
104
C/3 (A
> SC

>

-■

a
-a
a
a
a

SC

*

C/3 C/3
3 §
UFRC
URC •
a - LC
o 106 ■
a - 101 co
RC ■
■ 102 ffl 105 ■
- a- 103 °
104 - a
VS

11

3 s

UFRC
URC ■
LC yo 106 ■
101 CO RC •
z
102 0
105 - a
Q
104 a
103

•-

RS

a- -

■a
-a
-a
-a
a-

RS

a

VS

-------------------3 §
a- UFRC
URC
© 106
■ - LC
■ - 101 CO RC
a- 102 CO 105
■ 103
104
C/3 W
> sc

V5 Jl
2 sc
• -a- UFRC = URC
. .. . m . LC
o
106
... . -B. 101 to
RC
' a- a 102 m 105
a • a 103 °
104
C/3 C/3
> SC

34 * *
a a
IA iti
3 2
--B
— a - UFRC
URC
-a- — a - LC §
106
— -a- 101 co
RC
-a
a - 102 1
105
a
a - 103 °
104
C/3 c/3
> SC

a
-a
a
-a
a

-a
■a
-a
-a
-a

Figure 5.15 Schematic of a 4x4 section of the CNN-based DBNS adder.

a

b

e

d

Figure 5.16 An example of addition using the CNN-DBNS adder: (a) X, (b) Y, (c)
Overlaying X and Y, (d) Z after time T.

D o u b le-B ase N u m b e r S y stem A rith m e tic U sin g C N N sC N N D B N S A U D esig n S c alability

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

92

U niversity o f W in d so r

A step by step application of the reduction rules to transform the DBNS-map of the sum
output in Figure 5.16-d to addition-ready representation is shown Figure 5.17. In this
figure, the grey color represents a cell participating in a reduction rule at that particular
instant of time. The output of Hspice simulation of the addition process and the non-zero
digit reduction to ARDBNR is shown in Figure 5.18. In the worst case, any two M x N
ARDBNR-maps can be added and the sum converted to ARDBNR-map in a time delay
given by:
T dela y

= ( M + [W '2 D • T

(5.12)

where [~a~\ denotes the smallest integer such that [a~\> a.

2°

2'

Figure 5.17 Non-zero digit reduction of Z. Starting from left to right, each map is
obtained from the previous map after a time T.

D o u b le-B ase N u m b e r S y stem A rith m etic U sing C N N sC N N D B N S A U D esign S calab ility

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

93

U n iversity o f W in d so r

3
*>

I
0
0

1
50

0
0

50

0

0

50

0

0

50

0

50

j

•>

I
0
0

1
50

0
0

50

0

0

50

0
j

*)

*)

•>

1
0
0

50

0
0

50

50

0
0

50

0

0

50

0
0

50

0

0

50

0
0

50

1
0
0

Figure 5.18 Hspice simulation of a 4x4 section of the CNN-based DBNS adder.

5.4.2 Constraints on the CNN-based DBNS Adder to be SelfProgrammable
The negative output current of the circuit in Figure 5.9 can be described using the
following equation:

Ominus ~

where

m l ^ n ~ ^sink)

(5 -1 j)

is the ratio of the current mirror M3-M5 and Inis the sum of theoutput currents

from the cells in positions (ij) and (zj+l). The positiveoutput current is given by the
following equation:

°plus

=

V z V n - ^ n d

D o u b le-B ase N u m b e r S ystem A rith m etic U sing C N N sC N N D B N S A U D esig n S calab ility

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

(5 -14)

94

U niversity o f W in d so r

where m2 is the ratio of the current mirror M3-M4 and m3 is the ratio of the current mirror
M6-M7. For simplicity of discussion, we will refer to the multiplying factor (m j in Eqn.
(5.13) and ^2m3 in Eqn. (5.14)) as the mirror ratio m.

The choice of the mirror ratio m and the value of the current source /sink plays a crucial
role in the performance of the CNN network. An improper choice of either of these
parameters might cause instability or force the cell output to settle to an incorrect stable
point. There are two critical network transitions: A row reduction rule followed by a row
reduction rule and a carry propagation rule followed by a row reduction rule.
An example of a row reduction rule followed by a row reduction rule is depicted in Figure
5.19.

... i

:

... i

...

•
b'

:

s

b'

BW

j*'

I

:

a

... i f ' ...

...

I H fl
:

b

■
c

Figure 5.19 An example of a reduction rule followed by a reduction rule: (a) initial
map, (b) map after the first reduction rule, (c) map after the second reduction
rule.
Examining Figure 5.19-a. one can see that there is only one group of active cells that can
participate in the row reduction rule of Eqn. (5.4). The active cells in positions (ij) and
(/,_/+1) will be replaced by an active cell in position (i+\j) as shown in Figure 5.19-b.
This means that the cells in positions (ij) and ( ij+ 1) will switch from logic 1 to logic 0
while the cell in position (i+ lj) will switch from logic 0 to logic 1. However, once the
transition starts and the cell in location (z'+lj) generates an output current, cells at
positions (i+lj) and (z+ lj'+ l) will constitute another candidate group of active cells for
the row reduction rule. Consequently, while the row reduction rule for the active group in
position (ij) tries to force the state voltage of the cell in position (i+l,j) to rise, the row
D o u b le-B ase N u m b e r S y stem A rith m etic U sin g C N N sC N N D B N S A U D esign S calability

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

95

U niversity o f W in d so r

reduction rule o f the active group in position (i+ lj) tries to force the state voltage of the
same cell to fall which leads to undetermined final state. The solution to this problem is to
ensure that the row reduction rule (for the active group in position (z'+lj ) in this case) will
not start unless the sum o f the output current is more than a certain large threshold. This
can easily be implemented by setting the current sink in Figure 5.9 to the threshold value
and then using a high mirror ratio to adjust the level of the current flowing into transistor
M5 or transistor M7 to the unit current.
The case where the carry propagation rule is followed by the row reduction rule is similar.
It is known from the definition of ARDBNR that there will always be a non-active cell to
the right of the doubled weight cell as depicted in Figure 5.20. Therefore, the application
o f the carry propagation rule is straight forward. However, once transition starts and the
cell in position (i.j) generates an output current, the row reduction rule of Eqn. (5.4) will
be applicable to the cells (i.j) and (zj+1). This case reduces to the above case where the
application of one rule forces the state voltage of the cell to rise while the application of
the other rule forces the state voltage of the cell to fall. The solution is already
implemented in the solution o f the above case; ensuring that the reduction rule will not
start unless the sum of the output current is more than a certain large threshold.

B1
Bi+1

i

r

m ■

I

a

...

...

r

i

...

...

:

:

3*
3W
:

3I+’

r

i

f '

...

u>_.

...
:

■

I

b

c

Figure 5.20 An example of a carry propagation rule followed by a reduction rule:
(a) initial map, (b) map after applying the overlaying rule, (c) map after applying
the row reduction rule.

D o u b le -B ase N u m b e r S y sle m A rith m etic U sing C N N sC N N D B N S A U D esig n Scalability

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

96

U niversity o f W in d so r

5.4.3 Impact of the CNN-based DBNS Adder on Substrate Noise
To get an estimate o f the amount of switching noise and cross talk reduction, the CNNbased adder is compared to a 32-bit standard digital binary adder since, to the knowledge
of the authors, there is no published DBNS design that uses standard logic gates. The 32bit digital binary adder has the same dynamic range as the 20x20 DBNS adder and is
designed using standard library cells in the same 0.35pm CMOS technology. Different
random Hspice simulations were performed on the adders and switching noise was
recorded. Switching noise for the CNN-based DBNS adder is plotted in Figure 5.21
against switching noise for the digital design. In all Hspice simulations, the CNN-based
DBNS adder showed advantages in switching noise with average improvement of 49dB
over the digital adder.

-180

CNN DBNS

• 160

W-140

£

-120

ut/5

•100

o
-60

Digital binary

-20

o

10

20

so

40

t i m e (n s )

Figure 5.21 Switching noise of the CNN-based 20x20 DBNS adder and 32-bit
standard digital binary adder.
Cross talk is also simulated using Hspice and plotted in Figure 5.22. The CNN-based
DBNS adder reduced cross talk by more than 20dB compared to that of the digital adder.

D o u b le-B ase N u m b e r S y stem A rith m etic U sing C N N sC N N D B N S A U D esig n S c a la b ility

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

97

U n iversity o f W in d so r

Digital
40

4 <

CNN
20

2
tft

0
•20

-6
0

5

10

15

20

time (ns)

Figure 5.22 Cross talk of the CNN-based 20x20 DBNS adder and 32-bit standard
digital binary adder.

5.5

Summary of CNN-based DBNS Arithmetic

This chapter has presented a practical methodology to implementing DBNS arithmetic
using analog CNN arrays. An interesting property of the DBNS is that numbers can be
represented naturally as 2-D grids. This property facilitates loading DBNS numbers either
as 2-D initial conditions or as 2-D inputs into CNN architectures. Therefore, arithmetic
operations in the double-base number system become a problem of CNN image
morphology. Subsequently, the challenge is to design proper CNN templates that perform
the required arithmetic task.
Addition in DBNS is reduced into two consecutive image manipulation steps: Finding the
immediate sum of the operands maps and transforming the immediate sum into additionready representation for further processing. These two steps are implemented using the
overlaying and row reduction rules. First, the row reduction rule is synthesized using a
simple current-mode circuit. A key advantage of this circuit is that it can be used to
perform the overlaying rule as well. The only difference is where the inputs come from
and where the outputs go. A CNN-based DBNS adder unit is then designed by employing
the overlaying and row reduction circuits as template connections to neighbor cells. The
functionality of the CNNDBNSAU is proved using extensive Hspice simulations.

D o u b le-B ase N u m b e r S y stem A rith m etic U sin g C N N sS u m m ary o f C N N -based D B N S A rith m etic

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

98

U niversity o f W in d so r

The scalability of the CNNDBNSAU is illustrated by designing a 20x20 DBNS adder.
Special cases of the DBNS-maps are identified and restrictions are defined to control the
application of the reduction rules. The restrictions are developed using analog feedback
connections between groups of cells. This property renders the network selfprogrammable in the sense that the adder switches between the overlaying and row
reduction rules based on the outputs of the involved cells. Hspice simulations show that
the CNN-based DBNS adder achieved 49dB improvement in switching noise and more
than 20dB reduction in cross talk over a standard digital adder that possesses the same
dynamic range and operates at the same speed.

D o u b le-B ase N u m b e r S y stem A rith m etic U sing C N N sS u m m ary o f C N N -b ased D B N S A rithm etic

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

99

Chapter 6
Conclusions

6.1

Summary and Contributions

This thesis reports a rather intriguing research project involving the
development of analog cellular neural networks for applications in
implementing arithmetic with special applications in mixed-signal
systems that require the protection of very sensitive analog signals.
Noise issues represent a new domain and will become important as
integrated circuits, for example, start to include the packaging of
bio-sensors in wireless devices for applications in the health
sciences. The novelty associated with this work is based on the use
of arrays of non-linear analog circuits, to perform digital
processing. Current-mode CNN arrays work with the supply
current being almost constant providing minimum instantaneous
supply current variations. This property drastically reduces
switching noise. In addition, CNN arrays provide smooth output
transitions with an RC mechanism to control the slew rate. This
feature allows circuit designers to reduce cross talk at the expense
of circuit speed. The major contributions in this thesis focus on the
development of practical methodologies to implement arithmetic
operations using analog CNN arrays. The research covered
arithmetic operations using three different number systems: The
binary number system, the binary signed-digit number system, and

C o n c lu sio n s

S u m m ary and C o n trib u tio n s

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

100

U niversity o f W in d so r

the double-base number system. CNN realizations for each number system uncover
certain advantages and disadvantages that will be summarized below. Circuit designers
then have to weigh the pros and cons and decide which specific number system will best
suit their application.

Binary number system arithmetic: To tackle the noise issue, the research commenced
in the arithmetic domain focusing on the binary number system. The main challenge was
to transform binary arithmetic into 2-D image morphology; to be processed using analog
CNN arrays. This included defining new continuous functions representing the sum and
carry functions, synthesizing the new functions using simple current-mode circuits such as
summing nodes and current mirrors, and designing network neighborhood and template
connections between CNN cells. A 1-bit full adder was then designed using four CNN
cells and tested to converge for all possible inputs. This property is of paramount
importance because it guarantees the stability of larger networks. The 1-bit full adder
constitutes a basic building block that was used to develop more complex circuits such as
a multi-bit adder and a multiplier. The designed binary circuits have the following
advantages:

• Exhibits low switching noise. Hspice simulations show that the circuits improve
switching noise by 57dB compared to standard digital designs operating at the same
speed.
• Offer low cross talk. Based on Hspice simulations, cross talk diminishes by more than
20dB compared to its digital counterpart.
• Provide standard binary inputs and outputs. This property facilitates using the new cir
cuits as black boxes in more complex structures with no changes to the circuits being
designed.
Nevertheless, the developed circuits have some disadvantages that are common to current
mode circuits as listed below:

• Consumes more power. Since these current-mode structures are driven by current, they
will always consume power even when the circuits are not processing any data.

C o n clu sio n s

S u m m ary 'a n d C ontrib u tio n s

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

101

U niversity o f W in d so r

• Longer critical paths. This is a feature of current-driven circuits when optimized for
low power consumption. The small current requires more time to charge/discharge the
load capacitances.
Binary signed-digit number system: The

exceptional

results,

regarding

noise

suppression, obtained from the CNN-based binary designs drove the extension of the
design methodology to a redundant number system which provides benefits of reduced
delay and interconnects. The binary signed-digit representation was chosen because it
offers superior noise margins compared to higher radix systems. Nonetheless, the main
challenge was to discover a means to represent the 3-valued BSD number system naturally
in the CNN framework. This led to the design of a new class of CNN characterized by a
fundamental 3-state activation function. The addition algorithm of BSD was analyzed and
decomposed into several steps. Four 3-state CNN cells, together with new equations that
define the BSD addition algorithm, were used to develop a 1-digit BSD full adder. As was
the case with the binary circuits, the 1-digit BSD full adder was tested to converge for all
possible inputs. This 1-digit BSD full adder forms a key element used in building a multidigit adder and a multiplier. The multiplier features four-input addition of the partial
products in the first level of the binary tree. This property reduces the number of full
adders needed and, hence, instantaneous supply current. The developed BSD circuits has
several advantages:

• Exhibits very low switching noise. The use of bi-directional current-mode summing
nodes reduced switching noise to unprecedented levels. The BSD circuits achieved
almost 70dB improvement in switching noise over standard digital circuits. This is an
improvement of more than 12dB over the corresponding CNN-based binary circuits.
• Offer reduced cross talk. Cross talk is reduced by more than 23dB of that of the digital
circuits. This is an improvement of more than 3dB over the CNN-based binary design.
• Allows constant delay operation regardless of the word length. This is the main advan
tage for using the binary signed-digit number representation. On the other hand, the
delay in the binary system increases linearly with the length of the operands. Conse
quently, the advantage of using BSD becomes prominent as operands sizes increase.
• Provides standard binary inputs and outputs even though they internally use BSD repre
sentation. This property facilitates using the new circuits as embedded components in
existing structures without any changes to the circuits.

C o n clu sio n s

S u m m ary and C o n trib u tio n s

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

102

U n iv ersity o f W in d so r

The developed BSD circuits also have some disadvantages such as:

• Increased power consumption. These structures work with three levels of current com
pared to two levels in the case o f binary circuits. Therefore, to keep the noise margin
the same as for the binary circuits, the current provided by current sources has to be
doubled.
• Require more transistors and larger silicon area. The BSD algorithm is more complex
than the binary algorithm. Realizing the restrictions on the transfer digit requires com
plex circuits that use a large number of transistors which usually translates into larger
silicon area.
Double-base number system: A fascinating feature of the DBNS is that numbers can be
represented naturally as 2-D grids. This property facilitates mapping DBNS numbers into
CNN architectures. The DBNS is also a relatively new number system and no practical
implementations had previously been published when this research work was started. This
peaked our interest in developing a practical methodology to convert arithmetic operations
in the DBNS into a problem of CNN image morphology. In order to take full advantage of
the limited-carry property promised by the DBNS, operands have to be represented in
addition-ready maps. These addition-ready representations can be obtained by applying
two reduction rules to the original DBNS-maps: the overlaying rule and the row reduction
rule. As an initial step toward the goal, an innovative circuit design to implement the row
reduction rule was developed. The key feature of this circuit is that it can also be used to
implement the overlaying rule using appropriate inputs and outputs. Special cases of
DBNR-maps were also identified and a unique feedback connection between groups of
cells was defined to deal with them. This method guaranteed correct network operation as
well as obtaining a DBNS-map with the smallest possible number o f active cells. The use
of feedback between groups of cells renders the CNN-based DBNS adder selfprogrammable where the network decides on which reduction rule to use next based on
the outputs o f the involved cells. The main advantages of the DBNS circuit can be
summarized as follows:

C o n clu sio n s

S u m m ary an d C o n trib u tio n s

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

103

U niversity o f W in d so r

• Exhibits low switching noise. Since the adder incorporates a pure analog control
method that governs the operation of the reduction rules, the network embraces the
common feature of the binary and BSD designs. A 20x20 DBNS adder achieved 49dB
improvement in switching noise over the corresponding traditional digital adder.
• Offers low cross talk. The smooth transitions of the CNN nodes reduce cross talk to
more than 20dB compared to that of the fast digital nodes in a digital implementation.
• Promises carry-free addition if the canonic representation is used. Even so, since the
canonic representation is difficult to obtain, the addition-ready representation can be
used to provide limited-carry addition.
The main disadvantage of the DBNS design when compared to digital designs is power
consumption. This seems to be the bottleneck that faces analog designers and little appears
to be able to be done to improve it particularly when speed has to be maximized.

A final point to be made for the DBNS representation, is that it appears to be useful for
cryptographic applications [174] because of its sparseness. One important property of
crypto hardware systems is their resistance to side attacks. These attacks can take several
forms in terms of hardware implementations, including measuring circuit power
consumption changes as certain crypto calculations are being performed. A crypto system
based on CNN arrays promises to be more resistance to such attacks than a computational
processor based on standard logic implementations, because of the very low noise levels
(power consumption changes) inherent in the CNN array operation.

A summary of the design specifications of the novel CNNBFA, CNNBSDFA, and
CNNDBNSAU designs is shown in Table 6.1. The table also shows design specifications
for the state-of-the-art CNN-based 1-bit binary full adders, 1-digit BSD adders, and CNNbased DBNS structure to perform reduction to addition-ready representation. The new
CNNBFA consumes about 43% of the power required by the recursive structure reported
in [35] and much less when compared to the power required by the flat structure reported
in [34] where the power increases with 0 (n ~). The speed of the new CNNBFA is about 8
times the speed of the recursive structure and about 19 times the speed of the flat structure.
The new CNNBFA also requires less than half the transistors of existing structures and

C o n clu sio n s

S u m m ary and C o n trib u tio n s

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

104

U niversity o f W indsor

this maps into smaller silicon area. In addition, the new CNNBFA is scalable and
compatible with the well-known circuit designs.

The delay of the new CNNBSDFA is comparable to the 5ns delay of the voltage-mode
signed-digit adder circuit reported in 0.18pm CMOS technology [159]. In addition, the
CNNBSDFA uses 27% fewer transistors than the voltage-mode design. When compared
to the recently introduced negative-differential-resistance (NDR) signed digit adder
[161][162], the CNNBSDFA is 8 times faster and consumes 82% less power. At first
glance, the BSD adder appears to be slower than the binary adder. However, an operand
size of just two digits is required to produce similar worst-case propagation delay values
for a ripple-carry adder built with binary full adder cells. This indicates that the BSD can
provide significant speedup of addition for multi-digit adders since addition of operands
of any length can be accomplished in the time required for a 2-bit binary addition.

The new CNNDBNSAU design performs both addition and reduction to AR
representation while the existing structure performs reduction only [34]. The new
CNNDBNSAU design is more than two times faster and uses fewer transistors which
means smaller silicon area and less power consumption. In addition, reduction rules
equations and operations are synthesized using simple current mirrors without hysteresis
or digital logic which leads to improved switching noise.

Table 6.1 Summary of design specifications.

NS

Transistor
count

Design

Power
(pW)
279.5x(l+N /4)

Flat [34]

=108x(l+ N /4)

—

= 60

Recursive [35]

=101 +

—

=25

=392

New CNNBFA

43

21.83

3.22

169.98

Digital [159]

140

5

—

—

4789

17

2300

New- CNN BSD

102

72.44

6.12

412.51

D B N S [34]

= 78+

New C N N D B N S

56

N D R [161]........

C o n clu sio n s

Active
Delay
Area
(jtm2) (ns)

—

—

' 2S.35 '

13

—

5.1

197.11

S um m ary and C o n trib u tio n s

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

105

University- o f W in d so r

The increased power and silicon area of the CNNBSDFA design, compared to the
CNNBFA and CNNDBNSAU, can be justified by its ultra-low switching noise
performance for very sensitive applications where reducing switching noise is of crucial
importance. The CNN-based BSD design reduces switching noise to unprecedented levels
as can be seen in Figure 6.1 and Figure 6.2. Figure 6.1 shows switching noise
comparison for different adder designs as a function of adder size while Figure 6.2 shows
switching noise comparison for different multiplier designs as a function o f multiplier
size. It is clear that the BSD design offers the lowest switching noise for all operand sizes
in both cases.

-110

CNN BSD
CNN binarv
CNN DBN’S —■—
Digital binary •

-1 0 0

-90

-30

-20
-10

100

120

N um ber o f bits

Figure 6.1 Switching noise of different adders vs. adder size.

-120

CNN BSD
CNN binary
Digital binary

-no
-1 0 0 -

-90

£ -40
-30

-20
-1 0

N um ber o f bits

Figure 6.2 Switching noise of different multipliers vs. multiplier size.
C o n clu sio n s

S um m ary an d C o n trib u tio n s

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

106

U niversity o f W in d so r

The CNN-based BSD design also offers reduced cross talk compared to all other designs
as can be seen in Figure 6.3.

70
60
50

<

40

30

20

10

0
c\
"ii'O'

%

%

Figure 6.3 Cross talk of different adders designs.

6.2

Conclusions

In this research work, a practical methodology to develop ultra-low noise arithmetic
circuits using analog cellular neural networks has been presented. The technique has been
used to design arithmetic circuits for three different number systems: The binary number
system, the binary signed-digit number system, and the double-base number system. First,
for each number system, the addition algorithm has been re-defined using continuous
functions that can be realized as CNN templates. Next, the templates have been utilized to
develop a 1-bit adder unit that converges for all possible inputs. This property guarantees
the stability of larger networks that use the adder unit as an embedded component. For the
BSD number system, a new class of CNN featuring a 3-state activation function has been
developed. This enables mapping the 3-valued BSD number system naturally into the 3states of the new CNN cell. The DBNS adder unit uses a novel synthesis of the reduction
rules to perform addition as well as reduction to addition-ready representation for further
processing. Using the different adder units as enabling building blocks, complex circuit
structures such as multi-bit adders and large multipliers have been developed. The CNNConclusions

Conclusions

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

107

U niversity o f W in d so r

based BSD arithmetic circuits are the first to be published in the literature. The DBNS
design employs a novel self-programmable structure that switches between different
templates based on the output voltages of the involved cells. This renders the DBNS adder
circuit the first complete CNN-based adder circuit in the literature. The BSD and DBNS
designs also incorporate the traditional advantages o f such number systems including
reduced delay and interconnects for large operands. Moreover, all the CNN structures
developed also exhibit the advantage of the ultra-low noise property. Hspice simulations,
using parameters from 0.35pm CMOS technology, show that switching noise and cross
talk have been reduced by up to 70dB and more than 23dB, respectively, when compared
to traditional digital circuits operating at the same speed.

6.3

Suggestions for Future Work

The following are some directions for future research:

1. When we compare the noise performance of the designed circuits and the digital imple
mentations, we assume that all circuits have equivalent parasitic inductance and capac
itance. This is not always true because the parasitic elements depend greatly on the
specific layout of the circuit. A more accurate comparison can be performed by fabri
cating the noise source (the design being tested) and a noise sensing element (e.g., a
sensitive amplifier) on the same chip a certain distance apart. Then random inputs can
be applied to the noise source and the effect on the noise sensing element can be mea
sured using accurate testing equipment.
2. In Chapter 5, we discussed addition in the double-base number system using the CNN
paradigm. We also developed an adder that performs addition as well as reduction to
the addition-ready representation. However, we assumed that the inputs are in the addi
tion-ready representation. For a practical implementation that is compatible with the
digital convention, the adder should accept binary inputs and produce binary outputs.
Therefore, the implementation of a binary to double-base and a double-base to binary
converters needs to be addressed.

C o n clu sio n s

S ug g estio n s fo r F u tu re

W ork

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

108

U niversity o f W in d so r

3. We did not discuss the implementation of multiplication in Chapter 5. This is because,
the traditional method of implementing a multiplier (using shift and add) seems ineffi
cient when considering the required silicon area. A thorough study needs to be done in
this area.
4. All the measurements presented in this thesis are based on Hspice simulation at room
temperature. For precise performance evaluation, second-order effects (such as thermal
noise and device mismatch) need to be taken into account.

C o n clu sio n s

S uggestions for F u tu re W ork

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

109

U niversity o f W in d so r

REFERENCES

[ 1] "Substrate noise analysis o f mixed-signal ICs,” Cadence Design Systems, 2001.
[2]

P. Larsson, “Power supply noise in future IC's: A crystal ball reading,” in Proc.
IEEE Custom Integrated Circuits Conf., pp. 467-474,1999.

[3]

P. E. Gronowski et al., “High-performance microprocessor design,” IEEE J. SolidState Circuits, vol. SC-33, no. 5, pp. 676-686, May 1998.

[4]

International Technology Roadmap for Semiconductors. http://www.itrs.net/Common/2004Update/2004Update.htm

[5] M. Nagata and A. Iwata, “Substrate crosstalk analysis in mixed signal CMOS inte
grated circuits,” Proc. Asia and South Pacific - Design Automation Conf., ASPDAC 2000, pp. 623-629,2000.
[6] S. R. Vemuru, “Effects of simultaneous switching noise on the tapered buffer
design,” IEEE Trans, on Very Large Scale Integration (VLSI) Systems, vol. 5, no. 3,
pp. 290-300, September 1997.
[7] J. J. Becerra and E. G. Friedman, Analog Design Issues in Digital VLSI Circuits and
Systems. Norwell, Massachusetts, Kluwer Academic Publishers, 1997.
[8] E. G. Friedman, High Performance Clock Distribution Networks, Kluwer Academic
Publishers, 1997.
[9] X. Aragones, J. L. Gonzalez and A. Rubio, Analysis and Solutions for Switching
Noise Coupling in Mixed-Signal ICs, Kluwer Academic Publishers, 1999.
[10] Han-Su Kim, Jenkins, K.A., and Ya-Hong Xie, “Effective crosstalk isolation
through p+ Si substrates with semi-insulating porous Si,” IEEE Electron Device Let
ters, vo. 23, no. 3, pp.160-162,2002.
[11] D. K. Su, M. J. Loinaz, S. Masui, and B. A. Wooley, “Experimental results and mod
elling techniques for substrate noise in mixed-signal integrated circuits,” IEEE J.
Solid State Circuits, 28,420,1993.
[12] K. Joardar, “Substrate crosstalk in BiCMOS mixed mode integrated circuits,” SolidState Electron, vol. 39, pp. 511-516, April 1996.
R EFEREN CES

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

110

U niversity o f W in d so r

[13] J. H. Wu, J. Scholvin, J. A. del Alamo, and K. A. Jenkins, "A Faraday cage isolation
structure for substrate crosstalk suppression,” IEEE Trans. Microwave Wireless
Compon. Lett. vol. 11, pp. 410-412,2001.
[14] Liao, C.P., Juang, K.C., Huang, T.H., Duh, D.S., Yang, T.T., and Liu, M.N. "A new
isolation technology for mixed-mode and general mixed-technology SOC chips,”
Semiconductor Manufacturing Technology Workshop, pp. 124-132,2000.
[15] J. P. Raskin, A. Viviani, D. Flandre, and J. P. Colinge, "Substrate cross-talk reduc
tion using SOI technology,” IEEE Trans. Electron Devices, vol. 44, pp. 2252-2261,
1997.
[16] M. Kumar, Y. Tan, and J. K. O. Sin, "A simple, high performance complementary
TFSOI BiCMOS technology with excellent cross-talk isolation and high-Q inductors
for low power wireless applications,” in Proc. IEEE Int. SOI Conf., Wakefield, MA,
pp. 142-143, Oct., 2000.
[17] Kumar, M., Tan, Y., and Sin, J.K.O., "Excellent cross-talk isolation, high-Q induc
tors, and reduced self-heating in a TFSOI technology for system-on-a-chip applica
tions,” IEEE Trans. Electron Devices, vol. 49, no. 4, pp. 584-589, 2002.
[18] "High integration brings noise to a new level,” Technology Update, Wireless System
Design, november 1999.
[19] Y. Tsividis, Mixed Analog-Digital VLSI Design and Technology, McGraw-Hill,
1995.
[20] A. Cathelin, D. Saias, D. Belot, Y. Leclercq, and F. J. R. Clement, “Substrate para
sitic extraction for RF integrated circuits,” Design Automation & Test in Europe,
poster 4C-2, March 2002.
[21] R.M. Secareanu, S. Warner, S. Seabridge, C. Burke, T.E. Watrobski, C. Morton, W.
Staub, T. Tellier, and E.G. Friedman, “Placement of substrate contacts to alleviate
substrate noise in epi and non-epi technologies,” Proc. of the 43rd IEEE Midwest
Symposium on Circuits and Systems, vol. 3 , pp. 1314-1318,2000.
[22] L.-R. Zheng and H. Tenhunen, “Effective power and ground distribution scheme for
deep submicron high speed VLSI circuits,” Proc. of the 1999 IEEE International
Symposium on Circuits and Systems, ISCAS '99, vol. 1, pp. 537-540,1999.
[23] T. Blalack, “Design techniques to reduce substrate noise,” Analog Circuit Design:
Volt Electronics; Mixed-Mode Systems; Low-Noise and RF Power Amplifiers for
Telecommunication, pp. 193-217, J. Huijsing-Editor, Kluwer, 1999.

R EFER EN C ES

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

111

U niversity o f W in d so r

[24] R.M. Secareanu, S. Warner, S. Seabridge, C. Burke, T.E. Watrobski, C. Morton, W.
Staub, T. Teilier, and E.G. Friendman, "Physical design to improve the noise immu
nity of digital circuits in a mixed-signal smart-power system,” Proc. of the 2000
IEEE International Symposium on Circuits and Systems, ISCAS 2000, vo. 4, pp.
277-280,2000.
[25] T. Blalack, Y. Leclercq, and C.P. Yue, “On-chip RF isolation techniques,” Proc. of
the Bipolar/BiCMOS Circuits and Technology Meeting, pp. 205-211,2002.
[26] K.T. Tang and E.G Friedman, “On-chip delta-I noise in the power distribution net
works of high speed CMOS integrated circuits,” Proc. 13th Annual IEEE Interna
tional ASIC/SOC Conf., pp. 53-57,2000.
[27] L. Connell, N. Hollenbeck, M. Bushman, D. McCarthy, S. Bergstedt, R. Cieslak, J.
Caldwell, “A CMOS broadband tuner IC,” IEEE International Solid-State Circuits
Conf., vol. 45, pp. 400-401, February 2002.
[28] L. Forbes, B. Ficq and S. Savage, “Resonant forward-biased guard-ring diodes for
suppression of substrate noise in mixed-mode CMOS circuits,” Electronics Letters,
vol. 31, no. 9, pp. 720-721, April 1995.
[29] T. Liu, J.D. Carothers, and W.T. Holman, “A negative feedback based substrate cou
pling noise reduction method,” Proc. Twelfth Annual IEEE International ASIC/
SOC Conf., pp. 49-53,1999.
[30] S. Sakiyama, J. Kajiwara, M. Kinoshita, K. Satomi, K. Ohtani, and A. Matsuzawa,
“An on-chip high-efficiency and low-noise DC/DC converter using divided switches
with current control technique,” IEEE International Solid-State Circuits Conf.,
ISSCC ‘99, Digest of Technical Papers, pp. 156-157, 1999.
[31] H-T. Ng and D.J. Allstot, “CMOS current steering logic for low-voltage mixed-sig
nal integrated circuit,” IEEE Trans. VLSI Systems, 5, pp. 301-308, Sept. 1997.
[32] D.J. Allstot, S-H. Chee and M. Shrivastawa, “Folded source-coupled logic vs.
CMOS static logic for low-noise mixed-signal ICs,” IEEE Trans. Circuits and Sys
tems 1,40, pp. 553- 563, Sept. 1993.
[33] E. Albuquerque, J. Fernandes and M. Silva, “NMOS current-balanced logic,” Elec
tronics Letters, 32, pp. 997-998, May 1996.
[34] S. Sadeghi-Emamchaie, Novel Cellular Neural Networks for Low-noise Digital
Arithmetic Architectures, Ph.D. Thesis, University of Windsor, 1999.
[35] S. Sadeghi-Emamchaie, G. A. Jullien, and W. C. Miller, “Very low-noise (switching
free) CNN-based adder,” SPIE Proc. Advanced Signal Processing Algorithms,
Architectures, and Implementations IX, 3807, (7), 1999.

R EFEREN CES

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

112

U niversity o f W in d so r

[36] A.J. Acosta, P. Parra, J. Juan, M. Valencia, and R. Jimenez, "Reduction of switching
noise in low-power CMOS digital circuits by gate-level optimization,” International
Workshop on Logic and Synthesis, pp. 101-106. June 2001.
[37] A.J. Acosta, R. Jimenez, J., M. J. Juan, Bellido, and M. Valencia, “Influence of
clocking strategies on the design of low switching-noise digital and mixed-signal
VLSI circuits,” in 10th PATMOS, pp. 316-326, Sept. 2000.
[38] L. O. Chua and L. Yang, “Cellular neural networks: Theory,” IEEE Trans. Circuits
& Systems, vol. CAS-35, pp. 1257-1272, Oct. 1988.
[39] L. O. Chua and L. Yang, “Cellular neural networks: Applications,” IEEE Trans. Cir
cuits & Systems, vol.CAS-35, pp. 1273-1289, Oct. 1988.
[40] L. O. Chua and T. Roska, “The CNN paradigm,” IEEE Trans. Circuits & Systems,
vol. CAS-40, pp.147-155, March 1993.
[41] T. Roska and L. O. Chua, “The CNN universal machine: An analogic array com
puter,” IEEE Trans. Circuits & Systems, vol. CAS-40, pp. 163-172, March 1993.
[42] T. Roska and A.Rodriguez-Vazquez (Editors), Towards the Visual Microprocessor,
John Wiley & Sons Ltd., 2000.
[43] L.O. Chua and T. Roska, Cellular Neural Networks and Visual Computing: Founda
tions and Applications, Cambridge University Press, 2002, ISBN 0 521 65247 2.
[44] R. Kunz, R. Tetzlaff and D. Wolf SCNN, “A universal simulator for cellular neural
networks,” Proc. IEEE CNNA 96, Sevilla, pp. 255-260, 1996.
[45] L. Bertucco and N& G., “A multi-layer cellular neural network simulator for image
processing applications”, Proc. the Third Int. ICSC Symposia on Intelligent Indus
trial Automation IN99 and Soft Computing SOCO’99, pp. 147-151,1999.
[46] G. De Sandre and A. Premoli, “Piecewise-exponential approximation for fast timedomain simulation of 2-D cellular neural networks,” IEEE Trans. Circuits and Sys
tems II: Express Briefs, vol. 51, no. 8 , pp. 400-405,2004.
[47] T. Kacprzak and K. Slot, “Multiple-input OTA based circuit for cellular neural net
work implementation in VLSI CMOS technology,” Proc. IEEE CNNA-92, pp. 157162,1992.
[48] S. Espejo, A. Rodriguez-Vazquez, R. Dominguez-Castro, and J. L. Huertas,
“Switched-current techniques for image processing cellular neural networks in MOS
VLSI,” Proc. IEEE Int. Symp. Circuits & Systems, pp. 1537-1540,1992.

R EFER EN C ES

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

113

U niversity o f W in d so r

[49] M. Sindhwani, T. Srikanthan, and K.V. Asari, “VLSI efficient discrete-time cellular
neural network processor,” IEE Proc. Circuits, Devices and Systems, vol. 149. no. 3,
pp. 167-171,2002.
[50] A. Rodriguez-Vazquez, S. Espejo, R. Dominguez-Castro, J. L. Huertas, and E.
Sanchez-Sinencio, “Current-mode techniques for the implementation of continuous
and discrete-time cellular neural networks,” IEEE Trans. Circuits and Systems II,
vol. 40, no. 3, pp. 132-146,1993.
[51] J. E. Varrientos, E. Sanchez-Sinencio, and J. Ramirez-Angulo, “A current-mode cel
lular neural network implementation,” IEEE Trans. Circuits & Systems, Pt. II, vol.
CAS-40, pp. 147-155, March 1993.
[52] M. Gilli, F. Corinto, and P.P. Civalleri, “Design and synthesis methods for cellular
neural networks,” Proc. Int. Joint Conf. on Neural Networks, vol. 2, pp. 1486-1491,
2003.
[53] P. Kinget and M. Steyaert, “A programmable analog cellular neural network CMOS
chip for high speed image processing,” IEEE Journal of Solid-State Circuits, vol. 30,
no. 3, pp. 235-243, March 1995.
[54] Wen-Cheng Yen, Rong-Jian Chen, and Jui-Lin Lai “Design of MIN/MAX cellular
neural networks (MMCNNS) in CMOS technology,” Proc. of the 2002 7th IEEE
International Workshop on Cellular Neural Networks and Their Applications,
CNNA '02, pp. 339-346,2002.
[55] T. Matsumoto, L. O. Chua, and H. Suzuki, “CNN cloning template: Connected com
ponent detector,” IEEE Trans. Circuits and Systems, vol. 37, no. 5, pp. 633-635,
May 1990.
[56] T. Matsumoto, L. O. Chua, and R. Furukawa, “CNN cloning template: Hole-filler,”
IEEE Trans. Circuits and Systems, vol. 37, no. 5, pp. 635-638, May 1990.
[57] H. Aomori, T. Otake, N. Takahashi, and M. Tanaka, “Lossless image coding based
on lifting wavelet using discrete-time cellular neural network with multi-templates,”
Proc. the 2004 Int. Symposium on Circuits and Systems, ISCAS '04, vol. 3, pp. III101-4,2004.
[58] D. Feiden and R. Tetzlaff, “Binary image coding using cellular neural networks,”
Proc. the Int. Joint Conf. on Neural Networks, vol. 2 , pp. 1149- 1152,2003.
[59] M. Tanaka, Y. Tanji, M. Onishi, and N. Takahashi “Lossless image compression and
reconstruction by cellular neural networks,” Proc. the 2000 6th IEEE Int. Workshop
on Cellular Neural Networks and Their Applications, CNNA '00, pp.57-62,2000.

REFERENCES

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

114

U niversity o f W in d so r

[60] Chang Wen Chen, Lulin Chen, and Jiebo Luo, “A cellular neural network for clus
tering-based adaptive quantization in sub-band video compression," IEEE Trans.
Circuits and Systems for Video Technology, vol. 6 , no. 6, pp. 688-692,1996.
[61] G. Costantini, D. Casali, and R. Perfetti, “Cellular neural network template for
rotation o f grey-scale images,” Electronics Letters, vol. 39, no. 25, pp. 1803-1805.
2003.
[62] Q. Gao, P. Messmer, and G.S Moschytz, “Binary image rotation using cellular neu
ral networks,” IEEE Int. Symp. on Circuits and Systems, ISCAS '02, vol. 3, pp. III113—III-l 16,2002.
[63] J. Kowalski, “0.8 pm CMOS implementation of weighted-order statistic image filter
based on cellular neural network architecture,” IEEE Trans. Neural Networks, vol.
14, no 5, pp. 1366-1374,2003.
[64] V.V. Khryasshyov, A.L. Priorov, E.Yu. Sautov, and E.A. Sokolenko, “Digital
image filtration on cellular neural network,” IEEE Int. Conf. on Artificial Intelli
gence Systems, ICAIS '02, pp 248-251,2002.
[65] J. Kowalski and T. Kacprzak, “Cellular neural network based weighted median filter
for real time image processing,” Proc. Int. Conf. on Image Processing, vol. 1, pp.
545-548,2001.
[66] M.A.J. Moran and J.A.F. Munoz, “Applications of cellular neural networks (CNN)
to grey scale image filtering,” Ninth Int. Conf. on Artificial Neural Networks,
ICANN '99, vol. 1, pp. 449-454, 1999.
[67] P. Ecimovic and J. Wu, “Delay-driven contrast enhancement using a cellular neural
network with state-dependent delay,” Proc. 7th IEEE Int. Workshop on Cellular
Neural Networks and Their Applications, CNNA '02, pp. 202-208, 2002.
[68] T. Hammadou and A. Bouzerdoum, “Novel image enhancement technique using
shunting inhibitory cellular neural networks,” IEEE Trans. Consumer Electronics,
vol. 47, no. 4, pp. 934-940,2001.
[69] M. Brendel and T. Roska, “Adaptive image sensing and enhancement using the
adaptive cellular neural network Universal Machine,” Proc. 6th IEEE Int. Workshop
on Cellular Neural Networks and Their Applications, CNNA '00, pp. 93-98,2000.
[70] S. Itakura, Y. Tanji, T. Otake, and M. Tanaka, “Progressive image reconstruction via
cellular neural networks,” IEEE Int. Symp. on Circuits and Systems, ISCAS '02, vol.
l,p p . 1-233 -1-236,2002.
[71] M.E. Celebi and C. Guzelis, “Image restoration using cellular neural network,” Elec
tronics Letters, vol. 33, no. 1, pp. 43-45,1997.

REFEREN CES

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

115

U niversity o f W in d so r

[72] P.A. Stubberud and A.R. Stubberud, "Text image restoration using cellular neural
networks,” Proc. IEEE Int. Symp. on Circuits and Systems, ISCAS '97, vol. 1. pp.
749-752,1997.
[73] P.R. Giaccone, D. Tsaptsinos, and G.A. Jones, “Foreground-background segmenta
tion by cellular neural networks,” Proc. 15th Int. Conf. on Pattern Recognition, vol.
2 , pp. 438-441,2000.
[74] D.L. Vilarino, D. Cabello, M. Balsi, and V.M. Brea, “Image segmentation based on
active contours using discrete time cellular neural networks,” Proc. Fifth IEEE Int.
Workshop on Cellular Neural Networks and Their Applications, pp. 331-336,1998.
[75] T. Sziranyi and J. Zerubia, “Markov random field image segmentation using cellu
lar neural network,” IEEE Trans. Circuits and Systems I: Fundamental Theory and
Applications, vol. 44, no. 1, pp. 86-89,1997.
[76] Chin-Teng Lin, Shi-An Chen, Chao-Hui Huang, and Jen-Feng Chung, “Cellular
neural networks and PC A neural networks based rotation/scale invariant texture
classification.” Proc. IEEE Int. Joint Conf. on Neural Networks, vol. 1, pp. 153-158,
2004.
[77] R. Schonmeyer, D.F. Feiden, and R. Tetzlaff, “On-chip template training for pattern
matching by cellular neural network universal machines (CNN-UM),” Proc. Int.
Symp. on Circuits and Systems, ISCAS '03, vol. 3, pp. III-514 - III-517, 2003.
[78] G. Grassi and E. Di Sciascio, “A new learning algorithm for pattern classification
using cellular neural networks,” IEEE Int. Symp. on Circuits and Systems, ISCAS
'01, vol. 3, pp. 652-655, 2001.
[79] N. Stanic, M. Potrebic, D. Durdevic, D. Dujkovic, and P. Kostic, “Character recog
nition using a cellular neural network,” 2002 6th Seminar on Neural Network Appli
cations in Electrical Engineering, NEUREL '02, pp. 135-138,2002.
[80] V. Tavsanoglu and E. Saatci, “Feature extraction for character recognition using
Gabor-type filters implemented by cellular neural networks,” Proc. 6th IEEE Int.
Workshop on Cellular Neural Networks and Their Applications, CNNA '00, pp. 6368 , 2000 .
[81] R.H. Tsai, B.J. Sheu, M.Y. Wang, and S.H. Jen, “Two-dimensional cellular neural
networks for pre-processing in face recognition and digital library search,” Proc.
IEEE Int. Symp. on Circuits and Systems, ISCAS '97, vol. 1, pp. 733-736,1997.
[82] D. Feiden and R. Tetzlaff, “Obstacle detection in planar worlds using cellular neural
networks,” Proc. 7th IEEE Int. Workshop on Cellular Neural Networks and Their
Applications, CNNA '02, pp. 383-390,2002.

R EFEREN CES

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

116

U niversity o f W in d so r

[83] A. Zanela and S. Taraglio, “A cellular neural network stereo vision system for
autonomous robot navigation.” Proc. 6th IEEE Int. Workshop on Cellular Neural
Networks and Their Applications, CNNA '00, pp. 117-122, 2000.
[84] M. Kanaya and M. Tanaka, "Robot multi-driving controls by cellular neural net
works,” Proc. Third IEEE Int. Workshop on Cellular Neural Networks and their
Applications, CNNA '94, pp. 481-486,1994.
[85] G. Grassi and L.A. Grieco, “Object-oriented image analysis via analog CNN algo
rithms - part I: Motion estimation,” Proc. 7th IEEE Int. Workshop on Cellular Neu
ral Networks and their Applications, CNNA '02,2002.
[86] M. Balsi, “Regularization-based continuous-time motion detection by single-layer
cellular neural networks,” Proc. 6th IEEE Int. Workshop on Cellular Neural Net
works and their Applications, CNNA '00, pp. 135-140,2000.
[87] M.G. Milanova, A.C. Campilho, and M.V. Correia, “Cellular neural networks for
motion estimation,” Proc. 15th Int. Conf. on Pattern Recognition, vol. 3, pp. 819822,2000.
[88] K. Kondo, H. Morishita, Y. Konishi, and H. Ishigaki, “Design of two-stage cellular
neural network filter for detecting particular moving objects,” Proc. Fifth Int. Symp.
on Signal Processing and Its Applications, ISSPA '99, vol. 2, pp. 665-668 , 1999.
[89] V. Preciado, D. Guinea, R. Montufar, and J. Vicente, “Real-time inspection of metal
laminates by means of CNNs,” Proc. SPIE, vol. 4301, no. 39, pp. 260-270,2001.
[90] D. Guinea, A. Gordaliza, J. Vicente, and M. C. Garca-Alegre, “CNN based visual
processing for industrial inspection,” Proc. SPIE, vol. 3966, no. 45, pp. 315-322,
2000 .
[91] C.L. Chang and C.T. Lin, “CNN-based defect inspection in images with regular pat
tern,” Proc. 16th Eur. Conf. Circuit Theory and Design, ECCTD?03, pp. 1221-1224,
2003.
[92] R. Perfetti and L. Terzoli, “Analogic CNN algorithms for textile applications,” Int. J.
Circuit Theory Applications, no. 28, pp. 77-85,2000.
[93] R. Fantacci, R. Gubellini, D. Tarchi, and T. Pecorella, “DiffServ on-board satellite
switching based on cellular neural networks,” IEEE Int. Conf. on Communications,
vol. 7, pp. 3953-3957,2004.
[94] R. Fantacci, M. Forti, M. Marini, and L. Pancani, “Cellular neural network approach
to a class of communication problems,” IEEE Trans. Circuits Syst. I, vol. 46, pp.
1457-1467,1999.

R EFEREN CES

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

117

U niversity o f W in d so r

[95] A. Fancsali and J. Levendovszky, "CNN based real-time call admission control in
packet-switched networks,” in Proc. Polish-Czech-Hungarian Workshop on Circuit
Theory, Signal Processing, and Telecommunication Networks. Budapest, Hungary,
2001 .
[96] J. Levendovszky and A. Fancsali, "Real-time call admission control for packetswitched networking by cellular neural networks,” IEEE Trans. Circuits and Sys
tems I: Regular Papers, vol. 51, no. 6, pp. 1172-1183,2004.
[97] Zhang Yifeng and He Zhengya, “A secure communication scheme based on cellular
neural network,” IEEE Int. Conf. on Intelligent Processing Systems, ICIPS '97, vol.
l,p p . 521-524,1997.
[98] F. Gollas, C. Niederhofer, and R. Tetzlaff, “Prediction of brain electrical activity in
epilepsy using a higher-dimensional prediction algorithm for discrete time cellular
neural networks (DTCNN),” Proc. Int. Symp. on Circuits and Systems, ISCAS '04,
vol. 5, pp. V-720 - V-723 , May 2004.
[99] M. Laiho, A. Paasio, A. Kananen, and K. Halonen, “A mixed mode polynomial-type
CNN for analyzing brain electrical activity in epilepsy,” in International Journal of
Circuit Theory and Applications, pp. 165-180,2002.
[100]R. Tetzlaff, C. Niederhofer, and P. Fischer, “Feature extraction in epilepsy using a
cellular neural network based device - first results,” Proc. Int. Symp. on Circuits and
Systems, ISCAS '03, vol. 3, pp.III-850 - III-853,2003.
[101 ] J. Vandewalle, B. Preneel, and M. Csapodi, “Data security issues, cryptographic
protection methods, and the use of cellular neural networks and cellular automata,”
Proc. Fifth IEEE Int. Workshop on Cellular Neural Networks and Their Applica
tions, pp. 39-44, 1998.
[102]F. Werblin, A. Jacobes, and J. Teeters, “The computational eye,” IEEE Spectrum,
pp. 30-37, May 1996.
[103]F.S. Werblin and A. Jacobs, “The cellular neural network as a retinal camera for
visual prosthesis,” Int. Conf. on Neural Networks, vol. 4 , pp. 2327-2332, 1997.
[104] I. Krstic, B. Reljin, and P. Kostic, “Cellular neural network to model and solve direct
non-linear problems o f steady-state heat transfer,” Int. Conf. on Trends in Commu
nications, EUROCON '01, Vol. 2, pp. 420-423, 2001.
[105]T. Nakaguchi, K. Omiya, and M. Tanaka, “Hysteresis cellular neural networks for
solving combinatorial optimization problems,” Proc. 7th IEEE Int. Workshop on
Cellular Neural Networks and Their Applications, CNNA '02, pp. 539-546,2002.

REFEREN CES

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

118

U n iversity o f W in d so r

[106]N.K. Al-Ani and T. Kacprzak, “Application of time-varying cellular neural network
for optimal solutions,” Proc. 6th IEEE Int. Workshop on Cellular Neural Networks
and Their Applications, CNNA '00, pp. 235-240,2000.
[107] A. Rasmussen and M.E. Zaghloul, “CMOS analog implementation of cellular neural
network to solve partial differential equations with a microelectromechanical ther
mal interface,” Proc. 40th Midwest Symp. on Circuits and Systems, vol. 2 , pp.
1326-1329,1997.
[108JM. Leong, P. Vasconcelos, J.R. Fernandes, and L. Sousa, “A programmable cellular
neural network circuit,” 17th Symp. on Integrated Circuits and Systems Design
SBCCI2004, pp. 186-191,2004.
[109] B.J. Sheu, S.H. Bang, and W.C. Fang, “Optimal solutions of selected cellular neural
network applications by the hardware annealing method,” Proc. IEEE CNNA-94,
pp. 279-284,1994.
[110] B.J. Sheu, S.H. Bang, and W.C. Fang, “Analog VLSI design of cellular neural net
works with annealing ability,” Proc. IEEE CNNA-94, pp. 387-391, 1994.
[111]C.C. Lee and J. Pineda de Gyvez, “Color image processing in a cellular neural net
work environment,” IEEE Trans. Neural Networks, vol. 7, no. 5, pp. 1086-1098,
Sept. 1996.
[112]L. Wang, J. Gyvez, and E. Sanchez-Sinencio, “Time multiplexed color image pro
cessing based on a CNN with cell-state outputs,” IEEE Trans. VLSI Systems, vol. 6,
no. 2, pp. 314-322, June 1998.
[113]H.N. Cheung, A. Bouzerdoum, and W. Newland, “Properties of shunting inhibitory
cellular neural networks for color image enhancement,” Proc. 6th Int. Conf. on Neu
ral Information Processing, ICONIP '99, vol. 3 , pp. 1219-1223,1999.
[114]L.O. Chua and T. Roska, "The CNN Universal Machine part 1: The architecture,”
Int. Workshop on Cellular Neural Networks and their Applications (CNNA), pp. 110, 1992.
[115]M. Gilli, T. Roska, L.O. Chua and P.P. Civalleri, “CNN dynamics represents a
broader class than PDEs,” Int. Journal of Bifurcation and Chaos, vol. 12, pp. 20512068,2002.
[116]M. Gilli, P. Checco, and F. Corinto,“A spectral technique for the analysis of nonlin
ear dynamic arrays,” IEEE Eleventh Int. Workshop on Nonlinear Dynamics of Elec
tronics Systems, pp. 85-88, May 2003.
[117]M. Di Marco, M. Forti, and A. Tesi, “Rich dynamics in weakly-coupled full-range
cellular neural networks,” Proc. Int. Symp. on Circuits and Systems, ISCAS '04,
vol 3, pp. Ill - 41-4, May 2004.
REFEREN CES

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

119

U niversity o f W in d so r

[118]M. Gilli and F. Corinto, “On dynamic behavior of weakly connected cellular neural
networks,” Proc. Int. Symp. on Circuits and Systems, ISCAS '04, vol. 5, pp. V-489 V-492, May 2004.
[119] M. Biey, M. Gilli and P. Checco, “Complex dynamic phenomena in space-invariant
cellular neural networks,” IEEE Trans. Circuits and Systems I: Fundamental Theory
and Applications, vol. 49, no. 3, pp. 340-345, March 2002.
[120]Zhigang Zeng, Jun Wang, and Xiaoxin Liao, “Stability analysis of delayed cellular
neural networks described using cloning templates,” IEEE Trans. Circuits and Sys
tems I: Regular Papers, vol. 51, no. 11, pp. 2313-2324, Nov. 2004.
[121] H. Jiang, Z. Li, and Z. Teng, “Boundedness and stability for non autonomous cellu
lar neural networks with delay,” Phys. Lett. A, vol. 306, pp. 313-325,2003.
[122] V. Singh, “Robust stability of cellular neural networks with delay: linear matrix ine
quality approach,” IEE Proc. Control Theory and Applications, vol. 151, no. 1, pp.
125-129,2004.
[123] H. Hara, N. Takahashi, and T. Nishi, “Necessary and sufficient conditions for one
dimensional discrete-time binary cellular neural network with non symmetric con
nections to be stable,” The 47th Midwest Symp. on Circuits and Systems, MWSCAS
'04, vol. 1, pp. 1-389 -1-392,2004.
[124]Xuemei Li, Lihong Huang, and Jianhong Wu, “A new method of Lyapunov func
tionals for delayed cellular neural networks,” IEEE Trans. Circuits and Systems I:
Regular Papers, vol. 51, no. 11, pp. 2263-2270, Nov. 2004.
[125] J. J. Yeboah Jr., Y. Ibrahim, G. A. Jullien, J. W. Haslett, “Ultra low-noise arithmetic
using cellular neural networks,” Micronet R & D Wkshp, pp. 105-106,2004.
[126]Z.Galias, “Designing cellular neural networks for the evaluation of local Boolean
functions,” IEEE Trans. Circuits and Systems-II, vol. 40, pp. 219-223,1993.
[127] K. Bult and H. Wallinga, “A class o f analog CMOS circuits based on the square-law
characteristic of a MOS transistor in saturation,” IEEE J. Solid-State Circuits, vol.
22, no. 3, pp. 357-365, March 1995.
[128] P. Kinget, and M. Steyaert, Analog VLSI Integration of Massive Parallel Processing
Systems, Kluwer Academic Publishers. 1997.
[129]Chichyang Chen and Rui-Lin Chen, “Performance-improved computation of very
large word-length LNS addition/subtraction using signed-digit arithmetic,” Proc.
IEEE Int. Conf. on Application-Specific Systems, Architectures, and Processors, pp.
337-347,2003.

REFEREN CES

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

120

U niv ersity o f W in d so r

[130]C. Chen, L.A. Chen, and J.R. Cheng, "Architectural design of a fast floating-point
multiplication-add fused unit using signed-digit addition,” IEE Proc. Computers and
Digital Techniques, vol. 149, no. 4, pp. 113-120, 2002.
[131] Wen-Chang Yeh and Chein-Wei Jen, “A high performance carry-save to signed
digit recoder for fused addition-multiplication,” Proc. IEEE Int. Conf. on Acoustics,
Speech, and Signal Processing, ICASSP '00, vol. 6, pp. 3259-3262,2000.
[132]K. W. Shin and Bang-Sup Song, “A complex multiplier architecture based on redun
dant binary arithmetic,” 1997 IEEE ISCS, Jun. 1997.
[133] A.K. Cherri and M. Hamad, "Algorithms for optoelectronics implementation of trig
onometric functions based on modified signed-digit numbers,” The 14th Int. Conf.
on Microelectronics, ICM '02, pp. 109-113, 2002.
[134]M.D. Ercegovac and T. Lang, “Simple radix-4 division with operands scaling,”
IEEE Trans. Computers, vol. 39, no. 9, pp. 1,204-1,208, Sept. 1990.
[135]L. Ciminiera and P. Montuschi, “High radix square rooting,” IEEE Trans. Comput
ers, vol. 39, no. 10, pp. 1,220-1,231, Oct. 1990.
[136] W.G. Natter and B. Nowrouzian, “A novel algorithm for signed-digit online multiply-accumulate operation and its purely signed-binary hardware implementation,”
Proc. IEEE Int. Symp. on Circuits and Systems, ISCAS '00, vol. 5, pp. 329-332,
2000 .
[137] M. Kameyama, T. Seikibe, and T. Higuchi, “Highly parallel residue arithmetic chip
based on multiple-valued bidirectional current-mode logic,” IEEE J. Solid State Cir
cuits, vol. 24, pp. 1,404-1,411, Oct. 1989.
[138]Shugang Wei and K. Shimizu, “Residue arithmetic circuits using a signed-digit
number representation,” Proc. IEEE Int. Symp. on Circuits and Systems, ISCAS '00,
vol. 1, pp. 24-27, 2000.
[139] A.G. Dempster and M.D. Macleod, “Using all signed-digit representations to design
single integer multipliers using subexpression elimination,” Proc. IntSymp. on Cir
cuits and Systems, ISCAS '04, vol. 3, pp. III-165-8,2004.
[140] Wu Huapeng and M.A. Hasan, “Closed-form expression for the average weight of
signed-digit representations,” IEEE Trans. Computers, vol. 48, no. 8, pp. 848-851,
1999.
[141] D. Lo lacono and M. Ronchi, “Binary canonic signed digit multiplier for high-speed
digital signal processing,” The 47th Midwest Symposium on Circuits and Systems,
MWSCAS '04, vol. 2, pp. II-205-8,2004.

REFEREN CES

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

121

U niv ersity o f W in d so r

[142] Li Liang, M. Ahmadi, M. Sid-Ahmed, and K. Wallus, “Design of canonical signed
digit HR filters using genetic algorithm,” The Thrity-Seventh Asilomar Conf. on
Signals, Systems & Computers, vol. 2, pp. 2043-2047,2003.
[143]M. Tanaka, and A. Nishihara, “Design of signal word decomposed filters with
canonical-signed digit coefficients,” Proc. TENCON 2000, vol. 1, pp. 482-486,
2000 .
[144] Y.M. Hasan, L.J. Karam, M. Falkinburg, A. Helwig, and M. Ronning, “Canonic
signed digit Chebyshev FIR filter design.” IEEE Signal Processing Letters, vol. 8.
no. 6, pp. 167-169,2001.
[145]M. Martinez-peiro, E. I. Boemo, and L. Wanhammar, “Design of high-Speed multiplierless filters using a nonrecursive signed common subexpression algorithm,”
IEEE Trans. Circuit Syst., vol. 49, no. 3, pp. 196-203,2002.
[146]Fei Xu, Chip-Hong Chang, and Ching-Chuen Jong, “HWP: a new insight into
canonical signed digit,” Proc. Int. Symp. on Circuits and Systems, ISCAS '04, vol. 5,
pp. V-201-204,2004.
[147]In-Cheol Park and Hyeong-Ju Kang, “Digital filter synthesis based on an algorithm
to generate all minimal signed digit representations,” IEEE Trans. CAD of ICs, vol.
12, no. 12, pp. 1525-1529,2002.
[148] A.G. Dempster and M.D. Macleod, “Digital filter design using subexpression elimi
nation and all signed-digit representations,” Proc. Int. Symp. on Circuits and Sys
tems, ISCAS '04, vol. 3, pp. Ill-169-72,2004.
[149] M. Kosunen, and K. Halonen, “A programmable FIR filter using serial-in-time mul
tiplication and canonic signed digit coefficients,” The 7th IEEE Int. Conf. on Elec
tronics, Circuits and Systems, ICECS ’00, vol. 1, pp. 563-566,2000.
[150]K. Khoo, A. Kwentus, and A. Jr. Willson, “Programmable FIR digital filter using
CSD coefficients,” IEEE JSSC, vol. 31, no. 6, pp. 869-874,1996.
[151]W. Oh, and Y. Lee, “Implementation of programmable multiplierless FIR filters
with powers-of-two coefficients,” IEEE Trans. CAS-11, vol. 42, no. 8, pp. 553-556,
1995.
[152] J. A. Solinas, “Low-weight binary representations for pairs of integers.” Technical
Report CORR 2001-41, Center for Applied Cryptographic Research, University of
Waterloo, Canada, 2001.
[153] J. Proos, “Joint sparse forms and generating zero columns when combing,” Techni
cal Report CORR 2003-23, Center for Applied Cryptographic Research, University
of Waterloo, Canada, 2003.

R EFEREN CES

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

122

U n iversity o f W in d so r

[154]R. Katti, and Xiaoyu Ruan, “Left-to-right binary signed-digit recoding for elliptic
curve cryptography,” Proc. Int. Symp. on Circuits and Systems, ISCAS '04. vol. 2,
pp. II-365-8,2004!
[155]N. Takagi, "Multiple-valued-digit number representations in arithmetic circuit algo
rithms,” in Proc. 32th IEEE Int. Symp. Multiple-Valued Logic, pp. 224-235. May
2002 .

[156] B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs, Oxford,
U.K., and New York: Oxford Univ. Press, 2000.
[157] A. Azivienis, "Signed-digit number representations for fast parallel arithmetic”, in
IRE Trans. Elect. Comp., E C -10, pp. 389-400,1961.
[158] I. Choo and R.G. Deshmukh, "A novel fast parallel signed-digit hybrid multiplica
tion scheme for digital systems,” Canadian Conf. on Electrical and Computer Engi
neering, vol. 2, pp. 630-635, 2000.
[159] H. Fukuda, “Signed digit CMOS (SD-CMOS) logic circuits with static operation,”
Proc. 34th Int. Symp. on Multiple-Valued Logic, ISMVL '04, pp. 128-34,2004.
[160]T. Hanyu, A. Mochizuki, and M. Kameyama, “Multiple-valued dynamic sourcecoupled logic,” in Proc. 33th Int. Symp. Multi-Valued Logic, pp. 207-212, May
2003.
[161] A.F. Gonzalez, M. Bhattacharya, S. Kulkami, and P. Mazumder, “Standard CMOS
implementation of a multiple-valued logic signed-digit adder based on negative differential-resistance devices,” Proc. 30th IEEE Int. Symp. on Multiple-Valued Logic
ISMVL '00, pp. 323-328,2000.
[162] A.F. Gonzalez and P. Mazumder, “Multiple-valued signed digit adder using negative
differential resistance devices,” IEEE Trans, on Computers, vol. 47, no. 9, pp. 947959,1998.
[163]T. Lang and J.D. Bruguera, “Multilevel reverse-carry computation for comparison
and for sign and overflow detection in addition,” Proc. Int'l Conf. Computer Design
ICCD '99, pp. 73-79,1999.
[164]T. Srikanthan, S.K. Lam, and Mishra Suman, “Area-time efficient sign detection
technique for binary signed-digit number system,” IEEE Trans, on Computers, vol.
53, no. l,pp.69-72, 2004.
[165]K. Navi, A. Kazeminejad, D. Etiemble, “Performance of CMOS current mode full
adders”, Proc. 24th Int’l. Symp. Multiple Valued Logic, pp. 27-34,1994.
[166]I. Opris, G. Kovacs, “Analogue median circuit,” Electronics Letters, 30, 17, pp.
1369-1370, 1994.
R EFEREN CES

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

123

U niversity o f W in d so r

[167] S. Kawahito, M. Kameyama, T. Higuchi, “Multiple-valued radix-2 signed-digit
arithmetic circuits for high-performance VLSI systems,” IEEE J. Solid State Cir
cuits, 25,1, pp.l 25-131,1990.
[168] A. A. Azivienis, A Study of Redundant Number Representations for Parallel Digital
Computers, Ph.D. Thesis, University of Illinois, 1960.
[169] Y. Ibrahim, G. A. Jullien, and W. C. Miller, “Ultra low noise signed digit arithmetic
using cellular neural networks,” Proc. 4th IntT Wkshp, pp, 136-142, IWSOC 2004.
[170] Y. Ibrahim, G. A. Jullien, and W. C. Miller, “Arithmetic implementation techniques
using analog cellular neural networks,” 17th IMACS World Congress on Scientific
Computation, Modelling and Applied Mathematics, IMACS '05,2005.
[171] V.S.Dimitrov, S.Sadeghi-Emamchaie, G.A.Jullien and W.C.Miller, “A near canonic
double-base number system with applications in DSP,” SPIE Conference on Signal
Processing Algorithms, vol. 2846, pp. 14-25.1996.
[172]V.S. Dimitrov and G.A. Jullien, “Loading the bases: a new number representation
with applications”, invited article in IEEE Circuits and Systems Magazine, 2nd
Quarter, pp. 6-23,2003.
[173] V.S.Dimitrov, G.A.Jullien and W.C.Miller, “An algorithm for modular exponentia
tion,” Information Processing Letters, vol. 36, No. 5, pp. 155-159, May 1998
[174] V.S.Dimitrov, G.A.Jullien and W.C.Miller, “Theory and applications of the double
base number system,” IEEE Trans, on Computers, vol. 48, No. 10, pp. 1098-1106,
Oct. 1999
[175] G. A. Jullien, V. S. Dimitrov, B. Li, W. C. Miller, A. Lee, and M. Ahmadi, “A
hybrid DBNS processor for DSP computation,” Proc. Int. IEEE Symp. Circuits and
Systems, Orlando, FL, vol. 1, pp. 5-8,1999.
[176]V. S. Dimitrov, J. Eskritt, L. Imbert, G. A. Jullien and W.C. Miller, “The use of the
multi-dimensional logarithmic number system in DSP applications”, Proc. 15th
IEEE Symp. on Computer Arithmetic, June, pp. 247-254,2001.
[177] Y. Ibrahim, G. A. Jullien, W. C. Miller, and V. S. Dimitrov, “DBNS arithmetic using
cellular neural networks,” IEEE IntT Symp. on Circuits and Systems, ISCAS '05, pp.
3914-3917,2005.
[178]Y. Ibrahim, G. A. Jullien, and W. C. Miller, “Double-base arithmetic using cellular
neural networks,” submitted to the IEEE Journal on Circuits and Systems, 2005.
[179] P. Komerup, “Computer arithmetic: exploiting redundancy in number representa
tions,” ASAP95, Strasbourg.

R EFEREN CES

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

124

U niversity o f W indsor

[180]M. Ercegovac and T. Lang. Digital Arithmetic, Morgan Kaufmann Publishers, 2003,
ISBN 1 558 60798 6
[181] S. Espejo, R. Dominguez-Castro, G. Linan, and A. Rodriguez-Vazquez, "A 64x64
CNN universal chip with analog and digital I/O,” in Proc. 5th IEEE Int. Conf. Elec
tronics, Circuits and Systems, ICECS '98, Lisbon, Portugal, pp. 203-206,1998.
[182]Hyongsuk Kim, T. Roska, H. Son, and I. Petras, “Analog addition/subtraction on the
CNN-UM chip with short-time superimposition of input signals,” IEEE Trans. Cir
cuits Syst. I, vol. 50, no. 3, pp. 429-432, Mar. 2003.

REFERENCES

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

125

Vita Auctoris

Youssef Ibrahim received the B.Sc. degree in Computer & Control
Engineering from Ain Shams University, Egypt, in 1993, the M. Eng.
degree in Computer Engineering from Cairo University, Egypt, in 1999
and the Ph. D. degree in Electrical & Computer Engineering from The
University o f Windsor, Ontario, Canada, in 2005.

In 1994, he joined the Electronics Research Institute, Egypt, as a Research Assistant. He
became a Research Associate in 1999 at the same institute. From 1995 to 2000, he held an
instructor position at the American University in Cairo.

His research interests areas are computer arithmetic, digital signal/image processing, cir
cuits and systems theory and design. In particular, he is interested in practical and theoret
ical aspects of computation structures, including parallel systems such as cellular neural
networks, with a special emphasis in the reduction of system noise. He has published sev
eral papers in these areas

Dr. Ibrahim is the recipient of many awards and scholarships including the 2003 Ontario
Graduate Scholarship, Ontario, Canada, the 1999 Award of Distinction in Higher Studies,
Egypt, and the 1989 Ministry of Education Medal, Egypt.

126

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

