Mixed-Signal VLSI Implementation of CVNS Artificial Neural Networks by Zamanlooy, Babak
University of Windsor
Scholarship at UWindsor
Electronic Theses and Dissertations
2014
Mixed-Signal VLSI Implementation of CVNS
Artificial Neural Networks
Babak Zamanlooy
University of Windsor
Follow this and additional works at: http://scholar.uwindsor.ca/etd
This online database contains the full-text of PhD dissertations and Masters’ theses of University of Windsor students from 1954 forward. These
documents are made available for personal study and research purposes only, in accordance with the Canadian Copyright Act and the Creative
Commons license—CC BY-NC-ND (Attribution, Non-Commercial, No Derivative Works). Under this license, works must always be attributed to the
copyright holder (original author), cannot be used for any commercial purposes, and may not be altered. Any other use would require the permission of
the copyright holder. Students may inquire about withdrawing their dissertation and/or thesis from this database. For additional inquiries, please
contact the repository administrator via email (scholarship@uwindsor.ca) or by telephone at 519-253-3000ext. 3208.
Recommended Citation
Zamanlooy, Babak, "Mixed-Signal VLSI Implementation of CVNS Artificial Neural Networks" (2014). Electronic Theses and
Dissertations. Paper 5097.
Mixed-Signal VLSI Implementation of CVNS
Artificial Neural Networks
by
Babak Zamanlooy
A Dissertation
Submitted to the Faculty of Graduate Studies
through the Department of Electrical and Computer Engineering
in Partial Fulfillment of the Requirements for
the Degree of Doctor of Philosophy
at the University of Windsor
Windsor, Ontario, Canada
2014
c© 2014 Babak Zamanlooy
c© 2014 Babak Zamanlooy
All Rights Reserved. No part of this document may be reproduced, stored or otherwise retained in
a retreival system or transmitted in any form, on any medium by any means without prior written
permission of the author.
Mixed-Signal VLSI Implementation of CVNS Artificial Neural Networks
by
Babak Zamanlooy
APPROVED BY:
A. Y. Yi, External Examiner
University of Michigan-Dearborn
J. Urbanic
Industrial and Manufacturing Systems Engineering
M. Ahmadi
Electrical and Computer Engineering
H. Wu
Electrical and Computer Engineering
M. Mirhassani, Advisor
Electrical and Computer Engineering
May 15, 2014
Declaration of Co-Authorship / Previous
Publication
Declaration of Co-Authorship
I hereby declare that this dissertation incorporates material that is the result of research conducted
under the supervision of my supervisor, Dr. M. Mirhassani. Results related to this research are
reported in Chapters 2 through 6.
I am aware of the University of Windsor’s Senate Policy on Authorship and I certify that I
have properly acknowledged the contributions of the other researchers to my dissertation, and I
have obtained written permission from my co-author to include the aforementioned materials in my
dissertation.
I certify that, with the above qualification, this dissertation, and the research to which it refers,
is the product of my own work.
Declaration of Previous Publication
This dissertation includes three original papers that have been previously published/submitted for
publication to peer reviewed journals. It also includes two papers that are in the process of submis-
sion to peer reviewed journals. These original papers are as follows:
iv
DECLARATION OF CO-AUTHORSHIP / PREVIOUS PUBLICATION
Dissertation Chapter Publication title/full citation Publication status
Chapter 2 B. Zamanlooy and M. Mirhassani, “CVNS-Based
sigmoid function evaluation for precise neu-
rochips,”
To be submitted
Chapter 3 B. Zamanlooy and M. Mirhassani, “CVNS
synapse multiplier for robust neurochips with on-
chip learning,” Submitted to IEEE Trans. VLSI
Syst., April 2014, Manuscript ID: TVLSI-00143-
2014.
Submitted
Chapter 4 B. Zamanlooy and M. Mirhassani, “Mixed-Signal
VLSI Neural Network Based on Continuous Val-
ued Number System,”
To be submitted
Chapter 5 B. Zamanlooy and M. Mirhassani, “Area-
efficient robust Madaline based on continuous
valued number system ,” Neurocomputing,
http://dx.doi.org/10.1016/j.neucom.2014.03.029,
2014
Accepted for publication
Chapter 6 B. Zamanlooy and M. Mirhassani, “Efficient
VLSI implementation of neural networks with
hyperbolic tangent activation function,” IEEE
Trans. VLSI Syst., vol. 22, no. 1, pp. 39–48,
January 2014.
Published
I certify that I have obtained a written permission from the copyright owner(s) to include the
above published material(s) in my dissertation. I certify that the above material describes work
completed during my registration as graduate student at the University of Windsor.
I declare that, to the best of my knowledge, my dissertation does not infringe upon anyones
copyright nor violate any proprietary rights and that any ideas, techniques, quotations, or any other
material from the work of other people included in my dissertation, published or otherwise, are fully
acknowledged in accordance with the standard referencing practices. Furthermore, to the extent that
v
DECLARATION OF CO-AUTHORSHIP / PREVIOUS PUBLICATION
I have included copyrighted material that surpasses the bounds of fair dealing within the meaning
of the Canada Copyright Act, I certify that I have obtained a written permission from the copyright
owner(s) to include such material(s) in my dissertation.
I declare that this is a true copy of my dissertation, including any final revisions, as approved by
my dissertation committee and the Graduate Studies office, and that this dissertation has not been
submitted for a higher degree to any other University or Institution.
vi
Abstract
In this work, mixed-signal implementation of Continuous Valued Number System (CVNS) neural
network is proposed. The proposed network resolves the limited signal processing precision issue
present in mixed-signal neural networks. This is realized by the CVNS addition, the CVNS multi-
plication and the CVNS sigmoid function evaluation algorithms proposed in this dissertation. The
proposed algorithms provide accurate results in low-resolution environment.
In addition, an area-efficient low sensitivity CVNS Madaline is proposed. The proposed Madaline
is more robust to input and weight errors when compared to the previously developed structures.
Moreover, its area consumption is lower.
Furthermore, a new approximation scheme for hyperbolic tangent activation function is proposed.
Using the proposed approximation scheme results in efficient implementation of digital ASIC neural
networks in terms of area, delay and power consumption.
vii
Dedication
To my family.
viii
Acknowledgments
There are several people who deserve my sincere thanks for their generous contributions to this
project. I would first like to express my sincere gratitude and appreciation to Dr. Mitra Mirhassani
for her invaluable guidance and constant support throughout the course of this work.
In addition to my advisor, I would like to thank the rest of my dissertation committee: Dr. Majid
Ahmadi and Dr. Huapeng Wu from the electrical and computer engineering department, Dr. Jill
Urbanic from the industrial and manufacturing systems engineering department, and Dr. Alex (Ya
Sha) Yi for their participation in my seminars, reviewing my dissertation, and their constructive
comments.
Also, I would like to thank Dr. Roberto Muscedere for his assistance regarding the VLSI CAD
tools and facilities used during the course of the project.
Finally, my deepest gratitude goes to my family for their love, support, and encouragement.
ix
Contents
Declaration of Co-Authorship / Previous Publication iv
Abstract vii
Dedication viii
Acknowledgments ix
List of Figures xiv
List of Tables xvi
List of Abbreviations xviii
1 Introduction 1
1.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 CVNS-Based Sigmoid Function Evaluation for Precise Neurochips 5
2.1 CVNS Addition in a Low Resolution Environment . . . . . . . . . . . . . . . . . . . 7
2.2 Selection of Number of Input and Output CVNS Digits . . . . . . . . . . . . . . . . 10
2.2.1 Selection of Number of Input CVNS Digits . . . . . . . . . . . . . . . . . . . 10
2.2.2 Selection of Number of Output CVNS Digits . . . . . . . . . . . . . . . . . . 12
2.3 Proposed CVNS Sigmoid Function Evaluation Scheme . . . . . . . . . . . . . . . . . 12
2.4 VLSI Implementation of the CVNS Sigmoid Function Evaluation . . . . . . . . . . . 14
2.4.1 Input Range Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.2 Current Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.3 Sigmoid Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.4 Output Assignment Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
x
CONTENTS
2.5 Post-layout Simulation and Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Appendix: Proof of the Proposed CVNS Function Evaluation Scheme . . . . . . . . . . . 26
CVNS Function Evaluation in the Input Range x ≥ 5 . . . . . . . . . . . . . . . . . 28
CVNS Function Evaluation in the Input Range 2.375 ≤ x < 5 . . . . . . . . . . . . . 28
CVNS Function Evaluation in the Input Range 1 ≤ x < 2.375 . . . . . . . . . . . . . 29
CVNS Function Evaluation in the Input Range 0 ≤ x < 1 . . . . . . . . . . . . . . . 30
CVNS Function Evaluation for Negative Input Values . . . . . . . . . . . . . . . . . 31
2.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3 CVNS Synapse Multiplier for Robust Neurochips with On-Chip Learning 35
3.1 Proposed CVNS Multiplication Algorithm in Low Resolution Environment . . . . . 37
3.2 VLSI Implementation of the CVNS Synapse Multiplier for Neurochips with On-Chip
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.1 VLSI implementation of mod 16 µA . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.2 VLSI implementation of
⌊
Cm−4
16µA
⌋
. . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.3 VLSI implementation of the CVNS synapse multiplier . . . . . . . . . . . . . 44
3.3 Post-Layout Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 Comparison with Previously developed CVNS multiplication algorithm . . . . . . . . 49
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4 Mixed-Signal VLSI Neural Network Based on Continuous Valued Number Sys-
tem 56
4.1 VLSI Implementation of the CVNS Neural Network . . . . . . . . . . . . . . . . . . 57
4.1.1 Input to CVNS Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1.2 Hidden Adaline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.1.3 Output Adaline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.1.4 Output to Binary Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
xi
CONTENTS
5 Area-Efficient Robust Madaline Based on Continuous Valued Number System 70
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2 Continuous Valued Number System (CVNS) . . . . . . . . . . . . . . . . . . . . . . . 73
5.3 Noise to Signal Ratio of Previous Adalines . . . . . . . . . . . . . . . . . . . . . . . . 73
5.4 Mathematical Analysis of Madaline Structures . . . . . . . . . . . . . . . . . . . . . 75
5.4.1 Lumped Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.4.2 Distributed Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.4.3 CVNS-RE Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.4.4 CVNS-DNN and CVNS-FDNN Structures . . . . . . . . . . . . . . . . . . . . 78
5.5 Proposed Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.6 Comparison of the Proposed Madaline Structure with Previous Architectures . . . . 83
5.6.1 Comparison of the Proposed Madaline Structure with Previous Structures in
the Linear Region of Stochastic Gain Function . . . . . . . . . . . . . . . . . 85
5.6.2 Comparison of the Proposed Madaline Structure with Previous Structures in
the Nonlinear Region of Stochastic Gain Function . . . . . . . . . . . . . . . 88
5.7 VLSI Implementation and Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6 Efficient VLSI Implementation of Neural Networks with Hyperbolic Tangent Ac-
tivation Function 107
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2 Proposed Approximation Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.2.1 Output Approximation in the Pass Region . . . . . . . . . . . . . . . . . . . . 110
6.2.2 Output Approximation in the Processing Region . . . . . . . . . . . . . . . . 112
6.2.3 Output Approximation in the Saturation Region . . . . . . . . . . . . . . . . 113
6.3 Selection of Number of Input and Output Bits . . . . . . . . . . . . . . . . . . . . . 113
6.3.1 Selection of Number of Input Bits . . . . . . . . . . . . . . . . . . . . . . . . 113
6.3.2 Selection of Number of Output Bits . . . . . . . . . . . . . . . . . . . . . . . 114
6.4 Determining the Boundaries for Different Regions . . . . . . . . . . . . . . . . . . . . 116
6.4.1 Pass Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.4.2 Saturation Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.4.3 Processing Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.5 Proposed Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
xii
CONTENTS
6.5.1 Hyperbolic Tangent Approximation . . . . . . . . . . . . . . . . . . . . . . . 117
6.5.2 Output Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.6 Hardware Implementation of the Hyperbolic Tangent Function and Comparison with
Existing Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.7 Neural Network Implementation Using the Proposed Structure for Hyperbolic Tan-
gent Activation Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7 Conclusions and Future Work 132
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
VITA AUCTORIS 135
xiii
List of Figures
2.1 The block diagram of Adaline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 The block diagram of the proposed CVNS-based sigmoid function evaluation structure 15
2.3 VLSI implementation of the current reference circuit used in the current generator
block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 VLSI implementation of mod 16 µA operation . . . . . . . . . . . . . . . . . . . . . 19
2.5 VLSI implementation of the output assignment unit . . . . . . . . . . . . . . . . . . 20
2.6 The simulation results of the proposed CVNS-based sigmoid activation function . . . 21
2.7 Monte Carlo simulation result of the the analog neuron developed in [27] for its
maximum output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.8 Monte Carlo simulation result of the 15 µA current . . . . . . . . . . . . . . . . . . . 26
3.1 VLSI implementation of mod16µA operation . . . . . . . . . . . . . . . . . . . . . . 42
3.2 VLSI implementation of 1 µA×
⌊
Cm−4
16 µA
⌋
operation . . . . . . . . . . . . . . . . . . . 45
3.3 Block diagram of the VLSI implementation of ((z))−17 . . . . . . . . . . . . . . . . . 46
3.4 Block diagram of the VLSI implementation of ((z))−13 . . . . . . . . . . . . . . . . . 47
3.5 VLSI implementation of ((z))−9, ((z))−5, ((z))−1, ((z))3 . . . . . . . . . . . . . . . . . . 48
3.6 Layout of the proposed CVNS synapse multiplier . . . . . . . . . . . . . . . . . . . . 49
3.7 Post-layout simulation results of the proposed CVNS synapse multiplier . . . . . . . 50
3.8 NSR of the CVNS Adaline using the proposed multiplication algorithm versus the
previously developed multiplication algorithm . . . . . . . . . . . . . . . . . . . . . . 52
4.1 The block diagram of the 2-2-1 CVNS network realizing the XOR function . . . . . . 58
4.2 VLSI implementation of the binary input to CVNS conversion, bi× 8 µA× 2i−m and
1 µA×
(
Cam−ϕ cmp 16 µA
)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3 VLSI implementation of the first layer multiplier . . . . . . . . . . . . . . . . . . . . 62
xiv
LIST OF FIGURES
4.4 Block diagram of the CVNS adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.5 VLSI implementation of the output to binary converter . . . . . . . . . . . . . . . . 65
4.6 Layout of the implemented network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.7 Layout of the chip sent for fabrication . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.8 Post-layout simulation results of the proposed CVNS network . . . . . . . . . . . . . 67
5.1 (a) Ideal stochastic gain function and its approximation (b) Approximation error of
the stochastic gain function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2 NSR flow diagram of an Adaline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3 Madaline general configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.4 Block diagram of the proposed Adaline structure . . . . . . . . . . . . . . . . . . . . 81
5.5 (a) NSR flow diagram of the proposed Adaline (b) NSR flow diagram of the proposed
CVNS-distributed Madaline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.6 Block diagram of the proposed CVNS-distributed Madaline . . . . . . . . . . . . . . 82
5.7 Normalized Neuron×NSR improvement of different architectures compared to dis-
tributed structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.8 Block diagram of the 8-bit CVNS multiplier . . . . . . . . . . . . . . . . . . . . . . . 93
5.9 VLSI implementation of the mod16µA operation . . . . . . . . . . . . . . . . . . . . 94
5.10 Simulation results of the mod16µA circuit . . . . . . . . . . . . . . . . . . . . . . . . 95
5.11 (a) VLSI implementation of the neuron (b) Output versus input of the implemented
neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.12 VLSI implementation of the the RE unit . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.13 Simulation results of the the RE circuit . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.14 Layout of the proposed Adaline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.1 Different regions of hyperbolic tangent function . . . . . . . . . . . . . . . . . . . . . 111
6.2 Hyperbolic tangent function derivative . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.3 Block diagram of the proposed structure . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.4 (a) approximation error (b) quantization error (c) total error (d) ideal and approxi-
mated output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.5 Hardware implementation of the considered example . . . . . . . . . . . . . . . . . . 123
6.6 Three layer Madaline general configuration . . . . . . . . . . . . . . . . . . . . . . . 127
6.7 Block diagram of the implemented network . . . . . . . . . . . . . . . . . . . . . . . 127
6.8 Optical input patterns and their related class . . . . . . . . . . . . . . . . . . . . . . 128
xv
List of Tables
2.1 Input range decoder truth table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Current values generated by the current generator block . . . . . . . . . . . . . . . . 18
2.3 Input values and their corresponding input and output CVNS digits . . . . . . . . . 22
2.4 Comparison of different structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1 Transistor sizes of the mod16µA circuit . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Parameters of the structure shown in Fig. 3.5 for different output CVNS digits . . . 49
3.3 Synapse multiplier input values and their corresponding output . . . . . . . . . . . . 50
4.1 Required resolution of different arithmetic units . . . . . . . . . . . . . . . . . . . . . 59
4.2 Area, delay and power consumption of the implemented network . . . . . . . . . . . 68
5.1 NSR of the previous Adalines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 Total number of neurons required for different Madaline structures and their normal-
ized value with respect to lumped structure . . . . . . . . . . . . . . . . . . . . . . . 84
5.3 NSR of different structures and their normalized value with respect to lumped struc-
ture in linear region of stochastic gain function . . . . . . . . . . . . . . . . . . . . . 86
5.4 Normalized neuron×NSR of different structures in linear region of stochastic gain
function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.5 NSR of different structures and their normalized value with respect to lumped struc-
ture in nonlinear region of stochastic gain function . . . . . . . . . . . . . . . . . . . 90
5.6 Normalized neuron×NSR of different structures in nonlinear region of stochastic gain
function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.7 Transistor sizes of the mod16µA circuit . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.8 Transistor sizes of the neuron circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
xvi
LIST OF TABLES
5.9 Transistor sizes of the RE circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.10 Area consumption of different cells required for implementation of different structures 100
5.11 Total number of different cells required for implementation of Adalines used in differ-
ent structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.12 Area consumption of different Adaline structures . . . . . . . . . . . . . . . . . . . . 102
5.13 Area consumption, NSR and area×NSR of different Madaline structures investigated
in this case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.1 Output value for different input ranges and sub-ranges . . . . . . . . . . . . . . . . . 120
6.2 Input range decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.3 Bit-level mapping for the input range 1 < x < 2 . . . . . . . . . . . . . . . . . . . . 124
6.4 Bit-level mapping for the input range 0.5 < x < 1 . . . . . . . . . . . . . . . . . . . 124
6.5 Comparison of different structures for =0.04 . . . . . . . . . . . . . . . . . . . . . . 125
6.6 Comparison of different structures for =0.02 . . . . . . . . . . . . . . . . . . . . . . 126
6.7 Comparison of network implementation . . . . . . . . . . . . . . . . . . . . . . . . . 129
xvii
List of Abbreviations
Adaline Adaptive Linear Neuron.
ADP Area×Delay×Power.
ASIC Application Specific Integrated Circuit.
CMOS Complementary Metal-Oxide-Semiconductor.
CVNS Continuous Valued Number System.
DNN Distributed Neural Network.
FDNN Fully Distributed Neural Network.
FPGA Field Programmable Gate Array.
IEEE Institute of Electrical and Electronics Engineers.
LUT LookUp Table.
Madaline Multiple Adaline.
NSR Noise-to-Signal-Ratio.
PLAN PieceWise Linear Approximation of a Nonlinear Function.
PWL PieceWise Linear.
RALUT Range Addressable Lookup Table.
RE Reverse Evolution.
TSMC Taiwan Semiconductor Manufacturing Company.
VLSI Very Large Scale Integration.
XOR Exclusive OR.
xviii
Chapter 1
Introduction
VLSI implementation of neural networks has been exploited in various applications. Examples
include pattern recognition [1], test of analog circuits [2], real-time surface discrimination [3] and
smart sensing [4].
VLSI implementation methods of neural networks may be classified as analog, digital or mixed-
signal. In the analog neural networks, both weight storage and processing are carried out using analog
circuits. In the digital implementation of a neural network, both weight storage and processing are
conducted in the digital domain.
When implemented by analog circuits, neural networks typically possess a higher energy effi-
ciency, require less area and lower number of interconnections in comparison with their identical
digital implementation. However, the resolution of analog implementations is lower and their design
and test is more challenging.
The third implementation method, mixed-signal, exploits digital registers for weight storage and
analog circuits for signal processing. This method profits from the ease of weight storage in digital
registers while capitalizing on the advantages of the analog method.
One of the main barriers in using the mixed-signal method is the limited precision of the analog
signal which is limited by the precision of the analog circuits used. The precision of the analog
circuits is referred as the environment resolution in the rest of this dissertation.
To use the benefits of the analog circuits while maintaining the accuracy high, the Continuous
Valued Number System (CVNS) can be exploited. The CVNS is an analog number system. In this
1
1. INTRODUCTION
number system, a real number is represented by a set of analog digits [5]. The information represented
by each CVNS analog digit is the same as the environment resolution. However, collectively a set of
digits can increase the analog processing precision. This makes the CVNS a suitable candidate for
implementing high precision analog and mixed-signal circuits [6–8].
The basic building blocks of an Adaline are adder, multiplier, and nonlinear activation function.
To exploit the CVNS features for implementing high precision mixed-signal neural networks, CVNS
addition, CVNS sigmoid function evaluation and CVNS multiplication algorithms are proposed in
this dissertation.
The proposed CVNS addition algorithm makes the CVNS addition in a low-resolution environ-
ment feasible. Moreover, it provides the result in CVNS format as well as binary format. This
feature is being exploited in development of the proposed CVNS sigmoid function evaluation.
The proposed CVNS sigmoid function evaluation is based on the piecewise linear approximation
and provides a high precision output. Furthermore, the proposed sigmoid function evaluation method
requires lower number of output digits for the same maximum approximation error when compared
to the state of the art. This feature in combination with using current-mode mixed-signal circuits
results in an optimal ASIC implementation of the sigmoid function.
A new CVNS multiplication algorithm for low-resolution environment is proposed with accurate
results. In the proposed CVNS multiplication algorithm, the multiplier is in binary format while the
multiplicand is in the CVNS format. This makes the proposed multiplication algorithm suitable for
mixed-signal neural network implementations. Using the proposed multiplication algorithm, VLSI
implementation of a 16×8 CVNS synapse multiplier is realized.
Using the proposed CVNS algorithms, a 2-2-1 mixed-signal CVNS network structure is proposed.
The proposed structure realizes the two input XOR function. The CVNS features have been used
to address the limited signal processing precision issue in mixed-signal networks. As a result, the
proposed network meets the signal processing resolution requirements of neural networks. The
implemented network is sent for fabrication through Canadian Microlectronics Corporation (CMC).
Attaining a low Noise-to-Signal Ratio (NSR) is one other major concern in VLSI implementation
of neural networks. The NSR is indicator of the effect of input and weight errors on the network
output. A network with lower NSR is more robust against input and weight errors.
A new area-efficient robust mixed-signal CVNS Madaline is proposed in this work. The proposed
architecture stores the weights in digital registers while the processing is carried out using CVNS.
Using digital registers for weight storage, eliminates the need for complex analog memory units
exploited in previous CVNS neural network structures [8–10]. Moreover, the proposed network
2
1. INTRODUCTION
improves upon in terms of both the NSR and the required number of neurons for a specific NSR.
In the final part of this work, efficient VLSI implementation of digital neural networks is studied.
Design and test of the digital neural networks is easier. Moreover, they can provide higher resolution
when compared to the analog implementations. However, implementation of nonlinear activation
functions in digital networks is challenging.
An approximation method for digital implementation of hyperbolic tangent activation function is
proposed in this work. The approximation is based on a mathematical analysis taking into consider-
ation the maximum allowable approximation error as design parameter. VLSI implementation of the
hyperbolic tangent activation function for maximum approximation errors of  = 0.02 and  = 0.04
is realized. Using the proposed approximation method, a 4-3-2 digital network is implemented. The
implemented network is capable of recognizing six different input patterns. Post layout simulation
results show that the proposed structure results in an efficient neural network VLSI implementation
in terms of area, delay and power consumption.
All of the circuitries in this work are designed, simulated, and laid out in 0.18µm TSMC CMOS
technology using a power supply voltage of 1.8V.
The next chapters are organized as follows. The proposed CVNS addition algorithm, CVNS
sigmoid function evaluation and its VLSI implementation are provided in Chapter 2. The pro-
posed CVNS multiplication algorithm and VLSI implementation of a 16×8 synapse multiplier are
discussed in Chapter 3. The proposed 2-2-1 mixed-signal CVNS network structure and its VLSI im-
plementation are explained in Chapter 4. The proposed area-efficient robust Madaline is discussed
in Chapter 5. Efficient VLSI implementation of digital neural networks with hyperbolic tangent
activation function is explained in Chapter 6. Finally, conclusions are drawn in Chapter 7.
3
REFERENCES
1.1 References
[1] B. Zamanlooy and M. Mirhassani, “Efficient VLSI implementation of neural networks with hy-
perbolic tangent activation function,” IEEE Trans. VLSI Syst., vol. 22, no. 1, pp. 39–48, January
2014.
[2] D. Maliuk, H.-G. Stratigopoulos, and Y. Makris, “An analog VLSI multilayer perceptron and
its application towards built-in self-test in analog circuits,” in 2010 IEEE 16th International
On-Line Testing Symposium (IOLTS) , Jul. 2010, pp. 71–76.
[3] L. Gatet, H. Tap-Beteille, and M. Lescure, “Analog neural network implementation for a real-
time surface classification application,” IEEE Sensors J., vol. 8, no. 8, pp. 1413–1421, Aug.
2008.
[4] G. Zatorre, N. Medrano, M. Sanz, B. Calvo, P. Martinez, and S. Celma, “Designing adaptive
conditioning electronics for smart sensing,” IEEE Sensors J., vol. 10, no. 4, pp. 831–838, Apr.
2010.
[5] A. Saed, M. Ahmadi, and G. Jullien, “A number system with continuous valued digits and
modulo arithmetic,” IEEE Trans. Comput., vol. 51, no. 11, pp. 1294–1305, Nov. 2002.
[6] G. Khodabandehloo, M. Mirhassani, and M. Ahmadi, , “CVNS-based storage and refreshing
scheme for a multi-valued dynamic memory,” IEEE Trans. VLSI Syst., vol. 19, no. 8, pp. 1517–
1521, Aug. 2011.
[7] ——, “16-level CVNS memory with fast ADC,” Electronics Letters, vol. 45, no. 16, pp. 822–824,
2009.
[8] ——,“A prototype CVNS distributed neural network using synapse-neuron modules,” IEEE
Trans. Circuits Syst. I, vol. 59, no. 7, pp. 1482–1490, July 2012.
[9] ——, “Resistive-type CVNS distributed neural networks with improved noise-to-signal ratio,”
IEEE Trans. Circuits Syst. II, vol. 57, no. 10, pp. 793–797, Oct. 2010.
[10] M. Mirhassani, M. Ahmadi, and G. Jullien, “Robust low-sensitivity adaline neuron based on
continuous valued number system,” Analog Integrated Circuits and Signal Processing, vol. 56,
pp. 223–231, 2008.
4
Chapter 2
CVNS-Based Sigmoid Function
Evaluation for Precise Neurochips
Hardware implementation of neural networks has been used in a wide range of analog and digital
signal processing applications [1–5].
Although different activation functions can be used [6],for the networks trained by the back-
propagation algorithm, sigmoid and hyperbolic tangent are widely used.
The sigmoid function, S(x ), is an S-shaped function which its output is in the (0,1) range,
evaluated using the following equation:
S(x) =
1
1 + e−x
(2.1)
The exponentiation and division terms present make the hardware implementation a challenging
task. Approximation methods [7–22] are used in order to overcome the problems associated with
the direct realization of the sigmoid activation function.
Neurochips with on-chip learning need the sigmoid activation function to have an input range of
(-8,8), with at least 8-bit output precision and with a maximum approximation error of 0.02 [23–26].
Digital neurons can provide the high resolution required in the neural network. The approxi-
mation methods used for digital sigmoid function evaluation may be classified as PieceWise Lin-
ear (PWL) approximation [7–13], piecewise nonlinear approximation [14–17], LookUp Table (LUT)
[18], bit-level mapping [19] and hybrid methods [20–22].
5
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
Generally, in PWL methods, the input range is divided to different segments using linear ap-
proximation in each segment [7–13]. PWL-based hardware implementations in [7–10] require several
multipliers which results in high area consumption and delay. In multiplierless structures [11–13],
linear approximation coefficients are powers of two. Therefore, multipliers have been replaced with
shift registers, which in turn decreases the area and delay significantly.
Piecewise nonlinear approximation [14–17] is similar to the PWL-based methods. The main
difference is that a nonlinear approximation is used in each segment. This method also requires
several multipliers which tend to have high area consumption and delay.
In the LUT-based method [18], the input range is divided to equal sub-ranges and the output
corresponding to each different input range is stored in LUT. Generally, the amount of memory
required for LUT-based method increases exponentially as the maximum approximation error de-
creases. Considering the low approximation error required for on-chip neurochips, this method is
impractical for such an area limited application.
In [19], the bit-level mapping method is implemented using purely combinational circuits. How-
ever, its input and output resolution is less than the resolution required for neurochips with on-chip
learning.
Hybrid methods [20–22] exploit a combination of previously mentioned methods to implement
the hyperbolic tangent activation function, which is slightly different from the sigmoid function. The
structures developed in [20] and [21] use a combination of PWL and LUT-based methods while the
structure developed in [22] is based on the PWL in combination with bit-level mapping. However,
as far as we know, there is no recently developed hybrid structure for sigmoid activation function
implementation.
The digital neuron methods described above provide the resolution required at the cost of more
area and power consumption. Analog neurons have lower area consumption [27–30] when compared
to the digital neurons. However, their precision is limited. Therefore, analog neurons in general
cannot meet these requirements. To use the advantages of analog circuits while keeping the accuracy
high, alternative arithmetic can be employed.
The Continuous Valued Number System (CVNS) [31] is a candidate for such application. The
CVNS is an analog number system with multiple analog digits which is suitable for implementing high
precision analog and mixed-signal circuits [32–34]. The focus of this paper is ASIC implementation
of the sigmoid function for neurochips with on-chip learning.
In this paper, a new CVNS addition algorithm is proposed. The proposed algorithm is used for
the development of an efficient CVNS-based sigmoid function evaluation. The proposed function
6
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
evaluation method is based on the PWL approximation. Furthermore, the ASIC implementation of
the proposed structure in TSMC 0.18µm technology is carried out. The proposed structure provides
an 8-bit input and output resolution and has a low maximum approximation error. Moreover, it is
area and delay efficient.
This paper is organized as follows. A new CVNS addition algorithm for low resolution environ-
ment is introduced in section 2.1. The mathematical analysis for arithmetic setup of CVNS digits is
provided in section 2.2. The proposed CVNS-based sigmoid function evaluation scheme is discussed
in section 2.3. The detailed proof of the proposed CVNS-based sigmoid function evaluation scheme
is provided in the Appendix. The proposed CVNS-based sigmoid function evaluation structure and
its VLSI implementation are explained in section 2.4. Post-layout simulation and comparison with
existing structures is carried out in section 2.5. Finally, conclusions are drawn in section 2.6.
2.1 CVNS Addition in a Low Resolution Environment
The absolute value of a real number X using fixed-point binary number system format can be shown
as follows:
X =
Ni−1∑
i=−Nf
xi × 2i (2.2)
where Ni and Nf are the number of integer and fractional digits, while xi is a binary digit.
The radix-2 CVNS digits of X can be obtained as follows [31]:
((X))m = (X × 2−m) mod 2 (2.3)
where m is the index of the CVNS digits and is within the −Nf ≤ m ≤ Ni and mod 2 is a continuous
modular reduction operation.
A CVNS digit ((X))m in a limited resolution environment can be truncated to the following
equation [34]:
((X))m =
m∑
i=m−ϕ+1
xi × 2i−m (2.4)
where ϕ is the environment resolution.
Example: Finding the CVNS digit set of X=1010.1111 for a limited environment resolution of
four
Using (2.4), X is mapped to the CVNS digit set ((X)) = {1.25, 0.625, 1.375, 0.875|1.875, 1.625, 1.5, 1}.
7
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
The radix-2 CVNS digits can be converted back to binary digits using the following equation:
xm =
1 ((X))m ≥ 10 ((X))m < 1 (2.5)
Example: Finding the binary digits of ((X)) = {1.5, 1.125, 0.25, 0.625, 1.25, 0.5, 1, 0}
Using (2.5), the CVNS digit set is mapped to the binary number x=11001010.
According to (2.3), the CVNS digits ((Z))m representing the addition of two numbers, X and
Y , in the CVNS format can be written in the following form:
((Z))m =
(
(X + Y )× 2−m)) mod 2 (2.6)
Using (2.2), (2.6) can be written in the following form:
((Z))m =
 Ni−1∑
i=−Nf
(xi + yi)× 2i−m
 mod 2 (2.7)
The relation in (2.7) can be written as summation of two terms in the following form:
((Z))m =
 m∑
i=−Nf
(xi + yi)× 2i−m
 mod 2 +( Ni−1∑
i=m+1
(xi + yi)× 2i−m
)
mod 2 (2.8)
Considering that all the terms in the second summation are even values and are greater than
or equal to two, applying a mod2 operation on those terms results in zero. Therefore, (2.8) can be
written in the following form:
((Z))m =
 m∑
i=−Nf
(xi + yi)× 2i−m
 mod 2 (2.9)
To satisfy the limitations of implementation environment, method of truncation [34] can be
applied. The ((Z))m can be written as multiple summation terms, where each term has maximum ϕ
terms. Therefore, ((Z))m is rewritten in the following form:
((Z))m =
(
m∑
i=m−ϕ+1
((xi + yi)× 2i−m + 2−ϕ
m−ϕ∑
i=m−2ϕ+1
(xi + yi)× 2i−(m−ϕ) + ...
+2−(nm−1)ϕ
m−nmϕ∑
i=−Nf
(xi + yi)× 2i−(m−nmϕ)
)
mod 2 (2.10)
The value of nm can be found based on the following condition:
m− nmϕ ≤ −Nf + ϕ− 1 (2.11)
8
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
which results in the following equation:
nm =
⌈
m+Nf − ϕ+ 1
ϕ
⌉
(2.12)
where d e is the ceiling function.
One of the most fundamental properties of the CVNS is that the digits have information overlap
with each other. This means that a higher index digit can be constructed partially from lower index
digits. A lower index digit of the summation,((Z))m−ϕ, by repeating the same process can be written
as follows:
((Z))m−ϕ = Cm−ϕ mod 2 =
(
m−ϕ∑
i=m−2ϕ+1
((xi + yi)× 2i−(m−ϕ) + ... (2.13)
+2−(nm−1)ϕ
m−nmϕ∑
i=−Nf
((xi + yi)× 2i−(m−nmϕ)
)
mod 2
Using (2.13), (2.10) can be modified as follows:
((Z))m =
 m∑
i=m−ϕ+1
(xi + yi)× 2i−m + 2−ϕCm−ϕ
 mod 2 (2.14)
To satisfy the requirement of the limited resolution environment, only the terms of ((Z))m−ϕ greater
than or equal to 2−(ϕ−1) are considered. Therefore, (2.14) by applying the principle of truncation
method [34], can be written in the following form:
((Z))m =
[
((X))m + ((Y ))m + 2
−(ϕ)
⌊
Cm−ϕ
2
⌋]
mod 2 (2.15)
where b c is the floor function and ((X))m and ((Y ))m are based on (2.4).
Equation (2.15) shows that the CVNS addition in a low-resolution environment is feasible. This
feature is being used in development of the proposed CVNS function evaluation in the next sections.
Considering the fact that the CVNS digits share information, a full CVNS digit set is not required.
The remaining digits can always be obtained from the reduced digit set.
Any higher index digit can provide information of up to ϕ digits as follows:
((Z))m−1 = 2
[
((Z))m mod 1
]
+
⌊
2−(ϕ−1) × ((Z))m−(ϕ+1)
⌋
(2.16)
Since ((Z))m < 2 and using (2.5), the term ((Z))m mod 1 can be modified as follows:
((Z))m mod 1 = ((Z))m − b((Z))mc (2.17)
where
⌊
((Z))m
⌋
= zm and zm is the m-th binary digit corresponding to the CVNS digit ((Z))m.
9
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
Thus, the CVNS addition algorithm provides the addition result in CVNS format as well as
binary format. To clarify the proposed CVNS-based addition algorithm, an example is provided.
Example: Finding the addition result of two CVNS numbers ((X))={(((X))3 = 1.25, ((X))−1 = 1}
and ((Y ))={((Y ))3 = 0.5, ((Y ))−1 = 1.5}.
Addition result is obtained in CVNS format. The addition result denoted by ((Z)), consists of
the digit set ((Z))={((Z))3, ((Z))2, ((Z))1, ((Z))0, ((Z))−1, ((Z))−2, ((Z))−3, ((Z))−4}.
The CVNS digits ((Z))3 and ((Z))−1 are obtained first as previously explained, which are equal
to 1.875 and 0.5. Afterwards, the remaining CVNS digits are generated. The results will be
((Z))={1.875, 1.625, 1.5, 1.25, 0.5, 1, 0, 0}. Furthermore, as it is shown, the proposed CVNS addition
algorithm provides the addition result in binary format as well, which is Z = 1111.0100.
The basic element of neural network is Adaline. The block diagram of an Adaline is shown in
Fig. 2.1. As can be seen from Fig. 2.1, the addition results are the inputs to the nonlinear activation
function. Therefore, the inputs to the CVNS-based sigmoid function are available both in binary
and CVNS format. This is exploited in the proposed function evaluation method.
2.2 Selection of Number of Input and Output CVNS Digits
To develop a CVNS sigmoid function evaluation, the optimum number of input and output CVNS
digits should be determined. In this section, a mathematical analysis is presented which addresses
this need.
Since each CVNS digit has the information equal to ϕ bits, the number of required input and
output CVNS digits depends on the range of representation, maximum approximation error, and
the environment resolution.
2.2.1 Selection of Number of Input CVNS Digits
According to (2.2), the maximum value representable by a number in binary format is 2Ni . Therefore,
to cover the input range, it is required to have:
2Ni ≥ r (2.18)
where r is the input range.
In comparison, the number of fractional part bits is ascertained by the maximum approximation
error. It should be noted that S(x1) can be used as the approximation of the input range between
two consecutive points x1 and x2 having an error lower than the maximum approximation error
10
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
Figure 2.1: The block diagram of Adaline
provided that the following equation is satisfied.
S(x2)− S(x1) ≤  (2.19)
where S(x) is defined in (2.1) and  is the maximum allowable approximation error.
The sigmoid change between two consecutive inputs is proportional to the sigmoid derivative.
The maximum derivative of sigmoid function occurs in the region close to the origin. In this region,
the output of the sigmoid activation function using the Taylor series can be approximated as follows:
lim S(x) =
x
4
x→ 0 (2.20)
Based on (2.20), sigmoid output is approximately equal to its input divided by four in this region
and therefore, (2.19) can be simplified as follows:
x2 − x1
4
≤  (2.21)
The number of bits used for representing the fractional part of the input determines the difference
between two consecutive points in the input and is equal to 2−Nf . The relation between Nf and
11
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
maximum approximation error can be obtained as follows:
2−Nf ≤ 4  (2.22)
Using (2.18) and (2.22), number of input bits, Nin, required for representation of a signed input
is as follows:
Nin = Ni +Nf + 1 =
⌈
ln ri
ln 2
⌉
+
⌈
− ln 4
ln 2
⌉
+ 1 (2.23)
To satisfy the requirements of on-chip neurochips [23–26], the value of Ni is is equal to 3 for an
input range of (-8,8). The number of fractional bits, Nf , for a maximum error of 0.02 is equal to 4.
Therefore the total number of input bits for the CVNS activation function is 8.
2.2.2 Selection of Number of Output CVNS Digits
The output of the sigmoid function is in the range of (0,1) and can be shown in the following form:
y =
−1∑
k=−Nout
yk × 2k (2.24)
where Nout is the number of bits required to represent the output.
Based on the setup discussed in section 2.1, each CVNS digit contains the information corre-
sponding to ϕ bits. Therefore, a reduced number of digits is required for the output of CVNS
sigmoid function. This can be obtained as follows:
Noutc =
Nout
ϕ
(2.25)
where Noutc is the number of output digits in CVNS format.
An 8-bit output resolution is required for neurochips with on-chip learning [23]. The simulation
results of [32–34] shows that implementation of CVNS in TSMC 0.18µm and 90nm with ϕ = 4 is
feasible. Since the same technology is used in this paper, the same value of ϕ is used. Therefore,
Nout = 8 and with ϕ = 4, Noutc = 2. Accordingly, the CVNS digit set ((y))={((y))−1, ((y))−5} is used
for representing the sigmoid activation function output. It may worth noting that ((y))−1 contains
the information of y−1 to y−4 while ((y))−5 contains the information of y−5 to y−8.
2.3 Proposed CVNS Sigmoid Function Evaluation Scheme
In this section, the proposed CVNS sigmoid function evaluation is discussed. The number of input
and output digits of the proposed CVNS function are determined based on the analysis performed
12
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
in section 2.2. Also, the input to the CVNS sigmoid function coming from the adder is available in
both binary and CVNS format.
Since multiplierless PWL-based methods [11–13] are the most efficient structures, a similar ap-
proximation methodology is adopted in CVNS. The method developed in [13] has a lower number of
input regions and a constant number of bit shifts in each region. This leads to a less complex CVNS
sigmoid function. Therefore, the approximation developed in [13] is used for the CVNS sigmoid
function evaluation.
The proposed CVNS-based function evaluation scheme is explained and proved in detail in the
Appendix. Based on the arithmetic setup conducted in section 2.2, the CVNS digits ((y))−1 and
((y))−5 can represent the output of CVNS sigmoid function. The output CVNS digits in each region
are summarized as shown in the following equation:
((y)) =

((y))−11 = 1.875 x ≥ 5
((y))−51 = 1.875
((y))−12 =
((x))2 cmp 0.75
8 + 1.75 2.375 ≤ x < 5
((y))−52 =
(
((x))0 + 1
)
mod 2
((y))−13 = ((x))2 + 1.25 1 ≤ x < 2.375
((y))−53 = ((x))−2
((y))−14 = ((x))1 + 1 0 ≤ x < 1
((y))−54 = ((x))−3
((y))−1x = 1.875− ((y))−1|x| x < 0
((y))−5x = 1.875− ((y))−5|x| x < 0
(2.26)
It should be noted that ((y))ij is used as an indicator of the output CVNS digits in each region,
where i and j represent the CVNS digit and the region index respectively. Also, ((x))2 cmp 0.75 is
a comparator operator and is evaluated as follows:
((x))2 cmp 0.75 =
1 ((x))2 ≥ 0.750 ((x))2 < 0.75 (2.27)
The mathematical derivation provides the output CVNS digits ((y))−1 and ((y))−5 in all four
different input regions as well as negative input values.
The main arithmetic operation performed in (2.26) is addition, which considering the analog
nature of CVNS digits in combination with current-mode realization, results in an efficient VLSI
13
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
implementation of the proposed CVNS-based sigmoid activation function. Moreover, as the inputs
to the CVNS-based sigmoid function unit are available in both CVNS and binary format, there is
no need for an input to CVNS format conversion.
2.4 VLSI Implementation of the CVNS Sigmoid Function
Evaluation
In this section, the proposed structure and the VLSI implementation of the CVNS-based sigmoid
function is discussed. Since in the current-mode circuits addition is easily performed, VLSI imple-
mentation of the proposed CVNS function evaluation is carried out using current-mode circuits.
The block diagram of the proposed CVNS-based sigmoid function evaluation structure is shown
in Fig. 2.2. It is composed of four main units including the input range decoder, current generator,
sigmoid approximation and output assignment.
The input range decoder detects the input range while the current generator provides the required
signals for the sigmoid approximation and output assignment units. Since there are four input
regions, the sigmoid approximation unit is constituted of four sub-units. It should be noted that the
sub-units are enabled by the en signal, which is activated by the input range decoder output signals,
r1 to r4.
The output assignment unit assigns the output based on the input sign. If the input is positive,
it passes its input without change. Otherwise, a subtraction is performed. Operation of each block
is explained in detail next.
2.4.1 Input Range Decoder
Since there are four different input ranges, four signals r1, r2, r3 and r4 are used to decode the input
range. The truth table of the input range decoder is shown in Table 2.1.
In each region, only the associated signal becomes active. These signals can be generated based
on the binary inputs to the CVNS sigmoid function structure. Using the binary inputs, the r1 to r4
signals can be generated based on the following logic equations.
r1 = x2 ∧ (x1 ∨ x0) (2.28)
r2 = r1 ∧ (x2 ∨ (x1 ∧ (x0 ∨ (x−1 ∨ x−2 ∨ (x−3 ∧ x−4)) (2.29)
14
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
Figure 2.2: The block diagram of the proposed CVNS-based sigmoid function evaluation structure
r3 = r1 ∧ r2 ∧ (x0 ∨ x1 ∨ x2) (2.30)
15
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
Table 2.1: Input range decoder truth table
Input range r1 r2 r3 r4
x ≥ 5 0 1 1 1
2.375 ≤ x < 5 1 0 1 1
1 ≤ x < 2.375 1 1 0 1
0 ≤ x < 1 1 1 1 0
r4 = r1 ∧ r2 ∧ r3 (2.31)
In the implemented hardware, 8µA is indicator of 1. Therefore, all the constant values are
multiplied by 8µA. Thus, (2.26) can be written in the following form:
((y)) =

((y))−11 = 15µA r1 = 0
((y))−51 = 15µA
((y))−12 =
(
((x))2 cmp 6µA+ 14
)
µA r2 = 0
((y))−52 =
(
((x))0 + 8µA
)
mod 16µA
((y))−13 = ((x))2 + 10µA r3 = 0
((y))−53 = ((x))−2
((y))−14 = ((x))1 + 8µA r4 = 0
((y))−54 = ((x))−3
((y))−1x = 15µA− ((y))−1|x| x3 = 1
((y))−5x = 15µA− ((y))−5|x|
(2.32)
2.4.2 Current Generator
To evaluate the sigmoid function using the relations developed in (2.32), various constant current
values are required.
The main building block of the current generator unit is shown in Fig. 2.3. Transistors M1 to
M3 generate a 5 µA reference current. By proper sizing of the transistors an 11 µA current is copied
to the transistors M4 to M13. The 11 µA current copied to the transistors M4 and M5 is used for
16
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
Figure 2.3: VLSI implementation of the current reference circuit used in the current generator block
generation of the current signal required for the output assignment unit. The current is copied to
the transistors M6 to M13 provided that the corresponding ri signal is active. Therefore, depending
on the input range, the respective 11 µA current will be generated. Hence, the current copied to
the transistors M6 to M13 is exploited to generate the constant current signals required for four
sub-units of the sigmoid approximation unit. Generation of different current values is conducted
using basic current mirrors. The current values generated by the current generator block for different
input ranges are summarized in Table 2.2. It should be noted that the 7.5 µA generated in all input
ranges is used by the output assignment unit.
2.4.3 Sigmoid Approximation
In this section, the input processing sub-units for different input sub-ranges are explained.
Output approximation for input range r1
In the input range r1, both output CVNS digits have a constant value of 15 µA. The 15 µA is
generated by the current generator block and is passed to the output provided that the r1 signal is
active.
Output Approximation for Input Range r2
The input processing circuit for the input range r2 is implemented based on (2.32).
17
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
Table 2.2: Current values generated by the current generator block
Input range Generated Current
x ≥ 5 7.5 µA, 15 µA
2.375 ≤ x < 5 1 µA, 7.5 µA, 8 µA, 14 µA,16 µA
1 ≤ x < 2.375 7.5 µA, 10 µA
0 ≤ x < 1 7.5 µA, 8 µA
To evaluate ((x))2 cmp 6 µA, 1 µA current generated by the current generator block is passed to
the output of this sub-unit provided that ((x))2 is greater than 6 µA. Since if x0 or x2 are one, the
((x))2 will be greater than 6 µA, ((x))2 ≥ 6 µA can be expressed by the x2∨x0 logic expression. This
can be implemented using an OR gate. Therefore, ((x))2 cmp 6 µA is implemented by a transistor
receiving the 1 µA from the current generator block and controlled by the x2 ∨ x0 signal.
The ((y))−12 output has two terms which their summation produces the output. Therefore, the
output of ((x))2 cmp 6 µA evaluation is wired with the 14 µA current generated by the current
generator block.
To generate the ((y))−52 CVNS output digit, the 8 µA current generated by the current generator
block is wired with the ((x))0 input and the mod 16 µA operation is applied. Using the mod2
operation basic properties, the mod2 block is designed as follows:
(
((x))0 + 8µA
)
mod 16 µA =
((x))0 + 8 µA ((x))0 <8 µA((x))0 − 8 µA ((x))0 ≥ 8 µA (2.33)
The VLSI implementation of the mod 16µ A operation is shown in Fig. 2.4. The current
subtractor block is enabled by the signal mod2 en applied to the transistor M1. Considering that
((x))0 ≥ 8 µA is equivalent to x0 = 1, the mod2 en signal is generated using the following logic
expression:
mod2 en = r2 ∧ x0 (2.34)
When the mod 2 en signal becomes active, the 16 µA current generated by the current generator
block is subtracted from the current through the transistor M3 which is equal to the input current to
the mod 16 µA block. Then, the subtracted current is delivered to the transistor M4, being copied
to the transistor M5.
18
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
Figure 2.4: VLSI implementation of mod 16 µA operation
Output Approximation for Input Range r3
The ((y))−13 output CVNS digit in the input range r3 is generated by wiring the 10 µA current
generated by the current generator block with the input ((x))2 .
The ((y))−53 is equal to the input ((x))−2 provided that the input is in the input range r3. This
is implemented using a transistor as a switch controlled by the r3 output of the input range decoder
which turns on when the input is in the input range r3.
Output Approximation for Input Range r4
The ((y))−13 output CVNS digit in the input range r4 is generated by wiring the 8 µA current
generated by the current generator block with the input ((x))1. The ((y))−53 is equal to the input
((x))−3 provided that it is within the input range r3. This is implemented using a transistor as a
switch, and controlled by the r4 output of the input range decoder.
2.4.4 Output Assignment Block
The output assignment block assigns the output based on the input sign. If the input is positive, the
output of the sigmoid approximation unit is directly passed to the output; otherwise, it is subtracted
from 15 µA. The output assignment unit circuit is shown in Fig. 2.5. The transistor M6 acts as a
19
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
Figure 2.5: VLSI implementation of the output assignment unit
switch which turns on provided that the input sign is positive, passing the input to the output.
It should be noted that the input sign may be checked through x3 input bit. If the input sign
is negative, the 7.5 µA current generated by the current generator block is copied to the transistor
M5. Considering that the
W
L of the transistor M3 is two times of the transistor M5, 15 µA will be
copied to the transistor M3. The transistors M1 to M3 act as a current subtractor, subtracting the
input from 15 µA. The subtracted current is copied to the transistor M4, generating the output.
2.5 Post-layout Simulation and Comparisons
The proposed structure exploits the CVNS features which make the high precision analog circuits
implementation feasible. Moreover, using CVNS, the proposed structure requires lower number of
output digits which decreased the area consumption. Furthermore, current-mode circuits are used to
realize the CVNS sigmoid function. Considering that the main arithmetic operation in the proposed
CVNS-based sigmoid activation function is addition, low area and high speed addition offered by
20
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
Figure 2.6: The simulation results of the proposed CVNS-based sigmoid activation function
current-mode circuits resulted in lower area consumption and higher speed.
The performance of the VLSI implementation of the proposed CVNS sigmoid function evaluation
structure is verified by conducting post-layout simulations for various inputs within the input range
of (-8,8).
Fig. 2.6 shows the post-layout simulation results for four input values of 0.875, 1.875, 3.625 and
7.9375 where each are in the input ranges r4, r3, r2 and r1 respectively. These inputs are applied
with intervals of 10 ns to the sigmoid function evaluation circuit. The CVNS values corresponding
to these four inputs are shown in Table 2.3. Using (2.32), the expected output for each input can be
calculated. The relation between ((y))−1 and ((y))−5 with input CVNS digits and the expected value
of ((y))−1 and ((y))−5 for each input is summarized in Table 2.3. The post-layout simulation results
shown in Fig. 2.6 are in agreement with the mathematical derivations presented in Table 2.3.
To compare the efficiency of the proposed structure with state of the art, the structures devel-
oped in [11–13], [15, 16] and [19] are coded using the Verilog hardware description language and
implemented in TSMC 0.18µm technology. To the best of our knowledge, the multiplierless PWL-
based methods developed in [12] and [13] are the best previously developed solutions to the ASIC
implementation of sigmoid activation function for neurochips with on-chip learning.
It should be noted that since the VLSI implementation of the proposed structure is conducted
using analog circuits, the proposed structure is laid out manually while the layout of the digital
structures is generated automatically. The post-layout area, delay and power consumption results
21
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
Table 2.3: Input values and their corresponding input and output CVNS digits
x ((x))3 ((x))2 ((x))1 ((x))0 ((x))−1 ((x))−2 ((x))−3 ((x))−4 ((y))−1 ((y))−5
0.875 0 1 µA 3 µA 7 µA 14 µA 12 µA 8 µA 0
((x))1 + 8 µA
= 11 µA
((x))−3 = 8 µA
1.875 1 µA 3 µA 7 µA 15 µA 14 µA 12 µA 8 µA 0
((x))2 + 10 µA
= 13 µA
((x))−2 = 12 µA
3.625 3 µA 7 µA 14 µA 13 µA 10 µA 4 µA 8 µA 0
(
((x))2 cmp 6 µA
+14 µA
)
= 15 µA
(
((x))0+8 µA
)
mod 16 µA=5 µA
7.9375 7 µA 15 µA 15 µA 15 µA 15 µA 14 µA 9 µA 8 µA 15 µA 15 µA
Table 2.4: Comparison of different structures
Structure Input Nin Nout Maximum Area Delay Power ADP
Range Error (µm2) (ns) (µW) (µm2 × ns× µW )
Bharkhada (-8,8) 17 16 0.0001 352 336.86 64.87 4 109.30 93 922 539 300.22
[16]
Zhang (-4,4) 14 14 0.0216 5 416.22 6.66 866.34 31 250 638.31
[15]
Tommiska (-8,8) 7 7 0.0708 1 404.95 1.88 368.82 974 166.47
[19]
Vassiliadis (-4,4) 14 10 0.0240 1 804.32 2.07 330.30 1 233 651.48
[11]
Alippi (-8,8) 12 8 0.0189 1 745.96 2.58 518.94 2 337 605.09
[12]
PLAN (-8,8) 9 8 0.0189 1 347.23 3.71 268.20 1 340 523.49
[13]
Proposed (-8,8) 16 2 1 0.0189 965.37 2.53 192.63 470 476.83
are summarized in Table 2.4.
Comparisons are made with state of the art where the goal is to satisfy the requirements of
neural networks with on-chip learning as required by [23–26].These include the input range, output
1Nout for the proposed structure is indicator of the number of output CVNS digits as given by (2.25).
22
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
resolution and maximum approximation error. Since all structures have been re-implemented, area,
delay, and power reports are also provided for a complete comparison.
The Nin and Nout in Table 2.4 are indicator of the number of input and output digits for each
of the sigmoid function evaluation. For the proposed work, there are 8 binary and 8 CVNS input
digits. Following the mathematical setup in section 2.2, the proposed structure requires only two
output CVNS digits.
As may be noted from Table 2.4, different methods use different number of input and output
digits. For example, the number of input and output bits in [19] is lower than the minimum number
of bits required for on-chip neural networks [23]. All of the other structures satisfy the requirement
of the minimum number of input and output digits.
The structure developed in [16] meets the requirements of on-chip neural networks in terms of
input range, output resolution and maximum error. This work is based on a third order nonlinear
approximation and is more appropriate for the FPGA platform. This structure has the lowest
maximum error at the cost of extremely high area, delay and power consumption. The reported
delays are the critical path delay of the corresponding structures.
The structure reported in [15] performs the sigmoid function approximation based on a second
order nonlinear approximation. This method does not meet the input range requirement. Moreover,
its area and delay is the second highest among the reported work.
The work done by Tommiska et. al [19] exploits the bit-level mapping method and has the
lowest delay. However, it fails to satisfy the resolution requirement. In addition, it has the highest
approximation error among all structures.
The structure reported in [11] has 2.07 ns delay. However, its input range is limited to only
(-4,4) and its approximation error is slightly high.
Both works reported in [12] and [13] are based on multiplierless PWL method and provide the
same maximum approximation error.
The proposed CVNS structure satisfies all the requirements of the application including max-
imum error, number of digits and input range. The maximum number of arithmetic operations
for the proposed structure occurs for the inputs in the input range r2, which includes addition,
mod2 operation and the output assignment. The proposed structure has the lowest area and power
consumption. The delay of the developed structure is 2.53 ns.
The power measurement is performed through post-layout simulation for all structures reported
in Table 2.4. The input rate for the proposed structure and the structures developed in [11–13]
23
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
and [19] is set to a unified optimum speed that all of these structures can operate. The input period
for these structures is set to 3.71 ns. Since the structures developed in [15] and [16] have higher
delay, their input rate is set to 6.66 ns and 64.87 ns respectively. A uniform random bit stream
including 10,000 inputs is applied to all these circuits to measure the power consumption.
The proposed addition algorithm provides inputs to the proposed structure both in CVNS and
binary. Therefore, no binary to CVNS conversion is required provided that the whole network is
implemented based on the CVNS arithmetic. The output of the proposed activation function is the
input to the multiplier of the next layer in the network. Considering that the input to the CVNS
multiplier is in the CVNS format [31], no conversion is required. Therefore, the area, delay and power
consumption of the proposed structure are reported in Table 2.4 without any conversion units. The
input to the CVNS conversion can be carried out using a current steering digital to analog converter
while the output to binary conversion can be conducted using the current comparator developed
in [36]. As an example, the area consumption of the 4-3-2 network developed in [32] is 385320 µm2
while the area consumption of the input to CVNS and output to binary conversion circuits is 1245.22
µm2 and 140.8 µm2 respectively. Therefore, the area overhead of the input and output conversion
compared to the area consumption of the whole network is negligible.
Considering that the area, delay and power consumption are all important in ASIC implemen-
tation, Area×Delay×Power (ADP) is defined as a performance metric. The ADP values of different
structures are summarized in Table 2.4, which show that the proposed CVNS sigmoid function eval-
uation structure has a three times lower ADP compared to the best previously developed structure.
Moreover, comparisons are performed between the proposed structures and two analog neurons
developed in [27] and [28]. Custom layout and circuit design are carried out in TSMC 0.18µm
technology. Both structures implement the activation function with six transistors. The neuron
in [27] is a resistive type distributed neuron which implements the activation function through
transistor nonlinear properties. Several neurons are required for each layer of the network depending
on the network size. For each unit, the area, delay and power consumption are equal to 62.32
µm2, 0.162 ns and 152.1 µW respectively. The implementation of [28] shows an area consumption,
delay and power consumption of 122.52 µm2, 0.39 ns and 205.2 nW respectively. Although these
neurons have area efficiency compared to the proposed CVNS sigmoid, process variation affects their
performance. As an example, the Monte Carlo simulation of the neuron developed in [27] for its
maximum output is conducted. The simulation result is shown in Fig. 2.7. The process variation
causes error in the output which in turn makes the realization of precise sigmoid activation function
using analog neurons impractical.
24
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
Figure 2.7: Monte Carlo simulation result of the the analog neuron developed in [27] for its maximum
output
Since the VLSI implementation of the proposed structure is carried out using mixed-signal cir-
cuits, process variation affects the proposed structure as well. The accuracy of the proposed structure
depends on the accuracy of the constant current values generated by the current generator block.
Since all the constant current values generated by the current generator block are generated from
the same current reference, the highest output current level of the proposed structure has the highest
variation. The highest output current level of the proposed structure is equal to 15 µA. Therefore,
a Monte Carlo simulation of the 15 µA output current with 10,000 runs is conducted. The Monte
Carlo simulation result is shown in Fig. 2.8. Since the difference between the two consecutive out-
put levels of the proposed CVNS sigmoid function evaluation is equal to 1 µA, all of the output
current levels between 14.5 µA and 15.5 µA represent the same output. Considering that all of the
output current levels between 14.5 µA and 15.5 µA represent the same output, Monte Carlo simu-
lation results show a yield of 47.44 % for the proposed CVNS sigmoid activation function structure.
Therefore, the proposed structure makes the use of analog circuits for implementation of precise
sigmoid activation function required for neurochips with on-chip learning possible.
2.6 Conclusion
A new CVNS-based sigmoid activation function evaluation scheme for neurochips with on-chip learn-
ing is proposed in this paper. The proposed function evaluation scheme exploits the PWL approx-
imation method and is based on a mathematical derivation using the CVNS features. Moreover,
25
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
Figure 2.8: Monte Carlo simulation result of the 15 µA current
based on the maximum approximation error, the number of input and output CVNS digits required
for VLSI implementation of the proposed sigmoid function evaluation method is determined.
To realize the proposed CVNS-based sigmoid function evaluation scheme, a new CVNS-based
structure is proposed. The proposed structure exploits the mixed-signal current-mode circuits,
which efficiently implement the addition arithmetic operation. In addition, the proposed CVNS-
based sigmoid function evaluation requires a lower number of output digits when compared to the
state of the art. The implementation results in TSMC 0.18µm technology show that the proposed
structure compares favorably to the state of the art.
Appendix
Proof of the Proposed CVNS Function Evaluation Scheme
The proposed CVNS function evaluation is based on the PLAN method developed in [13]. The
approximation developed in [13] is as follows:
y =

1 x ≥ 5
0.03125× x+ 0.84375 2.375 ≤ x < 5
0.125× x+ 0.625 1 ≤ x < 2.375
0.25× x+ 0.5 0 ≤ x < 1
yx = 1− y|x| x < 0
(2.35)
26
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
where x and y are the real numbers which represent the input and output of the sigmoid activation
function.
Using (2.2) and (2.4) along with Ni = 3 and Nf = 4 obtained in section 2.2, the input x in terms
of its corresponding CVNS digits can be written in the following form:
x = 23
(
((x))3 +
((x))−1
24
)
(2.36)
The output CVNS digits can be generated based on the following equation [31]:
((y))j =
(
y × 2−j) mod 2 (2.37)
Before going through the mathematical derivation of the CVNS-based sigmoid function evalua-
tion, three basic properties that are going to be used several times are reviewed. These properties
will be referred as basic properties one, two and three in the rest of this appendix.
1) According to (2.4), the minimum value representable by each CVNS digit is equal to 2−(ϕ−1).
Therefore, assuming a reliable environment resolution of four, in CVNS-based function evaluation,
the values less than 2−3 are not considered. Since an adequate number of CVNS digits are used to
represent the output, this has no effect on the accuracy of the proposed function evaluation method.
This can be shown in the following form:
i ≤ −4⇒ 2ixj = 0 (2.38)
where i and j are integer values and xj can assume values 0 or 1.
2) A basic property of mod2 operation is that if the outcome of this continuous modular re-
duction operation is zero, it should be an even value, greater than one . This can be shown in the
following form:
i ≥ 0 ⇒ 2ixj mod 2 = 0 (2.39)
3) Another basic property of mod2 operation is exploited too. If the input to mod2 function is
less than 2, the mod2 function passes the input to the output without change. This can be shown
in the following form:
x < 2⇒ x mod 2 = x (2.40)
According to (2.35), there are four different input regions. In addition, the output for negative
input values can be determined using the output for absolute value of input. Therefore, using basic
properties one, two and three along with the CVNS arithmetic features, mathematical derivation
of determining the CVNS output digits ((y))−1 and ((y))−5 in four different input regions as well as
negative input values is conducted in the following sections. It should be noted that ((y))ij is used
27
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
as an indicator of the output CVNS digits in each region, where i and j represent the CVNS digit
and region index respectively.
CVNS Function Evaluation in the Input Range x ≥ 5
In this region, named as r1, y has a constant value of 1. Therefore, the output CVNS digits ((y))−11
and ((y))−51 are equal to each other and according to (2.4), are calculated in the following form:
((y))−11 =
−1∑
k=−4
2−k = ((y))−51 =
−5∑
k=−8
2−k = 1.875 (2.41)
CVNS Function Evaluation in the Input Range 2.375 ≤ x < 5
In this region, named as r2, using (2.35), (2.36) and (2.37), the output CVNS digit ((y))−12 can be
written in the following form:
((y))−12 =
((x))3
2
+
((x))−1
32
+ 1.6875 (2.42)
Using basic property one, (2.42) can be simplified as follows:
((y))−12 =
((x))3
2
+ 1.6875 (2.43)
According to (2.4) and (2.43), ((y))−12 output CVNS digit may be 1.625 or 1.875. To investigate
this, the input region r2 is divided to three sub-ranges. The first one is 2.375 ≤ x < 3. In this
sub-range, ((x))3 = 0.25. Therefore, using basic property one and (2.42), we have ((y))−12 = 1.75.
In the sub-range 3 ≤ x < 4, we have ((x))3 = 0.375. Therefore, using basic property one and (2.42),
we have ((y))−12 = 1.875. Following the same procedure and considering that in the the sub-range
4 ≤ x < 5, ((x))3 = 0.5, we have ((x))−12 = 1.875. Therefore, in this region, the ((y))−12 can be
written in the following form:
((y))−12 =
((x))3 cmp 0.375
8
+ 1.75 (2.44)
where the cmp function is defined as follows:
a cmp b =
1 a ≥ b0 a < b (2.45)
According to (2.4) and (2.45), it can be easily proven that the condition ((x))3 cmp 0.375 is
exactly the same as ((x))2 cmp 0.75. Therefore, (2.44) is simplified as follows:
((y))−12 =
((x))2 cmp 0.75
8
+ 1.75 (2.46)
28
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
As will be proved in the following section, ((x))2 is used to determine the ((y))−13 output CVNS
digit in the region 1 ≤ x < 2.375 as well. Using the same input signal to generate the ((y))−1 output
CVNS digit in both regions may result in lower area consumption of the VLSI implementation of
the proposed CVNS-based sigmoid function evaluation method.
Using (2.35), (2.36) and (2.37), the output CVNS digit ((y))−52 in this region can be written in
the following form:
((y))−52 =
(
8 ((x))3 +
((x))−1
2
+ 27
)
mod 2 =
(
8x3 + 4x2 + 2x1 + x0 +
x−1
2
+
x−2
4
+
x−3
8
+
x−4
16
+ 1
)
(2.47)
Using basic properties one and two, (2.47) can be written in the following form:
((y))−52 = (x0 +
x−1
2
+
x−2
4
+
x−3
8
+ 1) mod 2 (2.48)
According to (2.4), (2.48) can be written in the following form:
((y))−52 = (((x))0 + 1) mod 2 (2.49)
CVNS Function Evaluation in the Input Range 1 ≤ x < 2.375
In this region, named as r3, using (2.35), (2.36) and (2.37) , the output CVNS digit ((y))−13 can be
written in the following form:
((y))−13 =
(
2 ((x))3 +
((x))−1
8
+ 1.25
)
mod 2 (2.50)
Using basic property one, the ((x))−18 in (2.50) is replaced with
b((x))−1c
8 , where b c is the floor
function. Therefore, (2.50) can be written in the following form:
((y))−13 =
(
2 ((x))3 +
b((x))−1c
8
+ 1.25
)
mod 2 (2.51)
Moreover, considering that b((x))−1c = x−1 and using basic property two, (2.51) can be written
in the following form:
((y))−13 =
(
2x3 + x2 +
x1
2
+
x0
4
+
x−1
8
+ 1.25
)
mod 2 = (((x))2 + 1.25) mod 2 (2.52)
It can be easily proven that ((x))2 < 0.5 in this region. Therefore, using basic property three,
(2.51) is simplified as follows:
((y))−13 = ((x))2 + 1.25 (2.53)
29
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
Using (2.35), (2.36) and (2.37), the ((y))−53 output CVNS digit in this region can be written in
the following form:
((y))−53 = 32
(
((x))3 + 2 ((x))−1 + 20
)
mod 2 (2.54)
Using (2.4) and the basic properties two and three, (2.54) can be written in the following form:
((y))−53 = (2 ((x))−1) mod 2 =
(
2x−1 + x−2 +
x−3
2
+
x−4
4
)
mod 2 = ((x))−2 (2.55)
CVNS Function Evaluation in the Input Range 0 ≤ x < 1
In this region, named as r4, using (2.35), (2.36) and (2.37), the ((y))−14 output CVNS digits can be
written in the following form:
((y))−14 =
(
4 ((x))3 +
((x))−1
4
+ 1
)
mod 2 (2.56)
Since all input binary digits x0 to x3 are equal to 0 in the input range 0 ≤ x < 1, we have
((x))3=0. Therefore, using (2.4) and considering that each CVNS digit spans over only 4 bits, (2.56)
can be written in the following form:
((y))−14 =
(
((x))−1
4
+ 1
)
mod 2 (2.57)
According to (2.4), the ((x))−14 can be written in the following form:
((x))−1
4
=
(x−1
4
+
x−2
8
+
x−3
16
+
x−2
32
)
mod 2 (2.58)
Considering the basic property one, it can be simplified in the following form:
((x))−1
4
=
(x−1
4
+
x−2
8
)
mod 2 (2.59)
Considering that since originally the range of input value x is between zero and one, and using
(2.4), it is clear that x0 and x1 are zero as well. Therefore, ((x))1 =
(x−1
4 +
x−1
8
)
mod 2 and ((y))−14
output CVNS digit can be written in the following form:
((y))−14 = (((x))1 + 1) mod 2 (2.60)
Since processing the signal ((x))−14 requires a division, replacing it with ((x))1 simplifies the signal
processing, which in turn may result in more efficient VLSI implementation.
In addition, since x0 and x1 are zero, ((x))1 is less than one. Therefore, using basic property
three, (2.60) is modified as following:
((y))−14 = ((x))1 + 1 (2.61)
30
2. CVNS-BASED SIGMOID FUNCTION EVALUATION FOR PRECISE NEUROCHIPS
According to (2.35), (2.36) and (2.37), the other output CVNS digit in this region, ((y))−54, can
be written in the following form:
((y))−54 = (64 ((x))3 + 4 ((x))−1 + 16) mod 2 (2.62)
Using basic properties two, three and (2.4), (2.62) can be written in the following form:
((y))−54 = (4 ((x))−1) mod 2 =
(
4 x−1 + 2x−2 + x−3 +
x−4
2
)
mod 2 = ((x))−3 (2.63)
CVNS Function Evaluation for Negative Input Values
According to (2.35), to evaluate the outputs for negative input values, the output for absolute value
of input is calculated and subtracted from 1 to generate the output. Considering that the number
of output bits is equal to 8, this can be shown in the following form:
yx =
−1∑
k=−8
(1− yk) 2k (2.64)
where yx and yk are the output for negative inputs and the binary digits representing the output
corresponding the absolute value of input.
Using (2.37), the ((y))−1 output CVNS digit can be calculated as follows:
((y))−1x =
−1∑
k=−8
(
(1− yk) 2k+1
)
mod 2 (2.65)
Using basic property one, (2.65) can be written in the following form:
((y))−1x =
−1∑
k=−4
(1− yk)2k+1 (2.66)
which can be simplified in the following form:
((y))−1x = 1.875−
−1∑
k=−4
yk2
k+1 (2.67)
According to (2.4), (2.67) can be simplified as follows:
((y))−1x = 1.875− ((y))−1|x| (2.68)
Repeating the same steps for ((y))−5x, it can be obtained using the following equation:
((y))−5x = 1.875− ((y))−5|x| (2.69)
Therefore, the output CVNS digits for the negative input values can be calculated by evaluating
the absolute value of input and subtracting it from 1.875.
31
REFERENCES
2.7 References
[1] B. Zamanlooy and M. Mirhassani, “Efficient hardware implementation of threshold neural net-
works,” in New Circuits and Systems Conference (NEWCAS), 2012 IEEE 10th International,
June 2012, pp. 1 –4.
[2] H.-Y. Hsieh and K.-T. Tang, “VLSI implementation of a bio-inspired olfactory spiking neural
network,” IEEE Trans. Neural Netw., vol. 23, no. 7, pp. 1065–1073, July 2012.
[3] P. Hafliger, “Adaptive WTA with an analog VLSI neuromorphic learning chip,” IEEE Trans.
Neural Netw., vol. 18, no. 2, pp. 551–572, March 2007.
[4] S. Still, K. Hepp, and R. Douglas, “Neuromorphic walking gait control,” IEEE Trans. Neural
Netw., vol. 17, no. 2, pp. 496–508, March 2006.
[5] S. Jung and S. su Kim, “Hardware implementation of a real-time neural network controller with
a DSP and an FPGA for nonlinear systems,” IEEE Trans. Ind. Electron., vol. 54, no. 1, pp.
265 –271, Feb. 2007.
[6] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional
neural networks,” in Advances in Neural Information Processing Systems 25, P. Bartlett,
F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds., 2012, pp. 1106–1114.
[7] D. Myers and R. Hutchinson, “Efficient implementation of piecewise linear activation function
for digital VLSI neural networks,” Electronics Letters, vol. 25, no. 24, pp. 1662 –1663, Nov.
1989.
[8] M. Al-Nsour and H. Abdel-Aty-Zohdy, “Implementation of programmable digital sigmoid func-
tion circuit for neuro-computing,” in Circuits and Systems, 1998. Proceedings. 1998 Midwest
Symposium on, Aug 1998, pp. 571 –574.
[9] M. Bajger and A. Omondi, “Low-error, high-speed approximation of the sigmoid function for
large FPGA implementations,” Journal of Signal Processing Systems, vol. 52, pp. 137–151,
2008.
[10] A. Armato, L. Fanucci, E. Scilingo, and D. D. Rossi, “Low-error digital hardware implemen-
tation of artificial neuron activation functions and their derivative,” Microprocessors and Mi-
crosystems, vol. 35, no. 6, pp. 557 – 567, 2011.
[11] S. Vassiliadis, M. Zhang, and J. Delgado-Frias, “Elementary function generators for neural-
network emulators,” IEEE Trans. Neural Netw., vol. 11, no. 6, pp. 1438 – 1449, Nov. 2000.
[12] C. Alippi and G. Storti-Gajani, “Simple approximation of sigmoidal functions: realistic design of
digital neural networks capable of learning,” in Circuits and Systems, 1991., IEEE International
Sympoisum on, jun 1991, pp. 1505 –1508 vol.3.
[13] H. Amin, K. Curtis, and B. Hayes-Gill, “Piecewise linear approximation applied to nonlinear
function of a neural network,” Circuits, Devices and Systems, IEE Proceedings -, vol. 144, no. 6,
pp. 313 –317, Dec 1997.
[14] H. Kwan, “Simple sigmoid-like activation function suitable for digital hardware implementa-
tion,” Electronics Letters, vol. 28, no. 15, pp. 1379 –1380, July 1992.
32
REFERENCES
[15] M. Zhang, S. Vassiliadis, and J. Delgado-Frias, “Sigmoid generators for neural computing using
piecewise approximations,” IEEE Trans. Comput., vol. 45, no. 9, pp. 1045–1049, Sep. 1996.
[16] B. Bharkhada, J. Hauser, and C. Purdy, “Efficient FPGA implementation of a generic function
approximator and its application to neural net computation,” in Circuits and Systems, 2003
IEEE 46th Midwest Symposium on, vol. 2, Dec. 2003, pp. 843–846.
[17] K. Sammut and S. Jones, “Implementing nonlinear activation functions in neural network em-
ulators,” Electronics Letters, vol. 27, no. 12, pp. 1037 –1038, June 1991.
[18] K. Leboeuf, A. Namin, R. Muscedere, H. Wu, and M. Ahmadi, “High speed VLSI implemen-
tation of the hyperbolic tangent sigmoid function,” in Convergence and Hybrid Information
Technology, 2008. ICCIT ’08. Third International Conference on, vol. 1, Nov. 2008, pp. 1070
–1073.
[19] M. Tommiska, “Efficient digital implementation of the sigmoid function for reprogrammable
logic,” Computers and Digital Techniques, IEE Proceedings -, vol. 150, no. 6, pp. 403 – 411,
Nov. 2003.
[20] A. Namin, K. Leboeuf, R. Muscedere, H. Wu, and M. Ahmadi, “Efficient hardware implemen-
tation of the hyperbolic tangent sigmoid function,” in Circuits and Systems, 2009. ISCAS 2009.
IEEE International Symposium on, May 2009, pp. 2117 –2120.
[21] P. Meher, “An optimized lookup-table for the evaluation of sigmoid function for artificial neural
networks,” in VLSI System on Chip Conference (VLSI-SoC), 2010 18th IEEE/IFIP, Sept. 2010,
pp. 91 –95.
[22] B. Zamanlooy and M. Mirhassani, “Efficient VLSI implementation of neural networks with
hyperbolic tangent activation function,” IEEE Trans. VLSI Syst., vol. 22, no. 1, pp. 39–48,
January 2014.
[23] J. Holt and T. Baker, “Back propagation simulations using limited precision calculations,” in
Neural Networks, 1991., IJCNN-91-Seattle International Joint Conference on, vol. ii, Jul 1991,
pp. 121 –126 vol.2.
[24] P. Murtagh and A. Tsoi, “Implementation issues of sigmoid function and its derivative for VLSI
digital neural networks,” Computers and Digital Techniques, IEE Proceedings E, vol. 139, no. 3,
pp. 207 – 214, May 1992.
[25] J. Holi and J.-N. Hwang, “Finite precision error analysis of neural network hardware imple-
mentations,” IEEE Trans. Comput., vol. 42, no. 3, pp. 281 –290, Mar 1993.
[26] K. Basterretxea, J. Tarela, I. del Campo, and G. Bosque, “An experimental study on nonlinear
function computation for neural/fuzzy hardware design,” IEEE Trans. Neural Netw., vol. 18,
no. 1, pp. 266 –283, Jan. 2007.
[27] G. Khodabandehloo, M. Mirhassani, and M. Ahmadi, “Analog implementation of a novel
resistive-type sigmoidal neuron,” IEEE Trans. VLSI Syst., vol. 20, no. 4, pp. 750 –754, April
2012
[28] M. Carrasco-Robles and L. Serrano, “A novel minimum-size activation function and its deriva-
tive,” IEEE Trans. Circuits Syst. II, vol. 56, no. 4, pp. 280 –284, April 2009.
33
REFERENCES
[29] L. Gatet, H. Tap-Beteille, and M. Lescure, “Analog neural network implementation for a real-
time surface classification application,” IEEE Sensors J., vol. 8, no. 8, pp. 1413–1421, Aug.
2008.
[30] H. Djahanshahi, M. Ahmadi, G. Jullien, and W. Miller, “Sensitivity study and improvements
on a nonlinear resistive-type neuron circuit,” Circuits, Devices and Systems, IEE Proceedings
-, vol. 147, no. 4, pp. 237 –242, Aug 2000.
[31] A. Saed, M. Ahmadi, and G. Jullien, “A number system with continuous valued digits and
modulo arithmetic,” IEEE Trans. Comput., vol. 51, no. 11, pp. 1294 – 1305, nov 2002.
[32] G. Khodabandehloo, M. Mirhassani, and M. Ahmadi, “A prototype CVNS distributed neural
network using synapse-neuron modules,” IEEE Trans. Circuits Syst. I, vol. 59, no. 7, pp. 1482
–1490, July 2012.
[33] ——, “CVNS-based storage and refreshing scheme for a multi-valued dynamic memory,” IEEE
Trans. VLSI Syst., vol. 19, no. 8, pp. 1517 –1521, Aug. 2011.
[34] ——, “16-level CVNS memory with fast ADC,” Electronics Letters, vol. 45, no. 16, pp. 822
–824, 2009.
[35] M. Mirhassani, M. Ahmadi, and G. Jullien, “Robust low-sensitivity Adaline neuron based on
continuous valued number system,” Analog Integrated Circuits and Signal Processing, vol. 56,
pp. 223–231, 2008.
[36] D. Freitas and K. Current, “CMOS current comparator circuit,” Electronics Letters, vol. 19,
no. 17, pp. 695–697, 1983.
34
Chapter 3
CVNS Synapse Multiplier for Robust
Neurochips with On-Chip Learning
Software implementations of neural networks may represent the inputs and weights of the network
with a high precision while as a result of the limited word length accessible in hardware, the inputs
and weights in hardware implementations of neural networks are represented with a finite precision
[1–5].
Representing the inputs and weights with a finite precision in hardware implementations of
neural networks degrades the network output response. This makes the sensitivity of neural network
hardware structures to the input and weight errors an important issue [6–13]. The studies show that
the Noise-to-Signal-Ratio (NSR) can be used to analyze the network output sensitivity to input and
weight errors. A network with lower NSR is more robust against the input and weight errors.
The continuous Valued Number System (CVNS) is a mixed-signal number system which is pre-
viously exploited to develop neural network structures with lower sensitivity to input and weight
errors [14,15].
The main arithmetic operations in a neural network include multiplication, addition and a nonlin-
ear activation function. Although a CVNS multiplication algorithm is developed in [16], its resolution
is limited by the precision of the analog circuits used for its implementation. The precision of the
analog circuits is referred as the environment resolution in the rest of this paper.
In [17], effect of low-resolution environment on the NSR of the CVNS Adaline developed in [15]
35
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
using the previously developed CVNS multiplication algorithm is studied. The study shows that
the NSR of the CVNS Adaline for applications requiring a multiplication resolution more than the
environment resolution increases.
According to the simulations performed in [18] and [19] for on-chip neurochips, 16-bit synap-
tic weight storage resolution is required while an 8-bit resolution for representation of the activa-
tion function output is needed. Since in a neural network the output of the activation function is
multiplied by the weights, synapse multiplier with a resolution of 16×8 bits is required for VLSI
implementation of neurochips with on-chip learning.
The results of [20–22] reveals that the environment resolution for implementation of CVNS in
TSMC CMOS 0.18µm and 90nm is 4 bits.
The CVNS multiplication algorithm developed in [16] is based on the assumption that the en-
vironment resolution is as high as the resolution required for multiplication. Since the resolution
requirement for multipliers of neurochips with on-chip learning is higher than the environment res-
olution, using the previously developed multiplication algorithm for on-chip neurochips becomes
impractical.
In this paper, a new CVNS multiplication algorithm for a low-resolution environment is proposed.
The proposed algorithm provides accurate results in the low-resolution environment. Moreover, using
the proposed multiplication algorithm, the VLSI implementation of a CVNS synapse multiplier for
on-chip neurochips is realized.
The effect of the proposed multiplication algorithm on the NSR of the CVNS Adaline developed
in [15] is studied in this paper. The study shows that the proposed multiplication algorithm provides
lower NSR. This results in more error tolerant neural network structures.
The rest of this paper is organized as follows. A new CVNS multiplication algorithm for low-
resolution environment is introduced in section 3.1. Afterwards, the VLSI Implementation of the
CVNS synapse multiplier for neurochips with on-chip learning is explained in section 3.2. Post-
layout simulation is carried out in section 3.3. Comparison of the proposed CVNS multiplication
algorithm versus previously developed algorithm is conducted in section 3.4. Finally, conclusions
are drawn in section 3.5.
36
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
3.1 Proposed CVNS Multiplication Algorithm in Low Reso-
lution Environment
In this section, a new CVNS multiplication algorithm is proposed. The proposed algorithm pro-
vides accurate results in a low-resolution environment. Therefore, it can provide the multiplication
resolution required by on-chip neurochips.
The absolute value of two multiplication operands X and Y using fixed-point number system
with a radix of B can be shown as follows:
X =
Nix−1∑
i=−Nfx
xi ×Bi (3.1)
Y =
Niy−1∑
i=−Nfy
yi ×Bi (3.2)
where Nix and Nfx are the number of integer and fractional digits of the multiplication operand
X respectively. Niy and Nfy are the number of integer and fractional digits of the multiplication
operand Y respectively. xi and yi are the digits.
The CVNS digit ((z))m representing the multiplication result of two operands, X and Y , can be
written in the following form [16]:
((z))m = ((X × Y ))m =
(
X × Y ×B−m) mod B (3.3)
Replacing X and Y with their fixed-point representation using (3.1) and (3.2), (3.3) can be
written in the following form:
((z))m =
 Niy−1∑
i=−Nfy
yi ×Bi
Nix−1∑
j=−Nfx
xj ×Bj ×B−m
 mod B (3.4)
One of the basic properties of mod B operation is that, if it is applied on values which are a
multiple of B and greater than one, the outcome of this continuous modular reduction operation
will be zero. This can be shown in the following form:
k ≥ 0 ⇒ (Bk xj)mod = 0 (3.5)
Using this basic property, (3.4) can be modified in the following form:
((z))m =
 Niy−1∑
i=−Nfy
yi
m−i∑
j=−Nfx
xj ×Bj−(m−i)
 mod B (3.6)
37
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
The CVNS digit ((x))m representing the input X in an environment with resolution of ϕ is as
follows [15]:
((x))m =
m∑
i=m−ϕ+1
xi ×Bi−m (3.7)
Equation (3.7) can be exploited to represent the ((z))m in terms of the CVNS digits ((x))m which
represent the multiplication operand X in the CVNS format. In order to do this, (3.6) is modified
in the following form:
((z))m =
(
Niy−1∑
i=−Nfy
yi
( m−i∑
j=m−i−ϕ+1
xj ×Bj−(m−i) +B−ϕ
m−i−ϕ∑
j=m−i−2ϕ+1
xj ×Bj−(m−i+ϕ))
+...+B−(ni+1)ϕ
m−i−niϕ∑
j=−Nfx
xj ×Bj−(m−i+niϕ)
))
mod B (3.8)
Since each CVNS digit ((x))m has ϕ terms, the value of ni is found based on the following
condition:
m− i− niϕ+Nfx ≤ ϕ− 1 (3.9)
Therefore, ni can be calculated using the following equation:
ni =
⌈
m+ (Nfx + 1)− (i+ ϕ)
ϕ
⌉
(3.10)
in which d e is the ceiling function.
Using (3.7) and (3.8), ((z))m is written as follows:
((z))m =
(
Niy−1∑
i=−Nfy
yi
(
((x))m−i +B−ϕ((x))m−i+ϕ + ...+B−(ni+1)ϕ((x))m−i+niϕ)
))
mod B (3.11)
Repeating the same procedure from (3.3) to (3.11) for ((z))m−ϕ, ((z))m−ϕ can be calculated in
the following form:
((z))m−ϕ = Cm−ϕ mod B =
(
Niy−1∑
i=−Nfy
yi
(
((x))m−i+ϕ +B−ϕ((x))m−i+2ϕ + ...
+B−(ni+1)ϕ((x))m−i+niϕ)
))
mod B (3.12)
Using (3.8) and (3.12), we can write:
((z))m =
 Niy−1∑
i=−Nfy
yi((x))m−i +B−ϕCm−ϕ
 mod 2 (3.13)
38
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
According to (3.7), the term B−(ϕ−1) is the term with the lowest resolution present in the CVNS
digits. Therefore, only the terms greater than B−(ϕ−1) should be considered. Thus, (3.13) can be
written in the following form:
((z))m =
 Niy−1∑
i=−Nfy
yi((x))m−i +B−(ϕ−1)
⌊
Cm−ϕ
B
⌋ mod B (3.14)
where b c is the floor function.
In (3.14), the CVNS output ((z))m and the CVNS input ((x))m−i have a resolution of ϕ. Therefore,
the proposed algorithm is compatible with the environment resolution and provides accurate results
in the low-resolution environment. Moreover, the multiplication operands X and Y are represented
by the analog CVNS digits ((x))m−i and the digits yi respectively. Therefore, the proposed multipli-
cation algorithm accepts the inputs X and Y in the analog and digital format respectively. Thus,
it may be a suitable candidate for VLSI implementation of high resolution mixed-signal synapse
multipliers using low-resolution analog mixed-signal circuits.
3.2 VLSI Implementation of the CVNS Synapse Multiplier
for Neurochips with On-Chip Learning
In this section, VLSI implementation of the CVNS synapse multiplier for on-chip neurochips based
on the proposed CVNS multiplication algorithm is discussed. The VLSI implementation is realized
using TSMC CMOS 0.18µm technology.
A synapse multiplier with a resolution of 16×8 bits is required for neurochips with on-chip
learning [18,19] . Therefore, a CVNS synapse multiplier with a resolution of 16 by 8 bits is considered
for implementation.
The synaptic weight is stored on the digital registers. The output of the activation function
which is the other input to the multiplier is in the CVNS format. Therefore, to use (3.14) for
implementation of the synapse multiplier, weights are denoted by y while the other input is denoted
by the CVNS digits ((x)).
As shown in [18], variation range of (-8,8) as synaptic weight range is adequate. Therefore, 3
integer bits are required to represent the weight variation range. Moreover, since the weights are
signed variables, 1 bit is required for the sign bit. The 12 remaining bits are used to represent the
fractional part of the weights.
Weights are stored in the 2’s complement format. To have the multiplication result in the 2’s
complement format correctly, the sign bit extension is exploited. For the 16 by 8 bit multiplication,
39
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
the sign bit is extended by 8 bits to produce the correct result. Considering the sign extension, both
Niy and Nfy parameters in (3.14) are equal to 12.
The output of the sigmoid activation function is always positive and less than one. Therefore,
all of the 8 bits representing the output of the sigmoid activation function are fractional digits. This
results in Nix = 0 and Nfx = 8. According to (3.7), the CVNS digit set {((x))2, ((x))1, ..., ((x))−8} can
represent the output of the sigmoid activation function in the CVNS format. Moreover, the radix of
CVNS is considered to be two. This provides the most efficient radix for conversion between binary
and CVNS. According to simulations performed in [20–22], the environment resolution for CVNS is
four bits. Therefore, assuming an environment resolution of 4 and a radix of 2, (3.14) can be written
in the following form:
((z))m =
(
11∑
i=−12
yi((x))m−i + 2−3
⌊
Cm−4
2
⌋)
mod 2 (3.15)
In the equation above, due to the sign extension, y4 to y11 are equal to each other and have the
same value as y3 which is the sign bit of the synaptic weight.
The multiplication result resolution for a 16×8 multiplier is 24 bits. Since the variation range
of the inputs to the multiplier are (-8,8) and (0,1), the output range of the multiplier is (-8,8).
Therefore, 3 bits are required to represent the integer part of the multiplication result while 1 bit
represents the sign bit. The remaining 20 bits represent the fractional part of the multiplication
result. Since the environment resolution is four, each CVNS digit includes the information of four
bits. Therefore, six CVNS digits can represent the 24-bits multiplication result. Thus, the CVNS
digit set ((z)) = {((z))3, ((z))−1, ((z))−5, ((z))−9, ((z))−13, ((z))−17} represents the output of the CVNS
multiplier.
Current-mode circuits are used to realize (3.15). In the VLSI implementation of (3.15), 8 µA is
indicator of 1. Therefore, (3.15) can be written in the following form:
((z))m =
(
11∑
i=−12
yi((x))m−i + 1µA×
⌊
Cm−4
16µA
⌋)
mod 16 µA (3.16)
To implement (3.16), the basic building blocks required for its implementation are realized.
Considering that the addition in the current-mode circuits is easily performed through wiring the
nodes carrying the signals, the main building blocks required for implementation of (3.16) are mod
16 µA and
⌊
Cm−4
16 µA
⌋
. Therefore, the VLSI implementation of these blocks is discussed next.
40
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
3.2.1 VLSI implementation of mod 16 µA
The mod 16 µA operation can be written in the following form:
x mod 16 µA =
x x < 16 µAx− 16 µA x ≥ 16 µA (3.17)
where x is the input to mod16µA block.
The mod 16 µA operation circuit is shown in Fig. 3.1. The circuit is composed of five main
sections including the input current mirror, the current comparator, the current subtractor, the
inverter chain and the output current mirror. Before going through different sections of the circuit,
it should be noted that (3.17) can be modified in the following form:
x mod 16 µA =
(
x
2 )× 2 x2 <8 µA
(x2 − 8 µA)× 2 x2 ≥ 8 µA
(3.18)
According to (3.18), the current comparator and subtractor sections should compare the x2 with
8 µA and subtract the x2 from 8 µA. This in turn reduces the current in these sections which results
in power consumption reduction of the mod 16 µA block.
The input current mirror is a low-voltage cascode current mirror which copies the input current
divided by two to the current comparator and the current subtractor sections precisely. The tran-
sistors M1 and M2 generate the bias voltage required for the current mirror while the transistors M3
to M8 form a cascode current mirror. The
W
L of the transistors M5 to M8 is half of the transistors
M3 and M4. Therefore, the input current divided by two will be copied to the current comparator
and subtractor sections.
The current comparator is based on the structure developed in [23] and compares the input
current with a reference current of 8 µA. The 8 µA reference current is generated by the transistors
M9 to M11. The generated reference current is copied to the transistors M12 and M13. Output of
the comparator is connected to the inverter chain which provides a rail to rail output. The current
comparator along with the inverter chain generates the CmpN and CmpP signals equal to 0 and 1.8
volt provided that the input current to the circuit is greater than 16 µA.
The two transistors M16 and M17 act as a transmission gate which turns on provided that the
input current is greater than 16 µA. This allows the 8 µA reference current to flow through these
transistors. Therefore, when the input is greater than 16 µA, the 8 µA reference current is subtracted
from the input current divided by two and flows through the transistors M18 and M19. Otherwise,
half of the input current flows through the transistors M18 and M19.
41
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
Figure 3.1: VLSI implementation of mod16µA operation
To copy the current through the transistors M18 and M19 to the output precisely, a current mirror
with high output resistance is required. The transistors M18 to M21 form a Wilson current mirror.
42
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
Table 3.1: Transistor sizes of the mod16µA circuit
Transistor
(
W
L
)
(µmµm ) Transistor
(
W
L
)
(µmµm )
M1 (
0.22
0.18 ) M15 (
2.77
0.18 )
M2 (
0.22
0.18 ) M16 (
1
0.5 )
M3 (
0.44
0.18 ) M17 (
1
0.5 )
M4 (
0.22
0.18 ) M18 (
2.5
0.18 )
M5 (
0.22
0.18 ) M19 (
2.5
0.18 )
M6 (
0.22
0.18 ) M20 (
5
0.18 )
M7 (
0.22
0.18 ) M21 (
5
0.18 )
M8 (
0.22
0.18 ) M22 (
0.9
0.18 )
M9 (
2.77
0.18 ) M23 (
0.22
0.18 )
M10 (
2.77
0.18 ) M24 (
0.9
0.18 )
M11 (
2.77
0.18 ) M25 (
0.22
0.18 )
M12 (
2.77
0.18 ) M26 (
0.9
0.18 )
M13 (
2.77
0.18 ) M27 (
0.22
0.18 )
M14 (
2.77
0.18 )
The Wilson current mirror provides a high output resistance. Moreover, using this current mirror,
the voltage at node S of Fig. 3.1 is VDD-(VDS19+VGS18). By proper sizing of the transistors M18
and M19, this voltage provides the proper voltage to keep the transistors M7, M8, M14 and M15 in
saturation. The WL of the transistors M20 and M21 is two times of the transistors M18 and M19.
Therefore, the current through the transistors M18 and M19 is doubled and copied to the output of
the mod 16 µA circuit. The transistor sizes of the mod 16 µA circuit are shown in Table 3.1.
43
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
3.2.2 VLSI implementation of
⌊
Cm−4
16µA
⌋
According to (3.12), ((z))m−4 is obtained by applying the mod 16 µA operation to Cm−4. The
mod 16 µA circuit implemented in the previous section is applied to the summation of two input
currents. Therefore, mod 16 µA operation is applied to each summation term of Cm−4. The CmpN
output signal of the inverter chain in the mod 16 µA circuit is an indicator that the input current
to this block is greater than 16 µA. Therefore, the number of CmpN signals equal to 1.8 volt in the
mod 16 µA circuits used to evaluate ((z))m−4 is equal to
⌊
Cm−4
16 µA
⌋
. Thus, each cmpN signal generated
by the ((z))m−4 evaluation circuit should generate a 1 µA current which is used to evaluate the ((z))m.
The circuit used to implement 1 µA ×
⌊
Cm−4
16 µA
⌋
is shown in Fig. 3.2. The transistors M1 to M3
generate a current of 1 µA. This current is copied to the transistors M4 to M21 using the cascode
current mirror provided that the corresponding cmpi signal is active. The cmp0 to cmp5 inputs are
connected to the CmpN outputs of the mod 16 µA circuits used to evaluate ((z))m−4. Therefore,
this circuit implements the 1 µA ×
⌊
Cm−4
16 µA
⌋
required for evaluation of ((z))m. All of the transistors
have the same WL equal to
0.3 µm
0.18 µm . The implemented circuit has a maximum of 6 inputs. Therefore,
based on the number of CmpN signals generated in the evaluation of ((z))m−4, the appropriate
number of 1 µA×
⌊
Cm−4
16 µA
⌋
block should be used to evaluate ((z))m.
3.2.3 VLSI implementation of the CVNS synapse multiplier
Using the basic building blocks designed in the previous sections, VLSI implementation of the CVNS
multiplier is conducted. According to (3.16), the CVNS digits representing the multiplication result
in the CVNS format can be shown in the following form:
((z))3 =
(
11∑
i=1
yi((x))3−i + 1 µA×
⌊
C−1
16 µA
⌋)
mod 16 µA
((z))−1 =
(
7∑
i=−3
yi((x))−1−i + 1 µA×
⌊
C−5
16 µA
⌋)
mod 16 µA
((z))−5 =
(
3∑
i=−7
yi((x))−5−i + 1 µA×
⌊
C−9
16 µA
⌋)
mod 16 µA
((z))−9 =
( −1∑
i=−11
yi((x))−9−i + 1 µA×
⌊
C−13
16 µA
⌋)
mod 16 µA
((z))−13 =
( −5∑
i=−12
yi((x))−13−i + 1 µA×
⌊
C−17
16 µA
⌋)
mod 16 µA
((z))−17 =
( −9∑
i=−12
yi((x))−17−i
)
mod 16 µA
(3.19)
VLSI implementation of different CVNS digits representing the multiplication result is discussed
next.
44
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
Figure 3.2: VLSI implementation of 1 µA×
⌊
Cm−4
16 µA
⌋
operation
VLSI implementation of ((z))−17
Block diagram of the circuit used for implementation of ((z))−17 is shown in Fig. 3.3. As may be
noted from Fig. 3.3, the input currents ((x))−5 to ((x))−8 are applied to the transistors M1 to M4.
These transistors act as switches which turn on provided that the corresponding yi input to their
gate is high. This in turn implements the yix−17−i terms required to calculate ((z))−17.
The input currents after passing the input transistors are summed at the input nodes of the
mod 16 µA blocks and mod 16 µA operation is applied to them. Output of the mod 16 µA blocks
which receive the input currents are wired together to perform the addition. The addition result
goes through another mod 16 µA block. This generates the ((z))−17 output CVNS digit. The
three cmp−170 to cmp−172 signals generated by the mod 16 µA blocks are used as an input to the
1 µA×
⌊
C−17
16 µA
⌋
circuit used to evaluate ((z))−13.
VLSI implementation of ((z))−13
The block diagram of the circuit used for implementation of ((z))−13 is shown in Fig. 3.4. The
operation principle of the circuit is the same as the circuit used for evaluation of the ((z))−17. As
45
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
Figure 3.3: Block diagram of the VLSI implementation of ((z))−17
can be seen from Fig. 3.4, the cmp−170 to cmp−172 generated by the ((z))−17 evaluation block are
used as input to the 1 µA×
⌊
C−17
16 µA
⌋
block. Since this block has 6 inputs and only three inputs are
used, the remaining inputs are disabled by connecting them to the ground.
VLSI implementation of ((z))−9, ((z))−5, ((z))−1 and ((z))3
All of these CVNS digits can be implemented using the same structure. The structure used for
evaluation of ((z))−9, ((z))−5, ((z))−1 and ((z))3 is shown in Fig. 3.5. The structure operation is
similar to the ones used for evaluation of ((z))−17 and ((z))−13.
To evaluate the CVNS digits ((z))−9, ((z))−5, ((z))−1 and ((z))3 , the proper inputs to the structure
shown in Fig. 3.5 is applied. The CVNS inputs and the digital weights are denoted by ((x))i and
yj respectively. The outputs of the mod 16 µA are shown as cmpk0 to cmpk10. The inputs to
1 µA×
⌊
Cm−4
16 µA
⌋
blocks are shown as cmpl0 to cmpl11. These are the CmpN outputs of mod 16 µA
blocks used to evaluate the ((z))m−4 output CVNS digit. The output of this structure is denoted
as ((z))m. The parameters of the structure shown in Fig. 3.5 for different output CVNS digits are
summarized in Table 3.2.
46
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
Figure 3.4: Block diagram of the VLSI implementation of ((z))−13
Using the circuits explained in the previous sections, the proposed CVNS synapse multiplier is
laid out . The layout of the proposed CVNS synapse multiplier is shown in Fig. 3.6. The layout has
an area equal to 14953.67 µm2.
3.3 Post-Layout Simulation
In this section, post-layout simulation of the proposed CVNS synapse multiplier is conducted. Per-
formance of the VLSI implementation of the proposed CVNS synapse multiplier is verified by con-
ducting post-layout simulations for various inputs.
Fig. 3.7 shows the post-layout simulation results for four different input values. These inputs
47
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
Figure 3.5: VLSI implementation of ((z))−9, ((z))−5, ((z))−1, ((z))3
are applied with intervals of 0.5 µs to the CVNS multiplier. The digital weights y and the CVNS
inputs ((x)) corresponding to these four inputs are shown in Table 3.3. Using (3.19), the expected
output for each input can be calculated. The relation between the output CVNS digits with inputs
to the CVNS multiplier is summarized in Table 3.3. The post-layout simulation results shown in
Fig. 3.7 are in good agreement with the mathematical derivations presented in Table 3.3.
48
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
Figure 3.6: Layout of the proposed CVNS synapse multiplier
Table 3.2: Parameters of the structure shown in Fig. 3.5 for different output CVNS digits
Output CVNS Digit i j k l m
((z))3 2 1 3 -1 3
((z))−1 0 -3 -1 -5 -1
((z))−5 2 -7 -5 -9 -5
((z))−9 2 -11 -9 -13 -9
3.4 Comparison with Previously developed CVNS multipli-
cation algorithm
In this section, effect of the proposed CVNS multiplication algorithm on the NSR of a CVNS
Adaline is discussed. Moreover, comparison of the NSR of a CVNS Adaline using the proposed
CVNS multiplication algorithm versus previously developed multiplication algorithm is carried out.
The previously developed multiplication algorithm is limited by the environment resolution.
In [17], the effect of this limitation on the NSR of a CVNS Adaline is studied. The study shows that
49
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
Table 3.3: Synapse multiplier input values and their corresponding output
y ((x)) ((z))
1111 1111 1111 1111 {1 µA, 3 µA, 6 µA, 12 µA, 8 µA, 1 µA {15 µA, 15 µA, 15 µA
, 3 µA, 6 µA, 12 µA, 8 µA, 0 µA} , 15 µA, 3 µA, 10 µ}
0010 1010 1010 1010 {0 µA, 0 µA, 0 µA, 0 µA, 1 µA, 3 µA {0 µA, 2 µA, 7 µA
, 7 µA, 15 µA, 14 µA, 12 µA, 8 µA} , 15 µA, 15 µA, 6 µA}
0111 1111 0000 0000 {1 µA, 2 µA, 5 µA, 10 µA, 5 µA, 10 µA {5 µA, 4 µA, 5 µA
, 5 µA, 10 µA, 4 µA, 8 µA, 0 µA} ,6 µA, 0 µA, 0 µA}
1101 0101 0101 0101 {1 µA, 3 µA, 7 µA, 15 µA, 14 µA, 12 µA {14 µA, 7 µA, 15 µA
, 8 µA, 0 µA, 0 µA, 0 µA, 0 µA} , 15 µA, 11 µA, 0 µA}
Figure 3.7: Post-layout simulation results of the proposed CVNS synapse multiplier
limitation of the previously developed multiplication algorithm increases the NSR of the Adaline.
This in turn degrades the output response.
50
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
The NSR of the CVNS Adaline developed in [15] in a low-resolution environment using the
previously developed CVNS multiplication algorithm can be shown in the following form [17]:
NSRl = g
(
σxσw
BDD
√
N
)
×
(
σ2∆x
σ2x
+
σ2∆w
σ2w
)
(3.20)
where g is the stochastic gain function. The stochastic gain is a function of σxσw
BDD
√
N
. N is the
number of inputs, σx and σw are standard deviations of inputs and weights respectively. ∆x and
∆w are the input and weight errors respectively and DD is obtained in the following form:
DD + 1 =
⌈
D + 1
ϕ
⌉
(3.21)
where D + 1 is the number of CVNS digits used.
As shown in the previous sections, the proposed CVNS multiplication algorithm eliminates the
effect of low-resolution environment on the multiplication result. Therefore, the low-resolution en-
vironment has no effect on the NSR of CVNS Adaline when the proposed CVNS synapse multiplier
is exploited. Thus, the NSR of a CVNS Adaline using the proposed multiplication algorithm is in
the following form [17]:
NSRp = g
(
σxσw
BD
√
N
)
×
(
σ2∆x
σ2x
+
σ2∆w
σ2w
)
(3.22)
The stochastic gain function, g(x), for x ≤ 1, is approximately equal to 1, while for x > 1 can
be estimated as follows [24]:
g(x) = 0.5 + 0.53× x (3.23)
According to (3.23), the stochastic gain is a linear function of its input. Therefore, g(x) ∝ x and
(3.20) and (3.22) can be written in the following form:
NSRl ∝ σxσw
BDD
√
N
×
(
σ2∆x
σ2x
+
σ2∆w
σ2w
)
(3.24)
NSRp ∝ σxσw
BD
√
N
×
(
σ2∆x
σ2x
+
σ2∆w
σ2w
)
(3.25)
Dividing (3.20) by (3.22) results in the following equation:
NSRp
NSRl
∝ BDD−D (3.26)
Considering that D is always greater than DD, the NSR of the CVNS Adaline using the proposed
algorithm will always be lower. This results in an Adaline with lower sensitivity to input and weight
errors. Therefore, the proposed CVNS multiplication algorithm can be used to design synapse
multiplier for robust neurochips with on-chip learning. To illustrate this, a case study is provided.
51
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
Figure 3.8: NSR of the CVNS Adaline using the proposed multiplication algorithm versus the
previously developed multiplication algorithm
Case Study: A CVNS Adaline with 16-bit weight storage resolution is considered. Inputs and
weights of the Adaline are uniformly distributed in the range of (-8,8). The input and weight variance
is equal to σ2x = σ
2
w =
162
12 . The radix of the CVNS is considered to be two. Number of the CVNS
digits used in the CVNS-DNN Adaline, D+ 1, is equal to four. The environment resolution is four.
Therefore, DD + 1 is equal to one.
NSR of the CVNS Adaline for number of inputs in the range of [2,10] using the previous and
proposed algorithm is calculated. This is conducted using (3.20) and (3.22). The results are shown
in Fig. 3.8. As may be noted from Fig. 3.8, the proposed multiplication algorithm results in
an Adaline with lower NSR. Therefore, the proposed algorithm provides neural network structures
which are more input and weight error tolerant.
3.5 Conclusion
A new CVNS multiplication algorithm is proposed in this paper. The proposed algorithm provides
accurate results in the low-resolution environment. Moreover, the VLSI implementation of a CVNS
synapse multiplier for neurochips with on-chip learning is realized. The post-layout simulations
of the implemented CVNS synapse multiplier confirms its performance. The comparison of NSR
of the CVNS Adaline using the proposed CVNS multiplication algorithm versus a CVNS Adaline
using the previously developed multiplication algorithm is conducted. The comparison shows that
52
3. CVNS SYNAPSE MULTIPLIER FOR ROBUST NEUROCHIPS WITH ON-CHIP LEARNING
the proposed CVNS multiplication algorithm provides a lower NSR. Therefore, the proposed CVNS
multiplication algorithm provides more robust neural network structures.
53
REFERENCES
3.6 References
[1] B. Zamanlooy and M. Mirhassani, “Efficient hardware implementation of threshold neural net-
works,” in New Circuits and Systems Conference (NEWCAS), 2012 IEEE 10th International,
June 2012, pp. 1–4.
[2] A. Tisan, M. Cirstea, S. Oniga, and A. Buchman, “Artificial olfaction system with hardware
on-chip learning neural networks,” in 2010 12th International Conference on Optimization of
Electrical and Electronic Equipment (OPTIM), May 2010, pp. 884–889.
[3] G. Zatorre, N. Medrano, M. Sanz, B. Calvo, P. Martinez, and S. Celma, “Designing adaptive
conditioning electronics for smart sensing,” IEEE Sensors J., vol. 10, no. 4, pp. 831–838, Apr.
2010.
[4] R. Jimnez, M. Snchez-Raya, J. Gmez-Galn, J. Flores, J. Dueas, and I. Martel, “Implementation
of a neural network for digital pulse shape analysis on a FPGA for on-line identification of
heavy ions,” Nuclear Instruments and Methods in Physics Research Section A: Accelerators,
Spectrometers, Detectors and Associated Equipment, vol. 674, no. 0, pp. 99 – 104, 2012.
[5] S. Jung and S. su Kim, “Hardware implementation of a real-time neural network controller with
a DSP and an FPGA for nonlinear systems,” IEEE Trans. Ind. Electron., vol. 54, no. 1, pp. 265
–271, Feb. 2007.
[6] M. Stevenson, R. Winter, and B. Widrow, “Sensitivity of feedforward neural networks to weight
errors,” IEEE Trans. Neural Netw., vol. 1, no. 1, pp. 71–80, Mar. 1990.
[7] Y. Xie and M. Jabri, “Analysis of the effects of quantization in multilayer neural networks using
a statistical model,” IEEE Trans. Neural Netw., vol. 3, no. 2, pp. 334–338, Mar. 1992.
[8] C. Alippi, V. Piuri, and M. Sami, “Sensitivity to errors in artificial neural networks: a behavioral
approach,” in IEEE International Symposium on Circuits and Systems(ISCAS), vol. 6, 1994, pp.
459 –462.
[9] S. Piche, “The selection of weight accuracies for Madalines,” IEEE Trans. Neural Netw., vol. 6,
no. 2, pp. 432–445, Mar. 1995.
[10] C. Alippi and L. Briozzo, “Accuracy vs. precision in digital VLSI architectures for signal pro-
cessing,” IEEE Trans. Comput., vol. 47, no. 4, pp. 472–477, Apr. 1998.
[11] X. Zeng and D. Yeung, “Sensitivity analysis of multilayer perceptron to input and weight
perturbations,” IEEE Trans. Neural Netw., vol. 12, no. 6, pp. 1358–1366, Nov. 2001.
[12] D. Yeung and X. Sun, “Using function approximation to analyze the sensitivity of MLP with
antisymmetric squashing activation function,” IEEE Trans. Neural Netw., vol. 13, no. 1, pp.
34–44, Jan. 2002.
[13] S.-S. Yang, C.-L. Ho, and S. Siu, “Computing and analyzing the sensitivity of MLP due to the
errors of the i.i.d. inputs and weights based on CLT,” IEEE Trans. Neural Netw., vol. 21, no. 12,
pp. 1882 –1891, Dec. 2010.
54
REFERENCES
[14] M. Mirhassani, M. Ahmadi, and G. Jullien, “Robust low-sensitivity Adaline neuron based on
continuous valued number system,” Analog Integrated Circuits and Signal Processing, vol. 56,
pp. 223–231, 2008.
[15] G. Khodabandehloo, M. Mirhassani, and M. Ahmadi, “Resistive-type CVNS distributed neural
networks with improved noise-to-signal ratio,” IEEE Trans. Circuits Syst. II, vol. 57, no. 10, pp.
793–797, Oct. 2010.
[16] A. Saed, M. Ahmadi, and G. Jullien, “A number system with continuous valued digits and
modulo arithmetic,” IEEE Trans. Comput., vol. 51, no. 11, pp. 1294 – 1305, nov 2002.
[17] G. Khodabandehloo, M. Mirhassani, and M. Ahmadi,“A study on resistive-type truncated
CVNS distributed neural networks,” in Circuits and Systems (ISCAS), 2011 IEEE International
Symposium on, 2011, pp. 2685–2688.
[18] K. Asanovir, and N. Morgan, “Experimental determination of precision requirements for back-
propagation training of artificial neural networks,” in 2nd International Conference on Micro-
electronics for Neural Network, 1991, pp. 9–15.
[19] J. Holt and T. Baker, “Back propagation simulations using limited precision calculations,” in
Neural Networks, 1991., IJCNN-91-Seattle International Joint Conference on, vol. ii, Jul 1991,
pp. 121 –126 vol.2.
[20] G. Khodabandehloo, M. Mirhassani, and M. Ahmadi, “A prototype CVNS distributed neural
network using synapse-neuron modules,” IEEE Trans. Circuits Syst. I, vol. 59, no. 7, pp. 1482–
1490, Jul. 2012.
[21] ——, “CVNS-based storage and refreshing scheme for a multi-valued dynamic memory,” IEEE
Trans. VLSI Syst., vol. 19, no. 8, pp. 1517 –1521, Aug. 2011.
[22] ——, “16-level CVNS memory with fast ADC,” Electronics Letters, vol. 45, no. 16, pp. 822
–824, 2009.
[23] D. Freitas and K. Current, “CMOS current comparator circuit,” Electronics Letters, vol. 19,
no. 17, pp. 695–697, 1983.
[24] H. Djahanshahi, M. Ahmadi, G. Jullien, and W. Miller, “Quantization noise improvement in a
hybrid distributed-neuron ANN architecture,” IEEE Trans. Circuits Syst. II, vol. 48, no. 9, pp.
842 –846, sep 2001.
55
Chapter 4
Mixed-Signal VLSI Neural Network
Based on Continuous Valued Number
System
ASIC implementation of neural networks has been exploited in various applications. Examples
include pattern recognition [1], real-time surface discrimination [2], and, smart sensing [3].
The ASIC implementation methods of neural networks may be categorized as analog, digital or
mixed-signal. In the analog neural networks, both weight storage and processing are conducted using
analog circuits. When implemented by analog circuits, neural networks typically possess a higher
energy efficiency, and require less area, in comparison with their identical digital implementation.
However, the capacitor-based weight storage in analog designs requires refreshing and is susceptible
to process and power supply variations [4].
In the digital implementation of a neural network, both weight storage and processing are carried
out in the digital domain. The third implementation method, mixed-signal, utilizes digital registers
for weight storage and analog circuits for signal processing. This method benefits from the ease
of weight storage in digital registers while capitalizing on the advantages of analog domain such
as compact addition and nonlinear neuron. Therefore, the mixed-signal method may be a suitable
candidate for the ASIC implementation of neural networks.
One of the main obstacles in using the mixed-signal method is the limited precision of the analog
56
4. MIXED-SIGNAL VLSI NEURAL NETWORK BASED ON CONTINUOUS VALUED NUMBER SYSTEM
signal which is limited by the precision of the analog circuits used. The precision of the analog
circuits is referred as the environment resolution in the rest of this paper.
To use the benefits of the analog circuits while keeping the accuracy high, the Continuous Valued
Number System (CVNS) can be utilized. The information represented by each CVNS analog digit
is the same as the environment resolution. However, collectively a set of digits can increase the
precision of analog processing. This makes the CVNS appropriate for implementing high precision
analog and mixed-signal circuits [5–7].
The results of [5–7] show that implementation of the CVNS in TSMC 0.18µm and 90 nm with 4-
bit resolution for each analog digit is viable. Since the TSMC 0.18µm technology has been exploited
in this paper, the same setting is used.
In this paper, a mixed-signal 2-2-1 CVNS neural network structure is proposed. The proposed
structure realizes the XOR function. The weights of the network are stored in the digital registers
with 16-bit resolution. The signal processing of the proposed structure is based on the CVNS
arithmetic. Using the CVNS arithmetic, the resolution requirements of neural networks is satisfied.
The proposed network is designed and laid out in 0.18µm technology. The performance of the
network is confirmed by the post-layout simulations.
This paper is organized as follows. The proposed structure for the 2-2-1 CVNS network and its
VLSI implementation are explained in section 4.1. Post-layout simulation results are discussed in
section 4.2. Finally, conclusions are drawn in section 4.3.
4.1 VLSI Implementation of the CVNS Neural Network
In this section, the proposed structure and its VLSI implementation are discussed. The proposed
structure is a 2-2-1 network. Weights are stored in the digital registers while the signal processing
is based on the CVNS arithmetic.
The block diagram of the proposed structure is shown in Fig. 4.1. It is composed of four main
units including the input to CVNS converter, the hidden Adaline, the output Adaline, and the
output to binary converter. The proposed network realizes the two input XOR function.
The input to CVNS converter transforms the binary input to CVNS. The hidden Adaline receives
the input in the CVNS format. The CVNS input is multiplied by the weights stored in the digital
registers. The weights are stored with a 16-bit resolution. This makes the proposed structure
compatible with the weight storage resolution required by neural networks [8]. Since the weight and
input resolution are 16-bits and 1-bit respectively, a CVNS multiplier with a resolution of 16×1 is
57
4. MIXED-SIGNAL VLSI NEURAL NETWORK BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Figure 4.1: The block diagram of the 2-2-1 CVNS network realizing the XOR function
required for the hidden Adalines.
The bias stored in the digital registers is converted to the CVNS using the bias to CVNS converter
block. Then, the outputs of the CVNS multipliers and the bias to CVNS converter block are added
together by the CVNS adder. Since the resolution of the hidden layer multipliers is equal to 16×1
and the bias resolution is 16-bits, the hidden layer CVNS adder required resolution is 18-bits.
The output of the CVNS adder passes through the CVNS sigmoid function. The CVNS sigmoid
function is based on the structure developed in [9]. This structure provides an output with 8-bits
resolution. This in turn generates the output of the hidden Adaline.
The output Adaline is similar to the hidden Adaline. The difference is that the inputs to the
CVNS multipliers in this layer are the outputs of the CVNS sigmoid functions of the hidden Adalines.
Since the output resolution of the CVNS sigmoid function is 8-bits [9], CVNS multipliers with a
resolution of 16×8 are required for this layer. The output Adaline CVNS adder performs the
addition operation on the outputs of the two 16×8 CVNS multiplier and the bias to CVNS converter.
58
4. MIXED-SIGNAL VLSI NEURAL NETWORK BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Table 4.1: Required resolution of different arithmetic units
Arithmetic Unit Required Resolution (bits)
Hidden layer multiplier 16×1
Hidden layer adder 18
Output layer multiplier 16×8
Output layer adder 26
CVNS sigmoid function 8
Therefore, it requires a resolution of 26-bits. The output to binary converter transforms the output
of the output Adaline to binary format.
The resolution requirements of different arithmetic units in the proposed CVNS network are
summarized in Table 4.1. It is worth noting that the required resolution for various arithmetic
units is higher than the environment resolution. However, the CVNS arithmetic used for the VLSI
implementation makes the signal processing with the desired resolution feasible.
VLSI implementation of the proposed structure is realized using current-mode circuits. Generally,
current-mode circuits provide lower power consumption, higher speed and have the ability of working
with lower power supply voltages. Moreover, some arithmetic operations such as addition can be
easily realized. The VLSI implementation of each block is discussed in detail next.
4.1.1 Input to CVNS Converter
The absolute value of a real number X using fixed-point binary number system format can be shown
as follows:
X =
Ni−1∑
i=−Nf
xi × 2i (4.1)
where Ni and Nf are the number of integer and fractional digits, while xi is a binary digit.
A radix-2 CVNS digit ((X))m representing X can be truncated to the following equation [7]:
((X))m =
m∑
i=m−ϕ+1
xi × 2i−m (4.2)
where ϕ is the environment resolution. As previously discussed, the environment resolution for the
CVNS in 0.18µm technology is equal to four.
59
4. MIXED-SIGNAL VLSI NEURAL NETWORK BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Using (4.2) with an environment resolution of four, the CVNS digit set ((X))={((X))3, ((X))2, ((X))1
, ((X))0} =
{
x0
8 ,
x0
4 ,
x0
2 , x0
}
can represent the one-bit input to the XOR network in CVNS format.
In the VLSI implementation of the network, 8 µA is indicator of 1. Therefore, the VLSI implemen-
tation of the input to CVNS converter should generate the CVNS digit set ((X))={1 µA, 2 µA, 4 µA
, 8 µA} provided that the input is one. Otherwise, all of the CVNS digits representing the input will
be zero.
The circuit configuration of the binary input to CVNS conversion for each CVNS digit ((X))i is
shown in Fig. 4.2. The input of the circuit is equal to X while the output represents ((X))i.
In the circuit shown in Fig. 4.2, the transistor M1 acts as a switch which turns on provided that
the input is one. By proper sizing of the transistors M2 and M3, the current corresponding to the
CVNS digit ((X))i is generated. The two transistors M3 and M4 act as a current mirror which copies
the current generated by the transistors M2 and M3 to the output.
4.1.2 Hidden Adaline
The two hidden Adalines shown in Fig. 4.1 are composed of four main signal processing units
including CVNS multiplier, bias to CVNS converter, CVNS adder and CVNS sigmoid function.
Each block will be discussed subsequently.
Multiplier
The CVNS inputs to the hidden Adaline are multiplied by the weights stored in the digital registers.
The CVNS multiplier in the hidden Adaline performs the mentioned multiplication.
Multiplication of the digital weights by the input in the CVNS format can be carried out using
the following equation [10]:
((Y ))m =
 Niw−1∑
i=−Nfw
wi((X))m−i + 2−(ϕ−1)
⌊
Cm−ϕ
2
⌋ mod 2 (4.3)
where Niw and Nfw are the number of integer and fractional bits representing the weights respec-
tively, wi is the binary digits of the weight, ((X))m and ((Y ))m are the CVNS digits representing the
input and the multiplication result respectively. Cm−ϕ can be obtained in the following form [10]:
Cm−ϕ =
Niw−1∑
i=−Nfw
wi((X))m−ϕ−i (4.4)
A resolution of 16-bits with a range of (-8,8) for weight and bias storage is required [8]. 4 integer
bits are required to cover the (-8,8) input range. The remaining 12 bits are the fractional bits.
60
4. MIXED-SIGNAL VLSI NEURAL NETWORK BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Figure 4.2: VLSI implementation of the binary input to CVNS conversion, bi × 8 µA × 2i−m and
1 µA×
(
Cam−ϕ cmp 16 µA
)
Since the weights and the input have a resolution of 16-bits and 1-bit respectively, the output
resolution of the CVNS multiplier is 16-bits. Moreover, the same number of integer and fractional bits
used to represent the weights can represent the multiplication result. Therefore, using (4.1) and (4.2),
with an environment resolution of four, the CVNS digit set ((Y ))= {((Y ))3, ((Y ))−1, ((Y ))−5, ((Y ))−9}
can represent the output of the hidden Adaline CVNS multiplier. The ((Y ))3 CVNS digit includes
the information of the integer bits while ((Y ))−1, ((Y ))−5, and, ((Y ))−9 CVNS digits represent the
information of the fractional bits.
The maximum of (4.4) occurs when all the wi bits are one and the CVNS digits ((X))m are at
their maximum value. For the hidden Adaline CVNS multiplier, it can be easily proven that the
Cm−ϕ will always be less than two. Therefore, (4.3) can be modified in the following form:
((Y ))m =
 Niw−1∑
i=−Nfw
wi((X))m−i
 mod 2 (4.5)
The maximum of
Niw−1∑
i=−Nfw
wi((X))m−i happens when all the wi bits are one and the CVNS digits
((X))m−i are at their maximum value. Therefore,
Niw−1∑
i=−Nfw
wi((X))m−i < 2. If the input to mod2
function is less than 2, it passes the input to the output without change. Thus, (4.5) can be written
61
4. MIXED-SIGNAL VLSI NEURAL NETWORK BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Figure 4.3: VLSI implementation of the first layer multiplier
as follows:
((Y ))m =
Niw−1∑
i=−Nfw
wi((X))m−i (4.6)
Considering that the input CVNS digit set ((X)) includes only the CVNS digits ((X))0 to ((X))3,
(4.6) can be simplified in the following form:
((Y ))m =
m∑
i=m−3
wi((X))m−i (4.7)
The VLSI implementation of (4.7) is shown in Fig. 4.3. VLSI implementation of wi((X))m−i
terms are realized using the transistors which receive ((X))i as their inputs and are controlled by the
wi bits. Since the current-mode circuits are exploited, the summation term is easily implemented
through wiring the transistors outputs.
Bias to CVNS Converter
The bias is stored in the digital registers with the same resolution and variation range of the weights.
Since the CVNS adder accepts its inputs in the CVNS format, the bias value stored in the digital
registers should be converted to CVNS.
The 16-bit bias can be represented by four CVNS digits. Since the bias variation range is (-8,8),
it is represented with 4 integer bits and 12 fractional bits. Therefore, using (4.2), the CVNS digit
set ((B))= {((B))3, ((B))−1, ((B))−5, ((B))−9} can represent the bias in the CVNS format. The ((B))3
includes the information of the four integer bits while ((B))−1, ((B))−5, and, ((B))−9 include the
information of the twelve fractional bits.
Keeping in mind that in the implemented hardware, 8 µA is indicator of one and using (4.2),
62
4. MIXED-SIGNAL VLSI NEURAL NETWORK BASED ON CONTINUOUS VALUED NUMBER SYSTEM
the CVNS digits representing the bias can be shown in the following form:
((B))m =
m∑
i=m−3
bi × 8 µA× 2i−m (4.8)
According to (4.8), the bm, bm−1, bm−2 and, bm−3 should generate 8 µA, 4 µA, 2 µA and, 1 µA
respectively provided that the corresponding bias bit is one.
The VLSI implementation of the bi × 8 µA × 2i−m terms is carried out using the same circuit
configuration shown in Fig. 4.2. To implement these terms, the bi is applied to the input of the
circuit while the output is bi × 8 µA × 2i−m. By proper sizing of the transistors M2 and M3, each
bias bit,bi, generates the respective current equal to 8 µA × 2i−m provided that the bi is one. The
summation term present in (4.8) is implemented through wiring the output of circuits realizing the
bi × 8 µA× 2i−m terms.
CVNS Adder
The CVNS adder is designed based on the CVNS addition algorithm developed in [9]. Addition of
two CVNS operands is carried out by the addition of their corresponding CVNS digits. Addition
of two CVNS digits ((Y1))m and ((Y2))m in a low-resolution environment is conducted based on the
following equation [9]:
((Z))m =
[
((Y1))m + ((Y2))m + 1 µA×
(
Cam−ϕ cmp 16 µA
)]
mod 16 µA (4.9)
where ((Z))m is the CVNS digit representing the addition result. Cam−ϕ can be obtained using the
following equation [9]:
Cam−ϕ = ((Y1))m−ϕ + ((Y2))m−ϕ (4.10)
The Cam−ϕ cmp 16 µA can be written in the following form [9]:
Cam−ϕ cmp 16 µA =
1 Cam−ϕ ≥ 16 µA0 Cam−ϕ < 16 µA (4.11)
The block diagram of the CVNS adder for two CVNS digits ((Y1))m and ((Y2))m is shown
in Fig. 4.4. The two CVNS digits ((Y1))m and ((Y2))m along with the output of the 1 µA ×(
Cam−ϕ cmp 16 µA
)
block are wired together. Since the current-mode circuits are used, this
realizes the summation term present in (4.9). Afterwards, the mod 16 µA operation is applied.
This in turn generates the CVNS digit representing the addition result.
The VLSI implementation of 1 µA ×
(
Cam−ϕ cmp 16 µA
)
is accomplished using the circuit
configuration shown in Fig. 4.2. To realize the 1 µA×
(
Cam−ϕ cmp 16 µA
)
, the input is equal to
63
4. MIXED-SIGNAL VLSI NEURAL NETWORK BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Figure 4.4: Block diagram of the CVNS adder
Cam−ϕ cmp 16 µA while the output is 1 µA ×
(
Cam−ϕ cmp 16 µA
)
. The transistor M1 acts as
a switch which turns on provided that Cam−ϕ is greater than 16 µA. Transistors M2 and M3 are
sized to generate the 1 µA current. The 1 µA current is copied to the output through the current
mirror constituted by the transistors M3 and M4. The VLSI implementation of the mod 16 µA is
explained in detail in [10]. The same circuit is used here.
CVNS Sigmoid Function
A CVNS sigmoid function evaluation structure is developed in [9]. The same structure is used in
the proposed network. The CVNS sigmoid function receives the input from the CVNS adder and
provides the output in the CVNS format. The output resolution of the CVNS sigmoid function is
equal to 8-bits.
4.1.3 Output Adaline
The output Adaline is similar to the hidden Adaline. However, the resolution of the CVNS multiplier
implemented in the output Adaline is different. Since the weight storage resolution and the CVNS
sigmoid function output resolution are 16-bits and 8-bits respectively, a CVNS multiplier with a
resolution of 16×8 bits should be implemented. A CVNS multiplier with the required resolution of
16×8 is developed in [10]. The same multiplier is used in the output Adaline of the proposed CVNS
network.
64
4. MIXED-SIGNAL VLSI NEURAL NETWORK BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Figure 4.5: VLSI implementation of the output to binary converter
4.1.4 Output to Binary Converter
The output to binary converter transforms the CVNS output to binary format. The radix-2 CVNS
digits can be converted back to the binary digits using the following equation [9]:
xm =
1 ((X))m ≥ 8µA0 ((X))m < 8µA (4.12)
The CVNS sigmoid function developed in [9] provides the output in the CVNS format using two
CVNS digits. Since the output is only one bit, the CVNS digit including the information of higher
significant bits is converted back to the binary.
To perform the comparison, the current comparator developed in [11] is exploited. The VLSI
implementation of the output to binary converter is shown in Fig. 4.5. The transistors M1 and M2
65
4. MIXED-SIGNAL VLSI NEURAL NETWORK BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Figure 4.6: Layout of the implemented network
act as a current mirror which copies the input current to the current comparator. The transistors M3
and M4 generate an 8 µA current. This current is copied to the transistor M5 through the current
mirror formed by the transistors M4 and M5. The current comparator composed of the transistors
M2 and M5 compares the input current with the 8 µA generated by the transistors M3 and M4. The
inverter chain converts the output of the current comparator to a rail to rail output.
4.2 Simulation Results
Using the developed circuits discussed in the previous sections, the proposed network is laid out.
The layout of the implemented network is shown in Fig. 4.6. The layout has an area equal to
113031.79 µm2. The designed layout is sent for fabrication. The layout of the chip sent for fabrication
is shown in Fig. 4.7.
The developed network is trained off-line. Weights are calculated in MATLAB. Then, the calcu-
lated weights are loaded into the digital registers to conduct the simulations.
The post-layout simulation results are shown in Fig. 4.8. Two inputs with different periods
are applied to cover all possible combinations. As may be noted from Fig. 4.8, the output exactly
matches the expected output for different combinations of inputs.
The post-layout simulation results show a worst case propagation delay of 99.68 ns. To measure
the power consumption, a uniform random bit stream including 1000 inputs with an input rate of
99.68 ns is applied to the network. The simulation result shows a power consumption of 17.59 mW .
66
4. MIXED-SIGNAL VLSI NEURAL NETWORK BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Figure 4.7: Layout of the chip sent for fabrication
Figure 4.8: Post-layout simulation results of the proposed CVNS network
Area, delay and power consumption of the implemented network are summarized in Table 4.2.
67
4. MIXED-SIGNAL VLSI NEURAL NETWORK BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Table 4.2: Area, delay and power consumption of the implemented network
Area (µm2) Delay (ns) Power consumption (mW )
113031.79 99.68 17.59
4.3 Conclusion
A 2-2-1 mixed-signal CVNS neural network structure is proposed in this paper. Using the CVNS
arithmetic, the problem of limited resolution of analog signal processing in mixed-signal neural
networks is resolved. This opens a path to design mixed-signal structures which satisfy the signal
processing resolution requirements of neural networks. The proposed 2-2-1 network is designed, laid
out, and post-layout simulated in 0.18µm CMOS technology. The post-layout simulations confirm
the operation of the proposed network.
68
REFERENCES
4.4 References
[1] B. Zamanlooy and M. Mirhassani, “Efficient VLSI implementation of neural networks with hy-
perbolic tangent activation function,” IEEE Trans. VLSI Syst., vol. 22, no. 1, pp. 39–48, January
2014.
[2] L. Gatet, H. Tap-Beteille, and M. Lescure, “Real-time surface discrimination using an analog
neural network implemented in a phase-shift laser rangefinder,” IEEE Sensors J., vol. 7, no. 10,
pp. 1381–1387, Oct. 2007.
[3] G. Zatorre, N. Medrano, M. Sanz, B. Calvo, P. Martinez, and S. Celma, “Designing adaptive
conditioning electronics for smart sensing,” IEEE Sensors J., vol. 10, no. 4, pp. 831–838, Apr.
2010.
[4] M. Valle, “Analog VLSI implementation of artificial neural networks with supervised on-chip
learning,” Analog Integrated Circuits and Signal Processing, vol. 33, pp. 263–287, 2002.
[5] G. Khodabandehloo, M. Mirhassani, and M. Ahmadi, “A prototype CVNS distributed neural
network using synapse-neuron modules,” IEEE Trans. Circuits Syst. I, vol. 59, no. 7, pp. 1482
–1490, July 2012.
[6] ——, “CVNS-based storage and refreshing scheme for a multi-valued dynamic memory,” IEEE
Trans. VLSI Syst., vol. 19, no. 8, pp. 1517 –1521, Aug. 2011.
[7] ——, “16-level CVNS memory with fast ADC,” Electronics Letters, vol. 45, no. 16, pp. 822 –824,
2009.
[8] K. Asanovic, and N. Morgan, “Experimental determination of precision requirements for back-
propagation training of artificial neural networks,” in 2nd International Conference on Micro-
electronics for Neural Network, 1991, pp. 9–15.
[9] B. Zamanlooy and M. Mirhassani, “CVNS-Based sigmoid function evaluation for precise neu-
rochips,” Submitted to IEEE Trans. Neural Netw..
[10] ——, “CVNS synapse multiplier for robust neurochips with on-chip learning,” Submitted to
IEEE Trans. VLSI Syst..
[11] D. Freitas and K. Current, “CMOS current comparator circuit,” Electronics Letters, vol. 19,
no. 17, pp. 695–697, 1983.
69
Chapter 5
Area-Efficient Robust Madaline Based on
Continuous Valued Number System
5.1 Introduction
Since the advent of artificial neural networks, software and hardware implementation methods have
been used for their realization. The hardware implementation methods have been exploited in
applications that require real-time and energy-efficient processing [1–5].
Neural network hardware implementation methods may be classified as analog, digital or mixed-
signal. In the analog implementation, both weight storage and processing are in the analog domain.
When implemented by analog circuits, neural networks typically maintain a higher energy efficiency,
a lower number of interconnections, and require less area, in comparison with its equivalent digital
implementation. However, the capacitor-based weight storage methods require a refresh signal and
are sensitive to process and power supply variations [6]. In the digital implementation of a neural
network, both weight storage and processing are done in the digital domain. The third implemen-
tation method, mixed-signal, uses digital registers for weight storage and analog circuits for signal
processing. This method exploits the ease of weight storage in digital registers while capitalizing on
the advantages of analog domain such as compact addition and nonlinear neuron.
The basic theory behind the neural network can be essentially described as a series of weights,
that when applied to distinct inputs, provide the appropriate corresponding output. Due to the
70
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
limited word length available in hardware implementations of neural networks, inputs and weights
are represented with finite precision; This degrades the output response. Therefore, the robustness to
input and weight errors becomes a key issue in neural network architectures. To quantify this issue,
the sensitivity of neural networks to input and weight imprecisions has been studied extensively
[7–14].
The sensitivity analysis can be classified as one of two approaches: the geometrical approach or
the statistical approach. The geometrical method uses a hypersphere or a hyper-rectangle model to
analyze the output sensitivity, while the statistical method determines the sensitivity by calculating
the Noise-to-Signal Ratio (NSR). The NSR is defined as the ratio of the variance of output error
to the variance of ideal output. Therefore, a network with a lower NSR is more tolerant to input
and weight errors. In this paper, the statistical approach is utilized to study the effects of input and
synaptic weight perturbations on the output.
Alternative number system can be exploited to design low NSR neural network architectures.
The Continuous Valued Number System (CVNS) is a mixed-signal number system in which its
analog digits share information. This feature enables multiple error correction in a digit set [15] and
makes this number system a candidate for implementing analog/mixed-signal neural networks with
a low NSR [16–18].
A CVNS neural network is presented in [16]. This structure is analog, in which the synaptic
weights are stored in the CVNS format and processing is accomplished using CVNS arithmetic. To
decrease the output error, the output of the Adaline is generated based on the Reverse Evolution
(RE) process [15]. This architecture is referred to as CVNS-RE in the rest of this paper.
Using the information redundancy present in the CVNS and the distributed neuron structure,
two architectures named CVNS-DNN and CVNS-FDNN were developed in [17]. Similar to the
CVNS-RE architecture, both of these structures store the weights in the CVNS format while the
processing is carried out using the CVNS arithmetic.
The previously developed CVNS neural network architectures [16–18] store synaptic weights in
analog memories which require a refresh signal and are sensitive to process and power supply vari-
ations. Moreover, the NSR improvement of the previous architectures is achieved at the cost of
extra neurons, which results in area overhead and increased power consumption. Although NSR is
generally a suitable measure to show the effect of input and weight errors on the network output,
the efficiency of different architectures can not be measured by this quantity alone. A better in-
dicator of the efficiency of a network model is the product of the total number of neurons in the
network multiplied by the network NSR, this provides a better estimate in terms of area and power
71
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
consumption for a specific NSR level.
To investigate the efficiency of various Madaline architectures based on the previous CVNS Ada-
lines, mathematical analysis of NSR and neuron×NSR of these architectures is required. It is worth
noting that only the NSR of CVNS Adalines has been considered previously. The mathematical
analysis performed in this paper allows the efficiency evaluation of different neural network archi-
tectures.
In this paper, a new mixed-signal Adaline is proposed. The proposed architecture stores the
weights in digital registers while the arithmetic is based on the CVNS number system. Using digital
registers to store weights eliminates the need for a refresh signal and provides a process and power
supply variation tolerant storage mechanism, eliminating the problems caused by the analog weight
storage methods used in the previous CVNS structures [16–18].
In the previously developed CVNS networks, the total number of analog memories required
for each Adaline is proportional to the number of CVNS digits representing the synaptic weights.
However, the total number of registers in the proposed CVNS Adaline is independent of the number
of CVNS digits. This in turn results in reduced weight storage elements which leads to lower area
overhead and lower power consumption. Furthermore, the RE process is used to decrease the error
in the CVNS digits, which improves the NSR.
Using the proposed Adaline, a Madaline named CVNS-distributed is proposed. The CVNS-
distributed Madaline uses the proposed CVNS Adaline at its output layer. The NSR and the
neuron×NSR of the proposed Madaline is formulated and compared with conventional lumped and
distributed as well as previous CVNS structures. This analysis proves that the proposed architec-
ture is more immune to input and weight errors while simultaneously requiring a lower number of
neurons for a specific NSR, resulting in an architecture with a lower area overhead and lower power
consumption. In this research, a three-layer Madaline is implemented in TSMC CMOS 0.18µm.
The implementation results confirm the advancement of the proposed Madaline in terms of NSR
and the area consumption required for a specific NSR.
The rest of this paper is organized as follows. In the next section, CVNS is briefly introduced. The
NSR and the stochastic model proposed by Piche [10] is briefly discussed in section 5.3. Moreover,
the NSR of previous Adalines is summarized in this section. The mathematical derivation of the
total number of neurons and NSR of a Madaline, based on the previous structures, is provided
in section 5.4, where the related equations are developed. The proposed Adaline and Madaline
structure is explained in section 5.5. Comparisons with existing structures are carried out in section
5.6. The VLSI implementation of a three-layer Madaline and comparisons with previous structures
72
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
are conducted in section 5.7. Finally, conclusions are drawn in section 5.8.
5.2 Continuous Valued Number System (CVNS)
The absolute value of a real number x using fixed-point number representation with a radix of B
can be shown as follows:
x =
Ni−1∑
i=−Nf
xi ×Bi (5.1)
where Ni and Nf are the number of integer and fractional digits, while xi is the digit.
CVNS is a number system with continuous valued digits. CVNS analog digits representing the
real number x can be generated as follows [15]:
((x))i = (x×B−i) mod B (5.2)
where ((x))i is the CVNS digit and−Nf ≤ i ≤ Ni−1. An ensemble of CVNS digits ((x))={((x))Ni−1, ...,
((x))0, ((x))−1, ..., ((x))−Nf } represent the input x in CVNS format.
Example: Finding the CVNS digit set of x = 81.92 with a radix of 10
According to (5.1), the number of integer digits, Ni, and fractional digits, Nf , required for repre-
sentation of x = 81.92 is equal to two. Therefore, the CVNS digit set ((x))={((x))1, ((x))0, ((x))−1, ((x))−2}
can represent the x = 81.92 in CVNS format. To obtain the CVNS digits, (5.2) is used. This results
in the CVNS digit set ((x))={8.192, 1.92, 9.2, 2}.
As can be seen, the CVNS digits share information. The information sharing between CVNS
digits allows for digit-level error detection, error correction and accurate arithmetic based on ana-
log and mixed-signal circuits. The digit-level error correction is carried out through the Reverse
Evolution (RE) process [15].
5.3 Noise to Signal Ratio of Previous Adalines
Piche [10] presented a stochastic model for the NSR of a lumped Adaline. In the general structure
of lumped neural networks, inputs are multiplied by weights in the synapse, summed at the node,
and passed through a nonlinear function in the neuron [19].
The stochastic model of lumped Adaline presented in [10] is based on the input NSR,
σ2∆x
σ2x
, and
the weight NSR,
σ2∆w
σ2w
, and is as follows:
NSR = g(
√
Nσxσw)× (σ
2
∆x
σ2x
+
σ2∆w
σ2w
) (5.3)
73
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
(a) (b)
Figure 5.1: (a) Ideal stochastic gain function and its approximation (b) Approximation error of the
stochastic gain function
The Adaline NSR,
σ2∆y
σ2y
, is the ratio of the variance of output error to the variance of ideal output,
and depends on the stochastic gain function, g. This gain function is a function of
√
Nσxσw, where N
is the number of inputs, and σx and σw are the standard deviation of inputs and weights, respectively.
Assuming x =
√
Nσxσw, the stochastic gain function, g(x), can be estimated as follows:
g(x) =
1 x ≤ 10.5 + 0.53× x x > 1 (5.4)
The stochastic gain function and its approximation are shown in Fig. 5.1a, while approximation
error is shown in Fig. 5.1b, showing a maximum approximation error of 15.5 %. According to (5.4),
the stochastic gain for inputs greater than one is a linear function of its inputs, while for inputs less
than one, has a constant value and is equal to one.
The stochastic model for NSR of an Adaline can be represented by a flow diagram as shown in
Fig. 5.2 [10]. In this flow diagram, the input and weight errors,
σ2∆x
σ2x
and
σ2∆w
σ2w
, are summed at a
node, and then are multiplied by the stochastic gain, resulting in
σ2∆y
σ2y
, which represents the NSR of
the Adaline.
The stochastic model developed by Piche [10] is used in [16, 17, 19] to analyze the NSR of the
developed Adalines. The NSR analyses performed in these papers are summarized in Table 5.1.
As shown in Table 5.1, the NSR of the Adalines developed in [16,17,19] is similar to the NSR of
74
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Figure 5.2: NSR flow diagram of an Adaline
Table 5.1: NSR of the previous Adalines
Structure NSR
Lumped [10] g(
√
Nσxσw)× (σ
2
∆x
σ2x
+
σ2∆w
σ2w
)
Distributed [19] g(σxσw√
N
)× (σ2∆xσ2x +
σ2∆w
σ2w
)
CVNS-RE [16] g(σxσw√
N
)×B−2D × (σ2∆xσ2x +
σ2∆w
σ2w
)
CVNS-DNN [17] g( σxσw
BD
√
N
)× (σ2∆xσ2x +
σ2∆w
σ2w
)
CVNS-FDNN [17] g( σxσw
BD(D+1)
√
N
)× (σ2∆xσ2x +
σ2∆w
σ2w
)
the lumped Adaline [10], with the difference that they have a modified stochastic gain function. In
the distributed structure [19], the stochastic gain function has been changed to g(σxσw√
N
), while the
stochastic gain of the CVNS-RE structure [16] is equal to the multiplication of g(σxσw√
N
) by B−2D.
It should be noted that B is the radix of the CVNS digits, while D + 1 CVNS digits are used to
store the synaptic weights. Moreover, the B−2D factor is indicative of the RE process [15] used in
this structure. The CVNS-DNN and CVNS-FDNN structures developed in [17] have a stochastic
gain function equal to g( σxσw
BD
√
N
) and g( σxσw
BD(D+1)
√
N
), respectively. A comparison of these structures
in terms of NSR and neuron×NSR will be performed in section 5.6.
5.4 Mathematical Analysis of Madaline Structures
The Adaline is the basic element of a neural network in which the inputs are multiplied by weights,
summed at the node, and pass through a nonlinear activation function. The Madaline is a multilayer
extension of the Adaline. In a Madaline, the output of every Adaline is used as an input to all
Adalines in the next layer [20]. The general configuration of a Madaline is shown in Fig. 5.3. The
Madaline has L+ 1 layers with N0 inputs, and Ni Adalines in each layer i.
75
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Figure 5.3: Madaline general configuration
In this section, the mathematical derivation of the total number of neurons and the NSR of
several Madalines is performed. The NSR of a multilayer network is affected by the errors in the
inputs to each layer, as well as errors in the synapse weights. These errors and variations are caused
by various sources in the network; for example, circuit nonlinearity can create errors at the output
of the neurons, which leads to errors in the input values of the next layer. However, the main source
of error in synapse weights is the inevitable quantization error present in all weight storage methods.
The mathematical analysis of the NSR of different architectures performed in this section, provides
information regarding the sensitivity of different networks to input and weight errors. Moreover, the
mathematical derivation of the total number of neurons, along with the NSR analysis of different
networks conducted in this section, makes the efficiency evaluation of different structures in terms
of neuron×NSR performed in section 5.6 feasible.
5.4.1 Lumped Structure
In the lumped structure, each Adaline is composed of one neuron to generate the output. Therefore,
the total number of neurons for a lumped Madaline with L+ 1 layers is as follows:
NLM =
L∑
i=1
Ni (5.5)
76
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
where Ni is the number of Adalines in layer i, i = 0 denotes the input layer which has no neurons,
i = 1 and i = L denote the first hidden layer and the output layer, respectively.
The NSR of a Madaline can be derived by cascading the NSR of Adalines of each layer. It should
be noted that
σ2∆w
σ2w
is the weight storage quantization error while
σ2∆x
σ2x
is the Adaline input error.
Considering that in a Madaline, all weights are stored with the same resolution, the quantization
error of weight storage for all layers is the same. The inputs to every Adaline, except the Adalines in
the first hidden layer, are the outputs of the previous layer Adalines. Considering that all Adalines
of a Madaline are the same, and assuming the same input error for Adalines in the first hidden layer,
the input error to all Adalines in a Madaline will be the same. Therefore,
σ2∆w
σ2w
and
σ2∆x
σ2x
for all layers
of a Madaline are the same. Thus, the NSR of the lumped Madaline structure can be written as
follows:
NSRLM = (
L∑
j=1
L∏
k=j
gk)× (σ
2
∆x
σ2x
+
σ2∆w
σ2w
) (5.6)
where gk = g(
√
Nk−1σxσw) is the stochastic gain function of layer k, in which Nk−1 is the number
of Adalines in layer k − 1 [10].
According to (5.6), the NSR of a Madaline increases as the number of inputs and network layers
increases, provided that the stochastic gain function is in its linear region. The stochastic gain
function of the output layer is present in all product terms of a Madaline NSR. Therefore, the
output layer of a network has the greatest impact on the NSR of a Madaline.
5.4.2 Distributed Structure
In the distributed structure [19], each Adaline is divided into sub-neurons, where their number is
equal to the number of inputs. Therefore, the total number of sub-neurons used in each layer of
the distributed Madaline is equal to the number of Adalines in that layer multiplied by the number
of Adalines in the previous layer. The total number of neurons in the distributed Madaline can be
written in the following form:
NDM =
L∑
i=1
(Ni ×Ni−1) (5.7)
Using the NSR of the distributed Adaline, the NSR of a distributed Madaline can be written in
the following form:
NSRDM = (
L∑
j=1
L∏
k=j
g
′
k)× (
σ2∆x
σ2x
+
σ2∆w
σ2w
) (5.8)
where g
′
k = g(
σxσw√
Nk−1
).
77
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
According to (5.8), using the distributed structure with one neuron for each input synapse results
in a self-scaling property that controls the Madaline NSR, when the number of inputs to Madaline,
as well as number of Adalines in different layers, increases.
5.4.3 CVNS-RE Structure
In the CVNS-RE structure [16], each Adaline is divided into sub-neurons, and the number of sub-
neurons is equal to the number of Adaline inputs multiplied by the number of the CVNS digits
used for each synaptic weight. Therefore, the total number of sub-neurons used in the CVNS-RE
Madaline using D + 1 CVNS digits for each weight, for all layers is as follows:
NCREM =
L∑
i=1
(Ni ×Ni−1 × (D + 1)) (5.9)
Using the NSR of the CVNS-RE Adaline, the NSR of the CVNS-RE Madaline is as follows:
NSRCREM =
L∑
j=1
L∏
k=j
(g
′
k ×B−2D)× (
σ2∆x
σ2x
+
σ2∆w
σ2w
) (5.10)
According to (5.10), the information redundancy present in the CVNS digits along with the RE
process, reduces the effect of the stochastic gain function of different layers, which consequently
improves the Madaline NSR.
Since the RE method is used for each layer, and for each Adaline, the network delay is increased
considerably. This limits the application of this type of network to off-chip training schemes.
5.4.4 CVNS-DNN and CVNS-FDNN Structures
In the CVNS-DNN Adaline, the number of sub-neurons is equal to the number of Adaline inputs.
Therefore, the total number of neurons in each layer is equal to the number of Adalines in the current
layer multiplied by the number of Adalines in the previous layer. The total number of sub-neurons,
similar to a distributed network, is equal to:
NCDM =
L∑
i=1
(Ni ×Ni−1) (5.11)
Using the NSR of the CVNS-DNN Adaline, the NSR of a CVNS-DNN Madaline can be written
in the following form:
NSRCDM =
L∑
j=1
L∏
k=j
g
′′
k × (
σ2∆x
σ2x
+
σ2∆w
σ2w
) (5.12)
where we have g
′′
k = g(
σxσw
BD
√
Nk−1
).
78
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
According to (5.12), the developed CVNS-DNN structure decreases the stochastic gain function
of different layers, provided that the stochastic gain function is in its linear region. This results in
the reduction of Madaline NSR, making the Madaline more tolerant to input and weight errors.
In the CVNS-FDNN Adaline, the number of sub-neurons is equal to the number of its inputs
multiplied by the number of CVNS digits used for synapse weight storage. The total number of
sub-neurons in this structure can be written in the following form:
NCFDM =
L∑
i=1
(Ni ×Ni−1 × (D + 1)) (5.13)
Using the NSR of the CVNS-FDNN Adaline, the NSR of the CVNS-FDNN Madaline can be
written in the following form:
NSRCFDM =
L∑
j=1
L∏
k=j
g
′′′
k × (
σ2∆x
σ2x
+
σ2∆w
σ2w
) (5.14)
where we have g
′′′
k = g(
σxσw
BD(D+1)
√
Nk−1
).
According to (5.12) and (5.14), the stochastic gain function reduction of different layers in the
CVNS-FDNN Madaline is a function of BD(D+1), while in the CVNS-DNN Madaline, it is function
of BD. Therefore, the CVNS-FDNN Madaline provides a lower NSR in the linear region of the
stochastic gain function and is more immune to input and weight errors.
The NSR analysis of different Madalines, along with mathematical derivation of the total number
of neurons, provides the platform to carry out the mathematical comparison of different Madalines
in terms of NSR and neuron×NSR conducted in section 5.6.
5.5 Proposed Architecture
All of the previous CVNS neural networks store the synaptic weights in analog memories which
require a refresh signal, hence, are sensitive to process and power supply variation. The new CVNS
Madaline structure, proposed in this section, overcomes this limitation.
In the proposed CVNS Madaline structure, synaptic weights are stored in digital registers, pro-
viding reliable and low complexity storage compared to the analog memory required for previous
architectures. The arithmetic and signal processing is based on the CVNS arithmetic. Efficiency of
the proposed network in terms of NSR and neuron×NSR will be presented in section 5.6.
According to (5.4), the Adaline stochastic gain function is a linear function of its inputs, provided
that the input to the stochastic gain function is greater than one. Therefore, an increase in the
79
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
number of inputs, input variation, and weight variation, in the linear region of the stochastic gain
function, results in an increase in the stochastic gain. Considering the NSR of the distributed
Adaline, shown in Table 5.1, using distributed neurons reduces the stochastic gain in the linear
region, and improves the NSR of the network.
The proposed Adaline is shown in Fig. 5.4. Here, distributed sub-neurons are used, and weights
are stored in digital registers. The inputs to the Adaline are in the CVNS format, represented by
D+ 1 CVNS digits. The number of bits representing each weight in the network is denoted by Nw.
Additionally, the RE process is applied to the CVNS digits representing the Adaline output.
Inputs to the network are multiplied by the weights using the CVNS multiplier [15]. The weights
are stored on the registers and denoted by w, while the second input to the CVNS multiplier, (((y))i,
is in CVNS format. The multiplication result, ((z))i, is in CVNS format and is as follows:
((z))i =
i∑
k=0
(wk × ((y))i−k) mod 2 (5.15)
According to (5.15), each of the output digits, ((z))i, exploits the w0 to wi bits of the weight
storage registers corresponding to that input. Therefore, different CVNS digits representing the
same input to the Adaline share the same weight registers for CVNS multiplication. In previously
developed CVNS structures, the number of analog memory units required for each input to the
Adaline is equal to the number of CVNS digits representing the weights corresponding to that
input. The reduction in the number of storage medium elements in the proposed structure results
in lower area and reduced power consumption.
Since the RE process is applied to the output, the error of the output is decreased by a factor of
B−D, decreasing the NSR by B−2D. Therefore, the NSR of the proposed Adaline can be written in
the following form:
NSRPA = g(
σxσw√
N
)×B−2D × (σ
2
∆x
σ2x
+
σ2∆w
σ2w
) (5.16)
According to (5.16), the information redundancy of the CVNS digits along with the RE process
improves the NSR of the proposed Adaline. The distributed nature of the sub-neurons controls the
NSR of the proposed Adaline when the number of inputs increases.
The NSR flow diagram of the proposed Adaline is shown in Fig. 5.5a. Here, the summation of
input and weight errors is being multiplied by g
′ ×B−2D, where g′ is equal to g(σxσw√
N
).
The block diagram of the proposed CVNS-distributed Madaline structure for a three-layer net-
work with N inputs, N hidden Adalines, and one output Adaline, is shown in Fig. 5.6. The hidden
layer of the neural network, shown in Fig. 5.6, is based on the distributed structure, while the output
80
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Figure 5.4: Block diagram of the proposed Adaline structure
layer uses the proposed Adaline. Using the proposed Adaline at the output layer improves the NSR,
while the distributed structure used in the hidden layers maintains a low total number of neurons.
81
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
(a) (b)
Figure 5.5: (a) NSR flow diagram of the proposed Adaline (b) NSR flow diagram of the proposed
CVNS-distributed Madaline
Figure 5.6: Block diagram of the proposed CVNS-distributed Madaline
82
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
To generate the input CVNS digits for the output layer, a CVNS generator block is required.
The CVNS digit generation is done based on (5.2). The number of sub-neurons in the layers using
the distributed structure can be calculated using (5.7). The only difference is that, in the proposed
Madaline, only L−1 layers use the distributed structure. The number of sub-neurons of each Adaline
in the output layer is equal to the product of number of Adalines in the previous layer multiplied
by the number of CVNS digits used to represent the inputs to that Adaline. Therefore, the total
number of neurons in the proposed Madaline can be written in the following form:
NPM =
(
L−1∑
i=1
(Ni ×Ni−1)
)
+ (NL ×NL−1 × (DL + 1)) (5.17)
where DL + 1 is the number of CVNS digits used to represent the inputs to the output layer.
Using the NSR flow diagram of the proposed Adaline in combination with the NSR flow diagram
of the distributed structure, the NSR flow diagram of the proposed CVNS-distributed Madaline is
shown in Fig. 5.5b.
Based on the NSR flow diagram of the proposed Madaline, the NSR of the proposed Madaline
can be written in the following form:
NSRPM = B
−2DL × (
L∑
j=1
L∏
k=j
g
′
k)× (
σ2∆x
σ2x
+
σ2∆w
σ2w
) (5.18)
According to (5.18), since the information redundancy provided by the CVNS digits in the output
layer decreases all product terms of the stochastic gain functions by a factor of B−2DL , the proposed
structure can exploit the CVNS information redundancy more efficiently. This will be proven in the
next section.
5.6 Comparison of the Proposed Madaline Structure with
Previous Architectures
In this section, a comparison between the proposed Madaline structure and the previous structures in
terms of NSR and neuron×NSR is performed. The comparison is based on two assumptions. First,
it is assumed that the total number of CVNS digits used in the proposed and previous architectures
are the same. It should be noted that all previous CVNS neural network architectures use CVNS
digits in all layers, while the proposed Madaline network uses CVNS digits only in the output layer.
Therefore, to have the same total number of CVNS digits in all structures the number of CVNS
digits used in the output layer of the proposed network is obtained as follows:
DL = (D + 1)× L (5.19)
83
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Table 5.2: Total number of neurons required for different Madaline structures and their normalized
value with respect to lumped structure
Madaline Structure Total Number Normalized Total
of Neurons Number of Neurons
Lumped N × L 1
Distributed N2 × L N
CVNS-RE N2 × (D + 1)× L N × (D + 1)
CVNS-DNN N2 × L N
CVNS-FDNN N2 × (D + 1)× L N × (D + 1)
Proposed Madaline N2 × (L− 1) +N2 × L× (D + 1) N × (1 + (D + 1)− L−1)
where DL is the number of CVNS digits used to represent the inputs to the output layer of the
proposed Madaline, and D+ 1 is the number of CVNS digits used for weight storage in each layer of
the previous CVNS architectures. It should be noted that the desired number of input CVNS digits
to the output layer of the proposed Madaline can be generated using the CVNS generator blocks
shown in Fig. 5.6.
Moreover, it is assumed that the number of inputs, as well as the number of Adalines in each
layer of all Madalines, are equal to N . Based on this assumption, the total number of neurons,
or sub-neurons, required for the lumped, distributed, CVNS-RE, CVNS-DNN, CVNS-FDNN and
proposed CVNS-distributed structures found through (5.5), (5.7), (5.9), (5.11), (5.13) and (5.17)
can be simplified as shown in Table 5.2. It should be noted that L + 1 is the number of Madaline
layers while D+ 1 is indicator of the number of CVNS digits used for weight storage in each layer of
the previous CVNS architectures. Also, the normalized value of the total number of neurons with
respect to the lumped structure are summarized in this Table. The normalized total number of
neurons are used to evaluate the efficiency of different architectures in terms of neuron×NSR.
According to Table 5.2, the CVNS-DNN architecture has the lowest total number of sub-neurons
among the previous CVNS architectures, while the CVNS-RE and CVNS-FDNN architectures re-
quire the same number. The proposed CVNS-distributed Madaline requires a higher number of
neurons compared to the previous CVNS architectures. Robustness of a network is measured in
terms of NSR while its efficiency can be evaluated in terms of neuron×NSR.
84
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Based on (5.4), the stochastic gain has two regions: linear and nonlinear. According to the
simulations performed in [22], a synaptic weight range of [-8,8] is required while according to [23]
the inputs should be normalized between [0,1] or [-1,1]. Assuming uniform distribution for input and
weight variables, the stochastic gain function of the lumped structure will always be in the linear
region, while the stochastic gain of the other previous structures, as well as the proposed structure,
depending on the number of inputs, may be in the linear or nonlinear region. Therefore, the
mathematical comparison of the proposed structure with the previous structures will be performed in
each region. The first region is where the input to stochastic gain function is greater than two. The
stochastic gain in this region is a linear function of its inputs. The second region is where the input
to the stochastic gain function is less than two, where the stochastic gain function shows a nonlinear
behavior. The mathematical discussion conducted in this section proves that the proposed Madaline
has a better NSR and neuron×NSR in both the linear and nonlinear regions of the stochastic gain
function.
5.6.1 Comparison of the Proposed Madaline Structure with Previous
Structures in the Linear Region of Stochastic Gain Function
According to (5.4), the stochastic gain function for inputs greater than two is a linear function of
its input. Therefore, it can be written:
g(x) ∝ x (5.20)
Using (5.20), the NSR of the lumped, distributed, CVNS-RE, CVNS-DNN, CVNS-FDNN and
proposed CVNS-distributed Madaline structures found through (5.6), (5.8), (5.10), (5.12), (5.14)
and (5.18) can be written in the following form:
NSRLM ∝
L∑
j=1
(
√
Nσxσw)
j (5.21)
NSRDM ∝
L∑
j=1
(
σxσw√
N
)j (5.22)
NSRCREM ∝
L∑
j=1
(
σxσw√
N ×B2D )
j (5.23)
NSRCDM ∝
L∑
j=1
(
σxσw√
N ×BD )
j (5.24)
85
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Table 5.3: NSR of different structures and their normalized value with respect to lumped structure
in linear region of stochastic gain function
Structure NSR Normalized NSR
Lumped [19] ∝ (√Nσxσw)L × (σ
2
∆x
σ2x
+
σ2∆w
σ2w
) 1
Distributed [19] ∝ (σxσw√
N
)L × (σ2∆xσ2x +
σ2∆w
σ2w
) N −L
CVNS-RE [16] ∝ B−2DL × (σxσw√
N
)L × (σ2∆xσ2x +
σ2∆w
σ2w
) N −L ×B−2DL
CVNS-DNN [17] ∝ B−DL × (σxσw√
N
)L × (σ2∆xσ2x +
σ2∆w
σ2w
) N −L ×B−DL
CVNS-FDNN [17] ∝ (BD ×D)−L(σxσw√
N
)L × (σ2∆xσ2x +
σ2∆w
σ2w
) N −L × (BD ×D)−L
Proposed Madaline ∝ B−2((D+1)L−1) × (σxσw√
N
)L × (σ2∆xσ2x +
σ2∆w
σ2w
) N −L ×B−2((D+1)L−1)
NSRCFDM ∝
L∑
j=1
(
σxσw√
N ×BD ×D )
j (5.25)
NSRPM ∝ B−2((D+1)L−1) ×
L∑
j=1
(
σxσw√
N
)j (5.26)
The above equations are geometric sequences which can be written in the following general form:
L∑
j=1
αj =
α− αL+1
1− α (5.27)
The above summation term can be approximated as αL provided that αL  1. Since the
input to the stochastic gain in this region is greater than two, we have α > 2. Considering that a
Madaline has at least three layers, (5.27) can be approximated as αL. Using this approximation,
the NSR of different structures presented in (5.21) to (5.26) can be summarized as shown in Table
5.3. The normalized values of the NSR for various structures with respect to the lumped structure
are shown in this Table as well. Comparing the normalized NSR of different architectures indicates
that the proposed structure has the lowest NSR compared to the previous architectures. Therefore,
the synapse weight errors have less effect on the network output which indicates that the proposed
structure is more robust to weight errors present in the hardware implementation.
Using the normalized values of the total number of neurons and the NSR of different structures
in Table 5.2 and Table 5.3, the results of the normalized neuron×NSR of all structures are provided
in Table 5.4.
86
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Table 5.4: Normalized neuron×NSR of different structures in linear region of stochastic gain function
Structure Normalized Neuron×NSR
Lumped [19] 1
Distributed [19] N1−L
CVNS-RE [16] N1−L × (D + 1)×B−2DL
CVNS-DNN [17] N1−L ×B−DL
CVNS-FDNN [17] N1−L × (D + 1)× (BD ×D)−L
Proposed Madaline N1−L × (1 + (D + 1)− L−1)×B−2DL ×B−2(L−1)
According to Table 5.4, the CVNS-DNN architecture has a better performance than the dis-
tributed structure. To compare CVNS-FDNN and CVNS-DNN structures, the normalized neuron×NSR
of the CVNS-FDNN architecture is divided by neuron×NSR of the CVNS-DNN structure resulting
in the following equation:
(D + 1)×D−L (5.28)
Since a Madaline has at least three layers, the CVNS-FDNN structure has a better neuron×NSR.
Dividing the normalized neuron×NSR of the CVNS-RE architecture by that of the CVNS-FDNN
architecture, we obtain the following equation:
(B−D ×D)−L < 1 (5.29)
Therefore, the CVNS-RE structure has a better neuron×NSR than that of the CVNS-FDDN.
This in turn means that the CVNS-RE structure has the best neuron×NSR among all of the previ-
ously developed structures.
To compare the efficiency of the proposed architecture with CVNS-RE structure, the normalized
neuron×NSR of the proposed architecture is divided by the normalized neuron×NSR of the CVNS-
RE structure which results in the following equation:
((D + 1)−1 + 1− ((D + 1)× L)−1)×B−2(L−1) (5.30)
The maximum of (5.30) happens when B, D and L are at their minimum value. A Madaline
has at least three layers, while the minimum radix and number of CVNS digits is equal to two.
Therefore, (5.30) is always less than one and the proposed CVNS-distributed architecture has a better
87
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Figure 5.7: Normalized Neuron×NSR improvement of different architectures compared to distributed
structure
performance compared to the CVNS-RE structure. Consequently, the proposed CVNS-distributed
architecture achieves the best neuron×NSR of all structures. The proposed architecture can provide
the same accuracy as previous architectures by utilizing a lower total number of sub-neurons. In
other words, the proposed network obtains the same accuracy with a lower area overhead and reduced
power consumption.
The normalized neuron×NSR improvement for different architectures compared to the dis-
tributed structure is shown in Fig. 5.7. All architectures have a better neuron×NSR compared
to the distributed structure, while clearly, the normalized neuron×NSR of the distributed structure
is better than that of the lumped structure. Therefore, all structures are more efficient than the
lumped structure.
5.6.2 Comparison of the Proposed Madaline Structure with Previous
Structures in the Nonlinear Region of Stochastic Gain Function
Inputs to the stochastic gain function that are less than two result in a nonlinear behavior. To
compare different structures in this region, the minimum NSR and neuron×NSR of the distributed,
CVNS-RE, CVNS-DNN and CVNS-FDNN structures are compared with the maximum NSR and
neuron×NSR of the proposed Madaline.
88
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
As discussed previously, the stochastic gain function of the lumped structure is always in the
linear region. Therefore, the NSR and neuron×NSR of the lumped structure derived in the previous
section will be used in this section as well. According to (5.8), (5.10), (5.12) and (5.14), the minimum
NSR of the distributed, CVNS-RE, CVNS-DNN and CVNS-FDNN structures occurs when the
stochastic gain is at its minimum. The minimum value of the stochastic gain is equal to one.
Therefore, the minimum NSR of these structures can be written in the following form:
NSRDM = NSRCDM = NSRCFDM = L× (σ
2
∆x
σ2x
+
σ2∆w
σ2w
) (5.31)
NSRCREM =
L∑
i=1
(B−2D)i × (σ
2
∆x
σ2x
+
σ2∆w
σ2w
) (5.32)
According to (5.18), the maximum NSR of the proposed structure occurs when the stochastic
gain is at its maximum. Considering that the maximum stochastic gain in this region is equal to
1.56, the upper bound of the NSR of the proposed structure can be written in the following form:
NSRPM = B
−2((D+1)L−1) ×
L∑
i=1
(1.56)i × (σ
2
∆x
σ2x
+
σ2∆w
σ2w
) (5.33)
The NSR of different structures presented in (5.31) to (5.33) can be summarized as shown in
Table 5.5. The normalized values of the NSR for various structures, with respect to the lumped
structure, are shown in this Table as well.
Dividing the NSR of the proposed Madaline by the distributed, CVNS-DNN and CVNS-FDNN
structures results in the following equation:
(L−1 ×B−2((D+1)L−1) ×
L∑
i=1
(1.56)i) < 1 (5.34)
Thus, the NSR of the proposed structure in this region is lower than those of the distributed,
CVNS-DNN and CVNS-FDNN structures.
To compare the NSR of the proposed structure with that of the CVNS-RE structure, their NSR
is divided by each other, which results in the following equation:
B−2((D+1)L−1) ×
L∑
i=1
(1.56)i × (
L∑
i=1
(B−2D)i)−1 (5.35)
The maximum of (5.35) occurs when B, D and L are at their minimum value. A Madaline
has at least three layers, while the minimum radix and number of CVNS digits is equal to two.
Therefore, (5.30) is always less than one, and the proposed structure has a better NSR compared
to the CVNS-RE structure. Moreover, considering the synaptic weight range of [-8,8] and input
89
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Table 5.5: NSR of different structures and their normalized value with respect to lumped structure
in nonlinear region of stochastic gain function
Structure NSR Normalized NSR
Lumped [19] (
√
Nσxσw)
L × (σ2∆xσ2x +
σ2∆w
σ2w
) 1
Distributed [19] L× (σ2∆xσ2x +
σ2∆w
σ2w
) N−
L
2 × L× (σxσw)−L
CVNS-RE [16]
L∑
i=1
(B−2D)i(σ
2
∆x
σ2x
+
σ2∆w
σ2w
) N−
L
2 ×
L∑
i=1
(B−2D)i × (σxσw)−L
CVNS-DNN [17] L× (σ2∆xσ2x +
σ2∆w
σ2w
) N−
L
2 × L× (σxσw)−L
CVNS-FDNN [17] L× (σ2∆xσ2x +
σ2∆w
σ2w
) N−
L
2 × L× (σxσw)−L
Proposed B−2((D+1)L−1)× N−L2 ×B−2((D+1)L−1)×
L∑
i=1
(1.56)i × (σ2∆xσ2x +
σ2∆w
σ2w
)
L∑
i=1
(1.56)i × (σxσw)−L
range of [0,1] or [-1,1], with a uniform distribution, the normalized NSR of the proposed structure is
always lower than one. Consequently, the NSR of the proposed structure is lower than all previous
structures in the nonlinear region of the stochastic gain function.
Using the normalized values of the total number of neurons and the NSR of different structures in
Table 5.2 and Table 5.5, the results of the normalized neuron×NSR of all structures in the nonlinear
region of the stochastic gain function are provided in Table 5.6.
To compare the efficiency of the proposed structure with the distributed and CVNS-DNN
strucutres, neuron×NSR of the proposed structure is divided by neuron×NSR of the CVNS-DNN
architecture which results in the following equation:
L−1 ×
L∑
i=1
(1.56)i × (1 + (D + 1)− L−1)×B−2((D+1)L−1) (5.36)
The maximum of (5.36) occurs when B, D and L are at their minimum value. Therefore,
(5.36) is always less than one. Consequently, the proposed structure is more efficient compared
to the distributed and CVNS-DNN structures. Considering that the neuron×NSR of the CVNS-
FDNN architecture is higher than that of the distributed and CVNS-DNN structures, the proposed
structure is more efficient compared to the CVNS-FDNN architecture as well.
To compare the efficiency of proposed structure with CVNS-RE, the neuron×NSR of the proposed
90
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Table 5.6: Normalized neuron×NSR of different structures in nonlinear region of stochastic gain
function
Structure Normalized Neuron×NSR
Lumped [19] 1
Distributed [19] N1−
L
2 × L× (σxσw)−L
CVNS-RE [16] N1−
L
2 × (D + 1)×
L∑
i=1
(B−2D)i × (σxσw)−L
CVNS-DNN [17] N1−
L
2 × L× (σxσw)−L
CVNS-FDNN [17] N1−
L
2 × (D + 1)× L× (σxσw)−L
Proposed N1−
L
2 ×
L∑
i=1
(1.56)i × (1 + (D + 1)− L−1)×B−2((D+1)L−1) × (σxσw)−L
structure is divided by CVNS-RE NSR, which results in the following equation:
L∑
i=1
(1.56)i × ((D + 1)−1 + 1− ((D + 1)× L)−1)×B−2((D+1)L−1) × (
L∑
i=1
(B−2D)i)−1 (5.37)
The maximum of (5.37) occurs when B, D and L are at their minimum. Therefore, (5.37) is always
less than one. This in turn means that the proposed structure is more efficient than the CVNS-RE
structure.
The maximum value of neuron×NSR of the proposed structure occurs when B, D and L are at
their minimum. Considering synaptic weight range of [-8,8] and input range of [0,1] or [-1,1], with
a uniform distribution, neuron×NSR of the proposed structure is always less than one. Therefore,
the proposed structure is more efficient compared to the lumped structure. Thus, the proposed
structure is the most efficient structure compared to the previous structures in the nonlinear region
of the stochastic gain function.
5.7 VLSI Implementation and Comparisons
In this section, the VLSI implementation of a three-layer Madaline with N = 5 Adalines in each
layer using 8 bits for weight storage and inputs and weights uniformly distributed in the range of
[−1, 1] and [−8, 8] is considered. The input and weight variance is σ2x = 13 and σ2w = 8
2
3 , respectively.
The radix of the CVNS is considered to be two, which provides the most efficient radix for conversion
between binary and CVNS. The number of CVNS digits used in each layer of the previous CVNS
91
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
structures, D, is equal to three. Since the Madaline has three layers, L is equal to two. Therefore,
using (5.19), the number of CVNS digits used in the output layer of the proposed CVNS-distributed
Madaline, DL, is equal to six.
The neuron×NSR calculation was used in the previous sections to compare the efficiency of
different structures, while the area×NSR calculation is used in this section. The neuron×NSR
made the system-level efficiency evaluation of the different structures feasible, while the network
area×NSR provides more information regarding the area consumption, corresponding to a specific
NSR, at the circuit-level. Therefore, to measure the VLSI implementation efficiency of different
Madaline structures, their area consumption and NSR were considered.
The basic building blocks of an Adaline are the adder, multiplier, neuron and the weight storage
medium. The proposed and the previous structures are implemented by current-mode circuits.
Addition in the current-mode circuits is easily performed through wiring the nodes carrying the
signals. Thus, the addition overhead in all structures can be neglected.
The multiplier used in the lumped and distributed Adaline [19] is a Multiplying Digital to Analog
Converter (MDAC), while the proposed Adaline and the previous CVNS structures require CVNS
multiplier.
The lumped, distributed and the proposed Adalines exploit digital registers as the weight storage
medium, while all previous CVNS structures require analog memory. The digital registers can be
implemented using the TSMC 0.18µm CMOS standard cell library.
The VLSI implementation of the CVNS multiplier is realized based on the (5.15). The block
diagram of an 8-bit CVNS multiplier is shown in Fig. 5.8. The CVNS multiplier is implemented
using current-mode circuits. In the implemented circuit, 8 µA is indicator of 1. As can be seen in
Fig. 5.8, the input currents ((x))0 to ((x))7 are applied to the transistors M1 to M8. These transistors
act as switches which turn on, provided that the corresponding wi input to their gate is high. This
in turn implements the wi((x))7−i terms required to calculate the output ((z)). Transistors with a
W
L of (
0.22 µm
0.18 µm ) are used for implementing these switches. The input currents, after passing the
input transistors, are summed at the input nodes of the mod 16 µA blocks. The output of the
mod 16 µA blocks, which receive the input currents, are wired together to perform the addition.
The addition result passes through two more stages of the mod 16 µA blocks. This in turn generates
the multiplication result.
The main building block of the CVNS multiplier is the mod16µA block. The mod16µA opera-
92
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Figure 5.8: Block diagram of the 8-bit CVNS multiplier
tion can be written in the following form:
x mod 16 µA =
x x < 16 µAx− 16 µA x ≥ 16 µA (5.38)
where x is the input to the mod16µA function.
The mod 16µA operation circuit is shown in Fig. 5.9. The transistor sizes of the mod 16µA
circuit are shown in Table 5.7. It should be noted that the smallest actual size of WL used for VLSI
implementation of the Madaline is equal to 0.22 µm0.18 µm .
The circuit is composed of five main sections including an input current mirror, a current com-
parator, a current subtractor, an inverter chain and an output current mirror.
According to (5.38), the current comparator and subtractor sections should compare the input
93
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Figure 5.9: VLSI implementation of the mod16µA operation
current x with 16 µA and subtract the x from 16 µA, provided that the input is greater than 16 µA.
The input current mirror, which is composed of the transistors M1 and M2, copies the input current
to the current comparator and the current subtractor sections. The current comparator is based on
the structure developed in [24] and compares the input current with a reference current of 16 µA.
This 16 µA is generated by the transistors M4 and M5 and copied to the transistor M6. The output
of the comparator is connected to the inverter chain which provides a rail to rail output. The current
comparator, along with the inverter chain, generates the CmpN and CmpP signals equal to 0 and
1.8 volt provided that the input current to the mod 16 µA circuit is greater than 16 µA. The two
transistors M8 and M9, which act as a transmission gate, turn on provided that the input current
is greater than 16 µA. This allows the reference current generated by the transistors M4 and M5
be copied to the transistor M7. Considering that the current through the transistor M3 is the same
as the input current, the 16 µA reference current will be subtracted from the input current which
flows through the transistor M10. This is only applicable if the input current is greater than 16 µA.
Otherwise, the input current flows through the transistor M10. The current through the transistor
M10 is copied to the transistor M11, which generates the output of the mod 16 µA block. The
94
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Table 5.7: Transistor sizes of the mod16µA circuit
Transistor
(
W
L
)
(µmµm ) Transistor
(
W
L
)
(µmµm ) Transistor
(
W
L
)
(µmµm )
M1 (
1
0.5 ) M7 (
0.66
0.5 ) M13 (
0.22
0.18 )
M2 (
1
0.5 ) M8 (
1
0.18 ) M14 (
0.9
0.18 )
M3 (
1
0.5 ) M9 (
1
0.18 ) M15 (
0.22
0.18 )
M4 (
0.65
0.18 ) M10 (
1
0.5 ) M16 (
0.9
0.18 )
M5 (
0.66
0.5 ) M11 (
1
0.5 ) M17 (
0.22
0.18 )
M6 (
0.66
0.5 ) M12 (
0.9
0.18 )
Figure 5.10: Simulation results of the mod16µA circuit
simulation results of four different input values are shown in Fig. 5.10 which confirm the proper
operation of the mod 16 µA circuit.
VLSI implementation of the neuron is shown in Fig. 5.11a. The implemented neuron is based on
the structure developed in [25], which realizes the sigmoid activation function. The output versus
input of the implemented neuron is shown in Fig. 5.11b. Furthermore, the transistor sizes of the
implemented neuron are shown in Table 5.8.
95
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
(a) (b)
Figure 5.11: (a) VLSI implementation of the neuron (b) Output versus input of the implemented
neuron
Table 5.8: Transistor sizes of the neuron circuit
Transistor
(
W
L
)
(µmµm ) Transistor
(
W
L
)
(µmµm )
M1 (
1.5
0.5 ) M4 (
0.8
0.5 )
M2 (
0.25
0.5 ) M5 (
4
0.5 )
M3 (
4
0.5 ) M6 (
0.8
0.5 )
The last building block of the proposed Adaline is the RE unit. This process reduces the error
in the CVNS digits and can be shown in the following form [15]:
((x))nc = b((x))n − ((x))n−1
2
c+ ((x))n−1
2
(5.39)
where ((x))n and ((x))n−1 are two adjacent CVNS digits and ((x))nc is the corrected CVNS digit.
Since the radix used for the implementation is two, ((x))n − ((x))n−12 will always be less than two.
Therefore, b((x))n − ((x))n−12 c can be written in the following form:
b((x))n − ((x))n−1
2
c =
0 ((x))n −
((x))n−1
2 < 1
1 ((x))n − ((x))n−12 ≥ 1
(5.40)
Considering that 8 µA is indicator of one, (5.40) can be rewritten in the following form:
96
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Figure 5.12: VLSI implementation of the the RE unit
b((x))n − ((x))n−1
2
c =
0 ((x))n −
((x))n−1
2 < 8 µA
8 µA ((x))n − ((x))n−12 ≥ 8 µA
(5.41)
VLSI implementation of the RE unit is shown in Fig. 5.12. The transistor sizes of the imple-
mented circuit are shown in Table 5.9. The implemented circuit is composed of four main sections
including a current subtractor, a current comparator, an inverter chain and an output generation.
The transistors M1 and M2 form a current mirror which copies the input ((x))n−1 to the current
97
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Table 5.9: Transistor sizes of the RE circuit
Transistor
(
W
L
)
(µmµm ) Transistor
(
W
L
)
(µmµm ) Transistor
(
W
L
)
(µmµm )
M1 (
1
0.5 ) M9 (
1
0.5 ) M17 (
0.3
0.5 )
M2 (
0.5
0.5 ) M10 (
1
0.5 ) M18 (
0.3
0.5 )
M3 (
1
0.5 ) M11 (
1
0.5 ) M19 (
1
0.5 )
M4 (
1
0.5 ) M12 (
0.6
0.2 ) M20 (
1
0.5 )
M5 (
1
0.5 ) M13 (
0.25
0.2 )
M6 (
1
0.5 ) M14 (
0.6
0.2 )
M7 (
1
0.5 ) M15 (
0.25
0.2 )
M8 (
1
0.5 ) M16 (
1
0.5 )
subtractor circuit. Considering that the WL of the transistor M2 is half of the transistor M1,
((x))n−1
2
is copied to the current subtractor circuit. The transistors M4 and M5 constitute a current mir-
ror which copies the input ((x))n to the current subtractor circuit. The transistors M5 to M7 act
as a current subtractor that produces the ((x))n − ((x))n−12 which is required for the calculation of
(5.41). This is used as the input to the current comparator circuit formed by the transistors M8
to M11. This compares the input with a current reference of 8 µA generated by the transistors
M8 and M9. The output of the current comparator is connected to the inverter chain constituted
by the transistors M12 to M15. The inverter chain generates a rail to rail voltage, which indicates
whether the ((x))n − ((x))n−12 is greater than the 8 µA or not. The output of the inverter chain is
connected to the CmpN input of the transistor M19. The transistors M16 to M19 generate an 8 µA
current, provided that the CmpN input to the transistor M19 is high. This in turn implements
the b((x))n − ((x))n−12 c required for implementation of the RE unit. The size of the transistor M20
is the same as the transistor M3. Therefore, the current
((x))n−1
2 is copied to this transistor. The
output of the transistors M19 and M20 are wired together which performs the addition. Hence, the
((x))nc = b((x))n − ((x))n−12 c + ((x))n−12 is generated at the output of the RE circuit. The simulation
results of four different inputs to the RE circuit are shown in Fig. 5.13. This confirms the proper
operation of the implemented RE circuit.
Using the CVNS multiplier, the neuron, the RE and the digital register circuits, the proposed
98
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Figure 5.13: Simulation results of the the RE circuit
Figure 5.14: Layout of the proposed Adaline
Adaline is laid out. The layout of the proposed Adaline is shown in Fig. 5.14.
The area consumption of various cells required for different structures is summarized in Table
5.10. As can be seen from Table 5.10, the area requirement of digital register is drastically lower than
that of the analog memory. This may result in area reduction of the proposed Madaline compared
to all of the previous CVNS structures.
99
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Table 5.10: Area consumption of different cells required for implementation of different structures
Cell Type Area (µm2)
8-bit MDAC Multiplier [21] 529
8-bit CVNS Multiplier 407.73
Neuron [18] 62.04
Digital Register 53.22
Analog Memory [18] 713.18
The area consumption of different structures is calculated by the summation of the area con-
sumption of the different cells required for the implementation of that structure. Therefore, the
total number of different cells required for the previous Adalines, as well as the proposed Adaline,
is determined.
The total number of multipliers required for the lumped and distributed Adalines is equal to the
number of inputs to the Adaline. However, the number of multipliers required for previous CVNS
Adalines, as well as the proposed Adaline, is equal to product of the number of inputs multiplied by
the number of CVNS digits.
In terms of the number of required neurons, the lumped Adaline requires one neuron, while
distributed and CVNS-DNN Adalines require one neuron for each input. On the other hand, the
CVNS-RE, CVNS-FDNN and the proposed Adaline require one neuron corresponding to each mul-
tiplier. Thus, the total number of neurons in theses Adalines is same as the number of multipliers.
Regarding the weight storage, all previous Adalines require weight storage medium as an input
to each multiplier. Therefore, the number of weight storage elements is determined by the product of
the total number of multipliers multiplied by the precision of the weights. In the proposed Adaline,
all multipliers corresponding to different CVNS digits representing the same input, share the same
weight storage element. Thus, the total number of weights in the proposed Adaline is equal to
product of the number of inputs multiplied by the precision of the weights. The total number of
cells required for different Adalines is summarized in Table 5.11. Here, Ni−1 and Nw are the number
of inputs to the Adaline and the precision of the weight storage medium , respectively, while D + 1
and DL + 1 are the number of CVNS digits in the corresponding Adalines.
100
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Table 5.11: Total number of different cells required for implementation of Adalines used in different
structures
Adaline Type Multiplier Neuron Weight Storage Medium
Lumped [19] MDAC Ni−1 1 register Ni−1 ×Nw
Distributed [19] MDAC Ni−1 Ni−1 register Ni−1 ×Nw
CVNS-RE [16] CVNS Ni−1 × (D + 1) Ni−1 × (D + 1) analog Ni−1 × (D + 1)×Nw
CVNS-DNN [17] CVNS Ni−1 × (D + 1) Ni−1 analog Ni−1 × (D + 1)×Nw
CVNS-FDNN [17] CVNS Ni−1 × (D + 1) Ni−1 × (D + 1) analog Ni−1 × (D + 1)×Nw
Proposed Adaline CVNS Ni−1 × (DL + 1) Ni−1 × (DL + 1) register Ni−1 ×Nw
As can be seen from Table 5.11, the number of weight storage elements in the proposed Adaline is
independent of the number of CVNS digits. Therefore, the number of weight storage elements in the
proposed Madaline will be independent of the number of CVNS digits as well. This, in combination
with the low area consumption requirement of digital registers, may result in significant reduction
in the area consumption of the proposed Madaline.
It should be noted that for the Madalines implemented here, Ni−1, Nw and D+1 are equal to 5, 8
and 3, while DL is equal to 6. Therefore, the proposed Adaline requires 30 CVNS multipliers, while
the previous CVNS Adalines require 15 CVNS multipliers. The lumped and distributed Adalines
require 5 multipliers.
Concerning the total number of neurons, the proposed Adaline requires 30 neurons, while the
CVNS-RE and CVNS-FDNN Adalines each require 15 neurons. The CVNS-DNN Adaline, similar
to the distributed Adaline, requires 5 neurons, and the lumped Adaline requires 1 neuron.
In regards to the weight storage elements, the proposed Adaline, as well as the lumped and
distributed Adalines, require 40 digital registers, while all of the previous CVNS Adalines require
120 Analog memory units. Therefore, the proposed Adaline requires more multipliers and neurons
compared to the previous CVNS Adalines, while requiring less weight storage elements.
Using the number of different cells required for each Adaline, along with their area consumption
summarized in Table 5.10, the area consumption of different Adaline structures is estimated in Table
5.12. The results summarized in this table show that the proposed Adaline area requirement is higher
than the lumped and distributed structures, while it shows 82% improvement when compared to the
101
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Table 5.12: Area consumption of different Adaline structures
Adaline Structure Area (µm2)
Lumped 4, 835.84
Distributed 5, 084.00
CVNS-RE 92, 628.15
CVNS-DNN 92, 007.75
CVNS-FDNN 92, 628.15
Proposed Adaline 16, 221.90
CVNS-DNN structure, which is the most area efficient structure among all of the previous CVNS
Adalines. To estimate the area consumption of different Madalines using the information provided
in Table 5.12, the total number of Adalines required for each Adaline should be determined. In
the three-layer Madaline implemented here, 5 Adalines are used in both the hidden layer as well as
the output layer. The previously developed structures use 10 Adalines of the same type, while the
proposed CVNS-distributed structure uses 5 distributed Adalines in the hidden layer and 5 proposed
CVNS distributed Adalines in the output layer. Considering the number of Adalines used in different
structures, and using the area consumption of different Adalines summarized in Table 5.12, the area
consumption of different Madaline structures is calculated and summarized in Table 5.13. Equations
(5.6), (5.8), (5.10), (5.12), (5.14) and (5.18) are used to calculate the NSR of the various Madaline
structures. The results are summarized in Table 5.13. Furthermore, the area×NSR is listed for
comparison.
The results summarized in Table 5.13 show that the proposed structure area requirement is
drastically lower than that of the previous CVNS structures, specifically, it demonstrates an 88%
improvement when compared to the CVNS-DNN, which is the most area efficient of the previous
CVNS Madalines. The reduction in the area consumption of the proposed Madaline structure is a
result of the decrease in the number of weight storage elements, as well as one advantage of exploiting
the digital weight storage in the proposed architecture. In terms of the NSR and area×NSR, the
CVNS-RE structure is the most efficient among all of the previously developed structures. Compared
to the CVNS-RE structure, the proposed architecture provides 30.14 dB improvement in terms of
NSR, while in terms of area×NSR, the proposed structure shows 47.74 dB improvement. Therefore,
102
5. AREA-EFFICIENT ROBUST MADALINE BASED ON CONTINUOUS VALUED NUMBER SYSTEM
Table 5.13: Area consumption, NSR and area×NSR of different Madaline structures investigated in
this case study
Madaline Structure Number of Adalines Area (µm2) NSR (dB) Area×NSR (dB)
Lumped 10 48, 358 .4 -71.69 21.99
Distributed 10 50, 840.0 -88.67 5.44
CVNS-RE 10 926, 281.5 -118.74 0.59
CVNS-DNN 10 920, 077.5 -90.31 28.96
CVNS-FDNN 10 926, 281.5 -90.31 29.02
Proposed Madaline 10 106, 529.5 -148.88 -48.33
the proposed structure is more tolerant to input and weight errors, and requires less area for a
specific NSR compared to the previous CVNS structures.
5.8 Conclusion
A new mixed-signal CVNS Adaline structure is proposed in this paper. This structure stores the
weights in digital registers while the arithmetic is based on the CVNS. In addition, the RE process
is used to decrease the error in the CVNS digits, which improves the NSR. Storing the weights
in digital registers provides a reliable and low complexity storage mechanism, at the same time
eliminating the need for the complex analog memory units, which are sensitive to process and power
supply variations, required for the implementation of the previous CVNS architectures. Moreover,
the proposed structure requires a lower number of weight storage elements compared to the previous
CVNS structures, this in turn results in a lower area overhead and reduced power consumption.
Combining the proposed Adaline with the distributed structure, the CVNS-distributed Madaline
structure is proposed in this paper. The mathematical analysis of the NSR and neuron×NSR of the
proposed Madaline structure is performed, and the comparison with the previous architectures is
conducted. The results show that the proposed Madaline structure compares favorably to all previous
architectures in terms of NSR and neuron×NSR. Furthermore, to have a circuit-level analysis, a
three-layer Madaline is implemented. The implementation results proves that the proposed structure
improves upon the previous structures in terms of the NSR and the area consumption required for
103
REFERENCES
a specific NSR.
The lower NSR provided by the proposed network decreases the effect of unavoidable weight
quantization error present in the hardware implementation, leading to a more robust architecture.
The increased efficiency of the proposed network is an indicator of a lower total number of neurons
required for a specific NSR. This in turn results in a robust neural network with a lower area overhead
and reduced power consumption.
104
REFERENCES
5.9 References
[1] B. Zamanlooy and M. Mirhassani, “Efficient hardware implementation of threshold neural net-
works,” in New Circuits and Systems Conference (NEWCAS), 2012 IEEE 10th International,
june 2012, pp. 1 –4.
[2] J. J. Martnez, J. Garrigs, J. Toledo, and J. M. Ferrndez, “An efficient and expandable hardware
implementation of multilayer cellular neural networks,” Neurocomputing, vol. 114, no. 0, pp. 54
– 62, 2013.
[3] G. Zatorre, N. Medrano, M. Sanz, B. Calvo, P. Martinez, and S. Celma, “Designing adaptive
conditioning electronics for smart sensing,” IEEE Sensors J., vol. 10, no. 4, pp. 831–838, 2010.
[4] L. Gatet, H. Tap-Beteille, and M. Lescure, “Real-time surface discrimination using an analog
neural network implemented in a phase-shift laser rangefinder,” IEEE Sensors J., vol. 7, no. 10,
pp. 1381–1387, 2007.
[5] A. Basu, S. Shuo, H. Zhou, M. H. Lim, and G.-B. Huang, “Silicon spiking neurons for hardware
implementation of extreme learning machines,” Neurocomputing, vol. 102, no. 0, pp. 125 – 134,
2013.
[6] M. Valle, “Analog VLSI implementation of artificial neural networks with supervised on-chip
learning,” Analog Integr. Circuits Signal Process., vol. 33, no. 3, pp. 263–287, Dec. 2002.
[7] M. Stevenson, R. Winter, and B. Widrow, “Sensitivity of feedforward neural networks to weight
errors,” IEEE Trans. Neural Netw., vol. 1, no. 1, pp. 71–80, 1990.
[8] Y. Xie and M. Jabri, “Analysis of the effects of quantization in multilayer neural networks using
a statistical model,” IEEE Trans. Neural Netw., vol. 3, no. 2, pp. 334 –338, mar 1992.
[9] C. Alippi, V. Piuri, and M. Sami, “Sensitivity to errors in artificial neural networks: a behavioral
approach,” in IEEE International Symposium on Circuits and Systems(ISCAS), vol. 6, may-2
jun 1994, pp. 459 –462 vol.6.
[10] S. Piche, “The selection of weight accuracies for Madalines,” IEEE Trans. Neural Netw., vol. 6,
no. 2, pp. 432–445, 1995.
[11] C. Alippi and L. Briozzo, “Accuracy vs. precision in digital VLSI architectures for signal pro-
cessing,” IEEE Trans. Comput., vol. 47, no. 4, pp. 472 –477, apr 1998.
[12] X. Zeng and D. Yeung, “Sensitivity analysis of multilayer perceptron to input and weight
perturbations,” IEEE Trans. Neural Netw., vol. 12, no. 6, pp. 1358 –1366, nov 2001.
[13] D. Yeung and X. Sun, “Using function approximation to analyze the sensitivity of MLP with
antisymmetric squashing activation function,” IEEE Trans. Neural Netw., vol. 13, no. 1, pp. 34
–44, jan 2002.
[14] S.-S. Yang, C.-L. Ho, and S. Siu, “Computing and analyzing the sensitivity of MLP due to the
errors of the i.i.d. inputs and weights based on clt,” IEEE Trans. Neural Netw., vol. 21, no. 12,
pp. 1882 –1891, dec. 2010.
[15] A. Saed, M. Ahmadi, and G. Jullien, “A number system with continuous valued digits and
modulo arithmetic,” IEEE Trans. Comput., vol. 51, no. 11, pp. 1294 – 1305, nov 2002.
105
REFERENCES
[16] M. Mirhassani, M. Ahmadi, and G. Jullien, “Robust low-sensitivity Adaline neuron based on
continuous valued number system,” Analog Integrated Circuits and Signal Processing, vol. 56,
pp. 223–231, 2008.
[17] G. Khodabandehloo, M. Mirhassani, and M. Ahmadi, “Resistive-type CVNS distributed neural
networks with improved noise-to-signal ratio,” IEEE Trans. Circuits Syst. II, vol. 57, no. 10,
pp. 793 –797, oct. 2010.
[18] ——, “A prototype CVNS distributed neural network using synapse-neuron modules,” IEEE
Trans. Circuits Syst. I, vol. PP, no. 99, pp. 1 –9, 2012.
[19] H. Djahanshahi, M. Ahmadi, G. Jullien, and W. Miller, “Quantization noise improvement in
a hybrid distributed-neuron ANN architecture,” IEEE Trans. Circuits Syst. II, vol. 48, no. 9,
pp. 842 –846, sep 2001.
[20] B. Widrow and M. Lehr, “30 years of adaptive neural networks: perceptron, Madaline, and
backpropagation,” Proceedings of the IEEE, vol. 78, no. 9, pp. 1415–1442, 1990.
[21] M. Mirhassani, M. Ahmadi, and W. Miller, “A new mixed-signal feed-forward neural network
with on-chip learning,” in Neural Networks, 2004. Proceedings. 2004 IEEE International Joint
Conference on, vol. 3, July, pp. 1729–1734 vol.3.
[22] K. Asanovir, and N. Morgan, “Experimental determination of precision requirements for back-
propagation training of artificial neural networks,” in 2nd International Conference on Micro-
electronics for Neural Network, 1991, pp. 9–15.
[23] J. Han, M. Kamber, J. Pei, “Data mining: concepts and techniques, 3rd ed.,” Morgan Kaufmann
Publishers Inc., San Francisco, CA, USA, 2011.
[24] D. Freitas and K. Current, “CMOS current comparator circuit,” Electronics Letters, vol. 19,
no. 17, pp. 695–697, 1983.
[25] G. Khodabandehloo, M. Mirhassani, and M. Ahmadi, “Analog implementation of a novel
resistive-type sigmoidal neuron,” IEEE Trans. VLSI Syst., vol. 20, no. 4, pp. 750–754, April
2012.
106
Chapter 6
Efficient VLSI Implementation of Neural
Networks with Hyperbolic Tangent
Activation Function
6.1 Introduction
Neual networks have a wide range of applications in analog and digital signal processing. Hardware
implementation of neural networks has been used in applications such as pattern recognition [1],
optical character recognition [2], test of analog circuits [3], real-time surface discrimination [4],
smart sensing [5] and identification of heavy ions [6].
The main building blocks needed for hardware implementation of neural networks are multiplier,
adder and nonlinear activation function. A lot of research has been done in digital implementation
of multipliers and adders which can be readily used leaving the nonlinear activation function as the
most complex building block.
To implement the neuron, various nonlinear activation functions such as threshold, sigmoid and
hyperbolic tangent can be used. Hyperbolic tangent and sigmoid are mostly used because their
differentiable nature makes them compatible with back propagation algorithm. Both activation
functions have s-shaped curve while their output range is different. Because of the exponentiation
and division terms present in sigmoid and hyperbolic tangent activation function, it is hard to realize
107
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
the hardware implementation of these functions directly.
To solve the implementation problem, approximation methods are generally applied. These
methods are based on Piecewise Linear approximation (PWL), piecewise nonlinear approximation,
Lookup Table (LUT), bit-level mapping and hybrid methods. Generally, in PWL approximation
methods, the function is divided into segments and linear approximation is used in each segment.
This method is used in [7], [8] and [9] for the hyperbolic tangent and sigmoid function implemen-
tation. Another PWL approximation method is introduced in [10] and [11]. Unlike other PWL
methods, the developed method is not based on input domain segmentation and exploits lattice
algebra based Centred Recursive Interpolation (CRI) algorithm.
The piecewise nonlinear approximation is similar to PWL method with the difference that nonlin-
ear approximation is used in each segment. This method is used in [12] to approximate the sigmoid
function and scheme 4 of [9] is proposed for approximating both sigmoid and hyperbolic tangent.
In the LUT based methods, input range is divided to equal sub-ranges and each sub-range is
approximated by a value stored in LUT. This method is used in [13] to implement the hyperbolic
tangent.
Bit-level mapping method approximates output based on a direct bit-level mapping of input. This
method can be implemented using purely combinational circuits and is used in [14] to implement
the sigmoid function.
Hybrid methods use a combination of the aforementioned methods. Examples include [15] and
[16] which have used a combination of PWL and LUT methods for hyperbolic tangent activation
function implementation.
The approximation error present in all methods affects the neural network performance. The
study performed in [17] and [18] shows that the nonlinear activation function implementation with
higher accuracy improves the learning and generalization capabilities of neural networks. However,
implementations with higher accuracy require more silicon area and decrease the network operation
speed. Therefore, having nonlinear activation function hardware structures with lower area and
higher speed for a specified accuracy becomes a key issue.
In general, PWL and piecewise nonlinear approximation methods need multiplications while the
LUT and bit-level mapping methods use no multipliers. An exception is the CRI based method
which is a PWL approximation while it requires no multipliers. However, it requires large num-
ber of registers [10]. The choice of approximation method depends on the target implementation
technology. The hardware implementation of neural networks is mostly done in FPGA or ASIC.
Because the current FPGAs provide a large number of multipliers, PWL and piecewise nonlinear
108
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
approximation based methods are appropriate selection for FPGA implementation. However, due
to high area requirements and delay of the multipliers in ASIC implementation, LUT s and bit-level
mapping are more suitable.
The focus of this paper is on ASIC implementation of hyperbolic tangent function. Hyperbolic
tangent has an output range of [-1,1] and is defined as follows:
tanh(x) =
ex − e−x
ex + e−x
(6.1)
In the method proposed by Lin and Wang [8], PWL is used to approximate the first derivative of
hyperbolic tangent. Then, the first derivative approximation is integrated to obtain the hyperbolic
tangent function. Lebouf et.al [13] have proposed a new LUT-based structure for the hyperbolic
tangent activation function. The proposed structure is based on Range Addressable Lookup Table
(RALUT) [19]. A hybrid architecture is proposed by Namin et.al [15] which uses a simple PWL
approximation in combination with RALUT. Another hybrid structure is proposed by Meher [16]
which is based on a linear approximation in combination with LUT. The values stored in LUT are
determined by the proposed boundary selection method.
In this paper a new hybrid architecture which is based on linear approximation in combina-
tion with bit-level mapping is proposed. The proposed architecture takes into account maximum
allowable error as the design parameter.
The proposed approximation scheme divides the input range to three different regions using
different strategy in each region. A mathematical analysis of the proposed approximation scheme in
each region is provided.
The mathematical analysis shows that the proposed scheme requires less number of output bits
for the same maximum error compared to the previous architectures. The hardware implementation
of the proposed structure is realized in CMOS 0.18 µm to show the efficiency of proposed structure
in terms of area, delay and product of area and delay compared to the previous architectures.
The proposed structure is used for implementing a 4-3-2 network in CMOS 0.18 µm. Post layout
simulation results show that the proposed structure results in a neural network implementation with
lower area, delay and power.
The rest of this paper is organized as follows. In the next section, the proposed approximation
scheme is discussed. The mathematical analysis for selection of minimum number of input and
output bits is provided in section 6.3. The domain boundaries of different regions are found in
section 6.4. The proposed structure based on the mathematical analysis done is explained in section
6.5. Hardware implementation of the hyperbolic tangent function and comparison with existing
109
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
structures is done in section 6.4. Neural network implementation using the proposed structure is
discussed in section 6.7. Finally, conclusions are drawn in section 6.8.
6.2 Proposed Approximation Scheme
In this section, mathematical analysis of approximation scheme used for hardware implementation of
hyperbolic tangent function is provided. The mathematical analysis in this section and the following
sections uses the basic properties of hyperbolic tangent function.
Hyperbolic tangent is an odd function.
tanh(−x) = − tanh(x) (6.2)
Using this property, only the absolute value of input is processed and the input sign is directly
passed to the output.
The Taylor series expansion of hyperbolic tangent is as follows:
tanh(x) = x− x
3
3
+
2x5
15
− 17x
7
315
+ ... (6.3)
For small values of x the higher order terms become small and can be ignored. Therefore, the
hyperbolic tangent passes the small input values to output.
lim tanh(x) = x
x→ 0 (6.4)
The output variation for large values of input is low.
d tanh(x)
dx
= 0
x→∞ (6.5)
Considering the two last properties, input range is divided to three regions. Region I in which
the output is approximately equal to input is named pass region while because of low variation
of output in region III it is named saturation region. Region II includes the rest of input range,
named processing region. Determining the boundary of each region is discussed later in section 6.4.
Different regions of hyperbolic tangent function are shown in Fig. 6.1.
6.2.1 Output Approximation in the Pass Region
The input and output of hyperbolic tangent function are represented as signed-magnitude notation.
Therefore, considering the first basic property of hyperbolic tangent function discussed in previous
110
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
Figure 6.1: Different regions of hyperbolic tangent function
section, input sign bit is directly passed to the output sign bit and only the absolute value of input
is processed. Absolute value of input, in binary format can be represented based on the following
equation:
x =
Ni−1∑
k=−Nf
xk × 2k =
Ni−1∑
k=0
xk × 2k +
−1∑
k=−Nf
xk × 2k (6.6)
where Ni and Nf are the number of bits for integer and fractional part of the input and xk is
binary digit and can assume values 0 or 1.
In the pass region, output is approximated by passing the input to the output which means that
a linear approximation is used in this region. The inputs in the pass region include the values close
to the origin which are represented by fractional part of the input.
The absolute value of hyperbolic tangent function output is in the range of [0,1] and can be
shown as follows:
tanh(x) =
−1∑
k=−Nout
yk × 2k =
−1∑
k=−Nf
yk × 2k +
−(Nf+1)∑
k=−Nout
yk × 2k
(6.7)
where Nout is the number of bits used for representation of absolute value of output and yk can
assume one of the values 0 or 1.
Using (6.6) and (6.7), yk is obtained as follows:
111
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
yk =
xk −Nf ≤ k ≤ −10 −Nout ≤ k < −Nf (6.8)
Based on (6.8), the fractional part of input is shifted to left by Nout−Nf bits and then is passed
to the output.
6.2.2 Output Approximation in the Processing Region
Before going through the proposed approximation scheme in this region, a new parameter named
None is introduced. This parameter is an indicator of position of the first occurrence of one in binary
input, when scanned from left. Therefore, based on this parameter, the input range can be shown
as follows:
2None ≤ x < 2None+1 (6.9)
This input range is divided into equal sub-ranges. The number of these sub-ranges, N , is based
on the equation shown as follows:
N = 2(None+Nf−i) 0 ≤ i ≤ None +Nf
Based on the value of N , sub-ranges within the input range are as follows:
2None
(
1 +
j
N
)
≤ x < 2None
(
1 +
j + 1
N
)
0 ≤ j < N (6.10)
To have an approximation value close to all outputs corresponding to an input sub-range, the
average value of outputs is considered as the approximation value as follows:
2
None+Nf
N∑
k=0
tanh
(
2None
(
1 + jN
)
+ k × 2−Nf )
2None+Nf
N
(6.11)
The total number of sub-ranges is found based on the fact that the difference between all values
inside a sub-range and the approximation value found using (6.11) should be less than maximum
allowable approximation error in this region, which results in the following equation:
∣∣∣∣∣∣∣tanh
(
2None
(
1 +
j
N
)
≤ x < 2None
(
1 +
j + 1
N
))
−
2
None+Nf
N∑
k=0
tanh
(
2None
(
1 + jN
)
+ k × 2−Nf )
2None+Nf
N
∣∣∣∣∣∣∣ < a
0 ≤ j < N
(6.12)
112
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
where a is the maximum allowable approximation error in the processing region. a depends on
the maximum allowable error described in section 6.3.
6.2.3 Output Approximation in the Saturation Region
The hyperbolic tangent function reaches its maximum value in the saturation region, while at the
same time output variation in this region is low. Therefore, all output values in this region are
approximated by the maximum value representable by the output bits. Using (6.7) this value is
equal to 1− 2−Nout .
6.3 Selection of Number of Input and Output Bits
In this section a mathematical analysis is presented, which allows for optimal finding of the number
of input and output bits, required for hardware implementation of the proposed approximation
scheme.
6.3.1 Selection of Number of Input Bits
The representation of absolute value of input in binary format can be shown using (6.6). The
number of bits needed for integer part depends on the input range. Therefore, to cover the range it
is required to have:
2Ni ≥ ri (6.13)
where ri is the input range. Using (6.13), Ni can be written in the following form:
Ni ≥
⌈
ln ri
ln 2
⌉
(6.14)
in which d e is the ceiling function which rounds its input towards the next highest integer.
In comparison the number of bits used for fractional part is determined by the maximum allowable
error. It should be noted that input region between two consecutive points x1 and x2 can be
approximated as tanh(x1) having an error lower than maximum allowable error provided that the
following equation is satisfied.
tanh(x2)− tanh(x1) ≤  (6.15)
where  is the maximum allowable error.
The hyperbolic tangent change between two consecutive inputs is proportional to the hyperbolic
tangent derivative shown in Fig 6.2. Therefore, the maximum change of hyperbolic tangent function
113
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
Figure 6.2: Hyperbolic tangent function derivative
between two consecutive points occurs in the region which is close to the origin. Based on (6.4),
hyperbolic tangent output is approximately equal to its input in this region and therefore (6.15) can
be simplified as follows:
x2 − x1 ≤  (6.16)
The difference between two consecutive points in the input is determined by the number of bits
used for representing the fractional part of the input and is equal to 2−Nf . Thus, (6.16) can be
written as follows:
2−Nf ≤  (6.17)
which results in the following equation:
Nf ≥
⌈
− ln 
ln 2
⌉
(6.18)
using (6.14) and (6.18) it can be written:
Ninp =
⌈
ln ri
ln 2
⌉
+
⌈
− ln 
ln 2
⌉
(6.19)
where Ninp is the minimum number of input bits required for representation of absolute value of
input.
6.3.2 Selection of Number of Output Bits
As previously discussed, the hyperbolic tangent function is divided into three regions including pass,
processing and saturation region. The number of bits required for output representation in these
114
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
three regions, assuming a maximum allowable error of , depends on the properties of each region
as will be discussed.
Pass Region
In the pass region, input is passed to the output. The pass region is where the inputs are close to the
origin. These points are represented by fractional part of the input. Therefore, minimum number
of bits required in this region is equal to:
Nout ≥ Nf (6.20)
Processing Region
The output error in the processing region is composed of two elements. The first one is the ap-
proximation error while the second one is the quantization error of representing the approximated
output. The total error caused by these sources should be less than maximum allowable error which
is shown as follows:
a + q ≤  (6.21)
where a is the maximum allowable approximation error and q is the maximum quantization
error of representing the approximated output.
The quantization error, q , is proportional to the number of bits used for output representation.
If rounding method is used for quantization of output, maximum quantization error is going to be
equal to half of Low Significant Bit (LSB) which is equal to 2−(Nout+1). Therefore, the maximum
allowable approximation error can be obtained as follows:
a = − 2−(Nout+1) (6.22)
It should be noted that the maximum allowable approximation error found through (6.22) was
used in (6.12) in order to find the number of sub-ranges inside the processing region. Hence the
change in number of output bits changes the number of sub-ranges in this region.
Saturation Region
The approximated value of output in this region is equal to 1-2−Nout . This represents the maximum
value of hyperbolic tangent function with an error which is less than the maximum allowable error.
This results in the following equation: ∣∣2−Nout∣∣ ≤  (6.23)
115
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
which can be written in the following form:
Nout ≥
⌈
− ln 
ln 2
⌉
(6.24)
By comparing (6.24) and (6.18), minimum number of bits needed for output representation in
the saturation region is equal to Nf .
The minimum number of bits needed for output representation in the pass and saturation region
is equal to Nf while there is no condition in the processing region. Therefore, the minimum number
of bits needed for output representation of absolute value of hyperbolic tangent function using the
proposed approximation scheme is equal to Nf .
The number of output bits required in the proposed approximation scheme is lower than the
number of input bits while all previously developed architectures use the same number of input and
output bits. Reduction in number of output bits may result in efficient hardware implementation.
This will be investigated more in the next sections.
6.4 Determining the Boundaries for Different Regions
In this section using the maximum allowable error as design parameter, boundaries of each region
is determined.
6.4.1 Pass Region
Based on the Taylor series expansion of hyperbolic tangent function for small values of x, higher
order terms can be ignored. Therefore, the first three terms shown below are sufficient to present
hyperbolic tangent function.
tanh(x) ∼ x− x
3
3
+
2x5
15
(6.25)
Since in the pass region input is passed to the output, boundary of pass region, xpa, can be found
using the following equation: ∣∣∣∣∣x3pa3 − 2x5pa15
∣∣∣∣∣ ≤  (6.26)
The xpa obtained should be rounded to the nearest lower value representable by the input bits.
Therefore, the quantized value of xpa can be written in the following form:
xpaq =
⌊
xpa × 2
Ninp
ri
⌋
2Ninp
ri
(6.27)
where b c is the floor function and 0 ≤ x ≤ xpaq is considered as the pass region.
116
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
6.4.2 Saturation Region
The starting point of the saturation region is when the difference between hyperbolic tangent function
and its approximation becomes equal to maximum allowable error. Therefore, the starting point of
saturation region, xs, is found as follows:
xs = tanh
−1 (1− 2−Nout − ) = 1
2
ln
(
2− 2−Nout − 
2−Nout + 
)
(6.28)
The xs obtained should be rounded to the nearest higher value representable by input bits.
Therefore, the quantized value of xs can be written in the following form:
xsq =
⌈
xs × 2
Ninp
ri
⌉
2Ninp
ri
(6.29)
where x ≥ xsq is considered as the saturation region.
6.4.3 Processing Region
The region between pass and saturation region is considered as processing region which can be shown
as follows:
xpaq < xpr < xsq (6.30)
where xpr is an input in the processing region.
6.5 Proposed Structure
Block diagram of the proposed structure is shown in Fig. 6.3. The hardware is composed of two
main blocks including hyperbolic tangent approximation and output assignment.
6.5.1 Hyperbolic Tangent Approximation
This block is composed of three main blocks to approximate the hyperbolic tangent function in all
three regions including saturation, processing and pass region. General arithmetic operations in
each region can be described as follows:
Pass Region
In this region fractional part of input is passed to the output. Based on (6.8), a shift to left by
Nout −Nf bits before passing the input to output is required.
117
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
Figure 6.3: Block diagram of the proposed structure
Processing Region
For inputs in the processing region, a bit-level input mapping is required. The number of bit-level
mapping blocks required is equal to the number of input ranges in this region. For each input range
in the processing region, logN2 bits after None bit of input should be mapped to output bits using the
bit-level mapping. Using logN2 bits after None bit covers all sub-ranges. The number of sub-ranges
(N) is calculated using (6.12) while the output of each sub-range is found using (6.11). The bit-level
mapping can be implemented using a combinational circuit.
Saturation Region Approximation
In this region hyperbolic tangent function is approximated by the maximum value representable by
output bits, and can be realized by setting all output bits to one.
118
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
6.5.2 Output Assignment
The input range decoder detects the None introduced previously which is set by input range and
region of operation respectively. Depending on the input range, a multiplexer is used to obtain the
appropriate output value.
To illustrate the proposed approximation scheme and structure, an example is presented. The
example shows different steps of the design procedure.
Example: Design procedure for  = 0.04 considering an input range of (-8,8)
1) Determining the number of input and output bits: using (6.19) we have Ninp = 8. The
minimum number of output bits required is equal to Nf . Using (6.18) we have Nout = 5.
2) Determining the boundaries of pass, processing and saturation regions: Using (6.26), (6.27),
(6.28), (6.29) and (6.30) the pass, saturation and processing region boundaries are found equal to
xpaq = 0.5, xsq = 1.65625 and 0.5 < xpr < 1.65625.
3) Output assignment in pass region: In this region, the fractional part of input is shifted to left
by Nout −Nf bits and passed to the output. Therefore, no shift is required in this example and the
fractional part of input is directly passed to the output.
4) Output assignment in saturation region: In saturation region, the output value is equal to
1− 2−5 or 0.96875.
5) Output assignment in processing region: First the maximum allowable approximation error
is found using (6.22) which is equal to 0.024. Then, the number of sub-ranges, N , is found using
(6.12). Finally, sub-ranges are found using (6.10) and the appropriate value of each sub-range is
assigned using (6.11).
The Table 6.1 summarizes these values for different input ranges and sub-ranges.
Also, quantization error, approximation error and total error for the considered case are shown
in figures Fig. 6.4a to Fig. 6.4c . The approximated output and ideal output are shown in Fig. 6.4d.
6) Designing the proposed structure: As can be seen from Table 6.1, there are five different input
ranges in the considered example. The input range is detected by the input range decoder.
The input range decoder detects the input range using None. Table 6.2 shows the input range
decoder truth table which can be implemented using a fully combinational circuit.
The input range 0 < x < 0.5 (r5) is in the pass region. Considering that Nout is equal to Nf no
shift is required and the hyperbolic tangent is approximated by passing the fractional part of input
to output directly.
The input ranges 0.5 < x < 1 and 1 < x < 2 (r4 and r3 ) are in the processing region. Therefore,
for hyperbolic tangent approximation a bit-level mapping is required in these input ranges.
119
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
Table 6.1: Output value for different input ranges and sub-ranges
Input Range Input Sub-Range
Output Value
xi1 xi2 xs1 xs2
4 8 — — 0.96875
2 4 — — 0.96875
1 2
1.875 2 0.96875
1.75 1.875 0.96875
1.625 1.75 0.9375
1.5 1.625 0.90625
1.375 1.5 0.90625
1.25 1.375 0.875
1.125 1.25 0.84375
1 1.125 0.78125
0.5 1
0.9375 1 0.75
0.875 0.9375 0.71875
0.8125 0.875 0.6875
0.75 0.8125 0.65625
0.6875 0.75 0.625
0.625 0.6875 0.5625
0.5625 0.625 0.53125
0.5 0.5625 0.5
0 0.5 — — Input
In the input range 1 < x < 2, the None is equal to 1 and the number of sub-ranges (N) is equal
to 8. Therefore, a bit-level mapping on the log82 = 3 bits after the None bit of the input is required
to generate the output. Table 6.3 shows the required bit-level mapping.
120
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
(a) (b)
(c) (d)
Figure 6.4: (a) approximation error (b) quantization error (c) total error (d) ideal and approximated
output
For input range 0.5 < x < 1, the None is equal to 0 and the number of sub-ranges (N) is equal
to 8. Therefore, a bit-level mapping on the log82 = 3 bits after the None bit of the input is required
to generate the output. Table 6.4 shows the required bit-level mapping.
These tables can be implemented using a purely combinational circuit.
In the input ranges 2 < x < 4 and 4 < x < 8 (r2 and r1), input is in the saturation region and
hyperbolic tangent is approximated by setting all output bits to 1.
The multiplexer assigns the output value using the input range decoder. For each input range
the multiplexer transfers the approximation in that range to the output.
The hardware implementation of the considered example is shown in Fig. 6.5. It should be noted
121
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
Table 6.2: Input range decoder
Input Range xi1 xi2 x2 x1 x0 x−1
r1 4 8 1 X X X
r2 2 4 0 1 X X
r3 1 2 0 0 1 X
r4 0.5 1 0 0 0 1
r5 0 0.5 0 0 0 0
that the bit-level mapping needed for input ranges r3 and r4 is implemented using Tables 6.3 and
6.4 while the input range decoder is implemented based on Table 6.2.
6.6 Hardware Implementation of the Hyperbolic Tangent Func-
tion and Comparison with Existing Structures
The proposed structure in the previous section is implemented with maximum allowable errors of
0.02 and 0.04. The proposed structure is coded using Verilog hardware description language and
synthesized by Synopsys Design Compiler using TSMC 0.18 µm library.
Meher [16] has synthesized his proposed structure using TSMC 90 nm library while the synthesis
of all other previously developed architectures is done using 0.18 µm library. To have a fair com-
parison between all architectures, we have coded the design of Meher [16] in Verilog and synthesized
using TSMC 0.18 µm library. It should be noted that signed-magnitude notation is used for input
and output representation.
The comparison of different structures for  = 0.04 and  = 0.02 is summarized in Tables 6.5 and
6.6. These tables include the input range, number of input bits, number of output bits, maximum
error after design and synthesis results which are area and delay. The maximum error after design
is evaluated for 106 points uniformly distributed in the input range [12]. Also, to have a comparison
of number of cells required for different designs, the gate count measure which is the design area
normalized with respect to two input NAND gate area is included. Moreover, considering that both
area and delay are important in hardware design the area×delay is included in these tables too.
The method used by Lin and wang [8] is based on PWL approximation which requires multipli-
122
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
Figure 6.5: Hardware implementation of the considered example
cation that has high area requirement and delay.
The other three previously proposed architectures use LUT. In [13] two LUT based structures are
proposed to implement the hyperbolic tangent function. In the first structure 512 and 1024 points
for errors of  = 0.04 and  = 0.02 are stored in LUT. The second structure is based on RALUT.
Using RALUT, number of stored points for errors of  = 0.04 and  = 0.02 is reduced to 61 and 127.
The reduction in number of stored points reduces the area consumption.
In the architecture proposed by Namin et.al [15], a simple PWL approximation in combination
with RALUT is used. The RALUT stores the difference between PWL approximation and the
hyperbolic tangent function which results in a reduction in number of stored points compared to
the RALUT used in [13]. This reduction lowers the area required while because of the subtraction
present in the proposed architecture, it has more delay.
Meher [16] has proposed an optimized LUT based architecture. The proposed architecture is
based on linear approximation in combination with LUT and requires 7 and 15 stored points for
123
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
Table 6.3: Bit-level mapping for the input range 1 < x < 2
x0 x−1 x−2 y−1 y−2 y−3 y−4 y−5
0 0 0 1 1 0 0 1
0 0 1 1 1 0 1 1
0 1 0 1 1 1 0 0
0 1 1 1 1 1 0 1
1 0 0 1 1 1 0 1
1 0 1 1 1 1 1 0
1 1 0 1 1 1 1 1
1 1 1 1 1 1 1 1
Table 6.4: Bit-level mapping for the input range 0.5 < x < 1
x−1 x−2 x−3 y−1 y−2 y−3 y−4 y−5
0 0 0 1 0 0 0 0
0 0 1 1 0 0 0 1
0 1 0 1 0 0 1 0
0 1 1 1 0 1 0 0
1 0 0 1 0 1 0 1
1 0 1 1 0 1 1 0
1 1 0 1 0 1 1 1
1 1 1 1 1 0 0 0
errors of  = 0.04 and  = 0.02. The reduction in the number of stored points reduces the area
required for hardware implementation.
Our proposed structure is based on a linear approximation in combination with bit-level mapping
which removes the need to store points and can be implemented using a purely combinational circuit.
124
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
Table 6.5: Comparison of different structures for =0.04
Structure Input Ninp Nout Maximum Area Gate Count Delay Area × Delay
Range Error (µm2) (ns) (µm2×ns)
Scheme-1 [8] (-8,8) 24 24 0.0430 32069.83 3214 903 2.896 × 107
LUT [13] (-8,8) 8 8 0.0365 9045.94 907 2.15 1.944 × 104
RALUT [13] (-8,8) 8 8 0.0357 7090.40 711 1.85 1.311 × 104
Hybrid [15] (-8,8) 8 8 0.0361 3646.83 366 2.31 8.424 × 103
Optimized (-8,8) 8 8 0.0401 954.67 96 2.09 1.995 × 103
LUT [16]
Proposed (-8,8) 8 5 0.0378 695.22 70 0.95 6.604 × 102
Also, the number of output bits required in the proposed structure is lower than all previously
proposed architectures which reduces the area consumption.
On the other hand, the simple input range decoding method which only uses the position of the
first occurrence of one in binary input in combination with bit-level mapping provides a high speed
structure. This is confirmed by synthesis results shown in Tables 6.5 and 6.6 which show that the
proposed structure in both cases compares favorably to the previously proposed structures in terms
of area, delay and area×delay.
6.7 Neural Network Implementation Using the Proposed Struc-
ture for Hyperbolic Tangent Activation Function
General configuration of a Madaline is shown in Fig. 6.6. The Madaline has L + 1 layers with N0
inputs, and Ni Adalines in each layer i. Multiplication and addition are the arithmetic operations
required in neural network implementation.
The output of activation function should be multiplied by weights. Therefore, the multiplier size,
Sm, in hidden layers of neural network can be written in the following form:
Sm = Nw ×Nout (6.31)
125
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
Table 6.6: Comparison of different structures for =0.02
Structure Input Ninp Nout Maximum Area Gate Count Delay Area × Delay
Range Error (µm2) (ns) (µm2×ns)
Scheme-1 [8] (-8,8) 24 24 0.0220 83559.17 8374 1293 1.080 × 108
LUT [13] (-8,8) 9 9 0.0180 17864.24 1791 2.45 4.377 × 104
RALUT [13] (-8,8) 9 9 0.0178 11871.53 1190 2.12 2.517 × 104
Hybrid [15] (-8,8) 9 9 0.0189 5130.78 515 2.80 1.437 × 104
Optimized (-8,8) 9 9 0.0205 1603.32 161 2.82 4.521 × 103
LUT [16]
Proposed (-8,8) 9 6 0.0196 1280.66 129 2.12 2.714 × 103
where Nw is the number of bits used for synaptic weight storage while Nout is the number of output
bits of the activation function.
The multiplication results of synapses connected to an Adaline in the next layer should be added
before passing through the activation function of that layer. Considering (6.31), the number of
output bits of multipliers is equal to Nw + Nout. Therefore the size of adders, Sa, required for
addition of multiplication results of Ni synapses between layer i and i + 1 can be written in the
following form:
Sa = (Nw +Nout)×Ni (6.32)
Therefore, the size of multipliers and adders in the hidden layers of neural network depend on
the number of output bits of the activation function.
For an specific maximum allowable error, the proposed structure requires less number of output
bits compared to the previously developed architectures. Therefore, bit width of multipliers and
adders in the hidden layers of the network using proposed structure as its activation function is
lower. Multipliers and adders with lower bit width have lower area, delay and power consumption.
Therefore, using proposed structure results in efficient VLSI implementation of neural networks with
hyperbolic tangent activation function.
To evaluate the efficiency of proposed structure, it is used to implement a 4-3-2 network for an
optical template matching application. The general neural network block diagram is shown in Fig.
126
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
Figure 6.6: Three layer Madaline general configuration
Figure 6.7: Block diagram of the implemented network
6.7. It is capable of recognizing six different input patterns and classifying them as four different
classes. The optical input patterns and their related class is shown in Fig. 6.8.
The network is coded using Verilog hardware description language and synthesized by Synopsys
Design Compiler using TSMC 0.18 µm library. The signal processing in the implemented network
is based on fixed point arithmetic and the proposed structure implemented in the previous section
which had maximum allowable errors of  = 0.02 and  = 0.04 is used as activation function. The
network training is done off-chip and the calculated weights are stored on the registers inside the
chip.
The same network is also coded and synthesized using the structure proposed by Meher [16] with
maximum allowable errors of  = 0.02 and  = 0.04.
Post layout simulation results show that the implemented network using the proposed structure
and the one proposed by Meher [16] for both cases of  = 0.02 and  = 0.04 performs well.
The post layout simulation results of the implemented network are summarized in Table 6.7
127
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
Figure 6.8: Optical input patterns and their related class
which show that the implementation of the network using proposed structure results in efficient
VLSI implementation in terms of area, delay and power for both maximum allowable errors of
 = 0.02 and  = 0.04.
6.8 Conclusion
A new approximation scheme for hyperbolic tangent is proposed in this paper. The proposed ap-
proximation scheme is based on a mathematical analysis considering maximum allowable error as
design parameter.
Based on the proposed approximation scheme, a hybrid architecture for hardware implementation
of hyperbolic tangent activation function is presented. The synthesis results show that the proposed
structure compares favorably to the previously developed architectures in terms of area, delay and
area × delay.
The proposed structure requires less number of output bits for the same maximum allowable error
compared to the previously developed architectures. Reduction in number of activation function
128
6. EFFICIENT VLSI IMPLEMENTATION OF NEURAL NETWORKS WITH HYPERBOLIC TANGENT ACTIVATION FUNCTION
Table 6.7: Comparison of network implementation
Activation Function Maximum Error Area (µm2) Gate Count Delay Power
(ns) (mw)
Proposed 0.04 121284.94 12154 7.72 8.98
Optimized 0.04 156957.42 15729 11.22 11.21
LUT [16]
Proposed 0.02 144451.59 14476 8.68 9.34
Optimized 0.02 181124.80 18151 11.81 11.93
LUT [16]
output bits results in multipliers and adders with lower bit width which in turn reduces the area,
power and delay in VLSI implementation of neural networks. The proposed structure is used for
implementing a 4-3-2 network which is capable of recognizing six different input patterns. Post
layout simulation results show that the proposed structure results in an efficient neural network
VLSI implementation in terms of area, delay and power.
129
REFERENCES
6.9 References
[1] V. Koosh and R. Goodman, “Analog VLSI neural network with digital perturbative learning,”
IEEE Trans. Circuits Syst. II, vol. 49, no. 5, pp. 359–368, May 2002.
[2] D. Kim, H. Kim, H. Kim, G. Han, and D. Chung, “A SIMD neural network processor for image
processing,” in Advances in Neural Networks, vol. 3497, pp. 815–815, 2005.
[3] D. Maliuk, H.-G. Stratigopoulos, and Y. Makris, “An analog VLSI multilayer perceptron and
its application towards built-in self-test in analog circuits,” in 2010 IEEE 16th International
On-Line Testing Symposium (IOLTS) , Jul. 2010, pp. 71–76.
[4] L. Gatet, H. Tap-Beteille, and M. Lescure, “Real-time surface discrimination using an analog
neural network implemented in a phase-shift laser rangefinder,” IEEE Sensors J., vol. 7, no. 10,
pp. 1381–1387, Oct. 2007.
[5] G. Zatorre, N. Medrano, M. Sanz, B. Calvo, P. Martinez, and S. Celma, “Designing adaptive
conditioning electronics for smart sensing,” IEEE Sensors J., vol. 10, no. 4, pp. 831–838, Apr.
2010.
[6] R. Jimenez, M. Sanchez-Raya, J. Gomez-Galan, J. Flores, J. Duenas, and I. Martel,
“Implementation of a neural network for digital pulse shape analysis on a FPGA for on-line
identification of heavy ions,” Nuclear Instruments and Methods in Physics Research Section A,
vol. 674, pp. 99–104, 2012.
[7] A. Armato, L. Fanucci, E. Scilingo, and D. D. Rossi, “Low-error digital hardware
implementation of artificial neuron activation functions and their derivative,” Microprocessors
and Microsystems, vol. 35, no. 6, pp. 557–567, 2011.
[8] C. W. Lin and J. S. Wang, “A digital circuit design of hyperbolic tangent sigmoid function for
neural networks,” in IEEE International Symposium on Circuits and Systems (ISCAS), May
2008, pp. 856–859.
[9] S. Vassiliadis, M. Zhang, and J. Delgado-Frias, “Elementary function generators for neural-
network emulators,” IEEE Trans. Neural Netw., vol. 11, no. 6, pp. 1438–1449, Nov. 2000.
[10] K. Basterretxea, J. Tarela, and I. del Campo, “Digital design of sigmoid approximator for
artificial neural networks,” Electronics Letters, vol. 38, no. 1, pp. 35 –37, Jan. 2002.
[11] K. Basterretxea, J. Tarela, and I. del Campo, “Approximation of sigmoid function and the
derivative for hardware implementation of artificial neurons,” Circuits, Devices and Systems,
IEE Proceedings -, vol. 151, no. 1, pp. 18 – 24, Feb. 2004.
[12] M. Zhang, S. Vassiliadis, and J. Delgado-Frias, “Sigmoid generators for neural computing using
piecewise approximations,” IEEE Trans. Comput., vol. 45, no. 9, pp. 1045–1049, Sep. 1996.
[13] K. Leboeuf, A. Namin, R. Muscedere, H. Wu, and M. Ahmadi, “High speed VLSI imple-
mentation of the hyperbolic tangent sigmoid function,” in Third International Conference on
Convergence and Hybrid Information Technology (ICCIT) Nov. 2008, pp. 1070–1073.
130
REFERENCES
[14] M. Tommiska, “Efficient digital implementation of the sigmoid function for reprogrammable
logic,” Computers and Digital Techniques, IEE Proceedings , vol. 150, no. 6, pp. 403–411, Nov.
2003.
[15] A. Namin, K. Leboeuf, R. Muscedere, H. Wu, and M. Ahmadi, “Efficient hardware implementa-
tion of the hyperbolic tangent sigmoid function,” in IEEE International Symposium on Circuits
and Systems (ISCAS), May 2009, pp. 2117–2120.
[16] P. Meher, “An optimized lookup-table for the evaluation of sigmoid function for artificial neural
networks,” in 18th IEEE/IFIP VLSI System on Chip Conference (VLSI-SoC), Sep. 2010, pp.
91–95.
[17] K. Basterretxea, J. Tarela, I. del Campo, and G. Bosque, “An experimental study on nonlinear
function computation for neural/fuzzy hardware design,” IEEE Trans. Neural Netw., vol. 18,
no. 1, pp. 266 –283, Jan. 2007.
[18] K. Basterretxea, “Recursive sigmoidal neurons for adaptive accuracy neural network implemen-
tations,” in Adaptive Hardware and Systems (AHS), 2012 NASA/ESA Conference on, June
2012, pp. 152 –158.
[19] R. Muscedere, V. Dimitrov, G. Jullien, and W. Miller, “Efficient techniques for binary-to-
multidigit multidimensional logarithmic number system conversion using range-addressable
look-up tables,” IEEE Trans. Comput., vol. 54, no. 3, pp. 257–271, Mar. 2005.
131
Chapter 7
Conclusions and Future Work
7.1 Conclusions
CVNS addition, CVNS sigmoid function evaluation and CVNS multiplication algorithms for low-
resolution environment are introduced. These algorithms make the implementation of high resolution
mixed-signal neural networks in a low-resolution environment feasible.
The proposed CVNS function evaluation method exploits the PWL approximation scheme and is
based on a mathematical derivation using the CVNS characteristics. A new CVNS-based structure
for realization of the proposed CVNS sigmoid function evaluation scheme is proposed. The proposed
structure uses the mixed-signal current-mode circuits. The implementation results show that the
proposed structure compares favorably to the state of the art.
The proposed CVNS multiplication algorithm provides accurate results in low-resolution envi-
ronment. Moreover, VLSI implementation of a 16×8 CVNS synapse multiplier is realized. The
post-layout simulations of the implemented CVNS synapse multiplier confirms its performance.
Using the proposed CVNS algorithms, a 2-2-1 mixed-signal CVNS neural network structure is
implemented. In the implemented network, weights are stored in the digital registers with 16-bit
resolution. The signal processing of the network is carried out using CVNS arithmetic. The imple-
mented network realizes the two input XOR function. Using the CVNS features, the limited analog
signal processing resolution issue present in mixed-signal neural networks is resolved. This opens
a path to design mixed-signal structures which meet the signal processing resolution requirements
132
7. CONCLUSIONS AND FUTURE WORK
of neural networks. The implemented network is designed, laid out, and post-layout simulated in
0.18µm CMOS technology.
An area-efficient robust CVNS Adaline is also proposed. This structure stores the weights in
digital registers while the arithmetic is based on the CVNS. In addition, the Reverse Evolution (RE)
process is exploited to decrease the error in the CVNS digits, which improves the NSR. Storing the
weights in digital registers eliminates the need for the complex analog memory units, required for
the implementation of the previous CVNS architectures. Furthermore, the proposed structure needs
less number of weight storage elements compared to the previous CVNS structures. This in turn
results in a lower area overhead and reduced power consumption.
Combining the proposed Adaline with the distributed architecture, the CVNS-distributed Mada-
line structure is introduced. The mathematical analysis shows that the proposed Madaline structure
compares favorably to all previous architectures in terms of NSR and required number of neurons for
a specific NSR. In addition, to have a circuit-level analysis, a three-layer Madaline is implemented.
The implementation results confirms that the the proposed structure improves upon the previous
structures in terms of the NSR and the area consumption required for a specific NSR. This in turn
leads to a robust neural network with a lower area overhead and reduced power consumption.
A new approximation scheme for digital implementation of hyperbolic tangent is proposed in
final part of this work. The proposed approximation scheme is based on a mathematical analysis
considering maximum allowable approximation error as design parameter. Based on the proposed
approximation scheme, a hybrid structure for VLSI implementation of hyperbolic tangent activation
function is presented. The synthesis results show that the proposed architecture compares favorably
to the previously developed structures in terms of area, delay and area × delay.
The proposed structure requires lower number of output bits for the same maximum allowable
approximation error compared to the previously developed architectures. Reduction in number
of activation function output bits leads to multipliers and adders with lower bit width. This in
turn lowers the area, power and delay. The proposed structure is used for implementing a 4-3-2
network. The implemented network is capable of recognizing six different input patterns. Post layout
simulation results prove that the proposed structure results in an efficient VLSI implementation of
digital neural networks in terms of area, delay and power consumption.
133
7. CONCLUSIONS AND FUTURE WORK
7.2 Future work
The proposed mixed-signal CVNS structure satisfies the resolution requirements of neural networks
with on-chip learning. Therefore, it can be exploited to develop a network with on-chip learning
capability. To realized this, additional modules are required. These modules should implement the
learning algorithm on the chip. For example, for the back-propagation algorithm, the derivative
of activation function is suggested to be designed similar to the structure proposed for the CVNS
sigmoid activation function. In addition, multipliers and adders are required to update the weights
and biases during the learning. These units can be realized using the CVNS multiplier and the
CVNS adder structures proposed in this dissertation. Moreover, we suggest to apply low power
design techniques to the whole system to decrease the network power consumption.
134
VITA AUCTORIS
Babak Zamanlooy received the B.S. degree from the K. N. Toosi University of Technology, Tehran,
Iran, and the M.S. degree from the Iran University of Science and Technology, Tehran, in 2004 and
2006, respectively, both in electrical engineering. He is currently pursuing the Ph.D. degree in
electrical engineering with the University of Windsor, Windsor, ON, Canada. His current research
interests include analog, digital, and mixed-signal integrated circuit design and VLSI implementation
of neural networks.
135
