University of Windsor

Scholarship at UWindsor
Electronic Theses and Dissertations

2018

Mixed-Signal Neural Network Implementation
with Programmable Neuron
Bahar Youssefi

Follow this and additional works at: https://scholar.uwindsor.ca/etd
This online database contains the full-text of PhD dissertations and Masters’ theses of University of Windsor students from 1954 forward. These
documents are made available for personal study and research purposes only, in accordance with the Canadian Copyright Act and the Creative
Commons license—CC BY-NC-ND (Attribution, Non-Commercial, No Derivative Works). Under this license, works must always be attributed to the
copyright holder (original author), cannot be used for any commercial purposes, and may not be altered. Any other use would require the permission of
the copyright holder. Students may inquire about withdrawing their dissertation and/or thesis from this database. For additional inquiries, please
contact the repository administrator via email (scholarship@uwindsor.ca) or by telephone at 519-253-3000ext. 3208.

Mixed-Signal Neural Network Implementation
with Programmable Neuron

by

Bahar Youssefi

A Dissertation
Submitted to the Faculty of Graduate Studies through the
Department of Electrical and Computer Engineering in Partial Fulfillment
of the Requirements for the Degree of Doctor of Philosophy at the
University of Windsor

Windsor, Ontario, Canada
2018

c 2018 Bahar Youssefi

All Rights Reserved. No Part of this document may be reproduced, stored or otherwise
retained in a retreival system or transmitted in any form, on any medium by any means
without prior written permission of the author.

Mixed-Signal Neural Network Implementation with Programmable Neuron
by
Bahar Youssefi

APPROVED BY:

F. Mohammadi, External Examiner
Ryerson University

A. Jaekel
School of Computer Science

K. Tepe
Department of Electrical and Computer Engineering

H. Wu
Department of Electrical and Computer Engineering

J. Wu, Co-Advisor
Department of Electrical and Computer Engineering

M. Mirhassani, Advisor
Department of Electrical and Computer Engineering

November 16, 2017

Declaration of Co-authorship / Previous
Publication

I. Co-authorship
I hereby declare that this thesis incorporates material that is result of joint research, as
follows:
Chapter 2 of the thesis was co-authored with A.J. Leigh, as an outstanding scholar under
under the supervision of Dr. M. Mirhassani. A.J. Leigh contributed to the layout design
and editing of the manuscript. Chapter 5 of this thesis was co-authored with S. Abdollahi,
as a research associate under the supervision of Dr. M. Mirhassani. S. Abdollahi provided
feedback on refinement of ideas.
In all cases, the key ideas, primary contributions, designs, schematics, block diagrams,
data analysis, interpretation, and writing were performed by the author.
I am aware of the University of Windsor Senate Policy on Authorship and I certify that
I have properly acknowledged the contribution of other researchers to my thesis, and have
obtained written permission from each of the co-author(s) to include the above material(s)
in my thesis.

iv

DECLARATION OF CO-AUTHORSHIP / PREVIOUS PUBLICATION

I certify that, with the above qualification, this thesis, and the research to which it refers,
is the product of my own work.

II. Previous Publication
This thesis includes 5 original papers that have been previously published/submitted for
publication in peer reviewed journals, as follows:
Thesis Chapter

Publication Title

Publication status

Chapter 2

Bahar Youssefi, Alexander J. Leigh, Mitra Mirhassani, and Jonathan Wu,
“Tunable Neuron PWL Approximation Based on the Minimum Operator,”
IEEE Transactions on Circuits and Systems II: Express Briefs

Minor revisions requested

Chapter 3

Bahar Youssefi, Mitra Mirhassani, and Jonathan Wu, “Efficient Mixed-Signal
Synapse Multipliers for Multi-Layer Feed-Forward Neural Networks,” IEEE
International Midwest Symposium on Circuits and Systems, pp. 814-817, Oct.
2016.

Published

Chapter 4

Bahar Youssefi, Mitra Mirhassani, and Jonathan Wu, “Hardware Realization of
Mixed-Signal Neural Networks with Modular Synapse-Neuron Arrays,” IEEE
International Symposium on Circuits and Systems, 2018

Submitted

Chapter 5

Bahar Youssefi, Siamak Abdollahi, Mitra Mirhassani, and Jonathan Wu,
“Nonlinear Dynamics of Single Sigmoid Neural Network,” IEEE Transactions
on Neural Networks and Learning Systems

Submitted

Chapter 6

Bahar Youssefi, Mitra Mirhassani, and Jonathan Wu, “A Current-Mode
Mixed-Signal Approach to Realize the Distributed-Arithmetic-Based FIR
Filters,” IEEE Access Journal, The institution of engineering and technology
(IET)

Under revision for resubmission

I certify that I have obtained a written permission from the copyright owner(s) to include
the above published material(s) in my thesis. I certify that the above material describes
work completed during my registration as a graduate student at the University of Windsor.

III. General
I declare that, to the best of my knowledge, my thesis does not infringe upon anyones copyright nor violate any proprietary rights and that any ideas, techniques, quotations, or any
other material from the work of other people included in my thesis, published or otherwise,

v

DECLARATION OF CO-AUTHORSHIP / PREVIOUS PUBLICATION

are fully acknowledged in accordance with the standard referencing practices. Furthermore, to the extent that I have included copyrighted material that surpasses the bounds of
fair dealing within the meaning of the Canada Copyright Act, I certify that I have obtained
a written permission from the copyright owner(s) to include such material(s) in my thesis.
I declare that this is a true copy of my thesis, including any final revisions, as approved
by my thesis committee and the Graduate Studies office, and that this thesis has not been
submitted for a higher degree to any other University or Institution.

vi

Abstract

This thesis introduces implementation of mixed-signal building blocks of an artificial neural network; namely the neuron and the synaptic multiplier. This thesis, also, investigates
the nonlinear dynamic behavior of a single artificial neuron and presents a Distributed
Arithmetic (DA)-based Finite Impulse Response (FIR) filter. All the introduced structures
are designed and custom laid out.
A novel VLSI implementation of a reconfigurable neuron based on choosing the minimum operator utilizing the winner-take-all circuit is proposed. The neuron estimates the
Sigmoid-shape activation function using the piece-wise linear approximation method and
achieves the adaptability by taking advantage of the body effect of PMOS transistors. The
structure covers a variety of activation functions such as rectified linear, hard-limit, and
different precision sigmoid functions which aims to improve the generalization ability in
neural networks.
An area and power-efficient synaptic multiplier is proposed which works based on the
combination of the digital gates and weighted current mirrors. A 4-3-2 neural network
containing the modular synapse-neuron building blocks is successfully tested for pattern
recognition. The proposed artificial neural network addresses the area-efficiency considering the inevitable growth in the size of the current networks.

vii

ABSTRACT

Moreover, the nonlinear behavior of a single sigmoidal neuron is investigated to discuss the oscillatory behavior of a single neuron and its possible applications in the future
generation of oscillators.
The proposed FIR filter is designed aiming to address the efficient VLSI implementation which works based on the distributed arithmetic. There is trade-off between the
computation efficiency of the DA-based processing and area-efficiency of multiply-and accumulate (MAC)-based ones. The proposed FIR filter reduces the required area for a DAbased filter by employing mixed-signal approach. An 8-bit 16-tap FIR filter is designed
and successfully tested for a BPF and LPF at 10MHz and 48KHz respectively.

viii

Acknowledgments

I wish to express my most sincere gratitude to my supervisor Dr. Mitra Mirhassani who
has been more than an advisor to me. During the past years, she was my mentor and the
source of motivation. I would like to thank my co-advisor Dr. Jonathan Wu for his constant
support throughout the course of this work.
In addition to my advisors, I would like to thank my committee members, Dr. Kemal
Tepe, Dr. Huapeng Wu and, Dr. Arunita Jaekel, and Dr. Farah Mohammadi for their
constructive comments and feedback.
I would also like to thank my friends and colleagues Parham H. Namin, Babak Zamanlooy, Iman Taha, and Alexander Leigh for their support and all my colleagues in the ECE
department, ASM and RCIM labs.
Finally, my deepest gratitude goes to my husband, Siamak Abdollahi, and my dearest
parents for their unconditional love, support, and encouragement.

ix

Contents

Declaration of Co-authorship / Previous Publication

iv

Abstract

vii

Acknowledgments

ix

List of Figures

xiii

List of Tables

xviii

List of Abbreviations

xix

1 Introduction

1

1.1 Outline of the Dissertation and List of the Contributions . . . . . . . . . . .
References

5
7

2 Reconfigurable Neuron PWL Approximation Based on the Minimum OperaTor

10

x

2.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

2.2

PWL approximation for a non-Monotonic function . . . . . . . . . . . . .

12

2.3

The proposed reconfigurable structure of the neuron . . . . . . . . . . . . .

16

2.4

The Reconfigurability and the simulation results . . . . . . . . . . . . . . .

19

2.5

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

References

25

3 Mixed-Signal Synapse Multipliers for Feed-Forward Neural Networks

27

3.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

3.2

Neural Network Configurations . . . . . . . . . . . . . . . . . . . . . . . .

28

3.3

Building block’s components . . . . . . . . . . . . . . . . . . . . . . . . .

30

3.3.1

Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

3.3.2

Mixed-Signal Multiplier . . . . . . . . . . . . . . . . . . . . . . .

31

3.4

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

References

38

4 Hardware Realization Of Mixed-Signal Neural Networks

40

4.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

4.2

Self-adjustable distributed Neuron . . . . . . . . . . . . . . . . . . . . . .

42

4.3

distributed neural network . . . . . . . . . . . . . . . . . . . . . . . . . .

44

4.3.1 Synaptic Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . .

45

4.4

Pattern recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

4.5

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

x
xi
i

References

52

5 Dynamic Behavior Of A Single Sigmoidal Neuron: Stable To Period Doubling 54
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

5.2 Background and Theory . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

5.3 Stability Analysis of the single neuron structure . . . . . . . . . . . . . . .

57

5.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

References

66

6 Low-Power Mixed-Signal Implementation of the DA-based FIR Filter

68

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

6.2 Distributed Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

6.3 Proposed Current-Mode Distributed Arithmetic Structure . . . . . . . . . .

72

6.4 Proposed Filter Implementation . . . . . . . . . . . . . . . . . . . . . . .

76

6.4.1

DAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

79

6.4.2

Current-Mode Delay Cell . . . . . . . . . . . . . . . . . . . . . .

84

6.5 Results Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

90

6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

References

94

7 Conclusions and Future Works

96

7.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . .

96

7.2 Suggested Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

VITA AUCTORIS

99

xii
xii

List of Figures

2.1

The sigmoid function of

19
1+e−0.1x

and the corresponding 5- pieces PWL

approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2

The reconfigurable neuron schematic. . . . . . . . . . . . . . . . . . . . . 14

2.3

(a) The output voltages VA (dashed line) and VB (solid line) for VG of
700mV , 730mV , and 800mV . (b) ideal sigmoid function, PWL approximation achieved from least fit method, and the simulation result of the
neuron output function (solid line). . . . . . . . . . . . . . . . . . . . . . . 15

2.4

The deviation error for the 5-piece PWL approximation of

K
1+e−0.1x

for

K = 19, 10, and 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5

The PWL neuron outputs that show the reconfigurable sigmoid functions
for VG = 710mV and the linear functions for VG = 250mV . Different
shades of the sigmoid and the linear functions are shown for VS of 2V,
2.3V, 2.4V, and 2.5V. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.6

2-bit voltage DAC corner analysis result for different conditions f f , ss,
f s, and sf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.7

Post-layout simulation results for different conditions of f f , ss, f s, and sf
for the temperatures of -55C, 27C, and 125C . . . . . . . . . . . . . . . . 22

xiii

LIST OF FIGURES

2.8

The corner and temperature post-layout analysis for different conditions of
f f , ss, sf , and f s at temperatures of 27C, -55C, and 125C showing the
deviation from the activation function at 27C. . . . . . . . . . . . . . . . . 23

3.1

System-level configuration of the proposed mixed-signal neural network. . 29

3.2

General configuration of the mixed-signal neural network[7]. . . . . . . . . 30

3.3

Non-linear neuron activation function which approximates the sigmoid function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4

Multiplying the two less significant and the two most significant bits of Y
with the analog value of the input X . . . . . . . . . . . . . . . . . . . . . 31

3.5

The proposed modular mixed-signal multiplier to be used in distributed
feed-forward neural network . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.6

MA/MB output result for Y=1111for corner analysis fast-fast (ff), slowslow (SS), slow-fast (sf) and fast-slow (fs) to show the process variation
effect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.7

Multiplication results for Y=0010, 0111, 1010, 1011, and 1100. Ideal and
simulation results are shown with dashed and solid lines respectively. . . . 34

4.1

The resistive-type neuron circuit modified to a robust current-mode structure. The synaptic multiplier’s output current is applied to the neuron as Iin
via terminal T1 . The output current is shaped by a self-adjustable sigmoid
function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.2

The variation of the neuron’s activation functions for the input ranges of
(−60µA, 60µA) shown by the solid line,(−100µA, 100µA) shown by dotted line, and (−200µA, 200µA) shown by dashed line. . . . . . . . . . . . 43

4.3

The 1000 runs Monte Carlo simulation results of the current-mode neuron’s
activation function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

xiv

LIST OF FIGURES

4.4

The system level configuration of a 4-3-2 distributed neural network. Wij
and Cij are the digital synaptic weights corresponding to the second and
third layers respectively. I1 to I4 are the input currents representing the
input patterns.

4.5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

The modular signed multiplying DAC that performs as the synapse [12].
T1 is connected to the terminal with the same name in Fig. 4.1 to build the
synapse-neuron module. . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.6

The corner analysis simulation results of tt (Typical NMOS Typical PMOS),
ff (Fast NMOS Fast PMOS), fs (Fast NMOS Slow PMOS), sf (Slow NMOS
Fast PMOS), and ss (Slow NMOS Slow PMOS) that show the process variation effect on the multiplication performance for the input current of 1µA
that is multiplies to a digital weight that varies from -11111 to 11111. . . . 47

4.7

The structure of the current comparators that are connected to the O1 and
O2 terminals of Fig. 4.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.8

The input templates that are used to test the functionality of the neuron. . . 50

4.9

Simulation results of the 4-3-2 distributed neural network to prove the pattern recognition capability. . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.10 The 4-3-2 mixed-signal network layout. . . . . . . . . . . . . . . . . . . . 51
4.11 Simulation results of the 4-3-2 distributed neural network to show the sensitivity regarding three critical paths. . . . . . . . . . . . . . . . . . . . . . 51
5.1

A single sigmoidal neuron in a feedback configuration. . . . . . . . . . . . 57

5.2

The stationary solution for the arbitrary values of µ = 2, β = 3, and
x0 = 0.55. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.3

The bifurcation map of the structure shown in Fig. 5.2 when x0 = 0.55 and
y0 = 0.001728. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.4

Stationary solution of y0 for various β and µ values. . . . . . . . . . . . . . 59

xv

LIST OF FIGURES

5.5

Stability map achieved from the system eigenvalues which show the two
possible behaviours, stable and period doubling, for the system. . . . . . . 59

5.6

Time domain behavior of the system for two different arbitrary sets of µ an
β. (a) shows the oscillatory behavior for µ = 6 and β = 3. (b) shows the
stable behavior for µ = 0.3 and β = 0.15. . . . . . . . . . . . . . . . . . . 63

6.1

The proposed DA architecture for a 16-tap 8-bit mixed-signal FIR filter. xij
is the j th bit of the ith input Xi . ICi is the ith filter coefficient and y[n] is
the final output current. (a) The compelete current-mode DA architecture
(b) The structure at the high state of the first N − 1 clock cycles. (c) The
structure at the low state of the first N − 1 clock cycles (d) The structure at
the N th clock cycle when the operation is done. . . . . . . . . . . . . . . . 70

6.2

Multiplying stage of the proposed mixed-signal filter. . . . . . . . . . . . . 73

6.3

DA-based delay/division stage of the proposed current-mode mixed-signal
filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6.4

Overall conceptual operating waveforms of the proposed filter. The notations show the signal level at the specific clock cycle. . . . . . . . . . . . . 75

6.5

The 5-bit DAC structure [13] . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.6

The 5-bit DAC output current of ICi (solid line), the exact error calculated
from Error(µA) = ICi − Iideal shown by dashed line, the error percentage
calculated from

6.7

100·(ICi −Iideal )
Iideal

shown by dash-dotted line. . . . . . . . . . . 81

The family plot of the DAC output current vs. the analog equivalent of the
digital input achieved from 500 runs of Monte Carlo simulations. . . . . . . 81

6.8

The input and output currents of two cascaded delay cells and a current
divider for four random input currents of 4.98µA, 20.02µA, 60µA, and
99.98µA.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

xvi

LIST OF FIGURES

6.9

Corner analysis of tt (Typical NMOS Typical PMOS), ff (Fast NMOS Fast
PMOS), fs (Fast NMOS Slow PMOS), ss (Slow NMOS Slow PMOS), and
sf (Slow NMOS Fast PMOS) of the error percentage occurs in the feedback
branchs output current. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.10 The 1000 runs Monte Carlo simulation family plot of the second delay cell
output current for four random input currents of 100µ, 58µA, 24µA, and
6µA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.11 The frequency and phase responses of the DA-based BPF and LPF. . . . . . 87
6.12 The layout of the 8-bit 16-tap mixed-signal filter based on DA. . . . . . . . 92

xvii

List of Tables

2.1

The neuron schematic transistors dimensions. . . . . . . . . . . . . . . . . 18

2.2

The intersection points of (x1 , y1 ) and (x2 , y2 ), the slopes, and the y-intercepts
of the 5-piece PWL approximation. . . . . . . . . . . . . . . . . . . . . . . 18

3.1

Simulation and ideal multiplication result for Y=1111 at different input
levels and the measured error. . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2

Simulation and ideal multiplication result for Y=1111 at different input
levels and the measured error. . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.1

Sizes of the transistors of the multiplier. . . . . . . . . . . . . . . . . . . . 47

4.2

The comparison table of the proposed 4-3-2 distributed NN and other similar structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.1

DAC transistors dimentions. . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.2

Filter’s Coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.3

Comparison of the proposed filter with recent published filters . . . . . . . 92

xviii

List of Abbreviations

ADC
ANN
BPF
CMOS
DA
ff
FIR
fs
LPF
LTA
LTI
MAC
MDAC
NMOS
NN
PMOS
PWL
ROC
sf
ss
tt
VLSI
WTA

Analog to Digital Converter.
Artificial Neural Network.
Band-Pass Filter.
Complementary Metal Oxide Semiconductor.
Distributed Arithmetic.
Fast NMOS Fast PMOS.
Finite Impulse Response.
Fast NMOS Slow PMOS.
Low-Pass Filter.
Loser-Take-All.
Linear Time-Invariant.
Multiply-Accumulate.
Multiplying DAC.
Negative Metal Oxide Semiconductor.
Neural Network.
Positive Metal Oxide Semiconductor.
Piece-wise Linear .
Region Of Convergence.
Slow NMOS Fast PMOS.
Slow NMOS Slow PMOS.
Typical NMOS Typical PMOS.
very large-scale integration.
Winner-Take-All.

xix

Chapter 1
Introduction

In this chapter, a brief overview of the mixed-signal approach towards different signal processing building blocks especially the neural network implementation is presented. This
chapter shortly investigates diverse types of artificial neurons and the importance of utilizing an adjustable neuron. Also, the possible nonlinear behavior of the neural network is
explored briefly.
The mixed-signal approach integrates both analog and digital elements in a single silicon chip [1, 2]. There is an impressive attention drew to the mixed-signal ICs because of
the two trends in the design industry [3]. First, the transistor dimensions have been scaled
down to deep submicron levels to allow millions of transistor and complex systems integrated into a solo die. Second, multifaceted, complex systems need to put the digital signal
processors and the analog circuitries together on a single chip by using the digital to analog
converters (DAC) and analog to digital converters (ADC).
These two trends in the IC industry, the upsurge in the number of the transistors packable in a single die and the growing extensiveness of electronic systems, make the mixed-

1

1. INTRODUCTION

signal circuit design a demandable approach in IC market [3]. Another attractiveness of the
mixed-signal approach is the ability to partially skip the disadvantages of both analog and
digital implementations when blending the advantages. Addition, subtraction, and division
by a constant number are examples of operations that can be done effortlessly in analog
domain [4].
Multipliers which are one of the essential building blocks of the filters and neural networks can be built by using multiplying DACs [5, 6, 7] which perform the multiplication in
the analog domain by digital controls. Some of the mixed-signal circuits can work with a
fewer number of DAC or ADC or avoid using any, by utilizing MDACs as multipliers. The
fewer number of DACs and ADCs means the inevitable reduction in the area and power
consumption.
Another multiplication method which can benefit from mixed-signal implantation is
Distributed Arithmetic (DA) [8]. Distributed arithmetic is a bit-serial computational method
that performs the inner product in a different fashion than multiply-accumulate (MAC) operations [9]. Crosier et al. [10] introduced the DA concept for the first time; then the method
was used for digital implementation of FIR filters [11]. In the DA approach, the clock cycles that are needed to compute the inner product is fixed and depends on the resolution of
the input data. This approach has been utilized in image coding [12], filter implementation
[8, 13], vector quantization [14], and discrete cosine transform [15]. Compared to MAC
operations, DA is more efficient regarding computations and mechanizations; the advantage is more visible when the system needs to deal with a large length input vector [8, 9].
It should be noted that the previous structures of the DA replaced the multipliers by large
memories, shift registers, and adders which increased the area and power consumption.
The proposed mixed-signal implementation would be beneficial by eliminating the adders,
subtractors, and dividers. The details of the proposed architecture are presented in Chapter
6.
Artificial neural network (ANN) is another computational method that can benefit from

2

1. INTRODUCTION

the mixed-signal implementation. ANNs have the ability of being trained to provide solution to different types of problems for which analytical solutions do not exist or hard to
be calculated [16] such as pattern recognition [17], memories [18, 19], nonlinear signal
prediction, time-series prediction [20, 25], and action recognition [26].
ANNs have been drawing attention due to their generalization ability which leads to the
better prediction performance [20], therefore, a solution that could improve their generalization ability is critical.
Considering the capability of the neural network in solving unknown problems [16],
they can be used in some specific applications such as wearable sensors that are used to
control the patients conditions continuously [21] or as wireless sensor network (WSN) that
employ a network of several sensors to monitor environmental conditions [22].
These real-life applications can give us a view of the design characteristic that should be
considered in the neural network implantations. The first aspect that should be considered
in the neural network realization is that the design should be able to perform the parallel
computation to follow the network principles [23]. The area and power consumption are the
issues to be paid attention to in portable and battery-powered devices such as wearable sensors and WSNs. Accuracy is a less of concern criteria in neural network implementations
since inaccurate elements performance can be modified during the training of the network
[23, 24]. In general, the analog implementation performs the parallel processing while provides more area and power-efficiency compared to the digital realizations whereas showing
less accuracy. The mixed-signal approach can benefit from the parallel calculation, area
and power-efficient characteristic of the analog design while it shows higher accuracy in
comparison.
Two basic composing blocks of an ANN are neuron and synapse. In the synapse, the
synaptic weight is multiplied to an input; then the result passes the neuron which shapes the
synapse output due to its activation function. In analog implementations [27, 28], synaptic
weights, processing, and neuron’s activation function are implemented by analog circuits

3

1. INTRODUCTION

usually providing a higher efficiency compared to digital implementation [6, 29].
Analog implementations can realize the highly parallel nature of the biological neural networks, however, they are not as accurate as the digital realizations. The inaccuracy
of analog implementations can be compensated by increasing the number of neurons [6].
Mixed-signal implementation can improve the accuracy compared to analog implementation while still benefiting from analog circuits advantages. A modular multiplying DAC can
adequately perform as a synapse module in which the digital synaptic weights are stored in
shift registers [29].
Another challenge of the ANNs implementation is the realization of the neuron activation function which can be sigmoid, hyperbolic tangent, hard-limit, Poslin, and linear.
Area, power consumption, and accuracy are the criteria that are considered in the implantation. Reconfigurability of the neuron’s activation function is another specification that
give the neuron the ability to change shape post-fabrication. A programmable (reconfigurable) neuron is primarily can be used in multiresolution learning paradigm which has
been proposed as a method that improves the ANNs generalization feature significantly
[20, 25]. The multiresolution method works based on adjusting the activation function corresponding to the resolution they need, that means to start with the coarse tuning activation
functions and increase the resolution as going further [20, 25]. In this fashion, an adaptable analog implementation of a neuron activation function would be beneficial for analog
and mixed-signal applications that are aiming to achieve improvement in generalization
property.
To get a deeper knowledge of how the neural network performs, knowing the behavior
of a single neuron is an immense help. In this thesis, the nonlinear dynamic behavior of a
single sigmoidal neuron with a feedback synaptic weight is investigated and the possible
applications are proposed.
The analog and mixed-signal research lab at the University of Windsor has been focused on the pattern recognition and in a special case movement recognition using neural

4

1. INTRODUCTION

networks. This thesis is motivated by the basic blocks that are used in pattern recognition
and challenges brought up in mixed-signal implementations.

1.1

Outline of the Dissertation and List of the Contributions

• In Chapter 2, A very large-scale integration (VLSI) prototype of a reconfigurable
neuron is proposed and realized for the first time. The programmable neuron can
be used in analog and mixed-signal networks. The activation function of the neuron
can be accustomed off-chip or on-chip by a 2-bit voltage digital to analog converter
(DAC) to provide the hard-limit, linear, and variable slope sigmoid functions. Since
the proposed neuron is able to provide adjustable precision, it would be invaluable
for neural network applications such as signal prediction which use multi-resolution
learning paradigm to increase the efficiency of the system by improving the generalization ability of the network.
• In Chapter 3, an area and power-efficient synaptic multiplier is realized in TSMC
CMOS 0.18µm technology. The mixed-signal MDAC is highly modular making it
suitable to be used to multiply digital synaptic weights and the analog inputs. The
structure reduced the dimensions of the required transistors and decrease the need for
weighted current mirrors compared to conventional MDACs.
• In Chapter 4, a 4-3-2 mixed-signal neural network is employed for pattern recognition application and a series of patterns are tested successfully. The network building
blocks are the proposed neuron introduced in Chapter 2 and the proposed synaptic
multiplier presented in Chapter 3.
• In Chapter 5, it is shown that a single sigmoid neuron with a feedback synaptic
weight shows the oscillation behavior which is the simplest system which can realize

5

1. INTRODUCTION

a neural oscillator for the first time. The frequency of oscillation only depends on
the propagation delay of the system which is promising to reduce the dependency of
VLSI implementations on the process and fabrication variations.
• In Chapter 6, the distributed arithmetic principles is used to implement a mixedsignal FIR table without need to us a lookup table. DACS with the current-mode
outputs are utilized to do the multiplication between digital inputs and the analog
coefficients. The current-mode multiplication eliminates the required adders and dividers, consequently, reduces the required area and power consumption. Two 16-tap
8-bit FIR filters (a BPF and an LPF) are realized by using the proposed architecture.
• Lastly, Chapter 7 highlights the contributions of the research and introduced the possible future works.

6

References
[1] M. Burns,G. W. Gordon, An introduction to mixed-signal IC test and measurement,
New York: Oxford University Press, vol. 2001, Mar. 2001.
[2] S. Davidson “An Introduction to Mixed-Signal IC Test Measurement [Book Review],”
IEEE Design Test , vol. 30, no. 3, pp.94–96, Jun. 2013.
[3] B. Kaminska, K. Arabi,I. Bell, P. Goteti, J. K. Huertas,B. Kim, A. Rueda,M. Soma,
“Analog and mixed-signal benchmark circuits-first release,” in Proceedings International Test Conference 1997,IEEE, pp.183–190, Nov. 1997.
[4] B. Razavi, Design of analog CMOS integrated circuits, Boston, MA:McGraw-Hill,
2001.
[5] E. I. El-Masry, H. K. Yang, M. A. Yakout “Implementations of artificial neural networks using current-mode pulse width modulation technique,” IEEE transactions on
neural networks, vol. 8, no. 3, pp.532–548, May. 1997.
[6] H. Djahanshahi, A robust hybrid VLSI neural network architecture for a smart optical
sensor, University of Windsor, 1999.
[7] H. Djahanshahi, M. Ahmadi, G. A. Jullien, and W. C. Miller, “Design and VLSI implementation of a unified synapse-neuron architecture,” in Proceedings of the 6th Great
Lakes Symposium on VLSI, pp.228-233, Mar. 1996.
[8] E. Zalevli, W. Huang, P. E. Hasler, and D. E. Anderson, “A Reconfigurable MixedSignal VLSI Implementation of Distributed Arithmetic Used for Finite-Impulse Response Filtering,” IEEE Transactions on Circuits and Systems I: Regular Papers,
vol. 55, no. 3, pp.510–521, Mar. 2008.
[9] S. A. White, “Applications of distributed arithmetic to digital signal processing: A
tutorial review,” IEEE ASSP Magazine, vol. 6, no. 3, pp.4–9, Jul. 1989.
[10] A. Croisier, D. J. Esteban, M. E. Levilion, and V. Rizo “Digital Filter for PCM Encoded Signals,” U.S. Patent, 3 777 130, Dec. 1973.

7

REFERENCES

[11] A. Peled and B. Liu “A new hardware realization of digital filters,” IEEE Transactions
on Acoustics, Speech, and Signal Processing, vol. 22, no. 6, pp.456-462, Jun. 1974.
[12] S. N. Merchant and B. V. Rao“Distributed arithmetic architecture for image coding,”
Fourth IEEE Region 10 International Conference, vol. 55,no. 3, pp.7477, Nov. 1989.
[13] D. J. Allred, H. Yoo, V. Krishnan, W. Huang, and D. V. Anderson“LMS adaptive
filters using distributed arithmetic for high throughput,” IEEE Transactions on Circuits
and Systems I: Regular Papers, vol. 52, no. 7, pp.1327–1337, Jul. 2005.
[14] H. Q. Cao and W. Li“VLSI implementation of vector quantization using distributed
arithmetic,” IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World, vol. 2, pp. 668-671, May 1996.
[15] M. T. Sun, T. C. Chen, and A. M. Gotlieb,“VLSI implementation of a 16x16 discrete
cosine transform,” IEEE transactions on circuits and systems, vol. 36, no. 6, pp.610617, Jun. 1989.
[16] J. Heikkonen, J. Lampinen, and A. M. Gotlieb,“Building industrial applications with
neural networks,” InProceedings of the European symposium on intelligent techniques,
pp.3-4, 1999.
[17] J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural networks,
vol. 61, pp.85–117, Jan. 2015.
[18] J. A. Anderson, “A simple neural network generating an interactive memory,” Mathematical biosciences, vol. 14, no.3-4, pp. 197–220, Aug. 1972.
[19] T. Kohonen,“VLSI implementation of a 16x16 discrete cosine transform,” IEEE
transactions on computers, vol. 100, no. 4, pp.353–359, Apr. 1972.
[20] M. T. Sun, T. C. Chen, and A. M. Gotlieb, “Improving signal prediction performance
of neural networks through multiresolution learning approach,” IEEE Transactions on
Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 36, no. 2, pp.341–352, Apr.
2006.
[21] S. Rhee, B. H. Yang, and H. H. Asada, “Artifact-resistant power-efficient design of
finger-ring plethysmographic sensors,” IEEE Transactions on Biomedical Engineering, vol. 48, no. 7, pp.795–805, Jul. 2001.
[22] I. F. Kakkar, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wireless sensor networks: a survey,” Computer Networks, vol. 38, no. 4, pp.393-422, Mar. 2002.
[23] V. Kakkar, “Comparative study on analog and digital neural networks,” IJCSNS International Journal of Computer Science and Network Security, vol. 9, no. 7, pp.14-21,
Jul. 2009.
8

REFERENCES

[24] J. Van der Spiegel, P. Mueller, D. Blackman, P. Chance, C. Donham, R. EtienneCummings, and P. Kinget, “An Analog Neural Computer with Modular Architecture
for Real-Time Dynamic Computations,” IEEE Journal of Solid-State Circuits, vol. 27,
no. 1, pp.82-92, Jan. 1992.
[25] Y. Liang and E. W. Page, “Multiresolution learning paradigm and signal prediction,”
IEEE Transactions on Signal Processing, vol. 45, no. 11, pp.2858-2864, Nov. 1997.
[26] Y. Du, W. Wang, and L. Wang, “Hierarchical recurrent neural network for skeleton
based action recognition,” In Proceedings of the IEEE conference on computer vision
and pattern recognition, pp.1110–1118, 2015.
[27] D. Anguita, A. Boni, “ Neural network learning for analog VLSI implementations of
support vector machines: a survey,” Neurocomputing, vol. 55, no. 1, pp.265-283, Sep.
2003.
[28] J. Cosp, J. Madrenas, and D. Fernndez,“Design and basic blocks of a neuromorphic
VLSI analogue vision system,” Neurocomputing, vol. 69, no. 16, pp.1962–1970, Oct.
2006.
[29] B. Zamanlooy, M. Mirhassani, “Mixed-signal VLSI neural network based on Continuous Valued Number System,” IEEE transactions on circuits and systems, vol. 221,
pp.15–23, Jan. 2017.

9

Chapter 2
Reconfigurable Neuron PWL
Approximation Based on the Minimum
Operator

2.1

Introduction

Hardware implementation of neural networks relies on efficient implementation of neurons [1, 2, 3] and their synapses [4]. A challenge in the implementation and application of
the neural networks is to enhance their generalization capability which leads to the better
prediction performance. Multiresolution learning paradigm is a relatively new approach
which improves the neural network generalization ability significantly [5]. The method
works based on tuning the neurons activation function during the training process; consequently, a neuron with an adaptable transfer is required in this method. Although the
multiresolution learning process has been proposed before, the hardware implementation

10

2. RECONFIGURABLE NEURON PWL APPROXIMATION BASED ON THE MINIMUM OPERATOR

faces challenges in creating adjustable neurons.
In this paper, a universal and programmable analog neuron is proposed that can change
the shape of its transfer function on demand without the need to redesign the neuron circuit.
There are several very large-scale integration (VLSI) implementations and applications for
various activation functions such as sigmoid function [1], hard-limit [2] and linear activation functions [3]. However, to the best of the authors knowledge, a reconfigurable analog
structure that can provide different transfer functions on demand without redesigning the
neuron circuit is not proposed. The neuron can generate the various sigmoid functions, as
well as hard-limit, and linear activation functions.
The proposed neuron can be distributed in the network, where each node is composed
of several sub-neurons. This type of neurons has been shown to improve the network performance [6]. Moreover, the distributed sub-neurons scaled over the input range, therefore
prevented the neuron to become a band-limiting or a low gain linear function. However, the
sub-neuron activity was not the result of training, rather the range of input values caused the
scaling effect. On the other hand, the proposed neuron can generate the desired functions
based on the multiresolution training algorithm and under the control of the designer.
The desired function of the proposed neuron is generated by choosing the minimum
operator based on the piecewise linear (PWL) approximation method. The PWL approximation of an exponential function utilizing a Winner-Take-All (WTA) circuit was presented
for the first time in [7]. However, the method cannot provide a solution for estimating nonmonotonic functions.
The architecture proposed in this chapter works based on choosing the minimum basis
function (operator) to provide the PWL approximation of the looked-for neuron activation
function. Therefore, it can be suitably called a Loser-Take-All (LTA) structure. The design
method is suitable for the PWL approximation of different non-monotonic functions; in this
chapter, the proposed method is used to estimate a tunable sigmoid function. The proposed
neuron can change shape and is tunable to form different sigmoid functions as well as the

11

2. RECONFIGURABLE NEURON PWL APPROXIMATION BASED ON THE MINIMUM OPERATOR

linear ones.

2.2

PWL approximation for a non-Monotonic function

The sigmoid function, defined by

K
,
1+e−ax

is the base of the PWL approximation in this

paper. It should be mentioned that the proposed method can be used to estimate other
non-monotonic functions as well.
The general approach to PWL approximation is to divide the input range into several
subintervals and to define a linear basis function that fits the curve in each subinterval. A
WTA-based approach suggested in [7] uses the fact that the exponential function increases
monotonically to approximate eax function in a positive interval of [0, XN ]. However, most
neuron activation functions, including the hard-limit, hyperbola, poslin, and sigmoid
functions are defined in the interval of [−XN , XN ] and are not monotonic [8].
In the proposed approach, the first step is to find the best estimation which fits the curve
with the minimum error. The least square fit is a curve fitting modeling method that is
widely used for a different variety of functions and is a routine approach for optimized
approximation [9].
Fig. 2.1 shows a case study of the sigmoid function of

19
1+e−0.1x

and its 5-pieces PWL

approximation achieved by applying the least square curve-fitting method. In each subinterval, the minimum operator is the one which fits the original function. Reasonably, a LoserTake-All (LTA) circuit that chooses the minimum correspondence function in a subinterval
among all the basis functions can successfully approximate the circuit. In the positive
interval, the output can be described as follows:
K
≈ yapx = min[m1 x + b1 , m2 x + b2 , ..., mn x + bn ]
+1

e−ax

(2.1)

in which mn and bn are the slope and the y-intercept of each basis function respectively.
Each basis function can be built by a current mirror with the dimension ratio of mn set

12

2. RECONFIGURABLE NEURON PWL APPROXIMATION BASED ON THE MINIMUM OPERATOR

20

IC

x1

3
2

15

10

minimum
operator

x0

x2

1

5

5

x4

4
x3
0

0

Figure 2.1: The sigmoid function of
imation.

19
1+e−0.1x

20

40

60

and the corresponding 5- pieces PWL approx-

with respect to the dimensions of the input transistor, Win /Lin , as follows:
mn =

K
Wn /Ln
=
Win /Lin
(xn − xn−1 )(1 + e−axn )(1 + e−axn−1 )

(2.2)

The y-intercept is a DC current offset that is added to the output of the corresponding
current mirror.
It should be noticed that this function is symmetric with respect to the point x0 . By
means of this symmetry, the function in the negative interval, (−65µA, 0), is generated
from the function in positive interval, (0, 65µA), by subtracting f (x) from a constant current, such that (g(x) = IC −f (x)). The constant current IC is equal to the higher horizontal
asymptote of the sigmoid function as shown in Fig. 2.1. f (x) and g(x) define the sigmoid
function in the positive and the negative intervals respectively. As shown in Fig. 2.1 the
minimum operator between the basis functions 1 , 2 , and 3 shows the PWL estimation
in the positive interval. The dimension ratios of the corresponding current mirrors which

13

2. RECONFIGURABLE NEURON PWL APPROXIMATION BASED ON THE MINIMUM OPERATOR

Vs
Vdd

Vdd

Vs

Vs

Vdd

Vdd

m1Iin

Iin

Vdd
M2

Vdd

b3
Vo2

VB

M5

VA

I1

VB

VG

VB

M3

Vdd

Vdd
M15

M14

AND

x0 x1

Vdd
M17
Vs

x0
x1

M18

Voltage DAC

VG
M6

Vdd
M16

I2

I3

M7

M4
VA

m2Iin=αm1Iin b2=β(b1+b3)
b3

m1Iin
b1
Vo1

VA

M9

M8

M1

Min

x0

x1

x0

x1

VA

AND

VB

Io
M10

Ibias

WTA

b3
Sign S2 S3
M11 Sign

b3-Io
S4
M12

Iout
M13

Sign
S1
Sign

Figure 2.2: The reconfigurable neuron schematic.
generate these basis functions are as follows:
e−ax0 − e−ax1
(x1 − x0 )(1 + e−ax0 )(1 + e−ax1 )
1 − e−a∆x
=K
(2∆x)(1 + e−a∆x )

m1 =K

e−ax1 − e−ax2
(x2 − x1 )(1 + e−ax1 )(1 + e−ax2 )
e−a∆x (1 − e−ap∆x )
=K
(p∆x)(1 + e−a∆x )(1 + e−a(p+1)∆x )

(2.3)

m2 =K

m3 =0

(2.4)
(2.5)

in which x1 − x0 = ∆x and x2 − x1 = p∆x.

14

2. RECONFIGURABLE NEURON PWL APPROXIMATION BASED ON THE MINIMUM OPERATOR

2.5
V =700m
G

2

V A , V B (V)

V =730m
G

1.5
V G =800m
1
0.5
0

0

10

20

30

40

50

60

I (µA)
in

(a)
20

10

Sigomid
PWL
Neuron

19

18.5

18

17.5

I

out

(µA)

15

17
25

30

35

40

45

50

5

0

0

20

40

60

I (µA)
in

(b)
Figure 2.3: (a) The output voltages VA (dashed line) and VB (solid line) for VG of 700mV ,
730mV , and 800mV . (b) ideal sigmoid function, PWL approximation achieved from least
fit method, and the simulation result of the neuron output function (solid line).

15

2. RECONFIGURABLE NEURON PWL APPROXIMATION BASED ON THE MINIMUM OPERATOR

The DC offset of the current mirrors are achieved as follows:
K
2
K
b2 = (p + e−a∆x ((p + 1)e−ap∆x − 1))
p
b1 =

(2.6)

b3 =K
To simplify the structure, the second basis function is generated from the combination of
the first and the third ones such that m2 = α(m1 +m3 ) and b2 = β(b1 +b3 ). By substituting
these equations in to the equations 5.4 to 5.7, α and β are achieved as follows:
2e−a∆x (1 − e−ap∆x )
p(1 − e−a∆x )(1 + e−a(1+p)∆x )
3p
β=
−a∆x
2(p + e
((p + 1)e−ap∆x − 1))

α=

(2.7)

In this section, the structure and the design method of the current-mode neuron activation
function of

19
1+e−0.1x

in the input range of [−65µA, 65µA] is represented. Achieving the

second basis function from the averaging of the first and the third ones not only leads to
a simpler structure but keeps the activation function smooth. That means any mismatch
or process variation that may arise in the first or the third current mirror block cells during fabrication would similarly affect the result of the average and thus would eliminate
discontinuity in the PWL approximation result.

2.3

The proposed reconfigurable structure of the neuron

Fig. 2.2 shows the structure of the proposed reconfigurable neuron with the sigmoid activation function. A two-section WTA (shown in the dashed box) is employed to compare the
mirrored currents corresponding to the first basis function, I1 = m1 Iin + b1 , and the third
one, I3 = b3 . Due to the nature of WTA, the output voltage of VA goes high only if I1 > I3 .

16

2. RECONFIGURABLE NEURON PWL APPROXIMATION BASED ON THE MINIMUM OPERATOR

In the case that I1 < I3 the voltage of VA goes low while VB goes high. The voltages VA
and VB are used to control the switches which let the minimum operator pass through the
output transistor, M13 . N OT gates are used as the push-pull amplifiers to generate VA and
VB that are able to reach the absolute value of 0 and Vdd .
In the conventional WTA circuits, M3 and M6 work in the saturation region and the
voltages VA or VB can be high or low because even a small difference between their currents reduces the drain voltage of the transistor with the lower current, in a chain loop this
reduction goes further until Vo2 becomes almost 0 and Vo1 becomes 1 [10]. When M3 and
M6 work in the triode region, an overlap region is generated at which VA or VB can be
low at the same time due to the higher dependency of the current to the drain voltage. The
bias voltage of VG determines the range of input currents at which the overlap occurs. Two
cascaded NOT gates are used to push VA /VB to zero or pull them to Vdd . In this way, the dependency of these voltages to the effective threshold voltages of N OT gates is neglectable.
Fig. 2.3(a) shows the overlap range variation for different values of VG . As shown in this
figure, the overlapping gap increases when the lower VG is applied to the circuit.
The signals, VA , VB , and VA · VB are used to control the switches which allow one of the
basis functions to pass through to the output. At the positive input range, sign is 1 which
closes the S1 ; in this case, the output current would be equal to Io as shown in Fig. 2.2. At
the negative input range, S2 , S3 , and S4 which are controlled by Sign are closed, and the
output current would be equal to Iout = b3 − Io .
Fig. 2.3(b) illustrates the simulation result for VG of 710mV , the ideal sigmoid function,
and the PWL approximation from Fig. 2.1. The voltage VG is chosen in a way that it
provides the most similar subintervals to the sigmoid approximation shown in Fig. 2.1.
The transistors dimensions and ratios are selected considering formulas (5.4) to (5.8) and
are summarized in Table 2.1.
The accuracy of this method can be shown by using the standard deviation which is
defined as E =

K
1+e−ax

− yapx . In the least square method, the subintervals are not equal

17

2. RECONFIGURABLE NEURON PWL APPROXIMATION BASED ON THE MINIMUM OPERATOR

and are chosen to provide the best fit to the curve. If the first subinterval (x1 − x0 ) is chosen
to be the reference and is considered to be equal to ∆x, the subinterval (xn − xn−1 ) can be
assumed to be p∆x while the xn−1 − x0 = q∆x. In this case, the standard deviation, E,
for the subinterval of (xn , xn−1 ) is achieved as follows:
xe−aqx (1 − e−ap∆x ) + p∆x+
1
E
−
=
K
1 + e−ax p∆x(1 + e−aq∆x )(1 + e−a(p+q)∆x )
q∆x−aqx (e−ap∆x − 1) + p∆xe−a(q+p)∆x
×
∆x(1 + e−aq∆x )(1 + e−a(p+q)∆x )

(2.8)

As shown in the above equation, the standard deviation depends on the subinterval, a, and
K. The maximum standard deviation occurs at the two ends of the subinterval, xn or/and
xn−1 . Consequently, it is simplified to:
2q∆xe−aq∆x
Emax
=
K
p∆x(1 + ee −aq∆x)(1 + e−a(p+q)∆x )

(2.9)

Table 2.1: The neuron schematic transistors dimensions.
Win /Lin = 7.5/1
W1 /L1 = W8 /L8 = 3.5/1
W9 /L9 = 1/1

W2 /L2 = W3 /L3 = 4/1
W4 /L4 = W5 /L5 = 4/1
W6 /L6 = W7 /L7 = 0.25/0.18

Table 2.2: The intersection points of (x1 , y1 ) and (x2 , y2 ), the slopes, and the
y-intercepts of the 5-piece PWL approximation.
Basis function
1
2
3
4
5

intersection points (x,y)
(-12.4,3.698), (12.4,15.4)
(12.4,15.4), (37.2,19.01)
(37.2,19), (37.2,18.99)
(-12.4,3.698), (-37.2,0.094)
(-37.2,0.094), (-62,0.112)

Slope
0.47
0.145
0
0.145
0

y-intercept
9.555
13.6
19
5.5
0.0112

The intersection points (x0 to xn ) of the PWL approximation of the sigmoid function
of

K
1+e−ax

depends on the number of basis functions and a. Accordingly, for a 5-piece

18

2. RECONFIGURABLE NEURON PWL APPROXIMATION BASED ON THE MINIMUM OPERATOR

Standard deviation

0.4
K=19
K=10
K=5

0.3

0.2

0.1

0
0

20

40

60

x
Figure 2.4: The deviation error for the 5-piece PWL approximation of
10, and 5.
PWL approximation of

K
1+e−0.1x

K
1+e−0.1x

for K = 19,

the deviation error only depends on K. Fig. 2.4 shows the

deviation error for three different values of K. The intersection points, the slopes, and the
y-intercept of each basis function are shown in Table 2.2.

2.4

The Reconfigurability and the simulation results

The main advantage of the proposed neuron over the previously proposed structures is that
it provides the ability to be controlled off or on-chip to generate a wide variety of transfer
functions based on the requirements and applications. The form and slope of the neuron can
get adjusted by externally changing voltages and programming it during the training when
the neuron is used for a chip-in-the-loop or online configuration. The proposed structure is
implemented in CMOS 0.18µm technology and uses the power supply of 2.5V. The area
and power of the structure are measured 94.4µm2 and 0.92mW respectively.
The reconfigurability of the neuron is realized by controlling the substrate voltage of VS
of the PMOS transistors as shown in Fig. 2.2. The voltage difference between the substrate
and the source of the PMOS transistors, Vdd − VS , changes the threshold voltage of Vth of

19

2. RECONFIGURABLE NEURON PWL APPROXIMATION BASED ON THE MINIMUM OPERATOR

V S =2

a=1.2,V =2

20

S

a=0.3,V =2.3
S

I out(µA)

15

a=0.1,V =2.5
S

V S =2.5
10
V =2.4
S

5

a=0.2,V S =2.4

V S =2.3
0
0

20

40

60

I (µA)
in

Figure 2.5: The PWL neuron outputs that show the reconfigurable sigmoid functions for
VG = 710mV and the linear functions for VG = 250mV . Different shades of the sigmoid
and the linear functions are shown for VS of 2V, 2.3V, 2.4V, and 2.5V.

tt
ff
fs

sf
ss

Vout

Figure 2.6: 2-bit voltage DAC corner analysis result for different conditions f f , ss, f s,
and sf .

20

2. RECONFIGURABLE NEURON PWL APPROXIMATION BASED ON THE MINIMUM OPERATOR

the corresponding transistors due to the body effect. The variation of the substrate-source
voltage affects the drain current as a function of Vth . Consequently, the current ratios between the transistors of the three basis functions and the input transistor vary corresponding
to the variations in the substrate voltage.
Fig. 2.5 shows the post-layout simulation results of the neuron transfer function for
different values of the substrate voltage of VS while the bias voltage, VG , is 250mV for
linear functions and 710mV for variable sigmoid functions. As shown in this figure,
the lower VS results in a higher slope for the first and the second basis functions. At
VG = 710mV and VS = 2V the slopes change to the point that the sigmoid function
ultimately reshapes to a hard-limit. That means the parameter a which controls the shape
of

1
1+e−ax

can be controlled off-chip via the substrate voltage. For VS = 2V, 2.3V, 2.4V ,

and 2.5V the a is realized 1.2, 0.3, 0.2, and 0.1 as shown in Fig. 2.5.
When the voltage VG is lower than 300mV , both VA and VB go high. Consequently,
only the second basis function can go through to the output, results in generating a linear
activation function. The linear transfer functions shown in Fig. 2.5 are generated at VG of
250mV and VS of 2V , 2.3V , 2.4V , and 2.5V . The highest slope is correspondent to VS of
2V as expected.
The substrate voltage of the mentioned PMOSs is controlled by a 2-bit voltage digital
to analog converter (DAC) which is shown in the dashed-dotted box in Fig. 2.2. The output
voltage of this block is VS and depends on which of the transistors are on at the time. If
and only if the x1 x0 is 00, the transistor M18 turns off and shows a high impedance at the
output node providing the output voltage of VS = Vdd = 2.5V . When x1 x0 is equal to 01,
10, and 11, transistors M16−18 turn on respectively to provide the corresponding VS of 2V ,
2.3V , and 2.4V . The output voltage of the voltage DAC vs. different values of x1 x0 is
shown Fig. 2.6.
The corner analysis for different conditions of f f (Fast NMOS Fast PMOS), ss (Slow
NMOS Slow PMOS), f s (Fast NMOS Slow PMOS), and sf (Slow NMOS Fast PMOS)

21

2. RECONFIGURABLE NEURON PWL APPROXIMATION BASED ON THE MINIMUM OPERATOR

125

-55

125
125

-55

-55

Figure 2.7: Post-layout simulation results for different conditions of f f , ss, f s, and sf for
the temperatures of -55C, 27C, and 125C .
is performed, and the result of that is shown in Fig. 2.6 as well to investigate the process
variation effect on the output result. The maximum output fluctuation of the voltage DAC
happens for x1 x0 = 01 where VS changes from 1.93V at f f to 2.07V at f s conditions.
In this case, VS is supposed to be equal to 2V and is correspondent to the hard-limit
neuron shape and does not affect the neuron shape significantly. Moreover, the hardlimit neuron shape is considered the coarse tuning when used in multiresolution learning
paradigm [5]; meaning that small variations in the transfer function can be compensated
during the fine-tuning stages. The dimensions of transistors M14−18 are
and

3.8
2

0.25 1.25 1.25 4.2
, 2 , 2 , 2,
2

respectively.

The process and temperature variations impact on the neuron function is investigated
by performing the corner analysis for the neuron transfer function. The post-layout simulations are shown in Fig. 3.6 for a = 0.1, a = 1.2 at VG = 800mV and for the linear
function at VG = 200mV and at the temperatures of 27C, -55C, and 125C. The deviation
from the tt (Typical NMOS Typical PMOS) condition at the room temperature of 27C is
represented in Fig. 2.8. As shown in this figure, the maximum deviation occurs at the f f
condition for both -55C and 125C.

22

2. RECONFIGURABLE NEURON PWL APPROXIMATION BASED ON THE MINIMUM OPERATOR

Error

125, ff
-55, ff

-55, sf

Figure 2.8: The corner and temperature post-layout analysis for different conditions of f f ,
ss, sf , and f s at temperatures of 27C, -55C, and 125C showing the deviation from the
activation function at 27C.
Similar to other analog structures, the proposed neuron is disposed to mismatch. However, the parametric analysis simulation results show that the circuit works well by considering 10% mismatch of transistors. It should be noted that the proposed neuron when
used in an on-chip or a chip-in-the-loop configuration, can enjoy some flexibility because
the adaptation of the synapses during the training will be based on the actual and physical
characteristics of the fabricated neuron. Moreover, the mechanism for adjusting the neuron
transfer function allows more flexibility in the transistor mismatches while the network is
getting trained.

2.5

Conclusion

A novel reconfigurable neuron is proposed in this paper to be used in analog and mixedsignal neural networks with different requirements. The shape of the neuron can be changed
on the spot to provide different shades of sigmoid, hard-limit, and linear activation functions. The structure is based on the piecewise linear approximation of the desired transfer

23

2. RECONFIGURABLE NEURON PWL APPROXIMATION BASED ON THE MINIMUM OPERATOR

function and the controllability is obtainable by adjusting the substrate voltage of PMOS
transistors. A 2-bit voltage DAC is used to adjust the shape of the neuron on or off-chip.
The proposed structure is an invaluable part of the analog or mixed-signal networks that
use multiresolution learning process.

24

References
[1] C. H. sai, Y. T. Chih, W. H. Wong, and C. Y. Lee, “A Hardware-Efficient Sigmoid Function With Adjustable Precision for a Neural Network System,” IEEE Trans. Circuits
Syst. II, vol. 62, no. 11 pp. 1073–1077, Nov. 2015.
[2] Q. Liu, and J. Wang, “Finite-Time Convergent Recurrent Neural Network with a HardLimiting Activation Function for Constrained Optimization with Piecewise-Linear Objective Functions,” IEEE Trans. Neural Netw., vol.22, no.4, pp. 601–613, Mar. 2001.
[3] T. Qiu, X. Wen, and F. Zhao, “Adaptive-Linear-Neuron-Based Dead-Time Effects
Compensation Scheme for PMSM Drives,” IEEE Trans. Power Electron., vol. 31, no.3,
pp. 2530–2538, Mar. 2016.
[4] Yang Zhang, Yi Li, Xiaoping Wang, and Eby G. Friedman, “Synaptic Characteristics of
Ag/AgInSbTe/Ta-Based Memristor for Pattern Recognition Applications, IEEE Trans.
Electron. Dev., vol. 64, no. 4, pp. 1806-1811, Apr. 2017.
[5] Y. Liang and X. Liang, “Improving signal prediction performance of neural networks
through multiresolution learning approach,” IEEE Trans. Syst., Man, Cybern. B, vol.
36, no. 2, pp. 341–352, Apr. 2006.
[6] G. Khodabandehloo, M. Mirhassani, and M. Ahmadi, “A Prototype CVNS Distributed
Neural Network Using Synapse-Neuron Modules,” IEEE Transactions on Circuits and
Systems I: Regular Papers, vol. 59, no. 7, pp. 1482–1490, 2012.
[7] D. Moro-Frias,C. A. De La Cruz-Blas, and M. T. Sanz-Pascual,“PWL Current-Mode
CMOS Exponential Circuit Based on Maximum Operator,” IEEE Antennas Wireless
Propag. Lett., vol.5, no.1, pp. 450-453, Dec. 2006.
[8] H. B. Demuth, M. H. Beale, O. De Jess,M. T. Hagan, Neural network design, 1996.
[9] B. Yang, and C. A. Balanis, “Least square method to optimize the coefficients of complex finite-difference space stencils, IEEE Trans. signal processing, vol.45, no.11, pp.
2858-2864, Nov. 1997.

25

REFERENCES

[10] J. Lazzaro, S. Ryckebusch, M. A. Mahowald, and C. A. Mead, “Winner-take-all networks of O (n) complexity, Advances in neural information processing systems, pp.
703-711, 198

26

Chapter 3
Mixed-Signal Synapse Multipliers for
Feed-Forward Neural Networks

3.1

Introduction

In Analog Neural Networks (Analog NN) [1, 2, 3] neurons can be realizing with simple and
elegant non-linear analog circuits and with only a few transistors. Moreover, the addition
of values can be performed by simple nodal summation of currents as long as it can drive
the circuit of the next stage. However, the accuracy of analog circuits has always been a
limiting factor for the realization of large size multi-layer Analog NNs. A multi-layer network requires storing a large number of synaptic values. In analog circuits, these values are
typically stored on capacitors which may change due to leakage currents; hence, periodic
refreshments are required. The issue of storage has been proven to cause limitation in size
and complexity of such networks.
Mixed-signal approach is shown to be an intriguing choice for neural networks imple-

27

3. MIXED-SIGNAL SYNAPSE MULTIPLIERS FOR FEED-FORWARD NEURAL NETWORKS

mentations [4, 5, 6, 7]. In such systems, advantages of both analog and digital [9, 10, 11, 8]
domains are gathered in one place in order to overcome the design challenges to accomplish
smaller area, lower power consumption, higher speed, and smoother activation function realization.
One of the most efficient approaches to implement the synapse in mixed-signal circuitry
is based on the Multiplying Analog to Digital Converter (MDAC) which is used to multiply
the synapse value by the neuron input. Conventional MDACs work based on the weighted
summation of currents, that means weighted current mirrors are required in the network.
Therefore, in each layer of the network with N neurons, N 2 MDAC units are required.
Optimization of the size of the multiplier would significantly affect the feasible size of the
network and hence its performance.
In this chapter, a programmable mixed-signal MDAC multiplier is proposed to be used
in the feed-forward neural network. The proposed structure is modular and easy to be
adopted for different network configurations while the area is reduced by using digital gates
to ease the multiplication and avoid using large-size transistors. Moreover, synaptic weights
are stored in registers which eliminate the need for capacitors and refreshing circuitries.

3.2

Neural Network Configurations

In this section, the general configuration of one layer of the mixed-signal neural network
is presented. There are three main building blocks for the mixed-signal implementation of
neural networks: programmable MDACs for synapse multipliers, adders, and non-linear
neurons that create an integrated synapse-neuron building block.
In the proposed architecture, multiplication operation between the synaptic weights and
the network inputs is performed by the MDAC, where synaptic weights are stored in digital
registers and are multiplied by the analog inputs. Multiplication result of each multiplier
passes through an s-shape neuron and then is added to other multiplication results coming

28

3. MIXED-SIGNAL SYNAPSE MULTIPLIERS FOR FEED-FORWARD NEURAL NETWORKS

Figure 3.1: System-level configuration of the proposed mixed-signal neural network.
from other blocks.
Fig. 3.1 shows the block diagram of a sample 2-2-1 network. As it can be seen in this
figure outputs of n building blocks are connected in parallel to generate a neuron. The
number of MDACs in each layer is equal to the number of inputs to that layer.
The digital registers store the value of the synaptic weight and are programmable based
on the network training. The weights are denoted by Ymn in this figure, where m and n
represent the number of corresponding neuron and inputs of each layer, respectively.
Since the circuit design is based on the current-mode operation, addition in the network
is based on the summation of currents. Neurons are resistive non-linear functions which
are distributed in the network.
The proposed network is trained off-line, where weights and network parameters are
calculated off the chip and downloaded later into the weight registers. However, the network can be easily adjusted for on-line training by adding extra hardware for weight adjustment calculations.

29

3. MIXED-SIGNAL SYNAPSE MULTIPLIERS FOR FEED-FORWARD NEURAL NETWORKS

M1

M3

Bias2

Vout
Iin
M4
Bias1

M2

Figure 3.2: General configuration of the mixed-signal neural network[7].

3.3

Building block’s components

Neuron and its simulation result are presented in this section, followed by the proposed
multiplier structure that plays an important role in reliability and accuracy of the network.

3.3.1

Neuron

Neurons for this network are resistive-type and distributed in order to increase the signal
to noise ratio of the network [7, 5]. The neuron transfer function self-adjusts, preventing
the saturation of neurons when the total number of input increases. The neuron uses the
fundamental nonlinearity in V-I characteristics of the MOS transistors to approximate the
sigmoid-like function. Fig. 3.2 represents the resistive-type neuron. The 6-transistor design
[7] is biased to operate in both triode and saturation regions and has an accurate approximation to the original sigmoid function. The simulation result of the transfer function of
the neuron is displayed in Fig. 3.3.

30

3. MIXED-SIGNAL SYNAPSE MULTIPLIERS FOR FEED-FORWARD NEURAL NETWORKS

Figure 3.3: Non-linear neuron activation function which approximates the sigmoid function.

MB MA
y3 y2 y1 y0
y3 y2 y1 y0
y3 y2 y1 y0
y3 y2 y1 y0
1

x0
x1
x2
x3

3

2

X=x0+2x1+2x2+2x3
M=MA+4MB
Figure 3.4: Multiplying the two less significant and the two most significant bits of Y with
the analog value of the input X

3.3.2

Mixed-Signal Multiplier

In its most general form, multiplication between two binary values (X, and Y) can be
preformed as follows:

M=

3
X
i=0

i

xi 2 ·

3
X

yj 2 j

(3.1)

j=0

31

3. MIXED-SIGNAL SYNAPSE MULTIPLIERS FOR FEED-FORWARD NEURAL NETWORKS

W/L=4.5, Wp/Lp=5.6
MA
M5

Iin

1
S0A

2

3

M2

W/L

M7

M6
Wp/Lp

1

M3
W/L

S1A

y1
y0

S2A

y3
y2

S0B

y3
y2

S1B

y3
y2

S2B

2W/L

MB

S0B

y1
y0

S2A S1A

S2A

M1
W/L

S0A

Wp/Lp

Wp/Lp

Input X

y1
y0

3
S2B

4Wp/Lp

Iout

2
S2B S1B

M4
2W/L

Figure 3.5: The proposed modular mixed-signal multiplier to be used in distributed feedforward neural network
where xi and yj are ith and j th bit of X and Y respectively. In mixed-signal multiplication,
one of the numbers (X) is an analog value. the synapse receives an analog input and multiplies it by a digital weight in the first layer. The multiplication result passes the S-shaped
nonlinear neuron; then it is added to other multiplication results come from other branches
in the first layer.
In conventional MDACs, the principle of multiplying is to use weighted current mirrors.
That means that if the size of the first transistor in a weighted current mirror is W/L, we
need transistors of the size of twice, four times, and eight times of W/L are required to perform the digital to analog multiplication and conversion. The proposed modular multiplier
reduces the size of transistors significantly by introducing a new method to do the multi-

32

3. MIXED-SIGNAL SYNAPSE MULTIPLIERS FOR FEED-FORWARD NEURAL NETWORKS

plication. In the proposed approach, the analog input is multiplied to two bits of the digital
weight (Y). Fig. 3.4 represents the concept of multiplication of two least and most significant bits of the weight (Y) to the analog input of X separately. Based on this separation,
the multiplication result from (1) can be rewritten as:
M A = (y0 + 2y1 ) · (x0 + 2x1 + 22 x2 + 23 x3 )

(3.2)

M B = (y2 + 2y3 ) · (x0 + 2x1 + 22 x2 + 23 x3 )

(3.3)

As it can be seen in equation (2), the output of the MA block can be 0, X, 2X, or 3X
depending on what the value of y0 and y1 are. In this method, combinations of y0 and y1
are used to generate controlling signals that let 0, X,2X or the addition of them (3X) pass
through to the output. The MB block has the same structure as the MA but it’s controlling
signals are generated by combinations of y2 and y3 .
Fig. 3.5 represents the modular architecture for a 4-bit to 4-bit equivalent mixed-signal
multiplier. In this figure, SiA and SjB are controlling signals generated by pair of y0 and y1
and pair of y2 and y3 . Respectively. S0A (S0B ) lets the same value of input X pass through
M1 (M3 ) if y1 y0 (y3 y2 )=01. Due to the same logic, twice of the value of X passes through
M2 (M4 ) when y1 y0 (y3 y2 )=10. When y1 y0 (y3 y2 )=11, passes 1 and 2 are open and the
addition of M1 (M3 ) and M2 (M4 ) passes through M5 (M6 ) which is output of MA (MB)
block.
To confirm the validity and accuracy of the operation, the simulation result and ideal
expected multiplication result for Y=1111 are compared in Table I. In case of Y=1111,
MA and MB reach their maximum values that fully load the multiplier. In this case, the
error and power consumption are at their maximum levels. Also, they have the same value
because of the modularity of the design. As it can be seen in this table the maximum error
is equal to 0.24µA and is occurred for analog inputs of 8µA and 9µA. The maximum error

33

3. MIXED-SIGNAL SYNAPSE MULTIPLIERS FOR FEED-FORWARD NEURAL NETWORKS

Multiplication result (MA/MB)

50
40
30
tt
ﬀ
ss
sf
fs

20
10
0

0

5

10

15

Input current (X)

Multiplication Result

Figure 3.6: MA/MB output result for Y=1111for corner analysis fast-fast (ff), slow-slow
(SS), slow-fast (sf) and fast-slow (fs) to show the process variation effect.
y=1011

150
y=1100

100

y=1010

50
y=0010
0

0

1

2

3

4

5

y=0111

6 7 8 9 10 11 12 13 14 15
Input Current (X)

Figure 3.7: Multiplication results for Y=0010, 0111, 1010, 1011, and 1100. Ideal and
simulation results are shown with dashed and solid lines respectively.
percentage is 1.3% for 4µA. Corner analysis results represented in Fig. 3.6 shows the result
considering process variations.
The multiplication results for more five digital weights and the ideal multiplication
results for an input range of 0 to 15µA is shown Fig. 4.6. The maximum error of 0.24µA
which is one-fourth of the multiplier accuracy of 1µA. That means the error is not only
within the acceptable range but also it can increase the accuracy to 5 bit.
To confirm the validity and accuracy of the operation, the simulation result and ideal
expected multiplication result for Y=1111 are compared in Table I. In case of Y=1111,

34

3. MIXED-SIGNAL SYNAPSE MULTIPLIERS FOR FEED-FORWARD NEURAL NETWORKS

Table 3.1: Simulation and ideal multiplication result for Y=1111 at different input levels and the measured error.
Y
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Ideal Result
3=000011
6=000110
9=001001
12=001100
15=001111
18=010010
21=010101
24=011000
27=011011
30=011110
33=100001
36=100100
39=100111
42=101010
45=101101

X=1111
Experiment result
2.97
6.05
9.12
12.16
15.19
18.21
21.23
24.24
27.24
30.23
33.23
36.19
39.17
42.12
45.07

error
-0.032
0.051
0.116
0.165
0.191
0.216
0.237
0.240
0.240
0.230
0.230
0.190
0.170
0.120
0.070

error percentage
1
0.8
1.2
1.3
1.2
1.1
1.1
0.9
0.7
0.7
0.6
0.5
0.4
0.2
0.1

MA and MB reach their maximum values that fully load the multiplier. In this case, the
error and power consumption are at their maximum levels. Also, they have the same value
because of the modularity of the design. As it can be seen in this table, the maximum
error is equal to 0.24µA and occurs for analog inputs of 8µA and 9µA. The maximum
error percentage is 1.3% for 4µA. Corner analysis results represented in Fig. 3.6 shows the
result considering process variations.
The multiplication results for more five digital weights and the ideal multiplication
results for an input range of 0 to 15µA is shown Fig. 4.6.
Here, the LSB is considered to be 1µA at the output, so, 1µA to 15µA were seen as
equivalent to four bits. This structure is expandable for higher resolution considering the
fact that the error is less than 0.5µA. Also, every two digits increase in weights resolution
needs another MA block to be added to the circuitry.
The current-based structure of this multiplier eliminates the need of extra storages to
store multiplication results that consequences a huge saving in the area. Moreover, the
combination of digital gates and analog circuit reduce the total area and static power con-

35

3. MIXED-SIGNAL SYNAPSE MULTIPLIERS FOR FEED-FORWARD NEURAL NETWORKS

sumption significantly comparing to a conventional MDAC.
Three recent conventional mixed-signal multipliers are compared to our proposed modular multiplier in Table II. The dash sign indicates that the content was not reported in the
original paper.
Table 3.2: Simulation and ideal multiplication result for Y=1111 at different input levels and the measured error.
Technology(um)
Power supply (V)
Chip area (um2 )
Power (mW)
Largest transistor (um/um)

Proposed
0.18
1.8
244
0.23
4.5/1.4

[12]
0.35
775
125/0.6

[13]
0.09
1.2
406
-

[14]
0.35
3.3
30.73
-

The area of this multiplier is 244um2 and the maximum power consumption is measured 0.23 mW. The small area of this multiplier makes it an excellent choice for deeplearning neural networks. Also, this highly modular and scalable VLSI architecture that
can be unified with neurons structure is capable of increasing the number of synapse per
die area and being used in different distributed neural network applications.

3.4

Conclusion

A modular mixed-signal multiplier architecture is implemented in CMOS 0.18µum for
multi-layer neural networks applications. The multiplier receives analog inputs and linearly multiply it with digital weights stored in registers and gives out the result as a current.
This structure reduced area and static power consumption by using a new technique in multiplication that multiplies every two bits of weights separately to the whole analog value.
The area and power consumption at the maximum input level of 244um2 and 0.23mW
with the measured output current error of less than 0.5µA respectively. Area and powerefficiency of this structure in addition to the modularity feature make this structure and an
easy and an excellent choice for the neural network design especially for multi-layers one

36

3. MIXED-SIGNAL SYNAPSE MULTIPLIERS FOR FEED-FORWARD NEURAL NETWORKS

in that these criteria are in high demand. Corner analysis results confirm the robustness of
this structure.

37

References
[1] C. Lu and B.-X. Shi and L. Chen, “An On-Chip BP Learning Neural Network with Ideal
Neuron Characteristics and Learning Rate Adaptation,” Analog Integrated Circuits and
Signal Processing, Vol. 31, pp. 55–62, 2002.
[2] V. F. Koosh and R. M. Goodman, “Analog VLSI neural network with digital perturbative learning,” IEEE Transaction on Circuits and Systems II: Analog and Digital Signal
Processing, Vol. 49, pp. 359–368, 2002
[3] L. Gatet and H. Tap-Beteille and M. Lescure, “Real-Time Surface Discrimination Using an Analog Neural Network Implemented in a Phase-Shift Laser Rangefinder,” IEEE
Journal on Sensors, Vol. 7, pp. 1381–1387, 2007.
[4] G. Zatorre-Navarro and N. Medrano-Marques and S. Celma-Pueyo, “Analysis and Simulation of a Mixed-Mode Neuron Architecture for Sensor Conditioning”, IEEE Transactions on Neural Networks, Vol. 17, pp. 1332–1335, 2006.
[5] H. Djahanshahi and M. Ahmadi and G. A. Jullien and W. C. Miller, “Quantization noise
improvement in a hybrid distributed-neuron ANN architecture,” IEEE Transactions on
Circuits and Systems II: Analog and Digital Signal Processing, Vol. 48, pp. 842–846,
2001.
[6] M. Mirhassani and M. Ahmadi and and G. Jullien, “Robust low-sensitivity Adaline
neuron based on Continuous Valued Number System,” Analog Integrated Circuits and
Signal Processing, Vol. 56, pp. 223–231, 2008.
[7] G. Khodabandehloo and M. Mirhassani, and M. Ahmadi, “Resistive-Type CVNS Distributed Neural Networks With Improved Noise-to-Signal Ratio,” IEEE Transactions
on Circuits and Systems II: Express Briefs, Vol. 57, pp. 793–797, 2009.
[8] B. Zamanlooy and M. Mirhassani, “Efficient VLSI implementation of neural networks
with hyperbolic tangent activation function,” IEEE Trans. VLSI Syst., vol. 22, no. 1, pp.
39–48, January 2014.

38

REFERENCES

[9] S. Bettola and V. Piuri, “High performance fault-tolerant digital neural networks,”
IEEE Transaction on Computers, Vol.5, No.23, pp. 230–233, 1997.
[10] D. Zhang and M.I. Elmasry, “VLSI compressor design with applications to digital
neural networks,” IEEE Transaction on Very Large Scale Integration (VLSI) Systems,
Vol.47, No. 3, pp. 1085–1091, 2006.
[11] K. Basterretxea and J.M. Tarela and I. del Campo, “Approximation of sigmoid function and the derivative for hardware implementation of artificial neurons,” IEE Proceedings on Circuits, Devices and Systems, vol.151, Issue 1, pp.18–24, 2004.
[12] Z. Gafsi, N. Hassen, M. Mhiri and K. Besbes, “A New Efficient Silicon Area MDAC
Synapse,” American Journal of Applied Sciences, vol.4(6), pp.378–385, 2007.
[13] G. Khodabandehloo, M. Mirhassani and M. Ahmadi, “16-level CVNS memory with
fast ADC,” IEE Electronics Letters, vol.45, No. 16, 2009.
[14] Y. Su, N. Ning, and Q. Yu, “A novel 2.5bit SHA-less MDAC design for 10bit 100Ms
pipeline ADC,” ICCp2011Proceedings,2011.

39

Chapter 4
Hardware Realization Of Mixed-Signal
Neural Networks

4.1

Introduction

Artificial neural networks (ANN) are popular adaptive trainable systems that are employed
in the vast field of applications from the prediction of nonlinear time series [1] and financial data forecasting [2] to the pattern recognition applications [3]. However, VLSI
implementation of these systems faces challenges due to the complications associated with
implementing a large fully parallel system especially when low complexity is required.
Flexibility, area, power-efficiency, and reliability are some of the most significant challenges to overcome in the hardware implementation of such systems. Moreover, as the
size and complexity of the network grows, its training becomes more difficult and it takes
longer to complete due to increased number of parameters.
In terms of hardware realization, a mixed-signal implementation approach was chosen

40

4. HARDWARE REALIZATION OF MIXED-SIGNAL NEURAL NETWORKS

to address the above-mentioned issues [3, 4]. This approach uses the analog circuit’s advantages such as the small, well-designed neurons and the current-mode structure to simplify
calculations [7, 6, 5]. In the same way, the mixed-signal structure avoids the drawbacks
typically associated with the analog structures such as lower accuracy compared to the digital implementations and the large capacitors which analog circuits require to store analog
weights. These storages cause more design complexity and limit the size of the implementable neural network.
In the proposed structure, neurons are divided into sub-neurons which are effective in
reducing the effect of quantization noise in the circuit [8]. Moreover, the neurons that are
used in this work, are effective in increasing the generalization capacity of the network.
There are several methods proposed in the literature that attempt in improving the network
performance and generalization capability. These include reduction of weight parameters
through weight sharing [9, 10], and multi-resolution learning [11]. However, these methods
are tailored for software simulations of neural networks and face challenges and difficulties
for hardware implementation.
It should be noted that the Sigmoid neurons become ineffective, when the input values
to a neuron increases, forcing the neuron to act more similar to a threshold neuron rather
than a non-linear sigmoid neuron. The neurons used in this chapter, however, are able to
self-scale their non-linear gain and do not require to be redesigned. The neuron used in this
chapter is simple and suitable for hardware realization of neural networks.
In this chapter, a modular synapse-neuron building block is introduced based on a
mixed-signal synapse and a distributed neuron. Due to the current-mode performance,
the addition, subtraction, and division are done in the most area-efficient way. The synaptic
multiplier utilizes AN D gates and weighted current mirrors instead of using the weighted
summation of currents as it is used in conventional DACs. The synaptic weights are stored
in digital registers; consequently, the storage capacitors are avoided. The design is laid out
in the TSMC CMOS 0.18µm process, and simulation results for a 4-input pattern recogni-

41

4. HARDWARE REALIZATION OF MIXED-SIGNAL NEURAL NETWORKS

Iin M11

M13

M15
Iout

T1
M12

M14

M16

M17
Bias1

Figure 4.1: The resistive-type neuron circuit modified to a robust current-mode structure.
The synaptic multiplier’s output current is applied to the neuron as Iin via terminal T1 . The
output current is shaped by a self-adjustable sigmoid function.
tion are provided to prove the performance of the design.

4.2

Self-adjustable distributed Neuron

In a distributed neural network, the neurons with small areas are desirable since there is
a sub-neuron for each synaptic multiplier forming a synapse-neuron module. An areaefficient resistive-type neuron was introduced in [13]; however, was sensitive to the mismatch and the process variations. Here, the resistive-type neuron has been improved to
generate an output current of Iout from the input current of Iin to generate the transfer function. Moreover, the neuron required current to voltage conversions, additions, and divisions
which are eliminated here. The modified circuit of the sub-neuron is presented in Fig. 4.1.
The gates of M14 and M15 should be set at Vdd /2 the gates of M14 and M15 should
be set at Vdd /2 to keep the transfer function symmetric. The diode-connected transistors
are used for biasing and sized in a way that they can provide the required biasing current.
The voltage of Bias1 is 250mV to keep the M18 ON even when the gate voltage is close
to zero. Fig. 4.2 shows the post-layout simulation results presenting the transfer function
of the improved neuron for three ranges of input currents of (−65µA, 65µA), (−100µA,
100µA), and (−200µA, 200µA).

42

4. HARDWARE REALIZATION OF MIXED-SIGNAL NEURAL NETWORKS

Figure 4.2: The variation of the neuron’s activation functions for the input ranges of
(−60µA, 60µA) shown by the solid line,(−100µA, 100µA) shown by dotted line, and
(−200µA, 200µA) shown by dashed line.

Figure 4.3: The 1000 runs Monte Carlo simulation results of the current-mode neuron’s
activation function.

43

4. HARDWARE REALIZATION OF MIXED-SIGNAL NEURAL NETWORKS

As shown in Fig. 4.2, the neuron is adjustable and can change its non-linear gain region
depending on the input current. This means that the neuron’s transfer function tunes itself
to go into the saturation region for larger input current depending on the input range. The
fact that the value of the weights might increase during the training, causes a neuron with
fixed transfer function to behave similarly to a threshold neuron for larger input values.
This has been shown to create difficulties during the training phase and has caused some
networks to rely more on linear-style neurons. The proposed neuron can adjust its transfer
function depending on the input values, and this can be achieved by the circuit without any
changes in the circuit.
The robustness of the proposed structure is shown by Monte Carlo analysis. Fig. 4.3
illustrates the 1000 runs of post-layout Monte Carlo analysis results considering the process
variation and the mismatch of circuits parameters.
In the following section, the neuron is used in a full network in a modular format. The
network is only for a proof of concept, to test the circuit operation. However, the neuron and
the modular synapse-neuron module can be used for hardware implementation of various
network sizes.

4.3

distributed neural network

In this section, the system level of a distributed neural network structure is introduced and
the unified current-mode synapse-neuron circuits are presented.
Fig. 4.4 shows the 4-3-2 configuration of the distributed feed-forward neural network.
The dashed box represents the modular block that contains a mixed-signal synaptic multiplier and a sigmoidal neuron. As shown in this figure, the synapse-neuron block and digital
registers are the only parts that are needed to form a multi-layer current-mode neural network. The j th programmable synaptic weight corresponding to the ith neuron in the second
layer is denoted by Wij which are multiplied by the current-form inputs I1−4 . The inputs

44

4. HARDWARE REALIZATION OF MIXED-SIGNAL NEURAL NETWORKS

of the hidden layer, Ia−c , are multiplied by the synaptic weights corresponding to the third
layer represented by Cij . The biases related to each multiplier are denoted by bij .
The circuits of the synapse-neuron block are discussed in the following subsections.

4.3.1

Synaptic Multiplier

Conventional synaptic multiplier also called multiplying digital to analog converter (MDAC)
principally works based on the weighted current mirrors, which leads to the use of large
transistors, especially in the most significant bit [12].
In this chapter, a small-area power-efficient multiplier is used as the synapse. The 5-bit
multiplier is designed by adding the sign bit to the multiplier that the authors proposed in
[12]. The synaptic multiplier works based on separating the two less significant and the
two most significant bits of synaptic weights and performing the signed multiplication by
combining the simple digital gates (AN D and N AN D) and weighted currents and adding
the sign bit od b4 at the end of the structure. This method results in the two identical
I1
W11

W21

W31

W22

W32

W23

W33

Ia W24

Ib W34

I2
W12
I3
W13
I4
W14

b21

b11

Ic
b31

b21

C11

C12

C13

b22

C21

C22

C23

O1
O2

Figure 4.4: The system level configuration of a 4-3-2 distributed neural network. Wij and
Cij are the digital synaptic weights corresponding to the second and third layers respectively. I1 to I4 are the input currents representing the input patterns.

45

4. HARDWARE REALIZATION OF MIXED-SIGNAL NEURAL NETWORKS

1

M5

M4

1
SW10

2

3

SW12 SW11

SW12

M0
Wn/Ln

M1

Wn/Ln

2Wn/Ln

2

SW20

M2
Wn/Ln

3

b1
b0

SW11

b1
b0

SW12

b3
b2

SW20

b3
b2

SW21

b3
b2

SW22

M6

M5
Wp/Lp

1

SW10

Wp/Lp

Wp/Lp

Iin

b1
b0

4Wp/Lp

2

M9W2/L2

W2/L2

T1

SW22 SW22 SW21
b4

b4

M3
2Wn/Ln

M10

M7 W1/L1

W1/L1

Iout

M8

Figure 4.5: The modular signed multiplying DAC that performs as the synapse [12]. T1
is connected to the terminal with the same name in Fig. 4.1 to build the synapse-neuron
module.
module shown as 1 and 2 dashed boxes in Fig.4.5 which diminish the mismatch effects of
transistors. Moreover, the size of the largest transistors is reduced significantly.
Fig. 4.5 shows the circuitry of the synaptic multiplier. Here, the input currents of I1−4
shown in Fig. 4.4 are denoted by Iin which is multiplied to the two least significant bits
of the synaptic weight in 1 and to the two most significant bits in 2. The addition of the
output current of 1 with the four times of the output of 2 will be the multiplication result of
the input current and four bits of the synaptic weight. The direction of Iout is determined
by the sign bit of b4 which is positive in case of a 0 and negative in the event of a 1.
The maximum power consumption and the dimensions of a single multiplier are measured
328µW and 103.25µm × 36.2µm respectively. The dimensions of transistors that are used

46

4. HARDWARE REALIZATION OF MIXED-SIGNAL NEURAL NETWORKS

Multiplication Results ( µA)

20

10

0

tt
ﬀ
ss
fs
sf
4
6
8

2
4

3

2

0

1

5

10

15

Synaptic Weights

Figure 4.6: The corner analysis simulation results of tt (Typical NMOS Typical PMOS), ff
(Fast NMOS Fast PMOS), fs (Fast NMOS Slow PMOS), sf (Slow NMOS Fast PMOS), and
ss (Slow NMOS Slow PMOS) that show the process variation effect on the multiplication
performance for the input current of 1µA that is multiplies to a digital weight that varies
from -11111 to 11111.
in this neural network are listed in table 4.1. Fig. 4.6 presents the corner analysis of the
Table 4.1: Sizes of the transistors of the multiplier.
M0 ,M1 ,M2 ,M3
M4 ,M5 ,M6
M7

4.5/1.4
1/1
4/1

M 8 , M9
M10 , M11

2.5/1
6/1

results of an input of 1µA multiplying to the 5-bit weight changing from -15 (11111) to
+15 (01111). The corner analysis represents the process variation effect of the multiplier’s
performance and shows the maximum deviation from the ideal multiplication results occurs
for the ff (Fast NMOS Fast PMOS) and is equivalent to 0.7uA, which is less than the input
current, and thus the accuracy is correctly considered to be 1µA. Via terminal T1 the output
current of this multiplier passes the neuron that is presented in the following subsection.

4.4

Pattern recognition

In this section, the performance of the 4-3-2 feed-forward neural network that is shown in
Fig. 4.4 is discussed. The network is built with the synapse-neuron block that was presented

47

4. HARDWARE REALIZATION OF MIXED-SIGNAL NEURAL NETWORKS

in the last section to show the feasibility of the design. This network is used to classify the
pattern templates as shown in Fig. 4.8.
Here, the network is trained offline. However, the design can be modified for the chip
in the loop training. The 5-bit weights and biases that were calculated off line in MATLAB
are as follows:


10011 11000 01011 10111




Wij =  11010 00110 00001 11110 


00110 01111 00010 11001


11001 01101 00001

Cij = 
01010 01010 11001




00000


00000



bi1 =  11000  b2j = 


10010
00100
The output currents of the neural network are compared to a reference current at the
terminals of O1 and O2 and provide the voltage of 0 (1) in case the current is lower (higher)
than the reference. The current comparator structure used in the proposed neural network
is shown in Fig. 4.7.
Fig. 4.9 shows the input currents that are introduced to the network and the output
voltages Out1 and Out2 . As seen in this figure the outputs are as we expected for any valid
combination of the input.
The layout of the neural network is presented in Fig. 6.12. The dimensions and the
average power consumption are measured 318.950µm × 446.150µm and 0.93mW respectively. The corner analysis results supports the notion that the network is not affected by
process variation due to the tunable reference currents.
A comparison of three 4-3-2 neural networks is shown in table 4.2. The table includes
the comparison between areas, power consumptions, and number of the tested templates.
Fig. 4.11 represents the sensitivity and robustness of the network performance to the
most critical paths in the design. The output currents are shown for the 5% variation in

48

4. HARDWARE REALIZATION OF MIXED-SIGNAL NEURAL NETWORKS

Table 4.2: The comparison table of the proposed 4-3-2 distributed NN and other
similar structures.
Technology(um)
Chip area (um2 )
Average Power (mW)
Power per synapse(mW)
Number of tested templates

Iref

Proposed
0.18
142299.5
0.93
0.33
6

[13]
0.18
385320
5
6

[8]
1.2
3.65
4

Out1

Iout1
O1

Figure 4.7: The structure of the current comparators that are connected to the O1 and O2
terminals of Fig. 4.4.
the input current (dotted lines), 5% changes in the width and length of the transistor M6 in
the multipliers of the input layer (dashed lines), and 5% variations in the width and length
of the transistor M1 8 of the second layer (solid lines). As shown in the figure, the outputs
Out1 and Out2 remain the same while these variations are applied.

4.5

Conclusion

A mixed-signal distributed neural network designed for a pattern recognition application is
implemented in TSMC CMOS 0.18µm. The network, which uses a power-efficient synaptic multiplier consumes a low power and occupies a small area. The 5-bit digital synaptic
weights are introduced to the synapse where they are multiplied with analog input currents.

49

4. HARDWARE REALIZATION OF MIXED-SIGNAL NEURAL NETWORKS

Templates Input equivalent bits Output bits
0101

10

0011

01

1010

01

1100

10

1001

00

0110

11

Figure 4.8: The input templates that are used to test the functionality of the neuron.
0

0

0

1

1

1

0

1

1

0

0

1

1

0

1

0

1

0

1

1

0

1

0

0

Figure 4.9: Simulation results of the 4-3-2 distributed neural network to prove the pattern
recognition capability.
Following the multiplication, the resulting current passes through the neuron which applies
a sigmoid-shaped transfer function to deliver the output currents. The area of the network
is measured 142299.5µm2 . The average power consumption is measured 0.93mW .

50

4. HARDWARE REALIZATION OF MIXED-SIGNAL NEURAL NETWORKS

Figure 4.10: The 4-3-2 mixed-signal network layout.

Figure 4.11: Simulation results of the 4-3-2 distributed neural network to show the sensitivity regarding three critical paths.

51

References
[1] Liang Y and Liang X, “Improving signal prediction performance of neural networks
through multiresolution learning approach,” IEEE Trans. Syst., Man, Cybern. B, vol.
36, no. 2, pp. 341–352, Apr. 2006.
[2] Reid D, Hussain A, and Tawfik H, “spiking neural network for financial data prediction,” Proc. IJCNN, pp. 1–10, Aug. 2013.
[3] Calitoiu D, Oommen B, and Nussbaum D, “Desynchronizing a chaotic pattern recognition neural network to model inaccurate perception,” IEEE Trans. Syst., Man, Cybern.
B, vol. 37, no. 3, pp. 692–704, Jun. 2007.
[4] Luo C, Ying Z, Zhu X, Chen L, “A Mixed-Signal Spiking Neuromorphic Architecture for Scalable Neural Network,” International Conference on Intelligent HumanMachine Systems and Cybernetics (IHMSC),vol. 1, pp. 179–182, Aug. 2017.
[5] Gatet L, Tap-Beteille Land, Lescure M, “Real-Time Surface Discrimination Using an
Analog Neural Network Implemented in a Phase-Shift Laser Rangefinder,” IEEE Journal on Sensors, Vol. 7, pp. 1381–1387, 2007.
[6] Koosh V, Goodman R, “Analog VLSI neural network with digital perturbative learning,” IEEE Transaction on Circuits and Systems II: Analog and Digital Signal Processing, Vol. 49, pp. 359–368, 2002.
[7] Lu C, Shi B, Chen L, “An On-Chip BP Learning Neural Network with Ideal Neuron
Characteristics and Learning Rate Adaptation,” Analog Integrated Circuits and Signal
Processing, Vol. 31, pp. 55–62, 2002.
[8] Djahanshahi H, Ahmadi M, Jullien G, Miller W,“Quantization Noise Improvement in
a Distributed Neuron Architecture,” Proc. of 40th Midwest Symposium on Circuits and
Systems, Vol. 2, pp. 1282–1285, Aug. 1997.
[9] K. J. Lang, A. H. Waibel, “A time-Delay Neural Network Architecture for Isolated
Word recognition,” Neural Network Journal, Vol. 3, pp. 23–43, 1990.

52

REFERENCES

[10] E. A. Wan, “Time Series Prediction by Using a Connection Network with INternal
Delay Lines,” Time Series Prediction:Forecasting the Future and Undersading the
Past, pp. 195–218, 1190.
[11] Y. Liang, X. Laing, “Improving Signal Prediction Performance of Neural Networks
Through Multiresolution Learning Approach,” IEEE Transaction on Systems, Man,
and Cybernetics, Vol. 36, No. 2, pp. 341–352, 2006.
[12] Bahar Youssefi, Mitra Mirhassani, Jonathan Wu,“Efficient Mixed-Signal Synapse
Multipliers for Multi-Layer Feed-Forward Neural Networks,”2016 IEEE 59th International Midwest Symposium on Circuits and Systems, 822–825, 2016.
[13] Khodabandehloo, Golnar, Mitra Mirhassani, and Majid Ahmadi, “A prototype CVNS
distributed neural network using synapse-neuron modules,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol.59, no. 7, pp. 1482-1490, 2012.

53

Chapter 5
Dynamic Behavior Of A Single Sigmoidal
Neuron: Stable To Period Doubling

5.1

Introduction

Neural networks are getting more popular in signal processing and computation due to
their ability of learning which can target different applications such as nonlinear signal
prediction, time-series approximation, pattern recognition, and medical purposes [1, 2, 3].
Complex neural network systems consist of a few to a large number of neurons the dynamic
behavior of which determines the performance of the system. Consequently, a deep understanding of a single neuron dynamic behavior plays a key role in understanding the nature
of the neural networks and expanding different approaches towards different problems.
Neurons’ activation functions fall into two main categories, artificial, and spiking which
are both popular and progressive in parallel to expand the applications of neural networks.
Artificial neuron’s activation functions such as hyperbolic tangent, sigmoid, Poslin etc.

54

5. DYNAMIC BEHAVIOR OF A SINGLE SIGMOIDAL NEURON: STABLE TO PERIOD DOUBLING

usually vary from -1 to 1 [4] and there have been various implantation methods proposed so
far [5, 6, 7]. On the other hand, spiking neurons are more similar to the neuron’s biological
model due to their oscillatory behavior which also makes them more difficult to be realized.
The oscillatory behavior of spiking neurons can be described by a dynamic system based
on a set of two coupled differential equations [8, 9]. The neural oscillator is also has been
discussed by the computational model proposed by Freeman which show the oscillatory
behavior of the interconnective groups of neurons [10, 11, 12]. The oscillatory behavior of
coupled artificial neurons has been also discussed [13].
In this chapter, we are interested to provide the possible oscillatory behavior of a single
sigmoid neuron which is much easier to be realized compared to spiking neuron and Freeman’s model. The oscillatory behavior of such a simple structure may open a way towards
realizing the neural oscillation without implementing the coupled differential equations.
In this chapter, we analyze the dynamic behavior of a single neuron using the bifurcation and stability maps for investigating the oscillatory behavior of the sigmoidal neuron.

5.2

Background and Theory

In this section, the theory of the dynamic behavior of a single artificial neuron in a feedback
configuration is discussed. The sigmoidal neuron with the synaptic weight of β is shown
Fig. 5.1. The output value of y is updated in the discrete time domain. At the instant n,
part of the output signal is sampled and added to the input of x0 by the synaptic weight of
β. The addition of x0 + βy[n] passes through the sigmoidal activation function to generate
the output at the instant n + 1. The local map of the sigmoid activation function is:
f (x) =

1
1 + e−µx

(5.1)

in which, the neuron gain, µ, is a positive number that determines the maximal slope of the
sigmoid function. It should be noted that µ is a variable number and affects the dynamic

55

5. DYNAMIC BEHAVIOR OF A SINGLE SIGMOIDAL NEURON: STABLE TO PERIOD DOUBLING

behavior of the system significantly; this effect will be discussed in detail later. Based on
the power flow between the input and the output, the dynamic behavior of the system is
described as follows:
y[n + 1] =

1
1+

e−µ(x0 +βy[n])

(5.2)

in which, n, n + 1, n + 2, . . . , are discrete time instants that present the experienced propagation delay of ∆T by the system such that (n + 1) − n = ∆T .
For a given constant value of the input, x0 , the equation (5.2) describes the transient
behavior of the output. From the perspective of dynamic behavior, the system is defined as
the one-dimensional map of f (y[n]) if µ, β, and x0 are constant. Therefore, the dynamic of
the system is explained by the iteration map point of view on the value range of the neuron
on I = [0, 1] [13].
Due to the nonlinear activation function of the system, we expect to observe a broad
spectrum of the nonlinear behavior which is described by the iteration map.
There are two main methods to study the dynamic behavior of nonlinear systems regulated by the iteration map, the linearization method, and Lyapunov stability analysis [14].
In this paper, we choose to go on with the linearization method for the local stability assessment. The linearization method is much less complex compared to Lyapunov analysis.
Furthermore, developed mathematical tools for studying the dynamic behavior of linear
time-invariant (LTI) systems can be employed in this approach.
The system’s essential requirement that makes it adequate to use the linearization method
is to have a stationary point for any arbitrary value of parameters. Based on the Brouwer
fixed point theorem [15], if the local map of f is continues for any values of x0 , β, and µ
while the state space of I is a subdivision of R, then the system always has a stationary
solution. The existence of the stationary solution allows us to use the linearization method
to investigate the systems stability. The stability map of the single neuron configuration is
displayed in Fig. 5.2 for the random values of x0 , β, and µ.

56

5. DYNAMIC BEHAVIOR OF A SINGLE SIGMOIDAL NEURON: STABLE TO PERIOD DOUBLING

x0

y

1
-μx
1+e

β
Figure 5.1: A single sigmoidal neuron in a feedback configuration.
After linearization, the next step is to discuss the stability of the LTI system by studying
the locus of the eigenvalues of the system in Z-plane which is also known as the region of
convergence (ROC).
The linearization of this system which is performed by employing the first-order perturbation stability analysis and the locus of the eigen values of our sigmoidal system are
discussed in detail in the following section.

5.3

Stability Analysis of the single neuron structure

The First-order perturbation stability analysis is a well-known method to study the dynamic
behavior of nonlinear systems [16] by linearizing the iterative map about a fixed-point. In
this approach, the complex nonlinear system is approximated by using an exact solution of
y0 to a related but easier system through applying a small perturbation term of ε to the fixed
point.
That means the solution to the complex system is approximated from the combination
of the exact solution in a fixed point and the small perturbation term. Upon the condition
and the parameters values of the system, the fixed point can be asymptotically stable, stable,
or unstable.

The fixed point of the equation (5.2) is obtained for the state variable of y

57

G

5. DYNAMIC BEHAVIOR OF A SINGLE SIGMOIDAL NEURON: STABLE TO PERIOD DOUBLING

10 0

-1

-0.5

0

0.5

1

y
Figure 5.2: The stationary solution for the arbitrary values of µ = 2, β = 3, and x0 = 0.55.

0.4

y

Period Doubling
0.2

0
0

5

10

Figure 5.3: The bifurcation map of the structure shown in Fig. 5.2 when x0 = 0.55 and
y0 = 0.001728.

58

5. DYNAMIC BEHAVIOR OF A SINGLE SIGMOIDAL NEURON: STABLE TO PERIOD DOUBLING

y0
0.9

0.2

Figure 5.4: Stationary solution of y0 for various β and µ values.

Stable
Period Doubling

Figure 5.5: Stability map achieved from the system eigenvalues which show the two possible behaviours, stable and period doubling, for the system.

59

5. DYNAMIC BEHAVIOR OF A SINGLE SIGMOIDAL NEURON: STABLE TO PERIOD DOUBLING

when it is time-independent such that:
y[n + 1] = y[n] = y0

(5.3)

In this paper, equation (5.3) is solved numerically by graphing to find the stationary point
due to the analytical complexity of the system. In this approach, equation (5.3) is substituted in equation (5.2) to define the objective function of G as follows:
G(y0 ) = y0 (1 + e−µ(x0 +βy0 ) − 1)

(5.4)

The stationary solution of y0 is the point where the objective function is globally minimum
such that |G(y0 ) ≈ 0| as shown in Fig. 5.2.
To study the dynamic behavior, we apply the first-order perturbation term of ε to the
stationary solution of y0 as follows:

y[n] = y0 + ε[n]

(5.5)

By Substitution of equation (5.5) into the dynamic discerption of the system described
by equation (5.2) for two consecutive time instants, equation (5.6) and equation (5.7) are
achieved as follows:

y0 + ε[n + 1] =

y0 + ε[n] =

1
1+

e−µ(x0 +β(y0 +ε[n]))

1
1+

e−µ(x0 +β(y0 +ε[n−1]))

(5.6)

(5.7)

60

5. DYNAMIC BEHAVIOR OF A SINGLE SIGMOIDAL NEURON: STABLE TO PERIOD DOUBLING

If we note that ε is much smaller than y0 , then the following equation is realized by subtracting (5.7) from (5.6):

ε[n + 1] − ε[n] =

e−µ(x0 +βy0 ) (eµβε[n] − eµβε[n−1] )
(1 + e−µ(x+βy0 ) )2

(5.8)

in which the higher order perturbation terms were ignored. The equation is approximated
by utilizing the first-order Maclaurin expansion of eµβ [n] and eµβ [n−1] as follows:
ε[n + 1] − ε[n] =

e−µ(x0 +βy0 ) (1 + µβε[n] − 1 − µβε[n − 1])
(1 + e−µ(x+βy0 ) )2

(5.9)

e−µ(x0 +βy0 ) (µβ)(ε[n] − ε[n − 1])
(1 + e−µ(x+βy0 ) )2

(5.10)

which is summarized as:
ε[n + 1] − ε[n] =

with a mathematical manipulation of ε[n+1]−ε[n] = δ[n+1] and ε[n]−ε[n−1] = δ[n]
and keeping the first order perturbation terms the following equation is achieved:

δ[n + 1] =

−(µβ)e−µ(x+βy0 )
δ[n]
(1 + e−µ(x+βy0 ) )2

(5.11)

The perturbative elements at time instants of n + 1, n, n − 1, n − 2, . . . are related
together as
δ[n + 1] = Zδ[n] = Z 2 δ[n − 1] = Z 3 δ[n − 2]

(5.12)

in which Z represents the eigenvalue of the system in Z-space. The locus of Z in ROC determines the different dynamic behaviors of the neural network. For a system with several
eigenvalues, the system is stable if and only if all the eigenvalues are within the conver-

61

5. DYNAMIC BEHAVIOR OF A SINGLE SIGMOIDAL NEURON: STABLE TO PERIOD DOUBLING

gence region. The system experiences the instability even if only one of the eigenvalues is
outside of the ROC. The instable behavior of the system for different eigenvalues can be
summarized as follows:




Re{Z} ≥ 1, Im{Z} = 0,



f (x) = Re{Z} ≤ −1, Im{Z} = 0,





Re{Z} ≤ 1, Im{Z} = 0,

Bistable
Period doubling

(5.13)

Self-Pulsation

The system is bistable when there are two stable equilibrium states which the system
can relax in either of two [17] depending on the values of x0 , µ, and β. The region of period
doubling is explained from the bifurcation map point of view. Bifurcation occurs when a
change in the system parameter causes the system experiencing a qualitative variation in
the output. The bifurcation map related to the parameters of Fig. 5.2 is demonstrated in
Fig. 5.3.
In the period doubling regime, for a single parameter of µ, two possible outputs occur.
The chance that one of these states is excited is identical. The transition from one state to
the other does not show the hysteresis behavior on the contrary of what occurs in bistability.
Consequently, the output oscillates between these two states with the period of twice the
propagation delay of the original system, ∆T .
To understand how period doubling works, Euler representation of the eigenvalue is
employed as Z = rejω∆T , where r is the magnitude of the eigenvalue and ω is the frequency
of oscillation. To have period doubling (also know as Ikeda) instability, rejω∆T should be
less than -1 [18] leading to ejω∆T = ejπ . The phase shift can be rewritten as 2πfosc ∆T = π,
therefore, the oscillation’s frequency is Tosc = 2∆T . This type of oscillation was observed
by Ikeda et. al. in the nonlinear optical ring resonator [18].
Self-pulsation represents another form of oscillatory behavior of the system where the
oscillation occurs for a constant DC input. It should be noted that the system does not need
a forced oscillation from the input that has harmonic elements. the frequency of oscillation

62

5. DYNAMIC BEHAVIOR OF A SINGLE SIGMOIDAL NEURON: STABLE TO PERIOD DOUBLING

y0

=-3;
=-1.6

0.4
y

0.6

0.5

y

B2

0.3

0.4

0.2
512

514

516

518

520

discrete time

0.2

B1

0
0

500

1000

discrete time
(a)
0.8

y

0.6

=0.5;
=2

0.4
0.2
0
0

500

1000

discrete time

(b)
Figure 5.6: Time domain behavior of the system for two different arbitrary sets of µ an β.
(a) shows the oscillatory behavior for µ = 6 and β = 3. (b) shows the stable behavior for
µ = 0.3 and β = 0.15.

63

5. DYNAMIC BEHAVIOR OF A SINGLE SIGMOIDAL NEURON: STABLE TO PERIOD DOUBLING

highly depends on the system parameters such as µ and β, which causes the phase noise
and jitter if is used in the local oscillator-based systems.
Among these three unstable behaviors, period doubling is promising because of its
possible application as a local oscillator due to its jitter free nature of oscillation. In the
period doubling regime, the frequency only depends on the propagation delay of the system
while in the self-pulsation it depends on all the parameters of the system, µ, β, and x0 .
The bistable regime has possible applications in flip-flops and latches or any other devices such as memories which need to store binary data.

5.4

Simulation Results

In this section, we present the simulation results to show the dynamic behavior of the
sigmoidal neuron structure.
According to the discussion of the section 5.3, the first step to analyze the dynamic
behavior is to find the stationary solution. for an arbitrary value of x0 -let’s say 0.5- the
stationary solution is a function of µ and β as shown in Fig. 5.4.
Since y0 is a function of the design parameters of µ and β, the corresponding eigenvalue
location can be determined from equation (5.11). From the locations of eigenvalues which
represent the dynamic behavior of the system, the stability phase map can be regulated as
a function of the design parameters as shown in Fig. 5.5. According to the stability phase
map, the system can be either stable or in the period doubling region.
The time domain behavior of the system is shown in Fig. 5.6. As shown in Fig. 5.6(b)
the system rests at the steady state after a short transient state. For the period doubling
region, the system oscillates between two constant values both of which depend on y0 .
The main advantage of the oscillatory behavior shown in Fig. 5.6(a) is that the frequency of oscillation is only dependent on the propagation delay of the system and can
be considered constant unless a delay is intentionally introduced to the system for the fre-

64

5. DYNAMIC BEHAVIOR OF A SINGLE SIGMOIDAL NEURON: STABLE TO PERIOD DOUBLING

quency tuning purposes.
The analysis, also, determines the region in which the neuron shows the oscillatory
behavior. Therefore, the parameters can be chosen to increase the tolerance of the system
to the process variations. The frequency of oscillation does not depend on the µ, β, or x0 as
long as the system is kept in the period doubling region which minimizes the process and
fabrication effect on the oscillation frequency.

5.5

Conclusion

In this paper, the nonlinear behavior of a single sigmoidal neuron with a feedback synaptic weight is discussed. The analysis as well as the bifurcation and phase stability maps,
prove that there are only two possible behaviors for the system, stable, and period doubling. The system oscillates with the period of twice of the propagation delay in the period
doubling region. The oscillation’s frequency does not depend on any other system’s parameters except the propagation delay which suggests promising applications in the VLSI
implementations of the oscillatory system by reducing the dependency to the fabrication
variations. The proposed structure is the simplest neural oscillation structure that has been
proposed so far.

65

References
[1] Y. Liang and X. Liang, “Improving signal prediction performance of neural networks
through multiresolution learning approach,” IEEE Trans. Syst., Man, Cybern. B, vol.
36, no. 2, pp. 341–352, Apr. 2006.
[2] L. Ngo and J. H. Han, “Multi-level deep neural network for efficient segmentation of
blood vessels in fundus images,” Electronics Letters, vol. 53, no. 16, pp. 1096–1098,
Jun. 2017.
[3] P. Shi and F. Li, L. Wu, C. C. Lim “Neural network-based passive filtering for delayed
neutral-type semi-Markovian jump systems,” IEEE transactions on neural networks
and learning systems, vol. 28, no. 9, pp. 2101–2114, Sep. 2017.
[4] H. B. Demuth, M. H. Beale, O. De Jess, M. T. Hagan, “Neuron Model and Network Architectures,” Neural Network Design, 2nd edition, Boston, PWS Publishing Co. 1996.
[5] C. H. sai, Y. T. Chih, W. H. Wong, and C. Y. Lee, “A Hardware-Efficient Sigmoid Function With Adjustable Precision for a Neural Network System,” IEEE Trans. Circuits
Syst. II, vol. 62, no. 11 pp. 1073–1077, Nov. 2015.
[6] Q. Liu, and J. Wang, “Finite-Time Convergent Recurrent Neural Network with a HardLimiting Activation Function for Constrained Optimization with Piecewise-Linear Objective Functions,” IEEE Trans. Neural Netw., vol.22, no.4, pp. 601–613, Mar. 2001.
[7] T. Qiu, X. Wen, and F. Zhao, “Adaptive-Linear-Neuron-Based Dead-Time Effects
Compensation Scheme for PMSM Drives,” IEEE Trans. Power Electron., vol. 31, no.3,
pp. 2530–2538, Mar. 2016.
[8] X. Wu, V. Saxena, K. Zhu, S. A. Balagopal, “A cmos spiking neuron for brain-inspired
neural networks with resistive synapses and in situ learning.” IEEE Transactions on
Circuits and Systems II: Express Briefs, vol. 62, no. 11, pp. 1088–1092, Nov. 2015.
[9] G. Indiveri and S. Fusi, “Spike-based learning in VLSI networks of integrate-and-fire
neurons,”in IEEE Int. Symp. Circuits Systems (ISCAS 2007), pp. 33713374, May 2007.

66

REFERENCES

[10] W. Freeman,J. C. Principe Mass Action in the Nervous System. New York: Academic,
1975.
[11] G.N.Borisyuk, R.M. Borisyuk, A.B.Kirillov, V.I.Kryukov, and W.Singer,“ Modeling
of oscillatory activity of neuron assemblies of the visual cortex.,” In Neural Networks,
IJCNN International Joint Conference, pp. 431-434. Jun. 1990.
[12] D. Xu,J. C. Principe “Dynamical analysis of neural oscillators in an olfactory cortex
model,” IEEE transactions on neural networks , vol. 15, no. 5, pp.1053–1062, Sep.
2004.
[13] X. Wang, “Period-doublings to chaos in a simple neural network: An analytical
proof,” Complex Systems , vol. 5, no. 4, pp. 425–441, 1991.
[14] H. K. Khalil,J. C. Principe Noninear Systems Prentice-Hall, New Jersey, no.5, 1996.
[15] D. Gale, “The game of Hex and the Brouwer fixed-point theorem,” The American
Mathematical Monthly, vol. 86, no. 10, pp. 818–827, Dec.1979.
[16] P. Kokotovic,H. K. Khalil, and J. O’reilly Singular perturbation methods in control:
analysis and design. Society for Industrial and Applied Mathematics, 1999.
[17] Y. Jia, and J. R. Li, “Steady-state analysis of a bistable system with additive and
multiplicative noises,” Physical Review E, vol. 53, no. 6, pp. 5786, Jun. 1996.
[18] K. Ikeda, and J. R. Li, “Multiple-valued stationary state and its instability of the transmitted light by a ring cavity system,” Optics communications, vol. 30, no. 2, pp. 257–
261, Aug. 1979.

67

Chapter 6
Low-Power Mixed-Signal Implementation
of the DA-based FIR Filter

6.1

Introduction

For portable devices that are used for dynamic signal processing, stability, power, and area
efficiency are ever-present need. Feeding these needs means a reduction in system size
and the off-chip communication while increasing the battery life which are the main design
concerns in portable devices. FIR filters are usually used at the early stage of dynamic
signal processing applications due to their stability. Since the filter is one of the largest
components in the system, it is vital to be designed low-power and area-efficient.
In digital signal processing, the inner product, which is the essence of the many processing functions including FIR filters, is characteristically realized based on multiplyaccumulate (MAC) operations. Although the MAC units can be easily programmed, they
negatively affect the throughput of the filter, especially of the high order ones. The lower

68

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

throughput means the computation needs the higher clock rate for the system which increases the power consumption. In fact, the computation time and the number of required
MAC operations increase linearly with the length of the input vector and the filter order
respectively. Consequently, implementation of a real-time and low-power filter would be a
challenging task as the order increases [1, 2, 3, 4].
Distributed Arithmetic (DA) [3, 4, 5, 6, 7] is an efficient alternative for decreasing the
power consumption in real-time applications. In this method, multipliers are replaced by
adders and shift registers and multiplication is performed in fixed cycles of time while
coefficients are stored on the chip. The fixed-cycle performance makes the structure computationally efficient especially when the input length is large. In digital signal processing,
despite the computation efficiency, the DA approach would not be area efficient compared
to MAC due to a considerable number of memory units that it must use. This problem could
be eased by using mixed-signal implementation of DA Multiplying [7, 8], or switchedcurrent techniques [3, 9, 10, 11] with limitations on power and speed [8].
The mixed-signal approach proposed in this chapter provides a subtle solution to large
area occupation problem of DA-based structures by focusing on the processing stage rather
than the analog storages. To demonstrate the efficiency of our approach, we implemented
a 16-tap 8-bit adaptable current-mode FIR filter with a low area and power consumption.
An LPF and a BPF are realized at different sampling frequencies to prove the efficiency of
the proposed structure,
In this chapter, a new structure for the current-mode mixed-signal FIR filter is proposed
and implemented based on the DA. The proposed structure is low-power and area-efficient
taking advantage of the current-mode and DA-based structures. An LPF and a BPF are
implemented to prove the efficiency of the proposed structure.

69

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

Shift Registers
CLK

Input

x0j

X0

x1j

X1

IC0

Is (T)

IC1

If (T-1)

+

Iout = If (T)

D2 D1
CLK

IC15
x15j

X15

CLK

(b)

D2

I f (T-1/2)

D1

2-1

CLK
RESET: Changing the sign

Is (T )
SUM

+/-

(c)

RESET

y[n]

Iout (T)
S2

If (T-1)

D2

CLK

2-1

CLK

D1
CLK

S1

Is(T(N-1))

+

Iout(T(N-1))

If (T(N-2))
RESET

D2 D1
CLK

(a)

y[n]

CLK

(d)

Figure 6.1: The proposed DA architecture for a 16-tap 8-bit mixed-signal FIR filter. xij
is the j th bit of the ith input Xi . ICi is the ith filter coefficient and y[n] is the final output
current. (a) The compelete current-mode DA architecture (b) The structure at the high state
of the first N − 1 clock cycles. (c) The structure at the low state of the first N − 1 clock
cycles (d) The structure at the N th clock cycle when the operation is done.

70

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

6.2

Distributed Arithmetic

The DA concept [12] is used for the calculation of the inner product of two vectors in a
bit-serial mode, in which the output Y [n] is generated by the addition of the delayed and
weighted samples of the digital input X[n].
An inner product which represents an FIR filter is computed as follows [4, 7]:

Y [n] =

M
−1
X

IC [i] · X[n − i]

(6.1)

i=0

where ICi and M denote filter coefficients and number of taps respectively.
To achieve the DA formulation, X is represented in 2’s-complement format. Assuming
that X[n] is an N -bit word, it can be represented by separating it’s sign bit in the following
form:
X[n − i] = −xi0 +

N
−1
X

xij 2−j

(6.2)

j=1

in which xi0 is the most significant bit of ith component in X[n] vector indicating sign, and
xij is the j th bit of ith component.
By substituting the X[n] from (6.2) in (6.1), Y [n] can be written as follows [3]:

Y [n] = −

M
−1
X

ICi xi0 +

i=0

N
−1
X
j=1

2

−j

M
−1
X

xij ICi

(6.3)

i=1

There are N − 1 clock cycles of divisions and feedbacks required to realize the second
term of equation (6.3) utilizing shift registers, multipliers, and adders [3, 4, 7].
At the N th clock cycle, the last input is subtracted from the feedback to generate the
first term of (6.3). At the N + 1th clock cycle, the system resets to set the feedback zero
and getting ready for the next operations.
That means N -bit serial input M -tap filter which works based on the DA concept re-

71

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

quires fixed number of clock cycles, N to perform the filtering operation. It should be noted
that the same filter which is MAC-based needs M MAC units in typical DSP approaches.
The DA-based approach offers lower power consumption of the filter especially of the
higher order ones. The mixed-signal structure proposed in the following section offers even
more efficiency by simplifying computations and a hardware realization that is needless of
large memories.

6.3

Proposed Current-Mode Distributed Arithmetic Structure

The proposed mixed-signal DA architecture is composed of the following main components: sixteen (number of taps) 8-bit (number of bits of the digital input) digital shift
registers to introduce digital inputs to the filter, eight 5-bit shift registers to store the multiplicands, eight 5-bit digital to analog converters (DAC), and two current delay/divider
cells.
It should be noted that the number of multiplicands storages and DACS are reduced
from sixteen to eight according to the symmetry of the filters coefficients.
The configuration of the proposed DA structure is showed in Fig. 6.1(a). As shown in
this figure, two’s complement inputs, X0 to X15 , are serially fed to the system through the 8bit shift registers. At j th clock cycle, the least significant bit of ith input (xij ) is multiplied
by the current-form multiplicand ICi which is the output of the ith DAC. Current-mode
multiplication results are added to each other through the connection node SU M . The
filtering computation is performed in N = 8 clock cycles. Figures 6.1(b) to (d) show
the performance of the delay and division section in various clock cycles and present the
filtering process in more details as follows.
Fig. 6.1(b) displays the performance of the delay and division section during the high
state of the first N −1 = 7 clock cycles. In this period of time, S1 and S2 are close and open

72

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

Weighted Binary Switches

4-bit

Input

DAC
Sign0
IC0

x0j

DAC
Sign1
IC1

x1j

SHR0

x07 x06 x05 x04 x03 x02 x01 x00 x0j
SHR1

x2j

IC3

x3j

IC15

x15j

x1j

x17 x16 x15 x14 x13 x12 x11 x10

SHR15

IC2

SUM
Is

Processing Stage
(Figure 3)

x150 x15j

x157

DAC
Sign15

Ib

IC2

IC1

IC0
Sign0

Sign1

Sign2

IC15
Sign15

Ib

Figure 6.2: Multiplying stage of the proposed mixed-signal filter.

73

Division by Two

Is

Se

Is

Is

RESET

If

If

M in

If =ID2
bias1
CLK

RESET

M2
W/2L
Sd

C6

C1=C2
C1=0.1C 3

M1
W/L
M8

ID1

C1

CLK
CLK

RESET

Is

Comparator

SUM

Iin

bias2

C5

Sf

M5

CLK
C2

Sc

M7
RESET

Control

Is
1

If

Sa

C3

M6

C4

Sb

RESET

Multiplying
(Figure 2)

RESET

2

Control Control

M4

Control

M3

Output Block

Iout

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

Figure 6.3: DA-based delay/division stage of the proposed current-mode mixed-signal fil74
ter.

First Delay Cell:D1

Second Delay Cell:D2

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

CLK

2

1

3

4

5

6

7

9

8

RESET
Is
Is(T1)
2 +Is(T2)
Is(T1)

Is(T1) Is(T2)
4 + 2 +Is(T3)

Is(T1) Is(T2)
256 + 128 +

+

Is(T6) Is(T7)
-Is(T8)
4 + 2

Iin=If+Is
Is(T1) Is(T2) Is(T3)
Is(T1) Is(T2)
8 + 4 + 2
Is(T1)
4 + 2
2

Is(T1) Is(T2)
256 + 128 +

+

Is(T6) Is(T7)
4 + 2
Is(T9)
2

ID1

ID2 = If
Control

Iout

Is(T1)
2

Is(T1) Is(T2)
4 + 2

Is(T1) Is(T2) Is(T3)
8 + 4 + 2

Is(T1) Is(T2)
256 + 128 +

+

Is(T1) Is(T2)
256 + 128 +

+

Is(T6) Is(T7)
4 + 2

Is(T6) Is(T7)
-Is(T8)
4 + 2

Figure 6.4: Overall conceptual operating waveforms of the proposed filter. The notations
show the signal level at the specific clock cycle.

75

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

respectively and the output current of Iout feeds back to the first current delay cell, D1 . The
output of D1 and D2 pass the current trough if and only if their inputs are disconnected
first. Consequently, during the high state of the first N − 1 = 7 clock cycles, D1 stores the
current while it is disconnected from D2 . Simultaneously, the current stored in D2 in the
previous clock cycle is added to the current coming from the node SU M , IS .
Each of D1 and D2 stores the current for half a clock cycle releases it in the second
half. Fig. 6.1(c) presents what happens at the low state of the first N − 1 = 7 clock cycles
where the current stored in D1 is divided by two and fed to D2 .
The equation (6.3) is fully generated at the N th clock cycle by subtracting Is from the
feedback current, If as depicted in Fig. 6.1(d). At this time, IS and If are equal to the first
and the second terms of the equation (6.3) respectively. The output current of this step is
the final result for the filter.
The CLK signal controls the speed of the operation by controlling the input shift registers and delay cells. The RESET signal opens the switch S1 at the N − 1th clock cycle
to cut off the feedback branch and clear the data at the end of each N cycle stream.
In the proposed structure, the inverse of some of the controlling signals such as CLK
and RESET are also noted. It does not mean that CLK and RESET are really utilized as
the controlling signals but CLK and RESET control PMOS switches instead of NMOS
ones.

6.4

Proposed Filter Implementation

In the previous section, the basis of the DA-based FIR filter was discussed. In this section,
the novel implementation of a 16-tap 8-bit FIR filter based on the DA is proposed.
Fig. 6.2 and Fig. 6.3 show the proposed filter structure. Fig. 6.2 presents the multiplying
stage at the output of that connects to the processing stage circuitry shown in Fig. 6.3.
Filter coefficients, ICi , are signed values that are fed to the circuit in the current form

76

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

through 5-bit shift registers and DACs as presented in Fig. 6.2. The negative sign means a
change in the direction of the coefficient currents.
Current-mode multiplication between the filter coefficient (ICi ) and the j th bit of ith
input (xij ) is performed through weighted binary switches illustrated in Fig. 6.2. The input
xij lets the flow of ICi in case of a 1 or opens the switch and cuts off the current in the
event of a 0. The multiplied currents are added together by connecting the node SU M .
The addition is performed considering coefficients’ signs. That means the direction of the
addition result, Is , can be leftward or rightward at the node SU M depending on the values
of xij at the existing clock cycle.
The addition result, Is , reaches it’s smallest possible negative value when all weighted
binary switches related to positive ICi s are open and all switches related to negative ICi s
are close. In this case, Is would be equal to the addition of negative coefficients flowing in
the leftward direction. It should be noted that the leftward direction is unwanted because it
is not compatible with the following stage, delay/division.
To avoid this problem and keep the Is positive in any condition, a compensating current
of Ib is added to the coefficient currents at the node SU M . The value of Ib is equal to the
absolute value of the addition of all negative coefficients, consequently, IS is guaranteed to
be positive and ready to go to the delay/division stage.
The current mirrors generating Ib pass the current if and only if the corresponding sign
bit is 1. In this way, the Ib has the minimum required value to keep Is positive which is
necessary to minimize the power consumption when programming the filter with different
coefficients. It should be noted that adding a constant value of Ib to the coefficients does
not affect the filter performance as will be proven in the following.
The following equation below is derived from equation (6.3) considering the presence
of the constant current Ib :
Y =−

M
−1
X
i=0

(ICi xi0 + Ib ) +

N
−1
X
j=1

2

−j

M
−1
X

(xij ICi + Ib )

(6.4)

i=1

77

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

The equation (6.9) can be written as:

Y = −Ib −

M
−1
X

ICi xi0 +

i=0

The geometric series of

PN −1
j=1

N
−1
X

2−j

j=1

M
−1
X

xij ICi + Ib

i=1

N
−1
X

2−j

(6.5)

j=1

2−j converges absolutely to 1 for an infinite series. For a

16-tap filter, the mentioned geometric series converges to 1 with the error of %0.003 which
is negligible in the filter performance. Consequently, (6.5) can be rewritten to:

Y = −Ib −

M
−1
X
i=0

ICi xi0 +

N
−1
X
j=1

2

−j

M
−1
X

xij ICi + Ib

(6.6)

i=1

In which, −Ib and Ib cancel out each other which makes the equations of (6.3) and (6.6)
equal.
Going back to the filter′ structure, the positive Is enters the delay/division stage through
the transistor Ms at every clock cycle as depicted in Fig. 6.2.
A case study of a random Is generated by random digital inputs is compared to the clock
and other relevant operating waveforms in Fig. 6.4. At every clock cycle, Is is added to the
delayed feedback current of If which has the initial value of zero as shown in Fig. 6.4. The
summation results, Iin = Is + If , is delayed and divided by two in each clock cycle for the
first seven clock cycles while the RESET switch is close to generate If for the next clock
cycle, such that If (T ) = Iin(T − 1). The division by two is achieved by a current mirror
at the feedback branch, in which width of M2 is half of that of M1 . The delay and division
steps take one clock cycle to generate the feedback current, If , which is added to the new
Is at the next rising edge of the clock. It should be noted that If is zero at the beginning of
every 8 clock cycles stream because of the RESET switch.
At 8th clock, the switches related to RESET are close which change the direction of
Is through the transistor M9 and let the subtraction happens instead of the addition, such

78

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

that Iout = Iin (T8 ) = If (T8 ) − Is (T8 ).
The direction of the output current, Iout , can be positive or negative depending on which
one of Is or If is higher. To take the direction of Iout into consideration, a current-mode
comparator (see the dashed-box in Fig. 6.3) produces the Control signal which goes low
in case of If < Is . In this case, the positive Iout flows through transistor M3 . When Iout is
negative, the Control signal goes high and M4 provides the path for the negative current
to flow.
In the following subsections, the DAC and delay/division stage performance are discussed.

6.4.1

DAC

To generate the coefficient currents of ICi from the digital coefficients of Yi , a group of
eight 5-bit DACs are utilized. The configuration of the DAC employed in the proposed
filter structure is shown in Fig. 6.5. This design is the modified architecture of the MDAC
that proposed in[13] which utilizes the combination of AN D gates and weighted current
mirrors to reduce the area and static power consumption compared to the conventional
MDACs [13].
As shown in the dash-dotted box in Fig. 6.5, a reference current of Iref is fed to the input
of the DAC. Here, Iref is 1µA and equivalent to the one-bit weighted current. The reference
current of 1µA is then multiplied to the digital coefficient value of Y = y4 y3 y2 y1 y0 through
the AN D gates and weighted current mirrors. The direction of ICi is determined by the
sign bit of y4 and can be leftward in the case of y4 = 1 or rightward in the event of y4 = 0.
As shown in Fig. 6.5, if all the bits of Y are zeros the output of the DAC would be
zero because all the switches of SW10−22 are open. However, there would be a power
dissipation due to the reference current producer. To eliminate the power consumption due
to the reference current, the transistor M12 turns on by the OR of Y bits. Therefore, M12 is
on letting Iref flow if and only if one the bits of Y is 1.

79

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

reference current
M14 M16
y0
y1
y2
y3

M13

M15

SW10

2

3

SW12 SW11

SW12

M0

M1

Wn/Ln

2Wn/Ln

Iref

2

4Wp/Lp

Wn/Ln

SW20

M2
Wn/Ln

3

SW10

y1
y0

SW11

y1
y0

SW12

y3
y2

SW20

y3
y2

SW21

y3
y2

SW22

M7

M6
Wp/Lp

1

y1
y0

Wp/Lp

Wp/Lp

1

M12

M5

M4

2

M10W2/L2

W2/L2

y5

SW22 SW22 SW21
y5

y5

M3
2Wn/Ln

M11

M8 W1/L1

W1/L1

ICi

M9

Figure 6.5: The 5-bit DAC structure [13]

80

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

15

DAC output

10
5
0
I out(µA)
Error ( µA)
Error %
11111 11010 10101 00000 00101 01010 01111

digital input
Figure 6.6: The 5-bit DAC output current of ICi (solid line), the exact error calculated
from Error(µA) = ICi − Iideal shown by dashed line, the error percentage calculated from
100·(ICi −Iideal )
shown by dash-dotted line.
Iideal

Figure 6.7: The family plot of the DAC output current vs. the analog equivalent of the
digital input achieved from 500 runs of Monte Carlo simulations.

81

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

(a)

(b)

(c)

(d)

Figure 6.8: The input and output currents of two cascaded delay cells and a current divider
for four random input currents of 4.98µA, 20.02µA, 60µA, and 99.98µA..

82

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

2.5

ﬀ

Error (%)

2

fs
1.5
1

sf
tt

0.5

ss
0
0

20

40

Iin( µA)

60

80

100

Figure 6.9: Corner analysis of tt (Typical NMOS Typical PMOS), ff (Fast NMOS Fast
PMOS), fs (Fast NMOS Slow PMOS), ss (Slow NMOS Slow PMOS), and sf (Slow NMOS
Fast PMOS) of the error percentage occurs in the feedback branchs output current.
Table 6.1: DAC transistors dimentions.
M0 ,M1 ,M2 ,M3
M4 ,M5 ,M6
M7

1/3
4.5/3
6

M8 , M9
M10 ,M11
M12 ,M13 ,M14 ,M15

2.5/1
6/1
3/2.5

The transistor M12 turns on in the triode region while the diode connected transistors
of M13−15 are in the saturation region. The transistor M15 has the same size as M12 and
added to the design to decrease the VGS of M12 furthermore leading to reduce the reference
current to the desirable value of 1µA without increasing the sizes of M14 and M16 . The
dimensions of the DAC transistors are shown in table 6.1.
Fig. 6.6 shows the output results of the DAC achieved from the post-layout simulation. As shown in this figure, the output current of ICi varies from −15µA to 15µA corresponding to the coefficients changing from 11111 to 01111. The exact error is measured by subtracting the expected output of Iideal from the measured value of ICi such that
Error(µA) = ICi − Iideal . The exact error is too small compared to the output of the DAC,
ICi , and cannot be presented properly in the same figure (See the dashed line in Fig. 6.6).

83

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

Consequently, the error percentage (dash-dotted line) is calculated from

100·(ICi −Iideal )
Iideal

to

provide more readable error measurements.
The effect of the components’ mismatch and process variation on the accuracy of the
designed DAC is shown by the 500 runs of Monte Carlo simulations the results of that are
illustrated in Fig. 6.7.
The DAC output error causes nonideality in the filter performance which is modeled in
the filter equation as follows:

Y [n] = −

M
−1
X

(ICi + δi )xi0 +

i=0

N
−1
X

2−j

j=1

M
−1
X

xij (ICi + δi )

(6.7)

i=1

in which, δi is the error corresponding to the coefficient ICi . The error can be achieved by
the subtraction of equation (6.7) from (6.3)as follows:

E=−

M
−1
X

δi xi0 +

i=0

N
−1
X
j=1

2−j

M
−1
X

δi xij

(6.8)

i=1

The error value varies from the minimum of −δ0 +(−δ1 +δ1 )+(−δ2 +δ2 )+· · ·+(−δM −1 +
PM −1
δM −1 ) = −δ0 in case that all bits are 1 to the maximum of i=0
δi . The minimum error
can be reduced to zero if all bits are 1 except x00 .
The error in the DACs outputs of ICi can introduce ripple in the pass-band, reduce
the pass-band width of the filter, and decrease the stop-band attenuation [7]. Keeping the
coefficients error within the current range equivalent to one bit reduces these unwanted
effects. In our design, the current range equivalent to one bit is 1µA and due to the results
shown in Fig. 6.5 and Fig. 6.6 the error introduced by the DAC is less than 1µA.

6.4.2

Current-Mode Delay Cell

As explained in section 6.3, the delay/division steps are needed to be repeated 8 times to
generate the output of the filter. That means the accuracy of the two cascaded delay cells

84

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

of D1 and D2 plays a key role in the filter performance. As shown in Fig. 6.3, the cascaded
delay cells of D1 and D2 work with the controlling signals of CLK and CLK respectively
and connected to each other by a current mirror.
Although the filter structure can afford all the inverse controlling signals, signals, to
be generated by utilizing a N OT gate in the path of the corresponding controlling signal,
here, we use PMOS switches to be controlled by signals instead of by signals to avoid the
delays introduced to the signal paths by N OT gates and to take advantage of better time
matching.
The delay cell (consider D1 ) works based on the transistor M5 and the capacitors. The
transistor M5 needs to be able to store the input current of Iin at the high level of the CLK
signal and release it at the low level of that. At the high level of the clock cycle, switches
Sa , Sb , and Sc are closed and Iin is charging capacitors C1−3 . When capacitors are fully
charged, the current flowing through them is zero (sampling stage). That means the whole
Iin passes through M5 and M6 . The cascaded transistor M6 working in the saturation region
is utilized to reduce the channel length modulation effect and to keep the Vds of M5 constant
while it is disconnected from the input current at the low level of CLK (hold stage).
At the low level of the clock, M8 is on because of the capacitors connecting to it’s gate
keeping the current of M5 at the same value it had at the high level of the clock. In fact,
the source follower M8 is utilized to prevent the changes at the output current to affect the
sampled current.
In the sampling stage, the switch Sb is closed slightly earlier than Sc which itself is
closed slightly earlier than Sa to reduce the switches charge injection using feedthrough
techniques. Otherwise, the charge injection changes the voltage stored in C1 by ∆V which
consequently changes the stored current by gm ∆V [14, 15].
Considering this technique, the values of capacitors are set considering the following

85

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

Figure 6.10: The 1000 runs Monte Carlo simulation family plot of the second delay cell
output current for four random input currents of 100µ, 58µA, 24µA, and 6µA.
rules [14]:
C1 = C2 = 10C3

(6.9)

Here, capacitors’ sizes are chosen to be C1 = C2 = 100f F and C3 = 10f F . Biasing
voltages, Vbias1 and Vbias2 are 1.4V and 560mV respectively to keep M5 biased in the
saturation region.
Fig. 6.8 displays the post-layout simulation results of the two cascaded delay cells of
D1 and D2 , and the current divider for four sample input currents of 91.55µA, 45.09µA,
19.55µA, and 4.38µA in CMOS 0.18µm technology. Each of the delay cells detains the
current for half of the clock cycle. The output current of the first delay cell, ID1 , is equal
to 21 Iin (T − 1/2) and is introduced as the input current to D2 . The feedback branch output

86

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

0

Magnitude

Magnitude

1
tt
ss
sf
fs
ﬀ
0.833

1.666

2.5

3.333

tt
fs
sf
ﬀ

4.166

0

5

0.4

Frequency (MHz)

(a)

0.8

1.2

1.6

2

2.4

Frequency (KHz)
(b)

0

5

Phase

Phase

0

0

1

2

3

Frequency (MHz)

(c)

4

5

0

0.5

1

1.5

2.0

2.5

Frequency(KHz)

(d)

Figure 6.11: The frequency and phase responses of the DA-based BPF and LPF.

87

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

current is shown as If = ID2 .
The ideal output current of the feedback branch is expected to be equal to the half of
the input current at one clock cycle earlier. Consequently, the error percentage is calculated
as follows:

E=

ID2 (T − 1) − Iin (T )/2
Iin (T )/2

(6.10)

Fig. 6.9 provides the error percentage for the input current range varies from 0 to 100µA to
evaluate the performance of the filters feedback branch. Using the mentioned formula the
error percentage for Fig. 6.8 (a), (b), (c), and (d) are calculated 1.38%, 0.56%, 0.8%, and
1.12%.
The corner analysis results are presented in Fig. 6.9 to estimate the effect of the variation
of fabrication parameters on the feedback branch performance. As shown in this figure, the
maximum error percentage is 2.3% which occurs for f f (Fast NMOS Fast PMOS) and f s
(Fast NMOS Slow PMOS) corner analysis at the input current of 1µA and for ss (Slow
NMOS Slow PMOS) at the input current of 100µA. For the most values of input currents,
error percentage is less than 1% considering fabrication parameters variation.
Fig. 6.10 shows the 1000 runs Monte Carlo family plot of the output current of the second delay cell for four random input currents of 100µ, 58µA, 24µA, and 6µA to represent
the mismatch and process variation effect on the performance of the feedback branch.
The error introduced by the feedback branch affects only the second part of the equation
6.3. The error can be modeled as a constant error to which another constant error is added
at every clock cycle as shown as follows:

88

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

Y = −(IC0 x00 + IC1 x10 + · · · + IC(M −1) xi(M −1) )
+2−1 (x11 IC1 + x21 IC2 + · · · + x(M −1)1 IC(M −1) + δ1 )
+2−2 (x12 IC1 + x22 IC2 + · · · + x(M −1)2 IC(M −1) + δ1 2−1

(6.11)

+δ2 ) + · · · + 2−(N −1) (x1(N −1) IC1 + x2(N −1) IC2 + · · ·
+x(M −1)(N −1) IC(M −1) + δ1 2−(N −2) + δ2 2−(N −3) + · · ·
+δN −1 )
In which, δj is the error which comes from the j th feedback in the filter. The deviation of
the equation (6.11) from the equation (6.3) is the error introduced by the feedback branch
calculated by:
E = 2−1 δ1 + 2−2 (δ1 2−1 + δ2 ) + · · · + 2−(N −1)

(6.12)

.(δ1 2−(N −2) + δ2 2−(N −3) + · · · + δN −1 )
Which can be rewritten as:
E = (2−1 δ1 + 2−2 δ2 + · · · + 2−(N −1) δN −1 )

(6.13)

.(1 + 2−2 + 2−4 + · · · + 2−(N −2) )
By substituting (6.13) in (6.11) and using the geometric series of

PN −2
j=1

4−j , (6.11) is

rewritten as:
Y =−

M
−1
X
i=0

ICi xi0 +

N
−1
X
j=1

N −2+1

1 − 14
2 (δj
1−
−j

1
4

+

M
−1
X

xij ICi )

(6.14)

i=1

Based on (6.13) and (6.14), the maximum error in the output of an N-bit filter caused by
the error introduced by analog circuits of the feedback branch is calculated considering
P −1 −j
P N2−2 −j
δ1 = δ2 = · · · = δN −1 = δmax and using geometric series of N
2
and
as
j=0 4
j=1
follows:

89

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

N

1 − 14
1 − 21
−
1)(
E = δmax (
1 − 21
1−

N −4
2

1
4

)

(6.15)

Assuming the error that is added to the feedback branch at every clock cycle is equal to
the maximum possible error, the maximum total error generated in the feedback branch is
0.69µA which occurs for Iin = 100µA. For an 8-bit filter, the maximum output error due
to the feedback branch error is 0.69µA · 1.24 = 0.86µA. The error would be 0.92µA for
the infinite number of bits.

6.5

Results Discussion

In this section, the results of the proposed 16-tap 8-bit FIR filter implemented in CMOS
0.18µm technology are presented. The DACs provide the external access to the filter coefficients for a reconfigurable structure. A band-pass and a low-pass FIR filter are implemented
to show the adjustability of the proposed architecture.
For the BPF, the sampling frequency of fs is 10M Hz, therefore, based on our discussion in sub-section 6.4.2, the CLK period is set to 100ns.
The LPF is designed with fs of 48KHz, while for both filters, the input precision and
number of taps are eight and sixteen respectively.
Table 6.2 demonstrates the coefficients of both filters. Ideal coefficients are achieved
from MATLAB for the 16-tap filters with the defined pass-band and stop-band. To convert
these coefficients into the (mapped) currents within the affordable range of the utilized
DACs, they are all multiplied to a constant to keep the consistency of the coefficients. The
inputs of the DACs (DAC(in)) are chosen considering the mapped current values to provide
the closest 5-bit number that can generate the similar mapped current value. As an instant,
the input of the DAC that provides the closest value to mapped current of 4.8µA is 00101
which generates the current of 5µA. DAC (out) denotes the measured output currents of

90

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

Table 6.2: Filter’s Coefficients.
BPF
ideal
-0.02424
+0.02885
+0.02091
+0.02114
+0.09029
-0.16071
-0.24275
+0.28154
+0.28154
-0.24275
-0.16071
+0.09029
+0.02114
+0.02091
+0.02885
-0.02424

mapped
(µA)
-1.29
1.53
1.11
11.26
4.8
-8.56
-12.93
15.01
15.01
-12.93
-8.56
4.8
11.26
1.11
1.53
-1.29

LPF

DAC(in)
10001
00010
00001
01011
00101
11000
11101
01111
01111
11101
11000
00101
01011
00001
00010
10001

DAC(out)
(µA)
-1.02
2.11
1.02
11.34
5.04
-8.26
-13.21
15.1
15.1
-13.21
-8.26
5.04
11.34
1.02
2.05
-1.02

ideal
-0.04843
0.03164
0.06631
0.01611
-0.07617
-0.04178
0.18376
0.41718
0.41718
0.18376
-0.04178
-0.07617
0.01610
0.06630
0.03164
-0.04843

mapped
(µA)
-1.74
1.14
2.38
0.58
-2.74
-1.5
6.61
14.99
14.99
6.61
-1.5
-2.74
0.58
2.38
1.14
-1.74

DAC(in)
10010
00001
00010
00001
10011
10001
00111
01111
01111
00111
10001
10011
00001
00010
00001
10010

DAC(out)
(µA)
-2.06
1.01
2.09
1.02
-2.98
-1.02
7.15
15.13
15.13
7.15
-1.02
-2.98
1.02
2.09
1.01
-2.06

the DAC while all connected to each other at the node SU M .
The digital random numbers are serially introduced to the circuit through the sixteen
8-bit shift-registers as the input of the filters. At each clock cycle, the inputs bits move
forward to complete the filtering cycle. The magnitude and phase responses of the BPF
and the LPF are illustrated in Fig. 6.11. The number of data points that are collected to
get these frequency responses are 2184. Process variation is considered and represented in
these simulations by performing corner analysis. The phase responses of these symmetrical
filters are shown to be linear within the pass-band. The solid lines in Fig. 6.11(c) and (d)
represent the ideal phase and the dotted lines show the post-layout simulation results.
The layout of the proposed design is shown in Fig. 6.12 and the area of the DA architecture considering shift registers and DACs is 0.071mm2 . The maximum power consumption
is measured 2.2mW . Table 6.3 provides a summary of the performance of the proposed
implementation and a comparison between different implementations of FIR filters in the
number of taps, sampling frequency, power dissipation, supply voltage, and the technology
node.

91

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

Figure 6.12: The layout of the 8-bit 16-tap mixed-signal filter based on DA.

Table 6.3: Comparison of the proposed filter with recent published filters

Filter
Proposed
[7]
[9]
[16]
[17]
[18]

# taps

Technology
node

Power
consumption

Sampling
frequency

Supply
voltage

Area
(mm2 )

16
16
4
5
6
4

0.18µm
0.5µm
0.8µm
0.18µm
90nm
0.18µm

2.2mW
16mW
—
3.6542mW
4.35mW
4.1mW

10MHz
50kHz
1MHz
—
—
—

1.8
5
2
5
1
1.8

0.071
1.125
1.3
—
0.239
0.52

92

6. LOW-POWER MIXED-SIGNAL IMPLEMENTATION OF THE DA-BASED FIR FILTER

The proposed filter’s power consumption is significantly lower than that of the similarly implemented FIR filters. This difference becomes more significant when it takes into
consideration that increasing order of the filter would increase the power consumption.
Moreover, high sampling frequency increases the power consumption of digital sections of
these circuits such as the shift registers and switching transistors. It should be mention that
the sampling frequency is not reported for some of these works. The cut-off frequency,
instead, is reported to be equal to 13.5MHz and 10MHz in [17, 18] respectively, for [16]
bandwidth of 40MHz is reported. As can be seen in table 6.3, the proposed structure’s
area, and power efficiency makes it an excellent choice for portable devices where these
two criteria matter the most.

6.6

Conclusion

A current-mode mixed-signal implementation of a distributed arithmetic-based FIR filter
is proposed in this chapter. The proposed structure is utilized to implement a band-pass
and a low-pass filter to prove the tunability. Sixteen-tap 8-bit filters are realized at different
sampling frequencies in 0.18µm CMOS technology, and magnitude and phase responses
are achieved considering the process variations parameters. Avoiding current to voltage
converters, adders, and dividers results in a low-power area-efficient structure that is an
excellent choice for portable devices. Sampling frequencies for the BPF and the LPF are
10M Hz and 48KHz correspondingly. The area and the maximum power consumption of
the proposed structure are 0.071mm2 and 2.2mW respectively that are significantly lower
compared to that of the similar works considering tap numbers, input data number of bits,
and sampling frequency.

93

References
[1] S. Zohar, “New Hardware Realizations of Non-recursive Digital Filters,” IEEE Trans.
Comput., vol. C-22, no. 4, pp. 328-338, Apr. 1973.
[2] A. Peled and B. Liu, “A New Hardware Realization of Digital Filters,” IEEE Trans.
Acoust., Speech, Signal Process., vol. 22, no. 6, pp. 456-462, Dec. 1974.
[3] P. Sirisuk, A. Worapishet, S. Chanyavilas, and K. Dejhan, “Implementation of
switched-current FIR filter using distributed arithmetic technique: Exploitation of digital concept in analogue domain,” IEEE International Symposium on Communication
Technologies, vol. 1, pp. 143-148, Oct. 2004.
[4] D. J. Allred, H. Yoo, V. Krishnan, W. Huang, and D. V. Anderson, LMS adaptive filters
uisng distributed arithmetic for high throughput, IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 52, no. 7, pp. 13271337, Jul. 2005.
[5] S. N. Merchant and B. V. Rao, Distributed arithmetic architecture for image coding, in
Proc. IEEE Int. Conf. TENCON89, pp. 7477, Nov. 1989.
[6] S. A. White, Applications of distributed arithmetic to digital signal processing: A tutorial review, IEEE Trans. Acoust, Speech, Signal Process., vol. 37, no. 1, pp. 419, Jan.
1989.
[7] E. zalevli, W. Huang, P. E. Hasler, and D. E. Anderson, “A Reconfigurable MixedSignal VLSI Implementation of Distributed Arithmetic Used for Finite-Impulse Response Filtering,” IEEE Trans. Circuits Syst. I, vol. 55, no. 2, pp. 510-521, Mar. 2008.
[8] P. K. Sharma, M. T. Khan, and S. R. Ahamed,“An alternative approach to design reconfigurable mixed signal VLSI DA based FIR filter,” In IEEE Technology Symposium
(TechSym), pp. 284-288, Sep. 2016.
[9] F. A. Farag, C. Galup-Montoro, and M. C. Schneider,“Digitally Programmable
Switched-Current FIR Filter for Low-Voltage Applications,” IEEE J. Solid-State Circuits, vol. 35, no. 4, pp. 637-641, Apr. 2000.

94

REFERENCES

[10] A. Worapishet, R. Sitdhikorn, A. Spencer, and J. B. Hughes,“A Multirate SwitchedCurrent Filter Using Class-AB Cascoded Memory,” IEEE Trans. Circuits Syst. II, vol.
53, no. 11, pp. 1323-1327, Nov. 2006.
[11] R. Wilcock, B. M. Al-Hashimi, and P. Wilson,“Integrated high bandwidth wave elliptic lowpass switched-current filter in digital CMOS technology,” Electron. Lett., vol.
41, no. 5, pp. 222-223, Mar. 2005.
[12] A. Croisier,D. Esteban,M. Levilion, and V. Riso, “Digital filter for PCM encoded
signals,” U.S. Patent 3 777 130, Dec. 1973. IEEE J. Solid-State Circuits, No. 1, pp.
27-33, Feb. 1988.
[13] B. Youssefi, M. Mirhassani, J. Wu,“Efficient Mixed-Signal Synapse Multipliers for
Multi-Layer Feed-Forward Neural Networks,”IEEE International Midwest Symposium
on Circuits and Systems, pp. 814-817, Oct.2016.
[14] S. J. Daubert, D. Vallancourt, Y. P. Tsividis, “Current copier cells,” Electron. Lett.,
vol. 24, no. 25, pp. 1560-1562, Dec. 1988.
[15] G. Wegmann,and E. A. Vittoz. “Basic principles of accurate dynamic current mirrors,”
IEE Proceedings G-Circuits, Devices and Systems, No. 2, pp. 95-100, Apr. 1990
[16] S. Arvind Rathod,and S. Yellampalli, “Design of Fifth Order Elliptic Filter With
Single-Opamp Resonator”, IEEE ICAECC, pp. 1-6, Oct. 2014.
[17] M. S. Oskooei, N. Masoumi, M. Kamarei, and H. Sjoland, “A CMOS 4.35-mW +22dBm IIP3 Continuously Tunable Channel Select Filter for WLAN/WiMAX Receivers,”
IEEE J. Solid-State Circuits, vol. 46, no. 6, Jun. 2011.
[18] S. DAmico, M. Conta, and A. Bashirotto, “A 4.1-mW 10-MHz fourth-order sourcefollower based continuous-time filter with 79-dB DR,” IEEE J. Solid-State Circuits,
vol. 41, no. 12, pp. 2713-2719, Dec. 2006.

95

Chapter 7
Conclusions and Future Works

7.1

Summary of Contributions

In this dissertation, it is mainly focused on the mixed-signal design of a fully parallel artificial neural network. First, the advantages of mixed-signal circuit design were explored,
then the two different composing building blocks, synapse, and neuron were introduced.
Lastly, the generalization ability of the neural networks as a widespread problem in NNs
was discussed.
In Chapter 2, we proposed the VLSI implementation of a programmable neuron to address the generalization issue of ANNs. The proposed structure provides different maximal
slopes of sigmoid and linear functions. The mentioned various activation functions can be
chosen on-chip or off-chip by a 2-bit voltage DAC.
The programmability was achieved by using body effect via controlling the substrate
voltage of PMOS transistors. The post-layout simulations, Monte Carlo, and corner analysis were performed to confirm the robustness of the design. To the best of authors knowl-

96

7. CONCLUSIONS AND FUTURE WORKS

edge, the proposed architecture is the first analog VLSI implementation that can provide
different shapes of activation function post-fabrication.
In Chapter 3, a mixed-signal synaptic multiplier was proposed. The structure works
based on the weighted current mirror in combination with AND gates. Using this structure, we addressed the area and power efficiency by avoiding the largest size transistor in
the conventional multiplying DACs and cutting out the currents when the operation is not
needed. To reduce the mismatch effect in weighted current mode, we designed the structure
with two similar building blocks and avoid the size differences between transistors.
In Chapter 4, the proposed synaptic DAC and a current-mode area-efficient neuron
were used together as a synapse-neuron building block of a feed-forward 4-3-2 ANN. A
series of patterns were successfully recognized with this structure. The area was measured
142299µm2 which is one of the smallest reported areas of the synaptic multipliers. The
average power was measured 0.93mW which is much lower compared to the state of the
art designs.
In Chapter 5, the nonlinear dynamic behavior of a single neuron with the sigmoid activation function and a feedback synaptic weight was investigated. We were interested in the
possible oscillatory behavior of this structure to be used for neural oscillation applications.
The linearization method was used to investigate the dynamic behavior of the structure
by linearizing the function around a fixed point to assess the local stability. Different dynamic behaviors of this system which were achieved based on the locus of Z in ROC were
investigated.
In Chapter 6, a novel mixed-signal structure of a DA-based FIR filter was proposed.
The structure is current-mode and employs DACS with the current-mode outputs to do the
multiplication between digital inputs and the analog coefficients. Two 16-tap 8-bit filters
were implemented, one is a BPF with the sampling frequency of 10MHz and the other is
an LPF with the sampling frequency of 48KHz. The area is a t least 5 times smaller that
similar works.

97

7. CONCLUSIONS AND FUTURE WORKS

7.2

Suggested Future Work

Multiresolution learning paradigm is an issue which can be further studied to investigate
the effect on the generalization ability as well as the learning speed. The learning speed
of the neural network discussed in Chapter 4 can be improved by controlling the activation
function of each neuron. Also, an improvement in the pattern recognition is expected by
using the programmable neuron suggested in Chapter 2.
It is also highly suggested to test the neural network with distributed neuron-synapse
blocks by assigning different activation function in each layer. That is possible if we use
the adjustable neuron proposed in Chapter 2. In this way, we most probably can address the
saturation and overfitting the weights in a higher level compared to non-distributed circuits.
The nonlinear dynamic behavior of a single neuron with the sigmoid activation function
which is investigated in Chapter 5 suggests the possibility to be used in oscillator and
spiking neuron applications. The suggested structure can be implemented both in analog
and digital circuitries. Base on the nature of this work, the implemented oscillator would
be robust with a very low jitter. Also, the analog implementation of this structure is highly
recommended to realize the spiking neuron. This implementation should most probably
is more robust and area-efficient by avoiding the capacitors that has been used so far in
spiking neurons implementations.

98

VITA AUCTORIS

NAME: Bahar Youssefi
PLACE OF BIRTH: Tehran, Iran
YEAR OF BIRTH: 1984
EDUCATION:
University of Isfahan, B.Sc., Isfahan, Iran, 2008
Tarbiat Modares University, M.Sc., Tehran, Iran, 2011
University of Windsor, Ph.D. Windsor, ON, 2018

99

Powered by TCPDF (www.tcpdf.org)

