Can deep-sub-micron device noise be used as the basis for probabilistic neural computation? by Hamid, Nor Hisham
Can deep-sub-micron device noise be 
used as the basis for probabilistic neural 
computation? 
Nor Hisham Hamid 
iF '1  N A 
A thesis submitted for the degree of Doctor of Philosophy. 




This thesis explores the potential of probabilistic neural architectures for computation with fu-
ture nanoscale Metal-Oxide-Semiconductor Field Effect Transistors (MOSFET5). In partic-
ular, the performance of a Continuous Restricted Boltzmann Machine (CRBM) implemented 
with generated noise of Random Telegraph Signal (RTS) and 1/f form has been studied with 
reference to the 'typical' Gaussian implementation. In this study, a time domain RTS based 
noise analysis capability has been developed based upon future nanoscale MOSFETs, to rep-
resent the effect of nanoscale MOSFET noise on circuit implementation in particular the 
synaptic analogue multiplier which is subsequently used to implement stochastic behaviour 
of the CRBM. The result of this thesis indicates little degradation in performance from that 
of the typical Gaussian CRBM. Through simulation experiments, the CRBM with nanoscale 
MOSFET noise shows the ability to reconstruct training data, although it takes longer to con-
verge to equilibrium. The results in this thesis do not prove that nanoscale MOSFET noise 
can be exploited in all contexts and with all data, for probabilistic computation. However, 
the result indicates, for the first time, that nanoscale MOSFET noise has the potential to be 
used for probabilistic neural computation hardware implementation. This thesis thus intro-
duces a methodology for a form of technology-downstreaming and highlights the potential of 
probabilistic architecture for computation with future nanoscale MOSFETs. 
Declaration of originality 
I declare that this thesis has been completed by myself and that, except where indicated to 
the contrary, the research documented in this thesis is entirely my own. 
Nor Hisham Hamid 
111 
Acknowledgements 
First of all, I would like to thank God, the Almighty, for having made everything possible by 
giving me the strength and courage to pursue this PhD. 
I would like to thank my supervisors Prof. Alan F. Murray and Dr. Martin Reekie for their 
advise, guidance, and encouragement, throughout the course of this PhD. I am particularly 
indebted to Prof. Alan Murray, whose expertise, understanding, dedication and patience has 
greatly contributed toward the completion of this thesis. 
I would also like to thank my family for their inspiration, love, moral and financial support, 
and prayers over these past four years. In particular, I am forever indebted to my wife Mazni, 
without her love, encouragement, patience, and understanding, I would never have completed 
this thesis. - 
I greatly appreciate Dr. David Laurenson and Prof. Steve McLaughlin (IDCOM) of Uni-
versity of Edinburgh, and Prof. Asen Asenov, Dr. Scott Roy, Dr. Jeremy Watling, and all 
members of Device Modelling Group of University of Glasgow, for their technical support 
and expertise toward the understanding and implementation of noisy nanoscale MOSFET 
model. 
A very special thank to Dr. Thomas Koickal, Tong Boon, Mark, and Katherine who carefully 
proofread my papers and thesis, and politely pointed out mistakes. I also would like to thank 
all SEE computing staff in particular David Stewart for their dedication in ensuring a first 
class IT support. Finally, I would like to thank all whose direct and indirect support helped 
me completing this thesis. 
I acknowledge the Universiti Teknologi Petronas for providing scholarships to pursue this 
PhD work. 
I dedicate this thesis to my wife, my son Haziq, my daughters Aqeela and Faqeeha. 
iv 
Contents 
Declaration of originality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 	iii 
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv 
Contents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 	v 
List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 
List of tables 	. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 	xii 
1 Introduction 	 1 
	
1.1 	Motivation 	...................................1 
1.2 	Contribution to knowledge ...........................3 
1.3 	Chapter layout ..................................4 
2 	Low Frequency Noise in Nanoscale MOSFETs 7 
2.1 	Introduction 	.................................... 7 
2.2 	Random Telegraph Signal (RTS) Noise ..................... 8 
2.2.1 	Origin of traps 	............................. 8 
2.2.2 RTS noise amplitude 	........................... 10 
2.2.3 	RTS average capture and average emission time ............ 11 
2.2.4 Discussion 	............................... 12 
2.3 	Flicker (11f 
) 
Noise ............................... 14 
2.3.1 	11f noise model 	............................ 14 
2.3.2 Discussion 	............................... 15 
2.4 	Summaiy 	.................................... 16 
3 	Modelling 'Noisy' MOSFETs 17 
3.1 Introduction 	................................... 17 
3.2 Methodology Overview ............................. 18 
3.3 Generating 11f Noise .............................. 20 
3.3.1 	Methodology: 11f noise 	........................ 20 
3.3.2 Implementation: 11f noise ....................... 21 
3.4 Generating RTS Noise 	............................. 23 
3.4.1 	Assumptions: RTS noise 	........................ 24 
3.4.2 RTS Amplitude 	............................. 25 
3.4.3 	RTS mean capture and emission time 	................. 27 
3.4.4 Methodology: RTS noise ........................ 29 
3.4.5 	Implementation: RTS noise 	...................... 31 
3.5 Results and Comparison: 1/f and RTS Noise 	.................. 35 
3.6 Discussion: Noise Modeling 	.......................... 40 
3.7 Summary 	.................................... 45 
v 
Contents 
4 	Noisy Circuit Implementation 46 
4.1 	Noisy 2-Quadrant Multiplier 	.......................... 46 
4.1.1 	Circuit Description 	............................ 46 
4.1.2 Circuit Implementation ......................... 48 
4.1.3 	Simulation Results and Discussion ................... 48 
4.2 	Noisy 4-Quadrant Multiplier 	.......................... 49 
4.2.1 	Circuit Description 	........................... 51 
4.2.2 Circuit Implementation ......................... 53 
4.2.3 	Simulation Results and Discussion ................... 53 
4.3 	Summary 	.................................... 56 
5 	Probabilistic Neural Computation 60 
5.1 	Introduction 	................................... 60 
5.2 	Continuous Restricted Boltzmann Machine (CRBM) 	............. 61 
5.2.1 	General architecture 	.......................... 61 
5.2.2 A continuous stochastic neuron 	.................... 62 
5.2.3 	CRBM training 	............................. 63 
5.2.4 CRBM in VLSI ............................. 65 
5.3 	Summary 	.................................... 66 
6 Noise in the CRBM 	 67 
6.1 Introduction ...................................67 
6.2 Zero-mean Gaussian noise in CRBM ......................68 
6.3 Non-zero mean Gaussian noise in CRBM ...................69 
6.4 Non-Gaussian 'Pseudo-RTS' noise in the CRBM ...............74 
6.5 CRBM with noise in Multiplier .........................76 
	
6.6 	Summary 	....................................78 
7 CRBM with Nanoscale MOSFET Noise 	 81 
7.1 	Methodology 	..................................81 
7.1 .1 	Noise data in look-up table 	......................82 
7.1.2 Data look-up ..............................84 
7.2 Modelling data with non-symmetric distribution ................85 
7.2.1 	4-traps 	.................................86 
7.2.2 10-trap and 100-trap RTS in a CRBM .................87 
7.3 	Summary 	....................................90 
8 Summary and Conclusion 	 92 
8.1 	Summary 	....................................92 
8.2 	Conclusion 	...................................93 
8.3 	Future work ...................................95 
A MOSFET Channel Voltage Approximation 	 96 
A. 1 Approximating the Quasi-Fermi Potential, V(y) ................ 96 
vi 
Contents 
A.2 Approximating Surface Potential, ?/s ...................... 98 
B Publication list 	 100 
References 	 101 
vii 
List of figures 
1.1 The flowchart illustrating the chapter flow in this thesis . . . . . . . . . . . . . 	6 
2.1 Sample of (a) single trap RTS and (b) flicker noise . . . . . . . . . . . . . . . 	8 
2.2 An illustration of the traps and charges in Si - Si0 2 structures. (Adapted 
from [1]).....................................9 
2.3 RTS noise amplitude variation with gate VGS and drain VDS  voltages for three 
different trap depths (Xt= 0.11, 0.40, 0.80 nm) located in the middle of gate 
(Yt= 16nm) for an implementation based on 35mm NMOS modelled in [2]. . 11 
2.4 Average capture Jr, and average emission t, vs gate VGS and drain VDS  volt-
ages for trap located at Xt= 0.11mm and Yt=  1 6nm for an implementation 
based on 35nm NMOS modelled in [2].....................13 
3.1 (a) Standard n-channel MOS drain-source current 'DS  for a given VGS volt- 
age. (b) noisy n-channel MOS drain current for a given VGS voltage . . . . . . 18 
3.2 Noisy MOSFET components. The noiseless MOSFET is a standard MOS- 
FET having a typical I-V characteristic corresponding to the technology used. 
The noise source m(t) generates noise data with the specific physical charac- 
teristics of the low frequency noise it models . 	. 	. . 	. . 	. 	. . 	. . 	. . 	. 	. . 	. 	. 19 
3.3 Illustration of 11f power spectral density S(f) . 	. 	 . 	 . 	 . 	 . 	 . 	 . 	 . 	 . 	 . 	 . 	 . 	 . 	 . 	 . 	 . 22 
3.4 (a) Single trap RTS noise Power Spectral Density, 5(f) generated using 
Eq.(2.5). (b) Time domain RTS noise generated from Power Spectral Density 
S(f) using Sum-of-Sinusoids techniques . 	. 	. . . 	. 	. . . 	. . 	. 	. . 	. 	. . 	. 	. . 29 
3.5 (a) Single trap RTS noise. (b) multi(3)-trap RTS noise . 	. . . . . . . . . . . . 32 
3.6 I-V characteristic for (a) 0.35km (L0.35tm, W'4m), and (b) 35 nm 
(L=35mm, W0.1m) 'noiseless' MOSFET . . 	. . 	. . 	. . 	. . 	. . 	. . . 	. . . 	. 35 
3.7 I-V characteristic for 11f based noisy (a) 0.35m (L035jm, Wr=1tm), and 
(b) 35 mm (L35mm, W0.11im) NMOS . 	. . 	. . . . . . . . . . . . . . . . . 36 
3.8 I-V characteristics for RTS based noisy (a) 0.35m (with L=0.35im, W1m, 
trap depth Xt = 1.1nm and lateral location yt = 160nm) and (b) 35 nm 
(L35mm, W=0. 1gm, trap depth Xt = 0.1 1mm and lateral location Yt = 
16mm) MOSFETs. The values O, /EB, /ECT and trap VGS were selected 
from Table3.1 corresponding to trap depth Xt 1.1mm.............. 37 
3.9 Time domain noise generated by (a) 0.35tm and (b) 35 mm noisy MOSFET 
simulated at static bias conditions (VDS = 1.5V and VGS = 1V) for lms. 	. 37 
viii 
List of figures 
3.10 The Drain-Source current noise (AIDs) for a 35mm (L=35mm, W0.1m) 
single trap RTS, based noisy MOSFET biased at VGS = 1.OV and VDS = 
0.6V. The drain current with DC value (IDs = 84tA) removed. (Note that a 
longer simulation time (1 second) was needed in order to capture more RTS 
noise). Based on the fixed bias conditions, AIDS, f, and r
, 
in the RTS based 
noisy MOSFET were generated to be 2.62e-6 A, 2.918e-3 s, and 1.279e-2 s, 
respectively . 	. . 	. 	. . 	. 	. . 	. . 	. . 	. 	. . 	. . 	. . 	. . 	. . 	. . 	. . 	. . 	. . 	. . 	. . 38 
3.11 The Power Spectral Density (PSD), 8(f) corresponding to the time domain 
noises shown in Fig.3.9. The dash-dot lines are 11f PSDs generated using 
Eq.(2.7) . 	. . 	. 	. . 	. . 	. . 	. . 	. 	. . 	. 	. . 	. . 	. . 	. . 	. . 	. 	. . 	. . 	. . 	. . 	. 	. . 39 
3.12 The Power Spectral Density (PSD), S(f) of a single trap RTS plotted in 
Fig.3.10. The dashed-line is the calculated PSD based on Eq.(2.5) . . . . . . . 39 
3.13 (a) Normalised (AID/I D) RTS amplitude for three different trap depths x= 
0.1 mm, 0.4mm, 0.8nm with trap lateral position (measured from source) 
fixed at yt=  16mm. (b) Normalised (AID/ID) RTS amplitude for three differ- 
ent trap lateral positions (measured from source) Yt= 5nm, 1 6nm, 30nm with 
trap depth fixed at Xt = 0.11m. The simulation was based upon parameters 
corresponding to Xt = 1.1nm in Table 3.1................... 41 
3.14 Mean capture and emission times for three different trap depths with fixed 
yt= 16nm (a) Xt= 0.11mm, (b) Xt= 0.4nm, (c) Xt = 0.8mm........... 42 
3.15 Mean capture and emission times for three different trap lateral locations with 
fixed Xt =  0.11mm (a) yt=  5mm, (b) Yt=  16mm, (c) Yt=  30mm.......... 43 
3.16 RTS noise generated by noisy MOSFET implemented with (a) 3 Traps and (b) 
10 Traps. Trap parameters were based on the value in Table 3.1 corresponding 
to Xt = 1.1mm. VDS  and VGS were set to 0.6V and lv respectively....... 44 
3.17 Power Spectral Density of 1-trap, 3-trap, and 10-trap RTS noise. Dashed-line 
indicates the 11f noise PSD .. 	. . 	. 	. . 	. 	. . 	. . 	. . 	. 	. . 	. . 	. . 	. .. 	. . 	. 	. . 44 
4.1 2-quadrant Chible multiplier . 	. 	. . 	. 	. . 	. . 	. . 	. . 	. 	. . 	. . 	. 	. . 	. . 	. . 	. . 47 
4.2 35nm CMOS technology 2-quadrant multiplier output current 1out....... 49 
4.3 35nm CMOS technology 2-quadrant multiplier output current with M 1 , M 4 , 
and M 5 replaced with (a) 11f and (b) single trap RTS based noisy n-MOSFETs 50 
4.4 (a) 11f and (b) single trap RTS based noisy MOSFET transient plot with 
VDD, 	Vref were set to 1.5V, 1V, 0.6V, and 0.75V, respectively. Note 
that longer simulation time was necessary for RTS noise in order to capture 
more noise data . 	. . 	. . 	. . 	. . 	. . 	. . 	. 	. . 	. . 	. . 	. . 	. . 	. . 	. . 	. . 	. . 	. . 50 
4.5 Power spectral density generated from time domain noise data in Fig.4.4. 	. 51 
4.6 The modified Chible 4-quadrant multiplier adopted from [3]. (a) one- com- 
puting cell of the modified Chible multiplier (b) the full 4-quadrant multiplier 
circuit composed of two computing cells . 	. 	. . 	. . 	. . 	. 	. . 	. . 	. . 	. 	. . 	. . 52 
4.7 35nm CMOS technology 4-quadrant multiplier output current I t....... 54 
4.8 The output current 'out  for (a) 4-trap, (b) 10-trap, and (c) 100-trap 4-quadrant 
noisy multiplier simulated for 250ms with 1is time step . . . . . . . . . . . . 55 
lx 
List of fivres 
4.9 The power spectral densities (PSDs) of the time domain noise data shown in 
Fig.4.8, generated using periodogram function in Matlab . . . . . . . . . . . . 56 
4.10 4-trap noisy synaptic multiplier output noise (20000 data points) for (a) V = 
0.3V & Vi,, = 0.65V (b) V,, = 0.6V & V2 = 0.7 (c) V = 1.05V & 
17 = 0.8V.................................... 57 
4.11 Cumulative normalised noisy synaptic multiplier output noise amplitude im-
plemented with (a) 4-trap (b) 10-trap (c) 100-trap . 	. . . . . . . . . . . . . . 58 
5.1 CRBM network with 3 visible neurons and 4 hidden neurons . . . . . . . . . . 62 
5.2 Continuous valued CRBM neuron . . . . . . . . ... . . . . . . . . . . . . . . 63 
5.3 One-step Gibb sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 
5.4 Typical CRBM neuron circuit implementation . . . . . . . . . . . . . . . . . 65 
6.1 (a) Training data. (b) 20-step reconstruction of the CRBM with zero mean 
Gaussian noise . 	. 	. . 	. 	. . 	. . 	. 	. . 	. . 	. . 	. 	. . 	. 	. . 	. 	. 	. 	. 	. 	. 	. . 	. . 	. 	. 68 
6.2 (a) Weight vector for bias neuron (b) Weight vector for hidden neurons. 	. . 69 
6.3 (a) Visible neurons and (b) Hidden neurons noise control parameters evolu- 
tion with training epoch . 	. 	. . 	. . 	. 	. . 	. . 	. 	. 	. . 	. . 	. 	. . 	. 	. 	. . 	. 	. 	. . 	. 70 
6.4 20-step reconstruction of the CRBM injected with non-zero mean ñ, = 0.8, 
trained for 5000 epochs . 	. 	. . 	. 	. 	. 	. . 	. . 	. 	. . 	. 	. . 	. . 	. 	. . 	. 	. . 	. . 	. . 71 
6.5 20-step reconstruction by the CRBM injected with non-zero mean (n2 = 2) 
Gaussian noise (a) after 500 training epochs (b) after 30000 epochs (c) after 
40000 epochs . 	. 	. . 	. 	. . 	. 	. 	. . 	. 	. 	. 	. . 	. 	. 	. . 	. 	. 	. . 	. 	. . 	. . 	. 	. 	. 	. . 72 
6.6 {a} for (a) visible, (b) hidden neurons, and (c) hidden-visible weight evolu- 
tions during training. wOl and w02 in (c) indicate the biased hidden neurons 
weight to visibles . 	. . 	. 	. . 	. . 	. . 	. 	. . 	. . 	. . 	. . 	. . 	. 	. . 	. 	. 	. 	. 	. . 	. . 	. 73 
6.7 (a) Non-symmetric distribution training data. (b) 20-step reconstruction of 
the CRBM injected with non-zero mean (n. = 2) Gaussian noise after 10000 
epochs . 	. 	. 	. . 	. 	. 	. . 	. 	. 	. . 	. . 	. 	. 	. . 	. 	. . 	. . 	. 	. 	. 	. . 	. 	. 	. . 	. . 	. 	. 	. 75 
6.8 Artificially generated non-Gaussian noise . 	. 	. 	. . 	. 	. . 	. . 	. 	. . 	. . 	. 	. 	. 	. . 76 
6.9 20-step reconstruction of CRBM injected with non-Gaussian noise after 10000 
epochs for (a) symmetric distribution (b) non-symmetric distribution training 
data . 	. 	. . 	. . 	. 	. . 	. . 	. . 	. . 	. 	. 	. . 	. 	. . 	. 	. 	. . 	. . 	. 	. 	. 	. 	. 	. . 	. 	. 	. . 	. 77 
6.10 CRBM neurons with localised noise in synaptic multipliers .. . . . . . . . . . 78 
6.11 20-step reconstruction with zero-mean Gaussian noise injected into every 
synaptic multiplier with noise variance set to (a) 0.1(b) 0.05 (c) 0.01..... 79 
7.1 CRBM neuron with noisy synaptic multiplication . . . . . . . . . . . . . . . . 82 
7.2 An illustration of how noisy synaptic multiplication is implemented in Matlab. 83 
7.3 Example of data extracted from the LUT. The value for V,, and l/ are mapped 
to wij and s3 respectively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 	86 
7.4 (a) Non-symmetric distribution training data (b) 20-step reconstruction after 
30000 epochs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 	87 
x 
List of figures 
7.5 CRBM with 4-trap implementation weights {w 3 } evolution for 30000 epochs 
where visible neuron i and hidden neuronj and index 0 represents bias neu-
rons during training . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . 88 
7.6 (a) Learnt weight vector after 30000 epochs and (b) evolution of noise control 
parameters {a} for hidden neurons, during training . . . . . . . . . . . . . . 89 
7.7 20-step reconstrnction of CRBM implemented with (a) 1 0-trap and (b) 100- 
trap noisy synaptic multiplier output noise after 30000 epoch . . . . . . . . . . 90 
A. 1 MOSFET cross-section for saturation operation with channel length modula-
tion. The pinch-off point is defined when VDS = VGS - Vth and denoted by 
VDSsat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 
xi 
List of tables 
3.1 Fitting parameters 00, LEB , LECT, for seven room temperature traps ob-
served in 0.4 ILm 2n-channe1 MOSFETs with corresponding estimated trap 
depth x t [4, 5 ] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 




This thesis explores the prospects of using future nanoscale MOSFETs to implement proba-
bilistic computation in hardware, based on modelled future nanoscale MOSFETs and specific 
probabilistic neural computation architecture. The word 'future' refers to MOSFETs which 
are not yet available although their performance is predictable through current research ef-
forts, while the word 'nanoscale' refers to MOSFETs with dimensions less than lOnm (sub-
1 Onm) in physical gate length. The motivation of this research is described in Sec. 1.1, and 
the contribution to knowledge is clarified in Sec. 1.2. Finally, the structure of this thesis is 
described in Sec.l.3. 
1.1 Motivation 
The success of metal-oxide-semiconductor field-effect transistors (MOSFETs) as the basic 
building block of most digital and analogue very large scale integrated (VLSI) circuits is pre-
dicted to continue for some years. Currently, 90nm (with a physical gate length of 50nm) is 
the state-of-the-art MOSFET process technology and it is projected that by 2018, sub-lOnm 
physical gate length MOSFETs will be available [6]. The drive toward miniaturisation is 
led by the promise of improved circuit performance, reduced chip sizes, and the potential of 
higher levels of integration. However, as MOSFET dimensions continue to shrink, consider -
able challenges arise in the area of device performance and reliability uncertainty [7, 8]. 
One of the contributing factors towards performance and reliability uncertainty is the in-
crease in low frequency drain current noise. Drain current noise in MOSFETs is predicted 
to increase as the channel length shrinks [9-12]. Random Telegraph Signal (RTS) noise 
and 1/f noise are the primary forms of low frequency noises that are predicted in future 
nanoscale MOSFETs. In current technology these are minimised or suppressed through de-
sign and/or additional fabrication steps. As MOSFET dimensions continue to shrink, their 
1 
Introduction 
presence will become increasingly significant. Recent studies show low frequency drain cur-
rent noise amplitudes in excess of 60% in Deep Sub-Micrometer (DSM) MOSFET' [13]. 
As MOSFETs continue to scale, the low frequency drain current noise in nanoscale MOS-
FETs2 is expected to become a serious issue [14], leading to severely limited functionality, 
performance, and compromised reliability. A conventional solution would avoid or minimise 
nanoscale MOSFET noise through additional fabrication processes. For large MOSFETs, 
fabrication processes may be controlled to reduce noise [6]. However, in nanoscale fabrica-
tion process, precise fabrication process control may be impossible and the performance gain 
of nanoscale MOSFETs may not justify the enormous fabrication cost. Furthermore, many 
of the sources of noise and unreliability in DSM MOSFETs are fundamental and will not 
yield to improved or more careful processing. Solutions based on alternative architectural 
paradigms, such that the unreliable performance of these nanoscale MOSFETs could be tol-
erated or useful, become very attractive. For example, the architectures proposed in [15-17] 
use redundant circuits to form error correction to deal with this uncertainty. Other approaches 
are adaptive (neural network and probabilistic computing), forcing errors introduced by these 
nanoscale MOSFETs noise to adapt to a known (trained) system outcome or acceptable-error 
marked [18-20]. These fault tolerant architectural approaches provide reliability via redun-
dancy, at the expense of circuit area and speed. An unconventional architectural approach 
that allows for stochasticity, or that even exploits nanoscale MOSFET noise, is therefore an 
exciting alternative. 
Solving the nanoscale MOSFET noise issue requires something of a paradigm shift, wherein 
noise is viewed as a necessary element of useful computation. This is different from the 
approach reported in [16-18] where artificial neural networks are used because of their in-
herent hardware redundancy, error tolerance and self-organisation features, offering a very 
effective means to counteract the inherent weaknesses of nanoscale MOSFETs. Naturally, it 
would be unwise to claim that the proposed approach will solve all conventional computing 
problems. Rather, it provides an opportunity for new computation architectures such as a 
probabilistic neural architecture to be implemented in hardware more efficiently, extending 
'Deep Sub-Micrometer (DSM) refers to MOSFET with physical gate length less than 1 OOnm but greater 
than lOnm. 
2 Low frequency noise in nanoscale MOSFETs will henceforth be referred to as nanoscale MOSFET noise. 
2 
Introduction 
the compatibility of nanoscale MOSFETs to specific real-world application. 
The Continuous Restricted Boltzmann Machine (CRBM) aims to deal with real-time data 
in a noisy environment; potentially for a complex multi-sensory micro-system implementa-
tion such as a Lab-On-Chip system [21, 22], coincidentally an area where nano scale technol-
ogy may be highly desirable for reasons of size. Data often encodes biological or chemical 
information, of relatively low bandwidth. Probabilistic neural systems are arguably well-
positioned to address nanoscale MOSFET noise as they use stochasticity to extract and to 
classify important features in real-world data. This project explores the use of nanoscale 
MOSFET noise in a probabilistic neural architecture which has been shown great potential 
for realising intelligent embedded system [21-24]. 
The work therefore must build a bridge between future nanoscale MOSFET physics and 
this probabilistic neural computation, in order to determine whether intrinsic low frequency 
drain current noise in future nanoscale MOSFETs is useful for probabilistic computation. 
If unreliable nanoscale MOSFETs can be shown to be useful in such an application, the 
technological and economic consequences of their practical implementation may become 
extremely significant. Furthermore, the methodology developed to make this study has more 
generic usefulness, as will be discussed in this thesis. 
1.2 Contribution to knowledge 
This project sets out to explore the suggestion that 
Low frequency drain current noise in future nanoscale MOSFETs can underpin useful prob-
abilistic computation. 
In examining this hypothesis, the project will develop new methods to link DSM device 
physics, through compact circuit models, to behavioural-level simulations of a relatively -
well-understood probabilistic paradigm. 
The Continuous Restricted Boltzmann Machine (CRBM) has been chosen as an experimental 
platform. While both nanoscale MOSFET noise [10-12,25-29] and CRBM [3,21-24,30-- 
9-1 
Introduction 
32] have been the subject of extensive research, the use of nanoscale MOSFET noise for 
computation in the CRBM has not been studied, and it is hoped that this project will point 
the way towards hardware implementations of nano-embedded intelligent systems. 
To achieve the objective of this project, temporal fluctuations of nanoscale MOSFET noise 
must be incorporated into the CRBM. Unfortunately, nanoscale devices are at least a decade 
away from everyday reality [6]. To pursue this project, temporal fluctuations in nanoscale 
MOSFETs will be simulated, based upon theoretical compact models of nanoscale MOS-
FETs extracted from atomistic simulation [25-27,33]. Current simulators cannot support the 
requirement for time domain noise analysis based on nanoscale MOSFET noise character-
istics (RTS). Therefore, in order to still pursue on the main objective of this project, a time 
domain RTS noise based simulation capability must be developed. 
1.3 Chapter layout 
This thesis can be separated into two relatively independent sections, A and B as illustrated 
by Fig. 1.1. Part A deals with nanoscale MOSFET noise while part B discusses the chosen 
probabilistic neural model, the CRBM. Chapter 7 brings these together, linking nanoscale 
MOSFET characteristics and probabilistic neural computation into one common goal, pro-
viding useful computation. The chapters are:- 
. Chapter 2 reviews the nanoscaleMOSFET drain current low frequency noise: RTS and 
1/f noise. 
o Chapter 3 discusses the modelling of a noisy MOSFET for time domain noise analysis 
capability. 
. Chapter 4 analyses the time domain output noise of the noisy analogue multipliers 
implementation, based on the capability developed in Chapter 3 
• Chapter 5 reviews the CRBM algorithm and architecture. 
• Chapter 6 analyses the effect of noise on the CRBM performance. 
4 
Introduction 
• Chapter 7 presents the CRBM with nanoscale MOSFET noise implementation, and 
explore the performance of this implementation. 
• Chapter 8 concludes the contribution and the future work of this research. 
Introduction 
Can DSM device noise be 


















Noise in the CRBM 
chapter 7 
CRBM with Nanoscale 
MOSFET Noise 
DSM: Deep-Sub-Micrometer 
CRBM Continuous Restricted Boltzman Machine 
Figure 1.1: The flowchart illustrating the chapter flow in this thesis. 
31 
Chapter 2 
Low Frequency Noise in Nanoscale 
MOSFETs 
2.1 Introduction 
Low frequency noise becomes a dominant limiting factor in the practical use of MOSFETs in 
a circuit implementation as the devices enter nanoscale dimensions. It sets a lower limit to the 
level of signal that can be reliably processed by the circuit. Excessive low frequency noise 
could lead to serious performance and functionality limitations. Therefore, low frequency 
noise in MOSFETs has been studied extensively in the last few decades [4, 9, 11-14, 27, 34, 
35]. 
Random Telegraph Signal (RTS) noise and 1/f noise are the two forms of low frequency 
noise that are predicted to dominate future nanoscale MOSFETs [9, 11, 12, 14]. In cur-
rent technology, their existence is insignificant, and in most cases minimised or suppressed 
through either design or additional fabrication processes [6]. As MOSFET dimensions con-
tinue to shrink, their presence (noise) is predicted to become increasingly significant. Recent 
studies have shown low frequency drain current noise amplitude in excess of 60% in Deep-
Sub-Micrometer (DSM) MOSFETs [13]. 
Fig.2.1 shows an example of simulated time domain 1-trap RTS and 11f noise for a 35nm 
gate-length NMOS transistor based on a noisy MOSFET model developed in [2]. In reality, 
1/f noise in general is more relevant to larger MOSFETs (>5-10 m2 [1 1]), while as MOS-
FETs shrink to nanometer scale, RTS noise becomes dominant [4]. It is commonly agreed 
that the superposition of multiple RTS noise sources gives rise to 11f noise. Understanding 
the microscopic origin of RTS noise therefore contributes to the understanding of the origin 
of 11f noise. 
7 
Low Frequency Noise in Nanoscale MOSFETs 
6 x 10, 
 
t.D 
' 0 	0.2 0.4 0.6 0.8 
	 _L 
02 	04 	06 	OR 
Time, (s) 
	




Figure 2.1: Sample of(a) single trap RTS and (b)flicker noise. 
2.2 Random Telegraph Signal (RTS) Noise 
Recent studies show that the low frequency performance of nanoscale MOSFETs is domi-
nated by RTS noise [11, 13]. RTS noise arises from the capture and emission (trapping and 
detrapping) of hot electrons in the channel by traps (defects) at the interface of Si - S02 , 
causing discretised drain current fluctuations, as seen in Fig.2. 1(a) [2]. 
RTS noise is characterised by three parameters: the average amplitude of fluctuation LID, 
the mean capture time , and the mean emission time e•  All these parameters vary over wide 
ranges with devices sizes, temperature, and bias conditions, where models to describe their 
dependencies have been developed [4,27,28,36]. 
2.2.1 Origin of traps 
There are four kinds of defects or traps commonly associated with a Si - Si02 interface: 
mobile ions, fixed charges, interfacial traps, and induced charges [1]. Illustrated by Fig.2.2, 
the defects/traps are briefly described below based on the detailed descriptions in [1]. 
• Mobile ions, typically Na and K, lying within the Si02 interface are introduced 
through contamination during the fabrication process. They move around or redis-
tribute under bias-temperature stressing, producing instability in the MOSFET's char- 





induced charges 	 K 
Si02 	 - 
~ 	- 	+ 	\/ 	
Fixed charges 
.2<. ,? 	 X 
Si 	 lntcrfacial traps 
Figure 2.2: An illustration of the traps and charges in Si - Si02 structures. (Adapted from 
[1]). 
acteristics. 
. Fixed charges exist due to excess ionic silicon that has broken away during the oxidis-
ing reaction at the Si - Si0 2 interface. This explains the location of fixed charges in 
Si02 . Unlike mobile ions, these fixed charges are consistent for a given set of fabrica-
tion conditions. 
• Interfacial traps, which are situated at the Si - Si02 interface, are believed to arise 
from unsatisfied chemical bonds, or so-called "dangling bonds", at the surface of the 
Si during thermal formation of the Si0 2 layer. The interfacial traps introduce energy 
levels in the forbidden band gap at the Si - Si02 interface and remain fixed in energy 
relative to the conduction band and valence band energies. 
Induced charges are introduced into the Si02 due to ionising radiation, or hot carrier 
stress. The induced charges may be positively or negatively charged. They influence 
the MOSFET's characteristics by increasing or reducing the threshold voltage. 
Interface traps and induced charges, unlike the mobile ions and fixed charges, are readily 
influenced by the bias conditions of the MOSFET. If they are within the tunnelling distance 
20 
Low Frequency Noise in Nanoscale MOSFETs 
(<2 mm) of the hot electrons, they can be charged or discharged, creating fluctuations in the 
MOSFET's characteristics (i.e. the drain current IDs),  of the form of RTS. 
2.2.2 RTS noise amplitude 
The discretised drain current noise in a nanoscale MOSFETs is the combined effect of carrier 
number fluctuations and carrier mobility fluctuations [9,37]. The normalised amplitude of 
this discrete fluctuation is described in the following general relation: 
tIDLN ± zI2 
113 N 
(2.1) 
where N is the number of channel carriers per unit area and 12  is carrier mobility. The term 
AL () in Eq.(2.1) describes the effect of mobility fluctuations caused by Coulombic scatter-
ing of the charged traps [4, 37-39]. The sign (±) indicates the electronic state of the traps 
(i.e. charged (+) or neutral (-)) [36], after capturing electrons [37]. A trap that is charged 
after capturing an electron increases the scattering effect which subsequently increases the 
noise amplitude LID [37]. If a trap becomes neutral after capturing an electron (charged 
when empty), the Coulombic scattering becomes weaker, reducing the noise amplitude Lu'D 
[37]. As the MOSFET enters nanoscale dimensions, the carrier number fluctuations () 
become dominant [9]. Therefore, the term (±) in Eq.(2.1) can be dropped. ($) de-
scribes the carrier number fluctuations caused by the capture and emission of electrons by the 
trap. When a trap captures an electron from the channel, the effective drain to source current 
drops. When the trapped electron is released into the channel, the effective drain to source 
current increases. The normalised amplitude (LID/ID) dominated by () can therefore be 
described by [40,41]: 
'D 	g 	q 	
( - 
	), 	
(2.2) = a1• WLC
0 t0 
where g is channel transconductance, W is channel width, L is channel length, C0x is gate 
oxide capacitance, t0x is gate oxide thickness, Xt is trap depth measured from the Si - Si0 2 
interface. a in Eq.(2.2) is a semi-empirical parameter used to account for the wide variation 
10 









Figure 2.3: RTS noise amplitude variation with gate VGS  and drain V1 voltages for three 
different trap depths (x t = 0.11, 0.40, 0.80 nm) located in the middle ofgate ('Yt = 
1 6nrn) for an implementation based on 35nm NMOS modelled in [2]. 
of RTS amplitude [40] caused by the short channel effect in MOSFETs operating in weak 
inversion [41]. a is in the range of 0.1 to 100 [42]. 
The dependence of typical RTS noise amplitudes on bias is shown in Fig.2.3. The noise 
amplitudes peak at low gate voltages (weak inversion), while at higher gate voltages, the 
amplitudes decrease. Recent studies have shown that RTS noise amplitudes vary by some 
40% in weak inversion compared to 5% for strong inversion [27]. In addition, it has been 
reported that shallower traps (small Xt) produce a larger RTS noise amplitude [37], which is 
in agreement with Eq.(2.2). 
2.2.3 RTS average capture and average emission time 
The capture and emission of electrons cause fluctuations in channel conductance, which in 
turn causes the drain current to fluctuate. Capture and emission are stochastic events, obeying 
Poisson statistics, and are normally described by the average values of e  and t,, respectively 
[36]. T, represents the mean time that a trap is empty before capturing an electron and 
represents the mean time in which a trapped electron is freed. Eq.(2.3) and Eq.(2.4) [4] 
Low Frequency Noise in Nanoscale MOSFETs 
describe the basic empirical model for t, and f,. that are used in this work. 
= exp (L) 	
(2.3) 
UoVjhfl 
- 	exp[(zEB +LECT)/kT] 
Te = 	 ( 2.4) 
gcroVth 1 
In Eq.(2.3) and Eq.(2.4), a0 is the trap cross-section pre-factor, z.E8 is the barrier energy 
for capture (also known as activation energy for capture), LECT is the trap binding energy, 
LEB + SECT is the emission activation energy, is the average thermal velocity, ri is the 
channel electron concentration, k is the Boltzmann constant, T is the absolute temperature, 
and g is the degeneracy factor which is normally set to 1 [4]. 
The capture and emission time are thermally activated processes which are inversely pro-
portional to temperature, as described in both Eq.(2.3) and Eq.(2.4). For a fixed operating 
temperature, the capture and emission time are affected by bias conditions through o, 
and LEcy. It has been reported that U() and zE 7 depend strongly and positively on gate 
voltage, 17GS  while LEB does not depend on either VGs  or VDS  [4, 5, 28]. The channel elec-
tron concentration's, dependence on drain voltage, VDS, is strongly influenced by the position 
along the channel at which the trap is located. For an n-MOSFET, ii near the source is not 
affected by VDS while ii near the drain shows a strong inverse relationship with VDS [2]. An 
example of mean capture and emission time variation with bias conditions for a trap near the 
Si - Si02 interface (x 1 = 0.1 lnm) located in the middle of the channel (y= 16nm) is shown 
in Fig.2.4. 
2.2.4 Discussion 
The capture and emission (trapping and de-trapping) of channel hot electrons by a trap (de-
fect) at the interface of Si—Si0 2 , causes discretised drain current fluctuations with amplitude 
exhibiting a non-Gaussian distribution [11]. The power spectral density (PSD) of 1-trap RTS 
noise exhibits a Lorentzian spectrum, described by [4, 28,29]: 
12 





0.5 	 1.5 
0.5 
VGS, (V) 	1.5 0 	VDS,(V) 
Figure 2.4: Average capture '. and average emission t, vs gate VGS  and drain VDS  voltages 
for trap located at x1 = 0.1 mm and Yt = I6nm for an implementation based on 
35nrn NMOS modelled in [2]. 
4 (zI) 
S(f) =9 	 , 	 ( 2.5) 




The number of discrete levels of drain current clearly depends on the number of traps. In 
practical MOSFETs of 1.0 x 0.15 jim 2 dimensions, the number of interface trap densities are 
usually in the order of 10 1°eV' cm 2 [43], of which several could be active (i.e. energy 




(2.6) S1 (f)= 
k=1( + e)k 	+ 	
+ (2f)] 
where Ntraps represents the total number of active traps contained within the Si - Si02 
interface and S1 (f) is the current noise power spectral density summed over all traps k = 
1, 2, 3.....Ntraps . For a large number of active traps, Ntrap, uniformly distributed (both 
throughout the oxide and in energy [44]) with a wide distribution of time constant (t, and ), 
the superposition of Lorentzian spectra in Eq.(2.6) gives rise to S1 (f) with a 11f form [4, 12]. 
13 
Low Frequency Noise in Nanoscale MOSFETs 
This provides a physical justification that the superposition of several RTS noises gives rise 
to 1/f noise in MOSFETs. 
2.3 Flicker (11f 
) 
Noise 
The origin of 11f noise in MOSFETs has been extensively studied for more than a decade 
[9, 14,44-46]. It has been agreed recently that 1/f noise in MOSFETs is associated with 
both carrier number fluctuations and correlated carrier mobility fluctuations [4,47]. Carrier 
number fluctuations come from the random trapping and detrapping of free carriers in the 
oxide traps near the Si - Si0 2 interface, where the trapped carriers limit the mobility of the 
free carriers near the interface by Coulombic scattering [4]. 
2.11 11f noise model 
In practice, MOSFET drain current 11f noise is commonly described by its Power Spectral 
Density (PSD). A typical MOSFET drain current 11f noise PSD has the following form [48]: 
S(f) - K
1Ij 
- COX Le2 	f' 	
(2.7) 
ffJ 
where Kf and af are process-dependent constants and may vary from sample to sample. 
The noise exponent af is typically between 0.5 and 2 [49]. The 11f noise cOefficient K1 
was claimed in [47] to be bias-dependent but in common practice [48], K1 is considered 
constant. The frequency exponent, e1, is a bias-dependent parameter with a typical value 
ranging between 0.7 and 1.2 [50]. C0 is the gate oxide capacitance and L eff is the effective 
channel length of a MOSFET. 
A unified model of a MOSFET's drain current 11f noise PSD incorporating both carrier 
number fluctuation and mobility fluctuation has been proposed by Hung [45]. The widely 
adopted analytical expression of the unified model in strong inversion is given by [48] 
14 
Low Frequency Noise in Nanoscale MOSFETs 
S(f) - kTq2It11 
1A in N0 + N* + B (N0 - NL) + C (N - Ny)] - fefL2C L NL+N* 
+LL kTI 
	A + BNL + CN 




qN0 = Co.(Vcs - Vth) 	 (2.9) 
qNL = C0 (VGS - 	- VDS) 	 (2.10) 
and N* = (kT /q2) (Co,+  Cd + Cd and Cit are depletion layers and interface trap 
capacitance respectively. 'y is the attenuation coefficient of the electron wave function in the 
oxide, with a typical value of 10 8cm 1 [48]. A, B, and C are technology-dependent model 
parameters. L.Ld m refers to the electrical channel length reduction due to channel length 
modulation. No and NL are the charge densities at the source and the drain ends of the 
channel respectively. 
2.3.2 Discussion 
Eq.(2.7) is a simple model of the MOSFET's drain current 1/f noise PSD, used largely for 
'long channel' MOSFETs [47]. Nevertheless, it is able to explain the general frequency and 
size dependence of 11f noise in any MOSFET correctly. In practice, Eq.(2.7) is much easier 
to understand and implement, especially for the initial low frequency noise characteristic 
approximation of a MOSFET. 
The approach to carrier trapping-detrapping 1/f noise provides a better physical explanation 
of 11f noise in MOSFETs. Eq.(2.8) is a complex model based on this approach and is con- 
15 
Low Frequency Noise in Nanoscale MOSFETs 
sidered to be a 'complete' model, capable of expressing 11f noise in all MOSFET operation 
regions in strong inversion. In the absence of technology-dependent model parameters and 
charge density estimations, Eq.(2.8) may not be immediately useful to describe 'accurate' low 
frequency noise in MOSFETs. In this case, Eq.(2.7) provides a reasonable approximation of 
Eq.(2.8). 
2.4 Summary 
This chapter has drawn together results from the extensive research and literature of Deep-
Sub-Micrometer (DSM) noise modelling. It has been established that an approach based on 
the capture and emission of electrons by traps near the vicinity of the Si - Si0 2 interface 
provides a clear and simple explanation of dominant low frequency noise sources (RTS and 
11f) in nanoscale MOSFETs, which accounts for both the effect of earner number fluctua-
tion and carrier mobility fluctuation. In general, low frequency noise is inversely proportional 
to MOSFET gate area. For a small MOSFET, RTS noise dominates the low frequency, pro-
ducing discretised drain current. For large MOSFETs, the superposition of RTS noise gives 
rise to 11f noise. 
Chapter 3 
Modelling 'Noisy' MOSFETs 
This thesis aims to demonstrate the principle that nanoscale MOSFET low frequency noise 
can underpin useful probabilistic behaviour. The low frequency noise is used as a source of 
probabilistic behaviour in a modelled nanoscale silicon 'neuron' that adapts to the natural 
variability in input data. For this purpose, a noisy MOSFET model that accurately represents 
the temporal fluctuation of a real nanoscale MOSFET is required. This chapter describes the 
method used to model the noisy MOSFET. The capability to represent the dominant nanoscale 
MOSFET low frequency noises (RTS and 11f) will be demonstrated and verified. The noisy 
MOSFET model will be used to implement a key circuit for the silicon 'neuron' in Chapter 
4. 
3.1 Introduction 
The aim of this study is to develop a SPICE-based behavioural model of a 'noisy' MOS-
FET, preserving the underlying, essential MOSFET transfer characteristics, as illustrated in 
Fig.3.l. 
A credible, computationally-simple noisy MOSFET model is critical to this work. At this 
concept-proving stage, however, simplicity and simulation speed are more important than 
great accuracy. It is not, therefore, the objective of this study to develop a complex, arbitrarily 
accurate noisy DSM MOSFET model. Rather, the focus is to develop a model that produces 
a correct form of noise behaviour, valid in all operating regimes, and is easy to implement in 
a readily-available circuit simulator. 
The work described in this chapter focuses on two low frequency MOSFET noise models: 
11f noise and RTS noise. 11f noise in general is more relevant to the larger area MOSFETs 
17 
Modelling 'Noisy' MOSFETs 
VDS(r) (Arbitrary Units) 	 VDS(f) (Arbitrary Units) 
(a) 	 (b) 
Figure 3.1: (a) Standard n-channel MOS drain-source current IDS  for a given VGS voltage. 
(b) noisy n-channel MOS drain current for a given VGS voltage. 
(>5— 10m 2  [11]), while as the MOSFET shrinks to nanometer scale, the RTS noise become 
dominant [11,36]. Current models predict that the superposition of RTS noise produces 11f 
noise, and that the transition from a 11f -dominated regime to that of true RTS noise is 
gradual—not in the form of a first-order 'phase transition' [4, 11,45]. 
3.2 Methodology Overview 
The noisy MOSFET was modelled by augmenting a conventional noiseless MOSFET with a 
noise source, m(t) (Fig.3.2). In this study, the initial MOSFET model was that for a 0.35pm 
AMS CMOS technology. However, the overall and final implementation of the noisy MOS-
FET was based upon a 35mm gate length MOSFET, for 90mm CMOS technology node, ex-
tracted from atomistic simulation [25,26, 33]. 
The noiseless MOSFET is used to generate the 'DS  response of the noisy MOSFET at a 
particular time instance t, for the given bias conditions (lcs  and VDS).  As bias conditions 
change in time, the noisy MOSFET drain current therefore changes correspondingly. The 
methodology uses 'DS static model in transient analysis. This is valid as the 'DS  reaches 
steady state in pico-seconds [51] while the low frequency noise fluctuates in micro-seconds. 
In 




GY 	 _ 
Noise Source 
S 
Figure 3.2: Noisy MOSFET components. The noiseless MOSFET is a standard MOSFET[ 
having a typical I-V characteristic corresponding to the technology used. The 
noise source n(t) generates noise data with the speqflc physical characteristics 
of the low frequency noise it models. 
The noise source n(t) represents the 1/f or RTS behaviour of the noisy MOSFET. Time-
domain models of 11f noise are calculated from the 11f Power Spectral Density (PSD) 
(described in Sec.2.3.1) generated for each bias point of the MOSFET. For RTS noise, time-
domain models are developed using RTS parameters (amplitude and time statistics) calculated 
at each bias point of the MOSFET. Details are illustrated in Sections 3.3 and 3.4. 
To ensure that the methodology used for generating noise does indeed produce time-domain 
noise with the intended spectral characteristics, the PSD of the generated time-domain noise 
is calculated using a PSD extraction algorithm. 
In general, the DC drain-source current, 'DS,  generated at a given time t, is used by the noise 
source n(t) to generate one noise datum. The noise data generated in time space have the cor-
rectform of noise behaviour that the noise source m(t) is supposed to model. If bias conditions 
change in time, the noise source m(t) generates non-stationary noise data corresponding to 
the changing DC response of the noiseless MOSFET. 
19 
Modelling 'Noisy' MOSFETs 
The noisy MOSFET model is used as a 'standard' component in circuits. It is essential that 
the method of generating noisy drain current does not otherwise disrupt the use of the circuit 
simulation under transient analysis. Therefore, a high-level analogue behavioural language, 
called Verilog-A, is used to model the noisy MOSFET. Implementation using Venlog-A al-
lows full control of the noisy MOSFET behaviour, in addition to implementation flexibility 
for future modification. 
3.3 Generating 11f Noise 
The objective of generating time-domain 1/f noise is somewhat different from that of most 
other work [52, 53]. Although the method chosen has been used to develop commercial 
transient noise analysis [53], this is not the aim of this work. The aim is to model a noisy 
MOSFET in a circuit implementation of a particular computational architecture, in order to 
explore the effects of noise on its performance. 
3.3.1 Methodology: 11f noise 
The critical part in modelling the noisy MOSFET is the implementation of the 11f noise 
source, n(t). The 1/f noise source should exhibit the correct form of noise characteristic 
corresponding to the MOSFET technology and bias conditions. 
In this work, 1/f noise was generated based on the Sum-of-Sinusoids technique adapted from 
[52]. This is the best approach for the use with Verilog-A implementation, even though other 
approaches are possible. The technique generates the time domain 1/f noise data, n(t), by 
summing a fixed number N1 of random phase sinusoids in a specified frequency band [52]: 
Nf 
m(t) = 	a(t) sin(27f2 t + 	 (3.1) 
where a2 is the magnitude, f2 is the frequency and çoj is the random phase defining the i ll  
sinusoid. 
20 
Modelling 'Noisy' MOSFETs 
N1 represents the number of sinusoids used to approximate noise data at a given time t. For a 
fixed frequency band (band-limited) PSD S(f) (Eq.(2.7)), Nf depends on the division of the 
frequency band into frequency steps, Lf. A smaller Lf means more sinusoids, better n(t) 
approximation, and longer transient data representation. Depending on the shape of 8(f), the 
frequency band may be divided linearly or logarithmically. 
For each frequency interval f2 and  f2 + zf of a given PSD S(f) shown in Fig.3.3, the 
magnitude a2 is approximated as: 
f+&f 
a = 2f 	S(f)df. 	 (3.2) 
During simulation, as bias conditions change, S(f) changes accordingly. For 11f noise, the 
PSD is proportional to the drain-source current, 'DS,  as described by Eq.(2.7) in Sec.2.3.1. 
Consequently, according to Eq.(3.2), the magnitude a 2 will also change. The values for a 2 
are updated at each time step to cater for changes in bias conditions. Changes in bias con-
ditions therefore affect generated 11f noise magnitude and spectrum characteristics through 
Eq.(3.2) and Eq.(3.1). This enables the modelled noise generated to be of non-stationarity 
characteristics in the case where bias conditions change with time, at the expense of lengthy 
simulations. 
f2 represents the frequency of the i1h  sinusoid. çoj is a random phase angle, uniformly dis-
tnbuted between 0 and 27. values are unique for each sinusoid across t. 
3.3.2 Implementation: 11f noise 
The initial n-channel MOSFET (NMOS) used to establish our methodology was based upon 
a 0.35pm AMS CMOS model. Gate length L, width W, and oxide thickness T0 were set 
to 0.35 /tm, 1gm, and 7.7nm, respectively. The 1/f noise coefficient K1 , exponent a1 , and 
frequency exponent e1 values were set to 2.81e-27 A . F, 1.4, and 1, respectively. These 
values were taken from the AMS CMOS technology file used in simulation. In order to 
generate statistically significant noise amplitudes useful for this work, K1 was arbitrarily 
increased to 2.81e-23 A . F. 
21 
Modelling 'Noisy' MOSFETs 
N 
I 
1mm f j+zlf 
FrequencyJ, (Hz) 
Figure 3.3: Illustration of 11f power spectral density S(f). 
DSM implementation was based upon a 35nm gate length NMOS model developed using 
atomistic simulation. We set L35nm, W=O. 1 tim, T0 0.88nm. The flicker noise parameters 
were not available in this 35nm NMOS model. Therefore, in this study, the values used for 
0.35pm CMOS technology were re-used for the 35nm NMOS implementation. 
These values and this approach are not acceptable for a thorough exploration of noise in 
35nm MOSFETs. However as explained in Sec.3.3, they are more than adequate for the aims 
of this study, capturing as they do the most important characteristics of DSM noise at circuit 
level. 
Using these parameters, the 11f noise source n(t) was implemented based on the method 
in Sec.3.3.1. It is thought that 1/f noise is important below 10kHz [9,46], and we chose to 
generate 11f noise between 100 Hz (frnin)  and 10 kHz (fmax).  So, for a given bias conditions 
(i.e. IDs),  frequency band, and MOSFET parameters, a corresponding PSD can be generated 
based on Eq.(2.7). 
This PSD is then used to generate the amplitudes of the sinusoids in Eq.(3 1) using Eq.(3.2). 
The frequency band was initially divided linearly with Lf set to 100 Hz. Therefore, based 
22 
Modelling 'Noisy' MOSFETs 
upon the set frequency band and frequency step, there are 99 terms in equation Eq.(3. 1). 
It should be noted that this calculation is performed to generate one noise datum at a particular 
time instance t and then the noise datum, LIDS(t), is added to generate (IDS(t) + LIDs(t)). 
At the next simulation time step, the process is repeated. 
The implementation steps for generating time domain 11f noise are summarised as follows: 
At t = 0 (initialisation) 
• Set the lowest frequency fmin  and highest frequency fmax 
• Set the frequency step Lf for dividing the band-limited PSD S(f). 
• Calculate the number of sinusoids: Nf = fmafmin Af 
• Generate N1 random phase angle using random number generator. 
Probe the noiseless MOSFET drain-source current 'DS  at time t. 
Use the 'DS  to generate band-limited (1mm ' fmax) PSD S(f) using Eq.(2.7) 
Approximate the magnitude a i for each frequency interval f2 and f+/f using Eq.(3.2). 
Calculate the noise datum n(t) using Eq.(3.1). 
Add the noise datum n(t) to 'DS  to produce noisy 'DS. 
Repeat 1-6 until the end of simulation time. 
3.4 Generating RTS Noise 
RTS noise in MOSFETs has been accepted and studied for some time [4,37,54,55]. Most 
agree that the physical origin of RTS noise is the trapping and de-trapping of electrons by 
traps located at or near the vicinity of the Si - Si02 interface [28,43] and RTS noise causes 
discretised drain current fluctuations. However, a 'standard' and generally accepted model 
for RTS noise that is valid for all operating regimes does not yet exist [28]. 
23 
Modelling 'Noisy' MOSFETs 
The number of discrete values of 'DS  depends on the number of traps [38]. Most RTS noise 
models are based upon single trap capture and emission activity [4,28], and restricted to room 
temperature operation in ohmic regimes at strong inversion [4,28,43]. 
In this work, we assembled a more wide-ranging RTS model based upon previously-published 
work [4,28,36,37,41,55-57], and made some pragmatic assumptions to enable the models 
to be used in our circuit and target architecture. 
3.4.1 Assumptions: RTS noise 
The RTS noise is generated based upon existing models, which were developed and extended, 
with the following assumptions and restrictions:- 
Only electron traps are considered and the noisy-MOSFET modelling was limited to 
n-type devices at this stage, as very few models of hole traps have been reported. 
All traps are considered neutral when empty. Attractive and neutral traps have larger 
cross-sections (in the range of 10-14 - 10 12 cm2 and 1018 - 10 4cm2 , respectively) 
than do repulsive (negative) traps with a cross-section smaller than 10 18 cm2 , and a 
concomitantly low capture probability [58]. Neutral traps produce larger RTS ampli-
tudes compared to the attractive traps and therefore representing the normally observ-
able, hence analyzable RTS noise [37]. 
Active traps reside in the volume between the Si - Si02 interface and oxide at depth 
within the tunnelling distance (2 mm) of any hot electrons, limited by gate oxide 
thickness t0 . 
Adjacent traps are at least 2nm apart. This is important to ensure that traps are elec-
trostatically isolated [34], and thus there is no interaction such as tunnelling between 
traps [4]. 
The capture and emission of an electron by a single trap are mutually exclusive events. 
Only one electron can be trapped by each trap at any particular time. It was reported 
24 
Modelling 'Noisy' MOSFETs 
by [4] that RTS noise due to multi-electron trapping by a single trap can occur for traps 
located in Si rather than the Si - Si02 interface. 
No Coulombic scattering effect (channel blocking). Coulombic scattering would cause 
trapped electrons to become a mid-filter that repels the further capturing process of 
electrons. This effect is mainly prominent in strong inversion and in very weak inver-
sion [4]. 
It is assumed that capture and emission time are not affected by electron temperature. 
However, according to [4], by applying a drain-source voltage, the electron tempera-
ture can be raised above the underlying lattice temperature, giving rise to temperature-
dependent capture and emission as evident by Eq.(2.3) and Eq.(2.4). By ignoring the 
effects of electron temperature, this assumption introduces inaccuracy to capture and 
emission time approximation especially for high bias conditions. 
The amplitude of RTS is not affected by lateral trap position along the channel region. 
With these assumptions, a suitably accurateform of RTS noise, valid for low bias conditions 
and typical temperatures can be generated while at high bias conditions and corresponding 
high temperatures, the accuracy will be compromised. It is acknowledged again that RTS 
noise thus generated may not be a completely accurate representation of real RTS noise in 
MOSFETs. As stated in Sec.3.1, that is not the main aim of this study. 
3.4.2 RTS Amplitude 
When a trap captures an electron from the channel, the effective drain to source current drops. 
When the trapped electron is released into the channel, the effective drain to source current 
increases. The normalised amplitude (LID/ID) of current fluctuation between the capture 
and emission of an electron by the trap is described by Eq.(2.2) in Sec.2.2.2 [40,41] 
Eq.(2.2) assumes RTS noise amplitude does not depend on the position of the trap along 
the channel region. Evidently, this assumption is not wholly accurate [27,57]. The RTS 
amplitude has been shown to peak when the trap is located at the centre of the channel region, 
25 
Modelling 'Noisy' MOSFETs 
and reduces for locations toward the source or drain [27, 57]. However, for simplicity, in this 
study it is assumed that Eq.(2.2) is sufficient to describe RTS amplitude that depends solely 
on trap depth Xt. It is acknowledged that it is crucial to refine Eq.(2.2) to include the effect 
of trap position along the channel on noise amplitude in future models. From Eq.(2.2), the 
deeper the trap location into the oxide from the Si - Si0 2 interface, the smaller the amplitude. 
The maximum amplitude occurs when the trap lies at the interface, Xt = 0, where Eq.(2.2) 
reduces to: 
'D 	9M 	q 
1D 	 WLCOX 	
(3.3) 
RTS amplitude varies by some 40% in weak inversion compared to 5% for strong inversion 
[27]. This suggests that RTS noise will depend strongly on the bias point of the MOSFET. In 
Eq.(2.2), the bias dependence of RTS amplitude is modelled through the channel transcon-
ductance g. 
Transconductance is generally calculated as g m = 	In this study gm =is calculated 
based on the inversion layer approximation [59] in order to cater for weak inversion operation 
of the MOSFET in which significant RTS amplitude has been found: 
g can be approximated as [59]: 
'DS 
nkT/q (3.4) 
for weak inversion, and 
I2pC0 
()  
9m 	 'S (1 + AVDS) 	 (3.5) = 4! 	 D 
vn 
for strong inversion, where A is the channel length modulation parameter. 
For high drain voltage, 9m  is calculated using the velocity saturation approximation [59]: 
0761 
Modelling 'Noisy' MOSFETs 
g = WCox V sat. 	 (3.6) 






. - 2n 
kT
_) , 	 (3.7) 
( 
	
2n L 	q 
and the transition from strong inversion to velocity saturation is approximated by the transi-
tion current 'DSSV  [59]: 
'DSSV 
8nWLC - V sat  
(3.8)
- 
3.4.3 RTS mean capture and emission time 
The capture and emission of electrons cause fluctuations in the channel conductance, which 
in turns causes the drain current to fluctuate. f, represents the mean time that a trap is empty 
before capturing an electron and ? represents the mean time of a trapped electron is freed. 
and e  are described by Eq.(2.3) and Eq.(2.4) respectively in Sec.2.2.3. 
The trap cross-section pre-factor u o , activation energy for capture LE2 , and trap binding 
energy LECT are parameters specific for each trap at a given trap location, bias conditions 
and temperature [4,28]. a o and LECT depend strongly and positively on gate voltage, VGS, 
while LEB  does not depend on either VGS or VDS  [4,5,28]. There has been no reported 
explicit relation describing do and LECT dependence to bias and/or temperature. In most 
cases, their values were extracted from the study of temperature- and bias-dependence of 
RTS noise from which they were found as fitting parameters [4,5,28]. 
The channel electron concentration, n is a further important parameter in Eq.(2.3) and Eq.(2.4). 
In linear operation (assuming constant electron concentration along the channel), n is typi-
cally described by [4,28]: 
27 
Modelling 'Noisy' MOSFETs 
1'l= 
IDSLeI 
qLVDst jnv Weff' 
(3.9) 
where is electron mobility, tin, is the inversion layer thickness, and Wef f and L ei1 are the 
effective channel width and length, respectively. In saturation operation, Eq.(3.9) is no longer 
applicable, as electron concentration can no longer be approximated by a constant channel. 
A general description of electron concentration at a specific location in the channel is given 
by [58]: 
n(x, y) 	- exp [q (''(x) - V(y)) /kTJ, 	 (3.10) 
where x is the depth measured from the Si - Si02 interface, y is the lateral location mea-
sured from source, ni is the intrinsic carrier concentration, (x) is the band bending potential 
at depthx, V(y) is thequasi-Fermi potential at lateral locationy, and Na is the substrate dop-
ing concentration. Eq.(3.10) is too complex to be used here, as it wOuld involve numerical 
methods to solve for b(x) [58]. In this work, the charge sheet approximation was assumed, 
within which Eq.(3.10) becomes: 
n(O,y) = j/-exp[q((0) —V(y))/kT], 	 (3.11) 
where 0(0) 	I's  is the surface potential. A description of surface potential /s  and quasi- 
Fermi potential V(y) can be found in Appendix A. 




= 1/ V lrm* 
where m*is  the effective mass of an electron. 
(3.12) 
Modelling 'Noisy' MOSFETs 
x 
5r- 
10 I . 
	
10_13 I 
	 2 	 3 	 4 







Figure 3.4: (a) Single trap RTS noise Power Spectral Density, S(f) generated using Eq. (2.5). 
(7) Time domain RTS noise generated from Power Spectral Density S(f) using 
Sum-of-Sinusoids techniques. 
3.4.4 Methodology: RTS noise 
The method used to generate 11f noise is not applicable for RTS noise, as it cannot generate 
the discrete noise levels that characterise RTS noise. Fig.3 .4(b) shows RTS noise generated 
from the a single trap RTS noise PSD in Fig.3.4(a) described by Eq.(2.5). The inability of 
the method described in Sec.3.3. 1 to generate the correct time-domain form of RTS noise is 
clear. 
An alternative method was therefore developed, that generates RTS amplitude and mean time 
statistics using the Monte Carlo simulation method. 
The lifetime of a filled trap (lowered 'DS  current) and an empty trap (high 'DS  current) obey 
Poisson statistics [4, 11,28]. The probability that an empty trap captures an electron is [4]: 
p(t) = —exp ( -- ) , 	 (3.13) 
Tc 	\ TcJ 
and the probability that a full trap emits an electron is: 
29 
Modelling 'Noisy' MOSFETs 
1 
p(t) = -_- exp
(-Tle)
- 	 (3.14) 
e  
where p(t) and pe (t) are the normalised probabilities: 
r pcie (t)dt = 1. 	 (3.15) 
where Pc/e  (t) represents either Pc  (t) or p (t). Based on a Monte Carlo method [60], Eq.(3.15) 
is generalised to: 
PtT,.an 
/ 	pc1e (t)dt = P(ttran). 	 (3.16) Jo 
where P(tTran) is the probability of capture or emission by the transition time tTran,  having 
values from O(tTran = 0) to 1 (tTran = cc). Performing the integration results in: 
1 - exp
(- tTran 
= P(tTran). 	 (3.17) 
c/e I 
Eq.(3. 17) can be manipulated algebraically to obtain the following 
/ 	r \ 
exp I - 
tTan 
 1 	1 - P(tTran). 	 (3.18) 
\. c/e) 
Taking the natural log of both side of Eq.(3.18) and performing another algebraic manipula-
tion gives 
tTran = — c1e in (1 - P(tTran)). 	 (3.19) 
If random number generator is used to generate the probability P(tTran), Eq.(3.19) can be 
used to detennined the corresponding transition time tTra.  However, since P(tTran) is 
evenly distributed between 0 and 1, Eq(3.19) can be re-written as 
tTran = '/ in (P(tTran)) 	 (3.20) 
30 
Modelling 'Noisy' MOSFETs 
xt(nm) I Temp (K) I VGS(V)  I ao (cm2 ) I LEB (eV) I LEcT (eV) 
1.1 310 3 2.1Oe-19 0.411 0.076 
1.3 320 0.86 8.40e-20 0.186 0.218 
1.4 330 1.45 5.80e-15 0.645 0.102 
0.9 270 3.67 3.80e-15 0.480 0.078 
1 300 1.9 2.04e-15 0.593 0.108 
1.2 300 0.86 7.00e-20 0.300 0.750 
1.25 300 1.25 8.90e-18 0.320 0.060 
Table 3.1: Fitting parameters Oj, /EB , LEcT , for seven room temperature traps observed 
in 0.4 jim 2 n-channel MOSFETs with corresponding estimated trap depth Xt [4, 5]. 
Eq.(3.20) can be used to determine the transition time tTrafl. It is assumed that the capture 
and emission of an electron by a trap are mutually exclusive events. When a trap is empty, 
the only transition possible is capture and vice versa. Therefore, the generated probability of 
transition P(tTran) applies to either for electron capture or emission. 
Fig.3.5(a) and Fig.3.5(b) are examples of single trap and multi(3)-trap RTS noise generated 
using the amplitude given in (2.2), and the transition time generated using Eq.(3.20). 
3.4.5 Implementation: RTS noise 
In this initial study, the position of the single trap in each noisy MOSFET channel is set 
arbitrarily to an appropriate lateral position and depth between source and drain. Trap depth 
Xt is selected from Table 3.1. These are experimental values [4,5,28]. In order to use the 
trap depth Xt for the 35mm MOSFET model, the values were scaled down by a factor of 10 
to cater for a thin gate oxide (-- 7 - 8). 
Depending on bias conditions, the transconductance g n was calculated according to Eq.(3.4), 
Eq.(3.5), or Eq.(3.6). Using the calculated g and MOSFET parameters W, L, C O3 and t0 , 
the RTS noise amplitude AID for a trap located at (Xt, lit) was generated based on Eq.(2.2). 
At the same time, the mean capture and mean emission f, time were determined according 
to Eq.(2.3) and Eq.(2.4), respectively. cr0 , LEB, and LECT are selected from Table 3.1 for 
the appropriate value of the trap depth Xt. It is assumed that /EB is a fixed value for all bias 
31 


















0 	1 	2 	3 	4 	5 
Time (s) x 10-3  
 
Figure 3.5: (a) Single trap RTS noise. (b) multi(3)-trap RTS noise. 
32 
Modelling 'Noisy' MOSFETs 
conditions. This is a reasonable assumption as LEB has shown no dependence on bias [4]. 
On the other hand, LECT was reported to show dependence on both gate voltage VGS and 
drain voltage VDS [4]. This dependence is described by [4]: 
- q (6VGS - 	Xt 
ó(L.ECT) 	 +q&L's, 	 (3.21) 
- ttox 
where 5(LECT) and 6VGS are calculated with reference to the values (LECT and VGS) ob- 
tamed from Table 3.1 for the trap depth Xt selected. The reference VGS from Table 3.1 was 
scaled (VGS x 	to cater for lower operating voltage for the 35mm based implementation. 3.5 
8 1's is the change in surface potential with reference to initial 'i/g, calculated based on a given 
trap lateral location Yt  and drain voltage VDS. 
Similarly, a0 depends strongly on the gate voltage VGS but to the best of our knowledge, no 
explicit mathematical model of the relationship has yet been reported. For simplicity, a o was 
set to be a fixed parameter for all bias conditions. It is acknowledged that this assumption 
leads to a less accurate model; however, for the purpose of this work, this is sufficient. The 
channel electron concentration n is calculated from Eq.(3.11). The quasi-Fermi potential 
(V,) and surface potential 0 S were calculated for the trap location (Xt, yt) from equation 
Eq.(A. 1, A.2, A.4) and Eq.(A.6), respectively, in Appendix A. 
The average capture and emission times were used to determine the transition from Eq.(3.20). 
As stated in the assumptions, a trap can either be filled or emptied at a particular time instance 
t. The initial state (fill or empty) is random. Once the trap state is determined, the transition 
probability P(tTran ) is generated. The probability p(tTrafl ) is used in Eq.(3.20) to determine 
the transition time tTran.  If the simulation time t coincides with transition time tTran,  a 
transition will take place. During this transition time, if an empty trap is filled, L\ID is 
deducted from the drain current 'D,  and vice versa. 
The implementation steps for generating time domain RTS noise are summarised as follows: 
Select the number of traps. 
Assign to each trap the depth Xt, the trap lateral Yt,  and the corresponding fitting pa-
rameters (a0 , LEB, LECT), based on values in Table 3.1. 
33 
Modelling 'Noisy' MOSFETs 
3. Determine the current time t: 
• If t = 0 (initialisation) 
Probe the noiseless MOSFET drain to source current 'DS. 
Randomly generate the trap(s) initial state (empty or full). 
• If t 0 (ongoing) 
Determine the trap(s) last state (empty or flill). 
Determine the previous effective drain to source current 'DS. 
4. Calculate transconductance g using Eq.(3.4), Eq.(3.5), or Eq.(3.6), depending on the 
operating regime of the noiseless MOSFET. 
5. For each trap: 
Calculate RTS amplitude LIDS using transconductance g and Eq.(2.2). 
Calculate the electron concentration n using Eq.(3. 11). 
Calculate the effective trap binding energy /ECT(eff) by subtracting Eq.(3.21) 
from the initial 2ECT (Table 3.1), i.e. /.ECT(ef f) = /ECT - 8 (LEc ). 
Calculate the mean capture and the emission IT, time using parameters in Table 
3.1, and Eq.(2.3) and Eq.(2.4) respectively. 
Calculate the capture Pc  (t) and emission Pe  (t) probability using the calculated 
and ?e , and Eq.(3.13) and Eq.(3.14) respectively. 
Approximate the transition time tTran using the calculated capture p(t) and emis-
sion pe(t) probability and Eq.(??). 
6. Calculating the effective 'DS  at current time t. 
• If current time t = transition time tTran or 0 (initial)' 
(a) If the trap(s) was(were) empty, subtract LIDS of the empty trap from 'DS 
and set the trap(s) to full. 
'The initial (t = 0) RTS amplitude LIDS of each trap was calculated based on noiseless 'Ds  (or empty 
trap). Therefore, the initial noisy 'DS is not accurate if the initial state of the trap(s) (generated randomly) 
was(were) not empty. 
34 
10-4 
i 	 -- 
0 05 1 1 
' x 16, 
V 	=I.5O 
5 
Modelling 'Noisy' MOSFETs 
5 
VDS(t), (V) 	 VDS(t), (V) 
(a) 	 (b) 
Figure 3.6: I-V characteristic for (a) 0.351um (L=0.35m, W=1tm), and (b) 35 nm 
(L=35nm, W=O.ljim) 'noiseless' MOSFET. 
(b) If the trap(s) was(were) full, add LIDS of the full trap to 'DS  and set the 
trap(s) to empty. 
. Else, Go to step 7. 
7. Repeat 1-6 until end of simulation time. 
3.5 Results and Comparison: 11f and RTS Noise 
To explore and visualise the noise signals, the noisy MOSFET was simulated via transient 
analysis with the gate and drain biased between 0 and 1.5 Volts, while the source and body 
were fixed at 0 Volts. All analyses were performed at room temperature (270 C). 
Initially, the noiseless I-V characteristic was determined by setting the amplitude of the noise 
source n(t) to zero. The drain voltage VDS  was swept from 0 to 1.5 Volts in 1 second for 
each gate voltage, VGS. Fig.3.6 shows the I-V characteristics for noiseless 0.35gm and 35nm 
NMOS implemented using an AMS CMOS technology and an atomistic-based CMOS tech-
nology, respectively. The I-V characteristics of the 0.35m NMOS and 35mm NMOS clearly 
have the correct form. The implementation of the noisy MOSFETs has not changed the un-
derlying I-V characteristics of the 0.35m and 35mm NMOS. 
35 









Figure 3.7: I-V characteristic for 1/f based noisy (a) 0.35m (L=0.35pm, W=ltm), and 
(b) 35 mm (L=35mm, W=O.lpm) NMOS. 
The characteristics of the 1/f-based and RTS-based noisy 0.35pm NMOS, and noisy 35nm 
NMOS transistors are shown in Figs.3.7 and 3.8, respectively. The impact of device down-
scaling is evident. The noise amplitude generated by noisy 35nm NMOS is almost double the 
noise amplitude generated by noisy 0.35 /tm NMOS. This observation becomes prominent in 
Fig.3 .9. 
It is vital to confirm that the noise generated has the correct spectral form given in Eq.(2.7) 
and Eq.(2.5). The noisy MOSFET was therefore simulated for a fixed combination of drain 
(VDS) and gate (VGS) voltage (i.e. fixing the IDs).  Fig.3.9 shows the time domain noise for 
the 11f based noisy 0.35tm and 35mm MOSFET, while Fig.3.10 shows the RTS noise in 
the 35mm MOSFET. The time-domain noises show the correct form of noise generated. The 
corresponding PSDs (extracted using periodogram function in Matlab) are shown in Fig.3. 11 
and Fig.3.12, respectively. The PSDs generated for the time-domain noises show the correct 
11f and RTS characteristics match the PSDs predicted by Eq.(2.7) and Eq.(2.5). 
In order to understand how the trap position affects the noise generated in RTS based noisy 
MOSFETs, a simulation .was setup to further analyse the dependence of RTS amplitude, 
mean capture t,, and emission f time on trap location (Xt, yt). The dependence of RTS 
amplitude on trap depth and lateral location are shown in Figs.3.13(a) and (b), respectively. 











0 	05 	1 	1 5 
5O 
5 	 VGS= 1.25V 
0. 
:0 °T 
V = O.75V 
1 A 6 10" 
4 
2 





VD(t), (V) VD(t), (V) 
(a) 	 (b) 
Figure 3.8: I-V characteristics for RTS based noisy (a) 0.35,.tm (with L=0.35m, W=11-m, 
trap depth Xt = 1.1nm and lateral location Yt = 160nm) and (b) 35 mm 
(L=35nm, W=0.1[tm, trap depth Xt = 0.11mm and lateral location Yt = 16mm) 
MOSFETs. The values o, /.EB, LECT and trap Vs were selected from 
Table3. 1 corresponding to trap depth Xt 1.1mm. 
=11 IIIjiII1 II 
c 
—2 
(1 02 Azi A 4 (12 1 
Time, (s) 	x 10 	 Time, (s) 	x 10 
(a) 
Figure 3.9: Time domain noise generated by (a) 0.35um and (b) 35 nm noisy MOSFET 
simulated at static bias conditions (VDS = 1.5V and VS = 1V)for ims. 
37 
Modelling 'Noisy' MOSFETs 




0.2 	0.4 	0.6 	0.8 
Time, (s) 
Figure 3.10: The Drain-Source current noise (LIDs)for a 35nm (L35nm, W=O. 1jtm) sin-
gle trap RTS, based noisy MOSFET biased at VGS = 1.OV and VDS = 0.6V. 
The drain current with DC value (IDS = 84jiA) removed. (Note that a longer 
simulation time (1 second) was needed in order to capture more RTS noise). 
Based on the fixed bias conditions, LIDS, f andf, in the RTS based noisy 
MOSFET were generated to be 2.62e-6 A, 2.918e-3 s, and 1.279e-2 s, respec-
tively. 
38 













0.35j,m:., 	 $ 
0• • • 
I f 
io4 	 106 
Frequencyj Hz 
Figure 3.11: The Power Spectral Density (PSD), S(f) corresponding to the time domain 














Figure 3.12: The Power Spectral Density (PSD), S(f) of a single trap RTS plotted in 
Fig.3.10. The dashed-line is the calculated PSD based on Eq.(2.5). 
39 
Modelling 'Noisy' MOSFETs 
amplitude. RTS noise amplitude shows strong gate voltage (Vcs) dependence, where at weak 
inversion, the amplitude peaks. These results are consistent with the characteristics reported 
in [27, 41,57]. 
From the same simulation, the mean capture and emission times were plotted in Fig.3.14 
and Fig.3.15. Trap depth has a significant effect on mean emission time dependence on 
gate voltage, while mean capture time appears unaffected. At weak inversion (low lcs), 
deeper traps are unable to retain electrons, however, as the voltage increases, the retention 
time becomes longer. Trap lateral position influences the dependence of mean capture f, and 
emission fe  time on drain voltage (VDS). and e  for a trap located near the source appear 
to be independent of VDS  while dependence become very obvious for a trap located near 
the drain. This effect corresponds to drain bias influence on channel electron concentration 
described by Eq.(3.1 1). 
Finally, multi-trap RTS noise was studied in the same context. 3-trap RTS noise and 10-
trap RTS noise was injected into MOSFETs. The variation between traps was based on 
implementing different trap depth (Xt) and lateral positioning (yt).  Figs.3. 16(a) and (b) show 
3-trap RTS noise and 10-trap RTS noise simulated for is. The number of noise discrete levels 
increases with the number of traps. However, due to limited simulation time, the full spread 
of current level (i.e. for 3-trap RTS, 8 levels are expected) is not seen. Using the Matlab 
periodogram function, the PSD of the 3-trap and 10-trap RTS noise was extracted and plotted 
in Fig.3.17. The PSDs approach 11f PSDs as the number of traps increases. 
3.6 Discussion: Noise Modeling 
Both 11f and RTS- noisy MOSFET models have been implemented in Verilog-A. The method-
ology used to generate 11f noise is based on the sum-of-sinusoid approximation [52,53] 
while RTS noise is generated based on both noise parameters (amplitude and time statistics) 
and Monte-Carlo simulation [57]. It was found that while the sum-of-sinusoids technique is 
simple and easy to implement, extensive computing power is required to generate a better 
representation of 11f noise. The technique used to generate RTS noise is more intuitive and 


















VGs, (V) 	2 1.5 	VDS, (V) 
 
Figure 3.13: (a) Normalised (LI D/ID) R TS amplitude for three dJJerent trap depths Xt = 
0.1 mm. 0. 4nm. 0.8mm with trap lateral position ('measured from source) fixed 
at Vt = I6nm. 'b) Normalised (LID/ID) RTS amplitude for three df[erent trap 
lateral positions (measured from source) y t = 5mm, 16mm, 30nm with trap 
depth fixed at Xt = 0.1 1mm.. The simulation was based upon parameters cor-
responding to Xj = 1. mm in Table 3.1. 
41 






0 5 	 I S 
(iS' 	









10 - I 
0 
V , (V) 	1.5 0 	V0,(V) 
 
Figure 3.14: Mean capture and emission tinies for three different  trap depths with/fred Yt = 
I67irn (a) Xt = 0.1mm, (b) x,= 0.4rzrn, (c) x= 0.8nm. 
42 





















0.5 	 - 	 1.5 
GS' 
(V) 	
1 5 o 	0.5 	(V) 
 
Figure 3.15: Mean capture and emission times for three dJJereni trap lateral locations with 
fixed x t = 0.1 Inrri (a) Yt = 5nm, (b) Yt = I6nm, (c) yt = 30mm. 
43 
- =3-jrraWpj IM 
1.2 
-20 
- 100 	10 1 	102 
Frequency 1, (Hz) 




0 	0.2 	0.4 	0.6 	0.8 
	
'0 	0.2 	0.4 	0.6 	0.8 




Figure 3.16: RTS noise generated by noisy MOSFET implemented with (a) 3 Traps and (b) 
10 Traps. Trap parameters were based on the value in Table 3.1 corresponding 





Figure 3.17: Power Spectral Density of/-trap, 3-trap, and 10-trap RTS noise. Dashed-line 
indicates the 1/f noise PSD. 
44 
Modelling 'Noisy' MOSFETs 
becomes increasingly complex. 
It has been shown so far that the techniques used are capable of generating the correct noise 
fonn (11f or RTS). It has been confirmed that PSDs of generated time-domain noises match 
the predicted PSDs in Eq.(2.7) and Eq.(2.5). It has been noted that superposition of the RTS 
noise produces 1/f noise, in agreement with finding reported by [4, 11,45]. Multi-trap (3 and 
10 traps) RTS noisy MOSFETs were implemented. The time domain plots of these multi-trap 
noisy MOSFETs (Fig.3.16) show multi-level switching as expected. Fig.3.17 compares the 
PSDs generated from these time-domain RTS noises. Using power series non-linear least 
square fitting function (in Matlab), the 10-trap RTS noise fitted well to a theoretical 1111.2 
PSD. This suggests that as the number of switching levels increase (i.e. increase number of 
traps), the generated noise becomes more like the 11f. 
3.7 Summary 
Low frequency noise will have dramatic effects on future nanoscale MOSFETs and circuits. 
A noisy MOSFET has been modelled to emulate the predicted noisy behaviour of future 
nanoscale MOSFETs. Both 1/f and Random Telegraph Signal (RTS) noise have been stud-
ied. A pragmatic approach has been taken to include the effect of this form of noise in 
MOSFETs, such that the modelled noise can be included in circuit simulations. In effect, a 
methodology has been put in place that builds a modelling bridge between atomistic models 
of DSM devices (and the physical devices upon which they were based), through look-up 
table models of the noise in such devices, to circuit models of noisy DSM devices and cir-
cuits that can be simulated in reasonable time and using conventional analogue simulators. 
The methodology opens up the opportunity to explore and investigate the effect of noise in 
nanoscale MOSFET circuits. 
45 
Chapter 4 
Noisy Circuit Implementation 
In this chapter, the effects of nanoscale MOSFET noise on circuit performance are explored. 
For the reason that will be discussed in Chapter 6, the implementation of the noisy circuit 
focuses on an analogue multiplier as the benchmark architecture. A 2-quadrant analogue 
multiplier described in [61] will be used in the implementation. With a simple modification 
[3], a stable 4-quadrant multiplier will be implemented. The noisy 2-quadrant and 4-quadrant 
analogue multipliers will be implemented by replacing the key MOSFETs with the noisy 
MOSFET models developed in Chapter 3. The performance of these noisy analogue multi-
pliers will be presented and discussed. 
4.1 Noisy 2-Quadrant Multiplier 
Analogue multipliers are used extensively in almost all forms of neural architecture, repre-
senting, primarily, the effect of synaptic gating. Many forms of multiplier have been used 
[61-64]; however, simple circuit form and a current-mode output are essential for reducing 
power- and area- consumption and introducing scaling flexibility in massively-parallel neural 
architectures [32]. 
The 2-quadrant multiplier discussed in [61] enjoys a wide input range and simple design [61], 
without having to have additional biasing circuitry [3]. These are the preferable attributes for 
neural architecture hardware implementation, and therefore become the basis of our choice 
in this chapter. 
4.1.1 Circuit Description 
Fig.4. 1 shows a 2-quadrant Chible multiplier configured as a synaptic (weight) multiplier. 
The weight V,, voltage determines I, which is then multiplied by V2,, referenced to Vrf, to 
46 
Noisy Circuit Implementation 
VDD 	VDD VDD 
M jH 7_ L 'out 
'I 	4r t'5 




Figure 4.1: 2-quadrant Chible multiplier. 
produce 'out• 
All MOSFETs are in strong inversion and in saturation, so the output current 'out  is given by 
(channel length modulation is neglected) [61]: 
I/3/34 ,5 
'out = 	2  (V — TLVTH2 — VTH1) (1/ — V rei), 	
(4.1) 
where 04 ,5 refers to the transfer parameters for M 4 and M 5 , respectively and ,3n is given by: 
1 	1 	1 
= 	+ n x=. 	 (4.2) 
,31 and 01 are the transfer parameters for Mn, and M2 respectively. The transfer parameters 
are defined as,31 = ii x C0 (- ) and /32 = P2 x G. ()2• VTH1 and VTH2  are the threshold 
voltage for Mn, and M2, respectively, and n is the slope factor usually smaller than 2 which 
tends to 1 for very large values of gate voltage [61]. V > VTH1 + VTH1 must be satisfied. 
47 
Noisy Circuit Implementation 
4.1.2 Circuit Implementation 
The 2-quadrant multiplier was implemented in an artificially-modelled 35mm CMOS technol-
ogy. MOSFETs M 1 , M 4 , and M 5 are modelled as noisy MOSFETs, to introduce noise to 
the signal path without compromising the performance of the multiplier. The current mirrors 
remain noise-free. This caveat is reasonable, as M2, M3, M6, and M7 are large, to provide 
good transfer characteristics. 
4.1.3 Simulation Results and Discussion 
The 2-quadrant multiplier shown in Fig.4. 1 has been simulated using the SPECTRE simulator 
with BSIM3v3 version 3.1 models. VDD was set to 1 .5V and Vf to 0.75 V. Using transient 
analysis, the circuit was simulated by sweeping the weight voltage V,, from OV to 1 .5V for 
each input voltage 1/ which is also varied between 0.5V and 0.9V with a 0.025 V step size. 
The output current for no injected noise 'out  are shown in Fig.4.2. The output current 'out 
shown in Fig.4.2 matches the behaviour predicted by Eq.(4. 1). However, a slight deviation 
can be observed for l4nVrei,  where  'out  0 as predicted by Eq.(4. 1). This observation is 
attributed to the effect of channel length modulation [49,59] in M6-M7. A remedy to this 
problem is to use a cascode current mirror in place of M6-M7 and increase the gate length of 
the current mirror MOSFETs. 
In Fig.4.3(a) shows the output of a 11f noisy multiplier and Fig.4.3(b) shows that of an 
RTS noisy multiplier implemented using 11f and single-trap RTS noisy MOSFET models 
respectively. In Fig.4.3(a) and Fig.4.3(b), the multiplier output noises depict dependence on 
both V, and Vi,, with significant noise amplitude observed for V, 1, = 1.5V and Vi,., = 0.7V. 
In order to analyse the output current noise characteristic, the noisy 2-quadrant multiplier was 
simulated using fixed bias transient analysis. Fig.4.4(a) and Fig.4.4(b) show the noisy output 
current, 'out, for implementation using 1/f-based noisy MOSFET models and single-trap 
RTS-based noisy MOSFET models respectively. There is, however, no correlation between 
the amplitude produced by the 11f- and RTS-based implementations as the parameters used 
to generate 11f noise were artificially scaled to produce statistically significant noise ampli-
tudes. The output current noises shown in Fig.4.4(a) and Fig.4.4(b) exhibit the time domain 
Noisy Circuit Implementation 
V. =O.90V 
In 
V. =O.85V In 
V. =O.80V 
In 




0 0.5 	 1 
Weight voltage V, (V) 
V. =O.70V 
in 








Figure 4.2: 35nm CMOS technology 2-quadrant multiplier output current 'outS 
noise characteristics expected from each implementation which are confirmed by their PSD 
plots as shown in Fig.4.5. 
For RTS, the output noise produces 6-7 discrete levels, corresponding to 3-traps activity in-
jected into the multiplier through M 1 , M 4 , and M 5 . Ideally, 8 discrete levels are expected, 
however for the given simulation time, it might not be possible to see them all. The PSDs 
(Fig.4.5) show that the output current noise inherits the characteristic noise behaviour of the 
noisy MOSFETs used. 
4.2 Noisy 4-Quadrant Multiplier 
Chible [6 1] suggested that a 4-quadrant multiplier can be implemented by augmenting the 
2-quadrant multiplier discussed in Sec.4. 1 . However, a major drawback of this suggestion 
is that the circuit's reference zero is dependent on the threshold-voltage, and is therefore 
process-dependent [3]. This is clearly evident in Eq.(4.1). The lack of a unique reference zero 
discourages the precise mapping of parameter values between the hardware implementation 








V. = 0.90V 
V=0.85V 
V= 0.80V 







10 	05 	I 	I.5 	 0 	0.5 	1 	1.5 
Weight voltage V, (V) 	 Weight voltage V. (V) 
(a) 	 (b) 
Figure 4.3: 35nm CMOS technology 2-quadrant multiplier output current with M 1 , M 4 , 





0 	0.02 	0.04 	0.06 	0.08 	0.1 	10 	0:2 	0.4 	0.6 	0.8 	1 
	
Time, (s) 	 Time, (s) 
(a) 	 (b) 
Figure 4.4: (a) 1/f and (b) single trap RTS based noisy MOSFET transient plot with VDD, 
Vw, Vj, Vreiwere set to 1.5V, JV, 0.6V, and 0. 75V, respectively. Note that 























Frequency f, (Hz) 
Figure 4.5: Power spectral density generated from time domain noise data in Fig.4.4. 
Chible multiplier has been implemented in [3] with two identical computing cells, each of 
which corresponds to the Chible multiplier proposed in [61]. In this implementation, the 
reference zero can be externally set by the external inputs, Wrej and Srf. Due to its simple 
architecture and reliable performance, the modified Chible multiplier (Fig.4.6(b)) is adopted 
for this project. However, the modified Chible multiplier's computing cell architecture is 
changed slightly here to cater for the NMOS-based implementation, as the noisy MOSFET 
developed in Chapter 3 is NMOS. The computing cell is shown in Fig.4.6(a). 
4.2.1 Circuit Description 
A detailed description of the modified Chible multiplier circuit can be found in [3,32]. The 
change made in this project on the computing cell's architecture does not change the overall 
multiplier behaviour. From [3,32], the output current I, is described as: 
KN (l4T - IVre j) . ( Si - Srei) , jf I > U"r(f 
IOU, = 	 ( 4.3) 
R' (14/j - 11 rei ) . ( S - Sre:i) , 	if1I < 
51 
Circuit Implementation 
VDD 	 V/,j) 	Vo/) 
I 
Vft o \/ 	
lol 	162 	103 	10 	Mjkl_4J2 •1 '1 '1 '1 
H 	 .ovifl 
IEAB
JUI 	
I f 	Mit) 
(a) 





Figure 4.6: The modified Chible 4-quadrant multiplier adopted from f3J. (a) one- computing 
cell of the ,nodfied ('hible multiplier ('b) the fill 4-quadrant multiplier circuit 
composed of two computing cells. 
52 
Circuit Implementation 
where KN and Kp are constants that depend on the size of differential pair AI, - M 5 and 
M 6 - M, respectively. 
4.2.2 Circuit Implementation 
The 4-quadrant multiplier was implemented in an artificially-modelled 35rirn CMOS tech-
nology. AI, IVA,  M 5 , iVl, 16 , and M07 in both cell A and cell B are modelled as noisy 
MOSFETs, to introduce noise to the signal path without compromising the performance of 
the multiplier, while the current mirrors remain noise-free. The effect of different numbers of 
traps was explored by implementing a 4-quadrant multiplier based on 1-trap noisy MOSFETs, 
and 10-trap noisy MOSFETs, making the total number of active traps in each implementation 
10 and 100 respectively. In addition, a 4-trap, 4-quadrant multiplier (i.e. only 4 active traps in 
the full 4-quadrant multiplier) was also implemented to explore the effect of a small number 
of traps on output noise. 
4.2.3 Simulation Results and Discussion 
The 4-quadrant multiplier has been simulated using the SPECTRE simulator with BSIM3v3 
version 3.1 models. VDD  was set to 1.5V, V 1 to 0.75V and S'rej  to 0.75V. Using transient 
analysis, the circuit was simulated by sweeping weight voltage W, from OV to 1.5 V for each 
input voltage Sj, which was also varied between 0.55V and 0.95V with a 0.05V step size. 
The output current for no injected noise, 'out,  is shown in Fig.4.7. The output current 'out 
shown in Fig.4.7 matches the behaviour predicted by Eq.(4.3). 
To investigate the time domain noise characteristic, the noisy 4-quadrant multiplier was sim-
ulated using fixed-bias transient analysis. For this analysis, W is set to values within the 
range [0,1.5] (V) with a 0.1 5V step size, while S 1 is set to value within the range [0.55,0.95] 
(V) with a 0.05V step size. For each combination of 11 and Si , the multiplier was simulated 
for 250ms with a I jis time step. In order to observe significant trap activities (capture and 
emission) within this 'short' simulation time, the trap cross-section pre-factor o values in 
the mean capture Eq.(2.3) and emission Eq.(2.4) time models described in Chapter 3 were 
53 
It 
2 V, r= 














Noisy Circuit Implementation 
0 	0.5 	1 	1.5 
Weight voltage ( V) (V) 
Figure 4.7: 35nm CMOS technology 4-quadrant multiplier output current 'out 
increased arbitrarily. An example of time domain noisy 4-quadrant multiplier output noise 
is shown in Fig.4.8. The 4-trap implementation (Fig.4.8(a)) produces distinct levels of noise 
amplitudes, while 10-trap and 100-trap implementations (Fig.4.8(b) and Fig.4.8(c) respec-
tively) produce a continuous' level of noise amplitudes; the expected results based on the 
discussion in Sect.2.2.4. Their corresponding PSD plots shown in Fig.4.9 confirm to the ear-
lier findings that a large number of traps produces a noise closer to the 1/f characteristic. 
The peculiar (up-turn) characteristics observed at high frequency for 100-implementation is 
attributed to aliasing effect during PSD generation. 
Figs.4. 1 0(a-c) show the output noise amplitude from three different combinations of the 4-
quadrant multiplier input implemented based on 4-traps. These results are typical examples 
of output noise amplitude variation caused by changing bias conditions. 
The linear combination of normalised noise amplitude data for all input combinations gener-
ates distribution as shown in Fig.4.1 1(a). To perceive how noise amplitudes behave in com-
parison to Gaussian noise, a histogram-fit curve of Gaussian noise with u = 0.1 generated 









_0.150 	0.005 	0.01 	0.015 















0.005 	0.01 	0.015 	0.02 	0.025 
Time (sec) 
(c) 
Figure 4.8: The output current I for (a) 4-trap, (b) 10-trap, and (c) 100-trap 4-quadrant 
noisy multiplier simulated for 2507ns with 1ts time step. 
55 
Noisy Circuit implementation 









10 1O_ 	 10 
4 
Frequency (Hz) 
Figure 4.9: The power spectral densities (PSDs) of the time domain noise data shown in 
Fig. 4.8, generated using periodogram function in Marlab. 
mentation (Fig.4.1 1(b) and (c)). It is important to note that noise amplitudes are grouped as 
16 levels only' for all RTS implementations (4, 10, and 100 traps) to visualise and compare 
the different implementation output noise as a histogram. The general noise amplitude distri-
bution follows a Gaussian distribution and as expected; the 100-trap output noise amplitude 
distribution is almost Gaussian. This is consistent with the central limit theorem of mathe-
matical statistics, that suggests the superposition of many independent random phenomena 
produces a Gaussian distribution [65]. Based on discussion in Sect.2.2.4, the large number of 
traps (i.e. 100 traps) give rise to 11f noise. 
4.3 Summary 
Noisy 2-quadrant and 4-quadrant multipliers have been implemented and simulated using 
noisy MOSFET models placed at the strategic locations to introduce noisy products without 
degrading the original performance of the multiplier itself. Simulation results indicate that 
'A 10-trap implementation can produce 1024 (210) possible levels, whilst a 100-trap implementation can 
produce 1.27 x iO° (2100)  possible levels. 
56 













-0.4 	-0.2 	0 	0.2 	0.4 

















-0.4 	-0.2 	0 	0.2 	0.4 














-0.4 	-0.2 	0 	0.2 	0.4 
Normalised noise amplitude 
 


























-1 	—0.8 —0.6 —0.4 —0.2 0 	0.2 0.4 0.6 0.8 
































-I —0.8 —0.6 —0.4 —0.2 0 	0.2 0.4 0.6 0.8 













-I 	—0.8 —0.6 —0.4 —0.2 0 	0.2 0.4 0.6 0.8 
Normalised noise amplitude 
 
Figure 4.11: Cumulative normalised noisy synaptic multiplier output noise amplitude imple-
mented with (a) 4-trap (b) 10-trap (c) 100-trap. 
58 
Noisy Circuit Implementation 
the multipliers retain the original performance, although the output becomes noisy. Small 
numbers of traps produce distinct levels of output noise amplitude while, as the number of 
traps become large, the levels become 'continuous', approximating a 11f characteristic. 
59 
Chapter 5 
Probabilistic Neural Computation 
5.1 Introduction 
Probabilistic (Stochastic) neural computation uses stochasticity to extract and classify im-
portant features in real-world data, in applications such as sensory fusion and classification 
[21-23,31,32]. 
The fundamental element of probabilistic neural computation is the stochastic neuron, which 
sums its inputs to decide the probability of the output state of the neurons. This probabilistic 
relationship gives a probabilistic neural system the ability to model the natural variability 
of real data. In addition, the probabilistic relationship enhances the system's fault tolerance 
[3,24, 32,66]; an important capability for the implementation of robust and reliable systems 
using future nanoscale MOSFETs. 
The ability to both adapt to and tolerate noise makes probabilistic neural computation at-
tractive for VLSI implementation. However, few probabilistic neural models are hardware-
amenable, and even fewer are capable of modelling continuous-valued (analogue) data. The 
Diffusion Network for example has been shown to be able to model analogue data [67,68], 
but a hardware implementation of this model is impractical because of its plethora of recur-
rent connections [3]. On the other hand, the PoEIRBM' hardware [3,69] has limited ability 
to model continuous-valued data [3]. The Continuous Restricted Boltzmann Machine, with 
a simple and unsupervised training algorithm, has been shown to have the ability to model 
continuous-value data [21-24] and is amenable in VLSI [3, 23, 31, 32]. For the purpose of this 
study, therefore, the CRBM is chosen to serve as a well-developed and well-understood ex-
perimental platform for the investigation of probabilistic neural computation using nanoscale 
MOSFET noise. The results have, however, wider implications and the methodology devel-
oped is generic. 
'POE/RBM is an abbreviation for Product of Experts in the Restricted Boltzmann Machine form. 
ME 
Probabilistic Neural Computation 
5.2 Continuous Restricted Boltzmann Machine (CRBM) 
The Continuous Restricted Boltzmann Machine (CRBM) is a probabilistic neural architec-
ture capable of modelling analogue data and adapted ('trained') according to a simple, unsu-
pervised training algorithm based on minimising contrastive divergence [3, 23,24,32]. The 
CRBM is based on Hinton's Product of Experts [70], in Restricted Boltzmann Machine form 
(POEIRBM), comprising continuous stochastic neurons analogous to those of Diffusion Net-
works, with limited (Restricted) interconnectivity [24, 70]. The CRBM has been shown to be 
amenable to VLSI implementation and to be potentially useful as both a robust classifier and 
as a "novelty detector" [3, 30,31].. 
The probabilistic behaviour of the CRBM is introduced by the continuous stochastic neurons. 
The stochastic behaviour of the CRBM's neuron is driven by noise injection to its input. The 
noise inputs cause neurons to have continuous-valued, probabilistic outputs. Neurons with 
noise-induced stochasticity provide the ability to develop diverse stochastic behaviour (bi-
nary, continuous, deterministic) in the CRBM which leads to a modelling flexibility that is 
advantageous with real data. Experiments with both artificially-generated and real- biomedi-
cal [24] and chemical data [21,22] show that CRBM can model continuous data successfully 
with a simple, reliable training algorithm. 
5.2.1 General architecture 
The CRBM has one visible and one hidden layer with only interlayer connections. Fig.5. 1 
shows an example of CRBM with 3 visible neurons and 4 hidden neurons. The dark-grey 
circles (vo and h0) represent two bias (permanently 'ON') neurons whose outputs are 1. The 
connection between bias neurons to a neuron is thus the threshold of the neuron. The in-
terlayer connections (between visible and hidden neurons) are bidirectional and symmetric 
(w23 = w32 ). The vector 00 denotes the weight vector of hidden neuron i. 
The restricted architecture enhances the distinctive functions of neurons in the visible and 
hidden layers [71]. Visible neurons pass and receive data to and from the 'outside world' 
(interface). Normally the number of visible neurons corresponds to the number of dimensions 
61 
Probabilistic Neural Computation 
wt' /' s 	(2) Ch3 
/ 
/ 
wo/ w /o 	I 
VI) 	t VI 
Figure 5.1: CRBM network with 3 visible neurons and 4 hidden neurons. 
of data the CRBM must model. The function of hidden neurons, however, differs from that 
of visible neurons. Each hidden neuron represents an 'expert' whose weight vector encodes 
a particular feature of the input data. The number of hidden neurons therefore depends on 
the complexity of the features of the input data that the CRBM needs to model. A small 
number of hidden neurons may result in a poor modelling capability. However, increasing 
the number of hidden neurons (at the expense of bigger computation and network size) may 
not necessarily give a better capability [3,24]. For each given case, the number of hidden 
neurons is chosen empirically to maximise modelling ability while minimising the number of 
free parameters [24]. 
5.2.2 A continuous stochastic neuron 
The CRBM employs continuous valued stochastic neurons (Fig.5.2). Let si be the output of 
neuron i, with inputs from neurons with states {s 3 
 } 
connected by a weight matrix {w 3  }. The 
behaviour of neuron i is [32]: 
Si = , 	+ ni 	 (5.1) 
with 	(x) = OL + (OH - 0L) 	
1 	
(5.2) 
1 + exp (—ax) 
62 






Xi H: 2r..H 	Si ~W 
ni 
Figure 5.2: Continuous valued CRBM neuron. 
where o() is a sigmoid function with asymptotes at °H  and °L,  where ai controls the slope 
of 
(.), 
and thus the nature of the neuron's stochastic behaviour [32]. If a i is high, the neuron 
is essentially a binary-stochastic 'decision-make', while for a i low, the neuron is more or less 
deterministic (i.e. comparable to a Multi-Layer Perceptron unit). Between those extremes, the 
neuron is able to model the noise and variability that is present in all real data. In the 'perfect 





p(Thj) 	 ex 
av/ 	
(5.3) 
a represents a 'noise'-scaling constant and N(O, 1) represents a Gaussian variable ('noise') 
with zero mean and unit variance. 
5.2.3 CRBM training 
The CRBM is trained by adapting both the weight {w 23 } and noise-control {a 2 } parameters 
by minimising "contrastive divergence" (MCD) between training data and the one-step Gibbs 
sampled data [72]. During training, the visible neurons are clamped with training data to 
produce f  vi  }. Then, the hidden neurons states { h3 } are sampled according to Eq.(5. 1). One-
step Gibbs sampled data are derived by repeating these procedure so that visible and hidden 
neurons are sampled once more to produce 
{} 
and I hj 1. Fig.5.3 illustrates the one-step 
63 
Probabilistic Neural Computation 
 t ii f 
(a) 	 (b) 	 (c) 
Figure 5.3: One-step Gibb sampling. 
Gibbs sampling of CRBM with two visible and three hidden neurons. The weight {w3 } and 
noise-control parameters {a 2 } of CRBM are updated according to the following simplified 2 
MCD training rules [3, 32]: 
	
= ]w 	- 
(
bi 





where vi and h3 refer to the state of visible neuron i and hidden neuron j respectively, and s, 
represents both vi and h3 . 77,, and 7/a  are constants defining the learning rates of wij and a 
respectively, and the brackets () in Eq.(5.4) and Eq.(5.5) denote the expectation value over 
all training data. 
The values of learning rate for visible i, hidden 77h  and weight i are determined empirically 
depending of the complexity of the training data. Typical learning rate for visible T/,, is set 
larger than (> 10 times) that of hidden 'q  and weight Thy.  This setting encourages faster 
adaption of the {a 2 } for the visible layers, compared to the weight w ij , and hidden noise 
control parameters {a2 }, to model the detail of the training data distribution [3, 21, 22]. 
2 Training algorithms are simplified to enable a CRBM implementation with only multiplication and addi-
tion/subtraction [23, 30]. 
64 









Figure 5.4: Typical CRBM neuron circuit implementation. 
5.2.4 CRBM in VLSI 
A full CRBM system implemented in VLSI, can reconstruct a variety of continuous data 
distributions [3]. The major improvement over previous probabilistic neural hardware imple-
mentations [66,73, 74] is the use and training of the continuous-stochastic neuron. As this 
element of the CRBM also incorporates the noise inputs, it deserves some detailed descrip-
tion. 
The CRBM's neuron circuit ([3]) is shown in Fig.5.4. The outputs of the four-quadrant ana- 
logue multipliers are summed into a current 'sum  representing 	 which is subse- 
quently threshold and transformed by the sigmoid-function block to produce output voltage 
Vs. The probabilistic behaviour of the CRBM's neuron is driven by injecting noise I7 j to 
tsum to produce current with stochastic behaviour controlled by the input voltage Va , 
as depicted in Fig.5.4. 
65 
Probabilistic Neural Computation 
In the VLSI CRBM, noise is generated on-chip and injected directly into the CRBM neuron. 
However, [3] found that, unsurprisingly, noise generators can interfere with the analogue ref-
erences, introducing extra computational errors. In addition, in large CRBM networks, the 
implementation of multiple, uncorrelated noise sources on- or off-chip becomes impractical, 
as unwanted correlations, reliability, area, and power problems increase. One solution is to 
localise fluctuations—introducing fluctuations only to the neurons but not to the determinis-
tic training circuit. This points to the use of intrinsic MOSFET noise to replace externally 
generated noise. 
5.3 Summary 
CRBM has demonstrated a promising modelling ability and hardware amenability suitable 
for realising intelligent embedded systems. The modelling ability is attributed to the noise-
induced, continuous-valued probabilistic behaviour of the CRBM neuron while hardware-
amenability owes to the simple training algorithm employed by the CRBM. 
The incorporation of artificially-generated noise is the distinctive feature of CRBM. Noise is 
added to the deterministic signal in the CRBM neuron to produce probabilistic output which 
is used to develop diverse stochastic behaviour in the CRBM. 
While a full CRBM system implemented in VLSI can reconstruct a variety of continuous data 
distributions, the problems associated with on- or off-chip noise generation limit the robust-
ness and flexibility of the CRBM VLSI system [3]. This leads to the idea of using intrinsic 
MOSFET noise to introduce noise localisation into each neuron, which would potentially 
alleviate the problems. 
Chapter 6 
Noise in the CRBM 
6.1 Introduction 
The 'Perfect CRBM' neurons use zero-mean Gaussian noise, injected from an external source 
to the pre-sigmoid sum of synaptic products, as shown in Fig.5.2. This artificially-generated 
noise (Eq.(5.3)) causes the neurons to have continuous-valued, probabilistic outputs. The 
injected noise variance (and hence, the effective maximum noise magnitude) is controlled 
by the global noise-scaling constant, ci. Small ci generates over-fitting and the CRBM sys-
tem becomes near-deterministic, while large a results to complete loss of the modelling 
capability—attributable to domination of the injected noise [32]. The optimal a value de-
pends upon the data distribution to be modelled. A useful rule-of-thumb, drawn from several 
CRBM-modelling projects, is to set a for visible neurons close to the standard deviation of 
the data to be modelled, while a for hidden neurons is set to intermediate values between 0.4 
and 0.6 [21,22, 24]. 
In VLSI implementation, the noise generator circuit must deliver uncorrelated noise, other -
wise the correlation will be 'detected' by the training rule, and subsequently will introduce 
training errors [3,66]. An efficient on-chip implementation of a multiple uncorrelated noise 
generator circuit has been reported in [3, 66], but this becomes unfeasible as the network size 
increases. 
The externally generated Gaussian noise could, in principle, be replaced by intrinsic nanoscale 
MOSFET noise. Intrinsic nanoscale MOSFET noise is unlikely to be Gaussian with a mean 
of zero. It is therefore crucial to investigate and to understand how the CRBM's performance 
is affected by non-ideal noise characteristics. Additionally, the possible locations in the neu-
ron architecture for the MOSFET noise source(s) have been investigated. 
67 
Noise in the CRBM 
-1 	-0.5 	0 	0.5 	1 	 -I 	-0.5 	0 	0.5 	1 
SI SI 
(a) 	 (b) 
Figure 6.1: (a) Training data. (h) 20-step reconstruction of the CRBM with zero mean Gaus-
sian noise. 
6.2 Zero-mean Gaussian noise in CRBM 
For a benchmark performance measure, a (Matlab) CRBM network (Fig.5.1) with zero-mean 
Gaussian noise is used to model two well-separated clusters of 200 data points, shown in 
Fig.6.1(a). Artificially-generated zero-mean Gaussian noise is injected to the pre-sigmoid 
input of each neuron. The CRBM is trained with 7/w = 0.3, 7/av = 10 for visible neurons, 
77ah = 1 for hidden neurons, and a = 0.1 for 5000 epochs. Fig.6.1(b) shows the reconstruc-
tion of the trained CRBM by Gibbs sampling from 200 random initial data for 20-steps 1 . The 
hidden neurons' weight vectors are projected into the visible neurons' state space to reveal 
the contribution of each neuron to the distribution of the reconstructed data. This is done 
by passing {()} through the sigmoid function (.) in Eq.(5.1), i.e. r(1) = çü (()) shown 
in Fig.6.2. The projected weight vectors are treated as continuous-valued outputs of visible 
neurons which reveals the effects of hidden neurons on the distribution of reconstructed data. 
Fig.6.2 shows that the hidden neuron bias unit hO, hidden neuron hi, and hidden neuron h2, 
do not contribute to the reconstruction. On the other hand, the weight vector r 3 suggests that 
hidden neuron h3 encodes the training data clusters' (symmetric) positions. 
'The reconstruction of the trained CRBM network by Gibbs sampling from any given number of random 
initial data for 20-steps will be referred to as 20-step reconstruction, unless clearly stated otherwise. 









Figure 6.2: (a) Weight vector for bias neuron (b) Weight vector for hidden neurons. 
In this model, the 'scatter' in the data is entirely attributable to the injected noise; not to the 
contribution of hidden neurons hi and h2. 
Fig.6.3(a) and Fig.6.3(b) show the evolution of {a} for visible neurons and hidden neurons 
during training, respectively. Fig.6.3(a) displays {a} evolving in a form of "autonomous 
annealing" [24, 32]; gradually reducing the noise level in the visible neurons, thus making the 
neurons near-deterministic. Fig.6.3(b) depicts similar {a. } evolution behaviour for the hidden 
neuron hi and h2 while hidden neuron h3 clearly performs a different function. The large 
{ a} of hidden neuron h3 suggests that the neuron is behaving near-binary, and allowing two 
clusters to be modelled. For this simple training data, hidden neurons hO-h2 can be removed 
without affecting the reconstruction quality. 
6.3 Non-zero mean Gaussian noise in CRBM 
Intrinsic nanoscale MOSFET noise may not be Gaussian with a mean of zero [4,39]. It is 
therefore crucial to investigate and to understand how the CRBM's performance is affected 
by this form of non-ideal noise characteristic. 
Before injecting MOSFET noise, the CRBM's ability to 'cope' with non-Gaussian noise in 
a pure software model will be demonstrated. With non-zero mean Gaussian noise, Eq.(5.1) 










/ 	 1z2 
hi 
o 
'0 	1000 2000 3000 4000 50 0 
Training epoch 
(b) 
Figure 6.3: (a) Visible neurons and ('h) Hidden neurons noise control parameters evolution 
with training epoch. 
becomes: 
Si = , 
	
+flx()) 	 (6.1 
where flr(j)  represents a non-zero mean Gaussian noise source with distribution described as 
1 
P(n()) = cr/ exp 	2cr2 	
) 	
(6.2) 
A CRBM (Fig.5. 1) with non-zero mean noise was trained to model the data shown in Fig.6. 1(a). 
The CRBM is trained with 	= 0.3, Tlav = 10 for visible neurons, 71(lh = 1 for hidden neu- 
rons, a = 0.1, and mean n.., = 0.8. After 5000 training epochs, the CRBM reconstructed data 
points by Gibbs sampling from 200 random initial data for 20 steps, is shown in Fig.6.4. 
The simulation is repeated with ñ 1 = 2. After 100000 epochs, the CRBM achieves a good 
20-step reconstruction. These results indicate that the CRBM's modelling capability remains 
good, although longer training times are required as the mean of the noise increases. 
To investigate how the CRBM responds to the injection of non-zero mean noise, the CRBM's 
20-step reconstructions after 500, 30000, 40000 training epochs are shown in Figs.6.5(a-c), 
70 
Noise in the CRBM 
0.5 	 0* 
(ID 0 
—0.5  
- i 	—0.5 	0 	0.5 
Si 
Figure 6.4: 20-step reconstruction of the CRBM injected with non-zero mean h i = 0.8, 
trained for 5000 epochs. 
respectively. The evolution during training of the {a 1 } of visible neurons and hidden neurons 
and the hidden-visible weights are shown in Fig.6.6(a-c). At the initial stage of training, the 
CRBM performs a crude approximation to the training data as shown in Fig6.5(a). Fig.6.6(a) 
shows large {a} values for the visible neurons up to 2000 epochs, which inhibit the ability 
of the weights {w 3 } to model the correct distribution [24]. As training continues, the {a} 
values for the visible neurons are reduced, allowing weight adaptation (Fig.6.6(a)) to model 
the training data. Fig.6.5(b) shows that the CRBM has learnt the correct cluster separation 
but is still biased to the bottom-left corner. After 40000 epochs, the CRBM has modelled 
the training data well (Fig.6.5(c)). The weight evolution plot (Fig.6.6(c)) demonstrates that 
weights {w } are adapted to model the training data. The hidden to visible neuron bias 
weights (w 10 and w20) adapt to compensate for the large non-zero mean. 
As training progresses, the hidden neuron hi becomes near-binary (Fig.6.6(b)), allowing the 
two clusters to be modelled clearly. The other hidden neurons encode the variance of the 
clusters in different directions. These results suggest that the CRBM's ultimate performance 
is not affected by the large mean. It simply takes longer to converge. 
Finally, the CRBM with non-zero mean noise was used to model the non-symmetric training 
data shown in Fig.6.7(a). This training data is particularly interesting as it tests both the 
71 



















-i 	—0.5 	0 	0.5 	1 
Si 
(c) 
Figure 6.5: 20-step reconstruction by the CRBM injected with non-zero mean Cni = 2) Gaus- 
sian noise (a) after 500 training epochs (b) after 30000 epochs (c) after 40000 
epochs. 	 72 
Noise in the CRBM 











0 	20000 40000 60000 80000 100000 
Training epochs 
 
Figure 6.6: {a} for (a) visible, (b) hidden neurons, and (c) hidden-visible weight evolutions 
during training. wOl and w02 in (c) indicate the biased hidden neurons weight 
to visibles. 
73 
Noise in the CRBM 
CRBM's ability to regenerate a non-symmetric distribution and also to model the probability 
distribution of the training data correctly. For this experiment, to accelerate training, both 
Tiah and 71av  were set to 7, q,, = 0.3 and ñ, = 2. After 10000 training epochs, the CRBM 
reconstructed data points by Gibbs sampled from 200 random initial data for 20 steps as 
shown in Fig.6.7(b). The results shown in Fig.6.7(b) indicate that the CRBM is able to adapt 
the non-zero mean noise to generate the correct data in position and probability. With 50 
non-symmetric training data with similar distribution shown in Fig.6.7(a), the CRBM with 
the non-zero mean noise was still able to generate the correct data position and distribution. 
6.4 Non-Gaussian 'Pseudo-RTS' noise in the CRBM 
The CRBM system has shown a good modelling ability by adapting its 'internal' noise (Gaus-
sian distribution) to model an input data distribution. 
Fig.6.8 shows the amplitude histogram of an artificial non-Gaussian noise source generated 
using Matlab to mimic RTS noise. The artificially-generated non-Gaussian noise is charac-
tensed by the separation x and distribution variance a. This is a very crude representation of 
real nanoscale MOSFET noise, but it can be used to give a rough indication as to whether 
nanoscale MOSFET noise would work in a CRBM. Chapter 7 discusses a CRBM implemen-
tation with real nanoscale MOSFET noise. 
The CRBM injected with a non-Gaussian (x 	2 and a = 0.1) noise is trained to model 
two sets of data shown in Fig.6. 1(a) and Fig.6.7(a) separately. Based on the non-zero mean 
Gaussian CRBM results, the separation x is expected to cause slow convergence. Therefore, 
in this experiment, the learning rates are set to 71w = 0.3, 1av = 7, and 7/ah = 7. The CRBM 
is trained for 30000 epochs and Figs.6.9(a) and (b) depict the 20-step reconstruction results of 
the trained CRBM. In general, the CRBM has been able to model the data cluster separation 
and distribution correctly. However, the CRBM generates paired-clusters distribution shown 
in Figs.6.9(a) and (b). The occurrence of these peculiar reconstruction results is attributed to 
the non-Gaussian noise. 
A simple experimentation found that by reducing x to zero, the additional clusters disappear. 
74 









—i 	—0.5 	0 	0.5 	1 
Si 
(b) 
Figure 6.7: (a) Non-symmetric distribution training data. (b) 20-step reconstruction of the 
CRBM injected with non-zero mean (A i = 2) Gaussian noise after 10000 epochs. 
75 
Noise in the CRBM 
0 
0 a 
Amplitude (Arbitrary units) 
Figure 6.8: ArtJIcially generated non-Gaussian noise. 
The clusters become prominent as x increases. These analyses confirm the notion that the 
non-Gaussian noise distribution causes the extra clusters and that they are therefore unavoid-
able. 
6.5 CRBM with noise in Multiplier 
It was decided to localise noise in the synaptic multipliers as shown in Fig.5.2, as the multi-
plier is the most-repeated circuit in most architectures of this form and is thus the most likely 
candidate for the use of extremely small (nanoscale) devices. This analysis will be extended 
to other elements of this and alternative neural architectures in future work. - 
Eq.(5. 1) is re-arranged to form Eq.(6.3), as addition and multiplication are both linear opera-
tions. 
Si = i ( 
	
(w js + 	 (6.3) 
To demonstrate that the CRBM is not affected by this change, artificially-generated zero- 
mean Gaussian noise is injected into the neurons of a CRBM network with two visible and 
three hidden units, as shown in Fig.6.10. Using the training data shown in Fig.6.1(a), the 
76 










—0.5 	0 	0.5 	1 
Si 
(b) 
Figure 6.9: 20-step reconstruction of CRBM injected with non-Gaussian noise after 10000 
epochs for (a) symmetric distribution (b) non-symmetric distribution training 
data. 
77 




VW ii  
Vs 1 




Figure 6.10: CRBM neurons with localised noise in synaptic multipliers. 
CRBM is trained with 71w  0.3, 7/av = 10 for visible neurons, 7/ah = 1 for hidden neurons, 
and a = 0.1. Fig.6.11(a) shows 20-step reconstructions after 5000 training epochs. This is a 
rather dispersed distribution, indicating that overall noise levels are too high. The experiment 
is repeated by setting smaller a, i.e 0.05 and 0.01, and the results are shown in Fig.6.1l(b) and 
Fig.6.11(c), respectively. While a = 0.05 gives the best reconstruction, the CRBM with a 
0.01 shows a rough reconstruction of the training data. When subjected to further training 
(10000 epochs), the CRBM with a = 0.01 produces similar performance to the CRBM with 
a = 0.05. For a perfect CRBM, the noise with a = 0.01 is unable to produce the correct 
reconstruction. These results indicate that injecting noise into the synaptic multiplier enables 
smaller noise magnitudes to be used, owing to the superposition effect as many synapses add 
their inputs to the receiving neuron's activity. 
6.6 Summary 
The ultimate goal of this study is to show that intrinsic low frequency nanoscale MOSFET 
noise can be used to implement a CRBM system. As the noise may not be Gaussian with 
78 
Noise in the CRBM 
C,) 













0 0 o 
—1 	—0.5 	0 	0.5 
SI 
 
Figure 6.11: 20-step reconstruction with zero-mean Gaussian noise injected into every 
synaptic multiplier with noise variance set to (a) 0.1 (b) 0.05 (c) 0.01. 
79 
Noise in the CRBM 
mean of zero, it is important to investigate how the CRBM's performance is affected by 
the non-ideal noise characteristics. As the first step towards achieving the goal, the CRBM 
system performance has been evaluated with artificially generated non-Gaussian noise. 
The results in Sec.6.4 show that the CRBM can display degraded, but usable, modelling 
performance when non-Gaussian noise sources with non-zero means are injected into the 
CRBM synaptic multipliers. Although the reconstmction results do not match the (essentially 
Gaussian) training data as faithfully as those with Gaussian noise, the CRBM is still able to 
capture the correct separation and distribution. These results suggest that nanoscale MOSFET 
noise has the potential to be used in CRBM implementation as long as the noise magnitude 
is relatively well matched to the distribution of the data. 
The following chapter presents and discusses the implementation of a CRBM system with 
intrinsic low frequency nanoscale MOSFET noise. 
Chapter 7 
CRBM with Nanoscale MOSFET Noise 
The experimental results discussed in Chapter 6 suggest that intrinsic MOSFET noise may 
indeed be able to replace externally-generated Gaussian noise. It is therefore the aim of 
this chapter to explore the potential of intrinsic nanoscale MOSFET noise to produce useful 
probabilistic behaviour of CRBM neurons. 
The chapter first describes the methodology used to inject the noise data generated by the 
noisy 4-quadrant multiplier in Chapter 4 to the synaptic multiplication in the CRBM neurons. 
The CRBM system was then trained to model continuous data sampled from a non-symmetric 
distribution. The CRBM system's performance in regenerating the continuous data distribu-
tion, through its noise-induced probabilistic behaviour will be explored. 
This chapter provides the necessary linkage between nanoscale device physics and proba-
bilistic neural computation from which a preliminary conclusion can be drawn as to whether 
intrinsic nanoscale MOSFET noise can be used in probabilistic computation. 
7.1 Methodology 
The temporal current fluctuations associated with nanoscale MOSFET noise are incorporated 
in the CRBM through the implementation of the noisy synaptic multiplications shown in 
Fig.7. 1. Full hardware implementation of a nanoscale-based CRBM system is not yet possi-
ble. Rather, the simulated temporal fluctuation of a noisy synaptic multiplier output, drawing 
on values stored in a look-up table (LUT) that represents both the multiplier's functionality 
and the noise associated with its DSM MOSFETs is incorporated. The synaptic multiplica-
tions are implemented using the noisy multipliers discussed in Chapter 4. 
The general method for incorporating noise data into CRBM neurons is described by Fig.7.2. 
The product of {w 3 } and {s 3 } is added with the corresponding noise datum extracted from 
81 
CRBM with Nanoscale MOSFET Noise 
Vs j  









Figure 7.1: CRBM neuron with noisy synaptic multiplication. 
the LUT that stores time domain noise data of a noisy synaptic multiplier output. Ideally, one 
LUT represents output noise data unique for a given multiplier. For a 3 visible and 4 hidden 
network, 3 multipliers are required to perform synaptic multiplication for each hidden neuron 
and 4 multipliers for each visible neuron. The total number of multipliers needed for the 
ideal case is 12, and thus 12 LUTs of time domain noise data are needed. It is not practical 
to implement this number of LUTs in Matlab, as this requires very large memory usage with 
a consequently unacceptably long simulation time. For the purpose of this study, only one 
LUT is used for noisy synaptic multiplication of CRBM neurons. In order to differentiate 
between multipliers, the start of the time domain noise data is shifted at random for each 
multiplier. It is acknowledged that by doing this, the noise data injected in each synaptic 
multiplication can be correlated. However, for the purpose of this study, this is acceptable, 
as the main investigation is on the effect of DSM noise form on CRBM performance. This 
limitation of the study is acknowledged, although it is not believed to cast any doubt on the 
thesis conclusions. 
7.1.1 Noise data in look-up table 
The noisy multiplier was designed based on 35nm atomistic-based CMOS technology that 
operates with a supply voltage of 1 .5V (see Chapter 4). The multiplier inputs, i.e. the weight 
RM 






WI) 	 V 
Start 




t OSw0i' / 
Locate normalised noise 
	
s x y.. 	 data (l) for w, & s 
-' 	 from look-up table 
+ 
(sj xi j)+Jiç 
End 
Figure 7.2: An illustration of how noisy synaptic multiplication is implemented in Matlab. 
Si 
83 
CRBM with Nanoscaic MOSFET Noise 
I 	VLSI 	I Software 
s [0.55,0.95] (V) [-1,1] 
W ij [0.0,1.5] (V) [-1.5,1.5] 
noise amplitude [-1,1] (pA) [-1,1] 
Table 7.1: Mapping noisy multiplier input bias voltage (hardware) to Matlab (software). 
V and the state input data V2 are confined to [0,1.5] (V) and [0.55,0.95] (V) respectively. 
The noisy multiplier output current fluctuates in the time domain with noise characteristics 
inherited from the noisy MOSFETs used in key locations. For the purpose of this work, these 
time domain noise data are stored in a LUT, indexed according to V and V1 . Due to limited 
data storage and computation bottlenecks, only a limited number of V 1, x l/  time domain 
noise data can be stored in the LUT. A compromise must be made between accuracy and 
computation time in representing output noise in Matlab. For this study, multiplier output 
noise data corresponding to V=[0:0.15:1.5]  (V) and %'=[0.55:0.05:0.95] (V) are stored in a 
LUT. For values that fall between the values stored, the closest-neighbour approximation is 
adopted. 
This study incorporates synaptic multiplier output noise generated in SPECTRE into the Mat-
lab implementation. The LUT method is adopted where the stored noise data are 'searched' 
and 'extracted' according to index parameters based on weight V,, and state l/  input volt-
ages. In order to simp1if' the implementation in Matlab, the values for LUT are normalised. 
The weight V and state Vi,, input voltage are normalised to {w 3 } and {s 2 } parameters in 
Matlab, respectively. The corresponding noisy multiplier output current noise which ranges 
between - lpA and 1 pA is normalised to value -1 to 1 in the LUT. Table 7.1 summarises the 
mapping of parameter values between the noisy multiplier hardware implementation and the 
software representation. 
7.1.2 Data look-up 
The flowchart in Fig.7.2 describes how noise data from a LUT is incorporated into synaptic 
multiplication {w 23 x s3 } to produce a probabilistic synaptic product. The flow is described 
as follow:- 
84 
CRBM with Nanoscale MOSFET Noise 
Determine whether it is the start of a session, i.e. training (epoch = 1) or reconstruction 
(step= 1): 
Condition 1: If epoch = 1 or step = 1 (start of a session), randomly generate the start 
point tij = 1 corresponding to the shaded row in the LUT shown in Fig.7.3, where 
index i and j identify a synaptic multiplier. 
Condition 2: If epoch 	1 or step 	1 (ongoing session), get the current epochlstep 
which corresponds to the row number in the LUT, i.e. t ij = n where n is the current 
epoch or step. 
Get the current {w 3 } and {s 3 } values. 
Locate the corresponding noise datum for {w 23 x s3 
} 
from the selected row (the 'black-
shaded' location in the selected row in Fig.7.3) 
Add the noise datum to {w 23 x s3 } to produce the noisy synaptic product. 
Repeat 1 if it is not the end of simulation. 
Condition 1 provides a degree of 'randomness' in the noise data to be linked to each multiplier 
which subsequently minimises correlation between noise data 'generated' for each synap-
tic multiplication. Another shortcoming is that the methodology assumes an instantaneous 
change in noise amplitude (due to instantaneous change in trap-occupancy) with changing 
bias conditions. 
7.2 Modelling data with non-symmetric distribution 
The modelling capability of a CRBM system with nanoscale MOSFET noise is explored 
with a network of 3 visible and 4 hidden neurons. The CRBM is trained to model two 
well-separated clusters of 200 data points with the non-symmetric distribution (one circular-
Gaussian and one elliptic-Gaussian) shown in Fig.7.4(a). This training data is used as it tests 
the CRBM's capability to regenerate the non-symmetric cluster with the correct probability 
L1ID(t) 
z1I( Vwx Viny) 
'p 
w 
CRBM with Nanoscale MOSFET Noise 
Vin 
Figure 7.3: Example of data extracted from the LUT The value for V and V i, are mapped 
to wij and s3 respectively. 
distribution. However, the total number of training data is reduced to 50 to render simula-
tion times tolerable'. The simulation results in Sec.6.3 show that reducing the training data 
to 50 would not affect the ability of the CRBM to generate the correct data position and 
distribution. 
7.2.1 4-traps 
The CRBM with 4-trap RTS noise data has been trained with 77w = 0.3, lav = 10 for visible 
neurons, and nah = 1 for the hidden neurons, for 30000 epochs. After training, the CRBM 
generated the 20-step reconstruction shown in Fig.7.4(b), Gibbs sampling from 200 random 
initial data. Although this CRBM is unable to reconstruct an exact match to the training data, 
the general reconstruction characteristics are encouraging. 
'On a Pentium 4 with 512MB cache PC workstation, it takes twenty hours to train CRBM with 200 training 
data point for 10000 epochs. On the other hand, 50 training data takes about 4 hours to complete 10000 epochs. 
CRBM with Nanoscale MOSFET Noise 
r'1 
'1) 
0000 	 0. 
c00 00 
0 
).5 	 —0. 
- i 
5 	 000 
19 00 
80° 




—U.3 	0 	0.5 	I 	-i 	—0.5 	0 	0.5 
SI SI 
(a) 	 (b) 
Figure 7.4: (a) Non-symmetric distribution training data (b) 20-step reconstruction after 
30000 epochs. 
Fig.7.5 shows the evolution of weight {w 3 } during training. During the initial 20000 epochs, 
the weights fluctuate significantly, indicating that the system is performing a noise-mediated, 
crude approximation to the training data. In comparison, the Gaussian CRBM weight evolu-
tion (Fig.6.6(c)) follows a less noisy trend. Fig.7.5 shows that most {w 3 } settle after 20000 
training epochs, indicating that the training process reaches equilibrium after 20000 epochs. 
Fig.7.5 demonstrates the CRBM system's adaptability and attempt to minimise contrastive 
divergence, despite significant noise variation during training. Fig.7.6(a) shows the final 
(learnt) weight vectors of the hidden neurons projected onto the state space of the visible 
neurons after 30000 epochs, highlighting the contribution of each hidden neuron to the data 
reconstruction. The large noise control parameter a (—* 3.5) of hidden neuron h3 (Fig.7.6(b)) 
indicates the neuron's 'near-binary' behaviour. The 'near-binary' hidden neuron h3 models 
cluster separation while other hidden neurons with smaller {a} encode the variance (dis-
tribution) of the cluster. In other words, the hidden neuron h3 decides which cluster each 
reconstruction point should belong to—the elliptic cluster or the circular cluster. 
7.2.2 10-trap and 100-trap RTS in a CRBM 
A CRBM has been implemented with more 'continuous' RTS noise based on 10-trap and 
100-trap RTS models. The term 'continuous' refers to the less obviously quantised noise 
87 
CRBM with Nanoscale MOSFET Noise 
w 
oJ 




-2o 	10000 	20000 	30000 
Training epoch 
(a)  





10000 	20000 	30000 	2O 	10000 	20000 	30000 
Training epoch Training epoch 
(c) 	 (d) 
Figure 7.5: CRBM with 4-trap implementation weights {w 71 } evolution for 30000 epochs 









CRBM with Nanoscale MOSFET Noise 





10000 	20000 	30000 
Training epoch 
(b) 
Figure 7.6: (a) Learnt weight vector afler 30000 epochs and (b)evolution of noise control 
parameters f ai 
} 
for hidden neurons, during training. 
RIM 




(a) 	 (b) 
Figure 7.7: 20-step reconstruction of CRBM implemented with (a) 10-trap and (b) 100-trap 
noisy synaptic multiplier output noise qfter 30000 epoch. 
amplitudes, in the presence of a larger number of traps [2]. 
In this experiment, 10-trap and 100-trap synaptic multiplier output noise data were used to 
implement a CRBM. The CRBM was trained with i, = 0.3, r = 10 for visible neurons, 
1 for hidden neurons for 30000 epochs. After training, the CRBM generated the 20-
step reconstruction shown in Fig.7.7(a)&(b). Both CRBM implementation (10-trap based and 
100-trap based) have modelled the data well and reconstructed continuous-valued data points 
with results comparable to those of a Gaussian CRBM. These promising results indicate that 
intrinsic nanoscale MOSFET noise can, indeed, be used to implement a CRBM—only with 
a slower convergence time during learning. 
7.3 Summary 
The objective of this work was to explore whether relatively accurately-modelled intrinsic 
nanoscale MOSFET noise can, in principle, be used to underpin useful probabilistic be-
haviour in a stochastic computing architecture—in this study, the CRBM. This was done, 
as a first step, by localising fluctuation to the synaptic multipliers, which are the dominant 
element in many neural architectures. The main challenge of this study is linking the synap-
tic multiplier output noise generated in the SPECTRE circuit simulator to the CRBM model 
KC 
CRBM with Nanoscale MOSFET Noise 
developed and simulated in Matlab. The linking is done through a Look-Up-Table that stores 
a synaptic multiplier's time domain output noise. 
The results in this section demonstrate that the intrinsic nanoscale MOSFET noise can, in-
deed, be used to implement probabilistic computation in the CRBM, whilst retaining mod-
elling ability comparable to that of a 'perfect' Gaussian CRBM. RTS noise takes the form of 
distinct discrete levels (i.e. 16 levels for 4-trap RTS implementation), but the CRBM Sec.6.4 
would seem to suggest that discretised reconstructions would occur, owing to the noise level's 
quantised probability. This would be the case if noise was taken from fixed MOSFET bias 
conditions. However, in reality, bias conditions change as a natural feature of multiplier 
activity. Thus, dynamic bias conditions enable a CRBM to reconstruct a continuous data 
distribution from discretised noise. In a software CRBM, the noise level in visible neurons 
and hidden neurons can be tuned to optimum level to enhance learning [23,24]. However, 
intrinsic nanoscale MOSFET noise cannot be tuned, and it is not, therefore, surprising that a 
CRBM with intrinsic nanoscale MOSFET noise performance is more difficult to train than a 
Gaussian software implementation. 
As evident in [2], the large number of traps produces noise amplitudes closer to a Gaussian 
noise distribution, which is in turn closer to the noise in a 'perfect' CRBM. It is therefore 
expected that 10-trap and 100-trap based CRBM implementation will perform better than 
a 4-trap implementation, owing to the more 'continuous' distribution of noise amplitude. 
Experiments confirm this expectation. 
91 
Chapter 8 
Summary and Conclusion 
8.1 Summary 
In future, RTS noise will become the dominant source of uncertainty in nanoscale MOSFETs. 
It will cause significant drain current fluctuation that limits the performance and functionality 
of a circuit. Several alternative architectural paradigms, such that the unreliable performance 
of these nanoscale MOSFETs could be tolerated (fault tolerance) or useful (adaptive), have 
been proposed [15-20]. This thesis investigates the use of DSM noise in an architecture 
that actually requires it. A 'demonstrator' architecture has been identified—the Continuous 
Restricted Boltzmann Machine (CRBM). 
In the original, 'pure' CRBM, stochastic behaviour is introduced through injection of Gaus-
sian noise from an external source [3,23,24, 32]. In hardware implementation, this requires 
additional circuitry and thus more silicon area [3]. In addition, the noise from external sources 
was found to introduce additional computational error to the system, making it less robust [3]. 
In this thesis, the noise source is localised in the synaptic multiplications of each stochastic 
neuron. An initial analysis with artificially generated 'pseudo-RTS' noise injected into a 
CRBM's synaptic multiplier shows that the CRBM retains a good modelling capability. This 
result encourages the potential of using nanoscale MOSFET noise in a CRBM system. The 
temporal current fluctuations associated with nanoscale MOSFET noise were incorporated in 
the CRBM through the implementation of noisy synaptic multiplications. 
Unfortunately, statistically significant intrinsic MOSFET noise is not yet available. There-
fore, a noisy MOSFET model has been developed to emulate the predicted noisy behaviour 
of future nanoscale MOSFET noise. Both 11f and RTS noise have been studied, and a prag-
matic approach has been taken to include the effect of this form of noise in MOSFETs, such 
that modelled noise can be included in circuit simulations. At this concept-proving stage, a 
92 
Summaiy and Conclusion 
computationally-simple model is more important than detailed accuracy. The noisy MOSFET 
model has been shown to produce the correct form of noise behaviour, valid in all operating 
regimes, and implementable in a circuit simulator. 
A noisy analogue multiplier circuit has been implemented by replacing the key MOSFETs 
in the circuit with the noisy MOSFET models. This introduces noise to the signal path, 
producing a noisy output without compromising the underlying performance of the multiplier. 
Simulation results indicate that the multiplier retains the original performance, although the 
output become noisy. 
The noisy multiplications are used to replace the normal synaptic multiplication process of the 
stochastic CRBM neurons. A systematic approach that embeds the noisy synpatic multiplica-
tion in the CRBM has been implemented, drawing on values stored in a look-up table (LUT) 
that represents both the multiplier's functionality and the noise associated with the nanoscale 
MOSFETs. This approach provides the necessary linkage between nanoscale device physics 
and probabilistic computation. Subsequently, the nanoscale-based CRBM system has demon-
strated the ability to model data with simple, yet non-trivial distributions. 
8.2 Conclusion 
This thesis examined the suggestion that 
"Low frequency drain current noise in future nanoscale MOSFETs can underpin useful prob-
abilistic computation." 
In this research, a novel methodology has been developed that builds a modelling bridge 
between atomistic models of nanoscale devices (and the physical devices upon which they 
were based), through look-up table models of the noise in such devices, to circuit models of 
the noisy nanoscale devices and circuits that can be simulated in reasonable time and using 
a conventional analogue simulator has been presented. Although we have chosen to study a 
probabilistic architecture, the methodology is perfectly generic and could be used to explore 
and investigate the temporal effects of nanoscale device noise on conventional computational 
paradigms, significantly in advance of the availability of integrated circuits (ICs) with true 
93 
Summary and Conclusion 
nanoscale devices. Understanding how future noisy nanoscale devices may effect circuit 
performance may pave the way to new design paradigms, some of which may exploit the 
noise for reliable system implementation—as explored in this thesis. 
From the results presented, this research has demonstrated that intrinsic MOSFET noise 
can potentially be useful for probabilistic computation in future hardware implementation. 
Clearly, these results, taken from a single probabilistic architecture (the CRBM) with accu-
rate, but limited models of nanoscale RTS noise, are insufficient for claiming that the pro-
posed approach will solve all conventional computing problems. However, even with this 
simple caveat, these results indicate that a principled method can be developed to study the 
effects of nanoscale MOSFET noise in computational architectures and that early indications 
are that architectures can be developed that can 'make use of' such noise in their operation. 
While the outcomes of this study are encouraging, there are several limitations which must 
be acknowledged. 
Simplified nanoscale MOSFET noise models: the models were developed based on 
extensive assumptions listed in Sec.3.4.1 and fitting parameters (Table 3.1) adopted 
from large MOSFETs. Although it has been shown that the simplified models are able 
to produce the predicted noise characteristics in future nanoscale MOSFETs, detailed 
accuracy in the generated noise cannot be guaranteed. 
Representation inaccuracy from using LUT: since the full hardware implement of CRBM 
with nanoscale MOSFETs is not yet possible, a LUT was used to link noise data to the 
CRBM software implementation. Due to limited storage space and computation re-
sources, only limited data can be represented in the LUT. An approximation is done for 
any value not stored in the LUT and thus may introduce inaccuracies. In addition, since 
only one LUT was used to represent noise data from several noise sources (multipliers), 
it is possible that correlation may be introduced. 
Only one probabilistic architecture was studied: the CRBM was used as a 'demonstra-
tor' architecture and has shown encouraging results. Unfortunately, it is not possible 
to generalise the findings of this thesis to other architectures. In fact, the study on the 
CRBM system has not been done exhaustively with other variation of data. Therefore, 
94 
Summary and Conclusion 
there is not enough evidence to comment about performance integrity of the CRBM 
with nanoscale MOSFET noise. 
8.3 Future work 
The work presented in this thesis is based on a simplified nanoscale MOSFET noise model. 
While the noise generated conformed to the predicted noise behaviour in future MOSFETs, 
real nanoscale MOSFET noise mechanisms will be more complex [4]. Once nanoscale MOS-
FET with significant noise become available, it will be important to repeat the work presented 
in this research with real noise data. 
The further development of time domain noise analysis based on the nanoscale MOSFET 
model as a standard capability in a commercially available simulator is recommended for 
future nanoscale circuit analysis. This capability will allow better understanding of circuit 
performance in the presence of significant temporal fluctuation of device performance. Once 
this capability becomes available, it would be useful to implement a full hardware simulation 
of a CRBM system. This eliminates the approximation errors introduced when mapping 
noise data into the software implementation, giving a closer representation of a real nanoscale 
hardware implementation. 
A promising modelling ability has been demonstrated, it is recommended to further explore 
the capability of nanoscale-based CRBM system to model more complex and realistic data, 
e.g. bio-medical data. This is important to prove that a nanoscale-based CRBM system can 
be exploited in all contexts and with all data for a real world applications. 
In this research, the CRBM system is a 'demonstrator' architecture for studying the feasibil-
ity of using nanoscale MOSFET noise for probabilistic computation. The CRBM was cho-
sen mainly due to its intrinsically-interesting probabilistic behaviour, which has been proven 
suitable for hardware implementation [3]. Furthermore, the CRBM system was developed 
locally, and thus expertises on this system was readily available. In order to generalise the 
findings of this research, it is encouraged to repeat the work with a different probabilistic 
architecture such as the Diffusion Network [67, 68]. 
Appendix A 
MOSFET Channel Voltage 
Approximation 
The mean capture Jr, and emission t, time are described by Eq.(2.3) and Eq.(2.4), respec-
tively. The bias dependency f, and f is reflected through the approximation of electron 
concentration n at the given trap location using Eq.(3.11). Using Eq.(3.11), effective elec-
tron concentration n at a specific location in a channel is determined by approximating the 
effective surface potential Os and quasi-Fermi potential V(y) at that location. 
A.1 Approximating the Quasi-Fermi Potential, V(y) 
The Quasi-Fermi potential V(y) describes the effective voltage approximated for a given 
lateral position along the channel. It is very dependent on bias conditions. 
In linear operation, the quasi-Fermi potential V(y)can be described by [58]: 
V(y) = Gy - /(C)2 —2(C) VDS+ 	 (A.1) 
where G represents Vc-Vth , y is the lateral location measured from the source, L is the 
channel length, and m is the body effect coefficient defined by 
In order to describe quasi-Fermi potential V(y) in saturation operation (VDS > VGS -Vt 
the MOSFET channel has to be divided into two sections defined by L and S, as seen in 
Fig.A.1. 
The section between source end and pinch-off point is denoted by the yL  axis. Along this 
axis, V(y'-') varies linearly and can be described by slight modification (A. 1) to be: 
we 
MOSFET Channel Voltage Approximation 
YG 
	
ys 	 D 
Source 	 VDS: AL 	Drain 
VO L)  V  (yS) 
x 	 x 
L 
Figure A.!: MOSFET cross-section for saturation operation with channel length modula-
tion. The pinch-off point is defined when VDS = VGS - Vth and denoted by 
VDSsat. 
V(y) = Gy - 	 - 2Y 
X Gy x VDSSat y X VISSat 
L - 	 + L - tL 
' 	 (A.2) 
where /2L refers to the amount of channel length modulation by the drain voltage and defined 
by: 
( VDS - VDS8at 
+ / ( I~DS - VDS5at YAL = I in 	
lEsat 	 lEsat 	
+ 	(A.3) 
where y is measured between 0 to L - LL. 1 is a characteristic length defined by  Oli _ZX3 
and Esat is the lateral field at saturation point, and is defined by 	X, V sat, and /Jef f are Ileff 
junction depth, velocity saturation, and effective mobility, respectively and their values can 
be extracted from the technology process file. 
The section between pinch-off point and the drain, denoted by the yS  axis exhibits the effect 
of channel length modulation, where the pinch-off point moves towards the source end, cx- 
97 
MOSFET Channel Voltage Approximation 
posing a region (LL) depleted of inversion charge. In this region, the carriers reach saturation 
velocity (V sat). The quasi-Fermi potential V( ys) in this region is defined by: 
V(y) = VDSSat + lEsat sinh (Y
1), 	 (A.4) 
where y is measured between L - LL and L. 
A.2 Approximating Surface Potential, /-'s 
In BSIM, MOSFET characteristics model are developed based on a threshold-voltage based 
model where surface potential bs is approximated as a fixed quantity, described by [48]: 
2kT( n Nch'\ 
s=— 
q 
.in -± 	 (A.5) 
where Nh is channel doping concentration, a process dependent parameter. While this ap-
proximation is acceptable for 'long' devices—with continuous down scaling of MOSFET 
towards nano-scale dimension—a more accurate approximation of surface potential Os has 
been suggested in [75] which explicitly includes the effect of bias conditions applied to the 
MOSFET. In [75], surface potential is approximated as: 
f+Ut x In XS2 - f + 
Ut 	) 
(A.6) 
where XS is given by 
VGB - VFB - I 
mx 
- 	 (Swi - I) 
(mx 	) x+ 
L 4ut  ] 
(A.7) 
VFB is the flat-band voltage, Ut IS the thermal voltage defined by Ut = 	, OF is the bulk 
Fermi potential defined by Ut x in (N),  and m is the body factor defined earlier. 'bs S 
MOSFET Channel Voltage Approximation 
2 




transition of s from weak inversion to strong inversion is approximated by f, and is defined 
as: 
2 F + V(y) + 
	- x (swi - 	 - V(y)) 2 	 (A.8) 2 	2 2 
where Eis a smoothing factor conveniently fixed at 2 x 10-2. 
Appendix B 
Publication list 
This appendix lists the papers produced as the contributions of this PhD research: 
N. H. Hamid, A. F. Murray, D. Laurenson, S. Roy, and B. Cheng, "Probabilistic computing 
with future deep submicrometer devices: a modelling approach," in Proc. IEEE ISCAS '05, 
Kobe, Japan, May 23-26, 2005, In Press. 
N. H. Hamid, A. F. Murray, and S. Roy, "Noisy mosfets for time domain circuit implementa-
tion: A modelling approach," submitted to IEEE Trans. on Circuit and System I. 
N. H. Hamid, A. F. Murray, and T. B. Tang, "Probabilistic neural computing with future 
nanoscale MOSFETs," submitted to IEEE Trans. Neural Network. 
100 
References 
R. F. Pierret, ed., Field Effect Devices, vol. IV of Modular series on solid state devices 
2nd Edition. New York, USA: Addison-Wesley Publishing Company, 1990. 
N. H. Hamid, A. F. Murray, and S. Roy, "Noisy mosfets for time domain circuit im- 
plementation: A modelling approach," submitted to IEEE Trans. on Circuit and System 
1. 
H. Chen, Continuous-value probabilistic neural computation in VLSI. Phd thesis, Uni-
versity of Edinburgh. 
M. J. Kirton and M. J. Uren, "Noise in solid-state microstructures: a new perspective 
on individual defects, interface states and LF (1/f) noise?," Adv. Phys., vol. 38, p.  367, 
1989. 
N. V. Amarasinghe and Z. C. Bulter, "Extraction of oxide trap properties using temper- 
ature dependence of random telegraph signals in submicron metal-oxide-semiconductor 
field effect transistors," J. Appl. Phys., vol. 89, pp.  5526-5532, 2001. 
"International technology roadmap for semiconductors, 2004 edition," 2004. 
H. S. P. Wong, D. J. Frank, P. L. Solomon, C. H. J. Wann, and J. J. Welser, "Nanoscale 
cmos," in Proceedings of The IEEE, pp. 537-570,, 1999. 
K. Likharev, "Electronics below lOnm," in Nano and Giga Challenges in Microelec-
tronics (J. G. et. al., ed.), (Amsterdam), pp.  27-68, Elsevier, 2003. 
M. H. Tsai and T. P. Ma, "The impact of device scaling on the current fluctuations in 
MOSFET's," IEEE Trans. Elect. Devices, vol. 41, pp.  206 1-2068, Nov. 1994. 
M. Valenza, A. Hoffmann, D. Sodini, A. Laigle, F. Martinez, and D. Rigaud, "Overview 
of the impact of downscaling technology on 1/f nosie in p-mosfets to 90nm," lEE Proc. 
Circuits, Devices, and Systems, vol. 151, pp.  102-110, April 2004. 
G. Ghibaudo and T. Boutchacha, "Electrical noise and RTS fluctuation in advanced 
CMOS devices," Microelectronic Reliability, vol. 42, pp.  573-5 82, 2002. 
H. Wong, "Low-frequency noise study in electron devices: review and update," Micro-
electronic Reliability, vol. 43, pp.  585-599, 2003. 
H. M. Bu, Y. Shi, X. L. Yuan, Y. D. Zheng, S. H. Gu, H. Majima, H. Ishicuro, and 
T. Hiramoto, "Impact of the device scaling on low frequency noise in n-MOSFETs," 
Appi. Phys. A, vol. 71, pp.  133-136, 2000. 
101 
References 
C. Hu, G. P. Li, E. Worley, and J. White, "Consideration of low-frequency noise in 
MOSFET's for analog performance," IEEE Elect. Devics Lett., vol. 17, P. 552, Dec. 
1996. 
A. DeHon, "Array-based architecture for fet-based, nanoscale electronics," IEEE Trans. 
on Nanotechnology, vol. 2, pp.  23-32, , 2003. 
S. C. Goldstein and M. Budiu, "Nanofabncs: Spatial computing using molecular elec-
tronics," in The 28th Annual mt. Symp. on Computer Architecture. 
J. Han and P. Jonker, "A system architecture solution for unreliable nanoelectronics 
devices," IEEE Trans. on Nanotechnology, vol. 1, pp.  20 1-208, Dec., 2002. 
S. Folling, 0. Turel, and K. Likharv, "Single-electron latching switches as nanoscale 
synapses," in Proc. of mt. Joint Conf. on Neural Network, pp. 216-221, , 2001. 
R. I. Bahar, J. Mundy, and J. Chen, "A probabilistic-based design methodology for 
nanoscale computation," in mt. Conf. on CAD, (San Jose, CA), Nov., 2003. 
K. Nepal, R. I. Bahar, J. Mundy, W. R. Patterson, and A. Zaslavsky, "Designing logic 
circuit for probabilistic computation in the presence of noise," in Design Automation 
Conference, , 2005. 
T. B. Tang, H. Chen, and A. F. Murray, "Adaptive stochastic classifier for noisy ph-isfet 
measurements," in Proceedings of the International Conference on Artificial  Neural 
Networks (ICANN2003), pp. 638-645, 2003. 
T. B. Tang, H. Chen, and A. F. Murray, "Adaptive, integrated sensor processing to 
compensate for drift and uncertainty: a stochastic 'neural' approach," in lEE Proc. on 
Nanobiotechnology, vol. 151, 2004. 
H. Chen and A. F. Murray, "A continuous restricted boltzmann machine with a hard-
ware amenable learning algorithm," in Proceedings of the International Conference on 
Ar4flcial Neural Networks (ICANN2002), pp. 358-363, 2002. 
H. Chen and A. F. Murray, "A continuous restricted boltzmann machine with an im-
plementable training algorithm," in lEE Proc. of Vision, Image, and Signal Processing, 
vol. 150, 2003. 
S. Roy, B. Cheng, G. Roy, and A. Asenov, "A methodology for introducing 'atomistic' 
parameter fluctuations into compact device models for circuit simulation," I Comp. 
Elec., vol. 2, pp.  427-431, 2003. 
A. Asenov, A. R. Brown, J. H. Davies, and S. Saini, "Hierarchical approach to 'atom-
istic' 3-d mosfet simulation," IEEE Trans. on Computer-Aided Design of Integrated 
Circuits and Systems, vol. 18, pp. 1558-1565, Nov., 1999. 
102 
References 
A. Asenov, R. Balasubramaniam, A. R. Brown, and J. H. Davies, "RTS amplitudes in 
decananometer MOSFETs:3-D simulation study," IEEE Trans. Elect. Devices, vol. 50, 
pp. 839-845, March 2003. 
N. V. Amarasinghe, Z. C. Butler, A. Zlotnicka, and F. Wang, "Model for Random Tele-
graph Signals in sub-micron MOSFETs," Solid State Electronics, vol. 47, pp.  1443-
1449, 2003. 
A. P. van der Wel, E. A. M. Klumperink, L. K. J. Vandamme, and B. Nauta, "Model-
ing Random Telegrah Noise under switched bias conditions using Cyclostationary RTS 
Noise," IEEE Trans. Elect. Devices, vol. 50, pp.  1378-1384, May 2003. 
H. Chen, P. Fleury, and A. F. Murray, "Minimizing contrastive divergence in noisy, 
mixed-mode vlsi neurons," in Advances in Neural Information Processing System 
(NIPS2003), vol. 17, (Vancouver, Canada), 2003. 
H. Chen, P. Fleury, T. B. Tang, and A. F. Murray, "Adaptive noisy neural computation in 
mixed-mode vlsi," in Proceedings of the Seventh International Conference on Cognitive 
and Neural Systems, p.  68, May, 2003. 
H. Chen, P. Fleury, and A. F. Murray, "Unsupervised probabilistic neural computation 
in mixed-mode vlsi," in Smart Adaptive System on Silicon (M Valle ed.), Springer, Oct., 
2004. 
B. J. Cheng, S. Roy, G. Roy, and A. Asenov, "Integrating Atomistic, intrinsic parame-
ter fluctuations into compact model circuit analysis," in 33rd Conference of European 
Solid-State Device Research (ESSDERC03, pp. 437-440, , 2003. 
M. Schulz, "Electrical characterization of the sio2-si system," Microelectronic Engi-
neering, vol. 40, pp. 113-130, 1998. 
M. J. Uren, D. J. Day, and M. J. Kirton, "1/f and random telegraph noise in silicon metal-
oxide-semiconductor field-effect transistor," Appi. Phys. Lett., vol. 47, pp.  1195-1197, 
Dec 1985. 
E. Simoen and C. Claeys, "Random Telegraph Signal: a local probe for single point 
defect studies in solid-state devices," Material Science and Engineering, vol. B9 1-92, 
pp. 136-143, 2002. 
Z. Shi, J. P. Mièville, and M. Dutoit, "Random Telegraph Signals in Deep Submicron 
n-MOSFET's," IEEE Trans. Elect. Devices, vol. 41, pp.  1161-1168, July 1994. 
N. V. Amarasinghe and Z. C. Bulter, "Complex Random Telegraph Signals in 0.06 jm 2 
MDD n-MOSFETs," Solid State Electronics, vol. 44, pp.  1013-10 19, 2000. 
E. Simoen, B. Dierickx, C. L. Claeys, and G. J. Declerck, "Explaining the amplitude of 
RTS noise in submicrometer MOSFET's," IEEE Trans Elect. Devices, vol. 39, pp. 423-
429, Feb. 1992. 
103 
References 
E. Simoen, B. Dierickx, and C. Claeys, "Hot-carrier degradation of the Random Tele-
graph Signal amplitude in submicrometer Si MOSTs," App!. Phys. A: Solids and Sur-
faces, vol. 57, pp. 283-289, 1993. 
0. R. D. Buisson, G. Ghibaudo, and J. Bnni, "Model for drain current RTS amplitude 
in small-area MOS transistors," Solid-State Electronics, vol. 35, pp.  1273-1276, 1992. 
E. Simoen and C. Claeys, "Substrate bias effect on the random telegraph signal param-
eters in submicrometer silicon p-metal-oxide-semiconductor transistors," App!. Phys., 
vol. 77, pp.  910-914, Jan 1995. 
N. V. Amarasinghe, Z. C. Butler, and P. Vasina, "Characterization of oxide traps in 0.15 
m2  MOSFETs using Random Telegrah Signal," Microelectronics Reliability, vol. 40, 
pp. 1875-188 1, 2000. 
J. H. Scofield, N. Borland, and D. M. Fleetwood, "Reconciliation of different gate-
voltage dependencies of 1/fhoise in n-MOS and p-MOS transistors," IEEE Trans. Elect. 
Devices, vol. 41, pp. 1946-1952, Nov 1994. 
K. K. Hung, P. K. Ko, C. Hu, and Y. C. Cheng, "A unified model for the flicker noise in 
metal-oxide-semiconductor field effect transistors," IEEE Trans Elect. Devices, vol. 37, 
p. 654, 1990. 
J. Chang, A. A. Abidi, and C. R. Viswanathan, "Flicker noise in CMOS transistor form 
subthreshold to strong inversion at various temperatures," IEEE Trans. Elect. Devices, 
vol. 41, pp.  1965-1971, Nov 1994. 
Y. Nemirovsky, I. Brouk, and C. G. Jakobson, "1/f noise in CMOS transistor for analog 
applications," IEEE Trans. Elect. Devices, vol. 48, pp.  92 1-927, 2001. 
"Bsim3v3.2.2 MOSFET Model: User's Manual," 2004. 
P. E. Allen and D. R. Holberg, eds., CMOSAnalog Circuit Design. 2nd Edition, Oxford, 
UK: Oxford University Press, 2002. 
D. Xie, M. Cheng, and L. Forbes, "SPICE models for flicker noise in n-MOSFETs from 
subthreshold to strong inversion," IEEE Trans. Computer-Aided Design oflnteg. Circuit 
and Syst., vol. 19, pp. 1293-1303, Nov 2000. 
D. E. Hocevar, P. Yang, T. N. Trick, and B. D. Epler, "Transient sensitivity computation 
for MOSFET circuits," IEEE Trans. on Computer-A ided Design, vol. CAD-4, pp.  609-
620, Oct. 1985. 
P. Bolcato and R. Poujois, "A new approach for noise simulation in transient analysis," 
in Proc. IEEE mt. Symp. Circuits Syst., pp. 887-890, May, 1992. 
Mentor Graphic, ELDO (Demo Version) User's Manual, 2003. 
104 
References 
M. J. Uren, M. J. Kirton, and S. Collins, "Anomalous telegraph noise in small-are silicon 
metal-oxide-semiconductor field-effect transistors," Physical Review B, vol. 37, pp.  37-
41, May, 1988. 
K. K. Hung, P. K. Ko, C. Hu, and Y. C. Cheng, "Random Telegraph Noise of Deep-
Submicrometer MOSFET's," IEEE Trans. Elect. Devices, vol. 11, pp.  90-92, Feb. 1990. 
S. T. Martin, G. P. Li, E. Worley, and J. White, "The gate bias and geometry dependence 
of random telegraph signal amplitudes," IEEE Elect. Devics Lett., vol. 18, pp.  444-446, 
Sep 1997. 
A. Lee, A. R. Brown, A. Asenov, and S. Roy, "Random telegraph signal noise simula-
tion of decanano mosfets subject to atomic scale structure variation," Superlattices and 
Microstructure, vol. 34, pp.  293-300, 2003. 
Y. Taur and T. H. Ning, eds., Fundametals of Modern VLSI Devices. Cambridge, UK: 
Cambridge University Press, 1998. 
K. R. Laker and W. M. C. Sansen, eds., Design ofanalog integrated circuits and systems. 
International Edition, Singapore: McGraw-Hill Inc., 1994. 
C. Jacoboni and L. Reggiani, "The Monte Carlo method for the solution of charge 
transport in semiconductors with applications to covalent materials," Review of Mod-
ern Physics, vol. 55, pp. 645-705, July 1983. 
H. Chible, "Analog circuit for synapse neural networks VLSI implementation," in 
The 7th IEEE mt. Conf on Electronics, Circuits and Systems (ICECS 2000), vol. 2, 
pp. 1004-1007, May, 2000. 
G. Colli and F. Montecchi, "Low voltage low power CMOS four-quadrant analog mul-
tiplier," IEEE Int. Symp. on Circuits and Systems, vol. 1, pp. 496-499, 1996. 
D. Coue and G. Wilson, "A four-quadrant subthreshold mode multiplier for analog 
neural-network application," IEEE Trans. on Neural Network, vol. 7, pp.  1212-1219, 
1996. 
N. Saxena and J. Clark, "A four-quadrant cmos analog multiplier for analog neural-
network," IEEE Journal of Solid-State Circuits, vol. 29, pp.  746-749, 1994. 
A. Ambrozy, ed., Electronic Noise. New York: McGraw-Hill Inc., 1982. 
A. Astaras, Pulse-stream binary stochastic hardware for neural computation: The 
Helmholtz machine. Phd thesis, University of Edinburgh. 
J. R. Movellan and J. L. McClelland, "Learning continuous probability distributions 
with symmetric diffusion network," Cognitive Science, vol. 17, pp.  463-496, 1993. 
J. R. Movellan, "A learning theorem for networks at detailed stochastic equilibrium," 
Neural Computation, vol. 10, pp.  1157-1178, 1998. 
105 
References 
P. Fleury, H. Chen, and A. Murray, "On-chip contrastive divergence learning in ana-
logue vlsi," in Proceedings of the International Joint Conference on Neural Networks 
(IJCNN'04), (Budapest, Hungary), 25-29, 2004. 
G. E. Hinton, "Product of experts," in Proceedings of the Ninth International Confer-
ence on Artificial Neural Networks (ICANN'99), (Edinburgh, Scotland), 1-6, 1999. 
P. Smolensky, "Information processing in dynamical systems: Foundation of harmany 
theory," in Parallel Distributed Processing: Exploration in the Microstructure of Cog-
nition,vol. 1. 
G. E. Hinton, "Training products of experts by minimizing contrastive divergence," Neu-
ral Computation, vol. 14, pp. 1771-1800, 2002. 
J. Alspector, B. Gupta, and R. B. Allen, "Performance of a stochastic learning mi-
crochip," in Advances in Neural Information Processing Systems (NIPS88), vol. 1. 
A. Jayakumar and J. Alspector, "A cascadable neural network chip set with on-chip 
learning using noise and gain annealing," in Proceedings of IEEE Custom Integrated 
Circuits Conference, pp. 19.5.1-19.5.4. 
R. van Langevelde and F. M. Klaassen, "An explicit surface-potential-based MOSFET 
model for circuit simulation," Solid-State Elect., vol. 44, pp. 409-418, 2000. 
106 
