Bio-inspired electronics for micropower vision processing by Constandinou, Timothy & Constandinou, Timothy
Bio-inspired Electronics For
Micropower Vision Processing
by
Timothy Constandinou
October 2005
A thesis submitted for
the degree of Doctor of Philosophy of the University of London
Department of Electrical and Electronic Engineering
Imperial College of Science, Technology and Medicine
University of London
Contents
1 Introduction 1
1.1 Motivation Neurobiology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 Biologically Inspired Electronics . . . . . . . . . . . . . . . . . . . . 2
1.3.2 Modern Vision Processing Technology . . . . . . . . . . . . . . . . . 2
1.3.3 A Distributed Architecture for Centroid Detection . . . . . . . . . . 3
1.3.4 Photodiodes in Modern Deep Sub-Micron CMOS Technology . . . . 3
1.3.5 ORASIS: A Micropower Centroiding Vision Processor . . . . . . . . 4
2 Biologically Inspired Electronics 6
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Neural Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Synapses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.3 Neural Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Neural Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Neural Visual Streams . . . . . . . . . . . . . . . . . . . . . . . . . . 11
i
CONTENTS ii
2.3.2 Retinal Data Representation . . . . . . . . . . . . . . . . . . . . . . 13
2.3.3 Quantised Data/time vs. Continuous Data/time . . . . . . . . . . . 14
2.3.4 Spike Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Hybrid Computation for Improved Computational Eﬃciency . . . . . . . . 16
2.4.1 Analogue versus Digital Signal Processing . . . . . . . . . . . . . . . 16
2.4.2 Linear operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.3 Non-linear operations . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.4 Hybrid System Organisation . . . . . . . . . . . . . . . . . . . . . . 21
2.5 The Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.1 Weak Inversion Technology . . . . . . . . . . . . . . . . . . . . . . . 23
2.5.2 Asynchronous Technology . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3 Modern Vision Processing Technology 38
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Imager Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.1 CCD Imagers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.2 CMOS Imagers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.3 CCD vs. CMOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 Vision Processing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.1 The Modular Approach . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.2 The Distributed Approach . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.3 The Computation-on-Readout Approach . . . . . . . . . . . . . . . . 48
3.4 Centroid Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
CONTENTS iii
3.4.1 Sequential Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.2 Distributed Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4 A Distributed Algorithm for Centroid Detection 62
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2 The Bio-pulsating Contour Reduction Algorithm . . . . . . . . . . . . . . . 63
4.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.3 Analytical Formalisation . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2.4 Algorithmic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2.6 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2.7 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.8 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.3 A Bio-inspired Paradigm for Parallel Processing . . . . . . . . . . . . . . . . 88
4.3.1 The Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.3.2 The Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.3.3 Neurobiological analogy . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.4 Emerging technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5 Photodiodes in Modern Deep Sub-Micron CMOS Technology 96
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
CONTENTS iv
5.2 Photodiode Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.2.1 Silicon-based Phototransduction . . . . . . . . . . . . . . . . . . . . 97
5.2.2 The P-N Junction Photodiode . . . . . . . . . . . . . . . . . . . . . 98
5.2.3 Photodiode Eﬃciency . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3 Photodiode Characterisation . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3.1 Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3.2 Device Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3.3 Device Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.4 Photodiode Results Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.4.1 Functional Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.4.2 Impact of Technology Scaling on Photodiode Performance . . . . . . 126
5.4.3 Photodiode Design Recommendations . . . . . . . . . . . . . . . . . 126
5.5 Interface Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.5.1 Continuous-time Pixel . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.5.2 Active Pixel Sensor (APS) . . . . . . . . . . . . . . . . . . . . . . . . 128
5.5.3 Spiking Pixel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.6 An Adaptive-ON/OFF Spiking Photoreceptor . . . . . . . . . . . . . . . . . 129
5.6.1 A bio-inspired encoding scheme . . . . . . . . . . . . . . . . . . . . . 130
5.6.2 Photodiode Implementation . . . . . . . . . . . . . . . . . . . . . . . 131
5.6.3 System Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.6.4 Circuit Topology and analysis . . . . . . . . . . . . . . . . . . . . . . 134
5.6.5 Circuit Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.6.6 Simulated and Measured Results . . . . . . . . . . . . . . . . . . . . 136
5.6.7 Spike Interval Encoding . . . . . . . . . . . . . . . . . . . . . . . . . 143
CONTENTS v
5.6.8 Contribution to Related Work . . . . . . . . . . . . . . . . . . . . . 144
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6 ORASIS: A Micropower Centroiding Vision Processor 152
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6.2 System Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.2.1 Pixel Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.2.2 Global Signal Distribution . . . . . . . . . . . . . . . . . . . . . . . . 155
6.2.3 System Input/Output . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.2.4 Pixel Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.3 Distributed Analogue Signal Processing (ASP) Core . . . . . . . . . . . . . 158
6.3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.3.2 Averaging and Comparison . . . . . . . . . . . . . . . . . . . . . . . 159
6.3.3 Edge Detecting and Contour Discrimination . . . . . . . . . . . . . . 164
6.3.4 Bias Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.3.5 Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.4 Distributed Asynchronous Binary Processing (ABP) Core . . . . . . . . . . 172
6.4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.4.2 State Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.4.3 State Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6.4.4 State Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6.4.5 Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
6.5 Address Event Representation (AER) . . . . . . . . . . . . . . . . . . . . . 180
6.5.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
CONTENTS vi
6.5.2 Sender Neuron Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.5.3 Column/Row Latch . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
6.5.4 Arbiter Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
6.5.5 Address Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.6 Fabricated prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6.6.1 ORASIS-P1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6.6.2 ORASIS-P2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.7 System Results (Simulated) . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
6.7.1 ASP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
6.7.2 ABP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
6.7.3 AER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6.7.4 Overall System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
6.8 System Results (Measured) . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
6.8.1 Test Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
6.8.2 System Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6.8.3 Power Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
6.8.4 Processing Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.8.5 AER Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
7 Conclusion 216
7.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
7.2 Recommendations for Future Work . . . . . . . . . . . . . . . . . . . . . . . 217
7.2.1 System Optimisation, Enhancement and Development . . . . . . . . 217
CONTENTS vii
7.2.2 Hybrid Distributed Algorithm Design and Implementation . . . . . . 218
A Algorithm Simulation Source Code 220
B AER Communication Source Code (Firmware) 238
C System Simulation Schematics 242
D Testboard Hardware 248
E Optoelectronic Measurement Setup 255
F Publications 258
List of Figures
2.1 Neural architecture - Horizontal cells from a rabbits retina representing the
intricate web-like neural interconnectivity (left) and typical representation of
a neuron highlighting data ﬂow (right) . . . . . . . . . . . . . . . . . . . . . 8
2.2 The synapse, a neurobiological mechanism forming the contact sites that
facilitate interneuronal connections for transmission and processing of neural
information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 The hierarchical organisation of the nervous system, from the highest level;
behavioural systems (on the left,) to the lowest level; genes (on the right) . 10
2.4 The human visual pathway - cellular representation of the retina (top) and
various parts of the brain associated with visual processing (bottom) . . . . 12
2.5 Signal representation in the primate retina, illustrating typical electrical re-
sponse of various neuron types to spot illumination. . . . . . . . . . . . . . 13
2.6 Classiﬁcation of the various data representation techniques in standard mi-
croelectronic technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.7 The relative cost of computation using analogue or digital signal processing.
The crossover between analogue and digital having the advantage is in be-
tween 50dB to 72dB SNR (8 to 12 bits resolution) depending on application
and circuit topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
viii
LIST OF FIGURES ix
2.8 Hybrid processing architectures with a single data conversion stage. (a) con-
ventional analogue front-end with digital processor and output (b) hybrid
analogue/digital processing platform with digital output (c) hybrid analogue
and sampled data processing platform with digital output and (d) hybrid
analogue/spike-domain processor with asynchronous digital output . . . . . 22
2.9 Evolving complexity of common MOSFET simulation models; the two plots
illustrating the trend for the BSIM and EKV models. . . . . . . . . . . . . 24
2.10 Eﬀect of CMOS technology scaling on threshold voltage mismatch . . . . . 29
3.1 A typical CCD imager (full-frame based) architecture. . . . . . . . . . . . . 42
3.2 A typical CMOS imager architecture . . . . . . . . . . . . . . . . . . . . . . 43
3.3 A conventional real-time image processing platform . . . . . . . . . . . . . . 45
3.4 Real-time distributed-processing vision chip architecture . . . . . . . . . . . 46
3.5 Object centroid computation by row/column summation followed by one di-
mension mean point calculation. Binary (left) and analogue (right) represen-
tation schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.6 Object centroid computation by winner-take-all network (competition illus-
trated only in one dimension). Using basic resistive grid (left) for single
centroid, dynamic switching network (middle) for saliency/object segmenta-
tion and interrogation window (right) for centroid tracking . . . . . . . . . 55
4.1 Example analysis scenarios where extraction of object centroid and/or size
could provide useful information in (from left to right): (a) Pharmaceutical
drug production (b) Reliability (leak) detection of bubbles in ﬂuids and (c)
Microscopic cellular population analysis . . . . . . . . . . . . . . . . . . . . 63
4.2 Computer simulation results of the bio-pulsating contour reduction algo-
rithm, illustrating continuous-time image processing functions (top row) and
snapshots taken at regular time intervals at the propagation delay of the
processing (bottom 3 rows) . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3 Front-end continuous-time image-processing functionality. . . . . . . . . . . 70
LIST OF FIGURES x
4.4 Local connectivity required for binary signals (state, reset and centre) to and
from every pixel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5 Proposed cellular architecture for object-based processing illustrating organ-
isation and connectivity of functional blocks within a quad-pixel arrangement. 74
4.6 The ORASIS simulator: screenshot of the developed software simulator for
the bio-pulsating contour reduction algorithm. . . . . . . . . . . . . . . . . 76
4.7 Acceptable noise margins for error-free binary edge detection and thresholding. 79
4.8 Simulated results demonstrating the algorithms inherent tolerance to erro-
neous (or incomplete) binary feature extraction. Shown are, response to:
(a) perfect binary extraction, (b,d) to incomplete thresholding and (c,e) to
incomplete edge detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.9 Statistical simulations to demonstrate robustness to array non-uniformities.
Algorithmic response to additive spacial gaussian noise representing ﬁxed
pattern noise (top) and random speckle noise representing array gain/sensitivity
non-uniformity (bottom). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.10 Statistical simulations to demonstrate robustness to array phototransduc-
tion/ampliﬁcation defects. Algorithmic response to additive salt and pepper
noise representing pixels with permanent low/high response. . . . . . . . . . 87
4.11 The generalised distributed processing architecture; a one dimensional array
illustrating the various functional layers and interconnectivity. Increased
performance and/or added functionality (eg. localised gain control, noise-
shaping, oversampling, etc) could be realised through closed loop conversion
mechanisms (dotted arrows), although these would be purely synthetic as
oppose to biologically-inspired. . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.1 The basic P-I-N diode under zero external bias illustrating: (a) cross-section
and (b) energy band diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.2 The basic P-N junction diode under zero external bias illustrating: (a) cross-
section (b) n/p concentration proﬁle (c) space charge density (d) electric ﬁeld
and (e) internal potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
LIST OF FIGURES xi
5.3 Cross section of a typical modern deep submicron CMOS technology showing
the stacked metal layers with insulating dielectrics and optical transmission
path (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.4 Basic geometric dimensions (design and technology/bias deﬁned) for a single
junction device representing the surface (left) and cross-section (right) views. 105
5.5 Various single-junction photodiode structures fabricated and tested in 0.18μm
CMOS. Illustrated are the surface and cross-section views of the following
structures: (a) n++/p-substrate (b) n++ rings/p-substrate (c) n++/p-well
(d) n++ rings/p-well (e) n++ strips/p-well (f) n-well/p-substrate (g) n-well
strips/p-substrate (h) n-well grid/p-substrate (i) n-well mesh/p-substrate
and (j) p++/n-well. All devices are sized 30μm × 30μm and illustrations
not to scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.6 Various multi-junction photodiode structures fabricated and tested in 0.18μm
CMOS. Illustrated are the surface and cross-section views of the following
structures: (a) t-well/n-well/p-substrate (b) t-well grid/n-well/p-substrate
(c) n++/t-well/n-well/p-substrate and (d) p++/n-well/p-substrate. All de-
vices are sized 30μm× 30μm and illustrations not to scale. . . . . . . . . . 111
5.7 Microphotographs of photodiode structures fabricated and tested in 0.18μm
CMOS. Illustrated are: (a) n++/p-well (b) n++ rings/p-well (c) n++ strips/p-
well (d) n-well/p-substrate (e) n-well strips/p-substrate (f) n-well grid/p-
substrate (g) n-well mesh/p-substrate, (h) p++/n-well, (i) p++/n-well grid,
(j) t-well/n-well/p-substrate (k) t-well grid/n-well/p-substrate and (l) n++/t-
well/n-well/p-substrate. All devices are sized 30μm× 30μm. . . . . . . . . 111
5.8 Measured IV characteristics of various test (photodiode) structures (using
calibrated light source: λ=550nm, Pmax=0.45mW/cm2). Shown are the
characteristics for: (a) n++/p-substrate (b) n++ rings/p-substrate (c) n++/p-
well (d) n++ rings/p-well (e) n++ strips/p-well and (f) n-well/p-substrate. 113
5.9 Measured IV characteristics of various test (photodiode) structures (using
calibrated light source: λ=550nm, Pmax=0.45mW/cm2). Shown are the
characteristics for: (a) n-well strips/p-substrate (b) n-well grid/p-substrate
(c) n-well mesh/p-substrate (d) p++/n-well and (e) p++ strips/n-well. . . 114
LIST OF FIGURES xii
5.10 Measured responsivities of various test (single junction photodiode) struc-
tures (using calibrated light source: λ=550nm, Pmax=0.45mW/cm2). . . . 116
5.11 Measured spectral photoresponse of single junction photodiode structures
(using controlled light source: 350nm < λ < 750nm). . . . . . . . . . . . . . 117
5.12 Measured spectral responsivity of single junction photodiode structures (us-
ing controlled light source: 350nm < λ < 750nm). . . . . . . . . . . . . . . 118
5.13 Measured spectral quantum eﬃciency of single junction photodiode struc-
tures (using controlled light source: 350nm < λ < 750nm). . . . . . . . . . 118
5.14 Measured spectral photoresponse of multiple junction photodiode structures
(using controlled light source: 350nm < λ < 750nm). Devices tested are:
t-well/n-well/p-sub (left) and n++/t-well/n-well/p-sub (right). . . . . . . . 119
5.15 Measured spectral responsivity of multiple junction photodiode structures
(using controlled light source: 350nm < λ < 750nm). Devices tested are:
t-well/n-well/p-sub (left) and n++/t-well/n-well/p-sub (right). . . . . . . . 120
5.16 Measured spectral quantum eﬃciency of multiple junction photodiode struc-
tures (using controlled light source: 350nm < λ < 750nm). Devices tested
are: t-well/n-well/p-sub (left) and n++/t-well/n-well/p-sub (right). . . . . 121
5.17 Measured spectral quantum eﬃciency (normalised) of spectrally selective sin-
gle junction photodiode structures (using controlled light source: 350nm <
λ < 750nm). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.18 Measured spectral quantum eﬃciency (normalised) of multiple-junction pho-
todiode structures (using controlled light source: 350nm < λ < 750nm).
Devices tested are: t-well/n-well/p-sub (left) and n++/t-well/n-well/p-sub
(right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.19 Measured spectral quantum eﬃciency (normalised) of spectrally unselective
single junction photodiode structures (using controlled light source: 350nm <
λ < 750nm). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.20 Measured and simulated (based on developed photoresponse model) spectral
quantum eﬃciency comparison for: (a) n++/p-well, (b) n++ rings/p-well,
(c) n-well/p-substrate and (d) n-well strips/p-substrate. . . . . . . . . . . . 123
LIST OF FIGURES xiii
5.21 Scanning electron microscope (SEM) images of the ORASIS-P2 surface; il-
lustrating the passivation layer/air interface proﬁle. Cross-section through a
photodiode region shown in lower-right image. . . . . . . . . . . . . . . . . . 125
5.22 Various photodiode interface topologies. Shown are: (a) logarithmic sensor
using single MOS diode (b) logarithmic sensor using two series MOS diodes
(c) adaptive photoreceptor (d) active pixel sensor (APS) and (e) spiking
photoreceptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.23 Measured photodiode characteristics. The photodiode response is linear until
below 50pW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.24 Core algorithm per photoreceptor group includes: phototransduction, ON/OFF
opponency, spike generation, variable spike interval encoding and input se-
lection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.25 Circuit schematic of the adaptive-ON/OFF spiking photoreceptor block, op-
erated oﬀ a 1.8v core supply (implemented in 0.18m CMOS). Illustrated is
the basic scheme for generating an adaptive-ON/OFF spiking output for a
single photodiode input. Shown at the bottom-left is the implementation
of the thresholding comparator based on a scaled cascade of current-limited
digital inverters. Unless stated all devices have aspect ratio (4/3)lmin for
NMOS and (10/3)lmin for PMOS, with lmin being the technology minimum
feature size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.26 Physical layout of the adaptive-ON/OFF spiking photoreceptor block, imple-
mented in UMC 0.18μm Mixed-mode CMOS. By area, the photodiode has
a 52% ﬁll factor, the threshold detectors occupy 18%, current mirrors 16%
and asynchronous digital logic 14%. . . . . . . . . . . . . . . . . . . . . . . 137
5.27 Simulated transient analysis for the individual ON and OFF channel spike
generators. The waveforms shown- from top to bottom: (a) photocurrent (b)
ON channel charging response (c) ON channel spike output (d) OFF channel
charging response (e) OFF channel spike output. . . . . . . . . . . . . . . . 138
5.28 Simulated transient analysis for the combined ON/OFF channel spike gen-
erator. The waveforms shown- from top to bottom: (a) photocurrent (b)
competing ON/OFF charging response (c) spike output (d) ON/OFF chan-
nel selection (e) combined spike and ON/OFF channel encoded output. . . 139
LIST OF FIGURES xiv
5.29 Simulated transient analysis for illustrating power consumption proﬁle. The
waveforms shown- from top to bottom: (a) photocurrent (b) combined spike
and ON/OFF channel encoded output (c) current consumption and (d) in-
tegrated current consumption. . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.30 Measured photo-response results for the adaptive-ON/OFF spiking photore-
ceptor circuit. Illustrated is the spike rate to incident light power relationship
for various bias current levels. The action to shift the ON/OFF transition
point can be clearly seen. The light intensity incident on the chip is the
equivalent of a well lit room. . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.31 Measured bias current tuning results for the adaptive-ON/OFF spiking pho-
toreceptor circuit. Shown (from top to bottom) is: (a) the incident light
power ON/OFF crossover point versus bias current and (b) the spike rate
versus bias current for dark current, i.e. zero incident light power. . . . . . 142
5.32 Circuit schematic of the selective output encoder/controller block, operated
oﬀ a 1.8v core supply (implemented in 0.18μm CMOS). Illustrated is the spike
counting circuitry for generating outputs of reconﬁgurable dynamic range, in
addition to the photodiode multiplexing control for selection between single
and multi-pixel operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.1 The proposed ORASIS System architecture. Illustrated are the three main
components: pixel processing array, address event representation readout
and current/supply/control distribution tree. The dotted lines represent the
“stretch” marks, i.e. how the system can be scaled to a larger size array. . . 154
6.2 The proposed ORASIS Pixel architecture. Illustrated are the four main
components: sensor (photodiode), Analogue Signal Processor (ASP), Asyn-
chronous Binary Processor (ABP) and Address Event Representation (AER)
neuron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.3 Symbol representation of the analogue signal processing (ASP) core. Illus-
trated is the external connectivity of analogue signals, i.e. with other cells,
showing the requirements for cellular tessellation. This excludes global con-
trol signals and bias point connections. Nodes have been abbreviated for
clarity as follows: vp=vphoto, ip=iphoto, ve=vedge. . . . . . . . . . . . . . 160
LIST OF FIGURES xv
6.4 Schematic diagram of the in-pixel analogue signal processing (ASP) organ-
isation. Illustrated is the internal connectivity emphasising the signal ﬂow
path between the various blocks. . . . . . . . . . . . . . . . . . . . . . . . . 161
6.5 Schematic diagram of the photodiode interface circuit including current-mode
narrow- and wide-ﬁeld averaging/smoothing and comparison. . . . . . . . . 161
6.6 Schematic representation of the wide-ﬁeld (column) averaging mechanism.
Example given for a three row example. . . . . . . . . . . . . . . . . . . . . 163
6.7 Schematic diagram of the tunable discrete edge detector circuit. Details
of the current generation scheme (Isource and Isink) and implementation of
thresholding inverters (X1 and X2) are provided later. . . . . . . . . . . . . 164
6.8 Simulation results of the edge detector circuit illustrating the discrete de-
tection at varying light intensities. Results are for: Isource = 1nA, Isink =
600pA, with 1pA ≤ Iphoto1, Iphoto2 ≤ 10nA. Shown (from top to bottom) are:
(a) Id1, (b) Id2, (c) Vedge1, (d) Vedge2, (e) Vout and (f) Ivdd . . . . . . . . . . 168
6.9 Simulation results of the edge detector circuit illustrating the tunable sensi-
tivity. Results are for: Isource = 1nA, 500pA ≤ Isink ≤ 1nA, Iphoto1 = 300pA,
with 1pA ≤ Iphoto2 ≤ 10nA. Shown (from top to bottom) are: (a) Id1, (b)
Id2, (c) Vedge1, (d) Vedge2, (e) Vout and (f) Ivdd . . . . . . . . . . . . . . . . . 169
6.10 Monte Carlo simulation results for the edge detector illustrating variability of
edge detection window to process variation and mismatch. For Isource = 2nA,
Isink = 1.5nA, Iphoto1 = 300pA, 1pA ≤ Iphoto2 ≤ 10nA, statistical simulation
of N = 1979 runs results in: μlower = 176.009pA, σlower = 32.190pA, μupper =
521.234pA, σupper = 98.071pA . . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.11 Schematic diagram of the contour discrimination combinational logic. . . . 171
6.12 Schematic diagram of the in-pixel current distribution circuit, providing bias
currents to the edge detecting and timing delay blocks. . . . . . . . . . . . 171
6.13 Schematic diagram of the current-limiting thresholding inverter; used for 1-
bit conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
LIST OF FIGURES xvi
6.14 Monte Carlo simulation results for the thresholding inverter illustrating vari-
ability of threshold voltage to process variation and mismatch. μ = 275.322m,
σ = 29.028m, N = 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.15 Symbol representation of the in-pixel asynchronous binary processing block.
Illustrated is the external connectivity of discrete signals, i.e. with other
cells, showing the requirements for cellular tessellation. This excludes global
control signals and output signals. Nodes have been abbreviated for clarity
as follows: S=STATE, R=RESET, C=CENTRE. . . . . . . . . . . . . . . . 174
6.16 Schematic diagram of the Asynchronous Binary Processing (ABP) organisa-
tion. Illustrated is the internal connectivity emphasising the signal ﬂow path
between the various blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.17 Schematic diagram of the state set logic facilitating the forward asynchronous
signal propagation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.18 Schematic diagram of the state reset logic facilitating the reverse signal (back)
propagation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6.19 Schematic diagram of the state memory logic for centroid determination and
state deﬁnition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
6.20 Schematic diagram of the current-controlled delay circuit for creating an
asynchronous discrete delay. . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
6.21 Measured delay cell performance for diﬀerent bias currents (Idelay), illustrat-
ing statistical variation over a batch of ten samples. . . . . . . . . . . . . . 179
6.22 Address-Event-Representation (AER) architecture for a 4x4 array. Illus-
trated are all required sub-blocks for an AER sending device, excluding pull-
up and pull-down biases for shared line drivers. . . . . . . . . . . . . . . . . 181
6.23 Schematic diagram of the sender neuron circuit facilitating the pixel hand-
shake with the AER row/column latches. . . . . . . . . . . . . . . . . . . . 182
6.24 Schematic diagram of the row and column latch circuit; locking a pixel’s
address upon arbitration until being successfully transmitted oﬀ-chip. . . . 183
6.25 Schematic diagram of the arbiter circuit, interconnected hierarchically to
synthesise the arbitration trees for row and column selection. . . . . . . . . 184
LIST OF FIGURES xvii
6.26 Schematic diagram of the address encoder circuit, shown for an eight input,
3-bit output example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6.27 The ORASIS-P1 test chip layout (top), microphotograph (top right) and
basic ﬂoorplan (bottom); implemented in UMC 0.18μm 1P6M mixed-mode
CMOS, accessed through Europractice (IMEC). The die size is 1.525mm x
1.525mm (excluding scribe line). Metal layers 5 and 6 have been excluded
for clarity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
6.28 The ORASIS-P2 chip layout (top), microphotograph (top right) and basic
ﬂoorplan (bottom); implemented in UMC 0.18μm 1P6M mixed-mode CMOS,
accessed through Europractice (IMEC). The die size is 5.0mm x 5.0mm (ex-
cluding scribe line). Metal layers 5 and 6 have been excluded for clarity. . . 188
6.29 The bottom-right array corner layout (top) and ﬂoorplan (bottom), illus-
trating the implementation of the current distribution scheme and array-side
address-event circuitry. Metal layer 6 has been excluded for clarity. . . . . . 189
6.30 The ORASIS-P2 regular cell layout (top) and ﬂoorplan (bottom). The cell
size is 85μm×85μm with 30μm×30μm active photodiode area, giving a 12.5%
surface ﬁll factor. Metal layers 5 and 6 have been excluded for clarity. . . . 190
6.31 Transient analysis simulation results for a 9×9 ABP core illustrating bio-
pulsating action for a single object image. Results shown are taken across
the central row (Y=5) for: (a) state propagation, (b) reset back-propagation
and (c) current consumption. . . . . . . . . . . . . . . . . . . . . . . . . . . 194
6.32 Transient analysis simulation results for a 12×12 AER sending architecture
illustrating arbitration for 24 colliding events. Results shown are: (a) the
AER bus output/handshake and (b) current consumption. . . . . . . . . . . 196
6.33 Transient analysis simulation results for a 12×12 complete array for a single
circular object input (8 pixel diameter). Results shown are: (a) the AER bus
output/handshake; event at position (6,6) and (b) current consumption. . . 198
6.34 Transient analysis simulation results for a 12×12 complete system (includ-
ing bias and signal distribution) for a single circular object input (8 pixel
diameter). Results shown are: (a) the AER bus output/handshake; event at
position (6,6) and (b) current consumption. . . . . . . . . . . . . . . . . . . 199
LIST OF FIGURES xviii
6.35 Test images with single uniform objects, with pixel grid overlayed including
measured centroid position and size. . . . . . . . . . . . . . . . . . . . . . . 201
6.36 Test images with single non-uniform objects, with pixel grid overlayed in-
cluding measured centroid position and size. . . . . . . . . . . . . . . . . . . 202
6.37 Test images with multiple uniform objects, with pixel grid overlayed including
measured centroid positions and sizes. . . . . . . . . . . . . . . . . . . . . . 203
6.38 Pseudo-dithering providing increased centroid position accuracy through suc-
cessive averaging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
6.39 Pseudo-dithering providing increased object size accuracy through successive
averaging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.40 Measured supply current levels illustrating the eﬀect of tuning main bias cur-
rent (feeding edge detectors and discrete delays) on system power consumption.207
6.41 Measured supply current levels illustrating the eﬀect of illumination level on
system power consumption for various tuning bias current levels (controlling
the edge detecting threshold). . . . . . . . . . . . . . . . . . . . . . . . . . . 208
6.42 Eﬀect of channel length and bulk (reverse) bias on static leakage (oﬀ) current
(at Vds = 1.8V, Vgs = 0V) for (a) NMOS and (b) PMOS devices. . . . . . . 210
6.43 Dependance of process time on bias current, given for input images including
objects of maximum size of 3, 4, 5, 6 and 8 pixel radius. . . . . . . . . . . . 211
6.44 Measured address-event bus capacity (bandwidth) by varying pull-up/pull-
down bias currents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
C.1 Schematic diagram of the simulated 16×16 ASP array. . . . . . . . . . . . . 243
C.2 Schematic diagram of the simulated 9×9 ABP array. . . . . . . . . . . . . . 244
C.3 Schematic diagram of the simulated 12×12 AER architecture. . . . . . . . . 245
C.4 Schematic diagram of the simulated 12×12 array. . . . . . . . . . . . . . . . 246
C.5 Schematic diagram of the simulated system including a 12×12 distributed
processing array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
LIST OF FIGURES xix
D.1 Schematic diagram of the ORASIS-P1 platform for sub-circuit test and pho-
todiode characterisation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
D.2 Photograph of the ORASIS-P1 platform for sub-circuit test and photodiode
characterisation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
D.3 Schematic diagram of the ORASIS-P2 platform for photodiode characterisation.251
D.4 Photograph of the ORASIS-P2 platform for photodiode characterisation. . . 252
D.5 Schematic diagram of the ORASIS-P2 platform for system test and validation.253
D.6 Photograph of the ORASIS-P2 platform for system test and validation. . . 254
E.1 Diagram of equipment setup for photodiode characterisation. . . . . . . . . 256
E.2 Calibrated (measured) light source intensity characteristics. Illustrated are:
(a) intensity proﬁle and (b) spectral transmission (normalised to maximum
value). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
List of Tables
2.1 Comparison of Neural and Electronic Computational Paradigms . . . . . . 7
2.2 A qualitative comparison of linear computations implemented using diﬀerent
signal representation techniques. . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 A qualitative comparison of non-linear computations implemented using dif-
ferent signal representation techniques. . . . . . . . . . . . . . . . . . . . . . 20
3.1 Comparison of CCD and CMOS imager technologies . . . . . . . . . . . . . 44
3.2 Comparative review of centroid detecting vision chips. . . . . . . . . . . . . 53
4.1 Split of computational load for various processing functions . . . . . . . . . 77
4.2 Example image analysis (red blood cells) for statistical spread in object and
background intensity levels. This is used to determine the edge (Eoffset) and
threshold (Toffset) levels for optimum robustness. . . . . . . . . . . . . . . . 81
5.1 Design parameters (process deﬁned and geometric) for the various test (pho-
todiode) structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.2 Measured electrical characteristics for the various test (photodiode) struc-
tures. Light source is calibrated at: Plight=0.45mW/cm2, λ=550nm. . . . . 115
6.1 Tessellating cellular interconnectivity; in total, each cell has 190 connections
with adjacent cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
xx
LIST OF TABLES xxi
6.2 Simulation results for a 16×16 ASP core indicating average power consump-
tion levels for typical stimuli through the diﬀerent process corners. . . . . . 192
6.3 Comparison between expected (based on constituent ASP, ABP, AER em-
bedded arrays) and simulated power consumption for a combined 12×12 array.200
6.4 ORASIS-P2 system properties and performance summary . . . . . . . . . . 214
To my parents and Marianna!
Acknowledgements
First and foremost, I would like to acknowledge my supervisor Chris Toumazou. He has
always been a constant source of inspiration, support and encouragement. Chris has taught
me that imagination, intuition and common sense are more important than formal educa-
tion. He has always motivated me to look at the bigger picture whilst guiding me to remain
focused.
I would also like to thank Julius Georgiou; a friend, colleague and brilliant engineer. It
was Julio who ﬁrst proposed the idea of doing a PhD to me. Over the years we have worked
closely together and I have been fortunate to have his feedback and insight time and time
again.
From the University of Oslo (UIO), I want to thank Tor Sverre Lande (Bassen) who
introduced me to the world of neuromorphic electronics. His constant enthusiasm in non-
conventional, elegant solutions and strong belief in my work had encouraged me to pursue
research in this ﬁeld. Also from UIO, I wish to acknowledge Philipp Ha¨ﬂiger, for providing
me complete access to his address event libraries.
I want to thank from the Physics & Bioengineering departments Patrick Degenaar and
Dylan Banks for many invaluable discussions and for providing me unrestricted access to
taking optoelectronic measurements in the Blackett Laboratory. Patrick initiated the idea
of the ON/OFF spiking photoreceptor and I have enjoyed developing it with him.
From the Circuits & Systems group, I would like to thank both past and present mem-
bers; providing a friendly yet intellectually rich environment. I thank Christos Papavas-
siliou, Solon Despotopoulos, Ganesh Kathiresan, Emmanuel Drakakis, Kostis Michelakis,
Kritsapon (Chy) Leelavattananon, Tony Vilches, Rohit Arora, Okundu Omeni and Mo-
hammed Semati for introducing and welcoming me to the group, David Yates, Calvin
Sim and Wim Melis for enduring the past three years alongside me and Michalis Frangos,
2
Acknowledgements 3
Francesco Cannillo, Leila Shepherd, Pantelakis Georgiou, Andreas Katsiamis, Soﬁa Vatti,
Themistoklis Prodromakis, Chun Lee, Aleksandra Rankov, Phil Corbishley, Xu Min and
the list goes on.. for continuing to provide a wonderful work environment. Furthermore, I
wish to thank the group administrators, Wiesia Hsissen and Angela Bishop for their con-
stant support and warmth they bring to the group. From outside the group, I thank Alex
Charalambides and Samia Antoun for taking countless coﬀee breaks with me when work
got too much.
I have been very fortunate to receive both ﬁnancial support and access to technical
resources from the team at Toumaz Technology Ltd, in particular I want to thank Keith
Errey and Alison Burdett.
Last and most importantly, I would like to express my gratitude to my family. I thank
my parents Gregory and Niki, my sister Marianna and my grandparents Timothy and
Yianoulla for their continuing support and encouragement. I ﬁnally thank my partner
Katy for enduring the past three years and providing me with constant love, support and
excitement.
Abstract
Vision processing is a topic traditionally associated with neurobiology; known to en-
code, process and interpret visual data most eﬀectively. For example, the human retina;
an exquisite sheet of neurobiological wetware, is amongst the most powerful and eﬃcient
vision processors known to mankind. With improving integrated technologies, this has
generated considerable research interest in the microelectronics community in a quest to
develop eﬀective, eﬃcient and robust vision processing hardware with real-time capability.
This thesis describes the design of a novel biologically-inspired hybrid analogue/digital
vision chip ORASIS1 for centroiding, sizing and counting of enclosed objects. This chip is
the ﬁrst two-dimensional silicon retina capable of centroiding and sizing multiple objects2
in true parallel fashion. Based on a novel distributed architecture, this system achieves
ultra-fast and ultra-low power operation in comparison to conventional techniques.
Although speciﬁcally applied to centroid detection, the generalised architecture in fact
presents a new biologically-inspired processing paradigm entitled: distributed asynchronous
mixed-signal logic processing. This is applicable to vision and sensory processing appli-
cations in general that require processing of large numbers of parallel inputs, normally
presenting a computational bottleneck.
Apart from the distributed architecture, the speciﬁc centroiding algorithm and vision
chip other original contributions include: an ultra-low power tunable edge-detection circuit,
an adjustable threshold local/global smoothing network and an ON/OFF-adaptive spiking
photoreceptor circuit.
Finally, a concise yet comprehensive overview of photodiode design methodology is pro-
vided for standard CMOS technologies. This aims to form a basic reference from an en-
gineering perspective, bridging together theory with measured results. Furthermore, an
approximate photodiode expression is presented, aiming to provide vision chip designers
with a basic tool for pre-fabrication calculations.
1ORASIS is taken from the Greek word: o´ραση (orasi) meaning vision.
2Such as biological cells under a microscope, pharmaceutical drugs under production line inspection, etc.
Chapter 1
Introduction
1.1 Motivation Neurobiology
Biology makes excellent use of resources to solve a given task. For example, the human
retina; an exquisite sheet of neural tissue, is made up of a layered network of millions
of poorly replicated yet statistically identical primitives to provide the brain with a well-
conditioned neural image of what we see. Extending beyond the retina into the brain,
this employs much the same strategy; using billions of poorly deﬁned primitives to achieve
the most computationally demanding and perceptive tasks. Face recognition, real-time
navigation control, object segmentation, depth perception, saliency detection are a few tasks
that we routinely perform and take for granted; yet even our most advanced computational
hardware is incapable of achieving acceptable let alone comparable results. Biology is
eﬃcient, robust, adaptable, real-time, eﬀective, scalable and reliable.
For the above reasons, any engineer would do extremely well in learning from nature.
In electronics, developing systems based on neurobiological hierarchy, organisation [1], rep-
resentation and/or structure could result in improved eﬃciency, robustness, adaptability,
responsivity, eﬀectiveness and reliability.
1.2 Research Objectives
This research is aimed in exploring biologically-inspired electronics [2] through developing a
speciﬁc application, employing a hybrid strategy. This aims to combine biological-inspired
1
Introduction 2
methods together with conventional techniques to achieve the best of both worlds. Subse-
quently, this work is targeted towards providing some insight into answering the following
questions:
• In which aspects is neurobiology superior to modern microelectronic technologies and
vice-versa ?
• What can we learn from biology that is eﬃciently implementable in standard mi-
croelectronic technologies and can provide substantial advantage over conventional
techniques ?
1.3 Overview
This section aims to provide a concise and brief, single paragraph introduction to the
material covered in each of the following chapters.
1.3.1 Biologically Inspired Electronics
This chapter is aimed at introducing the term “biologically inspired electronics” and iden-
tifying appropriate technologies for implementing such systems. Initially fundamentals of
neurobiology such as structure, organisation and hierarchy are described, leading to the
notion of biologically inspired representation. This approach is then extrapolated to mi-
croelectronics; in particular emphasising on process eﬃcient representation. The notion
of hybrid computation is discussed and the relative merits and ﬂaws concerning resource
eﬃcient computation are compared for common operations. Finally for modes of device op-
eration in current microelectronic technologies and associated design techniques related to
reliable and eﬃcient implementation in this representation space are discussed and reviewed.
1.3.2 Modern Vision Processing Technology
This chapter begins by comparing modern imaging and vision acquisition process technolo-
gies. Following is an extension to a typical processing platform identifying key features
and limitations. Distributed techniques resembling biologically-inspired systems are then
introduced and outlines as to provide a plausible solution to the limitations of modern sys-
tems. Finally an extensive vision chip review is presented for all centroid processing systems
Introduction 3
reported to date; employing both traditional computational techniques and embedded in
distributed hardware.
1.3.3 A Distributed Architecture for Centroid Detection
This chapter presents a novel distributed algorithm for centroiding and sizing of regular ob-
jects. Through a true parallel processing approach, tradition information ﬂow bottlenecks
have been overcome and high computational eﬃciency is achievable. Furthermore, the pre-
sented scheme; based on a biologically-inspired organisation demonstrates great robustness
to fabrication non-idealities and ill-conditioned sensor data. This has been experimentally
veriﬁed by testing both to non-uniform input data and pixel processing element mismatch.
Finally the presented architecture is extended to form a generic distributed array processing
paradigm. The basic concept is described and related to a biological vision system with
some example implementable algorithms being outlined.
1.3.4 Photodiodes in Modern Deep Sub-Micron CMOS Technology
During the past few years, an increased interest in CMOS technology for imaging appli-
cations; principally due to the progress of the Active Pixel Sensor (APS) approach, has
prompted much work on photodiode modelling. Although much work has been published
on modelling phototransduction, reliable modelling in standard CMOS technologies has not
been widely accessible. This is partly due to corporate interests, and partly due to estab-
lished semiconductor physics being applicable to CMOS phototransduction. This chapter
begins by presenting a uniﬁed phototransduction theory speciﬁcally for PN junction photo-
diodes within standard CMOS technologies. A series of test devices are then fabricated in
a modern technology and all the measured results are presented. These are used to validate
the developed model and outline a set of generic design rules for good photodiode design
within a standard CMOS technology. Finally, a biologically-inspired photoreceptor circuit
is presented; using an ON/OFF spiking scheme to provide adaptable, tradable dynamic
range, spatial and temporal resolution. This circuit is designed to be part of a dynamically
reconﬁgurable foveating silicon retina, details of which are not included.
Introduction 4
1.3.5 ORASIS: A Micropower Centroiding Vision Processor
The ﬁnal chapter presents the complete system implementing the distributed algorithm
(described in Chapter 4) and photosensing elements (described in Chapter 5) into custom
integrated hardware. This system, named ORASIS; presents the ﬁrst distributed vision
processor (or silicon retina) capable of multiple centroid detection and sizing; implemented
in a standard CMOS technology. This chapter describes the architecture, hierarchy, inter-
connectivity and speciﬁc circuit blocks also including simulated and measured results.
References
[1] G. M. Shepherd (ed.), The Synaptic Organization of the Brain. Oxford University Press,
1998.
[2] C. A. Mead, Analog VLSI and Neural Systems. Addison-Wesley, 1989.
5
Chapter 2
Biologically Inspired Electronics
2.1 Introduction
Although modern microelectronic technologies have surpassed our expectations in virtually
all areas, there still remains a vast application space of computational problems either too
challenging or complex to be solved with conventional means. These applications often
require the transformation of data across the boundary between the real (analogue) world
and the digital world. The problem arises, whenever a system is sampling and acting on
real-world data, for example in any recognition or identiﬁcation task. Traditional processing
techniques ﬁnd it very challenging and computationally demanding to identify and process
complex structures and relationships in vast quantities of ill-conditioned data (low precision,
ambiguous and noisy) [1].
Although great progress has been made in hardware processing techniques (eg. DSP,
FPGA), in both computational power and eﬃciency, the solution to complex recognition
tasks still continues to elude us. Furthermore, neither artiﬁcial intelligence, artiﬁcial neural
networks nor fuzzy logic has provided us with an eﬀective and robust solution. However,
biological organisms routinely accomplish complex visual tasks such as object recognition
and target tracking. For example, a common houseﬂy; with a brain the size of a grain of rice
can outperform our modern multiple gigahertz processors in real-time obstacle avoidance
in ﬂight navigation in addition to countless other perception tasks.
Thus, the Neuromorphic community has emerged aiming to provide a design method-
ology for tackling such problems. Using hybrid, distributed processing architectures based
on simple primitives; inspired by biology, modern microelectronic technology is progressing
6
Biologically Inspired Electronics 7
Computation Neural Electronic Ref.
(von Neumann based)
Representation Analogue, spike domain1 Digital [3]
Processing Parallel Serial [3]
Power Low High [4]
Speed Low, High2 High [4]
Learning Adaptive Preset [5]
Precision Fuzzy Accurate [5]
Sensitivity Redundant Fault sensitive [6]
Organisation Monolithic Modular [7]
Connectivity Very high fan-out Sparsely connected [7]
Geometry 3 dimensions3 2 dimensions [6]
1 Intracellular representation is continuous whereas intercellular is dis-
crete.
2 Dependant on the particular process and level of abstraction, eg. low
speed operation at neuronal level, however extremely high speed at cor-
tical level.
3 Often classed as 2.5 dimension as neural tissue typically has layered
structure that folds, occupying a 3 dimensional space.
Table 2.1: Comparison of Neural and Electronic Computational Paradigms
one step closer to ﬁnding a workable solution to these problems.
2.2 Neural Organisation [2]
Neurobiology has a massively diﬀerent organisation to that of any conventional electronic
system. This vast diﬀerence is highly evident when comparing various features of the two
systems (see Table 2.1).
This section aims to provide a basic overview of the neurobiological organisation, start-
ing from the fundamental processing element and building block; the biological neuron. The
mechanism of synaptic processes; providing intra- and inter-neuronal communication and
Biologically Inspired Electronics 8
NucleusDentrites Myelin
Sheath
Soma
(body)
Button
(Feet)
Axon
Terminals
Node of
Ranvier
Schwann
Cell
SynapsesAxon
Hillock
Axon (under
myelin)
DATA IN
DATA OUT
Figure 2.1: Neural architecture - Horizontal cells from a rabbits retina representing the
intricate web-like neural interconnectivity (left) and typical representation of a neuron high-
lighting data ﬂow (right)
interaction will then be discussed. Having established these key primitives, the neural hier-
archical organisation will be outlined; linking together genes, molecules, synapses, neurons,
neural networks and beyond.
2.2.1 Neurons [8]
The brain is a collection of about 10 billion interconnected neurons. A neuron is a cell (see
Fig. 2.1) which uses biochemical reactions to receive, process and transmit information.
Each neuron’s dendritic tree is connected to up to thousands of neighbouring neurons.
When any of these neurons ﬁre, a positive or negative charge is received by one of the den-
drites. The strengths of all the received charges are added together through the processes
of spatial and temporal summation. The aggregate input is then passed to the soma (cell
body). The soma and the enclosed nucleus don’t play a signiﬁcant role in the processing of
incoming and outgoing data. Their primary function is to perform the continuous mainte-
nance required to keep the neuron functional. The part of the soma that does concern itself
with the signal is the axon hillock (and the internal surface membrane.) If the aggregate
input is greater than the axon hillock’s threshold value, then the neuron “ﬁres” and an
output signal (known as the action potential) is transmitted down the axon. The strength
of the output is constant, regardless of whether the input was just above the threshold, or
say a hundred times as great, however the time to spike is aﬀected. The output strength is
Biologically Inspired Electronics 9
Dentrite of
receiving neuron
Acetylcholine
receptors
Synaptic
cleft
Plasma
membrane
Neurotransmitter
molecules
Axon of 
transmitting neuron
Myaelin
Synaptic
vescicle
Figure 2.2: The synapse, a neurobiological mechanism forming the contact sites that facil-
itate interneuronal connections for transmission and processing of neural information.
unaﬀected by the many divisions in the axon; it reaches each terminal button with the same
intensity it had at the axon hillock. This uniformity is critical in an analogue device such
as a brain where small errors can snowball, and where error correction is more diﬃcult than
in a digital system. It is this action of the neuron ﬁring when a threshold value is reached
which introduces the gain that enables the processing in the neurobiological system.
2.2.2 Synapses [7]
Each terminal button is connected to other neurons across a small gap called a synapse
(see Fig. 2.2) The physical and neurobiological properties of each synapse, determines the
strength and polarity of the new input signal. This is where the nervous system is the most
ﬂexible. Changing the constitution of various neurotransmitter chemicals can increase or
decrease the amount of stimulation that the ﬁring axon imparts on the neighbouring den-
drite. Altering the neurotransmitters can also change whether the stimulation is excitatory
or inhibitory. It is this dynamic nature of the synapses that provides the neurobiological
architecture with the ability to adapt and “learn”. This results in an extremely robust
system architecture, being considerably immune to component degradation and failure.
Biologically Inspired Electronics 10
Behavioural
Systems
Neural
Networks
Local
Circuits
Neurons
Micro-
circuits
Synapses Molecules Genes
Dendritic
Trees
Increasing complexity Decreasing complexity
Figure 2.3: The hierarchical organisation of the nervous system, from the highest level;
behavioural systems (on the left,) to the lowest level; genes (on the right)
2.2.3 Neural Hierarchy [7]
Neurobiology uses an intricate hierarchical organisation, with spatial and temporal scales
spanning several orders of magnitude to produce a certain behaviour in a given organism.
This begins at the genetic level, with genes interacting with the environment to deﬁne the
basic protein constitution in the diﬀerent regions of the nervous system. These molecular
components form the various parts of the diﬀerent brain cells and furthermore, the synaptic
organisation is facilitated through the genetic blueprint. The smallest clusters of inter-
connected synapses then form local units often referred to as microcircuits [9]. These are
grouped to form dendritic subunits [10] within a dendritic trees of individual neurons. A
single neuron may contain several such dendritic subunits. The next level in the hierarchy
consists of local circuits [11], being groups of interconnected neurons of similar types. Such
neuronal groups are then arranged into larger neuronal networks including interregional
pathways, columns, laminae and topographic maps involving multiple regions in the brain
that mediate speciﬁc types of behaviour. A basic representation of this intricate hierarchy
is illustrated in Fig. 2.3.
2.3 Neural Representation
To provide an insight into neurobiological data representation, this section begins by dis-
cussing how visual information is segregated and transmitted in parallel visual pathways.
This is then extended to the cellular level; in particular, with reference to the retina- the
underlying inspiration being the perfect use of resources.
This philosophy is then applied to microelectronics. Much work has already gone into
trying to realise neurobiological systems in silicon technologies [12] [13]; however the two-
dimensional geometry with limited interconnection capacity has proved no match for three-
dimensional neurobiological wetware [4]. Subsequently a diﬀerent approach is taken; to aim
Biologically Inspired Electronics 11
to use the general organisation and representation in biology to help optimise our designs
to our available resources.
2.3.1 Neural Visual Streams [2]
Anatomical studies show that neurons in the visual pathway (see Fig. 2.4) are segregated
into several visual streams. The functional role of the visual streams must be inferred from
the anatomical properties along with the way neurons in these separate streams respond to
light stimulation.
Diﬀerent visual streams each have a unique role; encoding speciﬁc features extracted
from the image, then being relayed to diﬀerent parts of the brain and central nervous
system.
The most important information represented by visual pathways is image contrast rather
than absolute light level. Visual contrast is the ratio of localised light level to average
image intensity. To represent the image contrast, neurons in the visual pathway change
sensitivity to compensate for changes in the mean illumination level. This process, called
visual adaptation, allows the biological visual system to represent scenes of extremely high
total dynamic range without compromising ﬁne details.
Contrast is supplied via two complimentary visual streams up to the primary visual
cortex. One of these represents contrast information varying slowly over space but rapidly
over time, whilst the other varies rapidly over space but slowly over time. These can each
have their own purpose; for example, one stream can relay ﬁne detail for object recognition
tasks, whilst the other provides transient information for saliency and attention processing.
Beyond the primary visual cortex, behavioural and electrophysiological measurements
suggest that image contrast is represented within separate visual streams that each specialise
in coding the information within a certain range of spacial frequencies and orientations. The
multi-resolution representations is qualitatively consistent with measurements of receptive-
ﬁeld properties in the primary visual cortex. Multi-resolution image representations have
become a standard tool in computational applications, including image compression, seg-
mentation and analysis.
Biologically Inspired Electronics 12
LI
G
H
T
DATA FLOW
Retinal Ganglion
Cells
Amacrine
Cells
Bipolar
Cells
Horizontal
Cells
Photoreceptor
Cells (Rods)
Photoreceptor
Cells (Cones)
B
ack o
f Eye
Retina Optic
Nerve
Lateral Geniculate
Nucleus
V1: Primary
Visual Cortex
V3LO V2V7 V8 V4vV5 V3A
Optic
Fibre
Layer
(OFL)
Ganglion
Cell
Layer
(GCL)
Inner
Plexiform
Layer
(IPL)
Inner
Nuclear
Layer
(INL)
Outer
Plexiform
Layer
(OPL)
Outer
Nuclear
Layer 
(ONL)
DATA FLOW
Figure 2.4: The human visual pathway - cellular representation of the retina (top) and
various parts of the brain associated with visual processing (bottom)
Biologically Inspired Electronics 13
Ganglion ON-Cell
Ganglion OFF-Cell Bipolar OFF-Cell
Bipolar ON-Cell
Amacrine Cell
Horizontal Cell Photoreceptor
Figure 2.5: Signal representation in the primate retina, illustrating typical electrical re-
sponse of various neuron types to spot illumination.
2.3.2 Retinal Data Representation
A common paradigm is the associating of neural activity with spiking and action potentials.
Although fundamentally correct, it is commonly overlooked that a great deal of neural
processing in fact is continuous in both time and value. Here, it is not the spike timing
which encode the data but rather the synaptic biochemistry at the various neural interfaces.
A prime example is the mammalian retina; consisting of over 75 discrete neuron types
classed into ﬁve main groups. Of these, the only group to transmit data as action potentials
are the ganglion cells. The other neuron groups (photoreceptors, horizontal, bipolar and
amacrine cells) encode, process and share data as graded potentials (see Fig. 2.5) Transduc-
tion proteins and ion channels are optimised for sensitivity, speed, gain and noise. At each
interface, the diﬀerent representations serve to achieve optimum performance for a given
function. It is this remarkable organisation that ensures both exceptional performance and
great computational eﬃciency. A good review on retinal circuit optimisation can be found
in reference [14].
Biologically Inspired Electronics 14
2.3.3 Quantised Data/time vs. Continuous Data/time
Information exists in a three dimensional media with data encoded in time, intensity and
space. Various techniques of data representation utilise this space in diﬀerent ways. Since
spatial content can be included in all electronic representations; i.e. it is not fundamental,
it will only be included in the next subsection on spike-domain coding.
For example, in electronics, analogue circuits represent data as continuous voltages
and currents, varying both in time and intensity. On the other hand, conventional digital
electronics use clocks to synchronise activity; data therefore being represented as discrete
voltages; being quantised in time in addition to value.
Sampled data techniques exist such as switched-capacitor (SC) [15] and switched-current
(SI) [16] that use a clock to sample continuous-varying signals and are therefore discrete
in time but continuous in amplitude. Such techniques are widely used in signal processing
of continuous (analogue) signals, for example in implementing ﬁlters for oversampling data
converters.
Exploring this two dimensional space (time and amplitude, see Fig. 2.6) for encoding of
data; the only remaining unexploited representation is continuous-time, discrete-data. This
is in fact the principle representation of biology; with spiking neurons conveying no data
in the shape or amplitude of the action potential, but rather in the timing. This encoding
can easily be achieved using asynchronous digital technology; although not widely used in
system-level design due to complexity in synthesis.
2.3.4 Spike Domain
This wide category of this continuous-time spike coding is often referred to as spike domain.
This has the property that all processing is in fact event driven, i.e. the data directly triggers
the processing rather than using some external stimuli or signal as is the case in the majority
of integrated systems.
Spike domain coding is often only associated with the frequency of spike occurrence,
referred to as rate coding. However, this alone cannot account for the high temporal per-
formance achieved in neurobiology, referred to as temporal hyper-acuity. For example,
echo-locating bats have been reported [17] to be able to discriminate echo delays from 10
to 50 nanoseconds. If the minimum spike rate is in the order of milliseconds, some coding
Biologically Inspired Electronics 15
Synchronous Digital
Neurobiology / Spike Domain
Sampled Data TechniquesAnalogue
C
o
n
ti
n
u
o
u
s 
Ti
m
e
D
is
cr
et
e 
Ti
m
e
Continuous Data
Discrete Data
Asynchronous Digital
Figure 2.6: Classiﬁcation of the various data representation techniques in standard micro-
electronic technologies
scheme other than rate coding must be responsible for this phenomenon.
Several diﬀerent schemes [18] have been proposed utilising both temporal and spatial
properties, in order to interpret how data is actually encoded in biology; the most popular
being listed below:
• Rate coding (temporal): The most popular idea is that the data is represented by the
density (or rate) of spikes (or pulses.) Experiments that measured the response of V1
cells in the Primary Visual Cortex [19] [20], found that increasing the contrast input at
the eye had the eﬀect of increasing the rate at which the neurons ﬁred. By periodically
counting the number of spikes in a set sampling window, the data is decoded. This is
good for slowly changing stimuli, or where long distance transmission is required as
this technique inherently removes random noise.
• Population average coding (spatial): In this scheme, the data is encoded as a nor-
malised spatial average. By periodically counting the number of neurons ﬁring within
a short window and normalising to the population size, the population average is
determined. This scheme has good temporal properties and is therefore sensitive to
rapid changes.
• Time of arrival (spatiotemporal): This coding normally conveys data concerning an
external event, with most of the information contained in the ﬁrst few spikes.
Biologically Inspired Electronics 16
• Phase coding (spatiotemporal): In this scheme, the data is encoded as the phase
diﬀerence between diﬀerent spike trains (or pulse streams).
• Synchrony (spatiotemporal): In this scheme the data is encoded in a pattern of spikes
produced by a set of neurons, for example, the pattern may be for simultaneous or
correlated ﬁring.
2.4 Hybrid Computation for Improved Computational Eﬃ-
ciency
Having illustrated how biology uses a sparse set of signal representation forms at diﬀer-
ent levels; it follows that the same strategy should be applied in microelectronics. This
section starts by comparing diﬀerent signal representation and processing modalities in
standard CMOS technologies. Following is a qualitative comparison of selected linear and
non-linear mathematical operations implemented in diﬀerent signal and data representation
techniques. Finally, various methods of combining these techniques in a realistic system level
organisation will be outlined.
2.4.1 Analogue versus Digital Signal Processing
A key debate in the low power electronics community is whether analogue or digital signal
processing can be more computationally eﬃcient.
Much work [21] [1] [22] [23] [4] has already gone into this by considering factors such
as signal-to-noise ratio (SNR,) power consumption, silicon area, channel utilisation, design
time, etc. Following from this, the general conclusions are:
• Analogue processing can be far more computationally eﬃcient than digital signal
processing. This is due to the rich mathematical content in the physics of the de-
vices in comparison to the primitive nature of a digital device (a switch). It follows
that to achieve similar functionality with digital logic, many more devices need to
be used; in fact this can be several orders of magnitude more devices! Moreover, at
high activities this results in signiﬁcantly higher power consumption. This is because
digital logic dissipates both due to continuous subthreshold “leakage” current (static
power) and during switching (dynamic power,) whereas analogue devices only have a
Biologically Inspired Electronics 17
continuous current supply (static power.)
• Digital processing is immune to noise and cumulative oﬀsets. The continuous nature of
analogue signals means they cannot be restored at each stage as discrete signals can.
Consequently any noise or circuit-introduced oﬀset accumulates through cascading
and can ultimately deteriorate the signal in complex analogue systems. This reduces
the accuracy and dynamic range of such a system for a given power budget. If device
geometries are increased and more power is dissipated, analogue systems can be made
to perform to higher accuracies, however then the computational eﬃciency of digital
systems tends to be superior.
• Quantifying these beneﬁts, it can be shown that the cost (silicon area and power
consumption) of analogue computation is exponential with respect to SNR, whereas
the cost of digital computation is linear. In addition to this, the starting overhead (at
low SNR,) of analogue is low, whereas for digital is high. This sets a trend where the
beneﬁts of each method can be divided using SNR alone (see Fig. 2.7 [22, 4]). For
lower SNR’s, analogue techniques can have many order of magnitude area and power
advantage, whereas for higher precision computation digital techniques have the cost
advantage.
These conclusions are the result of deriving mathematical expressions to quantify com-
putational cost that are based on the fundamental limits of each technique. Although these
provide the ultimate theoretical performance of each representation technique, they do not
consider implementation issues, with circuit design and wafer processing being far from
ideal. Following are qualitative comparisons of various microelectronic representations in
performing common computational tasks; implementation issues being considered.
2.4.2 Linear operations
The most common mathematical computations are in fact linear operations. These include
addition, subtraction, multiplication, division and so on. Implementing these computations
in diﬀerent ways can prove hugely beneﬁcial. For example, to add two currents, only a
single wire is needed (by Kirchhoﬀ’s Current Law,) whereas an 8-bit digital implementation
would require 8 full-adder stages, comprising a total of at least 228 transistors. Similarly, a
multiplication can be achieved using a Gilbert (translinear) multiplier circuit [24] employing
only 8 transistors. Here the equivalent digital solution would be an 8-bit array multiplier
Biologically Inspired Electronics 18
8171615141312111019876543210
0
2
4
6
8
01
21
41
61
81
lo
g
(a
re
a,
 p
o
w
er
)
Signal-to-noise ratio (equivalent bits resolution)
Analogue advantage Digital advantage
analogue
digital
depending on
application
Figure 2.7: The relative cost of computation using analogue or digital signal processing.
The crossover between analogue and digital having the advantage is in between 50dB to
72dB SNR (8 to 12 bits resolution) depending on application and circuit topology.
requiring an excess of over 2000 transistors. In these examples silicon area can be saved
using analogue techniques, however as always in electronic design, the various trade-oﬀs
need be considered. A qualitative comparison of the most popular techniques used for
linear arithmetic computation is illustrated in Table 2.2.
For these comparisons, sampled data techniques have been combined with their re-
spective continuous-time counterparts as these are both based on the same underlying
circuit theory. To substantiate this Furth et al. [23] have shown these continuous-time and
sampled-data techniques to follow similar SNR to power consumption relationships.
2.4.3 Non-linear operations
In most complex processing tasks, the underlying computation tends to be non-linear. This
may comprise of an array or bank of linear functions to achieve the overall non-linear
behaviour. A qualitative comparison, as previously presented for linear operations, has
been formulated for selected common non-linear functions, shown in Table 2.3.
Biologically Inspired Electronics 19
Si
gn
al
R
ep
re
se
nt
at
io
n
T
op
ol
og
y
Si
lic
on
A
re
a
P
ow
er
A
cc
ur
ac
y
N
oi
se
Sp
ee
d
R
ef
.
A
d
d
it
io
n
,
S
u
b
tr
ac
ti
on
,
S
u
m
m
at
io
n
C
ur
re
nt
-m
od
e
an
al
og
ue
1
cu
rr
en
t
ad
di
ti
on
(K
C
L
)
be
st
be
st
go
od
ex
ce
lle
nt
go
od
[2
5]
V
ol
ta
ge
-m
od
e
an
al
og
ue
2
ch
ar
ge
do
m
ai
n
(s
w
it
ch
ed
-c
ap
)
go
od
go
od
go
od
ex
ce
lle
nt
go
od
[2
6]
Sp
ik
e
do
m
ai
n3
lo
gi
c
O
R
go
od
go
od
ex
ce
lle
nt
go
od
ex
ce
lle
nt
[2
7]
D
ig
it
al
4
pa
ra
lle
l
co
un
te
r,
ri
pp
le
ad
de
r
fa
ir
go
od
ex
ce
lle
nt
ex
ce
lle
nt
ex
ce
lle
nt
[2
8]
M
u
lt
ip
li
ca
ti
on
,
D
iv
is
io
n
C
ur
re
nt
-m
od
e
an
al
og
ue
G
ilb
er
t
m
ul
ti
pl
ie
r
ex
ce
lle
nt
ex
ce
lle
nt
ex
ce
lle
nt
go
od
fa
ir
[2
4]
V
ol
ta
ge
-m
od
e
an
al
og
ue
ﬂi
pp
ed
vo
lt
ag
e
fo
llo
w
er
s
go
od
go
od
go
od
go
od
fa
ir
[2
9]
Sp
ik
e
do
m
ai
n
pu
ls
e-
m
od
e
ne
ur
on
go
od
go
od
po
or
go
od
go
od
[3
0]
D
ig
it
al
ar
ra
y,
tr
ee
m
ul
ti
pl
ie
r
po
or
fa
ir
ex
ce
lle
nt
ex
ce
lle
nt
ex
ce
lle
nt
[3
1]
S
ca
li
n
g
C
ur
re
nt
-m
od
e
an
al
og
ue
sc
al
ed
cu
rr
en
t
m
ir
ro
r
ex
ce
lle
nt
ex
ce
lle
nt
go
od
fa
ir
go
od
-
V
ol
ta
ge
-m
od
e
an
al
og
ue
op
er
at
io
na
l
am
pl
iﬁ
er
go
od
fa
ir
go
od
go
od
go
od
-
Sp
ik
e
do
m
ai
n
sh
ift
re
gi
st
er
co
un
te
r
go
od
go
od
fa
ir
ex
ce
lle
nt
go
od
[3
2]
D
ig
it
al
ba
rr
el
sh
ift
an
d
ac
cu
m
ul
at
e
fa
ir
go
od
ex
ce
lle
nt
ex
ce
lle
nt
ex
ce
lle
nt
[3
3]
1
P
ro
vi
de
m
ax
im
um
re
so
ur
ce
eﬃ
ci
en
cy
(a
re
a
an
d
po
w
er
)
[4
].
2
P
ro
vi
de
go
od
al
l-
ro
un
d
pe
rf
or
m
an
ce
.
3
P
ro
vi
de
go
od
no
is
e
im
m
un
it
y,
ro
bu
st
ne
ss
an
d
re
so
ur
ce
eﬃ
ci
en
cy
[2
7]
.
4
P
ro
vi
de
hi
gh
es
t
sp
ee
d
op
er
at
io
n
an
d
pr
ec
is
io
n
[4
].
T
ab
le
2.
2:
A
qu
al
it
at
iv
e
co
m
pa
ri
so
n
of
lin
ea
r
co
m
pu
ta
ti
on
s
im
pl
em
en
te
d
us
in
g
di
ﬀe
re
nt
si
gn
al
re
pr
es
en
ta
ti
on
te
ch
ni
qu
es
.
Biologically Inspired Electronics 20
Si
gn
al
R
ep
re
se
nt
at
io
n
T
op
ol
og
y
Si
lic
on
A
re
a
P
ow
er
N
oi
se
A
cc
ur
ac
y
Sp
ee
d
R
ef
.
C
om
p
ar
is
on
,
T
h
re
sh
ol
d
in
g1
C
ur
re
nt
-m
od
e
an
al
og
ue
cu
rr
en
t
co
m
pa
ra
to
r
ex
ce
lle
nt
ex
ce
lle
nt
go
od
fa
ir
fa
ir
[2
5]
V
ol
ta
ge
-m
od
e
an
al
og
ue
op
er
at
io
na
l
am
pl
iﬁ
er
go
od
go
od
go
od
go
od
go
od
-
Sp
ik
e
do
m
ai
n
in
te
gr
at
e,
lo
gi
c
A
N
D
ex
ce
lle
nt
go
od
ex
ce
lle
nt
fa
ir
ex
ce
lle
nt
[3
4]
D
ig
it
al
su
bt
ra
ct
or
fa
ir
fa
ir
go
od
ex
ce
lle
nt
ex
ce
lle
nt
[3
3]
E
x
p
on
en
ti
al
,
L
og
ar
it
h
m
,
S
q
u
ar
e,
R
o
ot
2
C
ur
re
nt
-m
od
e
an
al
og
ue
tr
an
sl
in
ea
r
ci
rc
ui
ts
ex
ce
lle
nt
ex
ce
lle
nt
go
od
go
od
fa
ir
[3
5]
V
ol
ta
ge
-m
od
e
an
al
og
ue
no
n-
lin
ea
r
V
to
I
go
od
fa
ir
go
od
go
od
fa
ir
[3
6]
Sp
ik
e
do
m
ai
n
no
n-
lin
ea
r
ne
ur
on
go
od
go
od
go
od
go
od
go
od
[3
4]
D
ig
it
al
ro
ot
/d
iv
is
io
n
al
go
ri
th
m
fa
ir
fa
ir
ex
ce
lle
nt
ex
ce
lle
nt
go
od
[3
7]
F
il
te
ri
n
g,
In
te
gr
at
io
n
,
D
iﬀ
er
en
ti
at
io
n
,
F
ou
ri
er
T
ra
n
sf
or
m
3
C
ur
re
nt
-m
od
e
an
al
og
ue
L
og
do
m
ai
n
go
od
ex
ce
lle
nt
go
od
ex
ce
lle
nt
ex
ce
lle
nt
[3
8]
V
ol
ta
ge
-m
od
e
an
al
og
ue
C
ha
rg
e
do
m
ai
n
(s
w
it
ch
ed
-c
ap
)
go
od
ex
ce
lle
nt
fa
ir
go
od
go
od
[2
6]
Sp
ik
e
do
m
ai
n
L
os
sy
in
te
gr
at
e
&
ﬁr
e
ne
ur
on
go
od
go
od
go
od
fa
ir
go
od
[2
7]
D
ig
it
al
II
R
/F
IR
ﬁl
te
rs
,
F
F
T
po
or
fa
ir
go
od
ex
ce
lle
nt
go
od
[3
9]
1
T
he
di
re
ct
co
m
pa
ri
so
n
of
co
nt
in
uo
us
si
gn
al
s
m
ak
es
an
al
og
ue
co
m
pa
ra
to
rs
th
e
m
os
t
ea
si
ly
im
pl
em
en
ta
bl
e,
w
he
re
as
di
gi
ta
l
co
m
pa
ri
so
n
te
ch
ni
qu
es
ar
e
ty
pi
ca
lly
im
pl
em
en
te
d
us
in
g
su
bt
ra
ct
io
n
dr
iv
en
co
m
bi
na
ti
on
al
lo
gi
c.
2
A
na
lo
gu
e
re
al
is
at
io
ns
ar
e
ba
se
d
on
tr
an
sl
in
ea
r
te
ch
ni
qu
es
or
ex
pl
oi
ta
ti
on
of
no
n-
lin
ea
r
co
m
po
ne
nt
re
sp
on
se
,w
he
re
as
di
gi
ta
l
im
pl
em
en
ta
ti
on
s
re
qu
ir
e
ei
th
er
R
O
M
-b
as
ed
lo
ok
up
ta
bl
es
or
sy
nt
he
si
s
of
cu
st
om
ar
it
hm
et
ic
-l
og
ic
-u
ni
t
(A
L
U
)
ty
pe
ha
rd
w
ar
e.
3
D
ig
it
al
im
pl
em
en
ta
ti
on
pr
ov
id
es
be
tt
er
re
co
nﬁ
gu
ra
bi
lit
y,
st
ab
ili
ty
to
dr
ift
/t
em
pe
ra
tu
re
an
d
lo
w
fr
eq
ue
nc
y
op
er
at
io
n.
T
ab
le
2.
3:
A
qu
al
it
at
iv
e
co
m
pa
ri
so
n
of
no
n-
lin
ea
r
co
m
pu
ta
ti
on
s
im
pl
em
en
te
d
us
in
g
di
ﬀe
re
nt
si
gn
al
re
pr
es
en
ta
ti
on
te
ch
ni
qu
es
.
Biologically Inspired Electronics 21
2.4.4 Hybrid System Organisation
The ultimate goal of using a hybrid approach is to exploit diﬀerent representation strategies
throughout a system; ideally concocting a cocktail of circuit topologies to achieve optimum
performance for a given processing task. Unfortunately, we are unable to simply pick-and-
match circuit blocks to form a complete system. Conversion techniques must be imple-
mented whenever signal representation changes, however these impose power constraints on
the overall system.
Most modern applications typically require both analogue and digital techniques to work
along side one another as the bare minimum. Since the real world is analogue, any system
requiring a sensor interface requires analogue electronics. On the other hand, as most
control systems and communication protocols are digital, any system requiring external
interface capability requires digital electronics.
This paradigm itself deﬁnes a minimum of one data converter required to be used.
Therefore, in order to best utilise resources, it would be best to use this data conversion to
our advantage by using this as the main conversion stage within a system. Using previously
mentioned signal representation techniques, there exist several architectures that fulﬁl these
criteria (see Fig. 2.8).
2.5 The Technology
For such hybrid processing architectures, CMOS technology is ideally suited. Inherently
being a digital process, representation of discrete data is possible in all forms, whether it be,
clocked-digital, asynchronous or spike domain. Furthermore, through its wide use in mixed-
signal systems, circuit elements well characterised for analogue operation are provided.
This section shall deal with some important aspects in CMOS low power design. For
continuous-signal (analogue) design, the weak inversion operating region shall be discussed
with emphasis on reliable simulation and noise modelling and device mismatch. Similarly,
for discrete-signal design, asynchronous digital implementation issues will be discussed with
emphasis on delay modelling, trip-point matching and power reduction techniques.
Biologically Inspired Electronics 22
Sensor
and
Interface
Digital
Signal
Processing
Analogue
to Digital
Converter
Clock
Digital
Output
Real world signals
Analogue domain Digital domain
Analogue
Signal
Processing
Sensor
and
Interface
Digital
Signal
Processing
Clock
Digital
Output
Analogue domain Digital domain
Real world signals
Analogue
to Digital
Converter
Sensor
and
Interface
Spike
Domain
Processing
Analogue
to Spike
Converter
Handshake
Async.
Digital
Output
Real world signals
Analogue domain Spike domain
Analogue
Signal
Processing
Sensor
and
Interface
Real world signals
Analogue domain
Analogue
Signal
Processing
Analogue
to Digital
Converter
Sampled
Data
Processing
Clock
Digital
Output
Digital domain
(a)
(b)
(c)
(d)
Figure 2.8: Hybrid processing architectures with a single data conversion stage. (a) con-
ventional analogue front-end with digital processor and output (b) hybrid analogue/digital
processing platform with digital output (c) hybrid analogue and sampled data processing
platform with digital output and (d) hybrid analogue/spike-domain processor with asyn-
chronous digital output
Biologically Inspired Electronics 23
2.5.1 Weak Inversion Technology
It is widely accepted that for low power analogue design, the MOS transistor is most eﬃ-
ciently1 used in the weak inversion region. Furthermore, the exponential gate-source voltage
to drain-current relationship extends translinear circuit theory to be applicable in CMOS,
as in bipolar technology. This makes realisation of powerful mathematical operations ex-
tremely cost beneﬁcial; both in computational eﬃciency and silicon area.
The basic model for MOS operation in the weak inversion region [40] is given by:
IDS = I0
(
kT
q
2)
e
qκVG
kT
(
e
qVS
kT − e qVDkT
)
+GDVDS (2.1)
where I0 is the pre-exponential constant, VG is the gate voltage, VS is source voltage,
VD is drain voltage (all relative to substrate,) κ is the body eﬀect (κ = ∂ψS/∂VG, where ψS
is the surface potential,) k is Boltzmann’s constant, T is temperature, q is the electronic
charge and GD = ∂I/∂VD is the coeﬃcient for channel length modulation (Early) eﬀect;
being diﬀerent to that for strong-inversion.
This section shall continue by discussing some of the issues that can be particularly
detrimental to circuit performance when using devices operating in weak inversion.
Simulation Models
There exist several diﬀerent MOSFET simulation models; virtually all being valid in strong
inversion, however, many of these are simply extrapolated to cover weak inversion operation
and therefore provide inaccurate results. A careful review of various simulation models with
emphasis on validity and continuity from strong to weak inversion can be found in reference
[41].
The two most popular models are the BSIM(V3+) and EKV(V2+) families; both
physics-based models being accurate and continuous throughout all operating regions. The
BSIM model; an empirically adjusted SPICE model has been adopted (BSIM3v3) as the
industry standard CMOS simulation model. On the other hand, the EKV model; built
on fundamental charge-based physics has been designed for and dedicated to low power
1In weak inversion the gm/I ratio is at a maximum; a measure of how eﬃciently a transistor uses current
to generate transconductance
Biologically Inspired Electronics 24
1
10
100
1000
1960 1970 1980 1990 2000 2010
Year
LEVEL1
LEVEL2
LEVEL3
BSIM
HSP28
BSIM2
BSIM3v2
BSIM3v1
PCIM
MM9
EKV2.6
early
EKV3.0
SP
HSP28
BSIM
BSIM3v3
BSIM3v3
BSIM4
BSIM3v2
BSIM2
EKV
BSIM4v4
HiSIM 1.2.0
MM11v2
earlyEKV
EKV3.0
Including L, W, P scaling
Without scaling
N
u
m
b
er
 o
f M
o
d
el
 P
ar
am
et
er
s
Figure 2.9: Evolving complexity of common MOSFET simulation models; the two plots
illustrating the trend for the BSIM and EKV models.
circuit simulation. Most modern simulators are compatible with both models; the most
widespread in each case being BSIM3v3.2 and EKV2.6. In comparison, the BSIM model
is more complicated, requiring many more parameters than EKV; having only 18 intrinsic
parameters (see Fig. 2.9). Furthermore, the EKV model provides a single expression for
drain current valid in all modes of operation. For these reasons, the EKV model is often
used as an intuitive tool; useful for fast simulation and can be very helpful at understanding
device behaviour.
Noise in Weak Inversion
Ultra-low power weak inversion circuits imply low current and/or voltage levels and are
therefore more susceptible to the eﬀects of noise. Hence a good understanding in noise
in weak inversion MOS transistors would be most useful. Traditional independent noise
mechanisms include thermal, shot, ﬂicker, recombination and burst noise [42].
White Noise is the expression given to noise with a ﬂat power spectrum; the most com-
mon being thermal or Johnson noise. Such an expression has been derived for subthreshold
Biologically Inspired Electronics 25
MOS transistors by considering the weakly inverted transistor channel to be composed of
a series of resistors [43]. Furthermore, Sarpeshkar et al. have developed a single expression
for white noise by unifying the processes of thermal and shot-noise for weak inversion MOS
transistors [44].
Iwhite
2  2qIsatΔf (2.2)
with q being the electronic charge, Isat being the DC current level and Δf being the
bandwidth.
Pink Noise describes those noise sources with a power spectrum inversely proportional
to frequency; in particular ﬂicker noise [45].
Iflicker
2  KI
p
satΔf
WL
1
f
(2.3)
with K and p being process dependant constants, W and L being the device width and
length and f being the frequency.
By comparing (2.2) and (2.3,) it can be concluded that ﬂicker noise dominates for
f < KI/2q in weak inversion MOS transistors. Furthermore, measured results of ﬂicker
noise in weak inversion transistors [46] [47] tend to suggest that PMOS devices in general
are quieter than NMOS. This is due to the fact that diﬀerent noise mechanisms exist for
NMOS and PMOS devices. Flicker noise in NMOS devices is thought to follow carrier
density ﬂuctuation, whereas in PMOS devices it is mobility ﬂuctuation and the strong gate
bias that are responsible.
MOSFET Matching in Weak Inversion
In CMOS fabrication, there exist two types of wafer processing error to consider. Global
variation accounts for the total variation in absolute value of a component over a wafer or
a batch. On the other hand, local variation reﬂects the relative variation in a component
value with reference to an adjacent component on the same chip.
Absolute value variations are usually provided by characterising the process at key points
often referred to as process corners. In contrast, relative value variations are modelled sta-
Biologically Inspired Electronics 26
tistically, often provided though Monte-Carlo models using gaussian spreads to represent
the ﬂuctuating parameters. It is these relative variations, often referred to as device mis-
match that pose a challenging task to the analogue designer. Furthermore, it is circuits
operating in the weak inversion region [48] that are most susceptible to such errors.
Measurements of weak inversion MOS mismatch [48, 49, 50, 51] have identiﬁed three
main factors eﬀecting device mismatch.
• The edge eﬀect causes variations in drain current and is dependant on device position
with respect to surrounding structures. For example, in an array of identical transis-
tors, those transistors located at the perimeter will exhibit this edge eﬀect; typically
5-15% for NMOS and 20-50% for PMOS. This can be attributed to various factors.
Uneven poly-silicon etch rates at such edge-conditions can alter the eﬀective device
size; the etch rate being aﬀected by the etch area. Also doping variations result-
ing from oﬀ-perpendicular ion implantation and diﬀusion from nearby structures (eg.
wells) contribute to this edge eﬀect.
• The striation eﬀect (sometimes classed as quasi-deterministic) manifests itself as a si-
nusoidal spatial variation in drain current. The amplitude of this variation is typically
30% the average drain current and the spatial period varies slowly from 100-300μm.
This is thought to be due to gas direction ﬂow in wafer processing in implantation
chambers.
• Random variations manifest themselves to short distance ﬂuctuations in threshold
voltage and drain current. Such eﬀects are often attributed to phenomena such as
gate oxide non-uniformities (granularity, trapped charge, thickness,) uneven dopant
distribution and local eﬀective mobility ﬂuctuations. It is assumed that these physical
properties are independent random variables and that the correlation distance of the
statistical disturbance is small compared to the active device area. These assumptions
lead to characterising the normalised device property distribution with a spatial zero-
mean gaussian distribution function.
Threshold Mismatch: The standard deviation of short distance transistor threshold
voltage matching, is given by:
σ(ΔVT0) =
AVT0√
W.L
(2.4)
Biologically Inspired Electronics 27
with AVT0 being a device-speciﬁc constant and W.L being the active device width and
length, therefore the larger the area, the better the matching due to the averaging of short
distance variations.
The dependance of AVT0 on physical properties, is given by:
AVT0 =
qtox
√
2Ntdl
0ox
(2.5)
with N being the active doping atoms in the depletion layer, tox and tdl being the oxide and
depletion layer thicknesses respectively and 0ox being the oxide permittivity.
Current Mismatch: The device current mismatch is dependant on the threshold
voltage mismatch in addition to the current factor mismatch. A common expression [52]
for drain current mismatch is given by:
σ
(
ΔID
ID
)
=
√
4σ2(VT0)
(VGS − VT0)2 +
σ2(β)
β2
(2.6)
with (VGS−VT0) being the overdrive voltage and β being the current factor. This expression
is normally reduced to a basic area dependance, as in the case for threshold mismatch:
σ
(
ΔID
ID
)
=
AIDx√
W.L
(2.7)
with AIDx being a device-speciﬁc constant, normally quoted in foundry documentation for
various values of overdrive voltage.
The current mismatch therefore increases with higher transconductance. This has dev-
astating eﬀect on weak inversion circuits; known for having a maximum transconductance
for a given current. At the circuit level, it is therefore advantageous to bias any transistors
operating in weak inversion at a constant IDS rather than a constant VGS . Biasing at a con-
stant current and thinking in terms of current-domain signals is the essence of current-mode
approach in circuit design.
Biologically Inspired Electronics 28
Mismatch Reduction Techniques: Generally these sources of mismatch can be
greatly reduced through careful layout techniques. By using symmetric, regular device
placements together with use of dummy devices, the edge eﬀect can be completely elimi-
nated. Long-range gradients and striation eﬀects can be reduced using common-centroid
layout strategies. Furthermore, random variations can be reduced through increased device
sizing. A comprehensive review on careful layout techniques for improved matching is given
in reference [53].
Technology Scaling generally has a beneﬁcial eﬀect on device mismatch. It is evident
that with decreasing feature sizes the gate oxide thickness decreases and so the oxide quality
improves. As the gate oxide thickness is directly proportional to the threshold voltage
mismatch (see Eqn. 2.5), this is therefore improved. On the other hand, the current factor
mismatch remains more or less constant, hence any improvement in current mismatch is
due to better threshold voltage matching.
By collating statistical mismatch data (conﬁdential) over a wide variety of CMOS tech-
nologies, the eﬀect on threshold voltage area dependance (mismatch) with minimum tech-
nology feature size is established. This trend is illustrated for NMOS and PMOS devices
separately in Fig. 2.10. An interesting observation is that although traditionally NMOS
devices match better than PMOS, in deep submicron technologies this becomes reversed.
This can be explained; not by PMOS matching improving for deep submicron technologies
but rather NMOS matching deteriorating. The reason for this is due to the increased doping
levels required to maintain acceptable depletion widths in deep submicron. For feature sizes
larger than 0.25μm it was suﬃcient to produce NMOS devices using a relatively low doped
p-substrate. However, for sub-250nm technologies it is required to increase basic substrate
doping by placing NMOS devices in p-wells, similarly to the way PMOS devices have been
traditionally placed in n-wells.
2.5.2 Asynchronous Technology
The clocked synchronous paradigm is currently the dominant design methodology for digital
systems. While this approach has proved hugely successful over many decades, limitations
and drawbacks do exist. For example, the requirement in distributing high-speed clocks over
a large chip area, with precision, is complex and the clock tree itself dissipates a signiﬁcant
proportion of the total power consumption. Also, clocking an entire system does not make
the most eﬃcient use of resources; for circuits not contributing to a particular process still
Biologically Inspired Electronics 29
2
3
4
5
6
7
8
9
01
02
03
04
05
06
07
Th
re
sh
o
ld
 v
o
lt
ag
e 
ar
ea
 d
ep
en
d
an
ce
 A
V
T0
 (m
V
μ
m
)
Technology minimum feature size (μm)
NMOS device
PMOS device
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Figure 2.10: Eﬀect of CMOS technology scaling on threshold voltage mismatch
burden the power supply. Furthermore, in mixed-signal systems, digital clocks often pickup
in sensitive analogue circuitry resulting in degraded performance. These problems become
more severe as device sizes continue to shrink and as clock frequencies continue to rise.
Asynchronous design oﬀers an alternative to the clocked system methodology. This
overcomes the above limitations by dispensing with the clock and using self-timed signaling
to control the sequencing of computations in the system. Such circuits have the potential
for very low power consumption, as only the parts of the circuit that are utilised at any
time have switching activity and thus consume power.
In order to utilise such techniques eﬀectively, a good understanding of transistor-level
logic implementation is timely. This section shall discuss some key issues in logic design,
particularly focusing on power dissipation and delay modelling.
Power Consumption
CMOS logic is typically classed as static logic, ideally consuming no power for no activity.
In reality however, power dissipation can be attributed to three main sources [33]. These
are:
• Static Dissipation refers to all the sources that draw constant current from the power
Biologically Inspired Electronics 30
supply. In static CMOS logic, this is due to reverse bias leakage between diﬀusion
regions and the substrate in addition to subthreshold conduction.
Pstatic = (Irb + Isub)VDD (2.8)
with Irb being the reverse bias leakage, Isub being the subthreshold (diﬀusion) current
and VDD being the supply voltage.
• Dynamic Dissipation refers to power used that is directly related to activity, i.e. the
more activity, the more dynamic power. This is due to the charging and discharging
of load capacitances during switching. The expression for dynamic switching power
is given by:
Pdynamic =
CLV
2
DD
tp
(2.9)
with CL being the load capacitance and tp being the minimum switching interval.
• Short-Circuit Dissipation is a form of dynamic dissipation; this occurs during tran-
sitions when both NMOS and PMOS devices are conducting and in saturation and
a direct route exists for current to ﬂow between the power rails. The expression for
short-circuit dissipation is given by: [33]
Pshort−circuit =
β
12
(VDD − 2VT )3 tedge
tp
(2.10)
with β being the transconductance coeﬃcient, VT being the thermal voltage and tedge
being the edge rise or fall time (assuming that the rising and falling edges are the
same.)
The total power dissipation can therefore be determined by considering the sum of
equations (2.8), (2.9) and (2.10). However, in complex circuit design, it is impractical
to evaluate the above at each individual logic node. Since for any well designed digital
logic, the dynamic dissipation will be the predominant factor, a simple approximation can
be made. By lumping together all the capacitance driven by gate outputs, the following
expression is formed:
Papprox =
δCtotalV
2
DD
tp
(2.11)
with δ being the percentage activity and Ctotal being the total load capacitance.
Biologically Inspired Electronics 31
Delay Modelling
In asynchronous circuit design, a crucial tool in creating reliable self-timed circuits is delay
modelling. As no clock exists to synchronise and condition signals, the focus is on building
well balanced circuits to provide glitch-free operation. In order to achieve this, a strong
understanding of switching characteristics is paramount.
Inverter Timing Analysis
The switching speed of any CMOS gate is limited by the time taken to charge and dis-
charge the load capacitance. In order to establish this, it is useful to analyse the constituent
timings within a simple inverter.
The fall-time of a CMOS inverter is dictated by the NMOS switching characteristic.
This is deﬁned as the time taken for a waveform to fall from 90% to 10% of its steady-state
value. The analytical expression [33] is given by:
tf =
2CL
βnVDD(1− n)
[
(n− 0.1)
(1− n) +
ln(19− 20n)
2
]
 kCL
βnVDD
(2.12)
with βn being the transconductance coeﬃcient for a NMOS device, n being the NMOS
turn-ON point (n = vtn/VDD with vtn being the NMOS threshold voltage) and k being the
switching strength parameter; lumping together all the n terms. The assumption is then
made that the k parameter is the same for NMOS and PMOS devices.
Similarly, the rise-time of a CMOS inverter is dictated by the PMOS switching charac-
teristic. This is deﬁned as the time taken for a waveform to rise from 10% to 90% of its
steady-state value. The analytical expression [33] is given by:
tr =
2CL
βpVDD(1− p)
[
(p− 0.1)
(1− p) +
ln(19− 20p)
2
]
 kpCL
βpVDD
(2.13)
with βp being the transconductance coeﬃcient for a PMOS device, n being the PMOS
turn-ON point (p = vtp/VDD with vtp being the PMOS threshold voltage) and k being the
switching strength constant.
Thus for equally sized N- and P-MOS transistors, with βn = 2βp (due to diﬀerent carrier
mobilities,) the rise and fall times are scaled proportionally, i.e. tf = tr/2. For well balanced
inverters it is therefore necessary to design the PMOS with increased aspect ratio.
Biologically Inspired Electronics 32
The delay-time is then deﬁned as the time diﬀerence between input transition (50%)
and the 50% output level. This can be approximated as half the rise or fall time, depending
on whether the transition is rising or falling.
Gate Delays
CMOS logic delays are often approximated by reducing a gate to an equivalent inverter.
For example, a 3-input NOR gate would reduce to three series PMOS devices for determining
the rise time and a single NMOS device for determining the maximum fall time.
In general, the fall time tf is atf for a NMOS transistors in series and the rise time
tr is btr for b PMOS transistors in series. Similarly, the minimum fall time is tf/x for x
NMOS transistors in parallel and the minimum rise time is tr/y for y PMOS transistors in
parallel. The maximum fall and rise times for parallel devices being achieved when only a
single device is contributing to the logic action. Such timings are invaluable for designing
circuits that have critical-delay paths, to avoid glitches and hazards. For other methods of
determining delay approximations see references [54] and [55].
2.6 Summary
In this chapter neurobiology has been reviewed from an engineering perspective, examin-
ing organisational and representation techniques. Extrapolating to microelectronics, these
design principles have been used in determining methods suitable for realising electronic sys-
tems based on biology. The motivation being that this will lead to development of eﬀective,
eﬃcient and robust perceptive systems. It has been established that hybrid structures are
a requisite in modern sensor processing applications and representation-based system-level
design techniques can assist in improving computational eﬃciency. Weak inversion analogue
and asynchronous digital electronics have been identiﬁed as the most suitable techniques
for biologically-inspired circuit design. Speciﬁc implementation issues have been discussed
and analysed for low power and robust design methodology. For weak inversion operation,
noise and matching analysis suggest that PMOS devices perform better than NMOS in deep
submicron technologies. Furthermore, an understanding behind device mismatch provides
the knowhow in designing reliably manufacturable subthreshold circuits. For asynchronous
logic design, delay modelling and power dissipation analysis provides an intuitive design
methodology for realising reliable self-timed and well-balanced circuits with minimal power
losses.
References
[1] C. A. Mead, Analog VLSI and Neural Systems. Addison-Wesley, 1989.
[2] B. A. Wandell, Foundations of Vision. Sinauer Associates, 1995.
[3] J. G. Nicholls, A. R. Martin and B. G. Wallace, From Neuron to Brain, ch. Analysis
of signals in the nervious system. Sinauer Associates, 1992.
[4] R. Sarpeshkar, Eﬃcient Precise Computation with Noisy Components: Extrapolation
from an Electronic Cochlea to the Brain. PhD thesis, California Institute of Technology,
Pasadena, California, 1997.
[5] W. Gerstner, W. M. Kistler, Spiking Neuron Models: Single Neurons, Populations,
Plasticity. Cambridge University Press, 2002.
[6] L. W. Swanson, Brain Architecture: Understanding the Basic Plan. Oxford University
Press, 2002.
[7] G. M. Shepherd (ed.), The Synaptic Organization of the Brain. Oxford University
Press, 1998.
[8] J. G. Nicholls, A. R. Martin and B. G. Wallace, From Neuron to Brain. Sinauer
Associates, 1992.
[9] G. M. Shepherd, “Microcircuits in the nervous system,” Scientiﬁc American, vol. 238,
no. 2, pp. 93–103, 1978.
[10] C. Koch, T. Poggio and V. Torre, “Nonlinear interactions in a dendritic tree: Local-
ization, timing and the role of Information Processing,” Proceedings in the National
Academy of Science USA, vol. 80, no. 9, pp. 2799–2802, 1983.
33
REFERENCES 34
[11] P. Rakic, “Local Circuit Neurons,” Neuroscience Research Program Bulletin, vol. 13,
no. 3, pp. 295–416, 1975.
[12] K. A. Boahen, “A Retinomorphic Chip with Parallel Pathways: Encoding ON, OFF,
INCREASING, and DECREASING Visual Signals,” Kluwer Analog Integrated Circuits
and Signal Processing, vol. 30, no. 2, pp. 121–135, 2002.
[13] K. A. Zaghloul and K. A. Boahen, “Optic Nerve Signals in a Neuromorphic Chip
I: Outer and Inner Retina Models,” IEEE Transactions on Biomedical Engineering,
vol. 51, no. 4, pp. 657–666, 2004.
[14] P. Sterling, “How retinal circuits optimize the transfer of visual information,” The
Visual Neurosciences, by L. M. Chalupa and J. S. Werner, eds., pp. 234–259, 2004.
[15] R. Gregorian and G. C. Temes, Analog MOS Integrated Circuits for Signal Processing.
John Wiley & Sons, 1986.
[16] C. Toumazou, J. B. C. Hughes and N. C. Battersby, eds., Switched-currents: An Ana-
logue Technique for Digital Technology. IEE Publishing, 1993.
[17] C. F. Moss and J. A. Simmons, “Acoustic image representation of a point target in the
bat Eptesicus fuscus: Evidence for sensitivity to echo phase in bat sonar,” Journal of
the Acoustical Society of America, vol. 93, pp. 1553–1562, 1993.
[18] P. A. Cariani, “Temporal coding of sensory information in the brain,” Acoustical Sci-
ency and Technology: The Acoustical Society of Japan, vol. 22, no. 2, pp. 77–84, 2001.
[19] D. H. Hubel, “Receptive Fields of Single Neurons in the Cat’s Striate Cortex,” Journal
of Physiology (London), vol. 148, pp. 574–591, 1959.
[20] V. B. Mountcastle, “Modality and Topographic Properties of Single Neurons of Cat’s
Somatosensory Cortex,” Journal of Neurophysiology, vol. 20, pp. 408–434, 1957.
[21] B. J. Hosticka, “Performance Comparison of Analog and Digital Circuits,” Proceedings
of the IEEE, vol. 73, no. 1, pp. 25–29, 1985.
[22] E. A. Vittoz, “Future of analog in the VLSI environment,” Proceedings of the IEEE
International Symposium on Circuits and Systems, vol. 2, pp. 1372–1375, 1990.
[23] P. M. Furth and A. G. Andreou, “Bit-energy comparison of discrete and continuous
signal representations at the circuit level,” 4th Workshop on Physics and Computation,
1996.
REFERENCES 35
[24] B. Gilbert, “A precise four-quadrant multiplier with subnanosecond response,” IEEE
Journal of Solid-State Circuits, vol. SC-3, p. 365, 1968.
[25] C. Toumazou, F. J. Lidgey and D. G. Haigh, Analog IC Design: The Current-Mode
Approach. London: Peter Perigrinus, 1990.
[26] P. E. Allen, E. S. Sinencio and E. Sanchez-Sinencio, Switched Capacitor Circuits.
Kluwer Academic Publishers, 1984.
[27] A. F. Murray, D. Del Corso and L. Tarassenko, “Pulse-stream VLSI neural networks
mixing analog and digital techniques,” IEEE Transactions on Neural Networks, vol. 2,
no. 2, pp. 193–204, 1991.
[28] D. J. Kinniment, J. D. Garside and B. Gao, “A comparison of power consumption in
some CMOS adder circuits,” Proceedings of PATMOS, C. Piguet and W. Nebel, eds.,
pp. 119–132, 1995.
[29] J. Ramirez-Angulo, S. Thoutam, A. Lopez-Martin, R. J. Carvajal, “Low-voltage CMOS
analog four quadrant multiplier based on ﬂipped voltage followers,” Proceedings of
IEEE International Symposium on Circuits and Systems, vol. 1, pp. 681–684, 2004.
[30] H. Hikawa, “A new digital pulse-mode neuron with adjustable activation function,”
IEEE Transactions on Neural Networks, vol. 14, no. 1, pp. 236–242, 2003.
[31] J. H. Satyanarayana and K. K. Parhi, “A theoretical approach to estimation of bounds
on power consumption in digital multipliers,” IEEE Transactions on Circuits and Sys-
tems II: Analog and Digital Signal Processing, vol. 44, no. 6, pp. 473–481, 1997.
[32] H. Hikawa, “Frequency-based multilayer neural network with on-chip learning and
enhanced neuron characteristics,” IEEE Transactions on Neural Networks, vol. 10,
no. 3, pp. 545–553, 1999.
[33] N. H. E. Weste and K. Eshraghian, Priciples of CMOS VLSI design: A systems per-
spective. Addison-Wesley, 1993.
[34] A. F. Murray, “Pulse arithmetic in VLSI neural networks,” IEEE Micro, vol. 9, no. 6,
pp. 64–74, 1989.
[35] B. Gilbert, Analog IC Design: The Current-Mode Approach, ch. Current-mode circuits
from a translinear viewpoint: A tutorial. Peter Peregrinus Ltd., 1990.
REFERENCES 36
[36] D. Quoc-Hoang Duong, N. Trung-Kien Nguyen and L. Sang-Gug, “Ultra low-voltage
low-power exponential voltage-mode circuit with tunable output range,” Proceedings
of the IEEE International Symposium on Circuits and Systems, vol. 2, pp. 729–732,
2004.
[37] M. D. Ercegovac and T. Lang, Division and Square Root: Digit-Recurrence Algorithms
and Implementations. Kluwer Academic Publisher, 1994.
[38] E. M. Drakakis, A. J. Payne and C. Toumazou, “Log-domain ﬁltering and the Bernoulli
cell,” IEEE Transactions on Circuits and Systems I: Fundamental Theory and Appli-
cations, vol. 46, no. 5, pp. 559–571, 1999.
[39] J. Proakis, D. Manolakis and D. G. Manolakis, Digital Signal Processing: Principles,
Algorithms and Applications. Pearson US Imports & PHIPEs, 1995.
[40] M. D. Godfrey, “CMOS device modeling for subthreshold circuits,” IEEE Transac-
tions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 39, no. 8,
pp. 532–539, 1992.
[41] J. Georgiou, Micropower Electronics for Neural Prosthetics. PhD thesis, Imperial Col-
lege of Science, Technology and Medicine, University of London, 2002.
[42] W. M. Leach, “Fundamentals of low-noise analog circuit design,” Proceedings of the
IEEE, vol. 82, pp. 1515–1538, 1994.
[43] C. Enz, High Precision CMOS Micropower Ampliﬁers. PhD thesis, Ecole Polytechnique
Federale de Lausanne, Switzerland, 1990.
[44] R. Sarpeshkar, T. Delbruck, and C. A. Mead, “White noise in MOS transistors and
resistors,” IEEE Circuits and Devices Magazine, vol. 9, no. 6, pp. 23–29, 1993.
[45] C. Schutte and P. Rademeyer, “Subthreshold 1/f noise measurements in MOS transis-
tors aimed at optimizing focal plane array signal processing,” Kluwer Analog Integrated
Circuits and Signal Processing, vol. 2, pp. 171–177, 1992.
[46] T. Delbruck, Investigations of visual transduction and motion processing. PhD thesis,
California Institute of Technology, Pasadena, California, 1993.
[47] J. Chang, A. A. Abidi and C. R. Viswanathan, “Flicker noise in CMOS transistors
from subthreshold to strong inversion at various temperatures,” IEEE Transactions on
Electron Devices, vol. 41, no. 11, pp. 1965–1971, 1994.
REFERENCES 37
[48] A. Pavasovic, Subthreshold region MOSFET mismatch analysis and modeling for analog
VLSI systems. PhD thesis, John Hopkins University, Baltimore MD, 1990.
[49] A. Pavasovic, A. G. Andreou, and C. R. Westgate, “Characterization of Subthreshold
MOS Mismatch in Transistors for VLSI Systems,” Kluwer Journal of VLSI Signal
Processing, vol. 8, pp. 75–85, 1994.
[50] F. Forti and M. E. Wright, “Measurement of MOS current mismatch in the weak
inversion region,” IEEE Journal on Solid State Circuits, vol. 29, no. 2, pp. 138–142,
1994.
[51] N. Kumar, P. O. Pouliquen and A. G. Andreou, “Device mismatch limitations on per-
formance of a Hamming distance classiﬁer,” IEEE International Workshop on Defect
and Fault Tolerance in VLSI Systems, pp. 327–334, 1993.
[52] M. Pelgrom, A. Duinmaijer and A. Welbers, “Matching Properties of MOS transistors,”
IEEE Journal on Solid State Circuits, vol. 24, pp. 1433–1440, 1989.
[53] R. A. Hastings, The Art of Analog Layout. Prentice Hall, 2000.
[54] T. Sakurai and A. R. Newton, “Delay analysis of series-connected MOSFET circuits,”
IEEE Journal of Solid State Circuits, vol. 26, no. 2, pp. 122–131, 1991.
[55] S. Dhar and M. A. Franklin, “Optimum buﬀer circuits for driving long uniform lines,”
IEEE Journal of Solid State Circuits, vol. 26, no. 1, pp. 32–40, 1991.
Chapter 3
Modern Vision Processing
Technology
3.1 Introduction
Computer and machine vision are disciplines that have matured substantially in the past two
decades. Computer vision consists of image processing (image in, image out), image analysis
(image in, measurement out) and image understanding (image in, high-level description
out). Machine vision is the application of computer vision to enable a system to detect or
extract and subsequently act upon a certain visual feature in a control or analysis task.
Machine vision technology is becoming an increasingly important and in some cases in-
dispensable tool, for example in manufacturing automation. Applications appear in many
disciplines and industries, including: semiconductor, electronics, pharmaceuticals, packag-
ing, medical devices, biomedical, space, automotive, security, surveillance and consumer
goods. Machine vision systems oﬀer a non-contact means of inspecting and identifying
parts, accurately measuring dimensions, or guiding robots, instruments or other machines
during positioning, navigation or stabilising operations.
Although such techniques continue to enjoy much success, eﬀective realisation of per-
ceptive vision processing is a function that continues to elude us. Visual perception is a
high level process often associated with the human visual system. This is because, biologi-
cal vision is undoubtedly superior-to and unrivaled-by any synthetic artifact in perceptive
processing. For example, through our visual perception we immediately perceive a room as
containing windows, furniture and objects without delaying to process or analyse the image.
38
Modern Vision Processing Technology 39
Such neurobiological visual perception is facilitated by ﬁltering, extracting and converting
the incident image through a chain of sub-processes of increased abstraction and complexity
starting at the ocular optics and retina. The retina preconditions, ampliﬁes, compresses and
extracts parallel visual streams from the incident image. This set of neural images convey
various features via the Lateral Geniculate Nucleus (LGN) to the primary visual cortex for
more speciﬁc feature extraction (eg. orientation selective). As subsequently information is
conveyed to higher neural layers, the amount (or quantity) of data reduces, whereas the
perceptive level (or quality) increases.
Conventional computer vision processing systems are designed to operate under well
controlled conditions for example, with uniform lighting and well-deﬁned targets. Visual
perception functions however are not inﬂuenced by such factors. For example, an image
recognition task should not be aﬀected by lighting, orientation or size. Therefore it would
be useful to understand and use the underlying biological principles to produce eﬀective
and robust perceptive vision processors.
This chapter begins by reviewing standard silicon process technologies suitable for imag-
ing, image processing and vision processing. Emphasis is given to identify a technology
suitable for implementation biologically inspired processing; integrating phototransduction
devices with electronics. Sequential and distributed processing architectures are then exam-
ined identifying their merits and limitations. Finally, existing techniques (both software and
distributed hardware) for centre-of-mass processing and target tracking are reviewed and
discussed, and vision chips developed in this ﬁeld are compared. This task (centroid process-
ing) is considered a high-level task; for it requires image conditioning and pre-processing,
object segmentation and beyond and is therefore classed as a perceptive vision process.
3.2 Imager Technology
George Smith and Willard Boyle invented the Charge-Coupled Device (CCD) [1] at Bell
Labs in the late 1960’s. They were attempting to create a new kind of semiconductor
memory for computers. By 1970, the Bell Labs researchers had built the CCD into the
world’s ﬁrst solid state video camera. Until the recent invention of the CMOS Active Pixel
Sensor (APS) by Eric Fossum in the early 1990’s, solid-state imaging systems have relied
on Charge-Coupled Device (CCD) technology for the image sensor component.
For the past decade, the CMOS APS has attempted to address some of the weaknesses
Modern Vision Processing Technology 40
of CCD technology. CCD’s rely on perfect charge transfer which makes the technology
intrinsically radiation “soft” in comparison to CMOS and therefore less straight-forward to
use at low temperatures. Furthermore, the diﬃculty to integrate with on-chip electronics
makes high frame-rate imagers and low power operation extremely challenging for CCD
technology. Also, comparing a specialist CCD imager process to standard CMOS technol-
ogy; by far being the most common and highest yielding process, it follows that CMOS
imagers would boast a signiﬁcant cost advantage.
This section is dedicated to these two solid-state imager technologies. By outlining the
principles of operation and comparing state-of-the-art CCD and CMOS image sensors, the
strengths and weaknesses of each method are established and stated.
3.2.1 CCD Imagers
CCD technology uses the inherent photoresponsiveness of silicon to generate electron-hole
pairs on photon absorption. Impurity implants patterned into the silicon together with a
suitable voltage bias conﬁne the photo-generated electrons to discrete packets. These charge
packets are then transferred to the conversion hardware using various strategies.
There typically exist three main architectures for CCD-based imagers: full-frame, frame
transfer and interline. Other techniques (not discussed) include Frame-Interline Transfer,
Accordian, Charge Injection and MOS XY Addressable.
• Full-Frame (FF): These typically are have the simplest architecture and are there-
fore the easiest to design, fabricate and operate. In these, the total area of the CCD
(100% ﬁll factor) is available for detecting incoming photons during the exposure
period. During the readout phase, charge is shifted sequentially across the array
therefore necessitating a mechanical shutter to maintain image integrity (i.e. to pre-
vent smearing). This architecture consists of a parallel CCD shift register, a serial
CCD shift register and a charge conversion output ampliﬁer. These simplicity of the
FF design yields CCD imagers with the highest resolutions and densities.
• Frame Transfer (FT): This architecture are very similar to the FF technique, diﬀer-
entiating in the fact that it uses two CCD arrays; one photosensitive and one used
exclusively for storage. The idea is to quickly shift the captured scene from the photo-
sensitive region to the storage array. Readout oﬀ-chip from the storage register is then
Modern Vision Processing Technology 41
performed similarly to the FF device, whilst the next image is being formed on the
sensing array. The advantage is that continuous (shutterless) operation is achievable
resulting in faster frame rates. This is however at the expense of image quality (due
to smearing), reduced resolution and higher cost.
• Interline (IL): These have been devised to address the shortcomings of FT devices.
This is achieved by separating the photo-detecting and readout functions by forming
isolated photosensitive regions interweaved with lines of light-shielded parallel readout
CCDs. After integrating a scene, the signal collected in every pixel is shifted, all in
parallel, to the charge storage parallel CCD. Readout and output is then performed
similarly to FF and FT devices. This architecture signiﬁcantly reduces smearing and
increases frame-rate. However, the increased pixel complexity and reduced ﬁll factor
typically result in higher unit costs and lower sensitivities.
CCD technology has several general advantages and disadvantages. Using a single out-
put ampliﬁer to convert and amplify the charge means excellent global image uniformity.
Furthermore, a single output ampliﬁer can be optimised for low-noise operation and com-
bined with a low-noise substrate, CCD imagers boast good dynamic range. On the other
hand, as the image is formed through accumulating charge in adjacent wells, an “overﬁll”
would result in charge spilling out to adjacent pockets, referred to as blooming. It is tech-
nically feasible but not economic to use the CCD manufacturing process to integrate other
image sensor requisites, such as the clock drivers, timing logic, and signal processing on
the same chip as the photodetectors. These are normally put on separate chips so CCD
cameras contain multiple chips. The complete CCD imager architecture (FF organisation)
is illustrated in Figure 3.1.
3.2.2 CMOS Imagers
In the past, CMOS imagers based on the Passive Pixel Sensor (PPS) architecture suﬀered
from poor image uniformity/dynamic range and limited output bandwidth (due to high
readout capacitance). A solution to this problem would be to introduce local gain within
the pixel to buﬀer the signal. However, in doing so the active photo-sensor area to total pixel
size ratio (surface ﬁll factor) would be reduced, consequently also reducing the responsivity
(incident light power to photocurrent ratio.)
In the past, as minimum transistor feature sizes were multiple-micron, an active sensor
Modern Vision Processing Technology 42
Bias
Generators
Clock
Drivers
Amplifier ADC
Line
Driver
Oscillator
Clock & Timing
Generatior
Electron
to
Voltage
Converter
Frame
Out
Charge-Coupled Device (CCD)
Image Sensor
Off-chip Acquisition Control and 
Signal Conditioning
Figure 3.1: A typical CCD imager (full-frame based) architecture.
approach would render an prohibitively low ﬁll factor, resulting in an unusable pixel format.
Through CMOS technology scaling, the Active Pixel Sensor (APS) [2] was recently made
feasible as the ratio of electronic device to photodetecting element has been signiﬁcantly
reduced. The APS approach is to include an in-pixel ampliﬁer performing the charge-to-
voltage conversion inside each and every pixel. Using a column and row selection matrix,
the individual pixel voltages are scanned oﬀ the array and buﬀered by the column ampliﬁers.
A drawback of the APS approach is that non-uniformities in the in-pixel ampliﬁers
constitute the so-called Fixed Pattern Noise (FPN). However, a major beneﬁt in using
CMOS technology is the ability to integrate photodetecting elements with electronics on
the same chip. Subsequently, advanced signal conditioning techniques can be implemented
to dramatically reduce the FPN (eg. correlated-double sampling) and further process the
image. The complete CMOS imager architecture is illustrated in Figure 3.2.
3.2.3 CCD vs. CMOS
Comparing CCD to CMOS imager technologies [3] [4] several key diﬀerences can be found.
Table 3.1 makes such a comparison in various performance criteria including: dynamic
range, responsivity, uniformity, speed, power and cost [3]. Considering these advantages
and disadvantages of CCD and CMOS imagers, it is evident these will continue to co-exist
having a complimentary application space:
• CCD image sensors oﬀer superior image quality and ﬂexibility at the expense of system
size, cost and power. This technology remains the most suitable for high-end imaging
Modern Vision Processing Technology 43
Bias
Generators
Clock
Drivers
Amplifier ADC
Line
Driver
Oscillator
Clock & Timing
Generatior
Frame
Out
Column Multiplexer
Column Amplifiers
Ro
w
 access
A A A A A
A A A A A
A A A A A
A A A A A
A A A A A
Complimentary Metal Oxide Semiconductor (CMOS) Image Sensor
Figure 3.2: A typical CMOS imager architecture
applications. Such examples include: high deﬁnition video and still cameras and most
high quality industrial, scientiﬁc and technical applications.
• CMOS imagers oﬀer superior integration, fabrication cost, power dissipation and sys-
tem size at the expense of image quality. This technology is the most suitable for
high-volume, space- and power-constrained applications with relaxed image quality
requirements. Such examples include: security, surveillance, biomedical, biometric,
automotive, healthcare and machine vision applications.
• Recent trends in CMOS imager technology have shown potential for APS systems
applied to high deﬁnition applications traditionally catered for exclusively by CCD
systems [5, 6].
3.3 Vision Processing Techniques
Image and vision processing has traditionally involved a modular organisation, consisting of
a stand-alone camera, computer interface and PC. Recent developments in processing hard-
ware has enabled embedded processors to substitute the traditional computing platform.
Although this presents a more compact, power eﬃcient system, the underlying principle of
organisation remains the same; very much a sequential von Neumann based architecture.
This section outlines the strengths and weaknesses of this approach and continues by
introducing the distributed processing approach, recently made feasible by advances in
microelectronic technology, in particular concerning submicron CMOS.
Modern Vision Processing Technology 44
Attribute Comparison Advantage
Responsivity Both CCD and CMOS have similar responsivities
(as both can now include in-pixel gain).
Same
Dynamic Range CCD achieves superior dynamic range due to lower
noise (in substrate and ampliﬁers).
CCD
Fill Factor CCD approaches 100% ﬁll factor whereas CMOS
typically achieves 20-50%, however this can be
greatly improved by using micro-lenses.
CCD
Pixel Size Full-frame CCD’s can have the minimum pixel
pitch, due to no in-pixel circuitry.
CCD
Uniformity CCD has excellent image uniformity due to single
output ampliﬁer whereas in CMOS, FPN degrades
uniformity hence additional electronics are required
for compensation.
CCD
Shuttering Interline CCD’s have the ability to shutter arbitrar-
ily whereas CMOS requires additional electronics.
CCD
Speed CMOS capable of much higher speed of operation
than CCD due to the integration of all functions
on the same chip (less capacitance).
CMOS
Windowing On-chip electronics in CMOS can produce control
signals for windowing functionality.
CMOS
Anti-blooming CMOS has inherent immunity to blooming whereas
to reduce in CCD requires alteration to the stan-
dard CCD fabrication process.
CMOS
Anti-smearing CMOS has inherent immunity to smearing whereas
this places design constraints in CCD.
CMOS
Biasing & Clocking CMOS devices operate from a single voltage bias
and clock level, generated on-chip whereas CCD
require several higher voltage biases.
CMOS
System Size Electronics integration on single chip in CMOS re-
sults in reduced system size compared to CCD.
CMOS
Power CMOS technology is inherently lower power than
CCD. Optimal modular design in CCD for high-
speed systems can provide good power eﬃciency.
CMOS
Cost Lower total cost in CMOS technology due to wider
availability of CMOS fabrication and single-chip
implementation.
CMOS
Table 3.1: Comparison of CCD and CMOS imager technologies
Modern Vision Processing Technology 45
CCD or CMOS
Image Sensor
Digital Signal
Processor (DSP)
Memory
High BW
(Serial data)
Frame
Scan
Random
Access
Processed
Output
Low BW
Image (parallel data)
High BW
(Serial data)
Figure 3.3: A conventional real-time image processing platform
3.3.1 The Modular Approach
This conventional approach uses a standard CCD or CMOS camera (image sensor includ-
ing all internal image conditioning and pre-processing hardware) for image capture, subse-
quently transmitting the image to be stored in memory. The processor, either an embedded
DSP/FPGA or a conventional PC platform then executes an image processing algorithm,
either randomly or sequentially accessing the stored image within memory. This process is
typically repeated for each frame in static image processing. If transient image properties
are to be processed then a history of previous frames need to be stored in memory. The
great merit of this technique is reconﬁgurability and versatility, for the processing algo-
rithm is deﬁned in software and therefore reprogrammable. This modular architecture for
implementing real-time image processing is illustrated in Figure 3.3.
Generally with modern processing hardware, “real-time” processing at standard scan
rates (eg. 25Hz) is achievable for many tasks, at the expense of power consumption and sys-
tem size. This poses a major issue in portable battery-operated machine vision applications
having stringent power and size constraints. Furthermore, the sequential processing and
communication nature of this technique render it unusable for high frame rate applications.
For example, for 8-Bit VGA resolution (640x480 pixel) image at 1000 Frames Per Second
(FPS) refresh, the imager alone would require a 2.46Gbps communication bandwidth! It is
here the bottleneck of the sequential von Neuman based computational paradigm presents
itself.
iCount: A Modular Real-time Image Processing Example
A commercial system implemented using such a platform is iCount by Safehouse Technology
Ltd [7]. The product brochure provides the following description:
Modern Vision Processing Technology 46
Pixel
Distributed
Processing
Pixel
Distributed
Processing
Pixel
Distributed
Processing
Pixel
Distributed
Processing
Pixel
Distributed
Processing
Pixel
Distributed
Processing
Pixel
Distributed
Processing
Pixel
Distributed
Processing
Pixel
Distributed
Processing
Column Output Encoder
Ro
w
 O
u
tp
u
t 
En
co
d
er
Output Control Logic
Current & Voltage
Bias Generation
System 
Control & Tuning
Off-array
Processing
Processed
Output
Low BW
Figure 3.4: Real-time distributed-processing vision chip architecture
“iCount unobtrusively provides counting statistics for people and vehicles moving be-
tween user-deﬁned areas. The system uses analogue or digital cameras, runs on standard
PC hardware and integrates easily with existing IT infrastructures.”
In this application, the modular approach provides a good, usable solution; for there are
no power or size constraints and reconﬁgurability is necessary to set up the various system
parameters. For example, in a particular scene object size and shape, boundary crossings,
forbidden regions and lighting conditions need to be set.
3.3.2 The Distributed Approach
Distributed or focal-plane vision processing has mainly evolved and been developed within
the Bio-inspired Electronics community in the past 15 years, since Carver Mead introduced
the notion of Neuromorphic Electronics[8]. This is due to the fact that distributed vision
processing at least shares two fundamental properties with neurobiology: massive paral-
lelism and being constructed from basic and identical processing elements.
A distributed processing architecture (see Fig. 3.4) uses electronics embedded within the
Modern Vision Processing Technology 47
photo-detection array, within each pixel; performing local, parallel and distributed process-
ing. Every row and column shares a common output bus for extraction of processed data.
Since the output is high-level processed data, the output buses are event- or data-driven;
delivering an event or value only when the local processing elements ﬂag a useful result is
awaiting. This phenomenally reduces the communication bandwidth, thus a simple asyn-
chronous handshake is the preferred method for oﬀ-chip communication. Some such vision
systems contain an oﬀ-array processing core for some post-processing of the sequentially
extracted data. Distributed processing vision architectures are often called vision chips
because all the required functionality is on a single chip, including bias and reference gener-
ators, clocks1, control and tuning logic are contained on a single chip. This has been made
possible through use of modern CMOS technology.
The distributed processing paradigm overcomes the bottleneck presented in sequen-
tial processing architectures by employing massive parallelism of low speed processing el-
ements2. Furthermore, by optimising these elements using hybrid (analogue, digital and
spike-domain) circuit topologies, phenomenally good computational eﬃciency is achievable
in real-time vision processors. This has been until recently [9, 10] at the expense of devel-
opment time and reconﬁgurability. As vision chips are application speciﬁc and hardware
based, generally they require more time to design and once fabricated are typically dedicated
to a speciﬁc algorithm or processing task.
ACE16K: A Distributed Real-time Vision Processing Example
A commercial system implemented using such a platform is the ACE16K [11] by Anafocus
Ltd. The product brochure provides the following description:
“A digitally-controlled analog array processor designed for fast image processing appli-
cations. Its revolutionary processing capabilities rely on a combined spatial distribution of
sensing, processing and storage on an array of identical programmable units.”
This product is a generic high frame-rate (surpassing 1000FPS) vision processing system
targeted at applications such as textile fault detection, rail inspection from high-speed
trains, detection of debris particles in oil ﬂow, high speed inspection during production,
etc. The system uses a distributed Cellular Neural Network (CNN) architecture to provide
the computational power required to process the vast amount of image data. However, the
1Use of clocks is normally avoided (for noise and power reasons) unless absolutely critical.
2An important advantage of this technique is the processing time is independent of array size
Modern Vision Processing Technology 48
main limitations of this approach is the relatively large pixel size and prohibitively high
power consumption, excluding it from use in applications with stringent power constraints
or those requiring high resolution images.
3.3.3 The Computation-on-Readout Approach
Although the distributed approach is ideal for fast and eﬃcient early vision processing,
the inclusion of processing circuitry within the pixels, prevents such systems from acquir-
ing high-resolution images. These space constraints are eliminated if the processing is
performed serially during read-out using pixel-block-parallel-addressing. Various kernels
can be programmed in the processing unit of the imager and convolution is performed on
readout with several kernels in parallel. Functionally, the image itself serves as an analog
memory because the image dynamics occur at much slower speed than the image process-
ing being performed. The beneﬁts of this approach are: (1) small pixel size allowing for
high-resolution imaging, (2) a single processor unit is used throughout the entire retina and
(3) programmability does not impact the imaging array density. The space constraints are
then transformed into temporal restrictions because the scanning clock speed and response
time of the processing circuits must scale with the size of the array. This approach is often
referred to as Computation-on-Readout (COR). This architecture is in fact very similar to
the APS, diﬀering in only that it includes a processing core; controlling the row/column
selection and facilitating the COR.
Dallaire et al. [12] have applied the COR organisation by implementing a multireso-
lution edge detection algorithm on hexagonal pixel array. Another similar system based
on COR was used by Mallik et al. [13], implementing a temporal change threshold detec-
tor. A more generic approach was taken by Gruev et al. [14, 10], attempting to realise a
pseudogeneral-purpose vision chip for spatiotemporal ﬁltering having the size and conﬁgu-
ration of processing convolution kernels programmable.
3.4 Centroid Detection
Visual position tracking and centroid detection have been tasks traditionally associated
with military applications. However these same tasks are fundamental to the more generic
ﬁeld of image recognition. Traditional image processing techniques eﬀectively condition,
ﬁlter and process image data but still normally output a matrix of pixels, constituting an
Modern Vision Processing Technology 49
image. For perceptive vision applications it is paramount to cluster together pixels in a
region of interest and provide a single entity for this. This task is often referred to as object
segmentation. Having performed this, it is useful to describe the object using a centroid co-
ordinate describing the object position and a single magnitude providing a measure for the
object size. Having such high-level processed data available can beneﬁt countless systems
through a broad range of disciplines.
Machine vision for autonomous navigation, automation of security camera tasks, image
stabilisation for medical applications and biochemical cellular migration/population analysis
are some applications that could beneﬁt from advances in such processing techniques. For
mobile platforms including autonomous systems and handheld devices, minimising power
consumption is of the upmost importance. It is therefore beneﬁcial to include low-power
front-end electronics to perform this saliency or region-of-interest detection to alleviate
other processing tasks by applying attention only where it is most useful.
For applications that include the centroid computation as part of a feedback loop, high
speed and low latency are most crucial. Latency is an especially important issue for image
stabilisation and feedback systems with mechanical actuators. They demand quick response
or risk becoming oscillatory or simply ineﬀective. For example, in microsurgery, tremors
from the surgeons hand in addition to tremors from the subject can result in hazard sit-
uations. Here, optical tracking the surgeons instrument and objects in the operating ﬁeld
coupled with mechanical compensation of surgical instruments could result in jitter-free
surgery.
3.4.1 Sequential Processing
As already stated, virtually all processing architectures today are based on the von Neu-
mann sequential processing paradigm [15]. With the advent of reconﬁgurable processing
hardware including ﬁeld-programmable gate arrays (FPGA) and digital signal processors
(DSP), it has been made possible to produce autonomous embedded systems. Furthermore,
the reconﬁgurability has extended traditionally computer-bound software algorithms to be
used in real-time processing hardware. The computational eﬃciency of a state-of-the-art
hardware platform is more or less ﬁxed; usually quoted in mW or μW per MIPS (Million
Instructions Per Second). Subsequently, it is the software algorithm making good use of
hardware resources which determines the system performance.
Modern Vision Processing Technology 50
This subsection will outline some common software techniques suitable for object cen-
troid computation.
Centre-of-Mass Computation
Calculating the centre of mass (COM) or centroid of an object is a relatively (computation-
ally) simple and eﬃcient task. Considering the image to be a matrix I of intensities that
contains both an object and a background. Equation 3.1 gives the centroid calculation for
a single axis.
Cx =
∑n
i=1
∑m
j=1(xi · Iij)∑n
i=1
∑m
j=1 Iij
(3.1)
where xi is the coordinate of a pixel on the x-axis and Iij is the intensity of that pixel. This
equation assumes that the intensities of the object have higher numerical value than the
background. The computational expense (related to image size) of a single axis centroid
calculation is given by: Ix ∝ 2xy, where Ix is the number of processor instructions and x
and y are the image dimensions.
Although this approach is valid for asymmetric objects, the method is especially sus-
ceptible to changes in object shape and orientation between successive images when used
in target tracking. Furthermore, this method alone cannot handle multiple target track-
ing/centroiding.
Object Segmentation
In order to facilitate Multi-Target Tracking (MTT), eﬀective object segmentation becomes a
fundamental requirement. A thresholding function to produce a binary template combined
with an object indexing technique is a basic method for achieving this. However, in real-
world images containing noisy undeﬁned targets a more advanced technique is required for
robust object segmentation.
The Active Contour (Snake) Algorithm [16]
The active contour method provides a reliable method of object selection within images
containing “real-world” data. An active contour (snake) is a deformable contour that moves
Modern Vision Processing Technology 51
under a variety of local image constraints and object-model constraints. The representation
of a snake is: v(s) = (x(s), y(s)), where s runs from 0 to 1 over the perimeter of the
snake. The snake is controlled by minimising a function which converts high-level contour
information like curvature and discontinuities and low-level image information like edges,
gradients and terminations into energies. The energy functional is given by:
E =
∫ 1
0
1
2
[α|x′(s)|2 + β|x′′(s)|2] + Eext(x(s))ds (3.2)
where α and β are weighting parameters that control the snakes tension and rigidity, re-
spectively and x′(s) and x′′(s) denote the ﬁrst and second derivatives of x(s) with respect
to s. The external energy function Eext is derived from the image so that it takes on its
smaller values at the features of interest, such as boundaries. Given a grey-level image
I(x, y), viewed as a function of continuous position variables (x, y), a typical external en-
ergy designed to lead an active contour toward a discrete edge is: Eext(x, y) = −|∇I(x, y)|2,
where ∇ is the gradient operator. The energy function is minimised by solving the Euler
equation:
αx′′(s) + βx′′′′(s) +∇Eext = 0 (3.3)
On formation of the active contour boundary, the area enclosed can be delegated to a
centre-of-mass calculation function to compute the centroid of that particular object.
The active contour method provides an eﬀective boundary detection technique suitable
for object segmentation. The resulting contour is smooth and continuous, and adapts readily
to deforming objects (changing shape, orientation and/or size). On the down side, the snake
algorithm is computationally intensive and requires external initialisation processes which
are able to position a snake close enough to the desired solution. Furthermore, the image
data is required to be suﬃciently smooth so that the snake does not remain blind to the
desired solution and to maintain numerical stability in the iterative computations.
Other techniques
It has already been established that a software solution to multiple target tracking in real-
world data needs to incorporate several algorithms. To perform object segmentation the
Modern Vision Processing Technology 52
active contour method has been outlined, whereas for the centroid calculation the centre-
of-mass computation has been mentioned. Other techniques include:
• For centroid calculation: object contour averaging, intensity centroiding [17], mul-
tithreshold centroiding [18] and Gaussian ﬁt estimation [19].
• For object segmentation: binary thresholding, fuzzy [20], alpha map [21], particle
ﬁlter [22], distribution matching [23] and gradient vector diﬀusion+region merging
[24].
Limitations
Although sequential processing is extremely powerful in processing oﬀ-line data, this tech-
nique presents a computational bottleneck in real-time centroid processing; especially in the
case of multiple-target tracking. The sheer volume of image data and complexity of object
segmentation algorithms result in a computational workload that would render an embedded
processing platform too power intensive for real-time and high frame-rate operation.
3.4.2 Distributed Processing
To overcome the computational limitations of traditional image processing techniques, a
parallel processing approach could be explored. Hybrid focal-plane electronics combined
with photodetection elements have the potential to achieve computationally eﬃcient dis-
tributed centroid processing.
A review of previously developed vision chips demonstrate both the feasibility and po-
tential in distributed architectures for centroid processing. High speed operation, good
robustness and low power consumption have already been reported in a number of such
systems already developed. A quantitative comparison of several centroiding vision chips
in key performance criteria is provided in Table 3.2. The underlying principles of operation
can be divided into three main categories:
Column/Row Summation
The most common approach to object centroid computation is illustrated in Fig. 3.5. The
basic principle is to sum the pixels in all rows and columns to the edge of the array and
Modern Vision Processing Technology 53
Y
ea
r
R
ef
.
T
ec
h
D
ie
Si
ze
A
rr
ay
P
ix
el
P
ix
el
F
ill
M
ul
ti
pl
e
O
b
je
ct
A
cc
ur
ac
y
Sp
ee
d
P
ow
er
(μ
m
)
(m
m
2
)
Si
ze
Si
de
(λ
)
Fo
rm
at
Fa
ct
or
(%
)
O
b
je
ct
s
Si
zi
ng
(P
ix
el
s)
(K
F
P
S)
(m
W
)
20
05
[2
5]
0.
18
25
.0
0
48
×4
8
94
4
A
na
lo
gu
e,
B
in
ar
y1
12
.5
ye
s
ye
s
13
2
0.
24
20
04
[2
6]
0.
18
4.
83
64
×6
4
17
9
B
in
ar
y3 ,
A
ve
ra
gi
ng
27
no
no
0.
3
0.
3
41
20
04
[2
7]
0.
5
-
43
×4
3
58
A
P
S4
49
ye
s5
no
0.
2-
0.
42
-
-
20
04
[2
8]
0.
7
18
.0
0
5×
5
29
A
na
lo
gu
e3
4
-
no
no
-
2.
4-
4.
8
-
20
04
[2
9]
0.
8
6.
25
20
×2
0
36
A
na
lo
gu
e3
12
no
no
0.
01
3
3
15
20
03
[3
0]
0.
18
6.
00
80
×8
0
92
3
B
in
ar
y3
51
no
no
0.
1
1
30
20
03
[3
1]
0.
5
49
.0
0
64
×6
4
41
B
in
ar
y6
18
.8
18
m
ax
.5
no
13
1
11
2
20
03
[3
2]
0.
6
20
.2
5
11
×1
1
17
B
in
ar
y6
6.
7
ye
s5
no
0.
1
10
-
20
03
[3
3]
0.
8
8.
00
20
×2
0
36
B
in
ar
y3
12
no
no
0.
1
3
15
20
02
[3
4]
0.
5
2.
25
24
×2
4
13
3
A
na
lo
gu
e4
19
no
no
-
1
-
20
02
[3
5]
0.
6
2.
88
60
×3
6
76
A
na
lo
gu
e3 ,
A
P
S7
12
,
16
7
no
no
13
3.
68
2.
6
19
99
[3
6]
0.
5
20
.2
5
43
×4
3
57
B
in
ar
y3
23
no
no
-
-
-
19
99
[3
7]
1.
2
4.
00
40
×1
18
8
A
na
lo
gu
e9
5
no
no
-
88
-
19
99
[3
8]
1.
2
22
.0
9
25
×2
4
26
6
A
na
lo
gu
e9
3
no
no
-
-
19
.9
19
99
[3
9]
2.
0
4.
93
23
×1
29
8
A
na
lo
gu
e9
-
no
no
-
-
5.
4
19
98
[4
0]
2.
0
6.
00
24
×2
4
62
A
na
lo
gu
e9
30
no
no
3
78
0.
25
1
0
19
98
[4
1]
1.
2
4.
00
12
×1
0
18
3
A
na
lo
gu
e3
11
no
no
13
-
5
19
98
[4
2]
3.
0
-
25
×1
-
N
eu
ro
n3
-
no
no
1.
5
-
10
1
1
1
C
en
tr
oi
di
ng
by
di
st
ri
bu
te
d
sp
at
io
te
m
po
ra
l
al
go
ri
th
m
;
pr
es
en
te
d
in
th
is
th
es
is
.
2
Su
b-
pi
xe
l
ac
cu
ra
cy
po
ss
ib
le
by
po
st
-p
ro
ce
ss
in
g.
3
C
en
tr
oi
di
ng
by
co
lu
m
n
an
d
ro
w
su
m
m
at
io
n
(c
en
tr
e-
of
-m
as
s)
.
4
C
om
pu
ta
ti
on
-o
n-
re
ad
ou
t
(n
ot
in
-p
ix
el
).
5
M
ul
ti
pl
e
ce
nt
ro
id
in
g
by
se
qu
en
ti
al
w
in
do
w
in
g
an
d
sc
an
ni
ng
.
6
C
en
tr
oi
di
ng
by
th
re
sh
ol
di
ng
,
w
in
do
w
in
g
an
d
se
ar
ch
in
g.
7
In
cl
ud
es
in
te
rw
ea
ve
d
A
P
S
im
ag
er
co
m
bi
ne
d
w
it
hi
n
ce
nt
ro
id
(a
nd
m
ot
io
n
de
te
ct
in
g)
ar
ra
y.
8
T
ra
ck
in
g
sp
ee
d,
i.e
.
st
im
ul
us
m
ov
in
g
at
pi
xe
ls
/s
ec
.
9
C
en
tr
oi
di
ng
by
W
in
ne
r-
T
ak
e-
A
ll
(W
T
A
)
ne
tw
or
k.
1
0
O
nl
y
st
at
ic
po
w
er
di
ss
ip
at
io
n
is
re
po
rt
ed
.
1
1
E
xc
lu
di
ng
im
ag
e
ac
qu
is
it
io
n,
i.e
.
th
is
sy
st
em
in
cl
ud
es
on
ly
ce
nt
ro
id
in
g
ha
rd
w
ar
e.
T
ab
le
3.
2:
C
om
pa
ra
ti
ve
re
vi
ew
of
ce
nt
ro
id
de
te
ct
in
g
vi
si
on
ch
ip
s.
Modern Vision Processing Technology 54
0 2 4 4 3 3 2 0
0
5
6
5
2
0
0 0.70 2.20 2.80 2.65 2.10 0.35 0
0
3.35
4.25
2.95
0.25
0
Figure 3.5: Object centroid computation by row/column summation followed by one di-
mension mean point calculation. Binary (left) and analogue (right) representation schemes.
use one-dimensional mean point calculation to determine the X and Y centre of mass. This
mean point calculation can be implemented either using analogue techniques, for example
using a basic vector-matrix multiplier or with a mean cumulative computation function
implemented with digital logic.
The summation has been executed upon either binary values (employing a thresholding
function) [26] [30] [33] [36], or directly on continuous value (analogue) data [28] [34] [35] [29]
[24]. Since the analogue summation considers the fractional values of the object edge pixels,
centroid computation with sub-pixel accuracy is possible using this technique. The discrete
summation however provides am increased degree of immunity to noise and/or fuzzy data.
Although this column/row summation technique can compute an object centroid with
precision, speed and relatively simple hardware, this technique remains limited to computing
single object centroids.
Windowing and Search
Another eﬀective technique, more widely used in general target tracking is using windowing
and search operations to locate objects and then to locally compute the centroid. Systems
based on such a strategy employ an initial search algorithm to locate diﬀerent points of
interest. This has been done by using maximum local point detectors on a resistive network
and then generating a search window by propagating outwards [32] or by target tracking;
using an initial target template (lock) to deﬁne the initial search windows [31]. On deﬁni-
tion of these tracking windows, the centroid extraction is normally facilitated through the
Modern Vision Processing Technology 55
Figure 3.6: Object centroid computation by winner-take-all network (competition illus-
trated only in one dimension). Using basic resistive grid (left) for single centroid, dy-
namic switching network (middle) for saliency/object segmentation and interrogation win-
dow (right) for centroid tracking
column/row summation method.
The great advantage of such schemes being the ability to track and centroid multiple
targets. However, this is at the expense of increased complexity, either with intricate in-
pixel digital processing or oﬀ-chip programmable logic device (PLD) to facilitate the digital
computation. Furthermore this introduces the requirement for a clock and control strategy
which can lead to degradation of signal-to-noise ratio (SNR) in addition to increased power
consumption.
Winner-take-all (WTA)
A third approach to centroid detection is to employ analogue processing for object seg-
mentation and/or saliency followed by a winner-take-all network on a resistive grid. Such
systems have either operated on the resistive network directly [40] or included dynamic
switch networks [38] for object segmentation or spatial derivative circuits [39] [37] for di-
rection sensing. These various techniques are illustrated in Fig. 3.6.
As this is a fundamentally analogue solution, reduced circuit complexity makes imple-
mentation straight forward. The voltage-smoothing resistive grid inherently removes noise
and thus increases robustness. Furthermore, by combining WTA networks with windowing
functions, multiple object centroid determination is possible [40].
Modern Vision Processing Technology 56
3.5 Summary
This chapter has begun by reviewing CCD and CMOS technologies. It has been estab-
lished that these technologies will continue to co-exist as complimentary rather than one
superseding the other. CCD’s will continue to provide high-quality, high resolution im-
agers for top-end applications, whereas CMOS oﬀers the ability of in-pixel integrability
with electronics and low power/high speed operation. This trend makes CMOS technology
ideal for implementation of biologically inspired electronics (mentioned previously), based
on distributed algorithms and architectures.
Conventional software techniques have extensively tackled single target tracking and
centroid detection. The algorithm involved is both computationally eﬃcient and hardware
implementable. Moreover, many distributed techniques based on the same centre-of-gravity
calculation have also proved successful in implementing robust, accurate and power eﬃcient
vision chips.
Centroiding vision chips however, have failed to deliver true distributed multi-target
tracking. Although few systems have reported multiple object capability, the eﬀectiveness
and eﬃciency has yet to be demonstrated. On the other hand, software techniques have
matured a variety of object segmentation algorithms suitable for multi-target tracking.
However, the huge computational expense of implementing these limit their use to oﬄine
or low frame-rate processing. Furthermore, such techniques do not scale eﬃciently, for
example, the array (pixel grid) size may only be scaled a certain amount; until the processing
platform is operating at full capacity.
Another important feature vision chips have not yet yielded upon is object size and/or
shape estimation. This coupled with centroid location would provide very powerful high-
level processed data widely applicable in vision processing. For example, object size could
be used as a screening parameter, limiting the centroid co-ordinates only to objects within
a certain size window. This would be useful in microscopic cellular population analysis
for counting of biological cells and classiﬁcation by size. Such a technique could provide
cell-type ratio’s, for example, between red and white blood cells; that have a distinct size
diﬀerence.
References
[1] W. S. Boyle and G. E. Smith, “Charge coupled semiconductor devices,” Bell System
Technology Journal, vol. 49, pp. 587–593, 1970.
[2] E. R. Fossum, “Active Pixel Sensors (APS) - Are CCDs Dinosaurs?,” Proceedings of
SPIE, vol. 1900, pp. 2–14, 1992.
[3] D. Litwiller, “CCD vs. CMOS: Facts and Fiction,” Photonics Spectra, pp. 154–158,
2001.
[4] N. Blanc, “CCD versus CMOS has CCD imaging come to an end?,” Photogrammetric
Week ’01, by Fritsch/Spiller (eds.), pp. 131–137, 2001.
[5] L. Kozlowski, G. Rossi, L. Blanquart, R. Marchesini, Y. Huang, G. Chow and J.
Richardson, “A Progressive 1920x1080 Imaging SoC for HDTV Cameras,” Proceedings
of IEEE International Solid-state Circuits Conference, vol. 19.7, 2005.
[6] M. Mase1, S. Kawahito1, M. Sasaki and Y.Wakamori, “A 19.5b DR CMOS Image
Sensor with 12b Column-Parallel Cyclic A/D Converters,” Proceedings of IEEE Inter-
national Solid-state Circuits Conference, 2005.
[7] Safehouse Technology Limited Website, http://www.safehouse.com.au, 2004.
[8] C. A. Mead, “Neuromorphic Electronic Systems,” Proceedings of the IEEE, vol. 78,
no. 10, pp. pp1629–1636, 1990.
[9] P. Dudek and P. J. Hicks, “A General-Purpose Processor-per-Pixel Analog SIMD Vision
Chip,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 52, no. 1,
pp. 13–52, 2005.
[10] V. Gruev and R. Etienne-Cummings, “A pipelined temporal diﬀerence imager,” IEEE
Journal of Solid-State Circuits, vol. 39, no. 3, pp. 538–543, 2004.
57
REFERENCES 58
[11] A. Rodriguez-Vazquez, G. Linan-Cembrano, L. Carranza, E. Roca-Moreno, R.
Carmona-Galan, F. Jimenez-Garrido, R. Dominguez-Castro and S. E. Meana,
“ACE16k: The Third Generation of Mixed-Signal SIMD-CNN ACE Chips Towards
VSoCs,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 51, no. 5,
pp. 851–863, 2004.
[12] S. Dallaire, M. Tremblay, and D. Poussart, “Mixed-signal VLSI architecture for real-
time computer vision,” Real-Time Imaging, vol. 3, no. 5, p. 307317, 1997.
[13] U. Mallik, M. Clapp, E. Choi, G. Cauwenberghs and R. Etienne-Cummings, “Temporal
Change Threshold Detection Imager,” Proceedings of IEEE International Solid-state
Circuits Conference, vol. 19.9, pp. 23–25, 2005.
[14] V. Gruev and R. Etienne-Cummings, “Implementation of Steerable Spatiotemporal
Image Filters on the Focal Plane,” IEEE Transactions on Circuits and Systems I:
Analog and Digital Signal Processing, vol. 49, no. 4, pp. 538–543, 2002.
[15] J. Von Neumann and A. W. Burks, Theory of self-reproducing automata. University of
Illinois Press, 1966.
[16] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” Interna-
tional Journal of Computer Vision, vol. 1, no. 4, pp. 321–331, 1987.
[17] M. Singh, B. S. Chauhan and N. K. Sharma, “VLSI Architecture of Centroid Tracking
Algorithms for Video Tracker,” Proceedings of the 17th IEEE International Conference
on VLSI Design, pp. 697–700, 2004.
[18] J. Shah, “Applications and Implementations of Centroiding using CMOS Image Sen-
sors,” Master’s thesis, University of Waterloo, Ontario, Canada, 2002.
[19] P. J. Pietraski and Z. Zojceski, “Search Fast Centroid Estimation Algorithm for High-
Rate Detectors Based on a Two-Poinbt Gaussian Fit,” IEEE Transactions on Nuclear
Science, vol. 47, no. 4, pp. 1510–1515, 2000.
[20] B. M. Carvalho, G. T. Herman and Y. T. Kong, “Simultaneous Fuzzy Segmentation
of Multiple Objects,” Electronic Notes in Discrete Mathematics, by A. Del Lungo, V.
Di Ges and A. Kuba, eds., 2003.
[21] Y. Altunbasak, R. Oten and R. J. P. de Figueiredo, “Simultaneous object segmenta-
tion, multiple object tracking and alpha map generation,” Proceedings of the IEEE
International Conference on Image Processing, vol. 1, pp. 69–72, 1997.
REFERENCES 59
[22] Z. Khan, T. Balch and F. Dellaert, “Eﬃcient particle ﬁlter-based tracking of multiple
interacting targets using an mrf-based motion model,” Proceedings of the IEEE/RSJ
International Conference on Intelligent Robots and Systems, vol. 1, pp. 254–259, 2003.
[23] D. Freedman, R. J. Radke, Y. Jeong, T. Zhang and G. T. Y. Chen, “Model-Based
Multi-Object Segmentation via Distribution Matching,” Proceedings of the IEEE
Workshop on Articulated and Nonrigid Motion, p. 11, 2004.
[24] Z. Yu and C. Bajaj, “Image segmentation using gradient vector diﬀusion and region
merging,” Proceedings of the IEEE International Conference on Pattern Recognition,
vol. 2, pp. 941–944, 2002.
[25] T. Constandinou, Bio-inspired Electronics for Micropower Vision Processing. PhD
thesis, Imperial College of Science, Technology and Medicine, University of London,
2005.
[26] C. Thomas, R. Hornsey and K. Yip, “CMOS imager design for fast centroid read-
out,” Proceedings of the Canadian Conference on Electrical and Computer Engineering,
vol. 4, pp. 2315–2318, 2004.
[27] A. Fish, D. Akselrod and O. Yadid-Pecht, “High Precision Image Centroid Computa-
tion via an Adaptive K-Winner-Take-all Circuit in Conjunction with a Dynamic Ele-
ment Matching Algorithm for Star Tracking Applications,” Kluwer Analog Integrated
Circuits and Signal Processing, vol. 39, pp. 251–256, 2004.
[28] B. H. Pio, B. Hayes-Gill, M. Clark, M. G. Somekh, C. W. See, S. Morgan and A. Ng,
“Integration of a Photodiode Array and Centroid Processing on a single CMOS Chip
for a Real-time Shack-Hartmann Wavefront Sensor,” IEEE Sensors Journal, vol. 4,
no. 6, pp. 787–794, 2004.
[29] N. Massari, L. Gonzo, M. Gottardi and A. Simoni, “A Fast CMOS Optical Position
Sensor with High subpixel Resolution,” IEEE Transactions on Instrumentation and
Measurement, vol. 53, no. 1, pp. 116–123, 2004.
[30] R. D. Burns, J. Shah, C. Hong, S. Pepic, J. S. Lee, R. I. Hornsey and P. Thomas,
“Object Location and Centroiding Techniques with CMOS Active Pixel Sensors,” IEEE
Transactions on Electron Devices, vol. 50, no. 12, pp. 2369–2377, 2003.
REFERENCES 60
[31] T. Komuro, I. Ishii, M. Ishikawa and A. Yoshida, “A Digital Vision Chip Specialized for
High-Speed Target Tracking,” IEEE Transactions on Electron Devices, vol. 50, no. 1,
pp. 191–199, 2003.
[32] J. Akita, A. Watanabe, O. Tooyama, M. Miyama, M. Yoshimoto, “An Image Sen-
sor with Fast Objects’ Position Extraction Function,” IEEE Transactions on Electron
Devices, vol. 50, no. 1, pp. 184–190, 2003.
[33] N. Viarani, N. Massari, L. Gonzo, M. Gottardi, D. Stoppa and A. Simoni, “A fast and
low power CMOS sensor for optical tracking,” Proceedings of the IEEE International
Symposium on Circuits and Systems, vol. 4, pp. 796–799, 2003.
[34] R. A. Blum, C. S. Wilson, P. E. Hasler and S. P. DeWeerth, “A CMOS imager with
real-time frame diﬀerencing and centroid computation,” Proceedings of the IEEE In-
ternational Symposium on Circuits and Systems, vol. 3, pp. 329–332, 2002.
[35] M. A. Clapp and R. Etienne-Cummings, “A Dual Pixel-type Array for Imaging and
Motion Centroin Localization,” IEEE Sensors Journal, vol. 2, no. 6, pp. 529–548, 2002.
[36] G. Erten and S. Hagopian, “Integrated image sensor processor with on-chip centroid-
ing,” Proceedings of the IEEE Midwest Symposium on Circuits and Systems, vol. 1,
pp. 262–265, 1999.
[37] G. Indiveri, “Neuromorphic analog VLSI sensor for visual tracking: circuits and appli-
cation examples,” IEEE Transactions on Circuits and Systems II: Analog and Digital
Signal Processing, vol. 46, no. 11, pp. 1337–1347, 1999.
[38] C. S. Wilson, T. G. Morris and S. P. DeWeerth, “A two-dimensional, object-based
analog VLSI visual attention system,” Proceedings of the 20th Conference on Advanced
Research in VLSI, pp. 291–308, 1999.
[39] T. Horiuchi and E. Niebur, “Conjunction search using a 1-D, analog VLSI-based, atten-
tional search/tracking chip,” Proceedings of the 20th Conference on Advanced Research
in VLSI, pp. 276–290, 1998.
[40] V. Brajovic and T. Kanade, “Computational sensor for visual tracking with attention,”
IEEE Journal of Solid-State Circuits, vol. 33, no. 8, pp. 1199–1207, 1998.
[41] R. Etienne-Cummings, V. Gruev and M. Abdel-Ghani, “VLSI Implementation of Mo-
tion Centroid Localization for Autonomous Navigation,” Advances in Neural Informa-
tion Processing Systems, vol. 10, pp. 685–691, 1998.
REFERENCES 61
[42] N. M. Yu, T. Shibata and T. Ohmi, “A Real-Time Center-of-Mass Tracker Circuit Im-
plemented by Neuron MOS Technology,” IEEE Transactions on Circuits and Systems
II: Analog and Digital Signal Processing, vol. 45, no. 4, pp. 495–503, 1998.
Chapter 4
A Distributed Algorithm for
Centroid Detection
4.1 Introduction
This chapter introduces a novel scheme suitable for object-based centroid computation based
on a distributed processing architecture. Although several of the features have been biolog-
ically inspired, the algorithm is fundamentally synthetic. By using this hybrid approach,
a realistically hardware implementable system can be developed beneﬁting from increased
computational eﬃciency provided by the bio-inspired analogue processing elements. The
reduced power consumption enables realisation of mobile diagnosis devices which would
otherwise be technically unachievable.
This chapter begins by describing qualitatively a speciﬁc distributed algorithm suitable
for hardware implementation. A formalisation is then provided by describing the vari-
ous parallel processing functions using mathematical and/or logical expressions. It is then
outlined how this would be implemented in hardware, from a top-level functional perspec-
tive. Subsequently, software simulation results provide details to computational workload,
robustness and accuracy. Finally, this algorithm is architecturally generalised to a parallel-
processing platform with other similarly-based distributed algorithms being proposed.
62
A Distributed Algorithm for Centroid Detection 63
Figure 4.1: Example analysis scenarios where extraction of object centroid and/or size could
provide useful information in (from left to right): (a) Pharmaceutical drug production (b)
Reliability (leak) detection of bubbles in ﬂuids and (c) Microscopic cellular population
analysis
4.2 The Bio-pulsating Contour Reduction Algorithm
An algorithm is proposed for distributed centroid processing and sizing of simple objects
[1]. Circular blob-like objects with uniform texture and an intensity diﬀering from the
background level can be segmented and their size and position (centroid) determined by
means of a distributed binary algorithm. Some example images this can be applied to are
shown in Fig. 4.1.
Possible applications for such perceptive vision processing span through many disci-
plines. From machine vision applications such as production line inspection and reliability
detection, security (asset tracking), surveillance (counting persons passing a boundary),
space (star tracking, lunar mapping and vehicle navigation) and military (target tracking)
to biomedical analysis (cellular, microbial and neural).
4.2.1 Overview
This algorithm uses an edge-detection technique to form the contours and trigger the data-
driven processing. On detection of an object boundary, the initial state for the signal ﬂow
is set. By propagating an inward ﬁll, the contour can be reduced until this converges to
the centre. The central point is detected by utilising spatiotemporal integration; i.e. a
summation of the cells set within the receptive ﬁeld within a certain time window. On
centroid detection, the object is reset and output transmitted, thus realising an inward
pulsating action. The frequency of pulsation determines the size, i.e. radius of this object.
A Distributed Algorithm for Centroid Detection 64
Fig. 4.2 illustrates this interaction graphically through computer simulations (described
later in detail).
4.2.2 Method
Threshold Detection
Objects are deﬁned as regions in the image with narrow-ﬁeld (2x2 average with adjacent
cells1) local-average intensity either below (or above) the average level of the input image.
This can implemented either as a true global average or a wide-ﬁeld local-average; centred
on the object pixel to be tested.
Edge and Contour Detection
The edges are detected (in continuous-time) by computing the diﬀerence in adjacent cel-
lular intensities. In a quad-grid connectivity scheme (i.e. using square cells), each pixel is
compared with its four neighbours. Subsequently contours are formed if a continuous edge
is determined; assuming that a contour is deﬁned as a cell corner adjacent to at least two
detected edges.
State Setting
The pulsating action is initially triggered by setting a cells (binary) state on detection of
a contour, i.e. when it lies on a continuous edge. The reduction is then facilitating by
each pixel checking whether any of their neighbouring cells have been set in addition to the
object criterion (threshold for that cell) being satisﬁed. The rate of this contour (cellular
state) reduction is deﬁned by an artiﬁcial propagation delay introduced in this event path.
Centroid Detection
In parallel with this contour reduction, each pixel checks whether it satisﬁes the centre-
criterion. This is deﬁned as when surrounding cells (but not directly adjacent; for reliability)
1A cell is referred to as the minimally-repetitive processing element (pixel circuitry) in for hardware
tessellation.
A Distributed Algorithm for Centroid Detection 65
Input image Average (smoothed) Contour detection Threshold detection
0 1 2 3 4 t
5 6 7 8 9 t
10 11 12 13 14 t
Figure 4.2: Computer simulation results of the bio-pulsating contour reduction algorithm,
illustrating continuous-time image processing functions (top row) and snapshots taken at
regular time intervals at the propagation delay of the processing (bottom 3 rows)
A Distributed Algorithm for Centroid Detection 66
are set and the central pixel is remains unset. Such a condition immediately ﬂags a centroid-
detection signal, that transmits the pixel co-ordinates oﬀ-array and subsequently oﬀ-chip,
then issuing a localised reset signal.
State Resetting
This reset signal is then back-propagated outwards in a recursive manner similar to the
contour reduction with the absence of the artiﬁcial delay. This delay is not required in
both paths, as the pulsating (or reset) period can be deﬁned by including a delay in either
the forward or back-propagating path. More importantly a swift back-propagating reset
is required to avoid ﬂagging multiple centroids. The resetting action therefore acts by
suppressing neighbouring cells from detecting centroids, thus realising a winner-takes-all
(WTA) type functionality.
An unusual and important feature of this method is the absence of any pre-deﬁned syn-
chronisation signal, for example, a clock. The only synchronisation is obtained through the
data-driven object reset scheme but on a local, rather than a global basis. This in combi-
nation with the artiﬁcial delay time-constant deﬁnes the processing time, since CMOS the
(asynchronous) digital logic operates with propagation delays in the order of nanoseconds.
4.2.3 Analytical Formalisation
Image Processing Functions
The distributed spike-domain processing is driven by two speciﬁc binary signals generated
within each cell; the THRESHOLD (Th) and CONTOUR (Co) inputs. These serve to
initiate and steer the signal propagation correctly.
The THRESHOLD input is used to facilitate the object segmentation by deﬁning the
valid area the signal can propagate within. Therefore this has the task of ensuring the ﬁll
is propagating inwards and NOT outwards. This input for a particular cell is generated by
comparing its intensity, with the average background (outside object boundaries) intensity.
For object and background intensities that signiﬁcantly diﬀer it is therefore valid to use the
average image intensity as the global threshold point.
As the centroid processing occurs at the pixel corners, a local average of all pixels
adjacent to that processing node is required to convey a valid intensity at that point. This
A Distributed Algorithm for Centroid Detection 67
has the additional advantage of smoothing the image, thus helping reduce noise. This
local averaging (narrow-ﬁeld) function can be expressed as given in Eqn. 4.1. Further noise
suppression can be achieved by implementing a larger averaging ﬁeld, eg. using a 9 or 16
pixel squares.
Avnarrow =
1
4
x+1∑
i=x
y+1∑
j=y
Ii,j (4.1)
Similarly, to calculate the global average intensity the summations extend though the
whole array, as expressed in Eqn. 4.2.
Avglobal =
1
xy
x∑
i=1
y∑
j=1
Ii,j (4.2)
Where: (x,y) are the array dimensions.
However, for images with varying background intensity, for example a gradient due
to lighting conditions, it is favourable to perform a wide-ﬁeld local average. This should
be a large enough area to extend beyond a single object in order to capture the back-
ground intensity. An easily hardware-implementable scheme would be to combine all the
the narrow-ﬁeld local averages calculated in a cells column and/or row to calculate the
wide-ﬁeld local average. This would provide a “global” average unique to every pixel as
expressed in Eqn. 4.3.
Avwide =
1
2x
x∑
i=1
Inarrow(i,j) +
1
2y
x∑
j=1
Inarrow(i,j) (4.3)
Comparing the wide-ﬁeld average to the narrow-ﬁeld average constitutes the required
thresholding function, given in Eqn.4.4.
Th = vdd ·H(Avnarrow(x,y) −Avwide(x,y) +K) (4.4)
A Distributed Algorithm for Centroid Detection 68
Where: vdd is the supply voltage, H(x) is the Heaviside function, Avnarrow(x,y) andAvwide(x,y))
are the narrow- and wide- ﬁeld local averages centred on the (x, y) pixel and K is a toler-
ance adjustment constant; to avoid erroneous thresholding due to noise in images with low
contrast selectivity.
To increase versatility to input images, adapting the threshold function such that the
narrow- and wide- ﬁeld averages can be inter-changed, would enable the system to select
objects of either higher or lower intensity (relative to background).
In addition to requiring the THRESHOLD (Th) input to deﬁne the object areas, a CON-
TOUR (Co) input is also necessary to initiate the signalling from object boundaries. This
can be generated by using an edge detection function followed by some basic post-processing
to ensure reliability. To facilitate the edge detection, every pixel must be compared to its
four adjacent neighbours and if the diﬀerential surpasses a certain threshold, an edge has
been detected. On a tessellating basis, each pixel needs to be compared with only two neigh-
bours, for convenience the pixels below and to the right. The corresponding expressions are
given in Eqns. 4.5 and 4.6.
Evertical = vdd ·H(| ln(Ix,y)− ln(Ix,y+1)| − Ethreshold) (4.5)
Ehorizontal = vdd ·H(| ln(Ix,y)− ln(Ix+1,y)| − Ethreshold) (4.6)
Where: vdd is the supply voltage, H() is the Heaviside function, Ix,y is the intensity of
the (x, y) pixel in the array and Ethreshold is the pre-deﬁned threshold, for ﬂagging an edge
condition. Rather than comparing the intensity values directly, their natural logarithms are
taken, for this achieves increased dynamic range for edge detection.
On determining how many edges surround a speciﬁc cell, a contour is deﬁned as a
continuous edge, i.e. when a cell is adjacent to two edges and lies on the outside of an
object. This can be deﬁned as a boolean expression, given in Eqn. 4.7.
Co = (Th) · (A ·B · C) + (A ·B ·D) + (A · C ·D) + (B · C ·D) (4.7)
A Distributed Algorithm for Centroid Detection 69
Where: A, B, C and D are the four edge inputs and Th is the threshold status.
Centroid Detection Functions
Having used the continuous-time analogue processing (See Fig. 4.3) to generate two binary
signals: CONTOUR and THRESHOLD, in turn these are used to feed, control and regulate
the asynchronous logical neuronal network.
Each cell requires three bits static memory to store its current STATE, RESET and
CENTRE activity. These are updated asynchronously depending on one cells CONTOUR
and THRESHOLD inputs, its current STATE, RESET and CENTRE values and current
STATE, RESET and CENTRE values from surrounding cells. The minimum required
cellular interconnectivity for implementing this algorithm is illustrated in Fig. 4.4. This
particular connectivity is required for the following reasons:
• Each pixel receives four state inputs (from directly adjacent cells) to facilitate the
inward signal propagation.
• Each pixel receives four reset inputs (from directly adjacent cells) to facilitate the
back-propagation signal, i.e. the local reset.
• Each pixel receives eight centre inputs (from directly and diagonally adjacent cells)
to ensure multiple centroids are not detected. This realises a form of inhibition on
centroid detection; preventing neighbouring cells from also registering a centre.
• Each pixel receives four state inputs (from indirectly adjacent cells, i.e. two pixels
apart in each direction) to determine the surround status used in centroid determina-
tion. The immediately adjacent cells are no used for this purpose as this could pose
a reliability issue for uneven aspect ratio objects, i.e. those that are not perfectly
circular.
Therefore, the internal functionality of a (binary) processing element (cell) can be de-
scribed using state diagrams or boolean expressions. These can be used to deﬁne the
conditions for setting and resetting the various memories (RS ﬂip-ﬂop based).
The Set STATE (SSet) condition is deﬁned in Eqn. 4.8.
A Distributed Algorithm for Centroid Detection 70
Wide-field
Average
Narrow-field
Average
Compare &
Threshold
Edge
Detect
Contours Thresholds
Input
Stimulus
Intensity
Profile
Smoothing
Smoothing
Figure 4.3: Front-end continuous-time image-processing functionality.
A Distributed Algorithm for Centroid Detection 71
S
C
R
S1
C1
R1
S12
Centre
Reset
S3
C3
R3
S32
Centre
Reset
S2
C2
R2
State
C5
Reset
State
Centre
Reset
State
C6
Reset
State
Centre
Reset
S22
Centre
Reset
State
Centre
Reset
State
Centre
Reset
State
Centre
Reset
State
Centre
Reset
S42
Centre
Reset
State
Centre
Reset
State
Centre
Reset
State
Centre
Reset
State
Centre
Reset
S4
C4
R4
State
C7
Reset
State
Centre
Reset
State
C8
Reset
State
Centre
Reset
Figure 4.4: Local connectivity required for binary signals (state, reset and centre) to and
from every pixel.
SSet(t+ τs) = Co(t) + Th(t) · (S1(t) + S2(t) + S3(t) + S4(t)) (4.8)
Where: τs is the delay time-constant which deﬁnes the propagation rate, Co(t) is the current
CONTOUR status, Th(t) is the current THRESHOLD status and S1(t), S2(t), S3(t), S4(t)
being the STATE variables of the directly adjacent cells.
The Reset STATE (SReset) and Set RESET (RSet) occur for identical conditions, i.e.
when a cells RESET memory is set, its STATE memory resets. This occurs either on
CENTRE being reset, or on RESET back-propagating (from adjacent cells) during a local
reset. This condition is deﬁned in Eqn. 4.9.
A Distributed Algorithm for Centroid Detection 72
RSet(t+ δt) = SReset(t+ δt)
SReset(t+ δt) = C(t− δt) · C(t) + S(t) · (R1(t) +R2(t) +R3(t) +R4(t)) (4.9)
Where: δt represents the propagation delay of the combinational logic, S(t) is the cells
current STATE value, C(t) is the current CENTRE status and R1(t), R2(t), R3(t), R4(t)
are the RESET variables of the directly adjacent cells.
The RESET memory is conﬁgured to self-reset, i.e. operate as a monostable, by feeding
the current RESET status through a small delay to the reset input. This ensures all setup
and hold times are respected when issuing a back-propagating RESET signal. This condition
is deﬁned in Eqn. 4.10.
RReset(t+ τr) = RSet(t) (4.10)
Where: τr is the Set-to-Reset delay time-constant.
The CENTRE memory is set on detection of an OFF-centre, ON-surround condition,
i.e. when surrounding pixels have STATE set and the centre-pixel has STATE not set.
Furthermore, lateral inhibition (Eqn. 4.11) prevents a cell ﬂagging CENTRE if an adjacent
cells CENTRE status is set. This condition is deﬁned in Eqn. 4.12.
CInhibit(t+ δt) = C1(t) + C2(t) + C3(t) + C4(t) + C5(t) + C6(t) + C7(t) + C8(t) (4.11)
Where: C1(t) to C8(t) are the CENTRE status of the 4 directly- and 4 diagonally- adjacent
cells.
CSet(t+ δt) = CInhibit · S(t) · S12(t) · S22(t) · S32(t) · S42(t) (4.12)
Where: S12(t), S22(t), S32(t) and S42(t) are the STATE values of cells’ two to the left, two
above, two to the right and two below respectively.
Finally the CENTRE memory is reset when the contour reduction reaches the centre
cell. This condition is deﬁned in Eqn. 4.13.
A Distributed Algorithm for Centroid Detection 73
CReset(t+ δt) = S(t) · S1(t) · S2(t) · S3(t) · S4(t) (4.13)
Where: S1(t), S2(t), S3(t) and S4(t) are the STATE values of the directly adjacent cells.
4.2.4 Algorithmic Features
• Asynchronous: The distributed nature of this algorithm makes it readily imple-
mentable using asynchronous digital logic. A tuneable (artiﬁcial) propagation de-
lay included in the input-enclosed feedback scheme provides a means of regulating
and controlling the rate of pulsing, when operated in closed-loop mode. Furthermore
such architectures are directly compatible with asynchronous oﬀ-chip communication
protocols such as the address-event representation (AER) scheme.
• Parallel Processing: Being of distributed nature, each cell contains identical processing
elements of relatively low speciﬁcations. This massive parallelism will be shown (later
in Chapter 6) to result in exceptional system performance; including high speed and
computational eﬃciency.
• Real-time: The parallel, distributed nature of the algorithm results in a very fast
processing time, making real-time and very high frame-rate2 processing feasible.
• Event-driven (spike domain): Image-detail driven processing produces a computa-
tional workload dependance on image content. This means that for no objects (in the
focal-plane) to centroid, the computational burden is much reduced.
• Scalable: Identical processing elements working in parallel means scaling up to an
increased array size will not produce a computational bottleneck. Power consumption
is therefore directly proportional to the number of processing elements (pixels).
• Robustness: Parallel processing provides an inherent tolerance to various non-idealities.
For example, eﬀects of fabrication defects, process variations and ill-conditioned are
reduced through relative representation and processing redundancy (discussed later).
• Hardware implementable: This distributed algorithm truly lends itself to hardware
implementation in standard CMOS technologies. Combining weak inversion analogue
2Continuous-time asynchronous operation does not sample or refresh the image array at regular time
intervals as in the case of a conventional imaging chip.
A Distributed Algorithm for Centroid Detection 74
CENTRE
Detection
STATE
Signalling
RESET
Signalling
PIXEL
EDGE
Detection
EDGE
Detection
PIXEL
PIXEL PIXEL
THRESHOLD
Detector
EDGE
Detection
EDGE
Detection
NARROW
Averaging
WIDE
Averaging
OUTPUT
Handshake
CONTOUR
Detector
Analogue Signal Processing Asynchronous Digital Processing
Column
Average
R
o
w
A
ve
ra
g
e
Column Address
Readout
Ro
w
 A
d
d
re
ss
Re
ad
o
u
t
State, Reset, Centre bus
to/from adjacent cells
St
at
e,
 R
es
et
, C
en
tr
e 
b
u
s
to
/f
ro
m
 a
d
ja
ce
n
t 
ce
lls
Figure 4.5: Proposed cellular architecture for object-based processing illustrating organisa-
tion and connectivity of functional blocks within a quad-pixel arrangement.
front-end signal conditioning and pre-processing together with distributed asynchro-
nous digital for neuronal-like networks makes ultra low power CMOS implementation
plausible.
4.2.5 Implementation
In order to facilitate the contour computation, the processing must occur at the pixel
corners, as illustrated in the pixel-cell architecture shown in Fig. 4.5.
The various functional (cellular-level) blocks; all continuous-time topologies, to be im-
A Distributed Algorithm for Centroid Detection 75
plemented in a standard CMOS technology are given below:
• Light detection: Integrated silicon pn junction (CMOS) photodiode (see Chapter. 5
for further details).
• Contour detection: Diﬀerential-input, single (discrete) output (thresholding) edge-
detectors feeding combinational logic for contour detection.
• Threshold detection: Narrow-ﬁeld averaging for input image smoothing and wide-ﬁeld
(or global) averaging for object detection; using current-mode techniques for linear
computations.
• Local resetting: reconﬁgurable (dynamic) switch network (logical), regulated by thresh-
old detector for object segmentation, to provide localised (object-constrained) reset-
ting by back-propagation.
• Neuromorphic logic: performing delay-and-propagate computation for signal ﬂow and
centre-surround-like computation for centroid determination.
• Memory: Single-bit static memories for storage of each cells signal, reset and centroid
states (3 bits per cell).
4.2.6 Simulation
Being an algorithm of both distributed and asynchronous nature, this is somewhat com-
plex to model and virtually impossible to simulate the exact behaviour using conventional
software techniques. However, by making some simplifying assumptions a “frame-based”
representation can be derived and simulated.
The basic assumption is that all delay elements are perfectly matched and therefore all
pixels in a frame can be exhaustively processed sequentially, for each delay “period”. The
sequence of this pixel processing can be implemented either in a scan fashion or for more
realistic functionality, in a random pattern. This was developed using Borland Delphi V4
and the GUI developed is shown in Fig. 4.6. A full listing of the source code is provided in
Appendix A.
A Distributed Algorithm for Centroid Detection 76
Figure 4.6: The ORASIS simulator: screenshot of the developed software simulator for the
bio-pulsating contour reduction algorithm.
Eﬀective Computation
In order to estimate the eﬀective computation of this distributed algorithm, the frame-based
equivalent algorithm needs to be evaluated. Although, many of the functions are static,
i.e. require be executed each frame for each pixel, there also exist dynamic functions, for
example the reset cycle being recursive. The static computation is dependant only on the
array dimensions whereas dynamic computation is largely dependant on input data.
For a pixel array of dimensions (x, y), with n-objects (a ∈ [0, n]) of radius rn-pixels each,
the computational load required in processing each frame for main functions is speciﬁed
in Table. 4.1. This frame-based algorithm could be optimised by combining some of the
nested loop functions, thus reducing the computational workload by approximately 5-10%.
On the other hand, memory access operations have not been included in determining the
computational workload of the algorithm and as these are extensive, it is estimated these
would increase the computational burden by at least 50-100% [2] [3] [4].
Having a frame capture and process time of tμs, the complete computational load be-
comes:
Computation(total) =
1 + 8y + 37xy + 6
∑n
a=1(πr
2
a)
t× 10−6 (4.14)
A Distributed Algorithm for Centroid Detection 77
Function Instructions1 Computational Load
Local Averaging loops2, 4 ADD, 1 DIV y(1 + 5x)
Global Averaging loops2, 1 ADD, 1 DIV 1 + y(1 + 2x)
Edge Detection loops2, 4 SUB, 4 COND y(1 + 9x)
Contour Detection loops2, 4 COND y(1 + 5x)
Threshold Detection loops2, 1 COND y(1 + 2x)
State Deﬁnition loops2, 6 COND y(1 + 7x)
Centre Deﬁnition loops2, 5 COND y(1 + 5x)
Reset Deﬁnition loops2, 1 COND and y(1 + 2x) + 6
∑n
a=1(πr
2
a)
per iteration3, 5 COND
1 INC = increment, COND = condition, ADD = addition / summation,
SUB = subtraction, DIV = division.
2 Double nested loops require y(1+x) INC/COND instructions.
3 Multiple iterations due to recursive operation.
Table 4.1: Split of computational load for various processing functions
Estimating that the image comprises of n-objects of average radius rav-pixels.
Computation(total)  y(37x+ 8) + 6n(πr
2
av)
t× 10−6 (4.15)
For computer simulation parameters, with 100x100 array @ 50fps with 25 objects of
average 8-pixel radius:
Computation(software) = 50(100(37(100) + 8) + 6(25)(π82)) = 20.05MIPS
For hardware operation parameters (see Chapter 6 for details), with 48x48 array @
2000fps with 5 objects of average 5-pixel radius:
Computation(hardware) = 2000(48 ∗ (37(48) + 8) + 6(5)(π52)) = 175.21MIPS
The advantage of implementing this algorithm in custom hardware using asynchronous
techniques, is that more functions can be made dynamic and therefore data-driven. This re-
sults in a reduced average computational burden and can therefore contribute to substantial
reductions in power consumption.
In contrast, a state-of-the-art digital signal processor (DSP), for example the Texas In-
A Distributed Algorithm for Centroid Detection 78
struments TMS320C5000 series is quoted [5] to consume 0.25mW/MIPS. To execute the
above mentioned (hardware) scenario on such a system (excluding imager consumption)
would therefore consume at very least: 175.21*250μW=43.8mW. Furthermore, the maxi-
mum capacity of such power-eﬃcient platforms are limited to 400-600MIPS. Therefore scal-
ing this algorithm to a larger array size would be at the expense of frame-rate, thus making
the realization of a high resolution, high-frame rate system unfeasible using existing DSP
technology.
4.2.7 Robustness
The robustness of an algorithm provides an indication to its reliability against hardware
fabrication non-idealities in addition to how immune it is to ill-conditioned or noisy data.
Processing non-idealities refers to hardware fabrication defects (eg. unreliable via connec-
tions) in addition to process variations and device mismatches. In terms of this algorithm,
process variations would eﬀect the following:
• Component mismatch: This is by far the most critical expected source of errors. Com-
ponent mismatch would directly eﬀect the photodiode array, causing non-uniformities
in oﬀset (referred to as ﬁxed pattern noise) and sensitivity (referred to as speckle
noise) variations. Furthermore, non-uniformities in gain elements (transistors) would
increase the degradation due to sensitivity mismatch. Beyond the imaging, edge de-
tection and thresholding electronics would also be aﬀected by component mismatches,
resulting in uneven feature extraction.
• Processing Defects: Fabrication defects causing unreliable connections, could result in
various signals being incorrectly conveyed from one cell to its neighbour. Such errors
could cause a cell to ﬂag an erroneous result or cause a break in the propagation cycle.
In order to analyse and evaluate the robustness of this algorithm, the input image
data can be adjusted to include array non-uniformities such as ﬁxed-pattern noise, pixel
sensitivity and feature detection ﬂuctuations. This can be modelled and/or simulated using
various techniques.
A Distributed Algorithm for Centroid Detection 79
Edge Detection Level
Local intensity variation
Object
Intensity
Background
Intensity
Tolerable noise and array non-uniformities
Contrast Ratio
Local intensity variation
In
te
n
si
ty
Local intensity variation
Object
Intensity
Background
Intensity
Object Selectivity
Contrast Ratio
Local intensity variation
Global
Average
Threshold Level
Tolerable noise & array non-uniformities
Threshold DetectionEdge Detection
Figure 4.7: Acceptable noise margins for error-free binary edge detection and thresholding.
Analytical Robustness
This involves considering the intensity proﬁles and error tolerances of the input images and
therefore determining the noise margin or signal-to-noise ratio for error-free binary feature
extraction. Subsequently for a given image type, the optimum settings (threshold and edge
levels) can be determined for maximum binary robustness and furthermore the suitability
of the algorithm for diﬀerent image types can be analysed. Finally, the relationship between
distorted binary detection and erroneous centroid/sizing determination can be discussed.
Binary Feature Extraction Robustness
For edge detection (as deﬁned in Eqns. 4.5 and 4.6) to be reliable in an input image consisting
of (relatively) dark objects on a light background, the conditions speciﬁed in Eqns. 4.16 (for
edge-detection) and 4.17 (for no edge-detection) need to be satisﬁed (see Fig. 4.7).
(Ibg − Iobj)− Eoffset > Imargin + Inoise (4.16)
Eoffset > Imargin + Inoise (4.17)
Where: (Ibg − Iobj) is the contrast diﬀerence of the objects to background level, Eoffset
is the minimum edge detection level, Imargin is the maximum object (and/or background)
intensity variation and Inoise represents the maximum tolerable level of intensity variations
(non-uniformities) in the array.
Assuming Imargin(background) = Imargin(object) and Inoise is increased to the maxi-
mum allowable level such that Eqns. 4.16 and 4.17 become equalities, the optimum setting
for Eoffset can be speciﬁed for a given image type as expressed in 4.18.
A Distributed Algorithm for Centroid Detection 80
Eoffset =
1
2
(Ibg − Iobj) (4.18)
However, the actual robustness to noise (Eqn. 4.17); causing erroneous edges to be de-
tected, is in fact increased due to the post-edge-detection CONTOUR logic. The operation
of this logic is to eﬀectively screen out any noise triggering erroneous edges within object
boundaries by only detecting edges outside object regions. Furthermore by performing the
the thresholding operation on the narrow-ﬁeld local averaged data, the object segmentation
is also reliable against noise induced errors.
For threshold detection (as deﬁned in Eqn. 4.4) to be reliable in an input image consisting
of (relatively) dark objects on a light background, the conditions speciﬁed in Eqns. 4.19 (for
thresholding) and 4.20 (for no thresholding) need to be satisﬁed (see Fig. 4.7).
(Iav − Iobj)− Toffset > Imargin + Inoise (4.19)
(Ibg − Iav) + Toffset > Imargin + Inoise (4.20)
Where: (Iav−Iobj) is the margin from average intensity to object intensity level, (Ibg−Iav) is
the margin from background intensity to average intensity level, Toffset deﬁnes the threshold
detection oﬀset from average intensity, Imargin is the maximum object (and/or background)
intensity variation and Inoise represents the maximum tolerable level of intensity variations
(non-uniformities) in the array.
Assuming Imargin(background) = Imargin(object) and Inoise is increased to the maxi-
mum allowable level such that Eqns. 4.19 and 4.20 become equalities, the optimum setting
for Toffset can be speciﬁed for a given image type as expressed in 4.21. Depending on image
content, Toffset may take a positive or negative value.
Toffset =
1
2
(2Iav − Iobj − Ibg) (4.21)
To establish numerical data for the above described tolerances and intensity levels,
analysing some sample images can provide typical values. By performing edge-detection
and thresholding operations on sample red-blood-cell images (as those shown previously in
Figs. 4.1 and 4.3), the resulting binary masks can be used to obtain statistical image content
A Distributed Algorithm for Centroid Detection 81
Sample RB001 RB01 RB02 RB03 RB04 RB05
Global Mean(x): 0.645 0.728 0.675 0.836 0.722 0.897
Background Mean(x): 0.794 0.850 0.843 0.980 0.971 0.990
Std(σ): 0.0001 0.023 0.029 0.030 0.044 0.015
Object Mean(x): 0.409 0.624 0.497 0.698 0.440 0.741
Std(σ): 0.0001 0.037 0.037 0.032 0.059 0.037
Cover(%): 38.52 53.76 48.50 51.11 46.85 37.32
Count(N): 56 380 291 52 46 41
Edge Eoffset: 0.193 0.113 0.089 0.069 0.141 0.078
Inoise: 0.193 0.039 0.014 0.006 0.023 0.005
SNR(dB): 10.482 25.53 33.66 42.88 29.94 45.08
Threshold Toffset: 0.044 -0.009 0.005 -0.003 0.017 0.031
Inoise: 0.197 0.039 0.104 0.074 0.164 0.082
SNR(dB): 10.302 25.42 16.25 21.06 12.87 20.78
1 Test image with ideal object and background uniformity.
2 Indicates minimum possible SNR for a given contrast ratio.
Table 4.2: Example image analysis (red blood cells) for statistical spread in
object and background intensity levels. This is used to determine the edge
(Eoffset) and threshold (Toffset) levels for optimum robustness.
A Distributed Algorithm for Centroid Detection 82
data; extremely useful in determining the algorithmic robustness to that image type. Such
an example analysis is provided in Table. 4.2.
This data is extracted by thresholding the various images at the global average level and
clustering together all pixels within background and object regions. The statistical proper-
ties of each of these groups can therefore be computed and the speciﬁc feature variations
can be compared for various images. What this data shows is that the robustness to the
binary operations (edge detection and thresholding) is largely dominated by the variance of
image object and background intensities in addition to array non-uniformities (considered
as static noise). For input images with relatively high variance, the binary segmentation
technique oﬀers poor immunity to array non-uniformities (for example, only 5-10% noise
margin for the red-blood cell images), whereas for images with relatively low variance, this
technique can provide up to 30-40% noise margin.
Post-Binary Feature Extraction Robustness
The previous section has concentrated on optimising the feature extraction robustness based
on the implemented binary operations and image content. However, a major objective in
implementing distributed algorithms based on a biologically-inspired paradigm, is to capture
the inherent robustness, defect-immunity and tolerance to ill-conditioned data of nature.
Similarly, this algorithm is shown to substantially increase robustness beyond the expected
analytical feature extraction limit.
This feature can be attributed to the parallel distributed processing forming multiple
data ﬂow paths coupled with data redundancy and compression. In the context of this
algorithm, the data ﬂow is initiated at the object contours and an inward ﬁll is facilitated.
This has the eﬀect of compacting the amount of data being processed; for the “ring” of
cells being processed at any time reduces with this inward propagation. Furthermore, this
shrinking ring realises a many-to-one mapping and thus introduces massive data redun-
dancy. For example, if an edge cell does not register a contour, or an internal cell does
not register a threshold, the data propagation path simply tends to route itself around the
erroneous data. This action is illustrated in Fig. 4.8. Similarly, this algorithm makes defec-
tive circuitry redundant provided it is not incorrectly reporting an initiation condition, eg.
ﬂagging a contour condition.
Due to the anticipated complexity in modelling this behaviour analytically, an alterna-
A Distributed Algorithm for Centroid Detection 83
a. Error-free operation
b. Tolerance to incomplete thresholding (error-free results) 
c. Tolerance to incomplete edge detection (error-free results)
d. Tolerance to incomplete thresholding (reduced accuracy in results)
e. Tolerance to incomplete edge detection (reduced accuracy in results)
No thresholding Thresholding Edge detection Centre detection Signal propagation
Figure 4.8: Simulated results demonstrating the algorithms inherent tolerance to erroneous
(or incomplete) binary feature extraction. Shown are, response to: (a) perfect binary
extraction, (b,d) to incomplete thresholding and (c,e) to incomplete edge detection.
A Distributed Algorithm for Centroid Detection 84
tive technique to examine this added robustness is through statistical algorithmic simula-
tions.
Simulated Statistical Robustness
This involves experimentally testing (simulating) the algorithm for various input images
against the various expected sources of error (previously mentioned). To facilitate this, the
input image data is pre-processed to include various levels of pixel-array non-uniformities
and the algorithm is simulated to determine the outcome for each case. By normalising the
results to the image content, this process can be repeated many times, collating statistical
simulation results (in total 18,000 simulation runs are performed). The detailed statistical
procedure that is taken is as follows:
• Input image: In total ten diﬀerent input images have been used with diﬀerent sizes
and frequencies of circular objects. The images were chosen to have ﬂat object and
background texture in order to demonstrate the algorithmic response to array non-
uniformities rather than to non-idealities in image feature uniformity.
• Contrast ratio: Each image is then altered to three set illumination levels; i.e. contrast
ratios for object to background intensity ratios of 10:1, 4:1 and 3:2. Subsequently the
average image intensity was adjusted to be at 50% maximum illumination level as to
fairly capture eﬀects to positive and negative non-uniformities. Image contrast ratio
is the most crucial parameter when determining algorithmic robustness, therefore for
fair comparison, only images of identical contrast ratio should be aggregated.
• Noise types: The sample images are subjected to three types of noise: Gaussian,
speckle and salt pepper noise. These three types have been speciﬁcally chosen for their
resemblance to various array non-uniformities in hardware fabrication. In particular
(1) Additive (zero mean) Gaussian noise is used to simulate for photoreceptor oﬀsets
(ﬁxed pattern noise), (2) Speckle noise represents gain and sensitivity non-uniformities
of phototransduction and ampliﬁcation elements and (3) Salt and Pepper noise is used
to model dead or defective photoreceptors or in-pixel circuits with permanent low or
high response.
• Noise power: Each type of noise is generated for twenty preset levels of noise power to
cover 100% intensity spread, i.e. +/- 50% about the zero-noise intensity. It is these
noise levels that shall be used as a reference to algorithmic robustness.
A Distributed Algorithm for Centroid Detection 85
• Multiple simulation: In order to generate a suﬃcient amount of statistical data and
achieve a conclusive trend, each simulation run is repeated using ten diﬀerent noise
sets.
• Algorithm settings: For each image/contrast ratio a single set of edge and object
threshold levels are chosen based on the optimum dynamic range criteria expressed in
Eqns. 4.18 and 4.21.
• Post simulation analysis: The generated results are averaged (for repeated runs), nor-
malised to image content, i.e. object density (for diﬀerent images) and then averaged
again. The resulting data is then appropriately scaled to report the error in average
object count and size for the three predeﬁned contrast ratios.
These analysed results are illustrated in Fig. 4.9 for gaussian and speckle noise and
Fig. 4.10 for salt and pepper noise. Concluding from these results, the good robustness of
this algorithm is apparent. At no point does the algorithm cease to operate or “crash”,
however the accuracy becomes degraded with increased image noise. For typical ﬁxed
pattern noise levels [6] and medium contrast selectivity, an acceptable 2-5% inaccuracy can
be observed. Whereas for images with high contrast ratios, higher accuracy can be expected.
The algorithm also proves to be robust to defective pixel outputs, as indicated in the salt
and pepper noise results.
4.2.8 Accuracy
Due to the binary nature of this centroiding/sizing algorithm, the centroid accuracy for
regular objects, eg. circular objects, is limited to one pixel resolution. Similarly, radial
precision can be also calculated to single pixel accuracy. However for irregular objects this
centroid accuracy deteriorates. This algorithm is intended to provide a good estimate to
centroid position and size, not a mathematically accurate centre-of-mass computation. For
more precise object-centre determination, such a centre-of-mass/centre-of-gravity calcula-
tion can compute to multiple sub-pixel accuracy.
In general, the accuracy of perceptive vision processing tasks need not be highly accurate.
For these are primarily used to divert attention of additional parallel processes to resolve
the task to higher precision. For example, attention/saliency selection tasks typically have
poor spatial resolution but good temporal resolution and are subsequently succeeded with
processes of high spatial resolution.
A Distributed Algorithm for Centroid Detection 86
Figure 4.9: Statistical simulations to demonstrate robustness to array non-uniformities.
Algorithmic response to additive spacial gaussian noise representing ﬁxed pattern noise (top)
and random speckle noise representing array gain/sensitivity non-uniformity (bottom).
A Distributed Algorithm for Centroid Detection 87
Figure 4.10: Statistical simulations to demonstrate robustness to array phototransduc-
tion/ampliﬁcation defects. Algorithmic response to additive salt and pepper noise repre-
senting pixels with permanent low/high response.
A Distributed Algorithm for Centroid Detection 88
4.3 A Bio-inspired Paradigm for Parallel Processing
By taking this speciﬁc (centroiding) distributed algorithm and analysing the connectivity
and representation, a general architectural description can be derived. This method pro-
vides a formalisation to hardware implementation of intricate hybrid3 structures based on
common software techniques. Such techniques include recursion, erosion, dilation and back-
tracking; being only implementable in a sequential manner using conventional techniques.
Distributed processing can oﬀer signiﬁcant performance advantages in computational eﬃ-
ciency, robustness, speed and processing capacity.
4.3.1 The Concept
The underlying concept behind this computational paradigm is to combine two diﬀerent
processing arrays. The front-end for signal acquisition and binary feature extraction and
the back-end for binary spatiotemporal processing, the cross-connection being the extracted
binary feature template. By dissociating the binary signal processing from the continuous
signal conditioning in this way, advanced binary algorithms can be implemented within a
true parallel architecture.
4.3.2 The Architecture
This distributed architecture can be implemented in several integrated sensor acquisition
and processing applications requiring either one or two dimensional arrays.
This architecture can be therefore divided into the following ﬁve layers:
• Sensor acquisition layer (direct connectivity): This constitutes the sensor; input trans-
ducer including associated signal conditioning circuitry. For example, in an auditory
system this could include a microphone and a pre-ampliﬁcation stage. This layer
could also include an oﬀ-array signal acquisition structure, for example interfacing to
a microelectode array.
• Signal processing layer (direct and lateral connectivity): This includes continuous
3Hybrid refers to a system using multiple forms of data representation for parallel processing.
A Distributed Algorithm for Centroid Detection 89
value spatiotemporal ﬁltering including averaging, diﬀerentiating, integrating, nor-
malising, etc. for preparation of binary extraction.
• Binary extraction layer (direct connectivity): Involves some thresholding functionality,
i.e. a tuneable one-bit converter. This binary feature extraction produces digital
signals (from analogue inputs) for driving the asynchronous logic array.
• Binary processing layer (direct and lateral connectivity): Forms a distributed asyn-
chronous logic network, with parallel inputs for binary signal processing and subse-
quently high-level feature extraction. Advanced algorithms can include feedback to
signal processing layer to achieve added adaptivity, tunability or accuracy.
• Array supervisory layer (lateral connectivity): Provides global inputs and control
to all cells and handshakes processed information oﬀ-array. Communication with
conventional processor can providing start data or provide search functionality.
This generalised distributed processing architecture is illustrated in Fig. 4.11.
4.3.3 Neurobiological analogy
The general architecture presented in fact shares several similarities to neurobiology and in
particular the visual system. The following analogies can be made for each of the above
mentioned layers:
• Sensor acquisition layer → Retinal photoreceptor layer: These neurons generate elec-
trical signals on absorption of photons of light and through local ampliﬁcation pro-
duces inputs suitable for the subsequent signal processing.
• Signal processing layer → Retinal bipolar, horizontal and amacrine layers: These bio-
logical cells form receptive ﬁelds producing parallel spatiotemporally-processed visual
streams for subsequent image-feature extraction.
• Binary extraction layer → Retinal ganglion cell layer: Through “integrate-and-ﬁre”
functionality this neural layer aggregates signal amplitude, converting to the temporal
domain as discrete pulse-frequency data, subsequently transmitted to the brain.
• Binary processing layer → Cortical primary visual cortex: V1 cells are believed to
have orientation selective functionality using localised interactions facilitating signal
propagation [7] in a similar manner to that intended in this binary processing layer.
A Distributed Algorithm for Centroid Detection 90
Spike
Domain
Processing
Sensor
1-bit
Converters
Analogue
Signal
Processing
Spike
Domain
Processing
Sensor
1-bit
Converters
Analogue
Signal
Processing
Spike
Domain
Processing
Sensor
1-bit
Converters
Analogue
Signal
Processing
Spike
Domain
Processing
Sensor
1-bit
Converters
Analogue
Signal
Processing
Spike
Domain
Processing
Sensor
1-bit
Converters
Analogue
Signal
Processing
Control and
Array Access
Unit
Control and
Array Access
Unit
Control and
Array Access
Unit
Control and
Array Access
Unit
Control and
Array Access
Unit
Sensor
Aquisition
Signal
Processing
Binary
Extraction
Binary
Processing
Array
Supervision
Figure 4.11: The generalised distributed processing architecture; a one dimensional array il-
lustrating the various functional layers and interconnectivity. Increased performance and/or
added functionality (eg. localised gain control, noise-shaping, oversampling, etc) could be
realised through closed loop conversion mechanisms (dotted arrows), although these would
be purely synthetic as oppose to biologically-inspired.
A Distributed Algorithm for Centroid Detection 91
• Array supervisory layer → Higher order cortical processes: Although structurally
diﬀerent, these provide high-order perceptive functionality such as saliency and at-
tention selection, very much performing a supervisory or control role over lower-order
processes.
4.3.4 Implementation
This architecture is best suited for implementation in CMOS technology. Possessing pho-
totransductive (photodiodes) or mechanical acoustic (MEMS) elements, CMOS technology
can oﬀer sensor integration; critical for such distributed processing platforms. Furthermore,
implementing this architecture using MOSFET devices operated in the weak inversion re-
gion, coupled with asynchronous digital logic provides the ideal combination for power eﬃ-
cient realisation. Power eﬃciency is extremely important when tessellating several thousand
(or more) cells on a single silicon substrate.
Application
A reconﬁgurable array processor would perhaps be the most suitable system based on
this architecture. Since logic is easily made reconﬁgurable and front-end binary feature
extraction functions can be made general, an FPGA-like vision processor could be useful
for a variety of applications. On the other hand, a reconﬁgurable system would be far
from being universally applicable. For well deﬁned tasks with speciﬁc algorithm a custom
implementation would achieve the optimum performance. The diﬃculty therefore lies in
developing the distributed algorithm to be implemented, as these are generally less-intuitive
to formulate.
4.4 Emerging technologies
A number of leading research establishments and semiconductor foundries are currently
investing in and developing next generation 3D CMOS technologies [8] [9]. These expect to
provide multiply stacked substrates with interconnecting vias; in a similar manner to metal
layers in current processes. Such a technology would have a huge impact on the imaging and
vision chip community. For example the exposed substrate layer can be speciﬁcally devoted
and tailored to imaging; with low substrate doping and near 100% ﬁll factor. Subsequently,
A Distributed Algorithm for Centroid Detection 92
the “underground” layers could contain the processing electronics in true 3D retinomorphic
arrangement [10].
Although this emerging technology is believed to revolutionise the semiconductor in-
dustry as a whole, it will only be truly beneﬁcial to a very small proportion. For example,
most high-performance digital processing systems currently limited due to power dissipa-
tion constraints will not be assisted by advances in such technologies. Therefore, foreseeing
ahead it would be advisable to develop those circuits and systems that will beneﬁt from such
future technologies. It is expected, systems involving distributed architectures; particularly
involving vision applications, will be amongst those to thrive the most in these emerging
technologies.
4.5 Summary
This chapter introduces the concept of the bio-pulsating contour reduction algorithm. This
is a parallel, distributed algorithm performing asynchronous object recognition breaking
the bottleneck of traditional, sequential von Neumann based computational paradigm. The
globally asynchronous scheme is regulated by employing data-generated local synchronisa-
tion, increasing computational eﬃciency and improving the signal-to-noise ratio. By incor-
porating the processing in the front end, the bandwidth requirements have been reduced
by at least four orders of magnitude.
Through developing a software equivalent technique, the computational complexity has
been estimated and by comparison to state-of-the-art DSP technology a target computa-
tional benchmark has been determined. Furthermore, it has been established that con-
ventional techniques are preferable for oﬀ-line processing applications, providing superior
accuracy. On the other hand, distributed techniques are more suited to realtime, high-speed
and high-resolution processing.
Analysis and simulation into algorithmic robustness have identiﬁed two crucial factors:
image content and array non-uniformities. Analysis has determined robustness to be highly
dependant on image type and content. Subsequently a method for deﬁning the suitability of
the algorithm to a certain image type has been outlined. Furthermore, statistical simulations
have tested the algorithm to diﬀerent types and levels of spatial noise and concluded very
good robustness to array non-uniformities.
A Distributed Algorithm for Centroid Detection 93
Finally a distributed processing platform has been outlined; based on the bio-pulsating
contour reduction algorithm. This extends the core architecture by generalising the vari-
ous sub-blocks, realising a platform suited to hardware implementation of a wide-range of
distributed algorithms.
References
[1] T. G. Constandinou, T. S. Lande and C. Toumazou, “Bio-pulsating architecture for
object-based processing in next generation vision systems,” IEE Electronics Letters,
vol. 30, no. 16, pp. 1169–1170, 2003.
[2] H. DeMan, F. Catthoor, G. Goosens, J. Vanhoof, J. Van Meerbergen, S. Note and J.
Huisken, “Architecture-driven synthesis techniques for VLSI implementation of DSP
algorithms,” Proceedings of the IEEE, vol. 78, no. 2, pp. 319–335, 1990.
[3] J. M. Rabaey and M. Pedram, Low Power Design Methodologies. Kluwer Academic
Publishers, 1995.
[4] F. Catthoor, S. Wuytack, E. De Greef, F. Balasa, L. Nachtergaele and A. Vandecap-
pelle, Custom Memory Management Methodology - Exploration of Memory Organisa-
tion for Embedded Multimedia System Design. Kluwer Academic Publishers, 1998.
[5] Texas Instruments, “TMS320C5000 Power Eﬃcient Digital Signal Processors (DSP’s)
Overview,” http://dspvillage.ti.com/docs/allproducttree.jhtml?pageId=C5, 2005.
[6] S. Kavadias, B. Dierickx, D. Scheﬀer, A. Alaerts, D. Uwaerts and J. Bogaerts, “A
Logarithmic Response CMOS Image Sensor with On-Chip Calibration,” IEEE Journal
of Solid-state Circuits, vol. 35, no. 8, pp. 1146–1152, 2000.
[7] Z. Li, “Visual segmentation by contextual inﬂuences via intracortical interactions in pri-
mary visual cortex,” Network: Computation in Neural Systems, vol. 10, no. 2, pp. 187–
212, 1999.
[8] J. Baliga, “Chips go vertical [3D IC interconnection],” IEEE Spectrum, vol. 41, no. 3,
pp. 43–47, 2004.
94
REFERENCES 95
[9] R. Islam, C. Brubaker, P. Lindner and C. Schaefer, “Wafer level packaging and 3D
interconnect for IC technology,” IEEE/SEMI Conference and Workshop on Advanced
Semiconductor Manufacturing, pp. 212–217, 2002.
[10] J. Georgiou and A. G. Andreou, “A mixed analog/digital asynchronous processor for
network models of cortical computation,” 9th International Conference on Cognitive
and Neural Systems, 2005.
Chapter 5
Photodiodes in Modern Deep
Sub-Micron CMOS Technology
5.1 Introduction
A major objective in implementing any vision processing system is monolithic integration
of imaging elements with processing circuits. Traditionally, the limiting factor has been
large area overhead for circuit implementation as past CMOS technologies (i.e. of larger
feature sizes) have generally yielded either low-resolution systems or systems of limited
in-pixel processing complexity. In modern technologies (i.e. sub-350nm minimum feature
size) however, this trend has been reversed and although the in-pixel circuits scale in size
and include higher complexity, the photodetection elements tend not to scale [1, 2] so
favourably. It is therefore a challenge in developing any vision processing system in modern
deep submicron CMOS technology is to achieve photodetection elements of performance
comparable with those available in past technologies.
This chapter reviews silicon-based phototransduction in CMOS technology speciﬁcally
concerning p-n photodiode implementation, developing an analytical model intended for
deep submicron technologies. Then a range of fabricated devices in a given technology
(0.18μm CMOS) are presented and the their measured results discussed and analysed, par-
ticularly concerning deep submicron technology related issues. Several key factors are then
identiﬁed and a generic set of design rules outlined for photodiode optimisation. Finally, dif-
ferent interface topologies are reviewed leading to a novel bio-inspired spiking photoreceptor
with a scheme for adaptable/tradable dynamic range, spatial and temporal resolution.
96
Photodiodes in Modern Deep Sub-Micron CMOS Technology 97
(a)
P-type N-typeIntrinsic
(b)
EFn
qφbi
EFp
EVn
ECn
ECp
EVp
Eg
Figure 5.1: The basic P-I-N diode under zero external bias illustrating: (a) cross-section
and (b) energy band diagram.
5.2 Photodiode Modelling
Much work has already gone into modelling and analysing silicon-based photodiode behav-
iour [3, 4, 5, 6, 7, 8, 9, 10]. This section aims to develop a comprehensive yet concise model;
useful in pre-fabrication design and optimisation of CMOS based P-N junction photodiodes,
particularly targeted towards implementation in deep submicron technologies.
5.2.1 Silicon-based Phototransduction
Traditionally, silicon-based photodiodes have been implemented using p-type/intrinsic/n-
type (P-I-N) structures. Phototransduction occurs when incident photon energy is absorbed
within the intrinsic region causing an electron-hole pair to split and collected resulting in a
photo-detectable current. This process can be described by its energy band diagram, shown
in Fig. 5.1.
Incident light radiation; of photon energy (hv) being greater than Eg (Eg(Si) = 1.1eV )
will result in excitation of an electron from the valence band (Ev) to the conduction band
(Ec). This process manifests itself as an electron-hole pair splitting and subsequently as it
will consume exactly 1.1eV, any excess energy will be dissipated thermally.
As the absorption requires the excitation of valence band electrons of a certain energy
Photodiodes in Modern Deep Sub-Micron CMOS Technology 98
and these are ﬁnite per unit volume, it is therefore dependant on the thickness of the
semiconductor. Given an incident photon ﬂux of Φ, penetrating a thickness of δx; causes a
change in photon ﬂux of ΔΦ, the absorption coeﬃcient is deﬁned as:
α = − ΔΦ
ΦΔx
(5.1)
Where α is the absorption coeﬃcient, Φ = (λ/hc)(Popt/A) is the photon ﬂux, Popt is the
incident optical power, A is the cross-section area and the negative sign represents an
attenuation. Integrating Eqn. 5.1 for illumination of constant wavelength, forms the Beer-
Lambert Law describing the transmitted photon ﬂux to decrease exponentially with depth:
Φ(x) = Φ0e−αx (5.2)
Where Φ0 is the initial photon ﬂux (i.e. at x=0) and x is the depth. Alternatively, expressing
the absorption coeﬃcient as a function of wavelength yields the relationship:
α ≈ 2ωni
c
=
4πni
λ
(5.3)
Where λ is the wavelength of incident radiation and ni is the imaginary part of the materials
refractive index. This expression in fact reveals a signiﬁcant relationship; that the longer the
wavelength, the deeper it can penetrate into a given material. Furthermore, the penetration
depth; derived from Eqn. 5.2 is deﬁned as the depth a given wavelength can penetrate until
it is attenuated by 63% (1/e) its original value. This is expressed as: y = α−1.
5.2.2 The P-N Junction Photodiode
As standard deep-submicron CMOS technologies typically do not include process steps for
creating deﬁned intrinsic layers for fabricating P-I-N photodiode structures, parasitic P-N
junctions are generally used for phototransduction. At the P-N junction, majority carrier
diﬀusion gives cause to a depletion layer being formed having pseudo-intrinsic properties and
therefore being suitable for photo-absorption; generating electron-hole pairs. This diﬀusion
is due to the majority carrier concentration gradient with holes diﬀusing from the p-type
Photodiodes in Modern Deep Sub-Micron CMOS Technology 99
region to the n-type and electrons vice versa. As a result, the region near the P-N junction
is depleted of majority carriers and hence the term depletion region (Fig. 5.2a,b).
Depletion Layer [3]
As the space charge region (depletion layer) has no free charge carriers, the depleted n-type
region has a net positive charge and the depleted p-type region has a net negative charge
(Fig. 5.2c). This gives rise to an internal junction “built-in” voltage, expressed in Eqn. 5.4.
φbi =
kT
q
ln
NAND
n2i
(5.4)
Where φbi is the junction built-in voltage, kT/q(= φt) is the thermal voltage, NA and ND are
the impurity acceptor and donor atom concentrations and ni is the intrinsic semiconductor
concentration.
Once determining this built-in potential (Fig. 5.2e), the width of the depletion region
can be determined.
xw =
√
2ε0εr(Si)
q
(
1
NA
+
1
ND
)
(φbi − vb) (5.5)
Where xw is the depletion width, ε0 is the permittivity of vacuum, εr(Si) is the relative
permittivity of silicon and vb is the applied bias. Consequently, the depletion region can
be made to increase in width by applying a reverse bias; the higher the bias, the wider the
depletion region.
Furthermore, this depletion region gives rise to an internal electric ﬁeld (Fig. 5.2d) due
to the static charge diﬀerential. This is crucial in separating photo-generated electron-hole
pairs and thus collecting the photocurrent. The magnitude of this internal electric ﬁeld at
the abrupt junction interface is expressed in Eqn. 5.6.
E0 = −qNDxwn
0r
= −qNAxwp
0r
(5.6)
Where E0 is the maximum internal electric ﬁeld and xwn and xwp are the depletion region
widths in the n- and p-type regions respectively.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 100
NA ND
n, p
x
(b)
qNA
qND
ρ(x)
x
(c)
E(x)
x
(d)
φ(x)
x-xwp xwn
(e)
(a)
P-type N-typeDepletion
Figure 5.2: The basic P-N junction diode under zero external bias illustrating: (a) cross-
section (b) n/p concentration proﬁle (c) space charge density (d) electric ﬁeld and (e)
internal potential
Photodiodes in Modern Deep Sub-Micron CMOS Technology 101
Photocurrent Density [11]
The electron-hole pair generation rate can be directly derived from the Beer-Lambert law
(Eqn. 5.2) giving:
G(x) = −dΦ(x)
dx
= αΦ0e−αx (5.7)
The photocurrent density (Jphoto) consists of two components: the drift current (Jdrift);
due to carriers generated within the depletion region and diﬀusion current (Jdiffusion); due
to carriers generated outside the depletion region. Assuming an initial photon ﬂux (Φ0) at
the edge of the depletion region, the expression for the drift current component becomes:
Jdrift = −q
∫ xw
0
G(x)dx = q (Φxw − Φ0) = −qΦ0
(
1− e−αxd) (5.8)
Where xd is the depletion layer width in the axis of the incident light radiation. For a
vertical junction this would be equal to the depletion layer width (i.e. xd = xw), however
for a lateral junction this would be equal to the junction depth (i.e. xd = xz).
The diﬀusion current consists of two sub-components; the reverse diﬀusion current den-
sity under dark conditions (Jdark) and the photogenerated contribution from the substrate
(Jdiff,vert and Jdiff,horiz for vertical and lateral diﬀusion). The expression for dark current
density can be derived from:
Jdark = Js =
qDnnpo
Ln
+
qDppno
Lp
(5.9)
Where Js is the saturation current density, Ln =
√
Dnτn and Lp =
√
Dpτp are the electron
and hole diﬀusion lengths, Dn = φtμn and Dp = φtμp are the n- and p-type diﬀusion
constants, μn and μp are the electron and hole mobilities and np0 and pn0 are the equilibrium
minority carrier densities in the n- and p-type regions.
Assuming that either the n- or p-type region is much more heavily doped, i.e. in the
case of a diode to the p-substrate (n+/p-), ND 
 NA and that Vr > 3φt, where Vr is the
reverse bias, Eqn. 5.9 can be reduced to:
Jdark  q
(√
Dn
τn
n2i
NA
+
nixw
τg
)
(5.10)
Photodiodes in Modern Deep Sub-Micron CMOS Technology 102
Where τg is the generation lifetime. [3]
The photogenerated diﬀusion contribution from the substrate can be expressed as one-
dimensional diﬀusion equations for vertical and horizontal junctions, given in Eqns.5.11 and
5.12 respectively.
Jdiff,vert = −qΦ0
(
αLp
1 + αLp
)
e−αxw (5.11)
Jdiff,horiz = −qΦ0
(
1− xxpi − xx
2Lp
)
(5.12)
Where Lp is the minority carrier diﬀusion length in the substrate, xxpi is the total photo-
diode width (x-pitch), xx is the junction width (lateral) and xj is the junction depth. The
geometric design parameters are illustrated in Fig. 5.4.
The total photocurrent can therefore be expressed separately for the vertical (Eqn. 5.13)
and horizontal (Eqn. 5.14) junctions, each consisting of its drift and diﬀusion components:
Iphoto,vert = xxyx (Jdrift,vert + Jdiff,vert + Jdark) (5.13)
Iphoto,horiz = 2xw (xx + xy) Jdrift,horiz + (xxpixypi − xxxy) Jdiff,horiz + 2xj (xx + xy) Jdark
(5.14)
Where xypi is the total photodiode length (y-pitch).
These expressions do not account for the fact that charge carriers generated near the
semiconductor surface are likely to recombine due to surface eﬀects. As a result, these
expressions are likely to evaluate higher current densities for short wavelength radiation
than real devices are expected to measure.
5.2.3 Photodiode Eﬃciency
Having deﬁned the photocurrent density for a P-N junction region, the photodiode eﬃciency
can be determined by including the device geometric and physical design parameters.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 103
The two main factors aﬀecting the overall photodiode (or quantum) eﬃciency are the
optical and dimensional eﬃciencies. The optical interface will generally attenuate and
redistribute the incident light radiation, whereas the internal silicon P-N junction structure
can be designed to optimally absorb and detect the remaining photon ﬂux.
Optical Transmission [12]
A crucial factor in determining a photodiodes overall eﬃciency is the transmission through
its optical interface. In CMOS technologies this generally includes a metal mask on top
metal layer; shielding non-photodetecting elements from incident light radiation and a
transparent passivation/ﬁeld-oxide layer acting as the air/coating/substrate interface. A
typical proﬁle of a modern CMOS technology is illustrated in Fig. 5.3.
Considering a substrate with refractive index ns on top of which a dielectric layer (of
refractive index nc and thickness t) is coated, light of wavelength λ strikes the coating
surface from the air (n0 = 1). The air/coating(SiO2) and coating(SiO2)/substrate(Si)
interface reﬂectances are given in Eqns. 5.15 and 5.16.
r0C =
nC − n0
nC + n0
(5.15)
rCS =
nS − nC
nS + nC
(5.16)
Where r0C is the complex amplitude reﬂection coeﬃcient for the air/coating interface and
rCS is the complex amplitude reﬂection coeﬃcient for the coating/substrate interface.
The propagation of the incident light within the coating introduces a phase shift, de-
scribed by:
φC = 4π
t
λ0
nC (5.17)
Where φC is the optical phase shift, λ0 is the incident radiation wavelength in vacuum and
t is the coating thickness.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 104
8KA
8KA
8KA
8KA
10KA
8KA
7KA    
3KA    (SiO2) = 4.1
M6
8KA
2KA
4KA
Poly 2KA
M1
8KA
4.8KAM1M1
M2
8KA
5.8KAM2M2
M3
8KA
5.8KAM3M3
(SiO 2) = 3.5
M6 M6 8.6KA / 20.6KA
PE-Si3N4 : 7KA = 7.5
     PSG : 5KA  = 3.9
Poly 2KA
AI
V
AI
V
AI
V
t
c
at
n
o
C
M4
8KA
5.8KAM4M4
AI
V
3KA
6.8KA
6KA
6KA / (SiO 2) = 3.5
6KA / (SiO 2) = 3.5
6KA / (SiO 2) = 3.5
6KA / (SiO 2) = 3.5
7.8KA
7.8KA
7.8KA
6KA
6KA
6KA
6KA
M4M5 5.8KAM5
8KA
AI
V IMD : 
MMC Metal
350A
1.15 KA
(SiO2) = 4.1
(SiO2) = 4.1
(SiO2) = 4.1
(SiO2) = 4.1
(SiO2) = 4.1
(SiO2) = 4.1
(SiO2) = 3.5
(SiO2) = 3.5
(SiO2) = 3.5
(SiO2) = 3.5
(SiO2) = 4.1
(SiO2) = 3.5
Figure 5.3: Cross section of a typical modern deep submicron CMOS technology showing
the stacked metal layers with insulating dielectrics and optical transmission path (right).
Photodiodes in Modern Deep Sub-Micron CMOS Technology 105
Ln
Lp
xw xz
xy
xx
xw
Ln Lp
xxpi
xypi
xxpi
Figure 5.4: Basic geometric dimensions (design and technology/bias deﬁned) for a single
junction device representing the surface (left) and cross-section (right) views.
Assuming that all the materials are lossless and that the substrate is semi-inﬁnite, the
reﬂection coeﬃcient can be expressed as [13, 8]:
r =
∣∣∣∣ r0C + rCSeiφC1 + r0CrCSeiφC
∣∣∣∣
2
= 1−
( (
1− r20C
) (
1− r2CS
)
1 + r20Cr
2
CS + 2r0CrCScosφC
)
(5.18)
Where r is the eﬀective reﬂectance of air/coating/substrate optical interface. Subsequently
the optical eﬃciency (η0) is deﬁned as the transmission, i.e. ηo = T = 1− r.
Geometric Substrate Utilisation
The second factor aﬀecting overall photodiode eﬃciency, for a given illumination area is the
geometric substrate utilisation; dependant on depletion layer and diﬀusion region volume,
depth and exposed surface.
Any three dimensional p-n photodiode structure can be modelled as two parallel de-
vices; one representing the vertical junctions and one representing the horizontal (lateral)
junctions. The basic geometric dimensions for a single junction device is illustrated in
Fig. 5.4
Vertical pn-junction eﬃciency can be determined from the derived photocurrent ex-
pression (Eqn. 5.13) by including the absorption eﬀect for a junction beneath the surface.
Similarly, horizontal pn-junction eﬃciency can be determined based on the derived lateral
photocurrent expression (Eqn. 5.14) assuming the remaining substrate region (i.e. outside
Photodiodes in Modern Deep Sub-Micron CMOS Technology 106
the junction area but still within the photodiode “allocation”) can contribute to the lateral
diﬀusion current.
External Quantum Eﬃciency
The external quantum eﬃciency deﬁnes the amount of electron-hole pairs contributing
towards the photocurrent for every incident photon. The external quantum eﬃciencies
have been determined separately for each surface of the pn-junction, combined to give the
resultant photocurrent.
The external quantum eﬃciency for a vertical pn-junction (Eqn. 5.19) is the classic
expression often used in photodiode designs with relatively low edge eﬀect (high vertical to
lateral ratio).
ηvert = (1−R) ζ
(
1− e
αxw
1 + αL
)
e−αxz (5.19)
Where ηvert is the external and ζ is the internal quantum eﬃciency for the vertical surface
of a pn-junction, 1−R is the optical eﬃciency and xz is the junction depth.
The lateral external quantum eﬃciency is deﬁned using separate expressions for the
lateral drift (Eqn. 5.20) [11] and diﬀusion (Eqn. 5.21) contributions.
ηhoriz,drift = (1−R) ζ
(
1− e
α(xz+xw)
1 + αL
)
(5.20)
ηhoriz,diff = (1−R) ζ
(
1− xxpi − xx
2L
)
(5.21)
Where ηhoriz,drift and ηhoriz,diff are the external (drift and diﬀusion) and ζ is the internal
quantum eﬃciency for the horizontal (lateral) surface of a pn-junction.
These quantum eﬃciency expressions can then be combined by considering the pro-
portion of incident photon ﬂux on each junction depletion (or diﬀusion) region, yielding
Eqn. 5.22.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 107
ηeff =
(
(xxxy) ηvert
xxpixypi
)
+
(
2xw (xx + xy) ηhoriz,drift
xxpixypi
)
+
(
(xxpixypi − xxxy) ηhoriz,diff
xxpixypi
)
(5.22)
Where xx is the junction width, xy is the junction length, and xw is the depletion width.
This assumes that all exposed substrate area is utilised either by vertical or horizontal pn
junction depletion layer.
Photoresponse
Subsequently, the device responsivity provides an expression for the photoresponse per unit
irradiance by including the electronic charge and photon energy.
R =
ηeff · q
hv
=
ηeff · λ
1.24
(5.23)
Where R is the responsivity and λ is the incident radiation wavelength (in μm).
Iφ = R ·Qλ + Jdark [xxxy + 2xz (xx + xy)] (5.24)
Where Iφ is the photocurrent generated by incident (monochromatic) radiation of Qλ irra-
diance (W/m2) and Jdark is the dark current density deﬁned in Eqn. 5.10.
The photodiode capacitance can be reduced to the junction depletion layer capacitance
including both base and sidewall junctions; expressed in Eqn. 5.25.
Cj =
ε [xxxy + 2xz (xx + xy)]
xw
(5.25)
Where Cj is the reverse bias junction capacitance. As previously mentioned, the depletion
width can be modulated by reverse bias (Eqn. 5.5), thus the higher the reverse bias, the
wider the depletion region and consequently the smaller the junction capacitance.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 108
5.3 Photodiode Characterisation
5.3.1 Technology
The test technology for the photodiode characterisation is UMC 0.18μm 1P6M MM/RF
CMOS [14].
This process has the following standard features; useful utilisation of which can assist
the photodiode designer to engineer optimally performing devices:
• Substrate processing: non-epitaxial p-substrate with masks for n-well, p-well, t-well
(within n-well) and n++/p++ diﬀusion formation. Doping proﬁle available on request
(crucial for determining depletion region).
• Passivation: High density plasma (HDP), poly-silicon glass (PSG), passivation en-
hanced silicon nitride (PESIN).
• Interconnect: Pitch 0.42μm for poly, 0.48μm for metal 1, 0.58μm for metal 2 to 5,
2.2μm for metal 6. Thickness 2.0kA for poly, 4.8kA for metal 1, 5.8kA for metal 2 to
5, 20.6kA for metal 6.
5.3.2 Device Design
A total of fourteen (14) diﬀerent test (photodiode) devices have been designed and fabricated
in the above mentioned technology (mentioned in section 5.3.1). These can be grouped into
three categories with the following objectives:
• Single junction topologies (4 devices): To test single junction vertical pn-junction
structures and thus validate vertical junction model. Furthermore the eﬀect of sub-
strate doping and reverse bias can be examined without inﬂuence of side eﬀect (due to
side-walls). These devices have the side-walls (optically) shielded to ensure measured
photoresponse is only a measure dependant on vertical junction eﬃciency. Dark cur-
rent however will be inﬂuenced by the side-wall areas contributing diﬀusion current.
• Multiple paralleled (lateral) junction topologies (7 devices): To test multiple vertical
and lateral pn-junction structures and thus empirically determine relative vertical
and horizontal junction eﬃciencies. Furthermore, by testing diﬀerent well/substrate
Photodiodes in Modern Deep Sub-Micron CMOS Technology 109
collection schemes, the inﬂuence of diﬀusion current and recombination eﬀects can be
observed.
• Multiple stacked (vertical) junction topologies (3 devices): To test spectral response
due to selective absorption based on photon energy, i.e. wavelength, at diﬀerent junc-
tion depths. This will provide veriﬁcation if such structures are feasible in standard
CMOS deep-submicron technologies. It is important to note that these devices are
not intended for use as phototransistors and will be tested such that all junctions
remain reverse biased.
All these fabricated device topologies are illustrated (diagrammatically) in Figs. 5.5
(single junction devices) and 5.6 (multiple junction devices). Furthermore, actual chip
microphotographs of these structures are provided in Fig. 5.7. The design parameters for
these presented photodiode structures are provided in Table 5.1.
5.3.3 Device Measurements
Details of the equipment setup and measurement procedure for electrical and optoelectronic
characterisation are provided in Appendix E.
As all devices are built in the same process, they all have a similar optical eﬃciency and
therefore any diﬀerences are due to diﬀerent semiconductor structures, not due to optical
transmission transfer.
Electrical Response
Measured electrical characteristics yielding a series of current-voltage relationships (per
device) under diﬀerent irradiance levels are illustrated in Figs. 5.8 and 5.9. The presented
measurements are for a maximum irradiance of 0.45mW/cm2 at λ = 550nm. Furthermore,
dark current characteristics are included to quantify dynamic range; to the tested optical
signal power and can subsequently determine responsivity (discussed next).
Furthermore, a summary of extracted electrical characteristics is provided in Table 5.2.
Measured dark current values are higher than expected partly due to equipment limitations,
i.e. the minimum reliably measurable current is of the order of 100fA (for the equipment
used). Other parameters extracted are open-circuit voltage, short-circuit current, ﬁll factor
Photodiodes in Modern Deep Sub-Micron CMOS Technology 110
N++P++ N-Well (P+)P-Well (P+)P-Substrate (P-) T-Well (P+)
(a) (b)
(c) (d)
(e) (f )
(g) (h)
(i) (j)
Figure 5.5: Various single-junction photodiode structures fabricated and tested in 0.18μm
CMOS. Illustrated are the surface and cross-section views of the following structures: (a)
n++/p-substrate (b) n++ rings/p-substrate (c) n++/p-well (d) n++ rings/p-well (e) n++
strips/p-well (f) n-well/p-substrate (g) n-well strips/p-substrate (h) n-well grid/p-substrate
(i) n-well mesh/p-substrate and (j) p++/n-well. All devices are sized 30μm × 30μm and
illustrations not to scale.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 111
N++P++ N-WellP-WellP-Substrate T-Well (P+)
(a)
(d)(c)
(b)
Figure 5.6: Various multi-junction photodiode structures fabricated and tested in 0.18μm
CMOS. Illustrated are the surface and cross-section views of the following structures:
(a) t-well/n-well/p-substrate (b) t-well grid/n-well/p-substrate (c) n++/t-well/n-well/p-
substrate and (d) p++/n-well/p-substrate. All devices are sized 30μm × 30μm and illus-
trations not to scale.
(a) (b) (c) (d) (e) (f )
(g) (h) (i) (j) (k) (l)
Figure 5.7: Microphotographs of photodiode structures fabricated and tested in 0.18μm
CMOS. Illustrated are: (a) n++/p-well (b) n++ rings/p-well (c) n++ strips/p-well
(d) n-well/p-substrate (e) n-well strips/p-substrate (f) n-well grid/p-substrate (g) n-well
mesh/p-substrate, (h) p++/n-well, (i) p++/n-well grid, (j) t-well/n-well/p-substrate (k)
t-well grid/n-well/p-substrate and (l) n++/t-well/n-well/p-substrate. All devices are sized
30μm× 30μm.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 112
Junction structure Active area 1 Active perimeter1 Junction Junction
doping2 depth
Single junction devices (single isolated terminal, shared substrate)
n++/p-sub 673.4μm2 - 1019/8×1014 0.18μm
n++ rings/p-sub 71.3μm2 300.0μm 1019/8×1014 0.18μm
n++/p-well 673.4μm2 - 1019/1017 0.18μm
n++ rings/p-well 71.3μm2 - 1019/1017 0.18μm
n++ strips/p-well 91.4μm2 381.5μm 1019/1017 0.18μm
n-well/p-sub 673.4μm2 - 2×1017/8×1014 1.80μm
n-well strips/p-sub 322.8μm2 274.7μm 2×1017/8×1014 1.80μm
n-well grid/p-sub 411.4μm2 384.0μm 2×1017/8×1014 1.80μm
n-well mesh/p-sub 324.7μm2 921.6μm 2×1017/8×1014 1.80μm
p++/n-well 509.8μm2 92.0μm 1019/2×1017 0.20μm
p++ strips/n-well 381.0μm2 789.1μm 1019/2×1017 0.20μm
Double junction devices (two isolated terminals, shared substrate)
(t-well/n-well/p-sub) device
t-well/n-well 446.8μm2 83.7μm 6×1017/2×1017 1.20μm
n-well/p-sub 578.3μm2 95.9μm 2×1017/8×1014 1.80μm
(t-well grid/n-well/p-sub) device
t-well grid/n-well 431.3μm2 384.0μm 6×1017/2×1017 1.20μm
Triple junction devices (three isolated terminals, shared substrate)
(n++/t-well/n-well/p-sub) device
n++/t-well 329.5μm2 71.9μm 1019/6×1017 0.18μm
t-well/n-well 432.0μm2 81.0μm 6×1017/2×1017 1.20μm
n-well/p-sub 567.3μm2 91.9μm 2×1017/8×1014 1.80μm
1 Including only optically exposed active junction area/perimeter.
2 Assuming abrupt junction; valid due to high junction doping diﬀerential.
Table 5.1: Design parameters (process deﬁned and geometric) for the various test (pho-
todiode) structures.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 113
(a) (b)
(c) (d)
(e) (f )
Figure 5.8: Measured IV characteristics of various test (photodiode) structures (using cal-
ibrated light source: λ=550nm, Pmax=0.45mW/cm2). Shown are the characteristics for:
(a) n++/p-substrate (b) n++ rings/p-substrate (c) n++/p-well (d) n++ rings/p-well (e)
n++ strips/p-well and (f) n-well/p-substrate.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 114
(a) (b)
(c) (d)
(e)
Figure 5.9: Measured IV characteristics of various test (photodiode) structures (using cal-
ibrated light source: λ=550nm, Pmax=0.45mW/cm2). Shown are the characteristics for:
(a) n-well strips/p-substrate (b) n-well grid/p-substrate (c) n-well mesh/p-substrate (d)
p++/n-well and (e) p++ strips/n-well.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 115
Junction structure Open-circuit Short-circuit Fill Shunt Dark Dynamic
voltage current factor resistance1 current23 Range 34
n++/p-sub 0.284V 211pA 64.52% 75GΩ 440fA 53.6dB
n++ rings/p-sub 0.317V 428pA 65.05% 50GΩ 440fA 59.8dB
n++/p-well 0.148V 29.98pA 25.81% 4.4GΩ 28.9fA 60.3dB
n++ rings/p-well 0.385V 180.3pA 69.35% 13GΩ 49.9fA 71.2dB
n++ strips/p-well 0.356V 42pA 55.17% 40GΩ 580fA 37.2dB
n-well/p-sub 0.295V 270pA 60.65% 12GΩ 680fA 52.0dB
n-well strips/p-sub 0.336V 1.30nA 67.84% 21GΩ 130fA 80.0dB
n-well grid/p-sub 0.341V 1.27nA 65.60% 12GΩ 130fA 67.0dB
n-well mesh/p-sub 0.345V 1.11nA 63.85% 20GΩ 570fA 65.8dB
p++/n-well 0.277V 120pA 57.66% 60GΩ 840fA 43.1dB
p++ strips/n-well 0.311V 544pA 66.16% 33GΩ 380fA 63.1dB
1 Measured over 0 ≤ Vbias ≤ 50mV
2 Dark current measured at: Vbias = 0V
3 Limited by resolution of current measurement: 10−13A
4 Dynamic range for given light power density: 0.45mW/cm2
Table 5.2: Measured electrical characteristics for the various test (photodiode) structures.
Light source is calibrated at: Plight=0.45mW/cm2, λ=550nm.
and shunt resistance. The open-circuit voltage V oc is identiﬁed as the voltage across the
illuminated device at zero current. This is useful when the device is to be used in photo-
voltaic mode. Similarly, the short-circuit current Isc, is the current through the illuminated
device if the voltage across it is zero. This gives a measure of the device responsivity and
subsequently its quantum eﬃciency. The ﬁll factor (not to be confused with surface area
ﬁll factor) of the device is deﬁned as the ratio of the maximum power of the device to the
product of the open-circuit voltage and short-circuit current.
Light Intensity Response
For a maximum irradiance of 0.45mW/cm2 at λ = 550nm (under zero bias), the measured
responsivity (photocurrent versus irradiance) is illustrated in Fig. 5.10. This has been
extracted from the IV curves presented previously.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 116
Figure 5.10: Measured responsivities of various test (single junction photodiode) structures
(using calibrated light source: λ=550nm, Pmax=0.45mW/cm2).
The results show excellent linearity over 2-3 orders of magnitude, although the photo-
diodes complete linear range is expected to be 4-5 orders of magnitude. The reason the
responsivity has not been measured to cover a wider range is due to a limited set of Neu-
tral Density (ND) ﬁlters within the light source attenuator (See Appendix E). Speciﬁcally,
there was no intermediate ﬁlter between ND2.04 and ND5.07, where the latter results in
photocurrent levels in the region of 10-500fA; at the low-end current measurement resolution
of the test equipment setup.
Spectral Response
Spectral characterisation is conducted to a calibrated irradiance for incident light of wave-
length λ = 350nm to λ = 750nm at Δλ = 5nm increments.
• Spectral eﬃciency (single junction devices): Measured results for spectral photore-
sponse, responsivity and absolute external quantum eﬃciency are presented in Figs. 5.11,
5.12 and 5.13 respectively. Generally the devices utilising minimally doped semicon-
ductor junctions tend to most eﬃcient, i.e. those devices collecting onto the substrate.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 117
Figure 5.11: Measured spectral photoresponse of single junction photodiode structures (us-
ing controlled light source: 350nm < λ < 750nm).
Furthermore, devices with increased lateral junction area show to signiﬁcantly improve
device eﬃciency. Devices with single (vertical) junctions are the least eﬃcient, how-
ever they have other favourable properties (discussed below).
• Spectral eﬃciency (multiple junction devices): Measured results for spectral pho-
toresponse, responsivity and absolute external quantum eﬃciency are presented in
Figs. 5.14, 5.15 and 5.16 respectively. Generally the deeper the junction, the higher
the responsivity as also observed in the single junction devices. Consequently long
wavelength response tends to exhibit higher eﬃciencies than shorter wavelengths.
• Spectrally selective devices (single junction): The normalised quantum eﬃciency re-
sults for devices that show spectral selectivity are presented in Fig. 5.17. Generally the
devices fabricated with single vertical junctions are the most spectrally selective. The
only exceptions are multi-junction (lateral) devices with relatively shallow junctions
to a relatively highly doped bulk.
• Spectrally selective devices (multiple junction): The normalised quantum eﬃciency re-
sults for multi-junction devices that show spectral selectivity are presented in Fig. 5.18.
Both devices tested demonstrate good spectral selectivity. Generally as expected,
Photodiodes in Modern Deep Sub-Micron CMOS Technology 118
Figure 5.12: Measured spectral responsivity of single junction photodiode structures (using
controlled light source: 350nm < λ < 750nm).
Figure 5.13: Measured spectral quantum eﬃciency of single junction photodiode structures
(using controlled light source: 350nm < λ < 750nm).
Photodiodes in Modern Deep Sub-Micron CMOS Technology 119
Figure 5.14: Measured spectral photoresponse of multiple junction photodiode structures
(using controlled light source: 350nm < λ < 750nm). Devices tested are: t-well/n-well/p-
sub (left) and n++/t-well/n-well/p-sub (right).
shallow junctions (< 0.5μm) are selective to short wavelength light, i.e. blue, whereas
deeper junctions (> 1.5μm) are selective to long wavelength light, i.e. red. The ac-
tual spectral selectivity is expected to be better than the presented results, as the
measurements were taken separately for each junction. Therefore, stray electron-hole
pairs generated at other junction interfaces may contribute to neighbouring junction
spectral performance. This would have the eﬀect of observing reduced spectral re-
sponsivity.
• Spectrally insensitive devices: The normalised quantum eﬃciency results for devices
that do not show spectral selectivity are presented in Fig. 5.19. Generally deep devices
with relatively high sidewall to base ratio are spectrally insensitive. The corresponding
normalised quantum eﬃciency curves show similar spectral proﬁles for a number of
such devices, exhibiting an almost ﬂat response from 500nm to 700nm allowing for
optical interference eﬀects.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 120
Figure 5.15: Measured spectral responsivity of multiple junction photodiode structures
(using controlled light source: 350nm < λ < 750nm). Devices tested are: t-well/n-well/p-
sub (left) and n++/t-well/n-well/p-sub (right).
Photodiodes in Modern Deep Sub-Micron CMOS Technology 121
Figure 5.16: Measured spectral quantum eﬃciency of multiple junction photodiode struc-
tures (using controlled light source: 350nm < λ < 750nm). Devices tested are: t-well/n-
well/p-sub (left) and n++/t-well/n-well/p-sub (right).
Figure 5.17: Measured spectral quantum eﬃciency (normalised) of spectrally selective single
junction photodiode structures (using controlled light source: 350nm < λ < 750nm).
Photodiodes in Modern Deep Sub-Micron CMOS Technology 122
Figure 5.18: Measured spectral quantum eﬃciency (normalised) of multiple-junction pho-
todiode structures (using controlled light source: 350nm < λ < 750nm). Devices tested
are: t-well/n-well/p-sub (left) and n++/t-well/n-well/p-sub (right).
Figure 5.19: Measured spectral quantum eﬃciency (normalised) of spectrally unselective
single junction photodiode structures (using controlled light source: 350nm < λ < 750nm).
Photodiodes in Modern Deep Sub-Micron CMOS Technology 123
(a) (b)
(c) (d)
Figure 5.20: Measured and simulated (based on developed photoresponse model) spectral
quantum eﬃciency comparison for: (a) n++/p-well, (b) n++ rings/p-well, (c) n-well/p-
substrate and (d) n-well strips/p-substrate.
5.4 Photodiode Results Discussion
5.4.1 Functional Analysis
A functional analysis of experimental data has been performed using the presented device
measurements and simulation results, based on the developed photodiode model. The com-
parisons between measured and simulated results for spectral (external) quantum eﬃciency
are illustrated in Fig. 5.20.
The results compare measured and simulated results by initially considering only the
Photodiodes in Modern Deep Sub-Micron CMOS Technology 124
drift current contribution (in simulation model) and subsequently also including the dif-
fusion contribution to provide a more realistic model. From the four examples; the de-
vices tested being: n++/p-well, n++ rings/p-well, n-well/p-substrate and n-well strips/p-
substrate, in three cases the diﬀusion current makes up a substantial proportion of the
total photocurrent (up to 50% or more). For the n-well/p-substrate device (Fig. 5.20c),
the diﬀusion current component constitutes only 10-15% or so of the total photoresponse;
accentuated for longer wavelength light. This can be explained by the fact that this device
has been designed to have no exposed (optically) sidewall region, thus there is no lateral
diﬀusion current contributing to the photocurrent (although it contributes to the dark cur-
rent). Thus the only diﬀusion current occurs at the vertical junction; principally below the
junction in the substrate. As this is a n-well junction, the region contributing diﬀusion
current would be located over 2μm beneath the semiconductor/coating (silicon dioxide)
interface. At this depth, only light of longer wavelength (beyond 550nm) would be able to
be absorbed.
Concerning spectral selectivity, the simulated results generally conform to the measured
data; following a similar trend. The only exception is the n++ rings/p-well junction device
(Fig. 5.20b), where measured photoresponse is approximately only 40% of the expected
value for short wavelengths (350nm-450nm). However, above 450nm the measured and
simulated results conform reasonably well. This discrepancy can be attributed to recom-
bination just below the semiconductor/coating (silicon dioxide) interface due to surface
eﬀects. The reason this appears accentuated is because this device is designed with very
shallow junctions and a very high lateral to vertical area ratio; thus virtually all photore-
sponse is expected to be due to lateral diﬀusion near the surface. One might expect the
n++/p-well device (Fig. 5.20a) to suﬀer from similar eﬀects, however this is not the case
as it contains a single junction with no exposed (optically) lateral junctions. Subsequently
most the diﬀusion current is below the junction into the well and therefore has no direct
route to the semiconductor surface.
The measured and simulated spectral proﬁles reveal the optical interference eﬀect is
more intricate than expected. The model includes a single air/coating/substrate interface
and determines the interference eﬀect due to the coating thickness; considered a single in-
terface. However, in practice the optic coupling between air and substrate involves many
embedded dielectric layers; evident from the interconnect cross section (Fig. 5.3). Sub-
sequently, although primary eﬀects have been successfully modelled, the measured results
tend to suggest that there are additional secondary processes further degrading the inci-
Photodiodes in Modern Deep Sub-Micron CMOS Technology 125
Figure 5.21: Scanning electron microscope (SEM) images of the ORASIS-P2 surface; il-
lustrating the passivation layer/air interface proﬁle. Cross-section through a photodiode
region shown in lower-right image.
dent light radiation. These are caused by internal absorption, reﬂections, refraction and
thus scattering within the layered coating; created during the planarisation process.
Furthermore, the actual fabricated device surface proﬁle substantially deviates from the
process data over the photodiode regions. This is due to these regions violating metal ﬁll
constraints required for uniform planarisation. As a result, the surface becomes slightly
indented with pit-like features at the photodiode openings. This can be clearly seen from
the electron microscope images, shown in Fig. 5.21. From these images the indentation is
measured to be 2.5μm.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 126
5.4.2 Impact of Technology Scaling on Photodiode Performance
As CMOS technology inevitably progresses and scales, it has many detrimental eﬀects on
CMOS technology being used in applications requiring photodetection and in particular
imaging. Technology scaling aﬀects imaging devices in the following areas:
• Sensitivity: Higher doping concentrations lead to reduced mobility and carrier life-
times. As a result the diﬀusion length is reduced thus reducing the diﬀusion current
contribution to the overall photoresponse. Furthermore, higher doping leads to re-
duction in depletion layer width; thus also decreasing the drift current contribution
to the overall photoresponse.
• Dark Current: Reduced diﬀusion length results in increased dark current density thus
further degrading the signal to noise ratio (SNR).
• Dynamic Range: Technology scaling results in reduced gate oxide thickness and thus
also reduced power supply voltage. For Active Pixel Sensor (APS) applications, this
means a reduction in maximum signal level, thus further limiting the overall dynamic
range.
• Spectral Response: Shallow drain/source diﬀusion provide structures capable of ab-
sorbing short wavelength light. As a result, spectral sensitivity tends to shift to shorter
wavelengths with technology scaling.
• Optical Interface: Increasing interconnect layers means more oxide surfaces and in-
creased thickness, therefore more internal interfaces thus increased interference eﬀects.
Reﬂection, refraction, diﬀraction and absorption cause scattering and attenuation
leading to reduced device eﬃciencies and increased inter-pixel cross-talk.
• Junction Capacitance: Increase doping results in reduced depletion region widths and
therefore increased junction capacitance.
5.4.3 Photodiode Design Recommendations
• For high device eﬃciencies:
– Design junction structures using minimally doped structures, typically substrate/well
diodes.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 127
– Use several small parallel-connected structures, i.e. to achieve high lateral to
vertical area ratio.
– Distribute substrate contacts throughout a multiple junction devices to maximise
collection eﬃciency.
• For increased optical eﬃciency:
– Select technology and/or option with minimum required metal layers.
– Select technology with anti-reﬂective coating (ARC) or post-process.
• For colour selectivity use single diode structures (vertical junction), optically shielding
the sidewalls (lateral junctions).
• Position metal interconnects near device perimeter to minimise shadowing and/or
diﬀraction eﬀects.
• Use high reverse bias for minimal capacitance.
• To minimise cross-talk:
– Use deep guard ring, typically a biased well if area permits, otherwise maximise
substrate contacts around the perimeter.
– Include perimeters throughout all metal layers connecting via interconnects,
forming a cage structure.
5.5 Interface Techniques
Interfacing to a photodiode is perhaps the most critical and important circuit within a vision
chip. Selecting the correct topology is crucial to overall system performance and success.
The most popular techniques are illustrated in Fig. 5.22.
5.5.1 Continuous-time Pixel
The simplest circuit and most popular (throughout the vision chip community) for convert-
ing a photocurrent into a voltage is the logarithmic sensor (See Fig. 5.22a,b) using a stacked
diode-connected MOS devices [15] biased in weak inversion by the photocurrent itself.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 128
Vout
Vout
(a)
Vbias
Vout
Vout
Reset
Row
Vout
(e)(d)(c)(b)
Figure 5.22: Various photodiode interface topologies. Shown are: (a) logarithmic sensor
using single MOS diode (b) logarithmic sensor using two series MOS diodes (c) adaptive
photoreceptor (d) active pixel sensor (APS) and (e) spiking photoreceptor
• Advantages: Small size (silicon area) and wide dynamic range.
• Disadvantages: High sensitivity to device mismatch and slow response in low light
conditions.
The continuous-time logarithmic sensor has been widely used throughout the vision
chip community. An popular variant of a continuous-time logarithmic pixel has been the
adaptive photoreceptor circuit (See Fig. 5.22c) [16]. This has incorporated a logarithmic
photoreceptor topology with a temporally adapting bias, thus altering its operating point
over time and achieving an impressive dynamic range.
5.5.2 Active Pixel Sensor (APS)
The photodiode interface topology used in all CMOS imaging systems is the APS organ-
isation (see Fig. 5.22d) [17]. The photocurrent is used to charge up a parasitic MOS ca-
pacitance that is periodically sampled and reset. This technique has several advantages:
linear transfer, controllable dynamic range and low sensitivity to device mismatch. The
main disadvantages are: the dynamic range cannot be set locally and it requires a clock.
• Advantages: Linear transfer characteristic, controllable dynamic range and low sensi-
tivity to device mismatch.
• Disadvantages: Global clock required and local adjustment of dynamic range has not
been achievable until recently [18, 19].
Photodiodes in Modern Deep Sub-Micron CMOS Technology 129
5.5.3 Spiking Pixel
An alternative technique is to output the result as a frequency (or pulse/spike rate). The
implementation is similar to the APS, except that the photocircuit self-resets itself (see
Fig. 5.22e) [20, 21]. By monitoring the integrating node by means of a comparator, the
reset switch can be activated after a set threshold has been surpassed.
• Advantages: Information encoded temporally (continuous time, discrete signal) there-
fore robust to device mismatch and noise pickup. Asynchronous technique; requires
no clock.
• Disadvantages: This method intrinsically has a slow response in low light condi-
tions. Recent developments to overcome this use a time-varying threshold [22] or
using ON/OFF encoding [23, 24]. Furthermore a spiking output generally requires
higher communication bandwidth, however, recent work has attempted to address this
by using single spike coding
5.6 An Adaptive-ON/OFF Spiking Photoreceptor
In this section is presented a spiking photoreceptor circuit [24] which is intended for use in
adaptable foveating vision chips. The ultimate aim is to realise an imaging device which can
electronically split its photosensor array into peripheral and fovea regions in a similar fash-
ion to the human eye. Specialisation of the visual ﬁeld would allow for high spatial or high
temporal resolution imaging with the possibility of high dynamic range. To this end a pulse
frequency modulated spiking photoreceptor has been developed which is capable of provid-
ing high dynamic ranges with power consumption similar to animal retina. The circuit is
based on the ON/OFF opponency algorithm used by the human eye to maintain high fre-
quency response at low light levels, while maintaining low power operation. The photosensor
surface ﬁll factor is kept to a maximum and power consumption are kept to a minimum.
This section discusses the algorithm, its implementation and simulated/measured results
describing its response and power consumption.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 130
5.6.1 A bio-inspired encoding scheme
The eye has around 100 million photoreceptors with an intensity detection range from
starlight to bright sunlight, and a typical video rate of 25 Hz. This remarkable capability is
rooted in the rhodopsin photocascade [25] in the rods and cones which are used to convert
incoming photons into electrical information.
Inorganic silicon photodiodes are capable of up to ﬁve orders of magnitude of dynamic
range. Usually however, only an 8-bit dynamic range per channel is implemented due to
the relatively high power consumption of higher bit-rate signal conditioning circuits. This
can under or over saturate scenes with large variations in image intensity. Early work by
Delbrck and Mead [16] led to an adaptive photoreceptor which could detect the contrast
regardless of overall light intensity. This structure along with edge detection algorithms
[26, 27] can be used to extract the salient information from the scene, at the expense of the
non-salient information.
It is however possible to use a spike rate encoding algorithm similar to that in ani-
mal vision such as the human eye [28, 29]. By changing from voltage or current space to
frequency space, it is possible to achieve wide dynamic ranges at lower power consump-
tion. Furthermore, such a scheme can provide adaptive functionality, trading between high
dynamic range and frequency response by means of adaptive spike counting.
The major drawback to any integrating system is the low frequency response for low
light intensities. Here again we can learn from nature by implementing complementary
ON and OFF channels [23, 30]. Using ON/OFF opponency, where ON-cells spike at high
frequency at high light levels, and OFF-cells spike at high frequency at low light levels,
there will always be an adequate frequency response, even at low light levels. Therefore
either the ON- or OFF-cell will always provide a high ﬁring rate and thus a fast frequency
response. A winner-takes-all type of circuit can be used to remove redundancy in the output
information stream.
A naive approach to the total spiking rate, before compression could be expected to be
given by:
fspike = DR×Nphotoreceptors × frefresh (5.26)
Photodiodes in Modern Deep Sub-Micron CMOS Technology 131
Where fspike is the output spiking rate, DR is the dynamic range, Nphotoreceptors is the total
number of photoreceptors range and frefresh is the refresh rate.
Therefore, assuming a dynamic range of 7 decades, 100x106 photoreceptors, and a refresh
rate of 25Hz, the output spiking frequency would be an enormous 2×1016 spikes per second.
Even with the 5 million or so axons; constituting the optic nerve, leading to the visual cortex,
this would be unobtainable.
The retina therefore carries out various algorithms to sort salient information from
non-salient information, and to compress information stream to the visual cortex. This
processing includes spatiotemporal ﬁlters [27], colour opponency and motion sensitivity.
The retina also specialises into the fovea and peripheral vision. The fovea contains a high
concentration of cones and is scanned across the ﬁeld of view to build up a high deﬁnition
image, whereas the peripheral vision concentrates on passing on fast motion information and
is important for object ﬁxation. The dynamic sensitivity in intensity is compressed using
the iris as a light intensity modulator. The division into regions of fast motion sensitivity
and high spatial sensitivity has been very successful in evolutionary terms. Most vertebrates
perform this type of processing to get round bandwidth restraints in sending visual signals
to the visual cortex. The fast temporal resolution is important for danger awareness, and
reaction time, while the spatial resolution allows greater understanding of ones environment.
Therefore, in developing such a biologically-inspired system, a technique is required for
dynamically changing the output of an imaging device from high temporal but low spatial
resolutions to low temporal but high spatial resolutions.
5.6.2 Photodiode Implementation
The measured current/intensity characteristics for the photodiode to be used can be seen
in Fig. 5.23. As previously seen, generally the psub/nwell or psub/n+ structures are most
eﬃcient. Internal quantum eﬃciency in these structures tend to be high. However external
quantum eﬃciencies tend to be lower due to ﬁll factor constraints coupled with coupling
losses due to the dielectric layers and the surface morphology of the CMOS chip around the
photodiode.
The response is linear for an incident power from 50pW to greater than 100nW . The
minimum detectable light is set by the dark current which is determined by the bias and the
ﬁxed pattern noise. While increasing bias increases the frequency response and quantum
Photodiodes in Modern Deep Sub-Micron CMOS Technology 132
Figure 5.23: Measured photodiode characteristics. The photodiode response is linear until
below 50pW.
eﬃciency, it also increases the dark current by a much greater factor. In our conﬁguration
the photodiode is reverse biased by 1.5V leading to a dark current of 4.8fAμm−2. Fixed
pattern noise is also a limiting factor as it can reach 1% (between pixels at 100μm proximity)
on the ampliﬁcation stage, but as will be discussed later, our asynchronous spiking regime
allows for some variance as we will discuss later. The photodiode internal quantum eﬃciency
is 76% for 530nm wavelength.
The current characteristics from this photodiode were used in the circuit simulation
for the spike generator. For the purposes of the circuit simulation we have used a current
variation of 1pA to 10nA corresponding to 25nWcm−2 to 2.6mWcm−2. These ﬁve decades
of intensity variation correspond to the diﬀerence between starlight and a well lit room. The
photocurrents on a seven pixel photoreceptor group can be added to give better dynamic
range in dark conditions. This can be seen in the seen in the system algorithm given in
Fig. 5.24.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 133
i
i
1-i ∫
− dt
c
i1
∫ dtc
i
tV>
tV>
1≥False
True
True OFF/ON
Reset
Photoreceptor 
Group
ON -OFF 
Opponency Integrate Fire & Reset
X=X+1 Y>
False
X=0
Select
True PTMPFM
Threshold
Winner
Take All Count & Select
Figure 5.24: Core algorithm per photoreceptor group includes: phototransduction,
ON/OFF opponency, spike generation, variable spike interval encoding and input selec-
tion.
5.6.3 System Algorithm
In previous neuromorphic vision chips pixel sizes have tended to be around a 100x100μm in
size with a photodiode ﬁll factor of around 10% [31]. This has tended to work against
creating imaging chips with high pixel densities. High pixel density therefore requires
eﬀective processing that does not take up large areas of the imaging array.
The basic system algorithm can be seen in Fig. 5.24. A single spike encoder is shared
between seven photodiodes, thus increasing the eﬀective ﬁll factor. The spike encoder
can take inputs from individual or all of the photodiodes using a switched arrangement.
This provides the system the ability to select between high spatial and high temporal
resolution, and decreases the silicon surface area required. The spike encoder takes the
selected photocurrent(s) and produces complimentary increasing and decreasing currents
to provide the ON and OFF channels, i.e. low photocurrents create high OFF-currents
and vice versa. The two channels then compete by integrating their currents into voltages
through capacitance. This voltage is released in the form of a spike once a trigger threshold
has been surpassed and the charge collected is reset. To reduce redundancy only the ﬁrst
spike, whether ON or OFF is released and both channels are reset. A complementary output
is sent to indicate an ON or OFF spike. Hysteresis is added to stop the circuit oscillating
between on and oﬀ when the light intensity is close to the threshold between light and dark
channels. This relatively high spiking response is then compressed by means of a counter.
On overﬂow, a spike interval encoded (SIE) output is signalled, resetting the counter and
selecting the next photodiode in sequence. The counting method enables reconﬁgurability
Photodiodes in Modern Deep Sub-Micron CMOS Technology 134
S
R Q
Q
Q
Q R
S
Phototransduction ON/OFF Selection
Spike Generation
Thresholding
ON/OFF Competition
Resetting
vonoff
vspike
vb2 vb2
vb2
in out
vb1
ibias
von voff
ion ioff
.5/.5 3/.5 6/.5
Q1 Q2 Q3 Q4 Q5
Q6
Q7 Q8 Q9 Q10
Q11 Q12
vtoff
vton
voutA
CpoffCpon
iphoto
Hysterisis
1/1 1/1
3/10 3/10 3/10 3/10 3/10
3/3
6/.18 6/.18
Figure 5.25: Circuit schematic of the adaptive-ON/OFF spiking photoreceptor block, oper-
ated oﬀ a 1.8v core supply (implemented in 0.18m CMOS). Illustrated is the basic scheme
for generating an adaptive-ON/OFF spiking output for a single photodiode input. Shown
at the bottom-left is the implementation of the thresholding comparator based on a scaled
cascade of current-limited digital inverters. Unless stated all devices have aspect ratio
(4/3)lmin for NMOS and (10/3)lmin for PMOS, with lmin being the technology minimum
feature size.
and tradability between dynamic range and temporal response by means of selecting the
overﬂow at diﬀerent stages.
5.6.4 Circuit Topology and analysis
The circuit schematic is given in Fig. 5.25. The pn photodiode is reverse-biased by the
diode-connected PMOS device Q1; forming a simple current mirror with devices Q2 and
Q3. To ensure good linearity, devices Q1-Q5 are sized such that they operate being biased
in weak inversion over the full input current (photocurrent) range. The photodiode reverse-
bias voltage is therefore given by:
Vphoto = VGS1 = nVT ln(Iphoto/I0) (5.27)
Photodiodes in Modern Deep Sub-Micron CMOS Technology 135
Where n is the slope factor, VT is the thermal voltage (= kT/q) and I0 is the device pre-
exponential current. For photocurrents of 100fA to 1nA, the photodiode reverse-bias is:
1.47 ≤ Vphoto ≤ 1.78 for Vdd = 1.80V . Subsequently Iphoto is mirrored by devices Q2 and
Q3 to source the ON-current (Ion) and ON-diﬀerence-current (Ion). The ON-diﬀerence-
current is generated by means of Kirchhoﬀ’s current law (KCL), i.e. by injecting the copied
ON-current (ID3) into a current sink (ID6), the diﬀerence can be determined. By sourcing
this diﬀerence through a PMOS current mirror, the OFF-current (Iioff = ID5) is generated.
Ioff = ID5 ≈ Ibias − Iphoto (5.28)
The ON and OFF currents are then used to create an increasing voltage, by means of
integrating these into the parasitic capacitance of their respective nodes. These capacitances
are given by:
Cpon = CDB2 + CGD2 + 3(CDB7 + CGD7) + CDB11 + CGD11 ≈ 1.3(CDB2 + CGD2) (5.29)
Cpoff = CDB5 + CGD5 + 3(CDB9 + CGD9) + CDB12 + CGD12 ≈ 1.3(CDB5 + CGD5) (5.30)
Where the predominant capacitance is due to the current sourcing devices (Q2, Q5) and the
reset switches (Q11, Q12). Therefore selecting reduced device widths for these devices can
reduce this capacitance for higher speed operation. The limiting factor to how much these
device widths can be reduced to depends on device matching for Q2, Q5 and RDS(on) for
Q11, Q12. For the designed values, i.e. W/L(Q2,Q5)=(3/10)μ and W/L(Q11,Q12)=(1/1)μ,
this node capacitance is 7.2fF. Therefore, the maximum spiking rate is given by:
fmax =
Ibias
CpoffVthreshold
(5.31)
Where Ibias = Iphoto(max) is the maximum ON or OFF current and Vthreshold is the com-
parator threshold voltage. The threshold comparators (shown in Fig. 5.25) are based on a
high-gain digital inverter cascade. By limiting and scaling the maximum current source to
each stage, a successively steeper edge is obtained and power consumption is minimized.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 136
For a three-stage cascade of minimum channel length devices, the optimum (for power con-
sumption) ﬁrst stage bias current is 2nA with a current ratio of 1:6:12 between stages. This
gives a comparator threshold voltage of 270mV on a 1.8V supply. Therefore using expres-
sion Eqn. 5.31 the maximum frequency of operation for a 1nA maximum photocurrent is:
fmax=(1n)/(7.2f)(270m)=514.4kHz.
By threshold detecting the integrating nodes Von and Voff , the ﬁrst channel (ON or
OFF) to reach threshold is collected through a logic OR operation. This provides the Vspike
output. A minimum spike width is then secured by using a digital monostable based on a
self-resetting RS ﬂip-ﬂop with inverter cascade. This is required to ensure a minimum pulse
width is asserted on the reset switched to reliably discharge the integrating nodes each inte-
gration period. An additional output is provided to specify whether the response is ON or
OFF by using an additional RS ﬂip-ﬂop to determine which channel is dominant. Hysteretic
feedback is provided to the threshold comparators by slightly increasing the threshold volt-
age by 5mV providing an approximately 10% lag on channel selection changeover to prevent
rapid channel toggling when the ON and OFF responses are comparable. The two outputs
Vspike and VON/OFF are also combined to provide a single spike polarity encoded (SPE)
signal VSPE by using a logic XOR operation.
5.6.5 Circuit Implementation
The adaptive-ON/OFF spiking photoreceptor circuit has been designed, implemented and
fabricated in a standard 0.18μm CMOS process. The single receptor layout is shown in
Fig. 5.26, the total silicon area being 880μm2. This would lead to a ﬁll factor of 52% with
30μm x 30μm photodiodes. However, it is possible to share the spike generator circuit
amongst multiple photodiodes to increase the ﬁll factor and/or photodiode density. In our
conﬁguration we envisage sharing the photoreceptor between a local hexagonal neighbour-
hood containing seven photodiodes, as given by Fig. 5.24.
5.6.6 Simulated and Measured Results
The circuit was simulated using the Cadence Spectre (5.0.33) simulator with BSIM 3v3.2
models for the MOS devices combined with a photodiode model derived from the measured
parameters, shown in Fig. 5.23. The simulation results for the individual ON and OFF
channels are shown in Fig. 5.27. The slow responses can be seen for the ON channel at low
Photodiodes in Modern Deep Sub-Micron CMOS Technology 137
P-N Photodiode
Threshold Comparators
Current
Mirrors
Digital Logic
Figure 5.26: Physical layout of the adaptive-ON/OFF spiking photoreceptor block, imple-
mented in UMC 0.18μm Mixed-mode CMOS. By area, the photodiode has a 52% ﬁll factor,
the threshold detectors occupy 18%, current mirrors 16% and asynchronous digital logic
14%.
light intensities and the OFF channel at high light intensities. The simulation results of the
competing ON/OFF channel spike generator are shown in Fig. 5.28. The response shows
good variation between light and dark over many orders of magnitude and the ﬁnal output
shows good distinction between ON and OFF channels. The hysteresis at the transitions
can also be clearly seen.
The spike interval at the maximum ﬁring rate is 2μs, corresponding to over 1MHz
in response when considering that the data is eﬀectively compressed by half. 500kHz is
suﬃcient to provide a 16-bit dynamic range at 5Hz refresh on a single pixel. As mentioned
previously, this maximum ﬁring rate is limited by the combined parasitic capacitance at the
integrating nodes as described by expressions 5.29, 5.30 and 5.31.
The power dissipation of this circuit can be expressed due to two sources; the continuous
current ﬂow in the current mirrors; the static power and the digital switching; the dynamic
power. The total current consumption is illustrated in Fig. 5.29. The quiescent (or static)
current consumption is approximately 3nA (dependant on bias, i.e. Istatic ≈ 2Ibias). The
dynamic current consumption is 370μA per 1.5ns spike in a 3μs window. Thus the energy
consumption per spike is 500fJ . Given the competition between the ON and OFF channels
the minimum frequency the circuit will operate at is 500Hz. In this regime the quiescent
power consumption is 5nW compared to 125pW for the spiking. However for most of the
operation at 5kHz to 500kHz it is the quiescent power consumption which will dominate.
Thus, averaging this quiescent power over the pulse train gives 20.5pJ of energy per spike,
Photodiodes in Modern Deep Sub-Micron CMOS Technology 138
Figure 5.27: Simulated transient analysis for the individual ON and OFF channel spike
generators. The waveforms shown- from top to bottom: (a) photocurrent (b) ON channel
charging response (c) ON channel spike output (d) OFF channel charging response (e) OFF
channel spike output.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 139
Figure 5.28: Simulated transient analysis for the combined ON/OFF channel spike genera-
tor. The waveforms shown- from top to bottom: (a) photocurrent (b) competing ON/OFF
charging response (c) spike output (d) ON/OFF channel selection (e) combined spike and
ON/OFF channel encoded output.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 140
Figure 5.29: Simulated transient analysis for illustrating power consumption proﬁle. The
waveforms shown- from top to bottom: (a) photocurrent (b) combined spike and ON/OFF
channel encoded output (c) current consumption and (d) integrated current consumption.
which is comparable to the bit-energy of 2− 20pJ/bit for the blow ﬂy retina [25]. An 8-bit
output therefore has a power equivalent of 5nW per pixel.
The fabricated adaptive-ON/OFF spiking photoreceptor circuit operates as expected.
The light intensity controlled frequency modulation can be clearly seen in the measured
results given in Fig. 5.30.
This relationship between light intensity and spiking rate has been measured for various
values of bias current, ranging from 1pA to 5nA. The response indicates there are two linear
regions of operation, the boundary condition being at a 300pA bias current. This in fact
agrees with the trend shown in the measured photo response of the individual photodiode,
shown previously in Fig. 5.23. Furthermore, the ON/OFF response is illustrated by a
positive or negative gradient in this relationship, i.e. the changeover points at the corners
of the graphs. It is observed that this ON/OFF changeover point can be tuned by adjusting
the bias current as would be expected. In reality only bias currents in the range 500pA
to 5nA would be used to utilise the ON/OFF compression most eﬃciently, i.e. ideally the
changeover point should be tuned to lie in the centre of input light intensity range. A high-
frequency OFF response can be traded with bias current and therefore power consumption.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 141
Figure 5.30: Measured photo-response results for the adaptive-ON/OFF spiking photore-
ceptor circuit. Illustrated is the spike rate to incident light power relationship for various
bias current levels. The action to shift the ON/OFF transition point can be clearly seen.
The light intensity incident on the chip is the equivalent of a well lit room.
Photodiodes in Modern Deep Sub-Micron CMOS Technology 142
Figure 5.31: Measured bias current tuning results for the adaptive-ON/OFF spiking pho-
toreceptor circuit. Shown (from top to bottom) is: (a) the incident light power ON/OFF
crossover point versus bias current and (b) the spike rate versus bias current for dark cur-
rent, i.e. zero incident light power.
The ON/OFF changeover intensity relationship with bias current is given in Fig. 5.31(a).
The observed deviation from linear ﬁt is due to two reasons in measurement procedure.
Firstly due to hysteresis, an increasing intensity changeover point would be diﬀerent to a
decreasing intensity changeover and this has not taken into account in the measured results.
Furthermore, due to the limited number of ND ﬁlters (12) used in deﬁning the intensity
variation, the changeover is measured to occur within a range rather than at an absolute
value. Subsequently as the changeover for a set bias is deﬁned by a range of two ND ﬁlter
values, the error margin is quite substantial. Thus to measure a more accurate relationship
either more ND ﬁlters are required, or an alternative method for intensity variation.
The dark current spiking rate versus bias current relationship is given in Fig. 5.31(b).
As expected, this gives a perfect linear relationship for bias currents in the range of 100fA to
10nA. Furthermore, the fact that only an OFF response is measured for bias currents down
to 100fA tends to suggest that the dark current of the photodiode biased in this circuit is
below 100fA. 100fA is much better than would be expected on the basis of the photocurrent
Photodiodes in Modern Deep Sub-Micron CMOS Technology 143
K
J Q
Q
Set
Reset K
J Q
Q
Set
Reset K
J Q
Q
Set
Reset K
J Q
Q
Set
Reset
K
J Q
Q
Set
Reset K
J Q
Q
Set
Reset K
J Q
Q
Set
Reset
S
R Q
Q
voutB
vpulse
vonoff
vspike
CtrlB
CtrlA
3-Bit to Octal converter
X8 X7 X6 X5 X4 X3 X2 X1
A2 A0A1
Set
Spike Counting (8-bit cascade)
Count Selection
Photodiode Multiplexing
U1 U2 U7 U8
U9 U10 U11
Photodiode selection switch control
Figure 5.32: Circuit schematic of the selective output encoder/controller block, operated oﬀ
a 1.8v core supply (implemented in 0.18μm CMOS). Illustrated is the spike counting cir-
cuitry for generating outputs of reconﬁgurable dynamic range, in addition to the photodiode
multiplexing control for selection between single and multi-pixel operation.
IV curve and allows for operation in excess of 13-bits.
5.6.7 Spike Interval Encoding
The asynchronous output of the protocol would make it in the ﬁrst instance very simple
to address and connect to a FPGA, DSP, Address Event Protocols [32], or other digital
logic. The maximum output frequency of spikes, given by Eqn. 5.26 would be 0.5MHz.
For ten sets of spike generators it would be 5MHz (70-140 photodiodes) and for a hundred
sets of spike generators it would be 50MHz (700-1400 photodiodes). Clearly to achieve the
equivalent of a 1 megapixel camera the maximum output frequency would reach 35−71GHz!
A 16-bit parallel bus could bring this down to a few GHz, but even this is too much for a
0.18μm process. Power consumption and switching noise would also get undesirably high.
Some form of information compression is therefore required. In the eye, each spike cab
convey multiple bits of information, which together with redundancy can be used to make
the spiking rate more sparse [33].
The challenge is therefore to compress the data by a factor of a few decades without
losing the fundamental asynchronous nature of the signal and allowing for addressing. The
simplest practical way of implementing compression is to use spike interval encoded modula-
Photodiodes in Modern Deep Sub-Micron CMOS Technology 144
tion where each photodiodes spike train are accumulated into a single spike whose temporal
width is modulated according to the spike frequency. The advantage here is multi-fold. A
single temporally wide pulse introduces less switching noise than trains of spikes. Secondly
the power consumption is highest on the rise and fall of the spikes. As the spiking circuit
is shared between multiple photodiodes, a further advantage is that the negative pulse in-
terval between spikes now indicates the switching to a new photodiode. Opposite polarity
between ON and OFF spikes can be used to indicate their state.
This spike interval encoding can be implemented using a standard asynchronous ripple
counter (see Fig. 5.32) based on a cascade of JK-type ﬂip-ﬂops (U1-8). Subsequently,
the output can selected, using the CtrlA signal from either Q(U6) or Q(U8) using a 1-
bit multiplexer, i.e. divide by 64 or 256. This output encodes the compressed data and is
additionally used to: (1) reset the ripple counter and (2) provide an input to the photodiode
selection control circuit. This consists of a 3-bit ripple counter with an octal output for
controlling the photodiode selection switches. As there are only seven photodiodes for
sequential selection, the 8th output is used to reset this counter. The CtrlB signal is
used to foveate the local photodiode cell by overriding the multiplexer by conﬁguring the
photodiodes in parallel for increased temporal resolution.
5.6.8 Contribution to Related Work
This section highlights how the presented work compliments related work developed to
similar technical goals. The contribution in this work is shown to be two-fold; both on a
proposed system architecture (implementation still ongoing) and on the front-end photore-
ceptor circuit. At each of these levels, a brief summary on state-of-the-art related research
is given followed by a rationale on how the presented work diﬀers.
Foveating Vision Chips
Considerable work has already gone into developing vision chips based on the foveal or-
ganisation of the animal retina. Early work by Wodnicki et al. [34, 35] and Sandini et
al. [36, 37, 38] produced polar arrays of increasing resolution towards the centre. More
recent work by Etienne-Cummings et al. [39] has combined foveation with visual smooth
pursuit tracking, acquisition saccadic control and centroiding. Azadmehr et al. [40] have
produced a system with a central (static) imaging array surrounded by a temporal response
Photodiodes in Modern Deep Sub-Micron CMOS Technology 145
(dynamic) border for controlling a pan-tilt system to track motion on the foveal region.
The proposed system takes an alternative approach; to implement a homogenous recon-
ﬁgurable array, such that the foveal region can be adjusted and moved dynamically. This
provides the ability to control both the size and position of foveal region electronically,
without need of mechanical actuators. Such a scheme is intended to achieve much swifter
pseudo-saccadic response to an electromechanical saccade.
Spiking Photoreceptors
Spiking photoreceptor circuits have evolved since the basic concept was introduced [20, 21],
not originally aimed for biologically-inspired vision chips. Along the same lines, Bermak et
al. [41, 42] continued to develop a number of Pulse-Width-Modulated (PWM) and Pulse-
Time-Modulated (PTM) based imagers. Another approach, due to the close resemblance to
neurobiology, inspired Kramer et al. [23, 30] to use a spiking scheme to encode a temporally
changing response with separate ON-increasing and OFF-increasing channels. Other devel-
opments in spiking photoreceptors included the use of current feedback to reduce energy
per bit [43, 44] and using adaptive reference thresholds to achieve object segmentation [45].
To further reduce power consumption time-to-ﬁrst-spike (TTFS) encoding was applied to
reduce redundant spiking [46, 22, 47].
As previously mentioned (Section 5.5.3) a fundamental limitation of using spike-encoding
scheme is slow response to low light levels. Qi et al. [46] addressed this issue by using a
time-varying threshold, i.e. an exponential sawtooth-like signal. Although this technique
proved to be successful in response-time, the beneﬁts of being asynchronous and having
linear response had been lost. Subsequently, this work has proposed an alternative scheme
to achieve fast response whilst maintaining linear response and asynchronicity, i.e. by
using an adaptable-ON/OFF encoded scheme. Furthermore, the ability to trade dynamic
range with temporal and spatial resolution is made possible due to combining the proposed
foveating architecture.
5.7 Summary
This chapter presents a uniﬁed model for a pn-junction photodiode implemented in CMOS
technology based on the underlying semiconductor physics. Measured results of fabricated
Photodiodes in Modern Deep Sub-Micron CMOS Technology 146
devices have validated the quantum eﬃciency expressions developed. Furthermore through
design, fabrication and veriﬁcation, several devices have been characterised in a deep sub-
micron process. Subsequently, some basic design rules for implementing pn-junction pho-
todiodes in deep submicron technologies have been outlined.
Finally, a biologically-inspired scheme to obtain optical information from vision chips has
been presented. The technique uses a ultra-low power (20pJ per spike) spiking photoreceptor
to output intensity information from a set of photodiodes. The scheme uses spike-interval
coding to encode the information asynchronously and therefore aims to reduce coupled
switching noise when distributed throughout a system.
References
[1] H.-S. Wong, “Technology and device scaling consideration for CMOS imagers,” IEEE
Transactions on Electron Devices, vol. 43, no. 12, pp. 2131–2142, 1996.
[2] T. Lule´, S. Benthien, H. Keller, F. Mu¨tze, P. Rieve, K. Siebel, M. Sommer and M.
Bo¨hm, “Sensitivity of CMOS based imagers and scaling perspectives,” IEEE Transac-
tions on Electron Devices, vol. 47, no. 11, pp. 2110–2122, 2000.
[3] S. M. Sze, Physics of Semiconductor Devices. Wiley, 1981.
[4] J. Geist and H. Baltes, “High accuracy modeling of photodiode quantum eﬃciency,”
Applied Optics, vol. 28, no. 18, pp. 3929–3939, 1989.
[5] C. Kittel, Introduction to Solid State Physics. Wiley, 1995.
[6] W. B. Leigh, Devices for Optoelectronics (Optical Engineering). Marcel Dekker, 1996.
[7] J. Singh, Electronic and Optoelectronic Properties of Semiconductor Structures. Cam-
bridge University Press, 2003.
[8] J. S. Lee, R. I. Hornsey and D. Renshaw, “Analysis of CMOS Photodiodes. I. Quantum
eﬃciency,” IEEE Transactions on Electron Devices, vol. 50, no. 5, pp. 1233–1238, 2003.
[9] J. S. Lee, R. I. Hornsey and D. Renshaw, “Analysis of CMOS Photodiodes. II. Lateral
photoresponse,” IEEE Transactions on Electron Devices, vol. 50, no. 5, pp. 1239–1245,
2003.
[10] O. Yadid-Pecht and R. Etienne-Cummings, eds., CMOS Imagers: From Phototrans-
duction to Image Processing. Kluwer Academic Publishers, 2004.
[11] G. I. T. D. S-C. Liu, J. Kramer and R. Douglas, Analog VLSI: Circuits and Principles.
The MIT Press, 2002.
147
REFERENCES 148
[12] A. Haapalinna, P. Karha and E. Ikonen, “Spectral Reﬂectance of Silicon Photodiodes,”
Applied Optics, vol. 37, no. 4, pp. 729–732, 1998.
[13] L. Polerecky, “Theoretical background for the measurements of refractive index and
thickness of a thin dielectric layer,” Internal Dublin City University Report, 1999.
[14] United Microelectronic Corporation (UMC), 0.18um Mixed Mode/RFCMOS Technol-
ogy 1.8V/3.3V 1P6M Electrical Design Rule (with Metal/Metal Capacitor Module),
1.2p2 ed., 2003.
[15] M. Mahowald, VLSI Analogs of Neuronal Visual Processing: A Sythesis of Form and
Function. PhD thesis, California Institute of Technology, Pasadena, California, 1992.
[16] T. Delbrck and C. A. Mead, “Analog VLSI phototransduction by continuous-time,
adaptive, logarithmic photoreceptor circuits,” Vision Chips: Implementing vision al-
gorithms with analog VLSI circuits, by C. Koch and H. Li eds., pp. 139–161, 1995.
[17] S. Mendis, S. E. Kemeny and E. R. Fossum, “CMOS Active Pixel Image Sensor,” IEEE
Transactions on Electron Devices, vol. 41, no. 3, pp. 452–453, 1994.
[18] S. Decker, D. McGrath, K. Brehmer and C. G. Sodini, “A 256x256 CMOS imaging ar-
ray with wide dynamic range pixels and column-parallel digital output,” IEEE Journal
of Solid-State Circuits, vol. 33, no. 12, pp. 2081–2091, 1998.
[19] O. Yadid-Pecht and A. Belenky, “Autoscaling CMOS APS with customized increase
of dynamic range,” IEEE International Solid-State Circuits Conference, pp. 100–101,
2001.
[20] W. Yang, “Image sensor array with threshold voltage detectors and charged storage
capacitors.” US Patent Number 5,214,274, 1993.
[21] W. Yang, “A Wide-Dynamic-Range, Low-Power Photosensor Array,” Proceedings of
IEEE International Solid-state Circuits Conference, pp. 230–231, 1994.
[22] L. Qiang and J. G. Harris, “A time-based CMOS image sensor,” Proceedings of the
IEEE International Symposium on Circuits and Systems, vol. 4, pp. 840–843, 2004.
[23] J. Kramer, “An on/oﬀ transient imager with event-driven, asynchronous read-out,”
IEEE International Symposium on Circuits and Systems, vol. 2, p. 165168, 2002.
REFERENCES 149
[24] T. G. Constandinou, P. Degenaar, D. Bradley and C. Toumazou, “An on/oﬀ spiking
photoreceptor for adaptive ultrafast/ultrawide dynamic range vision chips,” Proceed-
ings of the IEEE Workshop on Biomedical Circuits and Systems, vol. S1, pp. 6–9,
2004.
[25] P. Abshire and A. Andreou, “Capacity and energy cost of information in biological and
silicon photoreceptors,” Proceedings of IEEE Systems, vol. 89, no. 7, pp. 1052–1064,
2001.
[26] C. A. Mead and M. Ismail (eds.), Analog VLSI implementation of neural systems.
Kluwer Academic Publishers, 1989.
[27] K. A. Boahen, “A Retinomorphic Chip with Parallel Pathways: Encoding ON, OFF,
INCREASING, and DECREASING Visual Signals,” Kluwer Analog Integrated Circuits
and Signal Processing, vol. 30, no. 2, pp. 121–135, 2002.
[28] R. G. Smith, N. K. Dhingra, Y. H. Kao, and P. Sterling, “How eﬃciently a ganglion
cell codes the visual signal,” Proceedings of the 23rd Annual EMBS International Con-
ference, pp. 663–665, 2001.
[29] T. Delbruck and S. Liu, “A silicon early visual system as a model animal,” Vision
Research, vol. 44, no. 17, pp. 2083–2089, 2004.
[30] P. Lichtsteiner, T. Delbrck and J. Kramer, “Improved ON/OFF temporally diﬀeren-
tiating address-event imager,” Proceedings of the 11th IEEE International Conference
on Electronics, Circuits and Systems (ICECS), pp. 211–214, 2004.
[31] A. Moini, ed., Vision Chips. Kluwer Academic Publishers, 1999.
[32] T. Y. W. Choi, B. E. Shi and K. A. Boahen, “An ON-OFF Orientation Selective Ad-
dress Event Representation Image Transceiver Chip,” IEEE Transactions on Circuits
and Systems I: Regular papers, vol. 51, no. 2, pp. 342–353, 2004.
[33] D. K. Warland, P. A. Reinagel and M. Meister, “Decoding Visual Information from
a population of retinal ganglion cells,” Journal of Neurophysiology, vol. 78, no. 5,
pp. 2336–2350, 1997.
[34] R. Wodnicki, G. W. Roberts and M. D. Levine, “A foveated image sensor in standard
CMOS technology,” Proceedings of the IEEE Custom Integrated Circuits Conference,
pp. 357–360, 1995.
REFERENCES 150
[35] R. Wodnicki, G. W. Roberts and M. D. Levine, “A log-polar image sensor fabricated in
a standard 1.2um ASIC CMOS process,” IEEE Journal of Solid-State Circuits, vol. 32,
no. 8, pp. 1274–1277, 1997.
[36] G. Sandini, P. Questa, D. Scheﬀer, B. Diericks and A. Mannucci, “A retina-like CMOS
sensor and its applications,” Proceedings of IEEE Sensor Array and Multichannel Sig-
nal Processing Workshop, pp. 514–519, 2000.
[37] G. Sandini, J. Santos-Victor, T. Paidia and F. Berton, “OMNIVIEWS: direct omnidi-
rectional imaging based on a retina-like sensor,” Proceedings of IEEE Sensors, vol. 1,
pp. 27–30, 2002.
[38] A. Bernardino, J. Santos-Victor and G. Sandini, “Foveated active tracking with re-
dundant 2D motion parameters,” Robotics and Autonomous Systems, vol. 3, no. 4,
pp. 205–221, 2002.
[39] R. Etienne-Cummings, J. Van der Spiegel, P. Mueller and Z. Mao-Zhu, “A foveated sil-
icon retina for two-dimensional tracking,” IEEE Transactions on Circuits and Systems
II: Analog and Digital Signal Processing, vol. 47, no. 6, pp. 504–517, 2000.
[40] M. Azadmehr, J. P. Abrahamsen and P. Haﬂiger, “A Foveated AER Imager Chip,”
IEEE International Symposium on Circuits and Systems, pp. 2751–2754, 2005.
[41] A. Bermak, “A CMOS imager with PFM/PWM based analog-to-digital converter,”
IEEE International Symposium on Circuits and Systems, vol. 4, pp. 53–56, 2002.
[42] A. Kitchen, A. Bermak and A. Bouzerdoum, “PWM digital pixel sensor based on asyn-
chronous self-resetting scheme,” IEEE Electron Device Letters, vol. 25, no. 7, pp. 471–
473, 2004.
[43] E. Culurciello and R. Etienne-Cummings, “Second generation of high dynamic range,
arbitrated digital imager,” IEEE Journal of Solid-State Circuits, vol. 38, no. 2, pp. 281–
294, 2003.
[44] E. Culurciello and R. Etienne-Cummings, “Second generation of high dynamic range,
arbitrated digital imager,” Proceedings of the IEEE International Symposium on Cir-
cuits and Systems, vol. 4, pp. 828–831, 2004.
[45] W. P. Lee, C. T. Hsu, C. Y. Tsoi, A. Bermak and K. N. Leung, “Synchronization
analysis in spiking pixel architecture-hardware implementation and mismatch analy-
sis,” IEEE Region 10 Conference, vol. D, pp. 274–277, 2004.
REFERENCES 151
[46] Q. Xin, G. Xiaochuan and J. G. Harris, “A time-to-ﬁrst spike CMOS imager,” Proceed-
ings of the IEEE International Symposium on Circuits and Systems, vol. 4, pp. 824–827,
2004.
[47] A. Bermak and C. Shoushun, “A Low Power CMOS Imager based on Time-to-First-
Spike encoding and Fair AER,” Proceedings of the IEEE International Symposium on
Circuits and Systems, vol. 4, pp. 5306–5309, 2005.
Chapter 6
ORASIS: A Micropower
Centroiding Vision Processor
6.1 Introduction
This chapter presents a computationally-eﬃcient vision processing chip for multi-object-
based centroiding and sizing. The outlined system, named ORASIS; constitutes a 48×48
pixel photo-detecting and distributed processing array. This directly implements the bio-
pulsating contour reduction algorithm (Chapter 4) using hybrid cellular topologies involving
weak-inversion analogue and asynchronous digital circuit techniques (Chapter 2).
The presented system provides the following additional functionality to previous work
developed in this area (reviewed in Chapter 3).
1. Object centroiding: This system is the ﬁrst developed (to date) capable of facilitating
simultaneous centroid detection of unlimited1 objects.
2. Object Sizing: This system is the ﬁrst developed (to date) capable of facilitating
parallel size determination of unlimited1 objects.
3. Object counting: This system is the ﬁrst developed (to date) capable of parallel object
counting.
4. Input Versatility: The system can be conﬁgured such that it can operate with a wide
1There is no maximum object object constraint, i.e. concerning the amount of objects that can be
processed in parallel. The only limiting factor is the address event communication capacity.
152
ORASIS: A Micropower Centroiding Vision Processor 153
variety of diﬀerent input image types. The image processing parameters that can be
tuned are: edge detection threshold, object/background threshold sense and threshold
oﬀset.
5. Ultra-low power consumption: The developed system achieves relatively high compu-
tational eﬃciency in comparison to traditional techniques.
This chapter begins with a top-level system architecture, describing how the various
blocks are hierarchically arranged, interconnected and can be scaled. Following is the cellu-
lar (tessellating) organisation outlining the functional sub-blocks for implementing the given
algorithm. These sub-blocks are then each described in detail, including circuit schemat-
ics with accompanying results. The fabricated prototypes are then discussed, with a brief
overview of their structure and contents. Finally system-level results, both simulated and
measured conclude the system description.
6.2 System Organisation
Based on implementing the bio-pulsating contour reduction algorithm; described previously
in Chapter 4, a system architecture is outlined [1], shown in Fig. 6.1.
6.2.1 Pixel Array
This can be subdivided into four “corners”, each of (x/2, y/2) dimensions, where (x,y) is
the array size (48x48). These sub-blocks are interconnected in the same way pixel elements
are interconnected internally with the exception of power supplies, bias currents and con-
trol signals. Furthermore each “corner” is synthesised using a unique set of pixel types,
depending on location. For example, the top-left corner block will contain left-edge, top-
left-corner, top-edge and standard pixel types. In total there are nine diﬀerent pixel types,
i.e. four edge pixels, four corner pixels and one standard (centre) pixel. The diﬀerence
between these pixel types is simply due to the diﬀerent terminating edge conﬁgurations, for
example a corner cell will have two terminating edges, whereas an edge cell will only have
one terminating edge. As will be seen later, the I/O connections need to be terminated
correctly for useful and error-free operation.
ORASIS: A Micropower Centroiding Vision Processor 154
Current/Supply/Control Distribution
Pixel Processing Array
Master Control and 
Current ReferenceA A A A
A A A A
B B B B
B B B B
B B B B
B B B B
Pixel Pixel Pixel Pixel
Pixel Pixel Pixel Pixel
Pixel Pixel Pixel Pixel
Pixel Pixel Pixel Pixel
Asynchronous Handshake
Column control latches, encoder and arbitration tree
Ro
w
 c
o
n
tr
o
l l
at
ch
es
, e
n
co
d
er
 a
n
d
 a
rb
it
ra
ti
o
n
 t
re
e
Ro
w
co
n
tr
o
ll
at
ch
es
,e
n
co
d
er
an
d
ar
b
it
ra
ti
o
n
tr
ee
Ro
w
 c
o
n
tr
o
l l
at
ch
es
, e
n
co
d
er
 a
n
d
 a
rb
it
ra
ti
o
n
 t
re
e
Column Readout Row Readout
Address Event Representation (AER) Output
Global Control and Tuning
Row/
column
selection
Figure 6.1: The proposed ORASIS System architecture. Illustrated are the three
main components: pixel processing array, address event representation readout and cur-
rent/supply/control distribution tree. The dotted lines represent the “stretch” marks, i.e.
how the system can be scaled to a larger size array.
ORASIS: A Micropower Centroiding Vision Processor 155
6.2.2 Global Signal Distribution
Due to the large number of in-pixel processors, a current distribution scheme is needed for
bias current copying and hierarchical fanout required for distribution of digital (control)
signals.
For the given size of pixel array (48x48), the following four-level distribution tree/fanout
is proposed for current distribution:
• Corner (1 to 4): Four initial master bias currents are generated and used to supply
the four corner headers of the pixel array, i.e. one header in each corner block (shown
in Fig. 6.1: sub-block A).
• Row (1 to 24): Each master reference set is then used to make (y/2) copies feeding
every row header in its corner (shown in Fig. 6.1: sub-block B).
• Column (1 to 24): Each row header in turn is used to make (x/2) copies feeding every
pixel in its row.
• Pixel (1 to 4): Within each pixel these bias currents are locally combined and copied
further (discussed later).
Current-mode vs. Voltage-mode Current Distribution
In a current-mode scheme, at each current-distribution chain, the currents are copied locally
using devices in close proximity (thus well-matched) and subsequently distributed along
separate metal lines. This increases device count and metal area usage in comparison to a
voltage-mode current distribution scheme.
A voltage-mode scheme uses voltage distribution to set device input voltages (over a
large area) and thus creates the bias currents. However, the error contribution is two-fold;
systematic in addition to increased mismatch. Using set bias voltage distribution, metal
line resistance over a relatively large distance results in a linear voltage gradient. When
used to generate bias currents, this translates in a non-linear current gradient through
device transconductance. Furthermore the mismatch increases due to large proximity device
separation where process variation gradients come into eﬀect.
ORASIS: A Micropower Centroiding Vision Processor 156
Therefore current-mode current distribution is preferred over voltage-mode distribution
for large area distribution to improve current matching at the expense of silicon area.
Digital Fanout (Control Distribution)
The scheme used for the control signal fanout is arranged using the same tree hierarchy as
for the current distribution. The digital signals are buﬀered at each level, by means of a
quad inverter cascade of increasing dimensions. By designing the buﬀers output transistors
to be relatively large, they can drive many relatively small input transistors in subsequent
levels, therefore ensuring fast and reliable operation.
System Scalability
This proposed architecture allows for a certain degree of scalability (perhaps tenfold), how-
ever to scale to much larger array sizes, for example, a 1 megapixel array, the distribution
hierarchy would have to be modiﬁed, i.e. increased number of levels/duplication stages.
The pixel circuitry and address-event hardware are however fully scalable.
6.2.3 System Input/Output
The various I/O signals used to control and tune the system and subsequently communicate
processed data oﬀ-chip are deﬁned below:
• GLOBAL RESET: For initialisation of the initial state of the pixel array. Resets all
the distributed memory contents.
• LOCAL RESET: Deﬁnes whether localised resetting is enabled, realising a pulsating
action, or operating in single-shot mode; initiated through a global reset signal.
• THRES MODE: Deﬁnes whether background intensity is above (or below) object
intensity.
• OUTPUT SEL[1:0]: Selects signal to be routed to AER output: Centre, State and
Reset.
• GLOBAL AVERAGE: Selects whether the wide-ﬁeld local average is computed as a
column average or global average.
ORASIS: A Micropower Centroiding Vision Processor 157
• IGLOBAL: Adjusts the global average level by sinking or sourcing a correction current
to this node when GLOBAL AVERAGE is asserted.
• IBIAS: Provides edge detector bias current and deﬁnes artiﬁcial propagation delay
constant.
• ITUNE: Tunes edge detector sensitivity, i.e. threshold of ﬂagging edge detection.
• CHIP REQ: Chip request for oﬀ-chip communication; used to signal asynchronous
handshake to receiving device on having data ready for transmission.
• CHIP ACK: Chip acknowledge for oﬀ-chip communication; used to acknowledge suc-
cessful transmission of data from receiving device.
• X[5:0]: The X-coordinate of address being transmitted.
• Y[5:0]: The Y-coordinate of address being transmitted.
6.2.4 Pixel Organisation
The basic pixel organisation is illustrated in Fig. 6.2. Based on the bio-pulsating contour
reduction algorithm described in Chapter 4, this architecture realises a direct implementa-
tion.
The photodiode is a reverse biased n-well/p-substrate junction (discussed in Chapter 5),
of dimensions 30μm× 30μm. This feeds the in-pixel analogue signal processing (ASP) core
which smooths, averages, compares and thresholds as described earlier. The ASP generates
two signals; CONTOUR and THRESHOLD, which in turn are used to feed the asynchronous
binary processing (ABP) core. This facilitates the contour reduction through asynchronous
signal propagation, ﬂagging the centres on detection. On centroid detection, the output
neuron negotiates with the Address Event Representation (AER) core for a timing slot for
oﬀ-chip communication.
The following three sections describe the structure and make-up of these blocks (ASP,
ABP and AER), the fundamental circuit theory, circuit operation and simulated / measured
results.
ORASIS: A Micropower Centroiding Vision Processor 158
Analogue Signal
Processor (ASP)
Asynchronous Binary
Processor (ABP)
Photodiode
Output
Neuron
Adjacent cells
(photodiodes)
Adjacent cells
(local-, global-av, edges)
Bias, tuning
currents
Output
selection
Row
handshake
Adjacent cells
(state, reset, centre)
Chip control
(threshold mode,
averaging mode,
global reset and 
local reset)
Column
handshake
Figure 6.2: The proposed ORASIS Pixel architecture. Illustrated are the four main com-
ponents: sensor (photodiode), Analogue Signal Processor (ASP), Asynchronous Binary
Processor (ABP) and Address Event Representation (AER) neuron.
6.3 Distributed Analogue Signal Processing (ASP) Core
The concept of using a distributed ASP core to feature extract is to avoid use of analogue-
to-digital converters (ADC) to reduce power consumption (previously discussed more rig-
orously in Chapter 2). Instead analogue processing is used to reduce the computation to
a series of comparisons, where simple comparators, i.e. 1-bit converters can be used for
discrete, asynchronous output.
This section describes the distributed architecture and circuits for extracting the re-
quired CONTOUR and THRESHOLD signals from a matrix of photocurrents.
6.3.1 Architecture
External
The extracellular interconnectivity is illustrated in Fig. 6.3 and previously in Fig. 4.5. Each
pixel-cell requires 16 connections with adjacent cells to achieve the required computational
functionality (front-end image processing):
ORASIS: A Micropower Centroiding Vision Processor 159
• Narrow-ﬁeld local averaging (6 connections): Each cell receives three iphoto in current
inputs from adjacent photodiodes (lower, right and lower-right) and consequently
transmits its iphoto out current to three adjacent cells (left, upper and upper-left).
• Wide-ﬁeld local averaging (2 connections): as column averaging is used, every cell
connects to a column averaging node shared with the upper and lower cells.
• Edge detection (4 connections): Every pixel-cell receives two vphoto in inputs from
adjacent photodiodes (lower and right) and consequently transmits its vphoto out
value to two adjacent cells (left and upper).
• Contour detection (4 connections): As each cell compares its photo-intensity with
that of the cell adjacently below and to the right, two edges are computed per pixel.
Therefore to process all edges adjacent to a photodiode two edge inputs (from left and
upper cells) and two edge outputs (to right and lower cells) are required.
Internal
The internal architecture of the ASP core is illustrated in Fig. 6.4.
The ASP core consists of three main (functional) blocks for: (1) averaging and com-
parison, (2) edge detection and (3) contour discrimination, in addition to some support
circuitry.
A cells photo-voltage (Vphoto); being a log-compression of its photocurrent, is generated
at the front-end in the averaging and comparison block. This block computes the narrow-
ﬁeld and wide-ﬁeld local averages, compares these and thresholds to determine whether a
certain pixel is above or below the average intensity level (Vthreshold). In parallel, the cells
photo-voltage is compared with those of neighbouring cells to determine whether it lies on
an edge. The four edge signals (to adjacent photodiodes) are then used in conjunction with
the THRESHOLD signal to determine whether a cell satisﬁes the CONTOUR condition.
6.3.2 Averaging and Comparison
The schematic diagram for the ASP block responsible for photodetection, averaging, com-
parison is given in Fig. 6.5.
ORASIS: A Micropower Centroiding Vision Processor 160
ic
o
lu
m
n
(x
,y
)
ip
_i
n
2(
x,
y)
vp
_i
n
2(
x,
y)
ve
_i
n
2(
x,
y)
ip_in1(x,y)
vp_in1(x,y)
ve_in1(x,y)
ic
o
lu
m
n
(x
,y
)
ip
_o
u
t2
(x
,y
)
vp
_o
u
t(
x,
y)
ve
_o
u
t(
x,
y)
ip_out1(x,y)
vp_out(x,y)
ve_out1(x,y)
ip_out3(x,y)
ip_in3(x,y)
Analogue
Signal Processing
(ASP) Core
Asynchronous
Binary Processing
(ABP) Core
ip
_i
n
2(
x,
y-
1)
v_
in
2(
x,
y-
1)
ve
_i
n
2(
x,
y-
1)
ip_out1(x+1,y)
vp_out(x+1,y)
ve_out1(x+1,y)
ip_in1(x-1,y)
vp_in1(x-1,y)
ve_in1(x-1,y)
ip
_o
u
t2
(x
,y
+
1)
vp
_o
u
t(
x,
y+
1)
ve
_o
u
t2
(x
,y
+
1)
ip_in3(x-1,y-1)
ip_out3(x+1,y+1)
ic
o
lu
m
n
(x
,y
+
1)
ic
o
lu
m
n
(x
,y
-1
)
Figure 6.3: Symbol representation of the analogue signal processing (ASP) core. Illus-
trated is the external connectivity of analogue signals, i.e. with other cells, showing the
requirements for cellular tessellation. This excludes global control signals and bias point
connections. Nodes have been abbreviated for clarity as follows: vp=vphoto, ip=iphoto,
ve=vedge.
ORASIS: A Micropower Centroiding Vision Processor 161
Icolumn
MODE
Vphoto_in1Iphoto_in3 Vphoto_in2
Vphoto_out
Vedge_in2
Vedge_out2
Vedge_in1
Vedge_out1
CONTOURTHRESHOLD
Itune
Ibias Idelay
Iphoto_out2
Iphoto_out1
Iphoto_out3
BIAS GEN
EDGE
DETECTOR
(HORIZ)
EDGE
DETECTOR
(VERT)
CONTOUR
DETECTOR
PHOTO-
DETECTION,
SMOOTHING,
COMPARISON
Iphoto_in2
Iphoto_in1
Vphoto_out
Vedge_in1
Vthreshold
Vb
Figure 6.4: Schematic diagram of the in-pixel analogue signal processing (ASP) organisa-
tion. Illustrated is the internal connectivity emphasising the signal ﬂow path between the
various blocks.
Q1 Q3
Iphoto
6/10 3/10
Q4
3/10
Q5
3/10
Q6
3/10
Q11 Q12
3/10 3/10
Q2
6/10
Q7
3/10
Q8
3/10
Q9
2/10
Q10
3/10
Q13 1/1
Ip
h
o
to
_o
u
t1
Ip
h
o
to
_o
u
t2
Ip
h
o
to
_o
u
t3
Ip
h
o
to
_i
n
1
Ip
h
o
to
_i
n
2
Ip
h
o
to
_i
n
3
(t
o
 lo
w
er
 c
el
l)
(t
o
 ri
g
h
t 
ce
ll)
(t
o
 lo
w
er
-r
ig
h
t 
ce
ll)
(f
ro
m
 lo
w
er
 c
el
l)
(f
ro
m
 ri
g
h
t 
ce
ll)
(f
ro
m
 lo
w
er
-r
ig
h
t 
ce
ll)
Vphoto
Vlocal
Vcolumn
Vthreshold
Ic
o
lu
m
n
(f
ro
m
 a
ll 
co
lu
m
n
 c
el
ls
)
MODE
Ilocal
Ilocal
Photodetection
and log compression Narrow-field local averaging
Wide-field local 
(column) averaging
Comparison
and thresholding
D1
30x30
Iphoto
Min (Ilocal, 
Icolumn)
Figure 6.5: Schematic diagram of the photodiode interface circuit including current-mode
narrow- and wide-ﬁeld averaging/smoothing and comparison.
ORASIS: A Micropower Centroiding Vision Processor 162
Phototransduction: The pn-junction photodiode (D1) is reverse-biased by stacking
two diode-connected PMOS devices (Q1, Q2) to Vdd. The photocurrent range for the given
device under the expected light levels is from 100fA (dark current) to 5nA. For this current
range devices Q1 and Q2 operate in the weak inversion region therefore the applied reverse-
bias is logarithmically proportional to the photocurrent [2, 3] as expressed in Eqn. 6.1.
Vphoto = Vdd − (Vgs1 + Vgs2) ≈ Vdd − 2nφtln
(
Iphoto
I0
)
(6.1)
Where: n is the subthreshold slope factor, φt is the thermal voltage, Iphoto is the photocur-
rent and I0 is the pre-exponential current.
Narrow-ﬁeld local averaging: The node in-between the stacked devices is also used
to form a current mirror with devices Q3-Q6; providing scaled, copied currents for current-
mode averaging. Devices Q3-Q5 source copied photocurrents (Iphoto out1,2,3) to adjacent
cells and device Q6 receives and sums copied photocurrents from adjacent cells to form a
four-pixel average current, such that Ilocal = (Iphoto + Iphoto out1 + Iphoto out2 + Iphoto out3)/2.
Wide-ﬁeld local averaging: The wide-ﬁeld local average is implemented by using a
column averaging technique. This is facilitated by summing all the copied (through current
mirror Q7, Q8) narrow-ﬁeld smoothed currents (Ilocal). Normalisation is then achievable
by copying this current using a distributed 1:1 current mirror per cell. This has the eﬀect
of forming an X:1 scaled mirror; with X being the number of cells attached to the column
(see Fig. 6.6).
Current-mode comparison: The near-ﬁeld local (cellular) average is then compared
to the wide-ﬁeld local (column) average by means of a basic current comparator formed by
an opposing source/sink transistor pair [4] (Fig. 6.5: Q10, Q12). If the device bias points
are similar, then both devices will be in saturation and the output voltage (Vthreshold) is
given in Eqn. 6.3.
Vthreshold =
Icolumn (1 + λpVdd)
Ilocal (1 + λn) + Icolumnλp
(6.2)
Where: λn and λp are the linear channel length modulation (early) factors for N- and P-
MOS devices respectively. Subsequently, if the input currents are exactly equal, then this
simpliﬁes to:
ORASIS: A Micropower Centroiding Vision Processor 163
Qnb Q1c
3/10 3/10
Qna
3/10
Q2a
3/10
Q1a
3/10
Q2b
3/10
Q1b
3/10
Q2c
3/10
Qnc
3/10
Qb Q1c
9/10 3/10
Q2c
3/10
Qnc
3/10
ilo
ca
l1
ilo
ca
l2
ilo
ca
l3
1/
3(
ilo
ca
l1
+
ilo
ca
l2
+
ilo
ca
l3
)
1/
3(
ilo
ca
l1
+
ilo
ca
l2
+
ilo
ca
l3
)
1/
3(
ilo
ca
l1
+
ilo
ca
l2
+
ilo
ca
l3
)
1/
3(
ilo
ca
l1
+
ilo
ca
l2
+
ilo
ca
l3
)
1/
3(
ilo
ca
l1
+
ilo
ca
l2
+
ilo
ca
l3
)
1/
3(
ilo
ca
l1
+
ilo
ca
l2
+
ilo
ca
l3
)
ilo
ca
l1
ilo
ca
l2
ilo
ca
l3
Figure 6.6: Schematic representation of the wide-ﬁeld (column) averaging mechanism. Ex-
ample given for a three row example.
Vthreshold =
1 + λpVdd
1 + (λn + λp)
(6.3)
However, if the source/sink bias points (saturation currents) are diﬀerent, the device
with the higher bias point will be forced out of saturation, i.e. it will enter the ohmic
region. For example, in Fig. 6.5, if Icolumn > Ilocal, the source device (Q12) will operate in
the linear region, and the output voltage (Vthreshold) will swing upwards towards V dd. This
behaviour is described by eqn. 6.4 (for Ilocal < Icolumn) and eqn. 6.5 (for Ilocal > Icolumn).
Vthreshold =
Icolumn (1 + λpVdd)
Icolumnλp + Ilocalφt
(6.4)
Vthreshold =
IlocalVdd − Icolumnφt
Ilocal + Icolumnλnφt
(6.5)
Although the generated threshold voltage (Vthreshold) describes the current comparison
discretely for a substantial diﬀerential current, smaller current diﬀerences cause Vthreshold
to remain between V ss and V dd. For driving CMOS static logic, such a signal is undesir-
able, for a non-discrete input can give rise to large “short-circuit” currents. Therefore a
thresholding buﬀer is required to “square up” this signal to reliable discrete levels (discussed
later).
Threshold mode option: To provide versatility to a wider range of image types, a
ORASIS: A Micropower Centroiding Vision Processor 164
Q3
3/10
Q4
3/10
Q5
3/10
Vb
Vb
Q1
9/8
Q2
9/8
Isink Isource
Vout
X1
X2
X3
Vphoto1 Vphoto2
Vedge1
Vedge2
2•Isink>Isource
Logic OR
Figure 6.7: Schematic diagram of the tunable discrete edge detector circuit. Details of the
current generation scheme (Isource and Isink) and implementation of thresholding inverters
(X1 and X2) are provided later.
threshold mode option can provide the functionality to select between light object on dark
background and dark object on light background. To achieve this, the threshold detection
oﬀset must be shifted (inverted) to provide the correct margin for adequate noise rejection
and robustness to process variation. This can be implemented by means of altering the
eﬀective device aspect ratios by switching in additional devices in parallel with the sinking
device (Fig. 6.5: Q9, Q13). Furthermore the discrete output requires to be inverted, easily
achievable by using a XOR gate (see Fig. 6.4).
Global averaging option: Using this current-mode averaging/thresholding scheme, it
is easily extendable to provide the option for global averaging, with an input for threshold
correction. This is achievable by switching all the column averaging nodes (Vcolumn) to be
shorted together thus realising a single global average. Subsequently, if an external current
is sourced or sunk to this node, this will have the eﬀect of adjusting the global average
threshold higher or lower.
6.3.3 Edge Detecting and Contour Discrimination
The schematic given in Fig. 6.7 illustrates the edge detection circuitry [5] [6] inserted be-
tween every pair of adjacent pixels.
ORASIS: A Micropower Centroiding Vision Processor 165
Edge Detection: Principle of Operation
The diode-connected devices (previously illustrated in Fig. 6.5) provide the logarithmically
compressed voltage inputs (Vphoto1 and Vphoto2). Two diode-connected devices are used to
ensure the current sourcing device (not shown) remains in saturation for small photocur-
rents. The diﬀerential voltage (Vphoto1, Vphoto2) is applied to the PMOS diﬀerential pair
(Q1 and Q2) sourced by the current Isource. The diﬀerential pair tail currents are sunk via
the current mirror (Q3, Q4 and Q5) which is controlled by the sink current; Isink. The
operation is as follows:
• Isource is selected such that it generates suﬃcient transconductance and therefore
gain to ensure reliable operation for the minimum response time required (limited
by response of photodiodes). Too high a value would result in a useless increase in
power consumption (for this is static power dissipation) with no increase in system
performance. Furthermore, both the bias currents and device sizes are selected such
that all the devices remain (as much as possible) within a single region of operation;
for the expected current levels, in weak inversion. This is to avoid any asymmetric
behaviour due to some devices having a signiﬁcant drift-current inﬂuence, i.e. entering
moderate inversion.
• Isink is adjusted to lie in between Isource/2 and Isource and sets the allowed tolerance
before indicating an edge and ﬂagging it up. This will set the gate-source voltages of
devices Q4 and Q5. This voltage will in turn determine the maximum current that
can be sunk from the drains of Q4 and Q5 (Id4max and Id5max respectively). Assuming
devices Q1 and Q2 are ideally matched, this circuit operates in one of two states:
1. (Vphoto1 = Vphoto2): Since Isource/2 < Isink < Isource then Id1 < Id8max causing
device Q4 to be forced into the ohmic region. This in turn will cause Vedge1
to sit barely above ground and similarly Q5, Id6 and Vedge2 will behave in the
same way. As a result of Vedge1 and Vedge2 both being low, Vout will output high
indicating there is no edge.
2. (Vphoto1 = Vphoto2): For example, if Vphoto1 < Vphoto2 such that Id1 = Id4max
then device Q4 is in saturation and Vedge1 rises to just below Vdd. However
Id2 < Id5max so device Q5 is still in the ohmic region, keeping Vedge2 low. This
will result in Vout outputting low indicating there is an edge.
ORASIS: A Micropower Centroiding Vision Processor 166
Edge Detection: Circuit Analysis
Assuming devices Q1 and Q2 are operating in saturation, the following expression (Eqn. 6.6)
can be derived, expressing the output current (diﬀerential).
Id1 − Id2 = Isource · tanh
(
Vphoto1 − Vphoto2
2nφt
)
(6.6)
Where: n is the charge eﬀect due to the substrate (also referred to as the slope factor
or subthreshold constant) and φt is the thermal voltage (φt = kT/q=25.9mV at room
temperature).
This can be split as to provide the single-ended tail currents described in expressions
6.7 and 6.8.
Id1 =
1
2
· Isource ·
[
1 + tanh
(
Vphoto1 − Vphoto2
nφt
)]
(6.7)
Id2 =
1
2
· Isource ·
[
1− tanh
(
Vphoto1 − Vphoto2
nφt
)]
(6.8)
From 6.6, the large signal (6.9) and small signal (6.10) transconductance of the diﬀer-
ential pair can be derived [7].
GM =
d(I1− I2)
d(V 1− V 2) =
Isource
2nφt
· sec2
(
Vphoto1 − Vphoto2
2nφt
)
(6.9)
gm =
Isource
2nφt
(6.10)
Furthermore, expression 6.9 can be used to express the range of values for which the circuit
will ﬂag an edge detected.
Isink − [GM (Verror + |Vphoto1 − Vphoto2|)] < 0 (6.11)
Where: Verror is the error term expressing the total mismatch error in the diﬀerential pair
ORASIS: A Micropower Centroiding Vision Processor 167
as an input referred voltage. This expression directly links into those developed previously
for algorithm-based edge detection robustness in Eqns. 4.16 and 4.17.
Although Eqn. 6.11 yields at least one discrete edge signal (Vedge1, Vedge2) for a moderate
diﬀerential input voltage (Vphoto1−Vphoto2), smaller variations result in graded output levels.
This is due to the diﬀerential pair having a ﬁnite gain and thus operating as a transconductor
rather than comparator. As a result, devices Q1-Q5 are in saturation and the edge output
voltages are given by Eqns. 6.12 and 6.13.
Vedge1 =
1
λn
Isink
Isource
(
1 + e
Vphoto2−Vphoto1
nφt − Isource
Isink
)
(6.12)
Vedge2 =
1
λn
Isink
Isource
(
1 + e
Vphoto1−Vphoto2
nφt − Isource
Isink
)
(6.13)
Where: λn is the linear channel length modulation (early) factor for the sinking devices,
assuming Vds > 3φt (for device Q3-Q5). This expression ignores any current source non-
idealities (early eﬀect).
As discussed earlier (in threshold detection), such intermediate (non-discrete) voltage
levels are undesirable for driving CMOS logic. Therefore thresholding buﬀers are inserted
in between the edge signals (Vedge1 and Vedge2) and logic NOR gate.
Edge Detection: Results
The intended edge detection functionality is illustrated in the simulation results given in
Figs. 6.8 and 6.9. Figure 6.8 shows the operation of the circuit for set bias currents (Isource
and Isink), but with varying photocurrents (Iphoto1 and Iphoto2). Results are given for ten
diﬀerent Iphoto1 levels spanning over three orders of magnitude, with Iphoto2 swept over the
entire range (X-axis). The “window” of edge-detection is seen (ﬁg. 6.8(e)) to exhibit excel-
lent linearity throughout the tested photo-intensity range. Figure 6.9 shows the operation
of the circuit for a set bias current Isource and photocurrent Iphoto1 with varying bias current
Isink, with Iphoto2 swept over the entire range (X-axis). This demonstrates the tunability of
the edge detection window using a single current-mode input.
Furthermore, the current consumption proﬁle for a 1nA bias, is illustrated in the simu-
lated results (Figs. 6.8(f) and 6.9(f)). This shows an average 3.5nA total current consump-
ORASIS: A Micropower Centroiding Vision Processor 168
(a)
(b)
(c)
(d)
(e)
(f )
Figure 6.8: Simulation results of the edge detector circuit illustrating the discrete detection
at varying light intensities. Results are for: Isource = 1nA, Isink = 600pA, with 1pA ≤
Iphoto1, Iphoto2 ≤ 10nA. Shown (from top to bottom) are: (a) Id1, (b) Id2, (c) Vedge1, (d)
Vedge2, (e) Vout and (f) Ivdd
ORASIS: A Micropower Centroiding Vision Processor 169
(a)
(b)
(c)
(d)
(e)
(f )
Figure 6.9: Simulation results of the edge detector circuit illustrating the tunable sensitivity.
Results are for: Isource = 1nA, 500pA ≤ Isink ≤ 1nA, Iphoto1 = 300pA, with 1pA ≤ Iphoto2 ≤
10nA. Shown (from top to bottom) are: (a) Id1, (b) Id2, (c) Vedge1, (d) Vedge2, (e) Vout and
(f) Ivdd
ORASIS: A Micropower Centroiding Vision Processor 170
Figure 6.10: Monte Carlo simulation results for the edge detector illustrating variability of
edge detection window to process variation and mismatch. For Isource = 2nA, Isink = 1.5nA,
Iphoto1 = 300pA, 1pA ≤ Iphoto2 ≤ 10nA, statistical simulation of N = 1979 runs results in:
μlower = 176.009pA, σlower = 32.190pA, μupper = 521.234pA, σupper = 98.071pA
tion (excluding photocurrents), over all operating regions with a peak consumption of 7-8nA
at the onset of edge detection. Therefore the total average power consumption, per edge
detector block is 6.5nW. The peaking mentioned previously is due to the current-limiting
operation in the thresholding inverters for intermediate input voltages (discussed later).
Statistical simulation results show acceptable variability in edge detection threshold
with process variation and device mismatch. The histogram given in Fig. 6.10 shows the
full range (±3σ) variation of current level to be 75-275pA for lower edge threshold and
325-850pA for upper edge threshold, for the given example. Although for the chosen device
sizing, the statistical spread is not optimal (i.e. the design criteria are for power and area
optimisation), the observed non-overlap band between upper and lower edge threshold levels
indicates good yield and robust operation.
Contour Discrimination
The schematic given in Fig. 6.11 illustrates the contour discrimination logic present in
every cell. This takes its (edge) inputs from the two internal (to that cell) edge detectors
(right and bottom edges), the cell to the left (left edge) and the cell above (upper edge).
Furthermore, the threshold input eﬀectively inhibits static (spatial) noise from aﬀecting the
signal propagation within objects (discussed previously in Chapter 4).
ORASIS: A Micropower Centroiding Vision Processor 171
X5
Vedge_hor1
Vthreshold
Vedge_ver1
Vedge_hor2
Vedge_ver2
Vcontour
X2
X1
X3
X4
X6
Figure 6.11: Schematic diagram of the contour discrimination combinational logic.
Q1 Q2
2/10 4/10
Q3
4/10
Q4
2/10
Q5
2/10
Q6
1/10
Q7
2/10
Q8
2/10
Q9
2/10
Ib
ia
s
It
u
n
e
Is
o
u
rc
e1
Is
o
u
rc
e2
Is
in
k1
Is
in
k2
Id
el
ay
(=
2I
b
ia
s)
(=
2I
b
ia
s)
(=
Ib
ia
s+
It
u
n
e)
(=
Ib
ia
s+
It
u
n
e)
(=
1/
2I
b
ia
s)
Figure 6.12: Schematic diagram of the in-pixel current distribution circuit, providing bias
currents to the edge detecting and timing delay blocks.
6.3.4 Bias Distribution
Each pixel receives two individual bias currents; Ibias and Itune. These are used to gener-
ate all accurate2 in-pixel bias currents through current summation of copied currents (see
Fig. 6.12). This generates Isource1 = Isource2 = (2Ibias), Isink1 = Isink2 = (Ibias + Itune) and
Idelay = (Ibias/2).
6.3.5 Thresholding
To achieve power-eﬃcient binary extraction functionality as previously mentioned, a thresh-
olding inverter has been strategically inserted in the edge detection and local/global- aver-
aging/comparison functions. This has the task of converting an analogue voltage to discrete
level (1-bit conversion) with minimum power consumption. The implemented circuit solu-
2refers to relatively accurate currents (± 5%), in comparison to voltage-mode distributed currents (±
25%)
ORASIS: A Micropower Centroiding Vision Processor 172
Vin Vout
Q2 Q3 Q4
Q5 Q6 Q7
.75/.25 .75/.25 .75/.25
Q8 Q9 Q10
.25/.25 .25/.25 .25/.25
.5/.5 3/.5 8/.5
Vbias
Q1
.5/.5
Ibias Ilimit1 Ilimit2 Ilimit3
Figure 6.13: Schematic diagram of the current-limiting thresholding inverter; used for 1-bit
conversion.
tion is illustrated in Fig. 6.13.
The basic concept is to threshold the signal using a three-stage cascade of logic in-
verters. Since the “short-circuit” voltage component would be prohibitively high in directly
implementing this, the inverters power-supply (Vdd) is connected through a current-sourcing
device. This has the eﬀect of current limiting the short-circuit current at each stage. By
scaling this limiting current, a successively squarer signal is obtained in successive stages.
Furthermore, as the circuit topologies demand a low threshold detection point (≈ Vdd/6)
a scaled PMOS current (limiting) mirror can help achieve this low threshold point. For a
1nA initial stage bias, the optimum (regarding power consumption) current-limit ratio is
1:6:16.
Statistical simulation results show acceptable variability in threshold voltage with process
variation and device mismatch. The histogram given in Fig. 6.14 shows the full range (±3σ)
variation of threshold voltage to be 200-350mV. Although this represents a signiﬁcant 8.3%
variation over the power supply range, the target input signals (Vedge1, Vedge2 and Vthreshold)
exhibit a steep roll-oﬀ in this region and therefore some variation in threshold level will
translate to negligible output referred error.
6.4 Distributed Asynchronous Binary Processing (ABP) Core
Having extracted the important features by means of the ASP core, the image data has
been reduced to two binary bits per pixel; CONTOUR and THRESHOLD. Subsequently,
using these inputs to feed and synchronise a distributed, asynchronous binary network,
ORASIS: A Micropower Centroiding Vision Processor 173
Figure 6.14: Monte Carlo simulation results for the thresholding inverter illustrating vari-
ability of threshold voltage to process variation and mismatch. μ = 275.322m, σ = 29.028m,
N = 2000
computationally-eﬃcient spatiotemporal processing can be achieved on another level.
This section describes the distributed architecture and combinational circuits for facili-
tating the object segmentation and centroid extraction from a matrix of CENTROID and
THRESHOLD binary inputs.
6.4.1 Architecture
External
The extracellular interconnectivity is illustrated in Fig. 6.15 and previously in 4.4. Each
pixel-cell requires 52 connections with adjacent cells to achieve the required computational
functionality (segmentation and centroiding):
• STATE (12 connections): Each cell receives eight STATE inputs from adjacent cells
(four from directly adjacent and four from a three-cell proximity, i.e. three pixels to
the right, three pixels above, etc) and consequently transmits its STATE contents to
its four directly adjacent cells.
• RESET (8 connections): Each cell receives four RESET inputs from directly adjacent
cells and transmits its RESET status back to them.
• CENTRE (16 connections): Each cell receives eight CENTRE inputs from adjacent
ORASIS: A Micropower Centroiding Vision Processor 174
S4 3S
4
C
4
R4S C R
S3
3S3
C3
R3
S
C
R
S2 C
2 R2
S C R
S1
3S1
C1
R1
S
C
R
C5 C6
C8 C7
Analogue
Signal Processing
(ASP) Core
Asynchronous
Binary Processing
(ABP) Core
3S
2
ST
AT
E(
x,
y-
1)
ST
AT
E(
x,
y-
3)
C
EN
TR
E(
x,
y-
1)
RE
SE
T(
x,
y-
1)
ST
AT
E(
x,
y)
ST
AT
E(
x,
y)
C
EN
TR
E(
x,
y)
STATE(x+1,y)
STATE(x+3,y)
CENTRE(x+1,y)
RESET(x+1,y)
STATE(x,y)
STATE(x,y)
CENTRE(x,y)
STATE(x-1,y)
STATE(x-3,y)
CENTRE(x-1,y)
RESET(x-1,y)
STATE(x,y)
STATE(x,y)
CENTRE(x,y)
ST
AT
E(
x,
y+
1)
ST
AT
E(
x,
y+
3)
C
EN
TR
E(
x,
y+
1)
RE
SE
T(
x,
y+
1)
ST
AT
E(
x,
y)
ST
AT
E(
x,
y)
C
EN
TR
E(
x,
y)
CENTRE(x-1,y-1)
CENTRE(x-1,y+1)
CENTRE(x+1,y-1)
CENTRE(x+1,y+1)
Figure 6.15: Symbol representation of the in-pixel asynchronous binary processing block.
Illustrated is the external connectivity of discrete signals, i.e. with other cells, showing
the requirements for cellular tessellation. This excludes global control signals and out-
put signals. Nodes have been abbreviated for clarity as follows: S=STATE, R=RESET,
C=CENTRE.
ORASIS: A Micropower Centroiding Vision Processor 175
STATE SET
LOGIC
STATE RESET
LOGIC
STATE MEMORY
LOGIC
CENTRE
STATE
RESET
DELAY
C
1
C
2
C
3
C
4
C
5
C
6
C
7
C
8
S1 S2 S3 S4 3S
1
3S
2
3S
3
3S
4
Id
el
ay
THRESHOLD
CONTOUR
G
LO
B
A
L_
RE
SE
T
LO
C
A
L_
RE
SE
T R1 R2 R3 R4
RESET_INHIBIT
SET_INHIBIT
Figure 6.16: Schematic diagram of the Asynchronous Binary Processing (ABP) organisa-
tion. Illustrated is the internal connectivity emphasising the signal ﬂow path between the
various blocks.
cells (four directly adjacent and four diagonally adjacent) and transmits its CENTRE
status back to them.
• Signal feed-through (16 connections): Provides two feed-through connections in each
direction (i.e. two leftwards, two upwards, etc) for three-cell proximity state signalling.
Internal
The internal architecture of the ABP core is illustrated in Fig. 6.16.
The ABP core consists of three main (functional) blocks for: (1) state setting (inward
propagation), (2) state resetting (back-propagation) (3) state and centroid storage, in ad-
dition to some support circuitry.
The state is initially set if a contour is deﬁned (initiation) or if a neighbouring cell
signals a state (propagation). A preset delay is added in the propagation path to limit the
signalling speed. In parallel the centroid detection checks for a centre-surround condition
(i.e. if surrounding cells are set and centre cell is unset). On centroid detection a reset
signal is back-propagated through all set cells, until it is blocked by a contour object limit.
ORASIS: A Micropower Centroiding Vision Processor 176
X1
X2
X3
X4
X5
X6
X7
X8
X9
CONTOUR
SET_INHIBIT
RESET_INHIBIT
THRESHOLD
SURROUND
SET
S1, S2, S3, S4
3S1, 3S2, 3S3, 3S4
Figure 6.17: Schematic diagram of the state set logic facilitating the forward asynchronous
signal propagation.
Furthermore a centroid detection will trigger an oﬀ-chip address-event, discussed in detail
in the following section.
6.4.2 State Set
The complete combinational logic required to facilitate the state set functionality is illus-
trated in Fig. 6.17. This block generates the SET, SURROUND and RESET INHIBIT
signals:
• The SET signal is asserted if an adjacent cell has its state set in addition to the
THRESHOLD signal being asserted. Alternatively, a SET signal can be generated by
a CONTOUR signal provided the SET INHIBIT condition does not block this. The
function of the SET INHIBIT signal is to avoid oscillation between SET and RESET
on completion of a local (object-based) reset cycle.
• The SURROUND signal is generated to assist the centroid detection. The condition
for this is if the received surrounding cells (three-pixel proximity) have states set.
• The RESET INHIBIT signal is generated if any adjacent cells do not have their states
set. The purpose for this is described in the following section.
ORASIS: A Micropower Centroiding Vision Processor 177
SET_INHIBIT
X1
X2 R
S Q
Q
X11
RESET_INHIBIT
CENTRE
LOCAL_RESET
STATE
R1, R2, R3, R4
GLOBAL_RESET
RESET
X3
X4
X5
X6
X7
X8
X9
X10
τ=20ns
Figure 6.18: Schematic diagram of the state reset logic facilitating the reverse signal (back)
propagation.
6.4.3 State Reset
The complete combinational logic required to facilitate the state reset functionality is illus-
trated in Fig. 6.18. This block generates the RESET and SET INHIBIT signals:
• The RESET signal is asserted if the LOCAL RESET mode is enabled and the cell
signals a CENTRE (logic X3 and X5). A RESET INHIBIT signal delays the RESET
signal being generated until the inward propagation reaches the central cell, to provide
a continuous back-propagation path. Alternatively a RESET signal can be signalled
if the STATE is set and any adjacent cell back-propagates a RESET (logic X4 and
X5). On generating a RESET signal, a monostable is triggered (logic X6, X8-10) to
produce a RESET pulse of suﬃcient time to reliably back-propagate to adjacent cells.
• The SET INHIBIT signal (active low) is asserted if any adjacent cells are resetting.
This ensures the back-propagation has reliably terminated before a forward propaga-
tion can commence.
6.4.4 State Memory
The complete combinational logic required to facilitate the state memory functionality is
illustrated in Fig. 6.19. This block completes the asynchronous state machine and provides
the STATE and CENTRE signals:
• The STATE signal is latched high on assertion of a SET input and conversely it is
latched low on assertion of a RESET input.
ORASIS: A Micropower Centroiding Vision Processor 178
SURROUND
RESET
SET
CENTREX2
C5, C6, C7, C8
STATE
X1
C1, C2, C3, C4
R
S Q
Q
X3
X4
X5
X6
Figure 6.19: Schematic diagram of the state memory logic for centroid determination and
state deﬁnition.
Q2
Q1
Idelay
IN C1
5/.5
5/.5
Q4 Q5 Q6
.75/.25 .75/.25 .75/.25
Q7 Q8 Q9
.25/.25 .25/.25 .25/.25
Ilimit1 Ilimit2 Ilimit3
Q10 Q11 Q12
.5/.5 3/.5 8/.5
Q3
.5/.5
OUT
Vbias
Ilimit
68.5fF
Vdelay
Figure 6.20: Schematic diagram of the current-controlled delay circuit for creating an asyn-
chronous discrete delay.
• The CENTRE signal is asserted when a high SURROUND signal is received in addi-
tion to the cells STATE being low. However, this is inhibited if any neighbouring cell
(directly adjacent or diagonal) ﬂags a centre. This ensures a single centre is detected
in the duration between the CENTRE signal being asserted and the RESET signal
being issued.
6.4.5 Delay
As mentioned previously, an artiﬁcial delay is inserted in the SET signal path; between
the STATE SET logic and the internal STATE MEMORY cell (See Fig. 6.16). The circuit
implementation of this delay cell is given in Fig. 6.20.
The delay circuit has a binary input (IN), a binary output (OUT) and two current inputs
ORASIS: A Micropower Centroiding Vision Processor 179
Delay (μs)
Idelay=125pA Idelay=200pA Idelay=400pA Idelay=500pA Idelay=800pA
C
ou
nt
Figure 6.21: Measured delay cell performance for diﬀerent bias currents (Idelay), illustrating
statistical variation over a batch of ten samples.
(Ilimit and Idelay). The Ilimit current deﬁnes the current limit on the thresholding inverter
circuit, as described previously in Fig.6.13. When the input is high, the Idelay current is
switched (by device Q1) into a capacitive, integrating node; Vdelay. A metal-insulator-metal
(MIM) capacitor (C1) is used to ensure good linearity and matching. Subsequently, Vdelay
is connected to a three stage NMOS current-limited thresholding inverter; whose threshold
sits approximately 300mV below Vdd. Therefore for a ﬁxed current, the time delay is given
by:
τdelay =
Qc
Idelay
=
C · Vthreshold
Idelay
(6.14)
Where: τdelay is the time delay, Qc is the charge stored on the capacitor and Vthreshold is
the threshold voltage of the inverter cascade. For example, an Idelay of 1nA would result in
a delay of 102μs.
Figure 6.21 presents measured results for the current delay circuit block over a batch
of ten fabricated dies. Although a 10-20% spread can be observed, it has to be taken
into account that these circuits are collected throughout diﬀerent wafers and therefore
diﬀerent process corners. Corner simulations demonstrate similar variations in performance.
Subsequently, Monte Carlo mismatch simulations suggest that similar delay cells fabricated
on the same die; at close proximities will match performance within a 5-10% spread.
ORASIS: A Micropower Centroiding Vision Processor 180
6.5 Address Event Representation (AER)
As the described system is both asynchronous and data-driven in nature, it is ideally suited
to an event driven output. One such protocol is the Address Event Representation [3]; used
extensively in the vision chip arena. The principle behind this data-transmission mechanism
is that each pixel has a unique identiﬁer (i.e. its co-ordinate) and when a pixel registers an
event this identiﬁer is asserted onto a digital bus. The data is then communicated oﬀ-chip
through means of an asynchronous handshake.
This section describes the speciﬁc AER architecture [8] adopted and the accompanying
blocks implemented for oﬀ-chip communication.
6.5.1 Architecture
The speciﬁc AER architecture implemented is given in Fig.6.22, illustrated for a 4x4 array.
Each pixel in the array has a sender neuron that latches a pixels state on an event until
the data has been communicated oﬀ-chip. The sender neuron initially sends a arbitration
request to the row (Y) arbitration tree. The role of the arbitration tree is to select a single
output in the event of multiple inputs. On selection of a particular row, the row header is
latched and subsequently the competition passes to the column arbitration tree. A similar
process then occurs from the sender neurons to column arbitration tree and back to the
column headers until a single column has been latched. On selection of both a row and
column, the chip sends a bus request signal oﬀ-chip to the receiving device. The address
is read oﬀ the bus and then a bus acknowledge signal is relayed back to reset the row and
column latches that consequently reset the sender neuron state. This selection/arbitration
process is then repeated for all events awaiting to be transmitted.
6.5.2 Sender Neuron Circuit
The circuit implementation of the sender neuron block is shown in Fig. 6.23. On a pixel sig-
nalling an event, the edge-triggered RS ﬂip-ﬂop (X2) is latched. Subsequently, devices Q1-
Q3 are used to divert the output signal ﬂow to negotiate row and then column arbitration.
On successful oﬀ-chip communication both a row and column reset (Y PIXEL RESET
and XPIXEL RESET ) will be received thus resetting the ﬂip-ﬂop.
ORASIS: A Micropower Centroiding Vision Processor 181
Sender
Neuron
Sender
Neuron
Sender
Neuron
Sender
Neuron
Sender
Neuron
Sender
Neuron
Sender
Neuron
Sender
Neuron
Sender
Neuron
Sender
Neuron
Sender
Neuron
Sender
Neuron
Sender
Neuron
Sender
Neuron
Sender
Neuron
Sender
Neuron
0 1 0 1
0 0 1 1
Column
Latch
Column
Latch
Column
Latch
Column
Latch
0
1
0
1
0
0
1
1
Row
Latch
Row
Latch
Row
Latch
Row
Latch
Column
R
o
w
A
rb
it
ra
ti
o
n
 T
re
e
Arbiter Arbiter
Arbiter
A
rb
it
er
A
rb
it
er
A
rb
it
er
Pixel Array Row (Y) Control Row (Y) Encoder
C
o
l (
X
) C
o
n
tr
o
l
C
o
l (
X
) E
n
co
d
er
Row Output: Y[1:0]
Column Output: X[1:0]
Bus Acknowledge
Bus Request
Figure 6.22: Address-Event-Representation (AER) architecture for a 4x4 array. Illustrated
are all required sub-blocks for an AER sending device, excluding pull-up and pull-down
biases for shared line drivers.
ORASIS: A Micropower Centroiding Vision Processor 182
X_PIXEL-RESET
Y_PIXEL-RESET
S
R Q
QCENTRE
X
_A
RB
.R
EQ
Y_ARB.ACK
Y
_A
RB
.R
EQ
X1
X2 Q1 Q2
Q3
Figure 6.23: Schematic diagram of the sender neuron circuit facilitating the pixel handshake
with the AER row/column latches.
The X ARB.REQ and Y ARB.REQ lines require shared pull-up biasing to avoid ﬂoat-
ing nodes during low activities.
6.5.3 Column/Row Latch
The circuit implementation of the column/row latch block is illustrated in Fig. 6.24. On a
pixel signalling an event, the ARB.REQ signal is inverted and passed to the arbitration tree.
On arbitration, the ARB.ACK of a single row/column latch is asserted that latches the
RS ﬂip-ﬂop (X1, X2). Consequently, the BUS.REQ signal is asserted to alert the receiving
(oﬀ-chip) device that an event is awaiting to be read. In the case of a two-dimensional
arbitration tree; as used in ORASIS, the row and column BUS.REQ signals are AND’ed to
produce a single chip request. On successful oﬀ-chip communication, the receiving device
relays a BUS.ACK signal that resets the selected row and column latches and issues the
PIXEL RESET signals to reset the sender neuron within the sending pixel.
The BUS.REQ line requires a shared pull-down bias to avoid a ﬂoating node during
low activities.
6.5.4 Arbiter Circuit
The circuit implementation of the arbiter block; used to synthesise the row and column
arbitration trees is illustrated in Fig. 6.25. The arbitration tree has the task to select one
of many requests, facilitated through a binary tree hierarchy. The arbiter cell operates on a
single input pair, i.e. by selecting one of two outputs, resolving contention by using a high
gain positive feedback element. The arbiter operates as follows:
ORASIS: A Micropower Centroiding Vision Processor 183
BUS.REQ
(to off-chip)
 ARB.REQ
(from row or column)
X1
Vb
ARB.REQ 
(to arbiter)
ARB.ACK
(from arbiter)
ARB.ACK 
(to address encoder)
PIXEL-RESET
(to row or column)
BUS.ACK
(from off-chip)
X2
X3
X4
X5
X6
Q1
Q2
Figure 6.24: Schematic diagram of the row and column latch circuit; locking a pixel’s
address upon arbitration until being successfully transmitted oﬀ-chip.
• If ARB1.REQ and/or ARB2.REQ is asserted, ARB.REQ is asserted to a higher
arbitration level (logic OR operation).
• If either ARB1.REQ or ARB2.REQ is asserted, the RS ﬂip-ﬂop (X2, X3) is steered
accordingly and on the arbiter receiving an ARB.ACK signal from a higher level, it
signals an ARBX .ACK to the requesting branch.
• If neither ARB1.REQ nor ARB2.REQ are asserted the RS ﬂip-ﬂop enters an un-
deﬁned state, however this doesn’t eﬀect the operation since it will not pass an
ARB.REQ signal to a higher level.
• If both ARB1.REQ and ARB2.REQ are asserted, the RS ﬂip-ﬂop selects the last
asserted and on receiving an ARB.ACK signal from a higher level, it signals an
ARBX .ACK to the selected branch.
6.5.5 Address Encoder
The address encoder circuit is based on a wired-OR topology, shown in Fig.6.26. This
implementation is both reliable and eﬀective for encoding the output, as the arbitration
tree can only select a single output. Therefore each output can be hard-wired to assert the
required digital representation on the AER bus.
ORASIS: A Micropower Centroiding Vision Processor 184
ARB1.ACK
(to lower level)
ARB1.REQ
(from lower level)
ARB2.REQ
(from lower level)
ARB2.ACK
(to lower level)
ARB.REQ
(to higher level)
ARB.ACK
(from higher level)
X2
X3
X5
X1
Q1 Q2
Q3
Q4
Q5
X4
X6
Figure 6.25: Schematic diagram of the arbiter circuit, interconnected hierarchically to syn-
thesise the arbitration trees for row and column selection.
One pull-down bias is required per bus line (bit), to prevent the bus output from ﬂoating
during low activities.
6.6 Fabricated prototypes
The proposed system was developed and fabricated in two stages: ORASIS-P1; the test chip
and ORASIS-P2; the full system (48x48 array). These integrated circuits are implemented
in a standard, commercially available CMOS process; UMC 0.18μm single-poly, six-metal
layer, triple-well (MM/RF) technology.
6.6.1 ORASIS-P1
The ﬁrst fabricated circuit was developed to demonstrate the feasibility of the proposed
distributed analogue signal processing (ASP) and asynchronous binary processing (ABP)
architectures. A 4x8 element ASP and 1x15 element ABP were separately implemented
to verify the expected operation of the two custom (distributed) processing cores. Fur-
thermore, several test structures and prototype circuits were included for validation and
characterisation. The layout of this this test chip (ORASIS-P1) is illustrated in Fig. 6.27.
Details of the test platform used to verify this IC are given in Appendix D.
ORASIS: A Micropower Centroiding Vision Processor 185
ARB.ACK1 ARB.ACK2 ARB.ACK3 ARB.ACK4 ARB.ACK5 ARB.ACK6 ARB.ACK7 ARB.ACK8
ARB.ACK1 ARB.ACK2 ARB.ACK3 ARB.ACK4 ARB.ACK5 ARB.ACK6 ARB.ACK7 ARB.ACK8PD_BIAS
BUS_BIT0
BUS_BIT1
BUS_BIT2
Q1 Q2 Q4 Q6 Q8
Q3 Q5 Q7 Q9
Q10 Q11
Q19 Q20 Q21 Q22 Q23
Q24 Q25 Q26 Q27
Q12 Q15 Q16
Q13 Q14 Q17 Q18
Figure 6.26: Schematic diagram of the address encoder circuit, shown for an eight input,
3-bit output example.
ORASIS: A Micropower Centroiding Vision Processor 186
4x8 ASP pixel array
(for contour and
threshold feature
extraction)
MOS devices
(for subthreshold
characterisation)
1D ABP
(centroid
algorithm)
2x
 6
-b
it
 M
u
lt
ip
le
xe
rs
C
u
rr
en
t 
re
fe
re
n
ce
te
st
 c
ir
cu
it
s
Te
st
 p
h
o
to
d
io
d
es
Misc. circuits
Bondpad, ESD protection and power ring
15
25
μm
1525 μm
Figure 6.27: The ORASIS-P1 test chip layout (top), microphotograph (top right) and
basic ﬂoorplan (bottom); implemented in UMC 0.18μm 1P6M mixed-mode CMOS, accessed
through Europractice (IMEC). The die size is 1.525mm x 1.525mm (excluding scribe line).
Metal layers 5 and 6 have been excluded for clarity.
ORASIS: A Micropower Centroiding Vision Processor 187
6.6.2 ORASIS-P2
The complete system (ORASIS-P2) layout is given in Fig. 6.28.
The design uses 84 physical bondpads (17 per side) uniformly distributed (at 200μm
spacing) with 75μm×65μm passivation opening for bonding within a J-leaded chip carrier
package (PLCC84). The padring is constructed from a standard cell library provided by
virtual silicon for 60μm inline pads. As the design uses RF top metal (20KA) option, the 5
metal layer (logic process) cell library was used to avoid design rule violations. The padring
is split at the left and right edges; with top part being the analogue section and bottom
part being the digital section.
The master current references are on the top edge using oﬀ-chip resistors for wide tun-
ability and ﬁne adjustment to compensate for process variation. Four master currents per
reference are sourced to the array corners to drive the current distribution network (see
Fig. 6.29). The buﬀers illustrated provide the digital fanout for the globally distributed
control inputs.
The address event representation hardware is situated at the bottom (column control)
and right (row control) of the array. The pixel array implemented is a 48x48 matrix using
a tessellation of nine diﬀerent cell types (edge, corner and regular) for correct termination
to provide good array utilisation. The regular cell implementation (layout) is given in
Fig. 6.30.
The pixel ﬂoorplan is arranged such that the ASP (approximately top 65% area) is
separate from the ABP (approximately bottom 35% area). The distributed data “bus” is
routed in metals layers 1 and 3 vertically along the left pixel edge and metal layers 2 and 4
horizontally along the bottom pixel edge (See Table 6.1). Metal layer 5 is used for current
and power supply distribution (horizontal), from the left and right current distribution with
a break after the central pixel (X24) column. Metal 6 is used as a ground plane and a light
blocking screen, to minimise photo-absorption in the substrate, apart from at photodiode
openings.
ORASIS: A Micropower Centroiding Vision Processor 188
C
u
rr
en
t 
d
is
tr
ib
u
ti
o
n
C
u
rr
en
t 
d
is
tr
ib
u
ti
o
n
Column Address Event Representation Circuits (AER)
Ro
w
 A
d
d
re
ss
 E
ve
n
t 
Re
p
re
se
n
ta
ti
o
n
 C
ir
cu
it
s 
(A
ER
)
Master current reference circuits
Test circuits
Pixel Array
(48x48)
Bondpad, ESD protection and power ring
50
00
μm
5000μm
Figure 6.28: The ORASIS-P2 chip layout (top), microphotograph (top right) and basic ﬂoor-
plan (bottom); implemented in UMC 0.18μm 1P6M mixed-mode CMOS, accessed through
Europractice (IMEC). The die size is 5.0mm x 5.0mm (excluding scribe line). Metal layers
5 and 6 have been excluded for clarity.
ORASIS: A Micropower Centroiding Vision Processor 189
Pixel Array
Row bias
and control
distribution
Corner bias
and control
distribution
Column AER 
latch and
address
encoder
Row AER 
latch and
address 
encoder
Row 
arbitration
tree
Column
arbitration
tree
Pixel
Pixel
Pixel
Pixel
Pixel
Pixel
Pixel
Pixel
Pixel
Current
copiers
Current
copiers
Current
copiers
Current
copiers
Current
copiers
Current
copiers
Current
copiers
Current
copiers
Buffers
Buffers
Buffers
B
u
ff
er
s
Figure 6.29: The bottom-right array corner layout (top) and ﬂoorplan (bottom), illustrat-
ing the implementation of the current distribution scheme and array-side address-event
circuitry. Metal layer 6 has been excluded for clarity.
ORASIS: A Micropower Centroiding Vision Processor 190
Photodiode
Averaging, Comparison,
and Thresholding
Vertical 
Edge Detection
Horizontal
Edge Detection
Delay
Bias
Distribution
State Reset
Sender Neuron
Contour
State Memory
State Set Multiplexer
Ex
cl
u
si
ve
-O
R
ASP
ABP
Distributed “Bus”
85
μm
85μm
Figure 6.30: The ORASIS-P2 regular cell layout (top) and ﬂoorplan (bottom). The cell
size is 85μm×85μm with 30μm×30μm active photodiode area, giving a 12.5% surface ﬁll
factor. Metal layers 5 and 6 have been excluded for clarity.
ORASIS: A Micropower Centroiding Vision Processor 191
Metal 1 Metal 2 Metal 3 Metal 4 Metal 5
(top to bottom) (left to right) (top to bottom) (left to right) (left to right)
↑ ↓ ← → ↑ ↓ ← → ← →
1 R1 R IOUT1 ILOC C1 C VOUT VIN1 IB[23:0] IB[23:0]
2 R R3 ITHRU ILOC C C3 EOUT1 EIN1 IT[23:0] IT[23:0]
3 3S1 2S1 R4 R S S3 C4 C VDDA VDDA
4 2S1 S1 R R2 S3 2S3 C C2 VSSA VSSA
5 S1 S C5 C1 2S3 3S3 C7 C3 VSS VSS
6 RESX RESX C1 C6 REQX REQX C3 C8 VDD VDD
7 IOUT3 ITHRU 3S4 2S4 EOUT2 EIN2 S S2 OUT1 OUT1
8 IOUT2 ILOC 2S4 S4 VOUT VIN2 S2 2S2 OUT0 OUT0
9 ICOL ICOL S4 S - - 2S2 3S2 MODE MODE
10 - - RESY RESY - - - - RESL RESL
11 - - ACKY ACKY - - REQY REQY RESG RESG
Table 6.1: Tessellating cellular interconnectivity; in total, each cell has 190 connections with
adjacent cells.
6.7 System Results (Simulated)
System veriﬁcation has been divided into four sections; partly to reduce simulator load
and partly to obtain comprehensive and standalone results at each stage. The ﬁrst three
sections deal with scaled-down ASP, ABP and AER architectures individually, with the
ﬁnal section presenting the overall system results. All the test schematics used are provided
in Appendix C.
6.7.1 ASP
The ASP core was veriﬁed by simulating a 16×16 array with a static single-object image
(photocurrent array) hardwired under diﬀerent conﬁgurations. Each system state is tested
through all process corners to verify robustness to process variations. Furthermore, the
total ASP power consumption is measured under each test condition, given in table 6.2.
The analogue power consumption contribution can be relatively accurately calculated
ORASIS: A Micropower Centroiding Vision Processor 192
Corner tt ss ﬀ snfp fnsp
State 1: VTHRES MODE = 0, VGLOBAL AV = 1.8, Iphoto = 300p, IphotoObj = 50p,
IphotoAverage = 228p, Ibias = 1n, Itune = 250p, Iglobal = 0
Analogue 4345nW 4349nW 4340nW 4343nW 4349nW
Core digital 1771nW 4018nW 1547nW 1736nW 2272nW
Total ASP power (16×16) 6116nW 8367nW 5887nW 6079nW 6621nW
Average ASP power (per cell) 23.89nW 32.68nW 23.00nW 23.75nW 25.86nW
State 2: VTHRES MODE = 1.8, VGLOBAL AV = 1.8, Iphoto = 100p, IphotoObj = 300p,
IphotoAverage = 153p, Ibias = 1n, Itune = 250p, Iglobal = 0
Analogue 4081nW 4077nW 4082nW 4079nW 4082nW
Core digital 1646nW 1463nW 3382nW 1650nW 1998nW
Total ASP power (16×16) 5727nW 5540nW 7464nW 5729nW 6080nW
Average ASP power (per cell) 22.37nW 21.64nW 29.16nW 22.38nW 23.75nW
State 3: VTHRES MODE = 1.8, VGLOBAL AV = 1.8, Iphoto = 1n, IphotoObj = 100p,
IphotoAverage = 761p, Ibias = 1n, Itune = 500p, Iglobal = 0
Analogue 6356nW 6340nW 6374nW 6354nW 6358nW
Core digital 1777nW 1551nW 4016nW 1745nW 2272nW
Total ASP power (16×16) 8133nW 7840nW 10390nW 8099nW 8629nW
Average ASP power (per cell) 31.77nW 30.82nW 40.58nW 31.64nW 33.71nW
Table 6.2: Simulation results for a 16×16 ASP core indicating average power consumption
levels for typical stimuli through the diﬀerent process corners.
ORASIS: A Micropower Centroiding Vision Processor 193
due to the fact that all biasing is current-input, i.e. current-mode. Therefore the average
cellular (analogue) current consumption is expressed in Eq. 6.15. As expected and conﬁrmed
by the corner simulation results, this analogue contribution is nearly constant through the
diﬀerent process corners.
PASP (ana) ≈ Vdda · (7.5Ibias + 3Itune + 7IphotoAverage) (6.15)
However the core digital power consumption is observed to vary substantially with
process variation. This is due to also supplying the current-limiting threshold detectors
that experience a shift in region of operation. The core digital supply current is therefore
independent of average photocurrent level and ASP conﬁguration. The maximum limit
of digital (core) current consumption within the ASP core is expressed in Eq.6.16. This
is based on the maximum simulated current consumption for typical values of internal
mismatch within three current-limiting threshold detectors.
PASP (dig)(max) ≈ Vdd · (10Ilimit) (6.16)
Where: Ilimit is the ﬁrst stage current-limit set in the thresholding inverter.
As this presented ASP architecture (with exception of bias distribution) requires no oﬀ-
array circuits, the total power consumption can be reduced to a cellular average. Based on
the simulated ASP results and measured photodiode responsivity this average is expected
to be in the range 15-40nW per cell (pixel). Therefore scaled to a megapixel array, this
would give 15-40mW total power consumption for both phototransduction and front-end
binary feature extraction.
6.7.2 ABP
The ABP core was veriﬁed by simulating a 9×9 array with a static single-object image
hardwired; by means of providing the ASP outputs (contour and threshold) as a matrix of
distributed inputs within the ABP array. For Idelay = 1.3nA, the transient behaviour is
illustrated in the simulation results given in Fig. 6.31.
These results illustrate the algorithms bio-pulsating action distributed through internal
(array) memory for a preset circular object. The cellular STATE can be observed to ﬁll
ORASIS: A Micropower Centroiding Vision Processor 194
(a)
(b)
(c)
Figure 6.31: Transient analysis simulation results for a 9×9 ABP core illustrating bio-
pulsating action for a single object image. Results shown are taken across the central row
(Y=5) for: (a) state propagation, (b) reset back-propagation and (c) current consumption.
ORASIS: A Micropower Centroiding Vision Processor 195
inwards (Fig. 6.31(a)). On convergence to a centroid cell, the CENTRE signal is ﬂagged,
causing a RESET back-propagation (Fig. 6.31(b)). This sequence then reinitialises and thus
for a static input, the process repeats in a periodic manner.
The average power consumption determined in this simulation is 470nW, providing 1739
processed (centroid) results per second. In a 9×9 array with a circular object of diameter
7 pixels, the active pixels are: πr2=38.5 pixels. By assuming static power dissipation in
non-active cells to be negligible (reviewed later), the ABP consumption per active pixel
is therefore: 470n/38.5=12.21nW. Furthermore, as the delay constant scales linearly with
Idelay, both the activity and power consumption also scale linearly.
6.7.3 AER
The AER architecture was tested using a 12×12 sender neuron array with 12-input row/column
latches, encoders and arbitration trees. Multiple sender neurons selected at random posi-
tions were programmed to output colliding events to test robust arbitration and bus selec-
tion. The colliding events have been arranged in two phases; the sender neurons at positions:
(2,9) (6,9) (5,10) (6,10) (9,10) signalling events at t=50μs and sender neurons: (4,2) (8,2)
(1,3) (5,3) (11,4) (12,4) (4,7) (8,7) (1,8) (5,8) (11,8) (12,8) (9,9) (12,9) (1,10) (2,10) (2,11)
(6,11) (9,11) outputting at t=80μs. The system REQ and ACK signals facilitating the
oﬀ-chip handshake are connected directly through a 3μs delay; representing the receiving
device. The results are given in Fig. 6.32.
The current consumption proﬁle (Fig. 6.32(b)) suggests that a unit energy is required
per event in addition to a static dissipation proportional to the number of rows and columns.
This is expressed in Eq. 6.17.
Paer ≈ Vdd · (Nrows +Ncolumns) [Istatic + (IeventAv ·Nevents · teventAv)] (6.17)
Where: Istatic is the static current (per row/column header overhead), Nrows and Ncolumns
are the number of rows and columns respectively, IeventAv is the average event current,
Nevents is the number of events (per second) and teventAv is the average event time.
The value of Istatic is proportional to the bias current; used in the address encoder
pull-down, bus request pull-up, etc. Thus a higher bias can increase the throughput of the
address-event bus at the expense of static dissipation. From the simulated results presented
ORASIS: A Micropower Centroiding Vision Processor 196
(a)
(b)
Figure 6.32: Transient analysis simulation results for a 12×12 AER sending architecture
illustrating arbitration for 24 colliding events. Results shown are: (a) the AER bus out-
put/handshake and (b) current consumption.
ORASIS: A Micropower Centroiding Vision Processor 197
in Fig. 6.32, typical AER consumption values (for Ibias = 100nA) are: Istatic = 11.16nA
and IeventAv = 225.7nA (determined from EeventAv = 27.08pJ). For example, in a 12× 12
array, outputting 10K events per second the total AER consumption would be 752.9nW.
6.7.4 Overall System
The overall system was simulated and therefore veriﬁed in two stages. Initially a complete
distributed array is simulated with ideal current and voltage sources to conﬁrm correct
array processing functionality and thus validate the cellular processing element. The second
stage involves hierarchically arranging the array to include the current bias and control
distribution tree. Thus the second stage simulation is intended to truly represent a scaled
ﬁnal system validation.
Complete Array
The combined operation of the distributed ASP and ABP cores with the AER oﬀ-chip
communication is veriﬁed by testing a complete 12×12 array. The simulation results are
given in Fig. 6.33.
The total array power consumption is consistent with the expected level; derived from
appropriately scaling constituent component requirements. This comparison is given in Ta-
ble. 6.3. The small diﬀerence in between estimated (4350.2nW) and simulated (4816.4nW)
results can be attributed to the static power dissipation in the ABP core (this was previ-
ously assumed negligible). This therefore represents a static dissipation within the ABP
core of 3.24nW/cell.
Complete System
Having veriﬁed correct operation in the distributed array, a scaled-down system mock
is tested to validate correct hierarchy and to determine power consumption overhead in
global bias distribution. A 12×12 array, arranged in exactly the same hierarchy (i.e. sys-
tem/array/corner/row/cell) as the ﬁnal chip (48×48 array). The simulation results are
given in Fig. 6.34.
As expected, the core power (vdd) consumption is in line with that shown previously,
i.e. as in the 12×12 array simulated results. The power consumption due to global bias
ORASIS: A Micropower Centroiding Vision Processor 198
(a)
(b)
Figure 6.33: Transient analysis simulation results for a 12×12 complete array for a sin-
gle circular object input (8 pixel diameter). Results shown are: (a) the AER bus out-
put/handshake; event at position (6,6) and (b) current consumption.
ORASIS: A Micropower Centroiding Vision Processor 199
(a)
(b)
Figure 6.34: Transient analysis simulation results for a 12×12 complete system (including
bias and signal distribution) for a single circular object input (8 pixel diameter). Results
shown are: (a) the AER bus output/handshake; event at position (6,6) and (b) current
consumption.
ORASIS: A Micropower Centroiding Vision Processor 200
Cellular Power Active Cells Distributed Power
Component Static Dynamic Static Dynamic Static Dynamic Total
ASP 23.89nW - 144 - 3440nW - 3440nW
ABP - 12.21nW - 501 - 610.5nW1 610.5nW
AER 11.16nW 48.75pJ2 243 14 267.8nW3 31.68nW4 340.9nW
Array
(estimated) - - - - 3708nW 642.2nW 4350.2nW
Array
(simulated) - - - - - - 4816.4nW
1 Assuming an input image including a circular object of 4 pixel radius.
2 Energy required per address event output.
3 Static dissipation is per row and column header.
4 Assuming each active cell to be a centroid; generating approximately 650 events per second.
Table 6.3: Comparison between expected (based on constituent ASP, ABP, AER embedded
arrays) and simulated power consumption for a combined 12×12 array.
distribution is observable as the diﬀerence in analogue power (vdda) consumption between
the simulated array and system results. Furthermore, it is apparent that the static (leakage)
dissipation is substantial and in fact is the main source of power consumption within the
ABP core; even at high activities.
6.8 System Results (Measured)
6.8.1 Test Method
A custom testboard has been developed for verifying system functionality (full schematic
provided in Appendix D). The approach taken is to have a dedicated microcontroller
(Microchip PIC18LF4620) facilitating the address event handshake and subsequently storing
the address event data into internal memory until ﬁlled, then streaming out to a PC via a
standard UART (RS232) interface. The drawback of this approach is that the test chip is
only tested in short bursts and therefore the output data (although processed in realtime),
is only available oﬀ-line. This is due to the limited bandwidth of the UART (a maximum
of 115200kbps). The source code for the address-event handshake and sampling has also
been included in Appendix B.
ORASIS: A Micropower Centroiding Vision Processor 201
(0,0)
(0,47) (47,47)
(47,0)
(0,0)
(0,47) (47,47)
(47,0)
(0,0)
(0,47) (47,47)
(47,0)
(a) Test Image 1
(b) Test Image 2
(c) Test Image 3
(24,40)
r=7
(34,40)
r=10
(25,23)
r=6
Figure 6.35: Test images with single uniform objects, with pixel grid overlayed including
measured centroid position and size.
ORASIS: A Micropower Centroiding Vision Processor 202
(0,0)
(0,47) (47,47)
(47,0)
(0,0)
(0,47) (47,47)
(47,0)
(0,0)
(0,47) (47,47)
(47,0)
(a) Test Image 4
(b) Test Image 5
(c) Test Image 6
(32,33)
r=8
(34,36)
r=6
(34,36)
r=7
Figure 6.36: Test images with single non-uniform objects, with pixel grid overlayed including
measured centroid position and size.
ORASIS: A Micropower Centroiding Vision Processor 203
(0,0)
(0,47) (47,47)
(47,0)
(0,0)
(0,47) (47,47)
(47,0)
(a) Test Image 7
(b) Test Image 8
(c) Test Image 9
(0,0)
(0,47) (47,47)
(47,0)
(10,9)
r=7
(31,32)
r=14
(13,13)
r=7
(13,37)
r=7
(23,37)
r=6
(24,23)
r=6
(39,39)
r=7
(9,39)
r=7
(9,9)
r=7
(39,8)
r=7
Figure 6.37: Test images with multiple uniform objects, with pixel grid overlayed including
measured centroid positions and sizes.
ORASIS: A Micropower Centroiding Vision Processor 204
For image acquisition, a 2/3” format CCTV lens (Pentax C1614A) is mounted a ﬁxed
distance (16mm ﬁxed focal length) above the bare silicon surface. Subsequently, a thin-ﬁlm
transistor (TFT) liquid crystal display (LCD) is used to produce the image, positioned
approximately 40cm perpendicular to the focal plane of the ORASIS P2 chip. The region
on the TFT display focused onto the photodiode array is then determined through power
consumption measurements. Initially, a narrow white rectangular region incident inside
the photodiode array is extended both in X and Y axis, until no further increase in current
consumption is measurable. At this point the entire array has been illuminated and therefore
the array boundaries have been established. Furthermore, to determine the incident light
intensity, test photodiodes devices previously characterised are now used to provide this
calibration.
6.8.2 System Functionality
This setup is used to conﬁrm system functionality within the intended design speciﬁcations.
Sample images, projected onto the ORASIS P2 chip and corresponding measurements are
presented in Figs. 6.35, 6.36 and 6.37. These illustrate both single and multiple object
detection, centroiding and sizing. Typically the measured centroid and size measurements
are within the actual object boundaries, i.e. the system tends to under rather than over
estimate. Furthermore, uneven objects are successfully detected but with inaccurate cen-
troid and position estimates, again within the actual object boundaries (see Fig. 6.36b,c).
However, overlapping objects are detected as a single uneven object (see Fig. 6.36a).
Accuracy
The accuracy, as expected is at best3 limited to single pixel resolution for centroid position
and object radius.
An interesting observation has been a small random deviation (±2 pixels) in object cen-
troid location, resulting in a similar deviation in object size. At ﬁrst glance this ﬂuctuation
was passed oﬀ as an error, however on closer examination it has shown to be able to provide
sub-pixel accuracy (through successive averaging) having a pseudo-dithering eﬀect. This
can be explained due to an edge eﬀect caused by an imperfectly focused image or a graded
object boundary. Subsequently, the static (spatial) ﬁxed-pattern noise (FPN) coupled with
3Best performance is achieved in images with high contrast ratio (i.e. dynamic range) and relatively high
incident light intensity.
ORASIS: A Micropower Centroiding Vision Processor 205
Figure 6.38: Pseudo-dithering providing increased centroid position accuracy through suc-
cessive averaging.
the (temporal) ﬂicker noise within the edge detector blocks provide this statistically-biased
dithering eﬀect. As a result, this provides a mechanism to enable centroid processing time
to be tradable with centroid position accuracy, illustrated through trends on measured data
shown in Figs. 6.38 and 6.39.
This suggests that less than 1% error is achievable for centroid position and radius
measurement by using 10-12 events per result. On this 48×48 pixel array, this corresponds
to approximately half pixel accuracy.
6.8.3 Power Consumption
The measured power consumption levels are generally in line with the previously presented
simulated results. The measured results partition the total power consumption into the
following sources:
ORASIS: A Micropower Centroiding Vision Processor 206
Figure 6.39: Pseudo-dithering providing increased object size accuracy through successive
averaging.
Analogue Consumption
This represents the analogue power consumption within the distributed array, including the
photocurrents, local and global averaging and threshold/edge detection circuitry. Measured
ASP consumption is within 5% of the simulated results. This can be attributed to the fact
that all the ASP circuits are biased using current mode techniques, i.e. all inputs are
currents. As expected, the ASP power is largely dependant on bias currents and incident
light intensity, typically being in the range: 15-50μW, as illustrated in Figs. 6.40 and 6.41.
Digital Consumption (Leakage)
This represents the subthreshold leakage current within the digital core distributed through-
out the array. This has been insuﬃciently considered at design time and has shown to be
a main source of power consumption within the ABP core. Power consumption due to this
leakage is in the order of 40-60μW.
For such applications with relaxed bandwidth requirements, static leakage can be mas-
sively reduced by either increasing the device channel length moderately, or by applying
ORASIS: A Micropower Centroiding Vision Processor 207
Figure 6.40: Measured supply current levels illustrating the eﬀect of tuning main bias
current (feeding edge detectors and discrete delays) on system power consumption.
ORASIS: A Micropower Centroiding Vision Processor 208
Figure 6.41: Measured supply current levels illustrating the eﬀect of illumination level
on system power consumption for various tuning bias current levels (controlling the edge
detecting threshold).
ORASIS: A Micropower Centroiding Vision Processor 209
a reverse bias on the bulk/source junction. This eﬀect is illustrated in Fig. 6.42. It can
therefore be deduced that using NMOS devices of channel length 400nm and PMOS devices
of channel length 250nm can reduce static leakage tenfold in comparison to using minimum
feature length (180nm) devices.
Digital Consumption (Static)
This represents the static current supply to the digital core distributed throughout the array.
This is virtually all due to “digital” short-circuit current caused by incomplete thresholding,
i.e. logic gates with non-perfectly discrete inputs. The exact amount of static dissipation is
dependant on the conﬁguration of the edge detection circuitry, i.e. bias current levels and
input light intensity. This dependance is clearly illustrated in Figs. 6.40 and 6.41.
ABP static consumption could be expected to account for up to 80% of the total sys-
tem power requirements in certain conﬁgurations. However, a signiﬁcantly lower level of
static dissipation has been measured from the expected (simulated) results. The reason for
this is that the ﬁxed-pattern noise provides a random oﬀset to the edge detector inputs.
Consequently, this inherently biases the diﬀerential edge detector output to always have an
oﬀset. As the simulations have considered only images of uniform background intensity,
this would represent the maximum static dissipation, i.e. when both inputs to a logic gate
are not discrete and ﬂoating.
Digital Consumption (Dynamic)
This represents power consumption directly related to the distributed binary signal propa-
gation and therefore proportional to the activity. In addition the address-event bus activity
inﬂuences this level. For typical activities, this has been measured to represent only a
10-15% portion of the total system power consumption. Therefore, no substantial power
saving can be achieved by operating the device at a reduced duty cycle.
Other
Power consumption of the I/O cells (obtained from a standard cell library [9]) have not been
included, as these have been characterised by the vendor and this consumption is largely
dependant on the external circuitry interfacing to the chip, i.e. input capacitances.
ORASIS: A Micropower Centroiding Vision Processor 210
18
0n
m
20
0n
m
25
0n
m
30
0n
m
40
0n
m
50
0n
m
18
0n
m
20
0n
m
25
0n
m
30
0n
m
40
0n
m
50
0n
m
(a)
(b)
Figure 6.42: Eﬀect of channel length and bulk (reverse) bias on static leakage (oﬀ) current
(at Vds = 1.8V, Vgs = 0V) for (a) NMOS and (b) PMOS devices.
ORASIS: A Micropower Centroiding Vision Processor 211
Delay bias current (nA)
Figure 6.43: Dependance of process time on bias current, given for input images including
objects of maximum size of 3, 4, 5, 6 and 8 pixel radius.
6.8.4 Processing Time
Although the asynchronous nature of this distributed system produces temporally unsyn-
chronised events between diﬀerent objects (due to the local resetting), the algorithm can
be run in a “single-shot” mode, and a clock applied to the global reset input. Using this
technique a true high frame rate processor can be realised, the limiting factor being the
maximum size of detectable object, i.e. the maximum propagation delay. This can in fact
be tuned as the internal propagation delay is controlled by a bias current. This relationship
between bias current and process time/frame rate is illustrated in Fig. 6.43.
6.8.5 AER Bandwidth
As the Address-Event bus bias is tuneable, the bandwidth dependance on bias current
has been measured, illustrated in Fig. 6.44. Since the bus bandwidth requirement for this
application is relatively moderate, high bandwidth and channel utilisation is generally not
ORASIS: A Micropower Centroiding Vision Processor 212
Nominal capacity
Figure 6.44: Measured address-event bus capacity (bandwidth) by varying pull-up/pull-
down bias currents.
an issue. However, for colliding events, the arbitration scheme introduces some latency, and
if comparable to the internal propagation delay, this could somewhat distort the information
extracted. Therefore a high address-event bandwidth is favourable to achieve a low latency
for maintaining temporal resolution.
6.9 Summary
In this chapter a vision processing chip has been presented for object size and centre de-
tection. It is the ﬁrst system reporting multiple (unlimited) object centroid processing
capability. Furthermore the developed system demonstrates high computational eﬃciency;
implementing a computationally intensive algorithm with micropower consumption. Al-
though the developed system includes only a 48×48 cell array; with a cellular power budget
of a few tens of nanowatts, scaled to a megapixel array this would only require a few tens of
ORASIS: A Micropower Centroiding Vision Processor 213
milliwatts. Also, the fabricated system has shown to utilise ﬁxed pattern noise favourably
(as in neurobiology), both reducing power consumption and increasing accuracy through
successive sampling. The achieved system speciﬁcation is summarised in Table. 6.4.
At a component level, novel contributions include a discrete edge detector topology, a
locally/globally-averaging threshold detector network and an asynchronous spatiotemporal
bio-pulsating core. At an architectural level the contribution is a dedicated vision processor
capable of delivering thousands of processed centroids per object every second at ultra-low
power levels; at present, unachievable by conventional means.
ORASIS: A Micropower Centroiding Vision Processor 214
Technology UMC 0.18μm MM/RF 1P6M CMOS
Supply voltage 1.8V core (3.3V I/O)
Bias current range 50nA to 2μA (for Iaer)
250pA to 10nA (for Ibias)
50pA to 2nA (for Itune)
Photosensitivity 6 decades, from 100nW/cm2 to 100mW/cm2
Responsivity 0.18A/Wcm2 (for blue light @ λ = 480nm)
0.28A/Wcm2 (for green light @ λ = 550nm)
0.32A/Wcm2 (for red light @ λ = 650nm)
Pixel Level
Pixel size 85μm× 85μm
Surface ﬁll factor 12.46%
Pixel device count 277
Pixel power 23.04 nW (ASP)
73.44 nW (ABP)
96.48 nW (total)
System Level
Die dimensions 5mm × 5mm
Array size 48 × 48 pixels
System device count 745,200
System power 222.28 μW (array)
20.3 μW (other)
243.6 μW (total)
Accuracy (centroid and radius) ±1 pixel1
Equivalent image process time 0.5ms (maximum)2
Address-event bandwidth 0.61 MEPS3(at Iaer=1μA)
Equivalent computational eﬃciency 1.38 μW per MIPS2
1 Using successive sampling the accuracy can be increased to ±0.5 pixel.
2 For a test image (with average incident power density of 6μW/cm2) consisting of
5 objects of 10 pixel diameter (at Ibias=2.5nA, Itune=250pA) .
3 MEPS = Million Events Per Second
Table 6.4: ORASIS-P2 system properties and performance summary
References
[1] T. G. Constandinou, J. Georgiou and C. Toumazou, “Towards a Bio-inspired Mixed-
signal Retinal Processor,” Proceedings of the IEEE International Symposium on Circuits
and Systems, vol. 5, pp. 493–496, 2004.
[2] C. A. Mead, Analog VLSI and Neural Systems. Addison-Wesley, 1989.
[3] M. Mahowald, VLSI Analogs of Neuronal Visual Processing: A Sythesis of Form and
Function. PhD thesis, California Institute of Technology, Pasadena, California, 1992.
[4] C. Toumazou, F. J. Lidgey and D. G. Haigh, Analog IC Design: The Current-Mode
Approach. London: Peter Perigrinus, 1990.
[5] T. G. Constandinou, J. Georgiou and C. Toumazou, “A Nanopower Tuneable Edge
Detection Circuit,” Proceedings of the IEEE International Symposium on Circuits and
Systems, vol. 1, pp. 449–452, 2004.
[6] T. G. Constandinou, J. Georgiou and C. Toumazou, “Nano-power mixed-signal tunable
edge-detection circuit for pixel-level processing in next generation vision systems,” IEE
Electronics Letters, vol. 39, no. 25, pp. 1774–1775, 2004.
[7] J. Georgiou, Micropower Electronics for Neural Prosthetics. PhD thesis, Imperial College
of Science, Technology and Medicine, University of London, 2002.
[8] P. Ha¨ﬂiger, A Spike-based Learning Rule and its Implementation in Analog Hardware”.
PhD thesis, ETH Zu¨rich, Switzerland, 2000.
[9] “L180 60um in-line i/o library,” UMCL18U350T2, Virtual Silicon Corp, 2002.
215
Chapter 7
Conclusion
This thesis explores and develops biologically-inspired vision processing using distributed
hybrid electronics in CMOS technology.
Chapter 2 introduces neurobiology through system organisation and neural primitives
to data representation and spike coding, in particular in reference to the vision system.
The notions of biologically-inspired representation and hybrid computation have then been
examined in the context of microelectronic integration. Related design and implementation
issues for bio-inspired computation have then been outlined speciﬁcally in reference to weak
inversion analogue and asynchronous binary (or spike domain) computation.
Chapter 3 reviews current state-of-the-art silicon-based imaging technologies and dis-
cusses their suitability for integration with processing hardware. This leads to a direct
comparison of sequential and distributed topologies for vision processing in custom hard-
ware. A speciﬁc vision processing function, centroid detection (and object segmentation)
has then been targeted and a comprehensive review of research and development in that
arena is given.
7.1 Contributions
Chapter 4 presents a novel distributed algorithm for parallel centroid detection with in-
herent object segmentation and sizing functionality. It describes the functionality both
qualitatively and analytically and provides both experimental veriﬁcation and an intuitive
reasoning behind the high robustness and inherent tolerance to ill-conditioned data and
216
Conclusion 217
process variations. A pixel architecture for hardware implementation is proposed and an
equivalent software algorithm is coded. Subsequently, the computational load is estimated
and power consumption ﬁgures are reported by considering benchmark computational eﬃ-
ciencies for state-of-the-art processing hardware. Finally, a generic distributed processing
paradigm for hardware implementation is outlined based on the underlying principles of
the presented algorithm. The versatility of this array processing platform is reinforced by
outlining two speciﬁc distributed algorithms directly implementable using this technique.
Chapter 5 reviews silicon-based photodiode modelling speciﬁcally related to CMOS pn-
junction devices. Furthermore, discussed implementation issues and design techniques for
deep submicron technologies are consolidated with measured data from fabricated devices.
A review of common photodiode interface topologies is then followed by a novel front-end
spiking photoreceptor circuit, with adaptive selection of ON/OFF-encoded channels. The
topology is intended for use in adaptable foveating vision chips, where spatial and temporal
resolution can be dynamically reconﬁgured locally.
Chapter 6 describes a novel vision chip implementing the bio-pulsating contour reduction
algorithm described previously in chapter 4. This device is the ﬁrst silicon retina reported
capable of parallel centroiding of unlimited objects and returning object size in addition to
centroid position. The presented system implements a retinocortically-inspired organisa-
tion employing an asynchronous binary algorithm combined with continuous time feature
extraction. The developed architecture, organisation and circuit topologies are described in
detail including both simulated and measured results to validate the theory. The presented
system advances vision chip development into exploring new distributed architectures based
on hybrid pixels, whilst maintaining micropower consumption and good system stability.
7.2 Recommendations for Future Work
Future developments based on material described in this thesis are proposed in the following
areas:
7.2.1 System Optimisation, Enhancement and Development
Although the system has been designed with low power consumption in mind, it could
yet further be optimised, also in accuracy, speed and silicon area. At a circuit level, the
Conclusion 218
current comparators could be redesigned for lower sensitivity to process variation, the global
averaging to include both row and column aggregates and the edge detector to be tunable
through a single bias. Furthermore, the asynchronous logic can be reduced through custom
device-level logic minimisation [1] and static dissipation be reduced by increasing device
length moderately.
At an functional level, the edge and contour detection could be improved by implement-
ing a thresholding diﬀerence of Gaussian function. Alternatively an adaptive photoreceptor
topology [2] could be used to dynamically bias the edge detector block to provide a local
automatic gain control and thus improve SNR. Furthermore, the threshold detection could
be massively improved by optimising the averaging/smoothing functions.
At a system level, the algorithm could be modiﬁed to return the aspect ratio of detectable
objects, i.e the W/L. Other techniques for object segmentation could be used to provide
versatility to a larger range of object types/input images. For example, colour segmentation
[3] could provide a condition for object segmentation where intensity alone fails.
At an technological level, this system is ideally implementable in one of the upcoming
3D CMOS technologies [4] [5], slicing the distributed architecture to several layers, thus
increasing ﬁll factor whilst massively reducing both the footprint and interconnectivity re-
quirements. A compact cellular footprint could then open the door to a megapixel resolution
vision processor.
7.2.2 Hybrid Distributed Algorithm Design and Implementation
There is great scope to continue work on hybrid distributed algorithms as initiated in this
thesis. By dissociating the front-end feature extraction from the higher level back-end al-
gorithm in a distributed fashion as described paves the way for implementing countless
computationally demanding algorithms. Moreover, towards the end of chapter 4, two spe-
ciﬁc examples including such hardware implementable algorithms have been proposed.
Ultimately, the proposed hybrid distributed architecture could be extended to imple-
ment an FPGA-like vision processor with a reconﬁgurable back-end for providing a generic
platform for custom binary algorithm implementation. This would provide the perfect
compliment to the front-end reconﬁgurability already developed within the Cellular Neural
Network (CNN) community [6] [7] [8].
References
[1] N. H. E. Weste and K. Eshraghian, Priciples of CMOS VLSI design: A systems per-
spective. Addison-Wesley, 1993.
[2] T. Delbrck and C. A. Mead, “Analog VLSI phototransduction by continuous-time, adap-
tive, logarithmic photoreceptor circuits,” Vision Chips: Implementing vision algorithms
with analog VLSI circuits, by C. Koch and H. Li eds., pp. 139–161, 1995.
[3] R. Merrill, “Color separation in an active pixel cell imaging array using a triple-well
structure.” US Patent Number 5,965,875, 1999.
[4] R. Islam, C. Brubaker, P. Lindner and C. Schaefer, “Wafer level packaging and 3D
interconnect for IC technology,” IEEE/SEMI Conference and Workshop on Advanced
Semiconductor Manufacturing, pp. 212–217, 2002.
[5] J. Baliga, “Chips go vertical [3D IC interconnection],” IEEE Spectrum, vol. 41, no. 3,
pp. 43–47, 2004.
[6] T. Roska and A. Rodr´ıguez-Va´zquez, eds., Towards the Visual Microprocessor: VLSI
Design and the Use of Cellular Neural Network Universal Machines. Wiley, 2001.
[7] G. L Cembrano, A. Rodriguez-Vazquez, R. C. Galan, F. Jimenez-Garrido, S. Espejo and
R Dominguez-Castro, “A 1000 FPS at 128x128 Vision Processor With 8-Bit Digitized
I/O,” IEEE Journal of Solid State Circuits, vol. 39, no. 2, pp. 1044–1055, 2004.
[8] A. Rodriguez-Vazquez, G. Linan-Cembrano, L. Carranza, E. Roca-Moreno, R. Carmona-
Galan, F. Jimenez-Garrido, R. Dominguez-Castro and S. E. Meana, “ACE16k: The
Third Generation of Mixed-Signal SIMD-CNN ACE Chips Towards VSoCs,” IEEE
Transactions on Circuits and Systems I: Regular Papers, vol. 51, no. 5, pp. 851–863,
2004.
219
Appendix A
Algorithm Simulation Source Code
220
Algorithm Simulation Source Code 221
{******************************************************************************
Program : ORASIS Simulator
Module : MAIN.PAS
Date : See File Timestamp
Author : Timothy G Constandinou
Company : Imperial College London
*****************************************************************************}
unit main;
interface
uses
Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
ExtCtrls, StdCtrls, Buttons, ExtDlgs, Menus, ComCtrls, Grids, Math;
const
xgrid = 250;
ygrid = 250;
maxresets = 100000;
maxsize = 27;
defaultfpn = 10;
defaultdefects = 10;
defaultedge = 128;
defaultthreshold = 128;
defaultbgcolour = 0;
type
Tmainform = class(TForm)
Image1: TImage; Image2: TImage; Image3: TImage; Image4: TImage;
Image5: TImage; Image6: TImage; Image7: TImage; Label1: TLabel;
Label2: TLabel; Label3: TLabel; Label4: TLabel; Label5: TLabel;
Label6: TLabel; Label8: TLabel; OpenPictureDialog1: TOpenPictureDialog;
MainMenu1: TMainMenu; FreqGrid: TStringGrid; N1: TMenuItem; N2: TMenuItem;
N3: TMenuItem; File1: TMenuItem; Open1: TMenuItem; Exit1: TMenuItem;
Mode: TMenuItem; Scan1: TMenuItem; Help1: TMenuItem; About1: TMenuItem;
N01fps: TMenuItem; N1fps: TMenuItem; N5fps: TMenuItem; N10fps: TMenuItem;
MaxRefresh1: TMenuItem; SaveImages1: TMenuItem; Monochrome1: TMenuItem;
SingleFrame1: TMenuItem; Random1: TMenuItem; NoiseBox1: TGroupBox;
GroupBox1: TGroupBox; ThresholdsBox: TGroupBox; AddFlatButton: TButton;
AddGaussianButton: TButton; AddSpeckleButton: TButton; Label10: TLabel;
Label11: TLabel; FPN: TEdit; Defects: TEdit; AddSaltButton: TButton;
AddPepperButton: TButton; AddSaltPepperButton: TButton; Label7: TLabel;
ResetAllButton: TButton; AutoThresholdButton: TButton; Label9: TLabel;
SetThresholdsButton: TButton; GlobalThresholdLabel: TLabel;
EdgeThresholdLabel: TLabel; GlobalThreshold: TTrackBar; Label12: TLabel;
EdgeThreshold: TTrackBar; ResultDesc: TEdit; AppendFileButton: TButton;
procedure FormCreate(Sender: TObject);
Algorithm Simulation Source Code 222
procedure IdleHandler(Sender: TObject; var Done: Boolean);
procedure Exit1Click(Sender: TObject);
procedure Open1Click(Sender: TObject);
procedure GlobalThresholdChange(Sender: TObject);
procedure EdgeThresholdChange(Sender: TObject);
procedure AutoThresholdButtonClick(Sender: TObject);
procedure AddFlatButtonClick(Sender: TObject);
procedure ResetAllButtonClick(Sender: TObject);
procedure About1Click(Sender: TObject);
procedure AddSaltButtonClick(Sender: TObject);
procedure Scan1Click(Sender: TObject);
procedure Random1Click(Sender: TObject);
procedure SetThresholdsButtonClick(Sender: TObject);
procedure MaxRefresh1Click(Sender: TObject);
procedure N10fpsClick(Sender: TObject);
procedure N5fpsClick(Sender: TObject);
procedure N1fpsClick(Sender: TObject);
procedure N01fpsClick(Sender: TObject);
procedure SingleFrame1Click(Sender: TObject);
procedure Image1Click(Sender: TObject);
procedure SaveImages1Click(Sender: TObject);
procedure Monochrome1Click(Sender: TObject);
procedure AddPepperButtonClick(Sender: TObject);
procedure AddSaltPepperButtonClick(Sender: TObject);
procedure AddSpeckleButtonClick(Sender: TObject);
procedure AddGaussianButtonClick(Sender: TObject);
procedure AppendFileButtonClick(Sender: TObject);
private
{ Private declarations }
public
{ Public declarations }
end;
TPixelElement = record Pixel : integer; Status : boolean; end;
TSizeElement = record Count, Size : integer; end;
TPixelArray = array[1..xgrid, 1..ygrid] of TPixelElement;
TSizeArray = array[1..xgrid, 1..ygrid] of TSizeElement;
TReset = record x, y : array[1..maxresets] of integer; n : integer; end;
var
mainform : TMainForm;
OutFile : textfile;
SizeArray : TSizeArray;
Centre, Reset : TReset;
DoResetAll : Boolean;
CurrGen : integer;
OrigArray, PixelArray, PixelArray2 : TPixelArray;
BgCol, FgCol1, FgCol2, FgCol3, FgCol4, FgCol5: TColor;
Algorithm Simulation Source Code 223
implementation
uses about; {$R *.DFM}
{ GENERIC FUNCTIONS }
procedure Mul(var a : integer; b : double);
begin a := round(a * b); end;
procedure Delay(msecs:integer);
var
FirstTickCount:longint;
begin
FirstTickCount:=GetTickCount;
repeat
Application.ProcessMessages; {allowing access to other controls, etc.}
until ((GetTickCount-FirstTickCount) >= longint(msecs));
end;
function Convert24bitTo8bitGrey(incolor : integer) : integer;
begin
result := 0;
while (incolor > 65536) do
begin incolor := incolor - 65536; inc(Result); end;
while (incolor > 256) do
begin incolor := incolor - 256; inc(Result); end;
Result := (Result + incolor) div 3;
end;
function Convert8bitGreyTo24bit(incolor : integer) : integer;
begin result := (incolor * 65536) + (incolor * 256) + incolor; end;
procedure EmptySizeArray(var TempArray : TSizeArray);
var
x, y : integer;
begin
for y := 1 to XGrid do
for x := 1 to YGrid do
begin
TempArray[x,y].count := 0;
TempArray[x,y].size := 0;
end;
end;
procedure EmptyPixelArray(var TempArray : TPixelArray);
var
x, y : integer;
begin
for y := 1 to XGrid do
Algorithm Simulation Source Code 224
for x := 1 to YGrid do
begin
TempArray[x,y].Pixel := 0;
TempArray[x,y].Status := FALSE;
end;
end;
function GetFPS() : integer;
begin
Result := 0;
if mainform.N10fps.checked then Result := 100;
if mainform.N5fps.checked then Result := 200;
if mainform.N1fps.checked then Result := 1000;
if mainform.N01fps.checked then Result := 10000;
end;
{ PROCESSING PROCEDURES }
procedure ImageToArray();
var
x, y : integer;
begin
for y := 1 to mainform.Image1.height do
for x := 1 to mainform.Image1.width do
PixelArray[x*2-1,y*2-1].Pixel :=
Convert24bitTo8bitGrey(mainform.Image1.Canvas.Pixels[x,y]);
end;
function isEdge(x1, y1, x2, y2 : integer) : boolean;
begin
Result := FALSE;
if (abs(PixelArray[x1, y1].Pixel - PixelArray[x2, y2].Pixel) >
mainform.EdgeThreshold.Position) then Result := TRUE;
end;
procedure CalcEdges();
var
x, y : integer;
begin
for y := 1 to mainform.Image1.height do
for x := 1 to mainform.Image1.width do
begin
PixelArray[x*2, y*2-1].Status := IsEdge(x*2-1, y*2-1, x*2+1, y*2-1);
PixelArray[x*2-1, y*2].Status := IsEdge(x*2-1, y*2-1, x*2-1, y*2+1);
end;
end;
procedure CalcEdgeNodes(var TempPixelArray:TPixelArray);
var
Algorithm Simulation Source Code 225
x, y, temp : integer;
begin
for y := 1 to mainform.Image1.height do
for x := 1 to mainform.Image1.width do
begin
temp := 0;
if TempPixelArray[x*2-1, y*2].Status then inc(Temp);
if TempPixelArray[x*2+1, y*2].Status then inc(Temp);
if TempPixelArray[x*2, y*2-1].Status then inc(Temp);
if TempPixelArray[x*2, y*2+1].Status then inc(Temp);
if (temp = 2) and (PixelArray[x*2, y*2].Pixel >
mainform.GlobalThreshold.Position) then
TempPixelArray[x*2, y*2].Status := TRUE;
end;
end;
procedure CalcAverages();
var
x, y, i, j, av : integer;
begin
for y := 1 to mainform.Image1.height do
for x := 1 to mainform.Image1.width do
begin
av := 0;
for j := 1 to 4 do for i := 1 to 4 do
inc(av, PixelArray[x*2+(i*2)-5, y*2+(j*2)-5].Pixel);
PixelArray[x*2, y*2].Pixel := av div 16;
end;
end;
procedure CalcGlobalAverage();
var
x, y, temp : integer;
begin
temp := 0;
for y := 1 to mainform.Image1.height do
for x := 1 to mainform.Image1.width do
temp := temp + PixelArray[x*2-1, y*2-1].Pixel;
temp := temp div (mainform.Image1.height * mainform.Image1.width);
mainform.GlobalThreshold.Position := temp;
end;
procedure ResetPixel(x, y : integer);
begin
inc(reset.n);
reset.x[reset.n] := x;
reset.y[reset.n] := y;
Algorithm Simulation Source Code 226
PixelArray[x,y].Status := FALSE;
PixelArray2[x,y].Status := FALSE;
if (x>2) and PixelArray2[x-2,y].Status then ResetPixel(x-2,y);
if (y>2) and PixelArray2[x,y-2].Status then ResetPixel(x,y-2);
if (x+2<(2*mainform.image1.width)) and PixelArray2[x+2,y].Status then
ResetPixel(x+2,y);
if (y+2<(2*mainform.image1.height)) and PixelArray2[x,y+2].Status then
ResetPixel(x,y+2);
end;
procedure CalcStaticNoise(NoiseType : string);
var
x, y, randlimit : integer;
randspread, temp : double;
begin
randomize;
randlimit := round(2.55 * strtofloat(mainform.FPN.text));
randspread := (strtofloat(mainform.FPN.text) / 100);
for y := 1 to mainform.Image1.height do
for x := 1 to mainform.Image1.width do
begin
if (NoiseType=’flat’) then
begin
temp := randlimit / 2 - random(randlimit);
inc(PixelArray[x*2-1,y*2-1].Pixel, round(temp));
end;
if (NoiseType=’gaussian’) then
begin
temp := RandG(0, (randlimit / 6));
inc(PixelArray[x*2-1,y*2-1].Pixel, round(temp));
end;
if (NoiseType=’speckle’) then
begin
temp := 1 + randspread * (random - 0.5);
mul(PixelArray[x*2-1,y*2-1].Pixel, temp);
end;
if (PixelArray[x*2-1,y*2-1].Pixel > 255) then
PixelArray[x*2-1,y*2-1].Pixel := 255;
if (PixelArray[x*2-1,y*2-1].Pixel < 0) then
PixelArray[x*2-1,y*2-1].Pixel := 0;
end;
Mainform.ResultDesc.Text := Mainform.ResultDesc.Text + NoiseType + ’=’ +
mainform.FPN.text + ’, ’;
end;
procedure CalcSaltPepperNoise(NoiseType : string);
var
Algorithm Simulation Source Code 227
n, x, y, defects : integer;
begin
randomize;
defects := round(strtofloat(mainform.defects.text));
for n := 1 to defects do
begin
x := random(mainform.Image1.width);
y := random(mainform.Image1.height);
if (NoiseType=’salt’) then PixelArray[x*2-1,y*2-1].Pixel := 255;
if (NoiseType=’pepper’) then PixelArray[x*2-1,y*2-1].Pixel := 0;
end;
Mainform.ResultDesc.Text := Mainform.ResultDesc.Text + NoiseType + ’=’ +
inttostr(defects) + ’, ’;
end;
procedure CalcNextGen;
var
x, y : integer;
begin
centre.n := 0; reset.n := 0; PixelArray2 := PixelArray;
for y := 2 to mainform.Image1.height-1 do
for x := 2 to mainform.Image1.width-1 do
begin
if (PixelArray[x*2, y*2].Pixel < mainform.GlobalThreshold.Position)
and not(PixelArray[x*2, y*2].Status) then
begin
if PixelArray[(x-1)*2, y*2].Status or
PixelArray[(x+1)*2, y*2].Status or
PixelArray[x*2, (y-1)*2].Status or
PixelArray[x*2, (y+1)*2].Status then
PixelArray2[x*2, y*2].Status := TRUE;
if (PixelArray[(x-3)*2, y*2].Status and
PixelArray[(x+3)*2, y*2].Status) and
(PixelArray[x*2, (y-3)*2].Status and
PixelArray[x*2, (y+3)*2].Status) then
begin
centre.n := centre.n + 1;
centre.x[centre.n] := x * 2;
centre.y[centre.n] := y * 2;
ResetPixel(x*2, y*2);
CalcEdgeNodes(PixelArray2);
end;
end;
end;
PixelArray := PixelArray2;
Algorithm Simulation Source Code 228
end;
procedure CalcNextGenRandom;
var
x, y, count, maxcount : integer;
TempArray : TPixelArray;
begin
centre.n := 0; reset.n := 0; count := 0; PixelArray2 := PixelArray;
maxcount := (mainform.Image1.width - 2) * (mainform.Image1.height - 2);
EmptyPixelArray(TempArray);
while (count < maxcount) do
begin
x := 1 + random(mainform.Image1.width - 1);
y := 1 + random(mainform.Image1.height - 1);
if not(TempArray[x,y].Status) then
begin
TempArray[x,y].Status := TRUE;
inc(count);
if (PixelArray[x*2, y*2].Pixel < mainform.GlobalThreshold.Position)
and not(PixelArray[x*2, y*2].Status) then
begin
if PixelArray[(x-1)*2, y*2].Status or
PixelArray[(x+1)*2, y*2].Status or
PixelArray[x*2, (y-1)*2].Status or
PixelArray[x*2, (y+1)*2].Status then
PixelArray2[x*2, y*2].Status := TRUE;
if (PixelArray[(x-3)*2, y*2].Status and
PixelArray[(x+3)*2, y*2].Status) and
(PixelArray[x*2, (y-3)*2].Status and
PixelArray[x*2, (y+3)*2].Status) then
begin
inc(centre.n);
centre.x[centre.n] := x * 2;
centre.y[centre.n] := y * 2;
ResetPixel(x * 2, y * 2);
CalcEdgeNodes(PixelArray2);
end;
end;
end;
end;
PixelArray := PixelArray2;
end;
{ DISPLAY PROCEDURES }
procedure DisplayImage();
Algorithm Simulation Source Code 229
var
x, y : integer;
begin
mainform.Image1.Canvas.Brush.Color := BgCol;
mainform.Image1.Canvas.FillRect
(Rect(0,0,mainform.Image1.Width, mainform.Image1.Height));
for y := 1 to mainform.Image1.height do
for x := 1 to mainform.Image1.width do
mainform.Image1.Canvas.Pixels[x,y] :=
Convert8bitGreyTo24bit(PixelArray[x*2-1, y*2-1].Pixel);
end;
procedure DisplayAverages();
var
x, y : integer;
begin
mainform.Image2.Canvas.Brush.Color := BgCol;
mainform.Image2.Canvas.FillRect
(Rect(0,0,mainform.Image2.Width, mainform.Image2.Height));
for y := 1 to mainform.Image1.height do
for x := 1 to mainform.Image1.width do
mainform.Image2.Canvas.Pixels[x,y] :=
Convert8bitGreyTo24bit(PixelArray[x*2, y*2].Pixel);
end;
procedure DisplayThreshold();
var
x, y : integer;
begin
mainform.Image3.Canvas.Brush.Color := BgCol;
mainform.Image3.Canvas.FillRect
(Rect(0,0,mainform.Image3.Width, mainform.Image3.Height));
for y := 1 to mainform.Image1.height do
for x := 1 to mainform.Image1.width do
if (PixelArray[x*2, y*2].Pixel<mainform.GlobalThreshold.Position) then
mainform.Image3.Canvas.Pixels[x,y] := FgCol1;
end;
procedure DisplayEdges();
var
x, y : integer;
begin
mainform.Image4.Canvas.Brush.Color := BgCol;
mainform.Image4.Canvas.FillRect
(Rect(0,0,mainform.Image4.Width, mainform.Image4.Height));
Algorithm Simulation Source Code 230
for y := 1 to mainform.Image1.height do
for x := 1 to mainform.Image1.width do
if PixelArray[x*2, y*2].Status then
mainform.Image4.Canvas.Pixels[x,y] := FgCol2
end;
procedure DisplayNextGen();
var
x, y : integer;
begin
mainform.Image5.Canvas.Brush.Color := BgCol;
mainform.Image5.Canvas.FillRect
(Rect(0,0,mainform.Image5.Width, mainform.Image5.Height));
for y := 1 to mainform.Image1.height do
for x := 1 to mainform.Image1.width do
if PixelArray[x*2, y*2].Status then
mainform.Image5.Canvas.Pixels[x,y] := FgCol3;
if mainform.Saveimages1.Checked and (currgen = 1) then
begin
mainform.Image2.Picture.SaveToFile(’smoothed.bmp’);
mainform.Image3.Picture.SaveToFile(’threshold.bmp’);
mainform.Image4.Picture.SaveToFile(’contour.bmp’);
end;
if mainform.Saveimages1.Checked and (currgen < 100) then
mainform.Image5.Picture.SaveToFile(’firing’+inttostr(currgen)+’.bmp’);
end;
procedure DisplayResets();
var
n : integer;
begin
mainform.Image6.Canvas.Brush.Color := BgCol;
mainform.Image6.Canvas.FillRect
(Rect(0,0,mainform.Image6.Width, mainform.Image6.Height));
for n := 1 to reset.n do
mainform.Image6.Canvas.Pixels[(reset.x[n] div 2),
(reset.y[n] div 2)] := FgCol4;
for n := 1 to centre.n do
begin
mainform.Image6.Canvas.Pixels[(centre.x[n] div 2),
(centre.y[n] div 2)] := FgCol5;
mainform.Image6.Canvas.Pixels[(centre.x[n] div 2)-1,
(centre.y[n] div 2)] := FgCol5;
mainform.Image6.Canvas.Pixels[(centre.x[n] div 2)+1,
Algorithm Simulation Source Code 231
(centre.y[n] div 2)] := FgCol5;
mainform.Image6.Canvas.Pixels[(centre.x[n] div 2),
(centre.y[n] div 2)-1] := FgCol5;
mainform.Image6.Canvas.Pixels[(centre.x[n] div 2),
(centre.y[n] div 2)+1] := FgCol5;
end;
end;
procedure DisplayCentres();
var
n : integer;
begin
mainform.Image7.Canvas.Brush.Color := BgCol;
mainform.Image7.Canvas.FillRect
(Rect(0,0,mainform.Image7.Width, mainform.Image7.Height));
for n := 1 to Centre.n do
mainform.Image7.Canvas.Pixels[(centre.x[n] div 2),
(centre.y[n] div 2)] := FgCol5;
end;
procedure DisplaySizes();
var
x, y : integer;
count : array[1..maxsize] of integer;
begin
if mainform.Scan1.checked then
begin
for y := 1 to Ygrid do
for x := 1 to XGrid do
inc(SizeArray[x,y].Count);
for x := 1 to centre.n do
begin
SizeArray[centre.x[x], centre.y[x]].Size :=
SizeArray[centre.x[x], centre.y[x]].Count;
SizeArray[centre.x[x], centre.y[x]].Count := 0;
end;
for x := 1 to maxsize do count[x] := 0;
for y := 1 to Ygrid do
for x := 1 to XGrid do
if (SizeArray[x, y].Count>maxsize) then SizeArray[x, y].Size := 0
else
if (SizeArray[x, y].Size>0) then inc(Count[SizeArray[x, y].Size]);
end
else
for x := 1 to maxsize do count[x] := 0;
Algorithm Simulation Source Code 232
y := 0;
for x := 1 to maxsize do
begin
y := y + count[x];
if (mainform.FreqGrid.Cells[x, 1] <> inttostr(count[x])) then
mainform.FreqGrid.Cells[x, 1] := inttostr(count[x]);
end;
If (mainform.FreqGrid.Cells[maxsize + 1, 1] <> inttostr(y)) then
mainform.FreqGrid.Cells[maxsize + 1, 1] := inttostr(y);
end;
{ FILE OUTPUT PROCEDURES }
procedure OutputHeader();
begin
write(outfile, DateTimeToStr(Now)+’: ’);
write(outfile, ’filename="’+mainform.OpenPictureDialog1.FileName+’", ’);
write(outfile, ’edge=’+inttostr(Mainform.EdgeThreshold.Position)+’, ’);
write(outfile, ’threshold=’+inttostr(Mainform.GlobalThreshold.Position));
writeln(outfile, ’, ’+mainform.ResultDesc.Text);
end;
procedure OutputResults();
var
n : integer;
begin
for n := 1 to maxsize do
write(outfile, mainform.FreqGrid.Cells[n, 1]+’ ’);
writeln(outfile, mainform.FreqGrid.Cells[maxsize+1, 1]);
end;
procedure OutputFile();
begin
assignfile(outfile, ’results.txt’);
if fileexists(’results.txt’) then append(outfile) else rewrite(outfile);
OutputHeader(); OutputResults();
closefile(outfile);
end;
{ INITIALISATION AND CLEARUP PROCEDURES }
procedure InitFreqGrid();
var
n : integer;
Algorithm Simulation Source Code 233
begin
mainform.FreqGrid.Cells[0,0] := ’Size’;
mainform.FreqGrid.Cells[0,1] := ’Count’;
mainform.FreqGrid.Cells[(maxsize+1),0] := ’Total’;
for n := 1 to maxsize do mainform.FreqGrid.Cells[n, 0] := inttostr(n);
end;
procedure ClearImages();
begin
mainform.Image1.Canvas.Brush.Color := BgCol;
mainform.Image1.Canvas.FillRect
(Rect(0,0,mainform.Image1.Width, mainform.Image1.Height));
mainform.Image1.refresh;
mainform.Image2.Canvas.Brush.Color := BgCol;
mainform.Image2.Canvas.FillRect
(Rect(0,0,mainform.Image2.Width, mainform.Image2.Height));
mainform.Image2.refresh;
mainform.Image3.Canvas.Brush.Color := BgCol;
mainform.Image3.Canvas.FillRect
(Rect(0,0,mainform.Image3.Width, mainform.Image3.Height));
mainform.Image3.refresh;
mainform.Image4.Canvas.Brush.Color := BgCol;
mainform.Image4.Canvas.FillRect
(Rect(0,0,mainform.Image4.Width, mainform.Image4.Height));
mainform.Image4.refresh;
mainform.Image5.Canvas.Brush.Color := BgCol;
mainform.Image5.Canvas.FillRect
(Rect(0,0,mainform.Image5.Width, mainform.Image5.Height));
mainform.Image5.refresh;
mainform.Image6.Canvas.Brush.Color := BgCol;
mainform.Image6.Canvas.FillRect
(Rect(0,0,mainform.Image6.Width, mainform.Image6.Height));
mainform.Image6.refresh;
mainform.Image7.Canvas.Brush.Color := BgCol;
mainform.Image7.Canvas.FillRect
(Rect(0,0,mainform.Image7.Width, mainform.Image7.Height));
mainform.Image7.refresh;
end;
procedure ResetAll(CalcGlobalThreshold : boolean);
begin
CurrGen := 0;
EmptySizeArray(SizeArray); EmptyPixelArray(PixelArray);
ImageToArray(); CalcAverages();
if CalcGlobalThreshold then CalcGlobalAverage();
CalcEdges(); CalcEdgeNodes(PixelArray);
DisplayImage(); Delay(50); DisplayAverages(); Delay(50);
Algorithm Simulation Source Code 234
DisplayThreshold(); Delay(50); DisplayEdges(); Delay(50);
DisplayNextGen(); Delay(50); DisplayResets(); Delay(50);
DisplayCentres(); Delay(50); DisplaySizes();
end;
{ TRACKBAR EVENT HANDLERS }
procedure Tmainform.GlobalThresholdChange(Sender: TObject);
begin
if (GlobalThresholdLabel.Caption <> inttostr(GlobalThreshold.Position)) then
begin
GlobalThresholdLabel.Caption := inttostr(GlobalThreshold.Position);
ResetAll(FALSE);
end;
end;
procedure Tmainform.EdgeThresholdChange(Sender: TObject);
begin
if (EdgeThresholdLabel.Caption <> inttostr(EdgeThreshold.Position)) then
begin
EdgeThresholdLabel.Caption := inttostr(EdgeThreshold.Position);
ResetAll(FALSE);
end;
end;
{ BUTTON EVENT HANDLERS }
procedure Tmainform.ResetAllButtonClick(Sender: TObject);
begin
DoResetAll := TRUE;
ClearImages();
// mainform.GlobalThreshold.Position := DefaultThreshold;
// mainform.EdgeThreshold.Position := DefaultEdge;
mainform.ResultDesc.Text := ’’;
PixelArray := OrigArray;
DisplayImage; ResetAll(TRUE);
DoResetAll := FALSE;
end;
procedure Tmainform.AutoThresholdButtonClick(Sender: TObject);
begin ResetAll(TRUE); end;
procedure Tmainform.SetThresholdsButtonClick(Sender: TObject);
begin ResetAll(FALSE); end;
procedure Tmainform.AddFlatButtonClick(Sender: TObject);
begin CalcStaticNoise(’flat’); DisplayImage(); ResetAll(TRUE); end;
procedure Tmainform.AddGaussianButtonClick(Sender: TObject);
Algorithm Simulation Source Code 235
begin CalcStaticNoise(’gaussian’); DisplayImage(); ResetAll(TRUE); end;
procedure Tmainform.AddSpeckleButtonClick(Sender: TObject);
begin CalcStaticNoise(’speckle’); DisplayImage(); ResetAll(TRUE); end;
procedure Tmainform.AddSaltButtonClick(Sender: TObject);
begin CalcSaltPepperNoise(’salt’); DisplayImage(); ResetAll(TRUE); end;
procedure Tmainform.AddPepperButtonClick(Sender: TObject);
begin CalcSaltPepperNoise(’pepper’); DisplayImage(); ResetAll(TRUE); end;
procedure Tmainform.AddSaltPepperButtonClick(Sender: TObject);
begin
CalcSaltPepperNoise(’salt’); DisplayImage();
CalcSaltPepperNoise(’pepper’); DisplayImage(); ResetAll(TRUE);
end;
procedure Tmainform.AppendFileButtonClick(Sender: TObject);
begin OutputFile(); end;
{ MENU EVENT HANDLERS }
procedure Tmainform.Open1Click(Sender: TObject);
begin
if OpenPictureDialog1.Execute then
begin
Image1.Picture.LoadFromFile(OpenPictureDialog1.Filename);
ResetAll(TRUE); OrigArray := PixelArray;
end;
end;
procedure Tmainform.Exit1Click(Sender: TObject);
begin Application.Terminate; end;
procedure Tmainform.MaxRefresh1Click(Sender: TObject);
begin mainform.MaxRefresh1.Checked := TRUE; end;
procedure Tmainform.N10fpsClick(Sender: TObject);
begin mainform.N10fps.Checked := TRUE; end;
procedure Tmainform.N5fpsClick(Sender: TObject);
begin mainform.N5fps.Checked := TRUE; end;
procedure Tmainform.N1fpsClick(Sender: TObject);
begin mainform.N1fps.Checked := TRUE; end;
procedure Tmainform.N01fpsClick(Sender: TObject);
begin mainform.N01fps.Checked := TRUE; end;
Algorithm Simulation Source Code 236
procedure Tmainform.SingleFrame1Click(Sender: TObject);
begin mainform.SingleFrame1.Checked := TRUE; end;
procedure Tmainform.Scan1Click(Sender: TObject);
begin mainform.Scan1.Checked := True; ResetAll(False); end;
procedure Tmainform.Random1Click(Sender: TObject);
begin mainform.Random1.Checked := True; ResetAll(False); end;
procedure Tmainform.About1Click(Sender: TObject);
begin aboutform.show; end;
procedure Tmainform.SaveImages1Click(Sender: TObject);
begin mainform.Saveimages1.Checked:=not(mainform.Saveimages1.Checked); end;
procedure Tmainform.Monochrome1Click(Sender: TObject);
begin
mainform.Monochrome1.Checked:=not(mainform.Monochrome1.Checked);
if mainform.Monochrome1.Checked then
begin
BgCol := $00FFFFFF; FgCol1 := $00000000; FgCol2 := $00000000;
FgCol3 := $00000000; FgCol4 := $00999999; FgCol5 := $00000000;
end
else
begin
BgCol := $00000000; FgCol1 := $00FF0000; FgCol2 := $0000FF00;
FgCol3 := $000000FF; FgCol4 := $00666666; FgCol5 := $00FFFFFF;
end;
ResetAll(TRUE);
end;
{ APPLICATION EVENT HANDLERS }
procedure Tmainform.IdleHandler(Sender: TObject; var Done: Boolean);
begin
if not(DoResetAll) and not(mainform.SingleFrame1.Checked) then
begin
Delay(GetFPS());
if mainform.scan1.checked then CalcNextGen() else CalcNextGenRandom();
DisplayNextGen(); DisplayResets(); DisplayCentres(); DisplaySizes();
inc(CurrGen);
end;
end;
procedure Tmainform.Image1Click(Sender: TObject);
begin
if mainform.SingleFrame1.Checked then
begin
Algorithm Simulation Source Code 237
if mainform.scan1.checked then CalcNextGen() else CalcNextGenRandom();
DisplayNextGen(); DisplayResets(); DisplayCentres(); DisplaySizes();
end;
end;
procedure Tmainform.FormCreate(Sender: TObject);
begin
BgCol := $00000000; FgCol1 := $00FF0000; FgCol2 := $0000FF00;
FgCol3 := $000000FF; FgCol4 := $00666666; FgCol5 := $00FFFFFF;
DoResetAll := TRUE;
Application.OnIdle := IdleHandler;
InitFreqGrid();
mainform.GlobalThreshold.Position := defaultThreshold;
mainform.EdgeThreshold.Position := defaultEdge;
mainform.fpn.text := inttostr(defaultfpn);
mainform.defects.text := inttostr(defaultdefects);
ResetAll(TRUE);
OrigArray := PixelArray;
DoResetAll := FALSE;
end;
end.
Appendix B
AER Communication Source Code
(Firmware)
238
AER Communication Source Code (Firmware) 239
/* **************************************************************************
Program : ORASIS AER Readout Firmware (for PIC18F4620) in Microchip C18
Module : main.c
Date : See File Timestamp
Author : Timothy G Constandinou
Company : Imperial College London
************************************************************************** */
#include "p18f4620.h" /* for TRISB and PORTB declarations */
#include <delays.h>
#include <portb.h>
#include <adc.h>
#include <usart.h>
#include <stdio.h>
#pragma config OSC=HS,BOREN=OFF,WDT=OFF,MCLRE=ON,LPT1OSC=OFF,PBADEN=OFF,
LVP=OFF,DEBUG=OFF,XINST=OFF
#define max_events 750
#pragma udata big1
char x[max_events+1];
#pragma udata big2
char y[max_events+1];
#pragma udata big3
unsigned int t[max_events+1];
#pragma udata
void initialise(void)
{
ClosePORTB();
CloseADC();
ADCON1 = 0b1111;
TRISA = 0b11111111;
TRISB = 0b01111111;
TRISC = 0b10001110;
TRISD = 0b00000000;
TRISE = 0b00000000;
}
void display_led(int led, int mode)
{
if (led==1) PORTCbits.RC4=mode;
if (led==2) PORTCbits.RC5=mode;
}
void display(int value) // To display a value (under 1023) on LED display
AER Communication Source Code (Firmware) 240
{
unsigned char value2=0;
while (value>255) {value-=256; value2++;}
PORTD=value;
PORTE=4+value2;
}
void orasis_reset(int mode) // To assert a GLOBAL_RESET signal on ORASIS
{
PORTCbits.RC0=mode; // Set reset
PORTBbits.RB7=1; // Set acknowledge high (active low)
}
void orasis_flush_aer(void) // Ensure AER registers are empty before starting
{
unsigned int n=0, m=0;
display_led(1,0); display_led(2,0);
while (n<50000) // If no chip request for 50K ops then flushed
{
if(PORTBbits.RB6==0) // If chip request then acknowledges
{
PORTBbits.RB7=0; while(PORTBbits.RB6==0);
PORTBbits.RB7=1; m++; n=0;
}
n++;
}
if (m>0)
fprintf(_H_USART, "ORASIS_P2 AER Output: Flushed %d events...\n\r", m);
display(m); Delay10KTCYx(500); // Displays number of events flushed
}
void orasis_sample_aer(void) // Runs chip, samples AER and outputs to RS232.
{
unsigned int n, event=0, time=0;
char *x_ptr = &x[0];
char *y_ptr = &y[0];
short long *t_ptr = &t[0];
display_led(1,1); display_led(2,1);
while(n<=max_events)t[n++]=0;
while ((time<62500)&(event<max_events)) // Samples until buffer is full
{ // or no activity for 500ms.
if(PORTBbits.RB6==0) // If chip request then acknowledges
AER Communication Source Code (Firmware) 241
{
x[++event]=PORTA&0b00111111;
y[event]=PORTB&0b00111111;
t[event]=time;
PORTBbits.RB7=0; while(PORTBbits.RB6==0); PORTBbits.RB7=1;
time+=3;
}
time++;
}
display_led(1,0); display_led(2,0); display(event);
if (event>0)
{
fprintf(_H_USART,
"ORASIS_P2 AER Output: Streaming %d events...\n\r", event);
for(n=1;n<=event;n++)
fprintf(_H_USART, "Event %u at t=%u: %u,%u\n\r", n, t[n], x[n], y[n]);
fprintf(_H_USART, "\n\r");
}
Delay10KTCYx(500);
}
void main (void)
{
int value;
unsigned char ctrl1,ctrl2,ctrl3;
initialise(); orasis_reset(1);
OpenUSART(USART_TX_INT_OFF & USART_RX_INT_OFF & USART_ASYNCH_MODE &
USART_EIGHT_BIT & USART_CONT_RX & USART_BRGH_HIGH, 10);
display_led(1,0); display_led(2,0);
for (value=0;value<=1000;value++)
{
if (((value/16)%2)==0) {display_led(1,0);display_led(2,1);}
if (((value/16)%2)==1) {display_led(1,1);display_led(2,0);}
Delay10KTCYx(1); display(value);
}
orasis_flush_aer(); orasis_reset(0); orasis_sample_aer();
orasis_reset(1);
CloseUSART();
}
Appendix C
System Simulation Schematics
242
System Simulation Schematics 243
Figure C.1: Schematic diagram of the simulated 16×16 ASP array.
System Simulation Schematics 244
Figure C.2: Schematic diagram of the simulated 9×9 ABP array.
System Simulation Schematics 245
Figure C.3: Schematic diagram of the simulated 12×12 AER architecture.
System Simulation Schematics 246
Figure C.4: Schematic diagram of the simulated 12×12 array.
System Simulation Schematics 247
Figure C.5: Schematic diagram of the simulated system including a 12×12 distributed
processing array.
Appendix D
Testboard Hardware
248
Testboard Hardware 249
5 5
4 4
3 3
2 2
1 1
D
D
C
C
B
B
A
A
ST NU
O
M
SNEL
STNU
O
M
BCP
1
draobts et
1
P-
SI
S
A
R
O
1
1
40 02, 22
hcra
M ,y ad no
M
eltiT
v e
R
reb
m u
Ntn e
muco
D
e zi
S
tee h
S
: eta
D
fo
v9 -
niv+
ni v-
v +
v9 -
-9vps
+9vps
A_v5+
v-
D_v5+
v9+
spv9+
v9 -
s pv9-
v9+
v 9+
spv5+
HI
I
HOL
HI
I
H
I
H
HIHI
LO
D_
D
N
G
D_
D
N
G
D_
D
N
G
D_
D
N
G
D_
D
N
G
D_
D
N
G
D_
D
N
G
D_
D
N
G
D_
D
N
G
A_
D
N
G
A_
D
N
G
A_
D
N
G
A_
D
N
G
D_
D
N
G
D_
D
N
G
A_
D
N
G
D _
D
N
G
D_
D
N
G
D_
D
N
G
D_
D
N
G
D_
D
N
G
D_
D
N
G
D_
D
N
G
A_
D
N
G
D_
D
N
G
D_
D
N
G
D_
D
N
G
D_
D
N
G D_
D
N
G D _
D
N
G D _
D
N
G
D_
D
N
G
A_
D
N
G
A_
D
N
G
A_
D
N
G
A_
D
N
G
A _
D
N
G
A_
D
N
G
A_
D
N
G
A_
D
N
G
A_
D
N
G
A_
D
N
G
A_
D
N
G
D _
D
N
G
D_
D
N
G
A_
D
N
G
HI HI
7
KL
K
NIL 1 2
2
W
S
4-
PI
D
W
S
C
02
R
5
PI
S
K0 01
12
3
4
5
43PT
9
U
67F61
CI
P
8
10
11 21 31 41 51 61 71 81
19
1
9
2 3 4 5 6 7 12
22 32 42 52 62 72 82
20
GND
OSC2/CLKOUT
I
K
C1T/
O
S
O1T/0
C
R
2
P
C
C/I
S
O 1T/1
C
R
1
P
C
C/ 2
C
R
L
C
S/
K
C
S/3
C
R
A
D
S/I
D
S/4
C
R
O
D
S /5
C
R
K
C/
XT/6
C
R
T
D/
X
R/7
C
R
GND
P
P
V/
RL
C
M
OSC1/CLKIN
0
N
A/0
A
R
1
N
A/1
A
R
2
N
A/ 2
A
R
F
E
R
V/3
N
A/3
A
R
I
K
C
OT/4
A
R
S
S/ 4
N
A/ 5
A
R
T
NI/
O
B
R
1
B
R
2
B
R
M
G
P/3
B
R
4
B
R
5
B
R
C
G
P/6
B
R
D
G
P/7
B
R
VDD
K033
5
R
71PT
71
C
Fp33
4
D
D
EL
1
U
0 22
OT/90 97
ML
2
1
3
NI
V
GND
T
U
O
V
11
U
B
B/206
A
P
O
32
7 4
6 51
+-
V+ V-
T
U
O
2T1T
C
11
R
5
PI
S
K001
12
3
4
5
7PT
21PT
3PT
93PT
42 PT
5
KL
K
NI L 12
2P T
1
P-
SI
S
A
R
O
6
U
1 p-p ihc
1
2
3
4 5 6 8 9 01 11 21 31 41 51 61 71 8102 12
22
23
24
52 62 72 82 92 03 23
33 34
3536
73 83 93 04 14 24 4454 64 74 84
VAS
VDD
VAD
1T
U
O
V_
XI
P
2T
U
O
V_
XI
P
3T
U
O
V_
XI
P
1
S
E
RT
X
E
2
S
E
RT
X
E
F
E
RI
S
AI
BI
E
N
UTI
N
E
S
N
A
S I
1
P
2
P
3
P
4
P
5
P
NI
E
N
UT I
NI
S
AI
BI
VAS
VSS
VAD
0
A
1
A
2
A
3
A
4
A
5
A TU
O
A
V3IO V0IO
VSSVDD
0
B
1
B
2
B
3
B
4
B
5
B TU
O
B
1
AT
A
D
2
AT
A
D
3
AT
A
D
4
AT
A
D
21
RM 1
4
W
S
6 -
PI
D
W
S
3 1
C
Fn001
91J
A
M
S
1
2 3
45
C
8
R
5
PI
S
K001
12
3
4
5
9
W
S
TSPS-T
C
AT
W
S
92 PT
21
C
Fu1
9PT
41
U
B
B/ 206
A
P
O
32
7 4
6 51
+-
V+ V-
T
U
O
2T1T
81J
A
MS
1
2 3
45
7
C
Fu0001
6
D
D
EL
4
U
29
OT/033
R
S Z
1
3
2
T
U
O
NI
GND
22J
A
MS
1
2 3
45
24PT
5
W
S
TSPS-T
C
AT
W
S
9
C
Fn001
7
R
R033
4
C
Fn001
41
C
Fp33
81PT
5
D
D
EL
1 2
R
R086
51
C
Fp33
7
W
S
)3 20
6
S
C
S(
6-
PI
D
W
S
52J
A
MS
1
2 3
45
62J
C
N
B
1
2
13PT
2
Y
zh
M0 2
7J
A
MS
1
2 3
45
04PT
1
KL
K
NI L 12
4
K L
K
NIL 12
8
W
S
6-
PI
D
W
S
7
U
67 F61
CI
P
8
10
11 21 31 41 51 61 71 81
19
1
9
2 3 4 5 6 7 12
22 32 42 52 62 72 82
20
GND
OSC2/CLKOUT
I
K
C1T/
O
S
O1T/0
C
R
2
P
C
C/I
S
O1T/1
C
R
1
P
C
C/ 2
C
R
L
C
S/
K
C
S/3
C
R
A
D
S/I
D
S/4
C
R
O
D
S/5
C
R
K
C/
XT/6
C
R
T
D/
X
R/ 7
C
R
GND
P
P
V/
RL
C
M
OSC1/CLKIN
0
N
A/0
A
R
1
N
A/1
A
R
2
N
A/2
A
R
F
E
R
V/3
N
A/3
A
R
I
K
C
OT/4
A
R
S
S/4
N
A/5
A
R
T
NI/
O
B
R
1
B
R
2
B
R
M
G
P/3
B
R
4
B
R
5
B
R
C
G
P/6
B
R
D
G
P/7
B
R
VDD
11PT
2
KL
K
NIL 1 2
41J
A
M
S
1
2 3
45
9J
CONN PCB 8
1 2 3 4 5 6 7 8
10 04
N 1
2
D
3 2PT
51
R K1
31J
A
MS
1
2 3
45
3
W
S
)320
6
S
C
S(
6-
PI
D
W
S
1
R
R033
34PT
01
C
Fu1
6
R K1
03PT
K03 3
3
R
91 PT
01
W
S
X
E
H-
PI
D
W
S
3
5
61 4
2
4
C
81 2
C
1J
C
N
B
1
2
82J
R
E
P
M
UJ
1
2
7 2J
C
N
B
1
2
3J
C
N
B
1
2
71
RM1
8
C
F u1
22P T
62PT
5PT
01
U
B
B/206
A
P
O
32
7 4
6 51
+-
V+ V-
T
U
O
2T1T
31
R
K01
14PT
3
C
Fu1
01J
A
M S
1
2 3
45
63PT
23PT
41PT
81
R
K01
4PT
1P T
7
D
D
EL
12J
A
M
S
1
2 3
45
1
W
S
)320
8
S
C
S (
4-
PI
D
W
S
6
W
S
X
E
H-
PI
D
W
S
3
5
61 4
2
4
C
81 2
C
2
U
T
O
S/4101
CT
1
2
34
5
NI
V
GND
N
D
H
S
S
S
A
P
Y
B
T
U
O
V
61
C
Fp33
02J
A
MS
1
2 3
45
02PT
44PT
5
U
022
OT/5087
ML
1
2
3
NI
V
GND
T
U
O
V
2
R RA
V
2
M2
6PT
31PT
92J
R
E
P
M
UJ
1
2
2
C
Fn 001
8
U
B
B/2 06
A
P
O
32
7 4
6 51
+-
V+ V-
T
U
O 2T1T
72PT
6J
A
MS
1
2 3
45
51PT
5J
C
N
B
1
2
73PT
6
C
Fu0001
32J
CONN PCB 8
1 2 3 4 5 6 7 8
3
D
D
E L
71J
A
M
S
1
2 3
45
91
RM1
12PT
4J
C
N
B
1
2
11
C
Fn001
61J
A
MS
1
2 3
45
22
R
R086
54PT
61
R
R033
01PT
3
KL
K
NIL1 2
51J
C
N
B
1
2
53PT
41
RM1
11J
A
MS
1
2 3
45
01
R
8
PI
S
K001
12
3
4
5
6
7
8
31
U
B
B/206
A
P
O
32
7 4
6 51
+-
V+ V-
T
U
O
2T1T
1 004
N1
1
D
3
U
0 22
OT/9 087
M L
1
2
3
NI
V
GND
T
U
O
V
61PT
52PT
9
R
8
PI
S
K001
12
3
4
5
6
7
8
8J
A
MS
1
2 3
45
8PT
83PT
21
U
B
B/206
A
P
O
32
7 4
6 51
+-
V+ V-
T
U
O 2T1T
2J
3
B
C
P
N
N
O
C
1 2 3
1
C
Fu1
5
C
Fp0 74
33PT
82PT
21J
A
M
S
1
2 3
45
6
KL
K
NIL 21
1
Y
zh
M02
4
R R A
V
2
M 2
4 2J
C
N
B
1
2
Figure D.1: Schematic diagram of the ORASIS-P1 platform for sub-circuit test and photo-
diode characterisation.
Testboard Hardware 250
Figure D.2: Photograph of the ORASIS-P1 platform for sub-circuit test and photodiode
characterisation.
Testboard Hardware 251
5 5
4 4
3 3
2 2
1 1
D
D
C
C
B
B
A
A
1
nsd .a2 p-sis aro/ a2p-sis aro/ bcp/ng ised
dr aobtseT
edoi doto h
P
A2
P-
SI
S
A
R
O
1
1
5002,80
yraurbeF,yadseuT
eltiT
ve
R
re b
mu
Ntne
muco
D
ezi
S
teeh
S
:eta
D
fo
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 0 0 0 0 0 0 0 0 0 0 0
0
0
0000000
0
0
0 0
I
H
I
H
0
I
H
HI
0
I
H
0
I
H
0
0
1
U
O T/033
R
S Z
1
3
2
T
U
O
NI
GND
J16 SMA
1 2
J5 SMA
12
J12 SMA
1 2
J17 SMA
1 2
J2 SMA
12
C
N
B
01J1
2
1J
2
B
C
P
N
N
O
C
1 2
A
M
S
8J
1
2
J6 SMA
12
2P-SISAR
O
2
U
1 2 3 4 5 6 7 8 9 01 11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
35 25 15 05 94 84 74 6 4 54 4 4 34 24 14 04 93 8 3 73 63 53 43 33
74
73
72
71
70
69
68
67
66
65
64
63
62
61
60
59
58
57
56
55
54
57 67 77 87 9 7 08 18 2 8 3 8 4 8
G
V
A_L
A
B
OL
G
T
E
S
E
R_L
A
B
OL
G
T
E
S
E
R_L
A
C
OL
E
D
O
M_
S
E
R
HT
0L
E
ST
U
O
1L
E
ST
U
O
S
S
V
D
D
V
OI0
V
OI3
V
1
DI
V
A
D
DAVID2
DAVID3
DAVID4
DAVID5
PHOTO3A
PHOTO3B
PHOTO3C
PHOTO4
PHOTO5
PHOTO6
PHOTO7
VSS
VDD
V0IO
V3IO
X<0>
X<1>
X<2>
X<3>
X<4>
X<5>
A
D
D
V
A
S
S
V
A
S
S
V
FF
O
N
O_
E
KI
P
S
T
U
O_
E
KI
P
S
D
D
V
S
S
V
OI3
V
OI0
V
>5<
Y
>4 <
Y
>3<
Y
>2 <
Y
>1<
Y
>0<
Y
D
D
V
S
S
V
OI3
V
OI0
V
K
C
A
Q
E
R
RES_IBIAS
EXT_IBIAS
TEST_IBIAS
RES_ITUNE
EXT_ITUNE
TEST_ITUNE
RES_IAER
EXT_IAER
TEST_IAER
VDDA
VSSA
VDD
VSS
IGLOBAL
PHOTO9B
PHOTO9A
PHOTO8B
PHOTO8A
N/A
SPIKEBIAS
VDD_SPIKE
1
OT
O
H
P
2
OT
O
H
P
A
S
S
V
A
D
D
V
A
D
D
V
S
S
V
D
D
V
OI0
V
OI3
V
S
S
V
J18 SMA
1 2
J3 SMA
12
J7 SMA
12
C
N
B
11J1
2
J14 SMA
1 2
J15 SMA
1 2
J4 SMA
12
2
C
P
A
C
A
M
S
9J
1
2
1
C
P
A
C
J13 SMA
1 2
Figure D.3: Schematic diagram of the ORASIS-P2 platform for photodiode characterisation.
Testboard Hardware 252
Figure D.4: Photograph of the ORASIS-P2 platform for photodiode characterisation.
Testboard Hardware 253
5 5
4 4
3 3
2 2
1 1
D
D
C
C
B
B
A
A
1
NS
D.B2P-SISA
R
O
t seT
2
P-
SI
S
A
R
O
draob
1
1
50 02,10
enuJ,y adse nde
W
e ltiT
ve
R
reb
mu
Ntne
muco
D
eziS
teeh
S
:eta
D
fo
3
Y4 X
0T
U
O
4T
U
O
2X
7T
U
O
N
A_8.1
CTRL8
CTRL4
K
C
A
4
Y
2L
RT
C
0T
U
O
4 T
U
O
8T
U
O
5T
U
O
0
Y
CTRL3
5
Y
T
E
S
E
R
3L
RT
C
GI
D_8.1
CTRL7
1T
U
O
9T
U
O
1T
U
O
6T
U
O
1
Y
CTRL2
2T
U
O
CTRL6
8T
U
O
0X
2T
U
O
5T
U
O
7T
U
O
2
Y
CTRL1
1X 3X
3T
U
O
3T
U
O
6T
U
O
5 X
CTRL5
9T
U
O
Q
E
R
1L
RT
C
0
Y
3
Y
GI
D_8.1
1.8_DIG
GI
D_8.1
6L
RT
C
1.8_AN
1.8_AN
1
Y
4
Y
X0
8L
RT
C
1.8_AN
5L
RT
C
N
A_8.1
X1
Q
E
R
N
A_8.1
X4
5
Y
1.8_DIG
GI
D_8.1
T
E
S
E
R
N
A_8. 1
1.8_AN
X2
K
C
A
2
Y
X3
1.8_DIG
7L
RT
C
1.8_AN
X5
4L
RT
C
GI
D_8.1
0
I
H
0
0
0
0
0
0
0
0
0
0
0
0
I
H
0
0
I
H
0
0
0
0
0
I
H
OL
0
0
0
0
0
0
0
0
0
0
I
H
0
0
0
0
0
0
0
00
0
I
H
0
I
H
0
0
0
0
0
0
0
0
0
0
0
HI
0
I
H
0
0
0
0
0
0
0
I
H
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
U
E 3233XA
M
1 3 4 5
16
15
2 6
9
7 8
01
1121
13
14
+1
C
-1
C
+2
C
-2
C
VCC
GND
+
V
-
V
1T
U
O
R
1T
U
OT
1
NI
R
1
N IT
EL
B
A
N
E
XT
EL
B
A
N
E
X
R
VL
SHDN
01J
2
B
C
P
N
N
O
C
1 2
M1
5
R
7
D
D
EL
K1
61
R
51
R
K01
9
PI
S
12
3
4
5
6
7
8
9
2J
A
M
S
1
2
3
5
D
C
C
G
E
S-7
D
EL
3 8
7 6 4 2 1 9 01 5
1
C
C
2
C
C
A B C D E F G PD
1
N
R
R00 1
7
P I
D
S
E
R
1 2 3 4 5 6 7
89011 1213141
21
R K1
6PT
A
T
NI
O
P
T
R00 1
9
R
22
C
Fu02 2
3
R
2
M 2
8
U
B
A
V33-4621
CT
1
2
3
NI
V
GND
T
U
O
V
41
C Fp33
6J
A
M
S
1
2
3
1
D
D
EL
2P -SISAR
O
3
U
1 2 3 4 5 6 7 8 9 01 11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
35 25 15 05 94 84 74 64 54 44 34 24 14 04 93 83 73 63 53 43 33
74
73
72
71
70
69
68
67
66
65
64
63
62
61
60
59
58
57
56
55
54
57 67 77 87 97 08 18 28 38 48
G
V
A_L
A
B
O L
G
T
E
S
E
R_L
A
B
OL
G
T
E
S
E
R_L
A
C
OL
E
D
O
M_
S
E
R
HT
0L
E
ST
U
O
1L
E
ST
U
O
S
S
V
D
D
V
O I0
V
OI 3
V
1
DI
V
A
D
DAVID2
DAVID3
DAVID4
DAVID5
PHOTO3A
PHOTO3B
PHOTO3C
PHOTO4
PHOTO5
PHOTO6
PHOTO7
VSS
VDD
V0IO
V3IO
X<0>
X<1>
X<2>
X<3>
X<4>
X<5>
A
D
D
V
A
S
S
V
A
S
S
V
FF
O
N
O_
E
KI
P
S
T
U
O_
E
KI
P
S
D
D
V
S
S
V
OI3
V
OI0
V
>5<
Y
>4<
Y
>3<
Y
>2<Y
>1<
Y
>0<
Y
D
D
V
S
S
V
OI3
V
OI0
V
K
C
A
Q
E
R
RES_IBIAS
EXT_IBIAS
TEST_IBIAS
RES_ITUNE
EXT_ITUNE
TEST_ITUNE
RES_IAER
EXT_IAER
TEST_IAER
VDDA
VSSA
VDD
VSS
IGLOBAL
PHOTO9B
PHOTO9A
PHOTO8B
PHOTO8A
N/A
SPIKEBIAS
VDD_SPIKE
1
OT
O
H
P
2
OT
O
H
P
A
S
S
V
A
D
D
V
A
D
D
V
S
S
V
D
D
V
OI0
V
OI3
V
S
S
V
4J
A
M
S
1
2
3
R001
11
R
61
C
Fu022
5J
A
M
S
1
2
3
5
U
5087
M L
1
2
3
NI
V
GND
T
U
O
V
41PT
T
N
U
O
M
S
N
EL
5PT
A
T
NI
O
P
T
1
Y
latsyr
C
zh
M02
1
W
S
8-
PI
D
W
S
K1
71
R
51PT
EL
O
H
T
N
U
O
M
7
C
F n001
9
C
Fn001
9J
R
E
P
M
UJ
1
2
R00 1
8
R
4
U
026 4F 81
C I
P
8 9 01
12
41
51 61 71 81 91 02 12 2232 42 52 62 72 82 92 03
31
1 312 3 4 5 6 7 33 43 53 6 3 73 83 93 04
11
32
5
N
A/
D
R/0
E
R
6
N
A/
R
W/1
E
R
7
N
A/
S
C/2
E
R
VSS
6
A
R/
O
KL
C/2
C
S
O
I
K
C31T/
O
S
O1T/0
C
R
2
P
C
C/I
S
O1T/1
C
R
A1
P/1
P
C
C/2
C
R
L
C
S/
K
C
S /3
C
R
0
P
S
P/0
D
R
1
P
S
P /1
D
R
2
P
S
P/2
D
R
3
P
S
P/3
D
R
A
D
S /I
D
S/4
C
R
O
D
S/5
C
R
K
C/
XT/6
C
R
T
D/
X
R /7
C
R
4
P
S
P/4
D
R
B1
P/5
P
S
P /5
D
R
C1
P/6
P
S
P/6
D
R
D1
P/7
P
S
P/7
D
R
VSS
3
E
R/
P
P
V/
R L
C
M
7
A
R/I
KL
C/1
C
S
O
0
N
A/0
A
R
1
N
A/1
A
R
F
E
R
V
C/-F
E
R
V/2
N
A/2
A
R
+F
E
R
V/3
N
A/3
A
R
T
U
O1
C/I
K
C
OT/4
A
R
T
U
O2
C/
NI
D
VL
H/
S
S /4
N
A/5
A
R
2 1
N
A/0T LF/ 0T
N I/
O
B
R
01
N
A/1T
NI/1
B
R
8
N
A/2T
N I/2
B
R
2
P
C
C/9
N
A/3B
R
11
N
A/0I
B
K/4
B
R
M
G
P/1I
B
K/5
B
R
C
G
P/2I
B
K/6
B
R
D
G
P /3I
B
K /7
B
R
VDD
VDD
1
R
2
M2
8
C
Fn001
32
C
Fn0 01
8PT
A
T
NI
O
P
T
K01
81
R
6
U
B
A
V81 -462 1
CT
1
2
3
NI
V
GND
T
U
O
V
1PT
A
T
NI
O
P
T
4
C
Fu1
02
C
Fn00 1
2
C
Fn001
51
C
Fp33
1J
A
M
S
1
2
3
3
W
S
N
OTT
U
B
H
S
U
P
4PT
A
T
NI
O
P
T
1
C
Fn001
2
Y
latsyr
C
zh
M02
3
D
C
C
GE
S-7
D
EL
3 8
7 6 4 2 1 9 01 5
1
C
C
2
C
C
A B C D E F G
P
D
9 1
R
K 01
31
R
R001
7J
A
M
S
1
2
3
9PT
T
N
U
O
M
S
N
E L
41 J
A
M
S
1
23
8J
9
D
B
C
P
5
9
4
8
3
7
2
6
1
B7
U
6010 4
4
3
14 7
7PT
A
T
NI
O
P
T
11J
R
E
P
M
UJ
1 2
21J
R
E
P
M
UJ
1 2
1
U
0264F81
CI
P
8 9 01
12
41
51 61 71 81 91 02 12 2232 42 52 62 72 82 92 0 3
31
1 312 3 4 5 6 7 33 43 53 63 73 83 93 04
11
32
5
N
A/
D
R/0
E
R
6
N
A/
R
W/1
E
R
7
N
A/
S
C/2
E
R
VSS
6
A
R/
O
KL
C/2
C
S
O
I
K
C 31 T/
O
S
O1T/ 0
C
R
2
P
C
C/I
S
O1T/1
C
R
A1
P/1
P
C
C/2
C
R
L
C
S/
K
C
S/3
C
R
0
P
S
P/ 0
D
R
1
P
S
P/1
D
R
2
P
S
P/2
D
R
3
P
S
P/3
D
R
A
D
S /I
D
S/4
C
R
O
D
S/ 5
C
R
K
C/
XT/6
C
R
T
D/
X
R /7
C
R
4
P
S
P/4
D
R
B1
P/ 5
P
S
P/5
D
R
C 1P/6
P
S
P/6
D
R
D1
P/ 7
P
SP/7
D
R
VSS
3
E
R/
P
P
V/
RL
C
M
7
A
R/I
KL
C/1
C
S
O
0
N
A /0
A
R
1
N
A/1
A
R
F
E
R
V
C/-F
E
R
V/2
N
A/2A
R
+F
E
R
V/3
N
A/3
A
R
T
U
O1
C/I
K
C
OT/4
A
R
T
U
O2
C/
NI
D
VL
H/
S
S /4
N
A/5
A
R
21
N
A/0TLF/0T
NI/
OB
R
01
N
A/1T
NI /1
B
R
8
N
A/2T
N I/2
B
R
2
P
C
C/9
N
A/3
B
R
11
N
A /0I
B
K /4
B
R
M
G
P/ 1I
B
K/ 5
B
R
C
G
P/2I
B
K/ 6
B
R
D
G
P/3I
B
K/7
B
R
VDD
VDD
71PT
EL
O
H
T
N
U
O
M
01P T
EL
O
H
T
N
U
O
M
2
D
D
E L
4
R K1
2
N
R
R 001
7
PI
D
S
E
R
1 2 3 4 5 6 7
890111213 141
4
D
C
C
G
E
S -7
D
EL
3 8
7 6 4 2 1 9 01 5
1
C
C
2
C
C
A B C D E F G
P
D
3PT
A
T
NI
O
P
T
2
R
2
M2
21PT
T
N
U
O
M
S
N
EL
81
C
Fu0 22
02
R
R 022
3J
A
M
S
1
2
3
31
C
Fn 001
61PT
T
N
U
O
M
S
N
E L
41
R
R00 1
4
N
R
R001
7
PI
D
S
E
R
1 2 3 4 5 6 7
890111213141
71
C
F n001
31J
A
M
S
1
23
M1
7
R
3
C
Fn001
21
C
Fp 33
5
C
Fn00 1
31P T
EL
O
H
T
N
U
O
M
11 PT
A
T
NI
O
P
T
M1
6
R
2
W
S
N
OT T
U
B
H
S
U
P
6
D
C
C
G
E
S-7
D
EL
3 8
7 6 4 2 1 9 01 5
1
C
C
2
C
C
A B C D E F G
P
D
2PT
A
T
NI
O
P
T
11
C
Fp 33
01
C
Fn001
3
N
R
R001
7
PI
D
S
E
R
1 2 3 4 5 6 7
890 111213141
A7
U
60104
2 1
41
7 12
C
Fn1
6
C
Fu1
R00 1
0 1
R
91
C
Fn0 01
Figure D.5: Schematic diagram of the ORASIS-P2 platform for system test and validation.
Testboard Hardware 254
Figure D.6: Photograph of the ORASIS-P2 platform for system test and validation.
Appendix E
Optoelectronic Measurement Setup
255
Optoelectronic Measurement Setup 256
Keithley 236
Source-Measure
Unit (SMU)
Computer
with GPIB
interface
Pulse
generator
Device chamber
Tungsten
halogen
light source
Monochromator
Fi
lt
er
 w
h
ee
l
40
0n
m
 fi
lt
er
Focal Optics Device Mount
Figure E.1: Diagram of equipment setup for photodiode characterisation.
The equipment setup1 for measurement of electrical (IV characteristics) and optical (spec-
tral, responsivity, quantum eﬃciency, etc) is illustrated in Fig. E.1.
The tungsten halogen light source is UV ﬁltered to remove sub-400nm wavelengths
and avoid 2λ interference. This and the monochromator are characterised through the
test optics with a calibrated photodiode (Newport 818-UV#3310). This provides the total
incident light power for each monochromator (wavelength) setting at the device mount.
Furthermore, a single small area photodiode is scanned across the incident light spot to
determine its intensity proﬁle. This characterisation is illustrated in Fig. E.2.
For each test device, the following measurements are taken:
• Electrical characterisation: At a set wavelength and calibrated light intensity the
Source Measure Unit (SMU) sweeps device bias and measures corresponding pho-
tocurrent. Repeated for all Neutral Density (ND) settings (0, 0.15, 0.30, 0.41, 0.45,
0.77, 0.87, 1.10, 1.34, 1.47, 2.04, 5.06) and for under dark (no input) conditions.
• Spectral characterisation: At each wavelength the photoresponse is taken for various
values of reverse bias. From this spectral photoresponse data, the responsivity and
therefore external quantum eﬃciency can be determined.
1Optoelectronic measurements taken at facilities provided in The Blackett Laboratory, Department of
Physics, Imperial College London.
Optoelectronic Measurement Setup 257
0
5
10
15
20
25
30
35
40
0
5
10
15
20
25
30
35
40
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(a)
(b)
Figure E.2: Calibrated (measured) light source intensity characteristics. Illustrated are: (a)
intensity proﬁle and (b) spectral transmission (normalised to maximum value).
Appendix F
Publications
258
Publications 259
• T. G. Constandinou and C. Toumazou, “A Micropower Centroiding Vision Processor,”
IEEE Journal of Solid State Circuits, submitted, 2005.
• T. G. Constandinou, P. Degenaar and C. Toumazou, “An ON-OFF spiking photore-
ceptor for adaptive ultrafast/ultrawide dynamic range vision chips,” Proceedings of
the IEEE Workshop on Biomedical Circuits and Systems, S1/6-9, 2004.
• T. G. Constandinou, J. Georgiou and C. Toumazou, “A Nanopower Tuneable Edge
Detection Circuit,” Proceedings of the 2004 IEEE Symposium on Circuits and Sys-
tems, Vol. 1, pp. 449–452, 2004.
• T. G. Constandinou, J. Georgiou and C. Toumazou, “Towards a Bio-inspired Mixed-
signal Retinal Processor,” Proceedings of the 2004 IEEE Symposium on Circuits and
Systems, Vol. 5, pp. 493–496, 2004.
• T. G. Constandinou and C. Toumazou, “Neuromorphic electronics for real-time bio-
medical image processing,” INE The Neuromorphic Engineer, Vol. 1, No. 1, pp. 7,
2004.
• T. G. Constandinou, J. Georgiou and C. Toumazou, “Nano-power mixed-signal tun-
able edge-detection circuit for pixel-level processing in next generation vision sys-
tems,” IEE Electronics Letters, Vol. 39, No. 25, pp. 1376-1377, 2003.
• T. G. Constandinou, T. S. Lande and C. Toumazou, “Bio-pulsating architecture for
object-based processing in next generation vision systems,” IEE Electronics Letters,
Vol. 30, No. 16, pp. 1169-1170, 2003.
• T. G. Constandinou, J. Georgiou and C. Toumazou, “An Auto-input-oﬀset Removing
Floating Gate Pseudo-diﬀerential Transconductor,” Proceedings of the 2003 IEEE
Symposium on Circuit and Systems, Vol. 1, pp. 169-172, 2003.
