On-chip spatial image processing with CMOS active pixel sensors by Hong, Canaan Sungkuk
On-Chip Spatial Image Processing 
with CMOS Active Pixel Sensors 
BY 
Canaan Sungkuk Hong 
A thesis 
presented to the University of Waterloo 
in fùlfillment of the 
thesis requirement for the degree of 
Doctor of Philosophy 
in 
Electrical and Computer Engineering 
Waterloo, Ontario, Canada, 200 1 
O Canaan S. Hong 200 1 
B i M i i u e  nationaie 
du Canada 
Aquisitions and Acquisitions et 
Bbliogmphic Services services bibliographiques 
The author has grantecl a non- 
exclusive licence allowing the 
National Ll'brary of Canada to 
reproduce, loan, distriibute or seil 
copies of this thesis in microfonn, 
paper or electronic fonnats. 
The author retains ownership of the 
copyright in this thesis. Neither the 
thesis nor substantial extracts fiom it 
may be printed or otherwise 
reproduced without the author's 
permission. 
L'auteur a accordé une licence non 
exclusive p e t t a n t  à la 
Bibliothèque nationale du Cana& de 
reproduire, prêter, distribuer ou 
vendre des copies de cette thèse sous 
la forme de microfïche/film, de 
reproduction sur papier ou sur format 
electroniqye. 
L'auteur conserve la propriéte du 
droit d'auteur qui protège cette thèse. 
Ni la thèse ni des extraits subsbntiels 
de celle-ci ne doivent être imprimés 
ou autrement reproduits sans son 
autorisation. 
The University of Waterloo requires the signatures of al1 persons using or photocopying this 
thesis. Please sign below, and give address and date. 
ABSTRACT 
Output images fiom the sensors more likely are not optimal results for display or further 
processing mainly because of noise, bliimness and poor contrast. In order to prevent these 
problems, image processors typically accompany the image sensors as a part of the whole 
camera system. Typically, two separated chips for sensing and processing are integrated ont0 
the same printed circuit board connected by printed wires. The integration of image sensors 
and processing circuits on a single monolithic chip, called smart sensing, is done to obtain 
better performance fiom sensors and make the sensing and processing system more compact. 
It has become a popular idea. The integration of image acquisition and processing on the 
same focal plane has potential advantages through low fabrication cost, low power, compact 
size, and fast processing frequency. Noise and cross-ta& can also be reduced through 
monolithic connections instead of off-chip wires, which are the only tramfer medium 
between two separated chips. 
In this thesis, we propose system-level architectures and design methodology for integrating 
image processing with CMOS active pixel sensors on a single chip. Conventional approaches 
to the integration categorized by circuit density of processing elements are not sufficient to 
achieve optimal design with power, speed, cost, and processing fiequency. This thesis 
observes the nature of image processing algorithms and categorizes them in order to find out 
adequate design architecture for real tirne smart sensing. The algorithms can be divided in 
terms of signal type, operational domain, and regions of operation. We narrow these d o m  
into analog/low bit digital operation in spatial domain, and then subdivide the algorithms into 
point, local, and global operational regions. For each region of operation, we look at 
examples of processing algorithms and then subdivide them again according to on-chip 
implementation methodology. Here, we propose systern-level architecture and on-chip design 
methodology for these categorized aigorithms. 
Four prototype chips, in this thesis, were designed and fabncated for the demonstration of 
smart sensing: One is a multi-camera system which is the inspiration for the smart sensing 
research, and the other three are demonstration imagers for each region of operation: point, 
local and global. These prototype chips are 64x64 photodiode arrays with on-chip image 
processing fabricated in standard 0.35 pm CMOS technology with 3.3V power supply. Each 
chip contains diffeieat hctional processing and operates at different performances. We have 
successfully tested the chips with different testing performances and characteristics. 
This thesis reports implementation architectures and design methodologies of on-chip 
processing with image semors, its analysis along with operational performance and 
experimental radts. These implementations demonstrate the advantages of the single chip 
solution and contribute as a milestone so designers and researchers can have a better 
understanding of smart sensing. 
ACKNOWLEGEMENTS 
I would like to acknowledge my supervisor, Prof Richard Homsey, for bis constant support, 
encouragement, motivation, research direction, and belief in me throughout the duration of 
this research. 
I am also deeply thankfûl to Dr. Paul Thomas at Topaz Technology Inc., for his support and 
encouragement in the early research projects. 
1 would also like to thank al1 my colleagues at the University of Waterloo for their valuable 
discussions, suggestions, and supports in helpuig me bewme familiar with many of the 
hardships associated with being a graduate student. 
1 am grateful for the research support and fimding fiom Natural Science and Engineering 
Council of Canada and the Center for Research in Earth and Space Technology. 
I would also like to thank the Canadian Microelectronics Corporation for permission to 
access their processing technology and fabrication of chips. 
Finally, my profound gratitude goes to my parents, my wife Seungeun, and my son Joseph, 
for their patience, understanding, unconditional love and support. 
TABLE OF CONTENTS 
........................................................................................................... Chapter 1 .................... . 1 
1 . Introduction ...................................................................................................................... i 
Chapter II ............................................................................................................................... 5 
2 . Basic Operation and Structure of CMOS Image Sensors ..................................................... 5 
2.1. Solid-State Image Sensors ........................................................................................... 5 
................................................... 2.2. History of Image Sensors at Visible Spectnun 6 
2.3. CCD and CIS for Smart Sensors ...... .. .................................................................... 8 
................................................................... 2.4. Fundamentals of CMOS Image Sensors 12 
.................................................... . 2.4.1 Optical Absorption and Photo-Generation 12 
........................... ................. 2.4.2. Photon Collection (Quantum Efficiency) .. 13 
................................................................................... 2.4.3. CMOS Photodetectors 16 
2.4.4. Active Buffer in Pixel ................................................................................. 19 
......................................... 2.4.5. Operation of Active Pixel Sensor witb Photodiode 21 
............................................................................................... 2.4.6. Readout Control 25 
2.4.6. Sample and Hold ( S m  ............................................................................ 2 7  
2.4.7. Basic Structure of CIS APS array .................................................................... 28 
....................... ..................... 2.5. Future Research Focuses of CMOS Image Sensor .. 31 
Chapter III ............................................................................................................................... 33 
............................... 3 . MOSAIC Multi-Camera Imager System with CMOS Image Sensors 33 
............................................................................................................... 3.1 . Introduction 33 
...................................................................... 3.2. Single Chip verse Multi-chip Systems 34 
3.3. Previous MOSAIC Implementations ..................................................................... 35 
................................................................................................... 3.4. Design of MOS AIC 37 
3.4.1. Integrated Bus Interface with CMOS Image Sensor ....................................... 37 
. . ........................................................................................ 3 A.2. Circuits and Layouts 39 
3.4.3. Demonstration and Tests ..................................................................... 4 2  
....................................... 3 .5 . Conclusions for MOSAIC: Single Chip Camera Modules 53 
Chapter IV .......................... . .............................................................................................. 54 
4 . Spatial Image Processing Integrated with CMOS Image Sensor ....................................... 54 
.............................................................................................................. 4.1 . Introduction - 54  
4.2. Smart Sensors (Vision Chips): Why Smart Sensors? ........... . ............ 5 6  
4.3. On-chip Early Image Processing: What on Smart Senson? ................................ ... 58 
4.4. Architectures for On-chip Processing integration: How to Implement Smart Sensors? 
.......................................................................................................................................... 62 
................................................................................................. 4.4.1. Previous Work 63 
................................................................ 4.4.2. Types of Hardware Implementation 64 
.................................. 4.4.3. Design Issues of Hardware Implementation ........ ... 69 
...................................... 4.4.4. Types of Image processuig Algorithms .............. . 75 
........................................................................................ ................................... Chapter V .... 80 
5 . Point Operation ........................... . ................................................................................. 80 
............................................................................................................... 5.1 . Introduction 80 
..................... 5.2. Cornparisons between On-chip Implernentations for Point Operation 85 
5.3. Design of In-pixel Contrast Stretching ................................................................... 88 
5.3.1. Introduction ................................................................................................... 88 
............................ ........................... 5.3 .2 . Intensity Transformation Function .... 90 
..................................................... 5.3.3. Previous Work on Pixel Level Processing 96 
5.3.4. Designs of CMOS Active Pixel Sensor with In-pixel Intensity Transformer . 96 
................................................................................. 5.3.5. Tests and Performances 103 
............................................................................ 5.3 .6 . Summary and Conclusions 118 
Chapter VI .................................. .... ...................................................................................... 120 
.............................................................................................................. 6 . Local Operation 120 
6.1. Introduction ............................................................................................................. 120 
.......................................................................................... 6.1.1. Smoothing Filters 121 
6.1.2. Sharpening Filters ........................................................................................ 122 
6.1.3. Derivative Filters (edge detection) .............................................................. 124 
.................................................................. 6.2. Proposecl Structure for Local Operation 124 
6.2.1. Implementations of 3x3 Local Mask Filten ................................................. 127 
................................................... 6.2.2. Implementation of Bigger Masks than 3x3 133 
.............................................................. 6.3. Spatial ûn-Chip Binary Image Processing 135 
6.3.1. Fundamental Operation in Binary Image Processùig ................................... 135 
6.3.2. Previous Works on Binary Lmage Processing .............................................. 140 
6.3.3. Design of CMOS Active Pixel Sensor with On-Chip Binary Image Processing 
.................... 141 
6.3.4. Tests and Performance .......................... .. ................................................. 150 
6.3 .5 . Summary and Conclusions ................................... ... .................................... 164 
Chapter VI1 ........................................................................................................................... 167 
7 . Global Operation ............................................................................................................... 167 
............................................................................................ 7.1 . Introduction ....... ...,. 167 
............................................................................ 7.2. Structure of Global Processing .. . 171 
7.3. 2-D Object Positionhg System (OPS) .................................................................... 174 
............................................................. ................................. 7.3.1 . Chip Design . 176 
............................................ ................... 7.3.2. Demonstration and Tests  .., 180 
............................................................................ 7.3.3. Summary and Conclusions 182 
Chapter MI1 ........................................................................................................................ 184 
8 . Summary and Conclusions ............................................................................................... 184 
Appendix A: Inverted Logarithmic Pixel Sensors with Current Readout ............................ 189 
A . 1 . Introduction ........................................................................................................... 189 
A.2. Inverted Logarithmic Pixel Sensors ...................... . .. . ..................................... 189 
..................................................................................... A.3. Testhg and Measurements 193 
.......................................................................................................... A.4. Conclusions 203 
Appendix B: Basic Procedures for Image Capture Test ...................................................... 204 
.......................................................................... Appendix C: Image Sensor Characteristics 205 
............................................................................................... C . 1 . Basic Measurernents 205 
................................................ C.2. Imager Characteristics Extraction and Calculation 206 
................................................................................. C.3. Image Sensor Characteristics 210 
............................................. References 213 
LIST OF FIGURES 
Figure 2.1. Solid-state image sensors over wide spectral range 
Figure 2.2. History of MOS, CCD and CMOS image sensors 
Figure 2.3. Absorption coefficient and penetration depth of silicon at different 
wavelengths of incidental light 
Figure 2.4. Photo-generation and collection of photon-generated electron-hole pairs 
Figure 2.5. Drift and minority diffusion in collection of photogenerated charge 
Figure 2.6. CMOS photodetectors 
Figure 2.7. Active pixel sensor with photodiode and active buffer 
Figure 2.8. Active buffers in CMOS APS with photodiode 
Figure 2.9. Cross sectional view of the photodiode 
Figure 2.1 O. Schematic view of photodiode pixel sensor with parasitic capacitance 
Figure 2.1 1. Configuration of a source follower as a gate buffer and current source 
Figure 2.12. Shift register, using flipflops 
Figure 2.13. Shift register with two inverters in each processing element 
Figure 2.14. Typical sample and hold circuit 
Figure 2.15. Sample and hold with PMOS source follower 
Figure 2.1 6. An advanced sample and hold 
Figure 2.17. Schematic of CMOS APS 
Figure 2.1 8. A simplified timing control 
Figure 2.19. Overall structure of CMOS APS array 
Figure 3.1. MOSAIC multi-camera system with a central controller 
Figure 3.2 Single chip and rnulti-chip system for MOSAiC system 
Figure 3.3. Systematic connection of MOSAIC imager 
Figure 3.4. Chip photo of MOSAIC chip 
Figure 3.5. Array structure of ideai MOSAIC image sensor with an ïntegrated 
bus intdace 
Figure 3.6. Active Pixel Sensor with photodiode and active buffer in integration mode 
Figure 3.7. Schematic of S/H 
Figure 3.8. Shifi register is implernented for readout circuitry, using two 
inverters and switches 42 
Figure 3.9. Redout circuitry is integrated with switches enabled by Bus Grant signal 
Figure 3.10. Test board with MOSAIC chip and lens momted 
Figure 3.1 1. Characteristics of single image sensor 
Figure 3.12. Photosensitivity of single chip of MOSAIC 
Figure 3.13. Dark current measurernent in single chip of MOSAIC 
Figure 3.1 4. Images with different Vbiasn 
Figure 3.15. Images with different SM Vbiasp 
Figure 3.16. Images with different sampling rate 
Figure 3.17. Testing setups for three MOSAIC chips' wmection 
Figure 3.18. Panorama images captured by the MOSAIC system 
Figure 3.19. Test results of MOSAIC imager 
Figure 4.1. Optical image system in human 
Figure 4.2. General machine visionlimage processing operational stages of image 
analysis 
Figure 4.3. Stmctures of focal plane implernentations with image sensors 
Figure 4.4. Number of transistors per pixel as a fimction of process technology 
Figure 4.5. Fil1 factor for different number of transistors in a pixel 
Figure 4.6. Maximum processing time available for the processing elernent for 
different sizes of array 
Figure 4.7. Total power consumption only for image sensor array 
Figure 4.8. Total power consurnption (not including image acquisition) of the 
di fferent array size for di fferent processing levels 
Figure 4.9. Image operation divided by regions of operation: point operation, 
local operation and global operation 
Figure 5.1. Image processing of image negative 
Figure 5.2. Contrast stretching technique 
Figure 5 -3. Image compression 
Figure 5.4. Gray level slicing 
Figure 5.5. Gray-level intensity transformation bc t ion  for contrast enhancement 
Figure 5.6. Onginal image of Matlab simulations on intensity transfomer with its 
histogram and intensity transformation hc t ion  
Figure 5.7. Matlab simulations on intensity transformer (mapping fiinction) with 
con- stretchhg technique 
Figure 5.8. Matlab simulations on intensity transformer (mapping hction) with 
brightness adjustment technique 
Figure 5.9. Matlab simulations on intensity transformer (mapping fünction) 
with gamma correction technique 
Figure 5.10. Die photograph of the prototype chip. The total area is 16 mm2 
Figure 5.1 1. Schematics of common source follower consisting of a transformer with 
enhanced mode NMOS active load 
Figure 5.12. Voltage response of a common source amplifier with enhanced mode 
NMOS active load 
Figure 5-13. Response of a common source amplifier with voltage output of 
photodiode as its input 
Figure 5.14. Schematic of intensity transformer 
Figure 5.15. HSPICE simulations on an intensity transformer 
Figure 5.16. Overall structure of the chip 
Figure 5.17. Schematics of main components of intensity transformer chip 
Figure 5.18. Photoraponse of in-pixel intensity transformer 
Figure 5.19. Sample images captured in real thne by the chip in normal mode 
Figure 5.20. Pattern noise can be reduced by subtracting white background 
image fonn the raw image 
Figure 5.2 1. Characteristics of single chip 
Figure 5.22. Sample images and histograms of three output modes 
Figure 5.23. Onginal images captured in normal mode with differe~t illuminations 
Figure 5.24. Effects of biasing voltage (Vbiasp) 
Figure 5.25. Effects of reference voltage (Vref) 
Figure 5.26. Mismatches in three output modes 
Figure 6.1. Matlab simulations on srnoothing filters 
Figure 6.2. Matlab simulations on sharpenhg filters 
Figure 6.3. Matlab simulations on edge detection filters 
Figure 6.4. Local masks with diffeient sizes 
Figure 6.5. Local masks with different comectivity 
Figure 6.6. Pixel processing for 3x3 Local mask operation 
Figure 6.7. Column processing for 3x3 local mask operation 
Figure 6.8.Chip processing for 3x3 local mask operation 
Figure 6.9. Hybnd method (column + chip processing) for 3x3 local mask operation 
Figure 6.10. Frame memory processing for 3x3 local mask operation 
Figure 6.1 1 .Concept of pipelined local masking 
Figure 6.1 2. Basic structure of pipelined implementation for large local masks 
Figure 6.13. Binary Image Processing with various functionalities 
Figure 6.14. Die photograph of the prototype chip. The total area is 3.2x3.2 mm2 
Figure 6.15. Overall structure of Binary Image Processing 
Figure 6.16. Schmatic of major components in on-chip binary image processing 
Figure 6.17. Detailed structure of On-chip Binary Image Processor 
Figure 6.1 8. Schematic of Voltage Comparator 
Figure 6.19. Logic design and schematics of the switches 
Figure 6.20. Real tirne images captured by the chip in normal mode operation 
Figure 6.2 1. Eff'ectts of fhme rate in normal mode operation 
Figure 6.22. Defect of white spot in normal mode 
Figure 6.23. Removing the defects 
Figure 6.24. Sample images of on-chip binary image processing 
Figure 6.25. Demonstrations of binary image processing 
Figure 6.26. Binary image processing from the shape of the objects 
Figure 6.27. The effects of reference voltage 
Figure 6.28. Comectivity 
Figure 7.1. Transfer fùnction of  different types of low pass fiiters 
Figure 7.2. Transfer function of ideal high pass filter 
Figure 7.3. Transfer hctions of high fiequency emphasis filters 
Figure 7.4. Structure of 2-D Object Positioning System and its basic operation 
Figure 7.5. Structure of global OR gate 
Figure 7.6. Overall structure of CIS array with object positionhg systems 
Figure 7.7. Die photo of Object Positioning Chip 
Figure 7.8. Schematic of a pixel for 2-D Object Positioning Systern 
Figure 7.9. Schernatics of a pixel and event detection latch 
Figure 7.10. Sample images of the 2D object positioning chip 
Figure 7.1 1. When multiple balls exist in the input image 
Figure 7.12. Test resdts of 2-D OPS imager 
Figure A. 1. Structures of logarithmic pixel sensors 
Figure A.2. Simulated effect of lithographic deviation on a regular logarithmic 
pixel sensor 
Figure A.3. Simulated effect of lithographic deviation on an inverted logarithmic 
pixel sensor 
Figure A.4. Schematic view of the sensor structure 
Figure AS. Structures of photodiode used for the inverted logarithmic pixel sensors 
Figure A.6. Variation of the photoresponse of the inverted logarithmic pixel with 
number of load transistors 
Figure A.7. Sample image captured by inverted logarithmic pixel sensors 
Figure A.8. Photograph of the image sensor die. Total die area is 16 mm2 
Figure A.9. Variation of mis pattern noise with illumination 
Figure A. 1 O. Effect of image sensor Voo on image q d t y  
Figure A. 1 1. Effect of transresistance amplifier reference voltage on image quality 
Figure A. 12. Effect of data sampling rate on image quality 
LIST OF TABLES 
Table 1. Major differences in process between CCDs and CISs 9 
Table 2. Present status of CMOS image sensors 3 1 
Table 3. General descriptions and comparisons on hardware implementation structures, 
with their advantages and disadvantages 68 
Table 4. Numerical comparisons of hardware implementation structures for an MxN 
array with S fiames/second 71 
Table 5.General descriptions and comparisons of point operation implementations, 
for different types of the point operation 88 
Table 6. Single chip characteristics in normal mode and contrast mode 110 
Table 7.General descriptions and comparisons of local operation implementations, 
for different size of local masks 136 
Table 8. Characteristics of single chip 151 
Table 9. General descriptions and comparisons of global operation, for different 
operation domain 173 
Table 1 O. Characteristics of chip tests 180 
Table1 1. Swnmary of on-chip implementation methodology for image processing 
algorithms 186 




There are many kinds of electronic camera available on today's market with various 
applications such as document and film scanning, video imaging, still-image capture, 
machine vision, i n h e d  and x-ray imaging, astronomy and microscopy. Despite the wide 
variety of applications, al1 digital cameras have the same basic functional components, which 
consist of optical collection of photons (e.g. a lens), wavelength discriminbation of photons 
(e.g. filters), detector (e.g. solid state sensors), timing, control and drive electronics for the 
sensors, signal processing electronics for correlated double sampling, colour processing, 
analog-to-digital conversion and interface electronics [ 1 1. 
A core component of an electronic camera is the solid-state image sensor that converts light 
into electrical form and, M e r  may process and convert it into an appropnate signal (e-g. 
digital signal). For many years, silicon based image sensors have been extensively 
investigated since silicon has a good light absorption characteristics over the visible spectnun 
and has a mature technology in its processes and VLSI circuits. Over the visible spectral 
range, there are two main silicon-based image sensor technologies, Charge Coupled Devices 
(CCDs) and CMOS Image Sensors (CISs). Although these technologies use the same silicon 
as substrate, they are quite distinct in their photo-characteristics and fùnctional operation. 
CCDs have been the dominant technology for electronic image sensors for several decades 
due to their low dark current, high photosensitivity, low fixed pattern noise, small pixel size 
and structure. However, in the last decade, CMOS image sensors have gained attention fiom 
many researchers and industries due to their low power, low fabncation cost, compatibility 
with VLSI integration, and radiation haniness. Many researchers are attracted by its low 
power, low weight and radiation hardness for deep-space applications. Custom markets are 
interested in CISs for their low fabncation cost and the compatibility of VLSI circuits with 
image sensors. 
This thesis focuses on the VLSI compatibility of CISs and more particularly, on uitegration 
of image processing algorithms on the same focal plane with CISs, so called smart sensors or 
vision chips. This thesis discusses why the integration of the smart sensors is advantageous 
and what should exist on the smart sensors, and how to ïntegrate image processing 
algorithms with CISs (i.e. how to implement the smart sensors). The thesis includes 
recommendations on system-level architectures, applications and limitations of the 
implementation of smart sensors, which are categorized by the nature of image processing 
algorithms. 
The main contnÎutions and objectives of this thesis are summarized as follows: (i) to give 
milestones of designs for integration of image processors with CMOS image semors, where 
designers and engineers can start their initial implementations, and to give a better 
understanding of the integration to give designers and researchers guidance to improve the 
implementation techniques for smart sensors, (ii) to determine the feasibility of the 
integration of image processors with image sensors in standard CMOS 0.35 jm technology, 
(iii) to demonstrate scalability of design with technology, (iv) to forecast possible design and 
implementation issues of the integration in advance, and lady (v) to suggest fiiture research 
directions, for smart sensor implementations. 
In Chapter 2, a bnef description and applications of solid-state image sensor is outlined and 
the history of developments in CCD and CIS is reported. The advantages and disadvantages 
in fimctional operation and processes of CCD and CIS are compared. Then, the basic 
operation of CIS is discussed dong with their basic functional components and structural 
layout. The friture expectations and applications of CIS are also included. 
In Chapter 3, a concept of MOSAIC (Matrix of Semi-Autonomous Imaging Cameras) is 
proposed for large field of view. The dennition and applications of the MOSAIC system are 
also discussed in this chapter. A simple MOSAIC chip with CIS array and bus interface was 
designed and fabncated for a demonstration of the multi-camera concept. The detailed 
designs of the MOSNC chip and its test resuits are explained. The conclusions and 
suggestions for the MOSAIC concept are also discussed at the end of the chapter. 
Based on the conclusions of the MOSAIC chapter, the main focus of the rest of the research 
is on effective integration architectures for image processing algorithms with CIS. In Chapter 
4, the background of image processuig integration with CIS is outlined, including why, what 
and how to employ smart sensors with CIS. This includes previous implementations of 
processing integration with CIS and their relations to this thesis. It explains the sequence of 
image processing analysis and its relation with smart sensors by discussing the structural 
implementation of focal plane integration with CIS. For effective integration architectures, 
we categorize the types of image processing algorithms in terms of signals, domains and 
spatial regions of operation. 
In Chapter 5, the advantages and disadvantages of integration architecture of image 
processing algorithms for point operations are investigated. The definition of a point 
operation is discussed along with examples of this operation. The merits and drawbacks of 
point operation in different implementation structures of pixel, column, chip and frame 
memory processing are compared. The optimal architectures for the integration are also 
proposeci according to general characteristics of sensor applications. A CIS chip with in-pixel 
contrast stretching, also known as an intensity mapping fùnction, was designed and 
fabncated as a demonstration of point operation at the pixel level. This chapter includcs the 
detailed design and fabrication of the chip and its test results. 
Chapter 6 investigates the architecture of image processing integration for local operation. 
The definitions of local operation are discussed along with advantages and disadvantages of 
this technique. Local operational image processing algorithms are divided into 3x3, and 
larger spatial mask implementations according to size of the local mask. Local operation in 
pixel, column, chip and frame memory processing are compared for implementing smart 
sensors, leading to the optimal system-level architectures according to the sue  of the local 
mask. A CIS chip with on-chip binary image processing was designed and fabncated as a 
demonstration of 3x3 local operation at the column level. The detailed designs of the local 
operation chip and its test results are also included. 
In Chapter 7, the architecture of image processing integration for global operations is 
investigated in terms of operational domain, namely fiequency and spatial domaias. A 
definition of global operation is discussed with examples of such operatiom. In this chapter, 
global operation at pixel, chip and fiame memory processing levels are compared listing their 
merits and drawbacks and, thereby, possible implementations are proposed according to the 
operational domain. A CIS chip with an object positioning system was designeci and 
fabncated as a dernonstration of global operation. The detailed designs of this chip and its 
test results are included, dong with the discussion of its optimization. 
Chapter 8 summarizes the work of the research and presents the conclusions derived fiom 
this research dong with directions for further work. 
Appendix A contains design and test results of a chip with inverted logarithmic pixel sensors. 
An inverted logarithmic pixel sensor is a modified pixel structure that has advantages of low 
pattem noise and continuous current readout over conventional logarithmic sensors. This 
chapter also discusses the potential advantages and disadvantages of current mode operations, 
and their applications. The detailed concept and design of the pixel sensor are discussed and 
sample images of the sensor array are demonstrated with their advantages and disadvantages. 
Appendix B discusses the basic procedure for image acquisition in the image sensor test. It 
describes how to test the image acquisition of the image sensor chip for the b t  tirne. 
Appendix C explains image sensor characteristics in the image sensor test. It discusses basic 
measmernent methods, calculations of optical characteristics and makes cornparisons with 
commercial sensors. 
Chapter II 
2. Basic Operation and Structure of CMOS 
Image Sensors 
2.1. Solid-State Image Sensors 
Solid-state image sensors are integrated circuits (usually silicon-based) that contain a number 
of photosensitive sensors in typically a 2-dimensional or I -dimensional array for the purpose 
of converting an optical image projected ont0 the device to an electrical output (usually a 
voltage or current). Compared to conventional camera films, the solid-state image sensors are 
cornputer fiiendly where films need a scanner in order to input images to cornputers. In 
addition, solid-state image sensors can Save time because they do not require developing tirne 
that film inevitably requires, which makes real time operation possible. 
As seen in Figure 2.1, there are many kinds of solid-state image sensors (not only silicon- 
based devices) with very different characteristics over a wide spectral range. Devices 
Y 
X U.V. - 
vidbk f r i n h d  -nt ndb-.. 
Figure 2.1. Solid-state image semors over a wide spectral range [86]. 
may have sensitivity to wavelengths ûom the y-ray spectnrm to radio fiequency spectnmi. 
Yet, a great interest of commercial electronic image sensors resides in the visible spectral 
range sirnply because most of applications are for the visille spectnim. This thesis focuses on 
visible imaging. 
2.2. History of image Sensors at Visible Spectrum 
For visible spectral range, charge-coupled devices (CCDs) and complementary metal oxide 
semiconductor (CMOS) active pixel sensors (APSs) are currenùy dominant technologies for 
image sensors. A brief history of the solid-state image sensors for CCDs and CISs (Figure 
2.2) is well described by Fossum [l] and cm be summarized as follows. 
At the beginning stage of solid-state image sensor developrnent, there was a fom of MOS 
image sensors before CMOS APS and before CCD. In the 1960's there were numerous 
groups working on solid-state image sensors with varying degrees of success using NMOS, 
PMOS, and bipola. processes. In 1963, Momson reported a structure of computational 
sensor that allowed determination of a light spot's position usuig the photoconductivity effect 
1%7 
Photon flux integration 
mode by Weckler 
1992 
Low noise APS 
1985 by JPL 
Scanistor by CCD invented Hitachi CMOS image 
sensors are 
LBM CCDs dominate revived 
I 
/,------ ,,,,,, --''----- 
Computatiod 1 First MOS active PPS wim iotegrated 1 Digital camera- 
Sensorby 1 pixelsensoc by time by NHK 1 on-a-cbip 
Monison Noble 
1966 
Array of phototransistors Integrated PPS by 
by Westinghouse VVL 
Figure 2.2. History of MOS, CCD and CMOS image sensors. 
[2]. In 1964, IBM reporteci the scanistor that used an array of n-p-n junctions addressed 
through a resistive network to produce an output pulse proportional to the local incident Iight 
intensity [3]. In 1966, Westinghouse reported a 50x50 element monolithic array of 
phototransistors [4]. Since none of these sensors performed any intentional integration of the 
optical signal, their sensitivity wes low and thereby, often they required some form of signal 
amplification. In 1967, Weckler from Fairchild suggested operatïng p-n junctions in a photon 
flux-integrating mode [SI. A 1 OOx 1 O0 element array of photodiodes was reported in 1968 [6]. 
Weckler later called the device a reticon and formed Reticon to commercialize the sensor. In 
1968, Noble reported the f k t  MOS active pixel sensor [7]. Noble discussed a charge 
integration amplifier for readout, similar to that used later by others. Here, the first use of a 
MOS source-follower transistor in the pixel for readout buffering was reported. 
in 1970, when the CCD was first reported 181, its relatively low Fixed Pattern Noise (FPN: 
pattern noise in dark room) was one of the major reasons for its adoption over the many other 
forms of solid-state image sensors. The smaller pixel size afforded by the sirnplicity of the 
CCD pixel also contributed to its embrace by industry and it continued until MOS image 
sensors were resmected in the late 1980s. While a large effort was made for the 
development of the CCD in the 1970s and 1980s, MOS image sensors were only periodically 
investigated and compared unfavorably to CCDs with respect to the above performance 
criteria [9]. 
In the late 1970s and early 1980s Hitachi and Matsushita continued the development of MOS 
image sensors [ 1 O], [ 1 1 ] for camcorder-type applications where high-speed operation with 
relatively low resolutions were focused on. In 1982, NHK successfùlly integrated timing 
control with passive pixel sensors. Temporal noise in MOS sensors started to lag behind the 
noise achieved in CCDs. By 1985, Hitachi combined the MOS sensor with a CCD horizontal 
shift register [12]. However, perhaps due to raidual temporal noise, especiaily important in 
low light conditions, Hitachi later abandoned its MOS approach to sensors. 
In the early 19903, the University of Edinburgh (later foming VLSI Vision Ltd.) created 
highly fûnctional single-chip imaging systems where low cost was the main factor. In 1990, 
the VVL reported an integrated Passive Pixel Sensor (PPS) array [ 131. However, due to large 
capacitive column bus loads, the use of PPS was lirnited to small to medium array sizes and 
slow to medium readout speed. By cornparison with CCDs, noise and mismatch effects 
limited the quality. However, low power operation and integration demonstrated viability of 
single chip cameras and integrated sensor-processors. Although, in 1968, Noble 
demonstrated the first MOS buffer amplifier in a pixel, relatively littie active pixel sensor 
(APS) research was carried out for another 10 years, and it took 20 years for major interest to 
be renewed when NASA JPL group began research on low noise APS in 1992 [14]. CMOS 
based image sensors offer the potential to uitegrate a signifiant amount of VLSI electronics 
on-chip and reduce component and packaging cost. Around 1995, after a successful 
demonstration of low noise CMOS APS, CMOS image sensors took off due to their easy 
integration with VLSI circuits, low power consumption, low fabrication cost, and radiation 
hardness. Recently, commercial products using CMOS image sensors have become available 
and increasingly popular, including PC camera, cellular phone cameras, PDA, toys, etc. 
2.3. CCD and CIS for Smart Sensors 
Through the 1970s and 1980s, CCD technology was strong and it still sunives in digital 
camera and camcorder markets, simply because it outperfonns any other soiid-state image 
sensors in the visible spectrurn. The good image quality of CCD is based mainly on low 
noise and low dark current. CCD has low noise level, typically less than 50 noise electrons. 
FPN (Fixed Pattern Noise) of the CCD is less than 1% Vpp of its saturation level with a good 
PRNU (Photo-Response Non-Unifonnity) of 1 - 10% Vpp. In addition, very low dark 
current, typically less than 10 is achieved by this technology. The CCD process 
itself is optimized for optical detection and therefore, the optical absorption and quantum 
efficiency outperforms CIS. Since CCDs can share the sarne area for opticai detection and 
charge tram fer, it does not require any special transistors to tram fer photon-generated 
charges, resulting in a high fil1 factor. However, due to detection and transfer mechanism, 
CCD is limited to serial scanning with complicated driving and interfacing. CCD is also a 
specialized technology that is relatively expensive and therefore, many companies cannot 
afford their own fabrication laboratory. Besides, because CCD is not easily compatible with 
logic, so on-chip ADC and other on-chip processing circuitry seldom exist on a focal plane 
with image sensors, but rather exist in separate chips. 
For the main theme of this thesis (integration of smart sensors), integration feasibility of 
technologies is of interest. Here, focushg on smart sensors, CCD and CIS are compared. 
With aspects of smart sensor implementations, the comparisons of CCD and CIS are very 
well describeci and mmmarized in "Vision Chips" [15] b y  Alireza Moini. The following 
comparisons are adapted fiom this reference, ernphasizing CMOS compatibility for smart 
sensors. Although CCD bas good image quality, CCD is rarely used for smart sensors, 
mainly due to VLSI incompatibility with logic and memory. Other major drawbacks of CCD 
with respect to CMOS are as the follows: 
Input Conbol Clock: A large number of docks are required in order to trigger al1 
pixels in imager array. At least hHo clock phases (or more) are required to read out al1 
the pixels. 
n S 1  integration: CCD is optimized for charge trmsfer (deep diffusions, thick gate 
oxide, etc) and it is therefore difficult to develop logic and memory with the 
technology. CCD is hard to integrate CMOS. The Table 1 shows major differences in 
their processes. From the table, it is quite obvious why these two technologies are 
rarely integrated together. Even if they were to be integrated, the integration cost 
would be very high. 
1 Parameters I CCD CMOS 1 
Gate Oxide Thïckness 
Well depth 
Channel Stop Depth 
Channel Depth 
SourcdDrain Im~lants 





O Fabrication cost: Since CCD technology requires a specialized process, its 
50 A 
P-well depth > 2.5 pm 
fabrication cost is very hi& compared to v e y  standardized CMOS tefhnology. 
Well depth - 0.5 p m  
1 
2 10V 1 13.3 V 
O Power consumption: CCD typically requires high voltage supply to clock the large 
LI 1 pm 
- 0.8 p m  
1 - 0.1 fim 
Several poly-Si and inter- 
pol y dielectrics needed 
capacitive gates of CCD array. Therefore, CCD consumes a large power. 
Digital process has 1 poly, 
analog has 2 polys 
There have been attempts to integrate CCD and CMOS logic [21]. However, due to 
incompatibility of the two technologies, these attempts were not generally successful. Even if 
these two technologies are successfûlly integrated, they never achieve both CCD-like image 
qualiîy and CMOS-like flexible logic. In fact, the optïmïzation for one degrades the 
performance of the other. Besides, the integration of CCD and CMOS often requires over 30 
masks, which is not cost effective. 
In order to effectively implement processing components with the image seosors, designers 
need a technology beyond CCD, in order to increase functionalities of the smart sensor even 
if this means sacrificing image quality of the image sensors. Although CMOS technology has 
been and remains the dominant technology in almost al1 VLSI design areas, CMOS image 
sensors did not take off in imaging device fields until the mid 1990s. After a demonstration 
of the active pixel sensors of CMOS image sensors, they gained attention fkom researchers 
and industries because the CMOS technology offers the following advantages [ 1 51. 
Mature technology: CMOS pmcesses have been available for long period of t h e .  
CMOS processes are well developed and well established. Many engïneers and 
researchers have characterized and optimized the technology. 
Design resources: Many design libraries for circuit and logic are supported by 
various research groups and industries. A large number of circuits and layouts are 
already built in. Designers c m  Save time and effort in simulation and custom layouts. 
Accessibüity: There are many fabrication facilities around the world, which are 
willing to fabricate prototype designs at low prices. Engineers and researchers are 
now able to fabricate their designs without having their own fabrication. 
Fabrication cost: Because CMOS process is standardized, the fabrication of CMOS 
designs is very cheap, compared to other process technologies. 
Power consumption: As CMOS technology scales dom, the downscaling of the 
power supply follows a similar trend, resulting in lower power consumption. In fact, 
CMOS technology is optimized for low power. 
Compatibility with VLSI circuits: Since CMOS technology is already optimized for 
logic and memory, it is easy to integrate VLSI circuits with CMOS image sensors. 
CHAPTER 2 1 i 
Radiation hardness: CMOS image sensor technology is more hardened against the 
radiation defects than CCD technology. Therefore, the CMOS technology is ofkm 
used for aerospace applications. 
For srnart sensors, CMOS becomes a good candidate for the image sensing and inteption of 
processing logic. However, there are a number of disadvantages when CMOS technology is 
implemented, particularly for CMOS active pixel sensors. Acwrding to [15], the major 
disadvantages for implementing smart sensors are as follows: 
Analog circuit: CMOS technology is typically developed for digital logic and 
memory. They are not well characterized and not optimized for analog circuits. 
However, some leading edge technology like RF CMOS brings people's attention to 
this analog characterization. 
Photodetectors: Because image-sensing field is relatively a new era for CMOS 
standard process technology, the photodetector structures are not well characterized. 
Even in recent years, although many companies optimized their fabrication 
processing or sometimes modiQ the processes from the standard ones for CMOS 
image sensors, still characteristics of photodetectors need to be assured by the 
designers. It is the designers' responsibility to assure that the photodetectors fünction 
as desired. 
Second order effects: In CMOS process technology, especially for logic and memory, 
some second order device characteristics, such as subtbreshold operation, are usually 
ignored or paid less attention. However, sometimes these second order effects play 
critical roles such as conversion gain, pattern noise, etc, in image sensing designs. 
Therefore, CMOS technology is sometimes difficult to optimize these image sensing 
behaviours. 
Vt and Lithographic Mismatches: Mismatch in CMOS devices is relatively high, 
which jeopardizes the image quality in CMOS active pixel sensors. Mismatch in 
CMOS devices often leads a poor quality of spatial noise or pattern noise in CMOS 
active pixel sensors, which becomes one of main challenges in CMOS image sensor 
amay design. 
CHAPTER 2 12 
CIS suf5ers fiom relatively poor image quality compared to CCD. However, as the CMOS 
technology becomes mature, as well as its optical characteristics in specialized process, its 
attraction and quality expectation get higher. For smart sensoa, where proper balance 
between image quality and processing circuitry is important, CMOS will be the most suitable 
technology in the futme. As the image qudity of the CMOS active pixel sensors improve, it 
will be exciting to see what smart sensors (beyond only image sensors) become in the next 
few decades. 
2.4. Fundamentals of CMOS Image Sensors 
2.4.1. Optical Absorption and Photu-Generaîion 
Photon detection happens through the excitation of a bound electron to an unbound state. The 
energy of a photon can be transferred to an electron in the valence band of a semiconductor. 
Then, if the photon energy is larger than the bandgap energy E, the electron in the valence 
band is brought to the conduction band. This is how the photon is absorbed in a 
semiconductor material and how an electron-hole pair is generated. Photons with energy 
smaller than E, however, cannot be absorbed and thus, the semiconductor is transparent for 
light with wavelengths longer than = hc&, (where is cutsff fiequency, h is Planck's 
constant and co is the velocity of light in vacuum). For example, for Si, E, = 1.12 eV and Ac 
is 1.1 1 pn whereas for Ge E, = 0.66 eV and the correspondhg As = 1.87 p. 
Absorption coefficient (cm-') Penetration depth (pm) 
.- --- 
0.4 O S  0.6 0.7 0.8 0.9 
Wavelength (pm) 
Figure 2.3. Absorption coeflcient and peneîmîion deprh of silicon 
at d@erent wavelength of incidental light. 
The optical absorption coefficient a plays an important role in photodetectors. The 
absorption coefficient, a, indicates what fiaction of light a given material absorbs at a given 
wavelength. Therefore, the absorption of photons in a photodetector, to produce electron-hole 
pairs and thus a photocurrent, depends on the absorption coefficient a for the given 
wavelength of the light in the semiconductor. The absorption coefficient also determines the 
penetration depth (Va) of the light in the semiconductor material according to Larnbert- 
Beer's law: 
Here, Io is the light intensity at the surface and y is the depth under the surface. The 
penetration depth of the light is at the location where the light intensity becomes I/e (63%) of 
the surface light intensity, Io, whose relation with the absorption coefficient is shown in 
Figure 2.3. Absorption coefficients strongly depend on the wavelength of the light The slope 
of the onset of absorption depends on the type of band-band transition. Therefore, this slope 
is large for direct band-band transition as found in GaAs, InP, Ge and h ~ ~ ~ G a o . u A s  because 
these materials have higher probability for electrons to transfer fiom valence band to 
conduction band with less energy, compareci to indirect transition materials 1851. For Si, Ge 
and wide bandgap material 6H-Sic with indirect band-band transition, the dope of the onset 
of absorption is relatively small. However, silicon detectors are appropnate for the visible 
and near infiared spectral range. The absorption coefficient of Si is one to two orders of 
magnitude lower than that of the direct semiconductors in the visible spectral range. 
Therefore, a much thicker absorption zone is needed than for the direct semiconductors. This 
is a reason why amorphous silicon can have much thinner films for sensing than silicon 
materials. However, silicon is econornically the most important semiconductor and thus 
silicon-based imaging devices and integrated circuits are popular in spite of the non-optimum 
optical absorption. 
2.4.2. Photon Coüection (Quantum Emciency) 
We have seen how the photons penetrate through materials and how these materials absorb 
the incoming photons according to its bandgap energy. Here we will look how these 
penetrated and absorbed photons are collected and transfmed in the silicon-based materials. 
AU carriers that are photo-generated (generated by absorbed photons) in drift regions (also 
called depletion regions or space-charge regions) contribute to the photocment. In other 
words, al1 electron-hole pairs generated in depletion regions are collected by its intemal 
electrïc field (recombination can be neglected due to the fast drift speed). Al1 the carriers 
photogenerated outside of the depletion region are collected by diffision rather than the drift 
mechanism. In the highly doped region (1) of Figure 2.4 and Figure 2.5, the carrier lifetime is 
reduced significantly due to the high doping density, resuiting in a high recombination rate. 
This considerably reduca the ratio of collected electrons to incident photons, also known as 
quantum efficiency (QE), for short wavelengths, because a large portion of the short 
wavelength light is absorbed in region (1). 
Light with long wavelengths penetrates deep into the silicon and diffuses in al1 directions, not 
only towards the depletion region; overall QE is reduced due to this lower collection 
efficiency. Since minority carrier diffision in conventional serniwnductor materials is much 
slower than the carrier drift, collection of photogenerated carriers in region (2) is much 
slower than that in the depletion region. Therefore, the recombination of photogenerated 
carriers in N+ (region 1) and P (region 2), due to the relative slow diffusion speed, reduces 
the quanhun efficiency. In the high dynamic case, carriers being photogenerated in region ( 1 ) 





Figure 2.4. Photo-generation and collection of photon-generated electron-hole pairs 






Figure 2.5. Dr@ and rninority dtmion in collection ofphotogenerated charge. 
region before the light intensity is reduced again. The dynamical quanhnn efficiency, 
therefore, depends on the fiequency or data rate. The higher both of these are, the smaller the 
dynamical quantum efficiency becornes [85]. 
In addition, the recombination of photogenerated carriers in region (2) can still reduce the 
quantum efficiency. The recombination of photogenerated carriers in region (l), however, is 
not so important for long wavelengths due to the large penetration depth and the relatively 
srnail portion of photogenerated carriers absorbed in region (1). 
2.4.3. CMOS Photoîietectors 
Based on the fundamental mechanisms in the absorption and collection of photogenerated 
electron-hole pairs, we continue our investigation on different forms of CMOS 
photodetectors. The detailed descriptions and cornparisons of major CMOS photodetectors 
are well arrangeâ in Fossum's paper [ I l .  The following comparisons are adapted fkom the 
Fossum's paper, with the addition of another significant photodetector structure, the pinned 
photodiode. Figure 2.6 shows main photodetector types of CMOS image sensors. These can 
be divided mainly into two types: passive pixel sensors (PPS) a d  active pixel sensors (APS). 
The PPS consisîs of a photodiode and a select transistor. A charge integration amplifier 
(CIA) readout circuit is located at the bottom of the column bus to keep the voltage on the 
column bus constant. With a given pixel size, it has the highest design fill factor because it 
has only one transistor for the readout QE (quantum efficiency) cm be quite high due to the 
large fill factor and absence of an overlying layer of polysilicon as found in CCDs. The 
passive pixel structure has the major problems of their readout speed and noise level due to 
large capacitive load. Since the large bus is directly connected to each pixel while it is read 
out, the RC t h e  constant is very high and therefore, the readout speed is slow. In addition, 
due to the large capacitive load, a passive pixel's readout noise is typically high, with the 
order of 250 electrons rms, compared to commercial CCDs with less than 10 electrons nns of 
read noise. Therefore, the passive pixel does not scale well to larger array sizes or faster pixel 
readout rates. 
When the passive pixel sensor was întroduced by Weckler in 1967 (51, the problems of the 
passive pixel were quickly realized and a sensor with an active amplifier within each pixel, 








... . . -  . . -  
i P . . -   - Col Bus 
: ..O 0: ............- : i 
CMOS Passive Pixel Sensor (PPS) 
Maximized fil1 factor 
Smder pixel size as technology scales 
I transistor, 2 lines 
Hi& yield due to its simplicity 
High QE due to few overlaying device 
Slow readout and high noise due to high bus 
capacitance 
Photodiode CMOS APS 
Pixel consists of a floating reverse biased p-n 
j d o n  
3 transistors, 4 lines per pixel 
Sense node and integration node are same 
Noise and full-well trade against each other 
Moderately high Quantum Efficiency (QE) 
Photogate CMOS APS 
Pixel consists of a MOS capacitor coupled to a 
floating reverse biased p-n junction 
5 transistors, 6 lines per pixel 
Sense node and integration node are separate 
Low noise, small Ml-well 
Low QE 
Difficult to implernent in advanced sub-micron 
process 
Pinned Photodiode CMOS APS 
Pixel consists of pimed diode @'-n-p) 
4 transistors, 5 lines per pixel 
Sense node and integration node are separate 
Low noise, very mal1 full-well 
QE lower than that of PD 
Difficult to implement in advanced sub-micron 
process 
Figure 2.6. CMOS photodetectors. 
improved performance compareci to passive pixels by using the voltage buffer (source 
follower) within a pixel. Typically, the pixels have a fill factor of 20-30% [l]. Due to the 
loss in filI factor, the photon-generated signal is reduced. However, the reduced capacitance 
in each pixel leads to lower read noise level of the array, and therefore the dynamic range and 
SNR increases. Main types of the active pixel sensors can be subdivided furtber into 
photodiode, photogate and pinned photodiode (see Figure 2.6). 
Photodiode APS: Pixel array has on-chip timing, control, correlated double sampling and 
fixed pattern noise (FPN) suppression circuitry. It has three transistors in each pixel with a 
typical pixel pitch of 15x minimum size of the technology [l]. The photodiode APS has 
higher QE than the photogate pixels (Figure 2.6) because there is no overlying polysilicon 
which is required for photogate. The output photodiode signal is supposedly independent of 
detector size because a decrease in detector size is compensated by an increase in conversion 
gain with less pixel capacitance. However, peripheral capacitances fiom the perimeters of the 
detector increase the total capacitance of the sensing node and thus, decrease the conversion 
gain. Despite of the reduction of the capacitance in the pixels, read noise is limited by the 
reset noise on the photodiode since correlated double sampling is not t d y  correlated without 
fiame memory. As the pixel size scales down, photosensitivity decreases and the reset noise 
scales as CI> where C is the photodiode capacitance. Therefore, the tradeoff can be made in 
designing pixel fill factor (photodiode area), dynamic range, Signal-to-Noise Ratio (SNR) 
and conversion gain (pVIe1. 
Photogate APS: The basic idea of photogate pixel cornes fiom CCD. While photon- 
generated charge is integrated under a photogate with high potential well, the output floating 
node is reset and the corresponding voltage is read out to one of S/H in CDS. When the 
integration is done, the charge is transfmed to the output floating node by pulsing signal on 
the photogate. Then the correspondïng voltage fiom the integrated charge is read by the 
source follower to the second SM of the CDS. The CDS outputs the difference between the 
reset voltage level and the photo-voltage level. The correlated double sampling can suppress 
reset noise, l/f noise, and FPN due to V, and lithographie variations in the array. Therefore, 
the main noise of the photogate is photon shot noise that cannot be suppressed by any means. 
The photogate has a pixel pitch typically equal to 20x the minimum size of the technology 
due to five transistors in each pixel. The floating diffusion capacitance is typically made with 
a small capacitance of the order of 10 fF yielding a conversion gain of 10-20 pWe' and 2 e- 
m e t  noise. However, due to the overlaying polysilicon, there is a reduction in quantum 
efficiency, particularly in the blue. However, the reduction of noise level increases the total 
dynarnic range and SNR. 
Pinned photodiode APS: The pixel consists of pinned diode (pf-n-p), where photon 
collection area is dragged away from the siaface in order to reduce surface defect noise such 
as dark cment. Photon-generated charge is uitegrated under a pinned diode and transferred 
to the output floating d i m o n  for the readout. Similar to the photogate, sense node and 
integration node are separated so as to optimize the noise. However, the main difference fkom 
the photogate is that the potential well for the charge collection is generated by burieci 
intrinsic layer (or n type layer) instead of pulsed gate voltage in the photogate. Each pixel has 
four transistors and five control lines, resulting in fil1 factor, which is higher than photogate, 
but lower than photodiode. In addition, due to a small photon collection area of pinned diode, 
it has a very mal1 full well for photon-generated charge collection with lower QE, compared 
to the photodiode. 
2. 4.4. A&e Buffer in &I 
A definite difference between active pixel sensors and passive pixel sensors is the inclusion 
of an active buffer into the pixel. The passive pixel sensors s u f k  from low data rate and high 
1 Column 
Active 1 - - - - 1 - q  
Buffer , 







Figure 2.8. Active buflers in CMOS APS wifh photodiode: (a) NMOS source 
follower (b) Uniîy gain amplijier. 
readout noise due to the large capacitive loads that are directly comected to photodetection 
area. In each active pixel, an active b a e r  is placed, comecting it to the column bus h e ,  as 
seen in Figure 2.7. By adding the buffer, the charge integration area of the pixel is isolated 
nom the column bus, and instead comected to the gate of the active buffer, whose 
capacitance is much srnaller than that of the bus line. The smaller capacitance of the 
integration and conversion node of the pixel allows a faster data rate and a lower readout 
noise. Types of active buffers are source follower, unity gain amplifier and others, as show 
in Figure 2.8. 
Source Followec Source follower, typically a NMOS source follower, is a common choice 
for APS arrays because of its simplicity and small number of transistors. Source followers, 
however, sufTer fiom lithographical mismatches and V, deviations, resulting in significant 
pattern noise in the image sensor array. 
Unity Gain AmpWitr (UGA): It has a feedback between input and output, remaining at a 
steady gain of 1 despite of the lithographical and V, mismatches. However, due to 
complexity of circuits and a relatively large number of transistors for the OPAMP, the UGA 
cannot find a practical fit in a pixel. Instead, the UGA is located per column where the 
implementation area is flexible in the vertical direction. Photon Vision Systems Inc. 
produced a clever way to implement UGA per column with CMOS image sensors, so called 
Active Column Sensor (ACS), claiming reduced FPN of less than 0.1 % [16]. 
Others: There are many different kinds of active bufTers implemented with CMOS image 
sensors, such as adaptive pixel sensors, pixels with feedback for low FPN and pixels with 
current amplifier. These pixels are for special uses with various applications, different from 
those of standard voltage buffers. in addition, the complexity of the circuit and the number of 
transistors are often sa large that they cannot be easily implemented in pixels for practical 
applications. 
2A5. Operation of Ache &el Sensor w'1h Photodiode 
We have corne to understand basic structures of active pixel sensors and their operation. Here, 
a more detailed mathematical analysis of these operations, particularly for the photodiode, is 
illustrated. There are three stages of the operation in photodiode with integration mode: (1) 
photocurrent generation, (2) photocurrent integration and conversion and (3) photo-voltage 
readout [87]. The mathematical analysis is based on these stages of the operation. 
First, photocurrent generation in a vertical n-p photodiode consists of drift current and 
d i h i o n  cment. This is written in Equation 2.4.2. 
Jtot = Jdrirt + Jdin Equation 2.4.2 
Under the assurnptions that the n-layer (Figure 2.9) is thin enough to cause negligible 
absorption and that thermal generation (dark current) can be ignored and al1 the incoming 
light is absorbed (q=l, 100% of quanhm efficiency), optical generation rate can be written 
as 
G (x) = Ioaexp (iicx) Equation 2.4.3 
here 4 is the light intensity at the surface and a is the absorption coefficient. The drift current 
is therefore, 
here W is the width of the depletion layer and x is the depth fiom the surface. For x > W in 
the p-type, a d i m i o n  equation can be written as 
Light Photons 
Figure 2.9. Cross secfional view of the photodiode. 
here D,, is the diffusion coefficient for electrons, rn is the minority c d  lifetime, and pno is 
the equilibrium minority carrier concentration. With the boundary conditions for the above 
equation of 
Pn= Pno @ X = O O  
P n = P n o @ ~ = o  




Therefore, the current density of diffusion is given by Equatàon 2. 4.8 
and so the total current density of the photocment is 
Therefore, the total current density of photocurrent is linearly proportional to incident light 
density, as shown in equation 2.4.9. 
The second stage of the APS photodiode is the charge integration mode. Afkr the photodiode 
is reset, the capacitor (Figure 2.10) is discharged by the photocurrent. Therefore, the output 
voltage of the photodiode is a fùnction of time after the photodiode has been reset. Since the 
photodiode is isolat& the current in the capacitor must be equal and opposite to the 
photocment (ignoring lealcage currents). Hence, the photocurrent can be expressed as 
For an nCp photodiode, the capacitance is 
where A is the diode area, hi is dielectric constant of silicon and Na is the acceptor 
concentration in the substrate. 






Figure 2.10. Schematic view ofphotodiode pixel sensor with associated capacitance. 
where Vo is the diode built in voltage, and V,, is the reset reverse bias. 
Equation 2.4.13 
Interestingly, this expression includes a term of A, the photodiode area. However, this is 
cancelled out because ifioto a IoA where Io is the incident flux of photons. Therefore, the 
collected voltage is independent of the diode area for a given photon flux. In reality, due to 
the peripheral capacitance of the photodiode and other sources of capacitance not 
proportional to area, the diode area does have some degree of impact on the total capacitance 
and thus the output voltage. If we calculate V(t) as a b c t i o n  of time with practical 
parameters, the voltage drop is almost linear for short times, which is the linearity we want. 
The last stage of the APS photodiode is the integrated voltage readout through the active 
buffer. Provided that V, > Vbk - VTL, L (Figure 2.1 1) is in saturation and c m  be idealized 
by a current source, 1. Then the source follower in the active buffer can be restructured. For 
transistor M, 
2 1 = K WS - VTM] = KWdiode - Vout - VTM] 2 Equation 2.4.14 
where K = I/2pCOx(W/L). 




Figure 2.11. Configuration of a source follower as a gate bufir and current source. 
Rearranging gives 
Equation 2.4.15 
Where 1 is the current through the current source. The maximum possible VOld = VdPde - VTM 
or including the reset voltage, 
Vout < VDD - (VTM + VTR) Equation 2- 4. 16 
Therefore, the maximum practicd output swing is 
Vbiu - V n  < Vaut < VDD - (VTM + Vm) Equation â4.Z 7 
In addition to active pixel structure, another essential component of CMOS APS array is the 
readout control circuits, controlling the image readout sequence of the array. Two main 
structures of the readout control circuit are a decoder and a shifi register (SR). The decoder 
c m  be used for true random access readout controls because the sequence of the outputs can 
be selected by the input of the decoders. With RAM, the sequence of the inputs (therefore, 
outputs) c m  be programmed in advance. 
In contrast, shift registers cannot be programmed for random access readouts because the 
shifi registers produce only sequential outputs fiom the first element to the 1s t  one. Shift 
registers (SRs) are relatively easier to implement and use fewer transistors than decoders, and 
SR is easier to expand. This thesis uses two designs of SR: flip-flop structure (FF SR) and 
two-inverter structure (INV SR). The flip-flop shifi register consists of flipflops (FFs) in 
series ~0mieCted fiom a flip-flops output to the input of the next one (Figure 2.12). The FF 
SR transfers its content to the next one by input clock pulse. Since the layout of various FFs 
can be found easily in design libraries of design packages, the design and imp1ementation of 
the FF SR is relatively simple. However, because the pre-built FFs have fixed dimensions (- 
23 pm in our case), it becomes harder to fit the design into a narrower column width, as the 
pixel size gets smaller. Therefore, a custom design of an FF is required eventually. Another 
SR structure consists of two inverters in each processing element (Figure 2-13), which holds 
the input pulse and transfers through control clocks. The INV SR needs two control clocks 
and thus, its input control becomes harder than for the FF SR. However, due to this mal1 
number of transistors, this i N V  SR can fit into a column width easily. With 0.35 Fm CMOS 
technology, we were able to design an INV SR with a 7 p pitch. 
Figure 2.12. Ship regtster. usingflip-flops. 
Clock 2 
Figirre 2.13. Shi# regkter with two inverters in each processing element. 
2.4.6 Sampfe and HOU (m 
At some point, unless the array outputs the data in parallel with a same number of channels 
as columns, an imager array transfers its images to a serial output. Typically whole rows are 
dumped into storage buffers and then transferred one by one in series to the output. Hence, 
the array needs storage for the analog image data until aii the data of one row are transmitted 
out. The storage is refmed to as a sample and hold ( S M ) .  A standard S/H is shown in Figure 
2.14. S/H for CMOS APS typically uses a PMOS source follower (PMOS SF) (as shown in 
Figure 2.15), as an active buffer because the PMOS SF can compensate for the Vt drop fiom 
the NMOS SF in CMOS APS. 
Although Vt of NMOS is different f?om that of PMOS, PMOS SF does the level shifhg, 
positionhg output voltage to approximately the same voltage as the photon-sensing node. 
Figure 2.14. Typical samp le and hold circuit. 
Sample I Vb iaa  
Vin 
Figure 2.15. Sample and hoid with PMOS source foifower. used in a 




I Vin Vout 
Capacitor 1 
Figure 2- 16. An advanced sarnple and hold. 
An advanced SM is illustrated here in Figure 2.16, with an anti-feedthrough dummy switch 
and unity gain amplifier. Capacitive feedthrough of the clock happens due to the presence of 
a capacitive voltage divider between the gate-drain (source) and the load capacitance when 
the original switch is off. By placing a dummy switch afier the switch, half the channel 
charge is injected toward the dummy switch, matching with charge that would be in 
capacitive voltage divider. However, it is signifiant only when a capacitor in S/H is 
relatively small and becomes comparable to the gate-drain oxide parasitic capacitance. In 
addition to a dummy switch, a unïty gain amplifier can be used. The UGA does not do level 
shifting like the PMOS SF. With a constant unity gain, it can reduce column pattern noise 
caused by Vt and lithographical mismatches in the column circuits. 
2.4.7. Basic Structure of CIS APS array 
We have seen basic components of the CMOS APS. In order to constxuct a complete CMOS 
APS array, we need to put them in proper order and in their proper locations. Here, a simple 
photodiode APS of integration mode is taken as an example (Figure 2.17). Each pixel 
consists of a photodiode, a reset transistor, a row select transistor and a source follower 
without bias transistor. The rest of the circuits are located in colwnn. Since the reset 
transistor and the row select transistor use a N ' O S  switch, an active high shift register is 
used for reset and row readout controller. Since PMOS switch is used for a buffer in Sm, an 
active low shift register is used for column readout controller. 
First, the sense4integration (or floating diffusion) node is reset to VDD-VT and afier an 
integration time (upto one frame readout tirne), the row select is tumed on, dumping the 
CHAPTER 2 
Pixel Column Signal Chain 
1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  , - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
1 






Figure 2.17. Schematic of CMOS APS. including photodiode, active buffer, SH 
und output buffer. 
image voltage to the S/H. Since the row select is turned on while the reset for the pixel is 
turned off and the reset for another pixel may be tumed on, two separate SRs should run 
concurrently with different input pulsed, but the same clocks. Once the S M  stores concurrent 
row images, the column select is tumed on one by one until al1 the colurnns are read out, as 
shown in the simplified timing control of Figure 2.18. The sample switch is typically needed 
because while the column images are read out, row select is still on, dumping image voltages 
to the SM continuously. 
Therefore, because of the readout t h e  difference between the first column and the last one, 
the column images will not be concurrent values, potentially causing artifacts. In order to 
prevent this artifact, the sample switch is activated after the row select is on. Once one row is 
read out, the next row follows the same procedure and this procedure is repeated until al1 the 
values of the array are read out. in order to use the operational t h e  effectively, the reset 
switch is typically turned on for a short period of time right after the row select is tumed off. 
By doing so, the photodiodes in the row are in integration mode, discharging the floathg 
diffusion by photocurrent while the lest of the rows in the array are read out. Figure 2.19 
shows the core structure of CMOS APS array. The reset and row select shift registers are 
located in both sides of the sensor array. The colurnn select shift register is located at the 
CHAPTER 2 
Reset 
Row Select m 
Sample m 
Col. Select 1 
Col. Select 2 
Col. Select n 
Row Select m+l 
Sample m+l 
Col. Select 1 
Col. Select 2 




I Colum n Scanning 
Digiîal 
Image 
Figure 2.19. Overail structure of CMOS APS array. 
bottom of the array, cotmected to the output buffers in the SM. The bias transistors are 
placed away from the pixels for high fiii factor, in the columns, so this bias transistor bank is 
located right below the sensor array and the S M  with output buffer is placed beiow. The 
timing and control can be on the sarne focal plane with the sensor array or off the chip. 
2.5. Future Research Focuses of CMOS Image Sensor 
Using standard CMOS technology, various image sensor arrays have already been 
dernonstrated by numerous research groups including NASA's Jet Propulsion Laboratory, 
Lucent techwlogy, MEC, VLSI vision Ltd., IBM, Hyundai and many other companies, as 
shown in Table 2. 
xzE+zq- 10 bits 
10 bits 
40 FPS 8 bits 
500 FPS 8 bits 
30 FPS 50 pAlcmZ 
25 F PS 2 nAlan2 10 bits 
60 FPS 8 bits 
3 nAlanZ 9 bits 
0.25 nA/cm* 
I 10 bits 
Table 2. Present stutus of CMOS image sensors from several companies [86]. 
The next generation of CMOS imaging technology is expected to develop in two directions. 
The first effort is for highly miniaturized, low-power, high quality imaging systems. Such 
irnaging systems are driven by mormance, not cost This effort is led by the U.S. Jet 
Propulsion Laboratory (PL) for next-generation deepspace exploration. CMOS image 
sensor is a suitable technology because of its relative radiation hardness for space 
applications. In addition, because CMOS consumes low power, the weight of the battery can 
be drastically reduced. However, since CMOS APS still suffers fiom high dark current and 
noise, leading to relatively poor image quality, this high perfomance research typically 
focuses on low noise, high image quality, low power, high speed and high resolution. 
The second effort is to create highly functional single-chip Ünaging systems where low cost, 
and not performance is the dnving factor. AIthough CCD technology is highly optimïzed for 
image sensing applications, its cost will probably not be significantly reduced in the future 
and applications will require multiple chip systems. For many researchers, the advantage in 
developing CMOS imaging technology is the complete integration of image sensor with low 
cost, analog-to-digital converters, driving and control circuitry, and sophisticated interfaces - 
al1 convenient to addressing the technical challenges posed by digital imaging applications. 
In addition, the integration of image processors with CMOS image sensoa remains as an 
attractive opportunity, with recent great successes in various applications, such as digital still 
carnera, video cellphones surveillance, medicine and dentistry, aerospace, machine vision 
and automobile industry. 
In CMOS image sensor technology, these two research directions are not win-or-lose 
situations, rather they are two distinct firture research fields. Despite the aggressive 
developments of CMOS image sensor performance, there are debates whether CCDs will 
defend their position as the dominant image sensor technology in the fùture and never give 
up its mainstream market to the CMOS counterpart. However, CMOS image sensors will 
find their places for imaging systems and applications, for example, for space applications 
and for portable devices like videophone and PDA. In the long term, the ability to integrate 
complete CMOS imaging systems on a single chip will be one of the driving focuses in 
developing the next generation of multimedia imaging systems. 
Chapter III 
3. MOSAIC Multi-Camera Imager System with 
CMOS Image Sensors 
3.1. Introduction 
As a part of the fbture expectations of CMOS image sensors, a method of achieving high 
resolution over a wide fieId of view is investigated. An integrated smart sensor, MOSAIC 
(Matrix of  Semi-Autonomous Imaging Cameras), for large field of view is proposed in this 
thesis. 
A MOSAIC imager design is described for a distributeci sensor consisting of 102 - 1o3 
identical detection modules linked by a serial bus to a central controller, seen in Figure 3.1. 
Since smaller single chips are used in the MOSAIC imager, relatively high yield, 
MOSAIC camera 
Figure 3.1. MOSAIC multi-carnera system with a central contdler. 
CHAPTER 3 34 
high resolution and low cost can be achieved. The MOSAIC concept can be applied to 
various applications such as airborne remote sensing, the filling of the focal plane of a large 
telescope, monitoring of the sky for meteors, monitoring of ships at sea, inter-satellite data 
sharing, and perimeter surveillance. 
One of the focuses of the present MOSAIC system is the devefopment of an efficient 
communication mechanism, achieved by ïntegrating the CMOS image sensor and bus 
interface module on the same chip. The integrated bus interface module increases 
performance of the bus connections by a zero-wait state design that does not require 
operation time for address over-head. MOSAIC imagers increase the field of view and 
fabrication cost effectively, by comecting single-chip cameras in a coordinated manner 
equivalent to a large anay of senson. Components that would have conventionally been in 
separate chips c m  be integrated on the same focal plane by using CMOS image sensors. Here, 
a MOSMC imaging system is constructed ushg CIS comected through a bus line (called the 
image-bus) which shares common input controls and output(s), and enables additional 
cameras to be inserted with little system modification. The MOSAIC system consumes 
relatively low power by employing intelligent power control techniques. However, the 
bandwidth of the bus is still expected to limit the number of camera modules that can be 
connected in the MOSAIC array. Hence, signal-processing components, such as data 
reduction and encoding, will be needed on-chip in order to achieve high readout speeds 
(these will be addressed in Chapters 5, 6, 7). Basic modules for a single-chip camera are 
proposed for efficient data transfer and power control in MOSAIC imager. 
In this thesis, the MOSAIC smart image chip, corresponding to the scheme described above, 
is implemented using a CMOS 0.35 p m  double poly technology with 3.3 V power supply. 
The irnplementation demonstrates the advantages of the single chip solution for the MOSNC 
imager in terms of area, power, speed, and fabrication cost. The thesis describes the design 
and performance results of the chip, dong with their background algorithms. In addition, the 
design of the intelligent bus interface and the architecture of the system are addressed. 
3.2. Single Chip verse Muiti-chip Systems 
Large-fonnat and MOSAIC imagers for astronomical, surveillance and other applications 
require high spatial resolution, coverage of a large area, effective cost and efficient image 
update rate. One solution for large format applications is a single monolithic chip, made with 
either a large array of pixels or an array of large-sized pixels. A large pixel (optical) area, 
Figure 3.2 (a), leads to low resolution that is ofien not desirable, while increasing the number 
of pixels in the array, Figure 3.2 (b), leads to a hi& complexity of circuits and consequently 
a high noise floor. In addition, large single chips have relatively Low yield, resulting in a high 
fabrication cost. Another solution for the large-format image sensor applications is a 
MOSAIC system containing many individual sensor chips, as show in Figure 3.2 (c). 
33. Previous MOSAIC Implementaüons 
There have been several atternpts to implement the MOSAIC concept into image acquisition 
applications. This thesis takes three examples where the MOSAIC concept has been applied: 
machine vision [17], astronomical telescope [i 81 and medical tele-pathology [19]. 
There are several previous designs for machine vision such as the DRIFT bus and Improved 
Integrated Smart Sensor ( I ~ s ~ )  bus [ 1 71. These are efficient, high performance bus structures 
in machine vision. The buses are used for communication between image processors and 
memory modules or other peripheral modules, not as direct connections between image 
sensor modules. These bus structures focus more on communications between image sensors 
and peripheral devices, cornpared to our MOSAIC system where communication among 
image sensor chips is emphasized. Also, because the bus comection and its handling 
modules are separately located fiorn the image acquisition modules, the system fabrication 
cost will be relatively high. 
(a) Away of large- (b) AW of large (è) Array of Chipxels 
srted pixels number ofpkels (UOSAIC) 
Figure 3.2 Single chip and rnulti-ch@ system for MOSAIC system. 
Secondly, &ere is an example of MOSAIC concepts used in astronomy telescope, called 
NOAO Mosaic Data Handling System [lS]. The system takes data h m  a mosaic of CCDs 
and dewdes, records, archives, displays, and processes the data. The NANO Mosaic CCD 
Camera consists of 8 CCDs producing an 8K x 8K format Unlike CMOS cameras, CCD 
cameras do not contain significant combinational logic, hence communication between the 
components is handled through a software intensive facility, cdled a message bus. Also, the 
use of multiple CCDs requires that data be read out simultaneously fiom al1 CCDs, hence the 
raw data is interleaved as it arrives fiom the detector and must be '2inscrambled" before 
being written to disk or displayed. Therefore, a powerfbl computer system and efficient 
software are required to be able to handle such large formats in the data handling system. 
Telemedicine and tele-pathology delivering medical diagnoses and heaith care to distant 
patients is another MOSAIC concept implementation [19]. This tecbnology covers the entire 
view of the patient site with several &mes of images, and automatically composes a wide 
field of view and hi&-resolution image of patient h m  these fiames by using the computer 
techniques for generating digital image mosaics. The patient image capturing equipment 
consists of several high-resolution video cameras, and their connections are made through 
ISDN network or communication satellites. Therefore, the system may require a higher 
communication cost because of the greater amount of transmitted information, 
communication network and computer power. It also emphasizes image interpolation in 
software, rather than an efficient data transfer mechanism in hardware. 
The previous implementations of the MOSAIC concept are shown above to be rather 
complicated and require intensive integration of expensive software. Often, pst-processing 
mechanisms are required to produce a suitable image quality. These works also need several 
different functional modules in physicaiiy separated forms: carnera, processing components, 
interface modules and bus connections. Therefore, the manufacturing cost is relatively high. 
In addition, the previous systerns focus on problems of software-based image alignment 
rather than implernentation of connection in their image acquisition system because the 
cameras are not pdectly aligned and have gaps between the carneras, requiring interpolation, 
image combination and dithering. 
A simple and cost effective implementation is suggested in this thesis. A single chip solution 
of MOSAIC system integrated with low-level hardware pre-processing units is proposed to 
improve its communication, cost, speed and computing power. The suggested 
implementation of integrated bus interface, called a "'chipxel (chip + pixel)", is more focused 
on the low-level hardware design with effective fabrication cost and simple systematic 
connections. Consequently, the chipxel emphasizes the method of comecting the multi- 
cameras efficiently, rather than how to interpolate the images nom the ordinary cameras in 
software. The chipxel is unique, compared to the previous works, irnplernenting MOSAiC 
concept as a single chip solution. Since the optimization of image sensor connections is 
emphasized in the single bus line, the integrated image camera with processing and bus 
intedace units is proposed here for MOSAIC applications, with considerations for speed, 
fabrication cost and complexity of the design. 
3.4. Design of MOSAIC 
3.4.1. Integrated Bus Interface with CMOS Iniage Sensor 
The systematic connections of MOSAIC imager systems can be divided into three different 
categones as shown in Figure 3.3: multiple inputs tu the controller with one output fiom each 
camera, one input to the controller through a hub connecting multiple cameras, and one input 
to the controkr comecting multiple cameras through a bus line. In a controller with the 
multiple inputs, Figure 3.3 (a), the output of each camera is connecteci to a controller and the 
controller arbiîrates the incoming outputs of the cameras and multiplexes/encodes into one 
data Stream. This connection potentially suffers fiom high fabrication cost and slow frame 
rate because the controller needs multiplexerlencoder to combine the multiple streams of data 
into one Stream for fiirther processing. In addition, as the number of carneras in the system 
increases, the complexity of the controller will increase. When more cameras are added into 
the system, the controller has to be redesigned to create more channels for the additional 
cameras and the multiplexer/encoder should be implemented with the new channels. 
Therefore, the system is less flexible to the inclusion of additional cameras. 
In the second architecture, Figure 3.3 (b), the multiplexer/encoder which exists in the 
controller of the first system is now separated h m  the controller and replaced with a hub, 
comecting multiple cameras and stre-g one output to the mntroiler. However, 
1 Contrai Controiier 1 
1 Central Controller 1 
Hub I 




Figure 3.3. Systematic connection of mosaic imager can be categorized into (a) 
multiple outputs fîvm cameras to controller, (b) multiple oulputs fiom cameras and 
single input through hub to the controller and (c) single input to the coniroller with 
integrated bus interface in cameras. 
because there are a limited number of channels fiom cameras that a hub can take, the 
fabrication cost and complexity are again relatively high. Whenever additional cameras are 
connected to the system, extra hubs are required. Now, intelligent cameras of chipxel, each 
unit with an integrated bus interface, are proposed here. The multiplexer/encoder is taken 
away fiorn the controller and integrated into each camera. The output data from the 
distributed carneras are streamed ioto the controller by the integrated bus interface through a 
comrnon bus line, as shown in Figure 3.3 (c). Therefore, it is easy to integrate additional 
cameras into the system with linle modification to the central controller or to the connections. 
In addition, when the MOSAX system needs independent processing such as event detection, 
a bus interface is a complementary component in each camera because each camera should 
be capable of indicating when it detects events and when it needs the bus he. The integrated 
bus interface therefore significantly increases the flexibility of the system; it requires neither 
many communication lines nor an expensive hub. In addition, because the signal does not go 
through many units, the noise level is relatively low and the communication speed is 
relatively high. In summary, the integrated bus interface in each camera has the advantages 
of low fabrication cost and high fiexibility over the other systems. 
3.4.2. Circuits and layouts 
A standard CMOS image sensor array was implemented with the chipxel to demonstrate 
continuous data transfer in the MOSAIC imager system. A MOSAIC chipxel, whose photo is 
shown in Figure 3.4, was designed and fabricated with 0.35 pm double poly technology with 
3.3 V power supply. The structure in Figure 3.5 includes an image sensor array with pixel 
readout circuitry, shift registers, sarnplehold and bus interface. Al1 the components except 
for the bus interface are used widely in CMOS image sensor designs. Here, a bus interface 
was integrated with CMOS image sensor array for the MOSAIC imager connections. 





ControI ] 7 Column Scanning with Enable 1 
Figure 3.5- Array strzrccure of ideal MOSAlC image sensor with an integrated bus 
interface. 
Each pixel in the CMOS image sensor array consists of a photodetector and readout circuitry, 
seen in Figure 3.6. The photodetector uses an n+p photodiode, one of the simplest sensor 
structures in CMOS image sensor technology. A simple source-follower is used for the 
readout circuitry of the pixel, which blocks capacitor loading nom column line of the array. 
The maximum output voltage of the array, due to the voltage &op of the source follower, is 
Vt lower than the achial photo-generated voltage, unless M e r  processing occurs (where Vt 
is the threshold voltage of the source-follower transistor). The second part of the system is 
for the generation of input control signals. The generation of readout input signals in general 
can be performed by a shifi register, taking less area with a simple design structure, for reset, 
row and column readout controls in this design. In addition, because the size of shifi register 
CHAPTER 3 41 
can easily be aligned with each column of the anay, the shift register is more suitable for 
column structure based implementation than the decoder (see Figure 2.4.6). The shift register 
in the chipxel uses two inverters and two swïtches with two control clock signds, shown in 
Figure 3.8. 
The sampleniold (SRI) is a storage place for image to be transferred to the outside. Since Vt 
is lost by the source follower in the pixel readout, the samplehold uses a PMOS source 
follower, shown in Figure 3.7, so that the lost Vt voltage can be recovered by Vt rise of the 
PMOS. Because the source follower in the samplehold is the off-chip driver, where a large 
loading exists, the sample/hold shouid use a large MOSEFT in its source follower. As the 
size of the driver increases, the dnving power of the driver increases, thus speeding up the 
signal readout on the large extemal loading. However, the large size also causes larger power 
consumption. Therefore, an appropriate size of the driver is determined in a given design 
specification for both speed and power consumption such that the product of the speed and 
power is at minimum. 
There are two mainstream bus interface schemes available for the chipxel chip: independent 
request and grant (RG) and daisy chain methods. The independent RG sends a bus request 
signal to the controller whenever it needs to transfer data using its own designated control 
lines, which is similar to the star configuration in network theory. Therefore, it needs many 
control lines and the complexity of the design will be hi&. In contrast, the daisy chah 
method enables the chip to send its image to the controller whenever it receives the bus gant 
signal through the daisy chain comection. Hence, the daisy-chah method is relatively slow, 
but the design is simple and the overall fabrication cost is low. 
1 Buffer 
- - 
Figure 3.6. Active Pixel Sensor with 




Figure 3.7. Schematic of S/H. A 
simple S/H is intplemented with 




Figure 3.8. Shgt register is implemented for readout circuitty, using two 
inverters and switches. 
Figure 3.9. Readout circuitry is integrated with switches enabled by Bus Grant 
signal- The integruted bllr interface passes the gront signal, gated by AND 
finction to the sample-and hold switches. 
In addition, the daisy chah does not use any time for address and over-header, which we cal1 
the "zero-wait state", because the images captured by each carnera are displayed in sequence. 
Therefore, the daisy-chah method is chosen for the prototype of the MOSAIC chip. 
Whenever the Bus Grant (BG) signal cornes to a chip, the chip holds the BG signal, enabling 
the column shift registers and sending out the image, like Figure 3.9. M e r  the chip transfers 
its frame of image, the BG signal is released to the next chip. The BG signal is once 
generated by the controller and circulated through the daisy chah until al1 the images of the 
system are transferred. 
3.4.3. Demonstration and Tesis 
The first test was to capture an image of the best quality possibly with the chip, verimg test 
board connections, control input patterns, image display software setup and, most 
irnportantly, the design of the chip. The basic procedures of the test for capturing images are 
CHAPTER 3 43 
discussed in Appendix B. Figure 3.10 shows the testing setup for the single chip; the testing 
board contains the chip, lens, lem mount, wire connections for power supplies and biases, 
and ribbon cables for control input patterns. The input patterns are generated by a software 
called "GageBit" fiom Gage Applied Inc. Pulse waveforms are manually drawn in the 
software and with a appropriate clock rate, it outputs sequence of control patterns. Also, a 
digital oscilloscope, called "CompuScope" fiom the same Company is used for the data 
acquisition and image display. With Labview interface, the CompuScope can be 
prognimmed to display the image signal into an intensity graph where the signals are 
displayed as pictures in real t he .  
Figure 3.10. Test board with MOSAIC ch@ and lens rnounted. Power supplies and 
bias voltage lines are shown in the lefi side of the board. The ribbon cable for control 
input patterns are connected to the right side of the board 
CHAPTER 3 
Figure 3.11. Above: A raw image 
of Audrey Hepburn. captured by a 
single image-bus chip. Right: 
















0.35 pm CMOS, double poly 
3.3 v 
1.91 mmx 1.91 mm 
64 x 64 
t O p m x 1 0 p  
46 % 
24 hnedsec.  
1.46 m W  at 5 k e s / s e c .  
0.57547 V/luxfsec 
(with 1 w/m2 - 70 lux) 
3 3 m ~ / ( ~ w / c r n ~ )  
16 mV rms (1.3 % of sat.) 
1.2 v 
0.3 V/sec 
1 .O5 pV/e' 
The characteristics of the single chip, including image sensoa, are summarized in Figure 
3.1 1, dong with an example of an individual raw image. The characteristics of the chip can 
be measured and calculated, based on the rneasurements. According to these measurements, 
the technology for this chip was not optimal for image sensors. Commercially available CIS 
chips typically achieve 5-10 pV/e- for their conversion efficiency. However, our chip has as 
estimated conversion efficiency of 1 .OS pV/e-, which is a relatively low result. This estimated 
conversion gain is calculated fiom the measurements of the image sensor chip. The detailed 
calcuiations and measurements are discussed in the Appendk C. Such a low photosensitivity 
leads to long integration times and high dark signal, thus degrading image quality. Since the 
technology is optimized for logic and memory, but not for image sensors, the chip is not 
expected to display a high performance. 
To briefly tallc about the tests for characterization of the chip, there are three essential 
rneasurements: (1) Measure and Save image files at a h e d  wavelength and at a fixed fiame 
rate (sampling rate) while changing the illumination (light power or intensity) nom O to until 
the output voltage is satunited or an equivalent test by changing the integration t h e .  In 
addition, the wavelength and fnune rate can be varied. This measurement can be dùectiy 
used for extraction of photosensitivity, PRNU and saturation level. Also, it can be used for 
CHAPTER 3 
Photosensitivity (Sampling Rate = 50 KHz) 
Light Power (uwlcm2) 
Figure 3-12. Photosensirivity of single chip of MOSAIC. Saturation level of 1.2 
V is shown in rhis diagram ut 50 KHz (1 2 fiames/sec). 





0 ,  I 8 I I I 1 I I 
O 500 1000 1500 2000 2500 3000 3500 4000 4500 
Integration lime (mS) 
Figure 3.13. Dark curreni measurement in single ch@ of MOSAIC. 
calculation of conversion efficiency. Figure 3.12 shows the photosensitivity of the MOSAIC 
chip. From the slope of the graph, the photosensitivity is calculated to be about 0.57547 
VAwc*second which is slightly lower than commercial CIS chips. (2) Measure and Save 
image files at k e d  illumination (light power) and a flxed h e  rate as changing wavelength 
of the incident light. The iIlurnination and &une rate can be varied. This measurement can be 
used for spectral response. (3) Measure and Save image files in a dark room while changing 
inteption time (sampling rate). This measurement can be used directiy for FPN and dark 
current. Figure 3.1 3 shows the dark current measurement of the MOSAIC chip. With this 
chip, there are tbree variables controlled by users: Vbiasp, Vbiasn (see Figure 2.17) and 
sampling rate (control patterns' clock rate). Vbiasp and Vbiasn are bias voltages that do not 
have direct effects on the output images, but only shift saturation level of output voltages. 
However, when Vbiasp and Vbiasn are out of operational range, the top level of the 
saturation range hits the VDD of 3.3V, and then the output images are degradeà. Figure 3.14 
and Figure 3.15 illustrate this phenornenon. The output image of Audrey Hepbuni is not 
affected much between Vbiasn = 0.4 V and 0.6 V. However, when Vbaisn becomes around 
0.65 V in Figure 3.14, the output image shows some degradation. Similarly Figure 3.15 also 
shows no effects on the images between Vbiasp = 2.45 V and 2.70 V. However, when 
Vbiasp becomes 2.75 V, some degradation appears on the image. 
The sampling rate or data rate is directly related to input control patterns' clock rate and to 
the sensor integration time. It is also related to power consumption and output voltage swing 
of the chip. Due to the direct relation between sampling rate and integration tirne, sarnpling 
rate affects the quality of the output images. As sampling rate incteases, the integration time 
decreases because the faster sampling rate reduces readout time of one image frame 
(typically, the maximum integration time of the image sensors is the readout time of one 
image h e ) ,  and thus photon integrating time of the image sensors gets smaller. As seen in 
Figure 3.16, as sampling rate gets higher, some degradation appears on the output image. 
When the sampling rate is over 100 KHz, the output image is hardly recognizable. The main 
limitation of such a low sampling rate is due to the poor photosensitivity of the image 
sensoa; longer integration time is needed to produce a good image quality with a poor 
pho tosensitivi ty. 
Vbiasn = 0.4 V Vbiasn = 0.45 V Vbiasn = 0.5 V 
Vbiasn = 0.55 V Vbiasn = 0.6 V Vbiasn = 0.65 V 
Figure 3.14. Images with dtrerent Ybiasn: They should be same unless if 
reaches its saîuration b e l .  mese sample images are captured under same 
setups of Vbiasp = 2.55 V and Sampling rate = 20 KHz, but ut d#irent 
Vbiasn. 
Vbiasp = 2.45 V Vbiasp = 2.50 V Vbiasp = 2.55 V 
Vbiasp = 2.60 V Vbiasp = 2.65 V Vbiasp = 2.70 V 
Vbiasp = 2.75 V Vbiasp = 2.77 V 
Figure 3.15. Images with drerera? S/H Vbiasp: ïïtey should be sume unless it 
reaches its saîuration levei. These sanrple images are captured under sarne 
setups of Vbiasn = 0.5 V and SampIing rate = 20 KHi, but ut dtrerent 
Vbiasp. 
Sarnpling Rate = 20 KHz Sampling Rate = 50 KHz 
Sampling Rate = LOO KHz Sampling Rate = 200 KHz 
Figure 3.16. Images with d~rerent sampiing rate. These sample images are 
captured under same setups of Vbiasp = 2.55 Vand Ybiasn = 0.5 Y.  but at 
diment sampling rate or data rate. 
Figure 3.17. Tesîing setup for three MOSAK chips' connection. The three 
independent cameras are connected together through a common bus Zine. 
For demonstrating multiple image capture, three independent cameras are connected together 
through a common bus line, as illustrated in Figure 3.17. Each canera captures its input 
image and transfers its image signals to the controller in sequence through the daisy chah  
After the signals are transmitted to the controller, the fiame grabber and display module are 
programmeci to capture and display three different images into one panorama, as shown in 
Figure 3.18. The integrated bus interface operates successfully for multiple images in real 
time mode. 
As the number of chips in the system increases (up to four cameras in our experiments), 
power consumption and t h e  delay are carefdly measured. The power is measured in the 
dark, rather than under illumination because the power consumption can be affected by the 
images that the chips capture. For the single chip operation, the chip consumes 1 mW 
norninaily. Interestingly, as the nurnber of chips in the system increases, the power 
consumption does not incrernent by the power consumed by the single chip. Rather, for each 
additional chip, the power increases by about 20% of the single chip power, as shown in 
Figure 3.19(a). When a chip does not have the bus gant signal, its bus interface disables the 
shift registers, preventing curent from flowing through the PMOS transistors in the S M .  
Since a large portion of the power is consumed by the PMOS transistors in the S M ,  about 
70-80% of the total power [20], the disabling mechanism saves power of the system as a 
power contra1 method. The overall power consumption of the system can be saved by such 
power control methods, especially when a large number of chips are connected. 
Figure 3.18. Panorama images coptwed by the MOSAK system. Three single chip 
cameras of the mosuic imager are linked together through a cornmon bus line. This is 
a still image, a part of vîdeo images captured in real tîme mode. These sensors do not 
inchde pattern noise correction. 
(a) Nominal C u m t  
O 1 2 3 4 5  
Number of camera ch@s 
0) Relative Time Delay 
O 1 2 3 4 5 
Number of camera ch@s 
Figure 3.19. Test results of mosaic imager. As the number of chips increases in the 
mosaic system. the power and tirne delay are measured. 
In order to measure the relative time delays with different numbers of chips, the minimum 
charging/discharging time of a fixeci pixel is consistently measured with the same 
background image. As the number of chips on the bus line increases, the minimum time 
delay of charging/discharging also increases, as shown in Figure 3.19 (b). Similar to the 
power consumption, the RC time delays for the additional chips do not increase by the time 
delay of the single chip systern. When the time delay of the single chip system is nomialized, 
an additional chip to the system experiences ody about 7.5% inmement- Since the loading of 
the bus line is mainly caused by the bus line, probe contacts and external comections, the 
extra loading of the additiond chips is relatively small. However, it is evident that as the 
number of chips to the MOSAIC systern increases, the output loading to the systern increases, 
thus slowing down the image transfer speed. Especially for a large field of view, when a 
large number of chips are comected, the inevitable heavy loading to the MOSAIC imager 
will be a primary implementation issue. 
In order to enhance the h e  update rate in the MOSAIC system, six different methods can 
be proposeci. Firstly, multiple output channels will increase the fiame update rate. Instead of 
one output channel, the output data can be transmitted through several different channels in 
parallel. One shift register (or decoder) can be placed per output channel, dividing the anay 
into blocks by column. Secondly, large drivers increase the fiame update rate. The output 
driving power in our CMOS photodiode array is generated fkom the source follower in the 
SM, where a PMOS source follower is used. The larger the transistor size of the driver is, the 
more cment (driving power) the driver has. Thirdly, a shorter RC charging/discharging 
range w d d  be used for output transmission similar to that used in random access memory. 
Since the voltage swing is small, the time for charging/discharging is reduced, allowing a 
faster update rate. However, such a srnall voltage swing potentiaily M e r s  fiom high noise, 
especially fiom off-chip connections. Therefore, digital signal transmission is proposed for 
noise immunity. Even with small voltage swing, the digital transmission of output is 
relatively immune to noise compared to its analog counterpart. The digital transmission does 
not necessarily increase the fiame update rate, instead it protects the output transmission 
from noise sources. In addition, efficient bus arbitration aigorithm (bus interface that 
arbitrates the bus ownership so that at a given time, only one module which is comected to 
the bus has the control of the bus) can enhance the h e  update rate. There are many 
different bus arbitration methods, each suitable for particular applications and systems, so 
choosing a proper bus arbitration can increase the speed. Lady, data reduction strategies are 
of great importance for high speed. Since large volumes of output data slow down the fkame 
rate, a reduction of the data transmitted fkom on-chip to off-chip will increase the frame 
speed. The data or image could be compressed a b  the acquisition of the image. 
Alternatively, objects or events of interest in the image c m  be extracted and encoded. Either 
data compression or data extraction will reduce the amount of output data, thus increasing 
h e  update rate. 
3.5. Conclusions for MOSAIC: Single Chip Camera Modules 
The integrated bus interface module increases the performance of the bus comections by 
providing proper structure and arbitration methods. In this b i s ,  the integrated bus interface 
dernonstrates its effectiveness in terms of fabrication cost and flexibility of operation. Since a 
common bus line is used for an image transfer to the controtler, the number of comection 
lines is reduced. Also the bus arbitration is managed in each camera, so the system is very 
flexible for additional cameras. Moreover, by an intelligent power control method of the 
system, low power operation c m  be achieved. However, even with efficient on-chip bus 
interface, large data flow and slow h e  update rates are d l  potential design issues for 
systems with large numbem of camera modules, due to the output loading to the bus h e .  
Therefore, it is concluded that the implementation of the high h e  update rate is necessary 
for further implementations of MOSAIC system. A smart sensor with on-chip processing is 
of great importance as an additional technique to increase the h e  rate of the MOSAIC. 
Chapter IV 
4. Spatial lmage Processing lntegrated with 
CMOS lmage Sensor 
4.1. Introduction 
Solid-state image sensor technology is based on the inherent photoconversion properties of 
semiconductors with the advanced silicon processing technology driven by the VLSI industry 
to achieve high performance and reasonable cost. As mentioned previously, the focus of 
funue CMOS image sensor technology is expected to be in two research eras: cost and 
performance. The performance refers to the good image quality produced by image sensors 
with low temporai and spatial noises, low dark current and hi& dynarnic range. The cost 
rather refers to processing component integration with image sensors for automated controls 
and enhanced fùnctionality. The integration of processing cùcuits on the same focal plane 
with CMOS image sensors will reduce overall fabrication cost, rnainly saving wafer area for 
pads and power supplies. The cost can also be saved fiom packages, circuit boards, wire 
connections and assembly. 
The main reason for high integration of CMOS image sensors is its compatibility of 
processes between circuits and image sensors. While CMOS technology requires relatively 
thin gate oxide thickness, shallow well depths and low power supplies, CCD requires 
relatively thick gate oxide thickness, deep well, deep channel depth, and high power supplies. 
Obviously, it is difficult to integrate the two technologies due to these significant diffeiences 
in their process steps. Essentially, a full-featured combination would require almost al1 the 
stages nom both processes, which means probably over 30 masks processing steps. 
There have been some efforts to combine good image quality of CCD and logic of CMOS 
technology. n i e  reduced yield and inmased cos& has not made a combined CMOSKCD 
process viable. The combined process is neither standard CMOS nor standard CCD, and so 
requires high development expense, and the fiequent result is that neither part will work 
particularly well. Several processes have been reported which clah to preserve the quaiity of 
each technology [2 1 1 [9 1 1 [92]. However, despite the demonstrateci feasibility of CMOSKCD 
hybrids, the idea has not yet taken off possibly because few places have access to bot. sets of 
fabrication facilities and the design experience [ 11. 
CMOS image sensors use the same technology of CMOS logic/mernory processes, and 
therefore, expensive extra process steps are not needed. Also, the process for CMOS image 
sensors can be enhanced, mainly by increasing the depth of the epitaxial layer, which is 
predetennined wafer selection rather than process steps. Therefore, the fabrication of CMOS 
image sensors with processing circuits such as on-chip ADC, logic, memory, and even 
processing elements is relatively simple and cheap, without much loss in optical pdormance. 
In this thesis, the interest in integrating image processing with a CMOS image sensor was 
initiated by the MOSAIC systern for large field of view, particularly with reference to data 
reduction mechanisms. Therefore, the remainder of the thesis is based on system-level 
architecture and design methodology issues, trying to answer the following questions: 
Why we want to integrate vision algonthms (image processïng algorithms) with 
image sensors (CMOS image sensors in this thesis)? 
What algorithrns and processing components should we put with the sensors? 
O How we will integrate these procasing algorithms? 
What structures are the best for what image processing algorithms? 
The first two questions are answered in this chapter and the last two questions are answered 
in next few chapters, leading to the basis of the main concept of the thesis. 
4.2. Smart Sensors (Vision Chips): Wby Smart Sensors? 
Here, we are trying to answer why we want to integrate image processing (vision) algorithms 
with image sensors, or to implement smart sensors. Cornparisons between smart sensors and 
camera plus processors are investigated to detennine th& advantages and disadvantages- 
The integration of image sensors and processing circuits on a single chip, for obtaining better 
performance nom sensors and processors, or for making the sensing and processing system 
more compact, is not a new idea. There are various reports on on-chip signai processing 
elements with CMOS image sensors, such as correlateci double sampling (CDS), delta- 
difference sampling (DDS), programmable amplification, multiresolution imaging, dynamic 
range enhancement, and on-chip clock generation. These processing circuits are signal 
processing to improve the performance of the CMOS image sensors, but not to increase 
functionality of the imager chip. 
A smart sensor is well defined in "Vision Chips " by Moini [15]. Moini quotes that "the smart 
sensors refer to those devices in which the sensors and circuits CO-exist, and their relationship 
with each other and with higher-level processing layers goes beyond the meaning of 
transmission. Smart sensors are uiformation sensors, not transducers and signal processing 
elements". in this thesis, the meaning of smart sensor is f.urther nanowed d o m  to the devices 
in which image sensors and image processing circuits (beyond signal processing) CO-exist, 
and they interact with each other in order to increase functiondity of the imager chip. 
Traditional photodetectors often require fiirther signal and image processing a h  the image 
acquisition to increase quality of imaging in terms of noise, resolution and speed. In contrast, 
in srnart sensors the main interest is the functionality of processing or quality of processing. 
The important qualities of processing in the smart sensors are the contents of outputs fiom 
the srnart sensors, algorithms htegrated with the senson, and applications the smart senson 
are targeted for. Sometimes, some imaging characteristics, such as resolution, fiame rate and 
power, could be sacrificed to enhance the fûnctionality of processing. 
When compared to a vision processing system consisting of a camera and a digitaymaiog 
processor, a smart sensor provides many advantages. Although the main advantages are to 
reduce bandwidth and subsequent stages of computations, there are many other advantages, 
well described in [LS]. These are the major reasons why smart sensors are betta than a 
combination of a camera and a processor in separate chips. 
Processing speed: The processing speed of smart sensors is faster than that of 
combination of image sensor and processor. In the image sensor and processor 
combination, the information transfer occurs in a series between the image sensors 
and the processors, while in smart sensor data between different layers of processing 
can be processed and tramferred in parallel. 
Singie chip integration: A single chip irnplementation of smart sensors contains 
image acquisition, low and high-level analog/digital image processing circuits on a 
same focal plane. For example, a tiny sized chip can do the eqWvalent work as a 
camera-processor system. 
Adaptation: In many smart sensors, photocircuits can be located up fiont with the 
photodetectors for local and global adaptation capabilities that fUrther enhance their 
dynamic range. Conventional cameras at best have global automatic gain control with 
offset at the end of the output data channe1 in the chip. 
Power dissipation: Srnart sensors ofkm use analog circuits that operate in sub- 
threshold region. in addition, a large portion of the total power spent in image sensors 
is due to output drivas to heavy output loadings of bonding wires, pads at high 
fiequency and off-chip interconnections. By placing image sensors and processors 
without a separate packaging, the design of the large drivers is avoidable, which 
reduce the power consumption in operation. 
Size and Cost: Single chip implernentation of image sensors and a processor c m  
reduce a system size dramatically, mainly saving wafei area for pads and power 
supplies. The compact size of the chip is directly related to the fabrication cost. 
Therefore, the integration of processing circuits on the same focal plane with the 
image sensors wili reduce overall fabrication cost. 
Aithough designing single-chip smart sensors is an attractive idea, it faces several limitations 
and disadvantaes: 
O Processing rellability: Processing circuits of smart sensors ofien use unconventional 
analog circuits which are not well characthed and understood in many technologies. 
Therefore, the processing circuits have low precision on theü operation, which is 
affecteci by many uncontrollable factors. As a result, if the smart sensor does not 
account for these inaccuracies, the processing reliability is severely affecteci. 
Custom designs: Unconventional analog circuits are often used in implementation of 
smart sensors. Therefore, cucuits from design libraries carmot be used, but many new 
analog circuits have to be developed fiom a scratch. Therefore, smart sensors are 
always full custom designed, which is hown to be t h e  consrrmuig and error-prone. 
O Programmability: Many smart sensors are not general-purpose devices, and are 
typically not programmable to perform different vision tasks. They are rather 
application specific designs. This lack of programmability is undesirable especially 
during the development of a vision system when various simulations are requird. 
However, it is not necessarily a serious drawback of smart sensors because many 
applications of the smart sensors are for particular tasks with limited programmability. 
Even with these disadvantages of the integration, smart sensors are still attractive mauily 
because of its effective cost, size and speed with various on-chip fiinctionalities. Simply there 
are the benefits when a camera and a cornputer system are converteci into a thumbnail sized 
camera chip. 
4.3. On-chip Early Image Prcniessing: What on Smart Sensors? 
The basis of the smart sensor concept is that analog VLSI systems with low precision are 
sufficient for implementing many low-level vision (image processing) algorithms, ofien for 
application-specific tasks. Conventionally, smart sensors are not general-purpose devices, but 
everything in a smart sensor is specifically designed for the application targeted. Yet, in this 
thesis, we do not wish to lirnit implementations to application-specific tasks, but to allow for 
general-purpose applications such as DSP-like image processors with programmability. The 
idea is based on the fact that some of early level image processing in the general-purpose 
chips are commonly shared with many image processoa, which do not require 
programmability on their operation. As shown in Figure 4.1, human eyes, not associated with 
the brain, perform basic image operation in a human such as image filtering, brightness 
High Level ProcessUlgr 






Figure 4.1. Optical image system in human: low level processing such as brightness 
adaptation and imagejiltering can be done a& eye level, wwilhout much interaction with brain. 
adaptation, edge extraction and motion detection 1221. These early level image processing 
algorithms, from the point of views of on-chip implementation, are rather pre-detemiined and 
fixed where their low precision can be compensated later by back-end processing. Here, we 
will investigate what early image processing algonthms can be integrated on smart sensors as 
a part of early vision sequences and we will discuss their ments and the issues that designers 
should consider in advance. 
General image processing consists of several image analysis processing steps as s h o w  in 
Figure 4.2: image acquisition, preprocessing, segmentation, representation and description, 
and recognition and interpretation. The order of this image analysis can Vary for different 
applications, and stages of the processes can be omitted. In image processing, the image 
acquisition is used to capture raw images fiom its input scene, through the use of video 
carnera, scanners and, in the case of smart sensors, the solid-state arrays. 
Preprocessing stage is used to perform initial processing that makes the primary task of the 
image analysis easier. Preprocessing is a stage where the requirements are typically obvious 
and straightforward, such as removing artifacts fkorn images or eliminating image 
Image data feedback 
Figure 4.2. General machine visionlimage processing operational stages 
of image anaiysis. 
information unnecessary for the application. It includes basic aigebraic operation such as 
image averaging and subtraction, feature enhancements, contrast stretching? bit slicing, and 
data reduction of image infionnation. It is mainly subdivided into three different operations: 
image enhancement, image restoration and image compression. The image enhancement 
processes an image so that the resdt is more suitable for a specific application. For example, 
image smoothing and sharpening filters improve image quality of input raw images. Image 
restoration is a process that attempts to reconstnict or recover a degraded image with a prior 
knowledge of the degradation phenomenon. Image restoration is quite similar to image 
enhancement, but one big difference is the prior knowledge of the degradation. Due to the 
prior knowledge of the degradation, the recovery of damaged images is relatively easier. 
Lady, image compression is another fonn of data reduction between raw input images and 
encoded output images. Image compression is a highly recommended preprocessing 
operation, particularly for hi@ volume communications like multimedia applications. 
At the third stage of the image processing, image segmentation is important in many 
cornputer vision and image processing applications. The goal of image segmentation is to 
find regions that represent objects or meanin- parts of objects. The segmentation 
subdivides an image into its constituent parts or objects. It should stop when the objects of 
interest in an application have been isolated [74]. Image segmentation generally foLiows two 
methods of detection: detection of discontinuity and detection of similarity. In the first 
category, the approach is to partition an image by abrupt changes in gray level. The p ~ c i p a l  
areas of interest within this category are the detection of isolated points and the detection of 
lines and edges in an image. The approaches in the second category, which is detection of 
similady, are based on thresholding, region growing, and region splitting and merging. 
At the next level of the processing, the resultant data of segmented pixels usually are 
represented and described in a fom suitable for M e r  cornputer processing. Representation 
and description is an image processing operation that follows the image segmentation. 
Basically, representing a region involves two choices: representation of regions in texms of 
its external characteristics (its boundary), and representation in terms of its intemal 
characteristics (the pixels comprking the region). Therefore, this stage of the processing 
refines images or image information more adequate for high-level image processing. 
At the last stage of image processing, recognition and interpretation is a process of the 
understanding patterns. This is a stage where understanding patterns that are related to the 
image processing takes a place. Therefore, it requires large computational power as well as 
large memory. 
We have seen general process stages of image processing and image analysis, popularly used 
in machine vision. These stages are not necessary operation for al1 the image analysis. It is 
rather dependent on the applications that it is used for. The order of the stages can be 
changed and some of the stages can be omitted for particular applications. For instance, edge 
detection with CMOS image sensor uses images captureci by CMOS image sensor and 
perfoms image segmentation on the image, skipping Mage enhancement or filtering. 
Based on the processing stages of the image analysis, on-chip image processing with CMOS 
image sensors is focused on here. Ideally, on-chip image processing contains al1 the 
processing stages of image analysis. However, it is not possible or necessary to design and 
integrate al1 the processing circuits of the operation on a single chip. In order to understand 
clearly what image processing operation are needed and how much image processing task is 
necessary for the smart sensors, understanding and classification of these image analysis 
stages are highly recommended. Afier d l ,  choosing an appropnate algorithm for less power, 
less area and faster speed is essentially important for the integration of the CMOS image 
SeIl!3Of. 
Although few, if any, of the vision chips are generai-purpose [93] and many vision chips are 
not programmable to perfonn different vision tasks, there are primary image processing tasks 
needed for many applications. For example, image processing beyond image enhancement 
and some of segmentations require large computationai power and memory to store the data. 
Also they are applications oriented processing. However, image enhancement and filtering 
are essential for many other image processing operation. Therefore, image enhancement and 
filtering implementation shuuld be included in the early level image processing commonly 
shared by general-purpose image processors. 
In summary, on-chip image processing with CMOS image sensors is expected to follow in 
these two implementation directions: application specific operation, and primary tasks for 
general-purpose processing such as image enhancement and filtering. Image enhancement, 
filtering, and sometimes image segmentation, can also be applied to performance 
improvements of image sensors, which are commonly shared by generai-purpose image 
processors. However, application-specific on-chip image processing is likely to be the 
dominant use for CMOS image sensors, because of the wide variety of applications and the 
large number of different design choices for the integration. 
4.4. Architectures for On-chip Processing Integration: How to Implement Smart 
Sensors? 
Now, we will investigate efficient architectures for implementing on-chip image processing 
with CMOS image sensors. In the next few sections, we will first look into the structures 
available for any signal processing integration on a single chip with image sensors. Then, we 
explore the nature of image processing algorithms in tems of image signais, processing 
domain and operational region. 
We have seen vision algorithms of on-chip image processing with CMOS image sensors such 
as image enhancement, segmentation, feature extraction and pattern classification. These 
algorithms are frequently used in software-based operation, where structural implementation 
in hardware is not considered. Here, the main research interest focuses on how to integrate 
image processing (vision) algorithms with CIS or how to implement smart sensors in 
hardware, in terms of its system-level architectures and design methodologies. Here, we will 
first look at previous designs and implementations, focusing on their design structures and 
methodologies. 
4.4.1. PIeviOus Work 
There have been many reports involving the sensing and image processing on a single silicon 
chip, such as smoothing, edge detection, stereo processing, contrast enhancement, motion 
detection, video compression, discrete cosine transform and neural networks. These works 
are great efforts and fine works, some of which include revolutionary ideas. Because these 
works are application-specific designs, the architectural and circuit level designs are often 
application orienteci, and they do not have general applicability. 
Some researchers report papers on the implementation of image processing and image 
sensors. The fmt successfÙI attempt to perform a low-level image algorithm, convolution by 
Gaussian filter, on a chip was carried out at Lincoln Laboxatory in 1984, based on the control 
of the charge t r a n s f d g  mechanism [24]. Soon after this, updated and more powerful 
versions of this algorithm and circuit were presented [25] [26]. AAer the initial attempts, the 
detailed design and implementation of a CCD-based image processor, performhg two- 
dimensional filtering operation with programmable 8-bit digital spatial filters, occured [27]. 
This system represents a hybrïd analog-digital architecture. Derived fiom the original 
implementations, a more effective parallel-pipelined architecture of on-chip processing is 
describeci in 1991 [28]. it was implemented for an edge detection algorithm and a boundary- 
preserving image filter. A radial geometry, called log retina, was introduced in early 1980s. 
This retina, based on the logarithrnic mapping between the retina and cortex in marnmals, 
consists of concentric circles with each àrcle having image sensors, with the pixel sue of the 
imager increasing linearly with eccentricity. The central part of the imager has a constant 
resolution. Such imager architechue has a number of advantages, such as emphasizing the 
central part of the image and certain invariance for pattern recognition and motion processing. 
Other examples of image processing demonstrateci in CMOS image sensors include motion 
detection, spatial local filters, multiraolution, video compression, and neuronM0SFETs. 
These on-chip image processing implementations are systematically designed for specific 
applications, but do not provide an oveniew description of their limitations and 
implementation boundaries. The ovewiew article by Fossum in 1989 [29] provides a 
comprehensive treatment of solid-state imagers using analog CCD circuitry. Low, medium, 
and high density detector arrays are discussed in terms of their implementation architectures, 
and a pipeline-vector-pixel processor is describeci. Also, the potential of on-chip readlwnte 
analog fhme memory for image transformation and h e - t o - h e  processing is addressed. 
However, the architectural implementation by circuit density (number of transistors per unit 
area for a processing element) is not suffiCient to provide the detailed and general partition 
because the circuit density is not the only design specification the designers should account 
for. 
Here, in this thesis, more generalized partitions for architectural implementation of on-chip 
image processing with CMOS image sensors are proposed. The partition includes not only 
the circuit density, but also the nature of image processing algonthms and the applications 
for its focal plane integration with the sensors. We will look into the existing architectures of 
focal plane integrations and its feasibility with CMOS image sensors. We will also explore 
the nature of image processing algonthms, including operation of the algorithms and their 
feasibility with imager focal plane implementation. 
4.4.2. Types of Hardware ImpIementation 
General architectures for signal processing, not necessarily image processing, on a single 
chip with the image sensoa are examined. It should be noted that this is a general 
implementation structure of any signal processing for image sensoa, such as on-chip ADC, 
CDS and amplification. The basic components of CMOS imager array, such as photodiodes, 
shift registers, S/H and output buffers, are assumed to be independent of implementation 
structures. Architectures of focal plane integration are mainiy divided into four different 
processing structura: pixel, column, chip and memory h e  processing. Location of the 
signai-processing unit, as known as a Processing Elernent (PE), becomes the dividing factor 
of these implementation structures, as shown in Figure 4.3. 
The pixel processing consists of one processing elernent (PE) per image sensor pixel, shown 
in Figure 4.3 (a). Each pixel typically consists of a photodetector, an active buffer and a 
signal-processing element. The pixel-level processing promises many significant advantages, 
including high S M ,  low power, as well as the ability to adapt image capture and processing 
to different environments with processing during light integration. However, the popular use 
of the design has been blocked by the severe limitations on pixel sue, low fïll factor and 
restricted number of transistors in PE. 
In the column-level processing, shown in Figure 4.3 (b), a PE is located at every column of 
the imager array. Since images of the array are read row by row, the whole row is dumped 
into S/H concurrently and then transfmed to the output in series pixel by pixel. With this 
typical readout mechanism of CMOS image sensor array, the column processing offers 
advantages of parallel processing that permits low nequency processing and thus low power 
consumption. Compared to pixel processing, the pixel suffers less from low fi11 factor 
because the PE is taken out to the column, which increases the photosensitivity of the sensor. 
Although there is restriction on implementation area, particularly column width, the 
implementation is relatively flexible because of the fieedom in vertical direction of the 
colurnns. Still, due to the narrow colurnn width, particularly as the pixel size shnnks, 
designers cannot have full flexibility of processing circuits area. 
The chip-level processing is one of the obvious integration methods due to its conceptuai 
sirnplicity and flexibility of design area. Each PE is located at the serial output channel at the 
end of the chip, shown in Figure 4.3 (c). There are fewer restrictions on the impiementaion 
area of the PE, leading to a hi& fi11 factor of the pixel and a more flexible design. However, 
the bottleneck of the processing speed of the chip becornes the operational speed of the PE, 
and therefore, a fast PE is essentially required. The fast speed of the PE results potentially in 
high complexity of design and the high power consumption of the chip. Therefore, many 
designers try to avoid using this structure unless the chip requires hi& complexity of design. 
Another structure of the implementation is fiame memory processing. As shown in Figure 
4.3 (d), a memory array with the same number of elements as the sensor is located below the 
imager array. Typically, the image memory is analog hune memory that requires less 
complexity of design, area, and processing tirne [30]. However, this structure consumes a 
large area, large power and high fabrication cost. In addition, the processed images have 
latency of a fi=ame to the output. Structures other than frame memory face difficulty in 
implementing temporal storage. The frame memory is the most adequate structure that 
pemiits iterative operation and frame operation, critical for some image processing 
buffer \ 1""6 
Output Amplifier 
Colwan ~eadout- 
(c) Chip Processing 
(d) Frame Memoly Processing 
Figure 4.3. Structures of focal plane implemenîations with image sensors: pUrel, 











Slow processing, thus low 
power 
Low processing fkquency 
Easy implementation of 
global and local adaptation 
Minimized parasitic effects 
Flexible implementation in 
vertical directions 
Semi-parallei processing 
Low processing frequency 
thus low power 
High fill factor 
Less non-unifonnit- than 
pixel level implementation 
Small chip area 
No limitations on PE design 
area 
High fill factor 
High unifonnity 
Flexible operation 
High fill factor 
Image storage 
Disadvantages 
Low fil1 factor 
Restricted size of PE 
Limited number of transistors 
in PE 
Limited prografnmability and 
precision 
Poor uniformity of PE 
Dark current and cross-tak 
Restricted area of column width 
Lirnited size of mask (3x3) 
Higher mismatch than chip 
structure 
Higher power than pixel 
structure 
Low uniformity of PE's in 
columns 
Fast PE (High speed) is 
reqwed 
High complexity of PE 
High power 
No parallel processing 
Chip speed dependency 
Large chip area 
Latency of a fiame 
Medium power 
High fabrication cost 
Signal degradation in memory 
Table 3. General descriptom and cornparisons on hardware implementation sbucrirres. 
with their advantages and disudvantages. 
CHAPTER 4 69 
algorithms. As a su~nmary, Table 3 illustrates the general descriptions and cornparisons of 
the hardware on-chip implementations with their advantages and disadvantages. 
4.4.3. Design Issues of Hardware ImpIemenfation 
A particular implementation structure cannot be optimal for every implementation, but 
instead will be application-dependent, where there is one optimal structure for a given 
application and specification. Here, we suggest specifications and design issues that should 
be accounted for when we approach a decision of on-chip hardware irnplementation structure 
for a given image processing application. These design issues include fi11 factor, processing 
time, power, design area, speed, W9nnity, dark current and cross-talk. 
Fil1 Factor: Since, in the column, chip and f k n e  memory level structures, 
processing elements are separated h m  pixels in the array, circuit density is not a 
limiting factor. However, circuit density plays an important role in pixel level 
structures because it is inversely proportional to the fül factor that is closely related to 
the photosensitivity of the image sensoa. Therefore, it is Unportant to choose a 
simple processing element with reasonable precision in the pixel processing structures. 
However, as technology scales down, the number of transistors that can be 
implemented in a pixel increases rapidly, according to the estimation of Figure 4.4. 
Transistors per pixel vs. Technology 
0.35 0.25 0.18 0.15 0.1 3 0.1 0.07 0.05 
Technology (um) 
Figure 4.4. Number of îransistors per phel ar a firnction of process technology. These 
estîmaîes are based on [33J riris figwe plots the estirnoted nuntber of transistors per 
pkel with minimum transistor sire (iypically for digital) as technology scales, assuming 
a 5 pixel with a constant fll  factor of 30%. 
CHAPTER 4 
Fill Factor vs Number of Transistors 
Figure 4.5. Fill factor for dzrerent number of transistors in a pirel with dzreren~ 
process technologies. The plot is estirnatedfrorn Figure 4.4 
Figure 4.5 shows the relation between the fil1 factor and the number of transistors in a 
pixel, which predicts the nurnber of transistors with a reasonable fil1 factor for a given 
process technology. As technology scales down, there is more space available for 
processing circuitry in a pixel, which encourages the pixel level implementation. 
Processiog Time: Each irnplementation structure has a different processing time 
requirement for processing element, fiom integration time to data sampling rate. The 
processing time is directly related to the power consumption of the components and 
typically associated with the design complexity. As longer processing time is allowed 
for a processing element, the compiexity of the element decreases because the circuit 
has looser speed requirement. When MxN array is operating at S fiamedsecond, each 
structure has different maximum processing tirne allowed. With chip level structures, 
the processing element should run at or less than the sampling (data) rate, which here 
is equal to l/(S*M*N) seconds. In the column level structures, the maximum 
processing time is equal to 1/(Sf M) seconds that is N times longer than the chip level 
structure. Meanwhile, the pixel level structures have l/S seconds of the maximum 




Memory 1 [33 ml 1 p x l d  c2] 1 [h<108 c2] 
C hip-based 
Co lumn- based 
Pixel and Frame 
Table 4. Numerical cornparisons of harhuc~re implementation shzrctures for UxN array 
with Sfiames/second [] are values. bused on a IOOOxlOOO array operating at 30 









1 /(S * M) 
[33 p] 
1 /S 
Figure 4.6. Mmimum processing tirne avui2abIe for the processing element for dzgerent 
sizes of arruy, usuming 30fiarnes/secondfiame rate for the image sensor arrays. 
a (S*MSN) 2 C 2 
14 2 ~9x10 c 1 
=(SM) 2 C 2 
8 2 [9x10 C ] 
ac s2 c2 
oc l*(S*M*N) 2 C 2 
[%1014 c2] 
œ N*(S*M) 2 C 2 
[%IO" c2] 
oc M*N*S z C 2 
maximum processing time as pixel level processing, yet with a necessary latency of 
one image m e .  An example of the comparisons for MxN array with S 
fiamedsecond is shown in Table 4. Aiso, Figure 4.6 shows maximum processing time 
of a processing element for different sizes of array. As the size of the array increases, 
the diffeience in the processing tirne for different processing levels are clearly 
illustrated in the Figure 4.6; the maximum processing time for pixel level and fiame 
memory implementation rernains constant, but those for column and chip level 
structures decrease rapidly. No matter what size the array format has, the pixel level 
implementation always give a constant and relatively long processing tune while the 
time requirements for the column and chip levels get tighter with the increase of the 
array sue. 
Power: The power consumption of the processing elements is directly related to the 
maximum processing frequency. Unlike their digital cousins where typical power 
consumption is linearly proportional to its opemting fiequency, analog circuits 
follow: 
Power a (Capa~itance*freqaency)~ 
Where a is around 1.5 - 2 
With chip level structures, the power consumption for each processing element is 
proportional to (c*s*M*N)~. With column level structures, it is proportional to 
(c*s*M)=. (c*s)= is for pixel level and the frame memory level structures. With 
counts of the number of the processing elements in the chip, the total power 
consumption will be a product of the power at each element and the total number of 
the elernents in the chip. Therefore, the total power consumption of the pixel level 
and the frame memory structure is proportional to (c*s)~*M*N. It should be noted 
that the caiculation is based only on the processing element, not Uicluding image 
acquisition. Typically the power consumption of image acquisition is proportional to 
the product of (number of pixels)a and (number of col~rnns)~, as shown in Figure 4.7. 
Therefore, as the array size increases the total power consumption of the chip 
increases drastically due to the processing elements and image acquisition. The power 
of the column level processing structure is (c*s*IM)=*N. The chip level 
CHAPTER 4 
Power Consumption of Image Acquisition 
500 1000 2000 
Format (k-pixels) 
Figure 4.7. Power consumption (exciuding processing element) of the dzflerent array 
size. This power consumptiÜn is on& for image acquisihon. 




Figure 4.8. Power consumption (excluding image acquisition) of the dlrerent array sire 
for dgerent processing levels, assuming 30 fiomes/second fiome rate for the image 
sensor arrays. 
structure has the same total power as that of one processuig element because it has 
only one processing element in the chip. Figure 4.8 shows the total power of the 
system (not including image aquisition) with the different sizes of the array, for the 
different processing levels. As the size of the array increases, the total power 
consumption increases drasticaily because of the non-linear relationship with the 
array size. It is clear that the pixel and column level implernentations Save power 
consumption as the array size increases, compared to the chip level implementation. 
Design Area: Total design area of the chip becomes an important issue because it is 
closely related to the fabrication cost. The fiame memory consumes the largest design 
area because of the separated storage for a h m e  of image in the chip, where the chip 
level structure typically consumes the least area by one relatively big and complex 
processing element. Below the imager array, the column level structure has the same 
number of long narrow processing elements per column of the array, with only a 
slight increase on the chip size. The pixel level structure, under the assumption that 
same size of processing element is used for ail other structures, has the second largest 
area comumption following the h e  memory structure. Yet, because the processing 
elements in the pixel level structure are relatively small due to the long processing 
time, and unless the element is small, the photosensitive area of the pixel is 
drastically reduced. The typicai size of the processing element in the pixel level 
structure is small, resulting in relatively mal1 increase in the chip size. 
Speed dependeacy: The speed of the imager chip is determined by the slowest 
component in the data path (bottleneck of the output channel). In most cases, the 
output amplifiers are the bottieneck in the output data path because of the heavy 
output loads. Because, in the pixel, column and h e  memory level structures, the 
processing elements have relatively a longer processing time than the data output rate, 
the output amplifiers are more likely to be the bottieneck of the chip speed. In 
contrast, because the processing elernent in the chip level structure should have the 
sarne processing speed as the output data rate, the output amplifier rnight not be the 
bottleneck of the chip speed. Instead, the processing element becomes the bottleneck. 
Therefore, a design of high-speed processing elements with reasonable power 
consumption becornes critical in the chip level structures. 
a Uniformity: As processing elements are spread al1 over the image sensor array, 
unifonnity of the processing elements becomes important design issue, especially for 
pixel level and h e  memory implementations. As FPN is a critical design factor for 
the regular image sensor arrays, the unifomiity will be an important factor for smart 
sensors. Even for column level implementations, the unifomïty cannot be neglected 
because of the non-unifomiity through the columns. However, chip level structure 
will not suffer fiom the non-unifonnity of processing elements. As technology scales 
down, uniformity is expected to increase due to the reduction of body effect 
coefficient (y) [86]. 
a Dark Current and Crosstak: Similar to the uniformity, dark current and crosstalk 
will be greater for pixel and fiame memory than column and chip leve1 
implementations. However, these can be reduced by carefùl circuit designs such as 
guard ring and separate power supplies, and advanced process technology with low 
dark current. 
4.4.4. Types of Image processing Algorithms 
Conventionai approaches to hardware implementation of on-chip image processing are 
accomplished by the density of the circuit [29]. in addition to circuit density, designers 
should consider the nature of the image processing (vision) algorithms for the on-chip 
implementations. OAen, for on-chip image processing (smart sensors), the nature of the 
vision (image processing) a lgor ihs  is overwhelmed by the circuit density, mainly due to 
the reduction of fil1 factor and reduction of photosensitivity and resolution. However, it is 
sometimes necessary and reasonable to sacrifice fil1 factor to gain operational performance 
for aven vision algorithms. After dl, both the circuit density and the nature of the processing 
algorithm shodd be considered for integrating smart sensors. Here, we will investigate and 
discuss the nature of image processing (vision) algorithms which can be integrated on the 
smart sensors. The nature of image processing algorithms can be categorized in ternis of 
signal type, processing domain and operational regions. 
A. Signal Types: Analog us. Di'iai Processing Eiements 
Broadly, any signals can be divided into analog and digital, including the image signals. The 
smart sensors focus on analog VLSI implementations even though hardware implementation 
:smollo~ se am suopxqpq a u  'ILL] pn%n llaw pus paqlusap llam an 
=ornas uems q ~193q3 opue JO s u o g ~ q q  a s a u  *suogei?t.~~~l m o s  s ~ q  os@ J! ' a ~ t p a ~ e  
 ha^ aq 01 ua~oid mq 3tqssa~o~d 8opm pue Sysuas a%euq JO uop&a~e atp Mnoqqy 
Progmmmabiiity (Fiexibüity): The analog circuits are designed to perform very 
specific tasks, unWre digital computers (and DSPs) that can be prograrnmed to 
perform any logical or numerical operation. On the other hand, for many applications 
where only specific tasks are of interest, the excessive and expensive digital 
computers (and DSPs) with good programmability are not needed. Even for high- 
level processing, a combination of a miart sensor without high programmability, and 
DSP is recommended because the smart sensors can reduce many stages of (tirne and 
power-consuming) computations in the algorithm processing, which would have 
otherwise been computed by the digitai cornputers. Besides, digital computers and 
DSPs are preferable for developing and evaluating new image processing (vision) 
algorithms. 
Preeision: Analog circuits ofhm suffer fiom fabrication inhomogeneties, offset 
currents, lithographie mismatches and 0 t h  factors that lower the precision. 
Therefore, the analog smart sensors will have lower precision than the digital cousins. 
Typically analog circuits have only 7 - 8 bits precision where digital counterparts are 
12 - 16 bit. However, biological systems such as human vision system, only process 
data with at most 100 levels of gray level, which can be covered with less than 7 bits. 
Yet with such low accuracy, human can obtain amazing performance. 
B. Operational Donruin: Frequency vs. Spatial Domain 
Otten, image processing algorithms transfer the processing domain of the input image fiom 
spatial to fkquency for easier manipulation and calculations. The foundation of fiequency 
domain techniques is the convolution theorem. Many image processing algorithms, 
especially localized image processing operation, use convolution in the spatial domain, and 
are later transfmed to multiplication in the fkequency domain by Fourier transfomi, where 
the multiplication is relatively easier to manipulate and implement than convolution. 
Operations in the fiequency domain are more effective and easier to understand. However, 
image processing in the fiequency domain definitely requires Fourier transform elernents that 
are typically complex circuit designs. Particularly, on-chip image processing in fkequency 
domain should contain ADC, Fourier transfomi and digital processor with CMOS image 
sensors, resulting in a large area and hi& complexity of designs. It is one of the reasons why 
general-purpose on-chip digital image processing chip plus image sensors rarely exist yet. 
Rather, the processing domain of on-chip image processing is restricted to the spatial domain 
because of its relative ease of implementation and no use of expensive Fourier transfom. In 
this thesis, therefore the focus of the implementation rests on analog on-chip image 
processing in the spatial domain. 
C. Operational Region: Point, Local and Global Operation 
Now, the image processing algorithms are separated in ternis of neighboring pixels' 
interconnectivity. The interconnectivity (regions of operation) in the spatial domain plays an 
important role for implementation of on-chip image processiag because the connection 
routing to the neighboring pixels is sometimes more crucial than the circuit density of the 
processing element. Therefore, the implementation of smart sensors should consider the 
neighbors ' connec tivîty . 
The type of image processing techniques, by connectivity to the neighboring pixels, can 
again be separated into point operation, local operation and global operation, as shown in 
Figure 4.9. Point operation is an image processing method that is based only on the intensity 
of single pixels. It modifies the gray level of a pixel independently of the nature of its 
neighbors; each pixel is modified according to a particular equation that is not dependent on 
other pixel values. In local operation, each pixel is modified according to the values of the 
pixel's neighbors (typically using convolution masks). Spatial filten typically use local 
operation of convolution masks. Global operation is a type of image processing where al1 the 
pixel values in the image are taken into consideration for the detemination of the final value. 
Spatial domain processing methods include al1 three types, but al1 the fiequency domain 
operations, by nature of the fiequency (and sequence) transfomis, are global operations. Of 
course, fiequency domain operation can bewme local operation, based only on a local 
neighborhood, by performing transfomi on small image blocks instead of the entire image. 
However, this is a special case since the fiequency domain operation needs Fourier transfomi 
that is already considered as global operation. 
In the following chapters, the natures of the image processing algorithms are investigated in 
terms of their interconnectivity to neighboring pixels, and comesponding structures for 
irnplementing the algorithms are proposed. Effeçtive architectures of on-chip image 
(a) Point operation 
I 
fi) Local operation (c) Global operation 
Figure 4.9. Image operation divided by regions of operation: point operation. 
local operation and global operation. 
processing with CMOS image sensors will be studied, particularly analog image processing 
in spatial domain. Furthmore, the nature of on-chip image processing and architectural 
imp1ementations will be investigated in ternis of the operational regions (interconnectivity) 
described above. The vision algorithms under the same interconnectivity are subdivided by 
irnplernentation design and fiuictional operation. Thus, an effective architecture is proposed 
for each subdivided aigorithm. Now, the characteristics of image processing operation and 
their adequate architechiral implementations for the image processing integration will be 
investigated in detail into in terms of interconnectivity. 
Chapter V 
5. Point Operation 
5.1. Introduction 
Arnong the simplest of al1 image enhancement techniques some fairly straightforward, yet 
powefil, processing approaches can be formulated with light intensity (gray level) 
transformations alone. Because enhancement at any point in an image depends only on the 
gray level at that point, the techniques in this category are often refmed to as point 
operations. The final output value is spatially independent of other pixel values, but only 
dependent on that pixel value, typically the gray level, at that point. 
In aspects of on-chip integration with image sensors, point operations can give a number of 
advantages, such as parallel processing during integration, real tirne operation, slow 
processing elements, low power consurnption, simplicity of design, and small silicon area. In 
addition, because the point operation is feasible for pixel-level implementations, high SNR, 
low power and concurrent adaptive processing can be easily achieved with the pixel level 
implementation. However, there is a limitation on the number of transistors inside the pixel 
due to a restricted size of pixel with a reasonable fil1 factor (see Figure 4.5). Point operations 
are still low-level image processing, and thus it is assumed that M e r  signal and image 
processing stages can acquire the image output and process it. 
In order to understand the nature of the point operation and to find relationship between 
algorithrns and systern-level architecture, we will look into major algoritbms of point 
operation and divide these operations by similarity of the functional processing. These point 
operations are categorized by their operational nature into three major groups: concurrent 
pixel processing (intensity transformation), histogram processing, and inter-fiame 
processing. Examples of the point operation algorithms will be described shortly. These 
examples include major algorithms for each operation, but do not contain al1 the possible 
algorithms in the category. 
A. Concurre~t Pixel Processing 
Concurrent processing is an image-processing algorithm, which operates on only a particdar 
pixel value, independent of any other pixels. Not only is it spatially independent of other 
pixel values, it is also temporally independent of its own pixel value, which means the 
present value of a pixel is not affected by the previous or the future values of the pixel. 
Because d l  the processing in these o p t i o n s  modifiedtransfers light intensity of the input 
image, keeping a constant relationship between inputs and outputs for the whole array, this 
process is also called intensity transformation. The output value of a pixel is determineci by 
the input value of the pixel, according to the intensity transfer (response) hct ion,  S = T (y). 
The operation may be processed concurrently durhg its light integration. Examples of the 
point operation algorithm, well described in [74][75], include: 
Image negatives: The technique is to reverse the order of light intensity values so that the 





(a) Intensiîy Transfer (a) original image (c) Image negatives, 
(response) Function simulated with LviewPro 
Figure 5.1. Image processing of image negative is to reverse the orderfrom black to 
white. Intensiîy response and sofhvare (Lview Pro) simulated sample images ore 
ilïwtrated. 
Contrast Stretching: Poor illumination environment and setthgs often cause low contrast 
images. The resulting narrowly distributed pixel values of low-contrast images can be 
expanded into wide intensity distribution, increasing the dynamic range and thus the contrast 
of the images. One example of typical contrast stretching transformation is shown in Figure 
5.2. By increasing the slope of the intensity transfer fûnction, where a large portion of the 
pixel value distribution is located, the contnist of the input image can be increased. 
(a) Intensity Transfer (b) Original image (c) Contrast stretching, 
(response) Furaction simulated with LviewPro 
Figure 5.2. Contrast stretching technique stretches intewity response line so that 
the slope of respome line in region of interest gets steeper and appearance of 
interest in an image is emphasired with a higher contrat. 
Compression of dynamic range: With a given range of output pixel values, a transfer 
fimction can increase range of input pixel values, thus increasing the dynamic range of the 
light intensity. An effective way to compress the dynamic range of pixel values is to 
(a) Intensiw Transfer (b) Original image (c) Image compression, 
(response) Function simulated with LviewPro 
Figure 5.3. With n given range of ou@utpixel values, oulpul values can have wider 
range of input pixel values by compression of the pixel values. 
perform the logarithmic intensity transformation with the following transfer hction: 
S = c log (1+ Irl), 
where c is a scaling constant. 
Gray level slicing: This is ofien used to highlight a spesfic range of light intensity in an 
image. One technique is to put a high value for al1 gray levels in the range of interest and a 
low value or unity value for al1 0th- gray levels, as shown in Figure 5.4. 
Figure 5.4. Gray level slicing is a technique highlighting a spec~jic range of gray 
levds in an image by displaying high values for region of interest and low values for 
all other gray Zevels. 
Bit-plane siicing: Instead of highlighting intensity ranges, the highlighting specific bits 
might be desired in order to discriminate contribution of individual bits to total image 
appearance. In 8 bit images, only the five highest order bits contain visually significant data. 
The other bit planes contribute to more subtle detaüs in the image [74]. Depending on what 
data is emphasized, the individual bit c m  be selected and highlighted with bit-plane slicing. 
B. Histogram Processing 
The second type of point operation is histogram processing. Histogram processing techniques 
are based on modimg the output images by modifying the histograrn of its gray Levels 
through the transformation bction.  The gray-level histogram of an image is the distribution 
of the gray levels in an image. In general, a histogram with a small spread has low contrast, 
and a histogram with a wide spread has high contrast, whereas an image with its histogram 
clustered at the low end of the range is dark, while a histogram with values clustered at the 
high end of the range corresponds to a bnght Mage [75]. Histogram processing can Vary 
h m  simple mapping functions, which can stretch, sbrink (cornpress), or slide the histogram, 
to more complicated algorithms that require detailed analysis of its probability density 
functions such as histogram equalization and histogram specification. 
Histogram equalization (linearization): Histogram equalization is a popular technique for 
improving the appearance of a poor image. It is similar to a histogram stretch but it generates 
more effective outputs of an input image. This technique is bas4  on obtaining a uniform 
histogram where the histogram of the resuitant image is as flat as possible. The theoretical 
basis for histogram equalization involves probability theory, where the histogram is treated 
as the probability distribution of the gray levels [74]. 
Histogram specification: Since histogram equalization is capable of generating oniy one 
result (an approximation to a uniform histogram), it is not an interactive image enhancement 
application. The histogram specification is to specie particular histogram shapes, 
highlighting certain gray-level ranges in an image. Because it has a flexibility of selecting a 
certain gray-level ranges, it can generate more visually appealing appearance of an image and 
become superior to histogram equalization. 
C. Inter-Crame Processing 
The third type of point operation is inter-fiame processing where the intensity level of a pixel 
is modified independently in space, but not in tirne. In order to calculate the final values of 
the pixels, the processing needs multiple fiames of images, with at least two h e s  of input 
images, containing time dependency. The examples of the processing include: 
Image subtraction: The difference between two images is computed as the difference 
between ail pairs of corresponding pixels fiom the two images. This image subtractiolz is 
often used in motion detection, radiography, feature extraction and background subtraction. 
Image averaging (multi-image averaging): Under assumptions that the noise is 
uncorrelated and has zero average value, averaging multiple images reduces the noise of the 
image, by sqart(N), where N is number of frames. By storing pixel values of previous images 
in a frame mernory, the average values of several images can be computed with lower noise 
Ievel. 
53. Cornparisons between On-ehip Implementations for Point Operation 
We have divided point operations into three different types in terms of their processing 
characteristics: concurrent pixel processing (intensity transformation), histogram processing 
and i n t e r - h e  processing. When these point operations are integrated with CMOS image 
sensors on a single chip, these characteristics of the processing should be taken into 
consideration for system-level architecture and circuit designs of on-chip processing 
integration. Here, we study on-chip implementations of the point operation. The three types 
of point operation are investigated at different irnplementation levels of on-chip processing: 
pixel, column, chip and frame memory. General system-level architectures are discussed and 
different integration methodologies for each type are cornpared. 
Concurrent Pixel Processing; 
Concurrent processing (intensity transfomation) can comprise an intensity transfomer 
(linearhon-linear amplifier) with controllability in pixel, column, chip and frame memory 
processing. The concurrent processing, compared to histogram processing and inter-fiame 
processing, has a wide choice of implementations. Although the concept of the design seems 
to be simple, the actual design of an amplifier with good controllability, or programmability 
is no t straightforward. 
Concurrent processing, integrated at pixel level, provides a number of advantages: parallel 
processing, processing during integration, high SNR, low frequency processing, low power, 
and adaptation of image signals and processing. Its main attraction is parallel processing 
during integration. Parallel processing during integration gives great flexibility of operation 
as well as local and global adaptation. Parallel processing permits more time for processing 
because typically integration time for input image (light) is much longer than the processing 
time. This slow processing frequency results in low power consumption, particularly for 
analog-intensive designs, because the power in analog processing is typically proportional to 
capacitance and the operational frequency squared. However, because a pixel requires a 
reasonable fil1 factor for good photosensitivity, only a small portion of the pixel area is 
preserved for processing circuits. Therefore, hnplementations at pixel level have severe 
limitations on pixel size and the number of transistors that can be practically used in a pixel. 
Concurrent processing at the column level has more fiedom on design area than at pixel 
level. Still, mlumn level implementations have restrictions on column width, but typically 
not in the vertical direction. Therefore, more flexible circuit designs can be implemented and 
more programmability can be added. Column level implementation maintain parailel 
processing, thus resulting in low processing fiequency and low power consumption (but 
higher than pixel level implementations). 
Concurrent processing at the chip level, where an intensity transformer is located at the final 
serial output channel, consumes the smallest area and has the highest flexibility in circuit 
design and control. However, this requires a fast processing speed with a high bandwidth, 
and often results in high power consumption. 
Concwrent processing with m e  memory (typically analog memory) locates al1 the 
processing circuits apart fiom the image sensor array (to below the image sensor array) and 
results in higher fil1 factor. However, because the concurrent processing is independent of 
spatial and temporal differences in the pixels (e.g. it does not need multiple f.'rames of 
images), analog memory becomes an unpractical implementation, often causing large power, 
large area, and hi& fabrication cost. Therefore, analog memory implementation for 
concurrent processing is not recommended unless special applications are needed. 
From the above comparisons, pixel and column level implementations are recommended for 
system-level architectures and circuit designs in concurrent processing. Particularly because 
the point operation does not have any interconnections to the neighboring pixels, pixel level 
implementation is strongly recommended if a small number of transistors can embrace the 
necessary operation. Pixel and column level implementations, with the benefit of parallel 
processing, can save power and have flexible designs in processing circuits. 
The implemmtation of histogram processing consists of a histogram generator at chip level 
and intensity transformers with pixel, column, chip or analog memory implementations. 
Because the intensity transfer fimction of histogram processing is generated according to the 
histograms of input images, the histogram generator bewmes an important component, 
located at a common output chamel to collect al1 the pixel values. Therefore, histogram 
generation is perhaps strictly a global operation, but closely related to point operation. The 
histogram generator in the histogram processing constitutes a major difference fkom simple 
concurrent processing where intensity is transformeci in concurrent processing, 
predetermined or manuaily programmeci. Histogram processing uses a &ta-derived 
prograrnmed intensity transformer from a histogram generator. Histogram processing has 
many similarities to concurrent processing due to its intensity transformer that can be 
implemented at pixel, column, chip or fnune mernory levels. Therefore, histogram processing 
has the same architectural design benefits and drawbacks as concurrent processing. 
The last type of point operation, inter-fhme processing, needs present pixel values and the 
pixel values of the previous h e  at the same t h e .  This processing has independency in 
area, but not in t h e .  The inter-fiame processing, by the nature of the operation, has 
correlations with pixel values in tirne. Therefore, it requires storage (typically analog storage) 
of pixel values for at least one h e  interval. During the integration of a fiame of image, the 
previous h e  of image shodd be stored until the present image is captureci and necessary 
operation are completed on these fiames of the images. Because design of fiame memory at 
column and chip level faces severe difficulty in its implementation, pixel level and analog 
h e  memory structures are recommended for the inter-fiame processing. However, for 
pixel processing, the storage (typically capacitance for analog memory) easily takes a large 
portion of the pixel area, reducing its fil1 factor and thus, the photodetector photosensitivity. 
Analog memory structure has a large storage area without affecting the fil1 factor of the 
photosensitive pixels. However, this structure may have high power consumption, high 
fabrication cost and more Iikely bigh complexity of design. Therefore, the choice for the 
implementation for inter-frarne processing depends on the applications and user-defined 
specifications. If the specifications focus is on low power and low fabrication cost, the pixel 
level implementation is recommended. If functionality and programmability of the chip is to 
be more emphasized, the analog mernory structure is proposed for the basis of the 
irnplementations. As a summary, the general descriptions and cornparisons of point operation 








(high S m  low 
power, adaptation, 




Flexible design in 
vertical directions, 
but still limited by 
column width 
High flexibility on 
design area and 
fiuictionality of PE, 
but high speed 
requirement, high 




but sacrificed with 
area, power, speed 
Since information is 
extracted fiom al1 
pixels, each pixel 






Each pixel should 
have its own 
mexnory in pixel, 
thus limited by pixel 
size 
Not feasible because 
each pixel should be 





capture, a fiame is 
stord outside, not 
limited by pixel size 
Table 5. General descriptions and conparisons ofpoint operation implementations. for 
diffePent types of the point operation. 
53. Design of In-pixel Coabast Stretching 
S3.L Introduction 
Low-conhast images can result fiom poor illumination, lack of dynamic range in the imaging 
sensors, or even the wrong setting of a lens aperture during image acquisition. Since contrast 
plays a critical role in overali quality of images, it is necessary to assure that the output image 
contains the appropriate contrast. In this thesis, as a part of image enhancement methods, 
especially for portable devices such as vidm ceilphones, PDA and toys, a simple, but 
effective design of image contrast enhancement is investigated and designed. Since this smart 
sensor is intended to be embedded with portable devices, low power and low weight battery 
operation are focused as well as the effective operation of contrast enhancement. 
The idea behind contrast enhancement is to increase the dynamic range of the gray levels in 
the image being processed and to obtain widely spread distribution of image histogram. The 
contrast enhancement techniques have a wide variety of processing methods b m  a simple 
contrast stretching to cornplex histogram equalization. Contrast stretching is a simple, yet 
powerfùl contrast enhancement technique, which will be dealt with in this thesis. This 
technique is used to produce an image of higher contrast than the original with a transfei 
function like one in Figure 5.5. The output ranges of r below an input of m, shown in Figure 
5.5, are compressed, making this output range smaller than the original. The output ranges 
around rn are stretched, which makes the output ranges larger than the original. The output 
ranges above m are compressed like the ranges below m. Interestingly, in an extrerne case of 
contrast stretching, the transformation fiinction produces a two-level (binary) image. 
Input Image (r) 
Figure 5.5. Gray-Ievel intensiîy tmnsformation jùnction for contrast 
enhancement. 
Histogram equalization is a more complex and powerhil contrast enhancement technique 
than the contrast stretching. The histogram equalization modifies the histogram of an image 
to make the histograrn as flat as possible, by multiplying al1 the pixels with the probability 
density fiuiction of the image. In ternis of contrast enhancement, this technique increases the 
dynamic range of the image, which has a considerable e f k t  in the appearance of the image. 
The histogram equalization needs two main computations: cumulative density calculation 
and transformation fuoction generation. 
A common component of these contrast enhancement techniques is the transfer fünction. 
Indeed, the transformation fùnction is a common processing component in almost every point 
operation. The transformation fiinction, also called the gray-level mapping fiuiction, is 
typically linear (nonlinear equations can be modeled by piecewise linear models) and maps 
the original gray-levei values to other specified values. Now, we will study how we c m  use 
this intensity mapping fùnctions for contrast enhancement operation. 
5.3.2. Intensiw Transformation Function 
In the previous section, we have briefly discussed the operation of intensity tninsfer fùnctions. 
The detailed operation of the fimction will be discussed in this section. Image processing 
fiuictions in the spatial domain may be expressed as 
where f(x, y) is the input image, g (x, y) is the processed image, and T is an operation on f. In 
the intensity computation, the transformation fuoction takes the simplest form of 
s = T (r) Equation 5.3.2 
where, for simplicity in notation, r and s are variables denoting the gray level of f(x, y) and 
g(x, y) at any point (x, y). This simple transformation fiincrion becomes a basis of point 
operation in image processing. The mapping h c t i o n  represents not only the relation 
between input and output images, but also has a considerable effect in the appearance of an 
image by three main operations: contrast adjustment, brightness adjustment and gamma 
adjustment. Here, the operation of the intensity transformer dong with its simulations on 
appearance of images will be discussed. 
The contrast of an image is adjusted by changing the dope of the mapping hct ion.  Figure 
5.7 illustrates how the slope of the mapping fiinction affécts the appearance of an image as 
well as its histogram. The original image has a linear relation of unity between the input 
intensity and output intensity, as shown in Figure 5.6. As the slope of the mapping fiinction 
gets steeper, the contnist of the image gets higher as shown in Figure 5.7. AAer dl, when the 
slope becomes significantly hi& the appearance of the image becomes more like a binary 
image, shown in Figure 5.7 (c). As the slope gets steeper, the histogram of the image gains 
more spread in its distribution, which indicates higher contrast. It is a global description that 
the histogram with a narrow shape indicates little dynamic range and thus corresponds to an 
image having low contrast. The histogram with a significant spread corresponds to an image 
with high contrast. 
The brightness of an image can also be adjusted by the minimum or maximum value of the 
output intensity in the transformation fiinction. For example, in Figure 5.8, the minimum 
value of the outputs in the mapping fünction increases, the appearance of the image gets 
brighter, becoming an almost white image in Figure 5.8 (c). The obvious observation from 
this histognim is that the minimum value of the histogram distniution increases as the 
minimum value of mapping fiuiction increases. 
The last operation of the mapping b c t i o n  is to adjust the gamma fwiction of an image, 
shown in Figure 5.9. The gamma adjusmient is for non-linear behavior of many of its 
elements in the image-transmission chain. The relationship of the gamma correction can be 
Figure 5.6. Original image for Matlab simulations on intensiîy transformer 
showing hisrogram and intensity trartsfonnation finetion. 
(a) Contrast 1 
(b) Contrast 2 
(c) Contrast 3 
Figure 5.7. Matlab simulations on intensity transformer (mappingfinction) 
showing contrust stretching technique. As the siope of the liner line gets steeper, 
dishibuton of the histogram spreuh out, gaining higher contrast. 
(a) Contrast 3 and Vrefl 
(b) Contrast 3 and Vref2 
(c) Contrast3 and Vref3 
Figure 5.8. Matlab simulations on intensity transformer (mapping firnction) 
showing brightness a@ustrnent technique. As the minimum values of the 
transformation respome. the minimum value of the histogram increases. gaining 
h igher brightness. 
expressed in the fonn: 
s = c r a  in s = T(r), Equmïn 5.3.3 
where c and a are constants and the exponent a (refmed to the gamma of the device) takes a 
value between 0.5 and 3. To make sure that the perceived gray scale in the displayed images 
is correct (to compensate non-hearity of the components in processing), it is usuaily 
necessary to insert a gamma correction. nie figure obtained by multiplying al1 the device 
garnmas fiom the camera through to the display (but not including the eye) is known as the 
system gamma. If the conditions of viewiog at the scene and at the dispiay are the same (they 
ofien are not), the system gamma needs to be unity. 
The design of an intensity transformer with contrast, brightness and gamma adjustment, 
requires a good programmability and a reasonable precision. Particularly, for contrast 
enhancement applications, because the dynamic range plays an important role in contrast, a 
design of the intensity transfonner with high dynamic range is an essential design 
requirement. Also, it requires a high precision design that is fiequently a drawback of analog 
circuits. 
O h ,  the intensity transformer can be easily reaiized with a global gain amplifier with an 
offset, of which circuit designs are already well established. However, these designs are 
typically manipulated for high precision and hi& speed with large size. Therefore, these are 
not suitable for low power operation in the portable devices. 
Therefore, this thesis proposes the implementation of an intensity transformer at pixel level, 
typically consuming lower power. Yet, for the pixel level implementations, the designs of 
processing elernents require high fil1 factor for its photosensitive area, and thus they need to 
have a mal1 number of transistors, which is one of the main design challenges in this thesis. 
In addition, good programmability is often important for the design of an intensity 
transformer, whkh is very difficult to achieve with in-pixel implementation. Therefore, in 
this thesis, the design focuses on an in-pixel intensity mapping function with a srnail number 
of transistors, and with reasonable programmability, and with low power for portable devices 
of interest here. 
(a) Gamma = 0.5 
(b) Gamma = 1 
Figure 5.9. Mutlob simulations on intensity transfomet (ntappingfùnction) 
showing gamma correction technique. 
Many researchers are attracted by potentially outstanding performance of in-pixel processing 
operation. The general performance and applications of pixel level processing are describeci 
with structural implementations of ADC on CMOS image sensors, in [3 11 focusing on pixel- 
level and in 1321 column-level. Another relevant work concems the size limitation of the 
pixel: how d l  the pixel should be in image sensors [33]. 
One of the well known works on pixel level processing is a floating point pixel-level ADC 
implemented by the Stanford ISL group [34]. Using the same design concept and circuit 
designs, the work also demonstrates a new way to increase dynarnic range of the image 
sensors. Another research project on pixel processing is on-sensor image compression [35]. 
This proposes a novel integration of image compression and sensing on the same focal plane. 
The proposed image compression technique uses a conditional replenishment, which detects 
and encodes only moving areas. While the overall architecture and circuit designs are not 
directly related to pixel processing designs, the conditional replenishment implementation in 
analog at the pixel-level is interesting research. Other examples of pixel-level processing 
dernonstrated include motion detection [36], individual pixel reset [37], pattern matching 
13 81, and f i n g e r p ~ t  detection [4 1 j. Continuous improvement in pixel processing 
performance and fiuictionality is expected. However, these are application-specific designs. 
An interesting in-pixel processing for generai-purpose applications can be found in [93]. 
Here, we will investigate generalized systern-level architecture and design methods for point 
operation, by demonstrating design and manipulation of on-chip light intensity transfomer. 
5.3.4. Designs of CMOS Active Pixel Sensor with In-pixel lntensity Transformer 
We have designed and fabricated a prototype chip comprising a 64 x 64 array of in-pixel 
intensity transformer circuits with photodiode pixels, in standard 0.35 jun CMOS technology 
with 3.3V power supply. A die photograph is shown in Figure 5.10. Each pixel is 30 p 
square including the in-pixel light intensity transformation circuit and it has a fil1 factor of 
66%. The main objectives of this chip are (i) to demonstrate the feasibility of point operation 
with CMOS image sensors, (ii) to demonstrate the scalability of in-pixel processing 
integration with 0.35 pm technology, (iii) to achieve in-pixel processing with low power and 
real-time operation, and (iv) to address limitations and fbture directions of in-pixel 
Figure 5. IO. Die photograph of the prototype contrait stretching chip. The total 
aren is 16 mm2. 
Vref 
Figure 5.11. Schemutic of cornmon source follower consisting of 
a transformer with enhanced-mode AMOS active foud. 
processing with CMOS image sensors. The main challenges of this chip are to design a 
simple circuit with a small number of transistors in restricted pixel area, and to achieve 
reasonable precision of the circuit. 
The main component of the chip is the pixel based intensity transformer, whose circuit 
schematic is shown in Figure 5.1 1. The basis of the circuit is a CMOS common-source 
amplifier with the source connectai to input control voltage instead of ground, and an active 
load instead of a passive load. The transfer fûnction of the common source amplifier is 
shown in Figure 5.12. The transfer characteristic displays three well-defined regions. In 
region 1 of Figure 5.12, the driving transistor M 1 is off, since Vin c Vref + Vt. Nevertheless, 
M2 is in the saturation region and is conducting a negligible current, thus the voltage across 
M2 is equal to Va, and hence the output voltage is VDD - VQ. In region II, Ml is conducting 
and is operating in saturation, and the transfer curve in region II is linear, which is usefbl for 
the amplifier operation. Finally, in region III, M l  leaves the saturation region and enters the 
triode region and the curve flattens out. 
The analytical derivation of equation describing the transfer curve wiii be shown below. The 
derivation is done under the assumption that both devices (Ml and M2) have infinite output 
resistance (that is, horizontal characteristic Iines) in saturation. Furthemiore, the two devices 
will be assumed to have equal threshold voltages, Vt, but diffaent values of K (KI and K2). 
When Ml  is in saturation we have 
r 
Input Voltage 
Figure 5.12. Voltage response of a common source amplifier 




Figure 5-13. Response of a common source amplijer with voltage 
output of photodiode as ifs input. 
= ~*WiW -va2 
Since ID = ID* = ID2 and VGSl = Vin - Vd, this equation can be rewritten 
IDl = IlWh - v&f- v# 
The operation of M2 is desm'bed by 
Io, = R = =  - va2 
Since VGsZ = VDD - VoUb the equation can be rewritten 






Combining Eqs. (5.3.5) and (5.3.7) and with some simple manipulation we obtain 
Equation 5.3.8 
which is a linear equation between Vaut and Vh. This is the equation of the straight-line 
portion of the transfer characteristic (region II) of Figure 5.12. This particular design of in- 
pixel intensiîy transformer has controllability on two operations: contrast and brightness. 
Gamma adjustment cannot be achieved with this design without making the design of the 
transformer more complex, taking too much area in pixel. In addition, because gamma 
correction is typically located at the end of processing stages in order to compensate the non- 
linearity of the components, it loses its value if it is placed at the fiont stage of image capture. 
In Figure 5.13, the relationship between the photodiode input and the transformer is shown. 
The floating diffusion of the photodiode is placed at the input to the transformer. After a reset 
the ,  Tmm, when a pixel is reset to Voo - Vt, the voltage at the floating diffusion 
(photodiode node) decrements due to the photo-generated leakage current, and the 
transfomer is off until the photodiode voltage becomes comparable to Vref + Vt (VFD = Vref 
+ Vt). When the photodiode voltage becomes larger than Vref + Vt, the transfomer is in the 
linear region with a gain of sqrt(KlIK2) until the àriving transistor gets into triode region 
around VDD. In this analysis, the dope of the response h c t i o n  that is equivalent to the 
contrast of the image can be changed by the resistance of the active load. Also, the minimum 
voltage value that is quivalent to the brightness can be changed by Vref (control voltage 
connectai to the source). With this given property of the common source amplifier, we are 
able to design a simple intensity transformer with a small number of transistors. 
In this particular design of intensity transformer at the pixel level, shown in Figure 5.14, a 
PMOS active load is used instead of an enhanced mode active load for programmability and 
output swing. For contrast adjustment, the slope of the transformer should be controllable by 
an input signal. The enhancement mode NMOS load cannot be prograrnmed, always having a 
fixed slope determined by the physical dimensions of the transistors. Using a PMOS active 
load with its gate controlled by input bias voltage allows different slopes accordhg to the 
bias voltages. In addition, the enhancement mode load has an output voltage range fiom Vref 
to VDD -Vt because VGS should be p a t e r  than Vt in order for the transistor, M2, to stay on. 
The PMOS active load has an output range fiom Vref to VDD, gaining Vt fiom that of the 
NMOS load. HSPICE simulations on the transfomer with PMOS load are shown in Figure 
5.1 S. The simulation results demonstrate good behaviorai performance and good 
controllability: Vbias for contrast adjustment and Vref for brightness adjustment. 
A standard source follower is placed right beside the transformer for normal mode image 
capturing (see Figure 5.1 7), so the prototype chip has three different outputs; normal, contrast 




Figure 5.14. Schematic of intensiîy transformer implemented in design of the chip. 
The transformer consists of a cornmn source amplifier with PMOS active load. 
. ,-- 
, ,' - 
,' Output Voltage 
/. 
Node Voltage 
- -- p*____.._ - 
0.0 1 7 Bu 20u 301. 
tirne 
Figure 5. I5. HSPICE simulations on an intensity tnwqfoonner with a PMOS active load 
with (a) d~rerent biasing voltages (Vbiasp) and (b) d~yerent reference voltages (Yrefl. 
channels, shown in the overall structure of the array in Figure 5.16. This structure is similar 
to the standard structure of CMOS image sensor m y :  image sensors, shift registers, bias 
bank and SM'S. Figure S. 16 shows 64x64 image sensors, reset and row shift registers, 
readout components for normal, contrast and binary mode at the bottom of the array. 
Schematics of the major components are shown in Figure 5.17. CMOS image sensoa use a 
photodiode with n diffusion structure for its simplicity in layout. For image capture in 
normal mode, a standard active buffer with source follower is used in every pixel. S/Hs for 
the normal mode also use double p l y  capacitors with PMOS output buffers for the level 
shifting, as explained in Chapter 2. Different fkom standard CMOS image sensor techniques, 
contrast stretched mode and binary mode use common source amplifier structure for the 
intensity transfer function. In addition, S/Hs for these modes use NMOS output drivers 
instead of PMOS, because the intensity transformer does not have any voltage drops, unlike 
Vt drop in the source follower of normal mode, and therefore, the output driver does not need 
to compensate for any voltage drops. The output swing range of the transformer is fiom Vref 
to VoD, and when PMOS output buffers are used, the output of the driver goes fiom Vref + 
Vt to VDD with a range of VDD - Vref -Vt When NMOS output buffers are used, the output 
goes fiom Vref -Vt to VDo - Vt with a range of VDD - Vref. Therefore, the NMOS output 
drivers have an output range larger, by Vt, compared to the PMOS drivers. However, the use 
of NMOS and PMOS drivers does not matter in binary mode because of Vt loss in both 
drivers. Therefore, the reset and row selects are generated by active high shifi registers with 
two inverters. The column selects are generated by active low shift registers. 
5.3.5. Tests and Performances 
The testing the prototype chip consists both of individual pixel test structures and the whole 
image sensor array. The tests on the individual pixel test structure are to ver@ performance 
of the intensity transformer with photodiodes, and the tests on the image sensor array are to 
dernonstrate the effects of the transformer on the appearance of images. 
Signal Responses of Individual Intensi~ Transformer 
The first test is based on the signal response of individual pixel test structures with 
photodiodes. There are three variables affecting the response of the transformer: light 
intensity, biasing voltage (Vbiasp) and reference voltage (Vref). The light intensity is the 






Figure 5.16. Overall stmchrre of the chip, consisting of CMOS image sensors 
array and readout control circuits. Zhe chip has three dzrerent ouput modes: 
normal, contrast and binary mode. 












stretched : if 1 
L Binary 
Figure 5.1 7. Schernatics of main components in intensiîy transformer chip. It 
contains readout buffrs and S/H% for three output modes. 
actual input to the transformer. The diffeient light intensities affect the slope of the 
decrement at the photodiode node, and thus change the dope of the linear region of the 
transformer as well as the intercept of the off region and linear region. Figure 5.18 (a)-II 
shows the output of the binary mode (output of the inverter) and Figure 5.18 (a)-[ shows the 
output response of the contrast stretched mode. As the Iight intensity increases at a fixecl 
Vbiasp and Vref, the slope of the contrast mode response, Figure 5.18 (a)-1, gets steeper. The 
intercept of the off region and linear region also starts earlier. The early starting of the linear 
region with faster slope switches binary response faster- The highest intensity has the fastest 
switch-to-hi& shown in Figure 5.1 8 (a)-II. 
With different Vbiasp, we observe the changes in the slope of the linear region; the more 
current (the smaller Vbiasp) goes through the PMOS transistor, the steeper the slope becomes, 
shown in Figure 5.18 (b)-1. A s  discussed in the HSPICE simulations of the previous section, 
the Vbiasp changes the slope of the transformer, thus changing the contrast appearance of an 
image. The steeper slope of the linear region (the faster response) typically generates an 
image with a higher contrast, which will be demonstrated in the next sections. The steeper 
slope of the response also leads to the faster switching of the outputs of the binary mode, 
shown in Figure 5.1 8 (b)-II. 
As Vref increases, the minimum output voltage of the response increases, shown in Figure 
5.18 (c)-1, because the minimum value is theoretically equal to the Vref. The changes of the 
minimum voltage directly affect the brightness of the image. Since the higher Vref tums the 
driving transistor to the linear region faster, the output of the binary mode switches faster 
with the higher Vref in Figure 5.1 8 (c)-II. 
The variations in the photoresponse of an individual pixel with light intensity, biasing 
voltages, and reference voltages have been tested and well demonstrated for circuit operation. 
These test results demonstrate that the response of the pixel to different light intensities and 
control voltages allows good control of intensity transformation. The tests on the individual 
pixel test structures verie operational performance for individual intensity transfomers with 
a photoreceptor, which are analytically understood with the HSPICE simuIations. These tests 
are well matched to the simulations and encourage M e r  tests of their effects on images. 
CHAPTER 5 
(a) Vurying Light Intemis) 
(c) V q i n g  Vref (Brightness) 
: Vbiasp = 2.47 V 
: Liiht=2wkrn2 
Vbiasp = 2 4 7 V  
{ Vref=lV 
: Vref=lV 




1 I V  Y SOIR) Ut 1 90111 
Figure 5.18. Photoresponse fiom the "stretch " output (top row) und inverter output 
(bottom row) of a ptrel with in-pixel contrmt stretch at various light intensities, bias 
voltages and reference vohges. 
Image Capture in Normal Modo wi'th CharacteWcs of Image Sensors 
The fïrst and the most important test of the image sensor array is to capture an image in real 
time mode. Here, we are able to demonstrate operation of image capture successfully. Some 
sample images are illustrated in Figure 5.19. As expected, the quality of images captured by 
the chip is not high, due to the fact that the chip process technology we used is not optimized 
for image sensors, but instead for logic and memory. However, the subtraction of a white 
background image enhanca the quality of the captured images by reducing fixed pattern 
noise. Figure 5.20 (a) is a raw image and Figure 5.20 (b) is a fked pattern noise subtracted 
image. There are some noticeable differences in their image quality; the processed image is 
cleaner and has a higher contrast. 
The characteristics of the single chip, including image sensors, are summarized in Table 6. 
The conversion efficiency of the chip is only 0.1 pV/e- (refer to Appendix C), which is very 
small, compared to commercially available CIS chips (typically 5-10 pV/e for their 
conversion efficiency). Some of this unexpectedly poor performance may be because of the 
cross-talk between normal mode operation and contrasübinary mode operation, and the 
increased capacitance of the photodiode node. The global photoresponse of the prototype 
array in each of the three modes is presented in Figure 5.21 (a) for uniform illumination at 
wavelength of 540 m. In a uniform dark room, the dark curent is measured with varying 
integration time (sampling rate), as shown in Figure 5.21 (b). The characteristics of the chip 
are measured and cakulated, based on the measurements and many other tests. The 
characteristics chart of the chips shows photoresponses including photosensitivity, spatial 
pattern noise, temporal noise, S M  and dark curent. The physical parameters, including 
pixel size, fiIl factors and chip size are also included. 
The chip consumes power of 14.85 mW, (typical power consumption of commercial CMOS 
image sensors is around 50 - 100 mW for VGA format). It includes concurrent operation of 
normal, contrast stretched and binary mode in real t h e  operation at 24 heslsecond. When 
contrast and binary modes are tumed off and power with only normal mode on is measured, 
the power consumption is around 6 mW, 
Figure 5-19. Sample images captured in real time by the ch@ in normal mode. 
(a) Raw image (b) Pracessed image 
Figure 5.20. Pattern noise can be reduced by subtracting white background 
image form the raw image. 




Format of Array 
Fil1 Factor 
Vdd . --
m u t  format 
0.35 Hm CMOS technology with double poly and 3 metal layers 










Signai to Noise 
Ratio 
Dark Signal 
3 . 3 ~  1 -84 =6.072 
mW at 24 fiame 
rate 
O. 1 UV le- 
3.3 x (4.5 - 1.84)=8.778 mW 
1.38 V 
50 mV (3.6% of 
saturation level) 
1 5 0 - 2 0 0 1 ~ ~  
0.03 Vlsec 
1 1.34 1 mV 1 (uWlcmL) 
for ran e of 5 to 6.5 5 uW/m 
5.42 
(Avcontrast~Avnod) in
the ran e of 5 to 6.5 f u w / m  
Table 6. Single ch@ characteristics in normal mode and contrust mode. 
(a) Photoresponses of Three modes 
I 
Lig ht Power (uw/crn2) - Normal - Contrast -8inary 
(b) Dark Signal 
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
Integration Time (Second) 
Figure 5.21. Characterrrtcs of single chip. (a) photoresponse of three output modes 
and (b) dark signal measurements. 
Normal Mode Contrast Stretched Mode Binary Mode 
Figure 5.22. Sample images and histograms of three output modes. 
which leads to the calcuiation that power consumption in contrast and binary modes is about 
8 mW. This higher power is due to the large capacitance loads of PMOS, connecteci to the 
column lines. However, by extracthg the active load transistors out of the array md inserthg 
row transistors between, the processing power could be reduced because the CS amplifier is 
only on when the pixels are read out. 
Intensity Transformer rir Contrust Sttetched Mode and Binav Mode 
Images captured by the prototype sensor in the three operational modes are compared in 
Figure 5.22, dong with calculated histograms showing the distribution of pixel values in the 
image. A normai mode image with poor contrast is shown with its narrowly distributed 
histogram. With an appropnate values for Vbiasp and Vref, the contrast stretched mode 
shows enhancement of the contrast by spreading out the histogram distribution. The binary 
mode converts the grayscde image to one bit binary image and therefore the histograrn 
contains only two values of black and white. Two sets of three output modes are shown in 
Figure 5.23. O+iginal images captured in n o m i  mode with dif/erent illumination 
(approximately, (a) is under 1 70 lux and (b) under 130 Zwc. The imuge of (a) is caphtred 
under a brighter illumination fhan the image of 0, showing o v e d  distribution of the 
histogram shifis to the right (brighter groyscale). 
Figure 5.22. The first set of images has a better contrast in terms of histogram distribution 
than the second set. With different combinations of Vbiasp and Vref applied to each of the 
images, the contrast stretched modes of these images have approximately the same histognun 
distributions, and thus same contrast. The b i n q  modes of the images always have the two 
vaIues of grayscale, with different number of black and white pixels. 
The contrast stretched mode enables the modulation of the contrast of the image. As shown 
in Figure 5.24, biasing voltage (Vbiasp) changes the distribution of the histogram of the 
original image in Figure 5.23, re*iining the original maximum and minimum values of the 
histogram. As Vbiasp decreases, the distribution of the histogram is spread flat. 
The decrernent of Vbiasp to the PMOS transistor allows more current to flow through the 
driving transistor of the transformer and thus, the slope of the linear region response becomes 
steeper. However, because the original image of Figure 5.23 has a wide spread dishiiution of 
the histogram, the effects of Vbiasp on the appearance of the image are not well 
dernonstrated. ïnstead, observations on the distribution of the histogram give a good 
illustration on the effects of the biasing voltage to the contrast of the image. The contrast 
stretched mode enables modulation of not only the contrast of the image, but also its 
brightness. The Vbiasp changes the distribution of the histogram, retaining the original 
maximum and minimum values of the histogram. In contrast, Vref does not change the basic 
distribution of the histogram, but modulates the minimum value, thereby changing the 
brightness. Figure 5.25 shows different image captured with different reference voltages. As 
the Vref increases, the minimum pixel value of the array increases, and thus the minimum 
value in the distribution of the histogram increases. The noticeable changes in the brightness 
of the image are shown, dong with different Vref. In addition, the increment of the minimum 
value of the histognun is observed with the brighter images with a higher Vref. 
One of the reasons for the degraded appearance in the contrast stretched mode and binary 
mode is the Vt and lithographical mismatch in the transistors of the transformer. Under the 
same input voltages of Vbiasp and Vref, the physical sizes of the transistors determine the 
output voltage of the transformer. Particularly, when the driving transistor in the intensity 
transformer is in linear region, the mismatch affects the output most, due to different gains of 
Vbiasp = 3.054 V 
Vbiasp = 2.900 V 
Vbiasp = 2.751 V 
Vbiasp = 3.007 V Vbiasp = 2.952 V 
Vbiasp = 2.847 V Vbiasp = 2.800 V 
Figure5.24. Effects of biasing voltage (Vbiasp). 
Ybiap increases the contrast of images: while 
maximum und minimum of the histogram mains 
saine, the distribution spreads out. 
Vref = 135 V 
Vref = 1.50 V 
Vref = 1.40 V 
Vref = 1.55 V 
Vref = 1.45 V 
Vref = 1.60 V 
c- 
FigureS-25. Effects of refrence voltage (Yrefi Vref increuses the brighmess of image; 
while distribution of histogram remuins sume, the minimum value increuses as Vref 
increases- 
the transformer in the linear region. The pattern noise, due to the mismatch, is amplified by 
the gain with which the image signals are ampiified. Figure 5.26 shows the tested 
meaSuTements of the image sensors array in the prototype chip. 
As light power (light intensity) increases, the pattem noise of outputs in the normal mode 
increases slightly. Different fiom the normal mode, the pattern noises of contrast stretched 
mode and binary mode remain at roughly constant values until the light power becomes 
around S ~ W I ~ ~ ~ ,  where the driving transistor of the transformer goes into sahiration region. 
in the linear region of the transformer, due to the gain, the mismatch (pattern noise) gets 
higher. As the light power increases, Vgs of the driving transistor increases, hence the gain 
inmeases dong with the pattern noise. 
When the light power becornes around 6 p ~ / c m 2 ,  the pattern noises of contrast stretched 
mode and binary mode reach their peak values of 0.93V and 1.508 V respectively. This is 
when the driving transistor goes into its triode region. As Vgs increases M e r ,  the transistor 
O L 1 1 I 1 
O 2 4 6 8 10 
Light Power (uW/crn2) 
+Normal + Contrast + Binary 
Figure 5.26. Mismatches in three ouput modes. Due to the lithographical mismatches. 
appearances of images in contrast stretched mode and binary mode are agected by 
pattern noise. 
goes deeper into the triode region, and the amplification gets smaiier. The pattern noise 
decreases with the decreased amplification. Thus, it is concluded that the effects of V, and 
lithographic mismatches on contrast-stretched and binary mode images become more 
significant than for normal mode operation. 
5-3-6 Summary und Conclusions 
In summary, this chapter desmies the design of an in-pixel intensity transformer and its 
analysis, along with operational perfomuioce and experimental results. A simple intensity 
transfomer is designed with controllability (programmability) of contrast and brightness. 
Each pixel for the in-pixel mapping fiinction is designed with 3 transistors and demonstrated 
successfilly. The intensity transformer, with cornmon source amplifier structure, is so simple 
that pixel-level processing implernentation is possible and feasible for on-chip integration 
with CMOS image sensors. Also, the design with a small number of transistors encourages 
the in-pixel integration. 
Full dynamic range of the allowed voltage swing between VDD and ground is still not used in 
the transformer, due to the necessary Vt difference for switching the driving transistor. 
During testing, we experienced some degradation of images in normal mode when al1 the 
three output modes (normal, contrast and binary mode) are tumeci on concurrently. This is 
attributed to cross-talk due to the short physical distance between the source follower and the 
transformer in the same pixel. In addition, switching in binary mode causes spikes in the 
contrast stretched mode for a short period of time. However, the effects on the appearance of 
the image are rarely noticed. Also, the effects of V, and lithographic mismatches become 
important, especially for contrast stretched and binary mode images. Thus, it becomes 
necessary to have an on-chip pattern noise reduction mechanism (e-g. using a feedback 
system). Also the precision of ou.  analog intensity transformer is questionable. There should 
be much more effort to improve the precision of this analog implementation (or to design a 
new implementation of higher precision) in order to use it practically. 
Although the intensity transformer has reasonable controllability by altering the biasing 
voltage and the reference voltage, it does not have DSP-like full prognunmability for contrast 
and brightness adjustments. However, with tradeoffs in programmability and precision, the 
in-pixel intensity transformer is able to achieve low power and real time operation with pixel 
processing, which is perfectly suitable for portable and wearable devices. Therefore, this 
design of the intensity transformer is for low-level image processing applications where low 
power and pixel level programmability, for m e r  automated contrast optimization, are 
emphasized . 
With design and fabrication of the in-pixel intensity transformer chip, the main purposes of 
this study, to explore the feasibility of on-chip in-pixel integration, and to gain a better 
understanding of the design issues needed for hi& quality contrast enhancement and 
automated contrast optimization, are successfully acbieved. The main issues of designing in- 
pixel processing with CMOS APS are circuit density of processing elements (that easily 
erodes photosensitive area and thus reduces fil1 factor) and massive interconnections between 
neighboring pixels. in point operation where the massive interconnections are not required, 
and in-pixel processing is the best design methodology as long as the circuit density of the 
processing element does not take too much space in the pixel. In this particular design of in- 
pixel light intensity transformer with 0.35 pm technology, the performance of the chip was 
not optimal, due to poor photosensitivity and low precision of the processing circuit. As the 
technology scales down, retaining the same minimum size of pixel (4-5 pm), the space of in- 
pixel processing circuit will get higher (see Figure 4.5) and thus higher precision of the 
circuit will be achievable. Therefore, point operation in pixel processing is a promising 
research area for the near fiiture. 
Chapter VI 
6. Local Operation 
6.1. Introduction 
Local operation is also called mask operation where each pixel is modified according to the 
values of the pixel's neighbors (typically using convolution masks). Local operation is 
spatially dependent on 0 t h  pixels around the processed pixel: the final value of the 
processed pixel is affected by its n e i g h b o ~ g  pixels in the finite sized masks. The basic 
approach of the operation, convolution, is to sum products of the mask coefficients and the 
intensities of the pixels under the mask, at a specific location in the image. Denoting the gray 
Levels of pixels under the mask (3 x 3 mask in this example) at any location by ZI, 22, . . . ,z9, 
the response of a linear mask is 
The gray level of the pixel located at (x, y) is replaced by R if the center of the mask is at 
location (x, y)  in the image. This computation is repeated as the rnask is moved to the next 
pixel location in the image until al1 the pixels in the array are covered. Linear spatial filters 
are defined such that the final pixel value, R, can be computed as a weighted s u m  of 
convolution mask (non-linear filters cannot be implernented in this way). In the above case, 
3x3 local mask was taken as an example for the convolution mask. However, the size of 
convolution mask is not restricted to 3x3, but can be expanded to 5x5, 7x7, 9x9, and larger, 
depending on what precision the h a 1  value is required to have. 
in aspects of on-chip integration with image sensors, local operations provide advantages of 
real thne operation in image acquisition and processing, such as implementations of many 
practical linear spatial image filters and image enhancement algorithms. In addition, because 
the local operation is feasible for column structure implementations, low fiequency 
processing is enabled and thus low power consurnption is expected. However, since the local 
operations are basai on a technique where local memory stores pixel values of the 
neighbours and processes thern concurrently, implementation of the operation must contain 
some type of storage, potentially requiring a large design area. Applications of local 
operation typically use an iterative technique for advanced image enhancement algorithms, 
which cannot practically be implemented on-chip. Nevertheless, in the case of column 
structure implementations, local operation still bas a limitation on design area because of the 
restricted column width, even with flexible design area in the vertical direction. Therefore, 
in order to overcome these limitations, careful designs and system plans are requird for the 
on-chip implementations. 
In order to understand the nature of local operation and to find a relationship between 
aigorithms and architectural on-chip implernentations, we will look into the main local 
operation algorithms, grouped according to simïlarity of functional processing. With many 
different local operations in Mage processing algorithms, these local operations are 
categorized into three major groups: smoothing filters, sharpening filters and edge detection 
filters. Examples of the local operation algorithrns are descnbed in [74], [75], [76], and 
summarized as follows. 
6 1.1. Smoothing Fifters 
Smoothing filters (Figure 6.1) are used for blurring and noise reduction. Bluing removes 
smail details nom an image and bridges mal1 gaps and holes in lines or curves, often used in 
preprocessing stages pnor to object extraction and segmentation. In addition, blurring can 
reduce spatial noise by smearing pattern noise in an image. Noise reduction can be 
accomplished by blurring with a linear filter and also by nonlinear filtering. Smoothing filters 
consist of four main types, namely order filters, mean filters, ordedmean filters and adaptive 
(a) Original h g e  fi) Average (Arithmetic Mëan) Filter 
(c) Median Filfer (ci) Adaptive Filter 
Figure 6.1. Matlob simuIations on smoothingfilrers. me image of theflower is 
added and degraded with Guassian noise. The size of local mask is 3 x 3. 
filters. Each type of filter has its own characteristics and applications. The detailed 
description of each filter is ornitted since it is out of this thesis' scope. 
The second type of local operation is sharpening filter. Image sharpening deals with 
enhancing detail infornation in an image, as shown in Figure 6.2. Because the high spatial 
fiequency fomponents of the image typically contain the detail idormation of the image, the 
sharpening filters should have some form of high-pass filtering. The detail information 
includes edges and boundaries of objects, which co~esponds to image features that are 
spatially small. This information is visually important because it outlines object and feature, 
thus increasing the contrast of the image. The sharpening filters are again subdivided into 
two groups: high pass and high boost filters. 
A highpass (sharpening) spatial filta contains positive coefficients near its centered pixel, 
and negative coefficients in the outer peripheral pixels of the local mask. However, because 
negative coefficients in the mask remains strongly in the final image output, this high-pass 
filtering for image enhancement typically requires an extra step of pst-processing, such as 
histograrn equalization, to display an acceptable image. High boost filters are more advanced 
than the highpass filten. With the highpass filtering, edges and high spatial frequency 
variances in the image will get enhanced, but a large portion of the visual information of the 
image is lost because the filter attenuates low spatial frequency components even though they 
are important for the appearance of the final image. The high boost filter solves this problem 
by adding low fiequency offset to the filter function. 
(a) Onginal image 
before the sharpening 
(b) Shapened image 
(c) Spatial coeficients 
response 
(id) Frequency response 
Figure 6.2. Matlub simulations on sharpenngfilters. The size of local mask ir 
3 x 3. 
îL 1.3. Derbath Fiitem (edge detectroon) 
Opposite to inteption that is analogous to averaging or smoothing, differentiation can be 
expected to sharpen an image extremely, leaving only boundary lines and edges of the 
objects. This is an extrerne case of hi@ pass filters. The most common methods of 
differentiation in image processing applications are k t  difference, gradient and laplacian 
operator whose Matlab software simulated images are shown in Figure 6.3. The difference 
filter is the simplest fonn of the differentiation with subtracting adjacent pixels nom the 
centered pixel in different directions. The gradient filters represent the gradients of the 
neighboring pixels (image differentiation) in foms of matrices. Such gradient approaches 
and their mask implementations are represented with various methods: Roberts, Previtî, 
Sobel, Kirsch and Robinson. Laplacian is another differentiation method for edge detection. 
The Laplacian of an image is a second-order derivative of 2D fiinction, which enhances 
abrupt changes and edges in the image. 
62.  Proposed Structure for L w d  Operation 
In the previous section, local operations are categorized by processing characteristic, into 
three different types: smoothing filters, sharpening filters and edge detection filters. The 
operation of these processes is baseci on a local (typically convolution) mask. The difference 
between these local filters is the different coefficient values used for the mask and the 
different sizes of the mask. Depending on the coefficients of the convolution mask, the 
operation can be smoothing filters, sharpening filters or even edge detection filtas. Therefore, 
in tems of on-chip implementation architecture, it is convenient to divide the local operation 
by the size of the local masks: 3x3 local mask (the smallest mask size and the simplest for 
on-chip implementation) and bigger than 3x3 mask. 
First, it is better to have a good understanding of the types of local mask in terms of size and 
comectivity. The local masks can have different sizes such as 3x3, 5x5, 7x7 and so on, of 
which the center pixel is the processed pixel being affected by the neighbors in the mask. 
Because the processed pixel is at the center of the mask, the size of the mask goes with odd 
numbers. The masks do not have to be square, but they are typically squares because of the 
simplicity of the design and the operation. Figure 6.4 shows different sizes of the local mask 
with shaded pixel at the center. Typically, as the size of the mask inmeases, the effect of the 
(a) Original image 
(b) Roberts (c) Sobel 
(4 Pravitt (e) Laplacian 
Figure 6.3. Matlab simulatio~~~ on edge detection filfers. m e  d~gerent edge 
detection algorithm vectors produce dtrerent effects on appearance of an image. 
The size of local mark is 3x3. 
(a) 8-connectiviîy 
Figure 6.4. Local mash with drerent skes. 
(b) ~connectivity in cross (c) Cconnectiviîy in diagonal 
Figure 6.5. Local markr with d~rerent connectivity. 
processing on the image is more apparent. As a matter of fact, the image quality becomes 
better with the larger masks under a given operation. However, due to the limited design area, 
long processing tirne and complexity of the design, the implementation of the large masks is 
o h  irnpractical. 
Comectivity of the local mask refers to the way in which the central pixel is comected to its 
neighboring pixels. The centered pixel, in a 3x3 mask, has eight possible neighbors: two 
horizontal neighbors, two vertical neighbors, and four diagonal neighbors. As shown in 
Figure 6.5, we can define three diffaent connectivity: (1) 8 connectivity, (2) 4 comectivity- 
cross and (3) 4-comectivity-diagonal. Similar to the size of the mask, as the connectivity in 
the mask increases, the effect of the processing becomes more apparent and the image quality 
becomes improved. 
When these local operations are integrated with CMOS image sensors on a single chip, these 
characteristics of the processing mask (size of the mask and interconnectivity) should be 
taken into consideration for systern-level architecture and circuit designs because these 
characteristics are directly related to design complexity and chip area. Here, we study on- 
chip implementations for the local operation. The implementations of different sized local 
masks (3x3 masks and larger masks) are investigated at different implementation levels of 
on-chip processing: pixel, column, chip and fhne mernory levels. General structural 
implementations are discussed and different architectural integrations for each operational 
type are compared with its merits and drawbacks. 
d2œlœ Implementations of 3x3 Local Mmk Filters 
The implementation of 3x3 local masks can be perforrned at the pixel, column, chip and 
fiame mernory levels. Because 3x3 local operations are the smallest possible rnasks, their 
implementation is relatively easy and there is a relatively large choice of architectural 
itnplementatiom. However, due to interconnections between neighboring pixels and to 
complexity of processing elernents, there are many challenges and difficulties in design and 
implementation of even such a simple mask. Here, it is assumed that these local masks have 
full comectivity to every neighboring pixel, giving û-connectivity in a 3x3 mask. 
Image Sensor Anay 
Figure 6.6. Pire1 processing for 3x3 local mask operation. 
First, implementation of a 3x3 local mask with in-pixel processing structures is an attractive 
design where each pixel has a photodetector and a processing element, C O M ~ & X ~  to its 
neighboring pixels in the array, shown in Figure 6.6. The connections to the neighboring 
pixels are defined by the local mask. In a case of 8 comected neighbors, a photodetector has 
eight outputs to the processing elements of its neighboring pixels. In addition, a processing 
element of a pixel has eight inputs from photodetectors of its neighboring pixels. Therefore, 
an obvious disadvantage of pixel level implementation of local mask is the heavy 
connections among pixels and processing elements. Due to the interconnections, the pixel 
loses its fill factor. Not only the interconnections, but also the processing element and the 
storage take area in a pixel, thus the photosensitive area is m e r  reduced (but microlens cm 
overcome loss of fill factor). In cases when the storage is an analog memory, the leakage 
(charge retention time) of the memory should be considered carefiilly to assure that there is 
not much voltage &op in the memory. The larger memory (e-g. capacitor) is the longer the 
holding t h e  is. However, large memory typically reduces the fill factor. Also, there should 
be shield for the storage to block any incident light Another disadvantage comes from the 
readout mechanism. Due to the concurrent processing on neighboring pixels, progressive 
scanning techniques of conventional CIS arrays will not work, unless each pixel has its own 
memory, which would increase pixel size significantly. Therefore, a new design of peripherai 
readout component may be required in the pixel processing implementation. 
If intelligent connections arnong pixels and simple processing elements are developed, the 
pixel level implementation is very attractive due to parallel processing, of which advantages 
Image Sensor Amy 
Figure 6.7. Column processing for 3x3 local mask operution. 
include low frequency processing, low power consumption and adaptation to the local 
environment. However, since the pixel level implementation still faces severe limitations on 
pixel size, the feasibility for given applications should be carefully examined and planned. 
The second method of 3x3 local mask implementation is based on column level processing 
structures with local memory. At the bottom of the imager array, three sets of Iinear arrays 
with Zocd storage and processing elements are placed with the sarne number of the columns 
in the image array, as shown in Figure 6.7. Since the processing elements are separated from 
the pixels, a progressive scanning technique of conventional CIS array can be used here. In 
the progressive scanning method, when pixel values of the image sensor array are read out 
row by row, pixel values of one row are dumped into the fist row of the processing array and 
stored until next image data come. When the next pixel values come, the previous values of 
the processing element array are shifted to the second row and then the third row. This 
repeats until al1 the rows of the image go through the processing array. Each t h e  a new row 
is dumped, the operation of the local mask shouid be done before the tnuisfer to the next row. 
The column structure implementation of local operation offers a number of advantages such 
as column-parallel processing, flexibility of implementation in vertical direction, low 
frequency processing and low power consumption. Because the column level processing 
Image Sensor Array Proœssing 
Element 
Figure 6.8. Chi!  processing for 3x3 local mask operation. 
structure has added space for its design of processing elements in the vertical direction, the 
restrictions on design area and the nurnber of transistors are relaxe& compared to the pixel 
level processing implementations. However, there are still limitations on the column width 
and thus, a carefùl design and irnplementation of processing elements is recommended. 
Among the choices of the i~nplementations~ the simplest method of structural irnplementation 
is chip level irnplementation where a processing element is located at the end of the output 
channel in the image sensor array, as shown in Figure 6.8. Because this method does not 
have any limitations of pixel sUe nor column width, it has fkeeûom of design area and 
therefore, it is feasible to use circuits of high complexity and fbnctionality. Even with 
complex design of processing elements, the chip processing implementation is expected to 
have the smallest design area. However, it requires a very fast processing fiequency, equal to 
the image data rate of the imager array (-10-1 00 MHz). High-speed readout typically causes 
high power consumption that is not desired in many applications, and increases the design 
complexity of the processing element to protect it fiom noise and crosstak. Similar to pixel 
level implementations, a new scanning method other than progressive or interleaved 
technique is desired for the chip level irnplementation because of the concurrent readout and 
processing operation on neighboring pixels. Also this must be non-destructive because we 
need to reuse the pixel values. 
Image Sensor Amy 
Processing 
Element 
3 Linear arrays of local storage 
Figure 6.9. Hybrd rnethod (colunin + chip processing) for 3x3 local 
mask operation. 
A modified structure of chip level implementation is shown in Figure 6.9. Using a 
progressive scanning method of the column level structure, when the three rows of the local 
storage axray contains valid image data fiom the image sensor array, image data of al1 three 
rows of the storages are shifted in series to the processing element. As the image data corne 
fkom the local storage array, the processing element operates on the image at very high speed 
(same as the data output rate or output sarnpling rate). Still, the method operates at very high 
speed with high power consurnption, but a simple progressive scanning method can be used 
with a trade-off on a larger area of local storage array. 
The last option for on-chip irnplementation is &me memory level structure, shown in Figure 
6.10. Al1 the pixel values of the image sensor array are shified to the h e  memory once 
photodetectors integrate incoming light and capture an image. Each pixel of the frame 
memory consists of storage and a processing element, very similar to the pixel processing 
implementation except that there are no photodetectors in the frame memory. Therefore, the 
pixels of image sensor array do not lose any fil1 factor for the processing elements and 
storages. With the gain of fil1 factor in the photodetector pixel, the overall chip s u e  increases 
Image Sensor Array 
Figure 6.10. Frame memory processing for 3x3 local mask operation. 
with fiame memory and intwface circuits, thus increasing fabrication cost. Because images 
captured by the image sensor array always go to the fiame memory for the local processing, 
the output images of the chip experience latency of one image fiame: the present output of 
the chip is captured one h e  before. Also, because the sensor array is not centered in the 
package, it needs a special care to align the lem with the package. Despite the high 
fabrication cost and complexity of memory design, the implementation with the h e  
memory has the potential advantages of parallel processing, low processing power 
consurnption and flexibility of processing circuit design. 
We have seen different architectural implementations for local operation with 3x3 local 
masks. Each type of implementation has its own advantages and disadvantages. A b  careful 
investigation of these implementations, we recommend the column level structure for the 
implementation method for 3x3 local mask operation because of column-parallel processing, 
low power consumption, and feasibility of implementation. Currently pixel-level 
irnplementation is less feasible, fkom a practical point of view, due to its extensive 
interconnections and severe increase of pixel size by the processing element and storage. 
However, when the CMOS technology scales down m e r ,  in-pixel processing may become 
a practical implementation in the near future. Chip level processing and frame memory 
irnplementations lose their interest in design due to their probable high power consurnption 
and complexity of circuit design. 
62.2. Impfementatr'on of Bigger Mmks than 3 d  
The on-chip Mplementation of masks bigger than 3x3 such as 5x5 (24 interconnects), 7x7 
(48 interconnects), 9x9 (80 uiterconnects) and even larger, is very difficult Simply because 
of the mask size and the large number of routings, the implementation of these masks 
requires extreme caution on interconnection routings between neighboring pixels. Pixel, 
column and h e  memory implementations, where, in some ways, limitations on the design 
area exist, are not suitable for these masks. Even chip processing irnplementation is not a 
good choice because of its complicated scanning method. Therefore, there should be some 
modifications on these implementation methods in order to accomplish the design of the 
larger masks . 
One possible design for the larger masks is a hybrid method, combining column and chip 
level implementations, introduced in the previous chapter (see Figure 6.9). Because a 
processing elernent is located at chip level (one processing element per chip), there are no 
limitations on design space for the processing elements. in addition, because the three or 
more lhear arrays of local storages (the number of linear arrays is equal to the mask size) are 
use& a conventional progressive scanning technique can be applied for the readout, reducing 
design complexity of penpheral readout circuits. However, since the processing of operation 
is done at chip level, high processing speed (equivalent to the pixel rate) at the processing 
element is still required. This processhg speed evenhially determines the data output rate. 
The hi& processing speed aiso consumes high power, which is the main trade-off of the chip 
processing implementation. 
Another implementation method for the larger masks is a pipelined structure. The pipelined 
structure is based on a concept that some 2 dimensional matrices (N x N) can be represented 
as products of two linear arrays (product of a linear (N x 1) and a linear array (1 x N)) if the 
matrix is separable, as shown in Figure 6.1 1. The computation of the product on linear arrays 
is relatively easier than that of the 2 dimensional arrays (Figure 6.12) where a 1x3 linear 
array computation is done with pipelined structure. This computation method may seem to be 
trivial, but when the mask size gets bigger than 5x5, this method will be highly effective. 
Figure 6. I I .  Concept of pipelined local musking. A 2-dimemional mat& con be 
realized by the producr of two 1-dimensional (Iinear) arruys. 
From the image sensor array 
4 4 4  4 4 4  4 4 . 1  4 4 4  
Final Output Storage 
Local storage + 
Processing Element O Local storage 
Figure 6.12. Basic stnrcture ofpipeiined implemeniation for large local m&. 
With the pipelined computation, column level implementation is possible for the larger 
masks, hence allowing a slow processing fiequency and low power consumption. 
Because a whole row of the array is computed at a same time, N different Iinear arrays are 
needed to compute the product of one linear anay, shown in Figure 6.12. AAer this operation, 
another similar computation should be done for a horizontal linear array in order to complete 
the 2 dimensional matrices. Therefore, the circuit design for the computations gets complex 
and consumes a large area. In addition, the coefficients of the product matrices of two linear 
arrays are correlated, and thus any changes on a coefficient of the product matrix may affect 
the other coefficients in the matrix. Because the coefficients of the local masks should be not 
correlated, but independent form each other, the pipelined structure has limitations on 
contents of the coefficients. Therefore, matrix must be separable for this pipelined structure. 
Although the pipelined structure can take advantage of column processing implementation, 
providing column parallel processing and low power, its complexity of computations, 
difficulty in input controls, and large design area Iimit the use for practical designs. Therefore, 
for the design of the large convolution masks (larger than 3x3 mask), the hybnd design with 
column and chip level implementation is highly recommended for its relatively simple design 
and easy operational control, at the price of hi& power. However, these designs do not have 
flexibility on the mask size; the mask size is predetermined and pre-fixed before the chip 
fabrication. Also, iterative operations cannot be implemented on chip, thus limiting its 
applications to low level preprocessing. The general description and cornparison of local 
operation are summarized in Table 7. 
63. Spatial On-Chip Binary Image Processing 
6 3.1. Fundamental Operation rir Binaty image Processhg 
Binary image processing is of special interest, since an image in binary format can be 
processed with very fast logical (Boolean) operators. Each gray level is represented by 
several bits. In a binary image, only one bit is assigneci to each pixel (B = l), implying two 
possible gray-level values, O and 1. These values might indicate the absence or presence of 
some image property in an associated gray-level image, where 1 indicates the presence of the 
property at that coordinate in the image, and O otherwise. This image property commonly 






Large number of neighbor 
connections and processing 
element area sacrifice fill 
factor and pixel size 
- 
High flexibility of 
implementation with low 
pow= 
High speed and power 
Combined structure of 
column row storage and 
chip level PE with local 
storage or Chip level PE 
with a special image 
scanning method 
High fl l  factor because 
neighbor connections and 
local storage are at outside 
of sensor array, but too 
much degradation on area, 
power and speed 
- -- 
Larger than 3x3 Masks 
lmpractical to implernent, due 
to a large number of 
interconnections to 
neighboring pixels 
Complex implementation due 
to large numtber of 
connections. If implemented, 
special architecture like 
pipelined structure is desired 
High speed and power 
Combined structure of column 
row storage and chip level PE 
with local storage or Chip 
level PE with a special image 
scanning rnethod 
High fill factor because 
neighbor connections and local 
storage are at outside of sensor 
array, but difficulty rernains in 
interconnections between 
pixels 
Table 7 . G e n d  descriptions and com@sons of local operation irnplementations. for 
dzfferent sizes of local masks. 
absence of certain objects might be indicated. Often, a binary image has been obtained by 
extracting information fiom a gray-level image, such as object location, object boundaries, 
the presence of some image property. Here, in this thesis, we restrict the binary image 
processing obtained by an image property of the brightness (light intensity at a pixel) for 
M e r  operations. 
A much broader and more powerfül class of binary image processing operation is binary 
image morphology, also called morphologid image processing. Morphology relates to the 
structure for forms of objects. Morphological filtering simplifies a binary image to assist the 
search for objects of interest. This is done by smoothing out object outlines, filling small 
holes, eliminating mal1 projections, and using 0 t h  similar techniques. Even though ou. 
focus of morphological image processing is for binary images, the extension of the concepts 
can be applied to gray-level images [42-521. 
The two principal morphological operations are dilation and erosion. Dilation expands 
objects, thus potentially filling in small holes and comecting disjoint objects. Erosion shrinks 
objects by etching away (eroding) their boundaries. These operations can be varied for an 
application by the proper selection of the stnicturing element, which defines the neighbors in 
the operations, and thus determines exactly how the objects will be dilated or eroded. The 
neighborhood for a dilation or erosion operation can be of arbitrary shape and size (it can be 
4-comected or 8-connected, or even with different sizes of the structuring elements of 3x3, 
5x5 or larger). A structuring element is a matrix consisting of only 0's and 1's. The center 
pixel in the structuriag element represents the pixel of interest, while the elements in the 
matrix that are on (= 1) define the neighborhood. 
The dilation and erosion processes are performed by laying the stmcturing elernent on the 
image and sliding it acmss the image in a manner similar to convolution. The diffaence 
between dilation and erosion is the operation perfiomed. The algorithms of these processes 
are as follows. 
For dilation, if the center of the strucîuring element coincides with a ' 1 ' in the image, 
or if any pixel in the input pixel's neighborhood (defined by '1' in the stnicturiag 
element) is on (' 1 '), the output pixel is on. Otherwise, the output pixel is off ('0'). 
For erosion, if the center of the stnicturing element coincides with a ' 1 ' in the image 
and if ali pixels in the input pixel's neighborhood are on (' 1 '), the output pixel is on. 
Othecwise, the output pixel is off ('0'). 
With a dilation operation, dl the '1' pixels in the original image will be retained, any 
boundaries will be expanded, and small hoes will be filled. The erosion process is similar to 
dilation, but it turns pixels to 'O,, not ' 1 '. Al1 the boundaries of the objects are etched away, 
and some small objects disappear fiom the original image. A sirnulated example is shown in 
Figure 6.13. A b  an original gray-levei image of stars (Figure 6.13 (a)) is extracted to a 
binary image (Figure 6.13 (b)), erosion and dilation operations are applied to the binary 
image. In Figure 6.13 (c), the erosion makes white spots of the stars smaller and even 
relatively very mal1 stars to disappear fiom the original image of Figure 6.13 (a). Meanwile, 
the dilation of Figure 6.13 (d) makes al1 the stars larger than the original size. 
There are many other types of morphological operation in addition to dilation and erosion. 
However, many of these operations are just modified forms of dilation or erosion, or 
combinations of dilation and erosion. The most useful operations for morphological filtering 
are called opening and closing. Opening consists of an erosion operation followed by dilation 
with the same stnicturing element. It can be used to eliminate al1 pixels in regions that are too 
small to contain the structuring element while keeping large objects the same sizes, shown in 
Figure 6.13 (e). A related operation, closing, is the reverse of the opening, consisting of 
dilation followed by erosion. It can be used to fil1 in holes and mal1 gaps. 
Another interesthg and useful operation is to determine the perimeter pixels of the objects in 
a binary image. This perimeter detection is quite similar to edge detection in gray-level 
images, but with simpler computations. A similar processing as dilation or erosion is 
perfomed on the perimeter detection: we lay the sbucturing element on the image and slide 
it a m s s  the image. The difference is in the operation with the structuring element. As shown 
in Figure 6.13 (0, a pixel is considered a perimeter pixel if it satisfies both of these criteris: 
It is an on (=1) pixel 
One (or more) of the pixels in its neighborhood is off (=O) 
(c) Erosion (d) Dilation 
Figure 6.13. Bïnury Image Processing with various firnctionalities of erosion, 
dilution. opening and perimeter detection. 
At k t  glance, perimeter detection may seem M a l ,  suice the perixneter points can be simply 
dehed  as the transition h m  1 to O (and vice versa). However, perïmeter detection is quite 
usefbl and powemil, particularly for image segmentation and pattern recognition. 
Because of Boolean operators and the dp l i c i ty  of their circuit design, on-chip 
implementation of binary image processing is relatively sûaightfonuard. Here, we try to 
implement on-chip binary image processing with CIS as a demonstration of on-chip local 
operation. Although binary image processing is different nom gray-level image processing, it 
has many similarities in operation, but with much less complicated operational computations. 
As a column processing implementation was proposed for local operation in the previous 
chapter, a column processing structure is implemented for the binary image processing. In 
addition to parallel processing and low power in the column processing implernentations, the 
binary processing offers a number of other advantages. 
The processing element is relatively simple compared to other analog processing 
circuits that need high-levels of wmplexity in their design. 
The algorithm is powerful enough to be applicable to many low-level processing 
applications. 
There is no need for hi& accuracy ADC 
The local storage is relatively simple. 
Here, we designed and implemented on-chip binary image processing with CIS to investigate 
the feasibility of the column structure implementation for local operation and its performance. 
63.2. Revious Works on Binury Image Processr'ng 
The morphological analysis of black-and-white images was initiated by George Matheron in 
the late 1960s. His early work is described in the publication in 1975 of "Random Sets and 
Integral Geometry"[4 1 1. Since 1 975, the use of the fùndarnental morphological operation, 
absent of any significant statistical interpretation, has found a fast-growing field of 
applications. The developments of binary and morphological image processing algorithms 
were accelerated. These algorithms include noise reduction [48] [49], image sharpening 1471, 
edge detection [44], image compression [57] and many other morphological filters 
[QI [43 3 [45] [46] 1501 [5 1 1. Large numbers of image processing software packages and 
hardware peripherals that include morphologicai operation such as diiation and erosion. 
Hardware impkmentations of morphologica1 processors include not only basic operation of 
dilation and emsion, but also more complicated image processing on binary images [52-571. 
In addition to the binary image processors, there have been attempts to integrate binary 
image processing with image sensors, airning for reai-tirne operation of image capturing and 
processing. The on-chip binary image processing has variety of applications such as motion 
detection and analysis [59] [61], fingerprïnt sensing [60], and skeletonization [62]. On-chip 
binary processors with CMOS image sensors were also implemented for high 
programmability and flexibility of operation [58][63]. These on-chip binary processors are 
based on pixel processing implementations, which contain a photodetector and a binary 
processing element in the same pixel. Therefore, due to the high density of processing 
circuits in the pixel, only small sizes of arrays, less than 32x32, were implemented and 
therefore, the applications are restricted to low resolutions. 
Because binary image processing uses a structuring element, which indicates relations 
between the center pixel and its neighboring pixels, the column processing structure is a good 
fit to the impiementation of the binary image processing. Some previous studies have focused 
on the implementation of image processing, not only binary image processing but also 
general image signal processing, in column processing structures 164-701. Also, the basic 
concepts of hybrid methods are also discussed: pipelined structure [71] as well as the 
combined structure of column and chip processing [72]. However, these are not for on-chip 
binary image processing. 
Here, we designed and fabricated on-chip binary image processing with CMOS APS in 
column processing implementation. 
63.3. Design of CMUS Active h l  Sensor wirh On-Ch@ Binaty linage Piocessing 
We have designed and fabricated a prototype chip comprishg a 64 x 64 array in standard 
0.35 pm CMOS technology with 3.3 V power supply. A die photograph is shown in Figure 
6.14. Each pixel is 30 pn square with nfp photodiode, and it has a fil1 factor of 82%. The 
main objectives of this chip are (i) to explore the feasibility of local operation integrated with 
CMOS image sensors, (ii) to demonstrate the scalability of column processing 
Figure 6.14. Die photograplr of the protolype binary image processirtg chip. The 
total area is 3.2x3.2 mm2. 
implementation with 0.35 pm technology, where processing elements are fit to the column 
pitch of the image sensor, (iii) to demonstrate on-chip binary image processing in real time 
mode, with low power consumption, (iv) to demonstrate feasibility of high resolution 
implementation, and (iv) to address the benefits and fuhve research direction of on-chip local 
processing with CMOS APS. 
The chip has one analog output and four different 1 bit digital outputs. The analog output is 
for raw images captured in the normal mode operation, without any signal modifications. The 
four 1 bit digital signais consist of: Binary image, erosion, opening and perimeter. The 
overall operational structure of the chip is shown in Figure 6.15. Since the binary image 
processing is perfonned by column-based processing components, the compact design of the 
processing circuits is easily found at the bottom of the chip (see the dark portion at the 
bottom of Figure 6.14). The chip consists of two main basic portions: one for normal mode 
operation at the top of the photodiode array, and the other for binary image processing at the 
bottom of the array. The normal mode of the chip follows the standard operation of CIS: the 
image is captured by photodiode with integration mode and the image data is transfmed in 
parallel through source followers to the S/H's by row select shift registers, and then 
transxnitted out in series by output buffers. Since basic operation and designs of photodiodes, 
shift registers and SIH's are discussed in the Chapter 2, the description of these components 
are omitted here. 
In contrast, the binary image processing whose overall schematic is shown in Figure 6.16 
consists of voltage comparators, local latches, processing elements, column storage and 
column readout circuits (shifi registers). More detailed structure of the chip is shown in 
Figure 6.17. Once the image is captured by the sarne photodiode as used for the normal mode, 
it is buffered and stored in the SM for the voltage compamtors, the schematic of which is 
show in Figure 6.18. The voltage comparator compares the image with the reference voltage 
to generate 1 bit binary signals (O or 1). which are stored in the local latches and shified row 
by row. Since the CIS array reads out the image data row by row, the shifting rate of the 
local latches should be the same as the clock rate of the row shift register for the CMOS 
imager array. This also means that al1 the necessary processing should be done within one 
cycle of this clock. In this particular design of binary image processing, 3x3 stmchiruig 
element (local mask) is used to define the comectivity of neighboring pixels. 
Local 
Processing 
CMOS Image Sensor Array 






Figure 6.15. OveraIi Operational Stmcture of Binary Image Processing. 
Figure 6.16. Schematic of major components in on-chip binary image processing. 
1 Column Shift Register 1 
Normal 
Image 
1 1 1 1 -  
1 1 1 1 1 1 ~ ~  
Erosiori & Perimater PE Bank 




Figure 6.17. Detailed structure of On-chip Binary Image Processor with CMOS 
image sensor array. 
Figure 6. iû. Schematic of Voltage Compara for [88]. 
After the voltage comparator, there are three linear arrays of the local latches with the same 
number of columns as the imager array, followed by an array of processing elements. The 
circuit design of the processing element depends on which operation is implemented, such as 
erosion, dilation and perimeter detection. Since the operation of opening is based on the 
dilation after the erosion, there is another set of local latches and processing elements after 
the h t  erosion processing array which takes input images nom the erosion and cornputes 
dilation operation on the eroded image, as shown in Figure 6.17. 
Each output of the binary operation of binarization, erosion, perimeter detection and opening, 
needs its own output readout storage (column storage in Figure 6.17) for the senal data-out 
because different binary operations transmit the outputs independently through physically 
separate channels. Therefore, there are five different column storage elements in the chip, 
including the normal mode operation. in the chip, despite the different column storages, only 
two column readout contmls (column shift registers) are used: one for the normal mode and 
the other for binary operation. 
The algorithms of erosion, dilation and perimeter detection are implemented with logic 
(Boolean) gates. The algorithm of erosion is implemented with AND logic gate, shown in 
Figure 6.19 (a). Due to the neighborhood selection of the structuring element, the processing 
logic should be able to discriminate output value of the processing element. In a case where 
no processing elements are selected by the structunng element, the default values of the 
(a) Logic gates for erosion 




(4 Switch for perimeter detection 
(e) Logic gates for diiation 
PE 
SE 
PEClk (e) Switch for dilation 
Figure 6.19. Logic design and schematics of the switches: (a) Logic gates for erosion. 
(b) Switch for erosion and perimeter detection, (c) Logif gates for perimeter deîection. 
(d) Switch for perirneîer detection. (e) Logic gates for dilution, a) Switch for dilation. 
inputs to the AND gate should be ' 1 ', thus lesding to a special design of the switch, shown in 
Figure 6.19 (b). The switch selects the incoming processing element if the corresponding 
coefficient of the structuring element is high at a trigger of the PEClk. Otherwise, the output 
retains its default value of ' 1 '. The design of the perimeter detection is similar to the erosion. 
The difference is in the logic gate for the neighboring pixels: for the erosion, AND gate is 
used and for the perimeter detection, NAND gate is used for the selection of the neighboring 
pixels, shown in Figure 6.19 (c). A h ,  since the default value of the switch is the same as 
that of the erosion, the same design of the switch is used for the perimeter detection. 
The operation of the dilation is significantly different fkom the erosion and perimeter 
detection due to the OR logic operation. With a similar structural design, but different logic 
gates, the dilation consists of two OR gates and 8 difierent switches, shown in Figure 6.19 
(d). Due to the different default value of the switch, the switch for the dilation is redesigned 
with some modifications fiom that of the erosion and perimeter detection, shown in Figure 
6.19 (e). 
63.4. Te- and Performance 
Since the tests on the imager characteristics were dealt with fiequently in the previous 
chapter, the detailed descriptions of the optical characteristics of this chip are not repeated. 
Only basic performance tests such as power consumption, h e  rate and physical parameters 
are discussed here. Rather, the test focuses on the performance of binary operation and their 
effects on the appearance of the images. 
Single Ch@ Charaeîerisîics and Normal Mode Operation 
Here, we are able to ven@ operation successfülly on image capture with some sample 
images, illustrated in Figure 6.20. As expected, the quality of the images captured by the chip 
is not high, partially due to the fact that the chip process technology is not optimized for 
image sensors, but instead for logic and mernory. However, the subtraction of the white 
background image at the same illumination when the images are captured (see Figure 6.20 
(c)) enhances the quality of the captured images by reducing pattern noise. Figure 6.20 (a) is 
a raw image and Figure 6.20 (b) is a pattern noise subtracted image. There are some 
noticeable differences in their image quality: the processed image is cleaner and has a higher 
(a) A raw image (b) Processed image (C) Wïtite background 
Figure 6.20. Real time images caprirred by the chip in nonnul mode operation. (a) 
Raw image. fi) Processed image afer the subtraction of white backgroundfi.om the 
raw image, (c) White background image. 
Technology 





Format of array 
Fil1 factor 
Maximum Fame rate 
Power 
Light lux 
0.35 um CMOS 
3.3 v 






100 Khz (24 h e s / s )  
3.05mAx3.3 V =  10.065 mWat 50Khz 
sampling rate 
Room light (150 - 200 lux) 
Table 8. Characteristics of single chip. 
contrast. The chanrcteristics of a single chip are sumrnarized in Table 8, including basic 
characteristics of the chip. 
The power consumption of the chip is about 10 rnW at a h e  rate of 12 h e s / s e c .  This 
includes both the normal operation and binary image processing with 4 different outputs. The 
pixel size is 30.8 x 30.8 jun2 which is relatively large. This is due to the interconnections and 
the processing elements in the columns. Since pre-built digital components such as flipflops 
and logic gates are used from a standard Iibrary, the minimum size of the design area (- 20 
pm) cannot be changed. Custom layouts for these components will, however, optimize the 
colurnn width and thus reduce the pixel size. In addition, the large pixel size with the large 
fil1 factor of the chip is necessary to increase the photosensitivity of the photodetectors, 
already degraded by the poor optical charactexistics of the process technology. 
This poor photosensitivity also affefts the ûame rate of the chip. As shown in Figure 6.2 1, as 
the k e  rate inmeases, the quality of the images capturai degrades rapidly. When the fiame 
rate reaches around 100 KHz of data rate, it is noticeable that the image has become 
degraded with pattern noise and poor contrast. Therefore, the binary image processing 
typically operates at around 20 KHz and 50 KHz, which is relatively low compared to 
commercial high performance chip with a data rate around 20 - 40 MHz. 
In normal mode, there is a defect on the image sensor array. Even under unifom illumination, 
the image displays a white half circle at the top of the image, shown in Figure 6.22 (a). Also, 
this white half circle appears on the binary image which have been filtered with a threshold. 
Figure 6.22 (b) shows an image of edges in the binary image, illustrating half-circle 
boundaries at its top of the image. This seems to be due to the cross talk and noise, which are 
generated by the nomial mode readout circuits located at the top. It can be verified by 
observing a binary image in edge detection mode while the normal mode is turned off. When 
the normal mode is tumed off, the dges (boundaries) of the half-circle cannot be found, 
shown in Figure 6.23. It is concluded that the readout peripheral circuits of the normal at the 
top of the array generate the unwanted defects on the normal operation. The optimization of 
the cross-talk @y putting a ground Nig between the array and the readout circuits) is 
expected to eliminate this defect. 
(a) 1 O KHz 
(c) 50 KHz ('d) 100 KHz 
Figure 6.21. Effects offiame rate in n o m l  mode operation. 
(a) Nonnal Mode (b) Edge detection mode 
of a binary image 
Figure 6.22. When both normal mode and binary operation are on, n defect of white 
spot can be found at the top of the image in both images. 
(a) Normal Mode (b) Edge detection mode 
of a binary image 
Figure 6.23. In processing on& mode, the defect in Figure 6.2.3 disappears 
regardles of Vref: 
Operaîlon of On-Ch@ Binary Image Processing 
On-chip binary image processing consists of four different operatioas; thresholding 
(binhtion),  erosion, dilation (later combined with erosion, it becomes an operation of 
opening), and perimeter detection. Figure 6.24 shows sample images of these binary 
processing, captwed by the chip in real tirne operation mode. Because the outputs could not 
be displayed at the same time with our testing equipment (we have only two probes), some of 
the images have tirne differences, although the chip outputs al1 images in parallel. 
A good demonstration of the binary processing is also shown in Figure 6.25. After capturing 
the image, the raw image of Figure 6.25 (a) is sent to the voltage comparator to generate the 
binary image, Figure 6.25 (c). Through the first set of linear arrays for the local storage and 
processing elements, operation of erosion and perimeter detection are perfoxmed on the 
binary images. Through the operation of erosion, boundaries of objects are etched away and 
disappea.. Large white spots are etched away, bewming smaller and some small spots 
disappear from the image, show in Figure 6.25 (d). Another operation with the same local 
latches is the perimeter detection. In this particular design of my chip, the perimeter detection 
is applied to the binary images (see Figure 6.25 (e)), not on the images afier the msion. The 
last operation of the binary image processing is the opening that eliminates al1 pixels in 
regions that are too mal1 to contain the stnicturing elernent. As shown in Figure 6.25 (f), the 
small white spots of the binary image, Figure 6.25 (c), were disappeared after the opening, 
but the original shapes are maintained for large spots. This process can be used in object 
discrimination and spatial noise reduction. 
The performance of the binary image processing integrated on the chip should be 
independent of shape of the objects in the input image. Figure 6.26 shows the operation of 
the chip on different shapes of the input objects; circle, triangle and rectangle, demonstrating 
that the chip operates independent of the shape of the objects. 
Interestingly, and perhaps obviously, the binary image processing is very dependent on the 
conversion from a raw image to a binary image. The conversion is accomplished by the 
voltage comparator; when the input voltage is lower than the reference voltage (bnght image), 
the output is ' 1 ', otherwise, the output is 'Oy. As noted, choosing a proper reference voltage is 
(a) Raw image (b) Processed image 
I 
(c) Binary image (d) Erosion 
(e) Perimeters (6 opening 
Figure 6.24. ample images of CMOS Active pixel sensor with on-chip binav image 
processing. 
(a) Raw image (b) Processed image 
(c) Binary image (d) Erosion 
Figure 6.25. Demonsirations of binary image processing. Al1 the images are 
cap~red by the protovpe ch@ in real time mode. 
(b) Perimeters 
(c) Binary image 
(d) Erosion 
Figure 6.26. Independent operaiion of binary image processingfiom the shape of 
the objects. 
quite an important process for the binary image processing. This processïng is also called 
'thresholdinf in the image processing field. Although there have been many studies and 
dernonstrations 1731 [74] [75] [94], thresholding is uot an obvious and straightforward 
subject Figure 6.27 illustrates the effects of the different reference voltages to the 
comparator on the binary operation. It demonstrates the importance of the reference voltage 
to the binary image processing. For example, the reference voltage of 1.56 V gives the best 
results on binary images and their operation, at this particular input image and under a 
particular illumination (environment). However, this reference voltage does not give the best 
results al1 the time, rather the most appropnate voltage should be chosen carefûlly for 
diffkrent environment and input images. When the reference voltage is low compared to the 
average pixel values of the input image, the output image mainly consists of 'Oy, 
correspondhg to a black image, where objects cannot be recognized and boundaries of the 
objects are meaningless, as shown in Figure 6.27 (b). In contrast, as the reference voltage 
increases, the gray levels of more pixels becorne over the reference voltage, producing ' 1 ' as 
their outputs in the binary image outputs. The acnial shape of the face becomes more 
recognizable and the boundaries of the object become more reasonable. When the reference 
voltage gets too high, most of the gray levels are over the reference voltage, generating an 
almost white image (see Figure 6.27 (h)) for its binary output images. Also, the boundaries of 
the object become meaningless once again. 
Another interesthg test is based on the stmcturing element that defines the effect of the 
neighboring pixels on the final output value of the pixel. With the 3x3 struchiruig elment in 
this chip, the coefficients ('O' or '1') of the structuring element are controllable extemally, 
which means that the connectivity of the neighboring pixels can be selected. Here, several 
diffkrent connectivities (structuring elernents) are explored. Figure 6.28 shows a 
dernonstration of the different struchiring element with different connectivities on the binary 
operation. A stnicturing elernent of 3x3 local mask (see Figure 6.28 (b)) is applied to an 
original image of a triangle, Figure 6.28 (a). With different stmcturing element of different 
comectivity, each operation of buiary image processing (perimeter detection, binary, erosion 
and opening) is applied to the original triangle image. Their output images of the binary 
operation are shown in Figure 6.28. 
Raw image 
(b) Vref = 1.20 V 
Binary Image 
(c) Vref = 1.30 V 
Processed image 
Erosion 
(d) Vref = 1.40 V 
CHAPTER6 
(e) Vref = 1.50 V 
(f) Vref = 1.56 V 
(g) Vref = 1.60 V 
(h) Vref = 1.70 V 
Figure 6.2 7. The effects of reference voltage 
(As changing Vref; a? Vc = 0.55 V optimal voltage for best image quali~). 
(a) Original input Image 
(c) 8-connected 
neighboring pixels 
(d) Cross 4- 
connected 
neigh boring pixels 




# Cross 4-connected for 
erosion and Diagonal 4- 
connected neighboring 
pixels for dilution 
For Erosion For Dilation 
Figure 6.28. Connectivity: eflects of dtfirent neighboring pkels of shuchring 
element on  bina^ image processing operation. 
Interestingiy, even with 8-comected and 4-connected neighboring pixels in the stnicturing 
elements, the images of the buiary operation are not greatly afTected. This is because the size 
of the structuring element is too small to generate significant impact on the output images. 
Therefore, the size of the structuring element will affect the output binary images 
considerably. Since the 3x3 structuring element in this chip is relatively small, the changes in 
its coefficients are negligibie. M e n  the size of the structuring element becomes larger to 5x5, 
7x7 or larger ones, the selection of the coefficients will influence the appearance of the 
output image. 
63.5. Summary and Conclusions 
In this section, we have describeci a design for CMOS active pixel sensor with on-chip binary 
image processing and its analysis, dong with operational perfomance and experimental 
resuits. We have explored the feasibility of local operation integrated with CMOS image 
sensors, concluding that column processing architecture is the best fit provided the 
interconnections to neighboruig pixels are not excessive. As a demonstration, the operations 
of binary image processing (global thresholding, erosion, dilation and penmeter detection) 
are integrated on a single chip with CMOS image sensor array. The on-chip real-tirne 
operation allows image capturing and image processing in parailel, thus pennitting low 
fkequency processing circuits and reducing power consumption. The binary operation, with 
each PE implernented per colurnn (also called column processing structure), is designeci with 
digital storage and logic gates, and demonstrated successfilly for its real-time operation with 
low power consumption. 
In this particular design of on-chip binary image processing, each processing element is fitted 
into a 30 pm column width which is larger than that of the average image sensor (< 10 p). 
However, custom layout of digital latches and logic will reduce area and optimize the 
processing power consumption. In addition, as the technology scales down, the size of 
processing element can shrink and more metal layers cm help reduce the area of the pixel 
interconnections. 
Due to a design mistake in the image sensor array, some defects are observed in the image in 
the normal mode operation. This is attributed to cross-talk between the image sensor array 
and the readout circuits, which can be eliminated by putting guard ring around the sensors. In 
addition, the layouts are not particularly optunized for the spatial or temporal noise. Layout 
optimization will help in the noise reduction of the images. It will also enhance the 
degradation seen in the image when both the normal mode and the binary processing mode 
are on. 
The design of the prototype chip is a good demonstration of one possible Mplementation 
structure for on-chip image processuig. However, it does have limitations in terms of 
programmability. Many operations of programmable binary image processing require 
repeated or iterative processing on the images at various stages of the processing. For best 
results, the images have to be fed back to the same operation over and over again, or to 
different operations. In contrast, our on-chip binary processing takes the input image straight 
fiom the image sensor array, and thus it is not able to do repeated operations on the input 
image. For repeated computations, a number of the processing components need to be 
designed on the data path, each independently operating its fûnction each time. However, the 
design of the repeated operation is a trade-off between the complexity of the design, power 
and area. 
The design of on-chip binary image processing, therefore, is for low level processing 
applications where low power consumption and design cost are emphasized. This 
demonstration of the chip is intended to prove firnctionality and feasibility, and to as a guide 
to the fbture research direction. Primary obstacles for on-chip local processing 
implementations are due to the design complexity of processing elements and the large 
number of interconnections between neighbo~g pixels. With the 0.35 pn technology, it is 
not impossible, but very difficult to implement local masks larger than 3x3. As the 
technology scales down and more metal layers are available, this restriction on the mask size 
will be loosened and 5x5 local masks will be easily feasible in the future. In order to get 
effective results h m  some of the local operations, at least 5x5 local masks (or larger) should 
be applied even with the tradeoff of area, power and design complexity. However, instead of 
voltage operation in these elements, current mode operations are expected to reduce the 
design complexity. Also, current mode operation can reduce the required processing time by 
elirninating the phenomenon of charging and discharging on the capacitive nodes [93]. Low 
power operation is achievable with this current mode because the dynamic range of the 
output is due to current, not voltage, which is less affected by the low voltage supplies, 
expected in more advanced CMOS technologies. A more detailed example of cment mode 
processing is illustrated in Appendix A, where a modified pixel structure of an inverteci 
logarithmic pixel sensor is introduced. 
Chapter VI1 
7. Global Operation 
7.1. Introduction 
Approaches to image processing fall into two broad categones in ternis of operational 
domain: the spatial and fiequency domains. The spatial domain refers to the image plane 
itself. Approaches in this category are based on direct manipulation of pixels in an image. 
Frequency domain processing techniques are based on modifjring the spatial Fourier 
transforrn of an image. An image in spatial domain is converted into fkquency domain by 
the Fourier transfom. Since computation of convolution in spatial domain is equivalent to 
multiplication in fiequency domain, manipulation of a linear system on the image becomes 
multiplication of the image by a filter transfer function in the frequency domain. The 
resultant image in the fiequency domain is converted to the spatial domain by taking the 
inverse transform. Many basic ideas of mioothing, sharpeniag or edge detection filters arise 
fiom concepts directly related to the Fourier transform because these filters attenuate or 
intensim only portions of fiequency components [74] [753. 
Frequency domain operations, by the very nature of the fiequency transforms, are global 
operations, where al1 the pixel values in the image are taken into consideration at once. Of 
course, fiequency domain operation can become a local (or mask) operation, based on a local 
neighborhood, by performing the transform on small image blocks instead of the entire image. 
In contrast, spatial domain processing methods include al1 the three types of point, local and 
global operation. Global operation in the spatiai domain leads to very difficult 
(a) Ideal Low Pass (b) Fxponentid Low (c) Gaussian Low Pass 
Filter Pass Filter Filter 
Figure 7.1. Transfer firnction of dzrerent types of low passfiZtersS H (u, v) is the 
transferfinction and D(u. v) is the distancefiom the origin. 
design issues and implernentation methods, due to the connections required to al1 the pixels 
in the image. Therefore, global implementations are implemented more like local operation 
methods by restricting theü neighborhood to the localized area. For global operation, 
fkequency domain methods are prefmed to spatial domain methods because manipulation in 
the frequency domain is relatively easier and more powerful, at least in software 
implementations. There are plenty of examples for the fiequency domain processing, which 
are well established and well documented [74] [75]. 
In the fiequency domain, there are generally three types of image enhancement filters: 
smoothing filters, sharpening filters and homomorphic filters. The smoothing filters in 
fiequency domain are similar to the smoothing filters in spatial domain. In fact, the basic idea 
of their operation is identical: Edges and sharp transitions of an image, which contribute 
significantly to the high-fiequency content of the transform, are smoothed/blurred out by 
attenuating a specified range of high-fiequency components in the transform of a given 
image. 
An obvious smoothing filter is the ideal low pass filter (see Figure 7.1 (a)), which attenuates 
or eluninates the high-frequency content of an image. The ideal low pass filter is 
theoretically desirable, but in practice there are no filters that match with the ideal operation 
in hardware. The practical împlementations of the low pass filters include exponentid low 
pass filter (see Figure 7.1 (b)) and Gaussian low pass Mter (see Figure 7. L (c)). 
In contrast, sharpening operates in the opposite way to smoothïng. Because edges and abrupt 
transitions in an image are associated with hi&-fiequency components of the transform, 
sharpening is achieved by attenuating or eliminating the low-frequency components without 
disturbing high-fiequency components of the transform. Sharpening filters are subdivided 
into hi& p a s  filters and high fiequency emphasis filters. The high pass filter is exactly 
opposite to the low pass filter, which is also expressed as (1 - low pass filter). Its tramfer 
function is shown in Figure 7.2. The high fiequency emphasis filters, also known as 
Homomorphic filters, use this characteristic that the illumination component of an image is 
typically associated with slow spatial changes, while the reflectance is with abrupt transitions, 
relating the high-fkequency components of the Fourier transfonn of the logarithm of an 
image [74]. This control requires specification of a filter b c t i o n  H (u, v) that affects the 
low- and hi&-fiequency components of the Fourier transform in different ways, as shown in 
Figure 7.3. Examples of the high fiequency emphasis filters include generalized unsharp 
masking, inverse blur model, difference of Gaussian (DOG), Laplacian of Gaussian (LOG) 
and modulated Gaussian (Gabor) filters [74] [75]. 
Figure 7.2. Tramfer fUnctim of ideal high p a s  
filter, which is used us a sharpeningfiter. 
(a) Dzrerence of (b) Laplacian of 
Guassians Gaussians 
(c) Modulated Gaussian 
(Gabor) filter 
Figure 7.3. Transfer ficnctiom of h igh fiequency emphasir filters. 
in practice, small spatial masks are used considerably more fiequently than the Fourier 
transfonn because of their sirnplicity of implementation and speed of operation. From the 
aspects of on-chip implementations, the spatial masks are often more of interest than the 
frequency operations. In the spatial domain, global operations can be divided into two 
categones: One is a global operation of which the output is an image deîïned from the input 
image, the other is where the output of the operation is the information extracted fiom the 
input image. A resistive Gaussian filter [78] is an example of a global operation with an 
image output in the spatial domain. Aithough the strong influence of the neighborhood is 
limited to 5 or 6 pixels, depending on the resistive values, the resistive network connects al1 
the pixels into a system, and a pixel value is affectai by al1 other pixels. An example of 
information-extracted global operation includes histogram operation. In order to generate the 
histogram of the input image, the operation needs al1 the pixel values of the array, but does 
not modify them and does not generate a new output image; the output is information about 
the input image. This output can be used to modify the output image as discussed in Ch.5. 
Practically, the on-chip implementation of the spatial global operation with image output is 
not possible due to the heavy interconnections, unless each pixel is interactive Iike the 
network of the Gaussian filter, where one pixel stores other pixel values and this propagates 
through the entire array. However, even the resistive Gauassian fxlter does not give a 
practicai realization. In contrast, the global operation with information output is relatively 
easier to implement because information of al1 the pixels c m  be collected through a common 
data channel. 
7.2. Structure of GIobal Processing 
When it cornes to the implementation of on-chip image processing, particularly for global 
operation, the systern level architecture becomes an important step of the design process. 
Since we have considered global operation in terms of the fiequency and spatial domains, we 
start by examining structural implementation by looking at the possible methods for these 
domains. In the fiequency domain, one of the most essential cornponents would be a Fourier 
transformer that converts from the spatial domain to the frequency domain. As before, it can 
be integrated at pixel, column or chip levels. The design of the transformer, however, takes a 
significant number oftransistors, which is not appealing for a pixel level implementation. It 
is more reasonable to implement the transformer at every column of the array or at the output 
channel(s) of the chip. A f k  the Fourier transformation, the image in the fiequency domain 
wouid be manipulated by image processing algorithms. The manipulation is based, by the 
nature of the global operation, on the contents of al1 the pixel values in the array. One 
possible (the most adequate) method is to use an analog memory to hold and process the 
contents of al1 the pixel values. The analog memory implementation often requires high 
complexity of design and precision. Also, the design area, which is roughly proportional to 
the fabrication cost and power, becomes large. Therefore, it is difficult to integrate image 
sensors, Fourier transformer and image processing elements on a single chip, even with low- 
level image processing. 
In the spatial domain, implementation of global operation is relatively easier than in the 
fiequency domain, simply because of the absence of need for a Fourier transformer. Spatial 
domain processing for image output can be implemented with pixel level integration and 
analog memory processing. Because it is practically impossible to have one pixel connected 
to al1 other pixels with one designated channel between the pixels, the connection should 
have a characteristic of propagation similar to a resistor network. So, pixels which are not 
directly connecteci together may affect each other indirectly, because of the propagation 
effwts of the grid connections. in pixel processhg integration, therefore, each pixel should 
have a characteristic of this holding and propagating, where, after the pixel captures an image 
signal with its photodetector, global processing occurs over ali pixels of the array in parallel. 
The pixel still holds its own image signal, and processes and propagates al1 other pixel values. 
The main advantage of this implementation is parallel processing where low power 
consumption cm easily be achieved. However, this implementation will suffer h m  severe 
reduction of fiU factor in the pixel. Because of the nature of hold/propagate, the complexity 
of the circuit design is hi&, typically involving a large nurnber of transistors, and sometimes, 
the use of passive elements such as capacitors. 
In order to avoid the severe reduction of fill factor, analog memory implementation may be 
applied for global operation. By placing processing elements out of the photodetector pixels, 
the entire area of the pixel can be occupied with a photodetector and a readout buffer. As 
trade-offs for this gain in fill factor, the chip area, power and speed will be sacrificed. Also, 
there is always a t h e  latency of one image frame because, afier the image capture, the image 
h e  is stored and processed in the analog memory, instead of being output duectly. Similar 
to pixel processing integration, the global connections between pixels should be doue using 
the method of propagation. Otherwise, it is practically difficult to implement, especially for 
large format mays. 
Global operation with infonnation output cm be implemented with chip processing, where 
the processing element is located at the end of common output channel, collecting 
information from the pixels and generating final output. In order to collect information h m  
al1 the pixels, the pixel data should go through a common processing element. Therefore, the 
processing element requires hi&-speed operation, at least quivalent to the pixel rate, 
causing high design complexity, high power and a potentially high digital noise level. 
The general description and cornparisons are summarized in Table 9, in ternis of their 
operation domains. However, there are no easy ways to integrate global operation with image 
sensors on the same focal plane. The general implementation methods for global operation 
are neither practicai nor feasible due to the heavy intercomections between pixels, and due to 
the circuit complexity. Instead, the implementations for global operations should be rather 















reduce feasiïbility of 
irnplementation 
except for special 





of al1 pixels is not 
feasible 
Concurrent readout 




reduce feasibility of 
practical designs. 
High fil1 factor but 
degradations on 
chip area, power 
and speed 
idormation output 
impractical design due 
to presence of a 
cornmon data channel 
Impractical design due 
to presence of a 
common data channel 
Most suitable 
structure, but modified 
structures combined 
with pixel and column 
processing are more 
practical 
Impractical design due 
to presence of a 
common data channe1 
Frequency Domain 
Fourier transformer at 
column or chip level, and 
global interconnections are 
done at pixel level and 
analog fiame memory 
structure. Typically, the 
implementation of on-chip 
Fourier transformer is 
complex 
Table 9. General descriptions and cornparisons of global operation. for dcyerent 
operation domain 
Here, as a demonstrattion of global operation, we report a 2-D object positioning system with 
partially global connections, dong with its implementation and performance. This 
implementation is a semi-global operation with information output. This is used to examine 
the feasibility of on-chip global implementation with idonnation output and the design 
issues involved in the implementation, and its suitability for different applications such as 
motion analysis and object extraction. 
7.3.2-D Object Positionhg System (OPS) 
The 2-D OPS encodes 2-D information into two sets of 1-D information. Figure 7.4 
illustrates the basic operation of the OPS; whenever objects are detected, the pixels 
containing signals above a threshold send flags to the column and row simultaneously. In the 
OPS array, each pixel has a photo-detector and an in-pixel voltage comparator. Whenever the 
input light level is higher than the threshold, the in-pixel comparator flags up to its 
corresponding row and column. Each row and column has an NAND gate fiinction, 
generating '1' when all pixels in its row/column are over the threshold, as shown in Figure 
7.5. Hence, dark objects are detected fiom a lighter background by presence of a 'O' in row 
or colimui. For example, as shown in Figure 7.4, pixels corresponding to a circle send flags 
to their corresponding columns and rows, and thus the final image captured becomes a square. 
Although some information is lost, the systern enables straightforward determination of the 
presence or absence of an object, as well as its size andlor orientation. Multiple objects can 
also be characterized. Moreover, simple combinational logic on the latches can be used to 
apply an object size threshold, which is othenvise difficult to achieve. 
The OPS does not require scanning readout but provides a tme simultaneous readout, making 
the frame rate independent of the scanning time of each pixel. Conventionally the fiame rate 
of the array, especially large arrays, depends on scanning tirne of individual pixels because 
image signal fiom each pixel has to be transmitted one by one. By the nature of fast k e  
rate in the OPS, it can be used in motion detection as well as many other dynamic image 
acquisition applications. 
In addition to the fast readout time, the OPS reduces the image data nom hJ2 to 2N, where N 
represents the number of rows and columns. A dual channel is used for the output, vertical 
and horizontal outputs, increasing the total output rate. With the fast readout tirne, data 
Deco&d output image 
Figure 7.4. Structure of 2-D Object Positioning System and its basic operation. Wirh 
on& two input controls signais (Reset and Select), simultaneous outputs converted in 
two sets of linear data fiom the 2-D array plane. Two 2-D data can befirther 
processed and displayed inîo 2-D plane. A circle in the original plane is interpreted 
as rectangle in the display. 
Re&+ < , 1 Global 
1 NAND 
1 
Figure 7.5. Structure of global connections. 
reduction and dual output channels the high h e  update rate is a focus of the OPS. In 
addition, the OPS uses digital signal transmission, and thus it is relatively immune to noise 
during transmission. Applications of such a threshold-based system include industrial web 
inspection, earth observation fiom space, robotic vision, and other applications where object 
detection with high speed is the primary goal. 
The main objectives of this prototype chip are (i) to explore the feasibility of global operation 
integrated with CMOS APS, (ii) to demonstrate high-speed operation of the OPS and (iii) to 
address limitations and fiiture research directions of the global operation. 
l.3- f Chip Design 
The structure of the OPS is quite similar to that of a standard CMOS image sensor may. The 
chip whose die photo is shown in Figure 7.7, consists of photodiode pixels, in-pkel 





1 Column Shift Register 1 
Figure 7.6. Overa Il stmcture of C I '  array with object positioning system. 
CHAPTER 7 
Figure 7.7. Die photo of Object Positioning Chip. 
Reset 4 $ Output 
Figure 7.8. Schemutic of a pirel for 2-D Object Positioning System It consists of 
photodiode and in-pixel comparator. me in-pixel comparator is composed of a 
common source amplifier and an inverter. The bias h*ansistor and inverter are 
locuted outside the pixel. 
OPS is shown in Figure 7.6. It has dual output channel where each channel transmits the data 
fiom each of the vertical and horizontal lines. The pixel has the same p n  junction photodiode, 
but it uses an in-pixel comparator instead of a source follower buffer for the pixel readout. 
An in-pixel comparator should be a simple structure using the fewest transistors possible in 
order to maintain a high fil1 factor. The in-pixel comparator uses a common source (CS) 
amplifier with an inverter at the end of data line to enhance switching activity, as shown in 
Figure 7.8. Because the inverter and bias transistor can be located outside of the pixel, only 
two transistors are needed in a pixel. Vref and Vbiasp affect the speed and threshold voltage 
of the switching. Because the output of the pixel is read out vertically as well as horizontally 
to the line latches, each pixel has in-pixel comparators for each line. When the pixel detects 
an over-threshold signal, it sen& a flag to both lines simultaneously. Since every pixel in the 
same line (column or row) is connected together, the values are read out to the lines 
simultaneously fkom every pixel in the same output line. If, during the tirne when the output 
value of the in-pixel comparator is sent to the outside, the light intensity is higher than the 
threshold, an output of ' 1' is transrnitted, otherwise, 'OY. Hence, whenever the pixel detects 
that the light intensity is over the threshold, the in-pixel comparator triggers the flag to the 
output line. Initially, the output of the data line is set to ground. When the light intensity is 










Figure 7.9. Schematics of a pixel and ewnt detection Zatch in 2-D Object 
Position ing System. 
is switched off and the PMOS bias transistor lets the output node charge up to VDD. Since dl 
the pixels in the same data line are linked together, if any comparator along the line is 
switched on, the line rernains switched on. It is an AND logic function (refer to Figure 7.5). 
At the bottom of each data line, there is a skewed inverter before the latch. The inverter 
enhances the switching sharpness and speed. In the cornmon source amplifier, Vref 
determines the lowest voltage level of the output voltage on the data Iùie. Vref should be 
recognized as '0' for the inverîer even when it is over Vdd/2. A skewed inverter was 
carefully simulated and the optimal size ratio of the transistors, by increasing the size of the 
PMOS in the inverter was decided. By adding an inverter, the overail logic fünction 
becomes NAND gate; the output is 'O' only when al1 the pixels along the line have high light 
intensity. Otherwise, the output is '1'. Hence, this system detects dark objects on a white 
background. 
In order to read out data fkom the array to the serial output, the data is multiplexed out aAer 
king stored in the latches in vertical and horizontal lines. Shift registers send enable signals 
to the latch multiplexer and transmit the data one by one to the serial output channel. The 
latch uses a simple digital component, either a flip-flop or an inverter based design. In our 
design, a flip-flop design is used for simplicity. 
Z3.2. Demonstrution and Tests 
The OPS chip was fabricated in 0.35 p CMOS technology with 3.3 V power supply, and 
has been demonstrateci successfully. Imager characteristics of the chip are shown in Table 10. 
When a circle is shown to the sensor array, the in-pixel comparators f h t  digitize the shape. 
The outputs of all the in-pixel comparaton in the line are then NAND gated into one output 
per line. Therefore, the shape of the object becomes a square as shown in Figure 7.10. Al1 the 
shapes of the objects are encoded into squares or rectangles by the array NAND gates. This 
rnechanism hides some of information that otiginally exists in the objects. However, some of 
the critical information, such as position and size of objects, are preserved and encoded into a 
smaller amount of data at relativeiy high speed. By the nature of the operation, when more 
than NO objects exit in the field of view, false objects are created in the overlapping area of 
the objects. When two different circles exist in the white background, the output image 
contains two original squares and two extra squares which are falsely created in the 











-- - -- 
C haracteristics of Chip 
2880.5 x 2880.5 pm2 
64 H x 64 V (4096) pixels 
23.8 pm x 23.8 pm 
72% 
0.35 um CMOS 
24 frames/second at 290 lux 
3.3 v 
20 mA - 8 mA (conversion chip) = 12 mW at 24 
~ e s / s e c  
66 mW - 26.4 mW (conversion chip) = 39.6 mW at 24 
~ e s / s e c  
1 bit Digital output 
68 PGA 
Table IO. Characteristics of ch@ tests. 
(a) OrigrgrnaI Images presented to sensor 
(5) Encoded Images of the objects reconstructedfiom senror output 
Figure 7.10. Sample images fiom the 2D object positioning chip. Input shapes 
are encoded into squares or rectangles in the final output images. 
(a) Original Images 
(b) Reconstructed Image of the Slupes 
Figure7.11. When multiple objects exist in the input image, there are defects 
(countepart objects) in the output image. 
fa) Array Vout (%) vs Light Powr 
I 
m t 
O 10 20 30 40 
Light Pomr (u W/cm2) 
(b) Vbaisp vsunifonnity 
35 1 1 
L i g h t  power when aii pixels are whit 
Light power when ali pixels are black 
Figure 7.12. Test resu fts of 2 4  OPS imager- (a) With dzfferent Vbiasp, the responses 
of outputs are d r m .  (ô) Non-unifonnity of OPS imager can be measured. 
overlapping area of the original ones, shown in Figure 7.1 1. 
Figure 7.12 illustrates the relationship between Vbiasp and array uniformity. Here, no pattern 
noise reduction was implemented Figure 7.12 (a) indicates that not al1 pixels switch at the 
same scene illumination intensity, due to a combination of pattern noise in the sensor and 
non-unifonnity in the cornparators. In Figure 7.12 (b), the upper line represents the light 
power at which al1 the pixels are high and the Iowa line is verse versa. in the light power gap 
between the two lines, the white and black spots CO-exist due to the non-uniformity response 
of the pixels as well as the in-pixel comparators. The gap between the two lines represents 
how much light power difference should exist for the objects to be recognized correctly. The 
minimum difference in light power is consistent at different biasing voltages. Therefore, it is 
necessary to remove this non-uniformity before image processing for segmentation, object 
recognition, mode1 fitting, etc. 
7.3.3. S u m m a ~  and Conciusions 
Here, we have seen an example of on-chip hplementations for spatial global operation 
integrated with a CMOS image sensor. The 2-D Object Positioning System extracts the 
coordinates of objects of interest by detecting a property of the image (in this particular case, 
the property is the light intensity). The 2-D OPS encodes 2-D information into two sets of 1- 
D information. The basic operation of the OPS is as foliows; whenever objects are detected, 
the pixels containhg signals above a threshold send flags to the column and row 
simultaneously. For example, pixels corresponding to a circle (or any other objects) send 
flags to their corresponding columns and rows, and thus the final image captured becomes a 
square (or a rectangle). By encoding 2-D information, the OPS enhances the speed of the VO 
interface, which is o h  a bottleneck of the processing speed, especially for vision 
applications. In a case of NxN pixels array, the reduction ratio will be NxN / 2N = N 1 2, 
which is significant when the array is large. In addition, the OPS chip operates in reai time 
mode that is advantageous in its operation and applications. 
However, the operational speed of these prototype sensors was not as high as was originally 
expected. This is mainly due to poor photosensitivity of the photodetectors. Although the rest 
of this chip, 0th- than the photodetectors, cm nin at high speed, the photodetecton need a 
long integration tirne, which becomes the bottleneck of system speed. With optimization in 
optical performance and in noise level, the positionhg system can achieve a high speed 
operation. 
In the particular case of motion detection, the concept of the Object Positioning can be useful 
due to its high speed and data reduction, even though some of information for the object of 
interest are lost during the processing and some artifacts occur when multiple objects are 
located in the same field of view. The main concern in the on-chip implementation of the 
concept is that the in-pixel comparators suffer fiom operational mismatch due to Vt and 
lithographie variations. A better design of the comparator can be achieved with the sacrifice 
of circuit complexity and design area, where photosensitivity of the pixel would be reduced. 
As the technology scales down, the scalability of the in-pixel comparator is high. 
In conclusion, the implementation of bue global operation in the spatial domain is not an 
easy task, mainly because of its requirements for extensive inteicomections. large 
computational power and high design complexity. Rather than a general irnplementation of 
the operation, the approach should be application specific and operational algorithm 
dependent. Unless the application requues the global interconnections (or partially global), 
the on-chip implementation of the global operation in spatial domain is not recommended. 
Chapter Vlll 
8. Summary and Conclusions 
Raw output images fiom CMOS sensors are not likely to be optimal for display or m e r  
processing mainly because of noise, bliariness and poor contrast. In order to minimize these 
degradations, image enhancement and processing mechanîsms (meaning circuit level designs 
apart nom device level modifications on photoreceptors) are necessary because the device 
level modifications often meet baseline limitations of the standard process technology. 
When real tirne image acquisition and processing are desired, the integration of image 
processing (vision) algorithms with image sensors has many advantages. In this thesis, 
integration of image processing and image sensors is presented with a concept of smart 
sensors. The integration of vision algorithms and image senson is an attractive research field, 
which can provide low fabrication cost, low power consumption and fast processing for 
various applications. In addition, analoglmixed signal image processing achieves additional 
advantages of compact size and fast continuous mode to the integration benefits of smart 
sensors. 
This thesis discussed two main concepts: MOSAiC imager and smart sensors. MOSAIC 
concept was proposed to achieve a large field of view and a high scene update rate. The 
MOSAIC may is describeci for a distributed sensor consisting of 102 - 1o3 identical 
detection modules linked by a serial bus to a centrai controller. Main challenges of the 
MOSAIC irnagers are large data flow and slow kame rate. The design of the MOSAIC 
system focuses on enhancement of frame rate, by a single chip solution (i.e. integrating 
CMOS image sensors and bus intexface modules on a same focal plane). Custom bus 
intdace modules increased performance of the bus connections by an efficient design of 
zero-wait state, at effective cost. Therefore a MOSAIC imager comprishg many single-chip 
modules is capable of covering a larger field of view ( l d  to l d  or more) than the 
conventional single chip camera system, with the enhanceci data update rate. Also, a smart 
sensor with critical information extraction was proposed here as an alternative solution of the 
MOSAiC imagers. The on-chip processing of the smart sensors extracts the information at 
the f?ont end of the imagers, reducing data 80w and thus, increasing the field of view andlor 
update rate. 
in the second part of this thesis, integration architectures and design methodologies were 
investigated for application to anaiog VLSI implementations for smart sensors. The basic 
concept of the integration architecture comes from an idea that, for the integration of image 
processing with CMOS image sensors, vision algoritbms and application specifications 
should be considered, in addition to the selection of appropriate processing circuit. 
Conventionally, the integration methodology focuses on reducing circuit density of 
processing element integrated with image sensors. This thesis argues that not only the circuit 
density is important, but also the algorithms of processuig are sometimes more crucial. 
Hence, various vision (image processing) algorithms were investigated systernatically 
according to interconnectivity with neighboring pixels (the region of operation). The vision 
algonthms were partitioned into three major groups: point, local and global operation. These 
algonthms were once again sub-divided by functionaiity, size of local masks and operational 
domain. For each sub- partitioaed algorithm, di fferent implementation architectures were 
proposed and compared in ternis of design area, speed, processing time, power and pixel fil1 
factor. 
The proposed general guideline is summarized in Table 1 1, where system level architectural 
designers and circuit engineers can start their milestone implementations for smart sensors 
according to algorithms they try to implement. However, designers should consider their 
applications and design specifications cautiously, and should make proper modifications on 
individual design components, in order to make less error prone implementations. 
7-- 
l Point Operation 
Concurrent Histogram lnterframe 
Operations Operations Operations I I I 
Pixel Level 
ppppp - 


















(X) Highly recommended 
Table Il. Summary of or-chip implementation methodology for image processing algorithm. 
Prototype chips for each major group in the vision algorihms were designed and fabricated 
with 0.35 pn CMOS technology, for the demonstration of on-chip implementation of 
algorithms. Three prototype chips were irnplemented: in-pixel intensity transfomer for point 
operation, on-chip binary image processing for local operation, and object positioning system 
for global operation. These prototype chips were tested and demonstrated successfblly. 
It is concluded in this thesis that on-chip image processing with image sensors will offer 
benefits of low fabrication cost, low power consumption, fast processing fkequency and 
parallei processing. Since each vision algorithrn has its own applications and design 
specifications, it is dangerous to predetermine optimal design architecture for every vision 
algorithm. However, in general, the pixel and column structures appea. to be the best choice 
for typical image processing algorithms such as point operation and local operation. 
The implementation of global operation is not recommended in spatial domain because of the 
heavy intercomections and computational power requirernents. Typically, the 
implementations of the global operation in the spatial domain should be modified and 
adapted for application-specific environments. 
Since CMOS image sensors use a standard process technology, modifications of the image 
sensing process cannot be achieved easily and optimization of the image sensing properties 
will not be as gmd as CCD. Although many microelectronic process companies such as 
TSMC, UMC and Tower Serniconductor offer the specialized processes for CMOS image 
sensors, CCD is still superior to CIS in terms of image quality. Typically in order for CIS to 
obtain equivalent image quality as CCD, special processes are needed, which require 
modifications on the standard process. The specialized process means expensive fabrication, 
which is contrary to the low cost concept of CIS. Therefore, even for the image quality 
enhancement of CIS, circuit level improvements such as image processing circuits are 
prefemed to the process level improvements in the CIS technology. The circuit level 
improvements are not beneficial only for the image quality, but dso for low cost VLSI 
integration. Sometimes, the VLSI integration of CIS is more emphasized than the image 
quality enhancernent for such applications as portable image devices, machine vision, 
surveillance and industrial inspections. Therefore, in the füture CMOS image sensors wiil 
find th& own applications where low cost and high functional image sensing is the driving 
force even with relatively low image quality. 
Appendix A: lnverted Logarithmic Pixel 
Sensors with Current Readout 
A.1. Introduction 
Current readout active pixel sensors are inherently advantagrnus in terms of readout speed 
because the fixeci output line voltage at input of transresistance amplifier prevents charge- 
discharge phenornena [79]. Another benefit of cment readout is current mode processing 
which is relatively compact in sue and simple in its operation [go]. One drawback of the 
active pixel sensors with current mode is lack of design resources. Because the current mode 
processing circuitry has not been well studieâ relatively, most of implementations will have 
to be custom designs. 
This appendix reports a CMOS active pixel sensor structure for a logarithmic pixel with 
continuous current readout. Because the design is distinct fiom the main theme of the thesis, 
it is located in this appendix. Here the arrangement of the photodiode and load found in a 
conventional logarithmic pixel is reversed. The inverted logarithmic pixel sensor reduces 
fixed pattern noise and e l m a t e s  the dependence of the output voltage swing on the column 
load, simpli-g both of its operation and structural design. We include a detailed design of 
the inverted logarithmic pixel sensor and its analysis, dong with operational performance 
and experimental results. 
A.2. Inverted Logarithmic Pixel Sensors 
We report a continuous curent readout logaxithmic active pixel in which the conventional 
arrangement of the photodiode and load are reversed. As show in Fig.A. l(a), a conventional 
logarithmic pixel employs a photodiode to generate a photocurrent and one or more 
MOSFETs operating in subthreshold to act as a load. The voltage dropped across the load is 
dependent on h(iphoto) due to this subthreshold operation. Such a configuration has 
advantages of continuous operation, thereby enabling temporal as well as spatial random 
access, and wide dynamic range (-6 orders of magnitude of illumination). 
I Current 
-I C + Out I 
Current Out 
Figure A. I .  Stmctures of logarithmic pixel sensors: (a) conventional log pixel, (b) 
current readout with PMOS buffer, (c) inverted log phel- 
Disadvantages include hi& fixed pattern noise, low contrast due to a small voltage swing 
(typically 200mV for the entire 6 order range of illumination), and relatively poor response at 
low illumination [8I] [82]. The complement of current readout technique of this pixel 
structure is also possible, where the load and photodiode positions are the same as the 
conventional logarithmic pixel, but a PMOS buffer transistor is used (see Fig.A. l(b)). The 
voltage generated across the load by the photocurrent appears as V, of PMOS buffer 
transistor (MI in Fig.A. l(b)). As the light intensity increases, the V, of the PMOS transistor 
increases, generating output cmrent which is equal to K(V, - VT)~ .  However, PMOS 
transistors are known to have higher lithographical mismatch than NMOS [89] 1901, so this 
structure is expected to display higher h e d  pattern noise. However, as technology is 
developed and more attentions are brought to every level of process, misrnatch of PMOS 
transistors is not necessarily worse than that of NMOS transistors any more. Therefore, it is 
process dependent. 
Another contribution to the low voltage swing for the conventional logarithmic pixel is a 
trade off in the choice of column bias; to maximise VOW, a low V ~ U U  is required, but a high 
VOW reduces Vpr for M l .  In our design, (see Fig.A. I(c)) the positions of the photodiode and 
load are reversed, so the voltage generated across the load appears duecîiy as VP. Now the 
PMOS transistors consume more area than NMOS because of the implementation of wells, 
the use of PMOS transistors is ofien avoided. 
Moreover, the response of the pixel is now dependent on local, as opposed to global, 
matching of MOSFET charactenstics. A larger than average local W/L (caused by 
Figure A. 2. Simulated effect of lithographic devia tion on a regular logmithmic pixel 
sensor. As varying w with a B e d  1 = 0.35 p. the output m e n t  of the driving 
transistor, Mi.  changes signifcanrly. At Iph = IO PA, variation of w leads to 
approximateZy 30 pl (- 130 % of output swing) of output m e n t ,  while the output 
swing between Iph = O to 30pA. is only 23 +UA. 
Figure A.3. Sirndated effect of Zithographic deviation on an inverted Zogarithmic 
pixel sensor. A partiaZZy Zittle variation of output current is caused by the 
Zithographical deviation of W/L: about 20 % of output swing. 
APPENDIX A 192 
lithographic deviation) means that, while the photocurrent generates less voltage across the 
load, b for Ml will be increased in partial compensation. in contrast, a higher than average 
W/L in the conventional pixel logarithmic (see FigA.l(a)) leads to a increase of the Ml Y, 
which is compounded by the increased W/L of M l  itself. Simulations of the effects of the 
lithographic deviation, shown in Fig.A.2 and A.3, illustrate fixed pattern noise suppression of 
the inverted logarithmic pixel sensoa. Variations of W/L of transistors in the conventional 
logarithmic pixel sensor produce a large variation of output current, about 142% variation of 
output swing (fkom Iph = O and 30 PA), while generating only 18% in the inverted 
logarithmic pixel sensor. Hence, this inverted logarithrnic pixel is expected to display 
reduced pattern noise and larger output swing than conventional logarithmic pixels, while 
maintainhg continuous readout and wide optical dynamic range. 
Figure A.4. Schematic view of the sensor structure. The output current is 
converted to a voltage by an externd transresistance amplzjier. The use of a single 







i c  
(a) Single junction photodiode with (b) Double junction photodiode with 
fToating difficsion ut n-me side floating dz#ksion at p-type side 
Figure A S .  Stmchrres ofphotodiode used for the inverted logarihmic 
pixel sensors. 
A.3. Testing and Measurements 
Here, this pixel structure has been implemented with a curent-mode readout (Fig.A.41, 
where the column load is replaced by an off-chip transresistance amplifier. Since location of 
photodiode and loads is reversed, the layout of photodiode (shown in Fig.A.S(b)) is different 
fiom the normal one (Fig.A.S(a)). Current readout has well-known advantages of reduced 
column charging/discha.ging, low noise, and ease of analog signal processing. However, it is 
not normally implemented in integrating pixels owing to the difficulty of on-chip pattern 
noise correction; this is of lesser concern here because continuous pixels typically requïre 
off-chip pattern noise correction. In our case, reading out iour also serves to decornpress the 
logarithmic dependence of Vour on i p h t ~ ,  since now La a (v,)~ a [ h ( i p h o r o ) ~ ~ .  In normal 
operation K e f =  OV. 
Photorespouse characteristics for single pixels with various numbers of load transistors are 
shown in Fig.A.6. A consequence of the structure shown in Fig-A. l(c) is that Ml operates in 
sub-threshold at low light intensitia. Now ioui a evp and Ys, a h(iphoro), giving a region 
where im qiphoto. At higher illumination, the [~n(i~hom)]~ variation is observed. Fig. A.7(a) 
APPENDIX A 
O 50 100 150 200 
Input Light (lux) 
Figure A.6. Variation of the photorespome of the inverted Iogorithmic pire2 with 
number of load ?rumistors. 
shows a sample unprocessed image captured on a 64 x 64 array of 30pm pixels with 3 load 
transistors, implemented in a standard 0.35pm CMOS process (see also Fig.A.8), whose chip 
testing is summarized in Table A. 1. While the image can clearly be seen (in contrast to many 
conventional logarithmic pixel sensoa), pattern noise is still significant Some of this is due 
to the fabrication process, which gives a high fixed pattem noise even for integrating mode 
sensors (-1.3% of saturation). An image corrected by subüaction of a background reference 
is shown in Fig.A.7(b). To obtain best results, this reference image is a white image captured 
at the same average illumination as the original, indicating the presence of photo-response 
non-uniformity (PRNU). The PRNU for the sensor is plotted in Fig.A.9; this is to be 
compared with conventional logarithmic pixels where PRNU is typically -50% of the mean. 
To illustrate the advantages of current-mode readout for low voltage operation, the suppl y 
voltage has been reduced fiom its standard value of 3.3V (Fig.A.lO); the sensor works well 
down to Voo = 2SV, but degrades rapidly thereafler. Variation of images with VRr is shown 
(a) Unprocessed image (b) White Background 
Pattern Noise 




(b) Illumination Pattern 
Noise 
(c) Pattern noise 
subtracted image 
Figure A. 7. (a) Raw image captured under room light of approximately 200 lux, (5) 
White background image. (c) Image corrected by subtraction of a white image. 
Figure A.8. Photograph of the image sensor die. Total die 
area is 16 mm2. 
Chip size 
Pixel size 
Format of array 
Fill factor 











200 Khz (Sampling Rate) 
3.30 mA x 3.3 V = 10.89 mW at 50 Khz 
sampling rate 
180 lwc 




Table A. 1. EIectrttrtcal nd Optical characteristics of the Inverted Logarithrnic 
Sensor chip. 
Figure A.9. Variation of nns pattern noise with illumination. In the absence of n 
well-defined saîuration signal, pattern noise is expressed as a percentage of the 
rnean output voltage at each point. 
(a) Vdd = 3.3 V 
(b) Vdd = 2.5 V 
Figure A. IO. Eflect of image sensor VDD on image quolity. VDD is nominally 3.3 V 
for this technology. 
(b) vin = 0.2 v 
(c) Vin = 0.3 V 
(d) Vin = 0.4 V 
(e) Vin = 0.5 V 
(t) vin = 0.7 v 
Figure A.11. Effect of transresistance ampl@er reference voltage on image 
qua Iity. 
(a) Sampling 
Rate = 10 KHz 
(c) Sampling 
Rate = 50 KHz 
(d) Sampling 
Rate = 100 KHz 
(e) Sampling 
Rate = 200 KHz 
Fi'e A. 12. Eflect of &ta sampling rate on image quality Because current 
readout does not have chu~ing/discha~ingphenomena. it can achieve high 
fiame rate. 
in Fig.A.11, illustrating the relative independence of the pixel operation to column voltage, 
and hence insensitivity to the input resistance of any subsequent image processing stages. 
There are no charging/discharging phenornena in cment readout mode, eliminating RC tirne 
constant. Thus, the inverted log pixel sensor with current readout cm have high hime rate, 
Maximum data rates (directly related to &me rates) of active pixel sensors in integration 
mode typicaiiy depend on RC time constant in S/H and its output drivers. In contrast, the 
inverted log pixel sensor does not have any S/H's and output drivers, experiencing no time 
delay in data readouts. However, due to slow time response of logarithmic transistors in 
subthreshold region, there are degradations on images at high data rate. Fig.A.12 shows 
difierent images captured at different data sampling rates. With a particular processing 
technology, typical maximum data rate of APSs in integration mode is around 50 KHz, while 
the inverted log pixel does not experience any degradations on images at 100 KHz, 
generating higher &une rate. 
A.4. Conciusions 
The reverseci arrangement of the photodiode and load causes the voltage generated across the 
load by the photocurrent to appear directly as the gate-source voltage of the in-pixel b&er 
transistor. This configuration eliminates the dependence of the voltage swing on the colurnn 
load. Pattern noise is also reduced over conventional logarithmic pixels because global 
variations of threshold voltage are less signifiant. In addition, a readout technique of the 
pixel sensor demonstrates reduced signal compression, improved output swing and increased 
frame rate. However, the independence of the load is not as effective as expected: images are 
degraded when Ver is higher than 0.7 V. Although pattern noise is expectedly reduced, there 
are still noticeable degradations by the pattern noise and thus M e r  processing on the 
images is required. 
Appendix B: Basic Procedures for Image 
Capture Test 
The very h t  step in the CMOS image sensor test is to capture an image of the best quality 
with the chip, verifjing test board connections, control input patterns, image display software 
setup and, most importantly, the design of the chip. Here is the procedure of the image- 
capturing test. 
1. Wiggling Test 
Without a lens, as a light source is hinieci on and off (or a light source is swirled in fiont of 
the image sensor chip) in a dark room, the output voltage of the chip should go up and down 
and the images in the display should become bright and dark, indicating the chip is capable 
of sensing the incident light. Generally, the wiggling test verifies the input control patterns, 
the image display system and the test board connections. It does not give a full verification of 
the setup, but gives a basic setup to start the test with. 
2. Flash Light Circle Test 
With an approximate setup of a lens (focal length adjustments and alignrnents to the chip), as 
a light source (e.g. a flash light) is tumed on in fiont of the image chip in a dark room, the 
output image should contain a white circle. If any circles cannot be captured, the bias 
voltages can be changed until an appropriate shape of the circle is captured. If any circles 
cannot be obtained with many different bias voltages, check the input patterns, the image 
display system and the test board connections. This step helps to find the appropriate bias 
voltages for the chip. 
3. Final Image Capture 
With the sanie setups (input patterns, display system, b i s  voltages and test board), a 
stationary object is placed in fiont of the chip with an appropriate illumination. As the focal 
distance and the alignment of the lem to the chip are changed, the chip should capture the 
stationary object nie fïrst image might be blurred, but an appropnate adjustment of the lens 
will sharpen the image and give an image of the best quality with the chip. 
Appendix C: Image Sensor Characteristics 
C.1. Basic Measurements 
O Measure the fiame rate (sampling rate) at which best image quality for a same image 
is obtained. 
O Measure the power consumption (nominal current flowing fiom Vdd) for various 
images at the fiame rate. 
O Measure and Save image files at a fïxed wavelength and at a fixed hune rate while 
changing the illumination (light power or intensity, but lux is preferred) nom O to 
until the output voltage is saturated. In addition, the wavelength and fiame rate c m  be 
varied for another test, This measurement c m  be directly used for photosensitivity, 
PRNU, saturation level. Mso, it can be used for calculation of conversion efficiency. 
O The above measurement is under an assrmiption that there is enough settling time 
between the different illuminations (due to light temperature). AAer changing the 
light intensity of the light source, the light intensity fluctuates until it settles down at a 
constant value. Sometimes it is difficult to avoid the light temperature effects. Instead 
of changing the light intensity, the integration time is changed here. Measure and Save 
image files (> 100 files at each Tint) at a fixed light illumination and at a fixed 
wavelength (typically at 540 nm of green light) as changing the integration time. For 
each image file, the mean and variance is calculated. 
Measure and Save image files at fixed illumination (light power, w/m2, is preferred) 
and a fixed fiame rate as changing wavelength of the incorning ray using a 
monochrometer. The illumination and fiame rate can be varied for another 
measurement. This measwernent c m  be used for spectral response. 
O Measure and Save image £iles in dark room as changing integration time (sampling 
rate). This measwement can be used directly for FPN and dark current. If you want to 
measure temporal noise Save enough image files (NO0 files) under a carefbl 
environment setup because the temporal noise is very sensitive to any environment 
change 
C.2. Imager Characteristics Extraction and Calculation 
FW factor [% 1 
Photo-sensitive area 1 total pixel area 
Frame rate [fkamesfsecondl 
The maximum frame rate (sampling rate) 
Power [mw 
Measure power (current nom Vdd node of the power supply) when the imager captwes 
normal image (background of the lab) and in the dark room 
Photosensitivity ~/(Iux*sec)] & Linearity 
(1)  As different light intensity (lux is prefmed to light power, w/m2) shines on the sensor 
array, meanire the output voltage at a given fiame rate (sampling rate) 
Slope = photosensitivity* Tint 
Fluctuation of slope = linearity 
b 




Slope = photosensitivity * light intensity 
Tint (Integration T h e )  
Quantum Efiiciency 
With the previous measurements of saturation level, conversion gain, the QE can be 
calcuiated as foIlows 
QE = Saturation Level /(light powePEffective photodiode areaereflection loss*Tint*&) 
APPENDIX C 207 
Conversion Effïciency 
(1) Mer measuring the output voltage (or delta V) at a given light intensity (light power) 
when the output voltage is saturated, calculate the total number of electrons generated by the 
light intensity with assumed QE, optical reflection, £il1 factor and photodiode capacitance. 
Then the conversion efficiency is the saturation level divided by the total number of the 
photon-generated electrons. 
Total # of photon-generated electrons (ne) = (P x Aeff x QE x R x Tint) I Ephoton 
Where 
P = light power pet unit area 
Aeff = photo-sensitive ara  (approltimately pixel area x fil1 factor) 
QE = Quantum efficiency (about 40Y&600/0) 
R = Optical reflection (about 60 - 70%) 
Tint = integration time 
Ephoton = Energy per photon (Eph = hf = 1.24 1 wavelength in nm) 
Conversion Efficiency = Saturation level / ne 
(2) Calculate the mean and the variance of the images captured at a fixed Tint. The 
conversion efficiency is simply 
g, = variance / mean 
This calcuiation is repeated for different Tint. The same resultant value sbould be obtained 
for the different Tint. 
Spectral Response 
Measure output voltage (or delta V = Vout in dark - Vout at Light) at different wavelengths 
(different wavelength filters in the monochrometer) at a fixed light illumination 
Wavelaigth (400-700 & v i d  Light specmmi) 
Saturation Level 
( 1 )  It is the maximum output voltage swing. As the light power increases at a given h e  
rate (sampling rate), the maximum delta V will saturate to a minimum voltage value. The 
saturation level = Highest output voltage - Lowest output voltage. 
Voltage (V) t 
Light intensity (light power) 
(2) As Tint changes, plot the mean output values of the images dong with the integration 





Tint (Integration Time) 
Dark Current 
Measure and Save the whole fiame (whole image) in the dark room as chmging integration 
time (sampliug rate). 
Average value 
of whole image 4 
Integration Time (sampling rate) 
Fixed Pattern Noise (FPN) 
Measure and Save the whole fiame (whole image) in the dark room and calculate the 
variancelstandard deviation of the image file in Vpp or Vrms or % (Vnns 1 saturation level) 
PRNU (Photo-Response Non-Unifor mie) 
(1) Measure and Save the whole fhme (whole image) at different light power (intensity) and 
at a fixed h e  rate and at a fixed wavelength. Then calculate the variancelstandard 
deviation of the image file in Vrms or Vp/p or % (Vrms lmean) 
(2) Measure and save the image files at different Tint and at a fixed Light illumination and at 
a fixed wavelength. Then calculate the varïance/standard deviation of the image files in Vpp, 
Vrms and %. 
Temporai Noise 
Measure the consecutive samples of the output voltage for one pixel in the array with 
different Tint (typically the value measured in the dark rwm is used for SMt and DR). The 
number of samples should be large (> 100) and carefùl environment is required because the 
temporal noise measwement is very sensitive to the environment. 
Signai to Noise Ratio (SN or SNR) 
Calculated as Saturation level / output temporal noise in the dark 
Dynamic Range 
Calculated as Saturation intensity 1 Temporal noise equivalent intensity. It should be same or 
nearly same as SN if the photoresponsitivity is linear (Output voltage difference is 
proportional to input light intensity) 
Photon Shot Noise, KTC noise 
They either cannot or difficult to be rneasured separately. They are included in part of 
temporal noise or readout noise. 
Effective Capacitance 
Measured fiom test structure of photodiode. 
Mea~u~ed  h m  calculation with light power, QE, photosensitive area (fil1 faftor), optical 
refection and output voltage. The total # of photons generated by the light intensity is 
calculated and then degradation by QE, photosensitive area and optical reflection is applied 
to the total # of photons. Then the effective capacitance = charge of photon-generated 
electrons / output voltage (delta V) 
APPENDIX C 
C3. Image Sensor Characteristics 







Chio Size mm x mm 1 Phvsical measure 
Pixel Size p m  x.pm 1 Physical measure 
Format of Array 1 Phvsical measure 
Fi11 Factor YO 1 Phvsical measure 
Max. Pixel rate 40 Mhz 
(megapixeldsec) 
(analog) 
1 00 Mhz (digital) 






Single serial port 
From the Grame time 
(e.g. 33 milliseconds) 
to less than one row 
read time (e.g. a few 
microseconds) 
30 - 100 mW for 
VGA format (640 x 
480) 






Fastest fiame rate with 
reasonable image quality 
Power at the nominal fiame 
rate 
Illumination 
Lux or Flux 
lux or w/m2 1 Illumination in environment 
1 w/m2 = 701x (visible white 
light) to 180lx (visible + 
NIR) 
V / (lx*Sec) 
Or 
output voltage p / s ]  vs. 
Input Light power ~ / m 2 ]  
(V/s)/(W'm2) 
output voltage p / s ]  vs. 











Temporal Noise (NJ 
Signal to Noise 
Ratio (S/N, SNR) 
Dark Signal 
Vnns or Vpp 
or % 
Vrms or Vpp 
or % 
Vpp or Vrms 
or % 
Average output current (or 
QE) in unit area, at 
wavelenghs under a light 
power w/m2], OR 
Ratio between photo current 
and light power for a given 
wavelength. 
Output voltage per unit of 
input signal charge 
Variance 1 Mean of output 
Max output voltage in dark - 
Min output voltage in bright 
Vrms at a given wavelength 
and illumination 
% = Vnns 1 mean level 
or Vpp 1 mean level 
Staîïc spread of (dark) 
voltages of ail pixels of amay 
Vpp 1 saturation level 
RMS of consecutive sarnples 
of the output voltage for one 
pixel 
Output signal voltage range / 
output signal noise in the 
dark 
Saturation intensity 1 Noise 
equivalent intensity 
If input light is linear with 
output signal, SNR = DR 
Signal voltage drop in the 
dark, due to dark current 
(Output voltage in dark - 
output voltage) 1 Integration 
time 
3 - 35 pV/e- (output 
referred) 
500 mV (Vdd = 3.3 
V)-2V(Vdd=SV) 
1-10 % Vpp or 
0.8-2 % Vnns 
of mean level 
1 mV - 50 mV Vrms 
1 mV - 30 mV rms 
< 1 %Vpp 
< 200 UV nns 









current in the dark per pixel, 
or normalized per unit area 
# of generated electrons / # 
of "impinging photons" 
Or 
QE=SRxhv /q  
Photo charge / Output 
voltage 
Al1 the noise measured and 
input referred 
20Y0 (photogate) - 
40% (photodiode) 
15 e- (photogate) - 50 
e- (photodiode) 
[l] Eric R. Fossum, "CMOS Image Sensors: Electronic Camera-On-A-Chip", IEEE 
Transactions on Electron Devices. Vo1.44, No. 10, pp. 1689-98,Oct.97 
[2] S. Morrison. "A new type of photosensitive junction device", Solid-State Electron, Vo1.5, 
pp.485-94, 1963 
[3] J. Horton, R. Mazza, and H. Dym, "The scanistor-A solid-state image scanner", in Proc. 
IEEE, Vo1.52, pp. 15 13-28, 1964 
[4] M.A. Schuster and G. Stull, "A monolithic mosaic of photon sensors for solid state 
imaging applications", IEEE tram. Electron Devices, Vol.ED- 13, pp.907- 12, 1966 
[SI G.P. Weckler, "Operation of p-n junction photodetectors in a photon flux integration 
mode", IEEE J. Solid-State Circuits, Vol-SC-2, pp.65-73, 1967 
[6] R. Dyck and G. Weckler, "Integrated arrays of silicon photodetectors for image sensing", 
IEEE Trans. Electron Devices, Vol.ED- 1 5, pp. 196-20 1, 1968 
[7] P. Noble, "Self-scanned silicon image detector arrays", IEEE Trans. Electron Devices, 
Vol.ED- 1 5, pp.202-209, 1 968 
[8] W.S. Boyle and G.E. Smith, "Charge-coupled semiconductor devices", Bell Syst. Tech. J. 
Vo1.49, pp.587-93, 1970 
[9] R. Melen, "The tradoff in monolithic image sensors: MOS versus CCD,  Electron., 
Vo1.46, pp. 106- 1 1 1, May 1973 
[IO] S. Ohba et al., "MOS area sensor: Part II-Low noise MOS area sensor with anti- 
blooming photodiodes", IEEE Trans. Electron Devices, Vol.ED-27, pp. 1682-7, Aug. 1980 
[I l ]  K. Senda, S. Terakawa, Y. Hiroshima, and T. Kuoii, "Analysis of charge-priming 
transfer efficiency in CPD image sensors", IEEE Trans. Electron Devices, Vo1.ED-3 1, pp. 
pp. 1324-8, Sept. 1984 
[12] H. Ando et al., "Design consideration and performance of a new MOS imaging device?', 
IEEE Trans. Consumer Electronics, Vol.ED-32, pp. 1484-9, May 1985 
[13] D. Renshaw, P. Denyer, G. Wang and M. Lu, "ASIC image sensors", IEEE Int. 
Symposium of Circuits and Systems, pp.30384 1,1990 
[14] S. Mendis, S. Kemeny, and E. Fossum, "CMOS active pixel image sensor", IEEE Trans. 
Electron Devices, Vol.ED-4 1, pp.452-3, 1994 
[15] Alireza Moini, "Vision chips or seeing silicon", Technical Report, Centre for High 
Performance Integrated Technologies and Systems, The University of Adelaide, March 1997, 
(www.eleceng.adelaide.ed~au/GroupdGAAS/B~geye/vi~i~n~hipdinde~.html). 
[16] J.J. Zarnowski, M. Pace, M. Joyner " 1.5 FET-per-pixel standard CMOS active column 
sensor", Proceedings of SPIE, Vo1.3649-27 
[17] M.N. Al-Awa, et al. "A Real Time Vision Architecture ushg a Dynamically 
Reconfigurable Fast Bus', Image Processing and Its Applications, pp.470-4, Jdy 1995 
1181 Doug Tody, T h e  Data Handling System for the NOAO Mosaic", Astronomical Data 
Analysis Software and Systems VI, ASP C o d  Series 125, pp.45 1-4 
Cl91 Qinglian Guo, et al, "Generation of High-Quality Images for Tele-medicine and Tele- 
pathology Efforts", Proceedings of the 2 0 ~  Annual International Conférence of the IEEE 
Engineering in Medicine and Biology Society, v01.20, N0.3, pp. 1288-9 1, 1998 
[20] Suneira Mendis, "CMOS Active Pixel Image Sensors with On-Chip Analog-to-Digital 
Conversion", Ph.D. thesis at Columbia University, 1995 
(211 P. Lee, et al. "An active pixel sensor fabricated using CMOSICCD process technology", 
1995 IEEE Workshop on CCDs and Advanced Image Sensors, Dana Point, CA 
[22] C.Y. Wu and C-Chiq "A new structure of the 2-D silicon retina", IEEE Journal of 
Solid-State Circuits, Vo1.30, No.8, August 1995 
[23] E. Vittoz, "Analog VLSI signal processing: M y ,  where and how?" Analog Integrated 
Circuits and Signal Processing, Vol. 6, pp. 27-44, 1994 
[24] GR. Nudd, et al. "A charge-coupled device image processor for smart sensor 
applications", SPIE Proc. Vol 155, pp. 15-22, 1978 
[25] T. Knight, "Design of an integrated optical sensor with on-chip processing", PhD thesis, 
Dept. of Electrical Engineering and Cornputer Science, MIT, Cambridge, Mass., 1 98 3 
1261 C.L. Keast and C.G. Sodini, "A CCDKMOS-based imager with integrated focal plane 
signal processing", IEEE Journal of Solid State Circuits, Vol. 28, No. 4, pp.43 1-7, 1993 
[27] A.M. Chiang, M.L. Chuang, "A CCD programmable image processor and its neural 
network applications", IEEE Journal of Solid State Circuits, Vol. 26, No. 12, pp. 1894-1901, 
1991 
[28] W. Yang, "Analog CCD processors for image filtering", SPIE Proc., Vol. 1473, pp. 114- 
27 
1291 E.R. Fossum, "Architectures for focal plane processing", Optical Eng., Vo1.28, No.8, pp. 
866-87 1, 1989 
[30] 2. Zhou, B. Pain, E. Fossum, "Frame-transfer CMOS Active Pixel Sensor with pixel 
binning," IEEE Trans. On Electron Devices, Vol. ED-44, pp. 1764-8, 1997 
[3 11 Abbas El Gammai, "Pixel Level Processing - Why, What and Why?' SPIE, Vol. 3650, 
pp.2 - 13, 1999 
1321 Bedabrata Pain, "Approaches and analysis for on-focal-plane analog-to-digital 
conversion", SPIE, Vol. 2226, 1 994 
[33] T. Chen, P. Catrysse, A. El Gamal, and B. Wandell, "How Smail Shouid Pixel Size Be?" 
SPIE, Vol. 3965, San Jose, CA, January 2000 
[34] David X. D. Yang, "A 640 x 512 CMOS Image Sensor with Ultrawide Dynamic Range 
Floating-Point Pixel-Level ADC", IEEE Journal of Solid-state Circuits, Vol. 34, No. 12, pp. 
1821 -34, 1999 
[35] Kiyohani Aizawa, "Computational Image Sensor for On Sensor Compression", IEEE 
Transactions on Electron Devices, Vol. 44, No. 10, pp. 1724 - 30, 1997 
D63 Miguel Arias-Estrada, bbComputational Motion Sensors for Autoguided Vehicles", 30" 
ISATA Conference on Robotics, Motion and Machine Vision in Automotive Industry, pp. 
101 - 8, 1997 
[37] Orly Yadid-Pecht, "CMOS Active Pixel Sensor Star Tracker with Regional Electronic 
Shutter", IEEE Journal of Solid-state Circuits, Vol. 32, No. 2, pp.285 - 8, 1997 
[38] Makoto Nagata "A Minimum Distance Search Circuit using Dual-Line PWM Signai 
Processing and Charge Packet Counting Techniques", ISSCC97 (97 International Solid State 
Circuits Conference), pp. 42 -3, 1997 
1391 M.J.M. Pelgrom, A.C.J. Duinmaiger, A.P.G. Welbers, "Matching properties of MOS 
transistors", IEEE Journal of Solid-state Circuits, Vol. 24, pp. 1433-40, 1989 
[40] R. Forchheimer, A. Astrom, Wear-sensor image processing: a new paradigm", IEEE 
Transactions on Image Processing, Volume: 3 Issue: 6, pp.736 -746, Nov. 1994 
[41] S. Jung, R. Thewes, T. Scheiter; Goser, K.F.; Weber, W."A low-power and high- 
performance CMOS fingerprint sensing and encoding architecture", IEEE Journal of Solid- 
State Circuits, Volume: 34 Issue: 7, pp. 978 -984, July 1999 
[42] G. Matheron, "Elements pour une Theone des Milieux Poreuz," Masson, Paris, 1967 
REFERENCES 2 16 
[43] P. Maragos, R.W. Schafer. "Morphological filters. Part 1: Their set-theoretic analysis 
and relations to linear shift-invariant filters. Part II: Their relations to median order-statistic, 
and stack filters", IEEE Trans. Acoust. Speech Signai Processing Vol. 35, pp. 1 153-84, 1987 
[44] J.S.J. Lee, R.M. Haralick, L.G. Shapiro, "Morphologie edge detection", IEEE Trans. 
Robotics Automation, RA-3, pp. 142-56, 1987 
[45] P. Maragos, R.W. Schafer, bbMorphological systems for multidimensional signal 
processing", Proc. IEEE Vol. 78, pp. 690-7 10, 1990 
1461 L.F.C. Pessoa, P. Maragos, "MRL-filters: A general class of nonlinear systems and their 
optimal design for image processing", IEEE trans. image Processing, Vol. 7, pp.966-7 8, 
1998 
[47] J.G.M. Schavemaker, M.H. Reinders, R. Van den Boomgaard, "Image sharpening by 
morphological filtering", IEEE Workshop on Nonlinear Signal & Image Processing 
MacKinac Island, Michigan, Sept. 1997 
1481 D. Schonfeld, J.Goutsaias, "Optimal morphological pattern restoration fiom noisy 
binary images", IEEE Trans. Pattern Analysis Machine Intelligence Vol. 1 3, pp. 14-29, 199 1 
[49] N.D. Sidiropoulos, J.S .Baras, C.A. Berenstein, bbOptimal filtering of digital binary 
images corrupted by unionhtersection noise", IEEE Trans. Image Processing, Vol. 3, pp. 
382403, 1994 
[SOI H. J.A.M. Heijmans, C. Ronse. "Annular Filters for Binary Images", IEEE Transactions 
on image processing, Vol. 8. No 1 0. pp. 133040, 1999 
[5 11 E. Oron, A. Kumar, Y. Bar-Shalom, "Precision tracking with segmentation for imaging 
sensors", IEEE Transactions on Aerospace and Electronic Systems, Volume: 29 Issue: 3, 
pp.977 -987, July 1993 
[52] T.G. Morris, S.P. DeWeerth, bbAnalogue VLSI morphological image processing circuit", 
Electronic letters, Vol. 3 1, No. 23, pp 1998-9, 1995 
[53] R.K. Krishnamurthy, R. Sridhar, "A CMOS wave-pipelined image processor for real- 
tirne morphology", Cornputer Design: 1995 IEEE International Conference on VLSI in 
Computers and Processors, 1995. ICCD '95. Proceedings, pp.638 -643, 1995 
[54] E. O'Rourke, J.B. Foley, "Specification, design and implementation of a digital binary 
image processing ASIC', IEE Colloquium on Applications Specific Integrated Circuits for 
Digital Signal Processing, pp.8/ 1 -8/5, 1993 
[55] M.T. Rigby, G.J. Awcock, VLSI  design methodologies for application specific binary 
sensors", Sixth International Conference on Image Processing and Its Applications, 1997, 
Volume: 1, pp. 166 - 170, 1997 
REFERENCES 217 
[56] J.D. Legat, P. De Muelenaere, " A  high peiformance SIMD processor for binary image 
processing", Proceedings of the IEEE Custom htegrated Cucuits Conference, pp. 17.4/1 - 
17.4/4, 1990 
[57] M. Schwanenberg, M. Traber, M. Scholies, R Schuffiy, "A VLSI chip for wavelet 
image compression* IEEE international Symposium on Circuits and S ystems, 1 999. ISCAS 
'99. Proceedings of the 1999, Volume: 4, pp.27 1 -274, 1 999 
[Sa] R. Dominguez-Castro, S.Espejo, A. Rodrigues-Vazques, R. A. Carmona, P.Foldesy, A. 
Sarandy, P. Szolgay, T. Sziranyi, T. Roska, "A 0.8 p CMOS Two-Dimensional 
Programmable Mixed-Signal Focal-Plane Array Processor with On-Chip Binary Imaging and 
Instructions Storage", IEEE Journal of Solid-State Circuits, Vol. 32, No. 7, pp. 1013-26, 1997 
[59] T. Nenika, T. Fujita, M. Ikeda, K. Asada, "A binary image sensor with flexible motion 
vector detection using block matching method", Asia and South Pacific Design Automation 
Conference, 2000. Proceedings of the ASP-DAC 2000, pp.2 1 -22,2000 
[60] S. Jung, R. Thewes, T. Scheiter, K.F. Goser, W. Weber, "A low-power and high- 
performance CMOS f inge rp~ t  sensing and encoding architecture", IEEE Journal of Solid- 
State Circuits, Volume: 34 Issue: 7, pp. 978 -984, July 1999 
[61] L. Zheng, K. Aizawa, M. Hatori, "Implementation of a 2D motion vector detection on 
image sensor focal plane," IEEE Intemational Symposium on Circuits and Systems, 1999. 
ISCAS '99. Proceedings of the 1999, Volume: 5, pp. 156 -1 59, 1999 
[62] N. Bourbakis, N. Steffensen, B. Saha, "Design of an array processor for parallel 
skeletonization of images", IEEE Transactions on Circuits and Systems II: Analog and 
Digital Signal Processing, Volume: 44 Issue: 4, pp.284 -298, Apnl 1997 
[63] W.C. Fang, T. Shaw, J. Yu, J, B. Lau, Y.C. Lin, "Paralle1 morphological image 
processing with an opto-electronic VLSI array processor", 1993 IEEE International 
Conference on Acoustics, Speech, and Signal Processing, 1993. ICASSP-93, Volume: 1, pp. 
409 4 12, 1993 
[64] W. Yang, "A charge coupled device architecture for on focal plane image signal 
processing", 1989 International Symposium on VLSI Techology, Systems and Applications, 
1989, Proceedings of Technical Papers, pp.266 -270, 1989 
[65] S. Kawahito, D. Handoko, Y. Tadokoro, "A CMOS image sensor with motion vector 
estimator for low-power image compression", Instrumentation and Measurement Technology 
Conference, 1999. IMTC/99. Proceedings of the 16th IEEE , Volume: 1, pp.65 -70, 1999 
[66] T. Moms, E. Fletcher, C. Afghahi, S. Issa, K. Comolly, I.C. Korta, "A column-based 
processing array for high-speed digital image processing", 20th Anniversary Conference on 
Advanced Research in VLSI, 1999. Proceedings, pp.42 -56,1999 
REFERENCES 218 
[67] K. Chen, A. Astrom, P.E. Danielsson, "PASIC: a smart seasor for cornputer vision ": 
Pattern Recognition, 1990. Proceedings, 1 ûth International Conference on, Volume: ii, 
pp.286 -29 1,1990 
[68] J-C. Gealow, C.G. Sodini, "A pixel-parallel image processor using logic pitch-matched 
to dynamic memory", IEEE Journal of Solid-State Circuits, Volume: 34 Issue: 6, pp.83 1 -839, 
June 1999 
[69] Y. Ni, J. Guan, "A 256x256 Pixel Smart CMOS Image Sensor for Line-Based Stereo 
Vision Applications", IEEE Journal of Solid sate circuits, Vol. 35, No. 7, pp. 1055-6 1, 2000 
[70] 2. Zhou, B. Pain, RI Panicacci, B. Mansoorian, J. Nakamura, E.R. Fossum, "On-focal- 
plane ADC: Recent progress at PL", SPIE. Vol. 2745, pp. 1 1 1 - 122 
[71] A, Simoni, G. Torelli, F. Maloberti, A. Sarton, M. Gottardi, L. Gonzo, "256x256 Pixel 
CMOS digital camera for cornputer vision with 32 algorithrnic ADCs on board", IEE. 
Processing of Circuits Devices Systems, Vol. 146, No. 4, pp. 184- 190, 1999 
[72] S. Kawahito, M. Yoshida, M. Sasaki, K. Umehara, D. Miyazaki, Y. Tadokoro, K. 
Murata, S. Doushou, A. Matsuzawa, "A CMOS image Sensor with Analog Two-dimensional 
DCT-Based Compression Circuits for One-Chip Cameras", IEEE Journal of Solid-state 
Circuits, Vo1.32, No. 1 2, pp.2O30-4 1, 1997 
[73] C.K. Chow, T. Kaneko, "'Automatic Boundary Detection of the Lefi Ventricle nom 
Cineangiograms", Computer and Biomed. Res. Vol. 5, pp. 388-40 1 
[74] Gonzalez and Woods, Digital Image Processing, Addison Wesley, 1993 
[75] S.E. Umbaugh, Computer Vision and Image Processing, Prentice Hall PTR, 1999 
[76] A. Bovik, Handbook of Image & Video Processing, Acadernic Press, 2000 
[77] C. Koch, H. Li, "Vision Chips Implementing Vision Algorithm with Analog VLSI 
circuits", IEEE Computer Society Press, 1995 
[78] H. Kobayashi, L. White, A.A. Abidi, "An active resistor network for gaussian filtering 
of images," IEEE Journal of Solid-State Circuits, Vol. 26, No. 5, pp.737-748, May 199 1 
[79] J-Nakamura, B.Pain, E.R. Fossum. "On-Focal-Plane Signal Processing for Current- 
Mode Active Pixel Sensors", IEEE Transactions on Electron Devices, Vol, 44, No. 10, 
pp. 1 747-57, 1997 
[80] L.G. Mcllrath, et. al. "'Design and Aaalysis of a 5 1 2x768 Cment-Mediated Active Pixel 
Array Image Sensors", IEEE Transactions on Electron Devices, Vol. 44, No. 10, pp. 1706- 1, 
1997 
REFERENCES 219 
[8 11 D. Scheffer, B. Dierickx, G. Meynants, "Random Addressable 2048 x 2048 Active Pixel 
Sensor", Transactions on Electron Devices, Vol. 44, No 10, pp. 1 7 16-20, 1997 
[82] S.Kavadias, et.al. "A Logarithmic Response CMOS Image Sensor with On-Chip 
Calibration", IEEE Journal of Solid-state Circuits, Vol. 35, No.8, pp. 1 146-52,2000 
[83] F. Pardo, et-al "Space-Variant Nonorthogonal Structure CMOS Image Sensor Design", 
IEEE Journal of Solid-state Circuits, V01.33, No.6, pp.842-9, 1998 
[84] J. Couiombe, M. Sawan, C. Wang, "Variable resolution CMOS m e n t  mode active 
pixel sensor", The 2000 IEEE Intemational Symposium on Circuits and Systems, 2000. 
Proceedings. ISCAS 2000 Geneva, VoIume: 2, pp. 293 -296,2000 
[85] H. Simmermann, Integrated Silicon Opto-electronics, Springer Series in Photonics, 2000 
[86] B. Pain, CMOS Digital Image Sensors, Short Course (SCl93) at Photonics West, SPIE 
conference, 200 1 
[87] R. Hornsey, Design and Fabrication of Integrated Image Sensors, ICR Short Course 
Notes, 1998 
[88] R.J. Baker, H.W. Li, D.E. Boyce, CMOS Circuit Design, Layout, and Simulation, IEEE 
Press, 1998 
[89] M. J.M. Pelgrom, A J. Duinmaijer, A.P .G. Welbers, "'Matching Properties of MOS 
Transistors", IEEE Journal of Solid-State Circuits, Vol. 24, No. 5, pp.1433-40, 1989 
[90] J. Bastos, M. Steyaert, R. Roovers, P.Kinget, W. Sansen, B. Graindourse, A. Pergoot, Er. 
Janssens, "Mismatch characterization of small size MOS transistors", Prof IEEE 1995 Int. 
Conference on Microelectronic Test Stnicutres, Vol. 8, pp.27 1-6, 1995 
[91] C.L. Keast, C.G. Sodini, "A CCDKMOS process for integrated image acquisition and 
early vision signal processing", Proc. SPIE Charge Coupled Devices and Solid State Spatial 
Sensors, Vol. 1242, pp. 152-6 1, 1990 
[92] C.L. Keast, C.G. Sodini, "A CCDKMOS based imager with integrated focal plane 
processing", IEEE Journal of Solid State Circuits, Vol. 28, No. 4, pp. 43 1-7, 1993 
[93] P. Dudeck, P.J. Hicks, "A CMOS General-Purpose Sampled-Data Anaiog Processing 
Elernent", IEEE Transactions on Circuits and System-II: Analog and Digital Signal 
Processing, Vol. 47, No. 5, pp467-73, May 2000 
[94] S. Anderson, W.H. Bruce, P.B. Denyer, D. Renshaw, G. Wang, "A Single Chip Sensor 
& Image Processor for Fingerprint Verification", IEEE 199 1 Custom Integrated Circuits 
Conference, pp. f 2.1.1-4, 199 1 
