# University of Massachusetts Amherst ScholarWorks@UMass Amherst

**Doctoral Dissertations** 

**Dissertations and Theses** 

July 2016

# Development of a Portable CMOS Time-Domain Fluorescence Lifetime Imager

Hongtao Wang University of Massachusetts Amherst

Follow this and additional works at: https://scholarworks.umass.edu/dissertations\_2

#### **Recommended Citation**

Wang, Hongtao, "Development of a Portable CMOS Time-Domain Fluorescence Lifetime Imager" (2016). *Doctoral Dissertations*. 670. https://doi.org/10.7275/8391959.0 https://scholarworks.umass.edu/dissertations\_2/670

This Campus-Only Access for Five (5) Years is brought to you for free and open access by the Dissertations and Theses at ScholarWorks@UMass Amherst. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact scholarworks@library.umass.edu.

# DEVELOPMENT OF A PORTABLE CMOS TIME-DOMAIN FLUORESCENCE LIFETIME IMAGER

A Dissertation Presented

by

HONGTAO WANG

Submitted to the Graduate School of the

University of Massachusetts Amherst in partial fulfillment

of the requirements for the degree of

DOCTOR OF PHILOSOPHY

May 2016

Electrical and Computer Engineering

© Copyright by Hongtao Wang 2016

All Rights Reserved

# DEVELOPMENT OF A PORTABLE CMOS TIME-DOMAIN FLUORESCENCE

# LIFETIME IMAGER

A Dissertation Presented

by

## HONGTAO WANG

Approved as to style and content by:

Christopher D. Salthouse, Chair

Robert W. Jackson, Member

Deepak K. Ganesan, Member

C.V. Hollot, Department Head Department of Electrical and Computer Engineering

# DEDICATION

To my parents.

## ACKNOWLEDGEMENTS

I would like to express my deepest gratitude to Professor Christopher Salthouse. Working in Biomedical Electronics Laboratory directed by Professor Salthouse was one of the greatest experiences in my life. He spent significant amount of time teaching me engineering skills hand by hand. He offered me as many as brainstorm ideas in solving challenging research problems. He provided me funding resources to guarantee a good focus on implementing creative solutions. In my "painful" days working in lab testing circuits and debugging codes, it was him who appreciated my little progress, and kept encouraging me to persistently and independently working things out. His nice personality made me feel he was not a rigorous mentor, but a friend who cares about my career and life in all ways.

I own my sincere gratitude to Professor Robert Jackson. He was the defense committee member for both my M.S. and Ph.D., my teaching assistant advisor for Electronics I and II, and the lecturer of my "Analog IC Design" course. Back to 2011, he recommended me to join in Biomedical Electronics Laboratory. I express my sincere appreciation to him for his recognition and encouragement.

I also own my sincere gratitude to Professor Deepak Ganesan. He provided me valuable funding resources and lab resources in developing a low power imager tape-out. I would like to thank him for his generous support, and the time serving as my Ph.D. committee member.

V

I appreciate my labmates Dr. Akshaya Shanmugam and Mrs. Shuo Li for their kind help in the past 5 years. I would extend my gratitude to Mr. Addison Mayberry, who helped me build instruments to test the low power imager IC. I hope all of them will build successful careers and end up with happy lives.

In the end, I own the deepest gratitude to my lovely parents. They gave me strongest emotional care that supported me to overcome many difficult periods throughout my Ph.D. Forever they are the most important people in my life.

## ABSTRACT

# DEVELOPMENT OF A PORTABLE CMOS TIME-DOMAIN FLUORESCENCE LIFETIME IMAGER

MAY 2016

HONGTAO WANG B.E., XIDIAN UNIVERSITY M.S.E.C.E., UNIVERSITY OF MASSACHUSETTS, AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS, AMHERST

Directed by: Professor Christopher D. Salthouse

Modern laboratory equipments to measure the excited-state lifetime of fluorophores usually include an expensive picosecond pulsed-laser excitation source, a fragile photomultiplier tube, and a large instrument body for optics. A portable and robust device to make fluorescence lifetime measurement in nanosecond scale is of great attraction for chemists and biologists.

This dissertation reports the development of a portable LED time-domain fluorimeter from an all-solid-state discrete-component prototype to its advanced CMOS integrated circuit implementation. The motivation of the research is to develop a multiplexed fluorimeter for point-of-care diagnosis. Instruments developed by this novel method have higher fill factor, are more portable, and are fabricated at lower cost.

# TABLE OF CONTENTS

| ACKNOWLEDGEMENTS                                                            | v    |
|-----------------------------------------------------------------------------|------|
| ABSTRACT                                                                    | vii  |
| LIST OF TABLES                                                              | xiii |
| LIST OF FIGURES                                                             | xiv  |
| CHAPTER                                                                     |      |
| 1. INTRODUCTION                                                             | 1    |
| 2. ALL-SOLID-STATE TIME-DOMAIN FLUORIMETER DISCRETE-<br>COMPONENT PROTOTYPE | 11   |
| 2.1 Discussion on existing fluorescence lifetime sensing method             | 11   |
| 2.2 Time-gated photon integration sensing method                            | 12   |
| 2.3 Hardware architecture and circuit implementations                       | 15   |
| 2.3.1 System block diagram                                                  | 15   |
| 2.3.2 Ring oscillator                                                       | 16   |
| 2.3.3 Delay circuit                                                         | 18   |
| 2.3.4 Excitation circuit                                                    | 19   |
| 2.3.5 Sensing circuit                                                       | 21   |
| 2.4 Experimental results                                                    | 24   |

| 2.4.1 Fluorimeter prototype overview                                                        | . 24 |
|---------------------------------------------------------------------------------------------|------|
| 2.4.2 Characterization result                                                               | . 26 |
| 2.4.3 Improvement on the detection limit                                                    | . 28 |
| <b>3.</b> ACTIVE PIXEL SENSOR CHARACTERIZATION UNDER TSMC 0.35 μ<br>HIGH VOLTAGE TECHNOLOGY |      |
| 3.1 Motivation to characterize sensor on the targeting technology                           | . 33 |
| 3.2 CMOS active pixel sensor in TSMC 0.35 µm high voltage technology                        | . 34 |
| 3.3 Sensor characterization method                                                          | . 37 |
| 3.4 Experimental results                                                                    | . 41 |
| 3.4.1 LED light power characterization                                                      | . 42 |
| 3.4.2 Sensitivity and linearity of the CMOS 3-T APS with PMOS FET reset                     | . 42 |
| 3.4.3 Sensitivity and linearity of the CMOS 3-T APS with NMOS FET reset                     | . 50 |
| 3.4.4 Reset speed                                                                           | . 51 |
| 3.4.5 Summary                                                                               | . 60 |
| 4. MIXED-SIGNAL BUILDING BLOCKS OF CMOS TIME-DOMAIN<br>FLUORIMETER IC                       | . 61 |
| 4.1 Design consideration                                                                    | . 61 |
| 4.2 System architecture of the Imager IC                                                    | . 62 |
| 4.3 Individual functional block design and verification                                     | . 64 |
| 4.3.1 Gating controller                                                                     | . 65 |
| 4.3.1.1 16-bit pre-scalar design and the measurement                                        | . 65 |
| 4.3.1.2 Dual-channel pulse generator design and measurement                                 | . 67 |
| 4.3.1.3 Delay line design and measurement                                                   | . 71 |
| 4.3.2 Switched-capacitor programmable-gain amplifier (SC-PGA) design and measurement        | . 74 |

| 4.3.2.1 Circuit design and closed-loop gain analysis                    |     |
|-------------------------------------------------------------------------|-----|
| 4.3.2.2 Noise analysis                                                  |     |
| 4.3.2.3 Layout and post-layout simulation                               |     |
| 4.3.2.4 Post-silicon measurement results and discussion                 |     |
|                                                                         |     |
| 4.3.3 10-bit column-parallel overlapping-subrange SAR ADC (CPOSSAR ADC) |     |
| 4.3.3.1 CPOSSAR ADC architecture                                        |     |
| 4.3.3.2 Overlapping subrange technique                                  |     |
| 4.3.3.3 ADC I/O                                                         | 101 |
| 4.3.3.4 10-bit CPOSSAR ADC measurement                                  | 103 |
| 4.3.4 Timing controller                                                 | 105 |
| 4.3.4.1 Top-level control logic                                         | 106 |
| 4.3.4.2 Lower-level control logic                                       | 109 |
| 4.3.4.3 Design and testing on timing controller                         |     |
| 4.3.4.4 Full custom design of the top and lower-level controllers       | 115 |
| 4.3.4.4.1 Pulse synchronizer                                            | 116 |
| 4.3.4.4.2 SC-PGA sampling timer                                         | 116 |
| 4.3.4.4.3 8-bit programmable timer                                      | 117 |
| 4.3.4.4 CDS Completed/Data Load                                         | 118 |
| 4.3.4.4.5 SC-PGA amplification timer                                    | 119 |
| 4.3.4.4.6 8-bit pre-scalar                                              | 120 |
| 4.3.4.4.7 ADC state register                                            | 121 |
| 4.3.4.4.8 ADC clock synchronizer                                        | 122 |
| 4.3.4.4.9 SAR reset controller                                          | 123 |
| 4.3.4.4.10 Coarse control                                               | 124 |
| 4.3.4.4.11 MSB load                                                     | 125 |
| 4.3.4.4.12 Fine control and shift enable                                | 125 |

| 4.3.4.4.13 ADC data load                                              | 126   |
|-----------------------------------------------------------------------|-------|
|                                                                       |       |
| 5. FLUORESCENCE LIFETIME IMAGER CAMERA MODULE                         | 1.0.0 |
| INTEGRATION                                                           |       |
| 5.1 Chip controller hardware                                          | 128   |
|                                                                       |       |
| 5.1.1 PIC32 microcontroller                                           | 128   |
| 5.1.2 Xilinx Cool Runner II CPLD                                      | 129   |
| 5.1.3 Programmable voltage and current reference                      | 130   |
| 5.1.4 Excitation circuit                                              | 131   |
| 5.1.5 Power supply                                                    | 131   |
| 5.1.6 CMOS fluorimeter integration                                    |       |
| 5.2 Testing of portable CMOS time-domain fluorescence lifetime imager | 133   |
| 5.2.1 Measuring 405 nm excitation pulse decay time                    | 133   |
| 5.2.2 Measurement on commercial fluorophores by a single pixel        | 135   |
| 5.2.3 Single pixel measurement detection limit                        | 138   |
| 5.2.4 Fluorescein lifetime measured with well spaced pixels           |       |
| 6. CONCLUSION                                                         | 152   |
| BIBLIOGRAPHY                                                          | 156   |

# LIST OF TABLES

| Table                                                                                       |    |
|---------------------------------------------------------------------------------------------|----|
| 1. The comparison of various types of fluorimeters                                          | 6  |
| 2. 16-bit pre-scalar circuit post-simulation data and the post-silicon measurement data     | 67 |
| 3. Summary of post-layout simulated gains and measured gains for all types of gain settings | 95 |

# **LIST OF FIGURES**

| Figure     Pag                                                                                                  | e  |
|-----------------------------------------------------------------------------------------------------------------|----|
| 1. The illustration of fluorescence in energy level diagram (a) and time domain (b)                             | 1  |
| 2. The system architecture of time-correlated single-photo counter (TCSPC)                                      | 2  |
| 3. The system architecture of the phase-modulation fluorimeter (a) and its time-domain response (b)             | 3  |
| 4. The illustration of integration by part fluorescence lifetime sensing mechanism 1                            | 3  |
| 5. The behavior model of the time-domain solid-state fluorimeter discrete-component prototype                   | 3  |
| 6. The block diagram of the discrete-component based fluorimeter prototype 1                                    | 6  |
| 7. The schematic of the pulsing circuit[66] 1                                                                   | 6  |
| 8. The analysis on ring oscillator 1                                                                            | 7  |
| 9. The schematic of the high speed LED driver circuit [66]                                                      | 20 |
| 10. The schematic of the sensing circuit[66]                                                                    | 21 |
| <ol> <li>The fabricated printed circuit boards of the discrete-component fluorimter<br/>prototype[66]</li></ol> | 24 |
| 12. The assembled discrete-components based fluorimeter prototype [66] 2                                        | 25 |
| 13. The measurement result of 5 $\mu$ M/L 'lucifer yellow' dissolved in PBS[66]2                                | 26 |
| 14. The measurement results for 3 commercial fluorophores[66]                                                   | 27 |
| 15. The detection limit measurement of the discrete-component fluorimeter prototype[66]. 2                      | 27 |
| 16. The fabricated PCBs of the second version discrete-component fluorimeter                                    | 28 |
| 17. The assembled second version discrete-component fluorimeter                                                 | 30 |

| 18. | The proportional-integral temperature control circuit used to control the instrument temperature                                                                          | . 31 |
|-----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 19. | The detection limit measurement result of the second version discrete component fluorimeter.                                                                              | . 32 |
| 20. | The schematic of a CMOS 3-T active pixel sensor                                                                                                                           | . 35 |
| 21. | A typical 3-T CMOS active pixel sensor layout                                                                                                                             | . 37 |
| 22. | The cross-sectional view of photodiode fabricated on p-type substrate (a) and on n-well (b)                                                                               |      |
| 23. | The layout of the 3-T CMOS APS testing array                                                                                                                              | . 39 |
| 24. | The schematic of the CMOS 3-T APS array testing board (a) and board connection (b)                                                                                        | . 40 |
| 25. | The assembled CMOS 3-T APS array testing boards                                                                                                                           | . 41 |
| 26. | The characterization curve of LED light power vs. LED current                                                                                                             | . 42 |
| 27. | The response of the 80 $\mu$ m × 80 $\mu$ m NDD/PW_HV PMOS FET-reset 3-T APS to light energy.                                                                             |      |
| 28. | The sensitivity (Gain) and the total non-linearity of NDD/PW_HV PMOS FET reset APSs                                                                                       | . 44 |
| 29. | The parasitic capacitors at the cathode of photodiode (a), and the dominant capacitors used to quantitatively analyze the sensitivity and the nonlinearity of the APS (b) | . 45 |
| 30. | The photon-energy transfer curve plotted according to analytical expression with different sizes of the reset transistors                                                 | . 46 |
| 31. | The sensitivity (Gain) and the total non-linearity measurement results for N+/PW_HV PMOS FET reset APSs.                                                                  | . 48 |
| 32. | The sensitivity (Gain) and the total non-linearity measurement results for N+/PW PMC FET reset APSs.                                                                      |      |
| 33. | The sensitivity (Gain) and total non-linearity measurement results for NW/PW_HV PMOS FET reset APSs.                                                                      | . 49 |
| 34. | The linear response to light energy for four types of APS with 80 $\mu$ m $\times$ 80 $\mu$ m photodiode                                                                  | . 50 |

| 35. | The sensitivity (Gain) and nonlinearity measurement result of the $80\mu m \times 80 \mu m$ NDD/PW_HV APS with 5 $\mu m$ wide NMOS FET reset transistor.                   | . 50 |
|-----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 36. | The comparison of sensitivity (Gain) and nonlinearity of NMOS FET and PMOS FET reset APSs with 80 $\mu$ m × 80 $\mu$ m diode and different reset transistor channel widths | . 51 |
| 37. | The equivalent circuit models of the APS during and after the excitation light interference in reset phase.                                                                |      |
| 38. | The coarse measurement on the reset speed of the APS                                                                                                                       | . 54 |
| 39. | The coarse reset speed measurement results for various types of APSs                                                                                                       | . 55 |
| 40. | The fine measurement on the reset speed of the APS                                                                                                                         | . 56 |
| 41. | The excitation light pulse measured by an avalanche photodiode (Hamamatsu, S2382) and a 5GS/s Oscilloscope                                                                 | . 57 |
| 42. | The fine measurement result of the 80 $\mu m$ $\times$ 80 $\mu m$ NDD/PW_HV APS with 0.35 $\mu m$ wide PMOS-FET-reset.                                                     | . 58 |
| 43. | The fine measurement result on 80 $\mu$ m × 80 $\mu$ m NDD/PW_HV APS with 68 $\mu$ m PMO FET-reset.                                                                        |      |
| 44. | The fine measurement result on the 80 $\mu$ m × 80 $\mu$ m N+/PW_HV APS with 0.35 $\mu$ m PMOS reset transistor                                                            | . 59 |
| 45. | The fine measurement result for 80 $\mu m$ $\times$ 80 $\mu m$ N+/PW_HV type APS with 68 $\mu m$ PMOS reset transistor                                                     | . 59 |
| 46. | The system architecture of the proposed monolithic fluorimeter IC                                                                                                          | . 62 |
| 47. | The block diagram of a complete imager system for one selected pixel                                                                                                       | . 64 |
| 48. | The block diagram of the gating controller                                                                                                                                 | . 65 |
| 49. | The RTL block diagram of the 16-bit pre-scalar                                                                                                                             | . 66 |
| 50. | The schematic of pulse generator                                                                                                                                           | . 68 |
| 51. | The schematic of the trans-conductance amplifier used in the pulse generator                                                                                               | . 68 |
| 52. | The tuning range of the excitation and reset pulse when $V_{ref}$ was 1.25 V                                                                                               | . 70 |
| 53. | The tuning range of the excitation and reset pulse when $V_{ref}$ was 1.3 V                                                                                                | . 70 |

| 54. | The tuning range of the excitation and reset pulse when $V_{ref}$ was 1.4 V                                                                       | 71   |
|-----|---------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 55. | The schematic of the delay line circuit[78]                                                                                                       | . 72 |
| 56. | The measurement results on the 64-stage delay line that delays the reset pulse                                                                    | 73   |
| 57. | The measurement results on the 128-stage delay line that delays the excitation pulse                                                              | 74   |
| 58. | The schematic of the switched-capacitor variable gain amplifier circuit                                                                           | 75   |
| 59. | The equivalent circuits of stage 1 in sampling phase (a) and in amplification phase (b)                                                           | . 76 |
| 60. | Frequency response of op amps used in switched-capacitor PGA based on post-layout simulation                                                      | . 78 |
| 61. | The noise sources of stage 1 amplifier during sampling mode                                                                                       | 80   |
| 62. | The noise sources of the SC-PGA in amplification phase                                                                                            | 81   |
| 63. | The effective number of bit (ENOB) of the signal at the output of the amplifier vs. pixe output for different input referred RMS noise of the PGA |      |
| 64. | The post-layout simulation test bench for a single channel PGA                                                                                    | 86   |
| 65. | The post-layout simulation result for a single channel PGA matched the hand-calculation results.                                                  |      |
| 66. | In the schematic of the dual-12-bit coarse-fine DAC                                                                                               | 88   |
| 67. | The lab test bench to measure the single channel PGA gain                                                                                         | . 88 |
| 68. | The measured transfer function of a PGA channel                                                                                                   | . 89 |
| 69. | post-layout simulation result on a single-channel PGA layout without clock skew                                                                   | . 90 |
| 70. | post-layout simulation result on a single-channel PGA in 32-channel PGA array withou clock skew                                                   |      |
| 71. | The parasitic capacitance model of the stage 1 when a single-channel layout is place in the array                                                 |      |
| 72. | The illustration of clock skew on affecting the gain of the SC-PGA                                                                                | 93   |
| 73. | post-layout simulation results on a single-channel PGA in 32-channel PGA array when clock skew was included                                       |      |

| 74. | The block diagram of the CPOSSAR ADC                                                             | 97  |
|-----|--------------------------------------------------------------------------------------------------|-----|
| 75. | The overlapping subranges used to remove the error produced in coarse conversion cycle           | 100 |
| 76. | A level shifter circuit used to produce the redundant overlapping subranges                      | 100 |
| 77. | The block diagram of the serial data I/O of in CPOSSAR ADCs                                      | 102 |
| 78. | The measured bit error rate of the ADC I/O under two different transmission bandwidth            | 103 |
| 79. | The test bench to measure an individual ADC channel                                              | 104 |
| 80. | The measured DNL and INL on the single channel CPOSSAR ADC                                       | 105 |
| 81. | The finite state machine of the full custom top-level controller circuit                         | 106 |
| 82. | The typical output waveform of the top-level controller for a single measurement                 | 108 |
| 83. | The finite state machine of the lower-level controller                                           | 109 |
| 84. | The typical timing waveform of the lower-level control logic                                     | 110 |
| 85. | The synchronous pipeline units used in the controller design                                     | 110 |
| 86. | The block diagram of the top-level controller                                                    | 111 |
| 87. | The tested waveform that verified the correct functional behavior of the top-level control logic | 113 |
| 88. | The block-level interconnect of the lower-level controller                                       | 113 |
| 89. | The measured waveform of the lower-level controller when the "Global_reset" is high.             | 114 |
| 90. | The measured waveform of the lower-level controller after the "Global_reset" signal pulled low   |     |
| 91. | The pulse synchronizer is used to gate off the "Pixel_reset" in reset phase                      | 116 |
| 92. | The RTL block diagram of the PGA sampling timer                                                  | 117 |
| 93. | The implementation of the 8-bit programmable timer                                               | 118 |

| 94. The schematic of the Data Load logic                                                                | 119 |
|---------------------------------------------------------------------------------------------------------|-----|
| 95. The schematic of the PGA amplification timer.                                                       | 120 |
| 96. The schematic of the 8-bit programmable pre-scalar                                                  | 120 |
| 97. The schematic of the ADC state register logic                                                       | 121 |
| 98. The schematic of the ADC clock synchronizer                                                         | 122 |
| 99. The schematic of the SAR reset controller logic                                                     | 123 |
| 100. The schematic of Coarse Control logic                                                              | 124 |
| 101. The schematic of the Fine Control logic                                                            | 126 |
| 102. The schematic of the ADC data load logic                                                           | 127 |
| 103. Three different reference generation circuits used in the chip controller board                    | 131 |
| 104. The block diagram of the prototype imager device implementing the CMOS imager IC                   | 132 |
| 105. The assembled portable lensless fluorimeter built by the CMOS imager IC                            | 133 |
| 106. The delayline controlled excitation pulse and reset pulse for optical decay measurement            | 134 |
| 107. The measured (fitted) 405 nm excitation pulse falling edge (a) and the differentiated data (b)     |     |
| 108. The measured lifetime of 500 $\mu$ M/L fluorescein sodium salt dissolved in PBS                    | 136 |
| 109. The measured lifetime of 500 $\mu$ M/L Lucifer Yellow sodium salt dissolved in PBS                 | 137 |
| 110. The measured lifetime of 10 $\mu$ M/L Protoporphyrin IX dissolved in PBS                           | 137 |
| 111. The measured lifetime of fluorescein sodium salt dissolved in PBS at different concentrations      | 138 |
| 112. The noise model of one sensing channel in the fluorimeter IC                                       | 139 |
| 113. The generalized time-domain measurement data on a fluorophore sample (a) and its decomposition (b) | 141 |
| 114. The noise model of a 3-T active pixel sensor                                                       | 142 |

| 115. The RMS voltage of the detectable signal vs. the amplifier gain under different improvement factors                                                          | 145 |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 116. The first stage of the PGA connected to a pixel sensor                                                                                                       | 147 |
| 117. The coordinate of pixels at the corner of pixel array                                                                                                        | 148 |
| 118. The measured fluorescence lifetime data by three pixels located at the corner                                                                                | 149 |
| 119. The modified first stage PGA circuit with the dynamic biasing function                                                                                       | 149 |
| 120. The connection of the stage 1 amplifier in (a) pixel reset phase, (b) pixel integration/first A-D-conversion phase, and (c) pixel output amplification phase | 150 |

#### **CHAPTER 1**

#### INTRODUCTION

Fluorescence lifetime spectroscopy is the technique of measuring the excited-state lifetime of fluorophores[1], molecules that emit lower energy light upon the excitation by higher energy light, as shown in Fig. 1 (a). The technique is frequently used in chemistry to investigate the steady-state property of molecules and to study their behaviors in chemical reactions[2]–[5]. It is also a useful tool in biochemical environmental monitoring, because lifetimes of fluorophores can be modulated by environmental parameters, such as pH and other ion concentrations [6]–[10].

The fundamental understanding of fluorescence lifetime can be made by realizing that fluorescence is a stochastic process. The average time a molecule spends in excited state before releasing its energy, in the form of light, is called the fluorescence lifetime [11]. The fluorescence reaches the maximum intensity within a pico-second timescale after the fluorophore molecule is excited by a short pulse light, as shown in Fig. 1 (b). It decays exponentially after the excitation light is removed. The time constant,  $\tau$ , associated with the decay is called the fluorescence lifetime of that fluorophore.



Fig. 1 The illustration of fluorescence in energy level diagram (a) and time domain (b)

Commercial fluorophores exhibit lifetimes in the range from tens of picoseconds to tens of nanoseconds[12]. Such a small timescale requires high speed imaging devices to make accurate measurement. Modern fluorescence lifetime imaging microscopes (FLIM) rely on two mainstream methods, one based on time-domain photon counting, another based on frequency-domain modulation[13].



Fig. 2 The system architecture of time-correlated single-photo counter (TCSPC)

In the time-domain method, the fluorescence decay profile is recorded in timedomain directly. This type of instruments includes time-gated image intensifiers[14], [15], directly time-gated CCDs[16], [17], time-gated photon counters[18], [19], and the most prevalent time-correlated single-photon counters (TCSPC)[20]. Fig. 2 illustrates the block diagram of a typical TCSPC system. In the instrument, fluorophores are periodically excited by a train of laser pulses to produce periodic fluorescence decays. The fluorescence is received by a photomultiplier tube (PMT) that converts light signal to an electric signal. At the first observation of a photon in each cycle, the PMT is triggered and that time point is registered by a constant fractional discriminator (CFD). In a parallel path, a photodiode (PD) detects the excitation pulse as a time reference. Here the CFD reduces noise and generates a sharp-edged trigger signal. The outputs from two CFDs start and stop a time-to-amplitude-converter (TAC), which converts the time interval between the first-photon event and registered time reference into a voltage. An analog-todigital converter (ADC) is then used to digitize the voltage, and its subsequent circuit decodes the binary into a memory address. The content in that memory location is incremented by one when the channel address is sensed. Because each memory space represents a time bin, the overall amplitude across all time bin channels represents the total number of photons observed at the entire time window. Fluorescence lifetime can be calculated performing a least square fit on the histogram constructed by the entire memory contents.

An alternative method to measure the fluorescence lifetime is called frequencydomain method. It is an indirect measurement of the lifetime by characterizing the frequency response of the fluorophore detector system. Two types of modulation are commonly used, one with a continuous-time sinusoidal wave[21]–[25], another with a laser pulse train[26]–[28]. Sinusoidal modulation is simpler but has limited modulation frequency range usually from 1 MHz to 200 MHz. Pulsed modulation can improve the range up to 1 GHz by utilizing high-order harmonic components in high-repetition-rate square pulses[29]. A wider modulation frequency scanning range is meaningful, because some useful fluorophores only produce significant phase change between its fluorescence and excitation signal for sufficient signal to noise ratio at high modulation frequency.



Fig. 3 The system architecture of the phase-modulation fluorimeter (a) and its time-domain response (b)

As depicted in Fig.3 (a), in continuous-time frequency-domain fluorimeter, the sample is excited by a laser diode (LS) or an LED modulated with a single-tone sine wave electric signal (OSC). A band-pass filter, called emission filter, removes the excitation wavelength in the emission angle. The filtered emission light is a time-varying signal that follows sinusoidal function. Its frequency is the same as the modulation signal, but the phase is shifted by an offset for that frequency. The filtered emission light is linearly converted into the electric signal by a photodetector (photo-multiplier-tube, PMT or photodiode, PD), as shown in Fig. 3 (b). The phases of the emission signal and the excitation signal are compared. Their difference is used to calculate the lifetime of the fluorophore using Eqn. 1.

where  $\Delta \phi$  is the phase shift,  $\tau$  is the lifetime, and  $\omega$  is the angular frequency of the modulating signal. When more than one fluorophores are involved in the measurement, multi-exponential decay needs to be examined, called fluorescence lifetime multiplexing. The measurement is performed by scanning the modulation frequency across a broad range and fitting the phase-shift-modulation-frequency response to a set of equations[26], so that the multiple lifetimes can be extracted from the data. Frequency-domain fluorimeters generally provide relatively shorter data acquisition time than time-domain instruments, because the steady state of the excitation-emission is established in picosecond time scale, and the phase information can be read out immediately.

TCSPC systems are the leading technology in photon counting area. The state-ofthe-art commercial TCSPC bench-top (as of March 2016) performs a detection limit down to 50 fM (fluorescein, Fluorolog-3<sup>®</sup>, HORIBA Scientific), and the minimum lifetime detection of less than 40 ps (PicoMaster<sup>TM</sup>, HORIBA Scientific). However, because of the requirement on complicated optics, high-power lasers, high voltage (>1000V) photo-multiplier tube, and the large instrument body, TCSPC are expensive and bulky. Although frequency-domain fluorimeters are less complicated and can measure faster lifetimes (20 ps, K2<sup>TM</sup>, ISS) than time-domain fluorimeters built with similar components, samples excited by continuous-time sinusoidal wave or highrepetition-rate pulses suffer from more significant photo-bleaching, the effect that molecules tend to reduce their emission intensity after long time excitation.

This dissertation builds upon work in both discrete circuit and integrated circuit devices for fluorescence lifetime measurement. An early fluorescence lifetime instrument was reported in 1974. It was based on a photo-multiplier tube connected to a sampling oscilloscope[30]. This time-resolved system could measure fluorophores with 1.5 ns and 17.1 ns lifetime. But it used a LINC computer to average the data, magnetic tapes for data collection, and a \$2.9 billion IBM 7090 for post data processing. In addition to the TCSPC and frequency-domain methods previously introduced, a number of discrete imagers have been built using a CCD coupled with a time-gated image intensifier[15], [31]–[35]. The temporal resolution of these instruments can be as low as 10 ps, but all of them require high power pico-second-scale laser and complicated optics, making them hardly portable. The above instruments are usually only accessible in research laboratories and have limited access for point-of-care diagnosis.

A portable time-domain fluorimeter achieved a compact size of  $21 \times 40 \times 50$  cm<sup>3</sup>[36], but it used expensive electronics and laser driver as in a TCSPC system. Joanneum Research in Austria developed a low-cost semiconductor based portable

fluorimeter to measure their oxygen sensor, but the detection range is in microsecond scale[37]. The portable filterless discrete circuit fluorimeter was able to measure ~ 5  $\mu$ M dye solution with lifetime of 1 ns, 2 ns, and 4 ns[38], but it used expensive laser diode as the excitation source, which also restricts the excitation wavelengths. Apparently, these portable fluorimeters can not satisfy nanosecond scale lifetime standards more widely used in research and industry[12].

Time-domain fluorimeters previously reported were either fast but expensive, making them only suitable to detect nanosecond-scale to sub-nanosecond-scale lifetime, or cheap but slow, which can be applied in most of the non-critical applications that requires microsecond-scale measurements. The fluorimeters built in this work was aimed at measuring nanosecond-scale fluorescence lifetime on a portable device, but their manufacturing cost are significantly reduced from the commercial high speed fluorimeters. A table is listed to compare the cost, speed, and portability for all types of fluorimeters.

| Works      | Technique            | Cost | Resolution | Detection limit   | Portable |
|------------|----------------------|------|------------|-------------------|----------|
|            | Photomultiplier tube |      |            |                   |          |
| [30]       | + oscilloscope       | High | 1.5 ns     | 2 µM/L            | No       |
|            | CCD + time-gated     |      | 185 ps     |                   |          |
| [15], [31] | image intensifier    | High | 150 ps     | N/A               | No       |
| [35]       | Streak camera        | High | 50 ps      | N/A               | No       |
| [36]       | TCSPC                | High | 200 ps     | $<1000 \ \mu M/L$ | Yes      |
| [37]       | Frequency-domain     | Low  | 1 µs       | N/A               | Yes      |
| [38]       | Time-domain          | High | 0.8 ns     | 1 μM/L            | Yes      |
| This work  | Time-domain          | Low  | 3.6 ns     | 0.25 µM/L         | Yes      |

Table. 1 The comparison of various types of fluorimeters

The time-domain instruments reported in this work trade off between cost and speed, as shown in Table. I. To improve the portability of instruments in conventional

sampling oscilloscope, TCSPC systems, frequency-domain instruments, and time-gated image intensifier-CCD instrument, the discrete-component fluorimeter presented in this work is filterless and requires no optics. It applies electric shutter to reject the excitation light from fluorescence emission in time domain. To improve its performance compared to existing portable fluorimeters, the presented device employs totem pole circuit in the LED driver for fast excitation. A novel sensing circuit is built in the instrument to integrate the nanosecond fluorescence decay by part in a time-gated mode.

After the discrete-component fluorimeter ptototype was built and verified, its advanced version, the integrated circuit based fluorescence lifetime imagers, were built. They was designed to capture images across an sensor array, measuring the fluorescence lifetime at each pixel. These compact imagers are also divided into time-domain and frequency-domain operation modes. In frequency domain, silicon photodiodes and custom trans-impedance-amplifiers have been integrated on a CMOS IC for fluorescence lifetime detection[25]. The power consumption and cost of such portable device are improved over existing discrete circuit fluorimeters, but the detection limit is limited to 7.5  $\mu$ M/L for fluorescein. It also requires filters to separate excitation light from emission fluorescence, because sample is excited continuously.

In time-domain, TCSPC imager ICs utilizing silicon single-photon-avalanchediode (SPAD) are able to detect single-photon events after excitation the same as in discrete-component instruments[39]–[45]. But these conventional TCSPC imagers suffer from pile-up effect[46], [47], the loss of photons due to the dead-time of the device. Time-gated TCSPC imagers count the total number of photons in multiple observation windows to eliminate pile-up effect[48]–[56], but pixels designed for this purpose include

digital circuits for gating and counting, making their fill factors usually only a few percent low. This can be improved by creating micro-lens above the pixel, but additional cost is needed based on the standard CMOS fabrication[57].

Another type of time-domain fluorimeter ICs utilizes silicon photodiode to integrate photons in an observation window rather than digitally count them. A timedomain imager used the differential photo-diode to build a pixel[58]. By gating on the sensor only in programmable time windows, photons within the window is converted into the differential component for read, while the rest of the light is in common mode that is rejected[59]–[61]. The fill factor, however, is less than 25%, because the dummy structure occupies half of the pixel area, and the limited reset current requires wide reset transistor for fast operation.

Later, based on conventional pinned photo-diode active-pixel-sensor (APS), an additional charge transfer gate was implemented in the pixel to achieve gating, charge storage, and in-pixel amplification[62]. But the experimental result showed that fluorescence lifetime measured by the IC varied with samples' concentration, contradictory to the common belief that the fluorescence lifetime is not a function of concentration. One charge transfer gate of the sensor can be removed by a special doping profile to simplify the control[63], but the optimized doping concentration is not supported in the standard technology, and a calibrated turn-on voltage for an efficient charge transfer is required.

Photodiode compatible with CMOS technology has been used in pixel sensors to measure fluorescence lifetime[64]. Its pixel circuit, however, includes a 3mm-wide reset transistor, a 1.5 mm-wide isolation transistor, and a differential amplifier. The resultant

fill factor is less than 14% for a large  $180 \times 415 \,\mu\text{m}^2$  pixel area. This limited the array size to be only 8 × 4 in that work. The fabricated IC was able to measurement the falling time of the excitation pulse varying from 1 ns to 5 ns, but no actual fluorescence decay measurement was performed. The real measurement on the fluorescence can be much more challenge, because of the very weak light intensity of the fluorescence compared to the excitation light.

The integrated circuit based fluorimeter presented in this work is able to measure fluorescence lifetime in time-domain. It removes filters required by frequency-domain imager IC to make the final device more portable. It overcomes the low fill factor issue associated with SPAD by using compact 3-T APSs compatible with the standard CMOS technology. The reduced sensitivity of APS compared to SPAD is overcome by high gain switch capacitor column amplifier and large number of averaging (~10,000). The gating controll of sensors, pulse generation, delay control, data conversion, global timing circuits, and data transmission are all integrated on a single chip to minimize the off-chip discrete components. This allows for a more portable imager than existing fluorimeters.

In summary, the motivation of this work is to leverage the cost, portability, and the performance of CMOS integrated circuits to build a time-domain fluorimeter that is suitable for point-of-care diagnostics. It was designed to demonstrate accurate measurement and low detection limit required in biochemistry laboratories. The project was partitioned into four development phases starting from May 2013. The detail in each phase will be discussed in the subsequent chapters.

In chapter 2, the novel time-domain sensing principle is described and first implemented in a single-pixel solid-state discrete-component device. Chapter 3 then

presents the characterization of a family of active pixel sensors (APS) fabricated in 0.35 µm CMOS technology for use in an integrated circuit version of the time-domain fluorimeter. In chapter 4, the mixed-signal processing units are designed and partially characterized in silicon. The circuits were finally integrated into a monolithic imager IC with an array of sensors, all signal conditioning circuits, timing controllers, and the data transmission interface. Chapter 5 describes how the IC was integrated into a board-level camera module used to carry out real measurements on commercial fluorophores. The experimental results will be presented and discussed. Chapter 6 summarizes the dissertation and proposes the future work.

#### **CHAPTER 2**

# ALL-SOLID-STATE TIME-DOMAIN FLUORIMETER DISCRETE-COMPONENT PROTOTYPE

#### 2.1 Discussion on existing fluorescence lifetime sensing method

Time-domain fluorimeters can be more portable and can achieve higher performance than phase-domain fluorimeters. As explained in chapter I, time-domain fluorimeters involve more complicated electronics, because of the gated excitation and the discrete-time electronics. However, gating circuits can separate the excitation light from the emission in time-domain better than a filter does for frequency-domain instruments, so that filters are no longer needed. Second, the improved discrimination between the excitation and emission improves the signal-to-noise ratio, allowing a better detection limit (lower detectable fluorophore concentration). Third, excitation pulses in time-domain fluorimeters need not to have a high-repetition rate. This allows the use of light-emitting-diode (LED) that usually responds slower, but is able to provide more wavelength options at the much lower cost than solid-state laser sources.

Based on the above discussion, the fluorimeter developed in this work measures the fluorescence in time domain. However, the TCSPC method imposes many challenges to make the instrument portable. First, single-photon counting is most frequently performed using photo-multiplier tubes. These vacuum-tubes generate up to  $\sim 1 \times 10^6$ electrons per incidental photon, but the body is many times larger than a solid-state electric competent, such as solid-state photo-diode that senses the light. Silicon photomultiplier (SiPM), the solid-state version of the vacuum-tube photo-diode, provides an

alternative solution. It behaves similarly to a vacuum-tube photo-multiplier, but TCSPC instruments rarely choose SiPM, because of the high dark count rate, and the lower peak avalanche current[65].

Second, the histogram recording in TCSPC requires high-speed circuits to process data in each excitation cycle. As an example, the pico-second-to-nano-second time interval between the end of the excitation and the first arrival photon event should be first converted into a voltage by an TDA. This voltage is digitized by an ADC to generate the binary data. The data is finally decoded to perform a memory write. The time consumed by the entire signal processing limits the period of the excitation, and that will limit the measurement time because a large number of excitation cycles need to be carried out to generate enough data point in the histogram to approximate the time-resolved fluorescence decay.

Third, high-end TCSPC instruments use filters to reduce the excitation-emission interference. However, the design of optics usually leads to a bulky instrument.

## 2.2 Time-gated photon integration sensing method

The above drawback presented in TCSPC instrument can be improved by using the time-gated photon integration method[38]. In this method, the fluorescence decay profile is measured by time-domain integration rather than histogram recording. In Fig. 4, a pulsed reset signal gates off the photodetector when it is logic high, and gates on the detector to integrate the fluorescence as it falls down. A delay is implemented between the excitation falling edge and the reset falling edge to start a time window called integration window. If using the first time window, almost the entire decay profile is integrated, as shown in the shaded area on the left. Then the reset pulse is delayed by 250

ps with respect to the excitation pulse, so that the integration window is delayed by 250 ps, and less fluorescence decay is integrated, as can be visualized by a smaller shaded area. If the delay of the reset pulse is further increased by a step of 250 ps, the fluorescence decay profile is essentially partitioned in many segment, and the area underneath each segment is measured by the electronics. Measurement data can be used to calculate the fluorescence lifetime by numerical method.



Fig. 4 The illustration of integration by part fluorescence lifetime sensing mecnanism



Fig. 5 The behavior model of the time-domain solid-state fluorimeter discrete-component prototype

The behavioral model of the time-domain solid-state fluorimeter discretecomponent prototype is presented in Fig. 5 to realize the time-gated photon integration method. Two synchronized pulses are produced by a pulse source. A programmable delay circuit generates the delayed reset pulse relevant to the excitation pulse. The delay value can be an integer multiple of the unit delay, D (250 ps), and can vary from 0 ns to 63 ns, nD, where n is an integer from 0 to 255. After the fluorophore sample is excited, fluorescence in the defined integration window is received by the photo-detector, PD, and converted into a time varying current, I(t). It is integrated into a voltage, called F(nD) because the voltage is a function of the delay between excitation and reset pulses noted by nD. This transient signal is averaged over time to produce a DC voltage. It is then amplified and digitized to create one data point indexed by the delay index, n. In the next step, n is increased by 1 to create one more data point, and so forth. In continuous-time mode, where the delay step is infinitely small, the integration of the fluorescence decay can be written as

$$F(t) = \int_{t}^{\infty} f(u) du$$
 Eqn. 2

the lower limit, t, represents the start point of the integration window, and f(u) is the time-resolved fluorescence after the excitation pulse

$$f(u) = Ae^{\frac{-u}{\tau}}$$
Eqn. 3

The integration upper limit in Eqn. 2 can be set infinity large as long as the reset pulse period is much larger than the lifetime of the fluorophore sample. The continuous variable, t, in Eqn. 2 can be replaced by the discrete delay value, nD. So Eqn. 2 in combination of Eqn. 3 can be rewritten as

$$\frac{F(nD)}{T} = \frac{\int_{nD}^{\infty} f(u)du}{T} = \frac{A\tau e^{\frac{-nD}{\tau}}}{T}$$
Eqn. 4

where T is the duration of the integration window, or the total amount of time when the reset pulse stays low in one cycle. Eqn. 4 suggests that despite the fact that the output from the averager circuit is not the raw fluorescence signal that comes from the chemical, the time constant remains the same. Based on this the data measured at the output of the

fluorimeter can be directly used to fit to a single-exponential function to extract the lifetime. In the following section, the detailed circuit implementations for each behavior model block will be discussed.

#### 2.3 Hardware architecture and circuit implementations

## 2.3.1 System block diagram

The time-domain fluorimeter discrete-component prototype is partitioned into several functional blocks, as shown in Fig. 6. A tunable ring oscillator produces a 1 MHz system master clock. It is converted into two short pulses in a delay circuit. One pulse is used to reset the sensing circuit; another one is delayed from the reset pulse and used for excitation. Although in Fig. 5 the reset pulse is the one to be delayed for illustration purposes, it turns out delaying the excitation phase can be easier to be made with out changing the phase of control signal of the more complicated sensing circuit than the excitation circuit. The sample is excited by this variable-phase excitation signal in each delay step, and the fluorescence is measured by the sensing circuit that is operated under a reset signal with a constant phase. Finally a microcontroller unit (MCU) controls the excitation pulse delay and digitizes the sensing circuit's output. The functional blocks are built on a PCB to act as a slave device controlled by a personal computer (PC). Command and data are transmitted between PC and the MCU through an USB cable. A complete measurement is performed by first setting an excitation delay and recording 100 digitalized outputs from the MCU for averaging. Then the same process is repeated with a different excitation pulse delay, and so forth.



Fig. 6 The block diagram of the discrete-component based fluorimeter prototype

# 2.3.2 Ring oscillator

The clock used to produce the excitation and the reset pulse is generated from a 3stage ring oscillator shown in Fig. 7. Each stage includes an inverter, labeled by U<sub>1</sub>, U<sub>2</sub>, and U<sub>3</sub>. A 74 k  $\Omega$  resistor connects the output of U<sub>2</sub> and the input of U<sub>3</sub> to form a slow signal path. A variable capacitor, C<sub>var1</sub>, is connected between the inputs of U<sub>2</sub> and U<sub>3</sub> to form a fast signal path. The frequency of the output clock is adjusted by the value of C<sub>var1</sub>.



Fig. 7 The schematic of the pulsing circuit[66].

The waveform at node  $V_1$ ,  $V_2$ , and  $V_3$  in the ring-oscillator circuit are plotted in Fig. 8(a) for an analysis. Inverter  $U_3$  acts as a single ended comparator. As its input,  $V_3$ , moves across the threshold voltage, VDD/2, the output of  $U_3$  toggles, and the signal is propagated to node V<sub>1</sub> through inverter U<sub>2</sub>. At the rising edge of V<sub>1</sub>, the step input of U<sub>2</sub> is transmitted through the fast signal path involving C<sub>var1</sub> and C<sub>x</sub>, so that the voltage at node V<sub>3</sub> has a corresponding step voltage,  $\Delta V_3$ 

$$\Delta V_3 = \frac{C_{\text{var1}}}{C_{\text{var1}} + C_x}$$
Eqn. 5

Because the left terminal of  $R_1$  is driven to ground by inverter  $U_2$ , and its right terminal is above the ground, current flows through  $R_1$  to discharge capacitors. This process is ended when  $V_3$  is decreased to below VDD/2, when the state of  $U_3$  and  $U_1$  toggles, and a falling edge appears at  $V_1$ . At this point,  $V_3$  also falls by a voltage equals to Eqn. 5, and the current through resistor  $R_1$  now charges capacitors to drive  $V_3$  high. The charging process continues until  $V_3$  is increased above VDD/2, when  $U_3$  and  $U_1$  toggles, and a rising edge appears at node  $V_1$ . After this point the state of the circuits returns to the initial condition where  $V_1$  trips to VDD. So the waveforms in each node repeat the above behavior to produce the large signal clock at the output of the ring oscillator. The times spent on both charging and discharging processes on capacitors are identical, called  $T_d$ .



Fig. 8 The analysis on ring oscillator

To calculate  $T_d$ , the equivalent circuit for discharging capacitors after the rising edge of V<sub>1</sub> is plotted in Fig. 8(b). The differential equation associated with the circuit is

$$\frac{V_3(t)}{R_1} = C_{\text{var1}} \frac{d(VDD - V_3(t))}{dt} + C_x \frac{d(0 - V_3(t))}{dt}$$
Eqn. 6

Solving the equation by using the initial condition listed in Fig. 8(b), and equating the result to the threshold voltage of  $U_3$ , VDD/2, the clock frequency of the pulse is

$$f_{pulse} = \frac{1}{2T_d} = \frac{1}{2R_1(C_{var1} + C_x)\ln(1 + \frac{2\Delta V_3}{VDD})}$$
Eqn. 7

The calculated range of frequency by adjusting  $C_{var1}$  from 30 pF to 6.5 pF is from 195 kHz to 1.5 MHz, and the measurement result was from 212 kHz to 1.5 MHz. The error at the low end is less than 8%. The reason could be for a large ratio of  $C_{var1}$  to  $C_x$ , since this node is connected to a pin of the IC, the step at  $V_3$  is limited by the ESD circuit of the inverter IC, so it takes less time for the RC circuit to charge and discharge, leading to a faster oscillating frequency observed at the output. However, because it is non-linear in nature, and is dependent on the particular inverter IC being used, the modeling of such effect is out of the scope of study. In real measurement, the clock frequency was chosen 1 MHz by adjusting  $C_{var1}$  to a proper value.

# 2.3.3 Delay circuit

Two delay lines receive the clock generated from the oscillator and produce two sub-clocks that are synchronized but are delayed from the raw clock. They are produced at node  $E_1$  and  $R_1$ , as shown in Fig. 7. The succeeding circuit for each signal includes a high-pass filters and inverter buffers. The full width at half minimum (FWHM) of the high-pass filter satisfies

$$t_{FWHM} = RC\ln(2)$$
Eqn. 8

R and C are the resistance and the capacitance associated with the filter. In the measurement, the pulse width of the excitation pulse can be adjusted between 20 ns to 70 ns. Because a larger resistor is used in the high-pass filter that produces the reset pulse, the pulse width is tunable between 30 ns and 100 ns. In the real measurement, excitation pulse width was chosen 20 ns, and the result pulse width was 84 ns. The difference in between fits the delay line range of ~64 ns, so that in time domain the envelope of the excitation pulse can be shifted from the rising edge of the reset pulse to the falling edge of the result pulse, supporting the maximum detection range.

### 2.3.4 Excitation circuit

Because the signal to drive the LED is a pulsed signal, with its active high state lasting only 20 ns in a 1000 ns clock period, a temporary biasing voltage, ~ 4.3 V more than the normal constant biasing voltage , 3.8 V, used for the selected LED according to the data sheet (Bivar Inc/UV5TZ-405-15), can be chosen to maximum the peak light power in just 20 ns in one period. This is feasible because after the LED is over driven in a 20 ns time window, in the following 980 ns the PN junction is able to cool down. This was verified by observing no circuit failure after operating the circuit in a long term experiment over 1 week. Excitation light exhibits a higher power when the LED is biased at a higher voltage. It is helpful to generate stronger fluorescence from the fluorophore chemicals, so that it can improve the signal-to-noise ratio at the front-end of the fluorimeter.



Fig. 9 The schematic of the high speed LED driver circuit [66]

The LED driver to excite the fluorophore samples in nanoseconds time interval and stop the excitation within sub-nanosecond scale requires fast switching and the ability to deliver high power. In this design, the push-pull driver stage is modified into the circuit shown in Fig. 9 to drive the LED. It is built by stacking two NPN UHF power transistors (NXP Semiconductors /BFG21W, 115,  $f_T$ =18 GHz),  $Q_1$  and  $Q_2$ , and driving the LED at node B. Both  $Q_1$  and  $Q_2$  are built with 4 and 2 parallel connected transistors to increase the peak current. A resistor  $R_1$  is connected between the power supply and the collector of  $Q_1$  for current sensing purpose. It was shorted in normal operation. Instead of using a PNP transistor at the lower part,  $Q_2$  is constantly biased by a voltage divider. The use of an NPN transistor eliminates the effect that the base-emitter junction of a PNP transistor will be very weakly biased as the voltage at node B approximates 0.6 V. This effect will limit the sink current to shut down the LED at the end of each excitation cycle.

 $Q_1$  is switched on at the rising edge of the excitation pulse to provide current  $I_1$ .  $Q_2$  delivers a constant current  $I_3$  when  $Q_1$  is turned on. The difference between  $I_1$  and  $I_3$  supplies the LED. The cathode of the LED can be grounded or biased at a negative voltage to further increase the peak biasing voltage for a higher peak light power. The measured peak power from the LED was 42 mW and 106 mW in each case when power

supply was 5 V. At the falling edge of the excitation pulse,  $Q_1$  is turned off, and charges at node B is sunk to ground by  $Q_2$  to shut off the LED as fast as possible.  $Q_2$  initially stays in the active region because voltage at node B is larger than the base voltage, ~ 0.6 V. In the end of discharging,  $Q_2$  moves to the saturation region with a constant resistance.

### 2.3.5 Sensing circuit



Fig. 10 The schematic of the sensing circuit[66]

The schematic of the sensing circuit is shown in Fig. 10. Fluorescence is received by an avalanche photodiode (APD),  $D_1$ , with its cathode biased by a high voltage source. The anode of  $D_1$  is connected to the collector of the bipolar transistor  $Q_1$ . It is the same UHF power transistor used in the excitation circuit. The transistor used in sensing circuit is to reset node A to 3.3 V when it is turned on. At the reset state, the reversed voltage across APD is ~ 130 V to achieve a photo-multiplication gain of 100 at 20°C, according to the device data sheet (Hamamatsu s2382). When  $Q_1$  is turned off, node A becomes a high-impedance node, and photo-current generated by the APD is integrated on its parasitic capacitor,  $C_{p1}$ , to produces a voltage signal with its amplitude proportional to the amount of received photon from the fluorescence. The voltage signal at node A is periodic, because  $Q_1$  is reset by the reset pulse at 1 MHz clock frequency. A low-pass filter, consisting of  $R_3$  and  $C_{p4}$ , averages the periodic voltage signal and produces a DC signal at the non-inverting input of op amp U<sub>1</sub>. Then U<sub>1</sub> buffers the voltage to the next stage. A replicated circuit is put in parallel at the bottom to create the differential signal seen at node V<sub>1</sub> and V<sub>2</sub>. The error voltage is amplified by a subtractor, realized by U<sub>3</sub> and resistors  $R_7 \sim R_{10}$ . The subtractor is an inverting amplifier when regarding its non-inverting input biased at a constant DC voltage. This is true because the dummy APD, D<sub>2</sub>, does not receive any light during the operation. The small signal output of the subtractor satisfies

$$V_3 = -V_1 \frac{R_9}{R_7} + V_2 \frac{R_{10}}{R_8 + R_{10}} \frac{R_7 + R_9}{R_7}$$
 Eqn. 9

By defining  $R_9=R_{10}$ , and  $R_7=R_8$ , Eqn. 9 is reduced to

$$V_3 = -(V_1 - V_2)\frac{R_9}{R_7}$$
 Eqn. 10

The value of  $R_7$  and  $R_9$  was chosen to be 33  $\Omega$  and 10 k  $\Omega$ , resulting in a differential gain of ~ 303. The succeeding stage involving U<sub>4</sub>, resistors  $R_5 \sim R_6$ , and  $R_{11} \sim R_{12}$  is a unity gain inverting amplifier. Its output is further filtered by an R-C low-pass filter with the -3dB band width of ~1.6 Hz.

The power spectrum density (PSD) of the thermal noise, contributed from all resistors, at the output of  $U_4$  follows

$$\overline{V_{n,R}^2} = 4kT \cdot \left[2R_3 \cdot \left(\frac{R_9}{R_7}\right)^2 + \frac{R_9^2}{R_7} + R_9 + \left(R_8 \parallel R_{10}\right) \cdot \left(1 + \frac{R_9}{R_7}\right)^2 + \left(R_5 \parallel R_6\right) \cdot \left(1 + \frac{R_{12}}{R_{11}}\right)^2 + \frac{R_{12}^2}{R_{11}} + R_{12}\right]$$
Eqn. 11

The PSD contributed from the input referred noise PSD of all op amps satisfies

$$\overline{V_{n,U,tot}^2} = \overline{V_{n,U}^2} \cdot \left[\frac{R_9}{R_7} + \frac{R_{10}}{R_8 + R_{10}} \cdot \frac{R_7 + R_9}{R_7} + \left(1 + \frac{R_9}{R_7}\right) + \left(1 + \frac{R_{12}}{R_{11}}\right)\right]$$
Eqn. 12

Where  $\overline{V_{n,U}^2}$  is the input referred noise PSD of each op amp.

After substituting resistor values into Eqn. 12, it is found out that the dominant resistor thermal noise comes from  $R_3$  and  $R_4$ , and the value of the PSD is 3.06 nV<sup>2</sup>/Hz. The total RMS value of resistor thermal noise voltage at sensing circuit output is

$$V_{n,R,RMS} = \sqrt{\int_0^\infty \frac{3.06 \times 10^{-9}}{1 + (2\pi R_{13} C_1 f)^2} df} = 87.5 \,\mu V$$
 Eqn. 13

Substituting resistor values to Eqn. 13, the resulting noise PSD due to op amp is  $1214 \overline{V_{n,U}^2}$ . Therefore, the sensing circuit will amplify the input referred noise power spectrum of an op amp by a factor of ~1000 at the output. Because of the narrow bandwidth of the low-pass filter, 1.6 Hz, the actual noise source due to op amps is the flick noise. In real measurement, this part of noise is reduced by averaging the sensing circuit output 100 times at each delay step. The averaging operation is capable of reducing the noise power by a factor of 100, or improving the SNR by 20 dB.

As can be observed from above analysis, because the sensing circuit processes the fast dynamics of the nano-second-scale fluorescence decay into a series of DC signals, the electronic noise contributed from the multi-stage circuit can be effectively reduced by adding a simple R-C low pass filter at the output stage. This advantage is not available by discrete-time circuits that dynamically sample the fast fluorescence decay, such as the time-domain TCSPC instrument.

# 2.4 Experimental results

# 2.4.1 Fluorimeter prototype overview

The time-domain solid-state fluorimeter discrete-component prototype was designed and fabricated on two printed-circuit-boards (PCBs). Both sides of each board are shown in Fig. 11. All circuits handling large swing digital signals, including the MCU operating at 3.3 V, ring oscillator operating at 5 V, delay line circuits operating at 5 V, inverters operating at 5 V, and the excitation circuit operating at 5V, were distributed on a pulsing board on the left figures in Fig. 11. The sensing circuit that processes small signal analog voltages, and the high-voltage bias generator used for the avalanche photodiode, were placed on a sensing board on the right. The bottom plate of each board is made ground plane to form the microstrip structure on the FR-4 material. To make the fluorimeter portable, each board was designed with only an area of 62 mm × 115 mm.



Pulsing board (back-side)

Sensing board (back-side)

Fig. 11 The fabricated printed circuit boards of the discrete-component fluorimter prototype[66]

After solid-state components were manually soldered on each board, they were assembled into a portable device, shown in Fig. 12. Board-to-board communication is performed through a 5 cm ribbon cable. Black colored spacers were custom fabricated in machine shop, and were placed between two boards at the edge to form an enclosed measurement chamber. In the measurement, a 1.5 mL disposibal cuvette (Fisherbrand<sup>TM</sup> 14-955-127) is plugged into the device to hold the liquid sample. An avalanche photodiode (Hamamatsu S2382) is soldered on the sensing board and it attaches to the surface of the cuvette. Right angle illumination is provided by a though-hole 405 nm LED (Bivar Inc/UV5TZ-405-15) that is bent at 90° angle to illuminate the cuvette from the side.



Fig. 12 The assembled discrete-components based fluorimeter prototype [66] The MCU in the portable fluorimeter was programmed by a C code to enable an embedded system. The system is a slave device controlled by a personal computer through USB communication. In the personal computer, a user interface was created in MATLAB. In the user interface, users can define parameters to specify a measurement. In particular, the sequence of the delay values can be set to define the integration window in each step; the repetition rate is used to measure one delay step multiple times; the averaging number defines how many data is recorded for one delay step in one single measurement; the initial amplitude set the peak sensing output at the beginning of the

measurement, and the bias voltage of the APD is automatically adjusted by the MCU to satisfy this requirement at the beginning of the measurement.

When the MATLAB user interface issues a measurement command to the slave device, it sends measurement parameters to the MCU. Then the C code programmed in the MCU is executed to controls the fluorimeter to perform the specified measurement. The DC voltage output from the sensing circuit is digitized by a 10-bit ADC in the MCU in each delay step. The data is first stored in the 128 k RAM. It is eventually transmitted to the personal computer at the end of the measurement. Once the data is captured, the MATLAB user interface fits the data to a single exponential function to find out the lifetime of the chemicals under test.



#### 2.4.2 Characterization result

Fig. 13 The measurement result of 5  $\mu$ M/L 'lucifer yellow' dissolved in PBS[66]

The performance of the time-domain solid-state fluorimeter discrete-component prototype was characterized by testing commercial fluorophores. The measurement result on a 5  $\mu$ M/L 'lucifer yellow' (MP Biomedicals/M3415) dissolved in PBS is plotted in Fig. 13. The raw data is fitted into a single-exponential curve with solid line. Fitting result for the measurement is a 5.6 ns lifetime. This is sufficiently close to the reported 5.7 ns found in literature[67].

Then the 10  $\mu$ M 'fluorescein' (Online science mall/CAD# 518-47-8) dissolved in PBS was measured with a lifetime of 4.1 ns. The 0.2  $\mu$ M Qdot 585 (Life technologies/Qtracker 585) dissolved in PBS exhibited 18.8 ns lifetime. Both results matches the lifetime standards reported in literatures (4 ns, and 19.5 ns±1.6 ns)[67], [68]. Fig. 14 is the normalized fluorescence decay data before and after the least square fit.



Fig. 14 The measurement results for 3 commercial fluorophores[66]

The detection limit of the fluorimeter was determined by measuring a group of fluorescein PBS solutions under different concentrations. As shown in Fig. 15, the average lifetime and the standard error deviate from the correct value as the concentration becomes lower than 0.5  $\mu$ M/L. This detection limit is a factor of 10 smaller than a frequency-domain fluorimeter implemented in IC[69] developed in 2011.



Fig. 15 The detection limit measurement of the discrete-component fluorimeter prototype[66]

# 2.4.3 Improvement on the detection limit



Fig. 16 The fabricated PCBs of the second version discrete-component fluorimeter The prototype of the discrete-component fluorimeter was re-designed to improve its detection limit performance, as shown in Fig. 16. Because a smaller detection limit requires more sensitive sensing circuit optoelectronics front-end, in the second version, 2 avalanche photodiode were implemented on the board. The principle is to increase the photo sensitive area of the sensor by 2, so that double amount of photons from the fluorescence can be collected to improve the signal-to-noise ratio.

The second change on the board is separating the excitation board from the board mounted with the MCU and the delay circuit. By implementing two excitation boards with smaller areas, they can be installed in the slots fabricated on the sensing board and the controller board shown in Fig. 16. This enables direct soldering the through-hole LEDs on the excitation boards without bending their package leads. The excitation and sensing angle is 90° guaranteed by this method. The parasitic effect from the LED

package is reduced for a faster response time. By implementing two excitation boards on both sides of the cuvette, 2 LEDs excites the chemical sample placed with a greater area. In this method total fluorescence can be doubled for the additional avalanche photodiode to sense additional part of fluorescence that is unavailable in the prototype fluorimeter.

The third change is the removal of the ring oscillator circuit that was used to produce clock signals for generating pulses. Clock signal produced by the inverter based ring oscillator is subjected to temperature variation and supply noise. A simpler and more effective way of implementing the clock is to use the MCU. The PIC32 MCU is equipped with several serial peripheral interface (SPI) modules. When operating in framed mode, the SCK port generates a continuous clock signal. The clock is pre-scaled from the output of an internal PLL, which receives an 8 MHz reference clock from an external crystal oscillator. Clock frequency of the SPI module can be programmed in software. Because of the well behaved stability nature of a crystal oscillator, PLL output synchronized with the reference exhibits improved low-jitter quality over the ring oscillator used in the prototype fluorimeter.

The re-designed printed circuit boards with above modifications were assembled into the device shown in Fig. 17. Two LEDs were put apart from each other to generate 2 separate excitation beams. Two avalanche photodiodes were soldered closely on the sensing board. After stacking the controller board and the sensing board, two APDs sense two separate excitation beam paths at a 90 ° angle.



Fig. 17 The assembled second version discrete-component fluorimeter

The portable fluorimeter was put inside an aluminum container to shield the device from the external light and electromagnetic waves when making measurements. A temperature controller was added into the system to stabilize the ambient temperature inside the measurement chamber, as shown if Fig. 18. A voltage divider built by resistor  $R_1$  and thermistor  $R_T$  creates a temperature dependent voltage, V(T). It feeds into the non-inverting input of the op amp U<sub>2</sub>, witch is built into an integrator with resistor  $R_2$  and capacitor C. The difference between the reference voltage,  $V_{ref}$ , and V(T) applies across  $R_2$  to create the integration current. The output of U2 is amplified by an inverting amplifier built by U<sub>3</sub> and resistor  $R_4$ ~ $R_5$ . If V(T) is larger than  $V_{ref}$ ,  $V_b$  is decreased, and transistor  $M_1$  draws less current from the peltier to remove the heat from inside the measurement chamber. As the temperature increases, the thermistor  $R_T$  is decreased, so as the V(T). The transistor M1 drives the peltier with a smaller current to maintain the temperature in the chamber. In steady state, V(T) keeps track of  $V_{ref}$ .



Fig. 18 The proportional-integral temperature control circuit used to control the instrument temperature

The instrument setup was used to measure the lifetime of fluorescein dissolved in PBS with concentration ranging from 65 nM/L to 50  $\mu$ M/L. In each concentration, measurement was repeated 5 times for calculation of the average value and the standard deviation. The real lifetime of fluorescein based on the literature is 4 ns[70]. As can be observed from Fig. 19, when the concentration was reduced to 0.25  $\mu$ M, the measured average lifetime became 3.6 ns. However it is only 10% below the 4 ns. As the concentration was decreased down to 125 nM/L and 62.5 nM/L, the measurement results were 2.1 ns and 1.8 ns, which are significantly incorrect compared to 4 ns. The experiment suggests that the new detection limit of the second discrete-component fluorimeter is 0.25  $\mu$ M/L. It is a factor of 2 improvements over the detection limit of 0.5  $\mu$ M/L observed from the first prototype.



Fig. 19 The detection limit measurement result of the second version discrete component fluorimeter

## **CHAPTER 3**

# ACTIVE PIXEL SENSOR CHARACTERIZATION UNDER TSMC 0.35 μm HIGH VOLTAGE TECHNOLOGY

### 3.1 Motivation to characterize sensor on the targeting technology

Realizing the discrete-component fluorimeter in a CMOS integrated circuit is of great attraction to increase sensor integration level, make the device more portable, and reduce the average cost. There are several changes needs to be resolved to migrate the discrete-component electronics to the integrated circuit. The first change is the type of photo-detector. The discrete avalanche photodiode is used in the prototype device as demonstrated in chapter 2. Such a device is not available in the standard CMOS technology. Alternatively, a new type of sensor, called active pixel sensor fabricated in CMOS technology, was extensively studied during the past 2 decades to enable digital imaging[71]. In the CMOS time-domain fluorimeter IC developed in this project, the 3-transistor (3-T) active pixel sensor (APS) was used as the elementary sensing element to perform photon-to-voltage conversion. The pixel sensors are formed into an imager array for 2-dimension imaging. Each sensor and its readout circuit are the migrated version of the discrete-component fluorimeter, so that a number of fluorimeters can be multiplexed and built on a monolithic IC.

A 3-T active pixel sensor differs from the avalanche photodiode in that there is no photo-multiplication gain. A 3-T active pixel sensor also does not require high voltage bias for proper operation. Because of the lack of documented resources to compare the performance of the sensors available in the targeting technology to that of the avalanche

photodiode used in discrete-component device, many performances of the 3-T APS sensor, such as its linearity with light intensity, its sensitivity to light, and its respond speed after reset need to be understood by building various sensors on the IC and testing them carefully. The overall objective in this step is to select sensors with good sensitivity, linearity, and fast respond speed. These sensors were eventually used to build the imager array.

### 3.2 CMOS active pixel sensor in TSMC 0.35 µm high voltage technology

TSMC's 0.35 µm double-poly-four-metal high voltage process was chosen because it provides 7 different types of diodes. These simple silicon PN junctions can serve as photodetectors, because incidental photons can interfere with the space charge region of a reverse biased PN junction, so that the energy of the photon is received by silicon and photo-current is generated in proportional to the amount of the received energy. This section will not discuss sensor device physics in depth, because their conversion mechanism from photon energy to electric current has been thoroughly studied[72]. The purpose of the section is to report and analyze the testing result of each concerned sensor built in a testing tape-out.

In the time-domain fluorimeter IC, the photodiode-based 3-T active pixel sensor (APS) is used as the light detecting element, a pixel[73]–[75]. The schematic of a 3-T APS is presented in Fig. 20. It consists of a PN junction based photodiode with a grounded anode, a reset transistor  $M_1$ , a source follower transistor  $M_2$ , and a row-select transistor  $M_3$ . All transistors in this example are based on NMOS FET. During the reset phase,  $M_1$  is turned on. The voltage across the diode is set to a fixed value, called reset voltage  $V_{rst}$ , to reverse bias the PN junction. The voltage is actually applied across the

junction capacitor,  $C_i$ , of the diode. After the falling edge of the reset pulse, the PN junction is no longer reversed biased by the voltage source, and photons received by the diode are converted into a current. It discharges the  $C_i$  when time goes by, so that the light is converted into a voltage change. If  $M_3$  is turned on all the time, the voltage across the photodiode is buffered to the output,  $V_{out}$ , through a source follower consisting of  $M_2$ ,  $M_3$ , and the current source shared by an entire column of pixels.



Fig. 20 The schematic of a CMOS 3-T active pixel sensor

One pixel sensor can be readily realized in a square-shaped layout in design. It is can be further copied in column and row dimensions to create an array of sensor, called an image sensor array. The array allows sensing spatially different light signals across a certain area. By implementing lens in front of the imager array to focus the view, the IC is able to record a 2-dimensional image.

There are four degrees of freedom in building an APS in the targeting technology. These include the type of photodiode, the size of the active sensing area, the type of the reset transistor, and the channel width of the reset transistor. Each parameter affects the performance of an APS in different regimes. Different types of diodes are fabricated with different doping profiles, so their linearity and sensitivity to light behaves differently.

The active sensing area of a diode determines how much photons can be received by the sensor at a constant time interval. It also affects the junction capacitance, C<sub>i</sub>,

labeled in Fig. 20. The conversion efficiency from the received number of photons to the voltage is different for different types of photodiode.

While transistor  $M_2$  and  $M_3$  are usually implemented by two minimum sized NMOS FETs to save area, the reset transistor  $M_1$  is built by either a PMOS FET or an NMOS FET. The advantage of using a PMOS FET is its capability of transmitting a good logic "high", so that the  $V_{rst}$  can be as large as the power supply to maximize the sensor's output swing as well as its sensitivity. The drawback of a PMOS FET is the more area requirement compared to an NMOS FET, because it is fabricated in N-WELL in the technology. The NMOS FET minimizes the area, but the actual reverse bias voltage applied on the photodiode is one threshold voltage blow power supply, lowing the sensor's output swing and sensitivity. Another issue with the NMOS FET is that as the source voltage is charged to one threshold voltage below the power supply, the transistor is weakly inversed until the end of reset. The subthreshold current mismatch from pixel to pixel create mismatch of actual reset voltage applied on their photodiode. This is improved in PMOS FET, because the drain of the reset transistor in all pixels can be charged to power supply, a constant voltage, during the reset.

Finally, the width of the reset transistor determines how good the reset voltage across the diode can be maintained during the reset when light is not shielded from the sensor. A good clamping performance of the reset voltage across the photodiode is beneficial for a good electrical shutter. Intuitively, a wider reset transistor, due to smaller turn-on resistance, can immediately supplement the lose of charges on capacitor C<sub>i</sub>, due to light illumination during reset. This is significantly important because the electric shutter of the imager is used to reject the excitation light from the emission light during

the measurement. The drawback of a wider reset transistor is it creates more parasitic capacitance and makes C<sub>i</sub> large.

The objective in this stage of research is to understand how each degree of freedom in building a CMOS active pixel sensor in the targeting technology can affect its performance. By recognizing the optimized sensor, it was chosen to serve as the elementary pixel used in the final imager IC.

### 3.3 Sensor characterization method

A typical layout of a CMOS 3-T APS corresponding to Fig. 20 is depicted in Fig. 21. The PN junction is formed by doping N-type dopant in P-type substrate, so that the anode is grounded and the cathode is the doped area. Reset transistor  $M_1$ , source follower input transistor  $M_2$ , and the row selection transistor  $M_3$  are placed at the right hand side compactly.



Fig. 21 A typical 3-T CMOS active pixel sensor layout

Among 7 types of diodes available in the targeting technology, 4 of them are PN junctions fabricated in P-substrate, as shown in Fig. 22 (a), and 3 of them are fabricated in N-WELL, Fig. 22(b). Diodes fabricated on P-substrate can be copied into an imager array with higher integration level than diodes that are fabricated in N-well. This is

because the first type of diodes shares the bulk silicon as the common well, while the second type diodes are fabricated in separated n-wells that require more area for isolation. Another advantage of the diode fabricated on bulk silicon is that its junction capacitor only includes one capacitor,  $C_i$ , as shown in Fig. 22(a). But when the diode is fabricated in n-well, two capacitors,  $C_{i1}$  and  $C_{i2}$ , will be involved. The smaller junction capacitor per unit active sensing area is favorable, because as the same amount of photo-current discharges the junction capacitor, the voltage across the smaller capacitor changes faster, presenting the higher sensitivity to light.



Fig. 22 The cross-sectional view of photodiode fabricated on p-type substrate (a) and on n-well (b) For the above reason, diodes directly fabricated in p-substrate were chosen for characterization. These include N+/PW diode, N+/PW\_HV diode, NDD/PW\_HV diode, and NW/PW\_HV diode. N+/PW diode is formed by simply doping N-type dopant into Ptype substrate. It is essentially formed in the same step when the drain and source of an NMOS FET is formed. In N+/PW\_HV diode, the thin oxide layer above the silicon, originally used as gate oxide, before doping is thicker than that of N+/PW diode. The thicker oxide layer can block some of the dopants from injecting into the silicon in ion implantation process, so that the actual doping concentration is lower than the thinner oxide layer case. The total width of the space charge region due to the lighter doping is wider as opposed to heavily doping. This will increase the reverse bias break-down voltage so that the photodiode can be biased at a higher voltage than low-voltage device

to increase the sensitivity and output swing. NDD/PW\_HV stands for N-type deepdiffused-drain (NDDD) high-voltage diode. The metallic junction of such type of diode is deeper than above two diodes. This allows a even higher break-down voltage. Finally, in NW/PW\_HV diode, the cathode of the PN junction is n-well. It is deeper than common source and drain diffusion, and the doping concentration is lighter. All these property further improve the sensitivity of the photodiode.

For each type of APS, the size of the pixel sensor layout was varied from  $2 \mu m \times 2 \mu m$  to  $80 \mu m \times 80 \mu m$ . The reset transistor was chosen either an NMOS FET or a PMOS FET with channel width chosen from 0.35  $\mu m$  to 68  $\mu m$ . A total of 104 pixels were implemented in a testing array shown in Fig. 23. The row decoders and column buffers are placed on the left and at the bottom. The entire pixel array was fabricated on a  $2 \text{ mm} \times 2 \text{ mm}$  silicon-chip fabricated in TSMC.



Fig. 23 The layout of the 3-T CMOS APS testing array

After the chip was fabricated along with 3 other chips on a 2.5 mm × 10 mm bulk silicon die, it was diced and wire bonded to a 1 cm × 1 cm 64-pin FPN open cavity package. Two PCBs were designed to test all pixels. The block diagram of the testing circuit is shown in Fig. 24 (a). In the testing instrument, a programmable current source consisting of a 12-bit digital-to-analog converter (DAC), an op-amp, an NMOS FET M<sub>1</sub>, and a resistor R<sub>s</sub> is controlled by a MCU to generate a current that drives an LED. The light intensity emitted from the LED can be digitally controlled. At each intensity level, the light is converted into a voltage by the selected pixel in the testing array of the chip. The analog outputs from the pixel array column buffers are then digitized by the 10-bit ADC in PIC32 MCU and transmitted to the personal computer for analysis.



Fig. 24 The schematic of the CMOS 3-T APS array testing board (a) and board connection (b)

Fig. 24 (b) shows the connection of the testing instrument. Two boards are stacked with spacers. A diffuser lens (THORLABS DG10-220-MD) was placed between the LED and the imager chip to spread out the light received by the pixels. The designed optic smoothes out the light intensity difference over the pixel array, so that each sensor receives the same amount of light at a given time. A ribbon cable transmits the digital control signal from the MCU to the DAC, and delivers the power. In measurement, the

testing instrument communicates with a PC through a serial port controlled by a MATLAB script.

A single measurement is performed by first generating a constant biasing current to drive the LED. Under such intensity, the ADC in MCU digitizes the output of the selected sensor twice. The first sample occurs at the falling edge of the reset pulse (for NMOS reset pixel) to capture the reset voltage; the second sample is at end of the exposure window. The total amount of photons received by the pixel is eventually represented by the difference between two samples. The exposure time can be controlled by a software re-configurable timer inside the MCU. Fig. 25 presents PCBs used in the measurement.



Fig. 25 The assembled CMOS 3-T APS array testing boards

# **3.4 Experimental results**

All pixels in the sensor characterization tape-out have been tested by the custom instrument. The measurement detail, data analysis, and discussion on each performance corner are presented in this section.

## **3.4.1 LED light power characterization**

In testing the linearity of a pixel sensor with different LED illumination, the pixel output is measured while varying the driving current, or the power, of a 635 nm LED. The relation between LED's current and its light power was characterized by a laser power meter. The result is shown in Fig. 26. In the subsequent data analysis, the fitted curve is used to map the bias current into the light power. This characterization was necessary because the measurement electronics indirectly set the LED light power by setting its current.



Fig. 26 The characterization curve of LED light power vs. LED current.

## 3.4.2 Sensitivity and linearity of the CMOS 3-T APS with PMOS FET reset

Sensitivity of a pixel sensor measures how much voltage change is created by the sensor when receiving a constant amount of photons. It is desirable to have the sensitivity as high as possible to detect the weak fluorescence from the chemicals. To characterize the sensitivity, the photon-energy transfer curve shown in Fig. 27 was plotted for each pixel sensor. In this example, the selected APS is reset then gated on to sense the constant light from the LED. The output of the sensor is first read immediately after the reset, and then it is read again at the end of the exposure window, lasting 140 µs. During this

period, photons received by the photodiode are converted into electrons to discharge the junction capacitor of the diode, so that the voltage at the sensor's output is decreased by a value proportional to the amount of received photons in that period. In the measurement, the light power from the 635 nm LED increases by 16  $\mu$ W between steps. The total photon energy from the LED thus increases 2.2 nJ/step (16  $\mu$ W/step × 140  $\mu$ s). After 44 measurements, the APS output was plotted across the total photon energy from the LED. A least square fit was then performed to extract the slope of the curve. Notice the fact that light is evenly distributed on the surface of the pixel array, so the actual photon energy received by any single APS within the exposure window was a constant fraction of the total amount. For this reason, the slope of the curve is not the absolute measure of a sensor's linearity to the received amount of photons. But by comparing the slope for different types of sensors their relative sensitivity can be realized.



Fig. 27 The response of the 80  $\mu$ m × 80  $\mu$ m NDD/PW\_HV PMOS FET-reset 3-T APS to light energy

The accurate measurement of the fluorescence decay profile requires the response of an APS to light to be linear. Intuitively, it requires the total amount of photons from the fluorescence to be converted into electrons with restrict linear relation. The total nonlinearity error is used to characterize the linearity of the APS[76]. In Fig. 27, the nonlinearity percentage error at each measurement point is calculated by

$$err_i = \frac{F_i - R_i}{\max(F) - \min(F)} \times 100\%$$
 (*i* = 1, 2, ..., 44) Eqn. 14

where  $F_i$  is the fitting result at ith point and  $R_i$  is the measured value at this point. Then the total non-linear error is calculated by taking the peak-to-peak value of *err*. A smaller value of *err* represents a better linear response of an APS to light.

The sensitivity (Gain) and the nonlinearity of NDD/PW\_HV PMOS-reset APSs were characterized by first testing 80  $\mu$ m × 80  $\mu$ m sensor with 0.35  $\mu$ m, 5  $\mu$ m, 25  $\mu$ m, and 68  $\mu$ m-wide PMOS-reset transistors (Fig. 28 (a) and (b)). The increased channel width creates more parasitic capacitance at the cathode of the photodiode. The sensitivity of the APS decreases with the increased size of reset transistor, because it takes more charges to change the voltage. There is no significant non-linearity variation observed by increasing the PMOS FET reset transistor's channel width.



Fig. 28 The sensitivity (Gain) and the total non-linearity of NDD/PW\_HV PMOS FET reset APSs.

To quantitatively analyze the parasitic capacitor's effect on sensitivity and nonlinearity, all parasitic capacitors at the cathode of the APS are plotted in Fig. 29 (a). They include the PN junction capacitor of the photodiode,  $C_d$ , the drain to body capacitor of  $M_1$ ,  $C_{db1}$ , the gate to drain capacitor,  $C_{gd1}$ , and  $C_{gd2}$ , for  $M_1$  and  $M_2$ , and the gate to body capacitor of  $M_2$ ,  $C_{gb2}$ . The parasitic capacitors associated with transistor  $M_2$  are negligible when M1 is chosen many times wider than  $M_2$ . In Fig. 29 (b), three dominant capacitors are presented at the cathode of the photodiode.  $C_d$  and  $C_{gd1}$  are non-linear PN junction capacitors.  $C_d$  is grounded, and  $C_{gd1}$  is connected to N-Well, which is further biased to power supply, 3.3 V.  $C_{gd1}$  is the voltage-independent overlapping capacitor formed because of the diffusion of drain area underneath the gate.



Fig. 29 The parasitic capacitors at the cathode of photodiode (a), and the dominant capacitors used to quantitatively analyze the sensitivity and the nonlinearity of the APS (b).

Voltage  $V_a$  is first reset to power supply, 3.3 V, during reset; then it starts to fall because the photodiode pulls electric charges from this node to ground when light is detected. The total number of reduced charges at the cathode of the photodiode satisfies

$$\Delta Q = \int_{3.3-\Delta V}^{3.3} C_d(V_a) dV_a - \int_{3.3-\Delta V}^{3.3} C_{db1}(3.3-V_a) dV_a - \int_{3.3-\Delta V}^{3.3} C_{gd1} dV_a$$
 Eqn. 15

where  $\Delta V$  is the voltage change at V<sub>a</sub>. The equations for C<sub>d</sub>(V<sub>a</sub>), and V<sub>gd1</sub>(3.3-V<sub>a</sub>) are

$$C_d(V_a) = C_{d0}(1 + \frac{V_a}{V_{bi}})^{\frac{-1}{2}}$$
 Eqn. 16

$$C_{db}(3.3 - V_a) = WC_{db0}(1 + \frac{3.3 - V_a}{V_{bi}})^{\frac{-1}{2}}$$
 Eqn. 17

where  $C_{d0}$  and  $C_{gd0}$  are the PN junction capacitors with zero bias voltage; W is the channel width of the reset transistor; and  $V_{bi}$  is the built in potential of the PN junction. Substituting Eqn. 16 and Eqn. 17 into Eqn. 15, the total amount of reduced charges equal to

$$\Delta Q = 2V_{bi}C_{d0}\left[\left(1 + \frac{3.3}{V_{bi}}\right)^{\frac{1}{2}} - \left(1 + \frac{3.3 - \Delta V}{V_{bi}}\right)^{\frac{1}{2}}\right] + 2WV_{bi}C_{db0}\left[\left(1 + \frac{\Delta V}{V_{bi}}\right)^{\frac{1}{2}} - 1\right] - WC_{gd0}\Delta V \text{ Eqn. 18}$$

Because the change of electric charges is proportional to the energy of light emitted from the LED, Eqn. 18 also satisfies

$$\Delta Q = \eta E$$
 Eqn. 19

where  $\eta$  is a constant, representing the efficient from light energy to the number of charges generated by the photodiode.



Fig. 30 The photon-energy transfer curve plotted according to analytical expression with different sizes of

the reset transistors

Equating Eqn. 18 and Eqn. 19, and varies  $\Delta V$  from 0 to 3.3 V, a group of transfer curves can be plotted for different channel widths, W, as shown in Fig. 30. In plotting this figure, the values for C<sub>d0</sub>, C<sub>db0</sub>, and C<sub>gd0</sub> are 3.25 pF, 1.9 fF/µm, and 0.24 fF/µm measured in the post-layout simulation; the value for V<sub>bi</sub> was chosen 0.6 V. Fig. 30 confirms that a larger reset transistor reduces the sensitivity of an APS, because it takes increased amount of light energy to change the voltage across the photodiode by a complete power supply voltage, 3.3 V.

In the next step, the channel width of the PMOS FET reset transistor was fixed at 5  $\mu$ m but the area of the photodiode was increased from 5  $\mu$ m × 5  $\mu$ m to 80  $\mu$ m × 80  $\mu$ m, as shown in Fig. 28 (c) and (d). Despite its compactness, the sensitivity of the smallest sensor is a factor of ~2 lower than the largest sensor. The linearity of the sensor is improved as the size of the photodiode increases. The above observation indicates that using a large photodiode to build the active pixel sensor would improve its performance both in sensitivity and the linearity. But the large pixel size will limit the number of pixels in a given silicon area. So there is a tradeoff between a CMOS imager's spatial resolution and its imaging quality. The main reason for reduced image quality when using as smaller as possible pixel sensor is the reduction of incident photons[77].

The above measurement protocol was carried out for all other PMOS FET reset APSs. Results are presented in Fig. 31 for N+/PW\_HV APSs, in Fig. 32 for N+/PW APSs, and in Fig. 33 for NW/PW\_HV APSs. The similar sensitivity and linearity dynamics with the reset transistor's channel width and the photodiode's size can be observed for N+/PW\_HV and N+/PW APSs.



Fig. 31 The sensitivity (Gain) and the total non-linearity measurement results for N+/PW HV PMOS FET





Fig. 32 The sensitivity (Gain) and the total non-linearity measurement results for N+/PW PMOS FET reset APSs.

It is observed that the gain of NW/PW\_HV APSs is not monotonic with respect to reset transistor's channel width, as well as diode size seen in other types of APSs. One explanation is that the depth of the n-well was not precisely controlled in the targeting technology as opposed to other type of doping. The variation of n-wells' depth determines that the PN junction depth varies from on APS to another. No issue is induced

for operating a normal PMOS FET used in circuits, because n-well is deep enough for drain and source doping, and the channel resides close to the surface of the n-well. However, modulated junction depths for NW/PW\_HV APSs among the testing array result in different quantum efficiency, the number of electrons-hole pairs generated by one incidental photon. This is because a deeper PN junction requires photons to travel longer to reach the space charge region, the area the photon energy is absorbed and electron-hole pairs are generated.



Fig. 33 The sensitivity (Gain) and total non-linearity measurement results for NW/PW\_HV PMOS FET reset APSs.

The photon-energy transfer curves for all four types of APS with 80  $\mu$ m × 80  $\mu$ m photodiodes were plotted in Fig. 34 for comparison. NW/PW\_HV APSs present the highest photo-electric sensitivity. N+/PW APSs weakly respond to light. The sensitivity of N+/PW\_HV and NDD/PW\_HV APSs stays in the middle.



Fig. 34 The linear response to light energy for four types of APS with 80  $\mu$ m × 80  $\mu$ m photodiode

# 3.4.3 Sensitivity and linearity of the CMOS 3-T APS with NMOS FET reset

The NMOS FET reset-type APSs were also tested following the same procedure. Fig. 35 is the measurement result for the 80  $\mu$ m × 80  $\mu$ m NDD/PW\_HV APS with 5  $\mu$ mwide NMOS FET reset transistor. Its linearity is close to PMOS FET reset APSs, but the output swing is reduced by ~ 40%, due to the threshold voltage loss of an NMOS switch.



Fig. 35 The sensitivity (Gain) and nonlinearity measurement result of the  $80\mu m \times 80 \mu m NDD/PW_HV$ APS with 5  $\mu m$  wide NMOS FET reset transistor.

In the next measurement, the width of the NMOS reset transistor was increased from 0.35  $\mu$ m to 68  $\mu$ m and the diode size was fixed at 80  $\mu$ m × 80  $\mu$ m. The sensitivity

and nonlinearity of NDD/PW\_HV photodiode with NMOS FET reset transistors are plotted in Fig. 36 with solid lines. Testing results for the same type of photodiodes with PMOS FET reset transistors are also plotted in dashed lines for comparison. One observation is that NMOS FET reset APSs provide higher linearity than the counterpart. This is because the drain to substrate parasitic capacitance in an NMOS transistor is more linear than that of a PMOS transistor. No significant differences were observed in comparing the sensitivity between the two types of APSs.



Fig. 36 The comparison of sensitivity (Gain) and nonlinearity of NMOS FET and PMOS FET reset APSs with 80  $\mu$ m × 80  $\mu$ m diode and different reset transistor channel widths

# 3.4.4 Reset speed

The proposed fluorimeter rejects the excitation light from the emission light in time domain to eliminate the requirement of filters. However, non-ideal effects in electronics require the reset speed of an APS to be fast enough. This section discusses the characterization of reset speed for different types of sensors.

The APS is gated off electrically by turning on the reset transistor. This creates a conducting path from the reset power supply to the cathode of the photodiode. But during

the reset, because the APS is not optically shuttered from the excitation light, electrons and holes are still generated inside the PN junction, producing photo-current that flows through the channel resistance of the reset transistor and creating voltage drop. So the effective reset voltage applied across the photodiode is an offset below the reset voltage. Because the offset is dependent on the intensity of light, this voltage drop is an error voltage created in reset phase. It takes time to eliminate the error voltage after the excitation light is removed, after which the normal light sensing can be started. The time spent during this period corresponds to the minimum delay between the end of excitation and the end of reset. It has to be as small as possible, because once the excitation is ended, the fluorescence starts to decay. A large delay between the start of exponential decay and the start of photon-integration (the end of reset) will lead to a dramatically attenuated fluorescence signal at the beginning of integration.



Fig. 37 The equivalent circuit models of the APS during and after the excitation light interference in reset phase

The equivalent circuit models for an APS during and after the excitation pulse interference in reset phase are depicted in Fig. 37. It is assumed that the photo-current generated in the APS is constant during the excitation. The time-varying voltage across the diode, u<sub>c</sub>, in both cases can be calculated by solving the non-homogeneous first-order linear differential equation

$$\frac{du_c}{dt} + \frac{1}{R_{on}C_d} u_c = \frac{1}{C} \left( \frac{V_{rst}}{R_{on}} - I_{ph} \right)$$
Eqn. 20

modeling the excitation interference period and

$$\frac{du_c}{dt} + \frac{1}{R_{on}C_d}u_c = \frac{V_{rst}}{R_{on}C_d}$$
 Eqn. 21

describing the behavior after the excitation. In equations,  $R_{on}$  is the channel resistance of the reset transistor,  $C_d$  is the diode capacitance,  $V_{rst}$  is the reset voltage, and  $I_{ph}$  is the photocurrent proportional to the constant excitation light. All these numbers are time-invariant in the analysis.

Calculation results for Eqn. 20 and Eqn. 21 are

$$u_{c}(t) = R_{on}I_{ph}(e^{\frac{-t}{R_{on}C_{d}}} - 1) + V_{rst} \quad (0 \le t < T_{ext})$$
 Eqn. 22

and

$$u_c(t-T_{rst}) = -R_{on}I_{ph}e^{\frac{-(t-T_{ext})}{R_{on}C_d}} + V_{rst} \quad (t \ge T_{ext} \gg R_{on}C_d)$$
 Eqn. 23

where  $T_{ext}$  is the duration of the excitation pulse. From Eqn. 22 it is first concluded that the type of reset transistor and diode determine the time constant,  $R_{on}C$ , for an APS to recover to fully-reset state after an interference made by the excitation. The second conclusion is that high sensitivity APSs tend to reset slower. For example, a very sensitive APS responds to the weak light stronger than a less sensitive APS, because of the larger  $I_{ph}$ . But this also increases the coefficient of the exponential term in Eqn. 22,  $R_{on}I_{ph}$ , so the APS takes longer to move to  $V_{rst}$  with, say, 0.1 % error. The last conclusion is that PMOS-FET-reset based APS is faster than NMOS counterpart, because a PMOS FET operates in triode region during excitation light interference moment, making  $R_{on}$  relatively small, while an NMOS transistor stays in the vicinity of the cut-off region, presenting a significantly large channel resistance.

Characterizing the reset speed of different APSs was performed in two stages. In the first coarse measurement stage, the excitation pulse and the reset pulse are configured as shown in Fig. 38. In (a), a 2  $\mu$ s-wide excitation light pulse generates electrons and holes in the PN junction during reset as an interference. For NMOS-FET-reset APS the falling edge of the reset pulse is delayed from that of the excitation pulse by only 2 ns. This small time interval prevents all excess carriers from being driven out of the space charge region to fully recover the reset state. When the 2 ns time interval is smaller than, say, 7 times of an APS's time constant, a small number of residual carriers remain in the PN junction capacitor, and they create an offset error. That offset error is measured by subtracting the actual reset voltage across the photodiode from the ideal value. To measure the ideal reset voltage, as shown in Fig. 38 (a), the falling edge of the reset pulse is delayed from that of the excitation pulse by  $\sim 2$  ns, which allows all excess carriers to vanish before the end of the reset. The difference between the actual reset voltage and the ideal value is denoted as  $\Delta V$ , as shown in Fig. 38 (b). The criteria for a faster APS corresponds to a smaller  $\Delta V$ .



Fig. 38 The coarse measurement on the reset speed of the APS

The coarse measurement results for 80  $\mu$ m × 80  $\mu$ m APS with all types of reset transistors are depicted in Fig. 39. First, PMOS-FET-reset APSs are significantly faster than NMOS counterpart, because the observed error voltages,  $\Delta V$ , are smaller. Second, NW/PW\_HV APSs exhibit highest sensitivity to weak light according to Fig. 34, but it also creates more excess carriers during excitation interference, leading to more reset offset error. This error can be reduced as the reset transistor becomes wider, but it is still non-zero for the widest possible channel width, 69  $\mu$ m. Third, no reset offset error was observed on NDD/PW\_HV, N+/PW\_HV, and N+/PW APSs with PMOS-FET reset, indicating that 2 ns recovery window is sufficient for these sensors to remove the effect due to the excitation interference.



Fig. 39 The coarse reset speed measurement results for various types of APSs

The coarse measurement states that NMOS-FET-reset APSs are not suitable for measuring the fast decay of fluorescence if the recovery window is limited under 2 ns. N+/PW diode based sensor very weakly respond to light so it is also not suitable to measure fluorescence. Further verification on the speed is needed for PMOS-reset-FET APSs that creates no reset offset error in coarse method. A testing circuit shown in Fig.

40 (a) more precisely characterizes it by measuring the falling edge of the excitation light pulse the same as measuring the fluorescence decay. As shown in Fig. 40 (b), initially the delay of the excitation pulse from the reset pulse was set so that the entire pulse is integrated by the sensor (shaded area). Then the delay between two pulses was reduced to have the excitation pulse partially integrated in several steps. Under each delay step, the out put of the sensor is subtracted from a reference voltage,  $V_{ref}$ , and amplified by a

factor of  $\frac{R_f}{R_s}$ , 120, according to

$$V_{sense} = \frac{R_f}{R_s} (V_{pix} - V_{ref})$$
 Eqn. 24

A sampling and hold circuit updates the sensed voltage at end of each excitation cycle. It is further low-pass filtered, and digitized by the 10-bit ADC in the MCU.



Fig. 40 The fine measurement on the reset speed of the APS

The falling edge of the excitation light pulse was first measured by using an avalanche photodiode (Hamamatsu, S2382). Its cathode was biased at ~130 V through a DC/DC converter (EMCO G05), and its anode is connected to a Tektronix MSO4104 5 GS/s oscilloscope with 50  $\Omega$  input resistor. Such a connection allows the time-resolved excitation light pulse to be linearly converted into a time varying photocurrent through the avalanche photodiode. Then that current creates a voltage drop across the input resistor of the oscilloscope. So the waveform displayed on the oscilloscope directly represents excitation light over time. As shown in Fig. 41, the FWHM of the light pulse was 38 ns and it took 4 ns to fall down from 90% to 10%.



Fig. 41 The excitation light pulse measured by an avalanche photodiode (Hamamatsu, S2382) and a 5GS/s Oscilloscope

The same light pulse was measured by the 3-T CMOS APSs. The 80  $\mu$ m × 80  $\mu$ m NDD/PW\_HV APS with 0.35  $\mu$ m-wide PMOS-FET-reset was tested first, and the result is shown in Fig. 42. Figure on the left is the raw output data from the instrument. Because it is the integration of the light signal, data was numerically differentiated to visualize the original pulse shape as shown in Fig. 42 (b). The falling edge measures 10 ns and its

FWHM pulse width is 40 ns. A longer falling edge decay than measured by the avalanche photodiode indicates this type of sensor still suffers from reset offset error.



Fig. 42 The fine measurement result of the 80  $\mu m$   $\times$  80  $\mu m$  NDD/PW\_HV APS with 0.35  $\mu m$ -wide PMOS-FET-reset

With the same type of photodiode, the width of the reset transistor was increased from 0.35  $\mu$ m to 68  $\mu$ m to visualize its effect on reset speed. As shown if Fig. 43, a wider reset transistor improves the reset speed so that the falling edge of the signal now matches with the avalanched photodiode measurement result.



Fig. 43 The fine measurement result on 80  $\mu$ m × 80  $\mu$ m NDD/PW\_HV APS with 68  $\mu$ m PMOS-FET-reset

The same measurement was performed on 80  $\mu$ m × 80  $\mu$ m N+/PW\_HV APS with 0.35  $\mu$ m and 68  $\mu$ m PMOS-FET-reset. Fig. 44 (b) shows that with a small reset transistor the speed of the APS is strongly limited. But in Fig. 45 (b), the measured falling edge becomes accurate enough when using a 68  $\mu$ m-wide PMOS reset transistor.



Fig. 44 The fine measurement result on the 80 µm × 80 µm N+/PW\_HV APS with 0.35 µm PMOS reset



transistor

Fig. 45 The fine measurement result for 80  $\mu$ m × 80  $\mu$ m N+/PW HV type APS with 68  $\mu$ m PMOS reset

transistor

# 3.4.5 Summary

In this section the 3-T CMOS APSs under TSMC 0.35 µm high-voltage technology was characterized. NW/PW HV diode based APS exhibits strong photoelectric conversion gain (sensitivity), but the reset performance is unsatisfactory slow even with the widest PMOS reset transistor. N+/PW based APSs suffer from the weak sensitivity, even though its reset speed is fast. The linearity and sensitivity of NDD/PW HV and N+/PW HV diode based APSs are comparably high, with NDD/PW HV slightly higher. Their reset speed measurements demonstrated that both are able to accurately measure the 4 ns falling edge of an excitation light pulse. Large active sensing area can provide the higher sensitivity to light, because the number of photons received by the sensor increases as the increase receiving area. Linearity can be also improved with a larger sized sensor, because the PN junction of the diode dominates. PMOS-FET-reset based sensor can provide a larger signal swing than NMOS reset transistors. It also provides a faster reset speed because of a smaller turn-on channel resistance from the reset power line to the cathode of the photodiode. Based on above knowledge, NDD/PW HV diode based APS was chosen as the elementary sensor used in the final IC. The size was chosen to be 20  $\mu$ m  $\times$  20  $\mu$ m, because it is the reasonably small achievable width of the column-parallel signal conditioning circuit to be described in part of the next chapter.

#### **CHAPTER 4**

# MIXED-SIGNAL BUILDING BLOCKS OF CMOS TIME-DOMAIN FLUORIMETER IC

## 4.1 Design consideration

Signal conditioning building blocks in discrete-component fluorimeter were reimplemented in integrated circuit in this chapter. However, the direct duplication of all existing discrete-component circuit schematics into IC design would be inappropriate. First, an active pixel sensor realized in CMOS IC is not as sensitive to light as an avalanche photodiode realized as a discrete component, because of the lack of avalanche multiplication effect. This leads to lower signal amplitude at the sensor output for the same amount of fluorescence, so additional amplification is required after the front-end sensor. Using a continuous-time resistive-feedback amplifier as in the discrete-time circuit would be challenging, because a large resistor ratio (>1, 000) is needed. Such topology consumes dynamic power in all columns, making it less attractive for a large sensor array. Second, the sensor output in discrete-time component circuit is converted into a DC voltage through an RC low-pass filter for averaging. It was built by a 1 M $\Omega$ resistor and a 4.44 pF parasitic capacitor of the op amp IC. Implementing such circuit in each column of sensor array may require considerable area for both the resistor and the capacitor. Third, the variable capacitor used in pulse generator in the discrete-component circuit is difficult to implement in IC.

An effective solution to solve the first two problems is to implement discrete-time circuits with sampling behavior. In particular, the RC low-pass filter can be replaced by a

61

sampling and hold circuit, so that the sensor output is only used a DC signal at particular moments during a clock cycle. A switched-capacitor amplifier would support both the sampling function and amplification function. It consumes no static power because the feedback is capacitive. The high gain requirement can be realized by dividing the gain into two stages.

Not being able to implement variable capacitor is solved by implementing a variable resistor in IC. This can be realized by designing a trans-conductance amplifier that is connected with a negative feedback. The input resistance of the module is the inverse of the trans-conductance, and can be adjusted by biasing current.



#### 4.2 System architecture of the Imager IC

Fig. 46 The system architecture of the proposed monolithic fluorimeter IC

The imager IC integrates 2, 048 NDD/PW-HV based CMOS image sensors, 32channel column-parallel readout circuitry, and digital controller in a single chip, as shown in Fig. 46. A row of pixels are enabled by a row decoder with the serial peripheral interface (SPI). Once selected, the outputs from these pixels are connected to corresponding column current sources that complete the voltage followers built inside each selected pixel. Those buffered voltages then feed into 32-channel switch-capacitor amplifiers in parallel for amplification. The outputs are digitized by 32-channel ADCs at the end of each measurement cycle. In the end the binary data is transmitted off the chip through the I/O interface.

To control the delay step required in the fluorescence decay measurement, controller logics are divided into a gating controller and a timing controller. Both controllers are time-gated and remain idle to save power when the reset signal, Global\_reset, is held high. As the reset is pulled low, the reset pulse and the excitation pulse are generated from the gating controller to create a specific integration window. Then the timing controller is synchronized by the reset pulse to produce timing signals required by amplifiers, ADCs, and the I/O. Both the gating controller and the timing controller contain internal registers that can be re-configured before each measurement. These registers define the delay between reset and excitation pulse, as well as the period of some timing signals.

Because the excitation circuit is a high speed LED driver that delivers high power in a short period, this circuit block was implemented off the chip. The imager IC generates the excitation signal synchronized with the rest of the timing signal to drive the off-chip LED driver. The system block diagram including the fluorimeter IC, the off-chip LED driver, and the chemical sample is presented in Fig. 47. Once the chemical sample is excited by the excitation light in horizontal direction, the emission light is collected by the imager IC orthogonally. In the actual measurement, the emission light is received by all enabled pixels and the succeeding column-parallel signal conditioning circuits read out the signals. Here the timing controller shown in Fig. 47 is further divided into a toplevel controller, main controller, and a lower-level controller, ADC controller. Once re-

63

configured in software, the main controller only triggers the ADC controller in a certain points in each cycle to perform two conversions. The first conversion starts right after the reset pulse to record the actual reset level of the pixel; the second conversion read the output of the sensor at the end of the integration window. Upon the completion of both data conversion, the ADC controller send a handshake signal, called ADC\_Idle, to the main controller for recognition purpose. Timing signals can be kept generating from the top level for other measurement cycles.



Fig. 47 The block diagram of a complete imager system for one selected pixel

#### 4.3 Individual functional block design and verification

The chip was implemented with a  $32 \times 64$  pixel array. It was filled with the 20  $\mu$ m×20  $\mu$ m NDD/PW\_HV diode based APSs with 5- $\mu$ m PMOS-FET-reset. Correspondingly, 32-channel column-parallel amplifiers and ADCs were implemented to fit the pitch size of the sensor array, 25  $\mu$ m. All functional blocks were built hierarchically throughout the entire design phase. Circuits from each level were designed starting from the schematic entry, pre-layout simulation, custom layout, and end up with post-layout verification before chip-level integration.

# 4.3.1 Gating controller

The gating controller produces excitation pulse and reset pulse to define various integration windows. As shown in Fig. 48, an off-chip system master clock feeds into the chip. Its frequency is divided down by a 16-bit pre-scalar to generate the clock signal for generating pulses. The 50% duty cycle clock is converted to two short pulse by a tunable dual-channel pulse generator. One pulse is used for excitation, Excitation<sub>0</sub>, and another one is used for reset, Reset<sub>0</sub>. The final pulse signals are delayed version of the un-delayed pulses via two delay line circuits. The input of the dual-channel pulse generator can be multiplexed to an external clock for debugging purpose. Internal signals are buffered to pads for monitoring.



Fig. 48 The block diagram of the gating controller

# 4.3.1.1 16-bit pre-scalar design and the measurement

The 16-bit pre-scalar circuit consists of a 16-bit ripple counter and re-configurable selection logic, as shown in Fig. 49. The ripple counter utilizes the transmission-gate D-flip-flop to build 16 stages of cascaded frequency dividers. The global reset signal is common to all D-flip-flops to asynchronously gate off the counter in idle mode. With a system master clock as the input, the counter provide 16 clock frequencies with a dividing ratio of  $1:2^x$  (x=1,2,...,16).

A 16-1 multiplexer logic receives all the derived clocks and outputs the selected clock based on the address sent to register during programming. A switch in the switch bank is closed for a particular address to connect one of the derived clocks to the output.



Fig. 49 The RTL block diagram of the 16-bit pre-scalar

All functional blocks were designed in transistor level, and was implemented in a 295  $\mu$ m× 125  $\mu$ m layout. After the chip was fabricated, this functional block was measured by a custom test instrument and a Tektronix MSO 4104 mixed-signal oscilloscope. A 10 MHz system master clock was provided into the circuit and the output frequency was measured by the oscilloscope while increasing the pre-scale ratio. Table. 2 presents the measurement data under each configuration. Besides the correct frequency observed from the oscilloscope, this result confirmed the correct operation of the serial programming data link. It was essential in measuring more complicated on-chip

programmable logics, as well as debugging the most complicated operation of I/O

interface.

| Frequency ratio | fmeas (pre-layout/post-layout) |
|-----------------|--------------------------------|
| 1:2             | 5.000 MHz                      |
| 1:4             | 2.500 MHz                      |
| 1:8             | 1.250 MHz                      |
| 1:16            | 625.0 kHz                      |
| 1:32            | 312.5 kHz                      |
| 1:64            | 156.3 kHz                      |
| 1:128           | 78.13 kHz                      |
| 1:256           | 39.06 kHz                      |
| 1:512           | 19.53 kHz                      |
| 1:1024          | 9.766 kHz                      |
| 1:2048          | 4.883 kHz                      |
| 1:4096          | 2.441 kHz                      |
| 1:8192          | 1.221 kHz                      |
| 1:16384         | 610.4 Hz                       |
| 1:32768         | 305.2 Hz                       |
| 1:65536         | 152.6 Hz                       |

Table. 2 16-bit pre-scalar circuit post-simulation data and the post-silicon measurement data

# 4.3.1.2 Dual-channel pulse generator design and measurement

In the discrete-component fluorimeter, short pulse for excitation or reset is created by sending a 50% duty cycle clock to a high-pass filter, and sharpening the edges of the filter's output by inverters. Similarly, the on-chip single channel pulse generator in the fluorimeter IC is built by a high-pass filter with a fixed input capacitor, C<sub>i</sub>, and a variable resistor, R<sub>i</sub>, as shown in Fig. 50. A trans-conductance amplifier is connected with a feedback loop. When applying a small signal voltage,  $\Delta V$ , at its inverting input, the current in the feedback path,  $\Delta I$ , equals to  $g_m \Delta V$ , so the equivalent resistors looking into the inverting input of the trans-conductor is  $\frac{1}{g_m}$ . The value of the resistor can be adjusted by tuning the bias current of the trans-conductor. After the filter stage, transistor M<sub>1</sub> and  $M_2$  limit the voltage at node A between -0.3 V and 3.6 V. Two inverters sharpen the edges of the pulse before it is used to drive succeeding stages. The design implemented a 1.04 pF input capacitor in one channel to create the excitation pulse, and used a 1.57 pF input capacitor in the second channel to create the reset pulse. This is because the excitation pulse width should be smaller than the reset pulse. An additional inverter was added in the reset pulse channel to create a negative pulse required by the PMOS-FET-reset APS. The entire dual-channel pulse generator occupies an area of 410  $\mu$ m×245  $\mu$ m.



Fig. 50 The schematic of pulse generator



Fig. 51 The schematic of the trans-conductance amplifier used in the pulse generator

Fig. 51 shows the schematic and aspect ratio of transistors in the transconductance amplifier. By connecting its output to the inverting input to form the negative feedback loop, the circuit provides a draining or sourcing path for the input capacitor, C<sub>i</sub>, to discharge after the transition edge of the input clock. Because the step size at the input is the power supply, Vdd, being a large signal makes the actual resistance seen by the input capacitor,  $C_i$ , undergo a large signal resistance and a small signal resistance over time. For example, at the rising edge of the input clock, the voltage at the inverting input of the circuit experiences the same step response, if the parasitic capacitor can be neglected. Transistor  $M_1$  is fully turned off, allowing all tail current,  $2I_{con}$ , to flow through transistor  $M_2$ . This current is mirrored to the left by the current mirror consisting of  $M_3$  and  $M_4$ . So the maximum current that discharges capacitor  $C_i$  will be limited by the total tail current. It indicates that at the initial phase after the input clock edge, the equivalent resistance seen by the input capacitor is

$$R_{a} = \frac{Vdd}{2I_{con}}$$
 Eqn. 25

As the drain and gate voltage of transistor M1 decreases, it is eventually turned on and stays in saturation region, allowing part of the tail current been steered to the left arm. The input resistance seem by the capacitor Ci in this phase equals to

$$R_b = \frac{1}{g_m}$$
Eqn. 26

Both the large signal resistance and the small signal resistance can be tuned by the bias current of the trans-conductance amplifier. The exact relation between pulse width and the bias current was tested in circuit simulation to satisfy sufficient tuning range for both pulses. It was further measured on the IC by plotting the pulse width across different bias currents.

In the real measurement on silicon, the tuning range of the pulse generator was tested by setting the  $V_{ref}$  of the trans-conductor at 1.25 V, 1.3 V, and 1.4 V, and sweeping  $I_{bias}$  in each case from 1  $\mu$ A to 40  $\mu$ A. Post-layout simulation results are plotted as the comparison, as shown in Fig. 52, Fig. 53, and Fig. 54. For very small  $I_{bias}$ , pulses were

not observed in the measurement because the circuit can not provide sufficient discharging current for the input capacitor. The tuning range reached the maximum as  $V_{ref}$  became 1.4 V. Under such condition, the pulse width for the excitation pulse can be varied from 50 ns to 220 ns, and the pulse width for the reset pulse varies between 100 ns and 400 ns.



Fig. 52 The tuning range of the excitation and reset pulse when  $V_{ref}$  was 1.25 V



Fig. 53 The tuning range of the excitation and reset pulse when  $V_{ref}$  was 1.3 V



Fig. 54 The tuning range of the excitation and reset pulse when  $V_{ref}$  was 1.4 V

# 4.3.1.3 Delay line design and measurement

The circuit used to delay a pulse is called delay line. This design adopted a delay line structure that connects many unity-delay circuits in serial. A multiplexer is used to select a tap along the line[78]. As shown in Fig. 55, the input pulse propagates through  $2^{N+1}$  unity-delay circuits to generate N pulses with incremental delays on both rising end falling edges. An arbitrary pulse signal along the line can be selected by the multiplexer by providing an N bit address. The actual delay between input and output can be expressed as

$$TD(D_{in}) = \sum_{i=1}^{D_{in,dec}+1} t_{pd}(i) + t_{MUX}(D_{in})$$
 Eqn. 27

where:

TD -- the total delay for binary address  $D_{in}$ .

 $D_{in,dec}$  -- the decimal value of  $D_{in}$ .

 $t_{pd}(i)$  -- the actual delay of the ith unity-delay circuit.

 $t_{MUX}(D_{in})$  -- is the propagation delay of the multiplexer.

The non-linearity source of the delay line includes the mismatch of delay in different delay stages, and the mismatch of propagation delay for various multiplexing channels.



Fig. 55 The schematic of the delay line circuit[78]

The design implemented two delay line circuits, one for the excitation pulse, and another for the reset pulse. The excitation pulse delay line includes 128 stages, and the reset pulse delay line contains 64 stages. Using different number of stages is because the excitation pulse width is smaller than the reset pulse. These numbers allows two pulse shapes to be either completely isolated, or completely overlapped as two extremes, according to different measurement purposes.

The design implemented the adjustable delay element in each stage described in [78], and the post-silicon measurement was performed to compare with the post-layout simulation result presented in that work. In the first measurement, the bias voltages for NMOS draining transistors, PMOS sourcing transistors, and PMOS load capacitor in the unity-delay circuit were set to be 3.3 V, 0 V, and 0 V. A 120 ns-wide reset pulse was generated from the pulse generator by setting V<sub>ref</sub> to be 1.4 V and I<sub>bias</sub> to be 24  $\mu$ A. The pulse fed into the input of the 64-stage delay line, and the output was measured by a

Tektronix MSO 4104 oscilloscope as the delay line address varied from 000000<sub>2</sub> to 111111<sub>2</sub>. The measured transfer function is presented in Fig. 56(a). The minimum delay, maximum delay, and the average step size were 9.4 ns, 80.3ns, and 1.1 ns. DNL of the delay line was presented in Fig. 56 (b). The maximum value is + 0.24 LSB, and the minimum value is -0.46 LSB. The measured transfer curve is fitted into a linear function, and the difference between the measured data and the fitted data determines the INL, as shown in Fig. 56 (c). The maximum and the minimum values of the INL are +0.30 LSB and -0.38 LSB. These errors are an order of magnitude larger than the post-layout simulation results[78], because the simulation did not take account of the process variation of different delay stages.



Fig. 56 The measurement results on the 64-stage delay line that delays the reset pulse

Under the same bias condition, the 128-stage delay line used to delay the excitation pulse was measured by using a 60 ns-wide pulse. Measurement results are presented in Fig. 57. The minimum DNL was -1.1 LSB, indicating the multiplexing path associated with the MSB experienced a stronger mismatch than other paths. This is due to the fact that the 128-stage delay line combined 2 64-stage delay lines, but the

multiplexing paths connecting two blocks were not laid off symmetrically. To solve this issue in the measurement, this delay step is avoided when designing the integration window series.



Fig. 57 The measurement results on the 128-stage delay line that delays the excitation pulse

# 4.3.2 Switched-capacitor programmable-gain amplifier (SC-PGA) design and measurement

# 4.3.2.1 Circuit design and closed-loop gain analysis

In the discrete-component fluorimeter, the front-end avalanche photodiode provides an avalanche multiplication gain of ~ 100 to amplify the weak fluorescence. After averaging by the low-pass filter, the DC electric signal is further amplified by 330 through the sensing circuit. In CMOS APS, the avalanche multiplication is not available, so signal needs to be amplified out side the pixel. A 3-stage switched-capacitor amplifier is used for this purpose, as shown in Fig. 58. The first stage is a non-inverting amplifier built by op amp U<sub>1</sub>, transistors M<sub>1</sub>~M<sub>6</sub>, and capacitors C<sub>1</sub> and C<sub>2</sub>; The second stage is a buffer to maximize the load impedance seen by the first stage, and to minimize the source impedance seen by the third stage; the third stage is an inverting amplifier built by op amp U<sub>3</sub>, transistors  $M_7 \sim M_{12}$ , and capacitors  $C_{3x}$  (x=1, 2, 3, 4) and C<sub>4</sub>.



Fig. 58 The schematic of the switched-capacitor variable gain amplifier circuit

Stage 1 is controlled by a 4-phase clock, which divides the operation into a sampling phase and an amplification phase[79], [80]. In sampling phase, switch  $M_1$  and  $M_4$  are closed. Capacitor  $C_1$  is connected to the pixel output on its bottom plate, and its top plate is  $V_{ref1}$  set by the feedback loop. Switch  $M_6$  is open and switch  $M_5$  is conducting, so that the offset voltage of  $U_1$  is sampled on capacitor  $C_2$ . After the fluorescence is integrated by the pixel, both  $M_1$  and  $M_4$  are opened, trapping charges on  $C_1$ .

The amplification phase starts from the rising edge of  $\Phi$ 3. At this moment, switch  $M_2$  and  $M_6$  are closed, and switch  $M_5$  is opened. The feedback loop around  $U_1$  sources or drains charges to  $C_2$  to maintain the voltage at node A to be  $V_{ref1}$ . This process is completed until capacitor  $C_1$  is fully discharged. Notice that there is a 300 ps delay between the falling edge of  $\Phi$ 1 and the rising edge of  $\Phi$ 3 measured on clock generation circuit. This means the amplification phase starts after the end of sampling phase by 300

ps. The non-overlapping clock signals make sure charges are trapped on  $C_1$  and  $C_2$  before amplification gets started.



Fig. 59 The equivalent circuits of stage 1 in sampling phase (a) and in amplification phase (b)

The equivalent circuits for stage 1 in sampling phase and the amplification phase are presented in Fig. 59 to quantitatively study the dynamic closed-loop behavior of the circuit. Neglecting the offset voltage of the op amp, voltage at node A in sampling phase is

$$V_{A0} = \frac{A_0}{1 + A_0} V_{ref1}$$
 Eqn. 28

where  $A_0$  is the open loop DC gain of the op amp U<sub>1</sub>. At the beginning of the amplification stage, capacitor C<sub>2</sub> is connected in the feedback path, and the bottom of capacitor C<sub>1</sub> is connected to V<sub>ref1</sub>. Because voltage at the output of the op amp can not change abruptly due to the capacitive load, the change of the voltage at the bottom plate of C<sub>1</sub> leads to the voltage at node A to be

$$V_{A1} = V_{A0} + (V_{ref} - V_{in}) \cdot \frac{C_1}{C_1 + C_2}$$
 Eqn. 29

After this point, due to the error voltage presented at the op amp input, its output pull charges from  $C_2$  to ground to recover the voltage at node A to  $V_{refl}$ . In the end of charge transfer, voltage at op amp output becomes  $V_x$ , and the voltage at node A becomes

$$V_{A2} = V_{A1} + (V_x - V_{A0}) \cdot \frac{C_2}{C_1 + C_2}$$
 Eqn. 30

In the steady state, voltage at the op amp output satisfies

$$V_x = (V_{ref1} - V_{A2}) \cdot A_0$$
 Eqn. 31

Solving Eqn. 28 to Eqn. 31, the final output of the amplifier is

$$V_{x} = \frac{V_{in} \frac{C_{1}}{C_{1} + C_{2}} - V_{ref1} \frac{C_{1}}{C_{1} + C_{2}} \frac{A_{0}}{1 + A_{0}}}{\frac{1}{A_{0}} + \frac{C_{2}}{C_{1} + C_{2}}}$$
Eqn. 32

Assume the open-loop DC gain of the op amp is large, but is comparable to the capacitor ratio of  $C_1$  to  $C_2$ , Eqn. 32 can be simplified as

$$V_x = \frac{V_{in} - V_{ref1}}{\frac{1}{A_0} (1 + \frac{C_2}{C_1}) + \frac{C_2}{C_1}}$$
Eqn. 33

It can be observed that for finite value of  $A_0$ , the closed-loop gain is limited to  $A_0$  as  $C_2$  approximates to 0. In another word, the closed loop gain is limited by the open loop DC gain of the op amp.

All three op amps used in the PGA are two-stage op amps with a miller compensation capacitor. The zero caused by the miller capacitor path was further cancelled by a transistor biased in triode region. The open loop DC gain of the op amp is ~80 dB, and the phase margin is 85.6°, as shown in Fig. 60. The design ratio of  $C_1$  to  $C_2$ is 300, because in discrete-component electronics the gain of the amplifier stage was ~ 300. According to Eqn. 33, the design value of the closed-loop gain for stage 1 is 291.2, with 2.9% gain error due to the finite open-loop gain. When the circuit is fabricated, it suffers from a larger gain error due to the additional parasitic capacitance of  $C_2$  that makes the ratio of  $C_1$  to  $C_2$  smaller than 300. Another effect is the gate parasitic





Fig. 60 Frequency response of op amps used in switched-capacitor PGA based on post-layout simulation

Because in discrete-component fluorimeter, an avalanche photodiode was used to detect the fluorescence, weak fluorescence can be amplified by a factor of ~100 owning to the avalanche gain of the sensor itself. The active pixel sensor used in IC can not provide in-pixel amplification, so that ~100 gain factor needs to be carried out outside the pixel. This was realized by buffering stage 1 output through stage 2, and a stage 3 is further implemented in PGA circuit to provide additional gain. The design implemented 4 input capacitors,  $C_{30}$ ~ $C_{33}$ , at the input, and a fixed 5 fF feedback capacitor,  $C_4$ . Input capacitors can be optionally connected to the circuit by turning on the associated switches,  $M_7$ ~ $M_{10}$ , making stage 3 programmable. This stage uses the same clock,  $\Phi 1$ and  $\overline{\Phi 1}$ , as stage 1 and divides the operation into the same sampling and amplification phase. In sampling phase,  $C_4$  is shorted by turning on  $M_{12}$ . The closed-loop around  $U_3$ samples the offset of  $U_1$ ,  $U_2$ , and  $U_3$  onto the input capacitor(s) connected to the circuit. After the sampling phase,  $M_{12}$  is opened, and switch  $M_{11}$  is used to partially cancel the channel charge injection. As the output of stage 1 is propagated to the input of stage 3, the feedback loop of  $U_3$  pulls charges from  $C_4$  to drive node B towards  $V_{ref2}$ . In the end, all charges transmitted to input capacitor(s) are also transmitted to  $C_4$ . Similar to the closed-loop gain analysis for stage 1, the final output of stage 3 is

$$\Delta V_{out} = \frac{-\Delta V_{in}}{\frac{1}{A_0} (1 + \frac{C_4}{C_3}) + \frac{C_4}{C_3}}$$
Eqn. 34

where  $\Delta V_{in}$  is the input voltage at stage 3. Stage 3 is an inverting amplifier with the maximum ideal closed-loop gain equal to A<sub>0</sub>, as the feedback capacitor, C<sub>4</sub>, approaches to 0. When connecting C<sub>30</sub>, C<sub>30</sub>~C<sub>31</sub>, C<sub>30</sub>~C<sub>32</sub>, and C<sub>30</sub>~C<sub>33</sub> as the input capacitor(s), the actual closed loop gains are 5.996, 104.878, 201.822, and 296.886. The corresponding closed-loop gain errors are 0.07%, 1.06%, 2.02%, and 3.00%. Combining the gain calculated on stage 1, and assuming stage 2 is ideal, the programmable gains of the entire PGA are 65 dB, 89 dB, 95 dB, and 98 dB.

# 4.3.2.2 Noise analysis

The noise introduced by the PGA degrades the signal-to-noise ratio (SNR) seen by the succeeding ADCs compared to the original pixel output. This effect should be minimized because the SNR at the pixel output tends to be low for very weak fluorescence signal. Assuming ideal op amp, the total input referred noise power spectrum density (PSD) of the entire PGA satisfies

$$\overline{V_{in,tot}^2} = \overline{V_{in1}^2} + (\overline{V_{in2}^2} + \overline{V_{in3}^2}) \frac{C_2}{C_1}$$
 Eqn. 35

Where  $\overline{V_{in1}^2}$ ,  $\overline{V_{in2}^2}$ , and  $\overline{V_{in3}^2}$  are the input referred PSD for stage 1, 2, and 3. It can be seen that because the ratio of C<sub>2</sub> to C<sub>1</sub> is  $\frac{1}{300}$ , the dominant noise source seen at the input of the PGA comes from the first stage.

Ron4  $\overline{V_{S4}^2}$   $V_{S4}$   $V_{S4}$   $V_{S4}$   $V_{S4}$   $V_{S4}$   $V_{S5}$   $V_{S5}$  $V_{S$ 

Fig. 61 The noise sources of stage 1 amplifier during sampling mode

The noise sources in the first stage include the input referred noise of the op amp  $U_1$ , and the thermal noise from transistor  $M_1$ ,  $M_2$ ,  $M_4$ ,  $M_5$ , and  $M_6$ , when they are turned on. The analysis is split into the sampling mode and the amplification mode. In sampling mode, noise sources are plotted in Fig. 61. The actual effect of noise in this phase is that capacitor  $C_1$  and  $C_2$  keep tracking the thermal noise from transistors until sampling is completed. Then the noise is stored on the capacitor in form of charge. They charge will reflect at the output by charge transfer mechanism. So the noise PSD deposited on capacitor  $C_1$  is

$$\overline{\Delta V_{c1}^2} = (\overline{V_{s1}^2} + \overline{V_{in,ref}^2}) \frac{1}{1 + (2\pi f R_{on1} C_1)^2}$$
Eqn. 36

Similarly, the noise PSD deposited on capacitor C2 is

$$\overline{\Delta V_{C2}^2} = \overline{(V_{s5}^2 + V_{in,ref}^2)} \frac{1}{1 + (2\pi f R_{on5} C_2)^2}$$
Eqn. 37

As for thermal noise generated by M<sub>4</sub>, because the feedback loop continuously adjusting the output of U<sub>1</sub>, so that the inverting input of the op amp is virtually ground, and the effect of  $\overline{V_{s4}^2}$  on both C<sub>1</sub> and C<sub>2</sub> is annihilated.

The noise sources in amplification mode are plotted in Fig. 62. There are three mechanisms that contribute to the input referred noise. First, the sampled noise voltage on capacitor C<sub>1</sub> during sampling phase is amplified by a factor of  $\frac{C_1}{C_2}$  to the output, in the form of ideal charge transfer shown in Fig. 62. This is because when not considering other noise sources, both terminal of C<sub>1</sub> will be grounded eventually, so the only path to dissipate the "noise charges" accumulated on C<sub>1</sub> during sampling phase is through C<sub>2</sub>. When referred to the input, the output noise voltage is divided by the closed-loop gain, resulting in an input referred noise PSD component by the sampled noise voltage on C<sub>1</sub> itself



Fig. 62 The noise sources of the SC-PGA in amplification phase

Second, the input referred noise of  $U_1$ , and the thermal noise from transistor  $M_2$  present themselves directly at the input, so

$$\overline{V_{in,n2}^2} = \overline{V_{in,ref}^2} + \overline{V_{s2}^2}$$
 Eqn. 39

Third, the sampled noise voltage on  $C_2$  is reflected at the input by an attenuation factor determined by the capacitor ratio of  $C_2$  to  $C_1$ . This part of input referred noise PSD is

$$\overline{V_{in,n3}^2} = \overline{\Delta V_{C2}^2} (\frac{C_2}{C_1 + C_2})^2$$
 Eqn. 40

Because  $\frac{C_2}{C_1 + C_2}$  is close to zero, this part can be neglected. Thermal noise created by

 $M_6$ ,  $\overline{V_{s6}^2}$  does not present itself at the input because of the continuous adjustment made by the op amp through the feedback loop.

The total input referred noise power spectrum density is the combination of Eqn. 38 and Eqn. 39 plus the bandwidth limit due to the op amp  $U_1$  with its closed loop feedback, it can be expressed as

$$\overline{V_{in,tot}^2} \approx \frac{4kTR_{on1}}{\left[1 + \left(\frac{f}{f_{Ron1,C1}}\right)^2\right]\left[1 + \left(\frac{f}{f_{samp}}\right)^2\right]} + \overline{V_{in,ref}^2(\omega)} + \frac{4kTR_{on2}}{1 + \left(\frac{f}{f_{amplf}}\right)^2}$$
Eqn. 41

In the equation,  $f_{Ron1,C1}$  is the -3 dB bandwidth due to resistor R<sub>on1</sub> (W/L= 1.2 µm/ 0.35 µm, 2.6 k  $\Omega$  turn-on resistor) and C<sub>1</sub> (1.5 pF) in sampling phase. The frequency value is 40.8 MHz.  $f_{samp}$  is the bandwidth of stage 1 amplifier in sampling mode. That frequency equals 54 MHz.  $f_{amplf}$  is the bandwidth of stage 1 amplifier in amplification mode. That frequency value is  $f_{samp}$  devided by the closed-loop gain of 300, which is 180 kHz. The reason to include  $f_{samp}$  and  $f_{amplf}$  is that before presenting themselves at the output of stage 1, the resistor noise due to Ron1 and Ron2 are filtered in the bandwidth of  $f_{samp}$  and  $f_{amplf}$ , due to the band limiting nature of the closed-loop U<sub>1</sub>, in sampling mode and

amplification mode. Because  $f_{Ron1,C1}$  and  $f_{samp}$  are very closed, Eqn. 41 can be approximated into Eqn. 42

$$\overline{V_{in,tot}^2} \approx \frac{4kTR_{on1}}{\left[1 + \left(\frac{f}{f_{Ron1,C1}}\right)^2\right]^2} + \overline{V_{in,ref}^2(f)} + \frac{4kTR_{on2}}{1 + \left(\frac{f}{f_{amplf}}\right)^2}$$
Eqn. 42

Integrating Eqn. 42 in the bandwidth of interest, BW, and then taking the square root, the total input referred Root-Mean-Square (RMS) noise voltage of the PGA is

$$V_{in,rms} \approx \sqrt{V_{in,rms,U1}^{2} + \frac{2kT}{\pi C_{1}}} \left( \frac{1}{2} \tan^{-1}(\frac{BW}{f_{Ron1,C1}}) + \frac{\frac{BW}{f_{Ron1,C1}}}{2\left(\frac{BW^{2}}{f_{Ron1,C1}^{2}} + 1\right)} \right) + 4kTR_{on2}f_{amplf} \tan^{-1}(\frac{BW}{f_{amplf}}) \text{ Eq}$$

n.43

where  $V_{in,rms,U1}$  is the RMS input noise of op amp U<sub>1</sub> over BW. The insight from Eqn.

43 is that, first C<sub>1</sub> should be made large to reduce the  $\frac{kT}{C}$  noise. Second, the op amp U<sub>1</sub> should be built low noise to minimize its input referred noise. Third, the aspect ratio of transistor M<sub>2</sub> should be large to decrease its thermal noise over BW.

In the design, the RMS input noise of op amp U<sub>1</sub> (over 54 MHz bandwidth) was 50 $\mu$ V. The aspect ratio of transistor M<sub>2</sub> was made 12  $\mu$ m/0.35  $\mu$ m to have a channel resistance, R<sub>on2</sub>, to be 126  $\Omega$ .  $f_{Ron1,C1}$  is 40.8 Mhz.  $f_{amplf}$  is 180 kHz. Substitute all values into Eqn. 43, the total input referred RMS voltage of the amplifier is 223  $\mu$ V. This value will limit the signal-to-noise ratio (SNR) at the input of the succeeding ADC for a given signal power from the pixel. In calculating the input referred RMS noise of the op amp U<sub>1</sub>, only the thermal noise was considered, because the 1/f noise (with corner frequency from 500 kHz to 1MHz) can be effectively reduced by the offset cancellation nature of the switched-capacitor circuit[81]. The integration bandwidth was chosen the unity gain bandwidth of the op amp U<sub>1</sub>, which is 53.7 MHz.

The ADC was designed with a resolution of 10 bits, so the output SNR of the PGA should satisfy

$$\frac{SNR - 1.76}{6.02} > 10$$
 Eqn. 44

so that after the amplification, the signal still remain sufficient number of bits, as an equivalent measure of signal to noise ratio, for the ADC to digitize. The output SNR of the PGA can be calculated by using the RMS input noise voltage. Assume noise introduced by the pixel is negligible for the first order approximation, the output SNR of PGA is

$$SNR = 20\log_{10}(\frac{V_{pout}}{V_{in,rms}})$$
 Eqn. 45

where  $V_{pout}$  is the small signal RMS voltage at the output of the pixel sensor. To make sure Eqn. 45 satisfy Eqn. 44, the small signal RMS voltage at the output of the sensor should be larger than 180 mV. Fig. 63 plots how the effective number of bits (ENOB) of the signal seen by the 10-bit ADC varies as the signal power from pixel sensor increases. It also shows how a smaller input referred RMS noise voltage of the PGA can affect the curve. By reducing the PGA input referred RMS noise by a factor of ~3, the signal power from pixel sensor is reduced by a factor of ~3 for guaranteeing the same 10-bit ENOB seen by the ADC.



Fig. 63 The effective number of bit (ENOB) of the signal at the output of the amplifier vs. pixel output for different input referred RMS noise of the PGA

The insights from Fig. 63 is that the ENOB of the signal seen by the ADC remains below 14 bits as the pixel output RMS voltage reaches to 1 V. In real case this value is dependent on the type of chemical and the concentration of the liquid sample. A 10-bit ADC guarantees that as long as the pixel output RMS voltage is close to 180 mV, the digitized signal can have approximately 10-bit accuracy. A weaker fluorescence signal from the excited chemical sample will reduce the output amplitude of the pixel sensor, and the ENOB seem by the ADC becomes less than 10 bits. This makes the 10-bit ADC "over-qualified" to digitize the signal. However, averaging can be implemented to reduce the noise effectively. Averaging the uncorrelated measurement by 10,000 times will reduce the input referred RMS noise voltage become only 2.23  $\mu$ V. Accordingly, the active pixel sensor only needs to output an RMS voltage of 1.8 mV.

# 4.3.2.3 Layout and post-layout simulation

The layout of an entire PGA channel measures 24  $\mu$ m × 660  $\mu$ m. 44 % of the area is occupied by poly-poly capacitors. The parasitic parameters including resistance, capacitance, and inductance were extracted under a single-channel layout, and the netlist was used to post-layout simulate the PGA. The test bench is shown in Fig. 64. Transistor M<sub>1</sub>, M<sub>2</sub>, and M<sub>3</sub> are direct copies of the source follower transistors in a pixel, and the column current sources. In the pixel reset mode, because voltage applied on the cathode of the diode is 3.3 V, the same voltage is applied to the gate of M<sub>1</sub> in simulation. Switch M<sub>2</sub> is turned on by applying 3.3 V to its gate. A 10  $\mu$ A current was drained by the current source transistor M<sub>3</sub>. This leaves a 2.022 V DC bias at the inverting input of the PGA. Voltage at the non-inverting input was set to V<sub>offset</sub>+ $\Delta$ V<sub>in</sub>, where V<sub>offset</sub> is the offset voltage, and equals to 2.022 V in this case,  $\Delta$ V<sub>in</sub> is the incremental voltage that linearly increases in simulation. For one differential input, the PGA samples the input in 1  $\mu$ s and amplifies it in 4  $\mu$ s.



Fig. 64 The post-layout simulation test bench for a single channel PGA

Fig. 65 is the simulated transfer function of the PGA when different input capacitors were connected at stage 3. For example, when  $D<3:0>=0001_2$ , only 30 fF input capacitor,  $C_{30}$ , in Fig. 58 was used, and the measured gain was 62 dB. The measured gains for the remaining configurations were 83 dB, 89 dB, and 93 dB. They all matched the hand calculation results.



Fig. 65 The post-layout simulation result for a single channel PGA matched the hand-calculation results.

# 4.3.2.4 Post-silicon measurement results and discussion

The PGA channel in the middle of the column-parallel layout was measured with the same principle as in the post-layout simulation, but special testing techniques were taken to achieve the fine resolution input control. As can be observed from Fig. 65, testing high gain configuration requires PGA's non-inverting input increases at ~  $\mu$ V/step. Commercial high resolution DACs (>16 bits) to produces that small voltage steps are expensive and suffer from large non-linearity. Two lower-resolution DACs (12 bits) were used to construct a coarse-fine circuit that provides linear  $\mu$ V-level resolution, as shown in Fig. 66. Two resistors serve as a voltage divider that fractionalizes the voltage contribution from the coarse DAC and the fine DAC, both are 12-bit accurate.

The voltage at op amp output with respect to two DAC output is



Fig. 66 In the schematic of the dual-12-bit coarse-fine DAC

The least significant bit voltage for each 12-bit DAC is 806  $\mu$ V, given 3.3 V reference rail. This makes the coarse voltage step to be 798  $\mu$ V, and the fine voltage step to be 8  $\mu$ V.



Fig. 67 The lab test bench to measure the single channel PGA gain

The coarse-fine DAC was programmed by an MCU through SPI. Its output was used to feed into the non-inverting input of the PGA, as shown in Fig. 67. The inverting input of the PGA is connected to a pixel on the chip. The pixel was reset by a 60 ns-wide 200 kHz reset pulse, but was shielded from the light. The PGA operated at the same frequency, with 1 µs sampling phase, and 4 µs amplification phase the same as the postlayout simulation. A 10-bit CPOSSAR ADC digitized the PGA output at the end of each cycle. Data was transmitted to PC through microcontroller memory interface.

The measured transfer functions of the PGA channel under different gain configurations are plotted in Fig. 68. Because the conversion range of the CPOSSAR ADC was from 450 mV ~ 1800 mV, PGA output lower than 450 mV was distorted. The measured gain values were 44 dB, 48 dB, 50 dB, and 56 dB. Each measurement was performed 10, 000 times to reduce the input referred RMS noise voltage of the PGA to  $2.23 \mu$ V.



Fig. 68 The measured transfer function of a PGA channel

The measured gain of the PGA was lower than the post-layout simulation results by  $18 \sim 39$  dB under different gain configurations. The mechanism of gain reduction was analyzed. The first reason is the underestimated parasitic capacitance on C<sub>2</sub> and C<sub>4</sub> (Fig. 58) when simulating the PGA using only a single-channel layout. The second reason is the clock skew caused by the top-level place and route. Both mechanisms were studied by composing more comprehensive simulation test-benches than the single-channel postlayout simulations performed in design phase.

Before more complete post-layout simulations using the array layout was performed to include the parasitic effects due to channel coupling, the simulation on a single-channel PGA layout when  $C_{30}$  was used in stage 3 is shown in Fig. 69 as a reference. It separates the transfer function for stage 1 on the top, and leaves the gain for the entire amplifier at lower part. The gain of stage 1 is called  $G_1$ , and the gain of the entire PGA is called  $G_{tot}$ .  $G_{tot}$  was further divided by  $G_1$  to calculate the gain for stage 3, called  $G_3$ . In the simulation, clock skew due to top-level place and route was not included to just evaluate the parasitic effect of a single-channel layout itself.



Fig. 69 post-layout simulation result on a single-channel PGA layout without clock skew

Fig. 70 presents the simulation results when using the 32-channel PGA array layout. It is observed that  $G_1$  is reduced from 232 to 127, and  $G_3$  is reduced from 4.8 to 2.1. Because the parasitic parameters are the only difference between two types of post-layout simulations, the discrepancy means the parasitic effects will change when using the entire array layout, which is closer to the fabricated chip.



Fig. 70 post-layout simulation result on a single-channel PGA in 32-channel PGA array without clock skew

The parasitic capacitance created in the feedback paths in both stage 1 and stage 3 affects the actual gain of the PGA. The model that presents all parasitic capacitors in stage 1 is presented in Fig. 71.  $C_{p1}$  and  $C_{p2}$  are two parasitic capacitors that are in parallel with  $C_1$  and  $C_2$ .  $C_{p3}$  is the parasitic capacitor from the inverting input of the op amp to the ground. In sampling phase,  $C_1$  and  $C_{p1}$  track the pixel output by storing  $Q_1$  amount of charges. The inverting input of the op amp is virtually ground so no charge is stored on  $C_{p3}$ .



Fig. 71 The parasitic capacitance model of the stage 1 when a single-channel layout is place in the array

Starting from the amplification phase,  $C_1$  and  $C_{p1}$  are grounded.  $Q_1$  is conducted to the ground and the same amount of charges,  $(Q_1+Q_2)$ , is pulled away from  $C_2$ ,  $C_{p2}$ , and  $C_{p3}$ . This creates an error voltage as the differential input,  $V_{err}$ , and the output,  $V_{out}$ , equals to  $A_0V_{err}$ , where  $A_0$  is the open-loop gain of the op amp. Eqn. 47 ~ Eqn. 51 are satisfied during the above process

$$Q_1 = V_{in}(C_1 + C_{p1})$$
 Eqn. 47

$$Q_1 = Q_2 + Q_3$$
 Eqn. 48

$$V_{err} = \frac{-Q_3}{C_{p3}}$$
 Eqn. 49

$$V_{out} = A_0 V_{err}$$
 Eqn. 50

$$Q_2 = (C_2 + C_{p2}) \cdot (A_0 + 1) \cdot Verr$$
 Eqn. 51

The closed-loop gain when including all parasitic capacitance can be written as

$$A_{cl} = \frac{A_0}{(A_0 + 1) + \frac{C_{p3}}{C_2 + C_{p2}}} \cdot \frac{C_1 + C_{p1}}{C_2 + C_{p2}}$$
 Eqn. 52

The first term in Eqn. 52 is less affected by the parasitic effect. This is because the open-loop gain of the op amp was measured ~80 dB in post-layout simulation. If the ratio of  $C_{p3}$  to  $(C_2+C_{p2})$  is 100, it will only create a 1% error. In that case,  $C_{p3}$  needs to be 0.5 pF, 100 times larger than  $C_2$ , 5 fF, if  $C_{p2}$  equals 0. A non-zero  $C_{p2}$  requires an even larger  $C_{p3}$ . This large parasitic capacitance approaching Pico farad scale is less likely to occur.

The dominant reason for the closed-loop gain reduction consists in the second term in Eqn. 52. Because  $C_1$  is 1.5 pF, but  $C_2$  is only 5 fF, a small  $C_{p2}$  in femtofarad scale will affect the gain significantly. By checking the netlist created in the layout parasitic extraction made on the PGA array layout, the parasitic capacitor  $C_{p2}$  was found to be 4.1

fF, and the parasitic capacitor in parallel with  $C_4$ , called  $C_{p4}$ , was found to be 6.4 fF. Both were less than 1 fF in the single-channel PGA layout parasitic extraction.



Fig. 72 The illustration of clock skew on affecting the gain of the SC-PGA

Besides the gain reduction due to the parasitic capacitance, the gain of stage 3 is reduced because of the clock skew. As shown in Fig. 72, in post-layout simulation of a single channel PGA layout, the gates of switch M<sub>4</sub> and M<sub>12</sub> were ideally connected to the same clock called  $\Phi$  1, so the falling edge of  $\Phi$ 1 arrives at both switches concurrently. This allows both stage 1 and 3 to leave their sampling phase synchronously. However, due to the top-level place and route, there was a 9 µm by 600 µm metal-2 connection between the gate of M<sub>4</sub> and M<sub>12</sub>. This layer was further sandwiched by metal-1 and metal-3 as shielding layers that were grounded. As shown in Fig. 72, the lossy transmission line creates a 4.6 ns delay based on the post-layout simulation, making switch M<sub>12</sub> to be opened after M<sub>4</sub> by that amount. Because stage 1 enters its amplification phase only 300 ps after M<sub>4</sub> is opened, the 4.6 ns delay caused by the metal-2 connection being accumulated on  $C_4$ , an amount of charges are by-passed by  $M_{12}$  while stage 1 is amplifying the signal. Because fewer amounts of charges are transmitted to  $C_4$ , signal amplified at stage 3 is reduced, making the total gain of the PGA reduced.

To quantify the clock skew effect on the loss of gain, the top-level clock connection was incorporated into the PGA array layout. After parasitic extraction, a new simulation was performed to find out the gain for stage 1, stage 3, and the entire PGA. The results are as shown in Fig. 73. The measured gain for stage 1 remained 126, which agreed with Fig. 70. But the gain for stage 3 was reduced from 2.1 to 1.5. That means 28 % charges resulting from the amplification made in stage 1 leaked through  $M_{12}$ , rather than be received by  $C_4$  and  $C_{p4}$ .



Fig. 73 post-layout simulation results on a single-channel PGA in 32-channel PGA array when clock skew was included

All above simulation and analysis were made by connecting only  $C_{30}$  at the input of stage 3, corresponding to D<3:0>=0001<sub>2</sub>. All remaining gain settings were also simulated to compare with their measurement results. Table. 3 is the summary of postlayout simulated gains and measured gains. For each gain setting, three types of layouts were used: "Single" represents using a single-channel PGA layout without the top-level clock routing; "Array" indicates simulating the PGA array layout without the top-level clock routing; "Array + clock skew" further added the top-level clock routing to include the clock skew. By reading the results, it is first confirmed that when using the entire array layout, the gain for both stage 1 and 3 were reduced by approximately 50 % than the single-channel layout, suggesting the feedback factors created by the capacitor ratios were doubled due to the parasitic capacitance in femtofarad level. When the clock skew was introduced, the gain for stage 1 remains, but the gain for stage 3 can be reduced to  $\sim 12\%$ . The array layout, which is closer to the fabricated chip, produced the simulated total gain of 45 dB, 56 dB, 58 dB, and 62 dB. They are close to the measured gain of 44 dB, 48 dB, 50 dB, and 56 dB.

| Gain setting             |         | Single | Array | Array + clock skew | Measured |
|--------------------------|---------|--------|-------|--------------------|----------|
| D<3:0>=0001 <sub>2</sub> | Stage 1 | 232    | 127   | 126                | -        |
|                          | Stage 3 | 4.8    | 2.1   | 1.5                | -        |
|                          | Total   | 61 dB  | 48 dB | 45 dB              | 44 dB    |
| D<3:0>=0011 <sub>2</sub> | Stage 1 | 242    | 118   | 118                | -        |
|                          | Stage 3 | 60     | 31    | 5.3                | -        |
|                          | Total   | 83 dB  | 71 dB | 56 dB              | 48 dB    |
| D<3:0>=0111 <sub>2</sub> | Stage 1 | 240    | 118   | 118                | -        |
|                          | Stage 3 | 109    | 60    | 7.0                | -        |
|                          | Total   | 88 dB  | 77 dB | 58 dB              | 50 dB    |
| D<3:0>=1111 <sub>2</sub> | Stage 1 | 240    | 120   | 120                | -        |
|                          | Stage 3 | 158    | 86    | 10.2               | -        |
|                          | Total   | 92 dB  | 80 dB | 62 dB              | 56 dB    |

Table. 3 Summary of post-layout simulated gains and measured gains for all types of gain settings

In conclusion, the first reason for the reduced gain in the amplifier is the use of femtofarad-level capacitors to realize the feedback factors. When the capacitors are interfered with parasitic capacitance in just the range of femtofarad-level, the actual gain of the circuit varied noticeably from the design values. This effect became dominant

when placing many single-channel amplifier layouts into an array layout with the minimum spacing. It will be an improvement by increasing the absolute value of capacitors while maintaining their ratios, but the area of the layout will be increased propotionally, making the 25  $\mu$ m-pitch size of the pixel array more difficult to realize. So building a column-parallel amplifier at that small pitch size requires more dedicate layout skill to realize more than ~ 60 dB closed-loop gain. The second reason for the reduced gain is the clock skew due to top-level place and route. This issue can be resolved by reversing the clock direction, so that switch M<sub>12</sub> is opened before M<sub>4</sub>, and no charge produced by the amplification made by stage 1 can leak through M<sub>12</sub>.

# 4.3.3 10-bit column-parallel overlapping-subrange SAR ADC (CPOSSAR ADC)

The output from the SC-PGA in each channel is digitized by a 10-bit 22 µm-pitch overlapping-subrange SAR ADC (CPOSSAR ADC). The first ADC tape-out under this architecture was developed and tested in 2012 [82]; because of the large DNL error observed from the measurement, it was re-implemented as a 9-bit ADC in 2013 by introducing overlapping-subrange technique, adding autozeroing circuitry, and optimizing digital logic[83]. Based on that work, a 10-bit ADC was re-built in 2014 by introducing an additional bit. Such a 10-bit ADC was used as the single column ADC in the final imager chip.

### **4.3.3.1 CPOSSAR ADC architecture**

The CPOSSAR ADC array includes a copy of 32 individual ADC channels and a shared reference circuit, as shown in Fig. 74. Each ADC channel is 22 µm wide to

support the 25  $\mu$ m pixel format. The pitch size of a pixel is 20  $\mu$ m, but the additional 5  $\mu$ m space between two pixels is used for signal routing.



Fig. 74 The block diagram of the CPOSSAR ADC

In the architecture of the CPOSSAR ADC, a 4-bit resistor ladder DAC and its buffer bank are shared by all columns. This DAC evenly divides the reference rail from V<sub>resL</sub> to V<sub>resH</sub> into 16 subranges. An individual ADC in a channel includes a 6-bit SAR ADC and a subranging circuit. The analogy of the relation among the 4-bit resistor ladder, subranging circuit, and the 6-bit SAR ADC is the combination of the main memory, cache, and the central processing unit (CPU) in a computer system. Subranging circuit is transparent to the 6-bit ADC with regard to the resistor ladder. It senses all the produced subranges from the 4-bit resistor ladder, but only selects a pair of references to set the actual reference rail seen by the 6-bit SAR ADC. The advantage is that the data conversion can be divided into a coarse phase and a fine phase to achieve an overall resolution more than 6 bits. The subranging circuit first by-passes the complete reference rail to the 6-bit ADC to resolve the MSBs. The result is used to control the subranging circuit to select a subrange that the ADC input resides in. Resolution achieved by the second phase can be improved by a factor of  $2^4$ , because the reference rail seen by the 6bit ADC is a factor of  $2^4$  smaller than before.

The 6-bit ADC contains a pre-amplifier before the comparator. In sampling phase, switch S of the pre-amplifier is closed and the input voltage is tracked by the 6-bit capacitor array DAC, or 6-bit CDAC. When S is opened, charges are trapped on the capacitor of the CDAC.

In coarse conversion phase, the subranging circuit is reset, and the complete reference rail,  $V_{resL}$  to  $V_{resH}$ , is selected for the 6-bit ADC to use. The 6-bit most significant bits (MSBs), D[11:6] are resolved in such a coarse conversion phase. Data is stored in a 6-bit register.

The subranging circuit adjusts the reference rail seen by the 6-bit ADC in the subrange creation phase. This is realized by first feeding the upper 4 bits from the 6-bit register, D[11;8], into a 4-bit decoder to close a pair of bias switches. The switch determines the subrange created by the 4-bit ladder to be connected to the 6-bit ADC. The decoder that controls the switch is designed to guarantees the selected subrange will be the very subrange the input voltage is located at. Determined by the next bit of the 6-bit register in the 6-bit ADC, D[7], a level shifter either bypasses the selected subrange, or shifts it to create an overlapping subrange for error correction purpose.

The created subrange is then sensed by the 6-bit ADC and used to perform a second 6-bit conversion, the fine conversion. Because the reference rail in this conversion phase is a factor of  $2^4$  smaller than in the coarse conversion stage, the resolution is improved by 4 bits. At the end of the fine conversion, the 6-bit LSB, D[5:0], is kept in the 6-bit successive approximation register. Eventually, the complete binary data D[11:0] and D[7:6] are concatenated and transmitted off the IC. There are two redundant bits in the 12-bit data, D[7:6], because the overlapping subrange creates such redundancy to

overcome errors possibly occurred during coarse conversion and subrange selection. So the two bits are finally used to remove the error in software. In the end, the overall number of bits of the data conversion becomes 10 bits.

### **4.3.3.2** Overlapping subrange technique

The CPOSSAR ADC uses a 6-bit capacitor DAC twice to achieve 10-bit resolution not available by a conventional 6-bit SAR ADC. Since the 6-bit capacitor array DAC is less than 10-bit ratio-accurate in real silicon, errors in the coarse conversion phase tend to induce an incorrect selection of subrange to be used in the fine conversion phase. In this case, the sampled input voltage exceeds the ceiling or floor of the selected reference rail used in the fine conversion phase, and large differential non-linearity (DNL) errors were observed at the transition areas between subranges.

The nature of the above error is the mismatch of the ideal subrange voltages and the reference voltages created by the capacitive DAC during coarse conversion, due to capacitor mismatch. To eliminate the error, one possible way is to implement large capacitor array layout, so that the 6-bit capacitor array can be more than 10-bit ratio accurate. However, the lack of mismatch model in the targeting technology makes it difficult to access the minimum required area per unit capacitor for that objective. In addition, large capacitor array is difficult to implement in the column layout as narrow as  $22 \,\mu\text{m}$ .

The second method to eliminate the error source, that the input voltage exceeds the selected subrange rail, is to use overlapping subranges in the fine conversion phase, as shown in Fig. 75 (a)[83]. In this method, the original combination of subranges is raised by an offset voltage, so that the new subranges overlap the old ones. To implement the

method, the 5th bit resolved in the coarse conversion phase, or the 5th bit stored in the 6bit register, D[7], determines whether an un-raised subrange or the raised subrange is used. If D[7] is 0, the input is located below the mid-point of the un-raised subrange, and it will be used directly in fine conversion phase without level shifting. If D[7] is 1, the input is located above the mid-point of the un-raised subrange, then this rail will be raised by an offset for use in the fine conversion. The offset is set to be half the width of a normal sub-range, so that the raised sub-ranges strides right between two adjacent unraised sub-ranges.



Fig. 75 The overlapping subranges used to remove the error produced in coarse conversion cycle



Fig. 76 A level shifter circuit used to produce the redundant overlapping subranges

This method provides redundancy that guarantees the input voltage never exceeds the reference rail in the fine conversion phase. To make this method effective, subrange voltages created by the 4-bit resistor DAC,  $V_{ref(i)}$ , were made lower than the ideal reference voltages created by the capacitor array DAC,  $V_{cap_ideal(i)}$ , as shown in Fig. 75(b).

The overlapping subranges are generated by a level shifter circuit shown in Fig. 76. In the subrange creation phase, the bottom plate of capacitor  $C_s$  is connected to Vbias through Switch B, a multiplexer controlled by  $\Phi 2$  and D[7]. The un-raised reference voltage at the input, V<sub>in</sub>, is tracked by capacitor C<sub>s</sub>, because transistor M<sub>1</sub> is closed. After the coarse conversion phase, the reference voltage is held by C<sub>s</sub> by opening switch M<sub>1</sub>.  $M_2$  is made twice as wide as  $M_1$  to partially cancel out the channel charge injection effect. Then the bottom plate of  $C_s$  is reconnected to a different voltage, Vbias+ $\Delta V$ , if D[7] is 1. The change of the voltage at the bottom plate of  $C_s$  is reflected to its top plate, because the top plate is floating. This voltage is further buffered by the second op amp to create the raised subrange voltage, with an offset of  $\Delta V$  higher than before. When D[7] is 0, the logic will maintain the status of switch B throughout the subrange creation phase, and the output of the level shifter will stay at the sampled un-raised subrange voltage. As shown in Fig. 76, two clock signals separate the circuit to perform reference sampling and level shifting. They are named  $\Phi 4$  and  $\Phi 5$  to differentiate timing signals used for SC-PGA,  $\Phi 1$ ,  $\Phi 1B$ ,  $\Phi 2$ ,  $\Phi 3$ .

## 4.3.3.3 ADC I/O

The 12-bit data are loaded into a parallel-in-serial-out (PISO) register in each channel at the end of a complete conversion. PISO registers are cascaded from one ADC channel to its next channel, so that data from all channels can be shifted off the chip in

serial, as shown in Fig. 77. The PISO register is in parallel mode when  $\overline{PEN}$  is in high. At the end of the data conversion, controller logic issues a pulse signal at port "Load" to store D[11:0] in the register. As  $\overline{PEN}$  is pulled low, each PISO register becomes a 12-bit shift-register, and data is transmitted off the chip by SPI module of a microcontroller.



Fig. 77 The block diagram of the serial data I/O of in CPOSSAR ADCs

The maximum transmission bandwidth of the I/O was tested by gating off the control logic, so that  $\overline{PEN}$  stays high. Then the external MCU sends serial data to the register chain and read it back. The bit error rate (BER) was measured when the data transmission was performed at 20 Mbps and 40 Mbps, as shown in Fig. 78. Limited by the SPI baudrate range of the MCU, the maximum data transmission speed is 20 Mbps with 0 BER. According to the measurement, a total of 19.2 µs is needed to transmit the complete data set off the chip.



Fig. 78 The measured bit error rate of the ADC I/O under two different transmission bandwidth

# 4.3.3.4 10-bit CPOSSAR ADC measurement

To test an individual ADC channel, its input was multiplexed to a pad that is connected to an Agilent 33250A arbitrary waveform generator (AWG), as shown in Fig. 79. On-chip control logic was configured through software to produce periodic data conversion clock signals for continuous sampling. It was realized by first breaking up a loop in the on-chip timing controller via a switch controlled by "ADC\_TEST\_EN". Second, the discontinued signal chain is repaired by feeding a substitutive signal, "EXT\_TRIG", from the MCU to that break point. This signal is generated by sending 2 reference clock signals, " $\overline{\Phi 1}$ " and "ADC\_IDLE", from the chip to the MCU, and using MCU's interrupt function to produce the required signal by setting timing constraints in C programming.



Fig. 79 The test bench to measure an individual ADC channel

Once the ADC input was connected to an external source, and data conversion clocks were continuously generated from the on-chip controller, the ADC is operated in the continuous conversion mode. Data stream produced by the ADC are loaded into its I/O register at the end of each conversion cycle. MCU then read the data through SPI and store it in its memory. A 10 MHz system master clock was provided to the chip by the MCU's SPI module operating in continuous frame mode. Then the pre-scalar of the ADC controller reduces the clock frequency to 1.25 MHz. This ADC clock was used to synthesize all data conversion clocks, resulting in a 4.883 kS/s sampling rate.

A 4.768554 Hz sinusoidal signal was used as the ADC input. Its peak to peak amplitude was 1350 mV and the offset voltage was 1125 mV. A total number of 131,072 samples were digitized by the ADC to create the histogram. The calculated DNL and INL are presented in Fig. 80. The minimum DNL of -0.7 LSB confirmed no missing codes, and the maximum DNL was 1.7 LSB. The INL was measured +9.6/-9.7 LSB, but the comparable results were also presented in ADCs used in other CMOS imagers[84], [85].



Fig. 80 The measured DNL and INL on the single channel CPOSSAR ADC

## 4.3.4 Timing controller

The imager IC is facilitated with an on-chip timing controller to produce timing signals and off-chip communication signals. The custom design is based on the synchronized pipeline topology. It is equipped with internal registers, such that the duty cycle of some critical time signals can be reconfigured in software. The controller architecture is divided into two levels called a top-level and a lower-level. The APS array and the SC-PGAs are controlled by the top-level controller to perform photoelectric conversion, sampling, and amplification. The top level controller also performs the data transmission and the handshake with an off-chip MCU. The use of an off-chip MCU is necessary, because it serves as the hardware interface between the personal computer user interface and the imager IC. The lower-level logic stays in idle mode to save power consumption. It is gated on by the top-level logic to operate the CPOSSAR ADC for analog-to-digital conversion.

## 4.3.4.1 Top-level control logic



Fig. 81 The finite state machine of the full custom top-level controller circuit

The top-level logic controller is a finite state machine, as shown in Fig. 81 in the blocked area. As the imager IC is powered up, the controller is reset by the MCU by sending a "Global\_reset" signal with logic high state. It essentially gates off the clocking circuits inside the controller and resets all state registers and programming registers.

In MATLAB script 1, users can define various configuration data. These include the cycle number of the pixel integration window, the cycle number of the amplification phase, the cycle number of auto-zeroing phase, the number of measurement, the delay line input, and the row address of the pixel array. The cycle number defines the number of periods of the system master clock spent on a particular timer. Once that amount of time is elapsed, a corresponding logic state will be toggled to start or stop a function. For example, the number of measurements defines the total amount of data to be collected for a particular gating window. After these configuration data are defined in the script, MATLAB executes the script to configure the controller. This is realized by maintaining the controller in "Idle & Config", and shifting the configuration data into the registers inside the logic controller, as well as the pixel array row decoder.

MALAB script 2 defines the measurement control and data transmission protocol. Once it is executed by the MATLAB, the MCU sends a low state "Global\_reset" to the chip controller to gate on the internal clocking circuit. The logic is first synchronized by "Reset\_pulse" produced by the gating controller. In the first measurement cycle, the selected pixels are reset by "Pixel\_reset", which is the clock synchronized with "Reset\_pulse". Then at the rising edge of "Pixel\_reset", pixels start to convert the received photons into voltages. This process stays for the number of clock cycles defined in the integration timer register. As the timer overflows, the voltages at pixels' outputs are amplified by the PGAs by the number of clock cycles programmed in the amplification timer register.

The end of the amplification triggers the data conversion. Data generated after the data conversion is read by the MCU. In MATLAB a counter called "meas\_count" is implemented to count the number of measurements that have been performed. The status of this counter is compared with the pre-defined number of measurement cycles. If the measurement volume is not reached, the controller stays in released mode, and the same

measurement process continues. Otherwise the MCU is notified by the MATLAB to gate off the controller by issuing a high state of "Global\_reset". This will move the controller finite state machine into the "Idle & Config" state once again. After that, the imager IC can be re-configured for a new set of measurements.



Fig. 82 The typical output waveform of the top-level controller for a single measurement

The typical waveform of the top-level controller is shown in Fig. 82. Notice that one complete measurement includes two data conversion cycles. The first conversion occurs immediately after the pixel is reset to record the actual reset voltage. The second conversion is performed after the fluorescence is integrated by the pixel in the integration window, and is further amplified by the PGA. Both conversion results are shifted off the chip, but only the difference is used as the measurement value for that delay step.

# 4.3.4.2 Lower-level control logic



Fig. 83 The finite state machine of the lower-level controller

The lower-level controller is the secondary finite state machine presented in Fig. 83. Configuration of such controller is also performed in "Idle & Config" state when 'Global\_reset' from the MCU is held high. When the top-level controller is released, and after the integration window timer is overflowed, autozeroing clocks are produced by the lower-level controller to remove the offset in ADC reference buffers. As the amplification phase is terminated by the top-level controller, an 'ADC\_trigger' signal is sent to the lower-level controller to start the data conversion. It essentially gates on the clocking circuits in the lower-level controller to produce data-conversion timing signals for the ADC. A complete data conversion undergoes 5 different phases: coarse conversion, subrange creation, 2nd autozeroing, fine conversion, and data load. Then the lower-level logic returns to the idle state and waits for the next trigger signal from the top-level logic. An example of the lower-level controller timing waveform is presented in Fig. 84.



Fig. 84 The typical timing waveform of the lower-level control logic

# 4.3.4.3 Design and testing on timing controller



Fig. 85 The synchronous pipeline units used in the controller design

The timing control logic was a full custom design. The controller is divided into many cell circuits. Each cell can be categorized into the synchronous pipeline architecture shown in Fig. 85 (a). The sequential logic is used to store and update the state, which is received by the succeeding combinational logic to produce the timing signal. Feedback may be established from the output to the input of the sequential logic for some logics whose next state is dependent on the current output state. Each cell generates one particular timing signal or multi-phase timing signals.

The design of the timing controller followed a top-down method. First the correlation among timing signals presented in Fig. 81 and Fig. 83 were studied. Then the block-level representation of different cells was connected in a way that the signal dependence was implemented. Next, each cell was designed in Register-Transfer-Level (RTL). This was realized by composing the sequential part and the combinational part to realize the corresponding signal under study. Last, the standard cells used in RTL were implemented in transistor level.



Fig. 86 The block diagram of the top-level controller

RTL block diagram for the top-level controller is shown in Fig. 86. The controller is gated-off when "Global\_reset" is held high. "Pixel\_reset" remains in high state so that all pixel sensors are disabled for least power consumption. When reset is removed, "Reset pulse" from the gating controller is transmitted through the pulse synchronizer. This only happens after the first rising edge of "Reset\_oulse", from where "Pixel\_reset" is activated and it is just a copy of "Reset\_pulse". In the next stage, PGA Sampling Timer and Data Load block receive the system master clock and are triggered by "ADC\_idle" signal sending from the lower controller. After "Data\_Load" and the multi-phase PGA control signals are generated, a PGA amplification timer is trigged by " $\Phi$ 1". The timer is used to produce a signal called "ADC\_Trig\_int" to indicate the amplification of PGA is completed, and the data conversion can be started. A multiplexer selects either "ADC\_Trig\_int" in normal mode or an "ADC\_Trig\_ext" in ADC testing mode. Two signals mentioned above are used for handshake purpose with the lower-level controller. The rising edge of "ADC\_Trig" is used to gate on the lower-level controller. Upon gated on, the lower-level controller sends a low state ADC\_idle to the top-level, indicating data converter is busy. After the conversion is completed, this signal becomes high, and the top-level controller is noticed that the data conversion is ended so that it continues to generate succeeding timing signals.

The post-silicon functional verification of the top-level controller in combination with the gating controller was performed by sending a 10 MHz system master clock to the chip. The pre-scalar of the gating controller was programmed with a pre-scale ratio of 128:1, such that the frequency of the reset pulse was 78.1 kHz, or its period was 12.8  $\mu$ s. The reset pulse generator in gating controller was tuned to produce a 140 ns-wide reset pulse. The pixel integration timer was programmed with an overflow time interval, T<sub>1</sub>, of 60 system master clock cycles, or 6  $\mu$ s. The PGA amplification timer was programmed with an overflow time interval, T<sub>2</sub>, of 10 system master clock cycles, or 1  $\mu$ s. Finally the lower-level controller pre-scalar was programmed with a pre-scale ratio of 1:1, so that the

ADC clock frequency was 10 MHz. The data conversion took 18 ADC clock cycles, and the total conversion time was 1.8 us. After all logic timers were programmed through the SPI, the "Global\_reset" was held low to gate on the controller logic, and the waveform of timing signals produced from the top-level controller was observed from a Tektronix MSO4104 oscilloscope. The result of the observed waveform is plotted in Fig. 87.



Fig. 87 The tested waveform that verified the correct functional behavior of the top-level control logic



Fig. 88 The block-level interconnect of the lower-level controller

The block-level interconnection for the lower-level controller is presented in Fig. 88. In reset mode, the system master clock is blocked at the input of the 8-bit pre-scalar. When reset is removed, it is first pre-scaled by the 8-bit pre-scalar to generate the ADC

clock, "CLK\_ADC", at a lower frequency. The "CLK\_ADC" signal can be regarded as the ADC master clock synchronized with the system master clock. This clock is still blocked from the rest of circuits as long as the "ADC\_Trig" signal from the top-level controller is held low. When it becomes high, the ADC master clock passes through the clock synchronizer to create a "CLK\_SAR", and a "CLK\_COMP" signal. The former is used to drive the successive approximation logic; the later controls the dynamic comparator in the ADC. The high state of the "ADC\_trigger" signal also releases ADC State Register, I/O mode control block, MSB Load block, and the coarse control block. The outputs from these functional blocks are provided to the next stage, including Start Control, Fine control, and Level Shifter Auto-zeroing. Finally "Fine\_flag" produced by Fine Control block feeds into Level Shifting block to generate timing signal for level shifting function (overlapping subrange creation).

In testing the lower-level controller, system master clock was made 10 MHz, and the lower-level controller's pre-scalar was set to 2:1, so that the ADC clock was 5 MHz. In reset mode, all timing signals remain idle, as seen from Fig. 89.



Fig. 89 The measured waveform of the lower-level controller when the "Global\_reset" is high.

As "Global\_reset" was pulled low, the lower-level controller was gated on. Waveform in Fig. 90 shows that only the "ADC\_clock" was continuously generated when the handshake signal, "ADC\_trig", from the top-level controller remained low. After the rising edge of "ADC\_trig", all remaining timing signals started to toggle their states. The data conversion began at the rising edge of "ADC\_trig", and was ended at the falling edge. In total there was 18 clock cycles for "ADC\_clock", corresponding to a 3.6 µs conversion time. After the falling edge of "ADC\_trig", the lower level controller returns to the reset mode, where it stayed in standby state until the next rising edge of "ADC\_trig" was issued.



Fig. 90 The measured waveform of the lower-level controller after the "Global reset" signal is pulled low

## 4.3.4.4 Full custom design of the top and lower-level controllers

Each cell in Fig. 86 and Fig. 88 was a full custom design. The following section reviews the detailed implementation of each cell.

4.3.4.4.1 Pulse synchronizer

The reset signal that directly drives the reset transistor of each active-pixel sensor is called "Pixel\_reset". As the "Global\_reset" is high, it remains in logic high, so diodes in all pixels are discharged to ground. After "Global\_reset" is pulled low, "Pixel\_reset" is synchronized to "Reset\_pulse". The synchronization is performed by the circuit shown in Fig. 91. At the first falling edge of "Reset\_pulse" after the "Global\_reset" becomes low, the output of the D-flip-flop, Q, becomes high. This unblocks the signal at node A to propagate through the 2NAND gate, and two inverters. Finally there are 4 gate delays between "Pixel reset" and "Reset pulse", approximately 400 ps.



Fig. 91 The pulse synchronizer is used to gate off the "Pixel\_reset" in reset phase 4.3.4.4.2 SC-PGA sampling timer

The clock signal used to divide the SC-PGA's operation into sampling and amplifying phase is created by the PGA sampling timer, as shown in Fig. 92. When "Global\_reset" is high, "Φ1" stays high so that the SC-PGA keeps tracking the pixel output; the 8-bit programmable timer is reset and disabled. After "Global\_reset" is pulled low, D-flip-flop, D1, starts to sense the first rising edge of "Pixel\_reset", on which its output, Q, is pulled high. This will reset D2, so that "Φ1" is pulled high to start amplifying the sampled reset voltage of the pixel sensor. After the amplified voltage is digitized by the ADC sequentially, "ADC\_idle" becomes high. It is clocked to D2 to pull high "Φ1" again. The first sampling-amplification cycle is completed by then.



Fig. 92 The RTL block diagram of the PGA sampling timer

Starting from the end of the first sampling-amplification cycle, the rising edge of " $\Phi$ 1" is propagated to the "EN" input of 8-bit Programmable Timer after ~3 clock cycles of "CLK\_sys" through D4~D7. Then the timer starts to count the elapsed number of clock cycles until it overflows, on which its output, "matched" is pulled high to set D1, and subsequently reset D2 to pull down " $\Phi$ 1". The SC-PGA and ADC then amplifies and digitize the pixel output for the second time. The complete double sampling-amplification cycle completes by then.

PGA sampling timer returns to the initial reset state after two amplificationconversion cycles. Because the first cycle measures the actual pixel reset voltage, and the second cycle measures the pixel output after exposure to light, their difference would be the actual measurement data.

#### 4.3.4.4.3 8-bit programmable timer

The timer used in the SC-PGA sampling timer counts the number of clock cycles until the state of the counter reaches a software defined value. The function of the timer is to control the time spent on sampling phase. The block diagram is shown in Fig. 93. In global reset mode, "EN" is low and both the 8-bit up synchronous counter and the 8-bit comparator are reset. The 8-bit register is written through SPI to define the clock cycles to be counted before the counter overflows. The 8-bit comparator monitors the state of the 8-bit up synchronous counter and generates an asynchronous trigger signal, "matched", as the state of the counter becomes the pre-defined value stored in the register. At the next rising edge of "CLK\_sys", its synchronous version, "matched\_sync", is produced. Both signals can be used depending on the timing constraint.



Fig. 93 The implementation of the 8-bit programmable timer

## 4.3.4.4 CDS Complete/Data Load

After the second amplification-conversion cycle is completed, "Data\_load" will be pulled high to change the I/O mode from parallel into serial. It returns to the parallel mode at the next pixel reset operation. This function is achieved by Data Load circuit, as shown in Fig. 94. In global reset phase, the input of D1 is high, so its output, Q, remains high. This condition resets D2 and sets D3. The output signal, called "CDS\_complete/Data\_Load", remains low in the reset period, because D4 is also reset. After "Global\_reset" is pulled low, D2, D3, and D4 are released from the reset or set condition. D7 starts to sense the moment when "Pixel\_reset" becomes low. When "Pixel\_reset" is high, D6 senses the high state of "ADC\_idle", and passes it to the clock input of D2 and D3 at the rising edge of "CLK\_sys". Because D2 and D3 act as a mode-2 counter with initial state of 01<sub>2</sub>, they overflow after two clock cycles. So as "ADC\_idle" becomes high for the second time, or after two complete sampling-amplification cycles,

the counter overflows. The trigger signal from the output of D3 feeds into the clock input of D4 by a synchronization step performed by D5. The output signal

"CDS\_complete/Data\_Load" is toggled to logic high to change the I/O mode from parallel to serial. At the beginning of the next measurement, "Pixel\_reset" falls to ground. This resets the entire circuit again to repeat the I/O mode control.



Fig. 94 The schematic of the Data Load logic

# 4.3.4.4.5 SC-PGA amplification timer

The time spent on amplifying the sampled pixel output is controlled by the SC-PGA amplification timer shown in Fig. 95. In reset mode, D1 is set to reset D2. The output of D2, Q, propagates through 2 inverters to make "ADC\_trig" stays low. The inverting output of D1 ( $\overline{Q}$ =0) feeds into the 8-bit programmable timer's "EN" port to reset and disable the timer. When "Global\_reset" is pulled low, D1 is released from the set condition to wait the state of " $\Phi$ 1" becoming low, after which point the rising edge of "CLK\_sys" toggles D1. The 8-bit programmable timer is enabled, because  $\overline{Q}$  now is pulled high. D2 is released from the reset condition, and its input, D, changes to and remains at logic high. After the 8-bit programmable timer overflows, the rising edge of "matched" toggles the state of D2 to produce a rising edge of "ADC trig". Since then the amplification phase is ended. "ADC\_trig" will be pulled low again once " $\Phi$ 1" becomes high.



Fig. 95 The schematic of the PGA amplification timer.

# 4.3.4.4.6 8-bit pre-scalar

In the lower level controller, the system clock is converted into the ADC clock at lower frequency. The function is realized by a programmable frequency divider circuit and an 8-1 multiplexer, as shown in Fig. 96. Each stage of the frequency divider reduces the clock frequency of its previous stage by 2, and multiplexer selects one of the clock signal as the output.



Fig. 96 The schematic of the 8-bit programmable pre-scalar

## 4.3.4.4.7 ADC state register



Fig. 97 The schematic of the ADC state register logic

As the lower-level controller is triggered by the rising edge of "ADC trig", the ADC state register pulls down "ADC idle" at the next falling edge of "CLK ADC" to notify that the data converter is busy. This logic then counts 17 clock cycles of "CLK ADC", the time spent on one complete data conversion, before pulling up "ADC\_idle" again. As shown in Fig. 97, as "Global\_reset" is held high, D2~D9 are set; D1, and D10~D20 are reset. "ADC idle" remains logic high and the ADC stays in idle mode. After "Global reset" is pulled low, the first rising edge of "CLK ADC" toggles D2, so that D1 is released from the reset condition. Then D1 toggles as "ADC trig" becomes high. As "ADC idle" is pulled low, it propagates to node A after 5 clock cycles of "CLK SAR", so that the reset and set condition for D8~ D20 are removed. D8~D10 form a loop of shift registers with the initial state of 1100000000002. The logic 1 is shifted to the right, and after 11 cycles it is presented at node B. D2 is toggled at the next clock cycle to reset D1 again. So "ADC idle" is pulled high again to notify that one complete data conversion is finalized. It takes a total of 17 clock cycles when "ADC idle" remains low.

## 4.3.4.4.8 ADC clock synchronizer



Fig. 98 The schematic of the ADC clock synchronizer

In the lower-level controller, the sub-clock produced by the 8-bit pre-scalar, "CLK\_ADC", is used to create two ADC timing clocks, one is the clock that drives the Successive-Approximation register logic, "CLK\_SAR", and another one is the clock that drives the comparator, "CLK\_COMP". The two clocks are created by the ADC clock synchronizer, as shown in Fig. 98., and one is 180° falling behind of another,

Both D1 and D2 are in reset condition as "Global\_reset" stays high. The reset signal blocks the "CLK\_ADC" from propagating through 2NAND N1 and N2. After "Global\_reset" is pulled low, when "ADC\_trig" becomes high and a falling edge of "CLK\_ADC" occurs, D1 toggles its state to logic high. This activity unblocks the "CLK\_ADC" from propagating through N1, and "CLK\_SAR" synchronized with "CLK\_ADC" is generated at the output, starting at a falling edge. At the next rising edge of "CLK\_ADC", D2 is toggled to high state, so that N2 is unblocked to start propagating "CLK\_ADC" to its output, however the signal is inverted.

Notice that 8 cascaded inverters are placed in parallel to D1. The reason for this is to remove the glitch at "CLK SAR" once "ADC idle" becomes low. The delay between

"CLK\_SAR" and "CLK\_ADC" is 10 gate delays, or 1ns evaluated from the post-layout simulation. This arrangement will guarantee that "ADC\_idle" falls to ground in advance of the falling edge of "CLK\_SAR", which is a timing constrain required by the SAR reset controller described below.







The 6-bit Successive-Approximation register is reset twice during each data conversion cycle. A pulse signal named "SAR\_reset" with pulse width half of the ADC clock cycle is produced by this function block. As shown in Fig. 99, as "ADC\_idle" remains high, all D-flip-flops except for D8 are in reset condition; D8 is set by "ADC\_idle". The cascaded D-flip-flops, D1~D9, form a shift-register loop, with its initial state as 000000010<sub>2</sub>. When "ADC\_idle" falls down, all D-flip-flops are released from reset and set condition. Next, at the first falling edge of "CLK\_SAR", D10 toggles its state to high. Then at the next rising edge of "CLK\_SAR", D9 is toggled to high. This allows "SAR\_reset" be pulled high to start the first 6-bit conversion. The state of the shift-register loop now becomes 00000001<sub>2</sub>. At the next falling edge of "CLK\_SAR", the output of D10, Q, is pulled low again, so "SAR\_reset" becomes low again. This

makes the pulse width of "SAR\_reset" to be half a clock cycle of "CLK\_SAR". It occurs at the first rising edge of "CLK\_SAR" after "ADC\_idle" is pulled low.

The shift-register loop keeps shifting the logic 1 clockwise until it recovers to the initial state of 000000010<sub>2</sub>. At the 10th rising edge of "CLK\_SAR" after "ADC\_idle" is pulled low, the second short pulse of "SAR\_reset" is produced the same way as the first pulse. This starts the second 6-bit conversion.

In summary, after "ADC\_idle" is pulled high, the first 7 clock cycles of "SAR\_ADC" were used for coarse data conversion; the next 2 clock cycles were used for sub-ranging creation; the second "SAR\_reset" pulse is produced at the 10th cycle, which starts the fine conversion; an additional 7 cycles are used for the fine data conversion. 4.3.4.4.10 Coarse control

"Coarse\_reset" is a signal used to differentiate the coarse conversion cycle and the fine conversion cycle. As shown in Fig. 100, it is created by a shift-register loop formed by D2~D16, and the reset logic built by D1 and D17.



Fig. 100 The schematic of Coarse Control logic

As "Global\_reset" is high, because "ADC\_idle" is also high, D1 is in reset condition and D17 is in set condition. These makes the initial state of "Coarse\_reset" to be high. The inverting output of D1 feeds into the reset or set port of D2~D16, making the initial state of the shift-register to be 0000000100000002. As "Global\_reset" is pulled low, D1 monitors "ADC\_trig", as it goes high D1 is toggled. This releases all shifter-registers from reset or set condition. "CLK\_SAR" then shifts the logic 1 to the right until it reaches the output of the last stage, on which D17 is reset, lowering "Coarse\_reset". By then the coarse conversion cycle is completed, and a total of 7 clock cycles are used. In the end of a complete data conversion, "ADC\_idle" is pulled high, and "ADC\_trig" is pulled low. This sets D17 to pull up "Coarse\_reset" for the next coarse conversion in the new data conversion cycle. The shift-register is reset to its initial state because the state of D1 is toggled to low.

# 4.3.4.4.11 MSB load

The "MSB\_load" signal is used to store the 6-bit coarse conversion result into the 6-bit register at the end of the coarse conversion. The RTL block diagram of MSB load block is similar to Fig. 100, but D17 is not included, and the reset state of the shift-register is 0000001000000002. So the rising edge of "MSB\_load" during the data conversion occurs at the 8th clock cycle of "CLK\_SAR", one clock cycle after the coarse conversion is completed. The pulse width is 1 clock cycle. The reset mechanism of the MSB load block is to pull off "ADC\_trig", and pulling up "ADC\_idle" at the end of a complete data conversion cycle.

### 4.3.4.4.12 Fine control and shifting enable

Similar to "Coarse\_control", "Fine\_control" is a flag signal used to notify the fine conversion cycle. The reason to have such a signal is to timely enable the shifting function of the level shifter, so that the overlapping sub-ranges can be created right before the fine conversion cycle. As shown in Fig. 101, "Fine\_flag" is reset to 0 as "ADC\_idle" stays high. After a complete data conversion, the reset is removed. When coarse conversion is completed, "Coarse\_reset" becomes low. At the start of the fine conversion, "SAR\_reset" produces a short pulse, which will set the D-flip-flop to pull up "Fine\_flag" hence "EN\_shift". In the end of a complete data conversion, the rising edge of "ADC\_idle" resets the D-flip-flop again to pull down "Fine\_flag" and disable shifting function



Fig. 101 The schematic of the Fine Control logic

### 4.3.4.13 ADC data load



Fig. 102 The schematic of the ADC data load logic

### **CHAPTER 5**

# FLUORESCENCE LIFETIME IMAGER CAMERA MODULE INTEGRATION

# 5.1 Chip controller hardware

The monolithic CMOS imager IC was integrated to a printed-circuit board portable imager device for real application. The prototype device is an embedded system under control by a user interface in the personal computer. Chip controller hardware is needed to generate control signals, transmit data between host computer and imager IC, and provide supply voltage and bias currents. This section introduces the hardware components involved in the prototype board that controls the imager IC.

### 5.1.1 PIC32 microcontroller

A 32-bit PIC microcontroller was used as the ubiquitous part of an embedded system. Because the developed imager device at this stage is only for prototyping purpose, there was no need to choose a high-end microcontroller to start with. PIC32 microcontroller family provides sufficient flexibilities including the amount of peripherals, memory space, instruction execution speed, and different packages. It is suitable for prototyping purpose because of the relatively simple development protocol, as well as a wide variety of application notes available online. This project used PIC32MX795F512L microcontroller as imager chip controller. Features of such microcontroller include the 512KB flash programming memory, 128KB RAM data memory, USB transceiver module, and a total of 85 general purpose I/O pins. The system can operate at 80 MHz, which is well above the master clock frequency of the imager IC used in pre-silicon validation, 10 MHz.

PIC32 microcontroller is programmed by C and the code is compiled by Ccompiler. The manufacturer, Microchip Technology, provides a complete set of integrated design environment called MPLAB IDE. In developing the software support of the prototype device, a universal bootloader code was first ported to the version of the selected MCU. Then the MCU was programmed with that ported bootloader by using PIC-kit 3 programmer. After the bootloader is loaded, the MCU can be simply reprogrammed using USB. The above software development platform becomes very convenient, and was choosing as the general method of building the imager chip controller and testing other mixed-signal ICs.

## 5.1.2 Xilinx Cool Runner II CPLD

The selected version of PIC32 microcontroller can operate at up to 80 MHz. Although products with up to 200 MHz are available, an even faster programmable logic, such as a CPLD, can be implemented besides the MCU to further enhance the data transmission bandwidth of the chip controller. The MCU speed is limited because it executes the code in serial and one instruction takes possibly more than one clock cycles for execution. Maintaining certain timing constrains in C coding becomes relatively difficult due to the nature of serial execution of code. In this regard, a CPLD are reconfigurable logic devices in which all the operating logics generate signals in parallel. By this way it is easier to perform the static timing verification before device programming. In the development of the ADC testing tape-out before building the final monolithic imager IC, a Xilinx CoolRunner II CPLD was used to support the MCU to generate high speed data transmission control signals. The development of the Verilog code to program the CPLD was performed using Xilinx ISE Design Suite 11. After the Verilog code was verified through static timing analysis, it was synthesized into a JEDEC programming file according to the selected hardware architecture. Then the JEDEC programming file was converted into a XSVF file through iMPACT, subsidiary software provided by Xilinx. The data file was further embedded into the application-layer code of the MCU, so that the CPLD was in-system programmed by the MCU[86], at the same time the MCU is programmed by its application-layer code.

#### 5.1.3 Programmable voltage and current reference

There are 15 reference voltages and 9 bias currents required by the imager IC. All these voltages and current are generated by programmable reference circuits that include a 12-bit DAC, an N-MOSFET (or a P-MOSFET), an op amp, and a source or sink resistor. Fig. 99 shows three types of reference generation circuits used in the chip controller. The 12-bit DAC is controlled by the PIC32 MCU by the SPI. This is a 4-wire serial communication architecture that involves Serial Data In (SDI), Serial Data Out (SDO), Serial Clock (SCK), and Chip Select (CS). The CS signal of each 12-bit DAC is controlled by one of the general purpose I/O pins on the microcontroller. The SDI and SCK of all 12-bit DACs are shared. The two signals are supplied by the Serial Data Out (SDO) and SCK pins from the PIC32 MCU. In Idel & Config mode of imager IC, programming on the on-chip register is performed by first pulling down one of the CS signal associated with the 12-bit DAC chip to be programmed; then the MCU sends serial data to DAC register and refresh the analog output by pulling high CS. All reference generation circuits are programmed sequentially at the power-up phase. Once the DACs

are programmed, their voltage output can be used by buffering it with op amps as shown in Fig. 103(a). To produce a bias current, the DAC output feeds into the current generator circuit shown in Fig. 103 (b) and (c), for either sourcing a current from the chip, or sinking current to the chip. In the current generator circuit, the feedback of the op amp pins the voltage at node A to be  $V_{ref}$ . The op amp adaptively biases the gate voltage of the N-MOSFET or the P-MOSFET, so that the drain current of the MOSFET is the  $V_{ref}$ devided by the resistor value, R.



Fig. 103 Three different reference generation circuits used in the chip controller board

### 5.1.4 Excitation circuit

The high speed LED driver circuit is the same as the discrete component fluorimeter discussed in Chapter 2. To further increase the maximum excitation power, two excitation boards were placed in parallel to excite the chemical samples at both sides.

# 5.1.5 Power supply

Fig. 104 shows the functional blocks in the final fluorimeter prototype. The portable device receives 5V USB power supply from the personal computer. It is regulated into three separate on-board supply voltages, Analog 3.3V, Digital 3.3V, and Digital 5V. The fabricated imager IC is mounted in the middle, and it receives both

analog and digital 3.3V voltages as supply voltage. The bias circuitry is powered by 3.3 V analog voltage supply. The excitation board that drives the high speed LED is supplied at digital 5V. Finally the MCU is powered at digital 3.3V.



Fig. 104 The block diagram of the prototype imager device implementing the CMOS imager IC

# 5.1.6 CMOS fluorimeter integration

The fluorimeter hardware shown in Fig. 104 was separated into four PCBs and they were assembled into a portable device shown in Fig. 105. The PIC 32 microcontroller board labeled with A was used to gate off, program, read data from the imager IC, and communicate with the personal computer. Both the imager IC and the bias circuitry were built on board B, because they share the same 3.3 V analog power supply. Two boards were stacked through four 6-32 screws and were connected through a ribbon cable for electric signal connection. Excitation boards, C and D, were placed between board A and B orthogonally, so that a measurement chamber is formed in the middle of four boards for placing the chemical sample cuvette. As two LEDs excite the sample in horizontal direction, fluorescence emitted from the sample is received by the imager IC from the 90° angle. The assembled device measures 17 cm × 9cm × 3cm to demonstrate its portable feature.



Fig. 105 The assembled portable lensless fluorimeter built by the CMOS imager IC

# 5.2 Testing of the portable CMOS Time-domain fluorescence lifetime imager

After each individual building block was validated, and the imager camera module was integrated into a PCB prototype, the portable device was used for real measurements. This section describes measurements on the decay time of the excitation pulse, commercial fluorophores, and detection limit on fluorescein. Finally measurements made by different pixels are compared.

# 5.2.1 Measuring 405 nm excitation pulse decay time

The excitation light power is orders of magnitude higher than the fluorescence emitted from the chemical samples. To initially verify the fluorimiter's ability to measure time-resolved optical signal, the portable device was first used to measure the falling edge profile of the excitation light by direct facing the LED to the imager IC. A 50 nswide excitation pulse and a 250 ns-wide reset pulse were created from the pulse generator. With the delay of the excitation pulse fixed at 0 ns, the reset pulse was delayed from 0 ns to 5.5 ns, with a total of 6 steps and a step size of 1.1 ns, as shown in Fig. 106. Under each delay step, the measurement was repeated for a total number of 10,000 times. The collected data is averaged as the final result for that delay step. The gain of the PGA was set to be 56 dB.



Fig. 106 The delayline controlled excitation pulse and reset pulse for optical decay measurement
The averaged data was normalized and plotted across the time scale, as shown in
Fig. 107 (a). A mono-exponential fit based on the data gave a time constant of 1.3 ns.
Then the raw data was discretely differentiated and fitted by the same exponential
function, as shown in Fig. 107 (b). This time the fitted time constant was 0.3 ns larger
than before, because the optical decay of the excitation light does not follow a strict
exponential function as the fluorescence. As can be identified from the figure, it linearly
decreases to around 20% of the peak value; then the signal starts to decay exponentially.



Fig. 107 The measured (fitted) 405 nm excitation pulse falling edge (a) and the differentiated data (b)

To verify the accuracy of the decay constant measured by the fluorimeter, the same excitation pulse was measured by a Hamamatsu S2382 avalanche photodiode biased at 130 V. The avalanche photodiode has a bandwidth close to 1 GHz. The output of the APD was connected to a Tektronix MSO3402 5GS/s oscilloscope with 50 ohm input impedance. As shown in Fig. 107 (b), data measured by the APD-Oscilloscope overlaps with that measured by the portable fluorimeter, indicating the sensitivity and accuracy of the optical decay measurement on the excitation light.

### 5.2.2 Measurement on commercial fluorophores by a single pixel

Real lifetime measurement by the portable CMOS fluorimeter was performed on three commercial fluorophores, including Fluorescein sodium salt (Online science mall/CAD# 518-47-8), Lucifer yellow (MP Biomedicals/M3415), and Protoporphyrin IX (Sigma-Aldrich/P8293). They were separately measured with their lifetimes and the results were compared with their documented lifetimes.

Fluorescein sodium salt was first dissolved in PBS. Then it was diluted to prepare a 500  $\mu$ M/L solution. Liquid chemical sample was placed in a 1.5 mL disposable plastic cuvette (Fisher Scientific/14-955-127) sealed with Parafilm Sealing Film (Universal Medical/ SKU: HS234526B). The cuvette was placed in the measurement chamber of the portable CMOS fluorimeter. During the measurement, the fluorimeter was put inside a constant temperature chamber in order to minimize temperature variation, and to shield the image sensor from the external light. The 20  $\mu$ m × 20  $\mu$ m N+/PW-HV diode based APS at column 32 row 64 was used to sense the fluorescence. The measurement process was similar to measuring the excitation decay, but because fluorescein's lifetime is larger than 1.3 ns, the delay range was set from 0 ns to 12.2 ns, with a total 12 steps and 1.1 ns

step size. A total of 10, 000 point data was captured for each delay step. After the average values were calculated for each delay step, they were plotted across the time axis as shown in Fig. 108 (a). A mono-exponential fit was performed to extract its lifetime, and the result was 3.3 ns, compared to the documented lifetime of 4 ns. Fig. 108 (b) presents the residue of the fitted data compared with the measurement data.



Fig. 108 The measured lifetime of 500 μM/L fluorescein sodium salt dissolved in PBS
Lucifer yellow sodium salt was dissolved in PBS and further prepared into a 500 μM/L sample. The measurement followed the same procedure as measuring the
fluorescein. Fig. 109 (a) plots the time-resolved fluorescence decay measurement data
and fitted data. The extracted lifetime was 5.4 ns, compared to the documented lifetime
of 5.7 ns.



Protoporphyrin IX powder was first dissolved in DMSO and then diluted into a 10  $\mu$ M/L solution. The measured fluorescence decay and fitted curve are plotted in Fig. 110 (a). The measured lifetime was 19.6 ns, as compared to the documented lifetime of 16.4 ns[87]. The 19.5 % relative error is due to the non-linearity effect of the SC-PGA and the 10-bit ADC.



Fig. 110 The measured lifetime of 10 µM/L Protoporphyrin IX dissolved in PBS

## 5.2.3 Single pixel measurement detection limit

One of the commonly used criteria to evaluate a fluorimeter's sensitivity to chemical samples is the minimum concentration of fluorescein it can detect. In order to determine the detection limit of the CMOS fluorimeter, fluorescein sodium salt dissolved in PBS were prepared in a series of concentrations, including 500  $\mu$ M/L, 50  $\mu$ M/L, 5  $\mu$ M/L, and 0.5  $\mu$ M/L, and measured separately. Measurement made on each group was repeated 5 times to calculate the average lifetime and the standard error. Fig. 111 shows the measurement data for all concentrations. The average lifetime remains the same when the concentration was reduced from 500  $\mu$ M/L to 50  $\mu$ M/L. But it starts to decrease to below 3 ns at a lower concentration. So the detection limit of the CMOS fluorimeter is 50  $\mu$ M/L.



Fig. 111 The measured lifetime of fluorescein sodium salt dissolved in PBS at different concentrations

The measured detection limit on IC is a factor of 200 larger than that of the discrete-component fluorimeter, 0.25  $\mu$ M/L. So the performance of the IC is not as good as the discrete-component fluorimeter. There is not a direct method to quantitatively compare two types of fluorimeters on the limiting mechanisms of their detection limit,

since their sensing circuit used different types of sensors and sensing circuits. In the succeeding discussion, a quantitative analysis on the IC's noise model will be individually carried out to understand the limiting factors of the detection limit. It may be intuitive that the reduced gain of the amplifier reduces the signal amplitude received by the ADC, requiring the concentration of chemical samples to increase proportionally to guarantee sufficient signal power. But detailed analysis shows that the relation between system's detection limit and the gain of the amplifier is not linear. The gain realized by an actual amplifier tends to affect the detection limit by interfering with the noise generated in the circuit. Optimization made on dominant noise sources produced from the circuit can reduce the detection limit.

The cascaded connection of a pixel sensor, an amplifier, and an ADC that represents one sensing channel is shown in Fig. 112. Sensor receives the fluorescence signal with the power of  $P_{fl}$  and converts it into an electric power called  $P_{sig}$ . The efficiency from  $P_{sig}$  to  $P_{fl}$  is usually smaller than 1, but it is assumed to be 1 for concise analysis. At the input of the amplifier, two noise sources interfere with the signal component. They includes the output referred noise of the sensor, and the input referred noise of the amplifier. Because signal and noise are uncorrelated, their superposition is amplified by a factor of A by the amplifier. Then input referred noise of the ADC further interferes with the signal, reducing the SNR at the final output.



Fig. 112 The noise model of one sensing channel in the fluorimeter IC

The SNR at the output of the entire signal chain can be written as

$$SNR_{out} = 10\log_{10}\left(\frac{10^4 P_{fl}}{\overline{P_{n,pixel}} + \overline{P_{n,amp}} + \frac{\overline{P_{n,ADC}}}{A^2}}\right)$$
Eqn. 53

In equation, the signal power is multiplied by a factor of 10,000, because the 10,000-time averaging operation can reduce the total noise power by that amount.

The detection limit of the system corresponds to the minimum SNR that is acceptable at the output of the ADC for reasonably accurate lifetime measurement. The data measured on a fluorophore sample, which is the integration of the time-resolved single exponential decay, can be plotted in a fashion shown in Fig. 113 (a). Data can be further decomposed into a signal component and a noise component in time domain, as shown in Fig. 113 (b). The signal falls from the peak amplitude at time of 0. After one decay time constant,  $1\tau$ , it becomes 37% of the peak value; after 2 decay time constants,  $2\tau$ , the signal is reduced to only 14% of the peak. Because the peak signal is the first data to be measured, guaranteeing the RMS noise level to be 10% of the peak signal amplitude can make sure that: the first measured signal power is  $10^2$ , or 100, times as large as the noise power; the signal power measured at  $1\tau$  is  $3.7^2$ , or ~14, times as large as the noise power. Because in both cases the signal power is more than 1 order of magnitude higher than the noise power, the interference of noise component to the accurate lifetime extraction can be ignored. When recording data from 0 to  $1\tau$ , because the delay step of the gating controller is 1.1 ns, and at least 2 measurements should be made to extract the lifetime, the minimum detectable lifetime is  $\sim 1$  ns, which is the objective of building the system. When measuring samples with more than 1 ns lifetime,

the effect of noise interference will become weaker, because more data points are involved in fitting process.



Fig. 113 The generalized time-domain measurement data on a fluorophore sample (a) and its decomposition (b)

Above discussion proves that setting the minimum SNR at the output of ADC to be 20 dB, corresponding to the peak signal amplitude to be 10 times as large as the RMS noise level, will guarantee reasonably good lifetime extraction. Under this condition, and based on Eqn. 53, the detection limit of the system can be written as

$$P_{fl} \ge \frac{1}{10^2} \left( \overline{P_{n,amp}} + \overline{P_{n,pixel}} + \frac{\overline{P_{n,ADC}}}{A^2} \right)$$
 Eqn. 54

It says that the minimum detectable fluorescence peak power from the excited chemical sample is determined by the noise generated from the sensor, the amplifier, and the ADC which is attenuated by a factor related to the amplifier's gain.

Next, each noise power factor in Eqn. 54 will be analyzed for quantitative study. The noise behavior of the amplifier has been analyzed in section 4.3.2.2. The closed form expression of the input referred RMS noise voltage of the PGA was shown in Eqn. 43. When the closed-loop gain of stage 1 amplifier reduces from 300 to 120, the bandwidth in amplification mode,  $f_{amplf}$ , is increased from 180 kHz to 450 kHz. Using the new value of  $f_{amplf}$  to calculate the RMS noise voltage again,  $V_{in,rms,PGA}$  increases from 223.1  $\mu$ V to 223.2  $\mu$ V. So the effect of the reduced gain on amplifier's input referred noise can be ignored.  $\overline{P_{n,amp}}$  in Eqn. 54 is 223<sup>2</sup>  $\mu$ V<sup>2</sup>.



Fig. 114 The noise model of a 3-T active pixel sensor

The pixel sensor's noise model is shown in Fig. 114. The noise includes the KTC noise due to transistor  $M_1$ , capacitor  $C_d$ , and the thermal noise from transistor  $M_2 \sim M_4$ . The 1/f noise is ignored. At node X, the PSD of the noise is

$$\overline{V_{n,X,PSD}^{2}} = \frac{4kTR_{on1}}{1 + (2\pi fR_{on1}C_{d})^{2}}$$
Eqn. 55

where  $R_{on1}$  is the channel resistance of  $M_1$ , and  $C_d$  is the junction capacitance of the diode.  $\overline{V_{n,X}^2}$  is directly reflected to the pixel output by assuming an ideal source follower. In the source follower circuit, because the impedance looking into the drain of  $M_3$  is much larger than that looking into the source of  $M_2$ , almost all noise current  $\overline{I_{n,2}^2}$  flows into the channel of  $M_2$  and does not affect the output. So the noise current of  $M_2$  is neglected. The output noise contributed from  $M_3$  and  $M_4$  can be written as

$$\overline{V_{n,pix,PSD}^2}\Big|_{M3,4} = 4kT\frac{2}{3}(\frac{1}{gm_3} + \frac{gm_4}{gm_3^2})$$
 Eqn. 56

where  $gm_3$  and  $gm_4$  are the trans-conductance of  $M_3$  and  $M_4$ . So, by combining Eqn. 54 and 55, the total noise PSD at pixel sensor output satisfies

$$\overline{V_{n,pix,PSD}^2} = \frac{4kTR_{on1}}{1 + (2\pi fR_{on1}C_d)^2} + 4kT\frac{2}{3}(\frac{1}{gm_3} + \frac{gm_4}{gm_3^2})$$
 Eqn. 57

Integrating it from 0Hz to BW (54 MHz, bandwidth of the op amp in PGA), the total noise power at the pixel output is

$$\overline{V_{n,pix}^2} = \frac{2kT}{\pi C_d} \tan^{-1}(2\pi BWR_{on}C_d) + 4kT\frac{2}{3}(\frac{1}{gm_3} + \frac{gm_4}{gm_3^2})BW$$
 Eqn. 58

In post-layout simulation,  $R_{on1}$  is 4.96 k $\Omega$ ,  $C_d$  is 40 fF, gm<sub>3</sub> is 42  $\mu$ A/V is, and gm<sub>4</sub> is 887  $\mu$ A/V. Apply all values into Eqn. 56, and integrate it from 0 Hz to 54 MHz, the total RMS noise voltage at the pixel output equals 615  $\mu$ V.  $\overline{P_{n,pixel}}$  in Eqn. 54 is 615<sup>2</sup>  $\mu$ V<sup>2</sup>.

The input referred noise of the ADC includes quantization noise and thermal noise. They are calculated by referring to the ADC dynamic performance. The measured ENOB of the 10-bit ADC at 1.1 MS/s conversion rate was 6.4 bits. In the measurement a 1.024 V peak-to-peak sine signal was provided to the ADC. Because the relation between ENOB and signal-to-noise-and-distortion (SINAD) follows

$$ENOB = \frac{SINAD - 1.76}{6.02}$$
 Eqn. 59

The calculated SINAD is 40.3 dB. It further satisfies

$$SINAD = 10\log_{10}\left(\frac{P_{in,ADC}}{\overline{P_{n,quant}} + \overline{P_{n,therm}} + P_{dis}}\right)$$
Eqn. 60

where  $P_{in,ADC}$  is the input signal power of the ADC,  $\overline{P_{n,quant}}$  is the quantization noise power,  $\overline{P_{n,therm}}$  is the thermal noise power, and  $P_{dis}$  is the harmonic distortion power. Eqn. 59 is equivalent to

$$SINAD = 20\log_{10}(\frac{V_{in,ADC,RMS}}{V_{n,ADC,RMS}})$$
 Eqn. 61

where  $V_{in,ADC,RMS}$  is the RMS voltage of the ADC input signal, or 362 mV for the 1.024 V peak-to-peak sine wave, and  $V_{n,ADC,RMS}$  is the total input referred ADC RMS noise voltage. By doing the reverse calculation  $V_{n,ADC,RMS}$  is determined to be 3.5 mV. So  $\overline{P_{n,ADC}}$  in Eqn. 54 is  $(3.5 \times 10^3)^2 \ \mu V^2$ . Notice that in calculation the harmonic distortion power is also included as part of the noise.

Now all components in Eqn. 53 have been analytically calculated. To evaluate the effect of the reduced gain of the amplifier to the system's detection limit, the value of Eqn. 53 is plotted when varying the amplifier gain from 20 dB to 80 dB, as shown in Fig. 115 ( $\alpha$ =1). This figure also includes three different curves when the total noise power from the pixel sensor and the amplifier are reduced by a factor of 2, 4, and 16 ( $\alpha$ =2, 4, 16). This family of plots indicates that, based on the noise performance of the existing system, as long as the amplifier gain is more than 40 dB, the detection limit becomes relatively a constant. It first proves that the relation between the detection limit and the amplifier gain is not linear. It also proves that the reduction of the measured amplifier gain from 93 dB to 56 dB did not degrade the detection limit of the system.



Fig. 115 The RMS voltage of the detectable signal vs. the amplifier gain under different improvement factors

The above analysis studied the system's detection limit from the noise point of view. Three noise components are the limiting factors, including pixel sensor noise, column amplifier noise, and ADC noise. It is concluded that the noise from the pixel sensor and the column amplifier is dominant, but the noise from the ADC is negligible if the signal can be amplified by a factor more than 40 dB before digitization. Under the existing circuitry, the RMS noise voltage at the pixel sensor output is 615  $\mu$ V; it is 223  $\mu$ V at the amplifier input. The minimum detectable signal is 66  $\mu$ V RMS, corresponding to the fluorescence emitted from a 50  $\mu$ M/L fluorescein sample. It is worthy of evaluating how small the noise could be reduced to by revising the circuit, so that the detection limit of the system can be extrapolated for future design reference.

First, pixel sensor noise includes the KTC noise and the source follower noise. Because KTC noise power is inverse proportional to the diode capacitance, it can be reduced by using a larger pixel. Source follower noise can be effectively reduced by decreasing the aspect ratio of the current source transistor and increasing its area, so that its transconductance is smaller to reduce the thermal noise, and its 1/f noise also gets decreased. For above two noise sources, a factor of 16 reduction on noise power can be realized by using 80  $\mu$ m by 80  $\mu$ m pixel sensor that replaces the 20  $\mu$ m by 20  $\mu$ m pixel, and using a 48 $\mu$ m/16 $\mu$ m current source transistor to replace the 48 $\mu$ m/1 $\mu$ m one. Second, the noise from the switched-capacitor amplifier is dominated by the KTC noise of the input switch. Its noise power can be reduced by a factor of 16 by increasing the sampling capacitor from 1.5 pF to 24 pF. Further improvement on all above noise power is possible, but becomes less attractive, because the pixel sensor and the capacitor area grow exponentially. So the above evaluation shows the detection limit could be improved as far as each imaging channel is allowed to be built using more silicon areas. The increased pixel pitch size will reduce the spatial resolution, but the improvement on the detection limit can be beneficial to certain applications. Under the noise power improvement factor of 16, the minimum detectable signal can be as low as 16  $\mu$ V RMS. It corresponds to the fluorescence emitted from a 12  $\mu$ M/L fluorescein sample, as plotted in Fig. 115.

### 5.2.4 Fluorescein lifetime measured with well spaced pixels

The final objective of using the CMOS fluorimeter is to acquire a 2-dimensional image based on the lifetime measured by each pixel of the array. This can be potentially used as a lensless imager to replace fluorescence microscope to measure live cells stained with fluorophores. However, the first CMOS fluorimeter IC prototype was limited in using a single pixel at a time. Measuring chemical samples using a second pixel has to be performed by manually tuning the reference voltage,  $V_{refl}$ , of the column-parallel SC-PGA array. The reason is explained as follow.



Fig. 116 The first stage of the PGA connected to a pixel sensor

For the review purpose, Fig. 116 presents a pixel sensor connected to the first stage of the PGA circuit. In pixel reset phase, because  $M_1$ ,  $M_4$ , and  $M_5$  are closed, while other switches are open, voltage across capacitor  $C_1$  equals

$$V_{C1} = V_{rst} - V_{ref1} - V_{os}$$
Eqn. 62

where  $V_{rst}$  is the voltage at node A during the pixel reset phase, and  $V_{os}$  is the offset of the op amp U<sub>1</sub>. After the pixel reset phase, all above switches are opened, while the remaining switches are closed. At the end of the succeeding amplification phase, voltage across C<sub>1</sub> becomes

$$V_{C1} = -V_{os}$$
 Eqn. 63

So the differential input signal, called  $V_{in,rst}$ , that is amplified in the first measurement is the difference between Eqn. 62 and Eqn. 63, or  $(V_{rst}-V_{refl})$ . The value of  $V_{in,rst}$  is pixel depended, and it includes a noise component with a mean value of 0, and a DC offset component that varies from one pixel sensor to another when fix the value of  $V_{refl}$ 

$$V_{in,rst} = V_{n,pixel} + V_{os pixel}$$
Eqn. 64

The purpose of the first measurement in one measurement cycle is to record both components, so that their effect can be removed by subtracting the first measurement from the second one, which is performed after the pixel integration. This correlated-double-sampling process will function properly only when both components in Eqn. 64 are very close to 0. If their total becomes greater than, say, 1 mV, the amplifier with a gain close to 60 dB would rail its output. So to make Eqn. 64 close to 0,  $V_{ref1}$  has to be tuned repeatedly for each pixel to cancel their different DC offset. Because the 32-channel PGAs share the same  $V_{ref1}$ , measuring the entire pixel for a frame of image means there will be total number of 1,024 adjustments performed on  $V_{ref1}$ . This can not be realized based on the current hardware.

To demonstrate the ability of different pixels across the array to accurately measure the same chemical sample with the same lifetime, three pixels at the corner of the pixel array were used to perform the measurement, as shown in Fig. 117.



Fig. 117 The coordinate of pixels at the corner of pixel array

The 50  $\mu$ M/L fluorescein sodium salt dissolved in PBS was measured by the pixels mentioned above. The measurement data and the fitted exponential curve are shown in Fig. 118. As can be seen from the result, all three pixels measured very close lifetimes with standard error less than 700 ps.



Fig. 118 The measured fluorescence lifetime data by three pixels located at the corner

To correct the pixel offset variation issue, the amplifier circuit needs to be modified. The requirement on adjusting  $V_{ref1}$  for each individual pixel is fundamentally because  $V_{ref1}$  can not automatically track  $V_{rst}$  of each pixel by the circuit itself. The modification made in Fig. 119 uses a sampling and hold circuit to dynamically set the  $V_{ref1}$ , which is the non-inverting biasing voltage of the op amp U<sub>1</sub>.



Fig. 119 The modified first stage PGA circuit with the dynamic biasing function



Fig. 120 The connection of the stage 1 amplifier in (a) pixel reset phase, (b) pixel integration/first A-Dconversion phase, and (c) pixel output amplification phase

The operation starts from the pixel sensor reset phase, as shown in Fig. 120 (a). The reset level of the pixel output,  $V_{rst}$ , is sampled on capacitor  $C_3$ .  $U_1$  is connected with unity gain feedback, so capacitor  $C_1$  tracks the offset voltage of the op amp  $U_1$ ,  $V_{os}$ . The output in pixel reset phase equals ( $V_{rst}+V_{os}$ ).

Right after the pixel reset is completed, both capacitor  $C_1$  and  $C_3$  are disconnected from the pixel output. So charges stored on both capacitors in pixel reset phase are trapped. Because the output remains ( $V_{rst}+V_{os}$ ), this voltage is digitized by the ADC to record the reset level of the measurement.

The pixel sensor integrates the amount of light in a time period called T after the reset phase. After that amount of time the pixel output drops by a voltage called  $\Delta V_{pix}(T)$  because of the light. Then two switch operations are performed. First, the unity gain feedback loop is opened, leaving capacitor C<sub>2</sub> couples the inverting input and the output

of  $U_1$ . Second, the capacitor  $C_1$  is re-connected to the pixel output. Because the inverting input is floating, and there are  $Q_1$  amount of charges flow out of  $C_1$ , those charges have to be pulled out of  $C_2$ . This makes the output becomes  $V_{out2}$  labeled in Fig. 120 (c). The voltage is digitized by the ADC again and the first data,  $V_{out1}$ , is subtracted from it.

Because the inclusion of the dynamic biasing circuit in the first stage amplifier can automatically track the offset from pixel to pixel, there is no need to manually control the bias point for measuring each individual pixel sensor when taking an entire frame of image. The cost of this solution is the introduction of the sampling capacitor,  $C_3$ . To cancel the charge injection effect due to switch  $M_1$  and  $M_2$ ,  $C_3$  needs to be as large as  $C_1$ , which was 1.5 pF in the design.

## **CHAPTER 6**

#### CONCLUSION

Fluorescence lifetime imaging is a promising research and application area that requires multi-disciplinary knowledge and efforts. Although Time Correlated Single Photon Counting and frequency-domain instruments have been widely adopted as the industrial standard, portable and cost-effective fluorimeters capable of measuring nanosecond fluorescence lifetime are currently unavailable in the market, and are of great interest for point-of-care diagnosis, field research, and wearable consumer electronics. Time-domain fluorimeters have been demonstrated on CMOS integrated circuits, but to the best knowledge of the author, a CMOS active pixel sensor has not yet been used as the sensor. This work proposed a novel sensing method to integrate fluorescence decay in different time windows to record the area underneath different parts of the decay profile. Because the electronics associated with such a sensing method no longer requires fast optoelectronics and high readout bandwidth as in SPAD based TCSPC system, the simplest CMOS active pixel sensor can be used to improve the fill factor of the pixel array.

This project developed portable imaging devices that are application specific. Several development phases were carefully planned and performed in a step by step manner. First, a discrete-component fluorimeter was built on printed-circuit boards. After having it demonstrate the ability of distinguishing lifetimes of different commonly used fluorophores in the range from 4 ns to 18.8 ns, the device was improved to have a detection limit as low as 0.25  $\mu$ M/L on fluorescein sodium salt dissolved in PBS. Next, all discrete electronics were migrated into the integrated circuit. All the necessary

building blocks were designed, simulated, fabricated, and tested, and the final CMOS fluorimeter was built on a single chip with all digital control logic integrated on it. The chip was packaged and installed on printed-boards to realize a portable device.

The single pixel in the imager demonstrated the ability to measure both the decay time constant of the excitation pulse, and the fluorescence decay of three commercial fluorophores. The measured lifetimes ranged from 3.3 ns to 19.6 ns. The fluorimeter's detection limit is 50  $\mu$ M/L performed on fluorescein sodium salt dissolved in PBS. When using pixels located at different coordinate of the array, measurements performed on the same chemical sample produced matched lifetime measurement results.

In testing the final CMOS IC using the entire pixel array, it was noticed that the amplifier's reference voltage needs to be frequently adjusted by hand in order to use the entire pixel arrray. Also, the gain of the column amplifier was reduced from its simulation because of the underestimated parasitic effects using a single-channel layout in performing the simulation. Both issues were quantitatively studied and discussed. Either improvement strategies were proposed or the inspiration from the analysis was induced to better evaluate the fundamental limiting factors of the system's performance.

To eliminate the adjustment of pixel dependent reference voltage used in the column amplifier, so that an entire frame of image can be produced automatically, it is proposed that the amplifier front-end circuit be modified by introducing a dynamic biasing circuit. The circuit implements a sampling-and-hold circuit to dynamically track and hold the unique reset level of each pixel sensor connected to the amplifier. The circuit operating under such scheme only processes the drop of the pixel output, but the variation of pixel offset is canceled.

The relation between the column amplifier's gain and the system's detection limit was mathematically modeled. The quantitative analysis was made by calculating the noise contributed from the pixel sensor, amplifier, and the ADC, then establishing a signal-to-noise ratio limit as the criteria to set the detection limit of the system. The model shows that based on the noise produced from the existing circuitry, the system's sensitivity was very weakly affected by the reduced gain. This is because the measured gain was ~60 dB, which was greater than a threshold, ~ 40 dB, below which the system's detection limit starts to decay rapidly.

There are two mechanisms that could affect the system's sensitivity. The first one is the noise contributed from the pixel sensor and the column amplifier. It was estimated that by reducing the pixel noise and the amplifier input referred noise power by a factor of 16, the detection limit could be reduced from the existing record of 50  $\mu$ M/L to 12  $\mu$ M/L. The improvement can be realized by increasing the pixel sensor's size from 20  $\mu$ m by 20  $\mu$ m to 80  $\mu$ m by 80  $\mu$ m, using a 48 $\mu$ m/16  $\mu$ m source follower current source transistor, and using a 24 pF sampling capacitor at the amplifier front-end. Because the curve fitting process that extracts the lifetime is numerically performed in the computer, the round-off error also produces errors that could lead to the variation of the measured lifetime from its actual value. This effect is less critical than the noise of the measurement circuit. It is proposed that the automatic gain control circuit be implemented into the column amplifier. This will allow the signal amplitude sensed by the ADC fit the entire conversion rail to maximize the number of signal levels that can be distinguished by the ADC.

The discrete-component fluorimeter prototype was applied into the real chemical development of optical chemical sensors for food safety in Joanneum Research, Austria. Based on the type of chemical sensors, samples with the concentration from 0.1  $\mu$ M/L to 100  $\mu$ M/L are usually measured. The discrete-component fluorimeter is able to measure fluorescein concentration down to 0.25  $\mu$ M/L; the CMOS IC fluorimeter can measure fluorescein with a concentration of 50  $\mu$ M/L. Their performance is reasonably useful in the concentration range of interest shown above. The devices not only provide the sufficient detection limit for research purpose, their low-cost and portable feature also attracts scientists in real biochemical research.

In conclusion, the CMOS IC fluorimeter was physically designed, simulated, fabricated and tested. The measurement made by the IC demonstrated the viability of using conventional CMOS active pixel sensors to perform real nanosecond-scale fluorescence lifetime measurement on real chemicals. Compared to more complicated pixels, such as SPAD sensors, Drain-Only-Modulation sensors, and Differential sensors, the adoption of using the conventional active pixel sensors maximized the fill factor of the a sensor, lowered the cost of the sensor, and reduced the operation supply voltage of the sensor. Compared to the industry-level TCSPC and frequency-domain phase instruments, the fluorimeter built on IC is portable, low cost, and can be readily used for field research and consumer electronics.

### BIBLIOGRAPHY

- R. F. M. de Almeida, L. M. S. Loura, and M. Prieto, "Membrane lipid domains and rafts: current applications of fluorescence lifetime spectroscopy and imaging," *Chem. Phys. Lipids*, vol. 157, no. 2, pp. 61–77, Feb. 2009.
- [2] C. Eggeling, J. R. Fries, L. Brand, R. Günther, and C. a. M. Seidel, "Monitoring conformational dynamics of a single molecule by selective fluorescence spectroscopy," *Proc. Natl. Acad. Sci.*, vol. 95, no. 4, pp. 1556–1561, Feb. 1998.
- [3] M. Elangovan, R. N. Day, and A. Periasamy, "Nanosecond fluorescence resonance energy transfer-fluorescence lifetime imaging microscopy to localize the protein interactions in a single living cell," J. Microsc., vol. 205, no. 1, pp. 3–14, Jan. 2002.
- [4] J. Goedhart, L. van Weeren, M. A. Hink, N. O. E. Vischer, K. Jalink, and T. W. J. Gadella, "Bright cyan fluorescent protein variants identified by fluorescence lifetime screening," *Nat. Methods*, vol. 7, no. 2, pp. 137–139, Feb. 2010.
- [5] J. R. Lakowicz, H. Szmacinski, K. Nowaczyk, and M. L. Johnson, "Fluorescence lifetime imaging of free and protein-bound NADH," *Proc. Natl. Acad. Sci.*, vol. 89, no. 4, pp. 1271–1275, Feb. 1992.
- [6] E. Kuwana, F. Liang, and E. M. Sevick-Muraca, "Fluorescence Lifetime Spectroscopy of a pH-Sensitive Dye Encapsulated in Hydrogel Beads," *Biotechnol. Prog.*, vol. 20, no. 5, pp. 1561–1566, Jan. 2004.
- [7] E. Kuwana and E. M. Sevick-Muraca, "Fluorescence Lifetime Spectroscopy for pH Sensing in Scattering Media," *Anal. Chem.*, vol. 75, no. 16, pp. 4325–4329, Aug. 2003.
- [8] J. R. Lakowicz and H. Szmacinski, "Fluorescence lifetime-based sensing of pH, Ca2+, K+ and glucose," Sens. Actuators B Chem., vol. 11, no. 1–3, pp. 133–143, Mar. 1993.
- [9] H.-J. Lin, P. Herman, J. S. Kang, and J. R. Lakowicz, "Fluorescence Lifetime Characterization of Novel Low-pH Probes," *Anal. Biochem.*, vol. 294, no. 2, pp. 118– 125, Jul. 2001.

- [10] T. Saxl, F. Khan, D. R. Matthews, Z.-L. Zhi, O. Rolinski, S. Ameer-Beg, and J. Pickup, "Fluorescence lifetime spectroscopy and imaging of nano-engineered glucose sensor microcapsules based on glucose/galactose-binding protein," *Biosens. Bioelectron.*, vol. 24, no. 11, pp. 3229–3234, Jul. 2009.
- [11] J. R. Lakowicz, *Principles of Fluorescence Spectroscopy*. Springer Science & Business Media, 2007.
- [12] N. Boens, W. Qin, N. Basarić, J. Hofkens, M. Ameloot, J. Pouget, J.-P. Lefèvre, B. Valeur, E. Gratton, M. vandeVen, N. D. Silva, Y. Engelborghs, K. Willaert, A. Sillen, G. Rumbles, D. Phillips, A. J. W. G. Visser, A. van Hoek, J. R. Lakowicz, H. Malak, I. Gryczynski, A. G. Szabo, D. T. Krajcarski, N. Tamai, and A. Miura, "Fluorescence Lifetime Standards for Time and Frequency Domain Fluorescence Spectroscopy," *Anal. Chem.*, vol. 79, no. 5, pp. 2137–2149, Mar. 2007.
- W. Becker, A. Bergmann, M. a. Hink, K. König, K. Benndorf, and C. Biskup,
   "Fluorescence lifetime imaging by time-correlated single-photon counting," *Microsc. Res. Tech.*, vol. 63, no. 1, pp. 58–66, Jan. 2004.
- [14] X. F. Wang, T. Uchida, D. M. Coleman, and S. Minami, "A Two-Dimensional Fluorescence Lifetime Imaging System Using a Gated Image Intensifier," *Appl. Spectrosc.*, vol. 45, no. 3, pp. 360–366, Mar. 1991.
- [15] K. Dowling, S. C. W. Hyde, J. C. Dainty, P. M. W. French, and J. D. Hares, "2-D fluorescence lifetime imaging using a time-gated image intensifier," *Opt. Commun.*, vol. 135, no. 1–3, pp. 27–31, Feb. 1997.
- [16] A. C. Mitchell, J. E. Wall, J. G. Murray, and C. G. Morgan, "Measurement of nanosecond time-resolved fluorescence with a directly gated interline CCD camera," *J. Microsc.*, vol. 206, no. 3, pp. 233–238, Jun. 2002.
- [17] A. C. Mitchell, S. Dad, and C. G. Morgan, "Selective detection of luminescence from semiconductor quantum dots by nanosecond time-gated imaging with a colourmasked CCD detector," *J. Microsc.*, vol. 230, no. 2, pp. 172–176, May 2008.

- [18] Sytsma, Vroom, De Grauw, and Gerritsen, "Time-gated fluorescence lifetime imaging and microvolume spectroscopy using two-photon excitation," *J. Microsc.*, vol. 191, no. 1, pp. 39–51, Jul. 1998.
- [19] E. P. Buurman, R. Sanders, A. Draaijer, H. C. Gerritsen, J. J. F. van Veen, P. M. Houpt, and Y. K. Levine, "Fluorescence lifetime imaging using a confocal laser scanning microscope," *Scanning*, vol. 14, no. 3, pp. 155–159, Jan. 1992.
- [20] W. Becker, Advanced Time-Correlated Single Photon Counting Techniques. Springer Science & Business Media, 2005.
- [21] R. D. Spencer and G. Weber, "Measurements of Subnanosecond Fluorescence Lifetimes with a Cross-Correlation Phase Fluorometer\*," *Ann. N. Y. Acad. Sci.*, vol. 158, no. 1, pp. 361–376, May 1969.
- [22] D. M. Jameson, E. Gratton, and R. D. Hall, "The Measurement and Analysis of Heterogeneous Emissions by Multifrequency Phase and Modulation Fluorometry," *Appl. Spectrosc. Rev.*, vol. 20, no. 1, pp. 55–106, Jan. 1984.
- [23] E. Gratton, D. M. Jameson, and R. D. Hall, "Multifrequency Phase and Modulation Fluorometry," *Annu. Rev. Biophys. Bioeng.*, vol. 13, no. 1, pp. 105–124, 1984.
- [24] P. Herman, B. P. Maliwal, H. J. Lin, and J. R. Lakowicz, "Frequency-domain fluorescence microscopy with the LED as a light source," *J. Microsc.*, vol. 203, no. Pt 2, pp. 176–181, Aug. 2001.
- [25] J. Kissinger and D. Wilson, "Portable Fluorescence Lifetime Detection for Chlorophyll Analysis in Marine Environments," *IEEE Sens. J.*, vol. 11, no. 2, pp. 288–295, 2011.
- [26] J. R. Lakowicz, G. Laczko, and I. Gryczynski, "2 GHz frequency domain fluorometer," *Rev. Sci. Instrum.*, vol. 57, no. 10, pp. 2499–2506, Oct. 1986.
- [27] G. Laczko, I. Gryczynski, Z. Gryczynski, W. Wiczk, H. Malak, and J. R. Lakowicz, "A 10 - GHz frequency - domain fluorometer," *Rev. Sci. Instrum.*, vol. 61, no. 9, pp. 2331–2337, Sep. 1990.

- [28] P. Herman and J. Vecer, "Frequency domain fluorometry with pulsed lightemitting diodes," *Ann. N. Y. Acad. Sci.*, vol. 1130, pp. 56–61, 2008.
- [29] E. Gratton, D. M. Jameson, N. Rosato, and G. Weber, "Multifrequency cross correlation phase fluorometer using synchrotron radiation," *Rev. Sci. Instrum.*, vol. 55, no. 4, pp. 486–494, Apr. 1984.
- [30] L. Hundley, T. Coburn, E. Garwin, and L. Stryer, "Nanosecond Fluorimeter," *Rev. Sci. Instrum.*, vol. 38, no. 4, pp. 488–492, Apr. 1967.
- [31] K. Dowling, S. C. W. Hyde, J. C. Dainty, P. M. W. French, and J. D. Hares, "2-D fluorescence lifetime imaging using a 10-kHz/150-ps gated image intensifier," in, *Summaries of papers presented at the Conference on Lasers and Electro-Optics*, 1996. CLEO '96, 1996, pp. 156–157.
- [32] M. Straub and S. W. Hell, "Fluorescence lifetime three-dimensional microscopy with picosecond precision using a multifocal multiphoton microscope," *Appl. Phys. Lett.*, vol. 73, no. 13, pp. 1769–1771, Sep. 1998.
- [33] S. E. D. Webb, Y. Gu, S. Lévêque-Fort, J. Siegel, M. J. Cole, K. Dowling, R. Jones, P. M. W. French, M. a. A. Neil, R. Juškaitis, L. O. D. Sucharov, T. Wilson, and M. J. Lever, "A wide-field time-domain fluorescence lifetime imaging microscope with optical sectioning," *Rev. Sci. Instrum.*, vol. 73, no. 4, pp. 1898–1907, Apr. 2002.
- [34] D. S. Elson, I. Munro, J. Requejo-Isidro, J. McGinty, C. Dunsby, N. Galletly, G. W. Stamp, M. a. A. Neil, M. J. Lever, P. A. Kellett, A. Dymoke-Bradshaw, J. Hares, and P. M. W. French, "Real-time time-domain fluorescence lifetime imaging including single-shot acquisition with a segmented optical image intensifier," *New J. Phys.*, vol. 6, no. 1, p. 180, Nov. 2004.
- [35] L. Liu, Y. Li, L. Sun, H. Li, X. Peng, and J. Qu, "Fluorescence lifetime imaging microscopy using a streak camera," 2014, vol. 8948, p. 89482L–89482L–5.
- [36] A. G. Ryder, S. Power, T. J. Glynn, and J. J. Morrison, "Time-domain measurement of fluorescence lifetime variation with pH," in *Biomarkers and Biological Spectral Imaging*, 2001, vol. 4259, pp. 102–109.

- [37] W. R. Gruber, P. O'Leary, and O. S. Wolfbeis, "Detection of fluorescence lifetime based on solid state technology and its application to optical oxygen sensing," 1995, vol. 2388, pp. 148–158.
- [38] C. D. Salthouse, R. Weissleder, and U. Mahmood, "Development of a Time Domain Fluorimeter for Fluorescent Lifetime Multiplexing Analysis," *IEEE Trans. Biomed. Circuits Syst.*, vol. 2, no. 3, pp. 204–211, Sep. 2008.
- [39] D. Stoppa, F. Borghetti, J. Richardson, R. Walker, L. Grant, R. K. Henderson, M. Gersbach, and E. Charbon, "A 32x32-pixel array with in-pixel photon counting and arrival time measurement in the analog domain," in *Proceedings of ESSCIRC*, 2009. *ESSCIRC* '09, 2009, pp. 204–207.
- [40] C. Niclass, C. Favi, T. Kluter, M. Gersbach, and E. Charbon, "A 128×128 Single-Photon Imager with on-Chip Column-Level 10b Time-to-Digital Converter Array Capable of 97ps Resolution," in *Solid-State Circuits Conference*, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International, 2008, pp. 44–594.
- [41] C. Niclass, C. Favi, T. Kluter, M. Gersbach, and E. Charbon, "A 128 128 Single-Photon Image Sensor With Column-Level 10-Bit Time-to-Digital Converter Array," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 2977–2989, Dec. 2008.
- [42] C. Veerappan, J. Richardson, R. Walker, D.-U. Li, M. W. Fishburn, Y. Maruyama, D. Stoppa, F. Borghetti, M. Gersbach, R. K. Henderson, and E. Charbon, "A 160 #x00D7;128 single-photon image sensor with on-pixel 55ps 10b time-to-digital converter," in *Solid-State Circuits Conference Digest of Technical Papers (ISSCC)*, 2011 IEEE International, 2011, pp. 312–314.
- [43] R. M. Field, S. Realov, and K. L. Shepard, "A 100 fps, Time-Correlated Single-Photon-Counting-Based Fluorescence-Lifetime Imager in 130 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 49, no. 4, pp. 867–880, Apr. 2014.
- [44] D. E. Schwartz, E. Charbon, and K. L. Shepard, "A Single-Photon Avalanche Diode Imager for Fluorescence Lifetime Applications," in 2007 IEEE Symposium on VLSI Circuits, 2007, pp. 144–145.
- [45] D. E. Schwartz, E. Charbon, and K. L. Shepard, "A Single-Photon Avalanche Diode Array for Fluorescence Lifetime Imaging Microscopy," *IEEE J. Solid-State Circuits*, vol. 43, no. 11, pp. 2546–2557, Nov. 2008.

- [46] C. Harris and B. Selinger, "Single-Photon Decay Spectroscopy. II. The Pile-up Problem," *Aust. J. Chem.*, vol. 32, no. 10, pp. 2111–2129, Jan. 1979.
- [47] J. Arlt, D. Tyndall, B. R. Rae, D. D.-U. Li, J. A. Richardson, and R. K. Henderson, "A study of pile-up in integrated time-correlated single photon counting systems," *Rev. Sci. Instrum.*, vol. 84, no. 10, p. 103105, Oct. 2013.
- [48] F. Borghetti et al., "a cmos single-photon avalanche diode sensor for fluorescence lifetime imaging - Google Search," in *Proc. Int. Image Sensor Workshop*, 2007, pp. pp. 250–253.
- [49] S. Burri, Y. Maruyama, X. Michalet, F. Regazzoni, C. Bruschini, and E. Charbon, "Architecture and applications of a high resolution gated SPAD image sensor," *Opt. Express*, vol. 22, no. 14, p. 17573, Jul. 2014.
- [50] B. R. Rae, C. Griffin, K. R. Muir, J. M. Girkin, E. Gu, D. R. Renshaw, E. Charbon, M. D. Dawson, and R. K. Henderson, "A Microsystem for Time-Resolved Fluorescence Analysis using CMOS Single-Photon Avalanche Diodes and Micro-LEDs," in *Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International*, 2008, pp. 166–603.
- [51] D. Stoppa, D. Mosconi, L. Pancheri, and L. Gonzo, "Single-Photon Avalanche Diode CMOS Sensor for Time-Resolved Fluorescence Measurements," *IEEE Sens. J.*, vol. 9, no. 9, pp. 1084–1090, Sep. 2009.
- [52] F. Guerrieri, S. Tisa, A. Tosi, and F. Zappa, "Two-Dimensional SPAD Imaging Camera for Photon Counting," *IEEE Photonics J.*, vol. 2, no. 5, pp. 759–774, Oct. 2010.
- [53] D. Chitnis and S. Collins, "Compact readout circuits for SPAD arrays," in Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), 2010, pp. 357–360.
- [54] E. Panina, G.-F. Dalla Betta, L. Pancheri, and D. Stoppa, "Design of CMOS Gated Analog Readout Circuits for SPAD Pixel Arrays," in *Research in Microelectronics and Electronics (PRIME)*, 2012 8th Conference on Ph.D, 2012, pp. 1–4.

- [55] L. Pancheri, N. Massari, and D. Stoppa, "SPAD Image Sensor With Analog Counting Pixel for Time-Resolved Fluorescence Detection," *IEEE Trans. Electron Devices*, vol. 60, no. 10, pp. 3442–3449, Oct. 2013.
- [56] E. Panina, L. Pancheri, G.-F. Dalla Betta, N. Massari, and D. Stoppa, "Compact CMOS Analog Counter for SPAD Pixel Arrays," *IEEE Trans. Circuits Syst. II Express Briefs*, vol. 61, no. 4, pp. 214–218, Apr. 2014.
- [57] A. El Gamal and H. Eltoukhy, "CMOS image sensors," *IEEE Circuits Devices Mag.*, vol. 21, no. 3, pp. 6–20, May 2005.
- [58] J. Genoe, D. Coppee, J. H. Stiens, R. A. Vonekx, and M. Kuijk, "Calculation of the current response of the spatially modulated light CMOS detector," *IEEE Trans. Electron Devices*, vol. 48, no. 9, pp. 1892–1902, Sep. 2001.
- [59] T. Huang, S. Sorgenfrei, K. L. Shepard, P. Gong, and R. Levicky, "A CMOS Array Sensor for Sub-800-ps Time-Resolved Fluorescence Detection," in *IEEE Custom Integrated Circuits Conference*, 2007. CICC '07, 2007, pp. 829–832.
- [60] T. D. Huang, S. Sorgenfrei, P. Gong, R. Levicky, and K. L. Shepard, "A 0.18-μm CMOS Array Sensor for Integrated Time-Resolved Fluorescence Detection," *IEEE J. Solid-State Circuits*, vol. 44, no. 5, pp. 1644–1654, May 2009.
- [61] T. D. Huang, S. Paul, P. Gong, R. Levicky, J. Kymissis, S. A. Amundson, and K. L. Shepard, "Gene expression analysis with an integrated CMOS microarray by time-resolved fluorescence detection," *Biosens. Bioelectron.*, vol. 26, no. 5, pp. 2660–2665, Jan. 2011.
- [62] H.-J. Yoon, S. Itoh, and S. Kawahito, "A CMOS Image Sensor With In-Pixel Two-Stage Charge Transfer for Fluorescence Lifetime Imaging," *IEEE Trans. Electron Devices*, vol. 56, no. 2, pp. 214–221, Feb. 2009.
- [63] Z. Li, S. Kawahito, K. Yasutomi, K. Kagawa, J. Ukon, M. Hashimoto, and H. Niioka, "A Time-Resolved CMOS Image Sensor With Draining-Only Modulation Pixels for Fluorescence Lifetime Imaging," *IEEE Trans. Electron Devices*, vol. 59, no. 10, pp. 2715–2722, Oct. 2012.

- [64] G. Patounakis, K. L. Shepard, and R. Levicky, "Active CMOS Array Sensor for Time-Resolved Fluorescence Detection," *IEEE J. Solid-State Circuits*, vol. 41, no. 11, pp. 2521–2530, Nov. 2006.
- [65] G.-C. Wang, "Timing Optimization of Solid-State Photomultiplier Based PET Detectors," *IEEE Trans. Nucl. Sci.*, vol. 57, no. 1, pp. 25–30, Feb. 2010.
- [66] H. Wang, Y. Qi, T. J. Mountziaris, and C. D. Salthouse, "A portable time-domain LED fluorimeter for nanosecond fluorescence lifetime measurements," *Rev. Sci. Instrum.*, vol. 85, no. 5, p. 055003, May 2014.
- [67] See http://www.iss.com/resources/research/technical\_notes/K2CH\_FLT.html for Terpetschnig, ISS Technical Notes Fluorescence Lifetime, 2013.
- [68] S. H. Ko, K. Du, and J. A. Liddle, "Quantum-Dot Fluorescence Lifetime Engineering with DNA Origami Constructs," *Angew. Chem. Int. Ed.*, vol. 52, no. 4, pp. 1193–1197, 2013.
- [69] J. Kissinger and D. Wilson, "Portable Fluorescence Lifetime Detection for Chlorophyll Analysis in Marine Environments," *IEEE Sens. J.*, vol. 11, no. 2, pp. 288–295, 2011.
- [70] ISS, inc, "Lifetime Data of Selected Fluorophores," 2012.
- [71] "SPIE | Proceeding | Active pixel sensors: are CCDs dinosaurs?" [Online]. Available: http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=1008573. [Accessed: 10-Jan-2016].
- [72] O. Yadid-Pecht and R. Etienne-Cummings, CMOS Imagers: From Phototransduction to Image Processing. Springer Science & Business Media, 2007.
- [73] O. Yadid-Pecht, R. Ginosar, and Y. Shacham-Dimand, "A random access photodiode array for intelligent image capture," in , *17th Convention of Electrical and Electronics Engineers in Israel, 1991. Proceedings*, 1991, pp. 301–304.

- [74] O. Yadid-Pecht, B. Pain, C. Staller, C. Clark, and E. Fossum, "CMOS active pixel sensor star tracker with regional electronic shutter," *IEEE J. Solid-State Circuits*, vol. 32, no. 2, pp. 285–288, Feb. 1997.
- [75] A. Krymski, D. Van Blerkom, A. Andersson, N. Bock, B. Mansoorian, and E. R. Fossum, "A high speed, 500 frames/s, 1024/spl times/1024 CMOS active pixel sensor," in *1999 Symposium on VLSI Circuits, 1999. Digest of Technical Papers*, 1999, pp. 137–138.
- [76] D. Lee, K. Cho, D. Kim, and G. Han, "Low-Noise In-Pixel Comparing Active Pixel Sensor Using Column-Level Single-Slope ADC," *IEEE Trans. Electron Devices*, vol. 55, no. 12, pp. 3383–3388, Dec. 2008.
- [77] F. Xiao, A. J. Farrell, B. P. Catrysse, B. W, and E. B, "Mobile Imaging: The Big Challenge of the Small Pixel."
- [78] S. Li and C. Salthouse, "Digital-to-time converter for fluorescence lifetime imaging," in *Instrumentation and Measurement Technology Conference (I2MTC)*, 2013 IEEE International, 2013, pp. 894–897.
- [79] R. Gregorian, "High-resolution switched-capacitor D/A converter," *Microelectron*. J., vol. 12, no. 2, pp. 10–13, Mar. 1981.
- [80] C. C. Enz and G. C. Temes, "Circuit techniques for reducing the effects of opamp imperfections: autozeroing, correlated double sampling, and chopper stabilization," *Proc. IEEE*, vol. 84, no. 11, pp. 1584–1614, Nov. 1996.
- [81] *Design of Analog CMOS Integrated Circuits*, 1 edition. Boston, MA: McGraw-Hill Education, 2000.
- [82] H. Wang, "A Column-Parallel Two-Step Successive Approximation Analog-To-Digital Converter," *Masters Theses 1896 - Febr. 2014*, Jan. 2013.
- [83] H. Wang, C. D. Salthouse, "22 um-Pitch 9-bit Column-Parallel Overlapping-Subrange SAR (CPOSSAR) ADC." Microelectronics J.

- [84] M. Furuta, Y. Nishikawa, T. Inoue, and S. Kawahito, "A High-Speed, High-Sensitivity Digital CMOS Image Sensor With a Global Shutter and 12-bit Column-Parallel Cyclic A/D Converters," *IEEE J. Solid-State Circuits*, vol. 42, no. 4, pp. 766–774, Apr. 2007.
- [85] S. Matsuo, T. J. Bales, M. Shoda, S. Osawa, K. Kawamura, A. Andersson, M. Haque, H. Honda, B. Almond, Y. Mo, J. Gleason, T. Chow, and I. Takayanagi, "8.9-Megapixel Video Image Sensor With 14-b Column-Parallel SA-ADC," *IEEE Trans. Electron Devices*, vol. 56, no. 11, pp. 2380–2389, Nov. 2009.
- [86] Randal Kuramoto, "Xilinx In-System Programming Using an Embedded Microcontroller," *March 6 2009*, vol. VAPP058 (v4.1).
- [87] J. A. Russell, K. R. Diamond, T. J. Collins, H. F. Tiedje, J. E. Hayward, T. J. Farrell, M. S. Patterson, and Q. Fang, "Characterization of Fluorescence Lifetime of Photofrin and Delta-Aminolevulinic Acid Induced Protoporphyrin IX in Living Cells Using Single- and Two-Photon Excitation," *IEEE J. Sel. Top. Quantum Electron.*, vol. 14, no. 1, pp. 158–166, Jan. 2008.