A 128x128 Single-Photon Imager with on-Chip Column-Level 10bit Time-to-Digital-Converter Array by Niclass, Cristiano et al.
3 •  2008 IEEE International Solid-State Circuits Conference
ISSCC 2008 / SESSION 2 / IMAGE SENSORS & TECHNOLOGY / 2.1
2.1 A 128×128 Single-Photon Imager with on-Chip 
Column-Level 10b Time-to-Digital Converter Array 
Capable of 97ps Resolution
Cristiano Niclass, Claudio Favi, Theo Kluter, Marek Gersbach, 
Edoardo Charbon
Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Time-resolved optical imaging has many uses in physics, molecular
biology, medical sciences and computer vision, just to name a few.
Deep sub-nanosecond timing resolution, in combination with high
sensitivity, is becoming increasingly important in a number of imag-
ing methods. Non solid-state devices enabling picosecond time reso-
lution, such as photomultiplier tubes and microchannel plates, have
existed for decades. However, cost and size have limited their use to
low-scale and scientific applications. In solid-state technology, single-
photon avalanche diodes (SPADs) have become the alternative of
choice [1]. Recently, SPADs are even more compelling with the emer-
gence of CMOS implementations [2] and the appearance of multi-
pixel designs [3]-[6]. 
With the growth of the array size however, it has become increasing-
ly hard to process massive volumes of data from SPAD pixels. To
address this issue, hybrid systems have been proposed that combine
advanced CMOS technologies with processes designed to optimize
SPAD performance [7]. The main limitation of this approach is the
increased complexity of fabrication and, possibly, higher costs. Analog
design techniques have also been used to evaluate the photon’s time-
of-arrival (TOA) on-chip [5]. However, increased pixel size and poten-
tially complex schemes to compensate for temperature and technolog-
ical variability are often needed. 
We present an array of 128×128 highly miniaturized SPAD pixels
with a bank of 32 time-to-digital converters (TDCs) on chip. The block
diagram of the system is shown in Fig. 2.1.1. A decoder selects a 128-
pixel row. Every group of 4 pixels in the row shares a TDC based on
an event-driven mechanism similar to [4]. As a result, row-wise par-
allel acquisition is obtained with a low number of TDCs. Thanks to
the outstanding timing precision of SPADs and an optimized TDC
design, a typical resolution of 97ps is achieved within a range of
100ns (10b) at a maximum rate of 10MS/s per TDC. The TDC bank
exhibits a DNL of 0.08LSB and an INL of 1.89LSB. 
Figure 2.1.2 (a) shows the pixel schematics based on a configuration
with seven NMOS transistors. The SPAD, implemented as a p+/p-
well/deep n-well junction, is based on [4], where detailed device char-
acterization is reported. The breakdown voltage VBD of the SPAD in
this design is 17.7V. At its cathode, a bias voltage of 21V is applied in
order to operate with an excess bias voltage VE of 3.3V. A row selec-
tion transistor (M1) decouples the SPAD anode from pixels in the
columns that are not selected. At the selected row, the SPAD is
charged as a result of its anode being connected to ground via
quenching/recharge transistor M2. Transistors M3, M4, and M6 oper-
ate as switches to set VGS of M2 to either VQCH or VRCH in order to
quench the avalanche or recharge the SPAD. M5 is used as a capaci-
tor to reduce the effect of switching noise on VQCH caused by charge
injection from the gate of M2. M7 is the pixel output transistor. It
operates as a pull-down for the column line when a photon is detect-
ed. The column line potential is kept high by pull-up transistors at
the bottom of the column. 
Figure 2.1.2 (b) shows a simplified block diagram of the TDC. Time-
to-digital conversion is obtained as a result of an interpolation of
three delay measurements: coarse, medium, and fine. The main TDC
structure is similar to [8]. Nonetheless, further improvements have
been implemented to reduce the silicon area and perform flash con-
version, thus increasing throughput. Each TDC has an independent
controller that is used as a time interpolator, to generate internal sig-
nals and to control its operating mode. Each controller also manages
the interface with the global readout circuit and the column circuit-
ry. A global master DLL generates 16 uniformly spaced phases
PHI[15:0] based on a global CLK/START signal. Example waveforms
for CLK/START and PHI are shown in Fig. 2.1.2(c). The input fre-
quency of the master DLL is typically 40MHz, thus τC is 25ns for time
interval measurements. The time separation between two successive
phases sets τM to 1.5625ns. 
The TDC supports two main operating modes: (i) a measurement
mode and (ii) a calibration mode. In measurement mode, the STOP
signal originated by the first of four SPADs that detects a photon is
mapped to signal TRG in column-level TDC. A 2b counter clocked by
signal CLK delivers coarse resolution τC for a measurement range of
4τC. Medium resolution τM is achieved by finding that pair of global
phases PHI which delimits the transition of TRG. The register used
for τM also generates a synchronization signal SYNC that is precise-
ly asserted on the second phase transition following the rising edge
of TRG. The time delay between TRG and SYNC is measured by
means of a 32-tap delay line and register, which are designed based
on the TDC core of [9]. In the TDC interpolator, the three measure-
ments, delivering 2, 4, and 4 bits of resolution, are combined into the
total time delay code. Only 16 delay cells (4b) of the fine delay line
are used for the final result. The remaining delay cells are added to
accommodate timing shifts due to process, voltage or temperature
variations.
In calibration mode, the TDC utilizes the full 32-tap fine delay line
as a local DLL that locks to two non-successive phases (PHI[i] and
PHI[i+2]) with a total duration of 2τM, thus generating a fine resolu-
tion τF of  97.66ps. The analog control voltage of the DLL is stored on
a local capacitor. Since calibration is performed individually in each
TDC, matching requirements between TDCs may be relaxed. The
output of all TDCs is transferred off-chip via a fast global readout cir-
cuit consisting of 32 TDC interface blocks, a configuration/testing
JTAG controller and a pipelined time-multiplexer readout chain. The
readout circuit controls 8 digital 12b output buses. Each bus provides
the 10b TDC data and 2b column address to identify the originator of
the STOP signal among four pixels. In order to maximize data rate,
the readout circuit operates four times faster than the TDC frequen-
cy. To reduce power consumption, IO pads only change state when
valid data are available in a readout cycle. The readout circuit also
provides configuration/testing measures to read and modify most
TDCs and readout circuit registers via an integrated JTAG controller.
The chip micrograph is shown in Fig. 2.1.3 along with a detail of the
pixel. The pixel measures 25×25μm2. The sensor was tested in three
steps. First, the TDCs were characterized separately. Second, the
TDC array was operated in measurement mode and connected with
the SPAD array when exposed to ambient light. The performance of
the TDC bank is summarized in Fig. 2.1.4. The figure shows the
worst-case DNL and INL measured over the entire bank over sever-
al days, to verify the effectiveness of calibration over temperature
and technological variations. Finally, the chip was exposed to direct
pulsed laser illumination generated by a 637nm solid-state laser
source. The pulses were 80ps wide with a repetition rate of 40MHz.
The power of the laser was adjusted to minimize pile-up distortion
and a TOA histogram was built for each pixel. The resulting jitter
measurements, along with the dark count rate (DCR), are shown in
Fig. 2.1.5.
The chip’s overall performance was tested in a breadboard system
based on an FPGA. The breadboard was designed to provide all the
digital interface signals and memory support for the imager output.
The sensor imaged a 3D scene illuminated by a pulsed laser. Figure
2.1.6 shows the 3D image obtained using the same techniques as in
[3], whereby the total integration time for one frame was 1s, with a
worst-case distance error of 1.4mm. Fig. 2.1.7 is a performance sum-
mary of the sensor chip and its various components. This research
was supported by a grant from the Swiss National Science
Foundation. The authors are grateful to Maximilian Sergio for help
during the IC tape-out.
References:
[1] S. Cova, A. Longoni and A. Andreoni, “Towards Picosecond Resolution with Single-
Photon Avalanche Diodes,” Rev. Sci. Instrum., 52 (3), pp 408-412, 1981.
[2] A. Rochas, M. Gani, B. Furrer et al., “Single Photon Detector Fabricated in a
Complementary Metal-Oxide-Semiconductor High-Voltage Technology,” Rev. Sci.
Instrum., Vol. 74, N. 7, pp 3263-3270, July 2003.
[3] C. Niclass and E. Charbon, “A Single Photon Detector Array with 64x64 Resolution
and Millimetric Depth Accuracy for 3D Imaging,” ISSCC Dig. Tech. Papers, pp. 364-
365, Feb. 2005.
[4] C. Niclass, M. Sergio and E. Charbon., “A Single-Photon Avalanche Diode Array
Fabricated in 0.35μm CMOS and Based on an Event-Driven Readout for TCSPC
Experiments,” APCT Conference, SPIE Optics East, Oct. 2006.
[5] D. Stoppa, L. Pancheri, M. Scandiuzzo et al., “A CMOS 3-D Imager Based on Single
Photon Avalanche Diode,” IEEE T. CAS I, pp. 4-12, Jan. 2007.
[6] M. Sergio, C. Niclass and E. Charbon, “A 128×2 CMOS Single-Photon Streak
Camera with Timing-Preserving Latchless Pipeline Readout,” ISSCC Dig. Tech.
Papers, pp. 394-395, Feb. 2007.
[7] B. Aull, J. Burns, C. Chen et al., “Laser Radar Imager Based on 3D Integration of
Geiger-Mode Avalanche Photodiodes with Two SOI Timing Circuit Layers,” ISSCC Dig.
Tech. Papers, pp. 238-239, Feb. 2006.
[8] A. Mantyniemi, T. Rahkonen and J. Kostamovaara, “An Integrated 9-Channel Time
Digitizer with 30ps Resolution,” ISSCC Dig. Tech. Papers, pp 266-267, Feb. 2002.
[9] R. B. Staszewski, S. Vemulapalli, P. Vallur et al., “Time-to-Digital Converter for RF
Frequency Synthesis in 90nm CMOS,“ IEEE RFIC Symp., pp. 473-476, 2005.
©2008 IEEE
4DIGEST OF TECHNICAL PAPERS  •
Continued on Page 
ISSCC 2008 / February 4, 2008 / 1:30 PM
Figure 2.1.1: Block diagram of the proposed sensor. The sensor consists of a 128x128
pixel array, a bank of 32 TDCs, and a fast parallel readout circuitry. A row decoding logic
selects 128 pixels that are activated for detection. The pixels are organized in groups of
four that access the same TDC based on a first-in-take-all sharing scheme.
Figure 2.1.3: Photomicrograph of the sensor chip with a pixel detail in the inset. The 
circuit, fabricated in 0.35μm CMOS technology, has a surface of 8x5mm2. The pixel pitch
is 25μm.
Figure 2.1.4: Measurements of differential non-linearity (DNL) and integral non-linearity
(INL) for the worst case TDC at room temperature.
Figure 2.1.5: (a) Time jitter measurement of the SPAD detector and overall circuitry using
the integrated TDCs. In the inset, a logarithmic plot is shown. The illumination laser
pulse width was 80ps. (b) Dark count rate (DCR) distribution over the array.



























































5 •  2008 IEEE International Solid-State Circuits Conference ©2008 IEEE
ISSCC 2008 PAPER CONTINUATIONS
Figure 2.1.6: Experimental 3D image with model picture in inset. The 1σ error computed
from two subsequent images is 1.4mm.




Parameter Symbol Min. Typ. Max. Unit 







  Photon detection probability 
@ Ve=4.0V 
η 3  40 
@ 460nm 
% 
Pixel Sensitivity spectrum λ 350  800 nm 
 Median DCR   284  Hz 
 Dead time τDT  100  ns 
 Tuning of measurement 
range (10 bits) 
 71.68 100 204.8 ns 
 Resolution (LSB) τF 70 97.66 200 ps 
TDC Measurement rate    10 MS/s 
 DNL   0.08  LSB 
 INL   1.89  LSB 
 Clock frequency  19.5 40 55.8 MHz 
 Total IO bandwidth   7.68  Gbps 
System JTAG bandwidth   8  Mbps 
 Static power dissipation   33  mW 
 Dynamic power dissipation   150  mW 
 
Integration time   1  s 
 
Illumination average power   1  mW 
3D Image 
Sample Illumination peak power   250  mW 
 
Illumination duty cycle   0.4  % 
 
Target area   1  m2 
 
Target distance @40MHz  0.1 1.5 3.75 m 
 
