Abstract-Imager topology with sub-ns time gating for 3D distance measurement application and first measurement results of the prototype are presented. The imager has a fully digital operating principle with single-photon avalanche diode detectors and on-chip narrow gating of pixel groups. The prototype detector has 80 x 25 pixels with a fill factor of 34 % in the sensor area. The chip has been fabricated in a 0.35 µm high-voltage process and occupies 5.69 x 5.02 mm 2 area.
I. INTRODUCTION
Three-dimensional (3D) imaging has become a necessity in many control and navigation applications. 3D scanners are commonly based on realizing the distance measurement with laser radar and the needed scanning function with the mechanical rotation of the measuring head or using rotating mirrors to scan only the laser beam. However, a small and low-cost imager is needed in many new applications in human-machine interfaces, gaming, surveillance and control of machines, for example [1] .
To achieve scanning without moving parts, a promising technique is to use optical time of flight (TOF) with the electronic focal plane scanning approach where a 2D detector array is located at the focal plane of a positive lens. These kinds of TOF imagers use typically either continuous wave phase comparison or stopwatch type methods with single-photon avalanche diode (SPAD) detectors. Good results have been achieved with the phase comparison method for short-range applications: range of 0.8 -4.2 m with accuracy of <1 %, for example [2] . In stopwatch type scanners the distance is usually measured by taking the start time from the laser pulse and measuring the time photons travel from the laser to the target and back to the receiver. The time measurement is realized with time-to-digital converters (TDC) or time-to-amplitude converters. The measurement range can be up to several kilometers with this kind of a pulsed TOF method [3] .
In the focal plane approach, the fill factor of the detector array should be high so that no photon reaching the receiver is lost. SPAD arrays with sizes of 9×9 with a fill factor of 43 % in [4] , 64×32 with 3 % [5] and 512×128 with 5 % [6] have been fabricated with a standard high-voltage 0.35 µm process without using micro lenses. These large arrays are usually implemented with in-pixel counting circuits or with shared TDCs which will either reduce the fill-factor significantly or increase the measurement time due to multiplexing the TDC operation. Fill factors and especially pixel counts reported with more scaled processes are higher but with the drawback of small SPAD sizes and increased noise density [7] . There is clearly a need for research on the optimum architecture of large SPAD arrays with the timing capability and with a high fill factor.
High background lighting, for example outdoor measurement in bright daylight, can be problematic for SPAD detectors due to the sensor saturation. One part of the solution is to use optical filtering, carefully selecting the field of view of the receiver and using either a mechanical or electrical shutter. Range gating with fast electric shutter has been used in recent high-performance SPAD sensors to suppress the pile-up effect of unwanted photons arriving at the detector [7, 8] . A SPAD imager operating principle that has been used in this work develops this idea further and decreases this time gating to the minimum. Furthermore, it is just the time position of the time gate that reveals the transit time of the detected photons. By minimizing the gating window width also the effect of SPADs own dark count rate becomes insignificant and the size of the active area of the diode can be increased. With a large SPAD a good fill factor can be achieved even with the pixel electronics included.
In this work, we present a CMOS SPAD imager topology based on sub-nanosecond time gating of pixel groups and a pulsed laser source that illuminates the target with very short and high-energy laser pulses. First, the operating principle is described and then the architecture and first measurement results of the fabricated chip are presented.
II. 3D IMAGER OPERATING PRINCIPLE
The SPAD imager presented in this article measures the flight time of a short laser pulse from the laser source to the target and back to the receiver. The measurement is realized by collecting binary type 2D cross sectional images from predetermined distances defined by the time gating electronics, see Fig. 1 . A three-dimensional image of the 978-1-5090-6508-0/17/$31.00 ©2017 IEEE PRIME 2017, Giardini Naxos-Taormina, Italy Imaging selected measurement range is generated by combining these 2D images from different distances. The entire measurement range is first scanned with lower time resolution (longer time gate) and when the target is found, the time resolution is increased to the desired value. In order to obtain a high frame rate for applications where the target is followed in real time, the pixel array is divided into small subarrays whose time gates can be programmed individually within the specified time window. This enables each subarray to have a partial scan around the surface of the target and the longer, full scan of the measurement range with a high resolution is not needed. The principle of programming time gates of subarrays separately is shown in Fig. 2 , which demonstrates a measurement to an inclined plane where the time gates of the subarrays are following the target surface. Fig. 2 presents eight subarrays, but in the prototype circuit the pixel array is divided into 40 groups of 50 SPADs. The position and depth of the partial scans can be programmed individually for each subarray within the total range of ~3 m to be able to follow movements of a human being, for example.
Obviously, the accuracy of the depth measurement depends on the transmitted laser pulse width and distance measurement resolution of the detector. With a pulsed laser source that has high energy (Er > 1 nJ) and narrow full width at half maximum (FWHM ~ 100 ps), single-shot depth resolution of < 2 cm is achievable [8] . The detector time measurement precision depends on the SPAD timing jitter (50 -100 ps) and the time gating precision. To achieve a depth resolution of 3 cm, the sizes of the time gates are designed to be < 1 ns and shifting of the gates with respect to each other can be done with an accuracy of ~ 100 ps. The size and time position of the gates within one time window (~24 ns) are programmable with 8 bits from the delay locked loop with 240 outputs. Obviously, in high background illumination conditions the gating of the SPADs is chosen to be as narrow as possible.
To get a high fill factor for the sensor area, the area of the pixel electronics has to be minimized. Therefore the state of each SPAD during the detection window defined by the time gate is captured only to one memory cell for each sent laser pulse. This results in binary data "1" or "0" depending on whether a photon has triggered the detector cell or not. Controlling of the laser driver circuitry is done at the imager sensor chip which makes it possible to delay the laser pulse with respect to the detector's measurement range. The possibility to shift the laser pulse time position enables the effective measurement range to begin even from 0 cm distance from the receiver.
III. DETECTOR ARCHITECTURE
The image sensor prototype consists of a detector array of 80 x 25 pixels, programmable on-chip time gating and data processing with an FPGA, see Fig. 3 .
The time gating is realized on-chip to be able to divide the SPAD array into 40 subarrays, each of which have their own time gates. The time gates are based on a delay locked loop that produces global timing signals for all the subarrays. The total delay line length of the delay locked loop defines the time window for the programmable time gates, and the length of 25 ns with 40 MHz input clock signal is selected to achieve a depth range of 3.75 m. The delay locked loop has 240 outputs and a grid of 105 ps from which three rising edges to each of the 40 subarrays are selected. The time gates are produced from these three rising edges locally at every pixel. Selection of the signals is done with multiplexers (MUX), each of which is controlled with 8 bits by the external FPGA.
The delay line output buffers are enabled for the specific detection time window by a control block with a 6-bit c o nt ro l w o rd f ro m t he F P G A . T he c o nt ro l b l o c k ha s a counter that starts counting the delay line reference clock periods when a command for the laser pulse driver is sent. When the counter reaches the value defined by the FPGA, the delay line outputs are buffered to the multiplexers, and selection of the time gating signals begins. The magnitude of the counter defines the maximum delay for the time window. In the prototype a 6-bit counter was implemented which enables distance measurement up to hundreds of meters. The delay of the laser pulse can be selected from the same delay line as the gate signals, but the selection is reduced to 120 phases with 210 ps delay shifts to simplify the chip layout. When registers for controlling the counter block, laser pulse delay and the gate timing are set, the imager chip sends a command to the laser diode driver circuitry which then shoots a high-energy and narrow laser pulse. The result of a single measurement shot is read out by connecting in-pixel flip-flops in series and buffering all 2000 bits to an external FPGA for further processing. The direct read-out of the SPAD array will make the signal processing adjustable with FPGA, which is good for prototype testing, however at the cost of increased overall power consumption due to the buffering of high speed offchip signals. The read-out of the 80 registers in x-direction is done serially and a bus of 80 x 25 bits is buffered out. With this arrangement up to 1 MHz frame rate for 2D images can be achieved with a 100 MHz clock signal. There is no need for faster read-out at the moment, since the frame rate will be limited by pulsed laser source circuitry to 100 kHz and later potentially to 1 MHz with some improvements in the laser driver circuitry.
The prototype of the imager was chosen to be fabricated in a cost-effective 0.35 µm HV process, which has good SPAD properties to demonstrate the operation principle functionality, but the performance of the operating principle is likely to be better with newer technology due to fill factor increase of the sensor area and speed of the transistors. A picture of the fabricated imager is presented in Fig. 4 and the dimensions of the chip are 5.02 mm x 5.69 mm.
A. Pixel design
One p ixe l c o nsists o f a SPAD, b iasi ng sw itc he s f o r t he diode, sample and hold, buffering of signals and data storage of 1 bit. A simplified schematic is presented in Fig.  5 and layout of one group and one pixel in Fig. 4 .
First the SPAD is biased near the edge of the breakdown voltage by turning the switch quench conductive. This will connect the anode of the SPAD to the power supply voltage and quench the diode from the previous measurement cycle if there has been one. The cathode is connected to a highvoltage supply of ~22.5 V. Then the sensor is opened for photons (leading edge of the time gate window) with load signal which biases the SPAD over the breakdown voltage by connecting the anode of the SPAD to the ground. Hence the maximum excess bias voltage is the power supply voltage VDD. The signal for loading the SPAD anode to ground has to be fast and narrow to be able to make short time gating. Due to trace length of > 2.5 mm, the loading signal pulse is produced locally at every pixel from the rising edges of Load start and Load stop signals. The state of the SPAD is sampled at the end of the gate with the sample switch and buffered to a 1-bit memory cell. The timing diagram of the SPAD gating and related signals is presented in Fig. 5 .
Sampling of the state of the SPAD can also be done fully digitally by connecting the sampling signal to trigger the flip-flop, which will save the state of the SPAD directly to memory without first sampling the charge to the capacitor. This option is also included in the fabricated chip for testing purposes. A multiplexer is added in front of the memory cell to enable the flip-flops to first store the state of the SPADs and then transfer data out serially.
The active area of the SPAD is a rectangle of 47 x 36 µm 2 in size, and the corners have been rounded to prevent premature edge breakdown. Deep nwell/p+ junction is acting as the multiplication region and the active area is surrounded by pwell guard rings. The pixel pitch is 50 µm to the x-direction and 100 µm to the y-direction. The pixels are placed in rows from which every other is flipped in y direction to get a better fill factor with shared deep nwells. 
B. Delay-locked Loop Design
The delay-locked loop is divided into two loops DLL1 and DLL2 that produce 240 output signals altogether. One delay-locked loop consists of a phase detector, a charge pump and a delay line of 120 delay elements. Current starved buffers are acting as voltage-controlled delay elements and a filtering capacitor of 228 fF is added to every delay cell. A MUX that selects one signal from the 240 phases is made from four stage pipelined 4-to-1 multiplexers. This means that there are 80 multiplexers in each MUX and the routing of the paths to have similar parasitic loading becomes challenging. Post-layout simulation results of the carefully designed routing of 240-to-1 MUX show < 20 ps systematic nonlinearity due to routing skew.
The chip sends a command to the laser driver circuitry to send the pulse which can be shifted by selecting one output from DLL1. The measurement results of the laser command output compared to the reference clock input are presented in Fig. 6 . The result of the laser command output contains nonlinearity of DLL1 combined with the inaccuracy of the laser multiplexer chain. The measurement results show ± 70 ps nonlinearity for laser output command shifting in room temperature. The measured standard deviation of the reference clock jitter was 8 ps and the DLL1 output jitter was 11 ps. The current consumption of the measured circuitry is 12 mA from a 3.3 V power supply. A sub-nanosecond time gating topology for solid state 3D scanning with a CMOS SPAD imager is presented. The fabricated imager has 80 x 25 pixels in a sensor area of 4 mm x 2.5 mm with a 34 % fill factor. The SPAD array has been divided into 40 subarrays whose time gating can be programmed individually from an on-chip delay-locked loop with 240 outputs. The measurement results show ±70 ps nonlinearity for the laser triggering output. The characterization work of the chip is in progress.
