We proposed the world's first flexible ultrathin-body singlephoton avalanche diode (SPAD) as photon counting device providing a suitable solution to advanced implantable bio-compatible chronic medical monitoring, diagnostics and other applications. In this paper, we investigate the Geiger-mode performance of this flexible ultrathin-body SPAD comprehensively and we extend this work to the first flexible SPAD image sensor with in-pixel and off-pixel electronics integrated in CMOS. Experimental results show that dark count rate (DCR) by band-to-band tunneling can be reduced by optimizing multiplication doping. DCR by trap-assisted avalanche, which is believed to be originated from the trench etching process, could be further reduced, resulting in a DCR density of tens to hundreds of Hertz per micrometer square at cryogenic temperature. The influence of the trench etching process onto DCR is also proved by comparison with planar ultrathin-body SPAD structures without trench. Photon detection probability (PDP) can be achieved by wider depletion and drift regions and by carefully optimizing body thickness. PDP in frontside-(FSI) and backside-illumination (BSI) are comparable, thus making this technology suitable for both modes of illumination. Afterpulsing and crosstalk are negligible at 2µs dead time, while it has been proved, for the first time, that a CMOS SPAD pixel of this kind could work in a cryogenic environment. By appropriate choice of substrate, this technology is amenable to implantation for biocompatible photon-counting applications and wherever bended imaging sensors are essential.
Introduction
Solid-state single-photon avalanche diode (SPAD) technology has existed for decades and is receiving wide attention for applications such as time-of-flight vision, time-correlated singlephoton counting, fluorescence lifetime sensing and biomedical imaging. A SPAD is an avalanche photodiode (APD) operating above breakdown voltage, V BD , in so-called Geiger mode and equipped with avalanche quenching and recharge mechanisms. Thanks to improved CMOS fabrication and Moore's Law, SPAD technology implemented in planar processes for imaging systems has significantly progressed in recent years [1] [2] [3] [4] [5] .
Especially in the last decade, the push towards new applications, such as disposable assays, edible probes and implantable sensors has continued. A typical application in the field of biomedical sensing is the retinal prosthesis [6] [7] [8] , where a single-photon imaging device, integrated with CMOS circuitry on flexible substrate, could be implanted into the eye and bent to match the curvature of the eyeball. Another application is chronic biomedical monitoring [9] , where a wearable or implantable miniaturized single-photon sensor could be left in situ to continuously monitor a person's health status, providing more accurate information about the progression of diseases such as cancer and other inflammatory or chronic ailments. However, current SPAD technology is generally implemented on bulk silicon and can't meet the requirements of implantable biomedical applications where backside-illumination and new substrate post processing are core technologies, while inherent CMOS compatibility is a requirement [10, 11] . In our previous work, we demonstrated the world's first flexible SPAD fabricated in an ultrathin-body silicon-on-insulator (SOI) process followed by transfer post-processing to flexible substrate [12] , and we achieved a flexible SPAD with dual-side illumination for the first time [13] .
In this work, we investigate the Geiger-mode performance of this flexible ultrathin-body SPAD comprehensively: dark count rate (DCR), V BD and photon detection probability (PDP) are studied based on different junction parameters, operation temperature and device structures. Afterpulsing and crosstalk are negligible at the dead time, thanks to reduced parasitic capacitance, obtained through the integration of the output buffer. Thanks to the flexible substrate and dual-side illumination, the flexible SPAD chip could be implanted easily to human organ, such as eyeball, and integrated with stimulation components at any side of the retina. Also, thanks to the integration of CMOS process, it could pave the way to realize larger scale sensor array integrated with more electronics on flexible substrate. The DCR could be drastically reduced by operating the SPADs at temperatures as low as 80K. The measurements are significant, since they prove for the first time that cryogenic SPADs in CMOS operation is possible. It could be very useful for the telecommunication electronics [14] for quantum computer [15] in the future.
In section 2, the cross-section of device structure and pixel unit are presented, including a SPAD, a quenching resistor, and CMOS transistors. The flow chart of the flexible CMOS SPAD sensor technology is also presented. Section 3 presents measurements including DCR's dependence on enrichment doping, temperature and device structures, PDP, afterpulsing, and time jitter performance. The improvements by integrating CMOS buffer circuitries are analyzed. Conclusions are given in Section 4.
Device structure and fabrication flowchart
Figure 1(a) shows a cross-section of the trench-isolated SOI SPAD. It was fabricated by our previously proposed flexible ultrathin-body SPAD technology enabling dual-side illumination [12, 13] . The SPAD was realized in circular shape based on a N + /P junction with P-type enrichment doping in the multiplication region. In order to prevent premature breakdown caused by the higher electric field at the edge of the junction, the enrichment doping is carefully designed to fit in the ultrathin-body SOI structure. Figure 1(b) shows a cross-section of the flexible trench-isolated SPAD integrated with the quenching resistor. Epitaxy silicon island isolated by trench is used as resistor to quench the avalanche. V OP is the operation voltage of SPAD, which is V eb above V BD and biases the SPAD in Geiger mode.
As shown in Fig. 1(c) , each of the two neighboring pixels contains a SPAD, a quenching resistor, and CMOS buffer circuits powered by supply voltage V DD . All the components are based on a trench-isolated silicon island structure to achieve high levels of flexibility. Polyimide or polymer are used as flexible substrates and also act as a microlens to increase fill factor [16] . For the flexible CMOS SPAD sensor, fabrication begins with a p-type SOI wafer prepared by epitaxy technology. The N-well is formed by implantation and followed by a thermal drive-in process. After CMOS transistors are built, a N + P junction is formed by implantation. The manufacture flow is further optimized by variable process parameters such as enrichment doping dose and epitaxy layer thickness, which are analyzed comprehensively in section 3. After junction implantation, the device islands are formed and isolated with trenches by dry etching. Then, a passivation layer is deposited by two-time etch-back to form a spacer at the trench step. Contact holes are opened by dry etching. Then, a metallization layer is sputtered and patterned to realize the first metal interconnection. After the second metallization, which is similar to the first one, the wafer is sent to alloy [16] . The device fabrication flow is summarized in Fig. 2(a) .
The flexible substrate transfer and microlens imprint process are summarized in Fig. 2(b) . Followed by the backside oxide deposition in Fig. 2(a) , the oxide is patterned as mask, as shown in step 1. The sol-gel polymer is coated on top of the device after the backside oxide mask is patterned. Then, the polymer is patterned and the quartz mold is brought into contact with the polymer. Pressure must be applied to form the microlens array on top of the SPAD sensor [17] with lateral alignment accuracy of less than 1µm. The imprinting process was being completed at the time of the writing of the paper. The silicon substrate under the buried oxide layer is then etched away. The etching stops at the buried oxide layer and the SPAD image sensor layer on new flexible substrate can easily be released. Thanks to the function of the polymer layer as both flexible substrate and microlenses, the SPAD sensors with CMOS circuit systems can act as flexible imager with higher fill factor. According to our current design, the fill factor is expected to be higher than 10% with microlenses.
The whole process (except microlens imprinting) was implemented in class-100 cleanroom and MEMS lab at Else Kooi Lab (formerly Delft Institute of Microsystems and Nanoelectronics). Tens of thousands of SPAD pixels from several batches have been fabricated with yield higher than 95% and results are quite consistent and reproducible. 
Measurement results
For the flexible device characterization, the chip was packaged through wire bonding and then measured. Because the SPADs have been already equipped with quenching resistors, which were also integrated on flexible substrate, only the operation voltage V OP and output need to be connected to pads. For the SPAD pixel integrated with CMOS buffering circuit, the power for circuit V DD also needs to be provided through the pad. The SPADs were operated in Geiger mode and the output was monitored by a high speed oscilloscope (LeCroy Wavemaster 8600A). Measurements were done at room temperature except when noted.
characteristics on flexible trench-isolated SPAD integrated with resistor
The analysis of process parameters is based on flexible trench-isolated SPAD integrated with quenching resistor configuration. The SEM microphotograph of ultrathin-body trench-isolated SPAD integrated with quenching resistor is shown in Fig. 3 . The DCR is measured by counting the number of avalanches while the SPAD is operating above V BD in the dark. DCR is caused by a combination of band-to-band tunneling and trap-assisted avalanching and it increased by excess bias voltage V eb .
As shown in Fig. 1 (a), a P + enhancement region is made to form the multiplication region, where the electric field is concentrated in this planar region and higher than that around the curvature region of the junction, thus preventing premature edge breakdown [18, 19] . It has been confirmed by DCR versus temperature measurements [13] that a large portion of dark noise comes from band-to-band tunneling due to high doping in the multiplication region. DCR as a function of excess bias based on devices with different doping levels in the multiplication region is reported in Fig. 4 . To make sure other process parameters are consistent, the devices for comparison are fabricated on the same wafer as Fig. 3(b) . When the P enrichment dose was reduced from 3.00 × 10 13 cm −2 to 2.95 × 10 13 cm −2 , the DCR could be reduced by 50% while the SPAD was operating at the same excess bias. However, if the P enrichment dose was further reduced to 2.90 × 10 13 cm −2 , the DCR would increase by around 6%. If the P enrichment dose is reduced further to 2.85 × 10 13 cm −2 , the SPAD suffers from premature edge breakdown (PEB). As dose decreased, V BD would increase as shown in Fig. 5 . The V BD and DCR are further analyzed at cryogenic temperature. Compared with the breakdown voltages at different temperatures shown in Fig. 5 , V BD decreases with temperature due to the increase of ionization rates at low temperature. Based on the modified Baraff theory. As T decreased, the mean free path increases. The possibility of scattering process when a carrier transport across the junction decreased. Lower energy from the field is needed to ionize the carriers, thus decreasing V BD [20] . At the temperature below ~200K, the dark noise caused by Shockley-Read-Hall (SRH) generation and trap-assisted tunneling in the high electric field region is suppressed and DCR is dominated by band-to-band tunneling [21, 22] . The DCR measurements at cryogenic temperatures of 80K and 200K in Fig. 6 . However, it stops decreasing and even starts to increase when the dose is reduced to 2.90 × 10 13 cm -2 , which is consistent with the results of Fig. 4 . The DCR density could be reduced to hundreds of Hertz per micrometer square at 80K and it is believed to reduce to tens of Hertz per micrometer square [23] if trap-assisted noise from trench etching could be filtered out at extremely low temperatures. However, it needs future work to confirm possible increased afterpulsing below 70K [21, 24] . Electric field simulations (Fig. 7) were performed using MEDICI at device breakdown. If the breakdown voltage increased as enrichment doping decreased, the electric field around the virtual guard ring would be increased when the electric field at the multiplication region reached critical values (2.5 × 10 5 V/cm in silicon) [25, 26] . Shown by Figs. 7(b) and 7(c), when the enrichment doping was reduced, the electric field in the multiplication region still needed to reach the critical values so that the impact ionization avalanche could be triggered and consequently PEB is present due to stronger electric field at the curvature of the virtual guard ring. It shows consistency with the measurement results in Figs. 4 and 6 .
To characterize the sensitivity of the SPAD, a monochromator (Oriel/Newport part 77250) was used to project into an integration sphere with a reference diode. By adjusting the intensity of the light source, the SPAD was operating in single-photon regime and PDP was characterized for incident light wavelength from 400nm to 950nm. In this work, the PDP was measured from flexible SPADs with different body thicknesses in both FSI and BSI, as shown in Fig. 8(a) .
The junction cross-section is shown in Fig. 8(b) . The photon detection probability will be the integral of electron-hole pairs generated at z multiplied by the probability that the injected electron-hole pair triggers an avalanche, signed by ( ) p z .
where ( ) T λ is the net transmission of light with wavelength λ and ( ) f z λ is the absorption probability at depth z:
with ( ) μ λ being the mean penetration depth of the light into the silicon.
Equation (1) and (2) could be applied according to FSI and BSI respectively: In FSI, the PDP can be modeled as:
While in BSI, the PDP can be modeled as: By increasing the body thickness from 1.5µm to 3.0µm, the peak value of PDP in FSI could be increased from 10.7% to 14.3%. At long wavelengths above 650nm, it was enhanced by a factor of more than two. Thanks to the wider depletion and neutral regions below the multiplication region from Z1 to Z3 in the thicker body, the larger probability for carrier generated in the higher depth drifting to the multiplication region where avalanches were triggered, consequently PDP got enhanced especially at long wavelength, with which photons have longer penetration depth.
The PDP of 3.0µm-thick body was lower than that of 1.5 µm-thick body in BSI, which was different from the results in FSI. In BSI, most of carriers are generated at the bottom of silicon body due to absorption probability of photons in silicon. Carriers at the bottom have to drift across the neutral region to multiplication region and trigger the avalanches. The neutral region is wider for thicker body since the cathode and enrichment region profile are made from the front side,
decreases if w z is increased and consequently the probability is reduced for carriers generated by photons with particular penetration depth to trigger avalanches.
Comparison between flexible trench-isolated SPAD and planar SOI SPAD
Trap assisted avalanche is another major source of dark noise in SPADs and it is caused by trap or defect states in silicon. To investigate the source of the defect states in the device structure, a planar SPAD structure is developed with LOCOS as isolation instead of trenches, as shown in Fig. 9 (b) [27, 28] . The SEM microphotograph of the planar SPAD in circular shape, which is isolated by 840nm-thick LOCOS by wet oxidation, is shown in Fig. 9(a) . The DCR measurement shows a much lower dark noise in the planar SPAD. While the diameter of the multiplication region was designed from 3µm to 6µm, the DCR density varied from 100Hz/µm 2 to 300Hz/µm 2 , which is comparable with DCR density of standard SOI CMOS SPAD [23] . PDP was measured with peak value of 13% at 450nm. The comparison of DCR density and PDP based on trench-isolated SPAD and planar SPAD at different temperatures are summarized in Fig. 10 .
Defects at the sidewall of trenches etched in plasma act as traps causing DCR by SRH generation and trap-assisted tunneling. It is proven by comparison of DCR density, while trench-isolated SPADs show 10 times higher DCR density than planar SPADs at room temperature. At cryogenic temperatures, as low as 80K, trap-assisted avalanche could be suppressed, showing trench-isolated SPADs with relatively similar DCR density if compared with planar SPADs. However, planar SPADs have worse flexibility because of the lack of silicon islands. Passivation technology [29] to filter out the defects around the trenches needs to be investigated in future work. 
Characteristics on flexible CMOS SPAD sensor pixel
Following the fabrication in the proposed technology of Fig. 2 , the first SPAD image sensor integrated with CMOS buffering circuit on flexible substrate is demonstrated as shown in the SEM microphotograph of Fig. 11(a) . The photo of bent flexible SPAD pixel farm sample, which has been released and mounted to PDMS piece is shown in Fig. 11(b) . The current-voltage (I-V) characteristics of the SPAD is shown in Fig. 12(a) for different light conditions. The breakdown voltage is around 26.5V at room temperature. The inputoutput response of the CMOS inverter, which consists of three-terminal SOI PMOS and NMOS transistors, is shown in Fig. 12(b) .
Compared with our previous work [13] , the excess bias could be increased to 4V, by integrating CMOS buffering. The characterization of DCR as a function of excess bias is shown in Fig. 13 . The pixel was operated at cryogenic temperatures, as low as 80K, which is reported for the first time to our knowledge; DCR could be reduced dramatically, as shown in Fig. 14 .
When the temperature decreased from 200K to 80K, the DCR was reduced by a factor of 4. DCR is dominated by band-to-band tunneling below 200K and has weaker dependence on T. At temperature of 80K, DCR density reaches around 150Hz/µm 2 . The cryogenic dark noise is consistent with planar SPAD at room temperature in which the trap-assisted avalanches are suppressed. To characterize the sensitivity of the pixel, the PDP was measured in both FSI and BSI, as shown in Fig. 15 . The peak PDP was measured to 13% in FSI and 12.5% in BSI, showing similar PDP performance. Thanks to the buffering circuits, the dead time, which depends on RC recharge time in a passive quenching scheme, was reduced to around 160ns and consequently the PDP was enhanced because of the increased saturation frequency, which determines the upper limit of photon flux detectability. Furthermore, compared with our previous work [13] , the PDP performance of BSI SPADs was also improved. The main reason is the epitaxy body doping diffusion into the intrinsic silicon layer on top of the buried oxide layer (Fig. 8(b) ) with a large amount of thermal budget in CMOS process. Hence, the electric field across the neutral region near the buried oxide could be enhanced, resulting in increased ( ) p z between 3 z and w z and larger probability for the photon-generated carriers in BSI to drift to the depletion region where avalanches are triggered. This observation is further confirmed by jitter performance measurements, which show the uncertainty of the time interval between the arrival time of the photon at the SPAD and the time when the pulse is generated.
To evaluate the timing jitter of the flexible CMOS SPAD sensor, the SPAD sensor was illuminated in both FSI and BSI modes by a laser source operated at 40MHz emitting light pulses with a few picoseconds of timing jitter. Neutral density filters were used to attenuate the light to single photon regimes. The time difference between the laser output trigger and the falling edge from the inverter buffer, which was synchronized with the avalanche rising pulse from a quenching resistor, was measured using a high speed oscilloscope (Lecroy Wavemaster 8600A, trigger jitter: less than 2.5ps). The statistical distribution of the time difference is used in estimating the device jitter in terms of full width half maximum (FWHM).
In this work, the jitter measurement performed using two different lasers with wavelengths of 405nm and 637nm (Advanced Laser Diode Systems GmBH) and timing jitter of less than 3ps. The results are summarized in Fig. 16 . A 590ps and 520ps FWHM jitter was measured in FSI and BSI respectively, when the pixel was exposed to the 405nm laser. Compared with single SPAD without CMOS integration [13] , the jitter mismatch between FSI and BSI operation has been reduced significantly, indicating similar charge collection time uncertainty when the light reaches the front side and the back side of the sensor. When the pixel was exposed to the 637nm laser, a FWHM jitter of 450ps was measured in FSI and 480ps was measured in BSI. The measured jitter had a smaller FWHM value when using a 637nm laser than a 405nm, indicating that the photons with higher penetration depth generate carriers closer to the multiplication region. The symmetric jitter performance in FSI and BSI is consistent with PDP measurements.
Afterpulsing is a type of correlated noise caused mostly by traps. Trapped carriers have activation energy levels close to the energy bands that can capture and hold carriers during the avalanche process within a release lifetime in the order of nanoseconds. The minimum hold off time is limited by afterpulsing [30] . A histogram of avalanche inter-arrival times is shown is Fig. 17(a) with an exponential fit. When a dead time of 160ns is implemented, the afterpulsing probability is around 1.9%, as shown in Fig. 17(b) . At 1.5µs, afterpulsing probability is less than 0.1% or about 5 times better than in [13] .
The comparison between flexible trench-isolated SPAD, planar SPAD, and flexible CMOS-buffering SPAD pixel is summarized in Table 1 .
Conclusions
The Geiger-mode performance of flexible ultrathin-body SPADs has been comprehensively analyzed. Dark counts due to band-to-band tunneling and trap-assisted avalanche in this structure can be reduced to 0.15kHz/µm 2 by optimizing the multiplication doping and by operating at cryogenic temperatures, as low as 80K. By integrating pixel-level CMOS buffering circuits, the excess bias could reach 4V, thereby enabling a similar high peak PDP of 13% and better jitter performance with FWHM value of 450ps and 480ps in frontside-and backside-illumination respectively. Afterpulsing and crosstalk are negligible (less than 0.1% at nominal dead time larger than 1µs). This technology provides a suitable solution to advanced implantable photon counting devices for retinal prosthesis and other localized therapeutic/diagnostic solutions. Other applications include flexible multi-aperture imaging, anti-vignetting focal plane optimization, (implantable) bio-compatible chronic medical monitoring and wherever bended imaging sensors are essential. 
