a) These authors contributed equally to this work.
Photonic integrated circuits (PICs) have emerged as a scalable platform for complex quantum technologies using photonic and atomic systems [1] [2] [3] . A central goal has been to integrate photon-resolving detectors to reduce optical losses, latency, and wiring complexity associated with off-chip detectors. Superconducting nanowire single-photon detectors (SNSPDs 4,5 ) are particularly attractive because of high detection efficiency 6 , sub-50-ps timing jitter 7 , nanosecondscale reset time 8 , and sensitivity from the visible to the mid-infrared spectrum 9 .
However, while single SNSPDs have been incorporated into individual waveguides 10, 11 , the system efficiency of multiple SNSPDs in one photonic circuit has been limited below 0.2% 12,13 due to low device yield 14 . Here we introduce a micrometer-scale flip-chip process that enables scalable integration of SNSPDs on a range of PICs. Ten low-jitter detectors were integrated on one PIC with 100% device yield. With an average system efficiency beyond 10% for multiple
SNSPDs on one PIC, we demonstrate high-fidelity on-chip photon correlation measurements of nonclassical light.
Photonic integrated circuits are being developed for a wide range of applications in quantum information science, including quantum simulation 1, [15] [16] [17] , quantum photonic state generation [18] [19] [20] [21] , quantum-limited detection 22 , and linear optical quantum computing 2,23-25 .
These applications require multiple detectors with low timing jitter. The lowest timing jitter for infrared photon detection has been achieved with SNSPDs based on sub-100-nm-wide and ∼ 4-nm-thick niobium nitride (NbN) nanowires. However, to date there has been no scalable approach to integration of SNSPDs into photonic circuits: while single isolated waveguide-integrated SNSPDs have been demonstrated 10, 11 , the highest reported system detection efficiency for just two SNSPDs integrated into the same photonic circuit remains significantly below 1% 12, 13 . The central challenge when building systems with multiple SNSPDs remains the low fabrication yield, which is limited by defects at the nanoscale 14 .
This yield problem is exacerbated when such detectors are integrated onto photonic chips, which can require tens of additional fabrication steps of their own. Here we report on a micrometer-scale flip-chip process developed to overcome the yield problem by separating the PIC and the SNSPD fabrication processes. Our approach is compatible with a wide range of PICs, including CMOS-compatible silicon photonics, in a back-end-of-the-line step. were fabricated on ∼ 200-nm-thick silicon nitride (SiN x ) membranes; silicon-on-oxide (SOI)
PICs were fabricated separately (see Methods). After evaluating the SNSPDs in a cryostat, high-performance detectors were selected from the fabrication chip and transferred onto the desired SOI waveguides. Using this method, we assembled a proof-of-concept photonic circuit, shown in Fig. 1(b) , comprising an optical network with two input and four output ports, each coupled to an SNSPD. We measured an estimated on-chip detection efficiency up to 45% for 1550-nm-wavelength single photons and timing jitter as low as 42 ps. The light was coupled into the waveguides using inverse tapered couplers with ∼ 3 dB insertion loss 26 , resulting in a system detection efficiency (from the external fiber) up to 19 ± 2%.
This system efficiency enables the first on-chip intensity autocorrelation measurements of nonclassical light, demonstrated here for photon pairs generated by spontaneous parametric down conversion. The detector comprised multiple nanowires connected in parallel (see SI), as shown in Fig. 2(a) . This SNSPD variant 27, 28 has been shown to double the signal-to-noise ratio of the photodetection voltage compared to traditional single-wire SNSPDs. The detector length was designed using a finite-element model 29 to ensure optical absorption exceeding 50% (see SI).
We fabricated 225 detectors on a ∼ 200-nm-thick SiN x layer over a Si substrate. The underlying silicon was then etched (see Methods), leaving hundreds of free-standing membranes carrying SNSPDs. One of these suspended membranes is shown in Fig. 2 (b). Each membrane was connected to the bulk substrate through six narrow (∼ 2-µm-wide) bridges, two of which connected the detector on the membrane electrically to large contact pads on the bulk substrate for testing the detectors after the etch step (see SI).
We characterized all detectors to identify low-jitter, high-efficiency devices (typically about 30% of the detectors). As shown in Fig. 2 (c), we removed selected detector membranes from the substrate using tungsten microprobes coated with polydimethylsiloxane (PDMS)
adhesive. We then placed membranes detector-side-down onto the target waveguide with sub-1-µm alignment accuracy under an optical microscope. For electrical readout, the gold pads on the membranes contacted complementary pads on the PIC ( Fig. 2(d) ). These goldgold contacts withstood repeated thermal cycles with no noticeable degradation (see SI). high-performance detectors, we were able to achieve perfect yield in the assembled device, resolving the non-scalability of low-jitter SNSPD fabrication 14 . Using this process, we integrated four detectors (labeled A1, A2, B1 and B2) on a PIC and characterized the performance of the PIC shown in Figs. 1(b,c) using four parameters: system detection efficiency (SDE), on-chip detection efficiency (ODE), FWHM timing jitter (TJ), and noise-equivalent incident power (NEIP). The SDE includes all losses (i.e., coupling and transmission) between the fiber port outside the cryostat and the detector.
We determined the SDE from the ratio of the SNSPD photocount rate to the photon flux coupled into the fiber port (see SI). Our chip reached an SDE of 19% for input A (11% for A1 and 8% for A2) and 7% for input B (3% for B1 and 4% B2). These SDE values represent an improvement of two orders of magnitude compared to previous approaches for multi-detector integration 12 .
The ODE is defined as the probability that a photon already coupled into the waveguide is detected 11, 12 (see SI). We estimated the ODE as SDE/η c , where η c = 0.25 accounts for coupling losses into the PIC (3 dB) and the splitting ratio of the directional couplers before the SNSDPs (3 dB). The transferred detectors reached ODEs between 12% and 45% and 42-to 65-ps TJ.
The NEIP is given by SDCR/SDE ·hω, where SDCR is the system dark count rate andhω = 0.81 eV. Fig. 3(b) shows the NEIP vs. ODE for the waveguide detectors on couplers A and B. The ratio of the power incident onto the detectors (IP) and the NEIP characterizes the signal-to-noise ratio for single-shot measurements. In this work, the NEIP was limited by radiation leakage (see SI) through a cryostat window used to image and align the lensed fibers to the polymer couplers ( Fig. 1(c-I) ). Hence, for subsequent measurements, we operated the detectors at lower ODEs of 10 -32% (circled points in Fig. 3(b) ), which reduced the dark count rate and resulted in a ratio of IP/NEIP ∼ 0.5 -1.7.
We used these high-SDE SNSPDs to characterize time-energy entangled photon pairs entirely on the PIC. Entangled photon pairs were generated by spontaneous parametric down conversion (SPDC) from a 1-cm periodically poled potassium titanyl phosphate (PPKTP)
waveguide, as shown in Fig. 3(a) . Signal and idler photons of ∼ 1 ps duration and orthogonal polarization were separated using a polarizing beam splitter and sent into inputs A and B of the PIC. The SPDC pump power was adjusted to generate pairs at ∼ 1.5 · 10 8 Hz,
corresponding to a multi-pair probability of ∼ 4.4·10 −4 per TJ. We obtained the second-order correlation function from g
AB (τ i ) = N AB (τ i )/(r A r B ∆τ i T ), where N AB (τ i ) is the measured number of coincidences between inputs A and B at time difference τ i , r A (r B ) is the count rate from input A (B), ∆τ is the coincidence bin duration, and T is the integration time. AB (τ i ) function. Photon bunching is evident between inputs A and B, but not within individual channels (i.e., between A1 and A2 or B1 and B2), as expected for an entangled photon source. The observed peak heights of g AB (0) ∼ 6 are lower than the theoretical value for ideal detectors due to the finite IP/NEIP ratio of our detectors (see Methods). By contrast, when pulses from a mode-locked laser were injected into inputs A and B with average photon number per pulse greater than one, bunching was observed between all detector pairs ( Fig. 3(e) ), as expected for a pulsed classical source. AB (τ )-measurements of an entangled-photon source coupled into the PIC (cooled to 3 K). (b) Noise-equivalent incident power vs. on-chip efficiency for the detectors shown in Fig. 1(b) . The circles mark the operation points chosen for subsequent coincidence measurements. (c) Photodetection delay histogram of the detectors shown in Fig. 1(b) when operated at the maximum on-chip efficiency. (d, e) Coincidence counts vs. time delay between B1 and {A1, A2, B2} for the entangled-photon-pair source (d) and for a mode-locked sub-ps-pulsed laser (e). The average laser power was adjusted to match that of the photon-pair source.
The ability to pre-select functioning devices enables scaling to more detectors with unity yield. Fig. 4(a) shows ten SNSPDs (D1-10) on adjacent waveguides with TJ values of 39 ps -57 ps for 1550-nm-wavelength light. For rapid characterization, these devices were measured by top illumination in a cryogenic probe station. The photodetection delay histograms for all detectors are shown in Fig. 4(b) . The membrane transfer demonstrated here could be used to integrate other electro-optic devices, such as III-V lasers or single-photon sources, onto PICs. Since the device membrane is flexible, it conforms to the target chip, even if that chip is not perfectly flat. Because of the small size of the membrane, the process is also relatively tolerant to defects on the target chip, as opposed to processes involving large-area flip-chip bonding (e.g., see Ref. 30 ), which require both surfaces to be free of defects.
In conclusion, we have demonstrated the scalable integration of high-performance SNSPDs into photonic integrated circuits. We assembled ten adjacent waveguide-integrated detectors on a silicon PIC with 100% yield and observed detector timing jitter values between 39 and 57 ps. Waveguide-integrated SNSPDs on the same PIC enabled on-chip g (2) (τ )-measurements of nonclassical light. Scaling to many tens to hundreds of detectors would ultimately be limited by the readout complexity. There is ongoing work to address this problem using electrical multiplexing schemes 31 . For more detectors, which require greater bandwidth, optical wavelength division multiplexing could be used, employing high-speed (> 50 GHz) modulators already available on PICs 32 . The integration process demonstrated here is CMOS compatible; indeed, the PICs used in this experiment were fabricated in a CMOS compatible process with the exception of the polymer waveguide couplers, which can be replaced with SiN x 33 . Thus, it appears likely that tens to hundreds of SNSPDs and other heterogeneous circuit elements can be integrated into high-performance PICs. This demonstration opens the door to fully integrated, high-performance photonic processors for quantum information science. were exposed outside the hairpin-shaped detector. These dummy structures, also referred to as proximity-effect-correction features, are shown as parallel lines in dark grey outside the detector in Fig. 2(a) .
METHODS
Detector suspension. The detector was covered with S1813 and a trench pattern was exposed in the photoresist. This pattern was then used as an etch mask to define trenches around the detector through the SiN x layer via RIE with CF 4 . This trench pattern left the underlying silicon substrate exposed. The silicon under the SiN x layer was removed using XeF 2 , a selective isotropic etch gas. In the final step, the photoresist was removed in an NMP solution (see SI), resulting in a detector on a suspended SiN x membrane.
PIC fabrication. The PIC was fabricated on a 10 Ω-cm, p-doped, 200-mm silicon-oninsulator (SOI) wafer from SOITEC. The wafer had a 220-nm-thick silicon device layer on top of a 2 µm buried oxide layer. The 500-nm-wide silicon waveguides were fabricated on a CMOS line at the IBM Watson Research Center using electron-beam lithography. In a subsequent optical lithography step, SU8 polymer couplers were fabricated to allow sub-3-dB coupling loss from a lensed fiber to the silicon waveguide (see Ref. 35 for further details). The gold pads on the PIC were fabricated in a similar manner to that outlined in the detector fabrication section above.
Timing jitter measurements. We used a mode-locked, sub-ps-pulse-width laser emitting at 1550 nm wavelength and 38 MHz repetition rate. The laser output was split into two SMF28 fibers, which we coupled to the detector under test and to a low-timing-jitter photodiode. The light coupled to the detector was attenuated to < 5 pW and operation of the detector in single-photon regime was checked by confirming the linearity of the photocount rate as a function of incident photon flux (see SI). For detectors A1, A2, B1 and B2 the light was coupled to the waveguides A and B using a lensed fiber as shown in Fig. 1(b) and Fig. 1(c-I) . The second sample, containing detectors D1-10, was back-illuminated with a high-NA fiber with light from the mode-locked laser, and single-photon operation regime was confirmed as described above. The electrical output from the detector and from the photodiode were sent to a 6-GHz-bandwidth, 40-GSamples/s oscilloscope. We measured time delay t D between the detector pulse (start signal) and the pulse from the fast photodiode (stop signal). We acquired the instrument response function (IRF), a histogram of > 2000 samples of t D , and measured the timing jitter of the detector, which was defined as the FWHM of the IRF.
Correlation measurements. g
AB (τ ) can be calculated from experimental data using the formula given in the main text. To incorporate detector dark counts, we define rates r Y X , where X ∈ {A, B} (for channels A and B, respectively) and Y ∈ {P, D} (corresponding to a 'photon' and 'dark count,' respectively). r 
where η H is the probability that channel B registers a photon given that channel A also registers a photon (i.e. the heralding efficiency) and ∆τ is the bin duration. For r 
AB (0) =
In our experiment, g
AB (0) ≈ 5, which gives an estimate of the heralding efficiency, η H = 3.5 · 10 −3 .
