Integrated quantum photonics is recognized as a key enabling technology on the road towards scalable quantum networking schemes. However, many state-of-the-art integrated quantum photonics demonstrations still require the coupling of light to external photodetectors. On-chip silicon single-photon avalanche diodes (SPADs) provide a viable solution as they can be seamlessly integrated with photonic components, and operated with high efficiencies and low dark counts at temperatures achievable with thermoelectric cooling. In this paper, we report the design and simulation of silicon waveguide-based SPADs on a silicon-oninsulator platform for visible wavelengths, focusing on two device families with different doping configurations: p-n + and p-i-n + . We calculate the photon detection efficiency (PDE) and timing jitter at an input wavelength of 640 nm by simulating the avalanche process using a 2-D Monte Carlo method, as well as the dark count rate (DCR) at 243 K and 300 K. For our simulated parameters, the optimal p-i-n + SPADs show the best device performance, with a saturated PDE of 52.4 ± 0.6% at a reverse bias voltage of 31.5 V, full-width-half-max (FWHM) timing jitter of 10 ps, and a DCR of <5 counts per second at 243 K.
many recent demonstrations still require the coupling of light to external single-photon detectors [2] , [3] . Major improvements in device footprint and scalability could be achieved if these photodetectors reside on the same chip and couple directly to the photonic waveguides [4] .
Superconducting nanowire single-photon detectors (SNSPDs) are a state-of-the-art solution, featuring waveguide integrability, near-unity quantum efficiencies, low dark count rate of a few counts per second (cps), and low timing jitter down to <20 ps [5] , [6] . However, they require cryogenic operating temperatures of a few degrees Kelvin, which is expensive and prohibitive for large-scale deployment.
A practical alternative can be found in single-photon avalanche diodes (SPADs), which are typically reverse biased beyond the breakdown voltage. In this so-called Geiger mode, a single incident photon can trigger a macroscopic avalanche current via a cascade of impact ionization processes. In contrast to SNSPDs, SPADs typically only require thermoelectric cooling and can even operate at room temperature [7] , [8] . Moreover, SPADs can benefit from mature complementary metal-oxide semiconductor (CMOS) fabrication technologies [9] , making them a promising candidate for scalable manufacturing.
Waveguide-coupled photodetectors have gained increasing attention in recent years; compared to traditional free-space devices, waveguide devices can achieve higher speeds and responsivities, and can also be integrated with photonics circuits and platforms [10] , [11] . To date, reports of waveguide-coupled SPADs have been limited to operation at infrared wavelengths [11] , [12] . However, many relevant quantum systems, including trapped ions [13] and color centers in diamond [14] , operate in the visible spectrum, which makes efficient, low-noise SPADs for visible wavelengths highly desirable. Such devices would also find numerous applications in other important technologies, including LIDAR [15] , non-line-of-sight imaging [16] , and fluorescence medical imaging [17] .
In this paper, we extend our recent work on the design and simulation of silicon waveguide-coupled SPADs for visible light operation, where we used a 2D Monte Carlo simulator to obtain the photon detection efficiency (PDE) and timing jitter, and studied the effect of different waveguide dimensions and doping concentrations [18] . Here we perform an in-depth study of different doping configurations, focusing on two device families: p-n + and p-i-n + . In addition to the PDE and timing jitter, we also analyze the expected dark count rate (DCR) at room temperature (a) SPAD structure, consisting of a Si rib waveguide end-fire coupled to an input Si 3 N 4 waveguide, (b) optical mode profile at 640 nm for the fundamental (quasi-)TE mode, (c) p-n + doping configuration with the junction placed at a distance Δj from the right edge of the waveguide core, (d) p-i-n + doping configuration with an intrinsic region width ΔW . The cross section is constant along the length of the waveguide. Images are not drawn to scale. and at −30 • C (243 K), which is a typical operating temperature achievable by Peltier coolers.
Many details regarding the basic SPAD geometry and simulation procedure can be found in ref. [18] and are not repeated here; instead we provide the essential points and highlight the improvements we have made on our previous work.
II. WAVEGUIDE-COUPLED SPAD DESIGNS

A. Device Geometry
The SPAD structure is shown in Fig. 1 . It is based on a silicon-on-insulator (SOI) platform, and consists of a 16 μm long silicon (Si) rib waveguide with an absorption of >99% at 640 nm. Input light is end-fire coupled from an input silicon nitride (Si 3 N 4 ) rectangular waveguide, which has high transmittivity at visible wavelengths [9] . We choose this input coupling geometry over a phase-matched interlayer transition, commonly used in integrated photodetectors for infrared wavelengths [12] , [19] , as the latter is difficult to achieve due to the large difference in refractive indices for Si (n = 3.8) and Si 3 N 4 (n = 2.1). An input coupling efficiency of >90% at the Si/Si 3 N 4 interface is obtained using 3D Finite Difference Time Domain (FDTD) simulations (Lumerical).
The structures are cladded with 3 μm of silicon dioxide (SiO 2 ) above and below. In this study, we fixed the waveguide core width and height at 900 nm and 340 nm respectively, with a shallow etch giving a rib height of 270 nm. A relatively wide waveguide width allows for greater flexibility in the diode junction design, as well as affording a greater tolerance for the fabrication of the junction.
Electrical connections to the device would be made via metal electrodes deposited on top of heavily-doped p ++ and n ++ regions at the far ends of the device (along the x axis).
B. Doping Configurations
Our previous simulation study of p-n + SPADs [18] showed that increasing waveguide core widths (up to 900 nm) could lead to a higher PDE, as charge carriers can travel a larger distance over which avalanche multiplication can occur. Here, we vary the placement of the p-n + junction, and investigate the hypothesis that increasing the displacement Δj of the junction beyond the edge of the waveguide core region ( Fig. 1(c) ) would also enhance this effective distance, and hence the PDE.
Another observation was that impact ionization was most efficient in a narrow region where the highest electric fields are concentrated (similar to Figs. 2(a)-(c)). Widening this high-field region could enhance the PDE, and is achievable by introducing an intrinsic region between the p-and n + -doped areas ( Fig. 1(d) ). However, doing so would also lower the peak electric field strength ( Fig. 2(d) ), which could in turn decrease the impact ionization efficiency. Here we explore the effectiveness of such p-i-n + devices, and attempt to find the optimum width of the intrinsic region ΔW , centered at 300 nm from the edge of the waveguide core.
For both device families, we maintain a constant geometry and doping profile along the length of the waveguide. In this study, we choose a n + (p) doping concentration of 1 × 10 19 (2 × 10 17 ) dopants/cm 3 , and a lightly p-doped intrinsic region with 1 × 10 15 dopants/cm 3 .
III. SIMULATION METHOD
A. DC Electrical Analysis
For each set of device dimensions and doping configurations, we perform a DC electrical analysis (ATLAS, Silvaco Inc.) by applying a reverse bias voltage V B across the device electrodes.
For each device, the cathode and anode are placed equidistant from the center of the Si waveguide, with a minimum n + region width of 45 nm. We thus obtain the electric field F(r), ionization coefficients, and other parameters dependent on the 2D position vector r in the x − y plane; these are required for the Monte Carlo simulation of the avalanche process. Further details can be found in ref. [18] . The breakdown voltage is also identified as the reverse bias voltage V B at which the device current increases sharply and crosses a threshold of 0.01 mA.
B. 2-D Monte Carlo Simulator
In comparison to deterministic techniques [20] , Monte Carlo simulators are well-suited for analyzing SPAD performance, as they can evaluate the timing jitter by modeling the stochastic nature of the impact ionization and avalanche buildup processes. For applications such as quantum key distribution (QKD) [21] and LIDAR [22] , low timing jitter is critical to the overall system performance.
In this work, we adapt the 2D Monte Carlo simulator detailed in ref. [18] . Briefly, a random path length (RPL) model is used to simulate the avalanche multiplication process [23] [24] [25] . Each simulation run starts with a photon absorption which creates an electron-hole pair. At each time step of interval Δt rpl , each charge carrier is accelerated by the electric field and, depending on the ionization coefficients, probabilistically causes an impact ionization after traversing a random path length. This creates further electron-hole pairs, which can then undergo further impact ionizations and eventually lead to a self-sustaining avalanche. Charge carriers are lost when they exit the device boundaries; we note that unlike in ref. [18] , the Monte Carlo simulation in this work considers the entire device area (the whole of the p, i, and n + regions) and is not restricted to the waveguide core region.
A successful detection event results if the device current reaches a detection threshold I det . Treating the success and failure outcomes as a binomial distribution, the PDE is then the fraction of successful detection events over all simulation runs, with an uncertainty given by the standard deviation (s.d.). The distribution of avalanche times (i.e. time between photon absorption and reaching I det ) yields the timing jitter.
1) Diffusion in Quasi-Neutral Regions:
The SPAD can be divided into a depletion region and quasi-neutral regions depending on the electric field strength. In the depletion region, the dominant charge carrier transport process is the drift force due to the strong electric fields, and the RPL model applies. However, in the quasi-neutral regions where electric fields are weak, impact ionization can be neglected, and a diffusion model which combines random walks (driven by Brownian motion) and the electric drift force is more suitable. Similar to ref. [18] , we use a threshold field to define the quasi-neutral region, i.e. |F(r)| < F thr = 1 × 10 5 V/cm, which is on the same order as the breakdown field in silicon [26] .
We use the fundamental (quasi-)TE mode profile ( Fig. 1(b) ) as a probability density map to determine the location where the initial electron-hole pair is injected for each simulation run. If the injection occurs in the quasi-neutral regions, charge carrier transport is simulated using the diffusion model; if the charge carrier crosses over to the depletion region, the simulation continues under the RPL model.
2) Device Current Via Shockley-Ramo's Theorem: Ref. [18] calculates the device current using a 1D approximation of Ramo's theorem, which only considers the motion of charge carriers in one direction. However, this would not be suitable here given our SPAD designs and more complex electric field profiles. As such, we use the generalized Shockley-Ramo's current theorem [27] , [28] , where each charge carrier i at position r i contributes to the device current I induced on the cathode via:
where q i is the charge, v i (r i ) is the instantaneous velocity, and F 0 (r i ) is a weighting electric field calculated in a similar way to F(r), but under these modified conditions: (i) the cathode is at unit potential, while the anode is grounded; (ii) all charges (including space charges) are removed, i.e. the waveguide is undoped [29] .
C. Dark Count Rate
Even in the absence of light, free charge carriers may be generated, which can probabilistically trigger avalanche events and result in dark counts. Due to the high electric fields in SPADs, the most relevant carrier generation mechanisms are thermal excitation enhanced by trap-assisted tunneling (TAT), and band-to-band tunneling (BTBT).
We quantify the dark noise by calculating the DCR R D (T ) via [30] :
(2) where T is the temperature in Kelvin, L = 16 μm is the SPAD length, P trig (r) is the avalanche triggering probability, and G TAT (r, T ), G BTBT (r, T ) are the net generation rates of charge carriers (per unit volume) of their respective mechanisms.
1) Trap-Assisted Tunneling: The thermal generation rate of carriers can be obtained from the Shockley-Read-Hall (SRH) model, modified to account for TAT [31] , [32] :
where n i (T ) is the intrinsic carrier concentration and τ g (r, T ) is the electron-hole pair generation lifetime, which can be expressed in terms of the recombination lifetime τ r (r, T ) [33] :
where the exponential term describes the main temperature dependence in TAT, and the field effect function Γ(F(r), T ) describes the effect of electric fields. E t and E i are the energy levels of the recombination centers (assumed to be equal to that of traps at the Si/SiO 2 interface [34] ) and the intrinsic Fermi level, respectively, and k B is the Boltzmann constant. The field effect function Γ(F(r), T ) is:
in which
where q is the electron charge, and m * t = 0.25 m 0 is the effective electron tunneling mass, with m 0 being the electron rest mass [35] .
2) Band-to-Band Tunneling: The BTBT mechanism has been shown to be important at electric field strengths above 7 × 10 5 V/cm, where band-bending is sufficiently strong to allow significant tunneling of electrons from the valence band to the conduction band [35] . This rate can be expressed as:
where B A , B B , and B Γ are model parameters; we use values based on ref [35] . The values of the parameters used in our calculations are listed in Table I , and further details of their derivation can be found in the Appendix.
3) Avalanche Triggering Probability: To obtain the avalanche triggering probability P trig (r) for each device, we perform >40 k Monte Carlo simulation runs, with photon absorption positions distributed uniformly across the device. A representative map of P trig (r) is shown in Fig. 3 .
IV. SIMULATOR OPTIMIZATION
The Monte Carlo simulations can become computationally expensive due to the need to keep track of and model individual charge carriers, especially when the number of charge carriers grows exponentially during the avalanche process. If we would use the same simulation parameters in our previous work [18] to model one SPAD at a given bias V B , our simulator (implemented in Python) would require ∼24 k CPU-hours on two sets of 12-core CPUs (Intel Xeon E5-2690 v3). Such a high computation cost would limit the variety of SPAD designs we can feasibly study.
Thus, we first use a representative device (p-n + SPAD with Δj = −50 nm, at V B = 21.5 V) to perform a series of preliminary studies to optimize the simulation parameters: the detection threshold I det , RPL time step Δt rpl , and number of simulation runs per parameter set. We aim to reduce computation time without sacrificing the simulation accuracy.
A. Detection Current Threshold
A reasonable discriminator threshold in experimental SPAD characterization setups is I det = 0.2 mA [36] , a value we used previously [18] . However, it may not be necessary to simulate the multiplication of charge carriers up to that point as the avalanche process might already have passed a self-sustaining threshold at a lower current. On the other hand, a very low I det would overestimate the PDE by falsely identifying small avalanches that would not be self-sustaining, and underestimate the timing jitter by not simulating the full avalanche.
By varying I det while fixing Δt rpl = 1 fs with 2 k simulation runs per I det value (Fig. 4(a) ), we conclude that we can lower I det to 20 μA without significant deviations in PDE or timing jitter.
B. RPL Time Step
A larger RPL time step Δt rpl would speed up simulations, but reduces time resolution and hence accuracy. A suitable choice would be just short enough such that the charge carrier environment does not change too significantly between each step, even in the high-field regions with large field gradients.
We vary Δt rpl while fixing I det = 20 μA with 2 k simulation runs per Δt rpl value ( Fig. 4(b) ). We choose Δt rpl = 10 fs as an optimal value; for larger time steps, PDE begins to deviate significantly compared to the previous value of Δt rpl = 1 fs.
C. Number of Simulation Runs
We analyze the PDE over an increasing number of simulation runs for Δt rpl = 10 fs and I det = 20 μA, and observe that the PDE converges to a stable value after several thousand runs. We choose to perform at least 6k runs per parameter set to reduce the relative uncertainty to ∼1%.
Compared to the previous simulation parameters (i.e. Δt rpl = 1 fs, I det = 0.2 mA, 18k runs), our optimized values (Δt rpl = 10 fs, I det = 20 μA, 6k runs) require only ∼90 CPU-hours per set, indicating an improved timing performance by a factor of ∼270. 
V. SIMULATION RESULTS AND DISCUSSION
A. Photon Detection Efficiency and Timing Jitter
We simulate each device at increasing reverse bias voltages V B , starting from just above its breakdown voltage. For all devices, PDE increases with V B and reaches a saturation level (representative plots shown in Fig. 5(a) ). We define the saturated bias voltage V sat as the lowest V B value where the obtained PDE values within a ±1 V range agree within their 1 s.d. uncertainty; the PDE at V sat is then the saturated PDE.
The distribution of avalanche times is generally asymmetric, especially for p-i-n + SPADs with ΔW > 600 nm (see Fig. 5(b) ). Long tails in the timing distribution can adversely affect applications requiring high timing accuracies, e.g. satellite-based quantum communications [40] . Therefore, we present the fullwidth-half-maximum (FWHM) and full-width-tenth-maximum (FWTM) timing jitter, both extracted from timing histograms with 1 ps bin size, to better describe the timing performance of the SPADs. In general, timing jitter does not vary significantly with V B , except when V B is near the breakdown voltage.
1) p-n + SPADs: For p-n + SPADs, we observe a general trend of PDE increasing with the junction displacement Δj (Fig. 6(a) ). If the junction is placed further away from the waveguide core, charge carriers injected after a photon absorption in the core region travel a longer distance and can undergo more impact ionizations, thus increasing the likelihood of a successful avalanche. The stochastic avalanche process taking place over a larger distance would also explain the increasing timing jitter at higher Δj. However, Δj being too large would weaken the electric field strength in the waveguide core, which would lead to more charge carriers being lost at the waveguide boundaries due to random walk; this may explain the slight drop in PDE for Δj > 400 nm.
The observed drop in PDE for Δj = 100 nm is due to an "edge effect": when the junction is placed in close proximity to the waveguide rib edge, we observe a narrowing of the effective impact ionization region where ionization coefficients are high ( Fig. 2(b) ), which leads to a lower PDE.
The highest saturated PDE obtained for p-n + SPADs is 48.4 ± 0.6% at V B = 26.5 V for Δj = 400 nm, with a FWHM timing jitter of 9 ps.
2) p-i-n + SPADs: For p-i-n + SPADs, the widening of the high-field region has led to a higher PDE than for p-n + devices ( Fig. 6(b) ). Besides the increased efficiency of impact ionizations, this can also be explained by a lower loss rate of charge carriers under the diffusion model (<5% for p-i-n + , ∼10% for p-n + ), which follows a photon absorption event in the quasi-neutral regions. The slight drop in PDE for ΔW = 600 nm is also likely due to an edge effect, as the p-i transition edge corresponds to the position of the waveguide rib edge; we do not otherwise find an obvious dependence of the PDE on the intrinsic region width, although timing jitter increases with ΔW .
Based on our analysis, we conclude that the optimum performance is obtained for ΔW = 400 nm, which gives a saturated PDE of 52.4 ± 0.6% at V B = 31.5 V and a FWHM timing jitter of 10 ps.
B. Dark Count Rate
We also evaluate the dark noise performance of the SPADs, focusing on devices which display high saturated PDE: p-n + SPAD with Δj = 400 nm and p-i-n + SPADs with ΔW = 400 nm and 900 nm. We calculate the DCR at 243 K, which is in a typical SPAD operating regime readily achieved with thermoelectric cooling, as well as at 300 K to explore the feasibility of room temperature operation.
For our simulated parameters, BTBT shows a greater sensitivity to peak electric field strength than TAT (Fig. 7) . In p-n + SPADs, where the peak fields are high, BTBT is the dominant dark carrier generation mechanism. As the bias V B increases, the depletion region widens, leading to a decrease in the peak field strength and hence the overall DCR, while the TAT contribution stays relatively constant ( Fig. 8(a) ). At an operating bias of V B = 31.5 V (which is above the saturated bias), the DCR is 11 kcps and 21 kcps at 243 K and 300 K, respectively.
In p-i-n + SPADs, due to wider high-field regions with lower peak fields, BTBT becomes negligible compared to TAT. As such, DCR generally increases with V B , and shows a steeper dependence on temperature (∼1000 -fold drop between 300 K and 243 K). We observe that while SPADs with wider intrinsic region widths ΔW had lower dark carrier generation rates per unit volume, this was offset by the larger device volume, and could lead to higher DCR compared to narrower ΔW .
Overall, dark count performance for p-i-n + SPADs is significantly better compared to p-n + devices, with observed DCR of <4 kcps at 300 K and <5 cps at 243 K ( Fig. 8(b) ), even at V B beyond the saturated bias.
VI. CONCLUSION
In conclusion, we have simulated waveguide-based silicon SPADs for visible wavelengths, studying both p-n + and p-i-n + doping profiles. For our simulated parameters, p-i-n + SPADs outperform p-n + devices in terms of PDE and DCR; we identify the optimum device as a p-i-n + SPAD with ΔW = 400 nm, with a saturated PDE of 52.4 ± 0.6% at a bias of V B = 31.5 V, FWHM timing jitter of 10 ps, and DCR < 5 cps at 243 K. This is an improvement over our previous study, where the highest PDE obtained was 45% [18] .
The PDE is slightly lower than typical free-space SPAD modules with PDEs of up to ∼ 70% [41] ; however, our waveguide devices can offer superior timing performance and dark noise compared to available commercial devices (jitter ∼ 35 ps, DCR < 25 cps). We note that even at room temperature, the DCR of a few kcps is acceptable for certain important technologies including LIDAR [42] due to the use of temporal gating, thus indicating the potential applicability of our waveguide SPADs.
Our simulation methods can also be further extended to study other device geometries (e.g. trapezoid waveguides), doping profiles (e.g. p + -i-p-n + ) and materials (e.g. Ge-on-Si SPADs for near-infrared wavelengths). We are currently fabricating SPAD devices according to the designs reported here, and will compare the measured device performance with our simulation results in the near future.
APPENDIX
A. Trap-Assisted Tunneling 1) Intrinsic Carrier Concentration: We calculate the intrinsic carrier concentration n i (T ) in silicon via [37] : n i (T ) = 5.29 × 10 19 · (T /300) 2.54 · exp(−6726/T ) (8)
2) Effective Recombination Lifetime:
The effective recombination lifetime τ r (T ) was measured to be 7 ns at room temperature for an undoped Si rib waveguide device with similar sub-μm dimensions [38] . To obtain a suitable value at 243 K, we analyze the temperature dependence of τ r (T ): for low-level injection in p-type silicon, τ r (T ) can be approximated as the electron recombination lifetime [33] , i.e.:
τ r (T ) ≈ 1 σ e · ν e (T ) · N t (9) where σ e is the electron capture cross section, ν e (T ) is the mean thermal velocity of electrons, and N t is the trap density. The trap density N t is assumed to be temperature-independent, while for traps at Si/SiO 2 interface with E t -E i = 0.25 eV, σ e has been shown to be relatively constant over our relevant temperature range (243-300 K) [43] . Thus, the temperature dependence comes only from ν e (T ) ∝ √ T , and we obtain τ r (243K) = τ r (300K) · 300/243 (10)
B. Band-to-Band Tunneling
Values for B A , B B and B Γ at room temperature are given in ref. [35] . Both B A and B Γ are nominally temperatureinsensitive, while B B (T ) ∝ [E g (T )] 3/2 , where E g (T ) is the Si bandgap energy [39] :
in which A = 1.1785 eV, B = −9.025 × 10 −5 eV/K, and C = −3.05 × 10 −7 eV/K 2 , for 150 K ≤ T ≤ 300 K. We thus obtain:
