
Abstract-We introduce an analytical framework to understand the path for scaling nanophotonic interconnects to meet the energy and footprint requirements of CMOS global interconnects. We derive the device requirements for sub 100 fJ/cm/bit interconnects including tuning power, serializationdeserialization energy, and optical insertion losses. Using CMOS with integrated nanophotonics as an example platform, we derive the energy/bit, linear and areal bandwidth density of optical interconnects. We also derive the targets for device performance which indicate the need for continued improvements in insertion losses (<8dB), laser efficiency, operational speeds (>40 Gb/s), tuning power (<100 μW/nm), serialization-deserialization (< 10 fJ/bit/Operation) and necessity for spectrally selective devices with wavelength multiplexing (> 6 channels).
Index Terms-Integrated optoelectronic circuits; switching; coupled resonators; integrated optics devices.
I. INTRODUCTION: A FRAMEWORK FOR SCALING CMOS NANOPHOTONIC GLOBAL INTERCONNECTS
NCREASING computational demands of enterprise and datacom (DC) applications [1, 2] have created a need for scalable interconnect solutions for high performance computing (HPC). While the present industry focus is on the adoption of inter-chip optical interconnections [3, 4] ; the rapid adoption of multicore processors in DC and HPC [5] with high demands on bandwidth density and efficiency [1] may necessitate new interconnect solutions for same-die global interconnects [6] [7] [8] [9] . Given the rapid progress in CMOS compatible nano-photonics using III-V [10] , Germanium [11] as well as Silicon based [10] [11] [12] [13] [14] [15] [16] platforms, the on-chip adaptability of optical interconnects for global wires [17] needs to be revisited.
In this paper, we develop a systematic framework for scaling nanophotonic interconnects by using device and system level arguments. We use CMOS with integrated nanophotonic devices as an example platform but the analytical framework can be applied to other platforms [e.g. 10, 11] .The device advances in couplers [18] , low loss waveguides [19] , modulators [20] [21] [22] [23] [24] , switches [25] [26] [27] [28] , multiwavelength devices [29] [30] & detectors [31] [32] [33] [34] [35] can be put in context with the targets for on-chip integration using this framework. We derive the total interconnect energy per bit, areal bandwidth density and linear bandwidth density for a silicon photonic link considering the device parameters. We arrive at a minimal set of features for nanophotonic devices for building a scalable on chip photonic network. We note that we limit our analysis to how photonic devices can be scaled to meet on-chip interconnect energy/bit and bandwidth density requirements. We compare the energy/bit/mm, linear bandwidth density of the optical interconnect with generic interconnect targets for CMOS. A direct comparison with a future advanced low swing voltage (LSI) electrical on-chip interconnects is hard to achieve within the scope of the paper since such an analysis has to fundamentally comprehend the variability limits to LSI interconnects [59, 60] .
II. FIGURES OF MERIT FOR NANOPHOTONIC INTERCONNECTS
We discuss four critical figures of merit for nanophotonic interconnects based on physical constraints of the optical and electrical properties of a silicon based material system. Namely, a) Energy consumption per bit (E) b) Interconnect density (β) c) Single channel bandwidth (f) d) Areal bandwidth density (D). Figure 1 : A minimal nanophotonic link with an optical source, couplers, modulators, waveguide and a detector. A serializer and deserializer are considered to obtain the optimum operating speed of the link. Tuning at both ends is assumed to operate the link at a specific wavelength.
III. ENERGY/BIT OF A NANOPHOTONIC INTERCONNECT (E)
We will derive the minimum bound for an optical interconnect electrical energy per bit considering the performance of the modulators, detectors, waveguide and coupling insertion losses. For the following analysis, we have assumed a receiver less topology for optical interconnect as proposed in Miller et al [36] . While, this is not the optimal optical link design for all operating conditions (see Appendix A, B), we believe this provides reasonable direction for the optical device requirements when the on-chip detector capacitance is low [36, 37] written (in the absence of tuning power and serialization) as a sum of energy from the source and the electro-optic modulator's energy as:
Where E Source-detect is the energy spent in the source laser and the detector energy; E EO is the energy spent in electro-optic coding of the electrical information into an optical signal. A lower bound to the interconnect energy can be obtained by assuming that the detector needs to charge a capacitor of capacitance C d to a voltage V r corresponding to a specific CMOS node [36] . While this is an aggressive requirement, this assumption lets us derive a minimum bound for energy per bit requirements. E source,Detect can be written in terms of drive laser parameters and insertion losses as 10 , 
10
. .
where V r is the minimum voltage to which the detector capacitance is to be charged, η L , η D are the quantum efficiencies of the laser and detector normalized to the maximum values, η C is the laser to waveguide coupling efficiency, η D includes the waveguide to detector coupling efficiency. η M is the modulator insertion loss, α is the insertion loss of the waveguides in dB/cm, L is the length of the interconnect in cm. The above equation is a reasonable approximation for the following conditions: a) the detector RC response is significantly faster than the optical pulse width b) the received optical power & extinction ratio exceeds the bit error rate requirement (see appendix B) of the link and c) the collected optical power at the receiver is always adjusted to allow full voltage at the detector. We also note that an on chip receiver drives a significantly lower load capacitance (a few transistor gate capacitances on the order of aFs). The minimum electro-optic conversion energy per bit (E EO ) is arrived at using the modal volume of the modulator and the injected charge density for a given transmission change. We assumed a modulator drive voltage V m , electro-optic modal volume Θ, the optical transmission change ΔT. dn dT is the spectral sensitivity of the optical device.
is change in refractive index (n) vs. carrier concentration (ρ) in the electrooptic device.
We show that an idealized nanophotonic interconnect in the absence of tuning power & electrical I/O overheads can achieve sub 100 fJ/bit/cm operation. Modulator switching energy approaching 10 fJ/bit can be expected in the near future in the depletion based & ultra-low modal volume modulators [23, 38] . Figure 2 , shows the energy vs. distance scaling of a nanophotonic interconnect with E mod =10 fJ/bit modulation energy, C d =1 fF detector capacitance, 1 dB coupling loss, 1 dB modulator insertion loss, -1 dB detector efficiency and 25 % efficiency laser source. (See Appendix C)
A. Effect of laser efficiency on the energy per bit
The power efficiency of the laser has a significant effect on the interconnect energy per bit. In figure 3 we show the interconnect energy per bit for varying laser efficiency (defined as optical output power vs. electrical power supplied to the laser). The low inefficiency of the laser may arise due to several factors including the requirement for thermoelectric cooling, collection efficiency & leakage power. At 5 % wall plug efficiency the interconnect energy/bit at 1 cm length can approach 50 fJ/bit/cm, for idealized interconnects with no tuning requirement. The effect of additional insertion loss due to routing and selective devices is described in Appendix E. 
B. Effect of tuning nanophotonic devices to offset variability & temperature dependence
We show that higher operating speeds of the devices may allow for the averaging of the tuning power required over many bits in order to achieve low energy per bit. Tuning of nanophotonic devices is essential due to the intrinsic temperature dependence of refractive index of solid state materials, wafer level variability, with run time operating temperature variability [39] . The total power including the tuning power for modulator and detector wavelength selective devices can be written as
Where we included the tuning power per nanometer of correction P tune to correct the operating wavelength of the modulator & detector by Δλ. B is the bit rate of the link. In figure 4 , we show the effect of the tuning power on the total interconnect energy. The constant power penalty due to tuning will mandate operation at higher speeds so that the tuning power can be shared among more bits per second.
Higher operating speeds of interconnects will be necessary to achieve an energy/bit below 100 fJ/bit/cm since the tuning power imposes a significant constraint on the energy efficiency of nanophotonic interconnects. As shown in figure 4 , 100 fJ/bit energy targets can be reached only at 40 Gb/s when a 2 nm (20 C) correction is required. The run time temperature control for the micro-processors is expected to be 20 C with a spatial variation of 50 C in temperature [39] . Hence significant advances, in temperature independent device operation [40] or highly efficient low overhead tuning schemes remain to be developed [41, 42] . We note that packaging and module level cooling may significantly change the tuning requirements. 
C. Effect of on-chip serialize-deserialize operations
We show that efficient electrical serialize and deserialize operations are essential to operate the optical links at higher operating speeds. We obtain the optimum operation speeds of the silicon optical interconnect by including the energy cost of serialize-deserialize operations and the tuning power.
We modeled the power penalty for serialize and deserialize (SerDes) operations as a constant energy per bit per serialization order. The total energy of the link can be written as:
( 5) where F clock is the system clock, E SD is the energy per bit per serialization order (N). The SerDes are used for scaling the bit rates beyond twice the system clock. The exact functional form for the SerDes operations can be different, however, it is commonly understood that the higher the bit rate and degree of serialization, the larger is the energy for serializing and deserializing. In figure 5 , we show the effect of serialize, deserialize power on the total energy per bit. Some recent examples of optimization for on-chip serial link SerDes are [52, 53] . For a large SerDes energy of 50 fJ/bit per serialization order, we see that the minimum of the energy is obtained when no serialization takes place at 2* F clock bit rate. However, for a lower SerDes energy (10 fJ/bit), the penalty due to SerDes is not significant enough to change the behavior of the interconnect energy. The minimum energy is then obtained when the interconnect is operated at the maximum possible drive conditions. (See Appendix D for SerDes energy scaling with CMOS technology node). We also study the effect of the system clock on the behavior of the total interconnect energy per bit considering tuning power, serialization as well as device insertion losses.
The energy penalty due to serialization can be minimized by operating at the highest available system clock. We also assumed that a distributed clock is available throughout the chip. The clock distribution from the local source to the SerDes is considered local distribution and is ignored. We see that at a 5 GHz system clock, with a SerDes power of 10 fJ/bit/Operation and a tuning power of 100 µW/nm, 150 fJ/bit operation can be achieved for all bit rates above 20 Gb/s.
E. Total Interconnect Energy Dependence on Length:
We study the total optical interconnect energy as a function of length including insertion losses, laser, modulator and detector efficiency in figure 6 . Cross over points of the optical interconnect energy/bit vs. generic interconnects with a fixed energy/unit area are shown in figure 6 . A high energy/bit interconnect such as a 1pJ/cm interconnect [43] (for e.g. a full swing interconnect with a swing voltage of 0.68 V (ITRS 2011_ORTC-6, V dd for high performance) & Capacitance of 140 aF/μm (ITRS Table 2011_INTC2 , 2020) will have cross over points as low as a few mm. However, an energy efficient interconnect with 100 fJ/cm [44, 59] will have a longer cross over point. It remains to be seen if the emerging electrical interconnects can meet the on chip bit error rate & variability requirements [59, 60] given the high aggregated bandwidth of microprocessors [61] . We believe that given the number of interconnects and the aggregated bandwidth in the microprocessor application of interconnects, error correction will be limited due to latency area and power considerations. The fundamental limit to optical interconnect density is greatly enhanced by the high central carrier frequency and the ability to multiplex a large number of wavelengths [45] . For a nanophotonic waveguide array comprised of waveguides of width W, separated in a pitch of P, the bandwidth density (per micron) can be written as:
Where N is the number of WDM channels, B is the single channel bandwidth, P is the waveguide pitch and L (in microns) is the cross talk distance in microns. The pitch is the waveguide center to center pitch calculated for 250 nm (height) X 450 nm (width) waveguides such that a 3 dB coupling to the closest waveguide takes place for TE mode over a length of L (in microns) [46] . Novel CAD methods and wavelength allocation methods to separate the waveguides can reduce the effective pitch. Note that unlike the electrical case, the optical bandwidth density is not a strong function of the length of propagation. The dispersion effects enter the analysis as a secondary effect over several meters of propagation [47] enabling 1 Tb/s on a waveguide using WDM [45] , thus indicating bandwidth density limits exceeding 10 12 bits/µm.s. 
A. Length dependence of interconnect linear bandwidth density

B. Considerations on scaling the number of channels using micro-resonators
Here we analyze two critical design considerations for scaling the bandwidth density using WDM: a) the channel spacing b) the total number of channels set by cavity free spectral range. We use 1 st order optical micro-ring resonators as example resonators. We note that in general variety of micro-resonators and higher order designs can be employed. The wavelength spacing between the resonators can be controlled by considering the effect of waveguide and material dispersion. The functional dependence of resonance position of the rings can be given by:
Where λ k is the position of the optical resonance of the k th micro-ring, r is the radius of the base micro-ring resonant at λ 0 is the radius perturbation introduced in the k th ring. We note that for a WDM microring bank spanning several 10s of nm δr(k) will be a non-linear spacing variation obtained by including the variation in n eff (λ 0 + δλ. k) due to strong waveguide dispersion of high index contrast systems [45] , waveguide bending and the material dispersion of the media. The channel spacing is also affected by the amplitude and phase cross talk due to off resonant interaction with the adjacent channels.
A second consideration is the free spectral range of the resonators to enable a large wavelength range for packing the WDM channels. The maximum number of channels that can be packed in a WDM system using micro-rings of radii is the mode order for the base micro-ring. For example a micro-ring resonator of 1.5 micron radius can have an FSR of 62 nm allowing a large number of WDM channels [62] . One can see that a considerable design space is available using microresonators to meet the linear bandwidth density requirement.
V. SINGLE CHANNEL BANDWIDTH OF A NANOPHOTONIC INTERCONNECT (F)
The limit to single channel bandwidth is decided by the operation speed of the receiver and transmitter. The fundamental limits to the electro-optic device speed are given by free carrier response times [20] [21] [22] [23] 36] or electro-optic material response time or the driving capacitor time constant [15] . For photo-detectors and free carrier dispersion modulators:
Where v sat is the saturation velocity of carriers in silicon (set by the optical phonon dispersion), typical values of ~10 7 cm/s (for Si, Ge and III-Vs), w=λ/15~ 103 nm is space rate of decay of the evanescent field of the waveguide [50] and n is the arbitrary factor chosen such that e -n gives the factor by which the evanescent field decays. The typical clearance for placing thin film planar doped regions next to nanophotonic waveguides can be estimated to be 3λ/15 ~ 310 nm.
The switching speed of a scaled electro-optic device driven by a scaled single stage digital logic driver is [54]: (11) where C n , V n , I n are the capacitance, voltage and current of a minimum sized transistor at a given technology node, I modulator is the peak current through the modulator. We plot the maximum switching speed of the direct logic drive as a function of the drive current for the modulator in Fig. 9 . Gate lengths, voltages and delays are taken from ITRS HPC PIDS [1] . The voltage and current drive requirements for the EO devices therefore should be compatible with scaled CMOS for high speed operation. The voltage and current drive requirements for the EO devices therefore should be compatible with scaled CMOS for high speed operation. 
N is the index refractive index of the guiding medium. The density will have to be adjusted to allow for the driver and receiver circuits (as shown by the driver scaling in section V).A 1.5 µm radius modulator operating at 10 Gb/s will reach bandwidth density of 1400 Tbit/s.mm 2 [38] . Improved speed, 3D integration and ultra-small modal volumes may be necessary for meeting the CMOS areal bandwidth density requirements.
VII. DEVICE REQUIREMENTS FOR SCALABLE NANOPHOTONIC
INTERCONNECTS
Based on the figures of merit proposed earlier, we present a minimal set of optical device requirements for replacing CMOS global interconnects. However, we note that specific device requirements derived above are for a single direct link and not a networked topology [6] [7] [8] [9] . Four minimal features to enable optical components on chip are: A. High bandwidth, Broadband devices: Higher speed of operation will allow large interconnect densities and offset the tuning power to reduce the energy/bit. Target speeds are in 10 to 40 Gbps for modulators with switch bandwidths to allow switching of 40 Gb/s signals. 

With the appropriate scaling of device performance, photonic CMOS for on-chip interconnects may emerge as a technology for high performance computing applications in the CMOS/beyond-CMOS era.
APPENDIX A: DERIVING OPTICAL LINK ENERGY
The energy per bit of the E Source, Detector can be derived as follows. At the detector end, the charge through the detector for 1 ON bit (and current) is given by
The incident optical energy at the detector can be written as:
Which gives the total electrical energy as: 10 , 
10
APPENDIX B: BIT ERROR CONSTRAINTS AT THE RECEIVER For an N node interconnect network operating at frequency f, the tolerable error rate P req for operating with a failure rate of R over time T is [60] :
For 10,000 on chip global interconnects operating at 5 GHz with a failure rate of 10 -6 over a lifetime of 10 years, the required error rate is 6.3X10 -29 . Following Beausolil et al [37] , the mean number of photons required in an ON pulse for an error rate of P, for a modulation depth Where η is the total quantum efficiency of the detector. For a target error rate of 10 -29 , this corresponds to 823 collected photons per "ON" pulse at a detector capacitance 1 fF, modulation depth of M=0.9 (Extinction ratio=-10Log 10 (1-M)= 10 dB).
The effect of modulation depth on the required optical power at the receiver (for a 40 Gbit/s signal) is shown in figure B.1. The minimum number of collected photons required at the given extinction ratio is also shown. Under the assumption of full charging of the detection capacitor (i.e. collected photons = C d V/e=6240), we can see that the tolerable extinction ratio at the receiver is 1.4 dB. Hence, the degradation of the modulated optical signal due to insertion loss should not affect the BER for low insertion losses (< 8 dB). For the analysis of the paper we assumed that the modulators are maintained at optimal modulation depth using a tuning mechanism. We note that, the above received optical power is a lower limit for a receiver less detector. The degradation of SNR due to a TIA has to be accounted for in a receiver based system [63] .
APPENDIX C: EO MODULATOR ENERGY FOR ELECTRO-OPTIC POLYMER MODULATORS
A second common class of modulators compatible with CMOS is electro-optic polymer modulators [15] .The scaling with electro-optic properties for such modulators is as follows: The estimated SerDes power at 32 nm under a global on chip synchronous clock without clock recovery is 27 fJ/bit/N [53] . Using the projected scaling ratio of 0.0797, at 11 nm node the estimated energy/bit/order is 2.16 fJ/bit/N (non-ideal gate capacitance as predicted by ITRS increases this projected value to 3.35 fJ/bit/N). To study the effect of the SerDes we have included a wide range of energy estimates of 100 fJ/bit/N to 10 fJ/bit/N in this paper.
APPENDIX E: EFFECT OF INSERTION LOSSES
The energy/bit of the optical interconnect is affected by the insertion losses due to the passive and active optical components. The insertion losses may arise from non-resonant modulator loss, mux, de-mux filters, waveguide crossing losses. The change in energy/bit due to total insertion losses is shown in figure E.1. Insertion losses can also play a major role if the degradation in extinction ratio at the detector reduces the received extinction ratio at the detector below the threshold for high bit error rate. For example, in section D, if the extinction ratio (of the received bits) reduces below 1.4 dB due to insertion loss, the interconnect will be BER limited.(for a modulator ER of 10 dB this places a 8.6 dB limit on IL)
Figure E.1: Effect of insertion loss on the energy/bit
