Abstract-A monolithically-integrated optical receiver for lowenergy on-chip and off-chip communication is presented. The monolithic photodiode integration enables the energy-efficient and high-sensitivity sense-amplifier-based receiver design. The receiver is characterized in situ and shown to operate with μA-sensitivity at 3.5 Gb/s with a power consumption of 180 μW (52 fJ/bit) and area of 108 μm 2 . This work demonstrates that photonics and electronics can be jointly integrated in a standard 45-nm SOI process.
I. INTRODUCTION
In order to harness the potential of emerging many-core processor systems, the communication fabric between cores and shared off-chip memory must provide high-throughput at low power and footprint costs, overcoming chip power constraints and I/O pin limitations. Monolithically-integrated silicon-photonics offers a dense, wavelength-division multiplexed fabric with orders of magnitude better energy-efficiency and bandwidth density than electrical interconnects [1] . However, advanced process design rules place significant constraints on the integration of photonics and electronics.
The optical receiver, a necessary component in any optical link, has traditionally been designed as a discrete component for optical fiber communication. Gain and responsivity could be optimized through material selection, while packaging resulted in poor parasitic capacitance. In order to mitigate the gain-bandwidth limitation at the dominant pole of the input node, transimpedance amplifiers were implemented to lower the input resistance, R in , while preserving a large transimpedance gain, R T IA [2] .
More recently, integrated photonics has addressed chip I/O bottlenecks through hybrid-packaged solutions [3] , [4] . While more sensitive and energy-efficient than discrete receivers, the photodiode (PD) and parasitic capacitance of hybridpackaged designs is still relatively large. A capacitance of 90 fF is reported in [3] . [5] describes a 25 fF PD capacitance (with parasitics) connected through a 20 fF microsolder bump, leading to a receiver sensitivity of 9 μA but energy-cost of 690 fJ/bit at 5 Gb/s. This increases system laser power and dominates over-all link costs, making it less competitive with electrical solutions already at 1 pJ/bit [6] .
In this paper we present an optical, sense-amplifier-based data receiver with a monolithically-integrated photodetector. Tight integration of the photodetector and the latching senseamplifier enables both low-energy operation and high inputsensitivity. These metrics are important, as future many-core communication fabrics will host tens of thousands of receivers per processor die, and input sensitivity maps directly through the optical link loss to the laser power requirement.
II. PHOTONIC LINKS
An example of a dense wavelength-division multiplexing (DWDM), monolithically-integrated photonic link is shown in Figure 1 . A continuous wave (CW), multi-λ laser is coupled from an optical fiber onto the chip through a vertical grating coupler. The light is then routed throughout the chip along waveguides fabricated using either gate poly-silicon or the SOI body. Ring-resonant modulators, driven by integrated modulator drivers, modulate data onto a particular wavelengthchannel by on-off keying. An optical clock signal could also be forwarded along with data. The modulated light is routed to another location on the die (e.g. core-to-core) or to another die (e.g. socket-to-socket). At the destination, the ring-tuning control block selects the channel to be removed from the optical bus by setting the resonance of a drop ring filter. An optical receiver, such as the one presented in this work, then converts the data back into the electrical domain by detecting the PD photocurrent.
III. RECEIVER ARCHITECTURE
The receiver architecture ( Figure 2 ) consists of a PD connected differentially across a latching sense-amplifier (LSA), followed by a dynamic-to-static (DS) converter and an onchip high-speed digital testing backend. The receiver operates in two clock phases, receiving one bit per clock period.
In the PD (Figure 2g ,j) we make use of P+ SiGe, which is integrated in the SOI process for PMOS strain engineering and is suitable for optical absorption in the near-IR range [7] . The photodiode is extremely compact and has an estimated capacitance of 10 fF.
The LSA (Figure 2a ) senses the differential photocurrent and makes a bit decision. During the reset phase (Φ=0), the LSA's nodes pre-charge high. During the decision phase (Φ=1), the two branches, A CW laser source is coupled onto Chip A through a vertical grating. Two ring-resonant modulators imprint data onto two wavelength-channels, λ 0 and λ 1 , which propagate along the waveguide. The bus is routed over an optical fiber to Chip B. The drop rings on Chip B are each tuned to either λ 0 and λ 1 to select that channel from the bus and direct it to the correct data receiver. A second set of wavelengths, λ 2 and λ 3 carry data from Chip B to Chip A. compensation causes branch M 1,3,5 to latch low. The LSA transistors are sized according to [8] in order to balance speed and sensitivity. In particular, transistors M 3,4 are sized large relative to M 5, 6 . This lowers the trip-point voltage of the crosscoupled inverters, ensuring that they do not activate too early. Offset compensation is implemented as programmable currentsteering ( Figure 2b ) and capacitive ( Figure 2c ) DACs [9] , for coarse-and fine-compensation, respectively. Figure 2h shows a diode-emulation circuit that is used to characterize the receiver's performance when decoupled from the optical devices. When the input data is 1, the circuit pulls current from IN-, emulating the photocurrent sourced from that node. A 0-bit sources no current. The diode-emulation circuit is driven by a pattern generator on a separate, programmable clock phase from the rest of the receiver.
To provide qualitative analysis of the impact of parasitic capacitances and operation frequency on the receiver sensitivity, an equivalent model of the LSA is shown in Figure 3 . Figure 3a shows the input nodes at the end of LSA reset (t = 0), pre-charged high. I cm models M 1,2 pulling down on the input nodes until cross-coupled inverters M 3−6 turn on. C w represents the wiring capacitance from the PD to the receiver. The model divides the decision phase into two steps: integration, and evaluation (Figure 3b ). During the integration 
During the evaluation phase, V dif f regenerates exponentially until T end according to Equation 4 . Re-arranging, the current-sensitivity of the receiver can be expressed by Equation 5. Figure 4a shows through extracted simulation that for high data-rates where the exponential is not completely settling, wire capacitance, C w , delays the onset of evaluation, shortening the evaluation time and therefore demanding exponentially more input photocurrent. Figure 4b shows that PD capacitance, C P D , reduces V dif f linearly, demanding only proportionally more photocurrent (Figure 4b ). As our PD was implemented on the same die as the receiver, the low-metal-layer routing between the PD and receiver results in a small C w , assumed here to be ≈ 2.5 fF. The proposed topology may suffer in scenarios where a second die provides the optical transport layer, necessitating through-silicon vias (TSV) or microsolder bumps where C w may increase above 20 fF [10] . Figure 4a shows that for C w in this range and data rates above 4 Gbps, the sensitivity becomes prohibitively poor.
The output of the LSA is buffered (Figure 2d ) to isolate the LSA decision nodes from the data-dependent capacitance looking into the DS (Figure 2e) .
The bits stored in the DS are fed into the on-chip digital test backend for in situ processing (Figure 2) . The backend, consisting of synthesized PRBS and pattern generators, snapshots, and counters, gathers bit-error-rate and receiver decision threshold data and exports only statistics off-chip.
IV. MEASURED RESULTS
The monolithically-integrated data receiver was fabricated in a standard 45-nm SOI process, as a part of a flexible electronic-photonic test vehicle. Figure 5 shows two DC photocurrents generated by a 1310-nm wavelength laser, coupled into the chip through a vertical coupler and horizontal waveguide made with front-end body Si. The receiver's threshold is swept using the offset circuitry (Figure 2b ,c) while recording the output decision statistics. Photocurrent values were de-embedded through simulation. Though the receiver was able to detect photocurrent from the PD, a foundry error in the SiGe mask definition limited achievable PD bandwidth. Figure 6a shows the receiver's sensitivity vs frequency for different supply voltages. Sensitivity is measured on a PDconnected receiver (Figure 2g,j) as the width of the transition region ( Figure 5 ) of an optical-0. As clock frequency increases, sensitivity degrades exponentially as predicted by our model due to the decrease in T end . Figure 6b shows the energy-cost of the receiver. The linearity emphasizes the digital design, with power following P digital = f CV 2 DD , keeping the receiver energy-cost ≈ 50f J/b across a range of frequencies. Figure 7 shows the bit-error-rate eye diagram of the receiver when configured with a PD-emulation circuit (Figure 2h ). Clock phase and receiver threshold were swept for a 31-bit PRBS data sequence at 3.5 Gb/s and a supply of 1.1 V, and error statistic were gathered in situ using the digital backend. Clock rates above 3.7 GHz caused the digital testing backend to fail. A die photo is shown in Figure 8 . The chip contains 72 test cells that implement combinations of optical modulators and receivers. Each receiver has a circuit area of 108 μm 2 and PD area of 416 μm 2 .
V. CONCLUSION
Integrated photonics has emerged as an I/O technology that can meet the throughput demands of future many-core processors. In this work, the monolithic integration of the photodetector enables the design of a fully-digital, low-energy receiver with high input sensitivity. A qualitative model of the receiver provides intuition for the impact of different PD integration scenarios on the receiver's sensitivity performance. The sense-amplifier-based latching receiver is shown to operate with μA-sensitivity at 3.5 Gb/s with an energy-efficiency of 52 fJ/b. This work demonstrates the first monolithic electronicphotonic integration in a sub-100-nm standard SOI process.
