Abstract-
A Three-Layer 3-D Silicon System Using Through-Si Vertical Optical Interconnections and Si CMOS Hybrid Building Blocks
Research into the next generation of Si CMOS electronic systems, termed giga-scale integration (GSI), has suggested that the operational speed of CMOS-based systems will be limited not by the devices, but instead, by the interconnections between devices [1] . Furthermore, the Semiconductor Industry Association (SIA) Roadmap for 1997 identifies CMOS interconnection technology for the year 2009 and beyond as a critical technology to develop to continue traditional performance scaling of CMOS technologies, but they do not identify a current technology which will satisfy the interconnection need [2] . The Roadmap goes on to suggest 3-D devices and architectures, and employing on and off-chip optical interconnects as possible solutions to the communication bottleneck anticipated at CMOS linewidths below 70 nm.
The 3-D CMOS architectures with optical interconnections enable the development of advanced smart photonic systems and can alleviate the electrical interconnection limitations of future CMOS technologies [3] , [4] . Although materials research is providing improvements in electrical interconnects by utilizing advanced materials (aluminum and standard SiO are being replaced with copper and titanium nitride [5] ), optical interconnections can provide high-speed low-loss massively parallel crosstalk resistant communication, enabling 3-D processing system topologies suitable for chip-to-chip, chip-to-board and board-to-board applications [6] , [7] .
Massively parallel applications such as image processing, image generation, and routing can be mapped into architectures which are well suited to 3-D interconnection structures. These architectures, which utilize multiple processors, are systems that could benefit from 3-D interconnections, as illustrated in Fig. 1 . The latency associated with planar high density interconnections does not scale as the size of the system increases; latency problems are exacerbated with increasing system size. If a large number of processors on a planar substrate [32 32 4 shown in Fig. 1(a) ] were arranged into a 3-D topology using vertical optical interconnections, as shown in Fig. 1(b) , the interconnection power dissipation and communication delay (latency) between subsections would be substantially decreased [8] . Another system that can benefit from vertical optical interconnections is a highly parallel image processor discussed in [9] . This system utilizes an offset cube topology of 4096 similar processors vertically interconnected to processing planes above and below the processor. Simulations of a 16 16 16 system indicate that it could perform image computations with an aggregate data throughput of 819.6 Gb/s.
Vertical electrical interconnection of circuitry has been investigated using Al thermomigrated feedthroughs through Si substrates to connect the front and back of each substrate [10] . These Si substrates, which included functional circuitry, were then stacked and connected using microbridges that connected the facing wafer planes. Heat dissipation limited operation to a single layer in the demonstrated five-layer stack. this heat dissipation problem is alleviated when optical interconnections are used, since simple point-to-point optical interconnections are unaffected by the flow of either liquidus or gaseous coolant. The index of refraction changes caused by mixing and convection will not significantly refract the beam path, however, large index of refraction changes caused by gas bubbles should be avoided in the flow. Thus, vertically optically interconnected 3-D electrically connected systems, since the optically interconnected systems need not be mechanically linked for electrical conduction between the layers.
Vertical optical communication through Si circuits for 3-D interconnections was first posited using an external solid-state laser operating at 1.3 m. Since then, three different demonstrations of vertical optical communication through stacked Si substrates have been reported in the literature [12] [13] [14] . The first technique utilizes advanced GaAs-based emitters flip-chip bonded to the Si circuits. Unfortunately, optical absorption for this through-Si link is high because the emission energies of these devices is larger than the bandgap of Si, eV, m) [15] . To achieve low-loss propagation in this system, the use of laser drilled free-space waveguides through the host substrates is required [12] . The second demonstration utilizes three GaAsbased vertical-cavity surface-emitting lasers (VCSEL's) (930 nm 950 nm) for multiple wavelength reconfigurable communication to three separate substrates containing flip-chip bonded multiple-quantum-well (MQW) detectors. The emitters in this demonstration are all located at the same plane and operate at wavelengths that experience absorption through the substrate. The novelty of this system is that it allows a single source plane and the option to choose the individual receiver plane by choosing the wavelength [13] . The third method utilizes integrated InP-based optoelectronic devices operating at 1.2 m bonded directly to Si circuits. This technique has been used to produce two layer through-Si CMOS vertical optical interconnections [14] , [16] , [17] .
To access high-yield low-cost VLSI for complex signal processing functions, the development of CMOS analog optoelectronic (OE) interface building blocks and the integration of OE devices with digital CMOS VLSI are critical. Parallel CMOS receivers have been demonstrated with electrical gigabits per second operation (not integrated with OE devices) [18] . Optoelectronic device integration with foundry digital Si CMOS, and the independent optimization of the CMOS and the OE devices, have been attempted with both monomaterial and hybrid integrated approaches. Si optoelectronic devices for monomaterial integration have been studied and are under development, however, Si CMOS detectors operate at short wavelengths (below nm) with low responsivity or low speed, and emitters have only been demonstrated with weak optical emission through defect centers [19] . Efficient throughSi optical interconnections require technologies that allow independent optimization of long wavelength optoelectronics and CMOS processing circuitry, which obviate the use of currently reported Si optoelectronic devices.
One method of combining optoelectronic (OE) materials with highly integrated circuits utilizes hybrid epitaxial growth of III-V optoelectronic materials directly onto Si circuitry. These devices reported in the literature suffer from short lifetimes and low efficiencies due to mismatches in lattice constants and coefficients of thermal expansion [20] , or the CMOS is damaged by the growth process [21] , although there have been reports of hybrid growth success for modulators on Si substrates [22] . Other hybrid integration technologies, such as flip-chip bonding or thin-film integration, enable the independent optimization, fabrication, and testing of the optical and electronic components prior to bonding. Flipchip technologies align and attach optical substrates to Si substrates using metal (usually indium-based) bump bonds, but retain the substrate (and the optical losses associated with the substrate), and have limited scalability in the vertical dimension. The assembly for mixing and matching individual devices or small areas of devices on circuits using bump bonding is also problematical. In contrast, thin-film hybrid integration enables independently optimized devices (typically less than 5 m thick) to be selectively aligned and bonded to the host substrate using a transparent transfer diaphragm [23] . This results in a virtually planar hybrid integrated optoelectronic circuit (OEIC), and enables vertical scalability, the mixing and matching of independently optimized multiple OE devices with circuits, and the use of standard microfabrication techniques. Recent results also indicate that thin-film hybrid integration using metallic bonding (in this case, bump bonding) yields high levels of hybrid integration [24] .
The 3-D optical links described herein can be used to implement a true 3-D computational mesh, which may ultimately yield effective massively parallel computational systems. For example, a 32 32 4 processor system using the simple LED link described in this paper would yield a 12-in module. The transmission media latency, based upon the speed of light, is lass than 3 ps (negligible) for adjacent processor/processor communication from layer to layer in any size system. The transceiver circuits and emitter/detector delay will dominate the optical link latency. However, the power consumed in the LED-based network is prohibitive. With LED's, the theoretical 32 32 4 system would use 60 kW of power for interconnect alone [25] . Although longer wavelength ( 12 m) VCSEL's are in their infancy, one can, based upon GaAs-based VCSEL's, forecase that, using a 20-m diameter 1-mA-threshold VCSEL that a nearly 14 times improvement in the coupling efficency is predicted. Reasonably assuming operation at 600 Mb/s with a power dissipation of 5 mW per link, and by conservatively assuming a factor of 10 improvement in the coupling efficiency, the optical interconnect power for the 32 32 4 processor system drops to approximately 250 W with VCSEL's [25] -a factor of 240 times improvement over the LED-based system, and a power dissipation level that can easily be handled with standard ethylene glycol liquid coolant.
In this paper, the development of hybrid integrated optoelectronic transmitter and receiver building blocks in digital Si CMOS with operational speeds up to 155 Mb/s is reported. These building blocks are individually characterized and assessed for implementation in a three-layer system. The hybrid Si CMOS optoelectronic transceivers are then physically stacked into a three-layer system to demonstrate the first 3-D vertically optically interconnected CMOS threelayer communication system using through-Si vertical optical long wavelength interconnections.
II. SYSTEM DESIGN
To demonstrate a 3-D CMOS system with vertical optical interconnections requires careful selection of suitable subsystem components for operational speeds to 155 Mb/s. The analog CMOS receiver and transmitter circuits used scalable designs for implementation in digital Si CMOS processes and were optimized for minimum power dissipation at the operational speed. Long wavelength InP-based optoelectronic devices with various sizes and configurations were used to enable characterization of the devices integrated with the CMOS analog interface circuitry. This section describes the design of the analog circuitry and optoelectronic components. The next section presents the measured characteristics of the different integrated designs that were used to determine the best system combination for the through-Si optical interconnection.
A. Analog Optoelectronic Interface Circuit Designs 1) Transmitter Design:
The digital CMOS transmitter circuit described herein was optimized to meet the SONET OC-3 speed specification of 155 Mb/s while minimizing power consumption. The driver circuit was also designed to provide up to 80 mA of output current to the emitter, and consists of two tapered inverters, a current switch, and a current mirror, as illustrated in Fig. 2 . The transmitter was fabricated through the MOSIS foundry in a 0.8-m standard digital Si CMOS process. The two-stage tapered buffer input is designed to minimize power consumption at high speeds and begins with a minimum geometry inverter used to drive a current switch. The inverter multiplication ratio is dictated by the tradeoffs between speed, power, and chip size. The maximum speed is obtained when the ratio is 3, but this results in high-power dissipation. However, if the ratio is changed from 3 to 5, there is a slight decrease in the speed, but also a significant decrease in power dissipation.
To avoid large voltage spikes and signal distortion in the output, the current switch was not used in the final stage of the amplifier. Instead, the current switch was connected before the power transistor stage to produce fast rise and fall times. The output stage did, however, include a current mirror for dc biasing of the emitter. Normal operation of the circuit begins with the emitter off, when switch SW1 is open (on) and SW2 is closed (off). To modulate the emitter to the on state, SW1 is closed and SW2 is opened. For fast circuit recovery to the emitter off state, node N2, which has a large capacitance due to the size of the power transistors (M1 and M2), is quickly discharged through SW2. Careful layout of this circuit was necessary to minimize series resistance in the output power transistors, to minimize noise introduced onto the input signal, and to ensure the use of metal lines that were large enough to prevent thermal problems and electron migration.
2) Receiver Design: The transimpedance amplifier was designed for scalability and fabricated through the MOSIS foundry in a 0.8-m standard digital Si CMOS process [26] . For wide bandwidth, a multistage low-gain-per-stage openloop configuration was used [27] . The amplifier circuit, shown in Fig. 3 , consists of five identical stages, an offset circuit, and a comparator. Each stage has a current gain of 3, which gives a total amplifier current gain of 243. The input offset correction circuitry is a current attenuator, which allows very small currents to be applied to the input of the amplifier to correct for current offset. The complete amplifier circuit was designed to drive an on-chip clocked comparator to convert the analog amplifier output to a digital signal [28] .
B. Optoelectronic Device Design and Integration with CMOS Circuits 1) Emitter Design:
The optimal emitter for a through-Si link is a device capable of high-output powers, high speed, and low divergence angle. VCSEL's would be an optimal choice, however, long-wavelength VCSEL's are still in development, with the first continuous-wave (CW) demonstration in 1995 [29] . Light-emitting diodes (LED's) have demonstrated high reliability, are low cost, and simple to fabricate, and can operate at the desired modulation speeds (155 Mb/s) with good internal quantum efficiency. Although the divergence angle of LED's is large, which implies reduced coupling under the condition of no misalignment, this large divergence also results in a high level of alignment tolerance for the packaged through-Si link.
Three LED designs were integrated onto Si CMOS transmitter circuits and evaluated for use in the three-layer through-Si interconnection system. To minimize absorption loss within the Si, all designs centered on a wavelength of m, at energies smaller than the Si bandgap. The first design, E1, was a 250 m on a side square device implemented in the InGaAlAs-InP material system. It was designed for high speed by utilizing high doping in a thin active layer [30] , [31] . The wafer was grown by MBE on an n-type InP substrate and consisted of an InGaAs stop etch layer (100 nm, cm ), followed by InAlAs (1. cm , m). In these designs, high-speed operation was achieved by employing high-current injection [32] , [33] . To achieve high-current densities, since the driver was limited to 80 mA of output current, smaller device sizes were used. E2s were 100 m on a side devices and E3s were 50 m on a side devices.
2) Detector Design: The optimal detector for implementation in the three-layer through-Si data link has a low capacitance per unit area, a high responsivity, and low-noise operation. A metal-semiconductor-metal (MSM) detector meets all of these requirements save the responsivity, which is poor, generally 0.2-0.4 A/W, and the dark current, which is relatively large in the InP material system [34] . An in- verted MSM (I-MSM) photodetector, with the electrodes on the bottom of the device and the substrate removed, can produce higher responsivity than a conventional MSM and comparable efficiency to p-i-n detectors by eliminating the shadowing effect of the electrodes [35] , while preserving the desirable low capacitance, large-area characteristics of MSM's. Two different device designs were investigated for the system under development. Since the size of the detector and the finger spacing and width affects the capacitive loading of the receiver input, detectors were designed and tested in 50-and 250-m sizes with varying finger widths and spacings. The 50-m active area design utilized finger widths of 2 m separated by 2 m, and had an estimated capacitance of 70 fF. The 250 m active area device utilized finger widths of 2 m separated by 8 m and had an estimated capacitance of 200 fF. The MSM structure was grown lattice matched to an InP (substrate) and consisted of InGaAs (100 nm, etch layer)/InAlAs (40 nm)/InGaAs (1000 nm)/InAlAs (40 nm), with all layers nominally undoped (on the order of 10 cm ). The InAlAs cladding layer minimizes the dark current [36] , eliminates low frequency gain [37] , and enhances the Schottky barrier height. The heterointerfaces were graded to ensure better carrier transport as well as improved dark current characteristics.
3) Integration with CMOS Circuits:
The transmitter and receiver circuits were fabricated through the 0.8-m MOSIS Si CMOS foundry, and were hybrid integrated by direct metal/metal bonding of the thin-film optoelectronic devices onto the Si CMOS circuitry. The optoelectronic devices were mesa etched and the growth substrate was removed from the devices using selective substrate etching. The thin-film devices were then bonded to a Mylar transfer diaphragm [38] , and the devices were selectively aligned and bonded to TiAu contact pads on the transceiver circuits. Emitters bonded to transmitter circuitry required a top (AuGe) contact, which was electrically isolated from the bottom contact using DuPont polyimide 2611. The detectors used in these demonstrations had planar contacts and were directly bonded to the circuitry after the LED integration was complete. In this manner, the optoelectronic devices, the emitter and detector, were independently optimized (as was the Si CMOS circuit) and metal/metal bonded together to realize a hybrid integrated OEIC. A photomicrograph of an integrated receiver circuit and an integrated transmitter circuit are shown in Fig. 4 .
III. EXPERIMENTAL RESULTS

A. Optoelectronic Integrated Transmitter
To test the speed of the integrated transmitters, the circuits were wire bonded into a high-speed quad flat pack and placed into a conductive box for shielding. Prior to digital testing of the integrated transmitter, the thin-film devices were dc characterized with microprobes. Typical light output levels for the devices were:
: 50 W at 80 mA (0.128 kA/cm : 500 W at 80 mA (0.8 kA/cm ); and : 150 W at 80 mA (3.2 kA/cm ). Next, a digital input to the circuit from a pattern generator was applied to the transmitter circuit. The emitted light was collected using a dual lens collimation/collection system and was detected using a high sensitivity receiver (New Focus 1811).
The three different designs were tested and the eye diagrams are shown in Fig. 5 . Fig. 5(a) shows the output data for device at 155 Mb/s (2 1 NRZ). Fig. 5(b) shows the output data for device at 100 Mb/s (2 1 NRZ). Fig. 5(c) shows the output data for device at 155 Mb/s (2 1 NRZ). Only device had sufficient output power to conduct BER testing (BERT). The output of the signal was fed to the error detector, and the measured BER at 100 Mb/s (2 1 NRZ) was 10 . These results indicate that device has sufficient output power and speed to operate in a 100-Mb/s through-Si optical data link, described in [9] . Device design suffered from low internal quantum efficiency due to high doping in the active region and design had low output power. High doping in the active region was used for some of the devices studied to achieve highter speeds in future link experiments (note that smaller device sized will be necessary, as well). For the most efficient interconnections, for high coupling, high efficiency, high speed, low drive current, and low-power dissipation, VCSEL's would be the devices of choice, as stated previoulsy. as VCDEL technology in these wavelength regions progresses, the implementation of VCSEL's in these 3-D links will significantly improve the performance.
B. Optoelectronic Integrated Receiver
The integrated receivers were packaged into high speed LDCC packages, and fully tested: dc characteristics of the MSM designs and were measured, and digital tests were performed. The optimal dark current was achieved by annealing the MSM's and the Schottky contact metals (Ti (150 A)//Pt (500Å)//Au (2500Å for one hour at 200 C. The dark current was less than 10 nA for device Digital testing was performed using a commercial laser modulated by a pseudorandom bit stream. The output of the laser was fiber coupled to the detectors. An eye diagram and output waveform for device (50 m) are shown in Fig. 6(a) and (b) . This small device operated at a speed 155 Mb/s using 2 1 NRZ, due to the small capacitive loading of the receiver. This particular receiver was also tested for BER as a function of sensitivity as shown in Fig. 7 . The receiver integrated with the larger device with 25 times the area of (capacitance of 200 fF), operated up to 10 Mb/s with a 2 1 pseudorandom data stream (PRBS), as shown in Fig. 6(c) , and a BER of 10 for an incident optical power of 80 W. Although the small device operated faster, it is not large enough to provide good alignment tolerance and collection efficiency for the LED-based through-Si optical transmitter.
C. Three-Layer 3-D Vertical Optically Interconnected Link Demonstration
The 3-D three-layer stacks of OEIC's demonstrating vertically optically interconnected through-Si CMOS systems were fabricated and tested. The Si CMOS analog interface building block circuit designs described previously herein were used in a transceiver circuit to produce three-level stacking by incorporating bonding pads on only one side of the circuit and by leaving optoelectronic integration sites on the other side of the circuit free from metallization layers. The most suitable optoelectronic devices, as determined above, were integrated onto three stackable circuits. The three-layer stacked system was assembled by aligning the individual layers using an infrared backplane mask aligner and by using Si material to provide mechanical support during wire bonding of the threedimensional system. After all three layers were assembled and attached using an ultraviolet and heat curing epoxy, the system was packaged into a 144 pin ceramic pin grid array. Fig. 8 is a photomicrograph of one of the tested packaged 3-D systems.
To demonstrate the vertical optical interconnections in the 3-D three-layer Si CMOS circuit stack, the middle integrated receiver was powered up and a digital input signal was applied to the bottom-layer transmitter. This vertical communications link operated at 1 Mb/s with a PRBS signal of length 2 1. The middle layer receiver output pulse waveform and an infinite persistence eye diagram are shown in Fig. 9 . The second vertical optical interconnection was operated by applying the PRBS digital signal to the middle chip transmitter and by monitoring the output waveform on the top layer integrated receiver. This vertical communications link also operated at 1 Mb/s and demonstrated open eye diagrams as shown in Fig. 10 . A previous test of this system demonstrated bit error rates at 1 Mb/s of 1 10 with a PRBS signal of length 2 1; this is the first reported BER measurement for a through-Si CMOS optical interconnection.
The final demonstration that we report herein is the complete system interconnection from the bottom layer of the three-layer system to the top layer of the system, with only power supply and monitor connections to the middle layer. The PRBS digital signal was applied to the bottom-layer transmitter input that communicated optically to the middlelayer receiver circuit. The received signal on the middle layer was then sent to an on-chip clocked comparator that regenerated the receiver signal to digital levels suitable for driving the digital input of the transmitter. This signal was used as the input to the middle-layer transmitter that communicated optically to the top layer receiver. Fig. 11 shows 500-kb/s results from the through three-layer optical interconnection. The top trace is the input signal applied to the bottom transmitter, the middle trace is a monitor of the middle chip comparator, and the top signal is the output of the top circuit receiver.
Although the optical transceiver components in this system have demonstrated individual operation speeds in excess of 155 Mb/s, the multilayered system investigated in this paper was limited to 1-Mb/s operation. Signal coupling within the high-pin count CPGA packages and within the test fixture were believed to limit the speed of the layer to layer interconnections. The 3-D three-layer system, which operated at 500 kb/s, was speed limited by the noise generated within the middle layer CMOS circuit which had a clocked comparator, a single ended transmitter, and a single-ended receiver all operating simultaneously. We believe that feedback of the single-ended transmitter output signal into the sensitive single-ended receiver was the primary cause of this noise. This was confirmed by observing a decrease in the signal quality of the middle-layer receiver with an increase in the middle-layer transmitter output current. The need to run the transmitter at low power limited the speed obtainable from the top chip receiver. Higher speed demonstrations should be possible by utilizing improved decoupling networks within the test fixture, by utilizing higher speed quad-flat packages and by utilizing differential CMOS transceiver circuits.
IV. CONCLUSION
Presented herein for the first time is a report of the development, fabrication, and test of a three-layer 3-D CMOS interconnection system with integrated long wavelength through-Si optical interconnections. We have demonstrated m InGaAlAs and InGaAsP thin-film LED's integrated directly on digital Si CMOS transmitter circuitry with digital optical operations of 100 and 155 Mb/s. We also have demonstrated a transimpedance amplifier circuit in standard digital CMOS integrated with a m thin film I-MSM demonstrating an integrated receiver with digital operation of 155 Mb/s. These system components have been combined together to demonstrate the first optically interconnected 3-D three-layer CMOS system, with operation of 1 Mb/s at a BER of 10 This is the first report of a BER measurement on a throughSi optical interconnection. System development is continuing to resolve the signal coupling problems in the high pin count package that have limited our operation speed to 1 Mb/s with the goal of three-layer communication at 100 Mb/s. 
