Abstract-The completed detailed design and initial phases of construction of an optoelectronic crossbar demonstrator are presented. The experimental system uses hybrid very large scale integrated optoelectronics technology whereby InGaAs-based detectors and modulators are flip-chip bonded onto silicon integrated circuits. The system aims to demonstrate (a 1-Tb/s aggregate data input/output to a single chip by means of freespace optics.
working on the scale of terabits per second and above will be required to meet the I/O requirements of future generation integrated circuits and switching systems [1] but calculations have shown [2] , [3] that electrical interconnects have fundamental difficulties, arising from the skin effect, in achieving links of these bandwidths over distances of several centimeters. While state-of-the-art CMOS already offers production electronic interface technology with a capacity of 160 Gb/s [4] , these limitations suggest that a new technology capable of delivering an order of magnitude more bandwidth-of the order of several terabits per second-is required to meet long-term needs.
In recent years, a technology referred to as optoelectronic very large scale integration (OE-VLSI) technology has emerged as offering a potential solution to this problem. It is based on hybrid integration of surface normal optoelectronic devices with mainstream VLSI electronics using attachment techniques such as flip-chip bonding. A number of experimental demonstrator systems using this technology have been already constructed with terabits per second or near terabits per second optical interfaces [5] [6] [7] , but fully populating these systems has been difficult and complete operation of such an interface is yet to be achieved. In addition, certain issues such as electrical crosstalk and thermal uniformity are unresolved.
In order to fully exercise such an interconnect in a realistic system context, we have chosen to construct a crossbar switch that exploits an optical fan-out of 64 input signals by a factor of 64 to provide the connectivity required by the switch matrix. With a 250-Mb/s targeted data-rate for each signal, the capacity of the internal interconnect is 1 Tb/s, while the system can be fully tested with only 64 input sources. In the course of this design, we have explored the issues that limit the design and have developed the technologies required by such free-space optical interconnects, including micro/diffractive optics, optomechanical packaging, vertical-cavity surface-emitting laser (VCSEL) and modulator fabrication and smart-pixel VLSI design.
Although the prime motivation for constructing this experimental system is to explore the potential of optical interconnect 1077-260X/99$10.00 © 1999 IEEE technology, it is nevertheless the case that crossbar switches are a good example of a system that can exploit the high connectivity offered by free-space optics. Thus, although the switch does not fully address the needs of a specific application, it remains a valuable generic architecture for considering how OE-VLSI may be used for routing applications.
In this paper, we describe the design of the system introduced above and report on the initial stages of construction and findings on some of the important issues in the design of terabits per second scale optical interfaces. The overall system architecture is described in Section II. The optical layout of the system and design of the optical demonstrator hardware are discussed in Section III. Descriptions of the VLSI switching circuit and the modulator/detector array with which it is flipchip integrated are given in Sections IV and V, respectively. The inputs to the crossbar system are provided by a VCSEL array, whose performance characteristics are given in Section VI. General conclusions are drawn in Section VII.
II. SYSTEM ARCHITECTURE
The demonstrator system, that has been designed as part of the European project Smart Pixel Optoelectronic Connections (SPOEC), takes the form of an optoelectronic matrix-matrix crossbar [8] . The functional schematic of such a system is shown in Fig. 1 . Starting from the left-hand side, input electrical data (64 signals) are converted into optical signals by an electrically addressed 8-by-8 VCSEL array (circuit C1). Each of the 64 optical outputs from the array are themselves fanned out 64 times by an 8-by-8 fan-out diffractive optical element (DOE). The resulting set of 4096 optical signals, incident on the InGaAs optoelectronic chip (circuit C2-a) defines a partition of 64 blocks or "super-pixels" at the optoelectronic interface. Each super-pixel receives the full set of 64 optical input signals and converts these into electrical signals which are electrically fanned-in (fan-in of 64-to-1) by the silicon-based 0.6-m (CMOS) routing chip (circuit C3). The unique output from each super-pixel, which represents one of the original set of 64 signals, is converted back into an optical output by means of a differential pair of multiplequantum-well (MQW) spatial light modulators (circuit C2-b). The 64 output optical signals are sent, in this case, to a simple output chip composed of an array of photoreceivers (circuit C4) flip-chip bonded onto a silicon chip (circuit C5) that converts the signals to electrical outputs.
The crossbar system thus includes a 64 64 switch matrix with optical inputs and outputs permitting nonblocking one-toone or unrestricted broadcast connectivity. The target data-rate is 250 Mb/s per channel. The system is designed as a packet switch with the routing chip configured by the packet header, which contains the address information provided as part of the optical inputs. Arbitration between those inputs which simultaneously request access to the same output is handled internally by means of a cyclic priority scheme.
In this demonstrator, a strained InGaAs on GaAs optoelectronic chip implements the optical-to-electrical and electricalto-optical conversion of the signals, thereby allowing the use of a single chip for the circuits C2-a and C2-b. The interfacing of this chip with the silicon-based routing chip is carried out by solder-bump flip-chip bonding [9] . The specified overall throughput of the switch (i.e., before fan-out) is around 16 Gb/s, corresponding to 62 250-Mb/s data inputs. The two remaining optical inputs are used for the (differential) clock signals which are distributed optically to each super-pixel [10] .
It is worth noting why we chose to use a separate, opticallyconnected, output chip rather than taking the signals directly off the routing chip. Firstly, we wanted to avoid having to drive 62 signals off-chip at 250 MHz. Drivers with this performance consume significant amounts of power and take up more area than we had available on the routing chip. Secondly, this routing chip is partitioned so that the high-speed signals are local to each super-pixel, and thus all the necessary data inputs and the clock are provided optically within the superpixel area. By providing an optical output we maintain this independent structure and avoid the problem of tracking fast signals across the chip (which can lead to serious cross-talk difficulties). The advantage of optical interconnects is again demonstrated in this context by the simple manner in which the output modulators can be driven directly from the digital circuitry. It is also the case that, by avoiding cross-chip highfrequency signals, this partitioning is particularly suited to scaling up in size.
III. OPTICAL LAYOUT

A. Functional Description
The optical system layout [11] , shown in Fig. 2 , is designed to work at 956 and 1047 nm, the wavelengths of the VCSEL's and modulator readout laser, respectively. The two optical paths through the system are: 1) from the VCSEL array to switching chip (Arm 1 or input arm) and 2) from the readout laser to output chip detectors via the routing chip (Arm 2 or output arm). In the input arm, a square array of 64 optical input signals is provided by the 8-by-8 array of VCSEL's. The output from the VCSEL's is partially collimated (increased in /number from 3 to 6.7 at 95% power gathered) using a refractive microlens array. A multi-element bulk lens (Lens 1) collimates the 64 beams. These beams are then fanned-out to form an 8 8 array by a diffractive optical element (DOE1) and directed toward the routing chip by two polarizing beamsplitters (PBS-A and PBS-B) which operate as polarization-independent reflectors at the VCSEL wavelength. A second multi-element bulk lens (Lens 2) images the 64 64 array of optical signals onto the single-ended detectors of the routing chip. The collimated and polarized beam from the read laser (Nd:YLF) is fanned-out to an 8 16 array by DOE2 and passed through PBS-A which is operating as a polarizing beamsplitter at this 1047 nm wavelength. A quarter wave plate ( in Fig. 2 ) circularizes the polarization before Lens 2 images the read beams onto the 8 8 array of differential modulators that form the output from the routing chip. Reflected light from the modulators is collected by Lens 2, converted to a polarization orthogonal to the input by the quarter-waveplate and reflected by PBS-A. To permit PBS-B to be an identical design to PBS-A, a half-wave plate ( ) polarization rotator is inserted between them. Lens 3 images the output beams onto a matching 8 8 array of differential detectors on the output chip.
Binary phase gratings are preferred over multi-level gratings for all the diffractive components because they provide very effective zeroth-order suppression despite their lower efficiency. This is essential to avoid optical crosstalk where the zeroth- order image of the input signals overlaps with quadrants of the four central super-pixels. The use of diffractive optics in an optoelectronic interconnect requires that critical wavelength stability and array uniformity are maintained to ensure correct alignment of the beams onto the detectors. In this system the uniformity across the VCSEL array needs to be nm. The optical components are mounted using the slotplate approach [12] . A baseplate has been designed with V-grooves in which the barrel mounted components sit. The advantages of this scheme include the ease of focal adjustment combined with good stability. The three circuit boards that hold the VCSEL, routing and output chips are mounted on adjustable brackets for fine angular and positional control. The entire optomechanical system, shown in Fig. 3 , fits into a box approximately 30 cm 20 cm 10 cm in dimension. For compactness, the continuous-wave Nd:YLF laser which provides the modulator read beams is mounted below the optical baseplate.
B. Optical Design
A telecentric 4-f imaging system, composed of custom-made multi-element lenses, conveys the signals from the VCSEL array to the InGaAs/CMOS routing chip. A second 4-f relay images the outputs from the routing chip to the output chip. The specifications for the three lenses that make up this system (see Fig. 2 ) are shown in Table I . Lenses 1 and 2 are five-element telecentric anastigmatic lenses, developed from earlier designs [13] using the CODE V ray-tracing software. A particularly demanding aspect of the optical design is the requirement that Lens 2 has excellent image quality over the wavelength range 956-1047 nm to ensure good performance for both the input (VCSEL) and read (Nd:YLF) beams. The output Lens 3 is an adaptation of Reiley and Sasian's lens [14] .
Lens 2 has to be well-corrected over a large field (17.5-mm diagonal). This requirement led us to choose an f/4 lens after working through the trade-offs inherent in compound lens design. Lens 1 was then required to be f/6.7 to demagnify the data channel separation from the 250-m VCSEL pitch to the 149.5-m pitch of the detectors on the routing chip. In turn, a further demagnification is desirable between the switching chip and the output chip to minimize the area of the output chip. The use of microlenses on the VCSEL chip is required since the multimode emission from the VCSEL's contains 95% of the output power in a cone of Numerical Aperture (NA) 0.17. To collect this optical power with a single bulk lens would present challenges to later optical elements so it was decided to split the collimation function between an 8 8 array of refractive microlenses ( m, diameter 162 m) and the following bulk Lens 1. The fused-silica microlenses are operated at f/3 to reduce the NA of the beams to 0.075 and thus match the NA of Lens 1.
Modeling of the two arms of the system has shown that we can expect spot diameters at the detectors on the switching chip of 19 m (90% enclosed energy). The expected spot sizes for the read beams at the modulators (80%) are 26 m and the reflection from these is imaged to 13-m (80%) spots on the output chip detectors. All of these dimensions are taken at the extremes of the imaged field. When manufacturing and assembly tolerances are included in the model, the required diameter for these three devices is expected to be 24, 28, and 18 m, respectively. Following this analysis, and allowing for alignment margins, device diameters of 35 m have been specified for the modulators and detectors on the switching and output chips.
Discrimination between the two wavelengths used is required within the system. While the 1047-nm beam from the Nd:YLF read laser can be polarization routed, the InGaAs VC-SEL array is not polarization controlled. The two beamsplitters therefore need to act as PBS's at 1047 nm and as lossless polarization-independent reflectors at 956 nm to route the VCSEL emission efficiently. In addition, the large number of beams incident on the routing chip, combined with the degree of fan-out, requires that these beam-splitters operate over the relatively wide incident cone angle of 7.5 . The chosen PBS design is based on an air-spaced construction [15] , rather than the usual cemented glass cube configuration, to exploit the inherent asymmetry and thus improve the performance. The materials used for the high/low refractive index coatings are TiO ( ) and SiO ( ). 27 layers were coated on a substrate of B270 ( ) resulting in a total thickness of 4.7 m. Experimental tests of the fabricated beamsplitters have shown a contrast greater than 2:1 between and polarization reflection at 1047 nm and, importantly, more than 99% reflectivity of both and polarizations at 956 nm. These characteristics are maintained over a 16 angular There is scope for significant increase of the density of data channels relayed by optical schemes of the type described here. The spot size predictions above indicate that a receiver array on a pitch of 50 m could be addressed, allowing data communication rates in the order of 10 Tb/s using the lenses described above. This would require considerably more dense electronic design at the routing chip than currently but indicates the potential of free-space optical relay of data.
C. Optical Power Budget
A calculated optical power budget for the demonstrator system is shown in Table II . The optical power reaching the routing chip is of critical importance to ensure that sufficient Fig. 2 . The values tabulated are those at an extreme field angle corresponding to the worst case for beamsplitter performance. Because of the limited output power of the VCSEL's and the large fan-out, arm 1 is the most critical in terms of power budget. Hence, the stringent requirements on PBS reflectivity at 956 nm and the adoption of composite collimation optics for the VCSEL's. The predicted 8.6-W optical power on each of the routing chip detectors and the resulting photocurrent of 4.7 A, is expected to permit operation with a satisfactory margin.
IV. ELECTRONIC DESIGN
Hybrid OE-VLSI technology imposes a certain number of constraints on the layout of the silicon electronics. For example, the positioning of the photoreceiver circuits, modulator drivers and the solder-bump pads required for flipchip assembly, puts stringent constraints on the placement of electronic circuits with the two metal layer process chosen for fabrication. In addition the maximum die size permitted by the silicon process has implications on the degree of smartness of the super pixel. The physical organization of the electronic chip is discussed in this section in the light of these constraints. A description of the transceivers is also given, along with their predicted performance and the key design issues relevant to designing a terabits per second OE interface. 
2) Description of the Super-Pixel:
The functional configuration of one super-pixel is shown in Fig. 6 . This architecture encompasses input and output interface circuits ( and , respectively) an address decoder for each channel, a cyclic priority encoding block piloted by a 6-bit counter and a multiplexer which routes the selected input toward the light modulator acting as the output device for the super pixel. Each super-pixel includes two clock pixels (differential optical signal), 62 input data pixels, two output amplifiers for the modulators, and scan path test logic for monitoring the address decoders and multiplexers. The input and output interface circuits are, respectively, analog receivers and modulator drivers. Because of their importance in optoelectronic hybrid-VLSI technology, they are discussed in more detail in Section IV-B.
Input data at each of the detectors are bit serial and organized into packets of arbitrary but equal length for all the channels. Each packet consists of a header section and a data section. The 7-bit header consists of a flag which specifies a valid address and a 6-bit address which identifies the destination of the packet. There is a gap between header and payload (2 clock cycles) to allow additional time for proper multiplexer switching. Each superpixel has its own address and thus all received packets with an address that matches are considered for output. However, only the highest priority packet is successfully transmitted to the output modulator. This optically embedded routing mechanism associated with super pixel level arbitration of simultaneous output requests, removes the need for a global electronic control. It also avoids routing over long tracks across the chip and keeps all control signals local to a super-pixel. Another key aspect of this smart pixel architecture is the optical clock distribution to each of the super-pixels. This also avoids global routing across the chip and helps to minimize clock skew, potentially allowing higher clock rates.
The design of the arbitration scheme has been implemented so as not to compromise the performance of the data routing or the address decoding circuitry. An asynchronous arbitration ring, using a cyclic address encoder tree, is chosen over a random priority scheme since it requires less space on the chip and is simpler to test. No provision has been made, on the switching chip, for the buffering of data discarded during arbitration or for feeding back to the input information on the loss of the discarded packets.
A specific external electrical signal (sync in Fig. 7) indicates the beginning of an address. It should be noted that it is the only external signal needed to synchronize operation of all the super pixels. Consequently this signal is distributed over the whole circuit through long tracks but provided the first active clock edge arrives after sync has settled at each super pixel there is no problem regarding the propagation delay of this signal. Thus, the frame rate is affected but not the data rate of the payload. Since data transmission is asynchronous, the clock signal is only needed during address decoding phase. This helps in minimizing power consumption of the circuit.
3) Layout and Packaging of the
Chip: The overall chip (14580 15 640 m) has been designed on a 0.6-m technology from Austrian Mikro Systems. The core logic of the super pixel is surrounded by horizontal and vertical power supply tracks (120 m wide). The chip package is a PGA256 cavity-up carrier. 128 pins are devoted to digital power supply; 18 to digital signals including a global reset signal, a global synchronization signal, and 16 other test signals. 100 pins are devoted to analog power supply and bias signals.
B. Analog Design of Optoelectronic Receivers
The routing chip uses two types of receiver to amplify the detected photocurrent to a standard digital logic level: a single-ended receiver for the data channels and a differential receiver for the clock channel. The data receiver uses a simple, single-ended, dc-coupled transimpedance design similar to that described in [16] . The data receivers are designed to have a typical power consumption of 2.5-mW each, giving a total peak power consumption of 10 W during header decoding. The constraint of a two-level metal silicon process restricted the total width available for power analog supplies to approximately 40 m per pixel. The voltage drop due to the resistance of the power supply rails between the edge and the centre of the chip placed the primary limit on the receiver supply current. The average power consumption is reduced by selectively disabling receivers during data reception according to whether the packet address matched the super-pixel address.
Electrical tests of a prototype data receiver indicate a minimum sensitivity of 5-A peak photocurrent at a data rate of 100 Mb/s (Fig. 8) . The limited operating speed and the pattern-dependent-jitter in the eye diagrams were due to the low transit frequency of the on-chip, long-channel transistor that was used to generate the small input current for electrical testing; calculations predict a roll-off at about 40 MHz. Simulations indicate that the receiver itself operates to 200 Mb/s. The receiver sensitivity is limited by the dc offset arising from transistor mismatch rather than thermal noise. Our simulations indicate that the system will operate at a bit error rate as low as 10 . The design included in the final routing chip was modified to slightly improve sensitivity to 3.5 A, matching the requirements of the optical power budget.
The clock receiver uses an electrically differential transimpedance receiver (Fig. 9) . The two-beam implementation allows a higher bandwidth at the same photocurrent per diode for the clock channel so that it can support a 250-MHz return-to-zero (RZ) clock waveform (equivalent to a 500-Mb/s data stream). It also allows the two clock VCSEL's to be biased close to or slightly above threshold to reduce turn-on delay. Two single-ended transimpedance front-ends are followed by a differential transconductance amplifier; the currents are subtracted at node and converted to a digital output voltage by a second transimpedance stage. The transconductance-transimpedance postamplifier offers an improved gain-bandwidth product compared to a conventional voltage gain design [17] . A diode-connected clamp transistor [16] , in parallel with the feedback resistor of the final transimpedance stage, limits the output swing and prevents the nonlinearity of the feedback transistor from degrading the transient response. The main benefit of the electrically differential approach over the conventional approach to implementing a two-beam receiver [18] , which uses two photodiodes connected in series and requires electrically independent ntype and p-type contacts for all photodiodes, is a simplification of the InGaAs fabrication process (see Section V). The receiver circuit also has better immunity to common-mode voltage noise. The cost is a more complex receiver design with higher power consumption and layout area. However, the small number of clock channels per chip justifies the additional complexity. Simulations indicate operation to 500 Mb/s at a peak photocurrent of 3.5 A per photodiode but experimental tests were limited in speed by the same problem as the data receiver.
Note that the output chip is clocked using an electrical signal from the word generator (which provides the electrical data input to the system) and that the phase of this signal is adjusted manually to achieve synchronization with the routing chip.
An important design consideration was electrical crosstalk between receivers due to simultaneous switching noise in the analog power supply. Several techniques were used to control crosstalk: approximately 2 pF of gate-oxide decoupling capacitance were included in each pixel to filter the highfrequency supply noise; the decision stage of the receiver circuit, which created most of the switching noise, was given a separate analog power supply to that used by the sensitive front-end and post-amplifier; and a total of 100 external power supply pads were used for the analog supplies to provide acceptably low inductance.
The analog cells have been laid out to fit within a digital standard cell of height 38 m for ease of integration with the digital logic circuitry. The data receiver occupies a width of 117 m including flip-chip pads, power supply rails and decoupling capacitors. The clock receiver occupies a width of 255 m.
V. MODULATOR AND PHOTODETECTOR ARRAYS
The III-V semiconductor optoelectronic interface arrays comprise only modulators and detectors. The photodetectors receive the optical inputs and are single ended; that is, a single photodiode is used to detect each channel of information. The optical outputs (modulators) are differential; two diodes are used to "transmit" each channel of information, one carrying the data while the other carries its logical complement. The two diodes in the differential output pair use a common ntype bias voltage and separate digital driver circuits are used to drive the two p-type contacts with the true and complementary data. Compared to the conventional approach of two seriesconnected modulator diodes, this approach reduces switching noise on the modulator bias voltage, but requires larger silicon driver circuits to sink the total photocurrent through each diode of the pair rather than the difference between the two. The modulator bias voltage is separated from the detector bias to avoid electrical crosstalk and to permit separate optimization.
The arrays are fabricated from In(Al,Ga)As strain-balanced multiple quantum well (MQW) p-i-n structures grown by molecular beam epitaxy (MBE). The structures are deposited on GaAs substrates with an intervening buffer layer 2-mthick containing a linear grade in In concentration. The top p -InGaAs contact layer includes Be -doping to facilitate the formation of low resistance, nonalloyed ohmic contacts. Fuller details of the MBE grown MQW layers have been described in [19] . The processing of the arrays includes: 1) mesa isolation of the individual devices by a two-step wet chemical etch; 2) liftoff of a sputtered gold film with a bilayer photoresist to form nonalloyed contacts to the detectors and modulators (the gold film serves the additional purpose of a high-reflectivity mirror on the modulators); 3) trench isolation of the lower n contact layer to disconnect electrically the modulators and detectors; and 4) overall passivation with PECVD SiO . Fig. 10(a) shows a scanning electron micrograph of a cross section through an array. The isolation trench is located on the left hand side of the picture, the n metallization is left of centre, a detector mesa is on the right, while the overall SiO passivation layer is also clearly visible. Fig 10(b) shows a top view of modulator (middle) and detector mesas with the isolation trench clearly visible. Test diodes fabricated as part of an array indicate turn-on voltages around 0.8 V and reverse saturation currents of less than 10 nA for a mesa with a diameter of 30 m.
The modulators used in the demonstrator system are designed to operate with the available 5 V. However, there is a clear trend to decreasing voltages being used in the underlying Silicon CMOS, and it would be very desirable to remain compatible with these voltages. Thus modulator design at lower voltages is an important issue, and is also being pursued [20] . The basic physics of the quantum-confined Stark effect determines the electric fields required to change the absorption of a given material (InGaAs MQW's in this case). Thus to reach the same fields with lower voltage requires that the thickness of the region over which the voltage is dropped to be decreased, effectively decreasing the number of QW's. One approach to accommodating this decrease is to make better use of the available absorption by incorporating the active region in an optical cavity [21] . This can be achieved in these devices by incorporating a Bragg mirror stack between the substrate and the diode structure, forming an optical cavity with the back metal mirror. We have modeled the optimum combination of mirror reflectivity and number of quantum wells for different voltages [20] . As the voltage decreases the number of wells is decreased, with an increase in the mirror reflectivity to counteract the loss of absorbing material. The major limitation of this approach is one of device manufacture. For correct operation the wavelength of the absorption peak of the MQW material must be accurately positioned with respect to the resonant wavelength of the optical cavity, which is determined by the cavity length. Modern MBE growth can achieve accuracies of 1%. Modeling suggests this would be sufficient to grow devices for the low finesse cavities required for 3.3 V or perhaps 2-V operation, but would be very problematic for operation at lower voltages. To circumvent this problem we are investigating the possibility of postgrowth etching of the cavity thickness to tune the cavity resonance wavelength. More immediately, we are also harnessing this approach to obtain improved modulation performance with the existing 5-V electrical drive.
VI. VCSEL ARRAY AND PRINTED CIRCUIT BOARD
The demonstrator switching system includes an 8 8 VC-SEL array to provide the input signals. The fabrication of the array of VCSEL's and their drive circuits are described in this section.
A. VCSEL Structure
The overall design of the VCSEL's is shown in Fig. 11 and is based on dislocation-free strained AlAs-GaAs-InGaAs heterostructures. The MOVPE-grown VCSEL's are designed for top emission at 960 nm and consist of 30.5 pairs of /4 thick Al Ga As and GaAs layers for the bottom Bragg mirror and 22 pairs for the top mirror structure. In order to reduce the electrical resistance of the Bragg mirror structure, each interface is linearly graded over a region of 30 nm. To reduce the free carrier absorption, the doping level is increased in the region with low intensity of the standing optical wave and decreased in regions of high light intensity. The total length of the cavity is 270 nm including three central 8-nmwide In Ga As quantum wells separated by 10 nm wide GaAs barriers. A /2 thick GaAs contact layer concludes the structure. The individually addressable 8 8 (square) arrays were processed on quarter 2-in. wafers. The pitch is 250 m and the overall size of each array, including bonding pads, is 3 3 mm . The VCSEL's have 10-m circular aperture diameter. The measured capacitances of the centre VCSEL (of longest wiring) and border VCSEL (of shortest wiring) are calculated to be 3.05 pF and 1.05 pF, respectively.
B. Electrical and Optical Characteristics
The DC electrical and optical characteristics of the VCSEL array bonded onto their PCB are shown in Fig. 12 . The mean threshold current is 2.6 0.05 mA, the mean threshold voltage is 1.9 0.01 V and the output power at 8 mA is 1.25 0.02 mW. The power conversion efficiency is 6.3 0.1%. For this chosen array, all the 64 VCSEL's are lasing (100% yield) with an emission wavelength of 956 nm and a maximum wavelength variation across the array of nm at 8 mA of input current. The additional wavelength variation when all VCSEL's are operated simultaneously is expected to be nm, which corresponds to an estimated temperature increase of 17 K, that is caused by a dissipated power of around 600 mW. These wavelength variations are within the tolerance acceptable to the diffractive optical elements.
High-frequency operation of the VCSEL array prebiased at 1.9 V threshold, bonded onto the PCB, was tested with data rates up to 500 Mb/s. Fig. 13 shows the resulting eye diagram measured on an HP54750A digitizing oscilloscope. The eye is clearly open which demonstrates that data rates above 500-Mb/s nonreturn-to-zero (NRZ) are possible with this VCSEL/driver/PCB subsystem. Even without any prebias, the transfer of 250-Mb/s signals is possible. In this case the turn-on delay is 1.6 ns with an estimated maximum contribution of 0.7 ns from the PCB/driver subsystem. The electrical crosstalk can be a potential source for errors when many VCSEL's on the PCB/driver assembly are operated simultaneously, however this was found to be negligible. The measured optical intensity fluctuations induced by the modulation of one VCSEL on the output power from a neighboring VCSEL operated at a constant current of 8 mA was measured to be below 10% and further measurements show that the observed crosstalk originates from the parallel wiring on to PCB. Since the chosen VCSEL had the longest parallel wiring on the driver PCB and VCSEL array, these measurements represent the worst case. Table III summarizes the target specifications and the measured characteristics of the VCSEL array in the PCB/driver subsystem. The assembly exceeds all of the system specifications.
The fabrication of larger, and/or more dense, VCSEL arrays may require close integration with external silicon drive circuits by such techniques as flip-chip bonding to minimize crosstalk between the power lines to each device. Such integra- tion is being actively explored for possible future exploitation by a number of research groups.
C. PCB and VCSEL Drivers
The VCSEL array is mounted on an intermediate sapphire substrate, itself fixed on the PCB, as shown in Fig. 14 . To design the circuitry on the PCB, the VCSEL is modeled as a 105-resistance at the current versus voltage ( -) operating points (2.4 mA, 1.85 V) and (8.0 mA, 2.44 V). Using this model, a simple passive impedance two-port network has been designed to provide a match between the specified 50-digital signal generator and the VCSEL load. This gives a compromise between maximum bandwidth and a minimum sensitivity to the dispersion in the VSCEL characteristics over the array. The PCB is 109 mm long and 62 mm wide and is attached to the slotted baseplate by means of an adjustable mount. If we assume that 0 and 1 have the same probability of appearing in the signals, the heat dissipation is 0.77 W for the VCSEL array and 6.14 W for the PCB (6.91-W total). In the worst case (maximum dissipation, each diode emits all the time) we expect 1.25 W for the VCSEL array and 8.50 W for the PCB (9.75 W total). We have designed a heat sink that locates close to the back plane ( ground plane) of the PCB. A temperature sensor (PT100) is included in order to achieve feedback and be able to regulate the temperature. This is accomplished with the aid of a Peltier element incorporated into the heatsink.
VII. CONCLUSION
The system demonstrator described here brings together the full set of technologies required to take advantage of the potential of free-space optical interconnects. Detailed designs of all components have been completed and this permits us to draw some important conclusions.
Firstly, we have shown the feasibility of operating an optoelectronic system incorporating an optical input to a single silicon chip in the terabits per second domain. The designed system not only considers the hybrid OE-VLSI chip itself, as has already been investigated by other researchers, but also includes a practical infrastructure whereby the full performance can be demonstrated and assessed. As a design process alone this has proven to be an extremely valuable exercise, in advance of the later phase of this work in which experiments will aim to confirm the predictions of the various simulations.
Secondly, this study has acted as a driver for a number of important optoelectronic and optical technologies. An 8 8 VCSEL array capable of providing a 16-Gb/s electronic-tooptical interface has been developed and has been measured as satisfying numerous other system requirements, particularly with respect to array uniformity. New microoptical and diffractive optical elements have been fabricated along with compound lenses designed to handle large arrays of optical signals over the required fields of view ( 18 mm). Novel beam-splitters have been designed and successfully tested; and the challenge of providing a compact stable optomechanical assembly has been addressed. The development of our InGaAs modulator arrays has permitted the evaluation of improved (simplified) fabrication processes, suited to arrays of this size.
Finally the various constraints imposed by distributing optical interfaces across a conventional CMOS chip have been addressed. This latter has led to the development of low-power receivers with novel designs and a study of the impact of embedding large numbers of such sensitive analogu circuits with digital microelectronics. From this, we have concluded that the major factors limiting the communication capacity of this optoelectronic interconnect system are the constraints associated with the receiver designs. Combining sensitive analog electronics with digital circuitry in such close proximity is always going to be a challenge. In this system, with the limitation of using only a two-metal silicon process, the need to minimize: 1) voltage drop along the receiver power supply rails and 2) switching noise appearing on the analog power supply has proven particularly problematic. However, these are not fundamental difficulties and the use of present generation CMOS technology, with multiple metal layers and substantially smaller feature sizes, immediately remove these constraints. Another parameter that is greatly improved by the move to silicon with smaller feature sizes is the receiver power dissipation. In the designed system, the power dissipated by each receiver was 2.5 mW, corresponding to an overall power demand of 10 W (Tb/s) . This was close to the limit of what could be easily dissipated. A study of the impact of size scaling on power consumption [22] has shown that as feature sizes drop toward 0.1 m so the electrical power consumption of the receivers also drops, giving the prospect for optical input at the level of 0.3 W (Tb/s) although, beyond this point, physical limitations on transistor performance will limit further improvement in this metric. Similar studies by other authors support this trend [23] , [24] . It is worth noting that the operating speed per channel scales in line with the decreasing technology size even if there is no further improvement in photodiode capacitance from the values quoted above. A likely future implementation of a 1-Tb/s interface in a 0.1-m technology would be in the form of 256 channels of 4 Gb/s per channel. A system with 4096 channels, such as the one described here, is therefore potentially capable of significantly higher bandwidths. The detailed study referred to above [22] indicates that the performance of the electronic receiver is unlikely to be the limiting factor for bandwidths well in excess of 1 Tb/s. This illustrates one of the attractive features of using optical techniques to solve the interconnect problems of the future.
Another parameter that will increase with time is clock frequency. The SIA Roadmap [1] predicts off-chip data rates for application-specific integrated circuits to rise to over 1.5-3.0 GHz by 2012. This is well within the capability of the optoelectronic interface devices developed as part of this study. The desire to communicate off-chip at whatever clock frequency the silicon is operating can not only be satisfied by these optical methods but becomes progressively easier to satisfy as the silicon technology itself improves.
The system that we have developed as a focus for this research, corresponding to the switching fabric of a crossbar interconnect, is not intended to be a complete architecture in its own right. Nonetheless, the completion of its design and the assessment of the critical components illustrates the manner in which the enormous capacity of free-space optical interconnects might be exploited in realistic high-performance systems of the future. His main research interests are the design, fabrication, and characterization of VCSEL arrays.
Hans-Peter Gauggel was born in Sigmaringen, Germany, in 1966. He received the Dipl.-Phys. degree and the Dr. rer. nat. degree, both in physics, from the University of Stuttgart, Germany, in 1993 and 1998, respectively.
His graduate research work involved the design, processing and characterization of GaInP-AlGaInP DFB lasers.
Dr. Gauggel is a member of the Optoelectronics Group at the Swiss Centre for Electronics and Microtechnology (CSEM), Zurich, Switzerland, where he is working on the development and fabrication of GaAs-based VCSEL's and VCSEL arrays.
K.-H. Gulden received the diploma and Ph.D. degree in applied physics from the University of Erlangen-Nürnberg, Germany, in 1990 and 1994, respectively.
After joining the Paul Scherrer Institut, Zurich, Switzerland, he worked on the development of vertical-cavity surface-emitting lasers and integrated optoelectronic integrated circuits for applications in sensors, biochemical analysis, and optical interconnects. His expertise includes epitaxial growth, semiconductor processing and laser physics. Over 50 technical publications and three patents have accompanied this work. He is head of the Optoelectronic Device Section at the Centre Suisse d'Electronique et de Microtechnique SA (CSEM) in Zurich, Switzerland.
Alain Gauthier was born in Paris, France, in 1950. He received the degree of "diplôme d'ingénieur de l'institut National de Sciences Appliques de Rennes" in 1975.
He is currently a Professor of microelectronics. His research interests include both analog and digital electronics and particulary the design of sigma-delta analog-to-digital converters.
Philippe Benabes (M'98) was born in Nice, France, in 1967. He received the degree of "diplôme d'ingnieur de l'Ecole Centrale Paris" in 1989.
From 1989 to 1991, he worked for Thomson Sintra ASM as a board designer. He received the Ph.D. degree in 1994 for the work "wideband bandpass SD converters."
He is currently professor of electronics at SUPELEC. His research interest include both analog and digital electronics and particulary the design of bandpass sigma-delta analog-to-digital converters. Prof. Goetz is a member of the Optical Society of America and the Optical Society of France.
Jean-Louis Gutzwiller
