Abstract-We have constructed an optoelectronic very-largescale integration (OE-VLSI) chip with a 540-element receiver and transmitter array. Differential optical signaling was used in conjunction with a fully differential electrical architecture for the receiver and transmitter circuits. The chip was partitioned into multiple functional channels to demonstrate different chip-to-chip communication functions appropriate for applications of OE-VLSI technology. Wide optical input-output busses were provided for each channel in order to demonstrate high degrees of parallelism. The architecture and design of the chip are described in detail, including the digital functionality, the optoelectronic devices, and the arrays of receiver and transmitter circuits. The design verification of the chip is also described. We present experimental results that both verify the full functionality of the chip design and verify that the receiver and transmitter circuits and digital circuitry met their designed performance targets.
vertical-cavity surface-emitting lasers (VCSELs) and photodetectors (PDs), represent an enabling technology [5] [6] [7] [8] .
Using this technology, we have constructed an OE-VLSI chip with 1080 PDs and 1080 VCSELs that employs fully differential optical signaling to realize 540-element receiver and transmitter arrays that provide optical input-output (I/O) to the chip in addition to conventional electrical I/O. The PD and VCSEL arrays were heterogeneously integrated onto a CMOS VLSI chip using a flip-chip bonding and substrate removal process similar to that described elsewhere [9] . In addition to implementing multiple digital functions suitable for OE-VLSI technology, the receiver, transmitter, and digital circuits were designed for robustness, testability, and ease of operability.
Several large-scale OE-VLSI chips previously reported have employed a mix of single-ended [9] , [10] and differential [11] - [14] optical signaling. With the exception of a small subset of critical ODLs in [10] , however, all of the underlying electrical architectures of the receiver and transmitter circuits in these chips have been single-ended. For those chips that employed differential optical signaling, a principal motivation was to overcome the poor contrast ratio available from the use of multiple quantum-well devices in modulator-based transmitters.
The chip presented in this work employs differential optical signaling in conjunction with a fully differential electrical architecture for the receiver and transmitter circuits with the intention of improving the operability of large arrays of receivers and transmitters. Fully differential electrical architectures provide greater immunity to the effects of crosstalk and power supply switching noise [15] , [16] . The use of differential optical signaling with a dc-coupled fully differential receiver allows the receiver to perform self-threshold decision making through common-mode rejection [15] . This avoids the need to implement an offset control function to overcome the operational problems of large arrays of single-ended receivers that have a fixed decision threshold [17] , [18] . In single-ended optical signaling, switching noise that affects the output of the transmitter is optically transmitted to a receiver and directly impairs its operation, even when a differential electrical receiver architecture is employed. The use of differential optical signaling provides greater immunity for a receiver to optically transmitted switching noise [18] , [19] .
The drawbacks of using differential optical signaling are that the usable ODL density is halved as compared with single-ended signaling, and transmitter power dissipation is larger. In large-scale OE-VLSI applications, where the number of available ODLs is on the order of hundreds or thousands, halving the usable ODL density may not be a major sacrifice given the potential benefits described. Although the transmitter power dissipation increases with differential optical signaling due to the need to bias twice as many lasers, the additional power penalty diminishes as lasers with smaller threshold currents are developed. For tight optoelectronic device (OED) pitches of, for example, 125 m, the physical space available to implement a receiver or transmitter circuit with single-ended optical signaling can severely limit the circuit complexity that can be implemented. Thus, from a physical implementation perspective, differential optical signaling can be advantageous as it provides twice as much area to implement the receiver or transmitter.
This chip is to be used in a system demonstrator that implements a point-to-point interchip link using an optical system based on free-space optical interconnects or on fiber image guides. The objective of the system is to demonstrate different functions appropriate for applications of OE-VLSI technology in chip-to-chip interconnects. The chip was partitioned into four separate functional channels, with optical I/O busses provided for each channel and a single, shared electrical I/O bus whose width (128 bits) matches the needs of the optical I/O busses. This paper discusses the architecture and design of the chip in detail, including its digital functionality, the OEDs, and the receiver and transmitter circuits and arrays. The design verification of the chip is also described. Experimental verification of the chip performance and design is described in detail, including the development of the printed circuit board packaging and a custom software interface that was used to control the operation of the chip. Although the printed circuit board (PCB) used for the chip testing reported in this paper had limitations that prevented exhaustive optical testing of large numbers of ODLs simultaneously, it was possible to verify the performance of the chip at the targeted data rates.
The paper is organized as follows. Section II presents the CMOS chip architecture. Section III describes the digital section of the chip, consisting mainly of the digital-functional channels. Section IV describes the construction of the receiver and transmitter arrays, including the OEDs and heterogeneous integration, and the receiver and transmitter circuit design. Section V describes the multistep design verification of the chip, including the design of the validation PCB, the software interface, and the test benches that were developed. Section VI presents the experimental results obtained. Section VII is a summary and discussion section.
II. CMOS CHIP ARCHITECTURE
Multiple digital-functional channels were implemented on the chip to demonstrate applications suitable for OE-VLSI technology. The digital-functional channels included a mix of bit-serial processing/routing (often referred to as "smart-pixel" applications) [9] , [14] , [20] [21] [22] [23] [24] , and bit-parallel processing functions [25] , [26] . Given a channel-based approach, a modular chip floor plan [25] was chosen. A photograph of the chip after heterogeneous OED integration is shown in Fig. 1 , with the principal sections of the chip indicated: the receiver array, the digital section, and the transmitter array. These sections are connected with highly parallel on-chip electrical interconnects. The digital-functional blocks, to be described in subsequent sections, were arranged in the middle section of the chip, with electrical I/O, control, and power connections distributed along the top and bottom in multiple rows of flip-chip (C4) bonding pads. The receiver and transmitter arrays were placed on the left and right sides of the digital section of the chip, respectively, with dedicated differential pairs of optical inputs and outputs for each of the four digital channels. Power connections were placed along the top and bottom sections of the receiver and transmitter arrays in multiple rows of C4 pads. Some electrical I/O and control pads were placed on the left side of the receiver array and on the right side of the transmitter array. The chip had a total of 879 off-chip electrical connections, including 47 for serial data, control, and clocking, 128 each for the electrical input and output busses, 32 for digital power and ground, and 272 each for power and ground connections for the receiver and transmitter arrays. Multiple supply voltages were provided for power for the receivers (2.5 V), the transmitters (3.3 V), the digital section (2.5 V), and the electrical I/O pads (3.3 V). All of these supply voltages were independent of one another on-chip.
The receiver and transmitter circuits were designed for operation at 250 Mb/s to match the performance of the electrical I/O pads and most of the digital-functional channels. Achieving the target performance for the receiver design was complicated by the need for the receiver to drive long interconnect busses out of the receiver array. Achieving the target performance for the transmitter design was simpler since the long interconnect busses into the transmitter array were driven by the digital circuitry. The 14.6 by 7.5-mm chip was designed in a 0.25-m five metal, single poly, n-well CMOS process, and fabricated by the Taiwan Semiconductor Manufacturing Company. The Canadian Microelectronics Corporation 1 provided access to this technology through the MOSIS service. 2 Arrays of VCSELs and PDs were heterogeneously integrated with the chip to provide optical I/O capability. Access to heterogeneous OED integration via flip-chip bonding was provided by BAE Systems. 3 In Sections III and IV, we describe the design of the digital section of the chip and the design of the receiver and transmitter arrays in detail.
III. DIGITAL SECTION FUNCTIONALITY
The middle portion of Fig. 2 shows a block diagram of the digital section of the chip. The chip was partitioned into four separate functional channels, each of which is described in Sections III-A-D. Each channel was designed using conventional digital design methods. The Verilog hardware description language (HDL) [27] was used for coding the functionality of each channel, and Synopsys' Design Compiler software was used to synthesize these designs to the gate level. Cadence Design Systems' Design Planner and Silicon Ensemble software were used to perform automatic placement and routing for some channels and channel subblocks. Cadence Design Systems' IC Craftsman software was used to interconnect the subblocks and channels at the top level.
Banks of centrally located multiplexer circuits were used to connect the electrical input and output busses to the digital channels. The electrical input bus could be connected to multiple channels simultaneously. Only one channel (the active channel) could access the electrical output bus at any time.
Multiple sets of input and output buffers were used between the input and output bus pads, the digital channels, and the centrally located multiplexer circuits. Special care was taken in matching trace lengths and matching the number of buffers used between the different bits of the electrical busses in order to reduce skew.
The control register is a register file containing thirteen 8-bit registers that control the function and monitor the status of the digital channels. The control register is also used to set the connectivity of the electrical input and output busses. Individual registers in the register file are accessed by address. Writing to and reading from a register is performed bit-serially. To reduce the noise induced on the clock lines by the data input lines (i.e., crosstalk), a finite state machine (FSM) was implemented to control the programming function of the control register. In order to modify the contents of a register, a specific 4-bit sequence had to be input to the FSM.
The receiver and transmitter arrays served as optical inputs and outputs for the digital channels. Electrical data from the receiver array were buffered before being accessed by the digital channels. The transmitter design required complementary data inputs. Thus, the complement of each signal destined for the transmitter array was obtained using an inverter, and then both the data and its complement were buffered before being sent to the transmitter array. Care was taken in the selection of the inverter and buffer circuits to ensure that skew was not introduced in the complementary transmitter inputs.
A. First-In First-Out Channel
A typical problem encountered in OE-VLSI application-specified integrated circuit (ASIC) design is that the performance of the optical interconnect generally exceeds that of electrical I/O pads [28] , [29] . In applications where data on a wide data bus are obtained from an off-chip source to be transmitted optically, the performance of the electrical interconnect can be a bottleneck. The first-in first-out (FIFO) channel (ffChan) allows the optical transfer of data between two chips while decoupling the performance of the optical interconnect from that of the electrical I/O busses. Electrical data can be loaded slowly from an off-chip source and then burst-transmitted optically. The channel consists of a send FIFO and a receive FIFO, each implemented as a 128-bit by 16-word dual-port static random access memory (SRAM) block. Independent read and write clocks are used to load data to and read data from each FIFO.
Data transfer between two chips is illustrated in Fig. 3 . Fig. 3(a) shows the flow diagram of the send FIFO on the first chip. The 16 128-bit data words from the electrical input bus are written into the send FIFO using a write clock. An electrical read clock (operated by the read controller) is then used to read out the 16 128-bits data words from the send FIFO and send them to the transmitter array for optical transmission to the second chip. The electrical read clock is also sent to the transmitter array and, with a four-fold redundancy, is transmitted optically to the second chip as a strobe signal. Fig. 3(b) shows the flow diagram of the receive FIFO on the second chip. The received strobe signals from the first chip are demultiplexed, and the optimal signal is selected and used as a write clock to load the 16 128-bit data words received from the first chip into the receive FIFO. An electrical read clock is used to read out the 16 128-bit data words from the receive FIFO and send them to the electrical output bus.
The send and receive FIFOs in the ffChan can operate independently; thus, it is possible to establish bidirectional point-to-point communication between two chips. It is also possible to operate a single chip in an optical loop-back configuration, where the outputs of the transmitter array are connected to the inputs of the receiver array.
Features were added to the ffChan to facilitate the electrical-only testing of the send and receive FIFOs. The 128-bit serial scan chain (SSC) registers were added before the input port and after the output port of both the send and receive FIFOs, allowing the electrical input and output busses to be bypassed. An additional test feature (not shown in Fig. 3 ) was added where the output of the send FIFO could be connected directly to the input of the receive FIFO, bypassing the transmitter and receiver arrays. Using these test features, it was possible to test the read and write functions of each FIFO without relying on optical I/O.
B. Data Generation Channel
The data generation channel (dgChan) was implemented to facilitate the bit-error testing of large transmitter and receiver arrays at high speeds. It incorporates a 16-bit linear feedback shift register (LFSR) that generates a bit pseudorandom bit sequence (PRBS). The dgChan consists of 128 corresponding sets of toggle/scan and comparator cells (described below), transmitter and receiver circuits, and electrical input and output bus bits.
Simplified flow diagrams for the toggle/scan and comparator cells are shown in Fig. 4 . Fig. 4(a) shows the flow diagram of a toggle/scan cell, which can operate in toggle or scan mode. All 128 toggle/scan cells form an SSC, where the Scan Out pin of one cell in the chain was connected to the Scan In pin of the next cell in the chain. An enable signal is provided for each toggle/scan cell from a corresponding bit on the electrical input bus. If the enable bit is held low, the corresponding transmitter circuit is sent Logic 0. In toggle mode, the toggle/scan cell generates a square wave at one half the frequency of the input clock. In scan mode, the first toggle/scan cell in the scan chain takes its input from the output of the LFSR, and the PRBS is shifted through the chain of toggle/scan cells, optically transmitting the PRBS data. Although this approach allows for the generation of a large amount of PRBS data, the data appearing at neighboring toggle/scan cells are correlated in time [30] . Fig. 4(b) shows the flow diagram of a comparator cell. Each comparator cell could be used to perform rudimentary error detection by comparing optically received data against the expected data stream generated by the LFSR. The PRBS data from the corresponding toggle/scan cell is used for comparison against data obtained from the corresponding receiver circuit. If the received data is correct, the comparator cell output is Logic 1; otherwise, it is Logic 0. The corresponding bit on the electrical output bus can either be the output of the comparator cell or the direct output of its corresponding receiver.
Optical data transfer can take place between the transmitters and receivers of the same chip or between two different chips. In the single-chip case, the toggle/scan cells, comparator cells, and the LFSR operate from the same clock signal, making synchronization simple. In the two-chip case, more elaborate synchronization is required. The master chip performs the PRBS data generation and optical transmission. The slave chip receives optical data and performs the comparison. The slave chip also generates the PRBS data in phase with the master chip using its local LFSR by synchronizing its clock with that of the master chip and also by receiving a "start" signal from it. This is accomplished using clock and enable signals transmitted optically from the master chip to the slave chip. As these two signals are essential to the operation of the dgChan, the master chip optically transmits them each with four-fold redundancy. The slave chip subsequently demultiplexes these signals.
Features were added to the dgChan to facilitate the electrical-only testing of the comparator and toggle/scan cells. The D flip-flops (DFFs) within the comparator cells were connected in an SSC. Both this SSC and the one comprised of DFFs within the toggle/scan cells were accessible directly via chip I/O pads. They could be used to bypass the transmitter and receiver arrays as well as the electrical output bus. Using these test features, it was possible to test the read and write functions of each FIFO without relying on optical I/O.
C. Feed-Through Channel
The feed-through channel (ftChan) was implemented to perform data routing functions similar to those implemented in other smart-pixel based OE-VLSI chips [9] , [14] , [20] [21] [22] [23] [24] . It is capable of performing add, drop, and feed-through functions on 128 bits of data. An ftChan unit cell consisted of a receiver circuit from the receiver array, a multiplexer, a corresponding transmitter circuit from the transmitter array, and corresponding bits on the electrical input and output busses. In the add mode, data from the electrical input bus are transmitted optically by the transmitters. In the drop mode, optical data are received by the receivers and placed on the electrical output bus. In the feed-though mode, optical data are received by the receivers and immediately retransmitted optically by the transmitters as well as being placed on the electrical output bus.
Unlike the other digital channels, the ftChan does not provide any specific electrical-only test features. The ftChan is, however, the digital channel that provides the most direct access to transmitter and receiver circuits, without the need for any clocking. Many of the test results presented in Section VI were obtained using the ftChan.
D. Error Correction Channel
In the early development of OE-VLSI chips and systems, attaining high reliability of the optical interconnect has been a principal yet difficult goal to attain. Yield problems beyond those normally associated with VLSI fabrication can result in an ODL having a higher than normal bit-error rate (BER) or one that is permanently inoperative. Such problems can occur, for example, during heterogeneous OED integration or during system packaging and alignment with an optical system, which can result in dead VCSELs, dead PDs, or reduced optical link power throughput due to aberrations or misalignment in the optical system.
The use of forward error correction (FEC) techniques in long-haul optical communication systems is a common approach to improving reliability and is typically based on long block lengths and high information rates using time-sequential (i.e., bit-serial) encoding and decoding functions [31] , as illustrated in Fig. 5(a) . The use of data (de)multiplexing in the time domain is also common in such schemes. In OE-VLSI systems, the density of the optical interconnect precludes the need for data (de)multiplexing. It is also preferable to avoid time-sequential encoding and decoding because a permanently inoperative ODL cannot be overcome. FEC approach using a parallel decoder that receives an entire code word each clock cycle from the parallel transmission of optical data on multiple ODLs [32] [33] [34] . In such an approach using an appropriate FEC scheme, it is possible for one or more permanently inoperative ODLs to exist and still maintain error-free communication.
The error correction channel (ecChan) was implemented as a parallel coder-decoder pair (codec). The FEC scheme selected was the Golay code [35] . The block length of the Golay code is 24 bits, its information rate is 0.5 (it is thus referred to as a (24, 12) code), and it is capable of correcting up to three errors and detecting up to four errors in a 24-bit encoded word. The selected implementation of the (24, 12) Golay codec, which uses purely combinational circuitry, is particularly suitable for VLSI implementation due to its compactness and low latency [32] , [36] .
The ecChan has six subchannels, each consisting of 24 receiver circuits, 24 transmitter circuits, an encoder, a decoder, and the 12 corresponding payload bits on the electrical input and output busses. In total, the ecChan uses 72 bits of data on the electrical input and output busses and 144 receiver and transmitter circuits. Although twice as many ODLs are required to achieve the same throughput of an unencoded link for the same ODL data rate, it should be noted that, theoretically, the use of parallel FEC could achieve an improved ODL data rate due to coding gain [32] .
There are three modes of operation for the ecChan. In the normal mode, there are only combinational circuit elements in the data paths. The six subchannel encoders directly encode the 72 bits of data on the electrical input bus, with the resulting 144 bits of encoded data sent to the transmitter circuits. On the decoding side, the 144 bits of data obtained by the receiver circuits are decoded by the six subchannel decoders into 72 bits and placed on the electrical output bus. In the test mode, a number of sequential circuit elements are added to the data paths to facilitate electrical-only testing. The encoder has SSC registers at its inputs and outputs that can be used to bypass the electrical input bus and the transmitter circuits, respectively. The decoder also has SSC registers at its inputs and outputs that can be used to bypass the receiver circuits and the electrical output bus, respectively. The decoder has additional registers to allow errors to be inserted into the decoder input and to monitor the internal behavior of the decoder. In the bypass mode, the encoder and decoder of each subchannel are bypassed. No data processing is performed in this mode, facilitating the testing of the remaining circuitry.
IV. RECEIVER AND TRANSMITTER ARRAYS
The left-and right-hand portions of Fig. 2 show block diagrams of the receiver and transmitter arrays, respectively. The receiver and transmitter arrays are comprised of 540 receiver and 540 transmitter circuits and serve as optical I/O for the digital-functional channels described in Section III. The receiver and transmitter circuits in the arrays [discussed in Sections IV-A-D] were organized into groups that were controlled independently to allow for operational flexibility. Each channel had 128 bits of optical I/O organized as four common-control groups of 32 bits in each array, as indicated in Fig. 2 . Additional optical I/O was allotted to some channels on the right-hand side of the receiver array and the left-hand side of the transmitter array, as indicated in Fig. 2 . The ffChan had an additional 4 bits of optical I/O used for the read and write clocks for the channel's send and receive FIFOs, respectively. The dgChan had an additional 8 bits of optical I/O used for clock and enable signals. The ecChan had two additional sets of 8 bits of optical I/O. These additional sets of bits for the ecChan are located adjacent to the 32-bit common-control groups of the ftChan and ecChan in the receiver and transmitter arrays (refer to Fig. 2 ). All additional sets of transmitter and receiver circuits formed independent common-control groups in each array.
A vertical column of digital buffers was inserted approximately in the middle of the receiver and transmitter arrays, as shown in Fig. 2 . This was done to break up the long electrical interconnects out of the receiver array and into the transmitter array into shorter and more uniform segments. The control registers for the receiver and transmitter arrays were located at the vertical center of their respective arrays, with some dedicated I/O and power pads on the left-hand side of the receiver array and the right-hand side of the transmitter array, as shown in Fig. 2 . These control registers were used to digitally set the magnitudes of various parameters (feedback resistance magnitude for the receivers and bias and modulation current magnitude for the transmitters) and to control the enable and test signals for each common-control group. The power pads along the top of the arrays provide power for the receiver and transmitter circuits for the ffChan and dgChan, and the power pads along the bottom of the arrays provide power for the receiver and transmitter circuits for the ftChan and ecChan. The supply voltage for the receiver array was 2.5 V for voltage-level compatibility with the digital section of the chip, to which it directly interfaces. The supply voltage for the transmitter array was 3.3 V to accommodate the relatively large forward-bias voltage drop of the VCSELs. The inputs to the transmitter array are generated directly from the digital section of the chip and are rail-to-rail 2.5 V CMOS signals.
The receiver and transmitter circuits were designed for operation at a data rate of 250 Mb/s. Sections IV-A-D describe the characteristics and modeling of the PDs and VCSELs used in the receiver and transmitter arrays, the design of the receiver and transmitter circuits, and the formation of the receiver and transmitter arrays.
A. PD and VCSEL Properties, Modeling, and Integration
The 34-row by 35-column PD and VCSEL arrays were heterogeneously integrated with the CMOS chip using two flip-chip bonding and substrate removal procedures similar to that described in [9] . Of the 1190 devices in each OED array, 1080 were used with transmitter or receiver circuits. The remaining 110 OEDs in each array are physically present but not electrically connected to receiver or transmitter circuits. This occurs in the two horizontal rows of OEDs that covered the receiver and transmitter array control registers and the vertical column of OEDs that covered the buffers in the receiver and transmitter arrays (refer to Fig. 2) . Additionally, two sets of four OEDs near the four-element common-control group of the ffChan (refer to Fig. 2) were not connected to receiver or transmitter circuits.
The PDs were fabricated by BAE Systems with square active areas of 50 m on a side and 15 15 m pads for the p and n contacts. The array was constructed as a tile of two-row by one-column unit cells on 125-m horizontal and 250-m vertical pitches. Within a unit cell, the p and n contacts of the top and bottom PDs were mirrored about the horizontal axis. The PD active areas were placed on a 125-m vertical pitch. When the unit cells were tiled, the active areas of all PDs were on a 125-m grid. tacts and were arranged on a standard 125-m grid. Fig. 7(a) and (b) shows photographs of a 2 2 VCSEL subarray before and after heterogeneous integration, respectively.
Measurements were taken on OED samples to determine their nominal optical and electrical properties and their sensitivity to temperature variations. Table I summarizes the optical and elec-trical properties of the samples. All property values are measured quantities except for those stated as exact numbers, which are quoted from OED manufacturer data sheets. The PD responsivity across the sample array was found to be highly uniform and insensitive to reverse bias voltage, temperature, wavelength, and incident power. Most VCSEL parameters were also found to be fairly uniform across the sample array, with the differential resistance being an exception.
During receiver design, the PDs were modeled using lumped circuit elements, with optical power represented as a voltage signal. A capacitor was used to model the junction capacitance, and was placed in parallel with a voltage controlled current source to model the conversion of incident optical power to input photocurrent, using as the transconductance gain parameter. Two different models were used for the VCSELs during transmitter design. One was a simple model using lumped circuit elements. A dc voltage source was placed in series with a resistor , and both elements were placed in parallel with a capacitor to model the junction capacitance. The VCSEL output power, represented as a voltage signal, was modeled using a current-controlled voltage source using as the transresistance gain parameter. Although satisfactory for initial transmitter designs and used frequently in the literature [8] , [9] , [37] , this lumped element model was unable to model critical elements of VCSEL operation such as below-or near-threshold operation and the temperature dependencies of and . A more accurate VCSEL model based on behavioral modeling [38] [39] [40] was developed using Verilog-A, a behavioral HDL for analog circuits. The Verilog-A model was based on the lumped circuit model, but it incorporated the typical diode-like exponential current-voltage relationship, as well as the temperature dependence of the threshold current and slope efficiency, as detailed in Table I .
B. Receiver Circuit Design
One of the principal reasons that differential optical signaling was employed was to accommodate the dc-coupled nature of the input photocurrent. Optically single-ended receivers have a fixed decision threshold, and variations in the average input photocurrent across an array of receivers can cause severe operational problems in receiver groups that are commonly biased and/or controlled [17] . In optically differential receivers, a fully differential preamplifier architecture with common-mode feedback (CMFB) circuitry [41] stabilizes the operating point and common-mode output voltage of the preamplifier in the face of variations in common-mode input photocurrent.
Two preamplifier designs were implemented. One is based on a feedback-free common-gate amplifier (CGA) with a diode-connected load, and the other is based on a conventional differential transimpedance amplifier (TIA) with resistive feedback. Transistor-level schematics of the CGA and TIA are shown in Fig. 8(a) and (b) , respectively. A common feature of the two preamplifier designs was the inclusion of circuitry (transistors MP in Fig. 8 ) to allow for functional circuit testing prior to heterogeneous OED integration. After heterogeneous integration, the PDs would appear in parallel with transistors MP, as indicated by the dashed lines in Fig. 8 . The corresponding active-low digital control inputs and for the receivers in all common-control groups that form a digital-functional channel are connected together, and can be used to inject small amounts of current (approximately 60 A) into either input of the preamplifier circuit, mimicking a differential input photocurrent.
The feedback resistances used in the TIA preamplifier configuration [resistors RF in Fig. 8(b) ] are implemented using active devices and can be tuned using digital control inputs [42] . The transistor-level implementation of the feedback resistance is shown in Fig. 9 . One reference N-type metal-oxide-semiconductor (NMOS) transistor (MR) and four other NMOS transistors (M0-M3) are all connected in parallel. MR was kept permanently conducting by having its gate terminal connected to the receiver supply voltage. This establishes a nominal resistance equal to approximately 16 k for small current magnitudes. Transistors M0 through M3 have width-to-length (W/L) ratios progressively increasing by a factor of two. M0 is the smallest with the same W/L ratio as MR. When M0-M3 are made conductive by setting their corresponding gate terminal control voltages ( ) to a digital Logic 1 voltage, resistor values of approximately 16, 8, 4 , and 2 k , respectively, are established. A total of 16 different effective RF magnitudes can be established from 16 down to 1 k . The active-high control inputs are common to all receivers in a common-control group, with different control input sets for each common-control group. Fig. 10 shows a block diagram of the receiver and is applicable for either preamplifier design. There are four circuit stages that follow the preamplifier, including two postamplifier stages, a Schmitt-Trigger inverter stage, and a line driver stage. The two postamplifier stages are parallel NMOS-and P-type metal-oxide-semiconductor-based folded cascode differential amplifiers that employ feedback to maintain bias point stability [43] . They are intended to amplify the preamplifier output to signal levels that approach the receiver power supply rails and to convert the differential signal into a single-ended one. The hysterysis provided in the transfer characteristic of the Schmitt-Trigger [44] stage provides immunity to switching noise on the receiver power supply rails [9] , [45] . The line driver stage is a pair of cascaded inverters with a W/L stage ratio of approximately three. Its purpose is to ensure rail-to-rail receiver operation and to drive the on-chip interconnect to the nearest buffer, which can be up to 2 mm away, at the target data rate of 250 Mb/s.
A representative receiver layout with the CGA-based preamplifier design is shown in Fig. 11 . The receiver layout for the TIA-based preamplifier design is very similar. The layout is divided into two separate regions, each of which is protected by guard rings to immunize the circuit from substrate noise [46] . The line driver stage is in the smaller region on the right side of Fig. 11 , isolated from all of the other receiver circuit stages in the larger region on the left. Isolating the line driver stage helps prevent the switching noise it generates from affecting the operation of the other receiver circuit stages and neighboring receivers. The region with the line driver stage has 25-m horizontal and 18-m vertical dimensions. The region with the remaining receiver circuit stages has approximately 80-m horizontal and 36-m vertical dimensions. The two layout regions are separated by approximately 25 m. This separation could have been made larger to improve isolation, but would have required the Schmitt-Trigger stage to be designed to drive a longer interconnect, which would have resulted in additional switching noise being generated.
The power dissipation of the receiver circuit was estimated to be between 8.5 and 9.5 mW per receiver under most operating conditions at a data rate of 250 Mb/s. The ftChan, which uses the smallest number of receivers, would dissipate approximately 
C. Transmitter Circuit Design
The optically differential transmitter circuit is based on a current steering design, and its simplified schematic is shown in Fig. 12 . A current steering design was chosen over a current switching design [30] to minimize generation of power supply switching noise. Each VCSEL is offset biased with a bias current (IB). The intermediate modulation current (IM) was steered through one of the two VCSELs by transistors M1L and M1R using complementary rail-to-rail CMOS inputs and . Operation of the transmitter is as follows: In the logic low (high) transmitter state with low (high) and high (low), IM is steered through M1R (M1L), causing to output a large amount of optical power and to output a smaller amount of optical power.
The IM and IB current sources shown in Fig. 12 are tunable from control circuits with digital control inputs called the modulation and bias current control block (MCCB and BCCB), respectively [42] . The MCCB and BCCB are digital to analog converters and are identical except for the sizes of their constituent transistors; those in the MCCB are twice as large as their BCCB counterparts. The inputs to the MCCB and BCCB are sets of five active-low digital control signals used to encode the current magnitude. For the MCCB, the range of settable IM currents was from 0 to 4.96 mA in increments of 160 A. For the BCCB, the range of settable IB currents was from 0 to 2.48 mA in increments of 80 A. Fig. 13 shows the simplified schematic of the MCCB and BCCB. Each digital control input is connected to a corresponding switchable current source. The least significant control bit is connected to the smallest current source, and subsequent control bits are connected to current sources whose magnitudes progressively increase by a factor of two. A diode-connected transistor collects current from the activated current sources and generates the control signal . Each common-control transmitter group has its own MCCB and BCCB, and the generated signals are used to set the magnitudes of IM and IB for all transmitter circuits in the common-control group. Each common-control group of transmitter circuits has independent control inputs. The transmitter design includes circuitry to allow for circuit testing prior to heterogeneous OED integration in the form of transistors M2L and M2R (see Fig. 12 ). These additional transistors would be in parallel with the VCSELs after heterogeneous OED integration is performed, as indicated by the dashed lines in Fig. 12 . The corresponding active-low digital control inputs and are common to all transmitters in all common-control circuit groups in a digital-functional channel. When enabled, M2L and M2R allow electrical paths to exist for the IM and IB currents to flow when the VCSELs are not present in the circuit topology.
The physical layout of the transmitter is shown in Fig. 14 
D. Array Construction
The construction of the receiver and transmitter array layouts was performed manually using a bottom-up approach. Receiver unit cell layouts for each of the two preamplifier designs and the transmitter unit cell layout were prepared, including vertically oriented power and ground rails and appropriately located contact pads and passivation windows for OED integration. Upon completion, the unit cell receiver and transmitter layout blocks were easily tiled to form the common-control circuit groups for the digital-functional channels. In the case of the receivers, a common-control group always consisted of receiver designs of the same type-CGA and TIA designs were never mixed.
Upon completion of the layouts for the common-control receiver and transmitter circuit groups, the layouts of the portions of the receiver and transmitter array for each of the digital-functional channels was prepared by tiling the layouts of the common-control circuit groups. With the exception of the ftChan, all common-control receiver groups that form a digital-functional channel consisted of receiver designs of the same type to maintain uniform receiver latency within a channel. The common-control groups of the ffChan and ecChan used TIAbased receiver designs, whereas those of the dgChan used CGAbased receiver designs. In the case of the ftChan, uniform latency was not a design consideration, and the receiver designs used for the four common-control groups were evenly split between TIA-and CGA-based designs.
The interconnections between the receiver outputs and buffer circuits and between the buffer circuits and transmitter inputs were then routed. The layouts of the two receivers in a receiver unit cell were mirrored about the center horizontal axis of the unit cell to match the specialized PD contact arrangement described earlier in Section IV-A. In the receiver design, the p and n contacts of each PD were connected to the preamplifier input and to the receiver supply voltage, respectively. The supply voltages were routed vertically using the top two metal layers. Thus, by performing the horizontal mirroring of the PD contacts and receiver layouts, wide horizontal channels devoid of any interfering circuitry or lower level metal layers were created below the PD n contacts when receiver unit cells were tiled. These wide routing channels facilitated the routing of the many receiver outputs out of the receiver array with increased spacing, helping to reduce electrical crosstalk. Fig. 15 illustrates a 2 2 layout tile of receiver unit cells, including the locations of passivation windows for the PD p and n contacts. The locations of the routing channels are indicated.
Completion of the receiver and transmitter arrays was performed by positioning each section of the array corresponding to the digital-functional channels, adding the layouts for the power pads, buffer circuits, and the receiver control I/O and then manually routing all the electrical interconnections.
V. POSTFABRICATION ASIC DESIGN VERIFICATION

A. Validation PCB
A custom PCB was designed and fabricated to assist in design verification and to perform testing of the ASIC. The backside of the validation PCB contained a conductive area for packaging the ASIC using a chip-on-board approach. Capacitors were placed around the ASIC to decouple the power supplies and reduce switching noise. The conductive area was plated with gold, contained thermal vias to aid in the distribution of heat dissipated by the ASIC and was maintained at ground potential. Due to wirebond pitch limitations on the chip and the validation PCB, bonding fingers were provided to connect to only a subset of the pads on the ASIC. These included the rows of pads that provide power to the receiver and transmitter arrays that are closest to the chip edges and one row of power pads for the digital section of the chip and for the chip pads themselves. Additionally, bonding fingers were provided for all of the scan chain I/O, control, and clock pads for the digital section of the chip and the receiver and transmitter control registers. There were no connections for the electrical input and output busses. Of the 879 chip pads, only 216 wirebond connections were made to bonding fingers of the validation PCB. Fig. 16(a) shows a photograph of the portion of the backside of the validation PCB, where the ASIC is packaged. Fig. 16(b) shows a photograph of the entire backside of the validation PCB. It should be noted that the ASIC was designed for packaging via flip-chip bonding to a PCB or fanout substrate for the incorporation of the chip in a point-to-point interchip link.
The front side of the validation PCB contains all of the components used to operate the chip, and is shown in Fig. 16(c) . Connectors provide raw power to sets of voltage regulators for the digital section of the chip and the top and bottom portions of the receiver and transmitter arrays. The voltage regulators serve to minimize resistive voltage drops on these power supply lines. A 100-pin connector was used to interface the various control and data lines on the chip with a digital I/O card inside a computer running a custom software interface program described in Section V-B. This ASIC-computer interface is capable of operating only at slow speeds on the order of 100 kHz. To allow the chip to be operated significantly faster, one of the global clock signals on the chip was connected to a high-speed connector on the validation PCB. The high-speed connector was positioned close to the corresponding clock pad of the chip.
B. Software Interface
A custom software interface was developed to control a National Instruments PCI-DIO96 digital I/O card. The I/O card interfaced with and controlled all operational aspects of the ASIC on the validation PCB. A graphical user interface (GUI) was used to control low-level capabilities, including the ability to write or read individual pins on the ASIC and to write or read any of the registers in the digital section control register and the receiver and transmitter control registers. Additional GUIs, which make use of the low-level routines, were developed to facilitate control of the receiver and transmitter arrays when performing optical testing in the laboratory.
Higher level test bench routines and GUIs were developed to perform extensive testing of the digital-functional channels and control registers. These test benches were ported to the C++ programming language directly from the Verilog test bench routines used to verify the design of the ASIC prior to fabrication. Several scan tests were implemented for the control registers, the SSCs at the input and output ports of the ffChan send and receive FIFOs, the SSCs for the toggle/scan and comparator cells of the dgChan, and the SSCs at the inputs and outputs of the encoders and decoders of the ecChan. These scan tests scan a series of test vectors through the SSCs and verifies that the same vectors are scanned out. Test vector generation could be achieved either deterministically or randomly. For the ffChan, a number of memory tests were implemented to detect defects in the FIFO SRAM storage elements. These tests write vectors to the SRAM, and then read them back out for verification. Multiple sequence types could be generated to test for various defect classes [47] , such as random vectors, constant-increment vector sequences, and walking zeroes and ones . For the ecChan, special test benches were developed to test the encoding and decoding functions. Unencoded vectors could be scanned in to test the encoder. Pre-encoded vectors could be scanned in to test the decoder. The pre-encoded vectors could also be scanned in and have errors injected prior to being decoded to test the error-correction capabilities of the decoder.
C. Design Verification
Postfabrication verification of the chip design was performed in multiple steps. During fabrication of the chip, the software interface was debugged using a simulated version of the ASIC [48] . Verilog and Verilog-A HDL models were developed for the digital circuitry and the receiver and transmitter arrays. The complete HDL model for the chip was simulated within the Affirma software environment from Cadence Design Systems, running on a SUN Ultra-60 workstation. The software interface for the chip, running on an IBM-compatible personal computer, interfaced with the simulated chip by replacing the PCI-DIO96 card interface code with function calls from the Verilog programming language interface, allowing the I/O pads of the simulated chip to be accessed through TCP/IP ports.
Upon completion of chip fabrication and software interface debugging, a chip, without heterogeneous OED integration having been performed, was packaged on a validation PCB. Electrical-only testing was performed using the software interface to verify the chip design by running through the suite of digital test benches described in Section V-B. During initial testing, a design error was discovered where a clock and a data input pad were incorrectly wired to the internal circuitry. These errors were corrected on a number of fabricated chips using a focused ion beam microsurgery technique from Fibics, Inc. 5 The repaired chips subsequently passed all of the digital test benches. Electrical-only testing was also performed on the receiver and transmitter circuits, making use of the and digital control inputs, as discussed in Section IV. For the receivers, these control inputs were used to force all outputs to a common logic state, which were checked using the available SSC registers in the digital-functional channels. For the transmitters, an indirect form of operational testing was used. The transmitter inputs were accessed via the available SSC registers in the digital-functional channels. Using these inputs, along with the and control inputs and the IB and IM magnitude control inputs, various current paths in the transmitter were established and eliminated, allowing crude testing to be performed by monitoring changes in the current drawn by the voltage regulators on the validation PCB.
VI. EXPERIMENTAL TEST RESULTS
After the ASIC design was verified, heterogeneous OED integration was performed on additional chips which were subsequently packaged on validation PCBs. Most of the test benches described in Section V were run on these chips to ensure they were free from manufacturing defects. Subsequently, a number of optical experiments were performed to characterize the postintegration qualities of the receiver and transmitter circuits.
The postintegration yield of the VCSELs in the transmitter array was determined by setting the bias currents of all the transmitters to forward-bias all of the VCSELs in the array. There were 1047 of 1080 VCSELs operative, for a yield of 96.9%. Fig. 17 shows the illuminated transmitter array biased below threshold. The inoperative VCSELs are indicated by white circles and were found to occur in a mix of random and clustered locations. It should be noted that the dark regions (one single vertical column, two horizontal rows, and two sets of 2 2 squares in the top-left section) of the otherwise illuminated array correspond to the locations of VCSELs that are not connected to transmitter circuits, as described in Section IV-A.
The light-current characteristics for the VCSELs in a representative common-control group (the common-control group of the dgChan in which all 64 VCSELs are operative) are shown in Fig. 18 . The IB/64 curve was obtained by measuring the aggregate output power of all 64 VCSELs as the IB magnitude setting was swept through all 32 possible states from 0 to 2.48 mA with IM kept at 0 mA. The IM/32 curve was obtained by setting all of the transmitters in the group to be in Logic State 0 and measuring the aggregate output power of 32 VCSELs as the IM magnitude setting was swept through all 32 possible states from Comparing these results to Table I , we find that the experimentally measured is slightly larger than the specified value of 1.40 mA, whereas is significantly smaller than the 0.340-W/A specification. An increasingly elevated VCSEL operating temperature as the bias and modulation currents are increased is one factor in the discrepancy because there was no means available to stabilize the temperature of the chip. Based on the threshold current temperature coefficient from Table I,  the elevated obtained experimentally corresponds to an elevated temperature of approximately 12.5 C, assuming no other factors contributed to the increase. Using a similar analysis with the slope efficiency temperature coefficient, the reduced corresponds to an elevated temperature of approximately 70 C, assuming no other factors contributed to the decrease. Clearly, temperature was not the sole factor that resulted in an elevated and a reduced . Experimental uncertainty in performing the measurements is also a factor. Any presence of and variations for individual VCSELs within the common-control transmitter group were subject to averaging effects that could distort the results. Additionally, imperfect current mirroring in the MCCB and BCCB could result in either a larger or smaller IB or IM current flowing in the VCSELs than the intended settings. It is also possible that the VCSEL properties were degraded as a result of the heterogeneous integration process.
The VCSEL output power in each common-control transmitter group was obtained with their IB magnitude setting at 2.0 mA. Fig. 19 shows the results of this measurement, with the -and -axes corresponding to a spatial arrangement of common-control groups. This spatial arrangement is a horizontally mirrored version of the one shown on the right-hand side of Fig. 2 . The average VCSEL output power was obtained by dividing the total measured power by the number of operative VCSELs for each common-control group. These results, which are subject to the same experimental uncertainty described earlier, suggest that neighboring VCSELs tend to have much greater parametric uniformity as compared with those in disparate parts of the array. For the different common-control transmitter groups within a digital-functional channel, variation in the average VCSEL output power ranged from 9.7 (ffChan and dgChan) to 11.7 (ecChan). Across the entire VCSEL array, variation in the average VCSEL output power was 51.1 . A likely reason for the large array-scale variation is the variation in temperature across the chip caused by the different amounts of power dissipated by each functional channel, which could exaggerate the temperature-induced disparity in VCSEL properties from one transmitter circuit group to another. The packaging limitations of the validation PCB was another reason, as only the outermost row of power pads for the transmitter array were wirebonded to, resulting in unwanted IR voltage drops on the power supply and ground rails.
An experiment involving a common-control transmitter group (the same for which the data in Fig. 18 is plotted) on one validation board and a common-control receiver group on another validation board (in the ecChan) was performed to investigate the receiver switching characteristics under dc conditions. The two common-control groups were imaged onto each other using a bulk-lens optical system. The and inputs for the common-control receiver group were set to complementary states, injecting approximately 60 A of current into one input of each receiver, forcing all receivers to a Logic 0 state. Light from the common-control transmitter group was incident onto corresponding PDs such that the resultant photocurrent was injected into the other input of each receiver, attempting to force them to the Logic 1 state. The IB magnitude of the common-control transmitter group was kept at 0 mA, and the IM magnitude was swept through all possible states between 0 and 4.96 mA. For each IM setting, the average transmitted optical power per VCSEL, estimated received optical power per receiver, and the percentage of receivers in the common-control group that switched to the Logic 1 state were recorded. Fig. 20 shows the results of this analysis. All but two receivers in the common-control group could eventually be made to switch in this experiment, one of which was verified to be in a permanent stuck-at-zero state using the and test inputs. It should be noted that this experiment lumps together numerous effects that could detrimentally affect the percentage of switched receivers, such as nonuniform transmitted optical power, power throughput variations in the imaging system across the field of view due to misalignment and aberrations, and nonuniformity in across the common-control receiver group. Fig. 20 indicates that, if provided with enough optical power (approximately 360 W), most receivers could be made to switch notwithstanding these detrimental effects.
The operational performance of the receiver and transmitter circuits was characterized at various data rates and operating conditions. The transmitter circuits in the dgChan were tested at high speed by configuring the channel to generate PRBS data and using a high-speed clock signal brought onto the chip via the high-speed connector on the validation PCB. The optical output patterns from individual VCSELs were spatially filtered and captured by an external detector, and eye diagrams were generated on a communication signal analyzer (CSA). Fig. 21 shows eye diagrams from various transmitters at a data rate of 250 Mb/s. Eye diagrams along a row correspond to the complementary outputs of the same transmitter and exhibit varying degrees of similarity from one transmitter to another. Fig. 22 shows the eye diagrams from one output of a transmitter at data rates of 250, 300, 600, and 900 Mb/s. Despite the long on-chip interconnect between the digital section of the chip and the transmitter array, it is clear that the transmitters perform well at data rates well above their target 250-Mb/s data rate.
The receiver circuits in the ftChan were tested by configuring the channel to operate in the drop mode and using single-ended PRBS optical data from an external laser source. The corresponding bit on the electrical output bus was accessed via microprobing, and eye diagrams were generated on the CSA. Fig. 23 shows eye diagrams from selected receivers at data rates of 10, 25, 100, and 250 Mb/s. There is a significant amount of switching noise present, as well as voltage overshoot and undershoot that has been intentionally cropped to show the full eyes. A likely cause of the switching noise and voltage overshoot and undershoot is that there were an insufficient number of digital I/O power pad connections made to the chip from the validation PCB. Only two such connections were made for the entire electrical I/O bus.
A similar setup was used to test the combined optical-electrical-optical performance of the receiver and transmitter circuits by operating the ftChan in feed-through mode. The optical input to the receiver was converted to an electrical signal and passed through the digital section of the chip directly to the corresponding transmitter in the transmitter array, which retransmitted the data optically. The optical output from the transmitter was spatially filtered and captured by an external detector, and the CSA was again used to generate eye diagrams. Fig. 24 shows eye diagrams at data rates of 50, 100, 150, and 250 Mb/s. The quality of the eye diagrams in Figs. 23 and 24 both degrade at data rates approaching the target 250-Mb/s data rate. The results of Fig. 24 clearly indicate that the performance of the electrical output pads is the limitation for the results of Fig. 23 and is the source of the switching noise and voltage overshoot and undershoot. Also, it is clear based on the transmitter results from Fig. 22 that the performance of the receiver is the limitation in the receiver-transmitter link, likely due to the long on-chip interconnect that the receiver must drive at its output.
It was not possible to use conventional BER testing equipment to obtain BER data from any of the ODLs due mainly to the packaging limitations of the validation PCB and the data format requirements (rail-to-rail CMOS signals) of the chip I/O pads. Additionally, the error counter of a BER test setup could not be used in conjunction with the dgChan-despite its runlength PRBS data generation capability-because synchronization to the PRBS data could not be achieved. Consequently, BER could only be estimated theoretically through manual calculation [49] based on the experimentally obtained eye diagrams. However, BER results obtained in this manner are highly subjective and suspect in terms of absolute accuracy. Additionally, in the case of the transmitters, they are not representative of actual BER performance because only eye diagrams for singleended transmitter outputs were available for analysis. To measure the true BER performance of the transmitter would have required analysis to be performed on the differential output, which could not be obtained experimentally. Nevertheless, such BER data provides meaningful insight into the statistical and relative behavior of individual ODLs within the receiver and transmitter arrays. Fig. 25 shows the calculated relative BER for several single-ended transmitter outputs within a common-control group of the dgChan at a data rate of 250 Mb/s. The data is normalized such that the BER of the worst-performing transmitter output is equal to one. The spatial arrangement of columns in Fig. 25 matches the physical location of the corresponding transmitters in the common-control group. The gridlines demarcate individual transmitters within the common-control group. The relative BERs are all within two orders of magnitude of one another.
VII. SUMMARY AND DISCUSSION
We have presented the architecture, circuit design, design verification, and experimental testing of an OE-VLSI chip with a 540-element receiver and transmitter array that employs differential optical signaling and a fully differential electrical architecture. Multiple digital functions suitable for OE-VLSI technology were implemented, and the receiver, transmitter, and digital circuits were designed for robustness, testability, and ease of operability. The receiver and transmitter circuits were designed to specifically meet the performance requirements of the digital circuitry and the electrical interface.
The experimental results presented on the electrical and optical performance of the chip verify the full functionality of the chip design and indicate that the receivers and the digital circuitry were operational close to their target data rates, and that the transmitters exceed their performance target.
When integrated into the chip-to-chip demonstrator system, the system will be capable of an aggregate interchip data bandwidth of [ 456 Gb/s. Neglecting the performance of the digital circuitry, the aggregate data bandwidth that can be achieved by the ASIC is limited by the receiver design constraints. The 125-m pitch in the 34 35 OED arrays places a severe burden on the receiver, which consequently must drive long electrical interconnects out of the receiver array. Increased aggregate data bandwidths could be readily achieved if the OED pitch were, for example, halved to 62.5 m. One approach would leave the receiver and transmitter designs unchanged (both the receiver and transmitter circuit layouts could fit without modification into the smaller area) and increase the size of the receiver and transmitter arrays four-fold. Another approach would be to redesign the line driver stage of the receiver, which would have an interconnect to drive that is only half as long, for higher performance.
