Abstract-Small, high-speed and low power optical transmitter circuits are needed for optical interconnects to play a role in improving chip I/O bandwidth. This paper demonstrates two different transmitter designs fabricated in a 1V 90nm CMOS technology, one suitable for driving vertical cavity surface emitting lasers (VCSELs) and the other for driving multiple quantum well modulators (MQWMs). A four-tap current summing FIR equalizer extends VCSEL data rate for a given average current. It consumes 80mW at 18Gb/s operation and occupies 0.03mm 2 area. The MQWM transmitter has a pulsedcascode output stage capable of supplying a voltage swing of twice the 1V supply without overstressing thin-oxide core devices. It consumes 38mW at 16Gb/s and occupies 0.014mm 2 area.
I. INTRODUCTION
Using optics to help reduce the chip I/O bottleneck requires high-speed/low-cost electro-optic conversion circuits. This paper explores designs optimized for creating a large number of high-speed optical transmitters on a single die. For these applications, one needs to achieve high-speed and also keep the area and power costs modest. Thus the goal is not the highest performance possible, but rather the best performance with reasonable costs. In addition to keeping manufacturing costs down, it makes sense to build the circuits using generic CMOS technology.
For optical transmitters, VCSELs [1] and MQWMs [2] are the two primary electro-optic transducer candidates for high density optical I/O systems. VCSELs are typically driven with current-mode drivers, since the current directly controls the emitted light. Currently, commercial VCSELs are available that work at 10Gb/s data rates with simple non-return-to-zero (NRZ) modulation [3] . Device performance is limited by a combination of electrical device parasitics and the optical bandwidth's square-root dependence on average current. Output power saturation due to self-heating [4] and also device lifetime concerns [5] restrict increasing VCSEL's average current levels to achieve higher bandwidth.
MQW modulators are used to externally modulate the intensity of an incident beam from a continuous wave laser source. Since the modulator's optical absorption is sensitive to its electrical field, the reflected beam is modulated by changing the voltage across the device. While MQWMs don't have the optical bandwidth issues of VCSELs, the contrast ratio that can be achieved with CMOS-level voltage swings is somewhat limited, with a typical modulator requiring 3V swing for 3dB contrast ratio [6] . While recent work has been done to lower modulator drive voltages near 1V [7] , robust operation will require swings larger than predicted CMOS supply voltages [8] .
This paper describes VCSEL and MQWM transmitters that address these issues associated with driving these optical devices at high data rates in a CMOS technology. The VCSEL transmitter has a four-tap current summing FIR equalizer that extends the VCSEL data rate and the modulator transmitter has a pulsed-cascode output stage capable of supplying a voltage swing of twice the nominal supply without overstressing the thin-oxide core devices used for maximum speed.
II. TRANSMITTER ARCHITECTURE
The transmitter architecture, shown in Fig. 1 , uses a time division multiplexing factor of five in order to reduce on-chip clock rates. From the frequency synthesis PLL, a five-stage coupled pseudo-differential supply-regulated ring oscillator provides five sets of complementary phases spaced a bit-time apart that are used to switch a 5-to-1 mux to produce a serial data stream. The serial data stream is then buffered by a current-mode driver for VCSEL transmission or by a voltagemode driver for MQWM transmission.
III. VCSEL TRANSMITTER
A common technique in high-speed links is to use linear finite impulse response (FIR) filters to flatten the channel frequency response. While the VCSEL's varying frequency response with current limits the performance of a linear equalizer for large signal modulation, the frequency response variations diminish with increasing average current due to the square-root relationship and a linear equalizer is effective in canceling ISI. Fig. 2 shows the VCSEL transmitter with a four tap equalizer consisting of one pre-cursor, one main, and two post-cursor taps implemented by summing current sources at the output node. At each tap, a pseudo-differential multiplexer serializes the five parallel input bits with five pairs of complementary clocks spaced at one-fifth the clock cycle. The multiplexed data switches differential output drivers that steer current between the VCSEL and dummy diode-connected thick-oxide nMOS devices that are connected to a separate higher voltage LVdd supply. This higher supply is necessary to support the VCSEL knee voltage.
The data bits are shifted one bit time with respect to the clock phases to implement the necessary filter delays at each tap. 8-bit current mirror DACs bias the output stages to the desired current value. Because of the smaller current requirements of the pre/post-cursor taps, their muxes and output stages are set to one-fourth the size of the main tap to save power. A static DC current source, I DC , biases the VCSEL for adequate frequency response.
The quality of the clock phases that drive the multiplexer has a large performance impact because any variations in duty cycle or phase spacing will result in increased timing uncertainty. To compensate for this, the mux predriver delays are made programmable by adding digitally-adjustable capacitive loads to the predriver mid node. Independent control of each predriver provides correction for both systematic duty cycle and random phase spacing errors.
IV. MQWM TRANSMITTER
For modern CMOS technologies, an output swing greater than the nominal power supply is required in order to provide an appropriate contrast ratio with a MQWM. A static-biased cascode output stage (Fig. 3a) [9] , can provide potentially twice the nominal supply output swing without overstressing the core devices in a static high or low output state. However, during a falling transition, the middle node, mid n , must discharge more than a threshold voltage, V th , below the supply before the cascode nMOS conducts significant current and the output begins discharging. This causes an excessive drainsource voltage to develop across the cascode nMOS and can result in hot-carrier degradation. The cascode pMOS experiences similar stress during a rising transition.
A potential solution is a double-cascode output stage with output tracking (Fig. 3b) [10] . While this implementation is more reliable, the speed is limited by the three series transistor stack and feedback tracking loops. In order to maximize speed, a driver which uses only a two series transistor stack and prevents excessive voltage stress by pulsing the cascode transistor gates during transitions is proposed. Fig. 4 shows the pulsed-cascode output stage which accepts both a "low" input, IN low , that swings between Gnd and the nominal chip Vdd and a "high" input, IN high , with the same data value that has been level shifted to swing between Vdd and Vdd2, where Vdd2 is nominally twice the voltage of Vdd. Cascode pMOS (MP2) and nMOS (MN2) are driven by NAND-pulse and NOR-pulse gates respectively.
During an output transition from high to low, the "low" input switches the bottom nMOS (MN1) to drive node mid n to Gnd and the "high" input triggers a positive pulse from the NOR-pulse gate that drives the gate of MN2 to allow the output to begin discharging at roughly the same time that the MN2 source is being discharged, as shown in Fig. 5 where Vdd is 1V. Thus, the cascode nMOS drain-source voltage does not overly exceed the nominal supply voltage. The NOR-pulse gate is sized such that the gate of MN2 does not swing all the way to Vdd2 and the edge-rate of the pulse signal also matches the falling rate of mid n . Therefore, during the transition, a gate-source voltage that does not overly exceed the nominal supply is developed across MN2. The "high" input also activates a pull-down nMOS (MN3) to drive node mid p from Vdd2 to Vdd to prevent excessive V ds stress on MP2. Similarly, during an output transition from low to high, the "high" input switches the top pMOS (MP1) to drive node mid p to Vdd2 and the "low" input triggers a negative pulse from the NAND-pulse gate that drives the gate of MP2 transistor. For ratios of C out /C midn from 1.3 (unloaded) to 15.5, no voltage between two terminals of any output devices exceeds more than 20% above the supply voltage.
In order to minimize the body voltage effect on the cascode transistors' threshold voltages and increase the modulator driver's output transistion rates, the cascode transistors are placed in separate wells that are dynamically biased with replica circuitry to track their source voltages. Fig. 6 shows the level-shifting mux that drives the modulator driver output stage. The five parallel input bits are serialized with five pairs of complementary clocks spaced at one-fifth the clock cycle. The multiplexer is loaded by an nMOS (M1) biased with a gate voltage, V bias , equal to Vdd+V th . M1 and source resistor R s are sized such that the mux output swings about half the nominal supply from VDD to produce the "low" signal path. A common-gate amplifier, consisting of M1 and resistor R ls , level-shifts the multiplexer output to produce the "high" signal path. The amplifier gain is roughly 1.5 to avoid excessive V ds stress across M1. The common-gate level-shift configuration easily allows the use of active inductive shunt peaking in order to increase the multiplexer bandwidth. This is implemented by adding a resistor to the M1 gate.
The "high" and "low" outputs are amplified by pseudonMOS inverters to reliably switch the buffers that drive the output stage with full CMOS levels. In order to compensate for delay between the "high" and "low" signals caused by the common-gate level shifter, a slightly lower inverter fanout ratio (1.5) is used in the "high" signal path, compared to the "low" signal path fanout ratio (1.8). The "high" signal path inverter nMOS transistors lie in a separate p-well to minimize body effects and improve delay tracking. Also, metal fringe coupling capacitors between the "high" and "low" signal paths perform skew compensation.
V. EXPERIMENTAL RESULTS
The transmitters were implemented in a 1V 90nm CMOS technology as part of an optical transceiver test-chip (Fig. 7) . The VCSEL transmitter is placed close to the chip edge, with a commercial 10Gb/s VCSEL attached via short bondwires. The VCSEL has a 700µA threshold current and a 0.37 slope efficiency. A 2.8V supply is used in the final stage to support the 1.5V knee voltage of the VCSEL. Fig. 8 shows equalization providing a 32% increase in vertical optical eye opening at 18Gb/s with 6.8mA average VCSEL current, I avg , and 3dB extinction ratio (ER). The maximum data rate (minimum eye opening of 80% and less than 40% overshoot) versus I avg with equalization and also with all the equalization turned-off for 3 and 6dB ER is shown in Fig. 9 . At 14Gb/s and 3dB ER, equalization allows the VCSEL to run at 35% less average current, which results in a 138% increase in VCSEL lifetime. Equalization extends the maximum data rate from 14 to 18Gb/s for 3dB ER and from 13 to 15Gb/s for 6dB ER before exceeding driver current levels. The equalization works better with the lower extinction ratio due to the large signal nonlinearities in the VCSEL transient response.
VCSEL transmitter total power dissipation is 80mW at 18Gb/s, with 26mW PLL power, 25mW mux and predrivers, and 29mW in the VCSEL output stage. The entire VCSEL transmitter, including PLL, mux, and equalizer output stage, occupies 0.03mm 2 area. An array of MQWM transmitters is located in the middle of the testchip under flip-chip bondpads used to attach the MQWMs. The modulator driver functionality at 16Gb/s operation is verified electrically by using on-chip samplers to subsample the output voltage and convert it to a proportional current that is driven off-chip and viewed with an oscilloscope. Fig. 10 shows an electrical eye diagram obtained by subsampling a 16Gb/s 20-bit pattern and post-processing using sampler calibration information. The reliability robustness of the pulsed-cascode output stage and level shifter are verified via corner and monte carlo simulations. Transient simulations with different operating temperatures and with various transistor and resistor models show the maximum absolute voltage developed between the output transistors' gate, source, and drain terminals does not exceed 11% above the nominal 1V supply. Monte carlo simulations yield tight distributions for all device maximum voltages of interest, with all sigmas less than 13mV.
MQWM transmitter total power dissipation is 38mW at 16Gb/s, with 23mW PLL power and 15mW consumed in the level-shifting mux and output stage. The entire MQWM transmitter occupies 0.014mm 2 area.
VI. CONCLUSION
Two different output stages capable of reliably driving either VCSELs or MQWMs at high data rates are presented. In order to ease the trade-off between VCSEL bandwidth and reliability, an equalizing output stage extends the data rate for a given average current. This transmitter achieves 18Gb/s at 80mW, or 4.4mW/(Gb/s). The pulsed-cascode output stage in the MQW modulator transmitter achieves an output voltage swing of twice the nominal CMOS supply without overstressing thin-oxide core devices. It achieves 16Gb/s at 38mW, or 2.4mW/(Gb/s). These good power efficiencies and small area requirements indicate that building the required electro-optic circuitry should not limit the use of optical I/O.
