Abstract-This paper presents a compact low-power 4 × 10 Gb/s quad-driver module for vertical-cavity surfaceemitting laser (VCSEL) arrays in a 65-nm CMOS technology. The side-by-side drivers can be directly wire bonded to the VCSEL diode array, supporting up to four channels. To increase the bandwidth of the driver, an internal feed-forward path is added for pole-zero cancelation, without increasing the power consumption. An edge-configurable pre-emphasis technique is adopted to achieve high bandwidth and minimize the asymmetry of the fall and rise times of the driver output current. Measurement results demonstrate an rms jitter of 0.68 ps for 10 Gb/s operation. Tests demonstrate negligible crosstalk between channels. Under irradiation, the modulation amplitude degrades less than 5% up to 300-Mrad ionizing dose. The area of the quad-driver array is 500 µm by 1000 µm, and the total power consumption for the entire driver array chip is 130 mW for typical current setting.
I. INTRODUCTION
T HE rapid growth of communication systems and expanding internet use results in the increasing need for highspeed data link. Fiber-optical communication technology has become very attractive for short-reach applications, because of the high bandwidth, low-signal distortion, and low cost [1] . In high-energy physics experiments, optical data links are widely exploited in data communication between detectors and control rooms [2] .
Two critical components in optical transmitters systems are the laser diode (LD) and its driver IC. Vertical-cavity surface-emitting lasers (VCSELs) are the most commonly used LD devices. VCSELs have the advantage of low power consumption, ease of assembly and optical alignment, and low cost [3] .
Many industrial and academic VCSEL drivers over 10 Gb/s have been reported in dedicated RF technologies, such as SiGe technology [4] , [5] . However, SiGe circuits are expensive and difficult to package. Compared to SiGe VCSEL drivers, the CMOS VCSEL drivers are typically limited in bandwidth but have advantages in terms of integration, chip size, power consumption, and cost. It is thus desirable to implement the drivers in CMOS technologies. However, the disadvantage is that special circuit techniques are needed to improve the bandwidth. In the Versatile Link PLUS (VL+) project [6] , [7] , a CMOS VCSEL driver array can save area and ease the assembly compared to several single drivers connected to a VCSEL array. This paper is organized as follows. Section II describes the structure of the laser driver array, as well as the detailed design of single-channel laser driver. The design consideration and techniques are included in this section. Section III presents the implementation of 4 × 10 Gb/s driver array. Sections III-A and III-B show the electrical and optical measurement results. Conclusions are drawn in Section IV.
II. DESIGN OF LDQ10P

A. Overall Structure of Driver Array
The block diagram of Laser Driver Quad 10GPlus (LDQ10P) is shown in Fig. 1 . The driver array is composed of four laser drivers (channels) and an I2C interface. The I2C is 0018-9499 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. employed to control the VCSEL bias current and modulation amplitude for each channel. The output pad of each channel (250-µm pitch) is directly bonded to the anode of a VCSEL in a common-cathode VCSEL array. The common cathodes of the VCSELs are wire bonded back to the application-specified integrated circuit (ASIC) ground, ensuring that the current flows in the bond wires that connect the ASIC to the module ground carry constant current reducing ground bounce and crosstalk among channels.
The block diagram of each channel of the LDQ10P is shown in Fig. 1 . The core circuits are the limiting amplifier and the output driver. Two 7-b digital to ananlog converters (DACs) are used to make the VCSEL bias and modulation currents programmable. Most of high-speed commercial VCSELs have a threshold current between 0.5 and 2 mA and require a modulation current between 2 and 10 mA [8] , [9] . Therefore, each channel was designed to have laser bias and modulation currents ranging from 0 to 12 mA and 0 to 8 mA, respectively. As shown in Fig. 2 , here the "bias current" refers to the maximum current (eye height) flowing into the VCSEL, and the "modulation" refers to the ac amplitude (eye depth) of the current. Another 3-b DAC is used to provide a programmable pre-emphasis function for the output driver. Fig. 3 shows a simplified schematic of the output driver. A power supply of 2.5 V is used for the output driver to accommodate the forward-biased voltage of the VCSEL devices (typically around 1.8 V) [10] . However, all transistors in the circuit are thin-oxide devices in order to reach the required bandwidth (10 Gb/s). Because the voltages across the terminals of these devices should not exceed 1.2 V (nominally), reliable operation requires the output driver transistors to be stacked in such a way that the supply voltage is split across several devices in series. This technique is known as the totem-pole HV protection technique [11] .
B. Design of the Output Driver
The output signals of the limiting amplifier feed into the input pair M 1 and M 2 of the output driver, and fully switch them ON/OFF. Transistors M 3 and M 4 are stacked above the input transistors for HV protection. M 6 copies the current from M 7 with a multiplicative gain of 10 and provides the The main signal path is the right branch of the output driver, consisting of M 2 , M 4 , M 6 , and the VCSEL. The transfer function of the output driver can be expressed in the following equation:
where C dio is the parasitic capacitance of VCSEL, C d6 is the drain capacitance of M 6 , and R is the equivalent output resistance of the VCSEL. To reduce the parasitics on the output node, the size of M 4 is kept small, which is 48 µm/60 nm.
The size of M 6 should be large enough to provide sufficient driving current for VCSEL, which is 300 µm/150 nm. Therefore, the bandwidth of the output driver is dominated by the large parasitic capacitance C d6 of M 6 and the diode capacitance C dio . Compared with the main signal path on the right branch of the output driver, the signal path on the left branch, consisting of transistors of M 1 , M 3 , and M 5 , has much less parasitic capacitance at the level of 100 fF due to small size of transistors. To achieve high bandwidth, a frequency compensation feed-forward scheme is used that adds a feedforward path between the drain of M 3 and the gate of M 6 , as shown in Fig. 3 [12] . Transistors M 1 , M 3 , and M 5 , and resistor R 1 act as the ac-boosting path. M 5 and R 1 work as an active inductive load to further boost the speed of the left branch of the output driver. The transfer function of the feedforward path can be derived as
where G m1 , G m6 , and G m5 are the transconductance of M 1 , M 6 , and M 5 , C FF is the feed-forward capacitor, and C gs5 and C gs6 are the gate-to-source capacitance of M 5 and M 6 . As can be seen from (2), the dominant pole of the feed-forward path is mainly determined by C FF , C gs6 , and G m5 . In addition, the zero introduced by C gs5 and R 1 can further boost the speed of the feed-forward path. Large C FF is desired since it can maximize the dc gain. However, sizing C FF too large should be avoided for bandwidth consideration.
In general, C FF value should be chosen around that of C gs6 . In our design, C gs6 is 40 fF and simulation shows that a C FF of 60 fF gives the best performance.
C. Output Driver Pre-Emphasis
VCSELs display asymmetric fall/rise times with the rise time being faster than the fall time, which might lead to intersymbol interference (ISI) [13] . A reconfigurable edge pre-emphasis technique is introduced to address this problem by enabling the pre-emphasis on the falling edge, rising edge, or both edges. In a differential pair, a source degeneration RC network introduces a zero in the frequency response. The effective transconductance G m can derived as
where g m1,2 is the transconductance of the input pair, R is the degeneration resistor, and C is the degeneration capacitor.
As can be seen from (3), the effective transconductance contains a zero at 1/RC and a pole at (1
R)/(RC).
If the pole at the drain of M 3 /M 4 is canceled by this zero, the effective bandwidth is increased by (1 + g m1,2 R). Of course, this bandwidth increase is achieved at the cost of a proportional reduction of dc gain. The edge pre-emphasis function is realized by sensing rising and falling edges of signal and controlling the switches of the source degeneration network, as shown in Fig. 4 . For example, if the pre-emphasis is configured to be at falling edge only, switch M 6 is ON while switches M 5 and M 7 are OFF. With this configuration, switch M 8 transits from ON to OFF during the falling edge of V IN + (so as the output current), which makes the RC degeneration effective and generates the emphasis peak in the output current. On the other hand, the pre-emphasis is disabled at the rising edge of V IN + because the RC network is shorted by switch M 8 . Similarly, the rising edge only pre-emphasis or both edge pre-emphasis can be achieved by enabling switch M 7 or both M 6 and M 7 , respectively. The preemphasis function can be easily turned OFF by setting switch M 5 and M 8 ON so that the degeneration network is always disabled regardless of the input signal.
In order to tune the zero location and change the preemphasis amplitude, a programmable capacitor array is implemented as shown in Fig. 4(c) . Thin-oxide nMOS transistors are used as switches for the low turned-ON resistance and small parasitic capacitance at the same time.
D. Limiting Amplifier
As the first stage of the laser driver, the limiting amplifier has to satisfy several requirements: First, it should amplify the input signal to the levels required to fully switch ON/OFF the VCSEL modulation current. This improves the power efficiency of the output driver. Second, the limiting amplifier stage should have a bandwidth compatible with no ISI data transmission at 10 Gb/s. In addition, its layout should respect the 250 µm-pitch imposed by the VCSEL array.
In the design of LDQ10P limiting amplifier, a voltage gain of more than 6 dB is required to provide over 800 m V pp differential swing to the output driver, with a 200 m V pp single-ended swing at limiting amplifier input. Therefore, a two-stage limiting amplifier with inductor peaking is adopted to reach this gain requirement and achieve high bandwidth. In a traditional two-stage limiting amplifier topology shown in Fig. 5 [13] , two inductors are typically used in each stage. To minimize the area of the limiting amplifier, we employed an architecture where a single inductor is shared by the first and second stages [14] , as shown in Fig. 6 . The simulation results show that a gain of 13.3 dB and a bandwidth of 14.8 GHz are achieved. The power consumption of the limiting amplifier is 12 mW at a power supply of 1.2 V. Fig. 7 shows the die photograph of LDQ10P. The chip is implemented in a nine-metal 65-nm CMOS technology. The core driver circuits are placed close to output pads to reduce parasitics. The four VCSEL drivers' area is 500 µm × 1000 µm. The high-speed signals are routed with the thick top metal layers to reduce the parasitic resistance. At 10 Gb/s, each of the four channels in the LDQ10 consumes 32 mW under typical settings (4-mA modulation current and 6-mA bias current) and the total power consumption is 130 mW including the digital control circuitry.
III. CHIP IMPLEMENTATION AND MEASUREMENT RESULT
A. Electrical Measurement Results
For electrical characterization, an Agilent JBERT N4903B was used for high-speed pseudorandom binary sequence (PRBS7) data generation and an Agilent DSA91204A oscilloscope was used for output waveform measurements. The jitter performance of the LDQ10P was measured with PRBS7 input data and at a bit error rate (BER) of 10 −12 . The test pattern length of PRBS7 is 127. Fig. 8 shows the electrical eye diagram at 10 Gb/s with 4-mA modulation current and 6-mA bias current. The total jitter TJ(BER) is defined as the amount of eye closure at a given BER, which is the width of the eye minus the eye opening; RJ, DDJ, and PJ are defined as random jitter, data dependent jitter, and periodic jitter, respectively [15] . The total jitter TJ(BER = 10E−12) in electrical eye diagram is 15.47 ps with an rms random jitter component of 0.68 ps.
B. Optical Measurement Results
The optical test board of the LDQ10 is assembled with a Philips Photonics ULM850-25-TT-N01xxU [16] VCSEL array. The eye diagram at 10 Gb/s is shown in Fig. 9 , where the total jitter TJ (BER = 10E − 12) is 17.97 ps and rms random jitter is 0.84 ps, demonstrating the good performance of the LDQ10.
C. Crosstalk Measurement Results
Crosstalk among channels may increase the jitter and corrupt the eye diagram. The worst case scenario for the channel under test happens when the other three channels are transmitting the same data but a different data sequence from the one being transmitted by the channel under test. In our design, a power/ground mesh with a large amount of on-chip decoupling capacitors was used to minimize the crosstalk effects. Table I shows the comparison of performance between the scenario of only one channel being turned ON (no crosstalk) and the worst case scenario when all four channels are turned ON (with crosstalk). As evident from the results, the increase in the total jitter is less than 2 ps at the worst case scenario compared to when only a single channel is running. This demonstrates that the crosstalk has negligible effects on the driver performance.
D. Pre-Emphasis Measurement Results
Frequency-domain edge pre-emphasis is employed in LDQ10 to accommodate different parasitics and compensate for the VCSEL turn-ON delays. Table II shows the comparison of the measured performance at 10 Gb/s with/without preemphasis. With pre-emphasis, more frequency peaking is introduced to sharpen the rising and/or falling edge. The total jitter is improved by 2.4 ps with pre-emphasis compared to the case without pre-emphasis.
E. Irradiation Measurement Results
The total ionization irradiation (TID) characterization of the Taiwan Semiconductor Manufacturing Company 65-nm CMOS technology is reported in [17] . Significant loss of drain-source current is observed in pMOS transistors, and huge leakage current is observed for transistors smaller than 360 nm, after an exposure of radiation dose level of 200 Mrad. Therefore, in our design to mitigate TID effects, pMOS transistors are avoided whenever possible and all gain stages in the driver are implemented using nMOS transistor. In addition, all core transistors have a size larger than 360 nm to limit leakage current. This way the TID effect of radiation on the circuits is minimized.
The chips were irradiated with X-ray, while being powered ON for three days, up to 300-Mrad dose with a dose rate of 9 Mrad/h. Plots of the total jitter and modulation depth versus TID are displayed in Fig. 10 . Table III shows the effects of radiation on the variations of the jitter, the bias current, and the modulation current in percentage. The total jitter remains less than 20 ps after radiation dose up to 300 Mrad. Since the total jitter target specification is 30 ps and the modulation/bias current variation is well within the range of VCSEL normal operation, the measurement results prove that our design is robust to ionizing radiation.
F. Comparison With Previous Works
A performance summary along with a comparison with previous works is shown in Table IV . As can be seen, this paper achieved a power efficiency of 3.25 pJ/b which is comparable to the state-of-the-art designs.
IV. CONCLUSION A 4 × 10 Gb/s compact low-power laser driver array was implemented in 65-nm CMOS technology. Shared-inductor technique was used in the limiting amplifier to minimize the silicon area. The scheme of adding a feed-forward path with inductive peaking for the frequency compensation is adopted to broaden the bandwidth of the laser driver. An edgeconfigurable pre-emphasis technique is adopted to achieve high bandwidth. The low power consumption of 130 mW is achieved for 4 × 10 Gb/s laser driver array. The four-channel VCSEL driver array occupies an area of 500 µm × 1000 µm.
