ABSTRACT This paper describes a voltage-mode transmitter that stacks independent non-return-tozero (NRZ) and four-level pulse amplitude modulation (PAM4) drivers for dual channels. The stacked structure can improve the transmitter's power efficiency owing to its current-recycling capabilities. The PAM4 voltage-mode driver with voltage regulators, implemented to increase data rates, generates multilevel signals. The proposed dynamic load control inside the PAM4 driver minimizes the voltage variation in the output of the regulators, which is caused by the variations of the data patterns. A prototype chip was fabricated using a 28-nm CMOS process. The stacked 7-Gb/s NRZ and 14-Gb/s PAM4 transmitters have achieved a full aggregate transmission rate of 21 Gb/s while consuming 12.6 mW from a 1-V power supply. When transmitting 500-mVpp signals with the proposed transmitter through 2-cm FR4 channels, the eye heights obtained in the NRZ and PAM4 channels were 312 and 92 mV, respectively.
I. INTRODUCTION
As the demand for high-performance mobile devices increases, the performance of their processors and memory chips, as well as the communication speed between chips, increases. Therefore, the traditional, widely used low-speed parallel interfaces have been replaced by high-speed serial interfaces [1] . In addition, studies are underway to reduce the power consumption of short-channel interface circuits in mobile devices, in which battery life is important.
Among the available driver circuits for transmitters, the voltage-mode drivers have a relatively low static current compared with current-mode drivers and, therefore, voltagemode drivers are predominently used in low-power mobile applications [2] .
In processor-memory interfaces such as DDR and LPDDR, a dedicated supply voltage (VDDQ) is used for the driver in order to reduce power consumption. On the other hand, in mobile interfaces such as Mobile Industry Processor Interface (MIPI) D-PHY and M-PHY, the low-swing signaling level is specified without an additional power definition, so on-chip linear regulators are often employed as power supplies for low-swing near-ground voltage-mode drivers. However, as described in Fig. 1 , low-swing near-ground voltage-mode drivers using an on-chip linear regulator suffer from power loss in the pass transistor of the regulator [3] . In particular, high-speed interfaces for CMOS image sensor (CIS) systems demand high power efficiency to extend battery life and avoid of image quality degradation due to the local heating on the sensor side [4] . To solve this problem, stacked non-return-to-zero (NRZ) drivers [4] , [5] with the current-recycling scheme shown in Fig. 2 have been introduced.
On the other hand, studies on four-level pulse amplitude modulation (PAM4) have been actively conducted to increase data transmission speeds without increasing clock speed and the number of required I/O pins. To support PAM4 transmissions, researchers have used voltage-mode transmitters [6] , [7] , current-mode transmitters [8] - [11] , and a hybrid transmitter [12] that combines a voltage-mode driver with an auxiliary current injection driver to increase the signal swing. In general, voltage-mode transmitters are suitable for low-power applications. However, as it becomes necessary to increase the number of driver segmentations to generate multiple levels for PAM4 transmissions, the additional required data paths and pre-drivers associated with the multiple segmentation result in significant power consumption [7] . Furthermore, the voltage regulators used for generating multiple levels have output noise caused by the variation of the data patterns, which considerably degrades the voltage margins [6] . This paper presents a power-efficient 21-Gb/s dual-channel voltage-mode transmitter in which a full-rate PAM4 driver and a half-rate NRZ driver are stacked. By using the phase information of the data recovered via clock data recovery (CDR) allocated to the NRZ channel, we were able to reduce the complexity of the PAM4 receiver circuit while increasing data transmission speed. The proposed PAM4 driver dynamically adjusts the current load of the regulator according to the data pattern, which can reduce the voltage noise of the driver output signal while minimizing the additional current consumption. In Section II, the proposed dual-channel transceiver architecture is described. A detailed description of the transmitter circuit is given in Section III. The measurement results of a prototype chip fabricated using a 28-nm CMOS process are presented in Section IV, followed by our conclusions in Section V. Fig. 3 shows the overall structure of the proposed dual-channel transceiver, including the stacked NRZ and PAM4 drivers (SNPD). The output of the PLL on the transmitter side is used as a common clock for the generation of NRZ and PAM4 high-speed data. A one-hot encoder is inserted in the PAM4 data path to generate the input signals for the PAM4 driver. The static current of the NRZ driver is recycled by the PAM4 driver; therefore, the SNPD wastes no current. The NRZ and PAM4 channels can be driven with a current of
II. PROPOSED ARCHITECTURE
as shown in Fig. 1 . R TX is the output impedance of the driver and R RX is the differential termination resistance of the receiver.
As shown in Fig. 3 , the PAM4 driver consists of a highswing driver (HSD) and a low-swing driver (LSD). D 10 and D 00 are top and bottom signal levels generated by the HSD. The two voltage levels of LSD, D 01 and D 11 , are generated by the source regulator, VREG 1 , and the sink regulator, VREG 2 . When the HSD is continuously activated and the signal level stays at D 00 or D 10 , the current paths of the pass transistors of VREG 1 and VREG 2 disappear because the LSD is deactivated. In this case, VREG 1 and VREG 2 cannot properly adjust their output levels; therefore, the output of VREG 1 slowly increases whereas the output of VREG 2 drops. That is, the regulator output levels change depending on the data pattern, which causes voltage noise in the signal, thereby reducing the voltage margin [6] . The I LOAD control circuits (S1, S2), which can dynamically form auxiliary current paths, were added to minimize this voltage noise.
On the receiver side, conventional PAM4 CDR circuits suffer from instantaneous perturbations or loop instability owing to the wide-edge distribution characteristics of PAM4 signals. Therefore, the circuit's complexity and power consumption need to be increased to overcome such drawbacks [10] , [13] . Reference [4] introduced the NRZ+NRZ transceiver with the full CDR circuit for one channel and the simple skew compensation circuit for the other channel. In this prototype chip, the receiver is not implemented but we propose to use a similar scheme: a full CDR for the NRZ and the skew compensation for the PAM4. In the proposed transceiver structure, the clock recovered by the CDR circuit of the NRZ channel [14] is shared with the PAM4 channel. The PAM4 receiver can recover data using the frequency and phase information of the recovered NRZ channel clock and compensate only the physical skew between the NRZ and PAM4 channels. The optimal sampling point of PAM4 could be found by checking bit error rate (BER) while sweeping the phase of the sampling clock.
III. CIRCUIT IMPLEMENTATION
The overall transmitter for verifying the proposed SNPD structure consists of a clock generation circuit, a logic circuit for digital control, a data path for the NRZ channel, and a data path for the PAM4 channel. The required high frequency clock for high-speed data generation can be provided by an internal PLL or an external clock source. The target frequency of the high-speed clock is 3.5 GHz, and the data paths of the NRZ and PAM4 channels have a half-rate structure. The digital control, such as for enabling or disabling the operation of each circuit, or for impedance control of the driver, is externally adjustable using the inter-integrated circuit (I 2 C) protocol, and the related digital circuits were implemented by placing and routing. The NRZ data path consists of a two-bit pseudo-random binary sequence (PRBS) generator, a 2:1 serializer, a pre-driver, and an NRZ driver. The PAM4 data path consists of a four-bit PRBS generator, a 4:2 serializer, a predriver with one-hot encoder logic, and a PAM4 driver.
As shown in Fig. 4 , the SNPD consists of an NRZ driver, a PAM4 driver, and three voltage regulators. The design specifications are summarized in Table 1 . The NRZ driver was designed with PMOS transistors because its output voltage is higher than half of VDD. On the other hand, the PAM4 driver was designed with NMOS transistors because its output voltage is lower than half of VDD. The output impedance of the NRZ and PAM4 drivers can be set close to 50 using digitally adjustable resistors, as shown in Fig. 5(a) . The simulation results presented in Fig. 5(b) show that the impedance range of the controllable four-bit driver encompasses the entire process variations and the impedance step is approximately 2 .
The PAM4 driver is divided into HSD and LSD, which do not operate simultaneously. The HSD generates signal levels of D 10 and D 00 between half of VDD and ground, and the LSD generates the signal levels of D 01 and D 11 from the outputs of VREG 1 and VREG 2 . This PAM4 driver can easily be converted to an NRZ driver by enabling only the HSD driver. However, it requires reconfiguration of the data path to support dual-mode operation and the dangling LSD driver still induces additional loading for the NRZ driver. Hence, we demonstrate only NRZ/PAM4 stacked drivers in this work. Table 2 summarizes the operation of the SNPD according to the input data. The input data of the SNPD is a one-hot code (H[3:0]) in which only one signal is high. When the HSD is activated, the static current of the HSD is equal to the static current of the NRZ driver (I HSD = I NRZ ), and S1, which generates the current path of VREG 1 and VREG 2 , is turned on. The current added by S1 is equal to I ADD , supplied by VREG 0 . The total static current at this point is
However, owing to the small output signal swing, the static current of the LSD is one third of the static current of the HSD or the NRZ driver. When the LSD is activated, S2 generates the current,
corresponding to the static current difference between I HSD (= I NRZ ) and I LSD , so that the total static current remains constant, as shown in the following equation:
VREG 0 also compensates any dynamic current mismatch between the NRZ and PAM4 drivers and minimizes the voltage noise of VM. Fig. 6 shows the simulated voltage noise of the regulator output for various I ADD values and the number of consecutive HSDs. As indicated in the 14-Gb/s PAM4 eye-diagrams of Fig. 6 , when the I ADD is changed from 0.1 mA to 0.8 mA, the voltage noise is reduced from 37 mV to 22 mV. Adjusting the value of I ADD according to the data pattern on a persymbol basis can reduce the voltage noise while minimizing power consumption. In addition, increasing the output capacitance of the regulators can be an alternative method to reduce the output noise of the regulators and the output voltage noise of the PAM4 driver, but this has the disadvantage of using additional physical area.
The skew between the I LOAD control signals and drivers' inputs (H[3:0]) can cause a voltage noise due to the current mismatch. In this prototype chip, the skew does not exceed 10 ps and the output voltage noise of the voltage regulator due to this skew is less than 1 mV, which is negligible.
Even though VM is set to half of VDD in the default mode, we can elevate VM by adjusting VREF, and improve the voltage margin of the PAM4 driver by sacrificing the voltage swing of the NRZ driver. For example, when VREF is changed to 200 mV, VM is set to 600 mV, and the voltage swings of the NRZ and PAM4 drivers become 400 mVp-p and 600 mVp-p, respectively. In this case, the total static current increases by 0.5 mA, and the current mismatch between the NRZ and PAM4 drivers is compensated by VREG 0 . Although we can improve PAM4 eye-height by increasing VREF, 166-mV VREF is an optimum point in terms of current consumption because the matched swings of NRZ and PAM4 minimize additional current consumption of the regulator. In this work, we have VREF of 166mV by default because the resultant eye openings of the PAM4 channel are VOLUME 6, 2018 good enough to meet the target BER and we focused on improving the power efficiency.
IV. EXPERIMENTAL RESULTS
A dual-channel transmitter with the proposed SNPD was fabricated using a 28-nm low-power CMOS process. The total area of the transmitter, including the power decoupling capacitor, was 0.13 mm 2 , as shown in the micrograph presented in Fig. 7 . The total test chip area, including the PLL, I 2 C, and I/Os, was 3.56 mm 2 . The test chip was mounted on a printed circuit board (PCB) via a chip-on-board (COB) method. The wire bond length of COB is approximately 0.2 cm, and the FR4 PCB trace after the SMA connector is approximately 2 cm. Fig. 8 shows the test board. A power analysis of the transmitter is shown in Table 3 . The SNPD consumed 3.11 mW from a 1-V power supply. The power of the transmitter, including the data path and clock buffer, was 12.6 mW.
The signal waveforms for the 7-Gb/s NRZ and 14-Gb/s PAM4 driver outputs were measured with a Tektronix DSA71254C instrument and P7313SMA differential probes. The termination voltages for the NRZ and PAM4 channels are 750 mV and 250 mV, respectively, set by the P7313SMA [18] . The driver impedance was manually controlled via the I 2 C interface. As shown in Fig. 9(a) , when the NRZ driver and the PAM4 driver were operating simultaneously, the NRZ and PAM4 signals did not interfere with each other. Even when we disabled the NRZ driver, there was no noticeable change in the output eye-diagram of the PAM4 driver. The BER of the transmitter was checked with Keysight J-BERT N4903B. To measure the BER of the PAM4 signals, we set the three threshold voltage levels for the error check patterns of the equipment. At a BER of less than 10 −12 , the NRZ and PAM4 channels achieved the timing margin of 0.74 UI and 0.31 UI, respectively.
For the PRBS7 pattern, the maximum run length of the HSD was only five. When the waveform was measured with an I ADD of 0.2 mA, the VREG noise was less than 5 mV, which is in good agreement with the simulation results shown in Fig. 6 . Under these test conditions, we could achieve a vertical eye opening of 312 mV for the NRZ driver, as shown in Fig. 9(b) . For the PAM4 driver, a vertical eye opening of 92 mV and a horizontal eye opening of 0.54 UI were realized, and the three eyes were well balanced, as shown in Fig. 9(c) . The high-speed clock was provided by an external source. The measured eye-diagram confirmed the obtained level separation mismatch ratio (R LM ) of 0.94. Table 4 shows a performance comparison of the output driver proposed in this paper with other prior designs. The SNPD and pre-drivers achieved an energy efficiency of 0.26 pJ/bit with a signal swing of 500 mVp-p.
V. CONCLUSION
A dual-channel 21-Gb/s voltage-mode transmitter with a current-recycling output driver was designed and implemented using a 28-nm CMOS process. The NRZ and PAM4 drivers were stacked to increase the power efficiency on the transmitter side and to reduce the complexity of the PAM4 clock recovery circuitry on the receiver side. The voltage noise was reduced by controlling dynamic current load of the PAM4 voltage-mode driver. The output driver and the entire transmitter exhibited energy efficiencies of 0.26 and 0.6 pJ/bit, respectively.
