Abstract-Double-edged pulsewidth modulation (DPWM) is less sensitive to frequency-dependent losses in electrical chip-to-chip interconnects. However, the DPWM scheme instantaneously transmits information at a different rate than a synchronous source. This paper presents an 8-/9-bit line-coding scheme to compensate for the timing skew between the DPWM and synchronous clock domains while limiting the size of buffering required in the transmitter and receiver. Furthermore, preemphasis is introduced and analyzed as a means to improve the signal integrity of a DPWM signal. A multiphase-based, time interleaving receiver architecture using a sense amplifier is presented for high-speed data recovery. The DPWM transceiver is implemented in a 45-nm CMOS Silicon on insulator and operates at 10 Gbit/s with 10 −12 bit error rate and consumes 96 mW. The power consumption of the 8-/9-bit coding hardware is 1.5 mW at 10 Gbit/s demonstrating low-power overhead.
I. INTRODUCTION

D
EMANDS on serial communication have required circuit techniques that optimize spectral efficiency for high data rates [1] - [5] . While conventional nonreturn to zero (NRZ) [2-pulse-amplitude modulation (PAM)] signal offers a low-complexity format, the broadband signal power spectrum incurs significant power loss in channels with severe nulls in the frequency response, e.g., multidrop buses [6] - [8] . Alternatively, 4-PAM has been extensively investigated, since it enhances the spectral efficiency given the channel bandwidth. However, 4-PAM is constrained by per-pin peak power limitation in low-voltage applications. In addition, higher order modulation has been reported with higher power consumption due to the linearity requirements of the receiver [3] , [9] . This paper focuses on modulation schemes that require only two signal amplitude levels. As shown in Fig. 1(a) , NRZ encodes symbol values in the voltage domain, while the symbol intervals are constant. Fig. 1(b) shows pulsewidth modulation (PWM) encoding of information on the positive pulsewidth, while the interval between rising edges are determined. The minimum pulsewidth is constrained by the channel bandwidth to avoid significant signal power loss. In Fig. 1(c) , multilevel signaling is proposed in the time domain using double-edged PWM (DPWM) to encode information into both positive and negative pulsewidth intervals and has recently been demonstrated to achieve picojoule per bit performance. The advantage of DPWM to NRZ has been presented in a practical channel of multidrop memory link with a channel loss of −32 dB below 5 GHz [8] . At 10 Gbit/s, while NRZ presents a significant eye closure due to the power loss, DPWM concentrates signal power into lower frequencies and, therefore, is less affected to the sharp frequency rolloff.
DPWM has been previously investigated for asynchronous memory links. Because DPWM transmits multiple bits via a single wire, fewer lanes are required for a given throughput and the analog circuitry does not need to be more linear. The DPWM is reported to reduce the power consumption to 36% and the area size to 25% [10] . However, most modern memory access is based on synchronous interfaces. While PWM could be employed as a modulation format for serial links [11] , the data capacity for PWM is half of DPWM for similar bandwidth requirements. DPWM is inherently plesiochronous in that the transmitted signal only matches the synchronous clock over a long enough time interval. An elastic buffer (EB) can be inserted between the synchronous and plesiochronous clock domains to accommodate instantaneous frequency skew between the clock domains [10] . In practice, a finite EB size may relax the instantaneous frequency deviation but fails to compensate the long-term frequency drift for uncoded data. A CMOS mixed-signal transceiver is proposed that utilizes an 8-/9-bit adaptive pulsewidth encoding to interface the plesiochronous DPWM scheme with a synchronous clock domain in the transmitter and receiver. To the best of our knowledge, this is the first DPWM implementation that is compatible with synchronous data links. In addition, the digital-to-time converter (DTC) introduces amplitude preemphasis to provide a high-frequency boost to overcome channel losses. This is the first work to incorporate preemphasis with a DPWM scheme [12] .
In Section II, DPWM signaling is briefly described to explain the frequency drift issue. In Section III, the 8-/9-bit encoder is proposed to realize a low-power EB that prevents excessive long-term frequency skew. In Sections IV and V, a transceiver architecture is proposed for low-jitter control and high-speed conversion. Preemphasis parameters are addressed to improve signal integrity. In Section VI, 8-/9-bit hardware designs are demonstrated and the transceiver performance is measured in both temporal and spectral domains.
II. DPWM SIGNALING
A DPWM waveform is shown in Fig. 1(c) . DPWM encodes log 2 M information bits representing a symbol a[k] into both positive and negative pulsewidth, where M denotes the total modulation levels. If a[k] takes a value between 0 to M − 1, the period of the DPWM pulsewidth is
where T ref = P T is the minimum pulsewidth, P is the programmable integer, and T is the timing resolution for PWM. Note that T ref does not have to be an integer of T, but this assumption simplifies the circuit implementation [8] , [11] . Longer T ref condenses the signal power to lower frequencies, which can be beneficial in bandwidthlimited channels. While T ref should be chosen to satisfy the channel bandwidth, T determines the achievable data rate and bit error rate (BER) subject to the accuracy of the time-to-digital conversion. Therefore, the DPWM data rate is calculated from the expected symbol value
To adjust the bit rate of DPWM, the number of bits in a symbol is chosen based on the minimum pulsewidth and the timing resolution. For a fixed T ref and T, Fig. 2 shows the bit rates versus M in DPWM signaling. The numerator of (2) indicates an extension of total signal levels, M, enlarges the data capacity. The bit rate increases proportionally with M and peaks at a certain value depending on T ref and T . A large M prolongs the average DPWM symbol period in the denominator of (2) and degrades the bit rate. Therefore, the optimal choice of M occurs when transmitting most information bits using the least required symbol period. To achieve 10 Gbit/s, for T = 40 ps and T ref = 160 ps, (2) indicates M = 8. Interestingly, DPWM also operates at 10 Gbit/s for T = 80 ps and T ref = 80 ps, when M is 4. A larger T relaxes the timing resolution on transceiver, and fewer signal levels M may simplify the hardware implementation. However, the shorter T ref distributes signal power into higher frequencies and incurs substantial intersymbol interference (ISI) and Data-dependent jitter (DDJ) in band-limited channels [10] , [12] , [13] . If T is reduced through the use of fine-line CMOS, a high bit rate is maintained even as T ref is longer than 1/f b . Therefore, DPWM reduces the required channel bandwidth requirement utilizing finer T and relaxing T ref .
The channel bandwidth determines the choice of T ref to avoid substantial ISI and timing jitter due to the low-pass channel characteristics. Therefore, the property T ref is chosen for the channel, while the timing resolution is controlled by the ability of the receiver to discriminate the different pulsewidths. When transmitting T ref of 160 ps, the desired channel bandwidth is 3.125 GHz (0.5/160 ps). For a 50--termination system, the first-order circuit bandwidth is 1/(2π · 25 · C p ), where C p represents the aggregate parasitic capacitance of the driver and package. The overall parasitic capacitance should budgeted at <2 pF, calculated by 1/(2π · 25 · 3.125 GHz).
The timing parameters determine the optimal DPWM M-ary. The dependence is described by rewriting (2) as follows:
where r equals T /T ref . In the event of CIS of one,
This means that the DPWM signal is plesiochronous because the rate of data transmission is changing based on the current symbol value. To mitigate this problem, we will subsequently introduce an EB for synchronous data links.
III. PLESIOCHRONOUS-TO-SYNCHRONOUS ELASTIC BUFFER
The signal transition time of the DPWM signal is a function of the symbol value as described in (1) . Since data are available from a synchronous data bus, the EB is required to store data until the transmitter is prepared to transmit the data. As shown in Fig. 1 , the system must accommodate the interface between the synchronous and plesiochronous timing domains. The EB is a circular shift register with 2N BUF word registers that can accommodate for the timing skew between the synchronous data clock Data_C K and plesiochronous DTC_C K in Fig. 3 . Head and tail pointers direct the read and write data access. The tail pointer holds the address of the next synchronous word to be written into the buffer, while the head pointer holds the address of the next DPWM symbol to be encoded in the DTC. The Data_C K and DTC_C K , respectively, control the augmentation of the tail and head pointer. Since data are encoded on the rising and falling edges, the DTC_C K triggers the head pointer of the EB on dual edges. While the tail pointer augments at a constant rate, the head pointer rate varies significantly based on the transmitted symbols. The EB must prevent access conflicts that occur when the head and tail pointer intersect by keeping the pointer difference under N BUF .
A. Elastic Buffer Timing Tolerance
While the EB can accommodate short-term variations in the frequency between Data_C K and DTC_C K , a long-term drift causes the buffer to overflow. The expected time to a pointer conflict is calculated from (1) . The transition time of the kth transition of the DTC_C K is
where it is assumed that t 0 is aligned with the data clock at time zero. While a[k] is a uniform variable over values from 0 to M − 1, the summation of these symbols is a Gaussian distribution with variance proportional to k according to the central limit theorem. The increasing variance implies an inevitable access conflict when the two pointers catch each other. The timing tolerance is only assured when the difference between the Data_C K and DTC_C K is less than the N BUF depth
where T is the synchronous data rate, i.e.,
The running disparity (RD) is introduced here to evaluate the instantaneous frequency deviation between plesiochronous DPWM clock domain T DPWM and synchronous data clock
While RD captures the instantaneous frequency mismatch, digital sum variation (DSV) indicates the overall frequency drift between two clock domains in (5)
When the DSV is zero, the two pointers rotate through the EB at the same speed in the long run. If DSV is positive, the DPWM pulsewidth is longer than the data clock on average and the head pointer moves more slowly than the tail pointer.
To avoid access conflicts between the data clock domain and the DTC domain, the worst case deviation in DSV must be understood to determine the required size of the EB. Optimally, the two pointer addresses are always spaced by N BUF . When transmitting a string of CIS, e.g., a[k] = 7, each symbol results in RD = 3.5 and incurs an infinite DSV augmentation shown in Fig. 4 (b). While the EB is written at a constant clock rate of T , the EB is read at a slower clock rate due to the longest DPWM pulsewidth. The tail pointer runs faster and eventually catches the head pointer when data access conflicts occur. To increase the buffer size delays, the data access conflicts but fails to fundamentally eliminate the problem. A frequency adaptation loop is needed to assure the true plesioshronous feature in DPWM and limit the EB hardware cost that allows implementation of DPWM.
B. 8-/9-bit Encoding Scheme
To ensure frequency tracking between the synchronous clock domain and the DTC domain, the synchronous clock period should equal the average DPWM pulsewidth. An adaptive 8-/9-bit encoding scheme is introduced for the DPWM transceiver. This 8-/9-bit scheme is distinct from To minimize the DSV, each encoded symbol provides the desired RD and controls the overall frequency drift of the modulated signal. The encoding concept is shown in Fig. 4 (b) to avoid a CIS and the implicit size constraints with the EB. The proposed scheme calculates the timing skew by dynamically examining the DSV. Depending on the current DSV polarity, the data are bitwise inverted to equivalently map to a long or short pulsewidth and balance the speed of read pointer. If transmitting a CIS of a k = 7, the feedback loop periodically inverts the information bits, alternatively generates long/short pulsewidths, and equivalently adjusts the frequency of DTC_C K . By generating the desired RD, the loop seeks to balance DSV and lock DTC_C K pulsewidth to T. The long-term frequency wander between the clock domains is suppressed to avoid data access conflicts, while the instantaneous frequency deviation is absorbed in EB. The feedback operation of the adaptive 8-/9-bit scheme is similar to a phase-locked loop. The DTC is essentially an oscillator with phase noise determined by the values of the transmitted symbols. The DSV behaves as a phase/frequency detector. The selected RD provides the frequency tuning of a oscillator.
Source encoding is implemented in the digital domain to minimize the hardware cost and complexity. In Fig. 5 , an 8-/9-bit scheme is proposed to encode one 8-bit data byte into three 3-bit symbols with an additional inversion bit. Three RDs are computed and accumulated by the integrator. One bit of zero is inserted initially at the D0 MSB to align the input word length of adder. The integrator records the deviation of the information bits with respect to the expected average value in (6) . If the current DSV is positive and the next transmitted byte still results in a positive RD, the encoder inverts every bit in the next byte. If the next transmitted byte demonstrates a negative or zero RD, the bits remain unchanged. The 8-/9-bit decoding is implemented by observing the inversion bit, INVERT. The decoder inverts the received byte when INVERT is one. If the DPWM receiver incorrectly demodulates a received symbol and results in the wrong INVERT bit, burst errors will occur but are limited to the current data byte. Because no feedback or accumulation computation is required, the 8-/9-bit decoding is robust to error propagation at the Receiver (RX). Fig. 6(a) simulates the DSV histograms of uncoded and encoded 2 13 − 1 pseudo-random binary sequence (PRBS). When transmitting the uncoded PRBS, DPWM is incapable of tracking the frequency deviation. By repetitively transmitting the same pattern by ten times, the original PRBS demonstrates a peak DSV more than 125 and requires N BUF exceeding 16, i.e., 125/7.5. Most importantly, the DSV histogram broadens infinitely as the PRBS pattern length increases. The unbounded DSV makes an economical EB implementation impossible. On the contrary, the PRBS encoded by 8-/9-bit scheme results in a concentrated DSV histogram and bounds the DSV magnitude to 10.5. Fig. 6(b) simulates the temporal responses of original PRBS pattern for ten repetitions. The encoding presents several advantages. First, the DSV is bounded because the peak DSV for three symbols is 10.5, i.e., (7 − 3.5) · 3, as shown in Fig. 4(b) . Second, the encoded pattern substantially shrinks the required EB size, i.e., N BUF = 2, i.e., (10.5/7.5). Regardless of pattern format, the 8-/9-bit-encoded pattern presents a bounded disparity and only requires the total buffer size of four. Fig. 7 simulates the power spectral density (PSD) of original PRBS pattern where the signal power is broadly distributed ∼1.67 GHz (0.5/T ). The PSD of encoded pattern presents strong spectral components at 1.67 GHz because the 8-/9-bit scheme suppresses the DSV and locks the average DTC frequency.
The 8-/10-bit encoding for NRZ or 8-bit/5Q for PAM are utilized to assure the sufficient signal transition at the expense of 25% performance degradation [14] , [15] . Similarly, the disadvantage of the proposed 8-/9-bit scheme is a reduced bit rate (12.5% penalty). However, the adaptive DTC transmitter is necessary for synchronous data links to prevent the access conflicts caused by timing skew between clock domains at the expense of performance.
For mesochronous applications, a delay-locked loop (DLL) synchronizer or two-stage synchronizer avoids metastability by providing delayed versions of data or sampling clock [16] . However, the analog circuitry associated with a high-speed DLL increases the power consumption and hardware area. For plesiochronous applications using a first-in, first-out (FIFO) buffer, the clock predictor concept can be utilized to provide periodic pointer resynchronization with more complex DLL designs. However, timing jitter accumulation on data or clock path through buffer stages could corrupt the fine timing resolution required to reliably discriminate the pulsewidths. In addition, the periodic pointer resynchronization could result in data dropping or replication even the phase between pointers is corrected.
Alternatively, flow control provides an effective solution for the conventional plesiochronous applications when the frequency difference between two periodic clock domains is not significant. In conventional FIFOs, the open-loop flow control predicts the pointer frequency deviation by inserting nulls periodically. In closed-loop flow control, the RX actively requests null insertions or clock halting [17] when the FIFO is overrun when a feedback channel is available and the request latency is tolerable. For DPWM FIFOs, the null insertions could be considered if the frequency difference between periodic data clock and DPWM signal is not significant. However, the uncoded DPWM signal is asynchronous, and the instantaneous frequency depends on the transmitted symbol values, as shown in (1). The uncoded DPWM signal is a frequency random walk and can present abrupt frequency changes.
The proposed 8-/9-bit encoding is essentially an open-loop flow control scheme. The encoding examines the transmitted symbol values and calculates the frequency deviation between the synchronous data clock and the DTC clock. The encoding foresees the EB conditions according to the current running symbol stream and inserts an additional bit to determine the symbol value inversion to adjust the DPWM pulsewidth and balance the pointer rate deviation. The 8-/9-bit encoding eliminates the external trigger for pointer resynchronization and avoids data loss in an FIFO buffer and eliminates jitter accumulation and power consumption associated with DLL-based techniques or variable-rate phase-locked loops to retime the signals across the timing boundary.
IV. DPWM PREEMPHASIS
In band-limited electrical interconnects, the frequencydependent loss due to skin effect and dielectric absorption is a dominant degradation in signal integrity. In addition to amplitude attenuation in channel, the phase dispersion introduces an additional significant source deteriorating the signal integrity. Group delay represents the derivative of phase response and group delay variation (GDV) evaluates the phase distortion in signal spectrum. Signal integrity degradation induced by GDV in circuit designs was investigated in [18] - [20] . Preemphasis is applied to the modulated signal to compensate high-frequency loss in conventional NRZ and PAM signaling [9] , [21] - [24] . DPWM signaling also improves signal integrity using a feedforward equalizer. Original DPWM signal (black line) and preemphasized signal (gray line). The transfer function of DPWM signal using an amplitude preemphasis tap weight G delayed by a duration T d is
Using the first-order Pade approximation, the preemphasis can be modeled as a pole/zero combination where the zero frequency is [25] . Fig. 8 shows the original DPWM signal transmitting a symbol stream of [0, 7, 4, 0]. While DPWM modulates signal pulsewidth according to the symbol values, the preemphasis boosts the signal amplitude and remains T d in each pulsewidth. Since the preemphasis gain and duration both affect the peaking zero frequency, the joint influence on magnitude and GDV should be addressed. Choosing the value of the preemphasis weight and the duration is investigated to improve the DPWM signal integrity. Fig. 9(a) shows the peak in the frequency response shifts to lower frequency when G increases. While the gain peaking increases as G increases, the signal power closer to dc is reduced. As described in Section II, the signal attenuation around Nyquist frequency has a significant influence on signal integrity in band-limited channels. Fig. 9 (a) also shows the group delay response. As G increases, the GDV increases, which causes phase distortion of DPWM. Fig. 9(b) simulates the gain peaking at 3.125 GHz versus preemphasis tap weight and summarizes the GDV. The GDV is captured up to 3.125 GHz where the majority of DPWM signal power is located. A significant GDV contributed by the preemphasis potentially deteriorates the available timing resolution in PWM scheme. Here, the performance tradeoff between peaking magnitude and GDV is observed. When G = 0, the transmitter presents a flat group delay response and therefore a zero GDV. While a larger tap weight increases the gain peaking at Nyquist frequency, the GDV worsens. The GDV introduced by the transmitter pre-emphasis contributes to the performance degradation in the overall DPWM group delay response through electrical channels.
In addition, T d also affects the magnitude peaking frequency and phase response because increasing T d lowers the peaking frequency. To evaluate the effects of T d , Fig. 10(a) simulates the gain peaking and group delay versus T d in terms of T . When delayed by T d = 4 T = T ref , the preemphasis provides the strongest boost at 3.125 GHz. Fig. 10(b) shows the peaking magnitude at 3.125 GHz and GDV with respect to T d . The GDV increases when the preemphasis tap is prolonged.
To quantify the signal integrity enhancement achieved by preemphasis, eye diagrams are simulated through the 24-in standard FR4 Printed circuit board trace of an insertion loss of 5 dB at 3.125 GHz. Agilent's advanced design system channel simulation is utilized for rapid signal integrity analysis. A symbol stream of 2000 bits generated by a PRWG-7 evaluates the preemphasis performance. Fig. 11(a) shows the simulated the eyes of 8-DPWM. The time margin is normalized by T (40 ps) and the differential input swing is 2V s . One unit interval (UI) equals one T . The original DPWM eye opening through the trace is 0.72 UI (28.6 ps) and 0.56 Vs at 10 −12 BER. When preemphasized by G = 0.125 delayed by 4 T , the eye opening are 0.87 UI (34.6 ps) and 0.671 Vs. Fig. 11(b) shows the eye width and eye height at 10 −12 BER versus G and T d . In general, when G = 0.1 to 0.3, preemphasized DPWM presents the eye quality improvement. However, a strong preemphasis, e.g., G = 0.4 or 0.5, incurs inferior eye opening due to reduced signal power reduction at low frequencies and worsens the group delay response. The tap weight within 0.1-0.2 is found to be sufficient and optimal for improving signal integrity in this channel. Fig. 11 also simulates the signal integrity for different tap delay. For DPWM using T ref = 4 T , the preemphasis delay around 4 T presents the most improvement. To further evaluate the joint effects of G and T d , Fig. 12(a) simulates the Eye height (EH) contour. Assume the target EH is 0.65 V S , G of 0.1-0.2 is sufficient for
In fact, for a given eye opening, a lower G is desirable to reduce preemphasis power consumption and increase Transmitter (TX) dynamic range. Fig. 12(b) simulates the Eye width (EW) contour. To achieve EW > 0.8 UI, DPWM requires the least preemphasis gain and provides the best energy efficiency when
While equalization techniques could be introduced to reduce crosstalk [26] - [28] , the approach taken in this paper does not address this issue in favor of preemphasis to mitigate the DDJ contribution to the total jitter. Nonetheless, equalization could be envisioned for both DDJ and crosstalk impairments. Fig. 13 shows the EB schematic consisting of two pointers and a register file of eight words (N BUF = 4). For DPWM with T ref = 160 ps, the EB must operate at 6.25 GHz. High-speed pointers are implemented by shift registers incorporating a one-hot coding to replace conventional binary counter. The tail and head pointers point two word addresses to be written and read. The tristate buffers direct data flows to proper word registers. During the system initialization, EB presets H0 and T0 and equivalently activates the first (F0) and middle word (F4). During the normal operation phase, the enable bits (initially on H0 and T0) are circulated among each pointer chain to sequentially enable other word registers. If Data_C K and DTC_C K have the same frequency, the write and read addresses should be separated by four.
V. DPWM TRANSCEIVER CIRCUIT IMPLEMENTATION A. Elastic Buffer
B. DPWM Transmitter
Fig. 14 shows the diagram of the low-jitter DTC circuitry for an eight-level DPWM [8] . A phase-rotation circuit uses eight clock phases with state transition control to implement cycle-by-cycle pulsewidth control. For DPWM, the DTC modulates the dual-edge pulsewidth of (a k + P) T , where T ref = P T and P is programmable for adaptation to different channel characteristics. When P programs from 4 to 7, the DPWM bit rate adjusts to 10, 8.8, 7.9, and 7.14 Gbit/s. A low-jitter DPWM signal generation concept is demonstrated to overcome the duty-cycle distortion observed in the traditional DPWM signal generation [8] .
The DTC integrates the current symbol a[k] with the prior symbols and computes a modulo-8 divide to determine the current phase selection. The phase multiplexer (MUX) selects one of clock phases to generate the trigger signal (M_C K ). When M_C K passes through the latch (L), the frequency divider alternatively generates positive and negative pulses without introducing duty-cycle distortion. Note M_C K is regulated by the FSM on the latch. When a[k] is less/equal than T H , the double-edge pulses are generated by rotating the phases of MUX. When the excess phase change induced by large a[k] (larger than T H ), the FSM uses a state transition control to allow the phase to rollover and skip redundant clock cycles. When switching to the correct state, FSM enables the latch (L) and the frequency divider to generate DPWM pulses. Eliminating the redundant delay cells adopted in conventional DTC, this feature substantially improves the signal integrity especially for larger T ref or to increase the number of signal levels. The DTC timing diagram and preemphasis driver design are also shown in Fig. 14. 
C. DPWM Receiver
Symbol recovery from the 10-Gbit/s DPWM signal requires dual-edge pulsewidth conversions at 3.1 GHz [1/(4 T + 4 T )] with the timing resolution of 40 ps. The corresponding RX architecture is shown in Fig. 15 . The proposed DPWM receiver uses time-interleaved circuits (TDCs) to capture incoming positive and negative pulsewidths to recover the transmitted 3-bit symbol. Fig. 15 shows the proposed receiver block diagram. The receiver uses a common limiting preamplifier to regenerate the signal swing since DPWM has two signal levels. Sampled by the dual edges of 2 T -period clock C K IN , the slicer quantitizes the pulsewidth in terms of T, and the integrator records the conversion results for symbol demodulation. Dual TDCs perform time-interleaving pulsewidth conversions. L_C K represents the signal transitions of DPWM and triggers the write operations of EB. The proposed TDC circuits operate above a conversion rate of 3.1 GHz and simulate the timing margin of 32 ps (0.8 UI) at an input swing of 50 mV. Sampling phases beyond 0.8 UI are constrained by the sense amplifier metastability. High-speed Current mode logic (CML) circuitry is used in slicers, and CMOS logic is utilized in computation blocks to lower the power consumption. The preliminary DPWM receiver design is demonstrated in [8] , and the low-power version of receiver is presented here. By refining the preamp design and utilizing the sense-amp slicer, the new TDC reduces the 55% power consumption and improves 10% timing margin, compared with the previous TDC design using CML slicers in [8] . At 10 Gbit/s operation with 50-mV input swing, the simulated power consumption and timing margin of previous and new TDCs are 38 mW/0.7 UI and 17 mW/0.8 UI, respectively.
The lack of signal transition in NRZ and 4-PAM is not concerned in DPWM because DPWM alternatively transmits positive and negative pulses for every symbol transmission. Hence, DPWM naturally presents a run-lengthlimited characteristic and facilitates the clock recovery (CR). As shown in [8] and [12] , DPWM has energy nulls at low frequencies even without encoding. In addition, DPWM utilizes only two signal swings and presents one threshold crossing as in NRZ; therefore, the erroneous edge detection in 4-PAM is avoided. For the link applications with a forward clock channel, the CR of DPWM is straightforward using a PLL or DLL. Without a forward clock channel, CR is required for sampling and synchronizing information bits. If T ref can be implemented precisely as a multiple of T, the CR concept is identical to the one in NRZ. However, because DPWM utilizes a finer timing resolution compared with NRZ, a multiphase CR can be considered to reduce receiver circuit bandwidth [29] - [31] . In this paper, since the proposed DTC precisely generates T ref in terms of T, the receiver CR is compatible with a typical design concept in NRZ. Source encoding impacts CR for multilevel signaling. For the links without a forward clock channel, 8-/10-bit and 8-bit/5Q schemes are developed for NRZ and 4-PAM to assure signal transitions in a finite symbol length [15] . The 4-PAM CR should carefully avoid edge detections on erroneous threshold crossings to reduce recovered clock jitter. A symmetric code for 4-PAM provides a good solution to eliminate erroneous threshold crossings [32] . Notably, 8-/10-bit, 8-bit/5Q and symmetric coding all demonstrate a bit rate penalty of 25%.
VI. MEASUREMENTS
The chip was implemented in a 45-nm CMOS Silicon on insulator (SOI) and the die photograph is shown in Fig. 18 . The DPWM transceiver and the 8-/9-bit hardware were measured with probing and the voltage supply was applied through a high-frequency probe. Individual breakouts were used in the evaluation of the subcircuits.
A. 8-/9-bit Encoder/Decoder
To demonstrate the hardware feasibility of 8-/9-bit scheme, the encoder and decoder designs are implemented on a 45-nm SOI CMOS at the target data rate of 10 Gbit/s. Fig. 16 shows the implementation including clock dividers, an on-chip PRBS, encoder/decoder designs, and data MUXs. Subrate clocks (C K 3/C K 9) are used for the EB data clock and data bit multiplexing. The original 8-bit data byte is generated by PRBS and encoded by the proposed 8-/9-bit scheme. Assuming DPWM symbols are demodulated correctly, the encoded byte, E N_Byte [8 : 0] are forwarded to the input of decoder. According to the MSB inversion polarity, the decoder inverts the received byte and recovers the original data byte. If the decoder functions correctly, the decoded byte, DE_Byte [8 : 0] , represents the identical original data byte. To facilitate BERT testing, E N_Byte [8 : 0] byte and DE_Byte [8 : 0] are multiplexed to serial bit streams and compared with the expected bit patterns. The two circuits consume 1.5 mW in total when operating at 10 Gbit/s. Fig. 17(a) measures the active area of encoder and decoder Fig. 17(b) measures the eye diagram of decoded data bit. The hardware implementation demonstrates the good area and power efficiency when utilizing 8-/9-bit scheme.
B. Synchronous Data Links Incorporating Elastic Buffer and 8-/9-bit Encoding Scheme
The DTC/TDC chip microphotographs are shown in Fig. 18 with the active area measuring 93 × 94 um 2 and 218 × 160 um 2 , respectively. The power consumption of the DTC, including output driver, and the TDC, including the preamp, are 66.5 and 29 mW, respectively. The TX driver consists of the predriver, the main driver, and the preemphasis driver. When the preemphasis tap is enabled to compensate the channel losses, the preemphasis driver provides a peak tap of G = 0.125 and consumes an additional power consumption of 1.5 mW. Any choice of T ref from 4 to 7 T exhibits comparable power consumption. DPWM achieves 10 Gbit/s using T = 40 ps and T ref = 160 ps. Distinct from the prototype transceiver reported in [8] , the new transceiver enables the preemphasis feature at the TX output driver and reduces the power consumption of the RX front-end slicers. Transceiver performance is summarized in Table I . Fig. 19 shows the test setup for verifying DPWM synchronous data links using EB with 8-/9-bit scheme. Agilent 81142 pulse pattern generator is programmed to generate 10 Gbit/s encoded bit stream (E N_Data) into the TX. An on-chip de-MUX provides a serial-to-parallel interface to convert the bit stream into DPWM symbols. The DTC reads out the pattern from EB and generates DPWM signal. At the RX, the TDC demodulates DPWM symbols and stores conversion results into EB. The recovered bits are multiplexed to a serial bit stream (DE_Data). Fig. 20 measures the DPWM signal power and matches the simulated PSD. Note that signal power concentrates on 1.67 GHz (4 + 3.5 T ) and integrates 77% within the Nyquist frequency of 3.125 GHz. Fig. 21 measures the frequency-dependent loss of 120-in coaxial cable, which has 4.5-dB loss at 3.125 GHz. Fig. 21 . S21 of 120-in cable and overall response using preemphasis. DPWM preemphasis provides gain boosting around 2 dB at 3.125 GHz. The preemphasis gain is used to boost the high-frequency response specifically in the region around the peak of the DPWM in the PSD. Fig. 22 shows the BER bathtubs in the original and equalized channels. The transmitted eye width at 10 −12 BER is 31 ps (0.78 UI). At a bit rate of 10 Gbit/s, the received eye width at 10 −12 BER is 20 ps (0.5 UI) without preemphasis and 26 ps (0.65 UI) with preemphasis to reduce data-dependent jitter. The preemphasized DPWM improves the eye opening by 0.15 UI. At a bit rate of 7.14 Gbit/s, the received eye width at 10 −12 BER is 25 ps (0.625 UI) without preemphasis and 28 ps (0.7 UI) with preemphasis. Fig. 23 measures the preemphasized eyes at the cable end captured by Agilent N4903 BERT. Note that the UI remains the same in the measurements of two bit rates, and the eye diagrams represent two UIs.
C. Transmitter Performance
D. Receiver Performance
The timing resolution of TDC determines the tolerance for additional random jitter (RJ) and deterministic jitter (DJ) sources. The symbol recovery is corrupted if the peak jitter in the DPWM signal is >40 ps. The receiver BER bathtub measurements and recovered data are shown in Fig. 24. Fig. 25 shows the receiver performance versus differential input swing. In general, the measured performance presents a degraded time margin ∼0.3 UI from the simulated performance. To explain the performance degradation, several 23 . Original and preemphasized DPWM measurements through the cable at 10 Gbit/s (top) and 7.14 Gbit/s (bottom). major DJ sources should be considered. DJ TDC denotes the simulated timing degradation caused by the actual SOI circuit implementation. DJ BERT is the data jitter of pattern generator and increases as the data swing decreases. DJ CH represents the cable/connector phase mismatches. Notably, N4903A BERT incurs significant jitter when generating data swing of ∼200 mV. Therefore, the receiver performances are compared on a data swing of 240 mV. The simulated and measured time margin of BER less than 10 −12 are 0.9 and 0.625 UI, respectively. Assuming the other DJ sources other than phase mismatches are negligible, the receiver tolerates additional RJ sources of 1.78 ps (0.625UI/14) at 10 −12 BER. The performance comparison against previous PWM, DPWM, and NRZ circuit demonstrations of 10 Gbit/s is shown in Table II. VII. CONCLUSION This paper presents a high-speed serial I/O scheme based on DPWM. DPWM signaling offers a doubled bit rate compared with PWM. A low-jitter DPWM transceiver architecture is introduced to achieve a finer timing resolution for bit rate enhancement. The TX presents an eye opening of 0.78 UI at 10 −12 BER. Preemphasis concept proves the common feedforward technique helpful for DPWM. The preemphasized DPWM improves signal integrity by 0.15 UI through a 120-in cable. The RX time margin of 10 −12 BER is experimentally verified versus input swing. Different to the conventional DPWM restricted in asynchronous applications, an adaptive 8-/9-bit scheme tracks the frequency deviation and enables DPWM for synchronous data links. To demonstrate the hardware feasibility of 8-/9-bit scheme of 10 Gbit/s, the encoder and decoder implementations achieve 10 Gbit/s data rate and consume 1.5 mW.
Wei Wang (S'12) received the B.S. degree in electrical engineering from Chang Gung University, Taoyuan, Taiwan, the M.S. degree in electronics engineering from National Taiwan University, Taipei, Taiwan, and the Ph.D. degree in electrical engineering from the University of California at San Diego, La Jolla, CA, USA.
His current research interests include phase-locked loops and high-speed interface circuits. 
