A duobinary transmitter (TX) is presented in a single-ended topology using voltage-mode drivers to support dynamic random access memory (DRAM) interface. A four-phase parallel duobinary precoder is included. It relaxes one of the critical timing requirements of the duobinary TX by reducing the feedback step of the precoder and performing its feedback at once. Fabricated in a 55 nm CMOS process, the TX occupies 0.053 mm 2 active area. The TX achieves 10 Gbps operation at 6.8 pJ/b of energy efficiency and operates up to 12.8 Gbps at 7.6 pJ/b.
Introduction: The data rate of DDR5 and LPDDR5 has increased to 6.4 Gbps, and the next generation of the dynamic random access memory (DRAM) is expected to operate at a data rate above 10 Gbps. At these high data rates, frequency-dependent channel loss is the most common issue which causes the inter-symbol interference. Duobinary signalling can achieve a data rate theoretically twice of the channel bandwidth [1] , thus it can be one of the solutions to mitigate this frequency-dependent attenuation problem. In applying duobinary signalling to DRAM interface, single-ended signalling with voltage-mode driver is required to provide backward compatibility [2] . When receiving and decoding duobinary sequence, a bit error causes a continuous error to the subsequent data. Especially for DRAM interface which has bus idle time between data transmission [3], the undefined data can affect the data sequence to be decoded. This issue can be resolved by implementing a precoder at the transmitter (TX) side to precode the data sequence before transmitting [1] . The feedback time of the precoder should be minimised because the maximum operating speed can be limited by the feedback time.
Previous duobinary TXs [1, 4] were implemented by current-mode drivers with differential signalling which may not be suitable for a DRAM interface. A single-ended current-mode duobinary TX operating up to 7 Gbps is presented in [5] , but a precoder was not implemented. As mentioned earlier, considering the bus idle time of a DRAM interface, a precoder is required and stringent timing constraint in the feedback loop should be considered for the higher data-rate operation.
In this Letter, we propose a single-ended voltage-mode duobinary TX including a four-phase parallel precoder. The feedback time of the precoder is minimised by reducing the feedback step and performing feedback at once.
Duobinary signalling: A duobinary signal can be generated by the sum of the present bit and the previous one of a binary sequence. Also, it can be implemented by the feed-forward equaliser (FFE) at the TX side so that the input signal of a receiver is duobinary. The three-level duobinary receiver converts incoming duobinary signal (Y n ) to original binary signal (D n ), by performing following function: D n = Y n − D n−1 . However, if an error occurs, the subsequent data will be affected successively. To prevent this error propagation, a precoder should be implemented on the TX side in a duobinary system. In general, a precoder performs the following function:
where D n and X n denote the data sequences before and after precoding, respectively, and the operator ⊕ denotes XOR function. The resulting duobinary signal which has been precoded can be decoded unaffected by the previous decision, such that
A precoder is placed before or after an output serialiser. When the precoder is implemented after serialisation (serial precoder), a full-rate clock is required because the precoding should be applied to a full-rate serialised data. Therefore, the serial precoder is not adequate for a quarter-rate clocking system. Another implementation of precoder is a parallel precoder which precodes parallel data before serialisation. A four-phase parallel precoder using pre-calculation block was presented and simulated in [6] . The feedback is performed in two steps to output two bits at a time. The timing constraint of the parallel precoder is T qclk = 4T b > T d_dff + 2T d_mux + T setup , where T qclk is the quarterrate clock period, T b is the output data bit time, T d_dff is the clock-to-Q delay of a D flip-flop, T d_mux is the propagation delay of a 2-to-1 MUX, and T setup is the setup time of a D flip-flop, respectively.
Architecture: Fig. 1 shows the overall architecture of our duobinary TX with single-ended source-series terminated (SST) drivers and a fourphase precoder. A clock buffer (CK BUF) receives external differential clock input (CLKP/CLKN), and four-phase quarter-rate clock signals are generated by an IQ divider (IQ DIV) and a single-to-differential converter. Thirty-two-bit parallel data from an on-chip PRBS generator are serialised to 4-bit sequences by a 32:4 serialiser. A four-phase precoder encodes these 4-bit parallel data into another 4-bit parallel sequences. The 4-bit output of the precoder is re-timed and re-arranged into five groups of 4-bit sequences by an FFE retimer. When the TAP_SIGN signal for each FFE tap is asserted, the tap coefficient is negative, and the corresponding FFE retimer output is inverted. The output stage has 32 segments which are composed of a 4:1 serialiser and an SST driver. The skew of each four-phase clock signal to the output stage is controlled by delay cell with a resolution of 2 ps, and the total skew of 14 ps can be adjusted. Except for two segments which are dedicated to the last tap (TAP4), each output segments can be allocated to any FFE tap from TAP0 to TAP3 by the TAP_SEL signal, and the tap coefficients are defined by the DRV_EN signal and above-mentioned TAP_SIGN signal. Feedback time reduced parallel precoder: For a quarter-rate clocking system, the precoding operation should be performed for 4-bit parallel data, and the precoding definition of (1) can be expanded as follows by applying the operation recursively:
Since we know the fourth output of the previous cycle (X n−1 ), we can figure out each four outputs of the present cycle (X n − X n+3 ). Equation (2) can be performed alternatively through one of the separated operations presented in Table 1 according to whether X n−1 is '0' or '1'. 
Our parallel precoder of Fig. 2 performs cumulative XOR operations of incoming parallel data (D n − D n+3 ) and outputs 4-bit parallel data according to the feedback of the fourth output (X n−1 ) of the previous quadrature clock cycle. At the input stage of each data path, the identical four-input XOR gates are used to match delay, and unused inputs of each XOR gate are tied to VSS. The timing constraint of the proposed four-phase parallel precoder is T qclk = 4T b > T d_dff + T d_mux + T setup , which is relaxed by T d_mux compare to the previous one. Our precoder reduces four DFFs and four latches while using four more two-input XOR gates. 
Fig. 2 Proposed parallel duobinary precoder and its timing diagram
Measurement results: The duobinary TX is fabricated in a 55 nm CMOS process, and the total active area is 0.053 mm 2 including the clock buffer and the IQ divider. Fig. 3 shows measurement setup and the die micrograph with a magnified layout. A half-rate differential input clock signal is applied through a single-to-differential converter, and four-phase quarter-rate clock signals are generated inside the chip. The S 21 response of the test channel through a 3-inch FR4 PCB trace, an SMA connector, and a 35-inch SMA cable is shown in Fig. 4 . The TX achieves 10 Gbps operation at 6.8 pJ/b of energy efficiency and operates up to 12.8 Gbps at 7.6 pJ/b. The energy efficiency becomes worse at a data rate of 11 Gbps or higher, due to the sharp increasing channel loss from 3.7 GHz which is the Nyquist frequency of 11 Gbps duobinary signal. (The Nyquist frequency of a duobinary signal is 1/(3T b ) [1] .) Measured eye diagrams through a test channel are shown in Fig. 5 . The eye diagrams of the duobinary output signals without equalisation (when both tap coefficient of T0 and T1 are 0.5) and with equalisation are shown, respectively. Using three-tap FFE with tap coefficient of 0.833, −0.125, and 0.042 at a data rate of 12.8 Gbps, the eye opening is 35 mV and 25 ps (0.32 UI). Table 2 compares the performance of our prototype with that of other duobinary TXs. The proposed TX is implemented in SST single-ended topology, and a quarter-rate parallel precoder is implemented in the TX. 
