Abstract-The use of multi-tone signaling can avoid the high loss of FR4 traces beyond 10 GHz while confining the task of equalization to 2.5-GHz-wide subchannels. This paper presents multi-tone transceiver design issues and derives performance requirements such as linearity, sensitivity, quadrature mismatches, and phase noise. Possible transmitter and receiver implementations are described and a critical interference effect resulting from local oscillator harmonics is identified.
I. INTRODUCTION
High-speed data transmission on printed-circuit boards (PCBs) or cables in servers and networking equipment presents new challenges as data rates approaching 20 Gb/s are sought. The use of binary signaling at these rates entails two difficult issues. First, with a loss of about 30 dB at 10 GHz and 60 dB at 20 GHz, a 30-in trace on FR4 boards would require an extremely high level of equalization. Second, transmit (TX) and receive (RX) operations such as multiplexing (MUX) and demultiplexing (DEMUX), phase-locked loops (PLLs), and clock and data recovery (CDR) circuits tend to consume high power at 20 Gb/s.
An alternative approach based on the "divide-and-conquer" principle is to employ multi-tone signaling [1] -as originally performed in DSL systems and later used in wireless communication in the form of orthogonal frequency-division multiplexing (OFDM). The advantages of this method include less equalization across a narrower band and signal processing at lower speeds. However, many other aspects of the transceiver performance that are typically unimportant in binary systems become critical here.
This paper presents an analysis of multi-tone signaling at 20 Gb/s over FR4 traces and derives the transceiver performance requirements. The simulations are performed in Simulink and include a lossy transmission line model obtained by electromagnetic simulations and circuit element approximations. Section II describes a representative system architecture and the design constraints that it entails. Section III deals with the TX design and Section IV with the RX design. Section V presents harmonic and interference issues resulting from broadband operation of the system and Section VI provides the PLL requirements. Section VII summarizes the simulation results.
II. TRANSCEIVER ARCHITECTURE
Unlike DSL and wireless systems, 20-Gb/s transceivers are likely to incorporate analog multi-tone signaling. That is, the modulation of each subcarrier and the summation of the subchannels in the TX must be performed in the analog domain. Similarly, the decomposition and demodulation of each of the subchannels in the RX must occur in analog form. If realized in the digital domain, multi-tone signaling would require 10-GHz DACs and ADCs with resolutions on the order of 6 bits. Figure 1(a) shows a representative architecture. The data to be transmitted appears as four 5-Gb/s binary streams, each stream is modulated on a subcarrier, f cj , and the resulting four subchannels are summed at the output. On the RX side, each subchannel is downconverted to the baseband and demodulated, producing the corresponding 5-Gb/s data. Each subchannel utilizes 16QAM to occupy a bandwidth of 2.5 GHz 1 so that the link avoids the lossy band beyond 10 GHz [ Fig. 1(b) ].
With a 30-in trace, each subchannel in the above system experiences a "tilt" of 13 dB maximum, thus requiring only simple equalization. Furthermore, MUX/DEMUX and CDR operations run at only 5 Gb/s and the PLLs at no higher than 8.75 GHz. Additionally, in the presence of a deep notch in the trace frequency response, the corresponding subchannel can be moved to above 10 GHz, carry a lower data rate, or employ decision-feedback equalization. However, the transceiver now closely resembles an RF communication system, facing similar sensitivity, linearity, and precision issues. Also unlike typical RF designs, the broadband nature of the serial link introduces additional harmonic and interference effects that impact the performance.
While the spectra shown in Fig. 1(b) have little overlap, multi-tone signaling in fact allows 50% overlap if the symbol period is an integer multiple of the subcarrier period [2] . That is, the overall bandwidth can be further reduced to 6.25 GHz if f cj = j× 1.25 GHz. Unfortunately, such a scheme would dictate high quadrature precision and very low phase noise so as to retain orthogonality of the subchannels. For this reason, only the case of nonoverlapping spectra is considered here. We now describe the details of this architecture and the reasons for the design choices made here. 
III. TRANSMITTER DESIGN

A. Number of Subchannels
The number of subchannels presents a trade-off between the required equalization, the number of PLLs and CDR circuits, and the TX/RX linearity. As the number increases, the first aspect is relaxed at the cost of exacerbating the last two.
A distinct disadvantage of multi-tone signaling arises from the large peak-to-average ratio (PAR) produced as the number of subchannels increases. This dictates the use of high-linearity power amplifiers. Several techniques have been proposed in literature to combat the non-linear effects of power amplifiers in OFDM systems [3] . However, for few subchannels, the average transmitted power is backed-off by the signal's PAR from the maximum power that can be transmitted with high linearity. Figure 2 plots the error vector magnitude (EVM) as a function of the transmitter output 1-dB compression point (A 1dB ) with a TX swing of 1.2 V pp and four subchannels. We observe that A 1dB ≈ 2 V pp degrades the EVM negligibly. (An EVM of 5 % is required for BER = 10 −14 .) This level of linearity proves feasible, indicating that the choice of four subchannels is a good compromise. 
B. Modulation
The system of Fig. 1(a) both performs a baseband to passband conversion and reduces the overall signal bandwidth from 20 GHz to 10 GHz. It must therefore employ a modulation scheme that provides an overall bandwidth reduction factor of four, e.g., 16PSK or 16QAM. The latter exhibits a 4.2-dB SNR advantage over the former and is chosen here. Nonetheless, for a bit error rate (BER) of 10 −14 , 16QAM necessitates an SNR of 25 dB [4] (13 dB higher than that for binary signaling). Figure 3 illustrates a possible realization of the 16QAM modulator. PAM4 signals with square pulses carry substantial energy in their sidelobes (≈ 10 dB lower than the main lobe) , corrupting adjacent subchannels. For this reason, pulse-shaping filters must be interposed between each PAM4 modulator and the upconversion mixer so as to suppress the sidelobes.
2 Raisedcosine filters ideally exhibit no energy beyond the main lobe and are thus suited to this design, but they introduce intersymbol interference (ISI) for small roll-off factors in a dispersive channel. A roll-off factor of 0.8 is chosen to achieve small ISI and moderate inter-chanel interference (ICI).
The use of 16QAM also places stringent upper limits on the quadrature imbalances in Fig. 3 . According to simulations, for a 1-dB SNR penalty, the phase mismatch and gain mismatch must remain below 1
• and 1%, respectively. These values would require quadrature calibration in both the TX and the RX, especially for subcarrier frequencies as high as 8.75 GHz.
C. Discrete Pre-Emphasis
We introduce the concept of "discrete pre-emphasis" as a TX function that relaxes the RX requirements. Consider the scenario illustrated in Fig. 4(a) , where the subchannels are launch-ed with equal levels, having a differential swing of 0.3 V pp each. For a 30-in trace, the fourth subchannel is received at a level of 10 mV pp , dictating an RX noise figure (NF) of 19 dB. Now, suppose the subchannel swings are chosen as shown in Fig. 4(b) so that the total TX output level remains unchanged but the received subchannel swings are equal. The RX noise figure can thus be relaxed by 7.2 dB. Unfortunately, this choice of launched swings presents serious RX linearity issues for short traces. For a zero-length trace, the subchannel swings differ by about 23 dB, mandating an RX third-order intercept point (IIP 3 ) of +19 dBm. As a compromise, the discrete pre-emphasis can be chosen such that the received levels are equal for mid-length (15-in) traces [ Fig. 4(c) ]. Such a choice relaxes the NF by 4.5 dB while imposing an IIP 3 of +15.3 dBm. IV. RECEIVER DESIGN As mentioned in Section II, the receiver comprises four independent paths. Each path can be realized as a directconversion receiver consisting of a low-noise amplifier (LNA) with a 2.5-GHz bandwidth centered at the subcarrier frequency, quadrature downconversion mixers, low-pass filters (LPFs), two 2-bit 1.25-GHz ADCs, and a multiplexer. 4 As explained below, variable-gain stages are necessary for various trace lengths. For a 30-in trace, the receiver sensitivity must satisfy an SNR of 25 dB for an input level of 16 mV pp (with discrete pre-emphasis) and a bandwidth of 2.5 GHz. This translates to an NF of 23.5 dB. Under these conditions, the IIP 3 must exceed −8 dBm to minimize the corruption of the weakest subchannel by the strongest. Also a maximum gain of 28 dB is necessary so as to raise the swing to the full-scale of the ADC (assumed 400 mV here).
With a zero-length trace, on the other hand, the swings vary from 0.15 V pp to 0.47 V pp , allowing an NF of 43 dB while necessitating an IIP 3 of +15 dBm. It is therefore necessary to lower the RX gain of the LNA or entirely bypass it. Similarly, the baseband gain must be reduced to avoid saturating the ADC. The overall gain is equal to −1.4 dB for the strongest subchannel in this case.
V. LOCAL OSCILLATOR HARMONICS
The simultaneous transmission and reception of four broadband signals must deal with an issue rarely encountered in RF design. This issue arises because the abrupt switching inherent in efficient mixers translates the spectrum to not only the local oscillator (LO) frequency but also its harmonics. Illustrated in Fig. 5(a) for the odd harmonics of the LO, this phenomenon corrupts the higher subchannels with attenuated replicas of the first. Unfortunately, the natural attenuation factors (with a sinc envelope) prove inadequate. For example, the replica at 3f c1 = 3.75 GHz is only 10 dB lower and, even with the discrete pre-emphasis shown in Fig. 4(c) , still falls only 12 dB below the power of the second subchannel. A similar effect appears in the receiver. As depicted in Fig. 5(b) , the harmonics of f c1 downconvert the higher subchannels to baseband, thereby corrupting the first subchannel.
The problem of LO harmonics cannot be easily solved by bandpass filtering because the required selectivities translate to relatively complex and power-hungry filters. Since this issue occurs only in relation to f c1 , it is possible to employ baseband PAM4 signaling for the first subchannel and hence avoid the use of f c1 altogether and save one PLL as well.
VI. PHASE-LOCKED LOOPS
The subcarrier frequencies shown in Fig. 1(b) are odd multiples of 1.25 GHz and require four PLLs shared by the TX and the RX. The 1.25-GHz subcarrier is also used in four CDR loops to recover the clocks required by the ADCs to correctly sample the data. In each loop, a phase interpolator is digitally controlled by the output of a majority early/late voting algorithm to determine the correct phase of the clock.
The use of 16QAM modulation places relatively stringent phase noise requirements on the subcarriers. From the expressions derived in [5] , the BER is plotted in Fig. 6 for different amounts of integrated phase noise. We observe that an rms phase noise of 0.5
• yields a 0.5-dB SNR penalty in the vicinity of BER = 10 −14 . The integrated phase noise can be expressed as
where S φ denotes the in-band phase noise (in the flat region) and f loop is the loop bandwidth. Assuming a reference frequency of 125 MHz and hence f loop ≈ 10 MHz, we obtain S φ = −117 dBc/Hz at 10-MHz offset for φ n,rms = 0.5
• . This value of S φ falls within the reach of LC oscillators, even for frequencies approaching 10 GHz [6] , [7] . Table I summarizes the required transceiver performance.
VII. SIMULATION RESULTS
The end-to-end system of Fig. 1(a) has been simulated in Sim-ulink with a 30-in FR4 trace. Continuous-time equalization techniques [8] have been added to the LNA model of each path, yielding a maximum tilt correction of 13 dB. Figure  7 shows the detected PAM4 output of the first subchannel with and without tilt correction, demonstarting that multi-tone signaling dramatically simplifies the task of equaliztion. These simulations include a PLL phase noise of -117 dBc/Hz at 10-MHz offset and I/Q mismatches of 1% and 1
• . Received eye diagrams at demodulator input: (a) without tilt correction, (b) with tilt correction.
