This is the author's version of an article that has been published in this conference. Changes were made to this version by the publisher prior to publication. The final version is available at http://dx.doi.org/10.1109/LAEDC58183.2023.10209113

# A Multi-Stage CTLE Design and Optimization for PCI Express Gen6.0 Link Equalization

Karla G. López-Araiza<sup>#\*1</sup>, Francisco E. Rangel-Patiño<sup>#\*2</sup>, Jorge E. Ascencio-Blancarte<sup>#\*3</sup>,

Edgar A. Vega-Ochoa<sup>#4</sup>, José E. Rayas-Sánchez<sup>\*5</sup>, and Omar Longoria-Gandara<sup>\*6</sup>

<sup>#</sup> Intel Corp. Zapopan, Jalisco, 45019 Mexico

\* Department of Electronics, Systems, and Informatics, ITESO – The Jesuit University of Guadalajara, Tlaquepaque, Jalisco, 45604 Mexico

<sup>1</sup>karla.lopez.araiza, <sup>2</sup>francisco.rangel, <sup>3</sup>jorge.e.ascencio.blancarte, <sup>4</sup>edgar.vega.ochoa, {@intel.com}, <sup>5</sup>erayas, <sup>6</sup>olongoria {@iteso.mx}

Abstract-The continuously increasing bandwidth demand from new applications has led to the development of the new peripheral component interconnect express (PCIe) Gen6, reaching data rates of 64 giga-transfers per second (GT/s) and adopting the pulse amplitude modulation 4-level (PAM4) signaling scheme. While PAM4 solves the bandwidth requirements, it brings new challenges for the physical channel design. PAM4 is more susceptible to errors due to various noise sources caused by reduced voltage (and timing) ranges, yielding a higher bit error rate (BER). It also introduces new challenges in slicers, transition jitter, and equalizers, making of equalization (EQ) a critical process for PAM4 signaling. In this paper, we propose a multistage continuous-time linear equalizer (CTLE) with high-band, mid-band, and low-band frequency boost stages to deal with highly lossy channels. Given the complexity of EQ of multi-level signals, optimization techniques are used, including an efficient optimization of the transmitter finite impulse response (FIR) filter and the receiver CTLE tuning.

*Keywords*—channel, CTLE, equalization, eye-diagram, FIR, ISI, jitter, optimization, PAM4, PCIe, receiver, transmitter.

#### I. INTRODUCTION

The ever-increasing bandwidth required by new applications has deployed the peripheral component interconnect express (PCIe) Gen6, reaching data rates of 64 giga-transfers per second (GT/s) and adopting the pulse amplitude modulation 4-level (PAM4) signaling scheme. By contrast to the conventional nonreturn-zero (NRZ) signaling, the design of PAM-4 transceivers brings many new challenges for the physical channel analysis and design. The intrinsic 1/3 eye amplitudes of PAM-4 lead to a signal-to-noise ratio (SNR) penalty, and the transitions between non-adjacent levels with finite rise and fall times reduce the horizontal eye openings. Additionally, many undesired channel effects (*e.g.*, noise and attenuation in the received signal) aggravate with higher data rates.

An intense industry effort is presently ongoing regarding the development of PAM-4 receiver (Rx) architectures featuring high bandwidths, high gain, low noise, and high linearity [1]. In addition, equalizers are used to cancel many undesired physical channel effects, including inter-symbol interference (ISI), making PAM-4 equalization (EQ) more demanding [2]. A combination of continuous-time linear equalizer (CTLE) and decision feedback equalization (DFE) is widely used to eliminate ISI. However, due to the higher transmission rates, the conventional CTLE is no longer able to meet the



Fig. 1. Block diagram of the PCIe Gen6 serial link transceiver.

requirements in a wide range of channel losses [9],[10].

In this paper, we propose a 3-stage CTLE and a lowfrequency equalizer (LFEQ) designed to compensate for highly lossy channels with high, mid and low frequency bands boosting stages. We also propose an efficient optimization methodology to determine the optimal coefficients for the transmitter (Tx) feed-forward equalizer (FFE) and the Rx CTLE. The procedure implies defining a new objective function as a figure of merit (FOM) suitable for PAM4, and then applying a numerical optimization method using a combination of pattern search [3] and Nelder-Mead [4] methods. We validate our proposed methodology by using MATLAB SerDes Toolbox with realistic parameters.

# II. PCI EXPRESS EQUALIZATION

PCIe Gen6 specification defines the requirements to perform on-chip EQ at the Tx and at the Rx to mitigate undesired channel effects and minimize the bit error rate (BER). The Tx EQ coefficients for 64 GT/s are based on a FFE 4-tap finite impulse response (FIR) filter ( $C_{m2}$ ,  $C_{m1}$ ,  $C_0$ , and  $C_p$ ) as illustrated in Fig. 1. The serial data output is obtained by the superposition of four consecutive received pulses ( $v_{nm2}$ ,  $v_{nm1}$ ,  $v_n$ ,  $v_{np}$ ) that are weighted with the four different filter coefficients [5]. The FIR filter output,  $v_{out}$ , can be adjusted by varying the coefficient values, since

$$v_{\rm out} = v_{\rm nm2}C_{\rm m2} + v_{\rm nm1}C_{\rm m1} + v_{\rm n}C_0 + v_{\rm np}C_{\rm p}$$
(1)

PCIe specification defines some predefined set of values for the Tx coefficients, referred to as presets, which are adaptively changed during the on-chip EQ. The Tx EQ coefficients are computed at the upstream port by the coefficient adaptation algorithm using the received signal. These coefficients are communicated to the downstream port by using the PCIe protocol. The Tx at the downstream port then applies the



Fig. 2. EQ map coefficients search space for optimization. From [6].

received coefficients setting to its Tx EQ circuitry. This process of computing the coefficients, communicating them to the Tx, and checking the signal quality can be repeated multiple times until the required BER is achieved [5],[6].

To have a unity gain for the Tx equalizer, the Tx coefficients are subject to the following protocol constraints (as per the PCIe specification [6]):

$$|C_{m2}| + |C_{m1}| + |C_0| + |C_p| = 1$$
  
subject to  $C_{m2} \ge 0, C_{m1} \le 0, C_p \le 0$  (2)

These constraints are implemented by determining only  $C_{m1}$  and  $C_p$  to fully define  $v_{out}$  from (1), being  $C_{m2} = 1/24$  (per specification) and  $C_0$  implied by (2). The coefficients must support all eleven values for the presets, and their respective tolerances, as defined by the PCIe specification [5].

When all the PCIe specification constraints are applied, the resulting coefficients space may be mapped onto a triangular matrix, as shown in Fig. 2.  $C_{m1}$  and  $C_p$  coefficients are mapped onto the y-axis and x-axis, respectively. Each matrix cell corresponds to a valid combination of  $C_{m1}$  and  $C_p$  coefficients, and  $u(\mathbf{x}^*)$  correspond to a combination of  $C_{m1}$ ,  $C_p$  that results in an eye diagram qualified as optimum (see Section IV).

## **III. CTLE DESIGN**

At higher data rates, several EQ techniques can be used to compensate ISI impairments, and then maximizing the eye diagram before the Rx sampling process fails to satisfy the required BER. Tx pre-emphasis suffers from peak power constraints, while Rx equalizer performance is limited by the amplifier bandwidth, therefore, design trade-offs are required between Tx and Rx implementations or a combination of both. However, many times the perfect channel state information is unknown, and they can change due to PCB manufacturing process, voltage, and temperature conditions. Continuous-time adaptive equalizers can be used to overcome these challenges.

## A. Continuous Time Linear Equalization

A CTLE is a continuous-time circuit with high-frequency gain boosting, whose transfer function can flatten the channel frequency response. One of the most common types of CTLE is a source-coupled differential-pair circuit with source degeneration, whose basic topology is shown in Fig. 3 [7]. The differential-pair source resistor attenuates the low-frequency signals while the source capacitor allows the high-frequency signal content, resulting in high frequency gain boosting [8].



Fig. 3. a) CTLE bode plot, b) CTLE circuit using capacitive degeneration. From [7].

The transfer function of this circuit can be represented by one zero and two poles, where the zero provides +20dB/decade slope and a pole gives -20dB/decade giving a total of -40dB/decade. This topology can be modeled by

$$H(s) = w_{p2} \frac{s + w_{z1}}{(s + w_{p1})(s + w_{p2})}$$
(3)

where  $w_{z1} = w_{p1}A_{DC}$ ,  $w_{p1} = 2\pi f_{p1}$ , and  $w_{p2} = 2\pi f_{p2}$ , with  $w_{z\#}$  representing a zero location,  $w_{p\#}$  a pole location,  $f_{p\#}$  a pole frequency, and  $A_{DC}$  the DC gain. By placing  $w_{p2} > w_{p1} > w_{z1}$ , the CTLE provides high-frequency gain boosting [9].

The PCIe Gen6 specification [5] defines the requirements to support 64 GT/s, defining a CTLE with six poles and three zeros, and an adjustable DC gain, so the system transfer function can be modeled as

$$H(s) = \sigma \frac{(s + w_{z1})(s + w_{p2}ADC)}{(s + w_{p1})(s + w_{p2})(s + w_{p3})} \cdot \frac{(s + w_{z3})}{(s + w_{p4})(s + w_{p5})(s + w_{p6})}$$
(4)

where  $\sigma$  is defined by,

$$\sigma = \frac{w_{p1}w_{p3}w_{p4}w_{p5}w_{p6}}{w_{z1}w_{z3}}$$
(5)

Considering that the CTLE must support a wide frequency range of channel loss, the proposed CTLE consists of three stages to cover the overall transfer function at low-, mid-, and high-frequency ranges, respectively. Henceforth (4) can be described as

$$H(s) = \sigma G_1(s) G_2(s) G_3(s) \tag{6}$$

where,

$$G_1(s) = (s + w_{z1}) / [(s + w_{p1})(s + w_{p6})]$$
(7)

$$G_2(s) = (s + w_{p2}A_{\rm DC}) / [(s + w_{p2})(s + w_{p4})]$$
(8)

$$G_3(s) = (s + w_{z3}) / [(s + w_{p3})(s + w_{p5})]$$
(9)

Consequently, the EQ topology at the Rx is a combination of a 3-stage CTLE and a DFE, as shown in Fig. 1.

#### B. Low Frequency Equalizer

A conventional CTLE cannot compensate for the small amount of low-frequency channel loss since its primary objective is to compensate for high-frequency channel losses [11]. Since the slope of the low-frequency loss is quite flat (<3dB/dec), an extra circuit is required.

The uncompensated low-frequency loss causes nonnegligible

long-term residual ISI that results in data dependent jitter (DDJ) that is difficult to reduce further by enhancing a CTLE, unless a LFEQ is added [9]. The LFEQ is based on a negative feedback topology. The objective is to minimize the small slope of low-frequency loss by placing together  $w_{z1}$  and  $w_{p1}$  pairs to achieve a small amount of low frequency gain (0 to 4dB) [8]. The transfer function of the LFEQ can be defined by (7), where  $w_{p1}$  is tuned to provide the expected DC gain in the low frequency range.

# IV. OPTIMIZATION OF THE PCIE GEN6.0 LINK EQUALIZATION

We aim at finding the optimal set of Tx and Rx EQ settings to maximize the eye diagram margins. Let  $\mathbf{R}_{m} \in \Re^{2}$  denote the electrical system margins response,

$$\boldsymbol{R}_{\mathrm{m}} = \boldsymbol{R}_{\mathrm{m}}(\boldsymbol{x}) = \begin{bmatrix} \boldsymbol{e}_{\mathrm{w}}(\boldsymbol{x}) & \boldsymbol{e}_{\mathrm{h}}(\boldsymbol{x}) \end{bmatrix}^{\mathrm{T}}$$
(10)

where  $e_h \in \Re^1$  is the smallest of the three PAM4 eye height measurements and  $e_w \in \Re^1$  is the smallest of the three PAM4 eye width measurements, which are functions of the Tx FFE and Rx CTLE EQ settings contained in vector x.

We need to ensure the optimal system margin response also meets an eye linearity,  $e_{\text{linearity}}$ , larger than 0.85, and a vertical eye closure (*VEC*) below 6 dB. An initial optimization problem can be defined through a constrained formulation,

$$\mathbf{x}^{\hat{}} = \arg \min_{\mathbf{x}} u(\mathbf{x}) \tag{11}$$

subject to  $e_{\text{linearity}}(\mathbf{x}) > 0.85$  and  $VEC(\mathbf{x}) < 6$ dB, where  $u(\mathbf{x})$  is the total area of the PAM4 eye diagram,

$$u(\boldsymbol{x}) = -e_{\mathrm{w}}(\boldsymbol{x})e_{\mathrm{h}}(\boldsymbol{x}) \tag{12}$$

 $e_{\text{linearity}}$  is the measure of the vertical linearity defined by the variance of amplitude separation among the different PAM4 levels, and *VEC* is the smallest of the ratios of voltage swing to eye height.

A more convenient unconstrained objective function is

$$u'(\mathbf{x}) = -w_1 u(\mathbf{x}) \rho(\mathbf{x}) + w_2 \|\lambda(\mathbf{x})\|_2^2$$
(13)

where  $\rho(\mathbf{x})$  is a vertical eye closure penalty function defined as  $\frac{VEC(\mathbf{x})}{VEC(\mathbf{x})}$ 

$$\rho(\mathbf{x}) = 10^{-\frac{1}{6}} \tag{14}$$

and  $\lambda(x)$  is eye linearity penalty function defined as

$$\lambda = \max\left\{0, 0.85 - e_{\text{linearity}}(\boldsymbol{x})\right\}$$
(15)

Both terms in (13) are scaled by weighting factors  $w_1, w_2 \in \Re^1$  such that they become comparable. The initial unconstrained formulation can then be defined as

$$\boldsymbol{x}^* = \arg\min_{\boldsymbol{x}} u'(\boldsymbol{x}) \tag{16}$$

Additionally, we need to ensure the optimal system response is within a suitable area in the coefficients search space of the EQ map. Here we follow our work in [6] and [11] to redefine the corresponding objective function. The four responses around  $u'(\mathbf{x}^*)$  must be at least 80% of the value of  $u'(\mathbf{x}^*)$ , as shown in Fig. 2, where  $u'_{i,j}$  are the objective function values per (13) for the *i*-th  $C_{m1}$  and *j*-th  $C_p$  values.

The new optimization problem can be defined through a constrained formulation, such that the optimal set of



Fig. 4. CTLE performance a) with LFEQ and b) without LFEQ.

coefficients maximizes the system response without violating the lower bound of  $0.8u'(x^*)$  in the vicinity,

$$\mathbf{x}^{*} = \arg \min u'(\mathbf{x})$$

subject to  $l_{11}(\mathbf{x}) \le 0, \ l_{12}(\mathbf{x}) \le 0, \ l_{21}(\mathbf{x}) \le 0, \ l_{22}(\mathbf{x}) \le 0$  (17)

with

$$\boldsymbol{l}(\boldsymbol{x}) = \begin{bmatrix} u(C_{m1i^{*}+1}, C_{ctle}, C_{pj^{*}}) & u(C_{m1i^{*}-1}, C_{ctle}, C_{pj^{*}}) \\ u(C_{m1i^{*}}, C_{ctle}, C_{pj^{*}+1}) & u(C_{m1i^{*}}, C_{ctle}, C_{pj^{*}-1}) \end{bmatrix} -$$
(18)  
$$0.8u(C_{m1i^{*}}, C_{ctle}, C_{pj^{*}}) \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}$$

where  $C_{m1i*}$  and  $C_{pi*}$  are the Tx set of coefficients that minimize (13) for each of the Rx CTLE setting values ( $C_{ctle}$ ).

Similarly, a more convenient unconstrained objective function can be defined by adding a penalty term,

$$U(\mathbf{x}) = u'(\mathbf{x}) + w_3 \| L(\mathbf{x}) \|_{\rm F}$$
(19)

where  $\|L(x)\|_{F}$  is the Frobenius norm of matrix L(x) defined as

$$\boldsymbol{L}(\boldsymbol{x}) = \max\left\{\boldsymbol{0}, \boldsymbol{l}(\boldsymbol{x})\right\}$$
(20)

and  $w_3$  is a weighting factor.

Our final unconstrained formulation is

$$\mathbf{x}^{T} = \arg \min_{\mathbf{x}} U(\mathbf{x}) \tag{21}$$

with  $U(\mathbf{x})$  defined by (13), (19) and (20).

We find the optimal set of coefficients  $x^*$  by solving (21). To avoid estimating gradients and considering that the objective function has many local minima, we use a combination of pattern search [3] and Nelder-Mead [4] methods. We start the optimization with pattern search, to explore the design space until finding a potential region where the global minimum is located. Then, the solution found by pattern search is used as seed for the Nelder-Mead method, which further minimizes the objective function for a more precise solution.

#### V. SIMULATION RESULTS

To validate our methodology, we use MATLAB SerDes Toolbox considering a short, medium, and long-reach channels (CEI-56G serial links) of 10dB, 20dB and 27dB losses, respectively, in a 64 GT/s PCIe Gen6 link, where the pass/fail criteria is defined in terms of a time domain eye diagram at



Fig. 5. Eye diagrams at different stages of the CTLE: a) high-frequency stage, b) mid-frequency stage, and c) low-frequency stage.

BER=10<sup>-6</sup>. The link is simulated with the corresponding Tx jitter parameters (deterministic and sinusoidal) based on [5], and Rx jitter parameters from a common reference clock Rx architecture. The simulator generates a statistically output containing the three eye heights and widths.

The simulation results within a medium-reach channel in Fig. 4 demonstrate how the LFEQ-CTLE combination enhances overall performance in the Rx equalization scheme, yielding an eye area improvement of 35.3%.

The 3-stage CTLE equalization effects within a short-reach channel as reference are shown in Fig. 5. It is seen how each CTLE stage target a range of frequencies boosting the DC Gain. The high-frequency stage results in an improved eye opening.

To validate the proposed design within worst-case conditions, we added Tx and Rx deterministic and sinusoidal jitter parameters to the system and proceed to a link equalization optimization in a long-reach channel as reference. The eye diagrams at the receiver, before and after applying the optimization process in Section IV, are shown in Fig. 6. Additionally, Table I confirms that the resultant top eye width and height amply satisfy the channel tolerancing eye mask defined in the PCIe Gen6 Spec [5]. The optimized eye-diagram under worst-case channel conditions confirm the effectiveness of the proposed optimization approach.

#### VI. CONCLUSION

We proposed in this paper a 3-stage CTLE and a LFEQ designed to compensate for PAM-4 PCIe Gen6 highly lossy channels considering high, mid and low frequency bands boosting stages. We also proposed an efficient optimization

Table I. 64 GT/s Eye margins. Specification versus simulation.

| Eye diagram<br>parameter | PCIe Gen6<br>spec (min) | 27dB channel simulation -<br>worst-case Tx/Rx jitter parameters |
|--------------------------|-------------------------|-----------------------------------------------------------------|
| top eye height           | 6.0 mV                  | 20.0 mV                                                         |
| top eye width            | 0.1 UI                  | 0.26 UI                                                         |



Fig. 6. Eye diagram before and after the optimization process.

approach to find the optimal coefficients for the Tx FFE and Rx CTLE. We validated the proposed method by using MATLAB SerDes Toolbox. The optimized EQ coefficients were tested by measuring the eye diagrams at the receiver, confirming a significant improvement on eye height, eye width, eye linearity, and vertical eye closure.

#### REFERENCES

- [1] H. Wang, Y. Chen, Y. Gao, N. Li, Z. Zhang, C. Guo and J. Li, "A quad linear 56Gbaud PAM4 transimpedance amplifier in 0.18 µm SiGe BiCMOS technology," in *IEEE Int. System-on-Chip Conf. (SOCC)*, Singapore, Sep. 2019, pp. 165-170.
- [2] J. L. Zerbe, C. W. Werner, V. Stojanovic, F. Chen, J. Wei, G. Tsang, D. Kim, W. F. Stonecypher, A. Ho, T. P. Thrush, R. T. Kollipara, M. A. Horowitz and K. S. Donnelly, "Equalization and clock recovery for a 2.5-10-Gb/s 2-PAM/4-PAM backplane transceiver cell," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2121–2130, Dec. 2003.
- [3] R. Hooke and T. A. Jeeves, "Direct search solution of numerical and statistical problems," J. of the ACM, vol. 8, no. 2, pp. 212-229, Apr. 1961.
- [4] J. C. Lagarias, J. A. Reeds, M. H. Wright, and P. E. Wright, "Convergence properties of the Nelder-Mead simplex method in low dimensions," *SIAM J. Opt.*, vol. 9, no. 1, Jan 1998, pp. 112–147.
- [5] PCI SIG Org. (2022), PCI Express® Base Specification Revision 6.0.1 [Online]. Available: https://pcisig.com/specifications.
- [6] F. E. Rangel-Patiño, J. E. Rayas-Sánchez, E. A. Vega-Ochoa, and N. Hakim, "Direct optimization of a PCI Express link equalization in industrial post-silicon validation," in *IEEE Latin American Test Symp.* (*LATS 2018*), Sao Paulo, Brazil, Mar. 2018, pp. 1-6.
- [7] W. T. Beyene, "The design of continuous-time linear equalizers using model order reduction techniques," in *IEEE EPEP Elec. Perform. Electron. Packag.*, San Jose, CA, USA, Oct. 2008, pp. 187-190.
- [8] R. Farjad-Rad, H.-T. Ng, M.-J. E. Lee, R. Senthinathan, W. J. Dally, A. Nguyen, R. Rathi, J. Poulton, J. Edmondson, J. Tran and H. Yazdanmehr, "0.622-8.0 Gbps 150 mW serial 10 macrocell with fully flexible preemphasis and equalization," in *Symp. VLSI Circuits Dig. Tech. Papers*, Kyoto, Japan, Jun. 2003, pp. 63-66.
- [9] J. He, N. Qi, N. Yu, L. Wu, P. Yin Chiang, X. Xiao, and N. Wu., "A 2ndorder CTLE in 130nm SiGe BiCMOS for a 50GBaud PAM4 optical driver," in *IEEE Int. Conf. Integr. Circ. Tech. Applic.*, Beijing, China, Nov. 2018, p. 151.
- [10] S. Parikh, T. Kao, Y. Hidaka, J. Jiang, A. Toda, S. Mcleod, W. Walker, Y. Koyanagi, T. Shibuya and J. Yamada, "A 32Gb/s wireline receiver with a low-frequency equalizer, CTLE and 2-tap DFE in 28nm CMOS," in *IEEE Int. Solid-State Circ. Conf.*, CA, USA, Feb. 2013, pp. 28-29.
- [11]R. J. Ruiz-Urbina, F. E. Rangel-Patiño, J. E. Rayas-Sánchez, E. A. Vega-Ochoa, and O. Longoria-Gándara, "Transmitter and receiver equalizers optimization for PCI Express Gen6.0 based on PAM4," in *IEEE MTT-S Latin America Microw. Conf. (LAMC)*, Cali, Colombia, May 2021, pp.1-4.