# Direct Optimization of a PCI Express Link Equalization in Industrial Post-Silicon Validation

Francisco E. Rangel-Patiño<sup>1,2</sup>, José E. Rayas-Sánchez<sup>1</sup>, Edgar A. Vega-Ochoa<sup>2</sup>, and Nagib Hakim<sup>3</sup>

<sup>1</sup> Department of Electronics, Systems, and Informatics, ITESO – The Jesuit University of Guadalajara, Tlaquepaque, Jalisco, 45604 Mexico

<sup>2</sup> Intel Corp. Zapopan, Jalisco, 45019 Mexico

<sup>3</sup> Intel Corp. Santa Clara, CA, 95052 USA

francisco.rangel@intel.com

Abstract — Post-silicon validation is a crucial industrial testing process in modern computer platforms. Post-silicon validation of high-speed input/output (HSIO) links can be critical for making a product release qualification. Peripheral component interconnect express (PCIe) is a high-performance interconnect architecture widely adopted in the computer industry, and one of the most complex HSIO interfaces. PCIe data rates increase on every new generation. To mitigate channel effects due to the increase in transmission speeds, the PCIe specification defines requirements to perform equalization (EQ) at the transmitter (Tx) and at the receiver (Rx). During the EQ process, one combination of Tx/Rx EQ coefficients must be selected to meet the performance requirements of the system. Testing all possible coefficient combinations is prohibitive. Current industrial practice consists of finding a subset of combinations at post-silicon validation using maps of EQ coefficients, which are obtained by measuring the eve height, eye width, and the eye asymmetries of the received signal. Given the large number of electrical parameters and the multiplicity of signal eyes that are produced by on-die probes for observation, finding this subset of coefficients is often a challenge. In order to overcome this problem, a direct optimization method based on a suitable objective function formulation to efficiently tune the Tx and Rx EQ coefficients to successfully comply with the PCIe specification is presented in this report. The proposed optimization approach is based on a low-cost computational procedure combining pattern search and Nelder-Mead methods to efficiently solve an objective function with many local minima, and evaluated by lab measurements on a realistic industrial postsilicon validation platform.

CORE

*Index Terms* — channel, crosstalk, CTLE, equalization, equalization maps, eye-diagram, FIR, high-speed links, ISI, jitter, optimization, PCIe, post-silicon validation, receiver, signal integrity, transmitter, tuning.

#### I. INTRODUCTION

Nowadays, the combined effects of increased processor complexity, customers' performance requirements, and timeto-market have added a big pressure on post-silicon validation, which is typically the last step prior to volume manufacturing. The post-silicon validation purpose is to qualify a product over all process corners and operating conditions, and then is a crucial industrial testing process in modern computer physical platforms.

A significant portion of circuits to be validated in modern

microprocessors corresponds to high-speed input/output (HSIO) links. Post-silicon validation of HSIO links can be critical for making a product release qualification decision under aggressive launch schedules. Additionally, finding the best receiver analog circuitry settings in HSIO links is a very time consuming post-silicon validation process.

Peripheral component interconnect express (PCIe) [1] is one of the most complex HSIO interfaces. PCIe is a packet based high-speed point-to-point interconnection technology that evolves with new computer industrial demands [2]. PCIe allows a star-topology architecture with strong similarity to the modern switched Ethernet fabric, and it is the primary interface for a host central processing unit (CPU) to connect with input/output (I/O) devices. The direct memory access (DMA) and other resources are also available to PCIe devices without having to share the data bus with other devices in the same system.

The PCIe bandwidth has been scaled by means of multiple lanes ( $\times$ 1,  $\times$ 2,  $\times$ 4,  $\times$ 8,  $\times$ 16, and  $\times$ 32) and interconnections rates have increased from 2.5 Gb/s of the first generation (PCIe1), to 5 Gb/s (PCIe2) and 8 Gb/s (PCIe3) [3]. PCIe has been continually enhancing its performance, and the next generation (PCIe4) is expected to operate at 16 Gb/s [4], [5], targeting 512 Gb/s capacity over 32 lanes. However, as transmission speeds increase, the transmission channel effects such as reflections, electromagnetic coupling, and attenuation are more severe, causing the signals to become more susceptible to errors [6]-[8]. Additionally, PCIe channels are bandwidth-limited by default and cause large signal attenuation at high frequencies. This generates distortion and spreading of the transmitted signal over multiple symbols, causing inter symbol interference (ISI), which can make the signal unreadable at the Rx, producing bit errors and too closed eye diagrams. The most practical solution to this problem is signal conditioning to open the eye diagram [9] before the Rx samples the data, being on-chip equalization (EQ) the most practical way to compensate for the channel attenuation [10]. PCIe3 specification defines the requirements to perform EQ at the Tx and/or at the Rx to mitigate undesired effects and minimize the bit error rate (BER). For the Tx EQ,

This work was supported in part by CONACYT (*Consejo Nacional de Ciencia y Tecnología*, Mexican Government) and Intel Corp. through a scholarship granted to F. E. Rangel-Patiño.



Fig. 1. A PCIe channel with Tx and Rx equalizers.

the signal can be reshaped before the signal is transmitted in an effort to overcome the distortion introduced by the channel. At the Rx, the signal can be reconditioned to improve the signal quality.

PCIe3 specification defines an adaptive mechanism for EQ to determine the optimum value of the Tx and Rx EQ coefficients within a fixed time limit. A typical PCIe system may have hundreds of combinations of EQ coefficients, and some of these combinations will produce better EQ results than others. Testing every coefficients combination using an exhaustive enumeration method to find the best one is impractical, as this approach usually consumes a large amount of time. In order to reduce the selection time, the current practice is to find out a subset of coefficient combinations during post-silicon validation, and then program it into the system BIOS. The current industrial method to find out the best subset of coefficients consists of using maps of EO coefficients, which are obtained by measuring the eye height, eye width, and the eye asymmetries of the received signal. These maps show how the Rx performs at different locations of the coefficient space. These EQ maps are intuitive visual indicators that help experienced post-silicon validation engineers to find the optimal coefficient combination by inspection. The method consists of finding the set of coefficients that qualify the eye-width, eye-height, and eye diagram asymmetries as near optimal. However, data collection to generate the EQ maps consumes a very large amount of post-silicon validation schedule and resources.

In this paper, we propose a simple yet efficient optimization methodology to find out the optimal subset of coefficients for the Tx and Rx in a PCIe equalization process. The procedure implies defining an effective objective function, and then applying a direct numerical optimization method using lab measurements in an industrial post-silicon validation platform. To overcome the problem of multiple local minima in the measurement-based objective function, an efficient combination of pattern search and Nelder-Mead methods is employed. The obtained eye-diagram results confirm the effectiveness of the proposed approach.

The organization of the paper is as follows. Section II describes the PCIe EQ process. The PCIe link equalization based on Tx EQ coefficient matrix maps is presented in Section III. The objective function formulation and the optimization procedure are presented in Section IV. The



Fig. 2. PCIe Tx/Rx adaptive equalization.

system test setup and system measurements are described in Section V. Finally, the results are discussed in Section VI, and conclusions are given in Section VII.

# II. PCI EXPRESS EQUALIZATION

PCIe3 provides a bit rate of 8 Gb/s while still using the same copper channel as PCIe2. PCIe channels allow up to 22 dB of attenuation at 4 GHz. The high frequency components are therefore diminished while crossing such a bandwidth-limited channel. To mitigate the effects of ISI and other channel-induced noise impairments, the PCIe3 specification defines the provision of performing equalization at the transmitter and at the receiver [2].

# A. Tx and Rx Equalizers

Most Tx serializer-deserializer (SERDES) implementations comprise a feed-forward equalizer (FFE) 3-tap finite impulse response (FIR) filter.  $C_m$ ,  $C_0$ , and  $C_p$  represent the three filter taps coefficients. The pre-cursor ( $C_m$ ) and post-cursor ( $C_p$ ) coefficients refer to whether the FFE filter taps work on an advanced or delayed signal with respect to time. Through the FFE filter, the serial data signal is delayed by several flip-flops which implement the filter taps. Three consecutive received pulses ( $v_{nm}$ ,  $v_n$ ,  $v_{np}$ ) are multiplied with the three different filter tap coefficients, and the results are summed and driven to the serial data output [11]. The filter response can be then adjusted by controlling the tap coefficients values. Therefore, the output signal ( $v_{out}$ ) of the FIR filter is given by

$$v_{\rm out} = v_{\rm nm} C_{\rm m} + v_{\rm n} C_0 + v_{\rm np} C_{\rm p}$$
 (1)

The EQ topology at the Rx can be a combination of a continuous-time linear equalizer (CTLE) that works independently of the clock recovery circuit, and a decision feedback equalizer (DFE). The CTLE is a simple one tap coefficient ( $C_r$ ) continuous-time circuit with high-frequency gain boosting, whose transfer function can compensate the channel response [12]. Fig. 1 shows the simultaneous implementation of Tx and Rx equalizers in a PCIe channel.

## **B.** Equalization Process

PCIe3 specification establishes some predefined set of



Fig. 3. EQ map coefficients search space for optimization.

values for the three Tx coefficients, which are referred to as presets, and then are adaptively changed during the link training and equalization procedure, in which both downstream port and upstream port devices (see Fig. 2) negotiate each to other the Tx EQ values to guarantee a BER less than  $10^{-12}$ . Since the Tx does not know the channel parameters, the Tx EQ coefficients are computed at the upstream port by the coefficient adaptation algorithm in the medium access control (MAC) layer using the received signal, as shown in Fig. 2. Then, these coefficients are communicated to the downstream port by using the PCIe protocol. The Tx at the downstream port then applies the received coefficients setting to the Tx EQ circuitry. The Rx drives two types of quality feedback by measuring the eye opening or evaluating eye edge ISI. This process of computing the coefficients, communicating them to the Tx, and checking the signal quality can be repeated multiple times until the required BER is achieved [13], [14].

### **III. TRANSMITTER EQUALIZATION COEFFICIENT MATRIX**

The values of the Tx coefficients are subjected to the following protocol constraints:

 $|C_{\rm m}| + |C_0| + |C_{\rm p}| = 1$  subject to  $C_0 > 0, C_{\rm m} \le 0, C_{\rm p} \le 0$  (2)

These constraints are implemented by determining only  $C_{\rm m}$  and  $C_{\rm p}$  to fully define  $v_{\rm out}$  from (1), being  $C_0$  implied by (2). Additionally, the coefficients range and tolerance are constrained by some requirements, as follows.

The coefficients must support all eleven values for the presets, and their respective tolerances, as defined by the Tx preset ratios table in the PCIe specification [2].

In order to keep the output-transmitted power constant with respect to coefficients, the full swing (FS), which indicates the maximum differential voltage that can be generated by the Tx, is defined as

$$FS = \left| C_{\rm m} \right| + \left| C_{\rm 0} \right| + \left| C_{\rm p} \right| \tag{3}$$

The flat level voltage should always be greater than the minimum differential voltage that can be generated by the Tx, indicated as the low frequency (LF) parameter,



Fig. 4. PCI Express test setup: an Intel server post-silicon validation.

$$C_0 - \left|C_{\rm m}\right| - \left|C_{\rm p}\right| \ge LF \tag{4}$$

When the above constraints are applied, the resulting coefficients space may be mapped onto a triangular matrix, as shown in Fig. 3, where several EQ maps, one per CTLE coefficient ( $C_r$ ) value are superimposed.  $C_m$  and  $C_p$  coefficients are mapped onto the y-axis and x-axis, respectively. Each matrix cell corresponds to a valid combination of  $C_m$  and  $C_p$  coefficients, and  $u(x^*)$  correspond to a combination of  $C_m$ ,  $C_p$  and  $C_r$  that results in an eye diagram qualified as optimum, as explained later in greater detail. This EQ maps can be used as an intuitive visual indicator of the equalization performance.

The current post-silicon practical method to find the best subset of coefficients for both Tx and Rx, consists of using these EQ maps, which are obtained by measuring the eye height, eye width, and eye asymmetries of the received signal for each of the  $C_m$  and  $C_p$  combinations. Three EQ maps are generated for each of the CTLE coefficient  $(C_r)$  values, and each lane and device pairing may require one or more EQ maps. The current industrial method, used by experienced validation engineers, consists of visually analyzing each of the EQ maps to select the coefficients  $C_{\rm m}$  and  $C_{\rm p}$  for the FIR filter in the Tx, and  $C_r$  for the CTLE in the Rx, that correspond to an eye qualified as optimum (maximum eye height, maximum eye width, and minimum eye asymmetry). However, this has to be performed by ensuring at the same time that the responses around the best  $C_{\rm m}$ - $C_{\rm p}$  matrix cell are at least 80% of the value of that matrix cell, as illustrated in Fig. 3, to avoid selecting a combination of too-high sensitivity. Due to the large number of EQ maps, finding the optimal subset of coefficients is usually a very challenging task, considering the large number electrical parameters, and the multiplicity of signal eyes that are produced by on-die probes [15] for observation.



Fig. 5. Objective function values across iterations.

## IV. OBJECTIVE FUNCTION FORMULATION AND OPTIMIZATION

We aim at finding the optimal set of coefficients to maximize the functional eye diagram based on margins response. Here we follow our work in [16] to define the corresponding objective function. Let  $\mathbf{R}_{m} \in \Re^{2}$  denote the electrical system margins response, which consists of the width and height of the functional eye diagram,

$$\boldsymbol{R}_{\mathrm{m}} = \boldsymbol{R}_{\mathrm{m}}(\boldsymbol{x}, \boldsymbol{\psi}, \boldsymbol{\delta}) = \begin{bmatrix} \boldsymbol{e}_{\mathrm{w}}(\boldsymbol{x}, \boldsymbol{\psi}, \boldsymbol{\delta}) & \boldsymbol{e}_{\mathrm{h}}(\boldsymbol{x}, \boldsymbol{\psi}, \boldsymbol{\delta}) \end{bmatrix}^{T}$$
(5)

where  $e_{w} \in \Re$  and  $e_{h} \in \Re$  are the width and height, respectively, of the eye diagram. The eye width and height are function of the coefficient values  $(C_{m}, C_{p}, C_{r})$  contained in vector  $\boldsymbol{x}$ , and they also depend on the operating conditions  $(\boldsymbol{\psi})$ and the connected devices  $(\boldsymbol{\delta})$ .

We aim at finding the optimal set of coefficient values to maximize the functional eye diagram area. Therefore, an initial objective function to be minimized is defined as

$$u(\mathbf{x}) = -\left[e_{w}(\mathbf{x}, \boldsymbol{\psi}, \boldsymbol{\delta})\right]\left[e_{h}(\mathbf{x}, \boldsymbol{\psi}, \boldsymbol{\delta})\right]$$
(6)

Based on the operating conditions and devices, the eye diagram may be decentered with respect to the eye-width (asymmetry  $e_{wa}$ ), eye-height (asymmetry  $e_{ha}$ ), or both. Hence, the objective function must consider these asymmetries. The area of the eye diagram and the asymmetries are scaled by weighting factors  $w_1, w_2, w_3 \in \Re$  such that they become comparable. Hence, a better objective function is defined as

$$u(\mathbf{x}) = -w_{1}[e_{w}(\mathbf{x},\boldsymbol{\psi},\boldsymbol{\delta})][e_{h}(\mathbf{x},\boldsymbol{\psi},\boldsymbol{\delta})] + w_{2}[e_{wa}(\mathbf{x},\boldsymbol{\psi},\boldsymbol{\delta})] + w_{3}[e_{ha}(\mathbf{x},\boldsymbol{\psi},\boldsymbol{\delta})]$$
(7)

with  $w_1$ ,  $w_2$ , and  $w_3$  calculated from

ı

$$v_{1} = \frac{3}{\frac{1}{n} \sum_{i=1}^{n} \left[ e_{w}(\boldsymbol{x}^{(i)}) \left[ e_{h}(\boldsymbol{x}^{(i)}) \right] \right]}$$
(8)

$$w_{2} = \frac{1}{\frac{1}{n} \sum_{i=1}^{n} e_{wa}(\mathbf{x}^{(i)})}$$
(9)



Fig. 6. Normalized coefficients responses across iterations.

$$w_{3} = \frac{1}{\frac{1}{n} \sum_{i=1}^{n} e_{\text{ha}}(\mathbf{x}^{(i)})}$$
(10)

where  $x^{(i)}$  are *n* randomly distributed base points (i = 1, ..., n) for initial measurements of eye width and eye height.

The optimization problem for system margining is then defined as,

$$\mathbf{x}^* = \arg\min_{\mathbf{x}} u(\mathbf{x}) \tag{11}$$

with  $u(\mathbf{x})$  defined by (7).

As described in Section III, we need to ensure the optimal system margin response is within a suitable area in the coefficients search space of the EQ map. In order to satisfy this requirement, the four margin responses around  $u(x^*)$  must be at least 80% of the value of  $u(x^*)$ , as shown in Fig. 3, where  $u_{i,j}$  are the objective function values per (7) for the *i*-th  $C_m$  and *j*-th  $C_p$  values, being  $C_m$  and  $C_p$  the vectors of Tx FIR precursor and post-cursor values, respectively, and  $C_r$  is the vector of Rx CTLE coefficient values. This avoids selecting an optimal solution with a too high sensitivity.

We now modify the optimization problem such that the optimal set of coefficients maximizes the system margins response without exceeding the limit of  $0.8u(x^*)$  in the vicinity. The new optimization problem can be defined through a constrained formulation,

$$\begin{aligned} \mathbf{x} &= \arg \min u(\mathbf{x}) \\ \mathbf{x} \\ \text{subject to } l_1(\mathbf{x}) \le 0, \ l_2(\mathbf{x}) \le 0, \ l_3(\mathbf{x}) \le 0, \ l_4(\mathbf{x}) \le 0 \end{aligned}$$
(12)

with

$$l_{1}(\mathbf{x}) = u(C_{mi^{*}+1}, C_{pj^{*}}, \psi, \delta) - 0.8u(C_{mi^{*}}, C_{r}, C_{pj^{*}}, \psi, \delta) (13)$$

$$l_{2}(\mathbf{x}) = u(C_{mi^{*}-1}, C_{pj^{*}}, \psi, \delta) - 0.8u(C_{mi^{*}}, C_{r}, C_{pj^{*}}, \psi, \delta) (14)$$

$$l_{3}(\mathbf{x}) = u(C_{mi^{*}}, C_{pj^{*}+1}, \psi, \delta) - 0.8u(C_{mi^{*}}, C_{r}, C_{pj^{*}}, \psi, \delta) (15)$$

$$l_{4}(\mathbf{x}) = u(C_{mi^{*}}, C_{pj^{*}-1}, \psi, \delta) - 0.8u(C_{mi^{*}}, C_{r}, C_{pj^{*}}, \psi, \delta) (16)$$

where  $C_{mi^*}$  and  $C_{pi^*}$  are the set of coefficients that maximize the margins response for each of the  $C_r$  values. Notice that we assume in (12)-(16) that u(x) is negative around  $x^*$ , which can be easily ensured by weighting factors (8)-(10).



Fig. 7. Eye diagram results: comparing the proposed methodology ( $R_f(x^*)$ ) with three different seeds against the exhaustive method ( $R_f(x^*)$ ).

A more convenient unconstrained formulation can be defined by adding a penalty term, as

$$U(\mathbf{x}) = u(\mathbf{x}) + \gamma_0^l |L(\mathbf{x})|^2 \tag{17}$$

where L(x) is a corner limits penalty function, defined as

$$L(\mathbf{x}) = \max\{0, l_1(\mathbf{x}), l_2(\mathbf{x}), l_3(\mathbf{x}), l_4(\mathbf{x})\}$$
(18)

The optimal solution depends on the value of the penalty coefficient  $\chi_l^l \in \Re$ . We define  $\chi_l^l$  as

$$\gamma_0^l = \frac{|u(\mathbf{x}^{(0)})|}{\left|\max\{l_1(\mathbf{x}^{(0)}), l_2(\mathbf{x}^{(0)}), l_3(\mathbf{x}^{(0)}), l_4(\mathbf{x}^{(0)})\right|^2}$$
(19)

where  $x^{(0)}$  is the starting point. Then, our objective function to optimize the system margins response is

$$\mathbf{x}^{^{*}} = \arg\min_{\mathbf{x}} U(\mathbf{x}) \tag{20}$$

with

$$U(\mathbf{x}) = -w_1 [e_w(\mathbf{x}, \boldsymbol{\psi}, \boldsymbol{\delta})] [e_h(\mathbf{x}, \boldsymbol{\psi}, \boldsymbol{\delta})] + w_2 [e_{wa}(\mathbf{x}, \boldsymbol{\psi}, \boldsymbol{\delta})] + w_3 [e_{ha}(\mathbf{x}, \boldsymbol{\psi}, \boldsymbol{\delta})] + \gamma_0^l |L(\mathbf{x})|^2$$
(21)

We aim at finding the optimal set of coefficients values  $x^*$  by solving (20). We look for a low cost computational optimization technique without the need of estimating gradients. The combination of pattern search method [17] and the Nelder-Mead method [18] is a good approach to deal with our objective function which contains many local minima. We start the optimization with pattern search, which serves for exploring the design space until finding a potential region where the global minimum is located. Then, the solution found by pattern search is used as seed for the Nelder-Mead method, which further minimizes the objective function for a more precise solution.

#### V. SYSTEM TEST SETUP

The system under test is an Intel post-silicon validation platform involving a CPU and a platform controller hub (PCH) [9]. The PCIe link is exercised at the packet level with a protocol add-in test card which emulates the external device, as shown in Fig. 4. Measurements are based on a process



Fig. 8. Eye diagram results: comparing the proposed methodology ( $R_f(x^*)$ ) against the initial design ( $R_f^{(x^0)}$ ) and the exhaustive method ( $R_f(x^*)$ exhaustive)).

called system margin validation (SMV) [19], which is a methodology to assess how much margin is in the design with respect to silicon processes, voltage, and temperature, by using an internal test circuitry [19]. The optimization algorithm described in Section IV is implemented in Python, using the SciPy [20] modules for Nelder-Mead and pattern search algorithms.

## VI. RESULTS

Through the optimization process defined in Section IV, we arrive to a set of Tx and Rx coefficients in just 47 iterations, as shown in Fig. 5, which are executed in 4 hours. The initial optimization with pattern search finds a solution at iteration 25, which is used as seed for the Nelder-Mead method to finalize the problem. Fig. 6 shows the evolution of the three coefficients during the optimization process.

In order to confirm the robustness of our technique, we ran the procedure three times using different seeds for the pattern search method. The optimized equalization coefficients were programmed into the system BIOS to measure the Rx eye diagrams, as shown in Fig. 7. The optimal eye diagrams found do not show significant differences between them (from a practical engineering point of view), confirming the robustness of our approach.

A comparison on eye diagrams between the proposed methodology against the initial design and the exhaustive method is shown in Fig. 8. The optimized equalization coefficients yield an eye diagram with an  $e_h$  and  $e_w$  being now 30 ticks and 27 ticks, respectively, which corresponds to an improvement of 35% on eye diagram area as compared to that one with the initial coefficients. Even though the optimized coefficients show an eye diagram area decrease of 6% as compared to the exhaustive method, the efficiency of this approach is demonstrated by the reduction of the eye diagram asymmetries (more centered eye diagram), and a significant time reduction in post-silicon validation. While the exhaustive method requires 24 machine-hours of data collection plus 8 man-hours for data (EQ maps) analysis for a complete optimization (prone to human errors), the method proposed

here can be completed in just 4 hours.

In the initial PCIe link optimizations that we performed, the eye-width and eye-height asymmetries were not considered in the objective function since they caused too many local minima. However, the eye diagram with the first optimized coefficients using this initial approach yielded a maximized area but with a large eye-width asymmetry. We concluded that for PCIe, the eye diagram can be easily decentered with respect to the eye-width, eye-height, or both, based on the operating conditions and devices. Henceforth, the objective function for PCIe link optimization must consider the asymmetries. We found that the combination of pattern search and the Nelder-Mead methods is a good approach to deal with our objective function containing too many local minima.

### VII. CONCLUSION

We proposed a direct optimization approach for PCIe link equalization based on a suitable objective function formulation to efficiently tune the Tx FIR filter and Rx CTLE EQ coefficients to mitigate ISI and other undesired channel effects, and successfully comply with the PCIe specification. The optimized EQ coefficients were evaluated by measuring the real eye diagram of the physical system, demonstrating a great mitigation of the ISI and channel effects, and accelerating the typically required long time for Tx and Rx EQ coefficients tuning, significantly enhancing current PCIe Tx/Rx tuning industrial practices in post-silicon validation.

#### ACKNOWLEDGEMENT

We thank Andres Viveros-Wacher, Victor Castillo, and Carolina Olea, from Intel Corp., who greatly assisted this research.

#### REFERENCES

- A. Wilen, J. Shade, and R. Thornburg, *Introduction to PCI Express: A Hardware and Software Developer's Guide*. Hillsboro, OR: Intel Press, 2003.
- [2] PCI SIG Org. (2017), PCI Express® Base Specification Revision 4.0 Version 1.0 [Online]. Available: https://pcisig.com/specifications.
- [3] A. Gatto, P. Parolari, M. Brunero, F. Corapi, V. Costa, C. Meani, and P. Boffi, "Intra-datacenter links exploiting PCI express generation 4 interconnections," in *Optical Fiber Communications Conference and Exhibition (OFC)*, Los Angeles, CA, March 2017, pp. 1-3.
- [4] D. Gonzales, "PCI express 4.0 electrical previews," PCI-SIG Developers Conference, Tel Aviv, Israel, March 2015.
- [5] Y. Ren, Y. W. Kim, anH. Y. Kim, "Implementation of system interconnection devices using PCI express," in *IEEE International Conference on Consumer Electronics (ICCE)*, Las Vegas, NV, Jan. 2015, pp. 281-282.
- [6] B. Casper, P. Pupalaikis, and J. Zerbe, "Serial data equalization," in 12th Conference and Technology Exhibition DesignCon 2007, Santa Clara, CA, Jan. 2007.
- [7] V. Stojanovic and M. Horowitz, Stanford University Lecture, Modeling and Analysis of High-Speed Links [Online]. Available: http://www.rle.mit.edu/isg/documents/Stojanovic\_CICC03.pdf
- [8] M. Li, *Jitter, Noise, and Signal Integrity at High-Speed.* Boston, MA: Prentice Hall, 2007.
- [9] F. Rangel-Patino, A. Viveros-Wacher, J. E. Rayas-Sanchez, E. A. Vega-Ochoa, I.

Duron-Rosales, and N. Hakim, "A holistic methodology for system margining and jitter tolerance optimization in post-silicon validation," in *IEEE MTT-S Latin America Microwave Conf. (LAMC-2016)*, Puerto Vallarta, Mexico, Dec. 2016, pp. 1-4.

- [10] ALTERA. (2013). FPGAs at 40 nm and >10 Gbps: Jitter-, Signal integrity-, Power-, and Process-Optimized Transceivers [Online]. Available: https://www.altera.com/en\_US/pdfs/literature/wp/wp-01092stratix-iv-gt-10gbps-transceivers.pdf.
- [11]I. Duron-Rosales, F. Rangel-Patino, J. E. Rayas-Sánchez, J. L. Chávez-Hurtado, and N. Hakim, "Reconfigurable FIR filter coefficient optimization in post-silicon validation to improve eye diagram for optical interconnects," in *Int. Caribbean Conf. Devices, Circuits, and Systems* (ICCDCS-2017), Cozumel, Mexico, Jun. 2017, pp. 85-88.
- [12]D. R. Stauffer, J. T. Mechler, M. Sorna, K. Dramstad, C. R. Ogilvie, A. Mohammad, and J. Rockrohr, *High Speed SERDES Devices and Applications*, New York, NY: Springer, 2008.
- [13]LOGIC FRUIT. S. Kumar, and P. Agarwal, PCI Express 3.0 Equalization: The Mystery Unsolved, [Online]. White Paper, Logic Fruit Technologies. Available: http://www.logic-fruit.com.
- [14] PCI SIG Org. (2012), Designing to the new PCI Express 3.0 equalization requirements [Online]. Available: https://pcisig.com/specifications.
- [15] P. Iyer, S. Jain, B. Casper, and J. Howard, "Testing high-speed IO links using on-die circuitry," in 19th International Conference on VLSI Design, Hyderabad, India, Jan. 2006.
- [16]F. E. Rangel-Patiño, A. Viveros-Wacher, J. E. Rayas-Sánchez, I. Durón-Rosales, E. A. Vega-Ochoa, N. Hakim and E. López-Miralrio, "A holistic formulation for system margining and jitter tolerance optimization in industrial post-silicon validation," *IEEE Trans. Emerging Topics Computing*, vol. \*\*, no. \*\*, pp. \*\*, \*\*\* 2017. (published online: 29 Sep. 2017; DOI: 10.1109/TETC.2017.2757937).
- [17]R. Hooke, and T. A. Jeeves, "Direct search solution of numerical and statistical problems," *Journal of the Association for Computing Machinery*, vol. 8, no. 2, pp. 212-229, April 1961.
- [18]J. C. Lagarias, J. A. Reeds, M. H. Wright, and P. E. Wright, "Convergence properties of the Nelder-Mead simplex method in low dimensions," *Society for Industrial and Applied Mathematics J. Optim.*, vol. 9, no. 1, pp. 112–147, 1998.
- [19]F. E. Rangel-Patiño, J. L. Chávez-Hurtado, A. Viveros-Wacher, J. E. Rayas-Sánchez, and N. Hakim, "System margining surrogate-based optimization in post-silicon validation," *IEEE Transactions on Microwave Theory and Techniques*, vol. 65, no. 9, pp. 3109-3115, May 2017.
- [20] SciPy.org, PCI SIG Org. (2017), Open-source Software for Mathematics, Science, and Engineering [Online]. Available: https://docs.scipy.org/doc/scipy/reference/index.html.