Abstract-This paper investigates techniques to speed up HSSI bit-error rate (BER) and jitter testing. The proposed oversampling-based transmitter test scheme accelerates transmitter jitter and eye diagram testing by means of a multiphase bit-error rate test circuit (BERT). Parallel BERT elements are able to digitize the input signal jitter behavior in a multiphase manner. We accurately extract the transmitter jitter in time domain and finish the whole transmitter test within tens of milliseconds, exceeding the current norm of 100 ms.
INTRODUCTION
High speed serial interfaces (HSSI), interchangeably referred to as SerDes, are a relatively recent means of high speed communication between blocks, electronic integrated circuits, circuit boards and systems. There are numerous HSSI standards addressing different applications, including SATA, XAUI, and Fiber Channel [1] . HSSI are now finding widespread use in consumer devices via High-Definition Multimedia Interface (HDMI) and USB. Increasing demand for more bandwidth is continuously pushing HSSI toward higher data rates [2] . As the data rate of HSSI devices keeps increasing, jitter, noise and signal integrity challenges are bound to become even harder to tackle and stricter silicon validation will become necessary [3] . While jitter and bit-error rate (BER) are just one of the hundreds of parameters to be tested on an average device, testing jitter and BER is extremely time-consuming [2] [3] .
Our aim is to improve the speed of bit-error rate (BER) testing. This is achieved by creating parallel bit-error rate test (BERT) elements working concurrently. These elements must be able to digitize the input signal jitter behavior in a multiphase manner. The more phases are deployed, the faster the test will be. Depending on the target data rate, we deploy 16, 8, 4, or 2 phases, and aim at obtaining BER-scan plots 16, 8, 4, or 2 times faster, respectively in those 4 parallel modes.
With our scheme, we accelerate transmitter jitter and eye diagram tests. This data can be accurately extracted from BER testing results. Using the dual-Dirac model [4] , jitter components can be extracted from the BER-scan plots. DualDirac was used since it enhances the speed of jitter testing by extrapolating bathtub curves at lower BER levels [8] .
The overall innovative testing scheme has been verified at data rates up to 6.4 Gbps using test hardware from DFT Microsystems. Using the DJ60HS SerDes Test Module from DFT Microsystems as a demonstration platform, we implemented compared the proposed scheme's accuracy and throughput with conventional BER-scan techniques.
II. BACKGROUND
The proposed SerDes test solution makes use of oversampling to cover a wide range of data rates and at the same time achieve a good performance in terms of jitter. As the PLL operating frequency range cannot cover the low data rates, the data is oversampled by a suitable factor, such that the sampling frequency lies within the acceptable range of the PLL. It should be noted that the factor by which the signal is oversampled affects the performance in terms of jitter. In fact, the sampling frequency of the oversampled signal, in spite of being achievable by the PLL, might become marginal with respect to the PLL operating frequency range. This could impact the performance, since more jitter exists closer to the ends of the range of the PLL operating frequency.
Although the usefulness of the oversampling technique is limited by the transceivers speed, with available transceivers now reaching 28 Gpbs, the oversampling technique can be used efficiently for data rates up to 14Gbps. This covers most existing telecommunication standards in use today. The oversampling factors applied to different data rates are shown in TABLE 1,  while the BERT diagram is in Fig. 1 .
The oversampled data passes through a Decimator, where the extra samples brought in by oversampling are dropped out. For instance, in 4X mode, out of each four sampled bits, only one bit is kept. Hence, the Decimator slows down the rate of the received bit stream back to its original rate. Then, the BERT engine performs a bit-by-bit comparison between the data out of the Decimator and the reference bit stream.
Before the BERT engine, Fig. 2 , starts bit-error evaluation, it performs a synchronization process to align the two bit streams together. A bit alignment process is required because the gigabit receiver deserializes the incoming bits with an arbitrary bit alignment. Synchronization is performed iteratively -if the number of mismatches between the two bit streams is greater than a certain threshold value (e.g., 3), then the reference pattern is shifted by one. The Bit Shifter block is processing shift requests of less than 512 bits. If the shift requested is more than 512 bits, the pattern memory address is incremented by one, since each location holds 512 bits.
III. MULTI-PHASE BERT
To obtain a full bathtub curve, BER evaluation should be performed over 2UI (Unit Interval) with fine time resolution. The sampling delay is swept over 2UI to provide the receiver of the BERT with samples in that interval. The multi-phase BERT idea is that instead of dropping-out the samples at other sampling phase delays, we can use them to further accelerate the BER-scan test. In parallel modes we receive 16, 8, 4 or 2 samples, which can be used to enable multiple parallel BERT elements. Hence, with multi-phase BERT we only need to sweep the sampling phase delay over a fraction of UI. In four parallel modes, sweeping the sampling delay over UI, UI/2, UI/4, UI/8, respectively, results in a full bathtub curve (Fig. 3 ). Also, with MPB, the overhead of communication between hardware and software is reduced. Each BER value requires three software commands to set the phase, start the BER measurement, and read the bit-error value. MPB reduces this overhead of communication by performing multiple BER measurements concurrently by only setting the sampling delay on one of those phases. Therefore, we remove the Decimator block in Fig. 1 . The BERT now directly receives the data from SerDes, and performs bit-error count with respect to the bit stream received from the pattern-generator or pattern memory. However, the rate of the reference pattern is not the same as the received pattern rate. For instance, in 4X mode the received pattern is 4 times faster than the reference. As the time pattern-generator or pattern memory provide the BERT with one reference bit, the SerDes samples the received bit at 4 different phases. Hence, the reference pattern should be arranged to allow performing a valid comparison. To address this issue, we place a Duplicator block in between the BERT engine and the pattern-generator. Indeed, the Duplicator duplicates each reference bit an amount of times equivalent to the oversampling-ratio. The Duplicator mainly consists of a FIFO to which the data is written at the rate f and its output data is read at the rate oversampling-ratio*f.
We aim to achieve multiple bit-error evaluations in parallel, while minimizing the additional hardware cost as much as we can. We utilize a single BERT engine that can perform the bit-error evaluation of up to 16 phases in parallel. This requires having 16 parallel comparators, each corresponding to a single phase.
Out of each 32-bit word of both the reference and received bit stream, 32/oversampling-ratio bits belong to each phase. To measure the number of bit-errors at each phase separately, we send the both input streams to the all of the 16 comparators and we define a control signal as an input to each comparator to mask out the reference/sampled bits of other phases. To synchronize the reference bit stream with the received bit stream, we only track the bit-error count of the first counter, since we know that the other sampled phases have a known distance from the first phase. Hence, when the number of mismatches on the first bit-error counter is less than the threshold value, the two bit streams are aligned.
One issue regarding the synchronization process in multiphase BERT method is that the buffer inside the Bit Shifter block gets full with less number of reference bits compared to before. For instance, if the oversampling-ratio is 4, since each reference bit is duplicated 4 times, only 512/4=128 bits suffice to fill in the buffer inside the Bit Shifter. When at a request of a bit shift of more than 512 bits we keep incrementing the memory address by one, we are in fact skipping 512-128=384 reference bits. We should only shift the reference pattern from memory by 128 bits. To solve this issue, instead of shifting reference pattern forward, we delay the received bit stream to achieve the requested alignment. To illustrate this, consider the case of oversampling-ratio of 4 once more, at a request of 512-bits shift, instead of shifting the reference pattern by 128 bits, we delay the received pattern by 4 clock cycles.
We need to take into account that bit-error values we receive in a row are not in the actual order that they should be to constitute a bathtub curve. Fig. 5 shows how bit-error values should be arranged. Also, as a bathtub curve is a plot of the BER versus the sampling phase delay, the array of the horizontal axis should be arranged in the same way. Regarding the sampling delay of the samples at those phases which we do not actually set the sampling phase delay, but we receive samples at those phases; we can calculate the phase value by adding fractions of UI ( to the phase delay of the first sample at which we set the sampling delay.
IV. EXPERIMENTAL RESULTS
Multi-phase BERT has been implemented on the DJ60HS SerDes Test Module from DFT Microsystems. The goal was to validate functionality and performance and to establish the actual speed-up that can be obtained; hence, a number of tests were performed using the test firmware of DFT Microsystems.
The most popular solution for multi-gigabit SerDes testing in production is to loop back the output of the transmitter to the input of the receiver [6] . We perform the tests in serial loopback which can be done either internally or through the loadboard. 
A. Synchronization Tests
Synchronization should be performed before each BER test; otherwise, the BER result is not valid. Hence, we first start by verifying the result of synchronization tests with different patterns. We perform the synchronization tests with sweeping the transmitter phase delay over 2UI. As the maximum number of the bit shifts required to align the two patterns together is equal to the length of the reference pattern, we perform the tests with patterns of different lengths.
B. BER-Scan Tests
Next, we validate the functionality of our method by performing multiple BER-scan tests and observing the quality of the generated bathtub curves. Under different test parameters, such as data rate, pattern, duration, and phase resolution, we investigate to see whether bathtubs have glitches, discontinuities, bumps, etc. With all of the tests performed, the BER-scans were observed to produce the expected bathtub curves that were consistent with the regular BER-scan results.
Apart from this, we evaluate the reliability of the bathtub curves by comparing their dislocation compared with the bathtub curves generated with the regular design. As an estimated of the dislocation of bathtub curves we monitor their edge displacement. Edge of a bathtub curve on a linear scale is the 50% transition point. We perform each test 100 times with both the previous method and MPB, and we find the phase location of each bathtub edge. We can conclude that if we would like our design to meet the 95% confidence level, the population mean of the edge phases should fall within the 95% confident interval of the regular design [7] . Transmitter jitter components, including random jitter (RJ), total jitter (TJ), and deterministic jitter (DJ) are extracted using the dual-Dirac model [5] . Just like before, we perform each test 100 times with both methods under the same conditions and compare the statistical results to evaluate the reliability of the information extracted from the bathtub curves.
C. Speed Enhancement
To verify the actual speed-up obtained with MPB, at any data rate we perform BER-scan tests multiple times (100 times) and calculate the average run time with both the regular method and MPB. We aimed to achieve a linear speed-up for each factor mode. The speed enhancement actually achieved is less, but this is what we expected. Because although hardware calculates the bit-error value of multiple phases concurrently, we pass the bit-error values to software through one port one by one. This is only because we did not want to change the registry by adding 15 extra registers before evaluating the method. While we could get all the bit-error values with 16 read commands, our implementation requires 16 write and 16 read commands. Obviously, this overhead of communication causes a difference between the actual speed-up achieved and the one we theoretically expected. This is the reason that the speed-up at lower data rates can deviate more from the ideal speed-up factor. At lower data rates, the UI is larger and many more BER measurements are performed (assuming that the time resolution remains the same). Also, this overhead causes the tests to be slower than before in 1X mode. In 1X mode the overhead of communication has been increased because instead of getting the bit-error value with one read command, we now require one read and one write command. Simply adding 15 more registers to the register interface will solve the problem of overhead of communication and will get us closer to the speed-up expected.
D. Enhancement Cost
The relative cost of speed enhancement is verified by probing how many extra logic cells, flip-flops, embedded FPGA memory, etc. have been used to implement multiphase BERT. In this paper we presented a scheme to accelerate BER testing of HSSI using oversampled multi-phase BERT engine. The method was implemented and incorporated in a DFT Microsystems SerDes Test Module. Such module offers the flexibility to adapt the measurement algorithms, which was ideal for exploring our multi-phase technique. We demonstrated that the good speed up was achieved, while the fidelity of the measurements was preserved. While the speedup achieved is relatively greater at lower data rates, it allows an order of magnitude faster testing in the field using the largest numbers of phases. 
