Abstract: In this paper, we demonstrate a novel 20-GSample/s burst-mode clock and data recovery (BM-CDR) technique for optical multiaccess networks. The BM-CDR incorporates an injection-locking method for clock recovery and a clock phase aligner employing space sampling with two multiphase clocks at 10 GHz and a phase-picking algorithm for automatic clock phase acquisition. The design provides low latency and fast response without requiring a reset signal from the network layer. The BM-CDR achieves a bit error rate G 10 À10 while featuring instantaneous (0-bit) phase acquisition for any phase step ðAE2 radÞ between successive bursts. We also compare the data with probabilistic theoretical predictions to validate the experimental results.
Introduction
The past decade has seen profound changes not only in the way we communicate but also in our expectations of what networks will deliver in terms of speed and bandwidth. More specifically, the explosive growth in Internet traffic has spurred the development of fiber-to-the-home (FTTH) systems for high-speed broadband access [1] , [2] . Among them, passive optical network (PON) is recognized as the most economical FTTH solution to alleviate the bandwidth bottleneck in access networks [2] , [3] . PON standards based on time-division multiplexing (TDM) such as the IEEE 802.3ah gigabit ethernet PON (GEPON) [4] and ITU-T G.984 gigabit-capable PON (GPON) [5] have already been widely deployed worldwide; FTTH rollout has recently surpassed 30 million users on the globe and is continuing to grow at a rapid rate [6] . The coming decade promises to demand more capacity and bandwidth in these networks. In this context, the IEEE802.3av 10G-EPON [7] has recently been standardized to attain a total bandwidth of 10 Gb/s. Related research activities are being aggressively pursued [8] , [9] due to 10G-EPON inherent cost-effective user-shared configuration, good backward compatibility, and smooth upgradability from GEPON. It is no longer a question of Bif[ 10G-EPON holds great promise for next-generation optical access networks (NGA), it is a question of Bwhen.
[ Fig. 1 is a schematic of a PON system showing our study in context. In the downstream direction, continuous data are broadcast from the optical line terminal (OLT) to the optical network units (ONUs) using TDM. The transmit side of the OLT and the receive side of the ONUs can therefore use continuous-mode integrated circuits (ICs). The challenge in the design of a chip set comes from the upstream data path. In the upstream direction, using time-division multiple access (TDMA), multiple ONUs transmit data to the OLT in the central office (CO). Because of optical path differences, the upstream traffic is inherently bursty with asynchronous phase steps Á' 2 rad, which exist between the consecutive k th and ðk þ 1Þth packets. This inevitably causes conventional clock and data recovery (CDR) circuits to lose pattern synchronization leading to packet loss. Preamble bits can be inserted at the beginning of each packet to allow the CDR feedback loop enough time to settle down and thus acquire lock. However, the use of a preamble introduces overhead, reducing the effective throughput and increasing delay. Consequently, an OLT requires a burst-mode (BM) CDR, which is responsible for phase recovery and must be achieved at the beginning of every packet. The most important characteristics of the BM-CDR are its phase acquisition time, which must be as short as possible, and its robustness to long runs of consecutive identical digits (CIDs).
Different approaches have been proposed to build BM-CDRs with short phase acquisition times. The first approach, based on feedback, consists in trading off the loop bandwidth of phase-locked loop (PLL)-based CDR to reduce the settling time [10] . The disadvantages include stability issues, jitter peaking, and limited jitter filtering. The second approach, based on feedforward, consists of gated voltage-controlled oscillators (GVCOs) [11] . Here, clock phase alignment is done by triggering a local clock on each transition of the input data. Phase acquisition is rapid, but this solution is susceptible to pulse distortions and does not filter out input jitter. The last approach is based on oversampling. One can either oversample in the time domain [12] - [14] or in the space domain [9] , [15] . Oversampling in time is achieved by using a clock frequency higher than the bit rate. This requires faster electronics operating at twice or thrice the aggregate bit rate resulting in wasted power consumption, in addition to the knowledge of a predefined unique delimiter (start of packet) that is exploited as a signature for phase picking. These BM-CDRs are limited to interpacket phase acquisitionVbetween consecutive packets. Oversampling in space requires multiphase clocks with a frequency equal to the bit rate. This requires a generation of multiple (8 to 10) phases of the clock with low skew between them. This technique suffers from high complexity and power consumption.
In this paper, we present a novel BM-CDR architecture based on injection locking and space sampling with a hybrid topology of feedback and feedforward. The BM-CDR uses electronics operated at the bit rate with only two phase clocks, leading to more efficient power consumption. Furthermore, the BM-CDR requires no a priori knowledge of the packet delimiter. Hence, this BM-CDR can also acquire phase in even more stringent conditions; that is, for intrapacket phase acquisitionVwithin a packetVmaking it truly modular across application testbeds. Our 20-GSample/s (10 GHz Â 2 phase clocks) BM-CDR achieves a bit error rate (BER) G 10 À10 with instantaneous (0-bit) IEEE Photonics Journal 20-GSample/s (10 GHz x 2 Clocks) BM-CDR phase acquisition for any phase step jÁ'j 2 rad, between consecutive bits, and no trading-off in the loop bandwidth. The rest of this paper is organized as follows: Section 2 details the architecture of the proposed BM-CDR. Section 3 is devoted to the presentation and analysis of the experimental results. Finally, this paper is concluded in Section 4.
Novel BM-CDR Architecture
A block diagram of the proposed BM-CDR is shown in Fig. 2 . The BM-CDR is composed of a clock recovery circuit (CRC) and a CPA. The CRC employs an injection-locking technique, whereas the CPA is based on oversampling in the space domain with two multiphase clocks and a phasepicking algorithm.
Under ideal conditions, with no intersymbol interference (ISI), error-free data recovery is achieved when the received data are sampled within half-bit period of the nominal sampling point. For a conventional PLL-based CDR, the ideal sampling point by the recovered clock is in the center of the data eye, and the phase error process is modulo-2 rad. For this BM-CDR, with 2Â-oversampling with multiphase clocks separated by rad, the sampling points are located at À=2 rad and þ=2 rad, respectively, from the center of the data bit. In this case, the phase error process is modulo-rad.
Clock Recovery
PON systems employ a simple binary amplitude modulation data formatVnonreturn to zero (NRZ)Vfor ease of detection. Random NRZ data have characteristic properties that directly influence the design of CRCs. The power spectral density (PSD) S NRZ ðf Þ of an NRZ data sequence with normalized average power of unity is expressed as
where f is the frequency parameter, and T b is the bit period. The spectrum of the NRZ data, as depicted in Fig. 3 (a) (solid curve), exhibits no spectral component (nulls) at integer multiples of the IEEE Photonics Journal 20-GSample/s (10 GHz x 2 Clocks) BM-CDR bit rate frequency f ¼ n=T b ; n ¼ 1; 2; . . .; thus, providing no direct information for clock extraction. This implies that a CRC can lock to these spurious signals instead of the bit rate frequency or not at all. Furthermore, a linear time-invariant (LTI) operation cannot extract a periodic clock from these data [16] . However, the information about the frequency of the data can be extracted from the spacing between the data transitions. These transitions appear as the rising and falling edges of the data signal. Thus, a nonlinear function may be used to recover the clock. In Fig. 2 , clock recovery is performed by using a method comprised of a nonlinear elementVan edge detectorVin front of the data signal. The edge detection is performed by an XOR gate operating on the input data D in and its delayed replica b D in , as illustrated in Fig. 4 (a). Fig. 3 (b) shows that pulse D 0 in , generated by the XOR gate indicates the data transitions. Furthermore, since the transition of a random data sequence is still random, the spectrum of the generated pulses resembles that of a return to zero (RZ). That is, the spectrum of pulse D 0 in displays as a square of sinc function with strong clock spectral lines at the data rate and its harmonics, as depicted in Fig. 3(a) (dotted curve). From (1), the spectrum for a bit-period ðT b Þ pulse nulls at 1=T b , whereas the spectrum for a half-bit-period ðT b =2Þ pulse expands twice wider (nulls at 2=T b ) but with lower magnitude. Consequently, this facilitates the injection locking of the subsequent VCO to the date rate or even its harmonics [17] . Theoretical derivations indicate that by maintaining a T b =2 delay between the two inputs of the XOR gate yields the strongest injection [16] . As shown in Fig. 2 , the clock signal CK is recovered from the edge-detected waveform by passing through the PLL-based VCO tuned near the clock frequency. In order to reduce jitter on the recovered clock signal, the VCO should have a good selectivity to suppress the unwanted data-dependent signal that results in amplitude and phase modulation. The recovered clock is then fed to the CPA.
CPA
The CPA utilizes multiphase clocks at the bit rate and a novel phase-picking algorithm based on an Bearly-late[ detection principle that is simple, fast, and effective. As illustrated in Fig. 2 , the CPA is based on a feedforward topology and comprises of phase (-) shifters, an Alexander phase detector (PD), a phase picker, and a D flip-flop (D-FF). The -shifters utilize the clock recovered by the CDR CK to provide multiple clocks, i.e., CK o , CK À=2 , and CK þ=2 , with low skew and different phases, i.e., 0 rad, À=2 rad, and þ=2 rad, respectively, with respect to CK . Next, an Alexander PD [18] , which inherently exhibits bang-bang (binary) characteristics, is used to strobe the data waveform D in , with consecutive clock CK o edges, at multiple points in the vicinity of expected transitions. This results in three data samples as shown in Fig. 5 (a): 1) previous bit A; 2) the current bit B; and 3) a sample of the current bit at the zero crossing T . Depending on the phase difference between the consecutive packets, the PD aided by these samples, i.e., X T È B and Y A È T , can determine the location of the clock edge with respect to the data edge as follows: IEEE Photonics Journal 20-GSample/s (10 GHz x 2 Clocks) BM-CDR k th packet and out-of-phase by jÁ'j 2 rad with the first bit of the ðk þ 1Þth packet. To achieve instantaneous phase acquisition, the CPA must use the instantaneous clock t inst to correctly sample the bits of the ðk þ 1Þth packet. Note, in the case of a conventional PLL-based CDR, its feedback loop would need finite time to settle down and acquire lock; that is, align the instantaneous clock t inst to the lock state t lock so as to sample in the middle of the data bit. When there is no phase difference between the consecutive packets Á' ¼ 0 rad, either of clocks CK À=2 or CK þ=2 will correctly sample the data bits of the phase shifted ðk þ 1Þth packet [see Fig. 6 (d) and (g)]. This is also true for an antiphase step Á' ¼ AE rad, (not shown in Fig. 6 because the scenario is similar to the 0-rad phase stepVa modulo-process). For a phase step À G Á' G 0 rad, clock CK o will lag the data [see . That is, regardless of any phase step jÁ'j 2 rad, there will be at least one sampling clock, either CK À=2 or CK þ=2 , that will yield an accurate sample. The phase picker then selects the most accurate sampling clock CK out , from these two possibilities for driving the D-FF to retime the data; that is, sample the noisy data yielding an output D out , with less jitter. The foregoing concepts on the Alexander PD and phase picker are summarized in Table 1 , leading to the circuit topology in Fig. 4(b) . The result is that the CPA achieves instantaneous phase acquisition (0 bit) for any phase step jÁ'j 2 rad; that is, no preamble bits ðl ¼ 0Þ at the beginning of the packet are necessary. Next, we provide an experimental demonstration for this, backed by a probabilistic theoretical prediction.
Results and Discussion
The proposed BM-CDR is built from low cost/complexity commercial off-the-shelf (COTS) components rated at 13 Gb/s. It is built by integrating the following evaluation boards from Hittite Microwave: XOR gate (HMC721LC3C), -shifters (HMC538LP4), Alexander PD (HMC6032LC4B), AND gate (HMC722LC3C), 2:1 selector (HMC678LC3C), and D-FF (HMC723LC3C).
The BM-CDR is tested in a conventional BM test setup [9] , [10] . Bursty traffic is generated from an Anritsu pattern generator by adjusting phase Á' in between packets or within a packet with a phase shifter. The packets are formed from guard bits, preamble bits, delimiter bits, 2 15 À 1 pseudorandom binary sequence (PRBS) payload bits, and comma bits. The phase steps can be set between AE125 ps with a 1-ps resolution, corresponding to a AE1.25 unit interval (UI) at 10 Gb/s. Note that 1 UI or 2 rad corresponds to 100 ps (1-bit period) at 10 Gb/s.
The plots in Fig. 7 depict the BER performance of a conventional CDR and the proposed BM-CDR at 10 Gb/s as a function of the phase step jÁ'j 2 rad, between two consecutive data bits, for a zero preamble length. For the CDR, we observe two bell-shaped curves centered at approximately AE50 ps [see Fig. 7(a) ]. As expected, these represent the half-bit periods corresponding to the worstcase phase steps at Á' ¼ AE rad, respectively. It follows that the CDR is sampling near the edge of the data eye, resulting in a loss of lock. At relatively small phase shifts (near) Á' 2 f0 rad; AE2 radg, we can easily achieve error-free operation, BER G 10 À10 , because the CDR is almost sampling at the middle of each data bit. For our BM-CDR, we achieve error-free operation for any phase step jÁ'j 2 rad with zero preamble bits allowing for instantaneous phase acquisition [see Fig. 7(a) ]. Additionally, we note that the BM-CDR can support up to 1000 CIDs with error-free operation. As the At this point, it is interesting to compare the experimental results with probabilistic theoretical predictions to draw some important conclusions. The sampling error probability P s of the CDR in presence of phase steps jÁ'j 2 rad and an l-bit preamble can be expressed as [12] 
where QðÁÞ, called the BQ function,[ is the normalized Gaussian-tail probability defined as
where t s is the root mean square (RMS) jitter on the sampling clock in UI, is the correcting factor introduced to account for the symmetrical performance about the edges of the data bit at ÀT b =2 and þT b =2 as
and ðlÞ measures the CDR's lock acquisition time, analytically derived to be [16] ðlÞ
where 9 0 is the Bdamping ratio[ and ! n in radians per second is the Bnatural frequency,[ both being functions of the CDR circuit parameters [16] . Moving forward, the Alexander PD's probability of correctly determining an early or late clock CK o can be written as
and the probabilities of correctly sampling points A, B, and T can be given as:
PrðBÞ where P s ðjÁ'jÞ ¼ P s ðjÁ'j; l ¼ 0Þ sampling error probability of a bit, and # is another correcting factor introduced to account for the symmetrical performance about the edges of the data bit as
The sampling points of the multiphase clocks CK À=2 or CK þ=2 are located at t k 2 fÀ=2; þ=2g, respectively, about the center of the data eye. The sampling error probabilities P k s of the multiphase clocks can be calculated by convolving P s ðjÁ'jÞ in (2), with the sampling points t k , as
where
is the Dirac-delta function. It follows from the sifting property
The phase-picking algorithm determines the clock sample CK À=2 or CK þ=2 , closest to the center of the data eye upon detecting an early or late clock CK o , with respect to data D in . Consequently, the sampling error probability of the BM-CDR P BMÀCDR s can be expressed as
Finally, we define the BER, denoted as P e , of the CDR and BM-CDR, from the sampling error probabilities in (2) and (13), respectively, as follows:
We compare the experimental results with the theoretical predictions for the CDR and the BM-CDR from (2) and (13) in Fig. 7(a) and (b) , respectively. For BER G 10 À6 , the results are in close agreement. However, for the CDR, the theoretical bound is optimistic for BER 9 10 À6 as the probabilistic model accounts for the jitter input to the receiver and not for jitter generated by the circuitry. This may be a result of VCO phase noise in the CDR and data bits being asymmetric with different rise and fall times leading to different jitter distribution on the edges of the data eye.
To measure the phase acquisition time of the CDR, preamble bits (B1010 . . .[ pattern) can be inserted at the beginning of the packet to help the PLL of the CDR to settle down and acquire lock until error-free operation is achieved. In Fig. 8(a) , as the preamble length is increased, there is an improvement in the BER. After 125 preamble bits, we perceive error-free operation for any phase step. However, the use of a preamble introduces overhead, reducing the effective throughput and increasing delay.
To compare the jitter tolerance of the CDR with the proposed BM-CDR, examine the plots in Fig. 8(b) , which show the number of preamble bits required to obtain a BER 10 À10 as a function of the maximum allowable RMS jitter for the worst-case phase step jÁ'j ¼ rad. Our BM-CDR is able to achieve instantaneous phase acquisition ðl ¼ 0Þ when the RMS jitter t s 0:04 UI. This is true for any phase step jÁ'j ¼ 2 rad, as shown in Fig. 7(b) . This is not the case for the CDR as the tolerance to jitter is 0 UI to obtain instantaneous phase acquisition for the worst-case phase step. This implies that it is not feasible for the CDR to obtain instantaneous phase acquisition since a jitter-free sampling clock is practically impossible. These theoretical limits can be summarized as:
With increasing preamble length, the jitter tolerance of the CDR and BM-CDR on the sampling clock increases for a given phase step. It tends to become independent of the phase step in the presence of a large number of preamble bits lim l!1 max t s ¼ 0:08 UI; for all jÁ'j 2 rad:
In Fig. 9 , we plot the BER performance of the CDR and BM-CDR as a function of phase steps for different RMS jitter and zero preamble bits. When the RMS jitter is t s 9 0:04 UI, the shaded area in Fig. 9(a) depicts the tradeoff region with the phase steps jÁ'j =4 rad, where the CDR has a better jitter tolerance than the BM-CDR. This is expected as the CDR's recovered clock is sampling closer to the middle of the data bit compared with the BM-CDR's multiphase clocks, which are sampling further away from the center of the data eye for phase steps jÁ'j =4 rad (see Fig. 6 ). However, for phase steps =4 G jÁ'j rad, the BM-CDR has a better jitter tolerance. Furthermore, Fig. 9(b) shows that, while the BM-CDR achieves instantaneous phase acquisition for any phase step jÁ'j 2 rad when the RMS jitter t s 0:04 UI, the CDR performance degrades with increasing RMS jitter. Fig. 10(a) shows the eye diagram of the recovered data and clock in response to a 2 15 À 1 PRBS pattern. The RMS jitter of the recovered clock and data is 1.75 ps and 2.5 ps with a peak-to-peak voltage swing of 435 mVp-p and 300 mVp-p, respectively. The output spectrum of the recovered clock is shown in Fig. 10(b) . The phase noise at 100, 500, and 1000 kHz offset is approximately À24, À62, and À68 dBc/Hz, respectively.
Conclusion
We have demonstrated a novel 20-GSample/s BM-CDR for optical multiaccess networks. The BM-CDR is based on an injection-locking technique for clock recovery and a CPA employing space sampling with two multiphase clocks at 10 GHz and a phase-picking algorithm for automatic clock phase acquisition. In summary, the BM-CDR achieves a BER G 10 À10 while featuring instantaneous (0 preamble bit) phase acquisition for any phase step jÁ'j 2 rad, without trading off its loop bandwidth (jitter tolerance). Thus, the BM-CDR could also find applications in future highspeed optical burst/packet switched networks, which may require a cascade of BM-CDRs that each consumes some of the overall jitter budget of the system.
Instantaneous phase acquisition can be used to improve the physical efficiency of the PON traffic, reduce the BM sensitivity penalty, and increase effective throughput of the system by increasing the information rate. Our eloquent solution leverages the design of components for long-haul transport networks using low-complexity commercial electronics, thus providing a cost-effective solution for PON BM-CDRs. These components are typically a generation ahead of the components for multiaccess networks. Thus, our solution will scale with the scaling for long-haul networks.
