Introduction
In anticipation of the growing demand of access bandwidth, especially for applications beyond FTTH (e.g. business services and 5G X-hauling), both IEEE and ITU-T have recently started to investigate the roadmap for future passive optical networks (PON) 1, 2 . The IEEE P802.3ca 100G Ethernet PON Task Force 3 was founded in 2016 to specify the physical-layer (PHY) parameters for next-generation 25/50/100 Gb/s EPONs (NG-EPON). Currently, NG-EPON is expected to use multiple wavelengths carrying 25 Gb/s each to achieve data rates up to 100 Gb/s 2 . However, at a line rate of 25 Gb/s, the fast synchronization required for the burst-mode upstream in PONs is challenging. So far, various techniques have been proposed for burst-mode clock and data recovery (BM-CDR), such as gated-voltage controlled oscillators (G-VCO), oversampling, and fast-lock phase locked loops (PLL) 4 . The G-VCO CDR has the disadvantages of no jitter rejection and reduced pulse-width distortion (PWD) tolerance. The oversampling CDR can provide a high PWD tolerance, but suffers from high power consumption and a large IC area. The fast-lock PLLbased CDR can exhibit high jitter rejection. However, traditional PLL implementations rely on analog building blocks, making it very challenging to achieve a lock time below 100 ns 4 . Previously, we have demonstrated a real-time 25 Gb/s PON upstream link using a low-cost 10 Gb/s burst-mode receiver (BM-Rx) 5 , though needing a phase-synchronized half-rate clock. A 25 Gb/s BM-Rx implementation for photonic switching networks 6 also requires a half-rate system clock input and an externally controlled startof-burst signal to achieve 31 ns settling time with 4.4 pJ/bit efficiency. In this work, we report a 25 Gb/s all-digital clock and data recovery (AD-CDR) chip with a wide-band lock loop for nextgeneration PON applications. The AD-CDR is capable of phase-locking an incoming burst in 35 ns, and requires neither a system clock nor a start-ofburst signal. Thanks to the all-digital implementation, the circuit occupies a compact active area of 0.050 mm 2 and consumes only 46 mW, resulting in an energy-per bit of 1.8 pJ/bit.
System Description
Inspired by the all-digital phase-locked loop (AD-PLL) used for frequency synthesis, the AD-CDR implements its loop filter completely in the digital domain. Compared to conventional charge-pump loop filters, it does not require bulky analog loopfilters, which require a large chip area or have to be implemented off-chip. In the digital domain, large time constants can be implemented at low cost and the loop filter can maintain its state without any loss. As a result, no frequency drift occurs when there is no phase information available, as is the case in between bursts. Consequently, the AD-CDR can operate without a precise reference clock. An overview of the proposed AD-CDR is shown in Fig. 1 . At 25 Gb/s, a bang-bang phase detector (BB-PD) is the optimal choice, due to its simplicity. This is because, different from an AD-PLL, the phase detection of an AD-CDR runs at full speed. To further relax the circuit requirements, the sampling stages of the PD are parallelized and sample the input data (D in ) with 8 quarter rate clock signals that have equally spaced 978-1-5386-5624-2/17/$31.00 ©2017 IEEE phases. The BB-PD generates two binary signals which indicate whether the recovered clock is leading (Early ) or lagging (Late) with respect to the input data. Because these signals are binary, they can easily be processed by a digital loop filter (DLF). However, the operating frequency of the phase detector is too high to allow direct synthesis of the loop filter. To reduce its operating frequency, the phase information is subsampled N=16 times, indicated with ↓ N in Fig. 1 . This is possible without degradation of the phase detector (PD) performance when it is operated as an Inverse Alexander phase detector 7 . Furthermore, because the input data is sampled with these quarter rate clock signals, the output data is automatically parallelized, simplifying further processing. The loop filter, consisting of a proportional and integral path with programmable gain, is automatically synthesized and has an operating frequency of 1.56 GHz. Based on the output of the loop filter, capacitors in a capacitor bank are switched to control the frequency of a Digitally Controled Oscillator (DCO). The LSB of the control word driving the DCO corresponds to a frequency step of 2.25 MHz.
The AD-CDR does not require a high-accuracy reference clock, only the DCO needs a 1-time only calibration, to ensure its frequency is in the vicinity of the line rate. The burst-mode operation of the CDR is realized thanks to its large loop bandwidth, which enables short settling times. As a result, no start-of-burst-signal is required either. This significantly simplifies the integration of the component in a system, removing the need of a feedback signal. The complete CDR is implemented in the low-power flavor of a 40 nm CMOS process and measures only 0.050 mm 2 .
Experimental Results
First, the power consumption and error-free operation were evaluated in continuous-mode. Next, burst-mode settling time were measured.
Continuous-mode measurements:
The AD-CDR in evaluated in continuous-mode with a 25 Gb/s PRBS7 input. One of the quarter rate outputs is analyzed using a BER-tester showing a BER lower than 1E-12, while the device consumed 46 mW. The digital loop filter is highly programmable. The phase noise, shown in Fig. 2 , of one of the quarter rate clocks was measured under different settings of the proportional gain (Kp). It is clear that increasing the proportional gain leads to a higher loop bandwidth, which directly reduces the settling time of the AD-CDR during burst-mode operation. 
ns

Burst-mode measurements:
To evaluate the burst-mode performance of the AD-CDR, 25 Gb/s packets starting with a "1010. . . " preamble are used. Because all the packets are generated from the same source, a sufficiently long gap is required between two consecutive packets to ensure the CDR is no longer in lock with the generator. This is illustrated with two packets with a gap size of 10 ns and 41 ns shown in Fig. 3 and Fig. 4 respectively. The top waveform displays an instantaneous sampled output stream, while the bottom row shows a persistence mode view of the output which superimposes multiple waveforms on the same view. It is clear that with a gap of 10 ns the CDR remains in lock, while after 41 ns the CDR is out-of-lock. To ensure random phases of the incoming data with respect to the DCO of the CDR, a gap size of 5.2 μs was employed during the remainder of the measurements. Of course, this is only necessary to stress the device during experiments. In practice, the gap size can be made arbitrary small.
In order to measure the settling time of the AD-CDR, a long preamble sequence was added in front of the packet. Because the input data is internally demultiplexed into four quarter-rate streams, it is very easy to observe the settling at the output of the device using a real-time oscilloscope: When the "1010. . . " preamble is demultiplexed by four, the output should stay either low or high. If any transition occurs during this preamble, an error has occurred. The number of transitions at the beginning of the packet indicates how many packets are received: the phase of the incoming packet is distributed randomly, as a result, there is an equal chance of receiving a 1 or 0 signal. The packet structure and the demultiplexed output is schematically illustrated in Fig. 5 . The packet consists of a 2 14 bit (≈16 kbit) preamble, a 16 bit long delimiter used to align the 4 output datastreams and a 2 20 bit (≈1 Mbit) payload. The gap between two packets is 2 17 bits (≈100 kbit) which results in 5.2 μs. A captured output packet is shown in Fig. 6 . The long preamble length was only used to verify no errors occur during burst-mode operation after settling. In practice, the preamble length can be limited to the worst-case settling time.
The AD-CDR aligns the phase of its recovered clock using a wide-band PLL structure. Because this is a closed loop system, the settling time is strongly related to its bandwidth. Additionally, the settling time also depends on the relative phase of incoming data stream and on the phase noise generated by the DCO. As a result, part of the settling time is deterministic, while it also has a Worst case settling time 35ns stochastic component. The settling time of the AD-CDR is measured by recording when a transition occurs in the subsampled preamble at the output of the AD-CDR. Fig. 7 shows a maximally observed settling time of 35 ns after transmission of 2E6 packets.
Conclusions
In this work, we presented the first 25 Gb/s alldigital PLL-based clock and data recovery circuit working in burst-mode operation. The AD-CDR operates at a line rate of 25 Gb/s and consumes 46 mW without an external reference clock, resulting in an energy-per bit of 1.8 pJ/bit. The digital loop filter is adjustable, and thanks to its large bandwidth, a settling time of 35 ns or less is obtained without a start-of-burst signal. The device was realized in the low-power flavor of a 40 nm CMOS process occupying an area of only 0.050 mm 2 .
