Abstract-The upcoming 100 Gb/s links in the next-generation ethernet passive optical networks will be based on four channels of 25 Gb/s. The corresponding transceivers in these optical links require a high-speed clock and data recovery circuit to extract a synchronous clock and recover the received data. To achieve a sufficiently fast settling time for 25 Gb/s burst mode upstream applications in passive optical networks (PONs), we introduce an architecture of the first 25 Gb/s all-digital clock and data recovery circuit (AD-CDR). Thanks to the implementation of a digital loop filter, our AD-CDR avoids the need of a system clock or a startof-burst signal. This circuit is implemented in a 40-nm CMOS process and has a very compact active chip area of only 0.050 mm 2 . Furthermore, the performance of the burst-mode operation of our AD-CDR in an optical setup is measured and reported, resulting in a burst-mode lock time of 37.5 ns and consuming only 46 mW. Index Terms-All-digital clock and data recovery (AD-CDR), burst mode, continuous mode, digital loop filter, inverse alexander phase detector, NG-EPON, passive optical network (PON), PLLbased CDR, quarter-rate ring oscillator, subsampling.
for next-generation 25/50/100 Gb/s EPONs (NG-EPON) and targets the 100 Gb/s capacity PON to be required in 2025 [2] . The NG-EPON is expected to use 2 and 4 wavelengths, each carrying 25 Gb/s, to achieve data rates up to 50 Gb/s and 100 Gb/s, respectively [3] .
The corresponding transceivers in these optical links require a high speed clock and data recovery circuit to extract a synchronous clock and recover the received data. Especially in the upstream direction, clock and data recovery is challenging due to the burst-mode nature of the traffic. A sufficiently short settling time is required to ensure the data is correctly recovered.
So far, various techniques have been proposed for high-speed burst-mode clock and data recovery (BM-CDR), such as gatedvoltage controlled oscillators (G-VCO), oversampling, and fastlock phase locked loops (PLL) [5] . The G-VCO CDR has the disadvantages of no jitter rejection and reduced pulse-width distortion (PWD) tolerance. The oversampling CDR can provide a high PWD tolerance, but suffers from high power consumption and a large IC area. The fast-lock PLL-based CDR can exhibit high jitter rejection. However, traditional PLL implementations rely on analog building blocks and achieve a lock time typically higher than 100 ns [5] .
Recently, fast-lock PLL based CDRs for a real-time 25 Gb/s PON upstream link are becoming increasingly important [6] . The authors of [7] discuss a 25 Gb/s BM-Rx implementation for photonic switching networks which also requires a half-rate system clock input and an externally controlled start-of-burst signal to achieve 31 ns settling time with 4.4 pJ/bit efficiency.
In this work, we demonstrate the operation of our 25 Gb/s all-digital clock and data recovery (AD-CDR) circuit for nextgeneration PON applications implemented in a 40 nm CMOS process in an optical setup. The AD-CDR has a wide-band lock loop and is capable of phase-locking an incoming burst in 37.5 ns, and requires neither a system clock nor a start-ofburst signal. Thanks to the all-digital implementation, the circuit occupies a compact active area of 0.050 mm 2 and consumes only 46 mW, resulting in an energy-per-bit of 1.8 pJ/bit. This paper is an invited extension of our work presented at ECOC 2017 [8] and is organized as follows. In Section II, we elaborate on the architecture of the all-digital clock and data recovery circuit. Subsequently, the setup and the experimental results in continuous and burst mode are discussed in Section III. Finally, Section IV concludes the paper.
0733-8724 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information. 
II. THE CLOCK AND DATA RECOVERY SYSTEM
A conventional PLL-based Clock and Data Recovery circuit (shown in Fig. 1 ) comprises a phase detector, a charge pump, a loop filter and a voltage controlled oscillator (VCO) [9] . The phase detector compares the phases of both the input data and the locally generated clock. Based on the phase difference between these signals, the phase detector adjusts the charge pump such that it charges or discharges the loop filter. The resulting voltage controls the frequency of the voltage controlled oscillator in order to decrease the phase difference between the input data and the recovered clock. As a result the recovered clock will be synchronized with the input data. Additionally, the input data is sampled using the recovered clock and is available together with this clock for further processing.
Recently, all-digital clock and data (AD-CDR) circuits have emerged which are inspired by the all-digital phase-locked loop (AD-PLL) used for frequency synthesis [10] , [11] . These AD-CDRs implement their loop filter completely in the digital domain, which has many advantages. Compared to conventional charge-pump loop filters, this does not require bulky analog loop-filters, which require a large chip area or have to be implemented off-chip. Furthermore, in the digital domain, large time constants can be implemented at low cost and the loop filter can maintain its state without any loss. This is especially beneficial for burst mode applications because the digital controlled oscillator will not exhibit any frequency drift when there is no phase information available, as is the case in between bursts. As a result, the AD-CDR can also operate without a precise reference clock.
An overview of the proposed AD-CDR is shown in Fig. 2 . It comprises a bang-bang phase detector (BB-PD), a subsampling block, a digital loop filter (DLF) and a digitally controlled oscillator (DCO). To relax the circuit requirements, the DCO operates at the quarter rate and generates 8 clock phases that have a phase difference of π 4 radian. At a clock rate of 6.25 GHz, this corresponds with a delay of 20 ps between each consecutive clock phase. These clock phases are sent to the phase detector and are used to sample the input data. With a 25 Gb/s data input, a bang-bang phase detector is the optimal choice, due to its simplicity: the BB-PD generates only two binary signals which indicate whether the recovered clock is leading (Early) or lagging (Late) with respect to the input data (D in ). Because these signals are binary, they can easily be processed by a digital loop filter. However, the operating frequency of the phase detector is too high to allow direct synthesis of the loop filter. To reduce its operating frequency, the phase information (= Early/Late-signal) is subsampled N = 16 times, indicated with ↓ N in Fig. 2 . The subsampled signal is filtered by the digital loop filter (operating at 25 Gb/s 16 = 1.56 GHz) and the resulting signal controls the DCO such that the phase error is reduced.
Below, the operation of each building block is discussed in more detail. For the discussion about the circuit implementation, the reader is referred to [12] .
A. Inverse Alexander PD
To implement the BB-PD, the Inverse Alexander PD topology introduced in [13] is chosen. When the PD output is subsampled, this topology has a superior performance over a conventional Alexander PD [13] . The phase detection logic of an Inverse Alexander PD uses three samples taken at three consecutive clock edges to determine whether a data transition is present and whether the clock leads or lags the data.
In lock, the first and third sample provide edge-related information, while the second sample corresponds to a sample of the middle of the data. Fig. 3(a) illustrates the ideal locking condition. In this case, the first and third sample are exactly on the edge and hence their value is undefined. However, in practice due to noise the PD will randomly produce an Early or a Late pulse, which on average cancel each other out. Fig. 3(b) shows the case where the clock edge leads the data edge (Early) and Fig. 3(c) shows the case where the clock edge lags on the data edge (Late).
B. Digitally Controlled Oscillator
For the design of the DCO in our AD-CDR, a quarter-rate architecture [14] is used. This means that the DCO operates at one fourth of the data speed, and provides the required sampletime resolution in the form of 8 uniformly-phase-shifted clock phases. For a 25 Gb/s data input, this means that the DCO frequency will be 6.25 GHz. The implementation can conveniently be realized as a 4 stage ring oscillator with differential delay cells [15] .
The operation of the quarter-rate topology is illustrated in Fig. 4 , where the waveforms of a '1010' input data sequence and of the 8 different clock phases are shown for the case that the AD-CDR is early. Using the quarter rate clock phases, the input data is sampled. In the ideal locking condition, the odd clock phases are perfectly aligned with the data edges, while the even clock phases are in the middle of the data symbol, which is the ideal sample moment. Per clock period, there are 4 sets of three consecutive samples and each set of three consecutive samples can be used by the Inverse Alexander PD operation to generate an Early/Late signal.
The outputs of the BB-PD, i.e., the Early/Late-signal and recovered data-related samples D out , are thus automatically parallelized. This demonstrates that the quarter rate operation significantly relaxes the requirements on the clock buffers and BB-PD circuitry and simplifies further processing.
C. Digital Loop Filter
The DCO is controlled by an automatically synthesized and place & routed digital loop filter. However, to allow automatic synthesis the operating frequency of the digital loop filter cannot exceed 1.56 GHz in the used 40 nm CMOS process. Therefore the output BB-PD is subsampled, i.e., only one out of N = 16 consecutive Early/Late signals is sent to the digital loop filter.
The loop filter, consisting of a proportional and integral path, processes the Early/Late signals. Its transfer function is given by:
where K p and K i are the respective gains of the proportional path and integral path, and D p and D i are the corresponding delays. In our implemented DLF, we can adapt both the proportional and integral gain setting, while the delays are hard wired. The delay in the proportional path and in the integral path are respectively D p = 2 and D i = 9 digital clock cycles. Especially the delay in the proportional path should be limited in order to avoid stability issues, but with the expected jitter in the CDR loop, this delay (D p = 2) is low enough to ensure its stability [16] . Based on the output of the loop filter, capacitors in a capacitor bank of the DCO are switched. This increases or decreases the delay of the differential delay cell and consequently the frequency of the DCO is adjusted.
When no data transition occurs, the BB-PD cannot determine if the clock leads or lags the data and therefore does not generate any signal. Thanks to the digital nature of the loop filter, the state is maintained and consequently the DCO frequency is not adjusted. Except for a 1-time only calibration to set the DCO in the correct frequency operation area, the AD-CDR does not require a reference clock.
The burst-mode operation of the CDR is attributed to its large loop bandwidth, which enables short settling times. As a result, no start-of-burst-signal is required either. This significantly simplifies the integration of the component in a system, removing the need of a feedback signal.
III. EXPERIMENTAL RESULTS
The AD-CDR is fabricated in a 40 nm Low Power CMOS technology and the CDR core occupies a chip area of only 0.050 mm 2 . To test the fabricated CDR, it was wire bonded on a high-speed PCB. A photo of the fabricated chip wirebonded on the PCB, is shown in Fig. 5 . This test chip is about 50 times larger than the required 0.050 mm 2 due to a considerable amount of bond pads used to observe all the test signals.
In [8] , the AD-CDR was directly connected through the test PCB to measurement equipment and electrical domain measurements were performed. In this work, the measurements are extended in a complete optical setup which includes our AD-CDR. The measurement setup is discussed first and is followed by the continuous and burst mode measurement results.
A. Setup
The experimental setup shown in Fig. 6 is used to evaluate the performance of the AD-CDR in an optical setup. In this setup, a clock generator creates a 25 GHz clock signal that provides the necessary timing information for a bit pattern generator. This pattern generator outputs a 25 Gb/s data signal with a voltage swing that varies from 300 mV pp to 630 mV pp . Either a PseudoRandom Binary Sequence with a length of 2 9 − 1 (PRBS9) or a user-defined (burst) packet is generated and applied to a 40 Gb/s LiNbO3 Mach-Zehnder Modulator (MZM). This MZM modulates the light of a laser with the generated data. The laser operates at a wavelength of 1550 nm and the output power is set to 6 dBm.
To mimic realistic PON applications for the CDR, a 40 Gb/s optical receiver with a RF bandwidth of 30 GHz is connected back-to-back with the modulator. This receiver comprises a PIN photo diode and a TransImpedance Amplifier (TIA), and converts the optical signal back to an electrical data stream. This data stream is then first amplified before it is applied to the AD-CDR. This amplifier has a gain of 11 dB and a bandwidth of 67 GHz and is required to increase the output swing of the optical receiver to about 370 mV pp such that the CDR can operate correctly. Subsequently, our implemented AD-CDR recovers the timing information and the data from the input signal. The recovered clock is observed with a spectrum analyzer, while the recovered data is recorded by an error analyzer for the BER measurements, by a sampling scope for the eye diagram measurements and by a real time scope for the settling time measurements.
Please note that in a complete receiver, an adaptive gain control block is typically incorporated in the TIA to keep the input swing of the CDR constant during operation [17] . In our measurement setup, this was not needed because all data is generated by one source.
B. Continuous-Mode Measurements
First, the power consumption and error-free operation were evaluated in continuous-mode. A 25 Gb/s PRBS9 input sequence was generated by the bit pattern generator. The corresponding eye diagram at the input of the AD-CDR is given in Fig. 7 and has an amplitude of 370 mV pp and a RMS jitter of 2.6 ps. Due to the conversion to and from the optical domain, additional noise and jitter are introduced which deteriorate the desired signal.
First, the error-free operation of our AD-CDR is confirmed. We performed consistent measurements which showed that our AD-CDR is able to work error-free over more than 15 min, while consuming only 46 mW. Error-free operation is verified by analyzing one of the quarter rate outputs with the use of a BER-tester. Additionally, the eye diagram of the CDR's output is shown in Fig. 8 . The jitter at the output of the CDR is 3.73 ps RMS .
Next, the phase noise of the quarter rate recovered clock was measured and is depicted in Fig. 9 . For this measurement, Fig. 9 . Phase noise of the quarter-rate recovered clock of the AD-CDR for a PRBS9 input data sequence at 25 Gb/s (
the proportional gain (K p ) and the integral gain (K i ) of the digital loop filter were set to 5 and 2 −7 , respectively. A higher proportional gain would further increase the bandwidth which will lead to a faster settling time. However, the input jitter would be less suppressed and this will result in the occurrence of bit errors. The integral gain is set sufficiently smaller than the proportional gain to avoid instability. This value cannot be too small, because we need sufficiently high gain to reduce any frequency error to zero. Fig. 9 shows that the CDR with our settings has a large 3 dB loop bandwidth (≈ 75 MHz), which directly reduces the settling time of the AD-CDR during burstmode operation. The figure also illustrates that many spurs are present in the phase noise of the DCO. These spurs originate from the finite length of the PRBS9 sequence and frequency of the fundamental spur can be determined by:
where f spur is the frequency offset of the spur, f data is the clock frequency of the input data, N is the subsample factor and length (PRBS9) is the repetition period of the PRBS9 sequence. Moreover, we determined the input sensitivity of the CDR. Fig. 10 shows the obtained bit error rate when the amplitude at the input of the CDR is varied. A bit error rate lower than 10 −12 is reached when the input amplitude of the CDR is larger than 300 mV pp . For all subsequent measurements, the signal amplitude at the input of the CDR was set to 370 mV pp .
C. Burst-Mode Measurements
To evaluate the burst-mode performance of the AD-CDR, 25 Gb/s packets starting with a "1010..." preamble are used. Because all the packets are generated from the same source, a sufficiently long gap is required between two consecutive packets to ensure the CDR is no longer in lock with the generator. This is illustrated with two packets with a gap size of 10 ns and 41 ns shown in Figs. 11 and 12 respectively. The top waveform displays an instantaneous sampled output stream, while the bottom row shows a persistence mode view of the output which superimposes multiple waveforms on the same view. It is clear that with a gap of 10 ns the CDR remains in lock. The gap length is increased four times in order to bring the CDR out-of-lock. To ensure random phases of the incoming data with respect to the DCO of the CDR, the gap size is again increased four times to 164 ns and this value is employed during the remainder of the measurements. The gap length cannot be further increased because we are limited by the use of a commercially available AC-coupled amplifier in our setup: any increase in the gap length will cause the DC level at the input of the CDR to drift during an idle phase. However, all electrical burst-mode measurements of our CDR are described in [8] and are performed with extreme gap lengths of 2 17 bits which corresponds with an idle time of 5.2 μs. Of course, the large gap size is only necessary to stress the device during experiments. In practice, the gap size can be made arbitrary small.
In order to measure the settling time of the AD-CDR, a long preamble sequence was added in front of the packet. Because the input data is internally demultiplexed into four quarter-rate streams, it is very easy to observe the settling at the output of the device using a real-time oscilloscope: When the "1010..." preamble is demultiplexed by four, the output should stay either low or high. If any transition occurs during this preamble, an error has occurred. The number of transitions at the beginning of the packet indicates how many packets are received: the phase of the incoming packet is distributed randomly, as a result, there is an equal chance of receiving a 1 or 0 signal.
The packet structure and the demultiplexed output is schematically illustrated in Fig. 13 . The packet consists of a 2 14 bit (≈16 kbit) preamble, a 16 bit long delimiter used to align the 4 output datastreams and a 2 20 bit (≈1 Mbit) payload. The gap between two packets is 2 12 bits (≈4 kbit) which results in 164 ns. A captured output packet is shown in Fig. 14 . The long preamble length was only used to verify no errors occur during burst-mode operation after settling. In practice, the preamble length can be limited to the worst-case settling time.
The AD-CDR aligns the phase of its recovered clock using a wide-band PLL structure. Because this is a closed loop system, the settling time is strongly related to its bandwidth. Additionally, the settling time also depends on the relative phase of incoming data stream and on the phase noise generated by the DCO. As a result, part of the settling time is deterministic, while it also has a stochastic component.
The settling time of the AD-CDR is measured by recording when a transition occurs in the subsampled preamble at the output of the AD-CDR (Fig. 15) . Nearly always, the CDR is able to lock on the data with very short settling times: it is observed that 99.9% of the transitions occur within a settling time that is smaller than 20 ns. After capturing 2 million packets, the worst case settling time was 37.5 ns.
IV. CONCLUSION
In this work, we presented the first 25 Gb/s all-digital PLLbased clock and data recovery circuit working in burst-mode operation. The operation of our AD-CDR at a line rate of 25 Gb/s is measured in a complete optical setup. The device was realized in the low-power technology of a 40 nm CMOS process occupying an area of only 0.050 mm 2 . The AD-CDR consumes 46 mW without the use of an external reference clock, resulting in an energy-per-bit of 1.8 pJ/bit. The digital loop filter is adjustable, and thanks to its large bandwidth, a settling time of 37.5 ns or less is obtained without a start-of-burst signal.
