I. INTRODUCTION
W ORLDWIDE, passive optical networks (PONs) are widely deployed as a cost-effective technology for access networks. Since 1985 the evolution of access line rates indicates a tenfold speed increase every 5-6 years, showing a 50% compounded annual growth rate [1] . The peak downstream (DS) rate per subscriber is expected to reach 1 Gb/s by 2014. Since 1995 various time division multiplexing PON (TDM-PON) standards have been proposed by two PON standardization working groups: the full service access network (FSAN) and the Ethernet in the first mile (EFM) alliance, and specified by two standard bodies: the international telecommunications union (ITU) and the institute of electrical and electronics engineers (IEEE) respectively. Both ongoing standardization efforts The authors are with the Department of Information Technology/ Interuniversity Microelectronics Centre, Ghent University, 9000 Gent, Belgium (e-mail: xingzhi@intec.ugent.be; xin.yin@intec.ugent.be; jochen.verbrugghe@ intec.ugent.be; bart.moeneclaey@intec.UGent.be; arno.vyncke@intec.ugent. be; christophe.vanpraet@intec.ugent.be; guy.torfs@intec.ugent.be; Johan. Bauwelinck@intec.ugent.be; Jan.Vandewege@intec.ugent.be).
Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/JLT.2013.2285594
were endorsed by the concurrent development of innovative upstream (US) burst-mode receivers (BM-RXs), the physical media dependent (PMD) keystone component of any PON system. Fig. 1 shows the evolution of the ITU-T and IEEE PONs as well as their associated BM-RXs [2] . FSAN was founded in 1995, and its first ITU-T G.983.1 standard [3] was published in 1998 for an asynchronous transfer mode-based PON (A-PON), mainly used for 155 Mb/s business applications. Some early work on BM-RXs was published in [4] - [10] . An amendment of G. 983 
in 2005 introduced a broadband PON (B-PON). ITU-T B-PON further improved PON systems with 622 Mb/s DS and 155-622 Mb/s US bandwidth (BW).
In 2004 the 1 Gb/s limit was exceeded by the publication of the ITU-T G.984 gigabit-capable PON (G-PON) standard [11] , and the IEEE 802.3 ah gigabit Ethernet PON (GE-PON) standard [12] . G-PON provides an important performance boost offering higher data rates of DS 2.5 Gb/s and US 1.25 Gb/s with higher BW efficiency. GE-PON uses standard Ethernet frames with 1 Gb/s BW for both DS and US. Compared to G-PON, GE-PON is a relatively simpler standard with relaxed physical (PHY) layer hardware specifications, but with lower BW utilization. During the standardization phase, fast synchronization 1.25 Gb/s G-PON compliant BM-RXs with a scrambled non-return-to-zero (NRZ) line code [13] - [16] , and ac-coupled 1.25 Gb/s GE-PON BM-RXs with 8 B 10 B line encoding [17] - [20] were investigated. Both of these two BM-RX techniques are now widely deployed in gigabit PON systems across the world.
The next evolution was the IEEE 802.3 av 10 G-EPON standard [21] ratified in Sept. 2009 . It provides 10 Gb/s DS and 10 G or 1 Gb/s US. The 10 G-EPON must support coexistence with 1 G E-PON on the same fiber infrastructure. This requires a 10 G/1 G dual-rate BM-RX and makes the 10 G-EPON BM-RX design more challenging. Numerous papers were published of which we will present a selection in detail in the following sections. At the ITU-T side FSAN completed the ITU-T G.987 10 G-PON, also known as XG-PON [22] , offering 10 Gb/s of DS and 2.5 Gb/s of US BW. The XG-PON is evolved from the existing G-PON standard, and requires coexistence with G-PON on the same optical fiber plant. Both XG-PON and 10 G-EPON are conceived to offer fast broadband access on an unprecedented scale in the near future. The ITU-T G.989 40-Gigabit-capable passive optical networks (NG-PON2) defines US BM-RX data rates of 2.5 G-10 Gb/s, while higher aggregate rates of 40 G are 0733-8724 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information. achieved by time/wavelength division multiplexing (TWDM). 40 Gb/s TDM line rates are beyond NG-PON2. This paper presents a tutorial overview of various BM-RX design techniques for both ITU-T and IEEE TDM-PONs, and their recent developments focusing on how to achieve fast RX synchronization and overall good performance. Section II describes the specific requirements of the BM-RXs from a system level point of view in terms of optical power budget, burst overhead (OH) composition for synchronization, and the overall BM-RX figure of merit. Section III discusses two typical RX coupling methods of alternating current (ac)-coupling and direct current (dc)-coupling. Their intrinsically associated technical issues and some solutions are introduced. Section IV presents various 2 R BM-RXs design techniques in detail from a component development point of view. Focusing on 10 Gb/s operation different configurations and design approaches of burst-mode transimpedance amplifiers (BM-TIAs) and burst-mode limiting amplifiers (BM-LAs) are reviewed and compared. 10 Gb/s burst-mode clock-data recovery (BM-CDR) design techniques are separately described in Section V. Furthermore 10 G/1 Gb/s dual-rate BM-RX requirements and their implementations are presented in Section VI. Afterwards stateof-the-art 10 Gb/s BM-RX prototypes and their sub-system performance are demonstrated in Section VII. Finally Section VIII gives conclusions. Fig. 2 illustrates a typical point-to-multipoint TDM-PON system. From a single optical line termination (OLT) located at a central office, it connects 64 up to 128 optical network units (ONUs) at customer's premises via a fiber plant mainly composed of fibers and passive optical splitters. The OLT BM-RX plays a key role in the whole PON system: handling the aggregate US traffic, its performance is essential to obtain a high split ratio and an extended reach. As the cost of the OLT equipment is shared among all ONUs, it hardly affects the total PON cost, but a 3-dB improvement on the BM-RX sensitivity can increase the split ratio by a factor of 2, which almost doubles the number of subscribers that can be connected to the PON. Fig. 2 illustrates the operation of a TDM-PON DS link. A single OLT broadcasts data frames with constant amplitude and synchronous phase towards all ONUs. Each ONU extracts its own frames, and discards all others. As the DS transmission operates in a continuous wave (CW) mode, one can employ a conventional CW-RX at the ONU, which can continuously perform precise amplitude and clock phase recovery with slow averaging over the frames. The US transmissions, however, follow a time division multiple access (TDMA) scheme with bursty multi-talker traffic. Each active ONU sends bursts towards the OLT within precisely assigned timeslots. The BM-RX must process optical bursts from all active ONUs with very different optical power levels, and unpredictable burst-to-burst phase. By a rigorous time ranging process, TDM-PONs avoid the collision of bursts and provide high BW efficiency.
II. BM-RX REQUIREMENTS

A. TDM-PON Upstream Transmission
The BM-RX comprises three main building blocks as shown in Fig. 2 : an avalanche photodiode (APD)-based BM-TIA, a BM-LA and a BM-CDR circuitry. The BM-TIA and the BM-LA together perform the 2 R functions of re-amplification and reshaping, forming a 2 R BM-RX. The BM-CDR performs a third function, the re-timing. So the three parts together form a 3R BM-RX. Unlike the DS CW-RX, the US BM-RX has to quickly adjust both amplification gain and dc offset and has to quickly find an optimum clock phase at the onset of each burst [13] , before data decisions can be made. This makes a BM-RX more difficult to design especially at a speed of 10 Gb/s.
B. Burst-Mode Overhead Composition
As explained in the previous section, the 2 R BM-RX performs amplitude recovery via fast gain control and accurate decision threshold level extraction on a burst by burst basis. The BM-CDR realizes fast clock phase alignment for all bursts received from active ONUs, to produce a data stream that is fully synchronized with the OLT system clock. At the onset of each incoming burst, a BM-RX requires settling time for the 2 R BM-RX operation and for the CDR lock. Therefore, a burst overhead (OH) as defined in ITU-T, or a guard band as defined in IEEE is inserted between two adjacent bursts. Fig. 3 depicts the US burst composition and BM-RX synchronization time suited for both ITU-T G.987 and IEEE 802.3 av. These are well-defined in both standards though some different terminologies are used. Each burst signal contains two main parts: guard band/overhead and data payload. A dead zone is used for laser turn-off (Toff) as required by the burst-mode transmitter (BM-TX) of an ONU to avoid overlap with bursts assigned to other ONUs. The idle time in front of the data is used for ONU laser turn-on (Ton), for the synchronization (Sync) time required for 2 R BM-RX settling, and for CDR lock and burst delimitation (or code group alignment). Each burst starts with a Sync pattern preceding the data payload. This pattern is also called burst preamble, whose length is represented as bits or nanoseconds. G.987.2 offers a slightly different description, dividing the OH time into three sections, the guard time including both Ton and Toff for burst overlap prevention, the preamble time and the delimiter time to determine the presence of a PHY burst and delineate the PHY burst.
Burst OH time is variable within certain limits. IEEE 802.av defines maximum (max.) 512 ns for laser Ton and max. 512 ns for laser Toff. Sync time for 10 Gb/s BM-RX is defined as max. 800 ns for the 2 R RX settling, max. 400 ns for the CDR lock and about 6 ns delimiter pattern for a code group alignment. The 10 Gb/s data payload in 802.3 av is 64 B 66 B line encoded and forward error correction (FEC) protected, and By using Reed-Solomon code (255, 223), an electrical FEC coding gain of 6.4 dB has been experimentally measured for a 10 G-EPON system after 20 km transmission over a standard single-mode fiber [23] . This shows that the required undecoded bit error rate (BER) for achieving the decoded BER of 1 × 10 −12 is relaxed to 1 × 10 −3 . However, the effectiveness of FEC depends on the characteristics of various optical transceivers. In principle one can expect to improve the RX sensitivity with Reed-Solomon code (255, 223) by at least 5 dB. A high optical power budget is an important figure of merit for the PON system performance. There are basically two ways to achieve a higher optical power budget: one is to increase the optical output power launched by each ONU BM-TX, and the second is to improve the OLT BM-RX sensitivity. The latter is preferred as a more cost-effective solution. A flexible PON network deployment requires not only a high RX sensitivity for higher split ratio and longer reach, but also a wide dynamic range to handle the differential ODN losses that individual ONU bursts experienced along the different optical paths. Differential ODN loss accounts for loss differences between long respective short optical network paths, and high respective low splitting ratios, and also accounts for tolerances on all optical components in the ODN. The dynamic range requirement is aggravated by the use of different types of ONUs connected to the PON and large tolerances on their launched optical powers. Table II summarizes the main specifications of the BM-RXs in all TDM-PON systems. For example, the 10 G-EPON max. BM-RX sensitivity of −28 dBm at a pre-FEC BER of 10 −3 is specified for OLT 10 GBASE-PR-D3, with a dynamic range of 22 dB, requiring an overload level of minimum (min.) −6 dBm. Otherwise a strong optical signal emitted from a nearby ONU experiencing a min. ODN loss would severely saturate the BM-RX front-end. The combined −28/−6 dBm is a demanding specification for 10 Gb/s BM operation. Max. Sync time is set to 800 ns for the 2 R RX settling and 400 ns for the CDR lock with sufficient margin.
C. BM-RX Specifications
During the development of BM-RXs, a lot of efforts are put into the decrease of Sync time required for the BM-RX settling This is because network operators and system vendors not only need to obtain RX performance meeting the specifications set in the PON standards, but also want to have better feature(s)/specification(s) to differentiation with other BM-RX approaches. The real challenge for designing such a kind of high performance 10 Gb/s PON BM-RX is that it needs to meet the combined requirements of high BM-RX sensitivity, large loud/soft ratio (dynamic range) and fast synchronization, and also to take other PON related aspects into account. For example, one needs to compromise between a very short burst OH for high BW efficiency and relatively relaxed OLT timing parameters for simpler PHY layer implementation and PON network operation. Many trade-offs have an impact on cost effectiveness (such as chip die size, packaging cost, power consumption) and on overall performance. Moreover, a robust BM-RX design is important for high yield manufacturing, system interoperability and multi-rate operation is a must for supporting network scalability and upgradeability.
III. BM-RX COUPLING METHODS
A BM-TIA is either ac-coupled or dc-coupled to a BM-LA. Traditionally the IEEE EPON-like BM-RXs employ the more simple ac-coupling method as looser Sync time is defined for this purpose in IEEE PON standards. In this case line encoding is necessary for the optimization of the time constants of the ac-coupling and some internal BM-RX circuits. As said the ITU-T GPON-like BM-RXs are designed to work with a much shorter OH time. The combination of a high input optical power level variation (up to 22 dB), a short guard time (4 bytes for G-PON), and the fact that the G-PON does not employ heavy line encoding, makes that a G-PON BM-RX requires dc-coupling. Considering the 4-byte guard time and the defined scrambled NRZ payload allowing for up to 72 consecutive identical bits (CIDs) within a burst, it is impossible to choose an appropriate ac coupling time constant without spoiling the initial conditions at the start of a new burst. This would result in signal distortion and burst-mode penalty [24] .
A. AC-Coupled BM-RXs
A typical ac-coupled BM-RX is shown in Fig. 4 . The BM-TIA is ac-coupled to the BM-LA via coupling capacitors, where the dc component decays with time t and τ is the RC time constant. The high-pass filtering nature of the ac-coupled circuit rejects the low-frequency contents, and this low frequency cutoff must be close enough to dc to avoid baseline dc droop and pattern dependent jitter with long CID patterns in the payload [25] . However, if the low frequency cutoff is too low the BM-RX requires a long preamble for settling as it takes time to evacuate the charge that was accumulated by the coupling capacitors during the preceding burst. The optimum RC time constant needs to be carefully chosen especially in case of a loud burst followed by a soft burst [26] . The 8 B 10 B line encoding with <5-bit CIDs employed in the GE-PON BM-RX makes the design easier but at the expense of more than 20% throughput decrease [27] .
As shown in Fig. 4 (a) fast gain control is required for the BM-TIA as it needs to handle a 22 dB dynamic range (an amplitude ratio of more than 100). A fixed transimpedance gain would severely saturate the TIA. The BM-LA amplifies the BM-TIA output signal and limits its output to a digital level. The BM-LA also performs offset compensation. After removing the dc component via ac coupling capacitors and the offset voltage between the positive and negative signal of the differential input, its decision threshold is set in the middle of the signal. The BM-CDR performs fast clock/data phase extraction and data recovery.
A conventional BM-RX employs an average detector (AD) for the decision threshold extraction as depicted in Fig. 5 (a) [28] . In this case, the response time is dominated by the time constants of the AD and the coupling capacitance. Fig. 5(b) and (c) explain how the time constants deteriorate RX response time and cause waveform distortion and duty cycle fluctuation. 10 GE-PON uses 64 B 66 B encoding with max. 65-bit CIDs. In this case the time constant is critical to achieve both a short preamble and a sufficiently small baseline dc droop in order to handle a wide dynamic range without waveform distortion. The recommended RX settling time is ∼400 ns [26] , and the specification of the 2 R RX settling time in IEEE 802.3 av is 800 ns with a margin. Reference [28] proposes a baseline wander common mode rejection (BLW-CMR) technique and an inverted distortion technique to cancel out the transient response and improve the duty cycle distortion for the 10 G-EPON ac-coupled BM-RX. A fast RX settling time of <150 ns was demonstrated for this ac-coupled average-detect type BM-RX. 
B. DC-Coupled BM-RXs
Any mechanism containing memory of a preceding burst, such as ac-coupling, slow dc offset compensation and slow automatic gain control (AGC) can hardly be employed in the G-PON BM-RX due to its hard OH requirement, as needed to reach a very high network efficiency. In this case dc-coupling is preferred over ac-coupling. As no coupling capacitors exist between the TIA and the LA, there is no dominate time constant constraint caused by coupling capacitors but dc components are not removed automatically. However, decision thresholds vary a lot from burst to burst. As shown in Fig. 6 dc-coupling implies the presence of dc-offsets, which can become a limiting factor in obtaining high RX sensitivity. The BM-RX should remove such offsets.
For example, a G-PON BM-RX must perform dynamic level detection and amplitude recovery based on fast decision threshold extraction from individual incoming bursts within a very short OH of 96 bits. Any transient or tail caused by a preceding strong burst will hinder the reception of a closely following weak burst. So an additional Reset pulse is required to shorten this tail, and to erase the threshold that was set for this preceding burst. During and shortly after Reset, unwanted transients can occur within the guard time, and during the RX settling invalid output can occur within the preamble as shown in Fig. 7 [29] . Therefore a so called blanking signal is usually sent towards the G-PON BM-CDR from the succeeding medium access control (MAC) large-scale integrated circuit (LSI), to make the BM-CDR ignore OLT outputs until valid data are received [30] . The Reset pulse is a time-critical signal, and the timing depends on the BM-RX design and the PON PHY layer system implementation. So it is not very user-friendly to guarantee the interoperability of the PON system. As a result, dc-coupled G-PON applications require a dedicated BM-TIA [16] and a dedicated BM-LA [14] with "Reset" signaling.
IV. 10 Gb/s 2R BM-RX ARCHITECTURES
Two typical ac and dc-coupled BM-RX methods and their specific issues were described in Section III. As the 10 GE-PON has been standardized we focus in this section on the 10 Gb/s BM-RX architectures. There are two main configurations to implement a 2R BM-RX as depicted in Fig. 8(a) and (b), respectively [31] : one is a step AGC and a peak detection automatic threshold control (ATC) configuration with Reset; another is a continuous AGC and a continuous ATC configuration using an average detect without Reset. The combination of step AGC with peak detect ATC can quickly settle gain and threshold while maintaining high tolerance to CIDs. But it is weak against the unspecified transient fluctuation of the ONU burst signals. The approach of Fig. 8(b) keeps gain adaptation and varying threshold continuously within the burst. It doesn't need a Reset signal, but it needs longer 2 R RX settling time (a few hundred ns) for high tolerance to CIDs.
Three main BM-TIA architectures can be categorized in a view of the type of gain control used [32] to achieve a wide dynamic range: 1) a conventional fixed gain type, 2) a step-AGC type [33] and [34] , and 3) a continuous AGC type [35] . The BM-LA needs to cancel the offset voltage between a positive and negative signal of the differential input as the offset voltage is still present in the BM-TIA output. It is added at the ac coupling between the BM-TIA and the BM-LA, resulting in pulse width distortion (PWD). Therefore, most of recent BM-RX design approaches include partially or fully fast automatic offset cancelation (AOC) in the BM-TIA rather than only in the BM-LAs. In this way the BM-RX dynamic range can be dramatically improved. Five circuit techniques [32] have been proposed for AOC/ATC: 1) reduction of ac coupling capacitance, 2) feed-forward top/bottom (positive/negative) peak detect with Reset [14] and [36] - [39] , 3) edge detect without Reset [40] , 4) feed-forward average detect without Reset [35] and [41] , 5) feedback AOC without Reset [42] . A newly developed feedback AOC with switchable loop BW and on-chip Reset generation is reported in [43] - [45] . In the following subsections, six advanced 10 Gb/s BM-TIA and BM-LA design examples are described in detail.
A. 2-Step AGC BM-TIA With Coarse/Fine AOC
Reference [33] presents a 10 Gb/s BM-TIA with a 2-step AGC and a coarse/fine AOC as shown in Fig. 9 . A variable feedback resistor Rf is switched automatically between high gain and low gain, controlled by a quick level-detection circuit, using a comparator with hysteresis. A coarse/fine AOC circuitry was implemented in the TIA to boost the dynamic range and to allow ac-coupling with a simpler BM-LA. When the input signal is too large the coarse AOC draws the current I AOC at the TIA input, switched by the hysteresis comparator. Then the feed-forward type fine AOC further compensates the remaining offset using a level-hold circuit by detecting the peak level of each differential input and holding it. An external Reset signal is inserted between the bursts to initialize the level detection and the initial condition for the offset compensation of each burst. This PIN-based BM-TIA exhibited an instantaneous response of 10 ns, achieved a RX sensitivity of −19.5 dBm at BER = 10 −3 and a dynamic range of 20.5 dB.
B. 3-
Step AGC BM-TIA With Coarse AOC Fig. 10 depicts the architecture of a 10 Gb/s BM-TIA with a 3-Step AGC and a coarse AOC [44] . This IC was developed in a 0.13 μm SiGe BiCMOS technology for a 2.5 V power supply. The design of the 10 Gb/s switchable transimpedance BM-TIA implemented with 2 shunt-shunt feedback resistors applied across the TIA core is critical. A complete high frequency TIA equivalent model should be carefully considered and also should include: 1) a 10 G APD with its paracitics (APD capacitance, the bond pad capacitances of the APD and its decoupling capacitor), mostly the 3 dB BW of an 10 G APD and its gain-BW product are limited when the multiplication factor M of the APD is high, e.g. M = 10; 2) the input pad of the TIA IC and the bond wire inductance between the APD and TIA input; 3) parasitics of the two feedback resistors with switches; 4) parasitics of bonding pads and bond wires of power supply decoupling capacitances. This high frequency equivalent circuit has more than one pole, and zeros, which may cause resonance peaks and a steep descent in the TIA transfer function. The inductance of the bond wire between the APD and the TIA would help to compensate the frequency response and improve the RX sensitivity. However, it is complex to design the APD-TIA with fast gain switching [46] . Unsatisfied low pass filter characteristics of a 10 Gb/s BM-TIA could result in a RX sensitivity penalty of 1-2 dB in case of too low 3 dB BW, and a penalty of a few dB in case severe peaking existing in the TIA frequency response thus generating inter-symbol interference.
In Fig. 10 3-step gain switching (GS) was implemented between High gain, Medium gain and Low gain. The BM-TIA performs a fast AGC by means of a variable-gain TIA core followed by a variable-gain single-ended-to-differential (S2 D) circuit. A dummy TIA feeds the references to both the gain switch block and the S2 D. The gain switch block generates the logic signal GS1 and GS2 by comparing the signal in the data path with the references. At the end of the burst, the TIA is reset to a high transimpedance gain. It should be set for maximum RX sensitivity to prepare for a new soft burst to come. The transimpedance R F is switched from High to Medium value under the control of GS1 when a loud burst arrives. The parasitics introduced by these additional feedback paths would limit the TIA front-end BW. To make sure that the circuit is stable and peaking remains within specs, this approach switches gain from Medium to Low by changing the load impedance in S2 D controlled by GS2. The advantage of this architecture is that the input-referred noise requirement for S2 D is less critical than for the TIA core. This provides more freedom in the design trade-off and allows better TIA BW optimization. The gain-switching in the S2 D stage has an open-loop behavior, so stability is of no issue, and it results in a reliable and faster response. Once the gain switching is settled, the TIA gain is locked to avoid toggling on signal level fluctuations during the burst payload. The BM-TIA also performs a coarse AOC, which sets the balance signal of the S2 D converter according to the TIA gain settings. This minimizes the dc offset of the TIA output at the RX sensitivity level and makes the succeeding BM-LA design easier.
Moreover, in this design a Reset pulse is not provided externally via a separate pin of an APD-TIA module. Instead the Reset is conveyed from the succeeding BM-LA via a commonmode (CM) signaling method. The input stage of the BM-LA alters the CM voltage of the BM-TIA output. The TIA senses the CM changes during the guard period, regenerates the Reset pulse and then activates an on-chip reset and lock function as shown in Fig. 10 . This APD-based BM-TIA achieved a RX sensitivity of −31.8 dBm at BER = 10 −3 (an APD multiplication factor M = 9), and a dynamic range of 30.7 dB with <10 ns response time.
C. Continuous AGC/ATC BM-TIA Without Reset
A dual-rate 10G/1 Gb/s BM-TIA (preamplifier) employing a continuous AGC and a continuous ATC is shown in Fig. 11 [41] . This BM-TIA IC adapts its gain and threshold continuously to compensate for both the received optical power variation and the unspecified transient responses of the ONU burst signals. The AGC/ATC functions can switch gain, transient response time and equivalent noise BW to optimize for each 10 G and 1 G burst, and is driven by an external rate select signal provided by the MAC LSI. The dual-rate APD-based BM-RX demonstrated a RX sensitivity of −30.8 dBm and an overload of −5 dBm at BER = 10 −3 for 10 Gb/s operation with <800 ns 2 R settling time, and −35.5 dBm sensitivity of and −9 dBm overload at BER = 10 −12 with <400 ns 2 R settling time at 1.25 Gb/s. 
D. Positive/Negative Peak Detect BM-LA
The positive/negative peak detect with Reset is the preferred technique for G-PON dc-coupled BM-RXs with very short Sync time (2R RX settling time of 24 bits at 1.25 Gb/s) [14] , [15] . This technique was also employed for 10 Gb/s symmetric longreach PONs with 512 ONUs supported by an erbium-doped fiber amplifier (EDFA) in front of the OLT BM-RX. In this case the fast synchronization is especially important for high network efficiency because of the very high split ratio, while the 10 Gb/s BM-RX sensitivity requirement is significantly reduced thanks to the EDFA used as an optical pre-amplifier. The PON uplink with bursty traffic requires the EDFA operating in burst-mode. Using a fast gain-stabilizing scheme the EDFA can suppress its gain transient [38] . This low noise EDFA relaxes the BM-RX sensitivity and also the dynamic range. Fig. 12(a) illustrates the circuit building blocks of the 10 Gb/s BM-LA [39] . It implements four stages of differential difference limiting amplifiers that successively measure the decision threshold with gradually improving accuracy. In the first stage, the threshold Vth for each phase is set using a positive peak detector (PPD), a negative peak detector (NPD) and a resistive divider. To reduce the number of peak detectors the next three stages only use NPDs. The disadvantage of this feed-forward AOC scheme using fast peak detectors is that it is intrinsically less accurate. The BM-LA exhibits a noisy decision threshold caused by random dc offsets, resulting in a BM-RX sensitivity penalty [47] . From Fig. 12(b) one can see that this type of the BM-LA needs Reset signals for each peak detector with critical timing. With the on-chip Reset creation [48] demonstrated <25 ns 2 R settling time at the expense of high power consumption (1.1 W), compared to a feedback AOC BM-LA (<250 mW) and a feedback AOC BM-LA with switchable loop BW and on-chip auto-Reset generation (430 mW). 
E. Feedback AOC BM-LA Without Reset
Reference [42] reports a 10 Gb/s BM-LA consisting of twostage amplifiers with two active feedback loops for the AOC as shown in Fig. 13(a) . It uses two differential amplification stages with the active feedback circuit followed by two differential amplification stages without feedback. This circuit was designed such that it offers linear amplification even for a large input voltage offset, providing an almost constant feedback gain over a dynamic range of around 20-200 mV peak-peak. Most of the gain in the data path is provided by the last two-stage amplifiers. Fig. 13(b) compares the simulated step response of the proposed scheme with the conventional one-stage scheme. The AOC time of 200 ns is 60% shorter than that in the 1-stage case, and 75% faster than the max. settling time of 800 ns set in 10 G-EPON specifications.
F. Feedback AOC BM-LA With Switched Loop BW
Though the 2 R RX Sync time was specified as maximum 800 ns in IEEE 802.av, shorter Sync time without RC time constant deterioration caused by the ac-coupling is preferred. A dc-coupled 10 Gb/s BM-LA IC using a feedback type AOC circuit with switchable loop BW is proposed as shown in Fig. 14 . With the feedback AOC it offers better RX sensitivity than the feed forward type but it requires a slow feedback loop with large time constant for accurate decision threshold extraction. By switching the loop BW during the RX preamble and the data payload the RX sensitivity is improved while keeping a shorter Sync time than the ac-coupling approach. Moreover the dc-coupled RX allows for auto-Reset generation via the common-mode signaling. The BM-LA data path comprises a linear preamplifier, two stages of LAs and an offset integrator with switchable loop BW, and an output buffer [44] . This BM-LA also integrates auto-Reset generation and Activity detection circuits controlling the AOC BW switch, so that the control signals as illustrated in Fig. 14 are internally created without the needs for time-critical signals from the PON system. When a burst arrives, the activity detection circuit first detects the start of the incoming burst with a "1010" pattern preamble. During the preamble, the BM-LA performs fast AOC and amplitude recovery by increasing the AOC loop BW using a small time constant of 8 ns. Once the correct threshold is established, the offset integrator switches its AOC loop BW to a larger time constant of 400 ns, entering a slow tracking mode within the payload. At the EOB, the BM-LA resets itself to its initial state preparing for the next burst to come. The BM-LA also sends the Reset/Activity signal back to the preceding BM-TIA IC by driving the input common-mode voltage V CM . As depicted in Fig. 10 the BM-TIA can extract this Reset/Activity from V CM without any performance penalty. This technique is capable of removing input dc offset in 50 ns, and offers continuous decision threshold tracking during the burst payload so that the BM-RX can cope with long CIDs. Table III compares the performance of various 2 R BM-RX prototypes [28] and [49] - [54] .
V. 10 Gb/s BM-CDR DESIGN ARCHITECTURES
Short response time and large CID tolerance are contradicting conditions for the high speed BM-CDR design. Quick frequency and phase locking by poor information provided from short preamble bits of varied burst packets is an intrinsically difficult technical issue. There are mainly three BM-CDR design techniques to tackle this problem as depicted in Fig. 15 [55] : a) a fast-lock phase locked loop (PLL) based CDR [56] and [57] , b) a gated-voltage controlled oscillator (G-VCO) based CDR [58] - [61] , and c) an over-sampling CDR [62] - [69] .
A. Fast-Lock PLL Based BM-CDR
The fast-lock PLL based CDR is a closed loop CDR implementation. It can exhibit high jitter rejection, but it requires too many transitions to lock. The broad-band PLL [56] reduces acquisition time by trading off jitter transfer BW and resilience against clock drift but the design uses Manchester encoding. Short acquisition time causes bit errors after long CIDs and an increase in output data/clock jitter. The min. time constant of the PLL is limited by additive noise.
For the 10 Gb/s CDR the acquisition time limit is around 130 ns with 0.1 unit interval (UI) of peak-to-peak output jitter in calculations [32] . Sub-100 ns locking time was demonstrated by [57] with a very careful design.
B. G-VCO Based BM-CDRs
The G-VCO based CDR is an open loop approach using a gating circuit to re-align the oscillator. It provides the sampling clock every time a transition in the data stream occurs, resulting in very fast response of 1 bit. This technique has the disadvantage of no jitter rejection and reduced PWD tolerance. Conventional G-VCO based BM-CDR circuits use two G-VCOs [58] . A main G-VCO aligns the phase of the oscillation clock with input data, and a sub G-VCO adjusts the frequency of the main G-VCO by a PLL. The multiple G-VCO architecture suffers from frequency difference between G-VCOs mainly caused by chip design mismatch. This would result in low CID tolerance.
Reference [59] employs a single G-VCO and a delta sigma digital-to-analog converter (ΔΣ DAC). The single G-VCO approach uses a frequency locked loop instead of the PLL in [60] . It reduces the frequency error from 20 MHz to <2 MHz. But it cannot reduce the input jitter as the CDR aligns the phase of the oscillator clock with the input data bit by bit. Reference [61] further improves the CDR performance by using two cascaded LC oscillators combined with jitter-reduction and PWD compensation circuits as shown in Fig. 16 . With these techniques the 10 Gb/s BM-CDR with the jitter-reduction circuit achieves output-data-jitter reduction of 3 dB at jitter frequency of 1 GHz. The jitter transfer of this CDR is 1.3 dB better than that of the CDR without the jitter-reduction circuit [61] . It synchronizes to the input data within 14 bits of the burst input, and the tolerance to the PWD is +0.22/−0.32 UI at 10.3125 Gb/s operation. This circuit can shorten the preamble period for the BM-LA AOC. The time required for the offset cancellation for the pulse width of over 0.9 UI is about 50 ns. The PWD-compensation circuit is able to complete the cancellation in 25 ns.
C. 10 Gb/s Over-Sampling BM-CDRs
The over-sampling CDR follows a multiple-phase selection approach. It keeps output jitter low even under heavy input data jitter because multi-phase clocks are generated from a clean stable OLT system clock that is independent of fluctuations of burst packets. The over-sampling CDR can provide a higher PWD tolerance and a short Sync time of a few bits. The additional cost is high power consumption and large IC chip area. Fig. 17 depicts the block diagram of the 82.5 GSa/s BM-CDR [67] . The over-sampling CDR IC integrates a 10.3 GHz × 8 phase PLL and an 82.5 GSa/s sampler. The PLL generates eight 10.3 GHz phase-shifted clocks, which are sequentially shifted by 45-degrees in phase synchronized with the OLT system clock. The 82.5 GSa/s sampler samples the burst data via the 8 phase-shifted clocks (#0-#7) converting the data to 8 sets of 1/8 phase-shifted D#0 to D#7. After a 1:4 demultiplexer (DEMUX) circuit the 10.3 Gb/s serial data is demultiplexed to 2.6 Gb/s × 8-bit parallel data, and sent to a field-programmable gate array (FPGA) data selector logic circuit (containing a data edge detector, a data phase decider and a data selector circuit). The optimum CDR recovery data with a high sampling resolution makes this 10 G-EPON compliant and achieves a high PWD tolerance of +0.53 UI. The over-sampling CDR IC chip size was 3.5 × 3.3 mm 2 and the power consumption was less than 5.8 W. [48] . The over-sampling principle is similar to the approach of [67] . The difference is mainly in the design implementation. This BM-CDR integrates an analogue front-end featuring a 4 × over-sampling and a digital back-end implementing a phase picking algorithm in a single IC chip. The analogue front-end uses two identical tapped delay lines (a signal delay line and a replica delay line). To ensure that the delay of the signal delay line equals the 1-bit period, a delay-locked loop is closed around a replica delay line which shares the same control voltage for all of the delay elements [64] . It samples the input waveform at four equidistant phases within a 1-bit period, realizing an equivalent sample rate of 40 GSa/s. Then the samples are de-serialized in 4 groups of 16 samples. This reduces the [55] speed requirement and provides averaging for the phase picking algorithm [69] .
When the Reset/Activity signal comes high, the de-serialized samples are latched, initiating the phase picking circuitry. The phase picking algorithm first determines the time-location of the data edges by XOR logic gates, then counts the number of data transitions that occur per phase, afterwards it comprises six analogue comparators to provide valid current mode logic level to the subsequent decision logic, and finally it uses combinatory logics to determine which sampling phase contains the most data transitions and to make a phase selection based on this. The complete algorithm has only 280 ps simulated processing delay. The analog front-end and on-chip phase detection algorithm occupy an IC chip size of 2.75 × 2.25 mm 2 . The advantage of this implementation is that the BM-CDR outputs are synchronous to the system clock and the synchronization time is 1.6 ns. The complete BM-CDR consumes 1.5 W from a 2.5 V supply with built-in data phase selection, thanks to the low speed clock of 622 MHz for implementing the digital back-end phase picking algorithm part.
The comparison results of the burst-mode Sync time and PWD tolerance are summarized in Table IV [55] . One can see that the over-sampling approach is better suitable for the PON BM-CDR. 
VI. 10 G/1 G/S DUAL-RATE BM-RXS
The 10 GE-PON system must support symmetric 10 Gb/s PONs and be backward compatible with deployed 1 G-EPONs. The wavelength of the IEEE GE-PON US (ranging from 1260 nm to 1360 nm) and the 10 G-EPON US (1260-1280 nm) overlaps. The OLT data paths must operate in a dual-rate mode with a single optical input for supporting asymmetrical and symmetrical 10 GE-PONs and for 10 G/1 G-EPON ONUs coexistence on the same outside plant [23] .
To separate the signal at the input of the OLT an electrical domain split with a shared dual-rate photodiode is preferred. Therefore a dual-rate BM-RX is a mandatory requirement for the 10 GE-PON development. This increases the complexity of the BM-RX design. Note that for ITU-T PONs the wavelength plan is well-defined: 1260-1280 nm for 2.5 Gb/s XG-PON US, and 1290-1330 nm for 1.25 Gb/s US with a wavelength division multiplexing (WDM) filter located at the OLT to separate the wavelengths. Therefore no dual-rate operation is needed for XG-PONs.
Dual-rate 3R BM-RXs have been reported in [31] and [70] - [74] . Reference [70] and [71] proposed a parallel-type dual-rate BM-RX with a BM bit-rate discrimination circuit (BDC) as shown in Fig. 19 . The BM-BDC automatically discriminates the bit rate of received signals in 50 ns. The mixed 1 G/10 G signals are separated according to two gating circuits controlled by two gating signals generated by a reset-set flipflop. The advantage of this dual-rate BM-RX is that it does not need any external control signal (such as Reset or rate select) from the MAC LSI. Therefore, this RX can operate independently in a PMD layer.
Reference [31] presents a 10 G/1 G dual-rate 3R BM-RX. The dual-rate preamplifier (BM-TIA) and BM-LA can switch transimpedance gain, transient response time of AGC/ATC, and equivalent noise BW for 10.3 Gb/s respective 1.25 Gb/s bursts, and are controlled by an external rate select signal provided by the PON MAC LSI as shown in Fig. 11 . This continuous AGC/ATC type dual-rate 2 R BM-RX enables to switch circuit parameters of a single preamplifier core. The power consumption of this BM-TIA was 480 mW [72] , which is quite high for a TO-CAN assembly. An over-sampling BM-CDR can support dual-rate CDR more easily when the high rate is an integral multiple of the low rate [64] . However in the 10.3/1.25 Gb/s coexisting system, the two data rates have no an integral multiple relationship. Reference [68] and [73] propose a single platform dual-rate BM-CDR using an 82.5 GSa/s sampling IC and bit-rate adaptive data-decision logic circuits that achieved rapid BM lock time of 37 ns at 10.3 Gb/s (BER = 10 −3 ) and of 64 ns at 1.25 Gb/s (BER = 10 −12 ).
VII. LATEST BM-RX PROTOTYPES AND PERFORMANCES
Recent progress on fast synchronization BM-RX prototypes and their sub-system performance has been reported in [49] , [54] and [74] - [76] .
Reference [74] demonstrates a 10.3 Gb/s 3R BM-RX with small die sizes and low power consumption. It employs a 2-step AGC type BM-TIA with Reset (1.05 × 0.9 mm 2 die size and 180 mW power dissipation), a two-stage active feedback BM-LA without Reset (1.7 × 1.7 mm 2 die size and 210 mW power dissipation), and a BM-CDR with G-VCOs and jitter-reduction and PWD compensation circuits. With a preamble of 200 ns, the measured 3R 10 G BM-RX sensitivity was −29.5 dBm @BER = 10 −3 and the loud/soft ratio was 23.5 dB. Reference [75] presents a small and low cost dual-rate triplexer module for use in 10 G-EPON OLT XFP transceivers. The min. RX sensitivity of less than −28 dBm and overload of more than −6 dBm have been achieved over the whole temperature range.
The performance of a dual-rate 10.3/1.25 Gb/s 3R BM-RX with a continuous AGC/ATC and an over-sampling CDR as shown in Fig. 20 was demonstrated in [31] . With a preamble of 800 ns for 10.3 Gb/s the measured Rx sensitivity was −31.2 dBm and the dynamic range was 25.2 dB (BER = 10 −3 ). And with a preamble of 400 ns for 1.25 Gb/s the measured RX sensitivity was −35.6 dBm and the overload exceeded −9.3 dBm (BER = 10 −12 ). This technology presents a good BM-RX performance. The disadvantage is that it needs a 10 G/1 G rate select signal from the MAC LSI and it needs to feed 4 control signals respectively to the AGC and the ATC blocks of the preamplifier, and to the LA to switch its BW, as well as to the BM-CDR for the bit rate select. The power consumption is quite high: 480 mW for the TIA, 280 mW for the LA and 3 W [72] for the CDR excluding the data selector logic circuits implemented in an FPGA.
Recent progress on BM transceiver development for 10 G-EPON was reported in [49] . Using a fast/slow mode selfswitching AGC the 2 R RX settling time is reduced from 800 ns to 240 ns for 10.3 Gb/s and from 400 ns to 240 ns for 1.25 Gb/s at the cost of an additional Reset signal required for this fast/slow AGC and a rate select signal for the transimpedance gain switch. Together with the ONU SFP+ transceiver the dual rate system can achieve a loss budget of 35.9 dB and a fast sync time of 240 ns, which can support a 256-split with 15 km transmission.
Reference [54] demonstrates a 10 Gb/s 2 R BM-RX as shown in Fig. 21(a) , which was originally designed for an ITU-T symmetric 10 Gb/s XG-PON2 system. It employs a BM-TIA with a 3-step AGC (1.28 × 1 mm 2 die size and 200 mW power dissipation) and a BM-LA with switchable loop BW feedbacktype AOC (1.21 × 1.26 mm 2 die size and 430 mW power dissipation). The advanced feature is that this BM-RX does not need any external timing-critical control signals provided by the MAC LSI. Instead it automatically generates the Reset pulse and performs an Activity detect itself inside the BM-LA chip. The combination of the Reset and Activity forms an envelope signal of the bursts as shown in Fig. 14. As it is auto-detected the timing is not critical to any individual burst with different payload length and different guard/idle time between two adjacent bursts. These signals are used for switching the AOC loop BW (or time constant) and fed to the BM-TIA via the common-mode signaling method for the TIA reset.
The auto-Reset detects the EOB when the BM-LA receives a long sequence of logic 0's without any logic 1. This duration is chosen sufficiently longer than the max. CID in order to avoid any untimely reset signal during the payload. However, the auto-Reset generation will suffer from noise, especially at a high pre-FEC BER. The missing EOB Reset probability versus guard time was investigated in [76] , and auto-Reset generation was well-considered and successfully implemented [45] . Various payload lengths and guard time up to 1 millisecond were tested with on-chip auto-Reset and Activity detection, providing that the BM-RX was functioning correctly.
Moreover this 10 Gb/s 2 R BM-RX can support simultaneous multi-rate operation at 10/5/2.5 Gb/s with scrambled NRZ data (without line encoding overhead) and a single BM-RX. Thanks to the fast/slow mode AOC with different time constants in the offset compensation loop, it can simultaneously achieve a fast response of 75 ns for 10 Gb/s, and a large CID tolerance of >28.8 ns without significant output jitter deterioration. The dual-rate CDR is straight forward for 10 G-GPONs where the ratio of date rates (10/2.5 Gb/s) is exactly a multiple integer. One can use a single CDR with a rate select signal to pick recovered data with the right phase for different rates.
Without any external control signals a 2 R RX settling time of 76.8 ns combined with a RX sensitivity of −31.3 dBm and a loud/soft ratio of 26.3 dB at BER = 10 −3 were demonstrated as plotted in Fig. 21(b) for 10 Gb/s operation. Experimental results for multi-rate operation are plotted in Fig. 21(c) , where RX settling time was increased from 75 ns to 150 ns and guard time was increased to 100 ns for simultaneous 10 G/5 G/2.5 G operation without a rate select control signal. It shows superior CID tolerance up to 512-bit CIDs for 10 Gb/s (128 bits for 2.5 Gb/s) and the BER curves are smooth lines without kinks or BER floors. This design shows a great potential for use in emerging asymmetric and symmetric 10 G-GPON systems.
It should be mentioned that the measured BER curves between the 2.5 Gb/s and the 5 Gb/s have only a negligible difference. This is because the BW of the multi-rate BM-RX was fixed and originally designed for 10 G operation. This RX can be used for 5 Gb/s operation without changing the RX BW. However, at 2.5 Gb/s the measured RX performance was poor due to the too high TIA BW (3 dB BW of 6.7 GHz measured with an APD at APD gain M = 9). By switching the RX BW accordingly from 10 G to 2.5 G operation the RX performance at 2.5 G/s can be significantly optimized and improved up to 6 dB in principle. To quickly adapt the RX BW there are two ways: either using an external rate select signal provided by the MAC LSI to the BM-TIA as [31] , or by auto-detecting the data rates of incoming signals.
VIII. CONCLUSION
This paper presented a tutorial introduction to the evolution of ITU-T and IEEE TDM-PONs and their key PMD component, the BM-RX. It described the high speed PON system requirements in terms of high optical power budget for large split ratio with longer reach, short burst overhead for high network BW efficiency and network coexistence with already deployed PONs. This paper reviewed design principles and architectures of various types of BM-TIAs, BM-LAs and BM-CDRs as well as dual-rate BM-RXs, focusing on the realization of fast RX synchronization. Finally it reviewed sub-system integration and system performance demonstrations. The main conclusion is that the technologies for 10 Gb/s PON BM-RXs are ready for production. But some engineering work for high yield manufacturing and overall performance optimization may still be needed. The main figure of merit of a 10 Gb/s BM-RX is a combination of a high RX sensitivity, wide dynamic range and fast RX synchronization, but cost effectiveness, low power consumption, and good interoperability for easy and robust deployment are also important. It was shown that time-critical control signals crossing the PON PHY and MAC layer can be avoided, which is a plus for future smooth deployment.
