Abstract-This paper presents a 1.25-Gb/s burst-mode receiver (BMRx) for upstream transmission over gigabit passive optical networks (G-PONs). The dc-coupled receiver uses a unique arrangement of three limiting amplifiers to convert the bursty input signal to a current mode logic output signal while rejecting the dc offset from a preceding transimpedance amplifier. Peak detectors extract a decision threshold from a sequence of 12 successive nonreturn-tozero (NRZ) 1's and 12 successive NRZ 0's received at the beginning of each packet. Automatic compensation of the remaining offsets of the BMRx is performed digitally via digital-to-analog converters. The chip was designed in a 0.35-m SiGe BiCMOS process. The receiver contains an APD with a gain of 6 and a transimpedance amplifier and shows a sensitivity of 32.8 dBm and a dynamic range of 23.8 dB. A sensitivity penalty of 2.2 dB is incurred when a packet with average optical power of 9 dBm precedes the packet under consideration, the guard time between the packets being 25.6 ns. The BMRx includes activity detection circuitry, capable of quickly detecting average optical levels as low as 35.5 dBm. The performed measurements prove that the receiver meets the G-PON physical media dependent layer specification defined in ITU-T Recommendation G.984.2.
I. INTRODUCTION
P ASSIVE optical networks (PONs) are identified as an economic and future safe solution to alleviate the bandwidth bottleneck in the access network [1] . Fig. 1 shows the typical point-to-multipoint (P2MP) structure of a PON. The optical line termination (OLT) distributes data in the downstream direction toward the optical network units (ONUs), located at the subscriber's premises. Each subscriber can send upstream data toward the OLT, using a time-division multiple-access (TDMA) scheme. Recently, the ITU-T has ratified Recommendation G.984.2, which specifies the physical media-dependent (PMD) layer of gigabit-capable passive optical networks (G-PONs) [2] . This paper reports on a burst-mode receiver (BMRx) that is capable of handling the challenging demands of 1.25-Gb/s upstream transmission over an ITU-T G.984.2 compliant PON. Fig . 2 shows a typical BMRx input signal, which consists of a quick succession of packets. As the optical path loss and launched optical power varies from one ONU to another, the peak amplitude of the input signal may differ more than 20 dB from packet to packet. Thus, the decision threshold needed to make a distinction between the digital 1's and 0's must to be adapted from packet to packet within nanoseconds [3] . A BMRx sets its threshold during the so-called preamble at the beginning of each packet. The presented receiver uses a sequence of 12 successive NRZ 1's and 12 NRZ 0's at 1.25 Gb/s to minimize the sensitivity penalty caused by the fast threshold extraction [4] . This is compliant to the ITU-T G.984.2 standard, where 44 b at 1.25 Gb/s are specified for both fast level and clock recovery [2] . The BMRx can handle a guard time as short as 25.6 ns between packets during which all ONU transmitters are turned off. Further complicating the design is the fact that the input signal may contain sequences of more than 72 consecutive identical digits, corresponding to 57.6 ns at 1.25 Gb/s. The combination of a guard time that is shorter than the longest sequence of consecutive identical digits, together with the varying amplitude from packet to packet implies that dc coupling is 0018-9200/$20.00 © 2005 IEEE needed [3] , [5] . Hence, dc offsets must be compensated to ensure high sensitivity and low pulse width distortion (PWD). For a cost-effective receiver, this needs to be done without any externally adjustable components. The presented receiver accomplishes inherent rejection of the dc-offset from the preceding transimpedance amplifier (TIA) and uses a digitally controlled feedback loop to compensate its internal dc offsets. In between packets, the threshold needs to be erased. As small amounts of optical power may be present during the guard time, it is difficult to distinguish between transmitted 0's and the level during the guard time [2] , [6] . Therefore, the BMRx cannot know precisely when a packet ends and an external reset signal is necessary to signal the end of a packet to the BMRx. To ease the start-up of the PON system and the connection of a new ONU to a "warm" PON (i.e., a PON that is running and has at least one connected ONU which can transmit data upstream), an activity detection circuit is included to notify whether there is any power transmitted upstream. Finally, a power-level measurement circuit is integrated, which allows the OLT to tell a specific ONU to either increase or decrease its launched power, an optional feature (power level mechanism) with important advantages supported by ITU-T G.984.2. Fig. 3 shows a simplified block diagram of the BMRx. Fig. 4 illustrates how the threshold extraction circuitry operates. Using four peak detectors (PKDs), the top and bottom levels of each phase of the differential input signal are extracted. Using resistive bridges, these levels are combined into the thresholds for each phase. The difference between each level and the threshold is then amplified by the first-stage limiting amplifiers (LAs). The differential outputs and of the TIA can be written as (1) where is its common-mode (CM) output voltage, is the useful differential output signal, and is the differential output offset of the TIA. Assuming sufficient CM rejection and completed threshold extraction, one can then write for the differential outputs of the first-stage LA and (positive phase) (2) where is the CM output voltage of the first-stage LAs, is the differential gain, and (respectively, ) the differential output signal of the TIA when a 1 (0) is received. accumulates the offsets of the PKDs and the LAs. A similar expression can be written for the negative phase. Using CM feedback (CMFB, not shown on Fig. 3) , the CM output voltages of both first-stage LAs are forced to the same level, derived from the bandgap reference. Note that the dc offset of the TIA is rejected. The differential outputs of each first-stage LA at the dark level (i.e., V) are the offsets from the LAs and the PKDs. Hence, compensation currents can be injected into the resistive bridges such that the differential outputs of the first-stage LAs are zero at dark level. The currents are set via a feedback loop using a digital algorithm. As all offsets are temperature-dependent, the BMRx regularly needs a dark period (i.e., a time slot where no ONU is allowed to launch any power into the fiber and indicated to the BMRx via the "Dark period indication;" see Fig. 3 ). This is done once every second, and, as a compensation cycle takes about 10 s, the impact on the transmission efficiency is negligible [7] .
II. BURST-MODE RECEIVER ARCHITECTURE
The use of PKDs to extract the threshold instead of track-andhold (T/H) circuitry as implemented in [8] relaxes the timing requirements put on the reset signal ("Reset 1 peak detectors" in Fig. 3 ) of the threshold extraction circuitry. When using PKDs, one needs to erase the threshold by shorting the hold capacitor Fig. 4 . Operation of the threshold extraction circuitry. Pos. pk. detector: positive peak detector; neg. pk.detector: negative peak detector. using a switch once the packet is finished. When the switch is opened again, the PKD will track the maximum of the incoming signal. Thus, the threshold of any incoming packet is immediately established, irrespective of the moment when the packet arrives. However, when using a T/H circuit, the moment of transition from track to hold mode needs to be precisely aligned with the preamble. Otherwise, the threshold is either incompletely acquired or the preamble may have ended. The fact that the BMRx does not need this alignment is especially useful when an ONU is connected to the OLT for the first time and tries to transmit a signal upstream. Indeed, at this time, one does not know the distance between this ONU and the OLT, hence it is impossible to accurately align any control signal with an incoming packet.
This receiver has several advantages over the receiver presented in [3] . In [3] , the peak value of the positive output of a differential TIA is fed back to its negative input, thus amplifying the input signal with respect to this extracted peak value. The receiver presented here extracts both the 1 and 0 level, as was also done in [5] . This reduces the sensitivity penalty due to finite extinction ratio of the optical signal [4] , [8] . Because in [3] only a positive PKD is used to extract the threshold, the extracted level will rise during reception of a packet, due to accumulation of positive noise peaks [9] . When using positive and negative PKDs, on average the positive and negative noise peaks will cancel, avoiding sensitivity penalty.
Contrary to the feedforward approach in [5] , the proposed architecture requires only a simple reset signal to erase the threshold of a preceding packet and prepare the BMRx for a new packet. In [5] , a series of differential amplifiers is used to remove the offset from the unipolar optical signal. Each amplifier amplifies the difference between the output signal from the preceding amplifier, and a threshold extracted from both the top and bottom level of this same signal. In this way, the signal-to-offset ratio is improved along the chain of amplifiers. No further offset compensation is required. However, each stage consisting of an amplifier and a threshold extraction circuit has to wait till the output of the previous stage has settled, before acquiring the threshold itself. Hence, a delay is needed to provide successive resets for each stage. When moving to gigabits per second (Gb/s) speeds, delay uncertainties will make the design of such a reset circuit increasingly difficult. Furthermore, due to gain-bandwidth-accuracy tradeoffs, a higher number of stages will be needed with increasing bit rates, increasing power consumption and chip complexity. The presented architecture breaks these tradeoffs by using separate circuitry to handle the dc-offset compensation and the high-speed signal, while requiring only a single extra signal to indicate the presence of a dark period.
The approach in [7] , developed for 155-Mb/s ATM-PON applications, consists of first measuring the threshold during special patterns embedded in physical layer operation, administration, and maintenance (PLOAM) cells. The measured threshold is then digitized and stored in a table. As the OLT knows which ONU is sending the following cell upstream, it is possible to fetch each threshold from the table and set it prior to data reception. This approach allows an accurate measurement of the threshold. However, when the PON starts up or when a newly connected ONU tries to initiate communication with the OLT, no threshold is available yet in the table. Hence, a complicated procedure is necessary to connect this ONU to the PON. The receiver interface becomes more complex as the OLT needs to tell the BMRx which ONU is sending next (using the so-called ONU-ID). This reduces transmission efficiency, as no traffic can be transmitted upstream as long as an ONU is being initialized [10] . Here, the threshold is extracted from the beginning of each packet, which allows using a simple system protocol. This allows quick initialization of upstream communication with an ONU, an important requirement for PONs [10] . Table I summarizes the main design objectives for the BMRx. The BMRx was designed for a junction temperature range from 40 C to 110 C and a power supply variation of 5%. 
III. CIRCUIT DESIGN

A. Limiting Amplifiers
The LAs employ the Cherry-Hooper architecture consisting of a transadmittance stage (TAS, , and ) followed by a transimpedance stage (TIS, , and ); see Fig. 5 , [11] . The small-signal differential voltage gain (differential output divided by differential input) can be written as (3) where are defined in Fig. 5 , (respectively ) are the transconductances of ( ), and is the base emitter dynamic resistance of . The large-signal behavior of the LA is determined by two factors. First of all, the sequence in which the bias current of each of the two differential stages is steered entirely to a single side with increasing differential input voltage sets the final output limiting levels. This sequence is determined by the differential gain from the bases of and to their respective collectors.
can be approximated as (4) where (respectively, ) are the tail currents of the TAS (TIS). If this gain is smaller than unity, then the tail current of the TAS will be steered completely to a single side and the collector currents of and will no longer change with increasing differential input voltage. In this case, the tail current of the TIS is not steered entirely to a single side. Second, noting that the current through each load resistor consists of the sum of a noninverting and an inverting current, it is clear that the ratio will also determine the large-signal behavior [12] . To avoid that the polarity of the output signal reverses with an increasing differential input voltage, the ratio must be bigger than 2. The output CM level can be written as (5) where is the voltage drop from the output emitter followers. The positive and negative limiting levels and are obtained from (6) assuming that the currents in the TIS are (almost) completely switched. Using a CMFB loop, is adjusted such that the output CM level is equal to . is then equal to
There are sufficient degrees of freedom to independently set the small-signal gain on one hand and the output CM and limiting levels on the other hand, while at the same time ensuring that no transistors enter the saturation region. This is an advantageous property of this amplifier topology as this receiver is entirely dc-coupled. Each input of the BMRx is driven by the 50-single-ended output impedance of the TIA, and hence only the voltage noise from gives a significant contribution to the input-referred noise of the BMRx. This noise term decreases with increasing collector current [13, pp. 769] . As the preamble consists of 12 1's and 12 0's, the noise bandwidth at the base of can be reduced such that the noise contribution from can be neglected, despite the higher output resistance of the resistive bridge.
The differential response of the LAs can now be optimized. was optimized with respect to the large-signal response using SPICE simulations.
was optimized for dc-coupling to the following stage. needs to be sufficiently high to keep the input-referred voltage noise from low and is set by adjusting . determines the small-signal gain. The values of the final parameters are given in Table II . With these values, and are never entirely switched off, minimizing PWD for large input swings. and were laid out using two emitter stripes and three base stripes to minimize the noise from extrinsic base and emitter resistors. The emitter area of transistors and has been optimized to ensure minimal PWD, especially for large-signal swings. If the emitter current density of is too small, then the transition frequency degrades, causing eye closure. If the emitter current density is too large, then the forward current gain degrades due to extrinsic base and emitter resistances, causing PWD. A separate power supply pair was used for the first-stage LAs, to avoid receiver sensitivity degradation due to power supply noise.
The input CM voltages of the first-stage LAs vary with the incoming data signal and the extracted threshold, see Fig. 4 . As shown in Fig. 6 , this CM signal is injected into the CMFB loop and may cause signal degradation. The worst case happens when a weak packet follows a strong packet. This is unlike postamplifiers for traditional optical receivers, where ac coupling, automatic gain control, or a slow offset-compensation loop ensures that the CM input voltage of the postamplifier is essentially constant [12] , [14] , [15] . The dominant poles of are situated at the collectors of and and are therefore fixed once the differential response of the circuit has been optimized. As these poles are located at high frequencies, stability of the CMFB loop is ensured by introducing a dominant pole within the feedback amplifier response at a sufficiently low frequency. To minimize PWD, the mismatch between the CM levels of the first-stage LAs was required to be less than 15 mV (for 99.7% of all the devices). To ensure good matching, large devices are needed for the CMFB amplifier, trading bandwidth for dc accuracy. Therefore, the bandwidth of the CMFB loop was limited to 30 MHz, the dc-loop gain is 60 dB. The CMFB loop was designed for a phase margin of 70 , ensuring small overshoot after a disturbance of the CMFB loop. Fig. 7 shows CM transients occurring after reception of a strong packet (differential input of the BMRx was 1.6 V, the waveforms are a sample from a Monte Carlo run to include the effects from mismatch between devices). It is clear that the CM output level settles within less than 25.6 ns, which is the minimum guard time.
The output LA uses the same topology as in Fig. 5 , but no CMFB loop is used anymore. The differential gain of the output LA is 24 dB. Finally, the CML output stage consists of an n-p-n differential pair, loaded with on-die 50-resistors. Fig. 8 shows the configuration of each of the four PKDs. From Fig. 4 , one can see that the amplifier of the two positive-phase PKDs exhibits an input CM level that swings from toward . For the two negative-phase PKDs, the CM level swings from toward . Hence, to ensure a wide input CM swing, the positive-phase PKDs use an nMOS input pair, while the negative-phase PKDs use a pMOS input pair [16, pp. 126-132] . Fig. 9 shows the relationship between the reset signal ("Reset 1 peak detectors" in Fig. 3) , the activity detection signal ("Activity detection CMOS output" in Fig. 3 ) and the peak levels acquired from the positive phase output of the TIA. During the guard time, the reset signal erases the acquired peak levels and resets the activity detection circuit. When the reset signal is brought low again, the PKDs are "released" and will track their peak levels. However, when the negative PKD is "released" at the falling edge of the reset signal, it may acquire the dark level, which is different from the 0 level due to the finite extinction ratio of the optical input signal. Therefore, the negative PKD is "released" at the moment when the activity detection circuit senses the 1 b of the preamble.
B. Peak Detectors
An operational transconductance amplifier (OTA) is used to drive the hold capacitor during peak acquisition, simplifying compensation of the feedback loop [17] . Source followers are used to drive the resistive bridges (see Fig. 3 ) of the threshold extraction circuit. pMOS source followers (for the positive PKDs) and nMOS source followers (for the negative PKDs) are used as level shifters to compensate the forward voltage drop of the diode during peak acquisition, thus keeping the output voltage of the OTAs within their admissible ranges. For the OTA, a current mirroring topology was chosen, as can be seen in Fig. 10 . To increase the output impedance, cascode current mirrors were used. To increase both the input common-mode range and the output swing, wide-swing cascode mirrors were employed [18, pp. 256-266] . Although it is known that the folded-cascode OTA exhibits somewhat better settling time [19] and lower input referred noise [18, pp. 276 ], the current mirroring topology was chosen here due to its excellent large-signal response [20] . To ensure low-noise operation, the transconductance of the mirroring transistors was chosen sufficiently small, thus trading noise performance for signal swing.
C. Offset Compensation Circuitry
Bidirectional current sources were constructed using a fixed pMOS current source and an nMOS current-output digital-to-analog converter (IDAC) whose full-scale current is double that of the fixed current. Using Monte Carlo simulations, the worst-case accumulated dc offset from the LAs and the PKDs of a single phase was established to be mV for 99.7% of the devices. Using 7-b IDACs, the offset can be compensated with accuracy better than 0.5 mV. When the "Dark period indication" is brought high, the digital algorithm switches the current from the IDACs away from the resistive bridge. Then, in a successive approximation fashion, each bit of the IDAC is set high, and it is tested whether or not the resulting differential output of the first-stage LAs is positive or negative. Based upon this information, it is decided whether to keep the current bit high or bring it low again. Fig. 11 shows the architecture of the IDACs. A two-dimensional (2-D) row-column decoder scheme was used to reduce interconnection complexity [16, pp. 241-242] . Careful post-layout simulations were performed to estimate the capacitance at the output node of the IDAC, which influences the time constant of the threshold extraction.
D. Activity Detection and Power Level Measurement Circuitry
The activity detection circuitry is detailed in Fig. 12 . An amplifier provides 16-dB gain and limits the bandwidth of the input signal to 160 MHz. A CMFB loop is closed around this amplifier and fixes the CM output voltage to . The negative output of the amplifier is compared against a reference using a comparator. The reference is established using two 8-b IDACs and a resistor network. One IDAC serves to compensate for the offsets present in the activity detection path and is controlled via the digital algorithm. The other IDAC introduces an offset above the dark level and sets the threshold above which activity is signaled. This offset is set via a digital serial interface, and is the subject of a tradeoff. If the offset is set too low, then false activity will be reported due to noise peaks; if the offset is too high, then the sensitivity of the activity detection circuit is reduced.
The power-level measurement (PLM) circuitry compares the decision threshold from the PKDs that are representative for the average optical power of a packet against two references. Each reference is set using an 8-b IDAC in a similar fashion as in Fig. 12 . Two additional 8-b IDACs are used to compensate for the offsets from the PLM circuitry. Note that the offsets of the PLM circuitry can only be adjusted once the offsets from the PKDs and LAs have been compensated.
IV. EXPERIMENTAL RESULTS
A die microphotograph is shown in Fig. 13 ; the chip was packaged in a 68-pin VFQFPN, and various test pins were added for testing of individual blocks such as all IDACs and monitoring of the offset compensation circuitry. The BMRx has been tested using a dc-coupled TIA, which was integrated together with an APD into a single module. Table III gives the parameters of this module. The BMRx and TIA were not integrated upon the same die, to reduce the risk of a possible sensitivity penalty due to substrate crosstalk caused by the digital control logic running on the BMRx. The digital serial interface allows one to read the digital codes of the IDACs after the offset compensation, thus allowing to estimate the actual dc offsets of the different blocks. In this way, it was estimated that the accumulated dc offset of the positive phase PKDs and first-stage LA is 3.5 mV, for the negative phase the dc offset was estimated to be 7.0 mV. The differential input noise voltage of the BMRx was estimated to be 180 V . Fig. 14 demonstrates that the BMRx is capable of recovering the amplitude information of incoming packets even with large amplitude differences. Unless otherwise stated the guard time between packets is 25.6 ns. Each packet consists of a preamble (12 1 s and 12 0 s) followed by a PRBS (pseudorandom bit sequence) payload with a length of 128.000 bits. Within the guard time, the reset signal is brought high for 12.8 ns. The shown timing of packets is identified as being the worst case for the BMRx, in terms of any "tails" caused by the strong packet. Fig. 15 shows bit-error rate (BER) measurements performed upon the PRBS part of the packets. In case a weak packet follows a strong packet (with the same timing as in Fig. 14) , it is clear that a sensitivity penalty is incurred, mainly due to a tail from the APD and TIA. Errors were only measured within the weak packet. Fig. 16 shows this sensitivity penalty at a BER as a function of the average optical power of the preceding packet. The PWD of the bit pattern immediately after the preamble must be limited to ensure correct timing recovery, which is done on a chip following the BMRx. Fig. 17 shows the measured PWD of the first four bits after the preamble and the PWD measured after the TIA. Note the severe PWD of the 1st bit after the preamble for an average optical input power below 30.0 dBm. The reason is that for these optical powers the differential output voltage of the TIA is smaller than 20 mV. Simu- lations show that for differential input voltages below 15 mV, the OTA of the negative PKD of the negative phase can no longer switch the diode fully on, resulting in a transient that extends up to the 1st bit after the preamble. Indeed, one can see that the PWD of the second bit after the preamble falls within acceptable margins. For high optical input powers, the PWD increases due to PWD of the TIA. This can be seen on Fig. 17 by noting that the largest contribution to the PWD at high optical input powers stems from the TIA. Temperature tests were conducted over a temperature range from 0 C till 50 C, a range limited by the APD contained in the module together with the TIA. The bias voltage of the APD was adjusted to maintain an avalanche gain of 6. A sensitivity penalty of 0.5 dB was measured at 50 C. At 50 C, the PWD is 20% at an optical input power of 9.6 dBm. Fig. 18 shows the waveforms when the activity detection circuitry is handling a packet with an average optical level of 35.5 dBm. Table IV summarizes the most important experimental results. Using Q-measurements performed upon the TIA output, it is estimated that the continuous-mode sensitivity of the TIA is 33.1 dBm. Hence, the BMRx introduces a sensitivity penalty of 0.3 dB.
V. CONCLUSION A 1.25-Gb/s burst-mode receiver with high sensitivity and wide dynamic range has been presented. The performed measurements underline the successful implementation of the proposed architecture.
The key features of this receiver are the acquisition of the decision threshold from the 1's and 0's level, the inherent rejection of the dc-offset of the TIA and a sensitive and fast activity detection circuit. When compared to burst-mode receivers that use a single positive peak detector to extract the decision threshold, the problems of threshold drift due to accumulation of noise peaks and sensitivity penalty due to finite extinction ratio are solved [4] , [8] , [9] . The inherent rejection of the TIA dc-offset and the inclusion of an automatic offset compensation loop break the tradeoff between high gain-bandwidth required for high bit-rates on one hand and dc-accuracy required for a sensitive BMRx on the other hand.
