I. INTRODUCTION
Based on the IEEE standard 802.3ab, the data rate of 1000BASE-T Gigabit Ethernet is 1Gb/s for copper wiring over distances of 100m [1] . 1-Gb/s is achieved by employing full duplex baseband transmission over four pairs of Category 5 cabling, indicating 250 Mb/s over each wire pair. Pulse Amplitude Modulation with the five levels {-2, -1, 0, 1, 2} (PAM5) is used as the transmission scheme of each wire pair. By grouping the four symbols transmitted in four channels, a four-dimensional (4-D) symbol, which carries eight information bits is formed. The transmitted symbol rate for each wire is thus 125 M baud/s, corresponding to a symbol period of 8 ns.
To achieve the target bit error rate (BER) requirement of less than , the receiver has to cope with channel impairments such as ISI, Echo, NEXT, and FEXT. The overall block diagram of 1000BASE-T transceiver for Gigabit Ethernet is shown in Fig. 1 . 10 10 − After being processed by the analog front end, the channel outputs are digitized through the A/D converter. An adaptive Feedforward Equalizer (FFE) removes the precursor ISI to make the channel minimum phase and whitens the noise, while echo is cancelled by the adaptive Echo Cancellers. The DFE/TCM decoding block is then employed to equalize the postcursor ISI and trellis decoding. Afterwards, the decoded symbols are mapped into the information bits and sent to the MAC layer.
This paper focuses on cost-effectively eliminating echo and ISI , which are the major channel impairments, by Variable Step-Sized Partial-Update Echo Canceller (VSSPU-EC) and Joint DecisionFeedback Prefilter and 1-tap Look-ahead Parallel Decision Feedback Decoder (DFP+PDFD) respectively. The Baseband system is integrated, co-simulated with the analog front end, and implemented in 2.5-V 0.25 m CMOS standard cell design flow. 
II. VARIABLE STEP-SIZED PARTIAL UPDATE ECHO CANCELLER
Due to long impulse response of echo, a traditional echo canceller requires tens to hundreds of taps in a direct-form adaptive filter. Since echo is the most dominant channel impairment in 1000BASE-T Gigabit Ethernet, reducing computational complexity of echo cancellers is a crucial task. To conquer this, [2] proposed partial-update least-mean-square (LMS) algorithm, where only a portion of weights are updated each time.
However, the convergence rate of the partial-update echo canceller (PU-EC) is seriously retarded. Therefore, we propose VSSPU-EC to reduce computational complexity without suffering from slow convergence rate. The mathematical description of our VSSPU-EC is demonstrated in the following paragraph.
The weight updating equation in sequential partial update LMS algorithm [3] is given by
where i denotes the tap number of the weight, k denotes the time index, w the weight, µ the step size, e is the error, and x is the input. For N=1, the algorithm reduces to the traditional LMS algorithm. The step size of our VSSPU-EC is recursively varied with the following equation:
where 0 < < 1 and > 0, and n is the time index. One should note that µ(n+1) is bounded by µ max and µ min . Typically, µ max is selected to provide the maximum possible rate of convergence, while µ min is chosen based on the design requirement of steadystate misadjustment. Fig.2 shows the block diagram of VSSPU-EC with 160 taps. We divide the 160 tap weights into five updating blocks, each containing 32 tap weights. An LMS Engine is applied to each updating block to handle the weight updating tasks. The architecture of LMS Engine is shown in Fig. 3 . Fig. 4 shows that our echo response and estimated tap weights are almost identical, indicating our VSSPU-EC eliminates echo effectively. Fig. 5 shows that VSSPU-EC speeds up the convergence rate by 8000 to 10000 iterations from PU-EC with minor hardware increment when SNR=23.5dB.
III. JOINT EQUALIZATION AND DECODING
To eliminate ISI and meet the 10 BER requirement in Gigabit Ethernet transmission, we apply joint DFP+1-Tap LA-PDFD [4] . The block diagram of DFP+1-Tap LA-PDFD is shown in Fig.6 . Fig.7 shows our proposed DFP architecture. The FFE shapes the channel to make the postcursor channel impulse response seen by the DFE/TCM block minimum-phase. Most of the postcursor channel energy is then concentrated in the beginning of the impulse response, thus shortening the effective channel impulse response seen by 1-Tap LA-PDFD. By adopting this architecture in our PDFD, branch metrics in parallel decision feedback decoders can be computed in a look-ahead fashion, and the critical path of 1-Tap LA-PDFD is shortened. Fig.8 and Fig.9 show the detailed architecture of our Decision Feedback Prefilter design [5] . Fig.10 shows the block diagram of 1-tap LA-PDFD [4] . Fig.9 The architecture of FBE Fig. 11 shows the error-rate performance of 1-tap LA-PDFD and conventional Decision Feedback Equalizer (DFE) using uncoded PAM5. The coding gain of adopting 1-Tap LA-PDFD is 4dB, while the maximum achievable coding gain by adopting traditional Viterbi Decoder is 5.3dB. The performance loss is tolerable since the target BER of 10 is still met under worstcase 1000BASE-T channel condition [6] . This performance loss is due to error propagation of the DFP. Because the DFP only cancels the ISI caused by the less significant tail of the postcursor ISI, the effect of the error propagations in DFP is less than that of the conventional DFE. 
IV. SYSTEM SIMULATION AND IMPLEMENTATION RESULTS
Maximum Likelihood Timing Recovery (MLTR) [7] is applied for timing synchronization. Fig. 12 shows the eye diagram and phase variation of MLTR in our design. We co-simulated our digital circuit with the analog front end over distances of 87-100m. Fig. 13 shows the digital eye diagram and phase variation of joint analog equalizer, digital equalization, and timing recovery during start-up with output SNR of 23.95dB. standard cell library to implement our 1000BASE-T Gigabit Ethernet Baseband DSP circuit. The critical path of our design is 7.48ns, which meets the timing requirement 8ns in 1000BASE-T. The core size of our design is 3 and the CHIP size of our design including PADs occupies . The chip layout and chip summary are shown in Fig. 14(a) and (b) . 
V. CONCLUSION
With low-power and low-cost DSP techniques, we eliminated echo and ISI, two of the most dominant channel impairments in 1000BASE-T Gigabit Ethernet, while still meeting the BER specification of 10 -10
. MLTR is applied for timing synchronization, and the complete baseband system is cosimulated with the analog front end. Our proposed system architecture is implemented in 2.5V, 0.25 m CMOS standard cell design. 
