Timing Recovery in Digital Subscriber, Loops Using Baud-Rate Sampling
Abstract-Sampled-data techniques are the most practical means of obtaining the necessary signal ,processing functions for timing recovery in the VLSI implementation of a digital subscriber loop transceiver. The sampled-data timing recovery techniques described in this paper are applicable to both echo .cancellation-and time-compression multiplexing systems. Timing recovery using baud-rate sampling in conjunction with a special pulse-shaping and timing function fulfills all the objectives for timing recovery in this application. It recovers a timing phase that has minimum precursor intersymbol interference, and makes possible the combination of decision feedback equalizer and echo canceler, reducing the convergence time and increasing the step size. The pulse-shaping function can be performed either in the transmitter . by means of digital coding, or in the receiver by means of analog filtering. In the latter case, the transmitted pulse is compatible with more conventional approaches. The proposed partial-response line coding, . a special form of AMI coding, is less susceptible to line impairments if detected as a two-level signal. Performance by analysis, simulation, and experimental measurements is reported on a variety of cable configurations, some including bridged taps. Analysis of jitter performance leads to design techniques for reducing the jitter magnitude.
T I. INTRODUCTION HE digital subscriber loop (DSL)
is an important element of the ISDN. In the DSL, integrated voice and data services will be provided to the customer over a common facility-based upon existing twisted-pair cables. The proposed data rate for the digital subscriber loop is 144 kbits/s, which includes provision for two voice/data channels at 64 kbits/s each plus a signaling channel at 16 kbits/ s [ 11. The line data line assumed in this paper is 160 kbitsl s, with the extra 16 kbits/s reserved for maintenance, framing, and other functions.
Full-duplex transmission on a single pair of wires requires a means of separating the signals in the two transmission directions. Two competing methods for the baseband full-duplex data transceivers are time-compression multiplexing (TCM) and echo-cancellation (EC). This paper is primarily concerned with the EC technique, which is gaining favor in most standardization activities, but many of the results are equally applicable to the TCM method. This paper relates to the design of timing recovery to minimize the complexity of the EC and also maximize the EC accuracy.
There has been considerable work on realization of the echo canceler (EC) for DSL systems [1]- [6] . Monolithic realization of the EC has been shown to be feasible [2] in spite of the challenging requirement for 60 dB or more of EC accuracy. Nonlinearities introduced by asymmetry of the transmitted pulse, A-D or D-A conversions, and nonlinear transformers can be compensated with nonlinear or memory EC techniques [3], [6] .
Conventional timing recovery methods are continuoustime [7] - [14] , but sampled-data techniques are the most practical for low-cost VLSI implementations. Sampleddata DSL timing recovery techniques have been reported, with a sampling rate from two to eight times the baud rate [15] - [17] . In an EC system, timing can only be derived from the received signal after EC, and therefore the sampling rate required by the timing recovery determines the sampling rate of the EC and, hence, its complexity.
In this paper, we report a sampled-data timing recovery technique that samples at the minimum rate possible, at the baud rate. In addition, the technique facilitates the use of a decision feedback equalizer (DFE), the combination of timing recovery and DFE to achieve low timing jitter, the combination of DFE and EC to achieve fast convergence and simple realization, and unsusceptibility to line impairments including bridged taps. The two-stage EC proposed in [18] to reduce the precision of the D/A converter can also be directly applied with this timing recovery technique.
In Section 11, interrelated design issues and resulting objectives for the EC, the equalizer, and the timing recovery circuits are considered. In Section 111, the timing function and the associated pulse shaping are introduced. The .timing jitter is characterized analytically with a linearized PLL model in Section IV and later confirmed by computer simulations and experimentally in Sections VI and VII. Experimentally, this timing recovery technique performs satisfactorily for line lengths up to 5 km, some with multiple bridged taps. Several implementation alternatives are considered in Section V.
0733-8716/86/1100-1302$01 .OO 0 1986 IEEE 
DESIGN CONSIDERATIONS AND OBJECTIVES
A block diagram of the U-transceiver is shown in Fig.  1 . The EC, filters, equalizer, and detector use the clock recovered by the timing recovery. Only discrete-time techniques are considered for these functions because no known continuous-time circuit techniques can meet the required specifications when implemented in VLSI.
A . Echo Canceler
The time duration of the echo response of a subscriber loop depends upon its length and the location and length of bridged taps, and may extend to 50 ps or more. The desired 160 kbit/s baud rate translates to a baud interval of 6.25 ps, so that the EC must be designed to operate on eight or more previously transmitted symbols. Previously reported EC's operate on samples taken at two to eight times the baud rate [15] - [17] . The penalty for oversampling by a factor k is that the complexity of the EC is proportional to k . Because the EC is the most complex portion of a DSL transceiver, there is a high priority on minimizing the sampling rate.
One approach to meeting the EC a'ccuracy objectives is to design for a very small timing jitter [19] . To achieve 60 dB of EC requires about -60 dB clock jitter, which is 1/1000 of a period, or about 6.25 ns for a 160 kbit/s transmission rate. This places a severe requirement on the timing recovery circuit. We will attempt to design to this requirement in this paper, although it should be noted that there are alternative methods of achieving the required EC accuracy [3 11.
Unless the far-end signal can be removed completely from the error signal used for adaptation in the EC, the step size in the EC adaptation must be kept small, typically 0.00047 [19] , to ensure a 60 dB EC accuracy. A small step size translates to longer convergence time and larger dynamic range, and the latter makes a switchedcapacitor implementation more difficult. If the timing recovery can be achieved with baud-rate sampling at the output of a DFE, and if the sampling phase recovered results in minimum precursor intersymbol interference so that the DFE can eliminate all the intersymbol interference in the far-end signal, the far-end signal can be completely removed from the adaptation of the EC. As a result a larger step size, e.g., 0.04 or 0.05, is adequate to achieve the required EC accuracy [19]- [21] . 
B. Line Impairments
A transmission line with bridged taps introduces $fattenuation and spectral nulls [31] . Fig. 2 shows the effects of various length lines using LINEMOD [22] . Bridged taps only affect the tail portion of the pulse and, therefore, introduce postcursor intersymbol interference. To be useful with a wide range of line configurations, the timing recovery technique must be insensitive to distortion caused by the cable and especially by bridged taps.
C. Equalization Techniques
Dispersion due to 4 attenuation and bridged taps makes adaptive equalization necessary. Linear equalization has two drawbacks in the, VLSI implementation of DSL's. First, multiplication takes a great deal of silicon area. Second, there .is noise enhancement when there are spectral nulls caused by the bridged taps. Adaptive equalization without multiplications and with less noise enhancement can be achieved by using a decision-feedback equalizer (DFE). The DFE uses past decisions to synthesize and subtract off the postcursor intersymbol interference. The noise enhancement problem is reduced in the DFE due to its nonlinearity [23] , [24] . However, the DFE cannot cancel the precursor intersymbol interference, and unless the timing recovery derives a phase with minimum precursors, a linear equalizer is needed. An objective of this paper is therefore to find timing recovery techniques that minimize precursor intersymbol interference. , where the desired timing phase is defined as T~. This timing function is positive when the timing phase is advanced, and negative when retarded. A way to estimate this function from samples of the received signal taken at baud intervals was proposed by Mueller and Muller [25] . It uses a weighting vector that is an algebraic function of data symbols ak as an operator on the received samples such that the expected value of this weighted sum equals the timing function. The weighting vector corresponding to the timing function in Fig. 3 is A different timing function will be introduced after a discussion of the issue of pulse shaping.
The line code has considerable impact on the EC, equalization, and timing recovery. Appropriate preequalization, which is performed either before the EC or at the transmitter in the form of line coding, can reduce the number of taps in the EC and equalizer. The timing jitter can also be reduced by line coding. The self-equalizing line codes, such as Biphase and Wal-2 [31] , reduce the duration of the impulse response significantly at the line output, and therefore are an example of preequalization.
We propose here to use as a line code 1 -z-' partial response, also known as dicode partial response [21] , [26] . Its self-equalization property reduces the duration of the effective impulse response in the presence of line impairments, and also ensures the absence of dc. Precoded dicode partial response is equivalent to alternative mark inversion (AMI) [3], [26] . However, if dicode partial response is detected at the receiver as a two-level signal with controlled IS1 rather than as a three-level AMI code, it also has the additional advantage of being self-equalizing. Fig. 4 (a) 'and (b) compares the received pulse shapes in these two cases on a gauge 26 line with length equal to 4 km. The two-level'partial-response clearly has a much shorter effective impulse response than the threelevel AMI pulse, but also extends over two time intervals and creates severe postcursor ISI, which fortunately can be removed by a decision-feedback equalizer (DFE). We will modify the dicode transmitted pulse to introduce a zero crossing in the transmitted pulse one baud interval prior to the main pulse. We will show that this precursor zero is preserved by the line, even in the presence of bridged taps, and, hence, can be used as a reference for timing recovery. This zero crossing can be introduced at the transmitter by means of digital coding, as shown in Fig. 5(a) , where the transmitted pulse corresponding to a one bit is shown. The transmitted pulse for a zero bit is the negative of this waveform. The transmit and receive filters then reshape the pulse into the form shown in Fig. 5(b 
S(c), is
Timing Function Generator f ( 7 ) = h(7 -7' ) (2) and the desired phase is atf(q,) = 0. This timing function has the effect of utilizing the zero crossing introduced by the pulse shaping, and has the additional benefit of specifically eliminating the first precursor intersymbol interference through the choice of derived timing phase. We will show that this eliminates the need-for a linear equalizer + Fig. 6(a) and assumes values 0, 1 , $-2, -t 3, which facilitates implementation by a shift-and-add operation in the digital domain or by a ratioed switched-capacitor technique in the m analog domain. Fig. 7 shows the signal flow diagram There are other ways to remove the spectral energy at dc aside from the partial-response approach proposed here. A scrambler, which randomizes the data sequence, and AMI coding, which ensures equal number of positive and negative pulses, accomplish the same goal + The same timing function applies to these cases, and the requirement on the DFE is also relaxed because the postcursor IS1 is smaller. However, the scrambler does not guarantee . We therefore analyze the jitter performance of baudrate timing recovery using a PLL in this section, and this leads to an understanding of how to reduce the jitter. We show that if the signal is perfectly equalized to be free of ISI, a jitter-free timing signal can be achieved'with the baud-rate sampling timing recovery technique.
A. Timing Jitter Model
-A simplified version of the baud-rate-sampling timing recovery system is shown in Fig. 8(a) . The VCO output, which is the recovered timing signal, is used to sample the input data signal. The received samples and the correct received data sequence are then processed in the timing function generator block. The output of the timing function generator is equivalent to the output of the phase detector of a conventional PLL [28] , [29] [as shown in Fig. 8(b) ]. It is passed through a loop filter to drive the v c o .
During steady-state operation, when the phase jitter and phase error are very small, the PLL can be modeled as a linear system, with the two inputs to the phase detector representing the phase of the incoming data signal and the VCO output, as shown in Fig. 8(c) . The phase detector is also replaced by a linearized phase detector with two inputs in the dimension of phase. Under the same assumptions, the system in Fig. 8(a) can also be modeled by the system in Fig. 8(c) . The characteristics of the phase detectors corresponding to two types of timing function are shown in Figs. 3(b) and 5(c). Note that the timing functions are approximately monotonic and linear functions of the difference between the VCO output phase and the ideal sampling phase of the data signal. Also note that the linear system model of the PLL is a low-pass filter, with cutoff frequency defined by the loop gain and the loop filter.
The feedback in the PLL drives the VCO such that the average sampling phase of the VCO output coincides with the desired timing phase, defined by the zero crossing of the timing function. However, the actual VCO output phase fluctuates around the desired phase because.the timing generator output also fluctuates. The bandwidth of the loop as defined by the loop filter and loop. gain determines how much fluctuation gets through to the VCO output phase.
The actual timing function outputs for the two types of timing function are computed in the Appendix. This instantaneous output of the timing function generator is data dependent, introducing timing jitter. If the received pulse has a finite duration, say NT, where,T is the baud interval, this output is a function of M data symbols only, where M is an integer depending on N and the timing function, and can only assume a finite number of values. With the assumption that the PLL has converged and the ,timing jitter is very small, it can be shown that if the data sequence ak is an independent, identically distributed sequence assuming the values + 1 and -1, the output of the timing function generator is a random sequence, and it is wide-sense stationary. The probability density function can be computed from the 2 M combinations of the M data symbols. This random sequence can be referred back to the input of the phase detector, and treated as the jitter of the input phase Bin. In this case, input Bin is a discrete-time random sequence with index n denoting time.
B. Jitter Spectrum
The autocorrelation function of this input-referred jitter in two cases is computed in the Appendix. The result is plotted in Fig. 9(a) and (b) , where N was assumed to be 3 for simplicity. The same derivation can easily be extended to cases with N larger. The power spectra of the jitter are plotted in Fig. 9(c) and (d) . It is interesting to note that the jitter power'at low frequency of Fig. 9(c) is inherently much smaller than that in Fig. 9(d) , which happens to be white. This is intuitive due to the property of symmetry in Fig. 3(a) . After referring the jitter to the input, the PLL can be treated as a linear system with a lowpass characteristic. The jitter power at the output of the VCO, at point A in Fig. 8(c) , is (4) where H(w) is the effective transfer function of the PLL, & ( w ) is the power spectrum of pattern jitter referred to the input, and Seo(w) is the power spectrum of timing jitter. We see that the timing jitter can be reduced to any desired value by making the effective bandwidth of the PLL narrower. However, the bandwidth of the PLL is also subject to other constraints, such as the capture range and the pull-in range.
From (A7) and (A13) in the Appendix, another way to reduce timing jitter is to make h -, and h, both equal to zero, which is equivalent to equalizing the pulse to a Nyquist pulse. This property is a major advantage of this timing recovery technique, since the equalization operation needed for data decisions also serves the purpose of reducing timing jitter. With other timing recovery methods, a separate operation is usually needed to reduce the timing jitter [lo] , [30] .
V . INTEGRATED CIRCUIT IMPLEMENTATION
Three approaches to the integrated circuit realization of the timing recovery and DFE have been considered, namely, fully digital timing recovery including a DFE, analog timing recovery with digital DFE adaptation and an analog EC, and a fully analog approach. The fully digital approach is only practical when the echo-free signal is available in the digital domain, as it is with certain EC's [3]. Several EC's perform the critical operations in the analog domain to avoid the intermodulation due to nonlinearity in the data conversion [3], [6] , and an analog echo-free signal is available in these designs.
A. Digital Timing Recovery and DFE
Some proposed implementations use digital adaptation and analog EC. In these configurations, analog echo-free signals are digitized for adaptation, and both analog and digital echo-free signals are available. A fully digital approach is to utilize the digital echo-free signal for timing recovery and equalization, as shown in Fig. 10 . The required accuracy for the analog-to-digital conversion and the DFE were determined by simulation to be 8 bits and 12 bits, respectively.
A variety of hardware architectures can be designed for the digital timing recovery and DFE. In the experiment setup, the required hardware for the DFE is a set of up/ down counters for both storage and adaptation. An arithmetic-logic unit (ALU) and accumulator are used for addition and subtraction. To compute the timing function, the same ALU and accumulator can be used for the shift and add operations. A shift register is needed to store and update the detected data, and a ROM lookup table or some random logic is used to generate the weighting vector.
B. Analog Timing Recovery and Digital DFE Adaptation with Analog Cancellation
Possibly only the sign of the digital echo-free signal will be used for adaptation. Analog timing recovery is then necessary. Adaptation for the DFE can still be implemented in the digital domain and converted via a D/A (time shared with the D/A in the EC). Postcursor cancellation in the DFE is then performed in the analog domain. A switched-capacitor realization of the analog delay line and the simple multiplications required by the timing recovery is shown in Fig. 11 , where the adaptation and the DFE cancellation are also shown. This is only one of the many possible switched-capacitor approaches where the number of operational amplifiers is minimized.
C. Fully Analog Approach
The fully analog EC was abandoned earlier due to the small step size required by the adaptation in the EC [2] . The small step size (0.00047 typically) was required due to the presence of the far-end signal in the error signal used for adaptation. The precursor-free timing recovery sampling at the baud-rate combined with the DFE can remove the far-end signal from the error term, and, therefore, allow a much larger step size. Simulation showed that a step size of 0.03-0.05, which can be implemented with ratioed switched-capacitor techniques, performs satisfactorily for an eight-tap EC. The fully analog EC/DFE also eliminates the problems associated with nonlinearity in data conversion, which has been a difficulty in the realization of EC, and the associated die area, and therefore deserves further investigation.
VI. SIMULATION RESULTS
The transient behavior of the timing recovery process is too fast to observe experimentally. Computer simulation was used to examine the convergence of the DFE and timing recovery.
Simulation of the system is based on the block diagram shown in Fig. 12 . Note that the samples used to generate the timing function are the echo-free signals which have also been equalized by the DFE. The DFE removes the IS1 among pulses and greatly reduces the timing jitter of the recovered timing. This configuration is possible only when the sampling frequency required by the timing recovery is the baud rate, because the DFE works at the baud rate. This is one of the advantages of the techniques described here.
Simulations of the timing recovery and the residual error at the output of the DFE are shown in Figs VII. EXPERIMENTAL RESULTS Two breadboard systems, corresponding to the first two approaches discussed in the implementation, have been built. MDAC's, rather than the switched-capacitor circuit, were used in the analog portion for multipliers. The EC is not included in the breadboard setup. A digital phase-locked loop with clock running at 80 times the data rate was used for simplicity. Experiments were performed at 160 kbits/s, on cables in the laboratory ranging from 0 to 5 km in length. The cables used were gauge 24 and 26.
Cases with bridged taps'were also tested.
The waveform of a single pulse corresponding to a "1" is shown in Fig, 16(a) , where the bottom trace is the transmitter clock. The period of the clock is the baud interval, which is 6.25 ps for a 160 kbit/s transmission rate. Fig. 16(b) depicts the eye pattern (top trace), the recovered clock (second trace), the transmitter clock (third trace), and the received signal (bottom trace) for a line length equal to zero. Note that the three-level eye-openings are visible in the received waveform. The eye pattern shown in the picture is the analog version of the DFE output. They are square because the sampling rate is the baud rate. Fig. 16(c) depicts the same waveforms corre- sponding to a line length of 3.2 km with two bridged taps. Experimental results on other line configurations gave similar results.
VIII. CONCLUSIONS
A sampled-data timing recovery technique with sampling rate equal to the baud rate is well suited for use in an EC DSL system. The recovered timing phase can be used in conjunction with DFE, and this phase is insensitive to line characteristics even in the presence of bridged taps. The timing jitter can be minimized without extra circuitry. Three possible integrated circuit implementations are described. The fully analog EC/DFE seems to be the most attractive approach because of the small chip area and the elimination of the nonlinearity problems and die area associated with data conversions.
APPENDIX
The received data signal in a baseband subscriber loop receiver can be expressed as 8, 'NOVEMBER 1986 
where h(t) is the channel response to the input pulse. In subsequent analytical results, binary line coding will be assumed in which ak is an independent, identically distributed sequence of transmitted data symbols assuming the values + 1 and -1. Further assuming that the timing jitter is very small and that the samples are taken at intervals of T , the samples will be 
045)
Considering the case where h(t) has a finite duration of 3T, plugging (A3) into (A5), we get The autocorrelation function of 6 i k can be computed from-(A9) and is plotted in Fig. 9(a) . The power spectrum of the input phase jitter is the Fourier transform of the autocorrelation function, and is computed and plotted in Fig. 9(c) .
In case 2 , where f ( 7 ) = h(7 -T ) the timing function generator output is zk (ak-lakak-lak-2)Xk-2 + (ak-2 + 2 a k ) X k -l + (-ak -I -2ak ak -1 ak -2 ) x k .
(A1 1)
Again, consider the case where h(t) has a finite duration of 3T, zk = 3h-1 + (-ak-lak+l -2akak-lak-2ak+I)h-I + (ak-lak-3 -ak ak-lak-2ak-3) hl.
(A121
Since h-l = 0 at steady state, zk = (ak-lak-3 -akak-1ak-zak-3) hl. (A13)
To bring zk out from the loop to the input, divide zk by the gain of the phase detector, which in this case is the slope s of the timing function, shown in Fig. 5(c) , 6ik = zk 2~/ ( 3 s T ) .
( A 14)
Exhaustive search shows that this random input jitter 6 i k also assumes three values with probability: The autocorrelation function of 6ik is computed from (A15) and is plotted in Fig. 9(b) . The power spectrum of the input phase jitter is white, and plotted in Fig. 9(d) .
ACKNOWLEDGMENT Many thanks are due to P. O'Riordan and P. Winship for building the experimental breadboard systems and performing the experiments. Support from Pacific Bell in providing cable for the experimental work is also appreciated.
