Abstract-An all digital QAM system is studied here for high-speed data transmission on digital subscriber loops. All elements of the system have been fully investigated and their effects on the overall system performance documented. In particular the following issues have been explicitly addressed: utility of an adaptive error-prediction (noise-prediction) filter; equalizer size and convergence; blind equalization; choice of constellation size and center frequency; combined timing/ carrier recovery, and finally the oversampling requirements of an NCO for potential use in the timing and carrier recovery loop. The paper relies on simulation results to investigate and quantify the overall interactions of the various blocks. Throughout the paper emphasis has been placed on reducing the complexity of the system for implementation on an integrated circuit (IC). It is expected that the information provided here will serve as a good starting point for an IC designer to start the implementation of a transceiver ASIC. 
List of Tables 
Introduction
High rate data transmission over copper wire has received considerable attention in the recent past [4] [8]- [13] [23]- [25] due to the desire of telephone operating companies to transmit high rate digital data over their existing infrastructure of copper loops [1] [4] . These efforts have given rise to standards and products for Basic-Rate digital subscriber loop (DSL) services [2] [3], high-speed DSL (HDSL) services [6] [7] , and more recently, asymmetric DSL (ADSL) services.
Other industries have also started to look at copper as a medium for high rate digital transmission.
There are presently efforts under way to investigate and identify 100 Mbps Ethernet LAN's on category-V, IV and III copper lines. Similar efforts are under way for 155 Mbps ATM networks [8] . Copper distributed data interface (CDDI) systems have also been discussed as a substitute to FDDI systems [8] [9] .
This paper presents a complete digital-QAM system, targeted for implementation on a single IC, for delivering ADSL services. The entire system architecture is presented in detail, right down to the finite-wordlength requirements of the signal processing blocks. Topics of discussion include; blind equalization; potential gain realized with error prediction; FFE-DFE requirements, convergence, and interaction; A/D size; finite-wordlength requirements for the FFE and DFE, and finally timing and carrier recovery, Although, There have been considerable contributions on the subject of QAM modulation on subscriber loops [16] [23]- [26] , they have mostly focused on establishing the feasibility of a QAM systems using an ideal receiver and transmitter in the presence of channel impairments only.
Babak Daneshrad, Henry Samueli
These impairments include severe signal attenuation, inter-symbol-interference and high cross-talk noise interference. Previous work by the authors [13] [33] [13] has incorporated some of the receiver non-idealities as well as those of the channel. This paper, however, provides a comprehensive look at the system and its constituent parts, with an eye towards total VLSI integration. The paper addresses most, if not all, of the system related issues and provides the hardware designer with all the necessary information needed to implement the system. Moreover, ease of implementation is an underlying focus throughout the paper.
The flow of the paper will be as follows: The next section provides a brief introduction to ADSL. Section 3 presents a set of default simulation parameters and procedures. Sections 4 through 7, will present results and concepts dealing with equalization, finite wordlength requirements and clock recovery. The paper, then concluded in Section 8.
Asymmetric Digital Subscriber Line (ADSL)
ADSL is the name given to a service to deliver high bit rate, half-duplex data to residential customers. Following Basic-Rate DSL and HDSL, it is the next step in the progression of Digital Subscriber Loop services. The earlier version of ADSL, ADSL-I, attempts to transmit 1.6 Mbps half-duplex on twisted copper loops falling outside the carrier service area (CSA). The 1.6 Mbps data rate is sufficient to transmit compressed motion video to the home for the realization of video on demand services. In recent years ADSL-II has been introduced for the transmission of 6 Mbps half-duplex data over shorter loops.
This work is focused on identifying a system for the delivery of ADSL-I services. The received signal is generally corrupted by inter-symbol-interference (ISI) and severe attenuation, depending on the channel, Figure 1 . Other impairments are manifested in the form of noise, classified into impulse noise, near-end crosstalk (NEXT), additive white gaussian noise (AWGN), and to a lesser degree far-end crosstalk (FEXT) from other ADSL services. The slicer SNR should be sufficient to provide the targeted BER of 10 -7 plus a 6 dB margin. In addition to the 1.6 Mbps half-duplex channel, it is desired to have plain old telephone service (POTS) and a 16 kbps return channel on the same loop to carry control information in the reverse direction. The presence of the slower return path as well as the high data rate in the forward direction suggest that unlike its Babak Daneshrad, Henry Samueli predecessors basic-rate DSL and HDSL, ADSL may be better realized using a passband modulation scheme where the forward and return paths are frequency multiplexed, and the issue of echo cancellation is removed as a consequence. Having moved the signal constellation to passband has the added advantage that the harmful NEXT due to basic-rate neighbors falls out of band. Although it still remains the most dominant source of noise, the NEXT contribution is severely diminished as a consequence. The power spectral density of NEXT from a basic rate disturber and FEXT from another ADSL system along with equations describing their behavior can be found in [13] and [33] . In this work AWGN with a power spectral density of -140 dBm/Hz was also added to the system for a total power of -84 dBm. The power of NEXT and FEXT noise were measured to be -77.5 dBm and -110 dBm respectively, resulting in a total additive noise power of -76.5 dBm.
Proposed Modulation Formats for ADSL
One of the more sophisticated and ambitious passband modulation formats that was first proposed for HDSL and is now the standard for ADSL services is Discrete Multitone (DMT). The idea has been around for some time, and is also referred to as Orthogonal Frequency Division
Multiplexing (OFDM) [39] . According to Chow [17] , the multitone modulation can be theoretically proven to be optimal. In other words, it can come very close to the theoretical channel capacity for the subscriber loops. However, Zervos [13] An alternative modulation format for ADSL is termed Carrierless AM-PM (CAP) [20] .
CAP has recently been proposed by AT&T to realize a variety of DSL services including ADSL [12] [21] [22] . To this point, the literature has not been very rich regarding the possible performance of a CAP system for ADSL. Furthermore, there has been little discussion of the other aspects such as timing recovery and word length requirements, both of which are crucial in the identification of a system for IC realization.
Quadrature Amplitude Modulation (QAM) is a third possibility proposed for high speed DSL transmission. QAM has been utilized extensively in the past for many digital data transmission applications. In particular, it has received a considerable amount of attention as a candidate system for ADSL [16] [23]- [25] . These studies are for the most part theoretical analyses intended to gauge the performance limits of QAM systems for ADSL and do not differentiate between analog and digital implementations of the system. The system proposed in this paper is an all-digital QAM realization, the signals are processed digitally all the way to the carrier frequency of 300 kHz where they are fed to an analog-to-digital (A/D) converter and coupled onto the line.
Such an implementation can be realized entirely on a single digital CMOS IC, thus greatly reducing the component count and the system cost. The implementation is possible in light of recent advances in the area of microelectronic circuits, where A/D converter rates have gone up and low spurious digital frequency synthesizers have become available. In fact, the ICs reported in [28] - [31] which are all targeted for high rate applications can be applied to ADSL, by scaling the operating frequency and changing the chip architecture to take advantage of the lower data rate.
With the architectural changes also comes the possibility of integrating the entire receiver on a single IC [34] . The notion of a single chip containing all the necessary receiver functions is quite appealing and is one of the driving forces behind this work.
Simulation Defaults
The block diagram of the simulated system is shown in Figure 2 . The system calls for the use of a numerically-controlled-oscillator (NCO), which can be implemented on the same IC as the other blocks, to correct for sampling phase and frequency offsets. As shown in Figure 2 , the system uses three complex adaptive filters, one for the feed-forward equalizer (FFE), the second for the decision-feedback equalizer (DFE), and the third is used as an error-prediction (EP) filter [32] . The three filters work in conjunction to adapt the system to any ADSL environment and to optimize the receiver performance. The FFE and DFE work together to equalize the linear distortion caused by the channel and the EP removes any correlation which may be left in the error signal after equalization. This is a way of getting higher performance with short equalizers. As seen in later sections the impact of the EP block on the steady state system performance is weakened as the equalizer length is increased.
The set of default parameters used in the studies is listed below, and unless otherwise stated in the text, the simulations used the following configuration. 8. Performance measure: SNR at slicer input.
9. NEXT, FEXT, AWGN as described in Section 2. Total noise power -76.5 dBm 10. 9.5 dBm transmitted power, -45 dBm total received signal power with a 16-QAM constellation.
The reason for the initial transmission of a 4-QAM constellation is that none of the simulations use a training sequence to converge the equalizer, and studies have shown that the system can be converged by going through a training period during which only two level data is transmitted on each rail [13] . At the end of this period, the equalizers reach their steady-state operating point, and the system may be switched to the desired constellation size. However, if it is only desired to gauge the relative performance of the system as a function of parameter changes, it is sufficient to observe the 4-QAM system's performance alone, since the relative performance after the switch to the larger constellation will remain unchanged. It was observed that for a switch from 4-to 16-QAM the performance drops by close to 2.7 dB, which is in line with the drop in the transmitted symbol variance from 18, for I-and Q-rail data taken from the set {-3, 3} (4-QAM), to 10 for the set {-3, -1, 1, 3} (16-QAM).
Equalization
The transmission of high-rate data on the copper loop can not be realized without equalization. This section is dedicated to identifying the equalizer requirements of the QAM system operating in the DSL environment. A study is first made into the effectiveness of an adaptive error-prediction filter on the overall system performance, next the minimum requirements of the FFE and DFE are examined, along with the optimum adaptation step size for these filters.
The ensuing sections will then investigate the interactions of the equalizer with the timing recovery algorithm. shows as much as 2.9 dB improvement in the performance of the T-spaced system as we move away from the optimum sampling phase. A similar trend was observed with a 64-QAM system operated at f c = 267 kHz [13] . In the case of fractional equalization however, the error predictor gain is not quite as significant since such systems are more immune to sampling phase offsets than their baud-spaced counterparts [35] . In such systems the EP only improves the steady state SNR by a maximum of 1.0 dB and its inclusion into the system can not be justified on this basis. We thus turned to investigating the transient behavior of the system both with and without the error Babak Daneshrad, Henry Samueli predictor.
Performance Gain using Error Prediction
A set of simulations were carried out in which a sampling phase ramp (constant sampling frequency offset) was applied to the system. Such a scenario arises when the timing recovery loop is trying to adapt to a sampling frequency offset between the transmitter and receiver. In the study, the T/2-spaced system with 0% sampling phase offset was first allowed to converge. The phase ramp was then applied until the end of the simulation when SNR was noted both with and without the error predictor. The whole setup was then repeated for other values of sampling frequency Figure 4 , and show a considerable improvement, as much as 3.5 dB, with the EP. In fact as the phase ramp becomes steeper, the improvement obtained with the EP becomes larger. This is desirable since it allows the system designer to increase the bandwidth of the timing recovery PLL without seriously jeopardizing the system performance.
The simulation data presented in this section all point to the fact that in the presence of a T/ 2-spaced equalizer optimum, steady-state performance can be achieved without the use of an error predictor. However, the advantages of an error predictor are most pronounced in situations where the performance must be maintained in the presence of slow variations in system parameters. The adaptation of the error predictor, as shown in Figure 2 is independent of that of the equalizers.
Consequently, as long as reliable data is available at the slicer the adaptation step size of the errorprediction filter can be made larger than that of the equalizers without jeopardizing system stability. Thus the error predictor can be used to reduce the system response time to transient effects. Based on these results the use of a 2 to 3 tap error predictor, as suggested by Figure 4 , is recommended.
Number of Taps
An important concern in the design of any system targeted for actual implementation is the hardware requirements of the system. By identifying the minimum amount of hardware required to realize the system without any significant loss in performance, the system designer can realize a cost-effective unit that does not waste any more power and dollars than is necessary. In this section, we determine the smallest number of taps required for the adaptive filters to yield the same SNR results as the 20-20-4 (20 FFE taps, 20 DFE taps, 4 EP taps) system discussed thus far.
The steady state coefficient values for the three adaptive filters were observed at the end of a simulation run. These results suggested that an 8 to 10 tap DFE along with a 3 tap EP may be sufficient to realize the system. The FFE coefficients, however, did not lend themselves readily to visual inspection. Consequently, a set of simulations were carried out to identify the minimum number of T/2-spaced FFE taps. The results are summarized in Figure 5 . The plot shows a slow but steady decrease in system performance when the number of FFE taps falls below 16. Although 16 taps is sufficient for the scenario simulated in this study, a larger choice for the FFE length may Babak Daneshrad, Henry Samueli be judicious to ensure system performance over a wide set of loops.
To confirm the observations made regarding the number of DFE and EP taps required, a second simulation using the same exact system as before was run in which the number of FFE taps was fixed at 20 and the SNR fluctuations were observed as a function of the DFE and EP lengths.
The results are summarized in Table 1 . Little gain is achieved with a DFE longer than 10 taps or an EP filter of more than 2 taps. 
Adaptation
Step Size, µ
All adaptive blocks within the system are updated using the least mean square (LMS) algorithm. The updating algorithm for each of the three filters is shown in equations (4.1) through (4.3) , where the signals refer back to the receiver block diagram of Figure 2 .. These equations dictate the trajectory of the filter coefficients as they converge to their optimum value and the step size µ controls their speed of convergence. Thus the larger the µ value, the quicker the filter will converge. Care must be taken in choosing the appropriate value of µ for each of the three adaptive blocks especially in the case where the system is expected to converge blindly. In addition to the absolute magnitude of the step size which controls the convergence and the amount of residual error, the relative size of the µ's is also of importance in determining the interaction between the DFE and FFE blocks (the EP is outside the equalizer loop). In general, it was found that for proper convergence without a training sequence, it is best to choose µ dfe to be smaller than µ ffe . As reflected in the default parameter values presented in the previous section a factor of four was found to be appropriate. This imbalance in the step sizes forces the DFE to follow the FFE and thus alleviates possible contention between the two units.
The effects of changing µ ffe on the overall system performance is shown in Figure 6 for a selected set of step sizes. There is an optimum µ ffe value of 1.9x10 -4 , above which any increase in the value of the step size will reduce, rather than increase, the speed of convergence. The effects of a large step size on the residual error may also be observed in the plot. The simulations showed the system performance for µ ffe in the range of 1.0x10 -4 to 2.5x10 -4 to be almost identical, with degradations observed outside this range.
The system performance as a function of the DFE step size was also observed for a set of 
Blind Equalization
It is desirable to simplify the system and eliminate the need for extra circuitry or handshake protocols necessary to synchronize with a training sequence. A simple method was sought to converge the equalizer as well as the decision directed timing recovery loop (Section 6). Several approaches were considered for this purpose, they include: a) the use of a fixed prefilter to ensure an eye opening at the start of transmission, b) the use of spectral estimation techniques to obtain the frequency response and from that the impulse response of the channel, c) computation of the equalizer coefficients by measuring the channel impulse response, and d) the use of random QPSK (4-QAM) signaling.
Of the four options above, the last was by far the most promising. It combines simplicity of design with reliability. Through simulations, it was demonstrated that starting from zero state, the Babak Daneshrad, Henry Samueli equalizers can be converged by simply transmitting random two level data. The method was tested by evaluating its performance on many different channels and system configurations. It was found that with careful selection of the adaptation step sizes in the FFE and DFE this approach works quite reliably. In fact, all simulations presented here were converged in this manner, using the default µ values of Section 3.
Finite-Wordlength
An important parameter in the design of ICs for data transmission is the number of bits necessary to represent each of the signals. Too many bits will use up silicon real-estate and power without any improvement in the overall system performance. On the other hand, an insufficient number of bits will cause the quantization error to overwhelm the system and degrade the performance. A study into the finite wordlength requirements of the system was made in [13] where it was reported that a 10 bit A/D is sufficient for optimum performance. The precision requirement for the equalizer signals were also derived, and are summarized below in Table 2 . The table entries for the filter input and filter multiply-accumulate precisions are self explanatory (the DFE input is the sliced data value which requires only 2 bits on each of the I and Q rails for a 16-QAM constellation). The coefficient accumulator precision refers to the accumulator used in the coefficient update loop. Insufficient precision here keeps the filter coefficients from converging to their optimum values and thus slightly alters the frequency response of the filters. As the size of this accumulator decreases the variance of the coefficients C j will increase since the LSB's of the error signal are lost. The last entry in the table, coefficient, represents the number of bits used for the coefficient C j in the filtering operation. In general this is smaller than the coefficient accumulator size, since any changes in the size of this parameter simply manifests itself as multiplier quantization noise. Figure 7 . It is seen that the system performance is almost identical for f c = 300 kHz, and f c = 350 kHz, with f c = 400 kHz yielding the worst SNR, followed by f c = 250 kHz. The degraded performance seen at the lower frequency is due to the high NEXT energy that is present at baseband. With a center frequency of 250 kHz, the lower band edge of the constellation falls at 50 kHz, which is only 10 kHz away from the 40 kHz 2nd order Butterworth lowpass filter used to shape the NEXT disturber. Consequently, a good portion of the NEXT noise falls in-band and seriously compromises the system performance. The poor performance at the higher frequencies is caused by the high signal loss introduced by the channel.
Similar simulations were carried out to find the optimum center frequency for systems operating over 12kft, 24 AWG and 26 AWG loops. The channel loss is lower for these loops, and consequently the center frequency at which the additive gaussian noise becomes a dominant factor reveals an increase of more than 11 dB in the system SNR on the shorter loop, Table 3 , thus suggesting that at these shorter distances a higher order constellation may be used to increase the overall data rate. Table 3 also shows the 400 kbaud system's performance when operated over a 12 kft 26 AWG loop at four different center frequencies. The performance over this loop is better than that of the 18 kft 24 AWG loop by close to 7 dB, resulting in a final SNR of 34.6 dB. The results suggest that a center frequency of 300 kHz is still close to the optimum for operation over all three loops. This is mainly due to the lowpass nature of the dominant near-end cross-talk noise which becomes less threatening with increasing carrier frequency.
The next set of studies focused on a 64-QAM system operating at f baud = 266.7 kHz. Once again a 4-QAM constellation was transmitted for blind equalization at 266.7 kbaud with the results summarized in Table 3 . The SNR figures obtained here are very similar to those obtained for the 16-QAM case when, the two systems have the same lower cutoff frequency, suggesting that the SNR is determined more by the lower band-edge. Furthermore, consideration of the 64-QAM constellation may thus be abandoned, in view of its higher SNR requirement for a given BER.
In choosing the constellation size and center frequency, consideration should also be given Table 4 . 64-QAM performance over different loops and carrier frequencies.
Babak Daneshrad, Henry Samueli to the relative difficulty of VLSI implementation. Varying the constellation size has minimal effect on the hardware requirement of the system, however, the choice of center frequency may greatly simplify the IC implementation [28] . If the center frequency is chosen to be equal to the baud rate and one fourth of the sampling frequency (f c = f baud = f s /4), then the quadrature samples for the sine and cosine terms needed at the mixer can be taken at π/2 intervals with values take from the set {1, 0, -1}. The introduction of such a sequence of zeros and ones eliminates the need for the digital frequency synthesizer altogether. Furthermore, the insertion of alternate zeros into the data stream allows us to collapse the separate I-and Q-rail lowpass shaping filters into a single filter structure. For a 16-QAM system, such an approach will result in f c = 400 kHz which from the previous SNR results is obviously unacceptable (Figure 7 ). On the other hand a 64-QAM system, f baud = 266.7 kHz, operating at f c = 266.7 kHz exhibits a 28.2 dB SNR which is comparable to the 29 dB obtained for the 16-QAM system operating at f c = 300 kHz. But once the switch is made to 64-QAM, the symbol error probability will not be tolerable. Consequently, simplicity of implementation must be sacrificed to obtain the higher SNR required for the target BER.
Timing and Carrier Recovery
The simulation results presented in the previous sections have all assumed that the receiver has perfect knowledge of the carrier frequency as well as the sampling phase of the transmitted signal. This assumption must be removed in the final analysis, since the performance of the system is entirely dependant on the effectiveness of the tracking loops at the receiver to acquire and track both the carrier, and the sampling phase of the transmitted signal. In the proposed digital QAM system, the tasks of carrier and timing recovery are greatly simplified due to choices made at the architectural level regarding the overall system [33] , namely the use of digital IF and T/2-spaced complex equalization. Realizing the IF signal digitally, has the added advantage that the carrier frequency and sampling frequency are related through the equation for the carrier signal as follows:
Thus for a fixed value of f c variations in f s will cause the normalized frequency of s(n) to change. Consequently, the need for carrier recovery is alleviated and only timing recovery remains. The use of a T/2 equalizer also has implications on the timing recovery loop, we no longer need to lock to the sampling phase, just the sampling frequency.
The concept of the phase detector used in this work is presented in detail in [33] . It is a decision directed scheme which is shown in block diagram form in Figure 8 . In [33] it is suggested that the sign of the phase error signal after the correlator (multiplier) should be fed to the loop filter.
Although attractive from a hardware stand point further investigation revealed such a loop to be very sensitive to the choice of PLL parameters. In this section we embark upon a series of studies to identify a more robust timing recovery loop based on the original concept of [33] . We will first present the S-curve of the phase error with different size accumulators in the presence and absence of the sign blocks, Figure 9 . Next the tracking and acquisition behavior of the various configurations are studied as well as the interaction of the timing recovery loop with the equalizer as they start from zero state.
Phase Detector S-curves
Taking out the sign operations shown in Figure 9 , and using a 300 point sliding window accumulator the S-curve of Figure 10 is obtained. It clearly shows the variations in the difference,
, as the timing moves over a baud interval. As expected the curve is not symmetric, since the Babak Daneshrad, Henry Samueli received demodulated pulse is not symmetric. Although a fairly reliable S-curve is obtained, the inclusion of a 300-point accumulator makes the system unattractive from a hardware implementation perspective. It is desirable to perform the averaging in the loop filter and NCO and thus simplify the phase detector. The S-curve for such a simplified phase detector is shown in Figure 11 . It is still possible to observe the underlying trend in the error signal, but the signal is much noisier than that of Figure 10 . Later sections will show that despite the large signal fluctuations such a phase detector can be relied upon to lock onto and track sampling phase errors.
In going from an infinite precision implementation of the phase detector to a binary implementation, where the sign blocks of Figure 9 are included, a trade-off is made between the jitter introduced by the phase detector and the ease with which the phase detector may be implemented. The S-curve obtained with the sign algorithm after a 300 point accumulator is very similar in shape to the curve of Figure 
Tracking Performance, Infinite Precision Detector
In order to study the tracking performance of the loop we must first ensure correct slicer decisions. Thus, the system was initially allowed to run without any sampling phase error, on 4-QAM data until the equalizer was converged. A phase step was then introduced, followed by a frequency step of 120 Hz. Finally, the constellation size was changed from 4-to 16-QAM. The results are summarized in Figure 12 Figure 12 . System response to step phase and frequency offsets using an infinite precision phase detector with a 300 sample correlator-accumulator. (a) Sampling phase error, (b) SNR trajectory with and without an EP, (c) Phase detector output after the 300 sample correlator-accumulator.
Babak Daneshrad, Henry Samueli simulation was carried out in which only a phase error step was introduced. In this case the steady state phase error curve and the SNR trajectory did not deviate significantly from those of Figure   12 , suggesting a smaller set of values for the loop filter parameters in order to further average out the phase error word. This simulation was also used to provide us with further insight into the relative performance of an infinite precision phse detector and a sign phase detector, Figure 13 shows the output of the accumulator arm of the loop filter for the two detectors. Although the simulation only used the infinite-precision detector in the PLL, both detectors were included in the setup, where the output of each was fed into one of two identical loop filters. Apart from the obvious scaling difference, the effects of taking the sign of the correlator output is manifested in the slight difference of the two trajectories. Note that the steady state values are also different.
During steady state operation, the infinite precision detector will be at its zero crossing point, since it was the one used in the PLL, however, due to the slight offset of the zero crossing between the sign detector and the infinite detector, the loop filter accumulator for the sign-detector shows a slight downward trend.
Simultaneous Timing Recovery & Equalization
The simulations presented in the last two sections disabled the phase detector for the first 50,000 iterations to allow the equalizer to converge. Although the exercise demonstrated the tracking capabilities of the loop, it did not provide any insight into the capabilities of the system to acquire upon start-up, when the PLL has not locked and the equalizer has not yet converged. The Figure 13 . Loop filter accumulator output for the infinite precision detector, and 50 times the sign-detector. System response to step phase offset using an infinite precision phase detector with a 300 element correlator-accumulator.
Babak Daneshrad, Henry Samueli next simulation introduces a phase step at the very beginning of the simulation. In this case both the equalizer and PLL were allowed to adapt in the presence of a phase step error. It was discovered that even in the presence of the phase drifts introduced by the PLL, the equalizer is capable of opening the eye of the received signal enough to allow the PLL to perform its task and eliminate the phase error, Figure 14 . The system reaches a steady state operating point with an SNR of 27.5 dB (4-QAM) after the completion of only 200,000 samples (125 ms).
Given the impracticality of implementing a 300 element sliding-window accumulator on an integrated circuit, the response of the system using the infinite precision correlator without an accumulator was also investigated. However, due to the change in the phase detector s-curve the loop gain and loop filter coefficients had to be changed. The phase error resulting from this change is also shown in Figure 14 for side-by-side comparison. The system performs satisfactorily and in fact, exhibits a faster reaction time due to the absence of a 300 baud delay incurred in the accumulator of the first phase detector. The phase jitter in the steady state, however, is larger and could be corrected by using a smaller value for K1 or loop-gain. At the end of the simulation the system had an SNR of 27.2 dB (this simulation used 4-QAM, and was run without an error predictor). 
Frequency Acquisition, Infinite Precision
The foregoing have demonstrated the satisfactory behavior of the system in the tracking mode, as well as its response to an initial phase error when reliable data is not available. We will now study the acquisition of the system in the presence of a constant frequency offset.
Commercially available crystal oscillators with worst case frequency offsets of ±60 ppm are readily available. This suggests that a system having two oscillators, one at each end, will experience a frequency offset of at most ±120 ppm. It is thus imperative that the acquisition of the PLL, as well as the system be tested and studied with appropriate frequency offsets, between the transmit and receive sampling frequencies.
A series of simulations were carried out to better understand the acquisition phenomena of the infinite precision phase detector with no accumulator. As before an initial period of blind equalization was introduced during which only 4-QAM data was transmitted. The relative frequency offset between the transmitter and receiver D/A and A/D converters were changed from -210 Hz (-131 ppm) to +210 Hz (131 ppm) in 60 Hz (37.5 ppm) increments. In all cases the system acquired and tracked the frequency offset and resulted in the final steady-state SNR figures of Table 5 . The SNR trajectories, the loop filter accumulator contents, and the phase error trajectories for a set of four selected frequency offsets are depicted in Figure 15 . Observation of the SNR plot, Figure 15 -c, reveals that with the exception of the system having +210 Hz offset, the equalizers of the other systems converge fairly rapidly and an open eye is obtained within the first 50,000 to 
Step Response and Acquisition, Sign Phase Detector
The sign phase detector is the most desirable detector for hardware implementation since a simulations use an accumulator after the correlator and the correlator output was fed directly to the loop filter. We first attempt to gain insight into this loop by observing its step response. Except for obvious changes to the loop filter parameter all other simulation parameters were kept unchanged.
A step phase error was introduced at the start of the simulation and all adaptive loops were enabled.
The system corrected for the phase error and remained locked for the duration of the study, Figure   16 . Babak Daneshrad, Henry Samueli the setup, the system was able to acquire and track offsets of ±56 ppm as well as -112 ppm [33] with a different set of loop filter parameters. In view of this sensitivity, it is recommended that the sign-phase detector be abandoned in favor of the infinite precision detector which is somewhat more hardware intensive but exhibits a much more predictable behavior.
NCO Sampling Frequency
The last block of the PLL that requires attention is the Numerically Controlled Oscillator. This is nothing more than an accumulator with the overflow bit used as the oscillating signal [29] .
Given information regarding the number of bits, N, used in the accumulator, and its clocking frequency, f s_nco , the frequency of the accumulator overflow is given by equation (7.4) . Where W represents the input word to the accumulator. Due to the discrete nature of this oscillator, it can only cause changes in the sampling phase in increments of 1/f s_nco seconds. Thus introducing a delay between the time the loop filter output is available at the NCO input, to the time when the effect of this word is actually seen on the sampling phase. The discrete jumps in the sampling point will also cause larger jitter noise in systems that use small f s_nco values.
The effects of varying the NCO operating frequency on the system behavior was the subject of a study whose results are presented in Figure 17 . The NCO operating frequency, f s_nco , was chosen to be a multiple of the system sampling frequency, f s = 1.6 MHz. The oversampling factor for f s_nco was changed in increments of 20 from 100 down to 20. The quantization effects of the NCO on the sampling phase of the receiver become pronounced at over-sampling factors of less than 40. Figure 17 shows that at an oversampling rate of 20, the sampling phase quantization effects are quite pronounced. The 5% jumps in the sampling phase causes the SNR figure to dip by as much as 2 dB.
Conclusions
This paper has touched on many of the issues encountered in the system level specification of an all-digital QAM system for the transmission of 1.6 Mbps data over digital subscriber loops.
The optimum constellation size and center frequency for the system operating over an 18 kft 24 AWG loop in the presence of basic-rate NEXT, AWGN and self-FEXT was shown to be 16-QAM at f c =300 kHz. Three adaptive blocks are recommended to equalize the system, a three tap error prediction filter, a 16 tap T/2-spaced feed-forward equalizer and a 10 tap feedback equalizer.
Additionally, values have been suggested for the adaptation step size, used to update the tap values, and the PLL parameters. These values when used with the proposed blind-equalization technique and phase detector guaranteed convergence in all simulations carried out in the course of this work.
A discussion of the finite wordlength requirements of the various signals was also presented and in the latter half of the paper a detailed discussion of the timing/carrier recovery problem was presented. The use of a fractionally-spaced equalizer and digital IF transmission along with the static nature of the channel were exploited to simplify the timing/carrier recovery loop and a simple, easy to implement, decision directed timing phase detector was recommended for use in the system. The infinite precision version of the phase detector without an averaging block was chosen from among a few variants because it exhibited reliable acquisition and tracking even when forced to converge simultaneously with the equalizer. Finally, the oversampling requirements of the NCO were investigated and a minimum oversampling rate of 40 was suggested.
The information presented in this paper can serve as a good baseline for the implementation of a modem operating over the subscriber loops. All the system level issues and trade-offs have been presented and discussed. The hardware designer can now take on the task of putting the various blocks together, and depending on the application at hand, the parameters and blocks presented here can be augmented or decreased to tailor make the system for the particular application.
