Abslruer-An all-digital architecture is presented for implementing the front-end signal processing functions in a quadrature modulator and demodulator for high bit-rate digital radio applications. A pair of CMOS chips has been designed and submitted for fabrication in a 1.25 pm process and is expected to accommodate symbol rates up to 35 
I. INTRODUCTION 0 accommodate demands for increasing data rates, T many digital communication systems are switching over to more complex modulation formats which allow data transmission at higher rates without requiring larger bandwidths. As a consequence of these high data rates, most of the baseband signal processing is currently implemented with analog hardware. For example, in [l] , a 400 Mb/s 256-QAM modem is described which primarily uses analog hardware for most of the signal processing functions.
Unfortunately, it becomes difficult and expensive to implement modems with complex modulation formats using analog hardware because the overall performance becomes very sensitive to various nonidealities, such as dc offset voltages, dc voltage drifts, analog filter phase distortions, and amplifier and mixer nonlinearities. However, with digital hardware implementations, virtually all of these distortions are eliminated and very precise and controllable performance can be achieved without sophisticated compensation techniques. Of course, even with a digital implementation of the baseband signal processing, amplifiers, mixers, and filters are still required at the IF and RF stages of the system implementation and careful design of these components is necessary in order to achieve satisfactory system performance. Other advantages of digital implementations include the ability to easily program the hardware to accommodate different data rates, modulation formats, and filter specifications. However, previous all-digital modem architectures, such as those described in [2] and [3] , have been designed for relatively low-speed applications, i.e., less than 2 Mb/s.
In this paper, an all-digital architecture is presented for implementing the front-end signal processing functions in a high-speed quadrature modulator and demodulator. The architecture is partitioned into a modulator chip and demodulator chip each of which incorporate 40-tap FIR square-root Nyquist matched filters with an excess bandwidth factor of 35%. The excess bandwidth factor was chosen so that the transmit signal spectrum would satisfy FCC mask requirements for the commonly used microwave digital radio systems [4] (15 MBd in a 20 MHz channel at 4 GHz, 22.5 MBd in a 30 MHz channel at 6 GHz, and 35 MBd in a 40 MHz channel at 11 GHz). Fully parallel and pipelined architectures were adopted to accommodate modem designs with data rates up to 35 MBd.
The modulator chip accepts a pair of in-phase ( I ) and quadrature ( Q ) data streams and generates a bandlimited IF output signal at a rate of 4 samples per symbol (e.g., 140 MHz for a 35 MBd system). This signal must then be D/A converted and translated to the appropriate RF carrier frequency for transmission. The I and Q input symbols to the modulator chip can each have up to 8 b of precision, thus, any signal constellation in a space of 256 X 256 points can be generated. The receiver must translate the RF input signal to an IF frequency equal to the symbol rate where it is to be digitized at a rate of 4 samples per symbol. The demodulator chip accepts the digitized IF signal and generates a pair of I and Q filtered baseband signals at a rate of 2 samples per symbol. Clock and carrier recovery functions must be implemented external to the demodulator chip. When the modulator and 
&(U) = H;(U).
( 2 ) The serial input data stream is first converted into a lower rate symbol stream which can be interpreted as the real and imaginary parts of a complex stream of impulses x ( n T ) = x , ( n T ) + j x , ( n T ) . Different modulation formats are accommodated by specifying different signal constellations for x ( n T ).
The impulses x,( n T ) and x , ( n T ) are then filtered by a pair of identical interpolating low-pass filters which generate the real and imaginary parts of the complex bandlimited baseband signal y ( n T ' ) = y,(nT') + j y , ( n T ' ) , where T' = T / K , and K is the integer factor by which the sampling rate must be increased in order that Nyquist's Sampling Theorem is not violated by the modulation process. The resulting modulated signal z ( nT') is therefore given by
(1)
The demodulator shown in Fig. l(b) performs the inverse function of the modulator. The digitized IF signal is quadrature downconverted to baseband and low-pass filtered by the receive square-root Nyquist matched filters. Ideally, the outputs of the receive matched filters, when sampled every T seconds, should correspond to the original transmitted symbols x,( n T ) and x, ( n T ) .
The transmit and receive filters are critical elements in the modulator and demodulator. They must meet very strict frequency-domain specifications imposed by FCC masks and they must achieve a very small level of timedomain IS1 in order to accommodate higher order modulation formats such as 256-QAM. It is well known that the optimum partitioning of the transmit and receive fil- The EVENISI and ODDISI terms correspond to the IS1 in the in-phase and quadrature rails of the demodulator.
A. Filter Coeficienr Optimization
A direct implementation of the 40-tap matched filter would require 40 multipliers, which would consume a substantial amount of silicon area and, hence, would not permit a single chip implementation. Furthermore, the maximum speed of operation would be severely limited due to the numerous multiply and accumulate operations required. In addition, implementing a fixed coefficient digital filter with multipliers is very inefficient. A much more efficient realization involves the use of a canonic signed-digit (CSD) representation for each of the coefficients. A CSD code represents the filter coefficients as sums and differences of several powers of two. Since power-of-two multiplications can be obtained for free in a dedicated hardware implementation, the use of a CSD representation results in a substantial reduction in hardware complexity.
A CSD coefficient optimization technique for conventional FIR filters is presented in [6] . However, the optimization problem for data transmission filters is complicated by the fact that time domain IS1 constraints as well as frequency domain attenuation constraints are imposed on the design. The algorithm in [6] was modified [7] to incorporate the additional IS1 constraints, and a 40-tap filter with CSD coefficients was designed. The coefficients were restricted to have at most 4 nonzero digits, thus, each coefficient can be realized in hardware with at most 3 adders and 4 hardwired shifts. The stopband attenuation of the optimized CSD filter is 53 dB, and the residual IS1 given by (4a) is -55.9 dB. The baseband magnitude response of the ideal and CSD filters is shown in Fig.  2 . The IF output spectrum of the modulator is shown in Fig. 3 , along with the FCC mask for a 35 MBd symbol rate in a 40 MHz microwave radio channel. Note that only a very small amount of stopband attenuation is sacrificed by the filter with quantized coefficients, however, the hardware implementation of the CSD transmit and receive filters becomes an order of magnitude simpler than the original filters since each multiplier has been replaced with 3 adders. It is readily apparent from Fig. 3 that the 40-tap filter design greatly exceeds the FCC mask specifications in the stopband. A 40 dB stopband attenuation with a much more gradual roll-off would have been sufficient and could have been accomplished with fewer filter taps. The primary motivation for increasing the stopband attenuation of the digital matched filters is to minimize the IS1 degradation resulting from the transition bands and stopbands of the analog filters in the system. Ideally, the passbands of all analog filters in the system extend out to the stopband edges of the modulator output spectrum and have zero ripple and exactly linear phase. The transition bands and stopbands of the analog filters would then overlap with the stopband of the transmitted IF spectrum. The residual IS1 of -55.9 dB resulting from the coefficient optimization of the digital filter will be degraded by any additional filtering in the system. However, if the stopband attenuation of the digital matched filters is large, then the transition bands and stopbands of the analog filters will have a negligible effect on the transmitted signal. But if the digital filters have a gradual roll-off with a smaller stopband attenuation, then the effect of the analog filters will be more pronounced and the IS1 degradation will be greater.
In practice, it will probably turn out that the passband ripple and passband group delay distortion of the analog filters will have a much more serious effect on the IS1 than the transition bands and stopbands. However, it was decided that the additional robustness of the IS1 performance resulting from a larger stopband attenuation in the digital filter was worth the additional chip complexity.
B. Architecture Simpl$cutions
Choosing the IF center frequency of the modulator and demodulator to be 1/4 of the sampling rate, or equivalently, to be equal to the symbol rate 1 / T , results in substantial architecture simplifications. By selecting this center frequency, the cosine and sine waveforms needed in the mixing function can be sampled at 0", 90°, 180", and 270°, thereby producing samples of values 1, 0, -1 , 0, for the cosine waveform and 0, 1, 0, -I for the sine waveform. These values eliminate the need for high-speed digital multipliers and adders to implement the mixing functions in both the modulator and demodulator. Instead, a 2-to-1 multiplexer and an inverter can perform the mixing process in the modulator, and a 1-to-2 demultiplexer and an inverter can perform the mixing process in the demodulator. In addition, since half of the cosine and sine samples are zero, the two identical 40-tap FIR transmit filters required for the I and Q rails in the modulator can be replaced by only one 40-tap filter which can simultaneously process the data for both the I and Q rails. A further simplification in the modulator architecture can be obtained by taking advantage of the fact that the transmit filter is an interpolating FIR filter with a 1 : 4 interpolation ratio. Thus, the single 40-tap filter can be broken down into four 10-tap "subfilters," which can process the I and Q data in parallel. Moreover, these 10-tap "subfilters" can be clocked at a rate equal to the symbol rate 1 /T, rather than the output oversampled clocking rate of 4/T. Hence, a tremendous amount of hardware devoted to pipelining overhead can be avoided, since the clock period is increased by a factor of 4. Additionally, the inverter needed to implement the -1 multiplication in the mixing process can be eliminated by simply negating half of the 40 filter coefficients. Thus, the only hardware needed to implement the modulator are four 10-tap FIR filters, obtained from the 40-tap transmit filter, and a 4-to-1 multiplexer. The final modulator architecture is shown in Fig. 4(a) , which implements the functions in the shaded portion of Fig. l(a) . It can be seen that while the input clock to the modulator is at a rate equal to 4/T, internally over 90% of the modulator hardware is clocking at the much slower rate of 1 /T, thereby considerably easing the circuit design requirements.
Similar architectural simplifications can be achieved in the demodulator since the receive matched filter is a de- cimating filter with a 2 : 1 decimation ratio. The final demodulator architecture contains a 2-to-l demultiplexer with inversion logic to implement the mixing process, and two 20-tap "subfilters" obtained from the 40-tap receive filter by selecting the even and odd tap coefficients. These "subfilters" are clocked at a rate of 2 / T instead of the initial oversampled rate of 4/T. As a result, one 20-tap "subfilter" outputs the in-phase symbol component, and the other 20-tap "subfilter" outputs the quadrature symbol component. Output samples occur every T/2 seconds ( ideal for T/2 fractionally-spaced adaptive equalizers ). Again, while the demodulator appears to be operating at a rate equal to 4/T, internally over 90% of the demodulator hardware is clocking at a slower rate of 2/T. The final demodulator architecture is shown in Fig. 4(b) , which implements the functions in the shaded portion of Fig. l(b) .
C . System-Level Considerations
The D/A converter and A/D converter play key roles in determining the overall system performance due to their critical location within the flow of data. For the all-digital architecture in Fig. 1 , the D/A converter is placed at the IF output of the modulator where the data rate is 4/T, whereas for a typical analog architecture, the D/A's are in the baseband I and Q rails and thus operate at the symbol rate of 1 / T. For the prototype modulator chip, a max-imum symbol rate of 35 MBd is projected. Hence, the D/A converter must be capable of operating at 140 MHz.
An unavoidable side effect of the D/A converter is the introduction of sin ( x ) / x amplitude distortion into the transmit spectrum. Thus, a x/sin ( x ) compensation filter is required to equalize the sin ( x ) / x frequency response roll-off of the D/A converter. A digital x/sin ( x ) compensation filter chip has been designed to operate with the modulator chip [SI. A digital compensation filter has the advantage that the x/sin ( x ) compensation is accurate for all data rates, whereas an analog compensation filter compensates the spectrum distortion only for a single data rate.
Special attention must also be given to the A/D converter in the demodulator. As shown in Fig. I(b) , the A/D converter is placed at the IF front-end of the demodulator where the data rate is 4/T, whereas for a typical analog architecture, the A/D's are in the baseband I and Q rails and need only operate at the symbol rate of 1 / T. Therefore, to accommodate a symbol rate of 35 MBd, the A/D converter must be capable of operating at 140 MHz. The required wordlengths of the D/A and A/D converters is discussed in Section 111.
The requirement to have the IF center frequency equal to the symbol rate greatly simplifies the digital chip architecture, but it does slightly complicate the system architecture. The majority of microwave digital radio systems in existence today use standard IF frequencies of 70 or 140 MHz, and thus, an additional mixer stage would be required to translate the IF to one of these standard frequencies. Furthermore, a fairly sophisticated IF filter is required after the first mixer stage to reject image frequencies. For example, with a 35% excess bandwidth factor and a 35 MBd symbol rate, the stopband-to-stopband edge bandwidth of the modulator output signal is 47.25 MHz (see Fig. 3 ) and the spectral separation of the next image band is only 22.75 MHz. Thus, the IF filters must have an approximately linear phase response, low passband ripple, 47.25 MHz passband bandwidth, and a stopband-to-stopband edge bandwidth of 92.75 MHz, which are fairly challenging specifications.
PIPELINING A N D WORDLENGTH ISSUES
Since speed was the primary consideration in the implementation of the prototype modulator and demodulator chips, fully parallel pipelined architectures were required. The transpose direct-form FIR filter structure with carrysave addition shown in Fig. 5 was chosen due to its inherent high-speed operation and its suitability for pipelining. Each filter coefficient in Fig. 5 was then replaced by its corresponding 4-digit CSD representation to obtain the final filter structure. Because of the architectural simplifications discussed in Section II-B, pipelining within the transmit filter was found to be unnecessary. However, for the demodulator receive filter, pipelining was found to be necessary since the receive filter is clocked at 2/T, while the transmit filter is clocked at 1/T. An example of the pipelining technique used in the transpose direct-form receive CSD filter is shown in Fig. 6 . Pipeline registers are inserted between every two adder stages. Note that, in general, further pipelining within the transmit and receive filters could be implemented if the system specifications should warrant the additional speed. The adders within the prototype transmit and receive filters were implemented as carry-save adders so that carry signals were not required to ripple through the adders at each stage. At the end of each "subfilter" in the modulator and demodulator, pipelined carry ripple adders were used to merge the carry bits with the sum bits to obtain the final two's complement output.
Finite wordlength effects were also considered in the implementation of the transmit and receive digital filters. The tolerable levels of IS1 within the overall modem becomes an important factor in determining the internal wordlengths of the modulator and demodulator, as well as the input and output wordlengths. A simulation program was written which emulates all of the bit-level operations that occur within the modulator and demodulator chip architectures. By varying the modulator and demodulator internal and input/output wordlengths, the finite precision effects on the IS1 could be determined, and from these results, the appropriate wordlengths for the VLSI implementation were selected. For the simulation results that follow, an 8 b input I and Q symbol wordlength for the modulator was assumed. In addition, the modulator output was directly connected to the demodulator input. This latter assumption is equivalent to a distortionless transmission channel and perfect D/A and A/D conversion, therefore, the simulation results represent an upper bound as to the achievable performance in a practical modem implementation. The simulations calculated the resulting IS1 SNR which is defined as Signal Power Quantization Noise Power IS1 SNR = 10 log,, where x ( n 7 ' ) are the transmit symbols and x ' ( n T ) are the receive matched filter output samples. The expectations in ( 5 ) were computed as time averages over 16K random input symbols of a 256-QAM signal constellation. The I channel and Q channel IS1 SNR were computed separately and the lesser value was selected as the overall IS1 SNR. For the first simulation, all wordlengths were kept at their maximum values (i.e., no rounding or truncation errors occurred anywhere within the architecture). For this "ideal" case the IS1 SNR was determined to be 54.3 dB. This is the maximum achievable performance of the finite wordlength architecture for the given set of CSD filter coefficients.
Next, the modulator internal wordlength was varied between 8 and 20 b and the modulator output and demodulator internal and output wordlengths were kept equal to their original full precision values. The resulting IS1 SNR is plotted in Fig. 7 . The SNR performance varies considerably for internal wordlengths between 8-14 b and shows little improvement when internal wordlengths over 14 b are used. Consequently, the internal wordlength of the prototype modulator was selected to be 14 b.
Next, the demodulator internal wordlength was varied between 10 and 24 b. For these calculations the modulator internal wordlength was fixed at 14 b, and various modulator output wordlengths (equal to the demodulator input wordlength) were used as parameters to generate a family of IS1 SNR curves. The results, plotted in Fig. 8 , show that while a democluhtor internal wordlength of 14 b is sufficient for a moduiator output wordlength of 8 b, a demodulator internal wordlength of 16 b provides satisfactory IS1 SNR for up to 14 b of modulator output wordlength. Hence, an internal demodulator wordlength of 16 b was selected for the prototype demodulator.
In Fig. 9 , the IS1 SNR is plotted as a function of the output wordlength of the modulator, given that the internal wordlengths of the modulator and demodulator were lectable rounding of the modulator and demodulator outputs from 8 to 14 b was implemented. Rounding results in a nontrivial increase in the IS1 SNR over that obtained by straight truncation at the modulator and demodulator outputs. This increase in IS1 SNR can be seen in Fig. 10 where the IS1 SNR was measured while varying the demodulator output wordlength between 8 to 16 b. For these results the modulator output wordlength was fixed at 10 b. It can be seen that the increase in IS1 SNR with rounding versus truncation is especially evident when low demodulator output wordlengths are used. 
A . Simulation Examples
To demonstrate the versatility of the all-digital QAM modulator and demodulator chip set, various QAM formats were simulated, and the resulting eye patterns and symbol constellations at the output of the demodulator were plotted. For these simulations, the actual modulator/ demodulator VLSI implementation wordlengths were used, i.e., a modulator input wordlength of 8 b, a modulator internal wordlength of 14 b, and a demodulator internal wordlength of 16 b. As before, the modulator and demodulator were connected back-to-back, thereby simulating an ideal distortionless channel (and also perfect clock and carrier recovery). The wordlength at the IF interface between the modulator and demodulator chips was varied to simulate the effects of finite precision D/A and A/D conversion.
In the first example, a 16-QAM signal constellation was simulated with a 6 b D/A and AID wordlength. An IS1 SNR of 31.8 dB was measured and the resulting baseband eye diagram and symbol constellation at the demodulator output are shown in Fig. 11 . In the second example, a 256-QAM input signal constellation was simulated with IV. CONCLUSIONS Several architectural simplifications and finite wordlength optimizations have been presented for implementing a high-performance all-digital quadrature modulator and demodulator which allow single-chip high-speed VLSI implementations. Prototype chips have been designed and submitted for fabrication to the TRW Microelectronics Center in a 1.25 pm CMOS process. These chips are projected to accommodate any symbol rate up can now be applied in high bit-rate digital radio modem designs.
