ABSTRACT Regenerative, on-board Processing FDMAlTDM payloads have been recently pro osed as valid candidates for user-orientex satellite systems. Both business traffic for fixed service and mobile satellite systems can potentially take advantage of the peculiarities of,such payloads, which substantially require multicarrier demodulation (MCD) of the uplink FDMA carriers to recover the individual modulating streams, which.are in turn TDM-formatted to modulate a unique downlink carrier. Therefore two main functions are implemented by a MCD: the demultiplexing (DEMUX) and the demodulation (DEMOD). We focus here only on a digital implementation of the MCD looking at its advantages, flexibtljty, better performance and VLSI integrability. This pa er is concerned with suitable digital tecLiques to implement on-board MCD. In particular the impact of the use of a kind of network clock synchronization on the overall MCD complexity is investigated in detail. The di ita1 architecture of the proposed MCD can %e adapted to different digital modulation techni ues. However, we focus here only to the application for QPSK signals, considering the interest of this modulation scheme for digital satellite communications.
I INTRODUCTION
The evolution of satellite conlmunications requires advanced satellite which can operate with earth stations with reduced complexity. This paper is concerned about satellite with on board processing capability. in particular, an on board processing system which receives an input FDMA signal and supplies an output to interface the TDM links is considered. Therefore it must accomplish the function of the separation of each individual radio channel and its demodulation and its correct switching to the appropriate downlink channel. An appropriate name for the on board processing system performing the first two operations is the "multicarrier demodulator" (MCD) . Two main functions are implemented by a MCD: the demultiplexing (DEMUX) and the demodulation (DEMOD). The focus here is only on a digital implementation of the MCD [l] , [ 2 ] because in perspective it offers several advantages such as flexibility, VLSI integrability, better efficiency .
I1 DEnULTIPLEXER WITB INTEGRATED I-Q

SPLITTING
The demultiplexing of a FDMA signal can be performed following two basic approaches: block methods and per-channel methods. A complete survey of demultiplexing techniques is presented in [l]. The aim of this paper is to show that separation of I-Q components of the incoming QPSK signal can be efficiently performed in the DEMUX. Therefore, we focus here on the Analytic Signal (AS) approach, which is a per-channel approach able to exploit this possibility. A detailed description of the AS principle is given in [1]-[3] and for the sake of brevity it will not be recalled here. The structure of the DEMUX according to the AS method is shown in Fig. 1 (DEMUX section). The FDMA input signal, after appropriate analog down-conversion of the received signal to a low frequency range, is sampled according to the sampling theorem at the high-rate frequency f,=l/T, and processed in order to obtain N, TDM digital signals, each sampled at the low-rate frequency fd=l/Td=f,/N,, N, being the number of multiplexed channels. In Fig. 1 HL(fT,) , H'L(fT,) represent the conjugate symmetric and antisymmetric parts, respectively, of the highrate complex bandpass filter B , ( f T , ) which can be regarded as a frequency translated version of a low-pass prototype E ( f T , ) such that :
where W is the channel spacing. In the same figure, Gi( f T d ) represents a real low-rate filter. which integrates the required pulseshaping function. At the output of the highrate filters Hl(fT,),H'+(fT,)
we have the real and imaginary part of the desidered QPSK signal. Through the decimation operation by a factor N, equal to the number of the input channels a translation to baseband of these' two signals is achieved. It is straighforward to verify that by performing the multiplications by the terms cos(nn/2), sin(nn/2) ( which do not influence the DEMUX implementation complexity) shown in Fig. l , after low pass filtering (filters G(fTa)) the separation of I-Q components of the received QPSK signal is obtained. It is evident from Fig. 1 that the integration of the pulse-shaping filtering and separation into I-Q components of the QPSK signal into the low-rate stage of the DEMUX avoids the use of additional low pass filter in the DEMOD and therefore reduces the implementation complexity of the MCD. Another interesting feature of the implementation structure of Fig. 1 is that only processing of real quantities is required. The overall number of operations required per input channel and per second can be estimated as a function of the channel spacing W, the number of channels N , and the filtering bandwidth B as [2] :
where K is given by:
The terms 6= and 62 denote the overall acceptable in band and the out-of-band ripples respectively derived according to given system specifications; for example a filter design procedure is reported in [2] . It results from the previous equations that for specified values of B and N, an optimum value fo the channel spacing W , can be found in order to achieve the lowest &=-.
However, taking into account that for the subsequent demodulation operation an integer number of samples per symbol is convenient a suboptimum value of W closest to W, is generally used. To this end, a suitable choice of the DEMUX output sampling frequency 2W=1.5 R turned out to be equal to 3 samples/symbol, with R the transmission rate.
I11 ALTERNATIVES FOR DEMODULATOR IPiPLEPrPNTATION
This section considers two alternatives for the digital implementation of a demodulator suitable for QPSK signals. In particular, simplified carrier and clock recovery approaches necessary to perform a coherent demodulation of the demultiplexed QPSK signals are described. The benefits introduced by the possibility of network clock synchronication in terms of a reduction of the overall MCD implementation complexity are also investigated.
a1 Nonlinear estimation method of QPSK-modulated carrier phase.
The block diagram of the phase estimator considered here is shown in Fig.1 . Its principle of operation is described in [4] . Interesting feature of this carrier phase estimator are that preambles can be usually avoided and a short and defined acquisition time is required. The influence of a finite arithmetic implementation on the carrier estimated value can be derived only by simulations. In Fig.2 the mean square error on the carrier phase is shown as function of E/No, with E the energy-per-bit and No the one-sided noise power density. A floating point implementation (curve a) and a finite arithmetic implementation with br = 6 bits (curve b) are considered. It can be noted from this figure that a carrier phase mean square error less than 5 degrees can be achieved for E/No greater than 5 dB with bf =6 bits. This quantization loss can be significantly reduced by a more accurate quantization process. Naturally this has a direct impact on the size of the ROM. The implementation complexity of the proposed carrier phase estimator method, taking into account that the separation into I-Q components of the QPSK signal is performed in the DEMUX, can be derived as: The simplified QPSK timing-error detector approach presented herein avoids the use of any interpolation /decimation and uses three samples per symbol to perform symbol detection. The outputs of the low-rate filters of the DEMUX (G*) which also carry out the pulse shaping with an appropriate roll-off factor, are two real sequences {yl(.)} and Cys(.)}.
Timing information must be recovered from these sequences. Symbols are transmitted synchronously, spaced by the symbol interval. Each sequence has three samples per symbol and the samples are timecoincident between the in-phase sequence Cyl(.)} and the quadrature one {yq(.)}.
The main goal of the proposed timing-error derector is to correctly select the set of three samples that belong to the same symbol; in other words the timing correction for the
34.3.2.
considered timing error detector algorithm consists to correctly partitioning the received samples into sets of three samples, each set belonging to a unique symbol. The implementation complexity of the proposed timing estimation method can be derived as:
A , = 1.5 R addslsec By comparing the implementation complexity of the proposed method with that of other digital approaches, as for example that proposed by Gardner in [5] , it is evident that a considerable reduction is achieved. However, the method proposed by Gardner [SI introduces lower degradation than the simplified approach considered in this section. Lastly, it can be said that the simplified timing-error detector method represents a suitable choice for MCD systems in which the overall implementation complexity, and therefore the power consumption, is the primary concern and a tradeoff with the degradation introduced is possible.
b. Clock synchronous system
This section considers the benefits of the use of a network clock synchronization on the DEMUX and DEMOD design. Indeed, when all the carriers are clock synchronized at the satellite receiver, only one sample per symbol at the optimum decision time instant can be used at the DEMOD input. For the AS approach the use of a network clock synchronization results in a reduction in the implementation complexity. Indeed in this case only one sample per symbol is required at the DEMUX output. Therefore, by maintain unchanged the signal bandwidth and the channel spacing, a frequency sampling reduction by a factor of three can be included in the low-rate stage of the DEMUX. The number of multiplications required per channel and per second is now given by:
From the previous equation it can be noted that a significant reduction for MDEmand APEis achieved making use of the network clock synchronization. Moreover, it can be pointed out that also the carrier recovery circuit can be simplified making use of a network clock synchronization. In fact, taking into account that the carrier phase estimation method presented in sect. a.1 can operate with only one sample per symbol, the implementation complexity results to be :
For the clock recovery only one clock error estimator is required for all the N , channels. It can estimate the clock error operating in time sharing on all the channels. Hence the contribution on the overall MCD implementation complexity due to the clock recovery circuit is pratically negligible.
IV SYSTEM DESIGN AND PERPORHANCE
The MCD design is discussed in this section. The two different DEMOD implementation techniques are considered. To this regard an important consideration to be made is that the number of channels processed by the DEMUX (N,) influences the input sampling frequency and conseguently the processing rate and the complexity of the first stage of the MCD. In particular, a feasible constraint is to require that the input AID converter sampling frequency (clock) should be close to its maximum possible value. Starting from these considerations, as the design goal, we have selected N,=8 and N,=10 at R=2048 Kbitlsec. The design of the DEMUX according to the AS approach is first presented. The high-rate and low-rate lowpass prototypes have been designed as a FIR linear phase filter by using the equiripple method [61. It can be noted that the low-rate lowpass prototype has been designed to include the required pulse-shaping function with a 40% roll-off factor equally shared between the transmitter and the receiver. The implementation complexity in terms of multiplications per second and per channel is reported in tab.1 for the different considered values of N , .
A finite arithmetic implementation is necessarily required to implement any digital processing system. To this end, the filtering specifications and the DEMUX finite precision design has been derived to introduce at each demultiplexer output a suitable degradation, with respect to the input signal-to-noise ratio SNRL [2]. The finite arithmetic wordlengths are reported in tab. 2. In Fig. 3 the degradations in dB for the output signal-to-noise ratio introduced by the digital implementation of the DEMUX are reported as function of E/No. It can be noted that there is a good agreement between the results derived through the theoretical analysis [2] and those obtained through computer simulations. Moreover, it should be said that the theoretical analysis considers the worst case degradation. The overall DEMOD implementation complexity results equal to 69.5 R (adds/ch/sec) with NE=23 . The degradation of the DEMOD (DEMOD l o s s ) are due to the phase jitter introduced by the carrier phase estimate and to a symbol timing offset introduced by the symbol timing estimate. In order to obtain an evaluation of the l o s s due to the phase jitter and symbol timing offset, we have assumed them as two independent noise contributions. The loss due to a phase jitter can be derived through the following equation
where Q is equal to E/No and a for moderate to high signal-to-noise ratios, can be assumed equal to 1/ue2 with uQ) the root-meansquare (r.m.s.1 value of the phase error. The degradation is reported in Fig. 4 as function of E/No with a finite precision implementation at 6 bits. In the same figure the degradation due to the symbol timing offset, also including the effect of a finite precision implementation at 6 bits, are also reported. The benefits of the use of a network clock synchronization on the MCD design are now illustrated. The resulting DEMUX and DEMOD implementation complexity ( not including the common clock recovery circuit) in terms of number of multiplications per channel and per second is reported in Tab. 3 for N,=8 and R=2048 Kbitlsec.. In the same table the achieved implementation complexity for DEMUX and DEMOD, without using network synchronization, is also reported for comparison purposes. From the previous results it can be pointed out that, when a network clock synchronization is used, a reduction for the overall MCD implementation complexity is achieved. 
