Aktroct-This paper describes a microprocessor-based modem developed for use in a packet switching network over satellite channels. This digital modem can process both BPSK and QPSK packets with near optimum error rate performance over channels with marginal signal energy-to-noise density ratio.
including the tracking of both symbol timing and carrier offset phase, phase ambiguity resolution and data symbol estimation.
The paper concludes with details on the performance of the modem receiving a packet corrupted by one or more errors due to noise on the channel.
In the packet-switched network, for which this modem was designed, it is desirable to minimize the length of the preamble while retaining the ability of the modem to detect and demodulate packets. Because error control coding is used, the received symbol energy-to-noise density, (Es/No) can be as low as 1 dB. This factor leads to the choice of a 96-symbol, BPSK preamble composed of alternating zeros and ones. This preamble has the favorable properties of allowing an easy detection algorithm, plus a maximum number of symbol transitions upon which'an estimate of symbol phase (timing) can be made. 
HIS paper describes various detection, estimation and demodulation algorithms that
have been developed and implemented in a microprocessor-based satellite communication modem. This modem was designed for use in a packet switching network, using binary or quaternary phase-shiftkeyed (BPSK or QPSK) modulation. The requirements of a packet-switched network pose several challenges to the modem designer. Packets arrive at the receiver at random intervals from various transmitters in the network. The demodulator must quickly detect the presence of each packet and recover the data contained within it. Operation with limited channel bandwidth and a marginal signal-to-noise ratio requires the implementation of near-optimum detection, estimation and demodulation functions.
Communications users in a packet-switched network transmit blocks of data (packets) over a common channel. Packet lengths are typically on the order of lo3 data bits (the length of a given packet may even'be a random variable within a fixed range). The packet protocols and scheduling algorithms, also major elements in the system design, are beyond the scope of this paper and generally do not impact modem design; however, other characteristics of packet-switched networks do influence the modem. The throughput of a packet-switched network is affected by factors such as the probability of packet contention (two or more users attempting to send a packet at the same time), and system overhead (such as the lengths of packet preambles and scheduling information), as well as the need for retransmission due to the possibility of L with a rate 1/2*, constraint length 7, convolutjonal code and a Viterbi decoder. This combination results in a bit error probability of less than assuming near optimum demodulation. In addition, a parity tail or "checksum" inserted on the end of each packet (before convolutional encoding) is used to detect errors which are not corrected by the decoder. The preamble length, modulation and coding parameters were chosen based on the goal of maximizing network efficiency (throughput).
The satellite channel for this application is a relatively narrowband channel in a frequency division-multiple access (FDMA) system. The modem is designed for BPSK and QPSK modulation with a 2 2 kHz symbol rate (a symbol is one transmitted bit in BPSK, and 2 bits in QPSK). Serious constraints on out-of-band power r e q u K t h a t a filter with a sharp rolloff in the stop band be used at the transmitter. Similarly, the receiver must use a selective filter to minimize the effects of adjacent and cochannel interference.
As an additional constraint, the transmit and receive filtering should not resultin2 Before developing the details of packet demodulation, it is useful to describe qualitatively the nature of packet transmission and demodulation. A packet is formed by the transmission of a 96-symbol BPSK (1 bit/symbol) preamble of alternating zeros and ones, followed by a number of BPSK (1 bit/symbol) or QPSK (2 bits/symbol) data symbols. These symbols include a. fixed start of message sequente which is used for frame synchronization and phase ambiguity resolution. The packet is bandlimited at the transmitter and then modulated by quadrature carriers. The channel introduces 
553
additive white Gaussian noise (AWGN) to the signal which is then presented to the receiver. The analog portion of the receiver first mixes the jncoming signal by quadrature sinusoids which are at a frequency within 2 kHz of the carrier (' 2 kHz is the received carrier uncertainty). This carrier frequency offset
START UP INITIALILATION
and its phase must be estimated by the processor so that the unwanted modulation which results can be removed numerically. Before the in-phase and quadrature signals are sampled and sent to the processor, both are passed through low pass filters with a noise bandwidth of approximately 18 kHz. This wider filter (as opposed to the Nyquist filter with noise bandwidth of 16 kHz) was chosen so that, when the frequency off-ESTIMATE set was maximum (22 kHz), the signal would not be seriously attenuated. The two filtered signals are then sampled at 64 kHz (twice the symbol rate) and presented to the microprocessor for numerical processing.
The numerical processing of a packet can be decomposed into two stages. The first, the preamble stage, involves the detection of the presence of a packet. Since the arrival of a packet is a random event, an estimate of the symbol phase (timing) must be made during this stage as well as estimates of the carrier frequency offset and phase. Once this has been completed, the processor (which has complete control of the sampling clock) adjusts the sampling instants based on the timing estimate, and proceeds to the second stage of processing.
The second stage, the data recovery stage, involves the tracking of both symbol timing and frequency offset and phase so that estimates of the data symbols can be made. For symbol time tracking, an e r r o r signal is derived from the incoming samples, which is used to drive a numerical firstorder phase-locked loop.. The output of the PLL is used to adjust the sampling clock. The carrier frequency offset and phase tracking involves the computation of an error signal which is used to drive a second-order phase-locked loop. (The error signal calculation for QPSK symbols differs slightly from the error signal used for BPSK.) The output from this second numerical PLL is used to numerically rotate the samples as they enter the processor.
In addition to these two tracking functions, the rotated data are filtered by a simple finite impulse response (FIR) digital filter which helps to reduce the effects of intersymbol interference by giving the received data a spectral shape which is closer to satisfying the Nyquist criterion. A flowchart of the entire computational process is shown in Figure I 
where the sequences rE, are the tracking errors in timing and carrier offset phase. Assuming these errors are small, the processor outputs the even sequence as data, since
coefficients. In the absence of intersymbol interference received waveform of the preamble can be approximated by a
2T, i.e., 0 G n < 9 6 k = n during the preamble. Since foT is small, we can also make the approximation @ -n/4 = @' = @". These preamble sequences, embedded in the channel noise, are detected by the processor and then estimates of 7,f0 and @ are obtained.
The second stage of packet demodulation involves tracking both the symbol timing and carrier frequency offset and phase. These tracking functions are initialized based on the estimates obtained in the first stage. The derivation of the error signals for carrier frequency offset tracking are dependent upon the modulation of the data portion (BPSK or QPSK) of the k = n we would expect the magnitude squared of these filter outputs,
to remain approximately constant (proportional to the noise variance, a2) before the arrival of a preamble, and to begin to grow linearly as the preamble is presented to the processor. If we assume for the moment that fo = 0, then we could set a threshold to be used to declare the presence of a packet when max {yo, yl} crossed the threshold. We could then estimate r based on the relative sizes of yo and yl. Two problems exist with this scheme. First, fo is, in general, not zero, but is constrained t o be in the range,
In the worst case (I f o T I = e,,, = 1/16), the filter outputs due to the signal would be zero if N (the length of summation) were 1 6 (see Figure 3. 2). Thus, we must choose N t o be relatively small (say, N = 4) so that we do not lose excessive signal energy due to the offset. Since it is desirable to have a longer summation (integration) period, a postdetection summation must be performed, i.e..
(3.4) (3.5)
Since the postdetection summation interval was chosen as 16, the overall summation interval is 64. Since the preamble is 9 6 bits long, this scheme allows the processor a window of 32 bits in which to detect a preamble (it is desired not to detect too early for reasons that will become apparent). Note that we could still estimate the timing phase; but this leads us to the other problem with the previously described strategy; namely, the difficulty of determining an estimate of the phase from the values of yo and yl. To obtain this estimate, the following approach was used. We can form the set of sequences, n + 3 n + 3
which have maxima for values of r which are between the points where yo and y1 are maximum. Figure 3. 3 shows a plot of the gammas vs. T when the preamble is present without noise. It is easy to see that the maxima of these four sequences are equally spaced on the interval 0 < r < T . Thus, the modified algorithm involves the computation of the four values of gammas, taking the maximum, say, yi, and determining whether its value exceeded a predetermined threshold. If the threshold was satisfied, a timing phase estimate is based solely on which gamma, i.e., yi, was maximum. In the absence of noise, this would lead to an error in the estimate, r E , in the range -TI8 < rE < T / 8 .
In the presence of noise, this scheme leads t o a slight bias toward y2 and away from y 3 . The reason for this is the fact that the cross correlation between the sequences fje and ?io is not zero. This tends t o increase the noise variance of the &'s (since a sum of .?, and Go is computed) while the noise vari- are real and fixed for a given 7 . We use the set of 16 pi's which compose the maximum yi, which are most likely to yield the sequence with the largest constant ci. We can now obtain an estimate offo by forming a series of discrete bandpass filters centered at various points in the range of -4flma, < f c T < 4fTma,. The filter with the maximum energy at the output leads to the estimate of fo. Note, if the processor detects the preamble early, some of the early pi's will be composed of noise which will only tend to degrade our estimate of fo. In the actual modem, 16 discrete filters are computed using the discrete Fourier transform formula, which would have a phase equal to @ (ignoring the noise term). Once this computation is complete, the processor is ready to begin stage two of the packet demodulation.
The second stage of packet demodulation involves the tracking of both symbol timing and carrier phase, and esti- mating the values of the data symbols (see Figure 3 .6). The initialization of these tracking functions is based upon the estimates obtained from the preamble stage. We will return to this initialization procedure after considering the algorithms which implement these functions. The data demodulation stage of processing begins by sampling y ( t ) at times ~~ ( n ) and T E ( Y~) + T / 2 which results in the two samplesy,(n) and yo(n) Since I foT I < 1 , we can make the approximation 4' = 4.
The two samples are then rotated by the estimated phase angle f ( n ) and filtered by the finite impulse response (FIR) filter H ( 2 ) = 1 + 2-l. The filtered samples are
where &(n) = 4(n) -i(n) and qe'(n) = Ge(n> + 5o(n -1)
ao'(n) = ~o ( n )
+ ?e(Q). The FIR filter has a lowpass characteristic since the frequency response of this filter is proportional to cos (nfl',). This filter reduces the effects of intersymbol interference (ISI) by shaping the spectrum of the signal such that the resulting pulse shape, At), more closely approximates a Nyquist pulse ~~ fit) ItznT 
TRACKING FILTER ZY
The symbol timing function is performed on the real (I channel) portion of ae(n), so that this algorithm can be used for both BPSK and QPSK packets (in the BPSK case, the imaginary signal is zero). The error signal for the symbol timing is based on the observation that (ignoring the effects of IS1 and channel noise) when a data transition occurs in the real (in-phase) data, the real part of the odd samples, a,,(tz) should 
SGN (x) =
This error signal is used to drive a first-order phase-locked loop (PLL). This loop is of the form, 7(n + 1) = 7(n) + K * E , (3.17) where K is a gain factor which is used to achieve a desired loop bandwidth. In the actual microprocessor, the sampling clock can be adjusted in discrete steps of A = TI128 s only. The error signal is accumulated and, if the magn'itude of the accumulator crosses a preset threshold, To, then the clock is incre-** Re (*) is real part; Im (a) is imaginary part; q is the noise term. *** The mean of ET resulting from incorrect decisions can also be computed in the same manner as the carrier phase tracking error mean is determined (see Appendix B).
mented or decremented by A according to the sign of the accumulator. The accumulator is then set to zero (Figure 3.7) , Le.,
In this algorithm the value of the threshold determines the loop bandwidth of the tracking loop. For QPSK packets, we can modify the error signal to include transitions in the imaginary (quadrature) channel by using equation (3.16) on this channel also.
The initialization of the symbol timing involves the adjustment of the sampling clock after the symbol phase is estimated in the first stage of processing. As discussed earlier. the symbol phase estimate is quantized to the four intervals centered at r = 0, T/4, T/2 and 3T/4 which correspond to the maxima of the four yi's. The question that must be addressed is which of the four phases represents the desired timing for the second stage of processing. If, for the moment, we remove the FIR filter during the data demodulation stage (i.e.,H(Z) = l), then the phase corresponding to yo, i.e., T = 0, would be the desired phase since the even samples, y,(n), would be maximum, while the odd samples, yo(n), would be zero when a data transition occurred (see Figure 3 .1). Thus, in this case, the sampling clock would be adjusted, based on the maximum yi, to the phase interval centered at The demodulation stage tracks the carrier offset frequency and phase with a numerical second-order phase-locked loop (see Figure 3.8 ). An error signal is derived, based on the data modulation (BPSK or QPSK) which is used to drive this tracking loop. Like the symbol timing loop, the tracking function is decision directed. For BPSK, the error signal is computed as,
E ,
Im {ae(n)} SGN (Re { a e @ ) } ) . (3. 19)
The expected value 0:: this error signal in Gaussian noise is shown in Appendix B to be where and A plot of the expected value of the error signal as a function of @ E for I a, I = 1 and various signal-to-noise ratios ( p ) is given in Figure 3 .9. The fact that this function is periodic with period 71 is attributed to the two-way phase ambiguity that characterizes all BPSK sy:stems.
For QPSK modulatio.n, the error signal must make use of the knowledge that data are present in the imaginary (quadrature) channel, i.e., Figure 3 .9. Note that, in this case, the value of @E is measured with respect to the line ejnI4, i.e., the line where the real part is equal to the imaginary part.
Again, the four-way phase ambiguity of QPSK modulation results in a periodic expected value (period = 71/2).
The appropriate error signal is used to drive a second-order PLL which can be represented by the recursive relations (Figure 3.8) ,
f3.23)
The resulting estimate &n + 1) is used to process the next data symbol. The constants K , and K , are adjusted to optimize the performance of this tracking function in the presence of The initialization of this carrier offset tracking loop is based upon scaled versions of the estimates obtained in the first stage of processing.
There is one other function which is performed by the processor during the demodulation stage. Both BPSK and QPSK modulation are characterized by a phase ambiguity which must be resolved by the processor. The technique which is used involves the transmission of a start-of-message (SOM) sequence after the preamble and before the data of the packet. The processor computes the correlation between the known sequence and the demodulated data.
If the magnitude of the correlation crosses a set threshold, the start of data is declared and the phase ambiguity is resolved by the sign of the correlation. By the appropriate choice of SOM, this method has proven to be very effective in both frame synchronization and phase ambiguity resolution.
PERFORMANCE
The ability of the modem to successfully detect the preamble and efficiently demodulate the data portion of the packet, as measured by bit error rate, is directly related to the performance of the algorithms under actual operational conditions. The quantization of signals required by finite register length and processor computations, tend to degrade the performance of the algorithms. As a first step in evaluating performance, a computer simulation was developed on a general purpose computer?. This simulation included the effects of channel noise, intersymbol interference and finite register length. This FORTRAN program' was used to gather statistics which were used to measure the influence of various parameters. The simulation helped to determine the level of quantization of the signal and to determine the implementation of processor functions, as well as to expose certain weak areas in the algorithms which could significantly degrade performance: For example, when the detection algorithm was simulated, it became apparent that a threshold could not be foupd which would give acceptable performance. If the threshold was low, then the algorithm tended to detect too early. As the threshold was raised, the random point in the preamble where detection the signal portion of the yi's climbs until the 64th received symbol, but remains constant for the remainder of the preamble (i.e., until the 96th symbol). A modification of the algorithm was made based on this "flattening" of the signalenergy. The modified algorithm requires the maximum yi(n) to cross a preset threshold for eight values of n in a row. This modification resulted in a significant reduction of the variance and gave a detection performance which was much improved.
The results of the simulations led to refinements of the algorithms and aided the development of the microprocessor which is used to implement these algorithms. The performance of the modem is measured in terms of (1) the probability of missing a packet and (2) the bit error rate of the data at the output of the demodulator. Missing packets can be attributed to two major causes. The first is the probability of not detecting the packet (or detecting it too late) due to the channel noise. Table 4 .1 lists the measured probability of this occurrence for several values of signal-to-noise ratio. The other cause of missed packets is large errors in the estimates of symbol timing and/or carrier offset frequency and phase. When this occurs, the tracking loops are initialized incorrectly, which in turn means that they cannot "lock" onto the phase in a sufficiently short period of time. The reason for large errors in the preamble estimates may be attributed to a number of causes, but the most significant appears to be early detection of preamble. The probability of missing a packet for these reasons is also included in Table 4 .1 as is the total probability of missing a packet.
The bit error rate performance is the other important modem performance measure in this packet format. Figure 4 .1 shows bit error rate as a function of signal energy-to-noise density ratio. The lowest curve is the performance of ideal BPSK and QPSK over the infinite bandwidth, additive white Gaussian channel with matched filter d e t e~t i o n . 1~9~~ The measured bit error rate performance of the actual microprogrammed processor is indicated by the circles. Notice that the performance is very close to optimum; at E,/N, = 1 dB the loss due to both signal filtering and other losses (such as quantization) is less than .5 dB from ideal. The simulation results were very close to these points, which indicates the validity of both the models in this paper and those used in the simulation program.
is declared had an increasing variance, with the result that many packets were detected early and many detected late. It is estimated that a data rate of 100 kbits/s can be achieved with this processor.
The development of the modem was motivated by the needs of an experimental packet-switching network. The original network incorporates a few large stations which have large antennas and low noise receivers. As expected, the operational signal-to-noise ratios are high. But the cost of constructing and maintaining a large number of these terminals is prohibitively expensive. To increase the size of the network, it becomes necessary to introduce less expensive small terminals into the system. To obtain acceptable performance of the network (in terms of throughput), it is necessary to operate with a bit error rate on the order of or less. This is accomplished by the use of a demodulator which can operate with near optimum , 1 performance at the marginal signal-to-noise ratios which characterize the small .station. In addition, use of convolutional encoding-Viterbi decoding is necessary to reduce the required signal-to-noise ratios even further. An important aspect of efficient decoding is the ability to use soft quantized estimates of the 'encoded symbols in the decoding process. The availability of these soft decisions at the output of the demodulator is an important feature of the modem.
Of course, the data rate of , the channel is reduced by a factor of two (rate 1/2 code) but without the availability of this efficient demodulation/decoding centered at fl. It is then transmitted over an additive white Gaussian noise (AWGN) channel with (two-sided) spectral density N0/2. The received signal is mixed to near baseband using two quadrature sinewaves of frequency fi and phase @. It is assumed that the difference between fl and f2 is small compared to the data rate, l/T, i.e.,
P ( t > =
lfoTl < 1 wherefo =f1 -fz.
The two resulting signals are passed through a lowpass filter H2Cf), which removes the sum frequency terms This complex model is used throughout this paper where the transmitted signal is The expected value for the QPSK case is a more tedious calculation, but uses the same basic concepts. Here,
+ bi / / p ( t -iT-T -u ) h l ' ( u ) h 2 (~) sin (2nfo(t -T ) + @) dud^ -b i / / p ( t -i T -~-o ) h 1 ' ( u ) h 2 (~)
-
