All-Digital Multicarrier Demodulators for On-Board Processing Satellites in Mobile Communication Systems. by Hung, Yim Wan.
All-Digital Multicarrier Demoduiators for 
On-Board Processing Sateiiites in 
Mobile Communication Systems
Thesis submitted for the degree of 
Doctor of Philosophy
Yim Wan Hung
Department of Electrical and Electronic Engineering 
University of Surrey 
Guildford 
U.K.
COPYRIGHT ©JUNE 1991
ProQ uest Number: 27750479
All rights reserved
INFORMATION TO ALL USERS 
The quality of this reproduction is dependent on the quality of the copy submitted.
in the unlikely event that the author did not send a complete manuscript 
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.
uest
ProQuest 27750479
Published by ProQuest LLC (2019). Copyright of the Dissertation is held by the Author.
Ail Rights Reserved.
This work is protected against unauthorized copying under Title 17, United States Code
Microform Edition © ProQuest LLC.
ProQuest LLC 
789 East Eisenhower Parkway 
P.O. Box 1346 
Ann Arbor, Ml 48106 - 1346
Abstract
Economical operation of future satellite systems for mobile communications can only be 
fulfilled by using dedicated on-board processing satellites, which would allow both cheap 
earth terminals and lower space segment costs. With on-board modems and codecs, the 
up-link and down-link can be optimized separately. An attractive scheme is to use 
FDMA/SCPC on the up-link and TDM on the down-link. This scheme allows mobile 
terminals to transmit a narrow band, low power signal, resulting in smaller dishes and 
HP As with lower output power. On the up-link, there are hundreds to thousands of FDM 
channels to be demodulated on-board. The most promising approach is the use of all-digital 
multicarrier demodulators (MCDs), where analogue and digital hardware are efficiently 
shared amongst channels, and digital signal processing is used at an early stage to take 
advantage of VLSI implementation. A MCD consists of a channelliser for separation of 
FDM channels, followed by individual modulators for each channel. Major research areas 
in MCDs are in multirate digital signal processing, and the optimal estimation for 
synchronization, winch form the basis of the thesis. Complex signal theories are central 
to the development of structured approaches for the sampling and processing of bandpass 
signals, which are the foundations in both channelliser and demodulator design. In multirate 
DSP, novel polyphase theories replace many ad-hoc, tedious and error-prone design 
procedures. For example, a polyphase-matrix-DFT channelliser includes all efficient filter 
bank techniques as special cases. Also, a polyphase-lattice filter is derived, not only for 
sampling rate conversion, but also capable of sampling phase variation, which is required 
for symbol timing adjustment in all-digital demodulators. In modulation schemes, a 
systematic survey is reported, based on two expressions that includes all formats in linear 
and constant envelope modulation. In synchronization techniques, classifications 
according to the criterion of statistical optimization, the data dependency, and the method 
of parameter extraction, reflect the inherent complexity and performance of numerous 
existing algorithms. The designs of two new algorithms are presented: a differential 
decision frequency error detector that is simple and fast; a dual-comb-filter 
frequency/timing error detector that is targeted âféVLSI implementation. The real-time 
implementation of a complete 4x16 kb/s MCD for the T-SAT project is described in 
detail, which proved many of the structured design concepts developed in the thesis. The 
requirements of software tools for various levels of simulation in multirate DSP and 
communications are analysed. This led to the implementation of a data-flow oriented 
simulation system, which was used in all research work in the thesis.
Acknowledgments
I would first like to thank my supervisors, Professor B.G. Evans and Mr F.P. Coakley for 
their guidances and the opportunities they created for me.
I would also like to acknowledge the Croucher Foundation of Hong Kong for awarding 
me a scholarsliip.
As a member of the On-board Processing Research Group, I would like to say thank you 
to past and present members: C.C.D. Kwan, L.N. Chung, C.W. Wong, S.C. Lu and P. 
Wang.
To all who have suffered during the preparation of this thesis, I can only offer my 
appreciation for their patience, especially my family and Faria.
Table of Contents
1 In troduc tion .................................................................................. 1-1
1.1 References................. ................................................................ . 1-7
2 Complex Multirate Signal P rocessing  ................................ 2-1
2.1 Complex Signal Representation.................. ............................ 2-2
2.2 Frequency Translations............................................................. 2-4
2.2.1 Complex Sinusoid............................................................. 2-4
2.2.2 Quadrature Modulation .....................................................2-5
2.2.3 Two-sided Frequency Translation.................................... 2-8
2.3 Uniform Sampling Theorem for Arbitrary Signals................ 2-9
2.4 Complex Filtering..........................................................................2-13
^ 2 .5  Bandpass Multirate Digital Signal Processing........................2-15
2.5.1 Decimation ........................................................................... 2-16
2.5.2 Interpolation ....................................................................... 2-19
2.5.3 Spectral Mapping ..............................................................  2-21
2.5.4 Sampling Invariance............................................................ 2-22
^  2.6 Realization of Multirate Filtering...............................................2-23
2.6.1 Periodically Time-Varying Systems.................................... 2-23
2.6.2 Synchronous Data-Flow Graph Representation ...............2-25
2.6.3 Efficient Complex FIR Structures....................................... 2-27
2.6.4 Decimation ...............   2-28
2.6.5 Interpolation .................................................................   2-30
2.6.6 Polyphase-Matrix Structure for Sampling Rate 
Conversion............................................. ....................................  2-31
2.7 Summary.......................................................................................2-35
2.8 References................................................................................... 2-38
3 A Com parative Study of Filter Banks ..................... ............3-1
3.1 Filter Bank Functions for Multicarrier Demodulators 3-2
3.2 Demultiplexer Approaches........................................................ 3-4
3.2.1 Single-Channel Approaches ........................................... 3-5
3.2.1.1 Direct Filtering..............................................................  3-5
3.2.1.2 Multistage Filtering ......................................................  3-6
Interpolated FIR Filters................................................. . 3-6 ^
Half-band Filters..............................................................  3-7
Analytic Filters............... .................................................  3-8
Hilbert Transformers........................................................  3-8
Yim, All-Digital Multicarrier Demodulators
Comb Filters....................... ..........................................  3-8
3.2.1.3 Fast Convolution.................. .....................................  3-9
3.2.2 Multichannel Approaches............... ................................ 3-9
3.2.2.1 Polyphase-DFT  ..................................... .......  3-9 u-
3.2.2.2 Tree............... ....................................... ...................  3-10
3.2.2.3 Fast Convolution............ ........................................... 3-11
3.2.2.4 Analysis-Synthesis.....................................................  3-11
3.3 Optimization Hierarchy in D S P .................................................3-12
3.4 A Unified Filter Bank Theory..................................................... 3-14
3.5 Computer Simulation..................................................................3-19
3.6 Computation Hardware...............................................................3-25
3.7 Comparison of Demultiplexer Approaches............................3-27
3.8 References................................................................ .................. 3-31
4 A Survey of Digital Modulation Schem es and 
Synchronization T ech n iq u es .................................................... 4-1
4.1 Baseband Digital Modulation Formats.................................... 4-1
4.1.1 16-QAM..............................................................................  4-3
4.1.2 Linear Modulation Schemes  ............................................ 4 - 7 ^
4.1.3 MSK ................................................................................... 4-8
4.1.4 Constant Envelope Modulation Schemes........................4-11
4.2 Modulation Schemes for Mobile Applications ...................... 4-12
4.3 Comparison of Power and Bandwidth Efficiency................. 4-14
4.4 All-Digital Demodulators........................................................... 4-19
4.5 Digital Symbol Timing Adjustment............................................ 4 -2 5 ^
4.5.1 Polyphase-Lattice Structures........................................... 4-26
4.5.2 Adaptive Sampling Control................................................4-30
4.6 Synchronization in All-Digital Receivers ................................4-32
4.6.1 Maximum Likelihood Estimation..................................... . 4-33
4.6.1.1 Other Criteria............................................................. 4-34
4.6.1.2 Properties of ML Estimates........................................ 4-35
4.6.1.3 Classification of Synchronizers..................................  4-35
4.6.1.4 Extraction of Estimates............................................... 4-37
4.6.2 Tone Synchronization .......................................................4-40
4.6.3 Symbol Timing Synchronization....................................... 4-42 ^
4.6.3.1 Zero-Crossing Algorithms.......................................... 4-42
4.6.3.2 The Mueller and Mueller Algorithm ........................... 4-43
4.6.3.3 Symbol Clock Extraction............................................ 4-44
4.6.4 Phase Synchronization .....................................................4-45
4.6.4.1 The Costas Algorithm................................................. 4-45
4.6.4.2 The Viterbi and Viterbi Algorithm.............................  4-46
Yim, All-Digital Multicarrier Demodulators
4.6.5 Automatic Frequency Control..............................................4-47
4.6.5.1 Phasor Filtering  .........   4-47
4.6.5.2 Differential Decision Frequency Eiror Detector.......... 4-49
4.6.5.S Dual-Comb-Filter Frequency-Timing Error Detector ... 4-51
4.7 Comparison of Demodulator Complexity..................  4-52
4.8 References......................................................................................4-54
5 All-Digital Multicarrier Demodulator Im plem entation   5-1
5.1 Analogue I.F. B oard ....................................................................5-3
5.2 Multiprocessor DSP Board  ..........................................5-7
5.2.1 Demultiplexer........................................................... ......... 5-9
5.2.2 Demodulator Array.............................................................. 5-18
5.2.2.1 Parameter Estimation.................................  5-19
5.2.2.2 Digital Timing Correction..........................................  5-26
5.3 Computer-Aided Design and Simulation S tudy...................... 5-28
5.4 Testing ..............     5-32
5.4.1 ROM-Based Flexible Modulator......................................... 5-33
5.4.2 Arbitrary Wave-form Generator........................................  5-35
5.4.3 Transient Wave-form Recorder.......................................... 5-36
5.4.4 Custom Bit Error Rate Monitor............................................5-36
5.4.5 Test results........................................................................... 5-37
5.5 Complexity of Multicarrier Demodulators................................. 5-38
5.5.1 Complexity of T-SAT M CD................................................ 5-38
5.5.2 ASIC Implementations ...........................................  5-40
5.6 Signal Processor Architectures and Software.........................5-41
5.7 References......................................................................................5-48
6 A Data-Flow Oriented Simulation S y s te m ...........................6-1
6.1 Requirement Analysis ................................................................6-2
6.1.1 Automatic Scheduling .......................................................  6-3
6.1.2 Feedback ........................................................................... 6-4
6.1.3 Hierarchical Block Diagram Paradigm ............................. 6-4
6.1.4 Memory Management.................................................... . 6-4
6.1.5 Hardware Computation .................... ................................ 6-5
6.1.6 Signal Analysis.....................................................................6-5
6.1.7 Batch Execution ..........................................    6-5
6.1.8 Run-time Efficiency...........................................................  6-6
6.2 Survey of Simulation Packages...........................   6-6
6.3 Survey of Computer Languages................................    6-7
6.4 Implementation of DOSS..............................................................6-9
6.4.1 Data-flow Oriented C ........................................................... 6-9
6.4.2 Batch Execution...................................................................6-17
Yim, All-Digital Multicarrier Demodulators
6.4.3 Code Generator .................................................................  6-17
6.4.4 Software Development and Maintenance....................... 6-21
6.4.5 Integration of DSP Tools...................................................  6-22
6.5 References.................................................................................... 6-22
7 C onclusion and Future W ork ................................................. 7-1
APPENDIX Published Papers
A.1 W. H. Yim, C. C. D. Kwan, F. P. Coakley and B. G. Evans, “Comparison of 
Digital Transmultiplexer Architectures for Use in On-Board Processing 
Satellites,” Satellite Integrated Communications Network, pp. 279-286, Elsevier, 
1988.
A.2 W. H. Yim, C. C. D. Kwan, F. P. Coakley and B. G. Evans, “Multicarrier
Demodulator for the On-Board Processing T-SAT Land Mobile Payload,” 4th 
lEE International Conference on Satellite Systems for Mobile Communications 
and Navigation, pp. 254-258, London, Oct. 1988.
A.3 W. H. Yim, C. C. D. Kwan, F. P. Coakley and B. G. Evans, “On-board
Multicarrier Demodulator for Mobile Applications Using DSP Implementation,” 
1st International Workshop on Digital Signal Processing Techniques Applied to 
Space Communications, pp. 124-130, European Space Agency, Noordwijk, Nov. 
1988.
A.4 W. H. Yim, C. C. D. Kwan, F. P. Coakley and B. G. Evans, “Multicarrier 
Demodulators for On-Board Processing Satellites,” InternationalJournal o f 
Satellite Communications, vol. 6, pp. 243-251, Wiley, 1988.
A.5 W. H. Yim, C. C. D. Kwan, F. P. Coakley and B. G. Evans, “On-Board
Multicarrier Demodulators for Mobile Applications Using DSP Implementation,” 
1st European Conference on Satellite Communications, pp. 437-443, Munich, 
Sep. 1989.
A.6 W. H. Yim, C. C. D. Kwan, F. P. Coakley and B. G. Evans, “On-Board
Multicarrier Demodulators for Mobile Applications Using DSP Implementation,” 
Space Communications, vol. 7, pp. 543-548, Elsevier, Amsterdam, 1990.
A.7 W. H. Yim and F. P. Coakley, “DSP MCD’s for Mobile Satellite Services,” 2nd 
International Workshop on Digital Signal Processing Techniques Applied to 
Space Communications, p. 3.2, European Space Agency, Turin, Sep. 90.
A. 8 W. H. Yim and F. P. Coakley, “All-Digital Multicarriers for On-Board 
Processing Satellite Systems,” 3rd Bangor Communications Symposium, 
University of Wales, Bangor, May 91.
Yim, All-Digital Multicarrier Demodulators
1Introduction
Figure 1.1 Transparent Satellite System
In the beginning, satellites were merely repeaters in space -  signals received from earth 
being retransmitted back. Figure 1.1 shows the tasks of these so called “dumb” satellites. 
The payloads of such satellites consist of nothing more than amplifiers, frequency shifters 
and filters. To date all commercial civil satellites have used this type of transparent 
transponder, as shown in block diagram form in figure 1.2.
Yim, All-Digital Multicarrier Demodulators 1-1
1 Introduction
LNA
FREQUENCY
SHIFT
LNA - Low Noise Amplifier
HPA - High Power amplifier
Figure 1.2 Transparent Satellite Payload
Subsequent major developments in satellite systems include networks consisting of large 
numbers of very small terminals [Evans85a] -  mobiles and fixed VSATs (Very Small 
Aperture Terminals). Figure 1.3 depicts such a system. The required system capacity 
depends very much on the application: thousands of low-bit-rate channels for mobile 
services, e.g., 4.8 kb/s suitable for vocoded voice; hundreds of channels for higher data 
rate VSATs, e.g., 64 kb/s. Economical operation in the future can only be fulfilled by using 
dedicated On-Board Processing (GBP) satellites. The complexity of these "intelligent” 
satellites is high, but they should allow both cheap earth terminals and lower space segment 
costs.
As shown in figure 1.4, the likely features of GBP satellites are:
(a) multiple spot-beam antennas (with associated beam forming networks)
(b) regenerative repeaters (modems and codecs)
(c) baseband switches
(d) on-board processor for control and monitoring functions
Yim, All-Digital Multicarrier Demodulators 1-2
1 Introduction
Figure 1.3 On-Board Processing Satellite System
( ON-BOARD PROCESSOR
BFN
}
mm I ^ wm#
)
\ J mm-► L J mm
BFN
MOD
ENCODE
Figure 1.4 On-Board Processing Satellite Payload
Multiple spot-beams allow frequency reuse on non-adjacent beams. This increases the 
number of channels many fold, given the same spectrum allocation. The higher efficiency 
of spot-beam antennas allows for smaller dish sizes with lower output power HP As at the 
earth stations, thereby reducing cost of terminals. Regenerative repeaters with
'^■i-Dlaital Muiticarrier Demodulators 1-3
1 Introduction
demodulation and decoding on board improve system performance under noise and fading. 
On-board switching of data provides inter-beam connectivity, and allows more flexibility 
in distributing different bit rates amongst the users.
With regenerative repeaters, the up-link (earth to satellite) and down-link (satellite to earth) 
can be optimised separately, as shown in figure 1.5. An attractive scheme is to use 
FDMA/SCPC (Frequency Division Multiple Access /  Single Channel Per Carrier) on the 
up-link, and TDM (Time Division Multiplexing) on the down-link [El-Amin86a]. This 
FDMA/SCPC scheme allows mobile terminals to transmit a narrow band, low power, 
signal. TDMA (Time Division Multiple Access) schemes require the transmission 
bandwidth of each mobile to be the total bandwidth allocated for all channels, thus requiring 
larger dishes and HP As with higher output power.
A continuous, single TDM carrier (and single modulator) on the down-link is less affected 
by non-linearities due to the satellite HP As, and thus can provide a higher EIRP (Equivalent 
Isotropic Radiated Power).
FDMA/SCPC
TDM
Figure 1.5 Up-link/Down-link Schemes
The use of FDMA/SCPC on the up-link requires hundreds to thousands of narrow band 
demodulators on each beam alone. To allow simple mobile terminals, the channels to be
Yim, All-Digital Multicarrier Demodulators 1-4
1 Introduction
demodulated are not synchronised in symbol timing or carrier phase. Analogue 
implementation is not feasible as a single demodulator per channel would require an 
unrealistic payload of large volume with high power consumption [Gardner85a].
The most promising approach is the use of all-digital Multicarrier Demodulators (MCDs), 
which have been extensively investigated in Europe, North America and Japan [Evans88a]. 
A single MCD demodulates a group of EDM channels. The analogue and digital hardware 
are efficiently shared amongst channels, and there are more opportunities for optimization. 
The number of channels in a group depends on the bit rate and available technology, e.g., 
30 to 100 channels per MCD. Digital signal processing (DSP) is used at an early stage 
(hence the term all-digital) to take advantage of VLSI implementation.
Figure 1.6 shows the functions of a MCD. A group of K  FDM channels are sampled at a 
rate sufficient for the group bandwidth. The analogue front-end and A/D converter are 
shared. A channelliser, a particular type of filter bank for this application, separates the 
channels. Since the signal bandwidths at the channelliser outputs are reduced by a factor 
of K, a lower sampling rate is sufficient. The demodulators can then operate at the lower 
rate. For example, a 9.6 kb/s QPSK channel would require a sampling rate in the region 
of 9.6 kHz. For a group of 100 channels, the sampling rate of the A/D converter would be 
about 1 MHz.
K FDM 
CHANNELS
û l i l
CHANNELLISER
SAMPLING
r a t e  n m i i i i i
DATA
Û  ^[d em o d ] XDOCX^
DEMOD
DEMOD
DEMOD
t
juuiniifin
t
J U U T J
Figure 1.6 Multicarrier Demodulator
Yim, All-Digital Multicarrier Demodulators 1-5
1 Introduction
Major research areas in MCDs are in multirate digital signal processing, and the optimal 
estimation for synchronization (and detection) in demodulators. This work forms the basis 
of the thesis.
Chapter 2 describes the underlying analysis of MCDs -  complex signals and multirate 
signal processing. Complex signals are conceptually simple, and are necessary for a 
structured approach towards signals and systems (both analogue and digital). Multirate 
digital signal processing is applicable to real-time implementations whenever there are 
changes in signal bandwidths. A novel approach based on polyphase-matrixes and 
sampling invariance turns the realization of efficient multirate filters into a routine task.
Chapter 3 is the derivation of a unified description of filter banks as polyphase-matrix-DFT 
structures. Various derived schemes for implementing channeUisers, including most 
current ad-hoc designs, are compared through computer simulation. General tree structures 
are found to be suitable for VLSI implementation, as well as non-uniform channel 
bandwidth.
Chapter 4 presents a comprehensive survey of modulation schemes and synchronization 
techniques. Based on complex signals, all digital modulation schemes can be represented 
in two expressions, which can be directly used for implementation of flexible and efficient 
DSP modulators. Because of the necessity of using all-digital implementations in MCDs, 
many new synchronization algorithms have been derived in recent years, by applying 
efficient DSP techniques for estimation. Novel techniques are also derived in this section: 
polyphase-lattice filters for digital adjustment of symbol timing; differential decision 
frequency error detectors; and dual-comb-filter detectors for both frequency and timing 
errors.
The next chapter, chapter 5, describes the implementation of a real-time, 4x16  kb/s MCD, 
using TMS320C25 digital signal processors. This is one of the major components of a 
complete prototype OBP payloadforthe U.K. Technology Satellite project, which involved 
a consortium of U.K. universities and the Rutherford and Appleton Laboratories.
Chapter 6 provides a requirement analysis of software tools for various levels of simulation 
in multirate DSP and communications. It describes the implementation of a Data-flow 
Oriented Simulation System, which was used in all research work in the thesis.
The final chapter 7 forms the conclusions and possible directions for future research work.
Yim, All-Digital Multicarrier Demodulators 1 -6
1 Introduction
1-1 R eferences
El-Amin86a. M. H. El-Amin, B. G. Evans, and L. N. Chimg, “An Access Protocol for 
On-board Processing Business Satellite Systems,” Proc. 7th International 
Conference on Digital Satellite Communications, pp. 149-154, May 1986.
Evans85a. B. G. Evans, “Towards the Intelligent Bird,” International Journal o f
Satellite Communications, No.3, pp. 203-215, July 1985.
Evans88a. B. G. Evans éd.. Special Issue on Multicarrier Demodulators, International
Journal o f Satellite Communications, Vol. 6, Wiley, 1988.
Gardner85a. F. M. Gardner, “On-Board Processing for Mobile-Satellite
Communications,” ESTEC contract no. 5889/84/NL/GM, European Space 
Agency, May 1985.
Yim, All-Digital Multicarrier Demodulators 1-7
2 Complex Multirate Signal Processing
Table of Contents
2 Com plex Multirate Signal P rocessing  .................................. 2-1
2.1 Complex Signal Representation.................................................2-2
2.2 Frequency Translations............................................................... 2-4
2.2.1 Complex Sinusoid......................................................... . 2-4
2.2.2 Quadrature Modulation .....................................................2-5
2.2.3 Two-sided Frequency Translation....................................  2-8
2.3 Uniform Sampling Theorem for Arbitrary Signals..................2-9
2.4 Complex Filtering.......................................................................... 2-13
2.5 Bandpass Multirate Digital Signal Processing........................ 2-15
2.5.1 Decimation ........................................................................... 2-16
2.5.2 Interpolation .......................................................................  2-19
2.5.3 Spectral Mapping .................................................................2-21
2.5.4 Sampling Invariance............................................................ 2-22
2.6 Realization of Multirate Filtering.................................................2-23
2.6.1 Periodically Time-Varying Systems....................................2-23
2.6.2 Synchronous Data-Flow Graph Representation ...............2-25
2.6.3 Efficient Complex FIR Structures....................................... 2-27
2.6.4 Decimation ........................................................................... 2-28
2.6.5 Interpolation .......................................................................  2-30
2.6.6 Polyphase-Matrix Structure for Sampling Rate 
Conversion.....................................................................................2-31
2.7 Summary......................................................................................... 2-35
2.7.1 Generalized Sampling Theorem ........................................ 2-35
2.7.2 Complex Signal Representation.......................................  2-35
2.7.3 One-Sided Frequency Translation ...................................  2-36
2.7.4 Complex Filtering .................................................................2-36
2.7.5 Bandpass Multirate D S P ..................................................... 2-37
2.7.6 Realization of Multirate Filtering......................................... 2-38
2.8 References......................................................................................2-38
Yim, All-Digital Multicarrier Demodulators 2-i
Complex Multirate Signal Processing
This chapter is an overview of the basic concepts and functions of multirate digital signal 
processing applied to complex (non-real) signals, and provides a generalised approach to 
the design and analysis of efficient DSP systems.
In contrast to the general DSP literature, we emphasis the processing of complex signals 
at an early stage. Sopliisticated DSP algoritlims arise from the processing of multiple 
bandpass signals, for example, conventional filter-banks, and digital I.F. processing in 
modems. Complex representations of bandpass signals and filters are more convenient for 
design and analysis, and more flexible for describing high level system functions.
Conventional approaches to both analogue and digital signal processing begin with 
realization theory, indulging in complicated transforms at an early stage. To provide 
straightforward and practical design concepts of multirate DSP systems, we make extensive 
use of apparently non-realizable entities such as continuous-time representation of digital 
signals and filters, complex analogue filters with finite impulse response, and direct 
analogue convolution. This approach provides a unified viewpoint towards signal 
processing, for example, analogue signals with infinite sampling rates are a particular case 
of multirate sampling. Therefore familiar signal processing concepts and notations can be 
directly applied.
Yim, All-Digital Multicarrier Demodulators 2-1
2 Complex Muitirate Signal Processing Complex Signal Representation 2.1
Our generalised approach arrives at systems with predominantly complex processing. In 
DSP, complex signals and functions can be realized in digital hardware directly, often 
leading to the most flexible implementation, with similar efficiency compared to other 
options. These options arise because, for example, hardware efficiency can be gained at 
the expense of flexibility, by restriction to real processing where appropriate. We shall see 
later that from a predominantly complex processing system, a variety of alternatives can 
be derived (including original ones presented in later sections), which are often presented 
in the literature as different techniques, with distinct terminologies. A generalised approach 
allows us to perform systematic trade-offs and comparisons before committing to particular 
realization techniques.
2.1 Complex Signal R epresentation
Analytically, complex signals demand no special treatment. One-dimensional transforms, 
such as Fourier, Laplace and Z, are applicable to complex signals, with real signals as a 
particular class. Complex signals are sometimes regarded as two-dimensional. We shall 
avoid this term as multidimensional signals are associated with multidimensional 
transforms in general. In trellis coding, a multidimensional signal space is mapped onto a 
two-dimensional trellis (time-varying). This trellis can then be realised by physical 
analogue or digital complex signals. In analogue realisation, two related real signals can 
be regarded as complex, representing the real and imaginary components. These two 
components require gain matching, which illustrates the dominance of real signals in the 
analogue domain. In DSP realisation, complex signals are represented simply by complex 
number sequences. Both signal types appear frequently in communication links using DSP 
modems. Therefore we shall use the following graphically convenient notations to 
distinguish amongst spectra of different signal types.
The frequency spectrum^ [Oppenheim83a, sec. 4.2.2] of a real signal s{t) exhibits 
symmetry. The magnitude plot of the Fourier transform S if) represents directly the even 
symmetry, | S {-f) | = j ^  (/) |. In the same plot, to represent the odd symmetry of the phase 
ZS i-f) = -Z S  if), we use hatching that forms a mirror image about the line of symmetry 
as in figure 2.1(a). The line of symmetry must be a t / = 0 for real signals.
1 In dealing willi frequency domain representations, we assume the Fourier transform of the signal of interest 
exists. This assumption holds for signals satisfying the Dirichlet conditions, or informally, finite energy 
signals. We also permit impulse functions such that an extremely broad class of signals can be considered 
to have Fourier transforms.
Yim, All-Digital Multicarrier Demodulators 2-2
2 Complex Muitirate Signal Processing Complex Signal Representation 2.1
IS(f)l
18(f)!
-fc 0
(a) real signals
18(f)!
(b) complex signals 
18(f)!
0 fc
(c) analytic signal
Figure 2.1 Spectrum of various signal types
If the magnitude spectrum of a signal does not exhibit even symmetry, it is complex. For 
complex signals with even symmetry in the magnitude spectrum, we use asymmetric 
hatching to represent the signal type as in figure 2.1(b). This is also a convenient way to 
represent an arbitrary complex signal. In summary, we use hatcliing to emphasis the signal 
type. If hatching is not applied, the signal type under consideration is unimportant, and in 
general complex.
Analytic signals are a special class of complex signals that contain no negative frequency 
components, illustrated in figure 2.1(c). For a complex signal s{t) =x(t)+Jy{t)  with 
Fourier transform S if), s{t) is analytic if and only if 5' (/) = 0, V / < 0. We shall exclude the 
limiting case with a d.c. component, for simplicity in later discussions. The most important 
property of an analytic signal in signal processing is that knowing the real part of s (t) is 
as good as knowings (t) itself, i.e., a one-to-one mapping between5" (t) andjr (t). The signal 
y  (t) is the Hilbert Transform of jc (0, which sole purpose is to acliieve the desired spectral 
property with no negative frequency components. Although analytic signals occur
Yim, All-Digital Multicarrier Demodulators 2-3
2 Complex Multirate Signal Processing Frequency Translations 2.2
frequently in signal analysis for converting between real and complex signals, they seldom 
exist physically in practical signal processing systems because only the real (or imaginary) 
component is strictly necessary.
Figure 2.1 depicts the reason for our prime interest in complex signals. The difference 
between complex baseband, bandpass and analytic signals are the particular locations of 
their centre frequencies. In contrast, real baseband signals become complex after arbitrary 
one-sided frequency translations. Real bandpass signals have twice the bandwidth of their 
baseband associates.
2.2 Frequency Translations
We shall identify reversible frequency translations of signal spectra, and the conditions 
under which the original signal can be recovered without distortion. These reversible 
operations allow us to establish the equivalence of complex signals and their real 
counterparts, and the equivalence of signals at different frequency bands. The approximate 
knowledge of frequency values and incomplete control of timing references lead to 
frequency and phase errors respectively. These will be the subjects in synclironization.
2.2.1 Complex Sinusoid
A conceptually simple form of frequency shifting uses a complex sinusoid with
Fourier transform à(f-fc). The relationship between the original and frequency translated 
signal is given by
s^it) =s{t)
The Fourier transform shows the simplicity of this one-sided frequency translation:
We can obtain s (t) from s^it) with the knowledge ofX-
=s{t)
Yim, All-Digital Multicarrier Demodulators 2-4
2 Complex Multirate Signal Processing Frequency Translations 2.2
The corresponding spectral translation operation is
= S(f)
These simple expressions emphasise the simplicity and flexibility of the use of complex 
sinusoids, frequently encountered in DSP. In contrast to other forms of frequency 
translation, no images of the original signal are generated. Therefore no filtering is required 
to recover the original signal (assuming pure oscillators). This is true for arbitrary and 
signal type, both in the analogue and digital domain, although a real signal becomes 
complex after one-sided frequency translation.
2.2.2 Quadrature Modulation
For direct frequency translation using complex sinusoids, even if the signal s {t) is real,
Scit) is in general complex. Quadrature modulation enables us to shift a complex signal, 
with the resulting signal real, and the original signal can be obtained undistorted in a reverse 
process with the aid of filtering. Therefore we shall regard quadrature modulation as a 
form of frequency translation rather than related with any particular type of modulation.
The process involves simply taking the real part of
=  (2-1)
=X {t) COS 2%fJ - y  {t) sin 2%fy 
Alternatively, s 'g(^ ) can be written as
With Fourier transform
S ' S w ) = \ s ( f + f , ) + \ s i f - f , )
Taking the real part of a complex signal without losing information requires that Scit) is 
analytic, i.e.,^ is positive and large enough such that S  i f+ f )  and S i f - f c )  do not overlap.
Yim, All-Digital Multicarrier Demodulators 2-5
2 Complex Multirate Signal Processing Frequency Translations 2.2
Then the latter can be filtered and translated to the original position. Tliis shows that any 
real signal (without d.c. component for simplicity) can be represented by an analytic 
component and a conjugate image.
For the reverse operation, after quadrature mixing and ideal low pass filtering, the recovered 
baseband signal is
r{t) = { '^g(r) igos2%fy -Jsm2%fJ:)} *A^(r)
=  •1
= \ e +^5 (t) \ * *tpW
= ls( f )
Figure 2.2 shows the conceptual procedure in the forward operation. For the reverse 
operation, the signal (analytic component plus conjugate image) in figure 2.2(c) is one-side 
shifted such that the analytic component is at baseband ( figure 2.2(a) ), and the conjugate 
image at - %  (so called double frequency terms) is removed through lowpass filtering. 
The physical realisation, the so called quadrature modulator is shown in figure 2.3. Both 
the forward and reverse frequency translations have the same form, and common high 
frequency devices are bidirectional. Here the direction of the one-sided frequency shift is 
interchanged in the forward and reverse operation, as opposed to the above analysis and 
spectral interpretation. This can be proved by using negative Tliis suggests that negative 
analytic signals without positive frequency components have the same important properties 
as analytic signals (with no negative frequency components). The term quadrature 
multiplexing has been used to describe the sharing of the same bandwidth for two real 
signals. Considering these two signals as components of a complex signal, the term complex 
multiplexing seems appropriate.
Yim, All-Digital Multicarrier Demodulators 2-6
2 Complex Multirate Signal Processing Frequency Translations 2.2
(
(a) complex b 
ISc
)
aseband signal 
(f)l
(
(b) analy 
IS’c
) fc 
tic signal
(f)l
-fc 0 fc
(c) real bandpass signal 
Figure 2.2 Spectral illustration of quadrature modulation
Yim, All-Digital Multicarrier Demodulators 2-7
2 Complex Multirate Signal Processing Frequency Translations 2.2
Real
Baseband
Imaginary
Baseband
JPassband
90”
LPF
(a) quadrature modulator
—■ Re[.
(b) up conversion
LPF
(c) down conversion
Figure 2.3 Quadrature modulator
2.2.3 Two-sided Frequency Translation
If s it) is real, equation 2.1 becomes
=s(t) COS 2%fJ
In continuous wave modulation, this two-sided frequency shift is double-sideband 
suppressed carrier modulation, and the multiplication is commonly realised by the balanced 
modulator. Although this realisation is attractive, involving a single multiplication device, 
the resulting signal remains complex if the baseband signal is complex. The modulated 
spectrum for a real baseband signal shown in figure 2.4 shows redundant symmetries that 
implies inefficient use of bandwidth. Therefore using single-sideband modulation can 
reduce the modulated signal bandwidth by half. Direct realisations first obtain the analytic 
component through filtering or Hilbert transform, then this complex signal is frequency 
sliifted using a quadrature modulator. In digital modulation schemes where the signal is 
artificially synthesized, it is simpler to generate a complex base band signal directly.
Yim, All-Digital Multicarrier Demodulators 2-8
2 Complex Multirate Signal Processing Uniform Sampling Theorem for Arbitrary Signals 2.3
18(f)!
0
(a) real baseband signal 
iSc(f)i
-fc 0 fc
(b) real bandpass signal 
Figure 2.4 Double-sideband suppressed carrier modulation.
2.3 Uniform Sampling Theorem for Arbitrary Signals
In ideal sampling, for a continuous signal s (t), we obtain the instantaneous values s {nT), 
and represent these discrete-time values as an infinite precision sequence in the digital 
domain. The sampling period T is normally dropped because in the digital domain, the 
sampling rate is only reflected by the operating rate of the DSP system in real time. The 
corresponding continuous-time representation of these instantaneous values is by ideal 
impulses uniformly spaced in time:
s f t )=  Y. s{ t)-b{t-nT)
n = -° °
= i  s{nT)-à{t-nT)
n
Ideal A/D and D/A conversions involve alternations between the above continuous-time 
and discrete-time representations, both based on mstantaneous values of underlying 
analogue signals. We emphasis that any digital signal, in the form of a numeric sequence, 
has an equivalent continuous-time representation as a weighted impulse train, not just at 
the analog-digital boundary. We often prefer the impulsive representation. The absolute 
sampling period is naturally retained, well suited to illustrate multirate sampling. Also, we 
can conveniently use Fourier analysis for a digital signal. Let the Fourier transform of a 
continuous signal be »9(/), the frequency spectrum of the sampled signal is obtained by 
repeating 5* (/) at harmonics of the sampling frequency F  = p  Conversely, any digital signal
Yim, All-Digital Multicarrier Demodulators 2-9
2 Complex Muitirate Signal Processing Uniform Sampling Theorem for Arbitrary Signals 2.3
has a periodic spectrum, with an associated, though not necessarily unique, continuous 
signal. Figure 2.5 (a) and (b) shows the spectral interpretation. Here the original signal can 
be reconstructed through suppressing the undesirable harmonic images with a low pass 
filter. The conditions for perfect reconstruction are established by the uniform sampling 
theorem^ [Oppenheim83a, p. 517]:
For a continuous signals’{t) with bandlimited Fourier transform S if) 
such that 15'(/) 1= 0, for 1 /1 > F^, s it) can be uniquely determined 
without error by the set of its values at regularly spaced intervals of 
period T = p, provided that the sampling frequency F  > 2 /^ .
IS(f)l
0
(a) baseband signal 
ISo(f)I
0 T:
(b) sampled spectrum of (a), (c) and (d) 
IS’(f)l
0 F
(c) complex signal 
IS” (f)l
0 F
(d) real bandpass signal
Figure 2.5 Continuous signals with identical sampled spectra
2 According to Oppenheim, the sampling theorem appears explicitly in communication theories in 1949 due 
to Shannon. Earlier in 1928, Nyquist pointed out the minimum required sampling frequency.
Yim, All-Digital Multicarrier Demodulators 2-10
2 Complex Multirate Signal Processing Uniform Sampling Theorem for Arbitrary Signals 2.3
The sampling frequency 2Fj  ^is commonly called the Nyquist rate. Sampling at this rate
is called critical sampling, otherwise, over or under sampling. Although this theorem holds 
for arbitrary signals, its usefulness is limited to a narrow class. Figure 2.5 (c) and (d) shows 
two signals that have the same spectrum (b) after sampling at the same rate, as the signal 
in (a). If we follow the sampling theorem directly, a much higher sampling frequency is 
required. Also, given the sampled signal alone, we have an infinite number of possible 
original signals. Informally, this can be stated in a more useful way:
For a continuous bandlimited signals (f) with two-sided bandwidth 
B, s {t) can be uniquely determined without error by the set of its 
values at regularly spaced intervals of period T = p, provided that 
the sampling frequency F  >B, and the spectral location of s (t) on 
the frequency axis is given.
Avoiding the use of the Nyquist rate, we use sufficient sampling to refer to the case where 
the original signal can be reconstructed without error. Although complex signals are 
common in DSP, the sampling of complex signals received little attention. The reason is 
that complex continuous signals are rarely used and therefore rarely sampled directly. 
When complex signals arise internally in DSP systems, the system functions are justified 
through mathematical equivalence. When the sampling rate is discussed explicitly 
[Gardner85a, sec. 4.2], the emphasis is on special cases as in figure 2.6. In (a), the uniform 
sampling theorem applies directly, the minimum sampling rate of the baseband complex 
signal is the Nyquist rate. In (b), the bandwidth of the associated real signal must be doubled, 
and therefore doubling the minimum sampling rate. Although this is a particular case, this 
depicts the fact that there is no apparent disadvantage in using complex signals — a 
single-valued sequence at the high rate, or order-pairs at half the rate.
IS(f)l IS'(f)l
0 0 
(a) complex baseband signal (b) associated real signal
Figure 2.6 Association of complex baseband and real signals
Yim, All-Digital Multicarrier Demodulators 2-11
2 Complex Multirate Signal Processing Uniform Sampling Theorem for Arbitrary Signals 2.3
In emphasising the modified sampling theorem, we justify the meaningful existence of 
complex signals at certain sampling rates, at any point in a DSP system, not just the 
analog-digital boundary. The simplicity in sampling complex signals is emphasised again 
in figure 2.7. There is no axis of symmetry and we need not keep track of the/  = 0 point. 
Complex sampling may not be practical when the original complex signal is continuous. 
However, this relationship is essential in multirate systems, where both signals are digital. 
It may be noticed that in the T-SAT SCPC modems, for flexibility and high quality, dual 
D/A, A/D convertors have been used for complex analogue signals, after careful 
considerations of all alternatives.
IS(f)l
0
(a) complex bandpass signal 
ISo(f)l
0 F
(b) sampled spectrum
Figure 2.7 Bandpass complex sampling
Another useful form of sampling is for real bandpass signals as in figure 2.8. Only signals 
occupying particular bands on the frequency axis do not result in spectral overlapping of 
their analytic components and conjugate images after sampling. This so called integer-band 
sampling has been extensively discussed in the literature, for example, [Crochiere83a, sec 
2.4], and conditions ion sufficient sampling have been established. Alternatively, we can 
treat the analytic component and the conjugate image as two separate signals. We use the 
modified sampling theorem to work out the two sampled spectra separately, and then see 
if these spectra overlap. We emphasis that this is only important when sampling a 
continuous real bandpass signal. Once in the digital domain, we can replace real bandpass 
signals by complex signals in various ways as the opportunity arises. Then we can use the 
simple relations for sampling complex signals without resorting to graphical means. Since 
discontinuous bandwidth is inconvenient to express, it is common to use graphical 
solutions, although too simple to be considered as such.
Yim, All-Digital Multicarrier Demodulators 2-12
2 Complex Multirate Signal Processing Complex Filtering 2.4
0
(a) real bandpass signal 
ISo(f)l
0
(b) sampled spectrum
Figure 2.8 Integer-band sampling
2.4 Complex Filtering
Complex filtering is illustrated in figure 2.9. Here a real bandpass signal is chosen to 
illustrate that it is conceptually simple to obtain the analytic component or the conjugate 
image. This is one of the justifications for our prime attention to complex signals. Without 
symmetry, the line /  = 0 on any spectrum is unimportant
Yim, All-Digital Multicarrier Demodulators 2-13
2 Complex Muitirate Signal Processing Complex Filtering 2.4
-fc fc
(a) real bandpass signal
(b) complex bandpass filter
(c) filtered signal
/ / \f ./ AZ. \ / 1 \  . f
(d) associated digital filter
Figure 2.9 Complex filtering 
Theoretically, a complex filter can be obtained from a lowpass prototype:
J 2 x fJ t-x )h{t)=hjr>{t) e
This one-sided frequency translation of the impulse response of a filter is indistinguishable 
from that of a signal in theory. If we take arbitrary timing reference, the filtered signal will 
experience a complex constant gain. In communications, this phase rotation will be 
automatically taken care of in the process of synchronization. Therefore we shall enjoy 
this advantage and ignore timing references in following discussions.
Analogue complex filters are impractical. However, their digital counterparts implemented 
tlirough Finite Impulse Response (FIR) filters are very important. In the digital domain, a 
complex filter is given by
h{tiT) =hjjy{tiT) éj 2 % f n T
Yim, All-Digital Multicarrier Demodulators 2-14
2 Complex Multirate Signal Processing Bandpass Multirate Digital Signal Processing 2.5
Complex filters with frequency translations are encountered frequently in filter bank 
operations. However, the emphasis is on the equivalence of frequency shifting real signals 
prior to real filtering [Crochiere83a, sec. 7.7.2]. We introduce analogue complex filters to 
illustrate the validity of their digital counterparts. Digital filters have periodic spectra, the 
same as digital signals, shown in figure 2.9. If the theoretical impulse response of an 
analogue filter is sufficiently sampled, the basic shape of the filter is preserved. Digital 
filtering is the multiplication of a filter spectrum with a signal spectrum, both being periodic.
In contrast to analogue complex filters, both the design and realisation of FIR complex 
filters are straightforward. We obtain the real lowpass prototype using common FIR design 
tecliniques. The explicit finite sequence of filter coefficients hu>{nT) is indistinguishable 
in theory and practice from a finite time signal. Therefore we can shift the spectrum of the 
filter using complex sinusoids prior to system initialization, just as we do for signals in 
real time. Theoretical filtering, i.e., spectral multiplication, is the convolution of two 
signals, one being the impulse response of a filter. Because of finite impulse response, 
filtering can be performed directly through discrete convolution in real time, without 
involving explicitly any form of transfer function. For a valid digital lowpass prototype, 
its impulse response is sufficiently sampled. Therefore the associated complex filter is also 
sufficiently sampled, independent of This centre frequency is arbitrary in the sense that 
it can be larger than the sampling frequency. The complex exponential function limits the 
actual centre frequency to f  modp automatically.
2.5 B andpass Multirate Digital Signal P rocessing
This section shows that multirate systems (linear periodically time-varying in general) can 
be easily explained by underlying single-rate processes (linear time-invariant). 
Furthermore, decimation can be described as digital-to-digital resampling. Digital 
anti-aliasing filters (D AAF) share the same design considerations as analogue anti-aliasing 
filters (AAAF) in analog-to-digital sampling. Interpolation can be described as 
digital-to-digital reconstruction. Digital anti-imaging filters (DAIF) share the same design 
considerations as analogue anti-imaging filters (AAIF) in D/A reconstruction.
To simplify multirate designs, we introduce the concept of sampling invariance, and we 
concentrate on the most general case, complex bandpass signals and multirate complex 
bandpass filters. We shall show that the design procedures for complex bandpass filters 
are more convenient than real bandpass filters. Commonly used modulo operators or 
graphical solutions are not necessary. (As a simple example, in integer-band sampling we 
have used the spectrum to show the constrains on the signal bandwidth location. The
Yim, All-Digital Multicarrier Demodulators 2-15
2 Complex Multirate Signal Processing Bandpass Multirate Digital Signal Processing 2.5
location of signal centre frequency can also be stated mathematically as particular integer 
multiples of the sampling frequency.) This will lead us to multistage complex filters, 
which are absent from [Crochiere83a, sec. 5] or limited to particular cases [Gardner85a, 
sec. 3.5]. Multistage complex filters are very important in digital I.F. processing, as we 
shall see in later sections.
2.5.1 Decimation
In sampling rate compression (or down-sampling) by an integer factor M, a sequence 
denoted by 5 {nT) is modified by retaining only one sample for every M successive samples, 
becoming s{nMT). This is illustrated in figure 2.10. (For complex signals, only one 
component is shown.) The effect is easily explained by considering an analogue signal 
sampled by two different rates, one being an integer multiple of the other. Figure 2.11 
shows the corresponding spectrum when the analogue signal is sufficiently sampled in 
both cases. When there are undesirable frequency components, anti-aliasing filters must 
be used to protect the signal bandwidth of interest — to avoid overlapping of desirable 
and undesirable frequency components. Decimation refers to the integrated process of 
digital anti-aliasing filtering and sampling rate compression.
s(t)
(a) continuous signal
s(nT)
(b) sampling at rate F = 1/T 
s(3nT)
(c) sampling at rate F/3
Figure 2.10 Sampling rate compression
All-Dlgltal Multicarrier Demodulators 2-16
2 Complex Multirate Signal Processing Bandpass Multirate Digital Signal Processing 2.5
IS(f)l
(a) continuous spectrum
IS,(f)l
(b) sampling at rate F
IS,(f)l
J
F / 3
(c) sampling at rate F/3
Figure 2.11 Sampling rate compressed spectrum
The gain factor associated with sampling is rarely considered in analog-to-digital sampling. 
In practice, this is determined by the A/D convertor — die full scale input range and the 
number of output bits. The theoretical scaling factor is absorbed internally in the A/D 
convertor. In sampling a continuous signal with spectrum 6" (/) at rate F, the sampled 
spectrum can be denoted by F ^S if-kF ),  /: = 0,±1,±2,.... For the sampling rate 
compressed signal, this becomes Although the spectral scaling factor is
different, no gain adjustment is required in sampling rate compression, as evident in the 
time domain description. This scaling factor relationsliip is a direct consequence of the 
Fourier transform. For different sampling frequencies, the period and the scaling factor 
have to agree to represent the same underlying continuous signal with equal power.
Figure 2.12 shows the operation of an analogue anti-aliasing filter (AAAF) for sampling 
a continuous signal. The same considerations apply to resampling a digital signal as in 
figure 2.13. Here a digital anti-aliasing filter (DAAF) is used. The spectrum of a digital
Yim, All-Digital Multicarrier Demodulators 2-17
2 Complex Multirate Signal Processing Bandpass Multirate Digital Signal Processing 2.5
filter has the same periodic characteristics as that of a digital signal. If each periodic image 
of the DAAF is identical to the AAAF, the directly sampled signal in figure 2.12(b) and 
the decimated signal in figure 2.13(b) are identical.
AAAF
(a) continuous signal spectrum
fc F
(b) sampled signal spectrum at rate F
Figure 2.12 Analogue anti-aliasing filter
DAAF
Ï
fc F
(a) sampled signal spectrum at rate 3F
fc ' F '
(b) sampled signal spectrum at rate F
Figure 2.13 Digital anti-aliasing filter
Yim, All-Digital Multicarrier Demodulators 2-18
2 Complex Multirate Signal Processing Bandpass Multirate Digital Signal Processing 2.5
The design of the DAAF bandwidth is greatly simplified using complex signals and filters, 
and the modified sampling theorem. The bandwidth considerations are the same as 
baseband processing, which are simpler than two-sided real bandpass signals or filters. 
Let the signal bandwidth be B and the decimated sampling frequency F. For the lowpass 
prototype filter, the two-sided passband bandwidth equals B. From figure 2.13(b), the 
minimum sampling rate equals half the total filter bandwidth, plus half the passband 
bandwidth. The conventional passband and stopband edge frequencies for the lowpass 
prototype are
/.=!
X-F-f
This filter specification will be referred to as minimum protection of the desired signal 
bandwidth, and the filter order required is minimum. Hie required filter is finally obtained 
by frequency shifting the prototype to coincide with the bandwidth of the signal. In this 
way, analysis using modulo operator or graphical solution is not necessary. This will be 
illustrated throughout the MCD design. From figure 2.14, we see that there are significant 
alias components due to undesirable adjacent signals, which seem difficult to remove. 
However, this is a feature of the original frequency plan and not an undesirable limitation 
of the DAAF. If there are guard bands at the original frequency plan, these guard bands 
can be regarded as part of the signal bandwidth and therefore protected. In practice, the 
guard bands should be partially protected with a more relaxed filter specification, as guard 
bands are normally provided to ease filter realisations in the first place. Efficient 
implementations of digital decimators avoid the computation of the unnecessary signal 
samples that are discarded during compression.
2.5.2 Interpolation
In sampling rate expansion (or up-sampling) by an integer factor L, a signal sequence is 
modified by inserting L - 1  zeros between samples. Tliis is illustrated in figure 2.14. Using 
the sequence representation, the sampled signal is a sequence of numbers. Because of 
expansion, the signal sequence is modified, apparently lengthened and therefore the 
sampling rate is increased. However, using the equivalent impulse representation as shown, 
there is no modification to the original signal and therefore both signals have the same 
Fourier transform. To correspond to an analogue signal sampled at the higher frequency.
Yim, All-Digital Multicarrier Demodulators 2-19
2 Complex Multirate Signal Processing Bandpass Multi rate Digital Signal Processing 2.5
unwanted images have to be removed using a digital anti-imaging filter (DAIF). This is 
commonly called interpolation, illustrated in figure 2.15. The original and expanded 
spectrum is the same as in (a). If we consider the lower sampling frequency, there is a 
single image in every frequency interval F. If we consider the higher sampling frequency, 
there is L images in each interval L • F. Therefore L - 1  images have to be removed. The 
DAIF bandwidth can only be represented in the high rate with zero samples inserted. In 
the lower rate, this bandwidth is larger than the sampling frequency.
s(nT)
(a)
s(nT/3)
(b)
Figure 2.14 Sampling rate expansion
lSo(f)l DAIF
Ic
(a) sampling rate expanded spectrum
IS,(f)l
(b) image supressed spectrum
Figure 2.15 Digital anti-imaging filter
Yim, All-Digital Multicarrier Demodulators 2-20
2 Complex Multirate Signal Processing Bandpass Multirate Digital Signal Processing 2.5
In contrast to decimation, a gain factor of L is required. Let F  *S(f—kF) denotes the 
original periodic spectrum that is obtained from sampling the signal 5* (/) at rate F. The 
sampling rate expanded spectrum remains the same. After the removal of undesirable 
images, the Fourier transform becomes F  • S i f —kLF). However, the Fourier transform for 
directly sampling 5* (/) at rate L • F  is Z, • F  ♦ S if-kLF).  The gain L is usually provided by 
the DAIF.
The design of DAIF is straight forward as compared with DAAF, without the potential 
problem of spectral overlapping. The prototype filter specification is exactly the same as 
in decimation above. If B  is the finite bandwidth of the signal to be interpolated, this 
specification corresponds to maximum suppression. Here all harmonic components of the 
original signal are suppressed. If partial image suppression is adequate for the application, 
a lower order filter suffices. The same filter specification applies to digital-to-analog 
reconstruction, where both the spectrum of the AAIF and the final reconstructed signal is 
non-periodic. Efficient digital interpolators avoid multiplication involving the zero-valued 
samples.
2.5.3 Spectral Mapping
In the advanced network concepts of multirate systems [Crochiere83a, sec. 3.5], sampling 
rate change is accompanied by a mapping of signal spectra from the old frequency axis to 
the new, as shown in figure 2.16. For a signal sampled at Fq with Fourier transform 
only the spectrum in the principal interval from 0 to Fq enters consideration. Since the 
signal spectrum is periodic, the spectral shapes at other frequency intervals are identical. 
After sampling rate expansion (zero insertion) by 3, the mapping results in 3 identical 
harmonic images in the new principal interval [0,FJ, as shown by the arrows. To be 
considered sampled afF^, 2 of the images have to be removed using a filter designed and 
operated at the high rate. For sampling rate compression (sample rejection) by 3, the 
bandwidth in the principal interval [0,FJ has to be limited to Fq by a filter designed and 
operated at the high rate, otherwise aliasing (spectral overlapping) occurs. Efficient 
implementations of these filters are based on avoiding redundant computations. They are 
multiplications with zero in interpolation, and computations of samples destined for 
rejection in decimation. As evident from the mapping concept, common filter specification 
design procedures are either graphical or involve modulo operators, concentrating on the 
principal interval. Although in simple cases, inspections suffice, these become tedious for 
multistage designs involving bandpass signals.
Yim, All-Digital Multicarrier Demodulators 2-21
2 Complex Multirate Signal Processing Bandpass Multirate Digital Signal Processing 2.5
{
Figure 2.16 Muitirate spectral mapping
Analogue signals can be considered to have an infinite sampling frequency. This is a 
particular case in spectral mapping when one principal interval has infinite length. 
Therefore sampling and reconstruction share the same design considerations as decimation 
and interpolation. The main distinction is that in resampling, both the spectra of signals 
and filters are periodic.
2.5.4 Sampling Invariance
We can simplify the design of multirate systems by the original concept — frequency 
components are considered a s invariable. The common emphasis on the principal 
interval is not beneficial. Let us consider a frequency component in the analogue domain 
fQ. Sampling with rate Fq, harmonics of appear at/, -  IcFq, f o r^  < Fq, â: = 0, ±1, ±2,... etc.
( If yô > F^, harmonics appear at (^modFo)-^Fo ). After decimating to rate F ^ = ^ ,
harmonics appear a t^  -kF^. If only the principal interval is considered,^ appears to be 
shifted to /o  =fo modFj. This is commonly regarded in the DSP literatures as frequency 
shift associated with decimation. However, a component still exists a t^ , independent of 
the sampling frequency. Therefore in multirate systems, it is best to disregard the principal 
interval and that the original frequency component have not shifted at all. If the sampling
rate is compressed to F2  — adhering to the principal interval in every sampling rate
change, the new component is located at/ ' 0  = % m odFj mod This is unnecessary as 
/"o = /) mod F 2 . Therefore we can focus on the original analogue component /  for any 
sampling frequency thereafter.
Harmonic images are equivalent. Operations in the digital domain are periodic, e.g., 
filtering, therefore we can focus on any one of the harmonic images as if it is a continuous 
signal. The centre frequency of a signal is arbitrary in the sense that it can be larger than
Yim, All-Digital Multicarrier Demodulators 2-22
2 Complex Multirate Signal Processing Realization of Multirate Filtering 2.6
the sampling frequency. In this case we are focusing on a different but equivalent image. 
Because of the existence of multiple sampling rates, the common practice of focusing on 
the principal interval of signal spectra from 0 to the sampling frequency F  is no longer 
convenient We can focus on any frequency interval with length F. For example, we can 
filter the spectral image centred at /  by frequency translating the prototype filter hu>{n ) 
to become hi^{n)e We also can filter the image centred at / + F  to become 
hip{n)e ^  which is identical with the former case. The same applies to arbitrary 
translations of the images itself using e If we focus o n / ,  the new focus is a t /  + / .  
This avoids the overhead of adhering to the principal interval using modulo operators, 
where the new focus in the principal interval i s /  = ( /  mod F) + (^ mod F)) mod F.
2.6 Realization of Multirate Filtering
Efficient implementation of basic multirate DSP functions avoid unnecessary 
computations that is not straightforward to express mathematically. An original approach 
to the derivation of multirate structures is presented here, based on early parallelization of 
the input signal, output signal and filter coefficients. The novel expressions of 
polyphase-matrixstructures for rational sampling rate conversion illustrate the conciseness 
of this approach. The alternative structure offered by Crochiere and Rabiner [Crochiere83a, 
p. 91] is derived from integerizing. Both the approach and resulting structure seem to be 
unnecessarily complicated. The new approach shows that practically used periodically 
time-varying processes can be routinely built up from time-invariant processes, e.g., simple 
digital filters, with straightforward periodic control sequences.
The aim of deriving polyphase (or similar) structures is to bridge the gap between theory 
and implementation. Once a polyphase structure is arrived at, the necessary computations, 
inherent parallelisms, and control strategies are then apparent. These can be formally 
represented using Synchronous Data-Flow (SDF) graphs [Lee87a], which is superior to 
conventional signal-flow graphs in terms of clarity and usefulness in applying CAD tools.
2.6.1 Periodically Time-Varying Systems
A combination of interpolation and decimation leads to sampling rate change by a rational 
ratio. This belongs to the general class of periodically time-varying systems.
Letj(/r To) denote a signal sampled with period Tq. The continuous-time representation is
s'{t)= i  s{nT^)b{t-nT^)
Yim, All-Digital Multicarrier Demodulators 2-23
2 Complex Multirate Signal Processing Realization of Multirate Filtering 2.6
This signal is passed through a filter with impulse response h{t). By the definition of 
impulse response,
r{t)= Z s{n T ^h { t-n T ^
n  = - = »
If the filtered signal is sampled with period 7\, we have
r{kT^)= Z s{nT^)h{kT,-nT^)
Let Tq = LT 2  and 7\ =M7^, where are positive integers. The previous expression 
becomes
i  sinT^)hi]cMT2-nLT^
Since the index to the filter impulse response values is always an integer, a filter sampled 
at period 7^  is all that is required. This expression means that if die sampling rate ois{n) 
and r{n) are related by a rational ratio, all operation can be performed digitally without 
the above intermediate D/A conversion. With a digital FIR filter, the summation limit is 
finite and realizable, covering all the non-zero coefficients oîh{n). Since the elements of 
the sequences are unambiguously indexed, the sampling periods are no longer useful and 
can be dropped, giving
(2 2)
r{k)= Z s{n) h{kM-riL)
n = - “
Hiis expression does not lead to direct implementation. For an FIR filter with finite 
non-zero coefficients, although the convolution length is finite, but the limit for n grows 
with k. The integerizing approach introduces the floor function L« J that returns the largest 
integer less than or equal to k . By making the change of variables
n = kML - I
and applying equation (2.2) gives
r(Xr) = Z s kM - I
/
h{iL-\-kM@L)
Yim, All-Digital Multicarrier Demodulators 2-24
2 Complex Multirate Signal Processing Realization of Multirate Filtering 2.6
where i@ L  represents the value of i modulo L. Although tliis approach yields the 
minimum computation rate, a direct interpretation of this expression do not result in 
efficient control strategies. For example, Crochiere and Rabiner derived a sampling rate 
convertor that uses digital sample and hold constructs to realize the floor function in real 
time, which is not desirable in both software and hardware. In our approach, parallelization 
is applied in an early stage to equation (2.2), such that the change of variables does not 
require floor functions or modulo operators.
2.6.2 Synchronous Data-Flow Graph Representation
This section illustrates the advantages of the Synchronous Data-Flow (SDF) model for 
specifying multirate systems. Let us first examine a simple FIR structure that is the 
fundamental building block of complicated multirate systems. For L =M  = 1, equation 
(2.2) becomes the familiar discrete convolution:
r{k)= Z s{n )h {k -n )
n = —«»
By the change of variables n = k - i ,
N - l
r{k)~  Z s{k - i)h { i)
i  =  0
where N is the length of the FIR filter, i.e., the number of consecutive non-zero coefficients. 
Tlie corresponding SDF graph for A = 5 is shown in figure 2.17, commonly known as the 
multiply-accumulate-delay structure.
•(1) i(0)
s(n)
s 4 3 2 1 0
h 0 1 2 3 4
r(n)
Figure 2.17 SDF Graph of an FIR 
Structure
Figure 2.18 An Abstract FIR Node
Yim, All-Digital Multicarrier Demodulators 2-25
2 Complex Multirate Signal Processing Realization of Multirate Filtering 2.6
The SDF representation consists of nodes that perform specific functions, joined by arcs 
that represents signal flow paths. The SDF model is not a standardization of the symbols. 
Here the shaded circles represent branching nodes, the square boxes are delay elements, 
and the triangles are scalars, i.e., multiplications by a constant. The filter coefficients h{n) 
shown are fixed for each scalar, whilst the input signal slides across the delay elements. 
The sample values s{n) shown represent a snapshot of the system. Figure 2.17 is not much 
different from a conventional signal-flow graph. However, there are distinct rules for an 
SDF graph:
(1) Each node can have an arbitrary but fixed number of input and output arcs.
(2) Each arc j oins exactly two nodes.
(3) An arc performs perfect transportation of samples. No samples sent by a node 
will be lost or arrive out of sequence.
(4) When a node performs its specific function (fired), an arbitrary but fixed 
number of samples are consumed from its input arcs, and produced at its 
output arcs.
(5) A node is fired only when all its required samples have arrived.
This formalism decouples the desired signal processing functions from the implementation 
details such as the control of a finite-state machine. In contrast, a conventional signal-flow 
graph can be directly executed as a piece of hardware, and it is well known that large scale 
digital design methodologies do not begin in such a low level abstraction.
It is apparent that an SDF model of a system assumes lossless asynchronism for correct 
operation. However, the data-flow is synchronous in the digital signal processing sense, 
because of the fixed topology, and the fixed numbers of samples consumed and produced 
by every node.
Tlie advantages of SDF can be seen even in the operations of the primitive nodes. For an 
SDF branch node, each input sample is duplicated in each output arc. These samples can 
be consumed by the following nodes at different times. In contrast, for a signal-flow graph, 
all samples have to be consumed in parallel. If this is not the case, additional delay elements 
have to be introduced for correct operation, which are inherently linked to particular 
implementation techniques. For an SDF delay element, a sample is stored when received, 
and the previous sample is output. A zero-valued sample is stored as the previous sample 
during system initialization. In signal-flow graplis, a delay element is a shift register 
(word-wise) that performs input and output simultaneously.
Yim, All-Digital Multicarrier Demodulators 2-26
2 Complex Multirate Signal Processing Realization of Multirate Filtering 2.6
Any subsections of an SDF graph can be regarded as a node, because the input/output 
relations to the outside system also satisfy the SDF rules. Therefore complex systems can 
be specified in hierarchy form. The abstraction of an FIR structure as a node is shown in 
figure 2.18. The indexes for the signal shown represent an instant of the node for illustrative 
purposes only. These and the fixed filter indexes may be omitted for briefness.
An SDF graph is a hierarchical block diagram, therefore more suitable for human 
interpretation. More important, through the formal specification of each node, a control 
strategy can be worked out automatically for different realizations. For example, in the 
transputer, the channel definition is exactly the function of an arc. Each node can therefore 
be simply programmed as a process. In conventional signal processors or computers, the 
execution order of node functions can be inferred from the SDF specifications. This static 
scheduling is the basis for automatic code generation (for multiple target languages) from 
a data-flow style specification of a system to a run-time efficient procedural language 
program. In case of multiple processors with synchronous or asynchronous 
communication, computer optimization can be applied to search for the most efficient 
parallel schedule.
2.6.3 Efficient Complex FIR Structures
For complex signals and filters, if the discrete convolution
rijc) = Z si]c-i)h{i)
1 = 0
=s{k)"^h{k)
is implemented directly as complex multiplications, the inherent 
multiply-accumulate-delay sequence of a real FIR structure is destroyed. By defining
sijc)=x{k)+j y  {k) 
h{k)=f{k)+j g{k)
we have the relation
s {k)^h {k) =xijc)^fijc) +y {k)*[-g (^)] +J x {k)^f{k) +j y  {k)^g {k)
We see that complex filter can be implemented as 4 simple FIR structures, or 2 such 
structures by combining all the real and imaginary convolutions. Because of the similarity, 
real or complex filter structures will not be separately discussed.
For a linear phase (constant group delay) real filter with N  taps
Yim, All-Digital Multicarrier Demodulators 2-27
2 Complex Multirate Signal Processing Realization of Multi rate Filtering 2.6
- \ - k )
Therefore the necessary multiplications cure, approximately reduced by half. For the 
transformed complex filter with centre frequency /  and sampling frequency F ,
fç_
h{k)=hu>{k) e ^
Narashima and Peterson [Narasimha79a] stated the condition
such that h (J V -1 -k )  =±h*(k). Therefore multiplications are reduced by a factor of 2. 
This property is used by Gardner [Gardner85a, p. 3.28] in application to MCD’s.
Because of the restriction on the centre frequency, this reduction may not be universally 
applicable. Alternatively, the same symmetric property can be achieved by introducing a 
constant phase rotation to the complex filter [Yim89a]:
This original frequency shift plus phase shift transform is universally applicable. In 
demodulators, any constant phase rotation will not affect the process of carrier 
synchronization, which assumes an unknown phase rotation. In general applications where 
there are more than two complex filters, one single filter can compensate the resultant 
phase rotation of all other filters. Since symmetry cannot be exploited in the compensation 
filter, the shortest filter should be chosen for this purpose.
2.6.4 Decimation
Substituting L = 1 into equation (2.2), we have the decimation by M expression:
r{k)= Z sin) h {kM -n )
n = - « *
The reduced computation is simply achieved by shifting M  samples into a conventional 
filter before computing one output. Since a straightforward implementation basal on this 
description is not satisfactory in either hardware or real-time software, the polyphase 
structure is often examined.
Parallelizing the input signal by defining
Yim, All-Digital Multicarrier Demodulators 2-28
2 Complex Multirate Signal Processing Realization of Multirate Filtering 2.6
n - m M - q  ^
Double summation is then required for n , giving
M— 1 <»
r{k)= Z Z s{m M -q) h{kM -m M + q)
q - Q  m = - ^
A/— 1
r{k)~  Z Z sÂ m )h Â k-m )
q  = 0  m = - o o
Changing variables by m = / : - / ,  we have
M—\ “>
r(/:)= Z Z s { k - i ) h  {i)^=0j=-~  ^ ^
The parallelized filter (polyphase subfilters) and signal are
h ^U )= h m + q )
s^{i)=s{iM-q)
The SDF graph of a 4 to 1 decimator is shown in figure 2.19. The input signal is split by 
a serial-to-parallel commutator. 4 samples are taken from the input arc and deposited into 
the output arcs in an anti-clockwise sequence. The initial output arc is the lowest branch, 
i.e., ^ = 3, as opposed to Crochiere’s commutator where ^ = 0. Because of the SDF 
constrains, the input signal is effectively advanced in time, i.e., Sq (/) = s{iM -q  +M -1).
The result is a commutator less error prone for interpolation, with less fractional delay 
compared to the conventional model.
The attractiveness of the polyphase structure is the simplicity in scheduling both in 
hardware and software. Parallelism is easily exploited by using distributed adders instead 
of the summing node. However, the symmetry of a linear phase filter is not exploited. In 
alternative structures that make use of symmetry, control complexity is increased, 
parallelism is reduced, and additional delay elements are required.
Yim, All-Digital Multicarrier Demodulators 2-29
2 Complex Multirate Signal Processing Realization of Multirate Filtering 2.6
So 19 15 11 7 3
ho 0 4 8 12 16
Si 18 14 10 6 2
hi 1 5 9 13 17
Sa 17 13 9 5 1
h. 2 6 10 14 18
Sa 16 12 8 4 0
h, 3 7 11 15 19
Figure 2.19 Polyphase Decimator.
2.6.5 Interpolation
Substituting M = 1 into equation (2.2), we have the interpolation by L expression:
riJc)- Z s{n)h{Jc-nL)
n = - «
This shows that only some of the filter coefficients are used for each output. The selection 
of coefficient is made clear by parallelizing the output signal according to
k = lL+ p  /7 =0,1, ...,L — 1
Focusing on only one branch of the output signal, we have
r(/L+/7)= Z s{n)h{lL+p -n L )
n = - o o
= Z s{n) h[{l-n )L+ p]
n = - ~
Defining the parallelized output signal and filter as
rp{i)=r{iL+p)
hp(i)=h(iL+p)
we get
/,(/)= Z s(n)hp{l-n)
By the change of variables n = i  - I ,
Yim, All-Digital Multicarrier Demodulators 2-30
2 Complex Multirate Signal Processing Realization of Multirate Filtering 2.6
/)(/)=  Z s { l - i ) h Ai)
^  , - = _ o o
This original expression allows us to focus on one of the components of the parallelized
output signal, leading to more sophisticated structures later. Figure 2.20 shows a 1 to 4 
interpolator. The ^parallel' output signal is assembled (if required) by a 1 to 4 
parallel-to-serial commutator, taking one sample from each of the polyphase subfilters, 
and then output these samples in a counter-clockwise sequence. Figure 2.21 emphasises 
that the same signal samples are shared amongst all the subfilters.
If the prototype filter has only linear phase symmetry, at most one subfilter is symmetric. 
This lack of symmetry is not due to polyphase decomposition, but inherent. Still, 
multiplications can be reduced by approximately half if we store each product term for 
latter use in other branches. This requires large increase in memory, leading to alternative 
structures that are not regarded as polyphase. Tlierefore exploiting symmetry in structures 
involving interpolation is not attractive.
s 4 3 2 1 0
h. 0 4 8 12 16
s 4 3 2 1 0
h. 1 5 9 13 17
1 4 3 2 1 0
K 2 6 10 14 18
1 4 3 2 1 0
h, 3 7 11 15 19
S 4 3 2 1 0
ho 0 4 8 12 16
hi 1 5 9 13 17
h. 2 6 10 14 18
hs 3 7 11 15 19
Figure 2.20 Polyphase Interpolator Figure 2.21 Polyphase Interpolator with 
Shared Input
2.6.6 Polyphase-Matrix Structure for Sampling Rate Conversion
Sampling rate conversion by a rational ratio is accomplished by an 1 to L interpolator, 
followed by an M to 1 decimator. This order guarantees sufficient sampling for the 
intermediate signal when both the input and output signal are sufficiently sampled.
For a band-limited signal with two-sided bandwidth F , sampling frequency Fq, and centre 
frequency/ ,  the specification of thepost-filter for the interpolator is
fp =5 /2  
f s = P o - \
Yim, All-Digital Multicarrier Demodulators 2-31
2 Complex Multirate Signal Processing Realization of Multirate Filtering 2.6
with sampling frequency LFq and centre frequency/
The specification of the pre-filter for the decimator is
also with sampling frequency LFq and centre frequency/ .  F^—Ff^iM  is the output 
sampling frequency of the convertor.
In the interpolator-decimator order, these two filter are in cascade. Since both are frequency 
translations of lowpass filters, only one filter is actually required, depending on which 
stopband frequency is lower. Additional passband characteristics or a narrower stopband 
can be imposed as desired for particular applications.
Recalling that the general multirate sampling equation is
r(Xr) = Z s{n) h{kM -nL )
n
Parallelizing the output signal by
k — lL+ p
we have
r{lL+p)= Y  s{n)h[{lL+p)M -nL]  
/ ,( /)=  Z s{n) h{ lM L + p M -nL )
n = -° °
Parallelizing the input signal by
n = m M -q
we get
A/ — 1
/ ,( /)=  Z  Z s{m M -q)  h{lML -m M L  ■\‘+pM+qL)
 ^= 0 m =-»
A/ — 1
/ ,( /)=  Z Z s{mM-q)h[{l-m)M L-\-pM+qL'\q=0 m =-=
M — 1
= Z Z Sqim) h il - m )
q = 0  m = - ~
Yim, All-Digital Multicarrier Demodulators 2-32
2 Complex Multirate Signal Processing Realization of Multirate Filtering 2.6
By changing variables, the polyphase-matrix structure results
M  — \
r  (/)= Z Z s { I - i )h  {i)
The novel polyphase-matrix filter is defined as
hpqii) =h{iML +pM+qL)
Let h (/) be an N taps filter, such that
h (/) = 0 for /< 0  and i> N
In the polyphase-matrix convolution, the inner summation index i ranges from to 
For an FIR filter, the minimum range %, f j  is such that
For example, if i starts from 0, some of the coefficients are not used. Therefore i starts 
from -1. Alternatively, since
0< {iM L + pM + qL )< N - 1  , 
the formal limits for each subfilter are
—
L M
N - l  p  q
ML L M
where the floor function \_u J gives the largest integer equal to or less than u , and the ceiling 
function \u \  gives the smallest integer equal to or larger than u .
The polyphase-matrix structure is shown in figure 2.22, with L = 4, M = 3 and N  = 48. 
The crosses represent zero valued coefficients with out of range indexes. Figure 2.23 
emphasis that the input signal are shared amongst polyphase branches.
This structure can also be inferred from the polyphase interpolator in figure 2.20. If the 
output parallel-to-serial commutator advances by M instead of 1, the desired L/M convertor 
results. A new sample is shifted in whenever the commutator passes the last branch.
This decomposition procedure is valid for arbitrary L and M, with minimum computation 
rate, and minimum number of delay elements to store the signal samples. This approach 
can also be used to exploit FIR filter symmetry, but the delay elements have to be increased.
Yim, All-Digital Multicarrier Demodulators 2-33
2 Complex Multirate Signal Processing Realization of Multirate Filtering 2.6
ho,: X 8 20 32 44
K
ho.o 0 12 24 36
ho., X 4 16 28 40
X |3 |1 5 27
g  7 19 31 43
X|11|23|35|47
ho.o X 6 18 30 42
h ,, X 10 24 34 46
ha.. 2 14 28 38 M
ha.o X 9 21 33 45
h,., 1 13 25 37 X
h,.. 5 17 29 41 X
Figure 2.22 Polyphase-Matrix Structure
So 14 11 8 5 2
ho.o 0 12 24 36
h,.o X 3 15 27 39
h:.o X 6 18 30 42
ha.o X 9 21 33 45
S: 13 10 7 4 1
ho.i X 4 16 28 40
hi., X 7 19 31 43
ha.. X 10 24 34 46
ha.. 1 13 25 37 X
s. 12 9 6 3 0
ho.a X 8 20 32 44
h,.a X 11 23 35 47
ha.a 2 14 28 38 X
ha.a 5 17 29 41 X
Figure 2.23 Polyphase-Matrix Structure with Shared Signal
Yim, All-Digital Multicarrier Demodulators 2-34
2 Complex Multirate Signal Processing Summary 2.7
and the expressions are not attractive. Therefore later chapters will be concentrating on 
general filters without symmetry. This would not forbid the manipulation of symmetry in 
later stages of implementation.
In conventional sampling rate conversion applications, L and M  is mutually prime. This 
can be directly applied to transmultiplexors such that demultiplexing and sampling rate 
conversion can be performed in a polyphase-matrix-DFT structure very efficiently. Also, 
we shall discover later that ratios that are not mutually prime, such as 2/30, are very 
important in multistage polyphase approaches.
2.7 Sum m ary
2.7.1 Generalized Sampling Theorem
In ideal uniform sampling, a digital signal sequencers the instantaneous values of the 
corresponding analogue signal equally spaced intime. The continuous-time representation 
of a digital signal is a sequence of unit impulses weighted by such instantaneous values. 
The spectrum of a sampled signal is a repetition of that of the corresponding continuous 
signal. These allow convenient handling of multirate sampling by introducing an ideal 
D/A convertor followed by an ideal A/D convertor as a conceptual intermediate step. (In 
practical conversions, the implementation loss due to non-ideal anti-aliasing or 
anti-imaging filters demand more attention than that due to the inaccurate capture or the 
make up of instantaneous values.)
A band-limited, continuous signal can be sufficiently sampled (reconstructed without error) 
at a rate higher than its two-sided bandwidth. This is the main reason for using complex 
signals rather than their real bandpass associates. We see that any real bandpass signal 
consists of two spectral parts, the analytic component and its conjugate image, with a 
spectral gap between them. In this case, the minimum sufficient sampling frequency 
depends on the magnitude of the spectral gap, in addition to the information bearing 
bandwidth.
2.7.2 Complex Signal Representation
The complex representation provides a unified treatment of arbitrary signals. Any signal 
can be equivalently represented by a baseband signal with spectrum centred at zero 
frequency, plus a one-sided frequency translation to the desired centre frequency. For an
Yim, All-Digital Multicarrier Demodulators 2-35
2 Complex Multirate Signal Processing Summary 2.7
arbitrary signal, the type (real or complex) of the baseband signal, the bandwidth, and the 
centre frequency provide a concise and unified characterisation, which enables immediate 
determination of the minimum sampling frequency and filter specification etc.
Analytic signals (complex) with no negative frequency components provide a linkage 
between general complex signals and real signals. Suppressing the conjugate image 
(frequency domain) of a real signal results in an analytic signal. Conversely, discarding 
the imaginary part (time domain) of this analytic signal gives the original signal.
2.7.3 One-Sided Frequency Translation
One-sided frequency translation using a complex sinusoid is flexible and reversible. All 
other translations are special cases depending on the input or output signal type, with 
restrictions on the carrier frequency. In addition, these other translations are all two-sided, 
generating double frequency terms that require filtering for the reverse frequency shift. 
Therefore complex sinusoids are often used in DSP. Apart from avoiding additional 
filtering, higher sampling frequencies are not required to accommodate double frequency 
terms.
We see that any power and bandwidth efficient modulation scheme has an explicit complex 
baseband signal, which is often called the complex envelope. Every scheme uses quadrature 
modulation for eventual real signal transmission. This view allows us to avoid describing 
modulation schemes in terms of generation methods. These may involve directly generating 
a real signal at a very low I.E., or modulating more than two oscillator phases.
2.7.4 Complex Filtering
The impulse response of filters are theoretically indistinguishable from signals. The 
theories of frequency shifting and sampling directly applies. A complex filter is obtained 
by shifting the lowpass prototype to a centre frequency aligned with the complex signal. 
Filtering is the multiplication of spectra or convolution in time domain. This holds for 
continuous or discrete filters. In the latter case, the filter spectrum is periodic. Theoretically, 
the convolution of an discrete signal and a continuous filter, or vice versa, is valid. We 
can simplify the design of DSP systems by using fictitious analogue filters such as 
time-limited impulse response types, since the corresponding finite impulse response 
digital filters can be directly realized in DSP.
Yim, All-Digital Multicarrier Demodulators 2-36
2 Complex Multirate Signal Processing Summary 2.7
2.7.5 Bandpass Multirate DSP
Spectral harmonic images arise due to sampling, resulting in a periodic spectrum. These 
images are indistinguishable from each other because any DSP operation is inherently 
modulo in nature. The centre frequency of each image can be arbitrary large in magnitude. 
In sampling rate expansion (inserting zero samples), there is no change to the signal 
spectrum. In sampling rate compression (discarding samples), new images are created but 
not destroyed. Therefore we can focus on any one image throughout a multirate system, 
at a single centre frequency of interest. These concepts are summarised as sampling 
invariance.
The conventional approach to multirate sampling considers that each sampling rate 
reduction is associated with a frequency shift of the signal. This is because of the restriction 
of signal frequency components within the principal frequency interval, i.e., from zero to 
the new, lower sampling rate. The ‘shifted signal’ is merely a newly created image that 
falls within the principal period. Paying attention to this virtual frequency shift is both 
redundant and tedious.
In decimation by M, M - 1  samples are periodically discarded after digital anti-aliasing 
filtering. The centre frequency of the digital AAF is the same as that of the signal. The 
passband and stopband frequencies of the prototype lowpass filter is
/ , = !
/.=F-f
where B is the bandwidth to be protected against aliasing, and F  is the output sampling 
frequency.
In interpolation by L, L - 1  zero valued samples are periodically inserted to the input signal 
before digital anti-imaging filtering. The specification of the lowpass prototype is the same 
as in decimation. Here F  is the input sampling frequency, and all harmonic components 
of signals inside the bandwidth B are to be suppressed.
The concept of sampling invariance, together with the fact that analogue signals is a 
particular case of multirate sampling (infinite sampling frequency), allows us to apply 
familiar design concepts of analogue AAF and AIF to the specification of their digital 
counterparts.
Yim, All-Digital Multicarrier Demodulators 2-37
2 Complex Multirate Signal Processing References 2.8
2.7.6 Realization of Multirate Filtering
Tlie combination of interpolation and decimation lead to the general class of periodically 
time-varying systems. The underlying theory is that of a combination of linear 
time-invariant processes, i.e., single rate sampling, with sample rejection and insertion 
(zero-valued). In the general case of sampling rate conversion by a rational ratio, there are 
3 different sampling frequencies. The sampling rate of the filter is the rate of the underlying 
time-invariant system. The other sampling frequencies are those of the input and output 
signal.
Efficient implementations of multirate filtering avoid the computation of signal samples 
that are discarded during decimation, and the multiplication of zero-valued samples during 
interpolation.
An original approach is presented for bridging the gap between multirate theory and 
efficient implementation. This concise approach is based on parallelization of input signals, 
output signals and filters in an early stage, leading to mathematical expressions directly 
representing serial-to-parallel commutators, parallel-to-serial commutators, and polyphase 
filters respectively. The formal expressions obtained for interpolator and decimator 
structures can be exploited further, resulting in a novel derivation of polyphase-matrix 
structures. In conventional approaches, only the decimator structure can be directly derived.
2.8 R eferences
Crochiere83a. R. E. Crochiere and L. R. Rabiner, Multirate Digital Signal Processings 
Prentice-Hall (1983).
Gardner85a. F. M. Gardner, “On-Board Processing for Mobile-Satellite
Communications,” ESTEC contract no. 5889/84/NL/GM, European 
Space Agency (May 1985).
Lee87a. E. A. Lee and D. G. Messersclimitt, “Synchronous Data Flow,” Proc.
IEEE, Sep. 1987.
Narasimha79a. M. J. Narasimha and A. M. Peterson, “Design of a 24-Channel 
Transmultiplexer,” IEEE Trans. Acoustic, Speech and Signal Processing, 
Vol. ASSP-27, pp. 752-762, Dec. 1979.
Oppenheim83a.A. V. Oppenheim, A. S. Willsky, and I. T. Young, Signals and Systems, 
Prentice-Hall (1983).
Yim, All-Digital Multicarrier Demodulators 2-38
2 Complex Multirate Signal Processing References 2.8
Yim89a. W. H. Yim, C. C. D. Kwan, F. P. Coakley, and B. G. Evans, “On-Board
Multicarrier Demodulators for Mobile Applications using DSP 
Implementation,” Proc. 1st European Conference on Satellite 
Communications, Nov 1989.
Yim, All-Digital Multicarrier Demodulators 2-39
3 A Comparative Study of Filter Banks
Table of Contents
3 A Com parative Study of Filter Banks .....   3-1
3.1 Filter Bank Functions for Multicarrier Dem odulators 3-2
3.2 Demultiplexer A pproaches............................. ............................. 3-4
3.2.1 Single-Channel Approaches ............................................ 3-5
3.2.1.1 Direct Filtering..............................................................  3-5
3.2.1.2 Multistage Filtering ......................................................  3-6
Interpolated FIR Filters....................................................  3-6
Half-band Filters..............................................................  3-7
Analytic Filters.................................................................  3-8
Hilbert Transformers........................................................  3-8
Comb Filters.....................................................................  3-8
3.2.1.3 Fast Convolution...........................................................  3-9
3.2.2 Multichannel Approaches  ..................................... 3-9
3.2.2.1 Polyphase-DFT................ ............................................  3-9
3.2.2.2 T ree...............................................................................  3-10
3.2.2.3 Fast Convolution...........................................................  3-11
3.2.2.4 Analysis-Synthesis.................................................    3-11
3.3 Optimization Hierarchy in D S P ..................................................... 3-12
3.4 A Unified Filter Bank T heory..........................................................3-14
3.5 Computer Sim ulation....................................................................... 3-19
3.5.1 Simulation Procedure.......................................................... 3-20
3.5.2 Simulation Results ............................................................  3-21
3.6 Computation Hardware....................................................................3-25
3.7 Comparison of Demultiplexer A p proach es............................... 3-27
3.7.1 Computation R a te ................................................................3-27
3.7.2 Memory S ize ........................................................................ 3-28
3.7.3 Delay.....................................................................................3-28
3.7.4 Control Complexity...............................................................3-28
3.7.5 Modularity............................................................................. 3-29
3.7.6 Flexibility .............................................................................. 3-29
3.7.7 Summary.............................................................................. 3-30
3.8 R eferen ces.......................................................................................... 3-31
Yim, All-Digital Multicarrier Demodulators 3-i
A Comparative Study of Filter Banks
Filter bank theories are well-established and new developments continue to appear. Well 
known applications include transmultiplexers, spectrum analysers and speech coders. All 
these applications are potentially relevant to multicarrier demodulators for on-board 
processing. However, no previous implementation can be directly applied because of the 
different functions required in MCDs. A detailed study is required to review filter bank 
approaches and to extract relevant optimization techniques.
Although comprehensive surveys have appeared, most reports are either general 
comparisons of filter bank approaches, or can be regarded as collections of ad-hoc 
optimizations. In addition, due to different performance and optimization criteria, decisive 
conclusions are rarely drawn. To select filter banks for MCDs, this study includes the 
following:
(1) precise identification of filter bank functions
(2) systematic classification of filter bank approaches
(3) structured organisation of DSP optimization techniques
(4) accurate estimation of filter bank complexity through computer simulation
(5) determination of computation hardware complexity
A structured filter bank approach based on routine parallelization is developed, leading to 
the discovery of a new filter bank structure called polyphase-matrix-DFT. Since most
Yim, All-Digital Multicarrier Demodulators 3-1
3 A Comparative Study of Filter Banks Filter Bank Functions for Multicarrier Demodulators 3.1
efficient filter bank approaches are special cases of tliis new structure, a superior unified 
filter bank theory emerges that simplifies the task of filter bank selection, design and 
implementation.
3.1 Filter Bank Functions for Multicarrier D em odulators
Infrequency domain multiplexing (FDM), a multiplexed signal ofK  channels is described 
as
L*=o
where co^  are the angular carrier frequencies of individual channels. At the receiver, an 
analogue I.F. stage can be defined as
n(t) = [sfr) *hr(f)]
where co,- is the final I.F. prior to demodulation. Because there are two frequency
translations, adjacent channels or out of band noise may overlap the channel of interest. 
The only purpose of the bandpass filters h^^(t) is to prevent spectral overlapping. If o),- is 
large, only a wide band filter is necessary. All the I.F. stages together make up a filter 
bank, with the basic characteristics of frequency shifting and filtering. The filter banks in 
all-digital multicarrier demodulators can be considered as I.F. stages using DSP, sharing 
a single A/D convertor for the sampling of s (t). However, a one-sided frequency shift 
guarantees no overlapping, and the complex signals and operations can be handled 
conveniently in DSP, giving
It is obvious that co,- can be zero. The function of this I.F. stage is to perform a coarse
frequency shift, leaving the residue frequency error to the demodulator. There is no apparent 
advantage in separating the frequency shift.
Frequency domain demultiplexing is simply a filtering operation:
Vim. All-Digital Multicarrier Demodulators 3-2
3 A Comparative Study of Filter Banks Filter Bank Functions for Multi carrier Demodulators 3.1
The first form emphasises that a simple bandpass filter can beused. The second form has 
the advantage that the output is at baseband. Ideally, the passband of h (r) is flat in order 
to select the channel of interest, and all adjacent channels are situated in the stopband. In 
the strict sense of demultiplexing, guard bands must exist between channels, otherwise 
non-realisable brick-wall filters (zero transition band) are required. However, in 
demodulation, guard bands are mainly provided to reduce adjacent channel interference, 
and to ease pulse shaping filter realisation. The addition of filtering stages or the absence 
of guard bands does not have major consequences.
The term transmultiplexer in telephony refers to both demultiplexing and multiplexing 
functions. The signals s^it) are single side-band (SSB) voice channels. After 
demultiplexing, the samples of all channels are naturally assembled in a time domain 
multiplexed (TDM) stream for subsequent transmission. Although the term demultiplexing 
or FDM-TDM conversion is used, the functions are that of a single carrier SSB 
demodulator:
Here O), is high enough such that the signal after filtering is analytic. After taking the real
part, is the original baseband signal. The carrier is either exactly known or corrected 
before the demultiplexer. Very sharp cut-off HR filters can be used because of the tolerance 
of SSB speech signals to non-linear phase distortions. Here the main concerns are the 
frequency shift and SSB recovery. MCDs deal with modulated complex signals, therefore 
transmultiplexer methods are equivalent to obtaining real outputs with low LF.’s . Further 
quadrature frequency shifts are required in die following demodulators. There may be 
some advantage of using real intermediate signals in the implementation, but complex 
outputs are more straightforward.
The prime function of a filter bank in a MOD is to minimize the total computation rate 
using multirate sampling. At the filter bank output, the sampling rate can be reduced in 
proportion to avoid unnecessary computations. Tlie use of a separate filter bank stage 
allows maximum sharing of computations, with independent demodulators working at 
much reduced samphng rates. This rate is chosen to reduce the overall computations of 
the MOD. Digital anti-aliasing filters are required similar to the analogue ones in A/D 
conversion, hence the term digital anti-aliasing filter bank. Here the frequency shift is 
unimportant since this is always performed in individual demodulators. We can use 
bandpass filters and need not perform frequency shift at all. However, the decimation of 
a bandpass signal and frequency shift are inseparable, if we focus on the image with the
Yim, All-Digital Multicarrier Demodulators 3-3
3 A Comparative Study of Filter Banks Demultiplexer Approaches 3.2
lowest positive centre frequency. For uniform channel spacing, a trivial shift such as 
exp(/jw) is often enough to shift all channels to baseband. In contrast to demultiplexing, 
partially attenuated adjacent channels may be present. The term channelliser has been used 
for this particular application.
In contrast to the above applications are overlapping filter banks. Here the passband of 
h it) is not necessarily flat, and the transition band overlaps adjacent channels. Because of 
the wide transitionband, digital filteringrequiresfewer computations. Anatural application 
is in spectrum analysis. More relevant to MCDs are analysis-synthesis filter banks. The 
function of the analyser is to decompose a large bandwidth signal into smaller ones of 
equal bandwidth. The filter characteristics are designed such that the original signal is 
recovered with minimum distortion in the reverse operation (synthesizer). This can be 
applied to MCDs with non-uniform channels. The analyser first splits the input into partial 
channels with equal bandwidth. Adjacent partial channels are then synthesized into larger 
groups. The number of partial channels in the synthesizing process determines the 
bandwidth of individual channels.
In summary, the main difference between the channelliser and other applications is the 
existence of demodulators. The overall filter characteristics can be shared between the 
two. Different optimization criteria may lead to different channel filters, but anti-aliasing 
bandpass filters are most appropriate. Since all filter banks perform demultiplexing to some 
degree, we shall use the general term demultiplexer in MCDs.
3.2 Demultiplexer A pproaches
Filter bank approaches can be classified into single-channel and multichannel types. In 
single-channel approaches, no filtering is shared amongst channels as in analogue 
processing. In contrast, the efficient sharing of computations is the main concern in 
multichannel approaches. Another important classification is single-stage versus 
multistage. The latter refers to the cascade of filters or demultiplexers to complete the 
overall function. These classifications and common examples are summarised in table 3.1. 
With the exception of fast convolution, which is a frequency domain method, all approaches 
are based on time domain principles. Simple classification is inadequate for some general 
approaches, these appear more than once in the table.
Yim, All-Digital Multicarrier Derrxjdulators 3-4
3 A Comparative Study of Filter Banks Demultiplexer Approaches 3.2
Table 3.1 Filter Bank Approaches
Single-channel Multichannel
Single-stage Direct Filtering 
Fast Convolution
Polyphase-DFT 
Fast Convolution 
Analysis-synthesis
Multistage Multistage Filtering . Tree 
Analysis-synthesis
To simplify the following discussions, the demultiplexer input is a complex FDM signal 
that consists of K  channels at equal spacing W. The two-sided signal bandwidths are all 
equal XoB <W. Critical sampling refers to the case where the input and output sampling 
rates 2xqFq = KW  and Fj = W.
3.2.1 Single-Channel Approaches
Although inefficient, single-channel approaches are important because other approaches 
are extensions of single-channel principles.
3.2.1.1 Direct Filtering
This method is very similar to a direct digital implementation of analogue demultiplexing, 
with the exception of decimation. The FDM signal is quadrature frequency shifted, 
followed by an anti-aliasing lowpass filter:
rj^{n)=s{n)e {n)
Only the samples at « = w/T are computed. Alternatively, a complex bandpass filter that 
is a frequency shifted version of the lowpass prototype can be used.
For digital modulation schemes, FIR filters are commonly used for their phase linearity.
With the notable exception of audio applications, the use of an HR filter requires a phase 
equaliser to reduce distortion. Conclusions have been made that FIR filters require fewer 
computations in most applications demanding phase linearity.
The filter length (or order) of an optimal equiripple FIR is approximated by C Bel 1 cLngerM j
. , _ 2 F ,  1
3A/ °®101ogô,ô.
Yim, All-Digital Multicarrier Demodulators 3-5
3 A Comparative Study of Filter Banks Demultiplexer Approaches 3.2
where F  is the filter sampling frequency, zyis the transition band, ^  and bp are the stopband 
and passband ripples respectively. For anti-aliasing, the transition bandwidth is given by
A f = F , - B
and the computation rate is proportional to
The transition bandwidth and computation rate relationship is important in determining 
the filter length required in different demultiplexer approaches, and hence the total 
computation rate.
3.2.1.2 Multistage Filtering
Computation rate can be reduced if the decimation is performed in cascaded filters with 
input sampling rates F}. For /  stages, the total computation rate is proportional to
/-I Ff
Computation reduces as I  increases, and optimum intermediate sampling rates have been 
found by computer optimization [Crochiere83a]. In practice, 7 > 2 is unattractive because 
each additional stage reduces the total computation rate by a smaller percentage. Only 
multistage lowpass filters have appeared in the literature. Using complex bandpass filters, 
their centre frequencies can be conveniently determined by sampling invariance. If the 
multistage lowpass filters are hfri), the frequency shift terms for complex filters are
h
e
where is the non-normalised carrier frequency of the channel at the input of the first 
stage.
Since the overall objective of anti-aliasing can be split between multiple stages, special 
filters can be used. The reasons for their use and efficiency compared to equiripple filters 
are summarised as follows:
Interpolated FIR Filters
If a FIR filter is interpolated by insertion of zeros, a filter with alternate passband and 
stopband results, which are repeated images of the original filter. In figure 3.1 for
Yim, All-Digital Multicarrier Demodulators 3-6
3 A Comparative Study of Filter Banks Demultiplexer Approaches 3.2
example, if a FIR filter is designed to select one channel and suppress the other in a 
2-channel (uniform spacing) demultiplexer, after interpolation by 3, it can be used in 
a 6-channel demultiplexer with only the even (or odd) numbered channels selected. 
One of the remaining 3 channels is selected by the next stage filter, with a transition 
bandwidth increased by one channel spacing. Although the convolution with an 
interpolated FIR (IFIR) filter is a single rate operation, similar efficiency is achieved 
compared to ordinary multistage filtering. The multiplication of zero coefficients can 
be avoided by parallel decomposition similar to that in polyphase structures, and the 
next stage filter can take advantage of the wider transition bandwidth.
m  AA AA
3F
Figure 3.1 Interpolated FER Filter
Possible uses of IFIR filters are in multichannel approaches. Without special design, 
all the even (or odd) channels share the same computations in an IFIR stage. The sharing 
extends to all channels for a real FDM signal. Since a TT-channel real FDM signal has 
the same spectral arrangement as a 27T-channel complex FDM signal, selecting only 
alternate channels covers either analytic components or conjugate images of all K  
channels.
Half-band Filters
In half-band filters, alternate filter coefficients are zeros. This condition is satisfied by 
an optimal equiripple FIR filter with
F
where F  is the filter sampling rate, F^ is the two-sided passband bandwidth, and A/ is 
the transition bandwidth (one-sided). The computation rate is approximately reduced
Yim, All-Digital Multicarrier Demodulators 3-7
3 A Comparative Study of Filter Banks Demultiplexer Approaches 3.2
by a factor of two compared to an arbitrary optimal equiripple FIR filter. Because of 
the passband constraint, half-band filters are only applicable in decimation by two 
stages.
Analytic Filters
Analytic signals, i.e., without negative frequency components, do not exist in DSP 
because of the periodic spectrum of sampled signals. However, similar properties of 
analytic signals apply for digital signals with no frequency components in the interval 
[-F/2,0], where F  is the sampling frequency. Digital analytic filters are complex filters 
that attenuate frequency components in the above interval. Since the output signals of 
such filters are analytic, only the real part need be computed. This is their major 
advantage and the main reason for their use. Since equiripple filters can be used by 
frequency shifting a lowpass prototype, there are no gains in computational efficiency.
Hilbert Transformers
Given a real signal its analytic counterpart jc(n) +/y(/z) can be found via the 
Hilbert Transform. This transform is realised in real time as a real filter//(«) such that 
y(n) =x{n)*H(n). In other words, the Hilbert Transform can be implemented as a 
special case of an analytic filter, where the coefficients are purely imaginary. This can 
be achieved with an equiripple, half-band, lowpass filter, and a frequency shift of 
exp{fm/2).
Comb Filters
Comb filters are FER lowpass filters with equal coefficients. Therefore only additions 
are required in convolution. The frequency spectrum of such a filter is a sine function. 
The approximately flat portion of the main lobe is the passband, and the side lobe 
portions provide the stopband attenuation. Comb filters are only suitable for use in the 
first stage of a low quality demultiplexer because of the fixed filter spectrum. To 
improve the overall passband and stopband characteristics, the decimation rate of the 
comb filter stage has to be reduced. This requires substantial computations in the later 
stages that cannot be shared amongst channels.
Yim, All-Digital Multicarrier Demodulators 3-8
3 A Comparative Study of Filter Banks Demultiplexer Approaches 3.2
3.2.1.3 Fast Convolution
An alternative implementation of the convolution r{n) —s{n)'^h{n) is given by the 
Z-transform relationship (r) = 5'(^)//(r). If 5 («) is periodic in time with /  samples, DFT 
coefficients can be used to approximate the Z domain specification such that 
R{i) =S{i)H{i),i e [0,7 -1 ]. The periodic signal r  (/% ) is then obtained by an inverse DFT.
In practice, 5  (n ) is non-periodic. Overlapping DFT’s must be used to obtain r{n). However, 
in estimating the computation rates, periodic signals are adequate, which will be assumed 
in later sections.
The computations required include an I  point DFT, I  multiplications with the stored filter 
DFT coefficients, and an inverse transform. Reduction of computations over direct FIR 
filtering can be achieved using a FFT when I  is large (e.g., 30). For comparison, I  can be 
taken as the approximate filter length of a FIR filter.
3.2.2 Multichannel Approaches
In multichannel approaches, computations in filtering can be shared amongst channels by 
careful arrangements of the frequency plan. This is impossible in analogue filtering.
3.2.2.1 Polyphase-DFT
Using polyphase implementation of the direct decimation approach, the channel output
has the form
K - l
ru{n)= E  K in)] é
q  =0
where co^  are the centre frequencies, Sq{n) are the parallelised components of the FDM
signal, and hq{n) are the polyphase filters decimated from a lowpass prototype. By 
exploiting the uniform channel spacing and suitable choice of co^ , all K  channels are 
obtained from a generalised DFT applied to the outputs of polyphase filters (figure 3.2).
FIR
FIR
FIR
FIR
FFT
Figure 3.2 Polyphase-DFT Demultiplexer
Yim, All-Digital Multicarrier Demodulators 3-9
3 A Comparative Study of Filter Banks Demultiplexer Approaches 3.2
The most important contribution towards computation efficiency is that the polyphase 
filters are independent of k, i.e., all channels use the same filter coefficients. Also, by 
definition, all channels share the same FDM signal. Therefore the number of computations 
for a single channel in the direct method remains the same for K  channels. The only 
additional processing isa/T point DFT. For a FDM signal with high bandwidth efficiency, 
the computation rate in the polyphase network is significantly larger than that in the DFT. 
A FFT implementation reduces the computation rate further, and the polyphase network 
occupies a higher percentage of the total processing.
3.2.2.2 Tree
If the number of channels is a composite number such that
K = U K i
the overall demultiplexing can be performed in cascade. The first stage is a TTo-channel
demultiplexer. All the Kq intermediate channels are then followed by a TTi-channel 
demultiplexer. This process is continued until all K  channels are separated, forming a 
tree-Hke structure. This approach can be used to obtain a non-uniform filter bank, because 
all intermediate channels are different in bandwidth, even though all demultiplexer nodes 
make use of uniform channel spacing.
Different techniques can be used to implement the nodes. However, critically sampled 
demultiplexer nodes are very inefficient because of tlie unnecessarily tight filter 
specification in every node. For a decimation filter, if we increase the output sampling 
rate, the filter length reduces, but the rate of computation increases. Near minimum 
computation rate occurs at an over-sampling factor of 2 at the output of each node. To 
allow cascading, the same over-sampling factor applies to the input of all nodes except 
the first. Desired channels are finally obtained tlirough decimation by 2 half-band filters.
The binary tree method shown in figure 3.3 makes use of only 2-channel demultiplexer 
nodes. The total number of channels is therefore an integer power of 2. Each node consists 
of 2 filters, in which either the input signal or a prototype filter is shifted in frequency by 
exp(dy n jt/2). Therefore the product terms of filtering are shared. Since each node performs 
decimation by 2, half-band filters can be used.
Yim, All-Digital Multicarrier Demodulators 3-10
3 A Comparative Study of Filter Banks Demultiplexer Approaches 3.2
FIR 1 2
FIR 12
FIR 1 2
FIR 12
FIR 12
FIR 12
Figure 3.3 Binary Tree Demultiplexer
Both the passband and sampling frequency of each level is reduced by 2. Consequently, 
all prototype filters and hence all nodes are identical except the operation rate. This is very 
different from a single-channel approach using multistage half-band filters. In this case, 
the passband of all stages is the same and equal to one channel bandwidth. With the change 
in sampling rates, different filters are used in each stage.
3.2.2.3 Fast Convolution
In applying the fast convolution to multiple charmels, let the sets of I  DFT coefficients be 
related by R^ii) The forward DFT used to obtain S (z) are shared amongst all
channels. For a continuous bandpass filter, the ideal frequency spectrum is zero in the 
stopband. Therefore the DFT coefficients can be designed such that of the I  
coefficients, only J  « UK are non-zero. Each channel is then obtained by a J  point inverse 
DFT. The decimation rate is achieved by the difference in input and output transform sizes. 
This approach is unpopular for uniform channel spacing applications because no decisive 
advantages over the polyphase-DFT method have been reported. For a non-uniform filter 
bank, the number of non-zeros in is proportional to individual channel bandwidths. 
Therefore the inverse transforms are of different sizes.
3.2.2.4 Analysis-Synthesis
The analysis-synthesis approach is only relevant for non-uniform filter banks. The FDM 
signal is first split into intermediate channels with equal bandwidth (analyser). Different 
numbers of adjacent channels are then multiplexed into unequal bandwidth channels 
(synthesizers). Uniform channel spacing approaches are used in the analyser and the 
synthesizers. The channel filters are designed to minimize distortion in the 
analysis-synthesis process. Since the filter transition bands are overlapping to reduce 
computations, the synthesizer outputs are over sampled and additional decimation stages 
are required.
Yim, All-Digital Multicarrier Demodulators 3-11
3 A Comparative Study of Filter Banks Optimization Hierarchy in DSP 3.3
3.3 Optimization Hierarchy in DSP
Real-time digital signal processing systems are sequential computation machines. The 
output computations of the present time state depend on both the present input and previous 
computations stored in memory (or delay elements). HR filters with feedback are typical 
examples. FIR filters are special cases where previous computations are multiplications 
by unity, i.e., the inputs stored are directly used in computing the present output.
Operations that are to be completed in a single time state are combinational, or memoryless. 
For example, the Discrete Fourier Transform can in theory be implemented using purely 
combinational logic such as ROM lookup tables. Although in practice, complex 
combinational computations are implemented sequentially.
The optimization theories of combinational computations and their hardware 
implementation are well developed, generally regarded as fast algorithms, including 
number theoretic transforms. However, the difference between sequential and 
combinational computation has not been identified in the literature. Therefore no structured 
approaches towards sequential optimization exists.
Sequential optimization is concerned with time states, including the derivation of an 
efficient realisable structure, the minimisation of delay elements and the avoidance of 
unnecessary computations. Combinational optimizations then follow. Sequential 
optimization has been overlooked because in the case of single-rate sampling such as a 
FIR filter, the sequential operations of a direct convolution are trivial. In the more relevant 
case of multirate sampling, efficient polyphase structures exist that can be considered to 
be already optimised sequentially. This solution can be applied directly to more 
sophisticated problems such as filter banks. However, the lack of a general sequential 
optimization procedure proves to be insufficient in many cases.
Signal flow graph and similar theories can be regarded as sequential optimization tools. 
With the ability to manipulate delay elements, direct implementations are often transformed 
into more attractive structures as in filtering, and unnecessary computations and delay 
elements are reduced prior to combinational optimization, as in multirate processing. As 
with other graphical tools, signal flow graphs are more appropriate as structured 
specifications of a DSP system, with the advantage of direct implementation using 
multipliers, adders and delay elements etc. Based on regular signal flow, the possibility 
of using ad-hoc control signals are largely eliminated.
Yim, All-Digital Multicarrier Demodulators 3-12
3 A Comparative Study of Filter Banks Optimization Hierarchy in DSP 3.3
For sequential optimization, graphical theories often lead to ad-hoc procedures with limited 
scope. While polyphase structures for integer sampling rate conversion are readily obtained 
through graphical network transformations, Crochiere and Rabiner failed to apply this 
method for a rational ratio convertor. Successful attempts have appeared more recently in 
[Hsiao87a] and [Vaidynathan90a]. However, because of the ad-hoc procedures, special 
ratios demand different treatment, and the optimization task becomes tedious for ratios 
with large integers. In addition, delay elements are not necessary minimized. These 
limitations have led to the later proposal of more structured rules for improvements [Bi90a]. 
In more sophisticated applications such as filter banks, polyphase structures can always 
be expressed in formal signal flow graphs, but never derived from them. Due to the lack 
of a structured sequential optimization approach, previous attempts in providing a unified 
filter bank theory failed to represent the large number of ad-hoc implementations typical 
of transmultiplexers.
Mathematical approaches for sequential optimizations do not seem appropriate. This is 
because mathematical approaches are functional, i.e., global time states are not recognized 
in theory. For example, the availability of input signal sequence of infinite length is always 
implied, but in practice, only limited input signal samples can be stored and operated on 
at one time.
Nevertheless, a mathematical sequential optimization approach is identified based on 
routine parallel decomposition. New problems no longer need ad-hoc modifications to 
known solutions, e.g., polyphase structures. Optimization is best performed before 
realization, but the opposite is true in the case of signal flow graph techniques. Although 
this parallel decomposition approach seems to be a minor extension of polyphase principles, 
its advantage can be seen in the new derivation of the polyphase-matrix sampling rate 
convertor, and the novel polyphase-matrix-DFT filter bank.
Optimizations in DSP can be classified in the following hierarchy:
(1) system specification
(2) sequential computation
(3) combinational computation
(4) digital hardware
This study deals mainly with the first two levels of optimizations that are inherently 
application dependent. In filter banks, the optimization of system specification is primarily
Yim, All-Digital Multicarrier Demodulators 3-13
3 A Comparative Study of Filter Banks A Unified Filter Bank Theory 3.4
the choice of a frequency plan that minimises the overall complexity. After sequential 
optimization, the necessary combinational computations are then identified, establishing 
the link with well-developed theories and hardware implementations in the literature.
3.4 A Unified Filter Bank Theory
The input signal of a demultiplexer consists of K  complex channels multiplexed in the 
frequency domain. For a uniform channel spacing of W, the minimum sampling frequency 
is Fq = KW.  The centre frequencies of these channels are
f ,  = ikJ+k,)W
where is the centre frequency of the first channel. By sampling invariance, kQ can be an
arbitrary real number. (If kQ > Fq, the frequency component at kQ and ^ o^iodFo are equivalent 
in DSP operations.) 7  is a new integer parameter that forces a permutation of the channel 
numbers. For example with ^  = 3, if 7  = 1 and kQ = 0, the ordering of centre frequencies 
is 0, W and 2W. If 7  = 2, this become 0,2W and 4W, which is equivalent to 0,2W and W,
Instead of using a decimation by M  filter as in all existing theories, a more general rate 
conversion filter with a factor of L!M is used. Let F q, Fj and be the input, output and 
filter sampling frequencies. Using the polyphase-matrix structure, the parallel output of 
the channel is
A/ “ 1
A*^ (/)= S p  e [0 ,L -l]n Z
Similar structures exist, however, all previous developments were based on ad-hoc signal 
flow graph manipulations with many limitations: only FIR filters can be used because 
explicit filter coefficients are required; decomposition procedures become tedious for large 
L and M ; because the decomposition can be performed in many ways, unnecessary delay 
elements may be introduced. Here, the above expressions for the filtering operation and 
filter definition are novel. Since sequential optimization is performed through routine 
parallel decomposition, the problem of unnecessary delay elements does not arise. In 
addition, the subfilter definitions are applicable for HR filters.
The matrix of subfilters is frequency shifted versions of a common lowpass prototype:
hl^( i )=HiM L+pM +qL)
— h /"I +qi^  +?(/-)
Yim, All-Digital Multicarrier Demodulators 3-14
3 A Comparative Study of Filter Banks A Unified Filter Bank Theory 3.4
where co^  = 2jt—.r .
By splitting the frequency shift term, we have
^ ^ = 0  /=-^^
The filtering can be shared amongst all channels if
For sufficient sampling, Fq>KW. By sampling invariance, if we shift all channel centre 
frequencies to zero, the multiplication term is
e ^
Finally, the demultiplexer equation with zero centred channels is
9 = 0 /=-»>
The inner summation denotes a polyphase-matrix network independent of k, i.e., shared 
amongst all channels. Considering all channels, the outer summation represents a 
generalised M point DFT independent of p , with only the first K  output required. If 
MIN =K' e  Z \  fast algorithms can be applied. This DFT can be implemented by a K'  
point FFT, with trivial modifications and a small number of extra additions. The parameters 
kQ,riQ, and J  are either system specifications or subjects of combinational optimizations. 
This structure is shown in figure 3.4, where the frequency shift terms are omitted for clarity. 
By optimizing the channel stacking arrangement, most of these terms do not require 
multiplications.
For the reverse, frequency domain multiplexing expression, all the k  channels r^(/) are 
simply summed, with the constraint that Fi>KW  instead of Fq>KW.
Yim, All-Digital Multicarrier Demodulators 3-15
3 A Comparative Study of Filter Banks A Unified Filter Bank Theory 3.4
S o p=0
bo.o
bi.o
q=0 j •
• •
h .
p=L-1
Si
h"0,1
hi.i
•
M -l\
• ■
■
0
M K
—
DFT r.(l)
Swi
h
"O.M-1
h
" l.M -l
. •
h •
" l  1.M1
Figure 3.4 Polyphase-Matrix-DFT Demultiplexer
Most popular and efficient demultiplexers are special cases of the novel 
polyphase-matrix-DFT structure. Examples of conventional structures and equivalent 
parameters of the polyphase-matrix-DFT structure are summarised in table 3.2. Most 
structures do not require a specific k ,^ and the corresponding entries are left blank.
All previous demultiplexers have been derived from a decimation filter with an integer 
factor of M'. The most general form of the novel polyphase-matrix-DFT structure has an 
over decimation factor of M' =M!L, This is important when a specific output sampling 
frequency is required. In MCDs where the output signals are data channels, an integer 
number of samples per symbol can be provided, irrespective of the channel spacing. A 
direct comparison cannot be made with a conventional polyphase structure as the output 
sampling rates are different in both structures, the filter specification can take advantage 
of the relaxed filter transition band. However, if MIL is very close to an integer, both 
structures will have approximately equal computation rates. The only price to pay for the 
rational sampling rate conversion capability is the increase in filter coefficients stored.
In all conventional methods with a polyphase-matrix-DFT equivalent where L > 1, 
previous structures are either ad-hoc FIR/HR implementations of the polyphase-matrix 
network as in single-sideband methods for transmultiplexors, or a formal signal flow graph 
specification cannot be obtained, as in the case of the weighted overlap-add method.
Yim, All-Digital Multicarrier Demodulators 3-16
3 A Comparative Study of Filter Banks A Unified Filter Bank Theory 3.4
Table 3.2 Polyphase-Matrix-DFT Parameters
Structure
Conventional
Parameters
Polyphase-Matrix DFT Parameters
Overall
Decimation
RateM'
ko M L 'n ko J
Novel, 
Rational Decimation 
Rate
MIL M L MIK 1
Critically Sampled 
Polyphase-DFT
K K 1 1 1
Over-sampled
Polyphase-DFT
nM '= K  
/z e
K n 1 1
Weighted Overlap-add M' KM' K M' 1
Single-Sideband (a) K 1/2 2A: 2 1 1/2 1
Single-Sideband (b) K 1/4 K 1 1 1/2 2
Binary Tree Node 2 4 2 1 1
In conventional binary tree methods, identical 2-channel demultiplexers are cascaded to 
form a tree structure. Therefore the number of channels is limited to an integer power of 
2. Each tree node consists of one filter for each of the 2 channels, but the multiplication 
terms are shared. The input and output signals are both over-sampled by a factor of 2, 
therefore nodes are cascadable. Unnecessarily tight filter specifications are avoided and 
half-band filters can be used. This method is very different from multistage filtering for a 
single channel, where the filter transition bands are decreased gradually from the input 
stage to the output stage. Whereas in a multichannel, multistage demultiplexer, all sets of 
filter coefficients are the same. A more appropriate description is that of a polyphase-matrix 
network with a 4-point DFT (figure 3.5), corresponding to the so called multiplier free 
frequency shifts. The novelty of the polyphase-matrix decomposition explains the previous 
absence of a polyphase explanation in this case, even though the multistage method has 
been vaguely recognized as such.
A general tree structure is shown in figure 3.6. The number of channels and bandwidths 
are flexible. The advantages of this method can be illustrated by the case where /T is a 
highly composite number. The polyphase-DFT method (single-stage) requires a 
mixed-radix FFT. A typical pipeline implementation requires complex coupling between
Yim, All-Digital Multicarrier Demodulators 3-17
3 A Comparative Study of Filter Banks A Unified Filter Bank Theory 3.4
PMN 2X2 FFT
PMN 2X2 FFT
PMN 2X2 FFT
PMN 2X2 FFT
PMN 2X2 FFT
PMN 2X2 FFT
PMN 2X2 FFT
Figure 3.5 Binary Multistage Polyphase-Matrix-DFT Demultiplexer
each stage in the pipeline, and time varying twiddle factors (figure 3.7). The multistage 
polyphase-matrix-DFT alternative is shown in figure 3.8. All nodes in the same tree level 
are identical and therefore time multiplexed as one stage in the pipeline (solid boxes). Fast 
algorithms can be readily applied to the constant FFT and filter coefficients in each stage. 
The connections between stages are simply valid TDM signals. Also, non-uniform channel 
bandwidths are already provided by these intermediate signals. For highly non-uniform 
channels, additional stages are required, shown as dotted boxes. High variability can be 
provided in-service if the hardware architecture enables simple reconfiguration of different 
nodes.
PMN 3X2 FFT
PMN 2X2 FFT
PMN 8X2 FFT
PMN 4X2 FFT
PMN 5X2 FFT
4 TDM
5 TDM 
3 TDM 
8 TDM
Figure 3.6 General Multistage Polyphase-Matrix-DFT Demultiplexer
Ü-<__ PN S I
2
TIME VARYING TWIDDLE FACTORS
Figure 3.7 Polyphase-DFT Pipeline
Yim, All-Digital Multicarrier Demodulators 3-18
3 A Comparative Study of Filter Banks Computer Simulation 3.5
2 TDM 6 TDM 30 TDM
8 TDM
PMN 4X2 FFT
PMN 6X2 FFTPMN 3X2 FFTPMN 2X2 FFT
Figure 3.8 Multistage Polyphase-Matrix-DFT Pipeline
The polyphase-matrix networks in all cases are guarantied to have minimum computation 
rates and minimum delay elements, given the prototype filter specification h{t) otH(J). 
The cases where M  and L are mutually prime can be seen in identical structures elsewhere. 
In other cases, including where M/L is an integer, our specification requires a higher 
sampling density for h {t) compared to previous methods. However, not all filter samples 
are actually used in filtering (i.e., filter coefficient decimation). It can be shown that the 
same number of filter coefficients with identical values can be used in conventional 
methods and the polyphase-matrix-DFT method. Such filter over-sampling leads to a more 
successful unified theory than previously was the case.
3.5 Com puter Simulation
The computation rate of different demultiplexers can be calculated given the prototype 
filter length, but there is no firmly established relationship between the filter specification 
and the ultimate performance criterion of bit error rate. The same is true for other complexity 
measures — the word lengths of coefficients and intermediate values. Although there are 
various quantization noise estimation techniques, they are based on simple analytic models 
and non-specific input signals. In contrast, computer simulation in DSP is primarily 
experimental, modeling is only required between D/A and A/D convertors, outside which 
exact numerical results can be obtained as in real implementations.
Apart from the accurate estimation of demultiplexer complexity in selected cases, previous 
estimations of word length requirements are found to be largely pessimistic. This is because 
the trustworthy estimation formulae used are only relevant for single-rate DSP. In multirate 
sampling, there are large variations of processing bandwidths. The quantization noise 
density instead of the total noise power has to be considered instead. Another example is 
the use of long filters with huge differences in the passband and stopband bandwidth, which 
rarely appear in single-rate sampling. The ranges of coefficient values in separate filters
Yim, All Digital Multicarrier Demodulators 3-19
3 A Comparative Study of Filter Banks Computer Simulation 3.5
differ greatly. The scaling of filter coefficients has to be taken into account before 
determining the quantization word length. An illustration of these important factors, an 
A/D word length of 9 bits and filter coefficients of 8  bits can be used in the polyphase-DFT 
method from 8  to 256 channels, with bit error rate performance comparable to floating 
point precision.
These observations are confirmed involving large numbers of channels. No such scale of 
simulation can be seen in the literature to this date. The simulation software is custom 
developed with a data-flow oriented architecture targeted at multirate DSP. Here the 
number of channels is a simple parameter, with very flexible combinations of word length 
requirements. This is not the case even in the most current sophisticated DSP software 
packages that are commercially available.
3.5.1 Simulation Procedure
In order to study the effect of demultiplexer distortion on the bit error rate (BER), a 
simulation procedure was developed that involved the generation of frequency multiplexed 
test signals, demultiplexing, demodulation and BER evaluation.
The input FDM consisted of independent QPSK channels generated from pseudo-random 
bit streams, at a rate of 64 kb/s. Nyquist shaping filter characteristics were equally shared 
between transmit and receive filters, with a roll-off factor of 40%. The number of PSK 
channels and the carrier spacing was varied to compare the computation rate, storage and 
word length requirements of various demultiplexer approaches.
The BER at the demodulator output was estimated by the semi-analytic method, where 
the following assumptions were made:
(i) the signal and quantization noise are uncorrelated,
(ii) the demodulator is ideal.
Hence the Gaussian noise power can be estimated from the equivalent noise bandwidth 
of the receive filter at the demodulator. The probability of individual symbol error was 
determined analytically from the distorted signal at the optimum sampling instant, and 
averaged over 100 symbols to obtain the BER. The demultiplexers were compared with a 
target performance of 0.1 dB degradation at of 8.4 dB. When the binary tree 
architecture was employed for small number of channels, the length of filters required 
were short. A step change in filter order resulted in drastic change in distortion. Hence the 
degradation value was chosen as a compromise between measurable value and good 
performance. In order to study the undesired signal distortion caused by demultiplexing.
Yim, All-Digital Multicarrier Demodulators 3-20
3 A Comparative Study of Filter Banks Computer Simulation 3.5
the test signal generation and demodulation algorithms were chosen such that the 
degradation due to inter-channel interference, carrier and clock recovery was negligible. 
Figure 3.9 illustrates graphically the procedure for determining the word length 
requirements, in this case the analogue to digital conversion (ADC) word length for the 
polyphase-DFT demultiplexer. The BER curve corresponding to the selected word length 
is not shown since the target degradation is very close to ideal. The scatter diagram 
corresponding to the chosen design, which included various finite word length effects, is 
shown on figure 3.10.
m
5cc
oc
occ
ccUJ
h-
m
IE-2 6 BITS
IE-3 ,7 BITS
IE-4
1E-5 IDEAL-
IE-6
8 BITSIE-7
IE-8
2 4 6 a 10 12
E /N . /  dB
Figure 3.9 ADC Word Length Effect on 
BER
Figure 3.10 Scatter Diagram
In the design of the demultiplexer filtering bandwidth, a carrier uncertainty of ±600 Hz 
was taken into account. Consequently, the simulation results cannot be directly applied to 
other bit rate systems, since the filtering bandwidth is not directly proportional to the bit 
rate.
3.5.2 Simulation Results
In this study, the selected demultiplexer approaches were:
(1) Direct approach using complex filters,
(2) Polyphase-DFT approach using a 2K-point odd-DFT, realised by a K-point 
modified FFT,
(3) Binary Tree approach using complex filters.
Yim, All-Digital Multicariier Demodulators 3-21
3 A Comparative Study of Filter Banks Computer Simulation 3.5
Figure 3.11 shows a comparison of the computational complexity as the number of PSK 
channels is increased, for a channel spacing of 64 kHz. The Direct method is unsuitable 
for implementation if the computation rate is the prime concern. The Polyphase method 
is the most efficient in all cases, whilst the Tree method remains competitive especially 
for small number of channels. It should be noted that the distortion due to the Tree method 
is significantly smaller than the other methods in many cases. This is due to the low order 
of the required half-band filters. The available choices of design often result in performance 
that is either unsatisfactory or much better tlian the design goal.
The corresponding storage requirement follows similar trends to the computation rate, 
shown in figure 3.12, since both are related to the total filter length.
4 0 0
100
Ui
0
8 18 32 64 128 256
LUzz<
I
Ü
oc
UJ _ 
0_ CO
UJ
cc.
CO
UJ
<cc
UJQ.
o
NO. OF CHANNELS
(a) Direct Approach
12
TREE -t
10
8
6
TREE
4
FFT +2
FFT
0
8 16 32 64 128 256
NO. OF CHANNELS
(b) FFT and Tree Approaches
Figure 3.11 Computation Rate Comparison
t  n ù ùi Tf ûAJ^
*  M U l - T i P L U / ^ T l  OAJS
Yim, All-Digital Multicarrier Demodulators 3-22
3 A Comparative Study of Filter Banks Computer Simulation 3.5
acc
o
mzz<IÜ
cc
UJQ.
UJ
§
CC
e
CO
3000
2000
1000
1 ë  70
O
_ riQEEj -I 60— / UJ/ z/ z/ <
/ Ü  50 — /
/ cc/ m
/ OL
UJ 40 
CD
< _ —
1 1 1 cc
,  4 0 0  o c o  F- on 1 1 1 1
CO
NO. OF CHANNELS 8 16 32 64 128 256
NO. OF CHANNELS
(a) Direct Approach (b) FFT and Tree Approaches
Figure 3.12 Storage Comparison
Table 3.3 summarises the various word length requirements. This result was obtained by 
exploiting the relative amplitudes of filter coefficients. Overflow was prevented by scaling 
down of filter coefficients. This increase in filter word length was not counted since the 
most significant bits were zeros. In the case of polyphase sub-filters, scaling up was 
performed to take advantage of the reduced gain resulting from decomposition. The setting 
of ADC maximum input range varied as the standard deviation of the FDM signal. It is 
shown that the same ADC word length can be used for a large range of number of channels. 
This is because, as the number of channels increases, the ADC quantization noise power 
increases as well as the total bandwidth. Therefore the noise power density remains 
approximately constant.
The effect of channel spacing on the computation rate is shown in figure 3.13, where the 
number of channels is 16. The Tree method appears to be more efficient for narrow channel 
spacing. This can be explained by the fact that the filter length varies inversely with the 
transition bandwidth. The Tree method employs very low order filters and hence is 
insensitive to the narrow channel spacing. For low bit rate systems, the frequency deviation, 
due to the Doppler effect etc, increases relative to the PSK signal bandwidth. In order to 
avoid degradation due to band limitation in the demultiplexer, a filter bank with narrower
Yim, All-Digital Multicarrier Demodulators 3-23
3 A Comparative Study of Filter Banks Computer Simulation 3.5
Table 3.3 Word length requirements (including sign bit)
Direct A jproach
No. of channels 8 16 32 64 128 256
ADC 9 9 9 9 9 9
Filter Coef. 8 8 8 8 8 8
Filter Arith. 15 16 17 18 19 2 0
Tree Approach
No. of channels 8 16 32 64 128 256
ADC 9 9 9 9 9 9
Filter Coef. 7 7 7 7 7 7
Filter Arith. 13 13 13 14 15 15
Polyphase FFT Approach
No. of channels 8 16 32 64 128 256
ADC 9 9 9 9 9 9
Filter Coef. 8 8 8 8 8 8
Filter Arith. 1 1 1 1 1 1 1 1 1 1 1 1
FFT Coef. 6 6 6 7 7 7
FFT Arith. 1 1 1 2 1 2 13 13 14
transition bandwidth is required to account for an effective increase in signal bandwidth. 
The result is similar to a decrease in channel spacing. Hence the Tree method is more 
suitable, where a small number of low bit rate channels require high bandwidth utilisation.
In general, the sampling rate at the output of the demultiplexer is not suitable for the 
following demodulation process, where an integer number of samples per data symbol is 
required. A sampling rate conversion filter is necessary for interfacing between the 
demultiplexed channels and the demodulator. This can be integrated with the post-filtering 
stage of the Tree method with reduced computations. This comparative advantage over 
the Polyphase method is also highly dependent on the channel spacing etc, as mentioned 
above. Alternatively, all the per channel filtering operations can be integrated into the pulse 
shaping filter of the demodulator. The overall computational complexity involves
■ •■ '^î^ital Multicarrier Demodulators 3-24
3 A Comparative Study of Filter Banks Computation Hardware 3.6
UJ CO 
Z
I
4
3
TREE2
FFT
1
48 64 80 96 112
CHANNEL SPACING /  KHZ
Figure 3.13 Effect of Channel Spacing on Multiplication Rate
demodulator design and optimization of the whole MOD. In clock recovery schemes where 
the pulse shaping filter is implemented as an adaptive polyphase network, the additional 
integration does not introduce computational overhead, however, the control complexity 
is necessarily increased.
3.6 C om putation Hardware
From sequential optimization and computer simulation, we can obtain a good picture of 
the overall complexity of different demultiplexer approaches in terms of RAM/ROM sizes 
and computation rates. These comparisons are most relevant using multiplier 
implementation (lumped arithmetic), and low multiplication rate demultiplexers fare 
better. If implementation techniques are not restricted in combinational and hardware 
optimization, the preference may change. Different fast algorithms exploit the inherent 
relationship between memory size and computation rate in different ways. As a simple 
illustration, multipliers compute the product of two variables, but convolutions require 
scalars that compute the product of a constant and a variable. Multiplier implementation 
is efficient only when the number of constants is large. For example, there is no distinction 
between a variable and a constant operand if the variable word length is 8  bits and there 
are 256 constants. To establish the inherent complexity of demultiplexers, we need to 
consider a hardware computation technique that is not limited to particular applications, 
and generally efficient compared with other implementations.
We shall use distributed arithmetic (DA) as a yardstick. According to White [WhiteS9a], 
“DA has always fared well, not always (but often) best, and never poorly.”
Yim, All-Digital Multicarrier Demodulators 3-25
3 A Comparative Study of Filter Banks Computation Hardware 3.6
DFT and FIR filters require the following inner-product
N - l
r = E
n =0
where are constants. lfs„ are represented with^ K" bits, r  can be obtained by a ROM with
NK  address lines. The other extreme case is using N  ROM’s within address lines for each 
multiplication, followed by additions. DA performs bit-wise computations. The 
implementation complexity can be illustrated by using positive base 2  numbers such that
^ - 1
k = 0
where b„^ . equals 0 or 1. The inner-product becomes
N - l  K - l
= E E
n =0 Jt = 0
By interchanging the order of the summations, we have
x-i (n- i 
Z AbnJcV«=0 /
The partial result
N - l  
n =0
can be found using a ROM with N  address lines. The address word is formed by taking 
only the bit from all the numbers Since A„ is independent of k, only a single ROM 
is required. The k partial results are then shifted (multiplication by 2) and summed to give 
the inner-product.
For a direct comparison, demultiplexers with over-sampling by 2 at both the input and 
output are used. This corresponds to a wide channel spacing with a guard band that is close 
to the signal bandwidth in the previous simulation study. Table 3.4 shows the ratios of 
different complexity measures in the filtering part of various demultiplexers.
The ratios of memory sizes given are the relative number of words. Since only one set of 
filter coefficient is used in the binary tree method, the complete demultiplexer can be 
implemented using one ROM and one adder. For example, a 7-tap filter requires a 128-word 
ROM. Higher computation rate requires duplication of ROM’s and adders. For example.
Yim, AjI-Digital Multicarrier Demodulators 3-26
3 A Comparative Study of Filter Banks Comparison of Demultiplexer Approaches 3.7
Table 3.4 Complexity Ratio of Filtering in Demultiplexers
Demultiplexer ROM size RAM size Computation Rate
Polyphase-FFT K K 1
Binary Tree 1
2
K - l
the bracketed term in the ROM size column gives the relative size in a pipeline 
implementation where each pipeline stage performs all computations in one level of the 
tree. The product of all columns (size times rate) gives a good measure of the total 
complexity. We can see that the binary tree approach using half-band filters is superior, 
despite the higher computation rate. The other favourable factor for the tree approach is 
that the FFT computations in the polyphase method are excluded in the table.
3.7 C om parison of Demultiplexer A pproaches
We shall concentrate on a few promising demultiplexing methods for comparison. 
Single-channel approaches are inefficient, but multistage filtering may be worth 
considering. The polyphase-DFT method is a multichannel but single-stage approach. The 
corresponding multistage approach is the general tree method, where each node is a 
polyphase-matrix-DFT structure. This method is also suitable for non-uniform filter banks, 
along with the more conventional fast convolution and analysis-synthesis methods.
3.7.1 Computation Rate
The polyphase-DFT method is most computationaly efficient, especially for large numbers 
of channels. The tree method is a multistage polyphase method, which requires a higher 
computation rate as more stages (hence more nodes with smaller number of output 
channels) are used. This is the opposite to multistage filtering where the more stages the 
better. Since the computations are not shared amongst channels in multistage filtering, a 
factor of K  improvement compared to a single-stage filter is required to be comparable to 
the polyphase method. This does not occur except for very narrow channel guard bands. 
In this extreme case, the polyphase method is the least efficient.
Yim, All-Digital Multicarrier Demodulators 3-27
3 A Comparative Study of Filter Banks Comparison of Demultiplexer Approaches 3.7
3.7.2 Memory Size
The RAM size required to store input and intermediate signals is strongly related to the 
computation complexity, both being a function of the total filter length. Therefore the 
comparison is similar for both criteria.
The same holds for the ROM size required to store the filter and FFT coefficients, if these 
are mostly different from each other. The number of distinct coefficients is a prime factor 
of the inherent computation complexity. The more identical nodes (except clock rate) the 
tree method has, the smaller the ROM size. The polyphase method requires the largest 
ROM size, separated by multistage filtering.
3.7.3 Delay
Signal delay due to filtering is a function of the equivalent per-channel filter length. The 
polyphase method uses a single-stage filter and therefore other methods cannot have a 
longer delay. Here multistage filtering is best because the purpose of such fundamental 
techniques is to obtain shorter equivalent filters by using more stages. Although multiple 
stages exist in the tree method, it does not share the same characteristics as multistage 
filtering, and the delay is between the two extreme methods.
3.7.4 Control Complexity
The overall complexity of demultiplexers is determined mainly by the computation rates 
and the complexity of control units (timing, logic, etc). Unfortunately, there is no general 
direct relation between the two. From experience gained in software simulation, the 
demultiplexer modularity has significant influence on the control complexity, because 
control units are required for individual modules as well as co-ordination between them.
Single-channel approaches have the least control complexity because there is no coupling 
between channels. The complexity of multistage filtering increases as the number of stages. 
For the polyphase method, the control of large number of dissimilar units, i.e., sub-filters 
and butterfly processors, probably requires the most complex overhead circuitry. For the 
tree architectures, although the individual nodes are more complex than FIR filters, the 
co-ordination of these units, different in clock rate only, is simpler than the polyphase 
method. The control complexity is somewhere between the above two.
Yim, All-Digital Multicarrier Demodulators 3-28
3 A Comparative Study of Filter Banks Comparison of Demultiplexer Approaches 3.7
3.7.5 Modularity
A highly modular demultiplexer consists of mostly identical sub-units of low complexity. 
This has the potential of low control complexity, good-testability, high reliability and is 
hence suitable for VLSI integration. Design and manufacturing will be greatly simplified 
compared to non-modular approaches. In addition, identical units allow uniform 
distribution of computation load to make the most efficient use of hardware.
The binary tree method belongs to this category. For uniform channel spacing, the nodes 
are identical, including filter coefficients, except for the clock rate. In addition, with only 
small differences in distortion, node designs are applicable to demultiplexers with large 
differences in the number of channels. In other approaches using high order prototype 
filters, the filter lengths and coefficients have to be modified according to the number of 
channels.
The polyphase method can be viewed as consisting of dissimilar subfilters of the polyphase 
network, and butterflies of the FFT processor. Since the filter and butterfly coefficients 
are different for each unit, the modularity is the lowest. In between is multistage filtering 
-  individual stages are different, but the same set of stages are used in all channels.
3.7.6 Flexibility
Flexibility refers to the availability of efficient designs for arbitrary channel allocations, 
e.g., the number of channels and carrier spacing. Another important aspect is the ability 
to adapt to changing traffic requirements during the life time of a satellite.
Multistage filters are most flexible since the channel filters are designed independently to 
meet the individual channel requirements. Reconfiguration can be achieved by modifying 
the filter coefficients, but the computation rate has to accommodate the highest possible 
bit-rate channel.
The polyphase-DFT method is least flexible since uniform channel spacing is required. 
The number of channels depends on the FFT processor, and is most efficient when the 
transform length is a power of 2,4, 8 ,..., etc. Flexibility in the number of channels can 
be achieved by using more complex algorithms: the mixed-radix FFT can be used for a 
highly composite channel number; the chirp-Z transform can be used in place of the FFT 
for arbitrary number of channels. Reconfiguration is expected to be complex because both 
the number of sub-filters and the transform length have to be modified.
Yim, All-Digital Multicarrier Demodulators 3-29
3 A Comparative Study of Filter Banks Comparison of Demultiplexer Approaches 3.7
The general tree method can provide highly dissimilar bandwidths with a small number 
of nodes. All intermediate signals in the tree structure are valid TDM channels and can be 
selected, providing the simplest form of reconfiguration. A limited choice of channel 
bandwidth relocation can be provided by the switching of these intermediate channels. 
Compact stand-by modules increase the bandwidth variety.
The fast convolution and analysis-synthesis methods are targeted at non-uniform channel 
bandwidth. The advantage of these methods is the ability to provide dissimilar bandwidths. 
Their efficiency is difficult to access because of complicated filter design and dependency 
on the degree of bandwidth dissimilarity. Since they are either inefficient or irrelevant in 
the case of uniform spacing, the tree structure is best to exploit any regularity in a 
non-uniform frequency plan. For example, grouping of different bandwidth channels into 
equal bandwidth chunks. Both the fast convolution and analysis-synthesis methods allow 
bandwidth relocation by simple switching. However, reconfigurations that affect the sizes 
of channel bandwidths require stand-by DFT modules of different sizes. There is no 
apparent advantage over the general tree method in this aspect.
3.7.7 Summary
The polyphase-DFT method has the lowest computation rate except for very narrow guard 
band, but requires the largest memory size. This method is most suitable for single 
multiplier implementation as in signal processors. If more multipliers are required to 
increase throughput, the ratio of constant coefficients per multiplier decreases, and with 
it the hardware efficiency. Modifications of the polyphase method allow trade-offs between 
computation rate and memory size, e.g., pre-filtering using IFIR filter and post-filtering 
using half-band filters. These methods share the same characteristics as the tree method, 
but are less flexible and modular.
The tree method requires a higher computation rate, but smaller memory size. This trend 
continues as more nodes with smaller DFT sizes are used in the tree. The overall hardware 
efficiency is highly competitive with the polyphase method. Non-uniform channel 
bandwidth can be easily provided by a small number of different nodes. One of the 
attractions of this method is the low design cost for the high hardware efficiency. The 
hardware design of the binary tree method is based on a single 2-channel demultiplexer 
node. Node designs are also easily reusable for demultiplexers with different capacities.
Yim, All-Digital Multicarrier Demodulators 3-30
3 A Comparative Study of Filter Banks References 3.8
Single-channel methods are the least efficient overall. However, the short delay, simple 
design and high flexibility of multistage filtering may outweigh the inefficiency when the 
number of channels is very small. The relative efficiencies of different methods are close 
to unity in this case.
For non-uniform channel spacing, the tree method would be better than the fast convolution 
and analysis-synthesis methods if some regularity can be exploited. Adapting to changing 
traffic requirements would involve large overheads, and the tree method need not suffer 
most.
3.8 R eferences
Bi90a. G. Bi, F. P. Coakley, and B. G. Evans, “Rational Sampling Rate
Conversion Structures with Minimum Delay Requirements,” submitted 
to lEE Computer and Digital Techniques, 1990.
Crochiere83a. R. E. Crochiere and L. R. Rabiner, Multirate Digital Signal Processing, 
Prentice-Hall (1983).
Hsiao87a. C. C. Hsiao, “Polyphase Filter for Rational Sampling Rate
Conversions,” Proc. IEEE Int. Conf. on ASSP, pp. 2173-2176, Apr. 
1987.
Vaidynathan90a. P. P. Vaidynathan, “Multirate Digital Filters, Filter banks. Polyphase 
Networks, and Applications: A Tutorial,” IEEE Proceedings, Vol. 78, 
pp. 56-93,1990.
White89a. S. A. White, “Applications of Distributed Arithmetic to Digital Signal
Processing: A Tutorial Review,” IEEE ASSP Magazine, Vol. 6, pp. 
4-19, Jul. 1989.
M -B dik/\^er, bigîFaî Prôcessm g O'f SicyMa/s > > Î?E!^.
Yim, All-Digital Multicarrier Demodulators 3-31
4 Digital Modulation Schemes and Synchronization Techniques
Table of Contents
4 A Survey of Digital Modulation Schem es and 
Synchronization T e c h n iq u e s ...................................................... 4-1
4.1 Baseband Digital Modulation Formats  ............................. 4-1
4.1.1 16-QAM.............................................................................. 4-3
4.1.2 Linear Modulation Schemes  ........................................... 4-7
4.1.3 M8K ................................................................................ . 4-8
4.1.4 Constant Envelope Modulation Schemes  ...................... 4-11
4.2 Modulation Schemes for Mobile Applications ........................ 4-12
4.3 Comparison of Power and Bandwidth Efficiency................... 4-14
4.4 All-Digital Dem odulators............................................................. 4-19
4.4.1 Sampling Frequencies  .......................................................4-20
4.4.2 Adaptive Sampling.....................  4-21
4.4.3 Carrier Correction................................................................ 4-21
4.4.4 Detection....................................................... 4-22
4.4.5 Error Detectors.....................................................................4-22
4.4.6 Loop Structures....................................................................4-22
4.4.7 Alternative Demodulator Structures...................................4-24
4.5 Digital Symbol Timing Adjustment............................................ 4-25
4.5.1 Polyphase-Lattice Structures ....................................  4-26
4.5.2 Adaptive Sampling Control..................................................4-30
4.6 Synchronization in All-Digital Receivers ..................................4-32
4.6.1 Maximum Likelihood Estimation.....................................   4-33
4.6.1.1 Other Criteria.............................................................  4-34
4.6.1.2 Properties of ML Estimates........................................  4-35
4.6.1.3 Classification of Synchronizers..................................  4-35
Data-Aided Approach...................................................  4-36
Decision-Directed Approach ...........................................4-36
Non-Data-Aided Approach  ..........................................4-37
4.6.1.4 Extraction of Estimates............................................... 4-37
Direct Search.................................................................  4-38
Tracking......................................................................... 4-39
Direct Computation......................................................... 4-39
Recursive Estimation  .......................................  4-39
4.6.2 Tone Synchronization ......................................................... 4-40
4.6.3 Symbol Timing Synchronization......................................... 4-42
4.6.3.1 Zero-Crossing Algorithms..........................................  4-42
4.6.3.2 The Mueller and Mueller Algorithm ........................... 4-43
4.6.3.3 Symbol Clock Extraction............................................  4-44
4.6.4 Phase Synchronization ....................................................... 4-45
4.6.4.1 The Costas Algorithm................................................. 4-45
4.6.4.2 The Viterbi and Viterbi Algorithm.............................. 4-46
4.6.5 Automatic Frequency Control..............................................4-47
4.6.5.1 Phasor Filtering..........................................................  4-47
4.6.5.2 Differential Decision Frequency Error Detector  4-49
4.6.5.3 Dual-Comb-Filter Frequency-Timing Error Detector ... 4-51
4.7 Comparison of Demodulator Com plexity.................................4-52
4.8 References......................................................................................4-54
Yim, All-Digital Multicarrier Demodulators 4-i
A Survey of 
Digital Modulation Schemes and 
Synchronization Techniques
This chapter is an extensively revised version of a survey done as part of a contract 
for Marconi Space Systems. The prime aim of this chapter is to study techniques by which 
demodulators can be built from functional sub-units whose design is independent (or at 
most determined by a single parameter) of the chosen modulation format. Thus, rather 
than listing current (often ad-hoc) techniques, we would prefer to adopt a generalised 
description of the demodulation process [Yim90a]. It will be seen that this brings out 
many points of similarity between different formats, with many well known techniques 
dropping out as specialised cases. The demodulator structure derived from the generalised 
description may not, however, be the most efficient, as some format specific techniques 
may have an extremely simple implementation. Novel and improved algorithms will be 
listed in the relevant sections.
4.1 B aseband Digital Modulation Form ats
The narrow sense of modulation is frequency shifting a baseband signal to obtain a high 
frequency signal for propagation. Carrier terms in a modulation scheme can be represented
Yim, All-Digital Multicarrier Demodulators 4-1
4 Digital Modulation Schemes and Synchronization Techniques Baseband Digital Modulation Formats 4.1
by the generic modulation operator exp(/2jc/^^). This allows us to concentrate on the 
baseband signal formats. Using this complex notation, the baseband signals are explicitly 
complex for bandwidth and power efficient digital modulation schemes.
Most existing digital modulation formats can be represented by the two baseband forms.
Linear Modulation (LM)
5 {t) z  a„ bit - «  C?)| it) 
and Constant Envelope Modulation (CEM)
*q(t)/2mA -{ Z a^b(t-n<ïi
s{t)=A,e
where (T is the symbol period, Ai and A  ^ are the normalizing constants, and a„ is the 
transmitted symbol sequence with discrete complex values. In LM, the amplitude pulse 
g(t) is often based on a Raised-Cosine (RC) spectral shape for zero Inter-Symbol 
Interference (ISI). In CEM, the phase pulse q (t) determines the instantaneous phase of a 
complex sinusoid at baseband, with h as the modulation index.
Linear (or amplitude) modulation is so called because the superposition principle holds 
for the data symbols a„. For constant envelope modulation (or digital FM), |s'(^)|^ is 
unvarying. These baseband representations are also called the complex envelopes of their 
respective modulation schemes. A modulated signal is obtained by superimposing the 
instantaneous magnitude and phase of the envelope on the carrier.
LM includes M-ary Quadrature Amplitude Modulation (QAM), Phase Shift Keying (PSK), 
and Trellis-Coded Modulation (TCM) with signal space coding. CEM includes Continuous 
Phase Frequency Shift Keying (CPFSK) with Minimum Shift Keying (MSK) as the binary 
case. Wide-band PSK schemes that use a rectangular^ {t) also have a constant envelope. 
This will be clarified later. There are a variety of full response schemes, differing from 
CPFSK primarily in the phase pulse. CEM with signal space coding characteristics include 
partial response, correlative and multi-h schemes. Continuous-Phase Modulation covers 
most of these schemes. We adopt LM and CEM as the two main classes. The use of signal 
space coding divides each class into two. This classification also corresponds directly to 
receiver structure. For historical reasons, the more commonly used standard terms are not 
clear (or universally agreed upon). In addition, terms covering subsets of LM do not always 
have equivalents in CEM, and vice versa.
Yim, All-Digital Multicarrier Demodulators 4-2
4 Digital Modulation Schemes and Synchronization Techniques Baseband Digital Modulation Formats 4.1
The basic difference between LM and CEM is similar to that between AM and FM in 
broadcasting. CEM is less bandwidth efficient compared to LM for the same M (related 
to the number of bits represented by each symbol). However, the performance under 
non-linearities (for R.F. power efficiency) is better for CEM. This is the main reason for 
considering the use of CEM, otherwise, LM is superior in general. LM and CEM will be 
illustrated in detail by 16-QAM and MSK respectively.
4.1.1 16-QAM
In the 16-QAM scheme, each symbol is determined by 4 bits. Therefore each element of 
the complex number sequence has 16 possible values. The signal space is the complex 
plane with 16 points on the integer grid as shown in figure 4.1. This is the trellis or 
constellation for linear modulation. The lines joining the trellis points show all the possible 
transitions between symbols. For trellis-coded modulation (TCM), not all the transitions 
are allowed. Multidimensional signal spaces are realised in periodic time varying 2-D 
trellis, i.e., the trellis at n and n + ld xe  different.
The mapping of bit stream to is shown in figure 4.2 to 4.4. Although the input bit stream
is illustrated with a continuous rectangular wave, this is only to show the underlying time 
reference. We emphasis that bit streams are sequences of logic values each represented 
by 1 bit, whilst digital signals are number sequences (discrete in time) with finite precision 
(discrete valued). In DSP, a sufficiently sampled digital signal corresponds to the 
instantaneous values of an underlying continuous signal. Theoretically, each sample s {tiT) 
is equivalent to a weighted impulses (^T) ' ô(f -/zT) in the analog domain. Therefore given 
the sampling rate, with zero valued samples inserted between^,,, we obtain an ideal complex 
impulse train (or two impulse trains) in the digital domain, which is impractical in analog 
signal processing.
The generally used symbol pulse,^ (/), for QAM is the impulse response of a raised-cosine 
(RC) filter. The impulse response and frequency response of a RC filter with 40% excessive 
bandwidth (roll-off factor) is shown in figure 4.5. For matched filter detection, the overall 
RC frequency response is realised by identical transmit and receive filters with square root 
raised-cosine frequency response, as shown in figure 4.6.
Yim, All-Digital Multicarrier Demodulators 4-3
4 Digital Modulation Schemes and Synchronization Techniques Baseband Digital Modulation Formats 4.1
Re[ a(n) ]
1
0
0 16 32  48 64 80  96  112  128
TIME IN SYMBOL PERIOD/8
Figure 4 .116-QAM Trellis Figure 4.2 Input Bit Stream
4
3
2
1
0
1
- 2
- 3
- 4
16 64  80  96  112  1280 32  48
TIME IN SYMBOL PERIOD/8
- 2
- 3
- 4
0  8 16  24  32  40  48  56  64  72  80  88  96104112120
TIME IN SYMBOL PERIOD/8
Figure 4.3 Real Component of Figure 4.4 Imaginary Component of a„
14
12
.1
0 . 0 8
0 . 0 6
0 . 0 4
0 .0 2
0
- 0 . 0 2
3 11 19 27 35 43 51
TIME IN SYMBOL PERIOD/8
-10
-2 0
- 3 0
- 4 0
- 5 0
- 6 0
- 7 0
—80
- 9 0
-1 0 0 0 1 2 3 4 5 6 7 8
(a) Impulse Response (b) Frequency Response
Figure 4.5 Raised-Cosine FIR Filter with 40% Excessive Bandwidth
FREQUENCY IN SYMBOL RATE
Yim, All-Digital Multicarrier Demodulators 4-4
4 Digital Modulation Schemes and Synchronization Techniques Baseband Digital Modulation Formats 4.1
0 . 0 8
0 . 0 6
0 . 0 4
0 .0 2  -
- 0 .0 2
- 0 . 0 4
15  23  31 39  47
TIME IN SYMBOL PERIOD/8
7 55
- 1 0
-2 0
- 3 0
- 4 0m•o - 5 0
- 6 0  -
- 7 0
- 8 0
- 9 0
-1 0 0 0 1 2  3  4
FREQUENCY IN SYMBOL RATE
5 6 7 8
(a) Impulse Response (b) Frequency Response
Figure 4.6 Square Root Raised-Cosine FIR Filter with 40% Excessive Bandwidth 
The complex baseband signal s (t) is obtained through passing the real and imaginary
component of the impulse train Z - n ^  through two identical RC filters with
n
real impulse response g  (t). Figure 4.7 shows the baseband signal with the underlying 
impulse train. Here a RC filter (instead of square root RC) is used to show clearly the 
underlying number sequence.
4
3
2
1
0
•1
-2
- 3
- 4
0  8 16  24  32  40  48  56  64  72  80  88  96104112120 
TIME IN SYMBOL PERIOD/8
-2
- 3
- 4 0 16  32  48  64  80  96  112  128
TIME IN SYMBOL PERIOD/8
(a) Real Component (b) Imaginary Component
Figure 4.7 16-QAM Baseband Signal
Figure 4.8 shows the eye pattern for overall RC pulse shape. The real (or imaginary) 
component of s (t) is separated into time frames of 2 symbol period and subsequently 
overlaid. The average amplitude fluctuation of s (t) is maximum at the data strobes, i.e., 
at t=n*ï. Figure 4.9 shows the eye pattern of | r{t) f.  The average of these CTspaced
Yim, All-Digital Multicarrier Demodulators 4-5
4 Digital Modulation Schemes and Synchronization Techniques Baseband Digital Modulation Formats 4.1
samples exhibit maximum values at the data strobe, shown in figure 4.10. This is the 
principle for symbol timing synchronization. The adverse effect of phase and frequency 
errors are eliminated through magnitude computation.
1 2  3  4  5  6  7  8  9  1 0 1 1 1 2 1 3 1 4 1 5
TIME INTERVAL IN SYMBOL PERIOD/8
Figure 4.8 16-QAM Eye Pattern (Real Component)
2  3 4 5 6  7 8 9  10  11 12  13  14  15
TIME INTERVAL IN SYMBOL PERIOD/8
10
9 . 5
9
8 . 5
8
7 . 5
0  1 2  3  4 5 6  7 8 9  10  11 12  13  14  15
TIME INTERVAL IN SYMBOL PERIOD/8
Figure 4.9 16-QAM Square Magnitude 
Pattern
Figure 4.10 16-QAM Mean Square 
Magnitude Pattern
When the initial frequency error is non-zero, the spectrum of the received signal and the 
matched filter are not aligned in frequency. Tliis causes significant distortion to the overall 
RC response when the frequency error is large. In this case, synchronization prior to match 
filtering is desirable. The principle of maximum amplitude fluctuation also holds for square 
root RC response. Figure 4.11 shows the eye pattern prior to matched filtering. The mean 
square magnitude pattern is shown in figure 4.12.
Yim, All-Digital Multicarrier Demodulators 4-6
4 Digital Modulation Schemes and Synchronization Techniques Baseband Digital Modulation Formats 4,1
2 3  4 5 6  7 8 9  10  11 12  13  14 15
TIME INTERVAL IN SYMBOL PERIOD/8
1 2 . 5
I 1 0 . 5
10
9 . 5§
7 . 5
0  1 2  3 4 5 6  7 8 9  10  11 12  13  14 15
TIME INTERVAL IN SYMBOL PERIOD/8
Figure 4.1116-QAM Eye Pattern with 
Square Root Raised-Cosine Symbol Pulse
Figure 4.12 16-QAM Mean Square 
Magnitude Pattern with Square Root 
Raised-Cosine Symbol Pulse
After matched filtering and symbol timing synchronization, the received signal sampled 
at the symbol period is
where cOq and 0q are the carrier frequency and phase error respectively. Carrier
synchronization reduces to computations on the data strobe only, i.e., one sample per 
symbol.
4.1.2 Linear Modulation Schemes
In 16-QAM, the real (r„) and imaginary (y„) part of can take on values ±1,±3. This is
a square constellation since 16 is an even power of 2. QAM normally refers to constellations 
where are independent, where =±1,±3,±5... etc. This covers rectangular 
constellations with cross constellations as a subset. Specific constellations may have 
special names. Optimal constellations are hexagonal and therefore x„,y„ are related.
In PSK +y^ is a constant, therefore the constellation is circular. In tliis thesis, PSK
usually refers to narrow-band modulation schemes based on RC spectral shaping which 
produces a varying envelope; this is sometimes called Quadrature Phase Modulation 
(QPM), emphasising the similarity to QAM. Wide-band PSK with rectangular ^  (r) is a 
special case with a constant envelope.
Yim, All-Digital Multicarrier Demodulators 4-7
4 Digital Modulation Schemes and Synchronization Techniques Baseband Digital Modulation Formats 4.1
Particular PSK schemes are: QPSK with a„—± l± j \  BPSK with = ±1. They are both
circular and rectangular constellations. Although QAM and PSK are very similar, QPSK 
and BPSK should be called 4-QAM and 2-QAM respectively, based on the similarity in 
receiver structures. BPSK is the only scheme that has a real baseband signal, and is therefore 
very inefficient in terms of bandwidth, and offers no significant reduction in receiver 
complexity.
The amplitude pulses often used for LM have zero ISI. This condition holds for 
raised-cosine spectral response, though different pulse shapes may be used to achieve zero 
ISI. This is a filter approximation problem for analogue implementations, but a filter 
optimization problem for DSP. Partial response schemes with non-zero ISI can be used 
for further bandwidth reduction, but the same can be achieved using TCM.
Offset (or staggered) signals are LM schemes where the imaginary component of the 
baseband signal is delayed by half a symbol period with respect to the real component. 
This is a compromise between LM and CEM. The bandwidth (theoretically band-limited) 
and power efficiency is the same as the corresponding LM scheme, but envelope variation 
is drastically reduced in the case of PSK. This is also a linear scheme, and at first sight 
there should be very little difference in modulation and demodulation techniques. Although 
the real and imaginary components have zero ISI points, no such points exist when 
considering the signal as complex. The zero ISI points can only be recovered when phase 
(and frequency) error is zero. Therefore the inherent complexity of offset signals are often 
underestimated.
4.1.3 MSK
Figure 4.13 shows the phase pulse q {t) for CPFSK. For MSK, the modulation index h=0.5, 
and each symbol = ±1 corresponds to one bit. Starting at zero, the phase reaches %!2 
after one symbol and affects the starting point of the next symbol. For CPFSK, by 
convention, q{t) reaches 0.5 after one symbol period and remains for infinite time. 
Therefore q {t) can be considered to be the impulse response of an HR filter. FSK is so 
called because the frequency, i.e., the slope of the phase pulse, is constant within one 
symbol. For the purpose of generation, it is common to consider the frequency pulse q'{t), 
which is the slope of the phase pulse. Therefore q '{t) corresponds to the impulse response 
of a FIR filter. The infinite time convolution of and q {t) can be obtained from the finite 
convolution of and q 'if), followed by a phase accumulator.
Yim, All-Digital Multicarrier Demodulators 4-8
4 Digital Modulation Schemes and Synchronization Techniques Baseband Digital Modulation Formats 4.1
. 5
. 4
. 3
. 2
.1
0
1 - . 5  0 5 1 1 . 5 2
TIME IN SYMBOL INTERVAL
Figure 4.13 MSK (CPFSK) Phase Pulse
Figure 4.14 shows the mapping of logic value into continuous phase values. The resultant 
complex baseband signal s (t) is shown in figure 4.15. MSK is sometimes considered to 
be an offset QPSK like signal with half sinusoidal pulse shape. In this case one symbol 
corresponds to 2 bits. The mapping of logic value to a„ is different but identical in property. 
Sometimes the baseband signal of MSK is shown as real. This is confusing because a low 
I.F. ( an integer multiple of the bit rate ) is used. We prefer the strict baseband signal with 
no carrier frequency terms. The complex envelope is shown in figure 4.16. It is evident 
from the expression of the baseband format that constant envelope holds for arbitrary phase 
pulses. Figure 4.17 shows the spectrum of MSK. Although better than wide-band PSK, 
the MSK spectrum is less attractive than narrow-band linear modulation using RC pulse 
shaping. For MSK, a sampling rate of at least 4 samples per bit is required. In comparison, 
for RC response with 40% excessive bandwidth, the minimum sampling rate is 1.4 samples 
per symbol, i.e., 1.4 samples per bit for BPSK, and 0.7 for QPSK.
1
0
16 48  64 80  96  1120 32
- 2
- 4
—6
0 16 32 48 64 96  11280
TIME IN BIT INTERVAL/8 TIME IN BIT INTERVAL/8
(a) Bit Stream (b) MSK Modulated Phase
Figure 4.14 MSK Phase Mapping
Yim, All-Digital Multicarrier Demodulators 4-9
4 Digital Modulation Schemes and Synchronization Techniques Baseband Digital Modulation Formats 4.1
1
. 8
. 6
4
. 2
0
- . 2
.4
- . 6
—. 8
•1
16  32  48  64  80  96  112
TIME IN BIT INTERVAL/8
0
1
8
6
.4
2
0
- . 2
4
— . 6
8
1 0 16 32  48  64  80  96  112
TIME IN BIT INTERVAL/8
(a) Real Component (b) Imaginary Component
Figure 4.15 MSK Baseband Signal
1
5
0
- . 5
■1
1•1 - . 5 0 5
Re[ s(t) ]
-2 0
- 4 0
—60
- 8 0
-100
-1 2 0
- 140
- 1 6 0
- 1 8 0
0 1 2  3
FREQUENCY IN SYMBOL RATE
4 5 6 7 8
Figure 4.16 MSK Envelope Figure 4.17 MSK Spectrum
Since the amplitude of the modulated carrier is constant, information is encoded in the 
phase only. Figure 4.18 shows the ‘eye pattern* of the phase in figure 4.17, with interval 
equal to 2 symbol periods. It is more informative to show the phase difference spaced CT 
intime. From the phase pulse, the phase difference between time « (T and(«+l)(T  is 
either %I2 or -jc/2. The phase difference at the middle of each symbol interval is either 
zero when there is symbol transition, or remains at ±jt/2. The phase difference ‘eye pattern’ 
is shown in figure 4.19. (The visible transition between ±x!2 close to samples 0 and 8 are 
due to finite resolution for illustration purposes. Tlie true transitions coincide with the 
symbol boundaries, i.e., two vertical lines at samples 0 and 8.) We can see that symbol 
synchronization is associated with phase difference, which is maximum at symbol 
boundaries. To calculate true phase differences, we need rectangular to polar conversions. 
However, cosine and sine approximations are easily computed. This is shown in figure 
4.20.
Yim, All-Digital Multicarrier Demodulators 4-10
4 Digital Modulation Schemes and Synchronization Techniques Baseband Digital Modulation Formats 4.1
-2
—4
—6
0 1 2 3 4 5 6 7 8  9  10  11 12  13  14  15
TIME INTERVAL IN BIT INTERVAL/8
2
1 . 5
1
5
0
- . 5
■1
- 1  . 5
- 2
Figure 4.18 MSK Phase Pattern Figure 4.19 MSK Phase Difference Pattern
2
1 . 5
1
. 5
0
- . 5
•1
- 1 . 5
- 2
2
1 . 5
1
5
0
- . 5
- 1
- 1 . 5
0 1  2 3 4 5 6 7 8  9  10  11 12  13  14  15
TIME INTERVAL IN BIT INTERVAL/8
1 2  3 4 5 6  7  8 9 10  11 12  13  14  15  
TIME INTERVAL IN BIT INTERVAL/8
(a) SIN of Phase Difference Pattern (b) COS of Phase Difference Pattern
Figure 4.20 Approximations of MSK Phase Difference Pattern
4.1.4 Constant Envelope Modulation Schemes
For MSK, =±1. For general CEM, =±l,±3±5etc. Although the transmitted symbols
are real, the baseband signal is complex due to the complex exponential function in the 
baseband equation. The modulation index A is adjusted accordingly to maximize the phase 
change between symbols. Using complex notations, the carrier term can be absent because 
negative frequencies can be represented.
In CPFSK, the phase pulse increases linearly from 0 to 0.5 within one symbol period. The 
slope of the phase pulse (the frequency pulse) is constant within one symbol and zero 
outside. This rectangular pulse shape leads to the term frequency shift keying, with infinite 
bandwidth in theory. Smoother phase pulses result in more attractive bandwidth 
requirements. A variety of smoother phase pulses can be used, with the same end points
Yim, All-Digital Multicarrier Demodulators 4-11
4 Digital Modulation Schemes and Synchronization Techniques Modulation Schemes for Mobile Applications 4.2
0 and 0.5. For example, a raised-cosine (time domain) shape is smoother thdn a linear 
ramp. Using DSP, these smoother pulses share similar transmitter and receiver structures 
with CPFSK. However, in analog implementations, the phase pulse is difficult to 
approximate.
The above schemes with various phase pulses are full response types. Although the phase 
is correlated from symbol to symbol, the phase difference (or frequency) is independent 
between symbols. Using smooth phase pulses alone do not reduce bandwidth significantly. 
In partial response types, the phase pulse increases by 0.5 not in one symbol interval, but 
in multiple symbol intervals. That is, the frequency pulse is non-zero for more than one 
symbol interval. It is easy to see that tlie resultant phase is significantly smoother even 
with a frequency pulse extending to 2 symbol intervals. Partial response is popular in 
CEM schemes but not in LM schemes. This is because in full response LM, the signal 
amplitude at an arbitrary instant is already correlated with many symbols, and is 
independent only at the data strobe.
In multi-h schemes, h varies periodically from symbol to symbol. This signal space coding 
relates directly to TCM in LM.
In other CEM schemes, the phase pulse q {t) is the impulse response (can be infinite) of a 
filter specified in the frequency domain. Explicit filtering is used for improved phase pulse 
smoothing. Therefore symbols are correlated. These are called Correlative Phase 
Modulation (CPM) schemes, including partial response types above. In Correlative Phase 
Shift Keying (CORPSK), filters with raised-cosine shape (frequency domain) are used. 
Tamed Frequency Modulation (TFM) is a binary case. In Gaussian Minimum Shift Keying 
(OMSK), a Gaussian filter is used. The term Continuous Phase Modulation (CPM) covers 
all types of CEM, including non-linear schemes that cannot be fit in directly in our CEM 
equation. Here the non-linearity refers only to the mapping of into phase values, therefore 
the modulated signals have similar properties as linear coding schemes.
4.2 Modulation Schem es for Mobile Applications
The assumption of this chapter is the use of an FDMA scheme although much of the work 
would be applicable to TDMA schemes, although the operating speed is much higher.
Many studies have examined the optimum choice of modulation - three key factors in this 
choice are
Yim, All-Digital Multicanier Demodulators 4-12
4 Digital Modulation Schemes and Synchronization Techniques Modulation Schemes for Mobile Applications 4.2
(a) the need for low power
(b) the requirement for minimum bandwidth especially if L-band transmission is 
required
(c) the signal distortion (e.g., increase in signal bandwidth) due to the 
economically unavoidable non-linearities of a HP A.
The requirement for minimum power and for minimum bandwidth are conflicting. 
Multi-level schemes, e.g., 64-QAM, are now widely employed on point to point digital 
radio but are very sensitive to channel impairments. Although the use of linearising 
feedback or pre-compensation of signals can counteract some of the HPA non-linearity, 
this factor encourages the use of constant (or near constant) envelope schemes such as 
MSK or Offset QPSK (OQPSK). That said most systems have chosen QPSK partly from 
familiarity and its ease of demodulation but also from its amenability to filtering using 
root-cosine filters. Table 4.1 shows some of the existing and proposed schemes for FDMA.
Yim, All-Digital Multicarrier Demodulators 4-13
4 Digital Modulation Schemes and Synchronization Techniques Comparison of Power and Bandwidth Efficiency 4.3
Table 4.1 Existing and Proposed Modulation Schemes for Mobile Applications
System Service Access Bit
Rate
(Hz)
Mod Channel
spacing
(Hz)
Codec Inter­
leave
(s)
C/No
(dB)
Inmarsat-B signalling
forward
SCPC 6k BPSK conv 1/2
signalling
return
24k OQPSK conv 3/4
voice + data 24k OQPSK 20k 49
Inmarsat-C signalling
forward
SCPC 1.2k BPSK conv 1/2
signalling
return
1.2k BPSK conv 1/2
voice + data 1.2k BPSK 5k 8.64 37
Inmarsat-M signalling
forward
SCPC 6k BPSK conv 1/2
signalling
return
3k BPSK conv 1/2
voice + data 8k OQPSK 10k 0.12 42
Inmarsat
Aeronautic
signalling
forward
SCPC 600 ABPSK conv 1/2
Medium
Gain
signalling
return
600 ABPSK conv 1/2
voice + data 21k OQPSK 17.5k 0.04 48
Inmarsat
Aeronautic
signalling
forward
SCPC 600 ABPSK conv 1/2
Low Gain signalling
return
600 ABPSK conv 1/2
data 600 ABPSK 2.5k 0.67 36
TMI
(Canada)
voice + data SCPC 4.8k 16QAM 5k TCM 51
AMSC message
forward
SCPC 1.2k BPSK
(America 
Phase I)
message
return
600 BPSK
ABPSK: Avionic BPSK
Source: 2"* International Mobile Satellite Conference, Ottawa, 1990
4.3 Com parison of Power and Bandwidth Efficiency
The following tables are summaries of power and bandwidth efficiency of Linear 
Modulation schemes and various Constant Envelope Modulation schemes. B is the 
normalized bandwidth, the one-sided signal bandwidth divided by the bit rate. For linear
Yim, All-Digital Multicarrier Demodulators 4-14
4 Digital Modulation Schemes and Synchronization Techniques Comparison of Power and Bandwidth Efficiency 4.3
modulations, the signal bandwidth is theoretically and practically limited. For constant 
envelope modulations, the -30 dB bandwidth is given. For CEM, L  is the duration of the 
frequency pulse in symbols. L = 1 for full response, L > 1 for partial response. The power 
efficiency is indicated by the approximate normalized minimum Euclidean distance d^ i^n» 
in dB with respect to BPSK. For high Ei,INq, the approximate probability of error is given 
by
V
where is a. positive constant independent of E{,/Nq, e.g.: = 1 for BPSK; 1 for
QPSK.
Table 4.2 Efficiency of Linear Modulation
M Scheme B (dB)
2 BPSK 0.70 0.00
4 QPSK 0.35 0.00
8 8PSK 0.23 -3.57
16 16QAM 0.18 -6.99
Raised-cosine (frequœcy domain) amplitude pulse, excess bandwidth 40%. 
Data applicable to corresponding offset signals.
Table 4.3 Efficiency of CPFSK
M L Scheme h B (fmm (dB)
2 1 MSK 1/2 1.55 0.00
2 1/2 1.09 -0.62
3 1/2 0.82 -1.64
4 1 1/4 1.02 -1.38
2 1/4 0.60 -7.00
8 1 1/8 >0.48 -5.20
Rectangular frequency pulse.
Yim, All-Digital Multicarrier Demodulators 4-15
4 Digital Modulation Schemes and Synchronization Techniques Comparison of Power and Bandwidth Efficiency 4.3
Table 4.4 Efficiency of Raised-Cosine Phase Pulse
M L Scheme h B dLm (dB)
2 1
2
SFSK 1/2
1/2
1.56
1.04
0.00
-0.10
3 1/2 0.76 -0.54
4 1/2 0.64 -1.20
4 2 1/4 0.58 -1.77
3 1/4 0.42 -3.16
4 1/4 0.38 -8.53
8 2 1/2 0.93 2.18
3 1/4 0.45 -1.40
Table 4.5 Efficiency of Correlative Phase Shift Keying
M Scheme h - B dLm (dB)
2 CORPSK (2-3,1-i-D) 
0=1/64
0.5 0.81 -0.20
TFM 0.5 0.66 -1.00
CORPSK
(2-5,l-t-D-l-D2-^D3)
0=1/64
0.5 0.65 -2.40
4 CORPSK (4-7,1+D) 
0=1/64
0.5 0.72 2.00
Non-linear 
CORPSK (4-5) 
0=0.5
0.66 -0.30
8 Non-linear 
CORPSK (8-9) 
0=0.5
<0.52 -1.00
Nyquist-III phase filtering.
Table 4.6 Efficiency of Gaussian MSK
M Scheme h B d'mm (dB)
2 GMSK 0.5 0.78 -0.46
Gaussian phase filtering.
From the above tables of comparison, the following conclusions can be drawn.
Yim, All-Digital Multicarrier Demodulators 4-16
4 Digital Modulation Schemes and Synchronization Techniques Comparison of Power and Bandwidth Efficiency 4.3
LM is superior in terms of bandwidth efficiency. Very few CEM schemes, including 
partial response types and with high M, come close to BPSK, and far from QPSK. With 
the same power efficiency, the bandwidth of MSK is at least 4 times larger than QPSK 
and twice that of BPSK (RC filtered). For binary iM — 2) CEM schemes, TFM and GMSK 
come close to BPSK. CEM schemes with attractive bandwidths comparable to QPSK 
require M — A and above, with partial response or CORPSK. However, these correlative 
schemes require Maximum Likelihood Sequence Estimation (MLSE) with long 
observation intervals (number of symbols). It should be noted that an out of band power 
less than -40 dB is easily achievable (at least with digital modulators) for LM. The -40 
dB bandwidth for CEM is significantly larger than the -30 dB bandwidth given above, as 
CEM spectra have a low roll-off rate. For all-digital modems, the smaller the bandwidth, 
the lower the computation rate. Therefore there is strong reason to go for high M schemes, 
especially for CEM.
BPSK has the same power efficiency as QPSK, but requires twice the bandwidth. Therefore 
QPSK is more attractive than BPSK for all-digital receivers. In addition, PSK (RC filtered) 
schemes have marginally less average envelope fluctuations for higher M. For LM 
schemes, the bandwidth efficiency increases from that of QPSK for increasing M, but the 
power efficiency decreases. Using TCM, the power efficiency is increased with increasing 
demodulator complexity. This additional complexity is mainly due to trellis-decoding. The 
computation rate is actually reduced, since this is proportional to the signal bandwidth. To 
meet the future growiKin spectrum demand, a probable choice of modulation is bandwidth 
efficient TCM schemes. The receiver need not be highly complex. Sub-optimal decoders 
can be used, and a higher implementation loss due to the complex trellis can be allowed. 
Also, because of the high number of trellis points that can be manipulated, current TCM 
developments include coding against fading and non-linear channels.
Most of the CEM schemes shown have a lower power efficiency than BPSK. Yet, there 
are trade-offs within each entry between power and bandwidth efficiency. We have selected 
the minimum bandwidth scheme as a standard. Larger h values can be found to give better 
power efficiency, which can be better than BPSK. However, the bandwidth increases with 
increasing h, and the observation interval for MLSE increases. Replacing QPSK with 
schemes that are less vulnerable to non-linearity caused by R.F. amplifiers require highly 
complex CEM schemes, if similar bandwidth efficiency is to be maintained. As with TCM, 
the increased complexity is in detection, while the computation complexity varies with 
the signal bandwidth.
Yim, All-Digital Multicarrier Demodulators 4-17
4 Digital Modulation Schemes and Synchronization Techniques Comparison of Power and Bandwidth Efficiency 4.3
Offset PSK schemes and their variants are very attractive. Current developments in LM 
schemes are offset in nature. They have the desirable bandwidth and power efficiency of 
LM schemes, with largely reduced envelope fluctuations. However, for the same receiver 
complexity, the initial acquisition and steady state performance of offset schemes are 
poorer.
Yim, All-Digital Multicarrier Demodulators 4-18
4 Digital Modulation Schemes and Synchronization Techniques All-Digital Demodulators 4.4
4.4 All-Digital D em odulators
This section gives an overview of demodulator structures emphasising DSP.
The modulated signal format at baseband can be written as s The baseband signal 
at the receiver becomes
r{t)=s(t+BQ^,aJ
The timing error, Eq, is usually defined to be in the semi-closed interval [-0.5,0.5), since
signal delays equal to an integer multiple of the symbol period (T are equivalent to shifting 
the sequence a„. The frequency error cOq is normalised with respect to the symbol rate. This 
term is not the carrier frequency, but the small frequency difference between the carrier 
and local oscillator. The phase error is Gq. All noise terms are included in w (/).
In brief, the functions of the demodulator are synchronization and detection. 
Synchronization refers to the estimation and correction of the parameters Eq, (Oq and Bq.
Using the estimated parameters, the corrections performed are
Ar{t)e frequency
r{t)e phase
r{t-SQ^) symbol timing
Tliis is a simplification since correcting the timing error first leads to a different phase 
angle for correction. However, there are always two terms to be estimated and corrected 
separately. Similar consideration applies to the frequency error and phase error. Detection 
is the estimation of the sequence a„.
There are numerous possible demodulator structures. One example for linear modulation 
schemes is shown in figure 4.21. The overview of the demodulator structure and possible 
variants will be discussed in the following sections.
Yim, All-Digital Multicarrier Demodulators 4-19
4 Digital Modulation Schemes and Synchronization Techniques All-Digital Demodulators 4.4
AAF -  A /D - DAF -  AS HX
TAG —  TED
XO OSIN
I
GAG
z n
M F—{X
FED
FAG —  MUX
I
DEG
GSIN FED
PAG
AAF
XO
MF
CSIN
TED
FED
FED
MUX
Analogue Anti-aliasing Filter DAF
Crystal OscillatCH* AS
Matched-Filter DEC
Cosine and Sine Table CAC
Timing Error Detector TAC
Phase Bror Detector PAC
Frequency Error Detector FAC
Multiplexer
Digital Anti-aliasing Filter 
Adaptive Sampler 
Bit Decision 
Cyclic Accumulator 
Timing Accumulator 
Phase Accumulator 
Frequency Accumulator
Figure 4.21 All-Digital Demodulator Structure
4.4.1 Sampling Frequencies
After the analogue anti-aliasing filter (AAF), the input signal is initially sampled at some 
rate at the A/D convertor, driven by a free rtuming oscillator (usually a crystal oscillator, 
XO). Sampling frequencies in the following stages vary according to the algorithms used, 
to take advantage of multirate processing. To simplify discussion, these rates are generally 
denoted by F] = 1/7). It is more convenient to normalize these rates with respect to the 
symbol period:
/ n  = %
That is, /v  is the number of samples per symbol. Sampling rates are normally reduced
further away from the A/D convertor, requiring digital anti-aliasing filters. The DAF here 
represents the demultiplexer or digital anti-aliasing filter bank in the MCD. In general, a 
higher A/D sampling frequency allows a simpler or better AAF, with decimation in the 
DAF to the required sampling rate in the following stages.
For linear modulation, only one sample per symbol = 1) is required in the decision
(DEC) of the symbols a^. Therefore the output of the matched-filter (MF) can be sampled 
at/v  = 1. However, the input at the MF should be sufficiently sampled. For RC pulse
Yim, All-Digital Multicarrier Demodulators 4-20
4 Digital Modulation Schemes and Synchronization Techniques All-Digital Demodulators 4.4
shapes with up to 100% excess bandwidth,^ = 2 is adequate. The signal in the preceding 
stages also should be sufficiently sampled, witli rates depending on the synchronization 
algorithms and intermediate signal bandwidths.
4.4.2 Adaptive Sampling
In analogue and hybrid demodulators, the A/D convertor is driven by a voltage controlled 
oscillator (VCO) instead of a free running XO. Controlled by the demodulator, the VCO 
frequency and phase locks on to the timing format of the modulated signal. The sampling 
frequency is synchronized to the symbol rate, i.e., F) = C% where C is a small positive 
integer in general. The sampling phase is adjusted such that the timing error Eq is zero. 
Because of the synchronism, symbol-rate sampling (/5v = 1) and filtering may be performed.
In all-digital demodulators, the functions of A/D and VCO are performed digitally in the 
adaptive sampler (AS). All signals prior to the AS are sufficiently sampled. In MCD’s 
where a single A/D and shared digital processing is desirable, using an AS in each channel 
demodulator is the only way to acliieve timing synchronization, if the data channels have 
independent symbol timing.
For best performance, the AS stage should be put before all other error detection and 
correction stages, otherwise, detection and correction is performed in different time scales. 
The input to the AS is Eg and the timing correction performed is r  (« 7) -  Eq). The AS can 
be implemented as a multirate filter with variable sampling phase. A simpler description 
of the AS is a digital variable delay element. However, the synchronization of the sampling 
frequency with the symbol rate is also important
4.4.3 Carrier Correction
The first multiplier performs the carrier frequency correction
The second multiplier deals with the phase
r{tiT^ e
The complex exponent terms are obtained from tables of cosine and sine values (CSIN), 
with phase angles as the indexes.
Yim, All-Digital Multicarrier Demodulators 4-21
4 Digital Modulation Schemes and Synchronization Techniques All-Digital Demodulators 4.4
This is a 2nd order split-loop structure. If the frequency error is significant, its correction 
must be placed before the matched-filter (MF). This is because the MF and the signal 
should have the same centre frequency, zero for baseband synchronization. If residue errors 
are present, the signal and filter are not frequency aligned, and distortions would be 
introduced. The phase correction can be combined with the frequency correction. However, 
a separate correction close to the phase error detector avoids the delay due to the MF, 
which improves the loop performance.
4.4.4 Detection
Matched-filtering can be derived from optimal estimation theory. In brief, the same pulse 
shape g  {t) should be used in both the transmitter and receiver. If the pulse shape is 
root-raised-cosine, the MF output sampled at correct instances one symbol period apart 
are the continuous-valued estimates of The decision (DEC) function then obtains a 
discrete-valued estimate â„. The ME output sampled a t^  = 1 is known as the data strobe.
4.4.5 Error Detectors
Ideally, the timing, frequency and phase error detectors (TED, FED and FED) produce 
output values proportional to the parameters Eq, coq and Sq. These detector outputs ultimately 
form the estimates for use in correction, generally once per symbol. These detectors (and 
general estimation techniques) can be classified according to where their inputs are 
obtained. The FED shown is decision directed (or decision feedback), since are used. 
In contrast, the TED and FED shown are non-data-aided. A third type of detector is 
data-aided, that is, the sequence is known. This type of detector is used when a known 
sequence of data is transmitted initially to speed up or ease synclironization.
4.4.6 Loop Structures
The feedback-loop teclmique is a particular way of estimation. Simpler (fewer 
computations) detectors can be used, and non-ideal detector characteristics can be tolerated. 
If the parameter is unknown but constant, e.g. Eq, a first-order-loop suffices. In contrast to 
analogue loops, digital loop filters are simply accumulators (AC’s), i.e.,
ê„=ê„_,+^,e„
Yim, All-Digital Multicarrier Demodulators 4-22
4 Digital Modulation Schemes and Synchronization Techniques All-Digital Demodulators 4.4
where the loop constant A:, determines the loop bandwidth, and is the detected error at
the present symbol indexed by n . The new estimate ^  is the last one plus the present input 
error. For noise rejection, the loop constants are less than one, and are usually chosen to 
be 2“", /z e to avoid multiplications in 2’s complement arithmetic.
For ideal detector characteristics and white noise, AC’s are most simple and effective. 
Other loop filters may be used to overcome the non-ideal characteristics of the detectors 
as in control systems, but in DSP, improvements are always made on the detectors rather 
than the loop filters.
The purposes of feedback loops are best explained at steady state, first neglecting the effect 
of noise. For the first order timing loop, the timing accumulator (TAC) input is zero, 
otherwise the TAC output grows witliout bound." The TAC output is a constant equal to 
the timing error and corrected in the adaptive sampler (AS). The TED detects no error and 
hence steady state is maintained.
If the multiplexer switch (MUX) connects the FED output to the frequency accumulator 
(FAC), a second order phase-lock-loop results. The phase correction part with the phase 
accumulator (FAC) can be explained as a separate first order loop. For the frequency 
correction part, the input value of the FAC at steady state is zero. The FAC output is a 
constant equal to the frequency error. The cyclic accumulator (CAC) combined with the 
CSIN table is often called a numerical controlled oscillator (NCO). The constant frequency 
value at the CAC input can be regarded as a constant rate of change of phase. The CAC 
outputs are the phase angle values that increase modulo 2% with time.
Because of the non-linear characteristics of phase-lock-loops, a large initial frequency 
error leads to slow lock-in or total failure. It is often desirable to have a first order frequency 
loop initially, by connecting the FED with the FAC tlirough the MUX, until the frequency 
error is small.
Digital loops have similar behaviours as analogue ones. The analysis with ideal detector 
characteristics is simple but often inadequate. Full analysis requires solutions of non-linear 
equations that are often performed numerically. It is simpler and more useful to simulate 
the simple error detector algoritlims themselves under various conditions, and to search 
for the design parameters kf, kp and kj for the TAC, FAC and FAC respectively.
One of the factors that determine accumulator word lengths is the ranges of the error 
estimates (timing, frequency and phase). We have seen that at steady state, the output of
Yim, All-Digital Multicarrier Demodulators 4-23
4 Digital Modulation Schemes and Synchronization Techniques All-Digital Demodulators 4.4
the AC’s are equal to the estimated errors. Since the error estimates are bounded, the output 
ranges of all AC’s should be equal to the bounded values. However, the practical AC range 
of each estimate requires different considerations.
If the frequency offset of a channel is specified to be within the range [-/^,/i/). an FAC
range of [-/mJ m) is apparently adequate. However, the presence of noise will cause 
overflow (or underflow) in the accumulation process ô)„ = co„ _i + AyCO„. In 2’s complement 
arithmetic, if the true value of co„ is larger t h a n t h e  value of the accumulator becomes 
0)„ -  % ,  which is unacceptable as a valid estimate. The most general solution to avoid 
overflow is to use one extra bit for the FAC. Two extra bits may be required for low E,,INq, 
i.e., a large noise power relative to the signal power. In this way, overflow will occur with 
a very small probability. Even when an overflow occurs, the frequency loop will return to 
steady state in a short time. Gardner [GardnerSSa] proposed to use a saturation accumulator, 
i.e., if the true value of co„ is larger than the value of the accumulator becomes The 
disadvantage is that saturation introduces a bias when the true frequency error is close to 
the extreme ends of the specified range. In addition, a saturation accumulator would not 
be much simpler than a 2’s complement accumulator with one more bit. In contrast, the 
PAC and CAC must be 2’s complement accumulators (or equivalent), because phase values 
are modulo 2%. Timing errors are modulo (the symbol period). Apparently, a 2’s 
complement accumulators should be used as in the PAC, but we have to detect each 
overflow and then take appropriate actions, which will be discussed later in symbol timing 
adjustment
4.4.7 Alternative Demodulator Structures
The signal models above are simplified. The parameters Eq, cOq and 0q are not true constants
in practice. These errors can be assumed constant if they are very slow varying, e.g., drifts 
in symbol clock and carrier oscillator frequency.
Relative motion between transmitters and receivers cause Doppler shift. This can be 
modelled as a simple carrier frequency offset in addition to the carrier frequency 
uncertainty. A second order phase lock loop suffices. However, a constant relative 
acceleration requires a third-order loop for zero steady state error. In timing 
synchronization, a first order loop corrects symbol clock phase errors. Unequal transmitter 
and receiver symbol clock frequencies require a second order timing loop. However, the 
poor transient behaviour, instability and complexity make these higher order loops inferior 
for small parameter variations.
Yim, All-Digital Multicarrier Demodulators 4-24
4 Digital Modulation Schemes and Synchronization Techniques Digital Symbol Timing Adjustment 4.5
One parameter that is not included in the above model is the power level of the received 
signal. The estimation of power level uses simple averaging and is only important in the 
detection of linear modulation schemes such as QAM. Digital automatic gain control 
(AGO is seldom used in all-digital modems. Amplitude insensitive estimation algorithms 
can be derived, and a longer A/D word length provides a larger signal dynamic range.
There are other more complicated models of the received signal. A linear model such as 
non-frequency-selective fading that leads to rapid phase variations can be countered by 
using Kalman filters. Alternatively, conventional techniques such as interleaving can be 
used without increasing the complexity of synchronization. Non-linearity such as that 
caused by saturated amplifiers can be compensated by using constant envelope modulation 
schemes. Therefore, although the above model of the received signal is simple, it is the 
most often used in synchronization, and hence the derived demodulator structures are most 
general.
Baseband synchronization deals mostly with complex signals. It is just as convenient to 
perform synchronization at passband with digital I.F. frequencies such as exp(/ %IAn). Then 
synchronization involves mostly real numbers that may be more convenient to implement.
In the above demodulator structure, synclironization refers to parameter estimation and 
correction. However, correction need not be performed since the ultimate demodulator 
function is to estimate the data sequence. In the reference symbol or remodulation approach, 
the demodulator attempts to generate a reference signal with the same parameters and data 
sequence as the received signal. If successful, both the parameters and data sequence are 
known. In the reference-less approach, not even the parameters are known explicitly. 
Although demodulator structures differ drastically, they share the basic estimation 
techniques, and not all of them will be examined in later sections.
There are two main classes of demodulator structures. In the feedback structure above, 
correction is performed before error detection. The reverse is true in feed forward 
structures, and the error detectors in this case are called estimators. Often the same 
estimation technique can be implemented in both structures. The choice depends on factors 
such as symbol rate and computation hardware type.
4.5 Digital Symbol Timing A djustm ent
The adaptive sampler for symbol timing adjustment involves two parts. One is an 
anti-aliasing filter that is capable of providing a computation efficient variable delay. In 
the most general case, this filter performs sampling rate conversion, and therefore a
Yim, All-Digital Multicarrier Demodulators 4-25
4 Digital Modulation Schemes and Synchronization Techniques Digital Symbol Timing Adjustment 4.5
polyphase-matrix structure can be used. The combination of variable delay and sampling 
rate conversion leads to a novel polyphase-lattice structure. The second part is the control 
of the polyphase-lattice structure to achieve symbol timing synchronization without 
varying the sampling clock of the A/D convertor.
Analogue demodulators attempt to sample the data strobe at exactly the symbol rate by 
controlling a VCO. Many so called digital demodulators use the same technique, except 
that the sampling is performed at an earlier stage. The term hybrid rather than digital is 
more appropriate. The major problem in digital adjustment of symbol timing is that main 
stream DSP theories deal only with known sampling rates. Yet, the function of adaptive 
sampling does appear in symbol-rate adaptive equalizers. The prime aim of these equalizers 
is the compensation of channel distortions and require a large amount of computations. If 
only adaptive sampling is required, we can apply multirate DSP techniques to obtain much 
more efficient structures suitable for high bit rates, or for multichannels as in MCD’s.
According to [Gardner90a], discussions of symbol timing adjustment for MCD’s can be 
found only in [Takahata87a]. However, timing adjustment and estimation are closely 
knitted in this paper. Likewise, previous investigations consider special cases of the 
variable delay element, its control, and the demodulator structure to handle the 
asynchronism. The apparently complex function of timing adjustment becomes simple if 
a proper partitioning of the problem is performed. In this section, new developments since 
the MCD implementation published in [ Yim88a] will be reported, together with alternative 
approaches.
4.5.1 Polyphase-Lattice Structures
Let s {n Tq) be the input signal sampled at period Tq, after passing through a continuous 
anti-aliasing filter h (/), we have
r{t)= Z s{nT ^h{t-nT^
n - - 0 0
Resampling at period 7\ with an arbitrary delay (timing phase) e yields 
r[( t + e)7’,]=  Ë s(nT^h[(k + B)T^-nT^
n = - «
For the purpose of timing adjustment, let
V
Yim, All-Digital Multicarrier Demodulators 4-26
4 Digital Modulation Schemes and Synchronization Techniques Digital Symbol Timing Adjustment 4.5
where the positive integer 7  is the timing resolution with respect to Fj, and the integer v 
controls the delay (or advance), which can be larger than one sample. Specifying the filter 
sampling period 7*3 by
Tq=JLT^
T^=JMT^
we get
r^{k)= E s{n)hW k + v)M -nJL]
For a FIR filter with N  taps,
A(/z) = 0, for /z < 0 and n > N  — 1 
The index n is bounded by
(^<ijk + v )M -n J L  < N - 1
The summation limits are
«n =
(Jk + v)M 
JL
iJk-\-v)M
JL
- N  + 1
Therefore the above convolution is not directly realizable because we have to keep count 
of the unbounded index k. The usual approach used in simple sample rate conversion 
[Crochiere83a], and in more complicated timing adjustment [Gardner90a], is by the change 
of variables i= n i~ n ,  giving
/( ( :)=  Z J
1 = 0 VL
(Jk + v)M 
JL h[iJL + iJk + v)M@JL)
Although this type of expression is commonly used, it is neither informative nor useful 
for implementation. Because of the floor and ceiling functions, the summation limit, signal 
and filter samples used in convolution depend on the unbounded index k . A better approach 
should exist as we have the prior knowledge that for every M  input samples, there are L 
output samples. A cyclic behaviour is therefore evident.
Yim, All-Digital Multicarrier Derrwdulators 4-27
4 Digital Modulation Schemes and Synchronization Techniques Digital Symbol Timing Adjustment 4.5
Again using routine parallelization, we obtain the following novel polyphase-lattice 
structures. Parallelizing the output signal only with k = lL + p ,  we obtain the 
polyphase-matrix structure:
rl{l)= Y. s{lM -iW Ai)i=-M
where the matrix of sub-filters is
/?;(/) =h{iJL +pJM  + vM)
Parallelizing both the output and input signals with n —m M —q'm  addition, we obtain the 
polyphase-lattice structure:
M — 1  ° °
r^(/)= Z Z s {I-iWp U)
q=Qi = - ^  ^
where the lattice of sub-filters is
hJp^ = h {iJML + p J M +qJL -f- vM)
These approaches bring out directly the desirable features for implementation. The timing 
reference k  is maintained by uniform shifting the input into a register as in simple FIR 
filtering. A block of M samples are shifted into the polyphase-matrix structure, whilst one 
sample each of the parallelized signal is sliifted into the polyphase-lattice structure. The 
polyphase matrix or lattice of sub-filters can in theory be HR filters. This contradicts the 
claim that only FIR filters can be used [Gardner90a]. The sub-filters are of constant 
parameters, for example, the FIR coefficients are predetermined. Therefore only simple 
FIR (or possibly HR) filtering is performed. A change of delay v merely requires a different 
set of coefficients to be used for the sub-filter specified by p  and q .
The filter specifications are valid for arbitrary positive integers L, M  and the timing 
resolution 7. If we use the filter sampling frequency 7^  and there exists a common factor 
Nq such that
hJp^ -hlNAJlxi +A^ 2F +^39 +^4^)] 
then only one out of Vq filter coefficients are actually used. Alternatively we can use the
sampling period directly such that all filter coefficients are used. Therefore 
unnecessarily high 7^  is automatically taken care of, without limitations imposed on the 
other parameters. This guarantees minimum computation rate for arbitrary parameters.
Yim, All-Digital Multicarrier Demodulators 4-28
4 Digital Modulation Schemes and Synchronization Techniques Digital Symbol Timing Adjustment 4.5
The floor and ceiling functions can be used in the sub-filter definitions to determine the 
range of i for each sub-filter. Different ranges of i specify different alignment of sub-filter 
coefficients with signal samples in the shift register, as illustrated in the polyphase-matrix 
sampling rate convertor in chapter 2. This is emphasised again in figure 4.22. For a single 
rate FIR filter, the filter length determines the shift register length. There is only one 
possible alignment in convolution. However, in a polyphase matrix structure, the span of 
sub-filters with unequal but similar length determines the shift register length. The 
alignment of filter and signal samples is formally given in the sub-filter definitions.
s(q)
r(p)
Figure 4.22 Alignment of Sub-filters Coefficients with Signal Samples
The delay variable v can be specified to have an arbitrarily range, e.g., the intervals 
{-J 4-1,0], or [0,y -1 ]. The polyphase-lattice sub-filters required can then be found from 
the definitions above by stepping tlirough all possible values of v. It can be seen that the 
larger the range of v, the more sub-filters are required. It seems inefficient at first sight, 
but the number of computations is independent of the number of sub-filters. Apparently, 
ROM size increases with the number of sub-filters. However, these sub-filters are not 
unique. For two different lattice elements, the same set of coefficients may align with 
different signal samples in the shift register during convolution. A larger range of v simply 
requires a longer shift register for the adjustment of timing range. Often the unique 
sub-filters can be found by inspection of the lattice elements. More formal methods can 
be given. For example, instead of indexing the lattice of sub-filters by ip,q,v), the 
following equation can be used:
m =Np+N^ +N^v
and the number of unique sub-filters is given by the range of w. More complex schemes 
of avoiding the storage of duplicated coefficients are generally possible, but the overhead 
of selecting the required sub-filter in the lattice increases.
Yim, All-Digital Multicarrier Demodulators 4-29
4 Digital Modulation Schemes and Synchronization Techniques Digital Symbol Timing Adjustment 4.5
This approach can be called explicit filtering. The alternative approach proposed in 
[Gardner90a] uses classical interpolation methods, and an implementation example is 
given in [Farrow88a]. This approach can be called implicit filtering since classical 
interpolation does not make use of explicit filter coefficients. However, an equivalent 
filter specification can always be found. The claimed advantages of this method are that 
the implicit filter is short and no switching between filter coefficient sets is required. To 
compare these two methods, we need to compare the filter specification (spectrum) and 
tlie inherent hardware complexity.
The filter length in explicit filtering need not be long. The transition band of the explicit 
anti-aliasing filter is given by
à f = F , - B
as explained in the earlier sections. Together with the input sampling rate Fq, we have
complete control over the filter lengdi. To begin comparison, we have to design an explicit 
FIR filter with the same spectral specification as in the implicit filtering method chosen. 
At this stage, it can be seen that the implicit filtering method is less flexible in the filter 
spectrum, and the use of optimal equiripple techniques guarantees minimum length of the 
explicit FIR filter.
The explicit filtering method requires scalars only (constant coefficient times variable 
signal). This fact has to be exploited for an efficient implementation, such as using 
distributed arithmetic. For short sub-filters, only a smaU ROM needs to be switched when 
different delays are required. In contrast, the implicit filtering method requires multipliers 
(variable times variable). Therefore conclusions can only be drawn given the bit rate and 
a detailed hardware design of both methods. The explicit filtering method has the advantage 
that it is suitable for both digital signal processors and custom hardware implementation. 
The use of distributed arithmetic in custom hardware has been discussed already. The 
architectures of digital signal processors are optimized for FIR filtering as a minimum 
requirement, and the switching of filter coefficient involves very little overhead.
4.5.2 Adaptive Sampling Control
Digitally modulated signals have a symbol period. The transmitter can be regarded as 
having an explicit symbol clock. For all-digital receivers, the A/D sampling clock is fixed, 
but sampling frequency varies inside the demodulator. It is convenient to regard the 
demodulator as having an implicit receiver symbol clock, which has a fixed relationship 
with the A/D sampling clock. Because of component tolerance, the exact value of the
Yim, All-Digital Multicarrier Demodulators 4-30
4 Digital Modulation Schemes and Synchronization Techniques Digital Symbol Timing Adjustment 4.5
transmitter symbol clock and the receiver A/D sampling clock is unknown. That is, the 
transmitter and receiver symbol clock frequencies are never identical, though very close. 
The function of the demodulator is to adjust the frequency and phase of the receiver symbol 
clock to match that of the transmitter. Synchronization of symbol timing requires adaptive 
control of the polyphase-lattice structure. We call this adaptive sampling.
Since the symbol clock frequency error is small, a first order timing phase-lock-loop 
suffices. The symbol clock phase adjustment is performed in the most general case by the 
polyphase-lattice structure, which provides a variable delay. The amount of delay is given 
by the timing synchronization algoritlim as an integer variable v. Since a realizable signal 
delay must be finite, let v e  [vq, vJ .  If the symbol clock frequencies of the transmitter and 
receiver are not exactly equal, the difference in timing phase grows indefinitely, requiring 
an infinite range of delay. Therefore v will take on values outside the range [vq, Vj], no 
matter how large the range is. This problem is easily solved by focusing only on the output 
signal of the variable delay element used. For the polyphase-lattice structure, this is
r;m
If V > Vi, let
v'<Vi
If V < V q , let
V ' >  Vn■ p 1 • p — 'Q
The increment of the index I is maintained by the regular shifting of signal samples into 
tlie polyphase-lattice structure, each shift denotes a processing cycle. In most of the cycles, 
the output is required, and the normal sequence of operation is
(1) SHIFT (2) COMPUTE OUTPUT (3) NEXT CYCLE
When the (/ +1)^ * output is required, the sequence is
(1) SHIFT (2) NEXT CYCLE
When the (/ -  output is required, the sequence is
(1) COMPUTE OUTPUT
(2) SHIFT (3) COMPUTE OUTPUT
In this case both the (/ — 1)^ * and outputs are required.
(4) NEXT CYCLE
Yim, All-Digital Multicarrier Demodulators 4-31
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
This solution was implemented and published in [Yim88a], with the special case where 
L = M = 1 in the polyphase structure. This solution was originally developed to solve the 
initial timing phase acquisition problem, which proved to be unsolvable by the 
straightforward use of coarse and fine control suggested in [Gardner85a]. The tracking 
problem is less tricky than the acquisition problem, since the timing phase in the former 
case is slow varying. A simplified version of control was implemented in [Yim89a], where 
there is always only one shifting and output computation in each processing cycle. This 
lead to doubling or skipping of symbols, and hence bits. If this is not allowed, the 
demodulated bit streams of each channel in the MCD have slightly different rates, and 
cannot be simply multiplexed into a TDM stream.
The alternative control method proposed in [Gardner90a] uses a variable counter called 
NCO to lock on to the transmitter symbol clock frequency. This control method is claimed 
to be elegant However, this method corresponds to the use of a second order loop to track 
a very small frequency offset Tlie expected poor transient behaviour during initial timing 
acquisition is reported in [Erup90a].
The maximum computation rate has to accommodate the case of (/ — 1) above, where two 
output computations have to be performed in one processing cycle. For a tight clock 
tolerance, this rarely happens, but the maximum rate has to be approximately twice the 
average value. This case would not occur if the implicit symbol clock frequency at the 
receiver is higher than that at the transmitter [Oerder90a]. The alternative method to avoid 
unnecessarily high hardware speed, suggested in [Gardner90a], can be efficiently 
implemented using the above polyphase-matrix structure, rather than the polyphase-lattice 
structure. A decimation rate slightly higher than 1 is required, i.e., L ~ M  and MIL > 1. 
Instead of shifting in M  samples and producing L samples in the normal operation, the 
input and output are performed on a sample by sample basis, subject to timing control. 
The normal computation rate is proportional to L/M, but the maximum is proportional to 
1. Therefore only a slight increase in the maximum computation rate can tolerate a higher 
transmitter symbol clock frequency.
4.6 Synchronization in All-Digital Receivers
Synchronization in general refers to the estimation of parameters such as carrier phase, 
carrier frequency and symbol clock phase, and the correction of these parameters on the 
received signal for detection. Synchronization functions also include frame, packet or 
word alignment that are not covered here.
Yim, All-Digital Multicarrier Demodulators 4-32
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
A brief review of maximum likelihood estimation theory is given. ML derived 
synchronizers are yardsticks for performance and complexity comparisons. Starting from 
the ML approach, optimal receiver structures can be derived from prior knowledge of 
signal formats and channel characteristics.
The most important synchroitization algorithms are listed in the following sections, 
together with novel or improved techniques. Instead of dealing with well developed, but 
intricate, estimation theory, simple methods of approximation are developed to characterise 
synchronization algoritlims, so that their complexity, advantages and limitations are 
obvious. This chapter covers synchronization techniques for non-offset linear modulation 
schemes. The basic estimation approaches, assessments of demodulator complexity and 
the efficient applications of multirate DSP are relevant to other schemes, which will be 
discussed at the end of this chapter.
4.6.1 Maximum Likelihood Estimation
A statistical optimization problem requires the following:
(a) criterion 
(c) information
(b) structure (of the receiver)
The derived structure approach leads to an optimal solution. The criterion is first chosen. 
Maximum Likelihood (ML) is the appropriate criteria for synchronization. Information 
required such as signal format, prior probabilities and noise distributions will then be 
apparent from the problem formulation. Finally, the solution leads to the receiver structure. 
Important points in classical ML estimation theory [Trees68a] will be briefly discussed. 
Detailed examples of synclironizer derivation can be found in [Gardner88a].
To simplify discussion, let the transmitted signal at baseband be s (t), which carries the 
complex data sequence a„. The received baseband signal suffers time delay and phase 
rotation:
r(t) =s{t
It is more convenient to rewrite the received signal as
where (j)o is the parameter vector to be estimated in the demodulator. For ML estimation, 
the log likelihood function is given by
Yim, All-Digital Multicarrier Demodulators 4-33
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
Tq
And the ML estimate is
^ML = mpc X(ÿ)
where ^  is a vector containing the trial values of unknown parameters. Putting it in another 
way, is a signal generated at the receiver that is an arbitrary guess of what the
transmitted signal becomes when it reaches the receiver, while r{t) is the actual received 
signal. Alternatively, we can correct the received signal using the trial parameters and 
compare with the unmodified transmitted signal s {t). The likelihood function is merely 
the minimum distance of two signals for a time duration of Tq, which is called the 
observation period.
The straight forward interpretation, and hence possible implementation, is trying all 
possible combinations of unknown parameter values making up different $’s. The trial 
that minimises the distance between the received and reference signal is the ML estimate 
denoted by The larger the observation period (usually in terms of number of symbols), 
the more accurate the estimate. This is in general unfeasible because the search space is 
multidimensional. Different approaches to tackle the maximisation problem leads to 
different architectures, often with sub-optimal approximations.
4.6.1.1 Other Criteria
The ML estimate is attractive because it is simple compared to other criteria. The true 
parameter vector, (j>o, is treated as non-random. The advantage is that we do not need to 
know the probability density functions. If we take this prior statistical knowledge into 
account, the Maximum A Posteriori (MAP) estimate results. The MAP criterion takes 
into account of the fact that if some parameter values are more likely than the others, the 
above likelihood function should be weighted accordingly. If the parameter values are 
uniformly distributed over some feasible range, the ML and MAP criteria are commonly 
regarded as equivalent. For the synclironization problem, parameters such as carrier phase 
and symbol clock phase are uniformly distributed with modulo 2jt, and each possible data 
symbol are equally likely to occur in general. An alternative to the MAP criterion in random 
parameter estimation is minimum mean square error (MMSE). Since the error function
Yim, All-Digital Multicarrier Demodulators 4-34
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
can often be differentiated to yield the minimum, this criterion is popular in adaptive and 
Kalman filtering. While not all criteria are applicable to a particular problem, different 
criteria may lead to the same solution.
4.6.1.2 Properties of ML Estimates
Although <j)o is non-random, other random processes in the receiver, such as noise, lead to
different estimated values for different observation intervals. That is, because of noise, the 
estimate is itself a random variable, and will be different for individual observations. The 
ML estimate is unbiased, — <t)o* That is, the average of estimates approaches the 
true parameter value. It is consistent, lim ^ That is, a very long observation yields
an estimate that approaches the true value. If the Cramer-Rao Bound (CRB) can be found 
forthe estimation problem at hand, no estimator can provide a lower variance of the estimate 
than that established by the CRB,
Kar®><4(W„,7i)
This bound depends on the noise density and the observation period. An efficient estimate 
exists if the equality is satisfied. If an efficient estimate exists, it is a ML estimate. The 
existence and value of the CRB is not of prime importance. A ML estimate is 
asymmetrically efficient, i.e., the equality is satisfied as 7^  -> This asymmetrical 
behaviour is important in CAD and simulation. For example, the CRB can be found 
numerically by a direct search on the likelihood functions given above. Also, we can 
compare the performance of receivers and establish the confidence levels by simple 
numerical means.
4.6.1.3 Classification of Synchronizers
The complexity of synchronization can be reduced if is known. This is the data-aided
(DA) approach, typically used during initial acquisition by sending a known data sequence 
as the preamble. The decision-directed (DD) approach (or decision feedback) is similar, 
but the estimated sequence â„ obtained from the detection path is used as the true sequence 
in the synchronization path. In contrast, the non-data-aided (NDA) approach does not 
make assumptions about the data sequence. We adopt these three approaches, in the general 
sense, as the main classification of synchronizers.
Yim, All-Digital Multicarrier Demodulators 4 .35
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
In joint estimation, the remaining parameters Sg and Gg are considered simultaneously.
Simplification often results in independent estimation of individual parameters. This is 
performed by maximizing the expectation of the likelihood function over all unknown 
parameters, except the one to be estimated. A simpler way is to ignore the other parameters, 
i.e., assuming 8g = 0 in estimating 0g or vice versa.
Data-Aided Approach
In data-aided (DA) synchronization, the data symbols a„ are known to the receiver. They
are located at the header of a packet or frame — a preamble. The known data symbols 
are chosen to give timing and carrier information. This reduces the synchronization 
problem to the parameter estimation of pure sinusoids, often called pilot tones. This is a 
significant reduction in estimation effort because a„ varies from symbol to symbol, while 
symbol timing and carrier parameters are slowly varying relative to the symbol period T, 
and can be assumed constant in the observation interval. Since no information is 
transmitted, DA algorithms are primarily used for fast initial acquisition. For complex 
modulation schemes, this may be the only way to guarantee acquisition for reasonable 
receiver complexity.
Although DA algorithms are simple and desirable for fast acquisition, there are 
disadvantages against their use. Firstly, the information throughput decreases with the 
length of the preamble. Secondly, two different sets of algorithms must be used for 
acquisition and tracking. Thirdly, the determination of the instant to switch algorithms 
demands extra complexity. Therefore DA methods are most suitable for slotted-aloha type 
protocols. Here the packet or frame arrival time is approximately known.
DA algorithms are important in tone-in-band or even dual tone-in-band schemes, primarily 
for fast acquisition and fading counter measure. All estimations are performed on sinusoids 
transmitted along side the information bearing signal. Narrow band filtering and fine 
frequency shifting are required for the tones and signals respectively. These are easily 
achieved using DSP.
Decision-Directed Approach
In decision-directed (DD) algorithms, we make use of the detected symbols â„ for
synchronization, assuming that they are the transmitted symbols a„. Conventional DD 
algorithms using feedback loops are attractive in terms of computations. However, the 
reliance on correctly detected symbols lead to slow or unreliable acquisition. For coded
Yim, All-Digital Multicarrier Derrwdulators 4-36
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
modulation, including TCM, we have a choice of using the undecoded, or decoded 
decisions. In the former case, the demodulator structure is the same with or without coding, 
but the coding gain cannot be exploited.
Non-Data-Aided Approach
In non-data-aided (NDA) synchronization, we neither rely on known transmitted symbol 
as in DA algorithms, nor rely on correctly estimated symbols in the receiver as in DD 
algorithms. Therefore NDA algorithms are most difficult to derive and require more 
computation in general. However, the control complexity is the simplest
No switching of algorithms is necessary because NDA algorithms can be used for 
acquisition and tracking. No preamble is necessary. This is important for continuous 
mode operation such as voice channels. When synchronization is lost during a long 
conversation due to adverse channel conditions, the recovery is automatic.
For tracking, it is difficultto compare NDA and DD algorithms. By definition, DD methods 
perform better if the demodulated data symbols are correct. Therefore NDA methods are 
more suitable when decision errors are frequent, due to lowE^ Z/Vg, blocking, and fast fading 
etc.
With data coding or signal space coding (trellis coding for LM, partial response or multi-h 
for CEM), the degradation compared to DD algorithms increases with the complexity of 
the coding scheme. This is because the coding gain is not exploited in NDA algorithms.
4.6.1.4 Extraction of Estimates
For LM schemes, the likelihood function involving all parameters can be approximately 
computed from the matched-filter output sampled at one sample per symbol %  = 1). At 
this sampling rate, it is more convenient to describe the received signal at the matched-filter 
output as
j%
For an ideal and noiseless channel with eg = 0, this becomes
y-6o
Yim, All-Digital Multicarrier Demodulators 4-37
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
There are common misconceptions about the minimum sampling rate of the demodulator 
required for optimal estimation. These confusions arise because the sampling rates of the 
input signal, the output signal and the matched filter are independent using multirate DSP. 
Although only /y  = 1 is required at the matched-filter output, the sampling rate at the 
matched filter input must be sufficiently sampled, and the minimum is around /y  = 2. 
Otherwise, matched filtering cannot be performed for under-sampled signals. On the other 
hand, there is no advantage in using a high input sampling frequency. The use of continuous 
signal and filter is not superior.
The sampling rate /y = 1 at the matched filter output gives only one likelihood function
for a particular trial parameter e. To maximize, we require likelihood functions at all 
possible timing phases. This implies the availability of a continuous signal at the matched 
filter output. Availability is guaranteed if continuous signals and filters are used. Using 
DSP, a straightforward way to increase availability is to use a higher sampling density at 
the matched filter output. This is misleading as a digital variable delay element at any 
stage prior to (or integrated with) the matched filter provides timing availability without 
increasing any sampling frequencies in the demodulator. Arbitrarily fine timing resolution 
can be provided without any increase in computation rate, for example, using the 
polyphase-lattice structure. Therefore implementation losses due to finite timingresolution 
do not mean that all-digital modems are inherently inferior. A sampling density higher 
than ^  = 1 at the matched filter output is sometimes used is to reduce the estimator 
complexity rather than to increase timing availability.
Estimates are extracted by the following general approaches, which may use criteria other 
than ML.
Direct Search
A direct search is feasible if we can reduce the dimension. The search for data symbols 
is avoided if we use the DD approach. For non-offset linear modulation formats the 
searching of symbol clock phase is independent of carrier phase. The ML estimate of Eq 
becomes a one dimensional search [Ascheid84a]. Phase independence can be illustrated 
by
\^n\ =l*^ «+eol^
The disadvantage is that whenever the DD approach is used, the initial acquisition becomes 
problematic because there are generally no correct decisions before synchronization is 
achieved.
YIm, All-Digital Multicarrier Demodulators 4 .3 3
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
Tracking
These make use of the fact that
ax®
= 0
where is a single element of the unknown trial parameters. For single parameter
estimation, ÿ = The derivative is computed through error detectors. The error output 
gives the distance and direction of the local maximum. A tracking loop uses this 
information to drive the output of the error detector to zero, hence the likelihood function 
is maximised. The disadvantage is that local maximum may not be the global one.
Direct Computation
When this method is applicable, the trial parameters that yield a maximum value of the 
likelihood function can be found directly without the need for searching. For example, 
using the DD approach with zero timing error,
/Go
Therefore Gq is given by
/0O «e =
Recursive Estimation
From the likelihood function, one estimate is computed for each observation period that 
extends over many symbols. In Kalman filtering, an estimate is computed per symbol. 
(The criterion is MMSE rather than ML.) In the simplest case, the first estimate has an 
equivalent observation interval of one symbol. The equivalent observation interval is 
increased by one symbol for each new estimate. An analogy is the difference between 
block and convolution coding.
Yim, All-Digital Multicarrier Demodulators 4-39
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
4.6.2 Tone Synchronization
The initial acquisition performances of demodulators are simplified or improved if specific 
wave-f orms are transmitted such the synchronization parameters can be directly computed. 
These wave-forms are often sinusoids that can be obtained simply by choosing a specific 
data sequence. For linear modulation, the preamble is chosen such that the sequence a„ 
takes on alternating complex values 1 +y and -1 -  j .  This sequence contains symbol 
timing and carrier information. If = 1 +y , V/i, the carrier is unmodulated and timing
information is absent. The sequence of impulses with alternate polarity contains a 
fundamental frequency at half the symbol rate ^  Subsequent RC filtering is lowpass by 
nature which removes the harmonics. Therefore a complex pilot tone results.
First let us consider the real part of the transmitted signal:
^(0 =A c o s ^
The noiseless received signal is
r{t)=A cos^it +
If the matched filter output sampling density is 2 samples per symbol, there are 4 samples 
per period for the pilot tone. It is convenient to separate the matched-filter output into 
even and odd samples:
= r ( « r  i = r[{n-\- 1/2)(Z]
= A cos %{n + Eg) = A sin %{n + Eq) ^
The symbol timing error is
4= — s i n  23tEo COS—
cOq ^
=  y  2jtEo'COS— , f o r E o - > 0
This expression considers only a real tone. By superimposing an identical but imaginary 
tone, the complete algorithm has the same form, except that the output value is doubled.
Yim, All-Digital Multicarrier Demodulators 4-40
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
Oxûy half di complex multiplication per symbol is required. This simple algorithm illustrates 
the class of so called error detectors for demodulator structures using feedback loops. The 
estimate is accurate when the power level represented by A is close to the designed value. 
So and cOq approach zero. These conditions are met by feedback loops, providing a simple 
means of iteration, i.e., successive estimation and correction. Acquisition will be successful 
if the polarity of the estimate is correct initially. This only requires that the initial frequency 
error is less than half the symbol rate. The timing estimate is independent of phase error.
By definition, -0.5 < 8q < 0.5. However the timing estimate is zero when Eq = 0 or ±0.5.
Tlie latter case corresponds to initial sampling at midway between the data strobe, leading 
to the so called hang-up effect. The acquisition time slows down considerably only in rare 
cases where the initial symbol boundary lies very close to midway between the even 
samples. It is a simple matter to eliminate the hang-up effect completely, because in this 
case the odd samples are the data strobes.
Similarly, the frequency error estimate is
=A^ co^xin  +Eo) sin Wo 
For the complete complex tone, when o)o = Eo = 0, we have
rn=^n+jyn
= ^  (-1)" {[cos 0 0  -  sin Gq] +J  [cos Gq + sin Gq] }
The phase error is
Uq=A sin Go
j  yn for even n
1 ~yn foi: odd n
In practice, we cannot differentiate between even or odd n at the receiver. If we can 
arbitrarily assume alternate symbols to be even and odd, a phase uncertainty of % results. 
This is a 2 fold ambiguity that is equivalent to polarity inversion. A separate algorithm 
for phase acquisition is seldom required. The algoritlim shown depends on timing error 
and frequency error. If symbol timing and carrier frequency are acquired already, a phase
Yim, All-Digital Multicarrier Demodulators 4-41
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
acquisition algorithm does not offer significant improvements over a phase tracking 
algorithm. However, the phase ambiguity is reduced. For example, QPSK has a phase 
ambiguity of Jc/2. The ambiguity is reduced to Jt with phase acquisition.
4.6.3 Symbol Timing Synchronization
At the matched filter output sampled at/y = 1, each sampling phase gives one likelihood
function. To avoid direct searching of the infinite number of sampling phases, a separate 
timing matched filter is developed using the ML approach [Gardner88a], with the output 
also sampled at/y = 1. To avoid the use of a separate filter, other generic algorithms exist 
based on the maximum fluctuation principle. For LM schemes, the amplitude fluctuation 
is maximum at the data strobe, therefore die symbol clock can be directly extracted. Also, 
simple and effective detectors exist based on zero crossing.
4.6.3.1 Zero-Crossing Algorithms
Although generally applicable, satisfactory performance of the zero-crossing algorithm is 
limited to binary schemes such as BPSK (real) and QPSK (complex). This type of algorithm 
is best illustrated using sinusoidal approximations. For BPSK, when two consecutive 
symbols are opposite in polarity, a zero crossing occurs approximately at midway between 
data strobes. Therefore a sampling rate of /y = 2 is required at the matched filter output.
For BPSK, ün = ±1. Assuming zero carrier frequency and phase errors, the timing error is 
given by
This is a primitive form of DD detector. The bracketed term is both a symbol transition 
detector and slope detector. The mid-strobe sample is approximately proportional to the 
timing error.
Yim, All-Digital Multicarrier Demodulators 4-42
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
The effect of phase error can be eliminated as in tone estimation. However, we require a 
complex transition and slope detector. The absence or presence of symbol transitions for 
BPSK signal can be approximated by segments of d.c. or sinusoids with one symbol 
duration respectively. As in synchronization using a pilot tone, let us approximate the 
received signal after matched filtering by
r„=A cosjt(« +6o)e
with symbol transitions and
r \  = ±Aey[“o"+0o]
without symbol transitions.
As in the DA method, the operation of zero-crossing detection on complex symbols (the 
Gardner detector) can be described as follows:
A . _ cOoy sm 2jteocosy  if
A ^cos^
C0(]
~2
A  C O q
-ysm 2:jceocosy if
A ^cos^ if
Therefore the timing error is given by
COfi
A sin2jt8ocos— if
0 if
4.6.S.2 The Mueller and Mueller Algorithm
The Mueller and Mueller algorithm is given by
Yim, All-Digital Multicarrier Demodulators 4-43
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
The complicated analysis of this simple detector can be found in [Mueller76a]. This 
detector works on only one sample per symbol and no multiplications are required. In 
addition to the initial acquisition problems of DD algorithms, the output error depends on 
carrier phase, because there are no computations for phase cancellation. Therefore failure 
under frequency error is certain.
4.6.3.3 Symbol Clock Extraction
For non-offset linear modulations (QAM, PSK and PAM etc), there is a maximum 
amplitude fluctuation at the data strobe. Therefore a symbol clock frequency component 
can be extracted. This is the so called square and filter method that is widely used in 
analogue timing recovery. Figure 4.23 shows the basic functional blocks.
r(t)
POSTFILTERPREFILTER SYMBOL CLOCK EXTRACTION
Figure 4.23 Square and Filter Symbol Synchronization
The prefilter is used to reduce the timing jitter. A matched filter can be used such that no 
additional filtering is required. Then the pulse shape of r'{t) at this point is raised-cosine. 
In analogue timing recovery, the square of (or imaginary part) is computed through
a non-linear device. Using DSP, we can simply compute r'{nT) •F'(/r7’). The effect of 
carrier phase is eliminated, and tlierefore frequency errors are tolerated. The square 
magnitude calculation is the essence of NDA methods. Although amplitude fluctuations 
are maximum at the data strobe, the average is zero. Squaring ensures maximum fluctuation 
on average, without knowing the polarity of the data symbols. After squaring, the 
bandwidth of the signal is doubled. Since r {t) can be sufficiently sampled at T = 12, the 
squared signal should be sampled at T = (T/4, i.e., 4 samples/symbol.
In analogue implementations, the symbol clock extraction is a narrow band filter tuned at 
the symbol rate 1/î^ The sinusoid thus obtained is used directly as the sampling clock for 
tlie data strobe. Using DSP, the spectral line at l/(Tcan be computed directly using a DFT 
[Oerder88a]. For T = (T/4, we can compute a 4 point DFT for every symbol without any 
multiplications. The 4 frequency points are 0 ,1! ,^ 2!^ and 3/2^  Only the second frequency 
point is required, denoted by + j'y„. For feed forward structures, an accurate estimate 
can be directly computed using look-up tables after comb filtering:
Yim, All-Digital Multicarrier Demodulators 4-44
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
«e = tan ^
J
= 2jC8o
The summation is performed over N symbols, where N is a trade-off between small jitter 
and fast response.
4.6.4 Phase Synchronization
Since only /y = 1 is required at the matched filter output for optimal estimation, phase
synchronization approaches that handle timing errors often arrive at the same algorithm 
when timing errors are ignored. For zero timing and frequency error, the matched filter 
output is
'■» = A aJ^ '‘
=Xn +y„
where A represents amplitude level. It is evident that direct computation is possible.
4.6.4.1 The Costas Algorithm
In the Costas type algorithms, assuming the detection is correct, i.e., â„=a„, the phase 
error is
=A|aJ^sineo
For QPSK, the detected symbol is â„ =sgn (r„) +jsgn (y„). The phase error is
«e = 3 [ r '„ - 0
=y^sgnix„) -x„sgn(y„)
= 24 sinGo
which is the digital version of the familiar Costas phase error detector.
Yim, All-Digital Multicarrier Demodulators 4 .45
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
This DD approach can be considered as direct computation rather than null tracking. The 
function of the feedback loop in this case is to find Gq by the approximation
Go = sin Go Go 0
This polar phase estimate also gives the rate of change of phase required in a 2nd order 
phase-lock-loop. However, the rectangular form cos Go+y sin Go is required in phase 
correction, and large lookup tables are required to perform the polar to rectangular 
conversion. In custom hardware, the Cordic algorithm is often used to replace the lookup 
tables and multiplications. In the above algorithm, if the imaginary part is not taken, all 
these conversions are not required. A feedback or feed forward approach can be used. 
However, the rate of change of phase angle cannot be dealt with. To avoid lookup tables 
altogether, the Kalman filtering approach is promising. Both the phase angle estimate 
(tracking) and its rate of change (prediction) can be jointly given in rectangular form.
4.6.4.2 The Viterbi and Viterbi Algorithm
For QPSK,
« = 0 ,1 ,2 ,3
The phase error is given by
«e = - 3 [ 0
4  y [ 2 m r e + K + 4 9 J
=  -4 4  e 
= 44^sin4G()
Alternatively, arctangent algorithm using look-up table techniques, together with comb 
filtering can be used in an open loop (feed forward) structure [Viterbi83a], as in symbol 
clock extraction above. The same 4 fold phase ambiguity appears as in the Costas detector. 
Therefore the performance under large initial frequency errors will be similar. However, 
performance under low will be superior to DD methods.
This technique is applicable to M-ary PSK with M phase states. Here the phase modulation 
is removed by taking the M* power. Also, generalized non-linearity instead of power 
has been developed for various LM schemes.
Yim, All-Digital Multicarrier Demodulators 4-46
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
These techniques have been extended to combat multipath fading [Yoshida90a], by using 
a combined narrow/wide band dual open loop estimator. The fast phase fluctuation due 
to multipath is estimated by the wide-band estimator, while the phase slip rate is prevented 
from increasing by the narrow-band estimator.
4.6.5 Automatic Frequency Control
Automatic frequency control (AFC) has received less attention than phase synchronization. 
This is because AFC is not required for medium to high bit rate systems, or fixed terminals. 
AFC becomes important in low bit rate, mobile and satellite systems.
If initial frequency errors are comparable to or larger than the channel bandwidth, coarse 
frequency correction is necessary and can be readily performed by filter based frequency 
discriminators or DFT based algorithms. Phase-lock-loops have a narrow pull-in range. 
Fine frequency correction is required for successful acquisition or to shorten the acquisition 
time. For a bandwidth efficient link, coarse frequency adjustment should not be required 
by design, otherwise bandwidth is wasted to allow for channel offsets. In addition, using 
all-digital approaches, receiver VCO’s are replaced by fixed frequency crystal oscillators. 
The latter have good long term stability, reducing the need for coarse adjustment
We concentrate on cases where the initial frequency error is a fraction of the channel 
bandwidth. With Doppler shift, compounded by low bit rate, AFC is necessary for fast 
acquisition.
Using the ML approach, a separate frequency matched filter is derived in [Gardner90b]. 
To avoid an additional filtering stage, there are other well developed methods such as 
digital frequency discriminators. Since a second order phase-lock-loop is often used during 
tracking, the complexity of using optimal frequency estimation during initial acquisition 
may not be justified.
Conventional frequency estimators are all DA or NDA types, because correct decisions 
are unreliable with the presence of frequency errors. However, corxecXdifferentialdecisions 
can still be made, which can be classified as DD approaches. Fast acquisition under 
significant frequency error can be achieved with minimal complexity.
4.6.5.1 Phasor Filtering
For arbitrary modulation schemes, let be the received signal (before or after matched
filtering) with a sampling density/y samples per symbol. With arbitrary timing error, let 
the phase change between successive samples be
Yim, All-Digital Multicarrier Demodulators 4-47
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
■«I’m -I
The expectation is given by
y—
£ '[ ^ J = e [ 4 y ’"]4e""
A V'" « y
If/v = 1» 0/n can take on arbitrary values due to modulation. Therefore
4 V " ]  = 0
If /v > 1, the range of phase change is restricted. We have
where w is a real number. That is, the imaginary component of the resultant vector due to 
modulation is zero. The frequency error is
= tan-1
Using feedback loops, the frequency error estimate to be driven to zero is
m
The balanced quadricorrelator uses the same principle with /y  > 2, but comb filtering is
absent. It is well known that the quadricorrelator works for QPSK, but the pattern jitters 
are too large to be useful for subsequent phase synchronization. Our analysis shows that 
the phasor filtering must be carried out over a number of symbols to reduce the pattern 
jitters to an acceptable level before outputing an estimate to a feedback loop [Yim90a].
Yim, All-Digital Multicarrier Demodulators 4-48
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
For offset signals without preambles, symbol synchronization at non-zero frequency error 
is probably impossible. Correcting the frequency error first, using phasor filtering for 
example, may be the only way for initial acquisition.
4.6.S.2 Differential Decision Frequency Error Detector
A class of simple frequency error detectors (FED) is available for PSK and other formats 
based on differential decisions [Yim88b]. This general approach is recognized as a 
differential DD method for frequency estimation in [Gardner90b]. Although this approach 
is covered in [Messerschmitt79a], no concrete methods are given to compute the frequency 
error efficiently. We shall see the superiority of using complex notations in deriving the 
new FED algorithms.
If the initial frequency error is a fraction of the channel bandwidth, NDA algorithms for 
timing acquisition exist that are insensitive to frequency errors. We only need a FED that 
operates on the data strobe, i.e., 1 sample/symbol. This is inherently simple compared to 
other types of FED.
The performance is superior to using DD phase error detectors (PED) alone, e.g., Costas 
type detectors. Assuming correct decisions, a first order loop (using FED) is superior to 
a second order loop (using PED alone). Although the false lock frequency range is the 
same (doubled in some cases), demodulators using differential decision FED’s can acquire 
with a first order loop performance at a frequency error range approaching the theoretical 
limit Demodulators using decision directed PED’s only exhibit a second order loop 
acquisition performance with a range that is an order of magnitude lower than the false 
lock limit Large frequency errors lead to different phase assignments occurring frequently, 
i.e., a„ becomes a„ • exp(Hyjt/2) for QPSK. This is similar to frequent decision errors.
Frequency acquisition is first illustrated using QPSK. With zero timing error, the matched 
filtered outputs are approximated by
m=0,l,2,3
Let us define the differential phase as
. r  '■; = 24 m ' = 0,1,2.3
This leads to a new class of algorithms for detecting the frequency error. The phase change 
due to modulation can be eliminated by
Yim, All-Digital Multicarrier Demodulators 4-49
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
= 2^ 14® sin 4c0g
This is a NDA method. The 4* power computation can be reduced with the introduction 
of decisions.
a [ cos2cOg+y sin2cOg if m' = 0,2
44 [-cos2cOo-y sin2cOg if w ' = l,3  
If I cogl < ^ , cos 2cOo > 0, the corrected polarity can be resolved by decision. The frequency 
error is
« 0 3 = ^  'Sgn{a)
=  44'*sin2cDo
The last version involves no complex power computation. Let i|)„ = a +J b , We have
a + jb
24^
+COSCÛ0  +  j  sincO g i f  m ' =  0
+ s in c O g - j coscD g  i f  m' = l
- c o s  cOg -  j  sincD g i f  m '= 2
- s i n c o g + y c o s œ g  i f  m '  =  3
If I cogl < ^ , I sin cogl < I cos cOgl. Therefore m ' can be resolved by multiple decisions. The 
frequency error is
b-sgn{a) if \a \> \b \
\-a-sgn{b) if \a\ <\b\
— 24 sin COg
Yim, All-Digital Multicarrier Demodulators 4-50
4 Digital Modulation Schemes and Synchronization Techniques Synchronization in All-Digital Receivers 4.6
Only one complex multiplication is required for computation the phase change between 
data strobes. The absolute values required in making differential decisions have not 
explicitly appeared in the literature.
For general MPSK signals, the trellis of the phase change vector has the same form as the 
signal trellis, except for a phase rotation of %!M. Therefore the differential decisions above 
are very similar to the corresponding decisions for data symbols. Also, the Viterbi and 
Viterbi algorithm for phase modulation removal can be applied to the phase change vectors. 
For general LM schemes, e.g., QAM, we can apply this approach together with the common 
method of selecting a subset of trellis points with a PSK constellation.
4.6.5.S Dual-Comb-Filter Frequency-Timing Error Detector
The frequency domain approach is simpler than the ML approach in deriving 
non-data-aided synchronizers. This approach make use of the fact that the long term power 
spectral density of a digitally modulated signal is well-defined, despite carrying a random 
data sequence. The derived synchronizers are also less complex. The dual-filter frequency 
error detector is no more than a conventional frequency discriminator, but optimized for 
all-digital demodulators [Alberty89a]. Also, the same dual-filter specification can be used 
for timing error detection [Godard78a]. Both are for use in feedback loops. The algorithms 
described here make use of comb filters which are very suitable for hardware 
implementation. This approach also relates directly to DFT type estimators.
G(f)
-R/2 R/2
Figure 4.24 Dual-Filter Detector
Figure 4.24 shows the power spectral density G if) the signal of a linear modulation scheme, 
with raised-cosine pulse shape. (Tliis approach is also relevant for other schemes with 
similar symmetries in Gif)). Let Wo(«) and u^in) be the output of the two filters. If the
Yim, All-Digital Multicarrier Demodulators 4-51
4 Digital Modulation Schemes and Synchronization Techniques Comparison of Demodulator Complexity 4.7
signal centre frequency is at zero, the average output power of the two filters centred at 
half the symbol rate ((2^ 2) is equal. The frequency error is given by the difference in the 
output power of the two filters, that is
u^ = ul{n)-ulin)
Any pair of filters can be used that satisfies the symmetry requirements found in 
[Alberty89a]. (Discussion of filter centre frequencies can be found in [Gardner90b].) One 
important trade-off is the filter bandwidth. The narrower the bandwidth, the slower the 
loop response. Also, the filter output can be decimated because of the reduced bandwidth. 
Comb filters satisfy all the symmetry requirements, with h{i) = l, 0 < / < A, and A (f ) = 0 
otherwise. If ^  = 2, the dual-filters are
Therefore only additions and subtractions are required in filtering, except one squaring 
operation for each of the decimated filter output. The relation to DFT type detector is 
clearly shown by the expression
ufti) = J, s { n - i ) h  (/) e /=o
i=n-N+\
The frequency shift term exp(/ j  does not affect power measurement. With appropriate
scaling, the summation is to find the approximate d.c. component (complex) of a frequency 
shifted signal segment with length N. In other words, the frequency component at ^ 2  of 
the short signal segment sampled at /y = 2 is approximated. This is the basic essence of 
the Fourier Transform and DFT.
The dual-comb filter above also satisfies the requirement for use as a timing error detector. 
The computation of timing error requires only 2 multiplications:
êo = ÎR[wi*Ko]
4.7 C om parison of Demodulator Complexity
In this section, we estimate the complexity of coherent demodulators based on the above 
generic synchronization techniques. For detection, variable polyphase structures for 
adaptive sampling are integrated with matched filters. The number of filter taps N given
Yim, All-Digital Multicarrier Demodulators 4-52
4 Digital Modulation Schemes and Synchronization Techniques Comparison of Demodulator Complexity 4,7
in the following tables are pessimistic estimates, as there are many opportunities for 
optimization in filter design. The lengths of root-Nyquist filters with 40 dB out-of-band 
attenuation are given for frequency sampling designs.
Costas type detectors are used in phase error tracking. For frequency acquisition, phasor 
filtering or differential decision is used that has the same computation rate. For timing 
synchronization, both the square-and-f ilter algorithm, and the zero-crossing algorithm are 
considered. The computation rates are given in table 4.7 and 4.8, normalized with respect 
to the symbol rates, i.e., number of multiplications per symbol. The last columns show the 
total computation rate normalized with respect to the bit rate. This shows that for the same 
capacity (bit rate), the computation rates for more complex modulation schemes are 
reduced.
Table 4.7 Square-and-filter Timing Synchronization
M Scheme Filter TED PED FED Phase Freq Total Rate/q,
2 BPSK 256 8 0 4 4 16 288 288
4 QPSK 256 8 0 4 4 16 288 144
8 8PSK 256 8 2 4 4 16 290 97
16 16QAM 256 8 2 4 4 16 290 73
Table 4.8 Zero-crossing Timing Synchronization
M Scheme Filter TED PED FED Phase Freq Total RateAb
2 BPSK 128 2 0 4 4 8 146 146
4 QPSK 128 2 0 4 4 8 146 73
TED - timing error detector 
PED - phase error detector 
FED - frequency error detector
Phase - phase correction 
Freq - frequency correction
The major computations lie in filtering and detection. Although the number of 
multiplications per symbol remains constant, the multiplication rate for filtering reduces 
with the increase in M. Increasing M requires more complex algorithms, e.g., 
square-and-filter TED instead of zero-crossing, and more comparators for detection. 
However, these have less weight in the overall complexity as compared to filtering. 
Therefore in terms of computation rate, a higher M may result in simpler receivers.
Yim, All-Digital Multicarrier Demodulators 4-53
4 Digital Modulation Schemes and Synchronization Techniques References 4.8
The same receiver structure is applicable for modulation schemes in the same class. Only 
the detection process requires the knowledge of specific M. For binary schemes such 
BPSK and QPSK, simpler algorithms exist that require a lower sampling density.
The control and design complexity is insensitive to M and the type of modulations used. 
Higher complexity is required for fast acquisition. For example, when preambles are used 
to speed up acquisition, simpler but extra algorithms are required. In addition, the change 
over between algorithms may require status monitoring.
One of the attraction for generalised structures is that software based algorithms can be 
automatically constructed from library modules when driven by a few parameters (e.g., 
choice of M). ASIC’s will be needed to satisfy the power and mass constraints of on-board 
processing. Generic algorithms offer the opportunity to reuse existing designs for 
functional sub-units.
The synchronization techniques (hence demodulator complexity) for non-offset linear 
modulation schemes discussed in this chapter are relevant to other schemes. For example: 
offset and non-offset schemes use the same frequency and timing matched-filters; after 
initial acquisition, synchronisers for non-offset schemes can be applied to offset schemes 
with minor modifications; simple constant envelope schemes, such as MSK, can be 
expressed in terms of offset linear formats; techniques for linear schemes (amplitude) can 
be applied to constant envelope schemes (phase) after a polar to rectangular conversion, 
or its approximation; frequency discriminators, which make use of signal power rather 
than signal format, can be generally used for all schemes.
4.8 R eferences
Alberty89a. T. Alberty and V. Hespelt, “A New Pattern Jitter Free Frequency
Error Detector,” IEEE Trans, on Comm., COM-37, Feb. 1989.
Ascheid84a. G. Ascheid and H. Meyr, “Maximum Likelihood Detection and
Synchronization by Parallel Digital signal Processing,” Globecom 
’84, Vol. 2, p. 32.2,
Crochiere83a. R. E. Crochiere and L. R. Rabiner, Multirate Digital Signal
Processing, Prentice-Hall (1983).
Yim, All-Digital Multicarrier Demodulators 4-54
4 Digital Modulation Schemes and Synchronization Techniques References 4.8
Erup90a. L. Erup, "Simulation of Interpolators and their Control,” 3rd IEEE
Intern. Workshop on Computer-Aided Modeling, Analysis and 
Design o f Communication Links and Networks, Vol. CAM AD-3, p. 
3.3, Sep. 1990.
Farrow88a. C. W. Farrow, "A continuously Variable Digital Delay Element,”
Proc. Int. Symp. Circuits and Systems, Vol. ISCAS-88, pp. 
2641-2645, Espoo, Finland Jun. 1988.
Gardner85a. F. M. Gardner, “On-Board Processing for Mobile-Satellite
Communications,” ESTEC contract no. 5889/84/NL/GM, European 
Space Agency (May 1985).
Gardner88a. F. M. Gardner, “Demodulator Reference Recovery Techniques
Suited for Digital Implementation,” ESTEC Contract No. 
6847/86/NL7DG, European Space Agency (Aug. 1988).
Gardner90a. F. M. Gardner, “Timing Adjustment via Interpolation in digital
Demodulators,” ESTEC Contract No. 8022I8SINLIDG, Vol. Part I, 
European Space Agency, Jun. 1990.
Gardner90b. F. M. Gardner, “Frequency Detectors for Digital Demodulators via
Maximum-Likeliliood Derivation,” ESTEC Contract No. 
8022I88INLIDG, Vol. Part II, European Space Agency, Jun. 1990.
Godard78a. D. Godard, “Passband Timing Recovery in an All-Digital Modem
Receiver,” IEEE Trans. Comm., Vol. COM-26, pp. 517-522, May 
1978.
Messerschmitt79a. D. G. Messerschmitt, “Frequency Detectors for PLL Acquisition in 
Timing and Carrier Recovery,” IEEE Trans, on Comm., Vol. 
COM-27, pp. 1288-1295, Sep. 1979.
Mueller76a. K. H. Mueller and M. Mueller, “Timing Recovery in Digital
Synchronous Data Receivers,” IEEE Trans. Comm., Vol. COM-24, 
pp. 516-531, May 1976.
Oerder88a. M. Oerder and H. Meyr, “Digital Filter and Square Timing 
Recovery,” IEEE Trans, on Comm., COM-36, pp. 605- 612, May 
1988.
Yim, All-Digital Multicarrier Demodulators 4-55
4 Digital Modulation Schemes and Synchronization Techniques References 4.8
Oerder90a.
Takahata87a.
Trees68a.
Viterbi83a.
Yim88a.
Yim88b.
Yim89a.
Yim90a.
Yoshida90a.
M. Oerder and H. Meyr, “VLSI Implementation of a 100 Mbit/s 
Digital Receiver,” Proc. 2nd Intern. Workshop on Digital Signal 
Processing Techniques Applied to Space Communications, DSP90, 
p. 1.4, Sep. 1990.
F. Takahata and et al., “A PSK Group Modem for Satellite 
Communication,” lEEEJ. on Selected Areas in Comm., Vol. SAC-5, 
pp. 648-661, May 1987.
H. L. Detection, Estimation, and Modulation Theory,ICan
Wiley & Sons (1968).
A. J. Viterbi and A. M. Viterbi, “Nonlinear Estimation of 
PSK-Modulated Carrier Phase with Application to Burst Digital 
Transmission,” TEEE Trans. Inf. Thy., IT-29, pp. 543-551, Jul. 1983.
W. H. Yim, C. C. D. Kwan, F. P. Coakley, and B. G. Evans, 
“Multicarrier Demodulator for the On-Board Processing T-SAT 
Land Mobile Payload,” Proc. 4th International Conference on 
Satellite Systems fo r Mobile Communications and Navigation, pp. 
254-258, lEE, 1988.
W. H. Yim, C. C. D. Kwan, F. P. Coakley, andB. G. Evans, “On-board 
Multicarrier Demodulator for Mobile Applications using DSP 
Implementation,” Proc. 1st Int. Workshop on Digital Signal 
Processing Techniques applied to Space Communications, pp. 
124-130, European Space Agency, ESTEC, Nov 1988.
W. H. Yim, C. C. D. Kwan, F. P. Coakley, and B. G. Evans, 
“On-Board Multicarrier Demodulators for Mobile Applications 
using DSP Implementation,” Proc. 1st European Conference on 
Satellite Communications, Nov 1989.
W. H. Yim and F. P. Coakley, “DSP MCD’s for Mobile Satellite 
Services,” 2nd International Workshop on Digital Signal Processing 
Techniques Applied to Space Communications, DSP90, European 
Space Agency, Sep. 90.
S. Yoshida and H. Tomita, “A New Coherent Demodulation 
Technique for Land-Mobile Satellite Communications,” Intern. 
Mobile Satellite Conference, pp. 622-627,1990.
Yim, All-Digital Multicarrier Demodulators 4-56
5 All-Digital Multicarrier Demodulator Implementation
Table of Contents
5 All-Digital Multicarrier Dem odulator im plem entation  5-1
5.1 Analogue I.F. B o a rd ........................................................................5-3
5.2 Multiprocessor DSP B oard........................................................... 5-7
5.2.1 Demultiplexer..................................................................... 5-9
5.2.2 Demodulator Array.............................................................5-18
5.2.2.1 Parameter Estimation.............................   5-19
Symbol Timing ...............................................................  5-20
Carrier Phase....................................................................  5-21
Carrier Frequency....................   5-23
5.2.2.2 Digital Timing Correction............................................  5-26
5.3 Computer-Aided Design and Simulation S tu d y ....................... 5-28
5.4 Testing ................................................................................................. 5-32
5.4.1 ROM-Based Flexible Modulator.......................................  5-33
5.4.2 Arbitrary Wave-form Generator..............    5-35
5.4.3 Transient Wave-form Recorder...........................................5-36
5.4.4 Custom Bit Error Rate Monitor............................................5-36
5.4.5 Test results........................................................................... 5-37
5.5 Complexity of Multicarrier Dem odulators...................................5-38
5.5.1 Complexity of T-SAT M C D ................................................  5-38
5.5.2 ASIC Implementations ........................................................ 5-40
5.6 Signal Processor Architectures and Softw are..........................5-41
5.6.1 Implementation of FIR Filtering...........................................5-41
5.6.2 Implementation of Other Algorithms................................... 5-42
5.6.3 Multi-Processor Implementation.......................................  5-44
5.6.4 Software Development Tools..............................................5-44
5.6.5 Summary...............................................................................5-46
5.7 R eferen ces.......................................................................................... 5-48
Yim, All-Digital Multicarrier Demodulators 5-i
All-Digital Multicarrier Demodulator 
Implementation
This chapter describes the implementation of an all-digital multicarrier demodulator using 
TMS320C25 digital signal processors. This forms part of the U.K. Technology Satellite 
(T-SAT) project, in which a complete on-board processing prototype payload was produced 
[Aghvami88a]. The payload design is targeted at land-mobile services using satellites in 
highly elliptical orbits. The work has been performed by a consortium of U.K. universities 
and polytechnics, co-ordinated by the Rutherford and Appleton Laboratories. The key 
baseline specifications of the MCD are summarised in table 5.1.
Contemporary with the T-SAT project, there were two other all-digital MCD prototypes 
that have been tested and details published. These were implemented using signal 
processors by KDD/NEC (Japan) [Takahata87a] and ANT (Germany) [Hespelt88a] 
respectively. The KDD/NEC implementation concentrated on the estimation of digital 
circuit complexity. The demultiplexer was a straightforward application of the 
polyphase-DFT method, and the synclironization algorithms used were only suitable for 
tracking. The ANT implementation was for mobile services, which requires special 
attention to the speed and reliability of signal acquisition. The algorithms used were based 
on those proposed by Gardner [Gardner85a], but later found to be either inadequate or 
incomplete, which lead to the development of new solutions. Different techniques emerged 
from the T-SAT MCD project, which were published in various technical papers. In
Yim, All-Digital Multicarrier Demodulators 5-1
5 All-Digital Multicarrier Demodulator Implementation All-Digital Multicarrier Demodulator Implementation 5
Table 5.1 T-SAT MCD System Specifications
Number of channels 4
Data rate 16 kb/s
Modulation scheme QPSK
Channel spacing 14kHz
Receive filter characteristic praised cosine
Roll-off factor 40%
Available 14.5 dB
Maximum Frequency Deviation ±600 Hz
Clock Accuracy 10'^
addition, only the T-SAT MCD has an I.F. front-end. The straightforward application of 
filter bank technique is impossible due to the use of off-the-shelf high frequency 
components (hence practical and inexpensive). Much effort has gone into discovering the 
problems in real-time implementation, and devising new and efficient solutions.
The T-SATMCD hardware, shown in figure 5.1, consists of two circuit boards, for analogue 
and digital signal processing respectively. The detailed design and implementation, 
including the test-bed, are described in the following sections.
Yim, All-Digital Multicarrier Demodulators 5-2
5 All-Digital Multicarrier Demodulator Implementation Analogue I.F. Board 5.1
ANALOGUE I.F. BOARD MULTIPROCESSOR DSP BOARD
DEMODsDEMULTIPLEXER
CMCM CM
4X16 kb/8IW i
0 Hz-70 MHz
Figure 5.1 T-SAT MCD Hardware
5.1 A nalogue I.F. Board
S  256 KHz
%
70 MHz
0 Hz
70 MHz
Figure 5.2 Analogue I.F. Board
The design and partition of I.F. processing between analogue and digital hardware was 
chosen to allow the use of an off-the-shelf SAW (Surface Acoustic Wave) bandpass filter. 
A narrow passband close to 56 kHz (4 x 14) requires custom fabrication. The centre 
frequency of the SAW filter, specified at 70 MHz, has a loose tolerance such that the 
frequency offset can be up to 2 channel spacing. The SAW filter used has approximately 
a IdB bandwidth (passband) of 200 kHz, and a rejection bandwidth (passband plus 
transition width) of 300 kHz at 20 dB, i.e., the attenuation at the stopband edge is 20 dB,
Yim, All-Digital Multicarrier Demodulators 5-3
5 All-Digital Multicarrier Demodulator Implementation Analogue I.F. Board 5.1
and the ultimate rejection is 40 dB. Figure 5.3 shows the SAW filter parameters used, 
focusing on the conjugate component at -70 MHz. A one-sided (quadrature) frequency 
shifter with a 70 MHz oscillator translates the -70 MHz point to 0 Hz. We can also focus 
on the real component at 70 MHz, but this is a mirror image of the Conjugate component.
t
40 dB
-70 MHz
300 kHz 
200 kHz
20 dB
1 dB
14 kHz
56 kHz
Figure 5.3 SAW Filter Specification
Using a one-sided frequency shifter, the signal prior to sampling is complex, requiring 
dual A/D converters. The minimum sampling rate is 250 kHz (200+(300-200)72), 
irrespective of the centre frequency of the SAW filter, provided that the charmels lay in 
the passband (figure 5.4 (a) and (b)). The sampling frequency is chosen to be 256 kHz, an 
integer multiple of the bit rate at 16 kbit/s. This is approximately over-sampling by 2 for 
a total usablesignal bandwidth of 112kHz (8 x 14). The SAW filter usedis most appropriate 
for an 8-channel demultiplexer using the tree method. To be efficient, the polyphase-FFT 
method requires a much narrower transition bandwidth. Figure 5.5 shows the SAW filter 
requirements for different methods. If a two-sided (balanced) frequency shifter is used, 
only a single A/D converter is required. However, additional digital I.F. processing is 
required to complete the quadrature shifting, and the sampling frequency has to be doubled. 
Because of the possibly large centre frequency offset, the required sampling frequency is 
higher still, dependent on the offset value (figure 5.4 (c) and (d)).
Yim, All-Digital Multicarrier Demodulators 5-4
5 All-Digital Multicarrier Demodulator Implementation Analogue I.F, Board 5.1
S A M P L I N G  F R E Q U E N C Y
(a) COMPLEX SAMPLING
(b) COMPLEX SAMPLING WITH OFFSET
(c) REAL SAMPLING
(d) REAL SAMPLING WITH OFFSET
Figure 5.4 Sampling Methods For Different SAW Filters
-70 MHz
(a) SAW FILTER USED
i  i l l
(b) SAW FILTER FOR TREE METHOD
(c) SAW FILTER FOR POLYPHASE-DFT METHOD
Figure 5.5 SAW Filter Requirements For Different Demultiplexers
Yim, All-Digital Multicarrier Demodulators 5-5
5 All-Digital Mutticarrier Demodulator Implementation Analogue I.F. Board 5.1
Two flash type A/D converters were chosen, with 8-bit word length. Amplifiers interface 
between the SAW filter, quadrature shifter and A/D converters. These amplifiers also act 
as lowpass filters for suppressing the high frequency harmonics generated by the quadrature 
shifter. More importantly, gain adjustments are provided for matching the power between 
the in-phase and quadrature-phase signal components. Adjustments are aided by software 
monitoring of the sampled signal magnitude. Using the same sampling frequency of 256 
kHz, the I.F. signal with a much higher centre frequency can be directly sampled, as 
discussed in section 2. However, A/D converter parameters such as aperture and sampling 
time jitters are not normally given. Correct operation is guarantied for baseband signals 
only, i.e., a centre frequency of zero.
The passband bandwidth of the SAW filter approaches 8 channels. Therefore the I.F. and 
sampling front-end allow for 8 channels, doubling the required capacity. This demands a 
higher sampling frequency and additional digital processing. However, all these 
compensate the 20 dB SAW filter rejection, and the 8-bit A/D quantization.
Yim, All-Digital Multicarrier Demodulators 5 ^
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
5.2 M ultiprocessor DSP Board
►
i
i itt
TMS1
Figure 5.6 Photograph of Multiprocessor DSP Board 
TMS2 TMS3 TMS4
DËMOD
DEWtOD
DEMOD
DEMOD
FIFO FIFO FIFO
Figure 5.7 Block Diagram of Multiprocessor DSP Board
Yim, All-Digital Multicarrier Demodulators 5-7
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
A photograph of the multiprocessor DSP board (built by C.C.D. Kwan) is shown in figure 
5.6, and the board functions are shown in figure 5.7. The TMS320C25 processor was 
chosen as the introduction of a future space qualified version was likely. Initially, the MCD 
was implemented on multiple TMS320C25 hardware development boards from 
Loughborough Sound Images Limited (U.K.). These boards were connected using either 
the serial link, or external high speed FIFO circuits. In the final design, custom hardware 
was built to simplify interface circuits and reduce inter-processor communication 
overheads.
In this final design, all DSP functions were contained on a single, custom made 
multiprocessor board, with IBM PC-bus compatibility. There are 4 TMS320C25 
processors in cascade, connected by high speed FIFOs. The inter-processor communication 
protocol is a simple interrupt mechanism using the XF and INTI signals. This allows 
flexible, and high speed, block transfers to reduce the processor I/O overheads. Block 
synchronization is maintained by periodically clearing the FIFOs through software 
command. This allows reliable system initialization, and provides the fault tolerance in 
data transmission necessary for long duration bit-error-rate testing.
Because of the wide SAW filter passband, processor TMSl is required as the root node 
of an 8-channel tree demultiplexer. This processor performs decimation by 2, from 256 
kHz to 128 kHz. The outputs are two signals witli approximately 64 kHz bandwidth each. 
One of the signals is a 4-channel FDM for demodulation, while the other 64 kHz signal is 
reserved for future experiments. Processor TMS2 implements a 4-channel tree structure. 
The output signals of the tree are over-sampled by 2, i.e., at 32 kHz. Processor TMS3 is 
provided for final decimation to 16 kHz. Processor TMS4 implements a 4-channel 
demodulator array, together with additional bitformattingfunctions.Although4processors 
are used, the main processing of the 4-channel MCD is implemented by 2 processors — 
one for the demultiplexer and one for the demodulator array. The other processors are 
required to overcome undesirable analogue component characteristics, or provide auxiliary 
functions such as buffering.
The tree demultiplexing method was chosen for two reasons. The first reason is the ease 
of adapting to the analogue front-end, as discussed in the last section. The second reason 
is that the tree method can be partitioned serially and mapped onto a pipeline of identical 
processors. The processing load is equally shared. In the T-SAT case, a 4-channel tree 
structure is implemented in a single processor TMS2. The demultiplexing is completed 
together with TMS3, a half-band decimator. The polyphase-FFT method would require 6
Yim. All-Digital Multicarrier Demodulators 5-8
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
(assuming 4 channels plus 1-channel-spacing transition band on each side) parallel 
polyphase filters and 1FFT processor. Since the computation rate in the polyphase network 
is much higher than the FFT processor, this method cannot be implemented using 2 
processors in cascade as the tree method. A parallel arrangement would require dual-port 
memories, which are more complex than FIFOs used in cascading. The increased number 
of I/O or memory access instructions reduce the efficiency of each processor. In addition, 
complex software scheduling would be required. For the TMS320C25, a multiplication 
and associated operations in performing a FFT require more instruction cycles than that 
of a FIR filter. Therefore the polyphase-FFT method is less efficient than its computation 
rate implies.
5.2.1 Demultiplexer
A/D conversion is performed at a high sampling rate determined by the total bandwidth 
of adjacent channels allocated for a MCD. For subsequent individual channel 
demodulation, the digital signal can be down-sampled to a lower rate (‘decimated’) related 
to the signal bandwidth of each channel. As with the sampling of an analogue signal, this 
decimation operation requires anti-aliasing filters that are realised as a digital filter bank.
A multichannel, multistage approach is taken. The resulting binary tree structure is shown 
in figure 5.8, implemented within TMS2. The spectra of all stages centre around one of 
the channels are shown in figure 5.9. For clarity, a SAW filter with a passband 
approximately equal to 4 channels is shown. This demultiplexing algorithm is more flexible 
than that in [GocklerSSa]. Also, the T-SAT demultiplexer is optimized for the TMS320C25 
pipeline architecture.
■
Figure 5.8 Binary Tree Demultiplexer
Yim, All-Digital Multicarrier Demodulators 5-9
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
SAW ANALOGUE
ANTI-ALIASING
FILTER
QUADRATURE
FREQUENCY
SHIFTING
-70 MHZ
ÂAÛ
A/D A n  ( \ X a  a  A
DIGITAL ,___^  ^ ^ _________ _____
ANTI-ALIASING / Â Â > k A
DOWN 
SAMPLING 
BY 2
DAAF
1 2
/A nXn nXâ n><
/ aXaXaXaXaXa>
/ L \
0 HZ
A X .A  AAA
A A.X .A A
M K a\
Figure 5.9 Spectra of Tree Demultiplexer
Yim, All-Digital Multicarrier Demodulators 5-10
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
f  1 — f  0 / 2
Figure 5.10 Digital Anti-aliasing Filter
Each node of the tree structure consists of two anti-aliasing filters. Each selects from the 
input a smaller group bandwidth (fewer channels), and performs decimation by two. One 
such operation is shown in figure 5.10, with a real prototype filter h {n). Two groups are 
shown, the selected one has a group bandwidth bg and a group centre^ = 0. There is no 
rigid restriction on the frequency plan as long as each group is protected against aliasing. 
We concentrate on the case when the two group bandwidths are equal, which simplifies 
node design. Exceptional nodes can be brought in to deal with special requirements. For 
complex signals, the group centre frequency can range from 0 to ^  and the corresponding 
complex filter is obtained by frequency shifting h{n):
h'{n) =h{n) e
' /
2x-j-n +  04
where 0 is chosen to preserve the symmetry of h {n). There are two such filters centred at 
the corresponding group frequency. For the next stage, both the sampling frequency and 
the group bandwidth to be selected are halved, therefore the prototype filter h{n) is the 
same for all nodes.
Yim, All-Digital Multicarrier Demodulators 5-11
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
The large centre frequency offset of the SAW filter requires an adaptable approach that 
allows flexible allocation of channel centre frequencies. The four channel centre 
frequencies in the T-SAT experiment are 70.035, 70.021, 70.007 and 69.993 MHz. It is 
more convenient to use negative frequencies; the corresponding centre fr^uencies are 
-69.965, -69.979, -69.993 and -70.007 MHz. The quadrature frequency shifter with a 70 
MHz oscillator adds 70 MHz to the negative channel centre frequencies, which become 
-35, -21, -7 and 7 kHz. The four channels will be referred to as S_zi, S_y and 
respectively. No signal components coincide with the receiver centre frequency of 70 MHz. 
The d.c. offset component due to the analogue quadrature frequency shifting front-end is 
automatically suppressed by the digital bandpass anti-aliasing filters.
Despite the flexible channel allocation, a more efficient implementation than 
straightforward anti-aliasing filtering can be found. To use half-band filters, the transition 
band edges of h{n) must be symmetric about/q/4. The minimum value is therefore 
4bg, twice the theoretical minimum for sampling complex signals. A slightly higher 
sampling frequency, an integer multiple of the symbol rate, can be selected to avoid 
sampling rate conversion in the demodulator. For example, the initial group bandwidth 
for 4 channels is 56 kHz, but the sampling rate is chosen to be 128 kHz rather than 112 
kHz.
Processor TMSl is the root node of an 8-channel tree demultiplexer, although only 4 
channels are actually used. In this node, we select a group of 56 kHz (4 channels) at a 
sampling frequency of 256 kHz. The half-band prototype filter has a passband edge at 28 
kHz and a stopband edge at 100 kHz. The spectrum of this filter is shown in figure 5.11. 
The filter coefficients are:
-0.045041 0 0.291965 0.5 0.291965 0 -0.045041
The centre frequency of this group of 4 channels {S_zs, & 2 1 , and S-j) is at -14 kHz. After
samphng at 256 kHz, this centre frequency becomes 242 (256-14) kHz, but it is valid and 
more convenient to use the unmodified value of -14 kHz. After frequency shifting and 
scaling, the complex filter used is shown in figure 5.12. The filter coefficients are:
real -0.046312 0 0.549796 1 0.549796 0 -0.046312
imag -0.077266 0 0.19672 0 -0.19672 0 0.077266
Yim, All-Digital Multicarrier Demodulators 5-12
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
- 1 0
- 2 0
- 3 0
- 4 0
- 5 0
— 60
- 7 0
0 28  56  84  1 12 1 4 0  168  1 9 6  2 2 4  2 5 2
Frequency / kHz 
Figure 5.11 Tree Node Prototype Filter
- 1 0
- 2 0
- 3 0
- 4 0
- 5 0
— 60 -------- j.
- 7 0
0 28 56  84  1 1 2  1 4 0  168  1 9 6  2 2 4  2 5 2
Frequency / kHz 
Figure 5.12 Tree Node Complex Filter 
For the next stage, the first node in TMS2, we split the 4 channels into 2 groups: & 2 1 )
and (S_y A ). The group bandwidth is 28 kHz and the input sampling frequency is 128 kHz. 
Because of the constant ratio of the group bandwidth and the input sampling frequency in
Yim, All-Digital Multicarrier Demodulators 5-13
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
all nodes, the prototype filter is the same. At the A/D input, the centre frequencies of the 
two groups are -28 kHz and 0 Hz respectively. The value -28 kHz becomes 228 kHz after 
sampling at 256 kHz. After decimation to 128 kHz in the preceding stage, the new value 
is 100 kHz. We can use the value of 100 kHz in shifting the prototype filter. However, the 
original value of -28 kHz is equivalent. Therefore all centre frequencies can be specified 
without regard to the different sampling rates in different stages. The frequency shift of 
the prototype filter will automatically keep track of the virtual frequency changes due to 
decimation. This can be identified as sampling invariance, which is central to the 
exploration and validation of a flexible multistage design. The filter coefficients of the 
filter centred at -28 kHz are:
real 0.050046 0 0.11392 1 0.11392 0 0.050046
imag 0.0749 0 0.57271 0 -0.57271 0 -0.0749
The filter coefficients of the filter centred at 0 Hz is purely real:
-0.090082 0 0.58393 1 0.58393 0 -0.090082
For efficient half-band filters, the number of taps is given by A = 4« + 3. All odd taps are 
zero except the centre tap. Using the polyphase decimation by two structure, the filter is 
decomposed into two branches, one of which contains only the centre tap. The phase 
rotation 0 is chosen such that the value of the centre tap is purely real. Together with scaling 
such that the real part of the centre tap equals to an integer power of 2, no multiplications 
are required for the whole branch containing the centre tap.
Although the two filters of each node share the same input signal, the signal flow is greatly 
simplified by using two different signals for the two filters. The only difference between 
these two signals is a delay of one sample, therefore no extra complexity is required. To 
simplify control, the complex convolutions required by filtering are decomposed and then 
combined into two convolutions.
Figure 5.13 shows the resulting node structure for A = 7. To compute one output for each 
branch, each node requires 4 (A +1) multiplications and additions. If half-band filters are 
not used, computations increase by a factor of two approximately. The node structure (and 
hence the number of computations) is the same for all nodes, but in general, the coefficients 
of all node filters are different because of the flexible centre frequency allocation. Since 
each node is approximately over-sampled by two, an additional decimation by two stage 
with similar structure is required before demodulation.
Yim, All-Digital Multicarrier Demodulators 5.14
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
ROM
MAC
MUX
DEMUX
MUX
MAC
ROM
Figure 5.13 Tree Node Structure
The reason for the above complicated optimizations for the TMS architecture can be seen 
in the assembler code for each node in figure 5.14. To compute one output (complex) for 
each filter, we need four calls to the macro shown. The complexity of this macro is close 
to that of a real FIR filter (8 taps), for which the TMS architecture is optimized. This macro 
fully exploits the zero coefficients of half-band filters and polyphase decomposition. 
Neither separate signal shifting nor discontinuous convolution is required, otherwise the 
number of instructions and the run-time overhead would be significantly greater. Since 
97% of the 10 MIPS (million instructions per second) capacity of the TMS processor is 
utilized, even two extra instructions in each branch would require a faster processor. This 
is also the main reason for the choice of the binary tree structure instead of other single-stage 
or FFT based approaches. Because of the small memory sizes required for program code, 
coefficients and signal samples, all program and data accesses are to on-chip memories, 
with the fastest possible execution rate.
node $macro 
lac 
lar
mpyk
rptk
macd
apac
adds
sach
$endm
mid, tail, coef, output
:mid:,15
arl, :tail:
0
convlen-1 
:coef:, *-
half
zoutput:
; centre tap branch processing 
; load signal pointer 
; clear registers 
; 8 multiplications
; optional rounding 
; store output
Figure 5.14 Assembler Code for Tree Node
Yim, All-Digital Multicarrier Demodulators 5-15
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
For an ordinary decimation by two half-band filter, we have to wait for two input samples 
to arrive before one output sample can be computed. Waiting is not necessary for a tree 
node because there are two half-band filters. We have seen earlier that a delay is introduced 
between the outputs of the two filters. Hence for each input sample, one output can always 
be computed, which alternates between the two filters. Consequently, one output sample 
for the whole tree structure can be computed for every input sample. Therefore buffering 
is not necessary amongst nodes. This also applies to the A/D converter interface. The 
channel order of output samples thus obtained is the bit reverse sequence associated with 
FFT algorithms.
For custom hardware implementation, the computation rate can be further reduced at the 
expense of control complexity. With some restrictions on the frequency plan, the two 
filters in each node can share the same convolution product terms. In addition, all nodes 
in the tree structure are identical, which is suitable for ASIC implementation. When 
/ q = Abg, and the two initial groups are centred at^ / 8  and % / 8 , the real and imaginary 
output of each branch are summed from the same product terms except the sign. This is 
also the case for the two branches. For example, let the centre frequencies of four channels 
be 7,21,35 and 49 kHz. If we choose the sampling rate at this stage to be 112 kHz instead 
of 128 kHz (first node in TMS2), the prototype half-band filter is:
-0.050624 0 0.295059 0.5 0.295059 0 -0.050624
One group contains two channels at 7 and 21 kHz. The filter centred at 14 kHz is:
real 0.071594 0 0.417276 1 0.417276 0 0.071594
imag 0.071594 0 -0.417276 0 0.417276 0 -0.071594
The Other group contains two channels at 35 and 49 kHz. The filter centred at 42 kHz is:
real -0.071594 0 -0.417276 1 -0.417276 0 -0.071594
imag 0.071594 0 -0.417276 0 0.417276 0 -0.071594
To compute one output for each of the two branches, only 4 real multiplications are required. 
Since sampling frequencies in all stages are always integer multiples of group centre 
frequencies, there are only four possible filters, differ only in the signs of coefficients. 
Therefore only 2 real numbers need to be stored as filter coefficients, independent on the 
size of the tree hierarchy. This type of optimizations can hardly be exploited in signal 
processors. For example, FIR filtering in the TMS requires one instruction cycle per tap. 
Using the above optimizations, the reduced number of multiplications are replaced by 
additions and sign changes etc. These operations all require one instruction cycle.
Yim, All-Digital Multicarrier Demodulators 5-16
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
The above optimizations require that the initial sampling frequency is a function of the 
channel spacing. This requires sampling rate conversion because demodulators ultimately 
sample at the symbol rate. An efficient alternative is to select the sampling rate as integer 
multiples of the symbol rate as in the most flexible case. If the centre frequencies of the 
two filters in each node are jt/4 apart, they share the same product terms as in the most 
optimized case. The passband and stopband of the filters can be selected to allow the use 
of half-band filters. The computational efficiency of this alternative is similar to the most 
optimized version. All nodes are identical in structure, except that the coefficients of each 
node are generally different.
There are other apparent advantages with the multistage approach. If the number of 
channels is not a power of two, some nodes in the lower levels will not be required. 
Considerable amounts of computation can be saved if the number of channels is a multiple 
of 4, 8 ,1 6 , . . .  etc. Also, channels with larger bandwidths can be obtained from higher 
level nodes instead of the output of the demultiplexer.
The output of the tree is at 32 kHz, but the minimum sampling rate of a 16 kb/s QPSK 
signal, with a 40% roll-off factor, is 11.2 kHz. Allowing for a possible frequency offset 
of ±600 Hz, the signal bandv.ddth to be protected from aliasing is 12.4 kHz. Processor 
TMS3 provides a half-band decimator for each channel. The sampling frequency is reduced 
to 16 kHz prior to the demodulators in TMS4. The 19-tap half-band filter for channel 5 , 3 5  
is shown in figure 5.15.
Yim, All-Digital Multicarrier Demodulators 5 .17
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
- 1 0
-2 0
- 3 0
- 4 0
- 5 0
- 6 0
- 7 0
0 7 14 21 28
Frequency / kHz 
Figure 5.15 Filter Specification For Half-Band Decimator
5.2.2 Demodulator Array
The demodulator architecture chosen is shown in figure 5.16. The full demodulator array 
requires a simple context switching amongst channels. The frequency offset due to Doppler 
shift is a typical problem for low bit-rate mobile systems. A split-loop is used to enable 
frequency correction before further processing and to overcome the delay due to digital 
filters. Synchronizations involve parameter estimation and correction. The former is 
described in section 5.2.2.I. Commonly used table look-up techniques are used for 
frequency and phase correction. The details of digital timing correction is covered in 
section 5.2.2.2. The anti-aliasing filter used in this process is combined with the 
matched-filter to reduce overheads in the TMS architecture.
Yim, All-Digital Multicarrier Demodulators 5-18
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
16KHZ
COS/SIN
ROM
SUB-FILTER
ROM
I
MAC
—f—
SHFREG
ACC
COMP
L ACC
CLEAR
TED
in *2
ACC MUX
— FED I—
T .
FED
COS/SIN
ROM ACC
Figure 5.16 Demodulator Architecture
5.2.2.1 Parameter Estimation
In this section, error detectors for feedback architectures that require preamble patterns 
are first described. These principles are then developed into block estimation methods for 
use in fast lock-in, feed-forward architectures. Some of these detectors can be modified 
to work with QPSK random symbols. New techniques for absorbing severe power variation 
(and therefore improving fading performance) are also investigated.
A simple preamble pattern is obtained by transmitting identical sine waves at the in-phase 
(real) and quadrature-phase (complex) arms of a QPSK modulator, with a frequency equal 
to half the symbol rate (%(= 1/fT). For QPSK and a wide class of M-ary modulation, this 
preamble pattern can be provided by simple bit sequence. The parameters of a sine wave 
are most conveniently estimated by sampling at four points per period. This is equivalent 
to two samples per symbol.
At the receiver, a more convenient form is given by separating the complex sequence z 
into even and odd samples:
r„=r{n*I)= (- 1 )” cos a
r  1 = r  [(/z +1/2)7] = -\[2A (-1)" sin a  e
where A is the pulse amplitude, a  (| a | < jt/2 ) is the timing error measured fromr„, co is the 
normalised frequency error, and (j) is the phase error.
Yim, All-Digital Multicarrier Demodulators 5-19
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
Symbol Timing
A timing error detector (TED) algorithm is given by
2 . (0  =A sin2acos—2
This expression varies directly with the timing error only when a  approaches zero. 
Therefore iteration is necessary, which is conveniently realised by feedback loops. For a 
more accurate estimate, a power compensating term is required, e.g..
u (5.2)
= 2 A^cos2 a
The average values of and u '«taken over a block of samples yields a  via an arctangent
lookup table. This estimate is accurate for feed-forward architectures if the frequency error 
is small relative to the bit rate. Otherwise, both the real and imaginary part in expression 
(5.1) should be computed to cancel the frequency error terms.
For random data symbols, large estimated errors occur when there are no symbol 
transitions. This can be reduced by bringing an additional sample into consideration 
[Gardner8 6 a]:
= 24 ^  sin 2a cos ^
Simulations have shown that timing synclironization can be achieved in the presence of 
frequency offset. This is illustrated in figure 5.17. In the ideal case, when
synchronization is achieved. The scatter plot, versus 3[«n], for QPSK has only 4 
possible co-ordinates for all n . Due to frequency error, with no timing error, the received 
signal can be approximated by
Yim, All-Digital Multicarrier DerrxxJulators 5-20
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
Each point in the scatter plot corresponds to a different value of the time index n . With 
zero timing error, all points lie on the circumference of a circle, otherwise, spreading 
occurs. The narrow ring in the scatter plot shows that the symbol timing is approximately 
synchronized. This allows other detectors to operate on the data strobe only.
Figure 5.17 Scatter diagram showing approximate timing synchronization using 
Gardner’s TED under frequency error
Because of the term sin 2a, a null occurs in the detector characteristic not only when a  = 0, 
but also when a  = ±jc/2. This causes the hang-up effect and hence long acquisition time. 
An algorithm based on (5.2) is used to eliminate hang-up completely. This detector is 
cascaded with an integrate-and-dump filter. The output sign controls a two state, 0 or (T12, 
variable delay element.
Carrier Phase
Phase error detectors (FED) can be implemented by a digital version of the Costas 
algorithm. Let For zero timing and frequency error, we have
Yim, All-Digital Multicarrier Demodulators 5-21
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
= j’„sgn(r„)-A:„sgii(>'„)
= 2A sin^
The decision on data symbols using the sgn function allows this detector to work on random 
symbols.
Power level variations affect the loop bandwidth. The effect can be reduced by the following 
estimate:
= l^nl+l7.i
= 2 A cos(|)
A division operation with yields tan<|) that is independent of power level. The division
can be replaced by a multiplication together with an inverse lookup table. This has the 
advantage that the lookup table can be shared with other detectors. A feed-forward structure 
is obtained if an arctangent lookup table is used to yield <j) directly. A more accurate phase 
estimate (less power level dependent) would improve the performance under fading. When 
phase synchronization is close to completion, the value u itself is the amplitude estimate 
that is important for QAM detection.
For feedback architectures, a second order phase-lock-loop using the Costas detector is 
inadequate for fast carrier acquisition, even with a preamble pattern. For example, due to 
phase ambiguity, the detection process for QPSK cannot differentiate between and 
a„ exp{/jt/2]. Therefore false-lock occurs if the symbol to symbol phase change due to 
frequency error is larger than :jt/4. Depending on the loop bandwidth, fast acquisition can 
only be achieved for a frequency error that is an order of magnitude smaller. This is because 
a second order loop behaviour is valid only if the initial and the final phase angle after 
carrier synchronization differ by a value smaller than jc/4. That is, the symbol to symbol 
phase change due to frequency error does not cross the phase detection threshold. The 
larger the frequency error, the more frequent the phase detection threshold is crossed. The 
effect is similar to frequent decision errors that increase acquisition time. A direct 
estimation of the frequency error is required for fast acquisition.
Yim, All-Digital Multicarrier Demodulators 5-22
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
Carrier Frequency
The balanced quadricorrelator [Gardner85a] operating on the preamble pattern is given by
= -2 4  ^  cos  ^a  sin CO
Acquisition completes in 80 symbols. The real part can also be computed to reduce the 
effect of power level variation (hence fading) as in phase estimation. The dependency on 
timing error can be eliminated by the following algorithm:
= —24^sinco
Although Gardner showed that the quadricorrelator works under random data symbols, 
our simulation study shows that this algorithm is impractical for fast acquisition. Due to 
the random data symbols, large frequency jitters are unavoidable. Therefore after changing 
over to a phase-lock-loop, the acquisition speed is still slow. This is confirmed 
independently in [Hespelt8 8 a], and lead to the development of the dual-filter detector. 
However, the two extra filters require many more instruction cycles to realise than the 
quadricorrelator. We use an opposite approach based on differential decisions to avoid 
large jitters [Yim8 8 a]. The complexity of the resulting detector is similar to the 
quadricorrelator, enabling a single-processor implementation of four demodulators. When 
the timing error is zero, the quadricorrelator becomes:
= a+Jb
where r can take on values of 0,1, 2 or 3 to account for phase modulation. The real and 
complex component is given by:
a
24'
C O S O ) ,r = 0 sin CO ,r  = 0
sino) ,r = l b -cos CO ,r  = l
-cos CO ,r  = 2 24^ -sin  CO ,r  = 2
-sin  CO .r  = 3 cos CO ,r  =3
Yim, All-Digital Multicarrier Demodulators 5-23
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
If I col <3t/4, jsinco] < j cos coj. Therefore r  can be resolved by differential decision. The 
frequency error is given by
bsgn{a) ,\a \> \b \
\-asgn{b) ,\a \< \b \
Although this detector is decision directed, the simulated performance is satisfactory even 
at a low EjjINq value of 6 dB. The differential decision FED is shown in figure 5.18. 
Combined with Gardner’s timing error detector, acquisition under random data pattern 
completes in 200 symbols. This compares favourably with the initial design by ANT 
[Hespelt88a], which used the quadricorrelator. The acquisition time is 1000 symbols, and 
requires complicated try-and-error type control. The initial frequency and timing 
acxjuisition is illustrated in figure 5.19. A switch over to a phase-lock-loop is necessary 
for coherent detection. There are limitations to this combination. Since a raised cosine 
pulse shape is assumed, the decision is performed after the matched filter. Large initial 
frequency errors lead to distorted pulse shapes. Yet the limiting factor is the false lock 
problem imposed by the condition on co above. For efficient bandwidth utilisation, the 
allowable frequency error is usually small such that false lock will not occur. The exception 
is for very low bit rate systems where tlie Doppler effect causes significant frequency error 
compared to the symbol rate. Here the differential decisions should be disabled initially 
to form a quadricorrelator, which has a larger pull-in range.
Yim, All-Digital Multicarrier Demodulators 5-24
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
s m
MUX
SON
ABS
œ M P
ABS
Figure 5.18 Differential Decision Frequency Error Detector.
Figure 5.19 Scatter Diagram Showing Timing and Frequency Acquisition
Yim, All-Digital Multicarrier Demodulators 5-25
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
S.2.2.2 Digital Timing Correction
An important classification criterion of digital modem is the use of digital timing correction. 
If there is system-wise clock synchronization or, more commonly, the sampling instant is 
adjusted to track the optimum decision instant through a hybrid loop, one sample per 
symbol is sufficient for demodulation. For MCD applications with non-synchronised 
SCPC signals, the single on-board A/D converter cannot be synchronised to all channels. 
Therefore a fully digital algorithm is required. This involves obtaining the signal value 
near to the optimum decision instant, and taking into account the difference in mobile and 
satellite clocks.
A polyphase interpolation structure has been used for timing correction. Variable 
inter-sample delays can be realised from the sufficiently sampled input signal, by selecting 
one set of coefficients out of an array of sub-filters. The array size determines the delay 
resolution while the processing load remains practically constant. Direct application of 
digital interpolation is found to be impractical for timing synchronization. An original 
method using an extended array of polyphase sub-filters was developed for the T-SAT 
demodulator [Yim89a], which is a special case of the variable polyphase-lattice structure 
discussed in chapter 4.
To account for the difference between mobile and satellite clocks, let the sampling period 
be TJ2, i.e., approximately two samples per symbol. Due to component tolerance, the 
transmitted symbol clock period is oT^. A straight forward interpolation filter has a 
timing correction range of [ 0 , TJ2 ], but the required range is one symbol interval, 
[ —TJ2 , TJ2 ]. This two sided range can be dealt with by a change of reference, but some 
form of correction range extension is implied. If o 9^= 1, the timing error accumulator can 
take on a value outside [ -TJ2  , TJ2 ], occurring at a frequency depending on o. This 
overflow/underflow condition should be used as a control signal such that the demodulator 
operates asynchronously with respect to the sampling clock. The accumulator value is 
then adjusted by ±7  ^accordingly whenever it is out of range. In general, a buffer is required 
for the asynchronous operation.
For a group of SCPC signals that are not synchronized in timing, the clock tolerance 
problem reappears when the bit streams are to be subsequently multiplexed. Therefore 
independent buffers for each channel are not necessary. This enables efficient time 
multiplexing of demodulator operations. The accumulator value should always be adjusted 
to within one symbol interval. Each adjustment leads to a missing or extra symbol, and 
causes loss of frame synchronization. This can occur at any instant after initial timing
Yim, All-Digital Multicarrier Demodulators 5-26
5 All-Digital Multicarrier Demodulator Implementation Multiprocessor DSP Board 5.2
acquisition, although the frequency of occurrence is very low for high quality crystal 
controlled oscillators. If not carefully controlled, the accumulator value may oscillate 
between boundaries of the timing range, causing burst errors. This oscillation can be 
eliminated if the correction range is extended to [ -7^ , 7^  ] and the accumulator is zeroed 
whenever this range is exceeded. An important consequence is that a minimum trouble 
free period is guaranteed after initial acquisition. This implies that burst mode operation 
will not be affected for commonly used packet sizes. Also, the trouble free period can be 
arbitrary extended at the expense of additional symbol delay.
The range of timing correction is conveniently increased by an extended array of polyphase 
filter coefficients. Figure 5.20 shows a simplified illustration. The prototype filter length 
is 12 with an interpolation factor of 3, i.e., a timing resolution of TJ6.
Yim, All-Digital Multicarrier Demodulators 5-27
5 All-Digital Multicarrier Demodulator Implementation Computer-Aided Design and Simulation Study 5.3
INTEPOLATION SUBFILTERS
h(0) h(3) m  h(9)
h(D m  h(7) h(10)
h0 h(5) h(8) h(ll)
CORRECTION RANGE 
HALF SYMBOL INTERVAL
EXTENDED SUBFILTER ROM
0 0 0 h(0) h(3) h(0 be>)
0 0 0 h(l) hW h(7) hdO)
0 0 0 h 0  h(9 b(8) bdl)
0 0 h(0) h(3) h(0 b(9) 0
0 0 h(l) h(4) h(7) b(10) 0
0 0 h 0 b(5) b(8) bdl) 0
0 h(0) h(3) h(0 h(9) 0 0
0 h(l) h(4 h(7) h(10) 0 0
0 m h(S) h(8) hdU 0 0
h(0) h(3) h(Q m  0 0 0
h(P h(10) 0 0 0
h(8) M il) 0 0 0
MAC
” 1 c—
SHIFT REGISTER
Î
CORRECTION RANGE 
2 SYMBOL INTERVAL
Figure 5.20 Extended Array of Polyphase Sub-filters for Digital Timing Correction
5.3 Computer-Aided Design and Simulation Study
Much effort has gone into simulation to: discover the problems of proposed algorithms; 
devise remedies; select a large number of design parameters to meet the system 
specifications; perform trade-offs to reduce the number of instruction cycles. This section 
presents the main results for the T-SAT MCD specification. More detailed descriptions of
Yim, All-Digital Multicarrier Demodulators 5-28
5 All-Digital Multicarrier Demodulator Implementation Computer-Aided Design and Simulation Study 5.3
the selected algorithms can be found in [Ahmad90a] (a simulation study using the software 
developed for T-SAT) and [Voyant90a] (an implementation of a single-channel modem 
using the T-SAT algorithms and TMS320C25 signal processor).
Because of the limited processing power, algorithms of low complexity were selected that 
make use of separate parameter estimation. That is, in the estimation of a single parameter 
(e.g., phase), all other parameters are assumed synchronised (e.g., symbol timing). 
Therefore reliable signal acquisition is not guaranteed. Simulation is required to analysis 
the transient behaviours of the complete set of algorithms, and to search for the loop 
constants that allow reliable acquisition. The demodulators are reconfigurated at 
predetermined intervals for acquisition and tracking. Acquisition with training patterns 
(burst mode) can be completed in 80 symbols at a signal attenuation level of up to 6 dB. 
The transient behaviours of various loops are shown in figure 5.21. Acquisition with 
random patterns can be achieved in 200 symbols (including worst case timing offset) by 
employing the differential decision frequency error detector previously described.
Simulation is also performed for the trade-off between overall complexity and 
performance. The selected parameters and simulated performances are summarised in table
5.2 and 5.3. The total degradation is less than 1 dB at an available Ei,INq of 14.5 dB. The 
bracketed terms correspond to burst mode operation requiring larger loop bandwidth for 
faster acquisition.
Yim, All-Digital Multicarrier Demodulators 5-29
5 All-Digital Multicarrier Demodulator Implementation Computer-Aided Design and Simulation Study 5.3
(a) Symbol Timing Transient Response
2# 4# 4« • •  *##
(b) Frequency Transient Response
(c) Phase Transient Response 
Figure 5.21 Burst-Mode Acquisition
Yim, All-Digital Multicarrier Demodulators 5-30
5 All-Digital Multicarrier Demodulator Implementation Computer-Aided Design and Simulation Study 5.3
Table 5.2 T-SAT MCD Parameters
Input Sampling Frequencies 
TMSl (Root Node)
TMS2 (Tree Demultiplexer) 
TMS3 (Half-Band Decimator) 
TMS4 (Demodulator Array)
256 kHz ' 
128 kHz 
32 kHz 
16 kHz
A/D Word Length 8 bits
Dynamic Range Margin 3dB
Signal Processor Word Length 16 bits
Demultiplexer Node Filter Length 7
Half-Band Decimator Filter Length 19
Matched Filter Length 15
Timing Resolution 2732
Phase Resolution jc/256
Table 5.3 T-SAT MCD Implementation Loss (dB)
Adjacent channel interference 0.25
Analogue anti-aliasing filter 0.14
Analogue to digital converter 0.02
Demultiplexer 0.11
Data filter 0.02
Carrier recovery 0.10 (0.37)
Clock recovery 0.04
Total 0.68 (0.95)
The evaluation of implementation loss can be depicted by scatter diagrams. The data 
symbols, a„, for QPSK have four possible complex values, forming the comers of a square 
in the complex plane — the constellation. For noiseless channels and ideal receivers, the
YIm, All-Digital Multicarrier Demodulators 5-31
5 All-Digital Multicarrier Demodulator Implementation Testing 5.4
plot of data strobe values on the complex plane is identical with the constellation in shape. 
Figure 5.22 shows a scatter diagram in the noiseless case. The main degradation here is 
due to filtering, leading to the spread of data strobe values from the four constellation 
points. The data strobe values spread further at the presence of white noise, shown in figure 
5.23. If displayed in real-time as in a constellation analyzer, most of the outer points in 
each of the four quadrants will not be detected by the human eye, due to the rare rate of 
occurrence.
#
Figure 5.22 Scatter Diagram Figure 5.23 Scatter Diagram
EJNq — oo Ei,!Nq = 14.5dB
For ideal receivers, the signal-to-noise ratio {SIN) at the data strobe is a simple function 
of EjjINq. Comparing the S!N for a practical receiver with that of an ideal one yields the 
implementation loss. If assuming that the total noise is Gaussian, the bit-error-rate can be 
found from standard tables, based on the measured value of SiN.
5.4 Testing
The MCD test-bed is shown in figure 5.24. The main difficulty of testing is due to the low 
bit rate (16 kb/s) relative to the I.F. frequency (70 MHz). Packing SCPC signals into the 
narrow channel spacing allocated requires special attention to the tolerance of high 
frequency components. Another difficulty is the non-synchronous sampling of all-digital 
demodulators. Conventional bit-error-rate testers cannot be used directly. Therefore most 
of the test-bed equipment and interface circuits are custom designed.
YIm, All-Digital Multicarrier Demodulators 5-32
5 All-Digital Multicarrier Demodulator implementation Testing 5.4
NOISE
ROM
D/AMODULATOR IF.
MULTECARRIER
DEMODULATOR
70 MHZ 2
ROM
D/AWAVE GEN IF.
TMS TMS ON-BOARD
PROCESSORTRANSIENT
CAPTURE
CUSTOM 
HER COUNTER
Figure 5.24 MCD Test-Bed
5.4.1 ROM-Based Flexible Modulator
Narrow-band QPSK is theoretically bandlimited. Using analogue implementation, 
approximate band limitation can only be achieved through complicated filter design. Strong 
components often appear in the spectral locations that correspond to the side-lobes of 
wide-band QPSK. A high quality modulator using DSP is simple in design and 
implementation. Ideal baseband signal synthesis for QPSK is the filtering of complex 
bipolar unit impulses with a linear phase root-raised-cosine filter. The bipolar impulses 
represent the binary data symbols. This can be directly implemented using DSP with
YIm, All-Digital Multicarrier Demodulators 5-33
5 Ail-Digital Multicarrier Demodulator Implementation Testing 5.4
perfectly linear phase FIR filters. The only practical limitations are the quantization and 
filter shape. Both degradations can be made negligibly small using over-sampling and 
long filters.
Because of the zero valued samples, the filtering of impulses has the same structure as 
interpolation. The polyphase method has been used for modulated signal generation in 
simulation since the beginning of the research work in this thesis, well before the recent 
attention in the literature towards flexible and efficient DSP modulators. This illustrates 
one of the advantages of an experimental approach towards simulation. Any of the 
simulated algorithms can be directly implemented with high efficiency. The package 
TOPSIM in  uses square pulses and inverse sine compensation to simulate typical analogue 
generation methods. One of the SPW brochures shows an inefficient single-rate digital 
modulator implementation involving multiplication by zeros. At the time of writing, the 
COSSAP package generates raised-cosine filtered signals using frequency domain 
methods.
A 128 kAz sampling rate and a 128-taps FIR filter are used. Applying multirate techniques, 
the processing can be performed efficiently using 16 polyphase sub-filters with 8 taps each. 
Using Distributed Arithmetic, a FIR filter can be implemented using one ROM lookup 
table and one adder. For QPSK, no adders are required because the bipolar impulses are 
represented by one-bit words. The signal is stored in an 8-bit shift register that is shared 
amongst sub-filters. Each sub-filter can be implemented by one ROM with 8-bit address. 
Alternatively, the 16 ROMs can be combined into one with 12-bit address, i.e., a memory 
size of 4K words.
The ROM-based modulator consists of a digital board and an analogue board. The digital 
board contains flip-flops to split the data sequence into real and complex streams. Two 
8-bit shift registers and a 4-bit counter is used for ROM addressing logic. Since the root 
raised-cosine filter is real, only one ROM is necessary for filtering of both the real and 
complex impulses. However, two identical 8K X 8 ROMs are used, 4 kbytes for signal 
synthesis and 4 kbytes reserved for modulator calibration signals. Two separate ROMs 
allows simpler address logic, faster access time and more flexible for other modulation 
schemes where non-identical ROMs are required. External data source or internal 
pseudo-random bit sequence generator with period 2^ can be selected through 
DlP-swithces.
Yim, All-Digital Multicarrier Demodulators 5-34
Testing 5.4
5 All-Digital Multicarrier Demodulator implementation
The analogue board is for D/A conversion and quadrature frequency shifting to 70 MHz 
I.F.. Two 8-bit, 90 ns settling time, current source D/A converter is used for the complex 
baseband signal synthesis. Current to voltage conversion is performed usmg high-speed 
dual-OPAMPS. Two 4th order Bessel filters with linear phase characteristics are used for 
anti-imaging. The high over-sampling factor allows near linear phase passband response,
0.4dB maximum passbandattenuation,and40dB minimum attenuationforthefirstimage.
Complex analogue baseband signals together with quadrature modulator simplify digital 
hardware and allow flexibility in I.F. selection. The quadrature frequency shifting can be 
performed digitaUy with a low I.F.. Only a real analogue signal is produced and therefore 
a balanced modulator can be used. However, a narrow band SAW filter (or a linear phase 
filter) is required for suppressing the upper or lower sideband as in single side-band 
modulation. No side-band is generated using a quadrature modulator. Only a wide-band 
filter is required for suppressing the oscillator harmonics. Therefore the I.F. frequency can 
be varied provided that the modulated signal falls inside the passband of the wide-band 
filter. The disadvantages of complex analogue signals are d.c. offset adjustment and gam 
matching. These are easUy overcame by calibration using analytic signals. A complex 
tone is generated digitaUy, using vacant ROM locations in the modulator, or a TMS
development board with data downloaded fromaPC. With the aid of aspectrum analyser,
the carrier residue (due to d.c. offset) and the sideband residue (due to unmatched gam) 
can be adjusted using variable resistors. These errors can be reduced to values less than
-50 dB with respect to the signal power.
Using DSP techniques, we have achieved a test quality, flexible I.F. modulator, with 
adjacent channel interference, carrier residue and sideband residue approaching the noise 
floor of a typical spectrum analyser. The observed eye pattern and signal spectrum can 
hardly be distinguished from the simulated signals shown in chapter 4.
5.4.2 Arbitrary Wave-form Generator
To test the performance of the T-SAT MCD, 4 or more channels are required. To reduce 
the amount of testing hardware, multiple channels are generated together using DSP such 
that only a single oscillator and a single analogue board are required. The analogue board 
is identical with that used in the ROM-based modulator. The digital board is a simple 
ROM look-up table. The digital sequences of all channels are precalculated, multiplexed 
and stored in ROM. The power level of each channel is compensated for the attenuation 
due to the Bessel anti-imaging filters. The multichannel wave-form is periodic.
All-Digital Multicarrier Demodulators
5-35
5 All-Digital Multicarrier Demodulator Implementation Testing 5.4
corresponding to 1024 bit pseudo-random sequence. This digital board can be replaced by 
a TMS development board for more flexible wave-form generations such as varying carrier 
offset and co-chaimel interference.
5.4.3 Transient Wave-form Recorder
A TMS development board with custom software is used to capture various signals internal 
to demodulators, mainly for acquisition performance analysis. In this testing mode, one 
of the demodulators in the MCD multiprocessor board is turned into a pure buffer. The 
demodulator software is executed on the TMS development board for monitoring. 
Therefore various internal transient signals and output signals can be stored in RAM. These 
wave-forms are retrieved by the host PC for display and recording.
5.4.4 Custom Bit Error Rate Monitor
The data clock of an analogue receiver is synchronised to that of the transmitter. In contrast, 
an all-digital receiver uses interpolation techniques that lead to asynchronous operation. 
Data clock frequencies in the receiver and transmitter are equal only in the long term 
average. This requires complex buffering hardware for interfacing to a conventional 
synchronous BER monitor. Therefore a software oriented BER monitor is implemented 
using a TMS development board as the high-speed front-end processing element, and a 
PC as an analyser and display unit.
T-SAT MCD operates in non-synchronous mode. Symbol skipping or doubling is allowed 
to avoid multiplexing data streams with slightly different bit rates. The frame structure is 
occasionally destroyed for a continuous transmission, leading to burst errors. Although 
these burst errors are infrequent using high accuracy oscillators, counting burst error 
separately gives a more accurate performance of the demodulator. More importantly, once 
a burst error occurs, counting cannot be continued before the realignment of the transmitted 
and received data sequences. Therefore automatic realignment is necessary for testing over 
a long period, which is required for high cases.
The inputs to the monitor are the transmitted and demodulated data sequences. Because 
of the asynchronous operation, these data sequences are buffered using FIFOs in a digital 
interface board. Frame synchronization is performed using a 32-bit convolution, to 
determine the delay and phase ambiguity prior to error counting. Error counting is 
automatically disabled when frame synchronization is lost due to burst errors or cycle slips, 
and resumed after automatic resynchronization. The 32-bit convolution is performed by 
table look-up techniques. In the simplest case, a ROM with a 32-bit address can be used.
Yim, All-Digital Multicarrier Demodulators 5-36
5 All-Digital Multicarrier Demodulator Implementation Testing 5.4
For a feasible memory size, the look-up procedure is decomposed by using 256-byte tables 
arranged in a tree structure. These complicated tables are generated and verified using 
simulation software. The look-up procedure itself is simple in software, with adequate 
operating speed for testing a single channel. The raw counts generated by the TMS board 
are analysed by the PC host. The BER, burst error rate and counting time are calculated 
and displayed in real time.
5.4.5 Test results
Figure 5.25 shows the acquisition performance under random data symbols. This is a time 
plot of the in-phase (or quadrature-phase) data strobe values that are used for data decision, 
based on the polarity. The transient signal obtained is stored in real-time during acquisition 
test Synchronization is achieved within 160 symbols as shown. This is better than the 
simulated performance of 200 symbols because worst case initial timing error is difficult 
to generate experimentally. Here the initial carrier frequency offset is 900 Hz, higher than 
the specified 600 Hz due to Doppler shift.
0 128 256 384
TIME/SYMBOLS 
Figure 5.25 Data Strobe Values Captured in Acquisition Test
Yim, All-Digital Multicarrier Demodulators 5-37
5 All-Digital Multicarrier Demodulator Implementation Complexity of Multicarrier Demodulators 5.5
Figure 5.26 shows the BER curve. The maximum degradation is approximately 0.7 dB for 
the measured range. Measurements of high Ei,INq values have not been attempted. Using 
true error counting, the measurement time required is excessive for accurate results in low 
bit-rate systems.
0)
4J
s
kiou
w
4->-H
PQ
I E - 2
IE-3
IE-4
IE-5
IE-6
IE-7
I E - 8
4 5 6 7 8 9 10 11 12
Eb/No / dB 
Figure 5.26 Bit Error Rate Measurements
5.5 Complexity of Multicarrier D em odulators
Based on the T-SAT project, this section discusses the complexity of MCDs, including 
implementations using digital signal processors (DSPs) and ASIC. In VLSI circuits, the 
chip area is ideally proportional to the throughput required — bit rate times number of 
channels. This is mostly likely to be achieved by ASICs, through algorithm, architecture, 
computational structure and gate-level design. The hardware architectures of DSPs are 
fixed — all operations are apparently performed in serial. However, the programmable 
feature allows efficient use of chip area by the mapping of appropriate algorithms onto the 
right processor.
5.5.1 Complexity of T-SAT MOD
The complete 4-channel MCD is realised using a pipeline of 4 TMS320C25 digital signal 
processors (DSPs). Initially, we adopted 3 commercial cards each with one processor.
Yim, All-Digital Multicarrier Demodulators 5-38
5 All-Digital Multicarrier Demodulator Implementation Complexity of Multi carrier Demodulators 5.5
additional memory and A/D converters etc. We actually need only 3 processor chips and 
simple connections. In the final custom hardware card, a fourth processor is added because 
the SAW filter bandwidth is wider than required.
Table 5.4 summarises the utilization of DSPs. The software has undergone several 
revisions, due to changes in hardware and testing etc. Therefore the quoted figures give 
an accurate representation of the overall complexity, but are not necessarily identical with 
those of the final version under test
Table 5.4 MCD Processor Utilization
Processor Function Utilization 
(% of 10 MIPS)
Memory Size (16-bit word)
ROM RAM
TMSl Root Node 94% 53 20
TMS2 Tree
Demultiplexer
97% 302 75
TMS3 Half-band
Decimator
60% 358 82
TMS4 Demodulator 86% 1851 240
We need less than 256 on-chip RAM locations for data plus another 256 RAM locations 
for time critical code. The 4096 words in the on-chip ROM (or possibly EPROM) are 
enough for the program in each block -  we actually use less than 2048 in any block. 
Initially, communication between DSPs is by the high speed serial port capable of working 
up to 5MHz, whereas we need only 4MHz. One lesson leamt was the apparently simple 
task of inter-module communication between DSPs could require considerable number of 
processor cycles. A software cyclic buffer requires up to 26% of processor cycles. Therefore 
we use external FIFOs finally to allow additional code for testing, and to improve the 
software structure.
The partitioning is that the first processor (TMSl) is the root node of an 8-channel tree 
demultiplexer, not required for 4 channels if a custom SAW filter is obtainable. The second 
processor (TMS2) is the 4-channel tree filter bank — the processing load is dependent on 
the number of channels and carrier spacing, which determines the sampling rate. The third 
processor (TMS3) provides a half-band decimation operation — the processing load 
depends on both channel spacing and modulation bandwidth. Although the amount of 
computations and memory required are significant, this processor is under-used and can 
be integrated with other functions to reduce complexity. In addition, the processing load 
of this stage only varies linearly with the bit rate or number of channels. The fourth processor
Yim, All-Digital Multicarrier Demodulators 5-39
5 All-Digital Multicarrier Demodulator Implementation Complexity of Multi carrier Demodulators 5.5
(TMS4) provides 4 demodulators and some formatting operations. These formatting 
operations, especially the unique word detection, require about 30% of the processing 
operations. These bit manipulations were finally performed in a custom bit-error-rate tester 
to allow for monitoring software in the demodulator. It proved difficult to fit the required 
tree-filter into a single chip so the quoted program sizes etc reflect considerable 
optimization effort. Other processors have not yet needed such attention so quoted figures 
should be taken as a guide rather than as best case figures.
Based on the T-SAT experience, we estimate the complexity of MCDs as the bit rate or 
number of channels increases [Yim88a]. We concentrate on the 4-channel tree hierarchy 
and the array of 4 demodulators. Each function is implemented using a single processor 
(TMS2 and TMS4). The number of processors required by the demodulator array grows 
linearly. This is not so for the tree hierarchy. For example, 3 processors are required for 
an 8 X16 kb/s demultiplexer with 7 nodes, or 4x32  kb/s with 3 nodes. Above these limits, 
not all the tree nodes can be implemented using a single processor. However, the TMS 
can still be used as building blocks for lower speed tree nodes up to 64 kb/s. This bit rate 
is also the approximate limit for single processor implementation of a demodulator. For 
bit rates lower than 16 kb/s, more opportunities exist for program code optimization.
5.5.2 ASIC Implementations
The complexity of ASIC implementations can be seen in related projects, adapting the tree 
demultiplexer algorithms for bit-serial processing [Bi90a] and distributed arithmetic 
[Bi90b]. Table 5.5 shows the comparison with the T-SAT implementation. In the ASIC 
for the tree hierarchy corresponding to TMS2, an optimum sampling rate is chosen in order 
to exploit filter symmetry. Therefore a sampling rate converter is required instead of a 
half-band decimator (TMS3).
Table 5.5 Comparison of ASIC and TMS Implementation
MCD Implementation ASIC T-SAT
Bit Rate (kb/s) 9.6 16
No. of Channels 4 4
Function Tree Sampling Rate 
Converter
Tree Half-Band Decimator
Chip Area (mm )^ 37 43 290 290
Equivalent Chip Area 
per 16 kh/s Channel
15 18 73 73
Relative Complexity 0.21 0.25 1 1
Yim, All-Digital Multicarrier Demodulators 5-40
5 All-Digital Multicarrier Demodulator Implementation Signal Processor Architectures and Software 5.6
Since the system parameters for the demultiplexers are identical except bit-rate, an accurate 
comparison of chip area can be given by direct scaling, assuming the ideal case that chip 
area is proportional to throughput The complexity using ASIC is approximately a quarter 
of that using the TMS320C25. Some care should be taken in interpreting the figures. Some 
chip areas in the C25 provide features that are not used: 4K on-chip ROM; serial I/O; 
companding.
5.6 Signal P rocesso r A rchitectures and Software
This section summarises the TMS320C25 architecture, and compares it with other DSPs, 
including associated software development tools. This would be relevant in selecting DSPs, 
and estimating the system complexity.
5.6.1 Implementation of FIR Filtering
In performing the convolution of a FIR filter, each multiplication is associated with an 
addition and memory shift (delay) operation. In typical signal processors, these associated 
functions do not require extra instruction cycles. Therefore the number of instructions 
required in FIR filtering is close to the number of multiplications.
For example, the TMS320C25 code segment for FIR filtering is:
rptk length -1
macd coefficient_address, *-
The macd instruction is repeatedly executed in cache memory. The address of the first 
filter coefficient is directly supplied to this instruction. The signal samples are addressed 
through the default address register denoted by *. Both the entire coefficient array and 
signal array are accessed through automatic address increment or decrement. In addition 
to performing the convolution (multiply and accumulate), the macd instruction shifts the 
signal samples in memory (delay). Using on-chip fast memory banks, the macd instruction 
requires one instruction cycle per tap after initialization.
The corresponding code segment for the DSP32 and DSP32C is:
aO = aO + ( *r2— = *rl— ) * *r3—
This line of code can be accomplished in one instruction cycle, for one tap of a FIR filter. 
The often used, simple and fast way to complete the filtering is using one line of similar 
codes per tap. The address registers r2 and r l  both access the signal sample array. The 
assignment using these two registers performs signal shifting. Address register r3 accesses 
the filter coefficient array. The register aO is used as an accumulator.
Yim, All-Digital Multicarrier Demodulators 5-41
5 All-Digital Multicarrier Demodulator Implementation Signal Processor Architectures and Software 5.6
A simple but useful indication of processor performance is the mac time — the time taken 
to complete the operations of a single filter tap. Since most DSPs, except the earlier ones, 
perform the mac operation in one instruction cycle, this cycle time is also the prime 
yardstick.
It has been shown that the tree nodes can be fitted into the TMS architecture such that the 
number of processor cycles required closely corresponds to the number of necessary 
multiplications, with aU data move operations done in pipeline. In addition, the tree filter 
bank is capable of operating in a stream processing mode, i.e., computes on a sample by 
sample basis. Buffers may be avoided, but would be necessary in polyphase-FFT 
approaches. Each tree node consists of 2 anti-aliasing FIR filters with particular features 
— complex, half-band and down sampling by 2. Polyphase implementation of these filters 
avoids all the unnecessary computations as well as requiring no extra processor cycles for 
data move operations. The apparent overhead of complex filtering can be reduced by 
splitting into real FIR filters and by careful placement of the coefficients and signal samples 
in memory.
The multiplier accumulator architecture typical of digital signal processors has shown to 
be able to implement the tree method efficiently. Therefore the mac time or instruction 
cycle time is very appropriate for comparing the maximum throughput (bit rate and number 
of chaimels) of different DSPs. Using more complex processor architectures may reduce 
the overhead during the initialization of filtering, but the total number of instructions 
required would not be significantly reduced. Filter symmetry cannot be exploited in the 
TMS architecture. Although the number of multiplications is reduced by half, the number 
of additions remains the same, and hence the number of instruction cycles cannot be reduced 
significantly. In addition, the delay operation in the macd instruction cannot be adapted 
to use for more complicated signal movement required in a symmetric filter. Although the 
DSP32 allows a more flexible combination of operations in a single instruction cycle, 
similar difficulties are encountered in exploiting filter symmetry. This is because only a 
limited number of combinations fit into the processor architecture — arbitrary 
combinations do not allow multiple operands to be fetched from memory in one cycle.
5.6.2 implementation of Other Algorithms
Processor architectures are more important for algorithms apart from filtering, such as FFT 
in demultiplexers and various detectors in demodulators. The TMS (and earlier DSPs) 
architecture is very poor in performing FFTs, despite the presence of bit reverse addressing. 
Complex filters can be decomposed into real ones, but complex multiplications cannot be
Yim, All-Digital Multicarrier Demodulators 5-42
5 All-Digital Multicarrier Demodulator Implementation Signal Processor Architectures and Software 5.6
avoided in the FFT. Each complex multiplication involves 4 operands in memory, 4 
multiplications and 2 additions. The one word instruction cache cannot be used effectively 
as in the macd instruction. Also, the FFT algorithm is inherently recursive. The TMS 
architecture is particularly inefficient in mapping the FFT algorithm into an iterating 
program. Together with the large loop overhead (minimum 2 cycle per branch instruction), 
fast TMS programs for the FFT require large code memory, e.g., using one instruction per 
multiplication instead of a program loop. The DSP16 and DSP16A have one multiplier 
but 2 accumulators. This feature is more suitable for complex multiplications, reducing 
the memory access for storing intermediate values. Even with similar mac time, most newer 
DSPs out perform the TMS320C25 in FFT computation, at least in code memory size, 
e.g., the ADSP-2101. From the DSP32 code segment shown above, the multiple memory 
access in one instruction is apparently more suitable for complex number computations.
Despite the rich variety of processor architectures, a simple yardstick is the demand ratio 
— the total number of memory cycles per instruction cycle. A complex instruction may 
be executed in parallel or completed in a pipeline, but the demand ratio limits the maximum 
number of operands, and hence the complexity that can be achieved. The demand ratio of 
the TMS320C25 is two. Instructions with two operands from memory can be executed in 
a single cycle. To implement FIR filters in one instruction cycle per tap, a one word cache 
for the macd instruction is required. In addition, a specialized write cycle performs the 
shifting of signal in memory. With a demand ratio of three, the DSP56001 does not require 
an instruction cache, but do require a specialized write cycle. In this case, a circular buffer 
addressing mode is provided. Newer generations of DSPs have a demand ratio of four, 
e.g., DSP32, DSP96002 and TMS320C30. These processors need neither specialized write 
cycles nor cache for single instruction per tap. With four read or write cycles in a single 
instruction, algorithms with irregular memory access patterns can be implemented 
efficiently.
A high demand ratio is realized by either multiple accesses of the same memory bank in 
one cycle, or by providing separate memory banks for parallel access. For example, the 
TMS320C30 accesses three parallel memory banks twice in each cycle. Since the memory 
cycle time is half of the instruction cycle time, very fast memories are required. Therefore 
most DSPs provide memory banks on-chip. In addition, the separate memory buses are 
also internal. Otherwise, these buses require a large number of external pins. The larger 
the on-chip memory banks, the faster the execution speed of large complex programs, and
Yim, All-Digital Multicarrier Demodulators 5-43
5 All-Digital Multicarrier Demodulator Implementation Signal Processor Architectures and Software 5.6
the lesser the need for expansive external fast memories. The TMS320C25 has 256 words 
of RAM for program, whilst the C30 has 2K words. The C50 has 8K on-chip RAM, which 
are adequate for the entire program and data of the T-SAT MCD.
Multiple operand instructions complete in a single cycle only if the operands are 
appropriately scattered in the parallel memory banks. Otherwise memory conflict occurs, 
and correct operation requires extra cycles. The TMS320C30 is superior to other DSPs 
with the same demand ratio of four, since even if two operands come from the same memory 
block, a multi-operand instruction still only consumes one cycle.
Since many DSP functions require repeated execution of instructions for a fixed number 
of times, low over-head loops are important The TMS320C25 has a one word cache for 
FIR filtering. The C30 has a cache of 64 words — a large block of instructions can be 
repeated without a branch instruction in each pass. The only other notable processor with 
a cache is the DSP16 with 15 words.
5.6.3 Multi-Processor Implementation
When multiple processors are required for high bit rate or large number of channels, the 
choice of algorithm and mac time are more important than the processor arcliitecture and 
configuration. The ideal situation is that the overall algorithm can be naturally partitioned 
into sub-functions, where each requires a computation load equivalent to one processor. 
The partitioning of algorithm also determines the multiprocessor configuration. For 
example: levels in a tree algorithm can be executed in cascaded DSPs using serial link or 
FIFOs; nodes in each level can be executed in parallelled DSPs without any coupling.
If undesirable partitioning cannot be avoided, extra instruction cycles are required that 
cannot be compensated by better processor architectures. In this case, the mac time remains 
as the prime guide in selecting DSPs. Since related operations are not implemented on the 
same chip, the opportunities for optimization are much reduced. Tight coupling amongst 
processors (typically using dual-port memories) would require frequent I/O instructions. 
Apart from the fact that these instructions would not be required in natural partitioning, 
I/O cycles are often slower than on-chip memory accesses.
5.6.4 Software Development Tools
Software development tools are often missing in the consideration of DSPs. Processor 
architectures are reflected in their assembly languages. Different architectures probably 
would require special features in software tools, but may not be fully supported. Like
Yim, All-Digital Multicarrier Demodulators 5-44
5 All-Digital Multicarrier Demodulator Implementation Signal Processor Architectures and Software 5.6
conventional assembly languages, the TMS language is instruction based. Since 
complicated operations can be accomplished in a single instruction, the total number of 
instructions is large. Also, instructions with complex functions may be regarded as less 
readable. The DSP32 language is operator based as in high level languages, hence more 
readable. However, many operators often appear on the same line to be accomplished in 
a single cycle, which are not easy to comprehend. Which of these two classes of languages 
one prefers is inherently subjective. But operator based languages have the advantage that 
programs can be written in a much shorter time, if code optimization is not important.
It is desirable to write fully relocatable code due to the existence of often small parallel 
memory banks. To optimize the speed of execution, the frequently used variables should 
be allocated in the fastest memories, and the variables in a multiple operand instruction 
should be appropriately scattered in the parallel memory banks. Fully relocatable code 
allows the programmer to write the codes first. The number of memory locations required 
and the frequency of access would then be exactly known. Optimization can be performed 
by reassigning variables in different memory banks using assembler directives or linking 
instructions, without affecting the program codes written.
Even the most primitive development tools would support the relocation of a full address 
word. In the TMS320C25, fast memory accesses using a full 16-bit address is possible 
only by using the address registers. For example, the codes for saving the accumulator 
are
Irlk arl, operand_address
larp arl
sad *,shift,ar2
The first problem is that loading the 16-bit address into the address register arl takes at 
least one cycle (two for Irlk). This overhead is large compared to the single instruction per 
tap in FIR filtering. In multichannel applications, the number of operands is typically large. 
Since there are only eight address registers, the frequency of register loading is high. The 
second problem is that one of the eight address registers has to be selected (by larp) as the 
default register (*) before use, which consumes one cycle. This extra cycle can be avoided 
by using the address pointer look ahead feature. The sad instruction above selects ar2 as 
the next register to be used by default. However, this feature easily leads to the wrong 
register to be selected when program code changes.
The alternative codes using short immediate addressing in the TMS320C25 are
Yim, All-Digital Multicarrier Demodulators 5-45
5 All-Digital Multicarrier Demodulator Implementation Signal Processor Architectures and Software 5.6
Idpk first_9_bit_address 
sad last_7_bit_address
The Idpk instruction loads the page pointer. The following short addresses refer to the 
memory locations in a 128-word page. Using symbolic addresses, the programmer is still 
required to supply the absolute value of both the page address and short address at least 
once, in order to specify the load location. More desirable codes would be
Idpk 16_bit_address
sad 16_bit_address
The preliminary software tools for the TMS320C25 are those for the earlier TMS processor 
family. These tools are surprisingly primitive compared to those of microprocessors: capital 
letters only; strict rules for spaces; short symbol length. Late in the T-SAT project, a 
completely new set of tools using a common object file format (COFF) is released. The 
first line works correctly only in a later version of the COFF tools. The significance is that 
the programmer no longer need to specify any absolute address, which can be dealt with 
in the COFF linker. It was the arrival of this linker that provided sophisticated support for 
relocating pages. Modifications of assembler codes were highly reduced when optimizing 
the use of memory banks.
The addressing of operands illustrates the difficulties in estimating the required processing 
power forthe algorithm at hand. Instruction cycles spent in memory accesses are significant 
compared to those spent in computations. The number of memory access cycles depends 
on many factors: the processor architecture; the algorithm (e.g., FIR vs FFT); the total 
number of memory locations used; the development time allocated for optimization.
A desirable software tool would be a processor cycle analyser, since the number of cycles 
required for an instruction varies, depending on how effective the fast memory banks are 
used. Using a software simulator, it is possible to estimate the total number of cycles a 
program requires. But it is the number of cycles required by individual instructions that 
are most useful in optimizing time critical codes. Fitting a 4-channel tree hierarchy into a 
single chip required considerable effort. With a 97% processor utilization, much time was 
spent in just counting the total execution time of different program versions, using a 
complex table of instruction cycles.
5.6.5 Summary
In summary, the selection of DSPs can be based on the mac time and demand ratio. The 
latest members in families of general purpose DSPs all have a demand ratio of four.
Yim, All-Digital Multicarrier Demodulators 5-46
5 All-Digital Multicarrier Demodulator Implementation Signal Processor Architectures and Software 5.6
Different processors with the same mac time would complete a particular algorithm in 
similar time. Selecting a less complex processor to optimize the cost requires large effort, 
particularly for low demand ratios. To give an accurate estimate of the throughput of two 
processors, the complete system has to be coded in two languages with sufficient 
optimization.
Another cost is the development time. Although signal processing in digital modulation 
schemes need not require floating point computation, the use of fixed point processors 
requires considerable time for the scaling of numbers. Due to the large dynamic range of 
FDM signals in MCDs, appropriate scaling to optimize the use of the fixed word length 
is more important than conventional modem applications.
After growing accustomed to operator based languages (e.g., DSP32) and their potential 
hazards, the development time would be shorter than instruction based languages (e.g., 
TMS). The instruction sets of operator based languages are typically small, and the 
programs are very similar amongst processors.
We have seen that a typical processor has instructions that involve fetching of operands 
from memory, computations, and modifications of address registers. Fast execution is 
accomplished by overlapping these operations for different instructions, e.g., fetching the 
operands of the next instruction while performing computations. This pipelining is central 
to fast computation in DSPs, but it can have a serious impact on programmability. The 
techniques for dealing with pipelining can be classified into: [Lee89a]
(1) Interlocking (e.g., TMS320C25, C30)
(2) Time-Stationary Coding (e.g., DSP16A, DSP56001)
(3) Data-Stationary Coding (e.g., DSP32C)
Interlocking hides the pipeline operations from the programmer, which explains the 
popularity of the TMS processors. There is little difference between the TMS assembler 
language and that of a general purpose microprocessor. Each instruction completes 
apparently before the next begins. Although the instructions are actually overlapped, the 
processor guarantees that the results are the same when each instruction is executed one 
by one.
Obviously, higher performance can be obtained by giving the programmer explicit control 
over the pipeline stages. In time-stationary coding, computations and the fetching of 
operands can be specified in different fields of a single instruction. The programming is 
demanding since this form of coding resembles microcode, in which fields of the instruction
Yim, All-Digital Multicarrier Demodulators 5-47
5 All-Digital Multicarrier Demodulator Implementation References 5.7
Specify operations in different parts of the processor. The advantage is that we can 
concentrate immediately on the coding of straightforward algorithms, rather than searching 
repeatedly for algorithms and computational structures to fit processor architectures, which 
is very time consuming, as we have seen earlier in the T-SAT demultiplexer.
In data-stationary coding, an instruction specifies what happens to the data through 
operators and assignments (as in high level languages), rather than specifying what happens 
to the hardware in a particular time (as in time-stationary coding). Pipelining is apparently 
hidden from the programmer as in interlocking, but data-stationary coding is no less 
efficient than time-stationary coding -  processor resources such as memory bandwidth 
and register transfers can also be fully utilized with appropriate combinations of operators 
and assignments. The main disadvantage is that the results of an instruction may not be 
available immediately for the next instruction.
5.7 R eferences
Aghvami88a. A. H. Aghvami, A. Clarke, B. G. Evans, P. G. Farrell, J. G. Gardiner, J. R.
Norbury, and E. Vilar, “Land Mobile Satellites Using Highly Elliptic Orbits 
- The UK T-SAT Mobile Payload,” 4th International Conference on 
Satellite Systems For Mobile Communications and Navigation, pp. 
147-153, lEE, Oct. 1988.
Ahmad90a. J. Ahmad, “Digital Modulator Techniques for On-Board Processing 
Satellites,” MPhiHPhD Transfer Report, University of Surrey, Jan. 1990.
Bi90a. G. Bi and F. P. Coakley, “The Design of Transmultiplexors For On-Board
Processing Satellites Using Bit-Serial Processing Technique,” 13th AIAA 
International Communication Satellite Systems Conference, pp. 613-622, 
March 1990.
Bi90b. G. Bi, F. P. Coakley, and B. G. Evans, “Rational Sampling Rate Conversion
Structures with Minimum Delay Requirements,” submitted to lEE 
Computer and Digital Techniques, 1990.
Gardner85a. F. M. Gardner, “On-Board Processing for Mobile-Satellite 
Communications,” ESTEC contract no. 5889/84/NL/GM, European Space 
Agency (May 1985).
Gardner86a. F. M. Gardner, “A BPSK/QPSK Timing-Error Detector for Sampled 
Receivers,” IEEE Trans, on Comm., COM-34, May 1986.
Yim, All-Digital Multicarrier Demodulators 5-48
5 All-Digital Multicarrier Demodulator Implementation References 5.7
Gockler88a. H. Gockler, “A Modular Multistage Approach to Digital FDM
Demultiplexing for Mobile SCPC Satellite Communications,” 
International Journal o f Satellite Communications, VoL 6, No. 3, pp. 
283-288, Wiley, 1988.
Hespelt88a. V. Hespelt and T. Alberty, “Study and Development of On- Board 
Multicarrier Demodulator for Mobile Communications,” ESTEC Contract 
No. 6497I85INL, ANT Nachrichtentechnik, Apr. 1988.
Lee89a. E. A. Lee, “Programmable DSP Architectures: Part II,” IEEE ASSP
Magazine, Vol 6., No. 1, pp. 4-14, Jan. 1989.
Takahata87a. F. Takahata and et al., “A PSK Group Modem for Satellite
Communication,” lEEEJ. on Selected Areas in Comm., Vol. SAC-5, pp. 
648-661, May 1987.
Voyant90a. J. Y. Voyant, “All-Digital, Programmable Modem Implementation,” MSc 
Research Project Report, University of Surrey, Dec. 1990.
Yim88a. W. H. Yim, C. C. D. Kwan, F. P. Coakley, and B. G. Evans, “On-board
Yim89a.
Multil CmTlCr Demodulator for Mobile Applications using DSP 
Implementation,” Proc. 1st Int. Workshop on Digital Signal Processing 
Techniques applied to Space Communications, pp. 124-130, European 
Space Agency, ESTEC, Nov 1988.
W. H. Yim, C. C. D. Kwan, F. P. Coakley, and B. G. Evans, “On-Board 
Multicarrier Demodulators for Mobile Applications using DSP 
Implementation,” Proc. 1st European Conference on Satellite 
Communications, Nov 1989.
Yim, All-Digital Multicarrier Demodulators 5-49
6 A Data-Flow Oriented Simulation System
Table of Contents
6 A Data-Flow Oriented Simulation S y s te m ...........................6-1
6.1 Requirement Analysis ................. ................................... ......... . 6-2
6.1.1 Automatic Scheduling .......................... ............................ 6-3
6.1.2 Feedback ...........................................................................6-4
6.1.3 Hierarchical Block Diagram Paradigm .............................6-4
6.1.4 Memory Management.......................................................6-4
6.1.5 Hardware Computation .....................................................6-5
6.1.6 Signal Analysis...................................................................6-5
6.1.7 Batch Execution .................................................................6-5
6.1.8 Run-time Efficiency...........................................................  6-6
6.2 Survey of Simulation Packages................................................6-6
6.3 Survey of Computer Languages...............................................6-7
6.4 Implementation of DO SS........................................................... 6-9
6.4.1 Data-flow Oriented 0 ......................................................... 6-9
6.4.1.1 Built-in N ode.....................................................   6-13
6.4.1.2 A rc ................................................................................  6-15
6.4.1.3 Dynamic Scheduling.....................................................  6-15
6.4.1.4 Hardware Computation.................................................  6-16
6.4.2 Batch Execution ...................................................................6-17
6.4.3 Code Generator...................................................... .......... 6-17
6.4.4 Software Development and Maintenance..........................6-21
6.4.5 Integration of DSP Tools...................................................  6-22
6.5 References......................................................................................6-22
Yim, All-Digital Multicarrier Demodulators 6-i
A Data-Flow Oriented 
Simulation System
Approaches in research and development can be classified into three categories:
(1) theoretical analysis
(2) simulation
(3) experiment
Simulation results are usually appreciated, for example, when no accurate analytical 
method exists (e.g., non-linearities), or when experiments are impossible or prohibitively 
expensive (e.g., satellite payloads). To the analyst, simulation is a brute force approach; 
to the experimentalist, simulation lacks credibility.
There should be no doubt about tlie contributions of simulation when it is regarded as a 
tool — for computer-aided modeling, analysis and design. Proper use of simulation enjoys 
the best of both the analytical and experimental worlds: simulation models can be more 
accurate than analytic models; simulation conditions can be more controllable and 
repeatable than experimental conditions.
Yet, simulation is not always a fast and cheap alternative. When a simple analytic model 
exists, the quest for even a slightly more accurate simulation result may require a high
Yim, All-Digital Multicarrier Demodulators 6-1
6 A Data-Flow Oriented Simulation System Requirement Analysis 6.1
price: excessive computer run-time; long hours for a human operator to carry out the 
simulation; high development cost in both algorithm and software. Similarly, it may be 
more economical to draw conclusions from a carefully selected experiment.
This chapter describes the design and implementation of a Data-flow Oriented Simulation 
System (DOSS). Much of the research work in this thesis was carried out on, or inspired 
by this system. But why another simulation package? This will be answered in the following 
sections when we look at other off-the-shelf packages, and requirements in simulation of 
multirate DSP.
6.1 Requirem ent Analysis
To cope with complexity, a communication system is often modelled in layers. Likewise, 
different simulation levels exist:
Network
Link
Signal Processing
Function
Analogue Circuit
Computational
Structure
Digital Circuit
Here the network level refers to the random events that happen in a communication system 
consisting of a number of users -  transmitters and receivers. The link level is what happens 
between a transmitter and a receiver. As with any other levels, the signal processing level 
can be subdivided. Functional simulation need not be directly realizable in real time, as 
opposed to analogue or digital circuits. In DSP, it is desirable to have an additional level 
to bridge the large gap between mathematical formulae (function) and logic gates (digital 
circuit). Computational structures must be realizable in real time, in terms of multipliers, 
adders, etc. This levels also deals with fast algorithms, number representations, and some 
bit level operations as in distributed arithmetic.
The requirement of different levels may be drastically different. The network and link 
levels are highly distinct, as evident in the simulation packages for the two levels. 
Confusions arise in the requirements of the link level and various levels in signal processing, 
since the ability to simulate a signal passing through a filter also can be used to simulate 
the response of a capacitor, or a bit passing through a logic gate. Assuming the required
Yim, All-Digital Multicarrier Demodulators 6-2
6 A Data-Flow Oriented Simulation System Requirement Analysis 6.1
functions are either supplied or can be defined by users, we can use a link simulation 
package to perform circuit analysis, or use a circuit analysis package to simulate a complete 
communication link. Therefore in considering a simulation package, we have to identify 
what the aims are.
The link level deals mainly with models, e.g., propagation channels, oscillator phase noise, 
non-linear amplifiers and filters etc. Demodulators are often modelled in terms of carrier 
phase jitter and symbol timing jitter. It is often more credible to simulate the signal 
processing functions of an analogue phase-lock-loop, rather than modelling the phase 
jitters. This kind of simulation at the functional level requires a package to provide more 
sophisticated support for users to deal with the increased complexity.
DOSS is targeted right at the middle of digital signal processing -  multirate computational 
structures. This is necessary for the design and implementation of all-digital MCDs. If we 
are primarily concerned with the implementation loss of MCDs in a communication link, 
there is little difference between analogue and digital implementations, in the sense that 
the same bit-error-rate can always be achieved by either approach. If we are concerned 
with the complexity of all-digital MCDs, we need not simulate multirate algorithms. We 
can establish the performance by single-rate simulation, and then count only the 
computations that are necessary. We focus on computational structures in MCDs for the 
same reasons as the use of CAD tools for digital circuit design: to validate algorithms; to 
reduce errors and development time.
After attempting to make use of TOPSIM III, stand-alone C programs and Unix C-shell 
scripts for integration, the need for a custom simulation system targeted at multirate DSP 
was evident. The requirements identified are described in the following sections.
6.1.1 Automatic Scheduling
Early packages perform simulation in the frequency domain. To simplify discussion, let 
the simulation duration determine the size of a single array to store the frequency domain 
samples. This array is modified in turn by procedures that represent models or signal 
processing algorithms. Therefore each procedure is executed only once in a simulation. 
This is possible by operating in the frequency domain because iterations implied in 
feedback loops can be eliminated.
To simulate feedback loops, computations must be performed in the time domain on a 
sample by sample basis. In single rate sampling, the simulation time determines the number
Yim, All-Digital Multicarrier Demodulators 6-3
6 A Data-Flow Oriented Simulation System Requirement Analysis 6.1
of execution cycles. In each cycle, only a single sample is stored. Each procedure modifies 
this sample in turn. When all procedures are executed once, the next cycle begins. Therefore 
each procedure is executed once per cycle.
In multirate sampling, the execution times of procedures are not equal. If an 
interpolation-by-2 filter is followed by a decimation-by-3 filter, to maintain a steady signal 
flow, every 3 executions of the interpolator must be balanced by 2 executions of the 
decimator. Scheduling refers to the relative execution times of all procedures. Without 
automatic scheduling, a user has to look at all procedures in the system to determine the 
overall schedule.
6.1.2 Feedback
Signal feedback is common is DSP, e.g., HR filters and phase-lock-loops. If a simulation 
package does not support feedback, a complete loop has to be hidden in a single library 
procedure. This means that a demodulator with a long loop structure is totally hidden from 
the user. To investigate alternative demodulator structures, a user has to write a library 
procedure for the complete demodulator. Such a package would offer little help in reducing 
user programming.
6.1.3 Hierarchical Block Diagram Paradigm
In signal processing systems, the prime concern is the function operating on a signal, its 
source and destination. Therefore a data-flow language is highly desirable for the implicit 
transportation of signal samples, and the scheduling of functions by the availability of their 
operands.
To enable users to perform complex simulations, hierarchical specifications of systems 
must be supported. Good support for hierarchy, as in any high level language, also 
simplifies the task of programming library functions.
Most simulation packages have a data-flow programming language, graphical or textual. 
The most appropriate programming paradigm is a hierarchy of block diagrams.
6.1.4 Memory Management
To simulate two FIR filters in a system, a typical procedural language program would 
execute the procedures
outputO -  fivifilterO, inputO, registerO ) 
outputl = fvrifilterl, inputl, registerl )
Yim, All-Digital Multicarrier Demodulators 6-4
6 A Data-Flow Oriented Simulation System Requirement Analysis 6.1
atdifferenttimes. The input and output relationships are necessary, as well as the parameters 
filterO andfilter 1, which specify uniquely the functions to be performed. The shift register 
arrays, registerO and registerl, are used to store the pass signal samples. The user needs 
not deal with these local memories, since only the procedure fir with the parameterfilterO 
needs to know the existence of registerO. In a complex system, the number of local 
memories is large, therefore hiding all of them would be much simpler for the user. Local 
memories then have to be managed automatically by the simulation system.
One approach for memory management is the use of macros, as in earlier versions of 
TOPSIM. Functions such as fir are straightforward procedures with the registers hidden 
inside as static variables. The simulation system duplicates the source code of such 
functions, giving them different names, e.g., firO and firl. This type of package is very 
inefficient since the procedure codes are not shared.
6.1.5 Hardware Computation
The host computer of a simulation system determines the behaviour of arithmetic 
operations, e.g., 2’s complement arithmetic for integers. Target computation hardware will 
have different architectures, e.g., a fixed point signal processor with a certain word length. 
Custom hardware is more complex, since different multipliers or word lengths etc, can be 
used in the same system. A simulation system for DSP should allow easy specification 
and variation of local word lengths and hardware types.
6.1.6 Signal Analysis
The aim of simulation is to prove that the algorithms work correctly, and to analyse the 
performance under different input conditions and operating parameters. The overheads to 
perform these frequent activities should be minimised. A simulation system should allow 
signal analysis to be carried out with ease, e.g., plotting of intermediate signals. Such a 
system also benefits programmers -  the task of validation of library functions would be 
easier.
6.1.7 Batch Execution
Trade-offs and optimizations may require many simulation runs involving many 
combinations of parameter values. A simulation system should provide sound support for 
multiple executions of the simulated system, with automatic substitution of parameter 
values.
Yim, All-Digital Multicarrier Demodulators 6-5
6 A Data-Flow Oriented Simulation System Survey of Simulation Packages 6.2
Parameters such as signal-to-noise ratios specify the function of a single block only, but 
parameters such as interpolation rates affect the topology and schedule of the complete 
system under simulation. These problems are more serious in the use of a graphical 
language to specify batch execution, which is inherently restricted on both execution 
control and parameter substitution.
6.1.8 Run-time Efficiency
Fast algorithms are developed for reducing the complexity of dedicated hardware for 
computational intensive applications. The number-crunching problem is more serious 
when simulation is carried out on a general purpose computer. Therefore simulation 
approaches that introduce unnecessary run-time penalties should be rejected: using single 
sampling-rate techniques to simulate multirate systems if possible; file-based approaches 
where all immediate signals are stored in data files.
6.2 Survey of Simulation Packages
There are many software tools for signal processing and DSP, but most are signal analysis 
or numerical tools rather than simulation tools. Most simulation tools are targeted at 
communication links, since a variety of channel models is required, and the same signal 
processing function can be implemented in numerous ways, depending on the performance 
required and the complexity allowed.
Earlier packages, e.g., TOPSIM III, simulate continuous signals. Although DSP techniques 
must be used in a digital computer, the package assumes a single sampling rate throughout.
Later packages allow multirate sampling, but allowing is very different from supporting. 
Most of these packages have a single-rate architecture with minor modifications. In 
TOPSIM IV, users can specify the sampling rate between blocks. In SPW and later versions 
of BOSS from Comdisco, a block can hold the execution of another block by a control 
signal. However, this manual scheduling is extremely tedious to use. This type of package 
is only suitable for late design-phase simulation, but not for initial study of algorithms.
All packages suitable for multirate simulation have automatic scheduling. BLOSIM 
[Messerschmitt84a] is an early example. Details of internal design are not available, 
however, the author’s understanding is that the user interface is poor, as it requires textual 
input for connecting blocks, similar to the input files of circuit analysis programs.
YIm, All-Digital Multicarrier Demodulators 6-6
6 A Data-Flow Oriented Simulation System Survey of Computer Languages 6.3
COSSAP from Cadis has all the features for simulating multirate systems. However, as 
with many other packages, one main problem is the graphical user interface. For example, 
a 256-channel demultiplexer would require an interpolator block with 256 output 
connections, and a user has to connect these outputs to following sta:ges one by one. There 
may be ways to avoid this problem, but similar difficulties remain, which will be illustrated 
in more detail later.
It is apparent that only with a suitable language and a custom designed software 
architecture, would we meet the requirements for simulation.
6-3 Survey of Com puter Languages
Occam is a data-flow language. Blocks can be represented by processes without the need 
for memory management. Inter-block connections can be represented by channels, which 
guarantee the correct transfer of samples between blocks. However, since the hardware 
architecture has to support the flowing of data across channels, the structure of data is 
restricted. The earliest version of Occam supported only integer variable types. Later, bytes 
and floating point numbers were supported, and sophisticated protocols may be specified 
for the channels. Yet the lack of support for user defined data structures hardly justify its 
use as a software tool. In addition, because of the true data-flow characteristics, it would 
be difficult to control the overall simulated system, e.g., batch execution.
Functional languages such as ML and Miranda are very attractive for the validation of 
mathematical algorithms. However, time states are absent in these languages, i.e., the input 
and output relationships of functions are always true. Therefore it would be awkward to 
program algorithms capable of implementation in real-time.
There are languages targeted at simulation, an old example being Simula. Object Oriented 
Languages (OOL), such as Small Talk, are commonly regarded as the languages (general 
purpose) of the future, but it is not often stated that some OOLs were designed originally 
for simulation. The main disadvantage of OOLs is that programming productivity is 
increased at the expense of run-time efficiency. C++ is a compromise: variable types are 
not supported; ordinary functions and data structures are allowed. However, a suitable 
language is not enough, e.g., software development tools in Unix understand C programs 
by default, but not C++. These tools include automatic makefile generators, program flow 
analysers and source code debuggers.
Yim, All-Digital Multicarrier Demodulators g.y
6 A Data-Flow Oriented Simulation System Survey of Computer Languages 6.3
There are other general purpose languages that may worth considering. The main purpose 
of Ada seems to be saving money for years to come. One of its selling points is the concept 
of modules, but OOLs can do a better job. However, there are two unusual features: the 
precision of computations can be specified, but this is not enough for simulating the variety 
of hardware types; tasks may be useful for simulating blocks, but they are less attractive 
in syntax than Occam processes, and suffer the same disadvantages when parallelism or 
concurrency is not the main concern.
To people believing that there may be problems in dealing with the complexity of Ada, 
Pascal and Modula-2 are good alternatives, since it is easy to award a training certificate 
proving that the holder knows everything about these two languages. (Even in Fortran, 
there are excessively large number of format options to make holes in a punch card.) 
However, extensions are certainly required in Pascal to deal with real world programs, 
e.g., standard Pascal does not allow separate compilation, which would not be able to cope 
with even a small size package.
Standardized Fortran scored poorly in terms of software engineering, which led to the 
development of Rationalised Fortran, Extended Fortran, etc. However, most sophisticated 
simulation packages mentioned above are implemented in Fortran. This is misleading since 
the Fortran versions used are very similar to C, which has pointer manipulation, dynamic 
memory allocation functions and block structured statements. These versions do not have 
the portability implied by the name Fortran.
Some tasks that are not supported in a programming language can be performed through 
the operating system. To the programmer, the operating system is a set of system function 
calls. This definition avoids many often pointless arguments: user interaction, textual or 
graphical, is often not considered as part of a modem operating system; a sophisticated 
operating system could be made to resemble a simple one, with trivial customizations or 
the addition of a simple application package; run-time efficiency is one of the trade-offs 
amongst functionality, flexibility and hardware complexity. In Unix, blocks can be 
represented by stand alone programs (processes) connected by pipes, and shell scripts can 
be used to program the complete simulated system. However, simulating a large system 
with complex topology is not feasible because of the limited shell syntax, and the run-time 
overhead of true processes. These problems can be partially overcome by more advanced 
Unix features such as named pipes, shared memories, and light-weight processes. Yet these 
features would not be easily portable, and would not be necessary if true concurrency is 
not required.
Yim, All-Digital Multicarrier Demodulators 6-8
6 A Data-Flow Oriented Simulation System Implementation of DOSS 6.4
C is chosen for the implementation of DOSS. There are other general advantages apart 
from those mentioned above. A C program can be as easy to read as a Pascal one. The 
scope of identifiers is C is as rich as that of a machine language. Therefore modules can 
be implemented with structured partition of program and header files. This is enhanced 
by using the program checker lint. Pointers are error prone, but allowing their use makes 
C versatile, e.g., variable size arrays and generic list processing. Function pointers add 
extra flexibility to the order of program execution, which is highly desirable for automatic 
scheduling.
6.4 Im plem entation of DOSS
There is a common misconception that conflicts are unavoidable between the three aspects 
of a simulation system:
(1) user interface
(2) algorithm
(3) software
For example: a sophisticated user interface would require highly complex software; 
algorithm designers would only be interested in the most straightforward way to express 
their formulae; programmers are left with a formidable problem in order to satisfy the 
needs of all others concerned. These conflicts of interest are evident in many packages, 
where the library functions may contain more control statements tiian the mathematical 
operations.
The aim of DOSS is to improve programming productivity by designing a suitable software 
architecture. As a result, the coding and validation of complex algorithms is simplified, 
and the package is easy to use.
DOSS is implemented in C, mainly because of the abundance of software development 
tools. The input to DOSS is a C program specifying the system to be simulated. Since the 
programming style of this input program is so different from C, a language called Data-flow 
Oriented C (DOC) is designed in order to provide a clear concept to users. The detailed 
implementation of DOSS is described in the following sections.
6.4.1 Data-flow Oriented C
A DOC program consists of nodes communicating through arcs, corresponding to blocks 
and connections in a block diagram. A complete DOC program for a baseband BPSK 
modulator is shown in figure 6.1. A pseudo-random bit sequence is generated in the node
Yim, All-Digital Multicarrier Demodulators 6-9
6 A Data-Flow Oriented Simulation System Implementation of DOSS 6.4
ran bit, converted to impulses in bit impulse, and then passed through an interpolation 
by 8 transmit filter tx filter with raised cosine characteristic. The sampling rate at the 
modulator output is therefore 8 samples/symbol.
#include "doss.h"
#define L 8 
system 0 
{
def_arc0/f); def_arc(/m/7M&e); dQijaxcisignal)\ 
fir *rc_fir = load_fir("transmit.fir") ;
}
ran bit^fr); bit_impulse(M, impulse)',
ix_û\ter{impulse, signal, L, rc_fir); smkisignal);
sdLweisignal, "bpsk.dat");
tx_filter(fw/7, sig, L, rc) 
arc *imp, *sig\ 
fir *rc; 
intL;
{
def_vec(vcc, L);
interpo(z/w/7, vec, L, rc); par_ser(vec, sig, L);
}
Figure 6.1 BPSK Modulator
A node can be the complete simulated system (system), composed of lower level nodes 
(tx filter), or built-in (interpo), which performs predefined functions. The ordering of 
nodes is irrelevant, therefore users can place the nodes as in a block diagram. Also, feedback 
loops demand no special treatment.
Arcs are built-in data structures, and have to be defined before use as in most structured 
languages. Here the definitions of arcs are macros, which include initialization functions 
for insertion of debugging informations, and to allocate memories etc. Arrays of arcs are 
necessary in multirate DSP, e.g., vec connected to interpo, which is a polyphase 
interpolator. The outputs of interpo are left in parallel form, since this is more often used. 
Individual arcs in the array can be accessed by vec[i]. Since a single stream is required in 
this application, the node par ser performs parallel to serial conversion. Each arc must 
be connected to one and only one nodes at each end. The node sink is used to satisfy this 
condition. If two nodes share the same input samples, a duplicate node is required.
Yim, All-Digital Multicarrier Demodulators 6-10
6 A Data-Flow Oriented Simulation System Implementation of DOSS 6.4
Other features in DOC are built-in data structures such as fir, and built-in functions such 
as load_fir, which is for reading filter coefficients from a file. The function save specifies 
that the signal samples flowing through an arc are to be written to a file. Users can define 
other data structures and functions as in C. The more formal definition of function must 
be obeyed -  the input and output relationships must be independent of time, since there 
is no explicit global time state in a data-flow language.
To provide a clear concept to users, DOC is a language that uses the compiler DOCC. The 
implementation of DOC is through adding a small set of macros and library procedures to 
the C language, therefore no compiler development is required. The compile command 
DOCC is a shell script that calls the C compiler. Although DOC is a data-flow language, 
all C characteristics are inherited, therefore existing software such as filter design functions 
can be directly reused. The relationship between DOC and C is shown in table 6.1.
Table 6.1 Relationship between DOC and C
DOC C
arc predefined data structure
built-in data structure predefined data structure
user defined data structure user defined data structure
built-in node predefined procedure
built-in function predefined function
user defined function user defined function
The treatment of nodes and arcs in DOC has the same consistency as that of functions and 
data structures in any structured language. This allows a user to handle a complex system 
with ease, through the use of hierarchy. For the programmer, the coding of built-in nodes 
can be kept to a minimum, a large number of library functions can be written in DOC 
directly.
DOC can handle very complex topologies, through iteration (loops) and recursion. This 
allows a polyphase-DFT filter bank to be built easily from only two nodes: a FIR node 
and a butterfly node. Derivatives of the basic polyphase-DFT structure can be built by 
adding very simple nodes in appropriate data-flow paths, without modifying the polyphase 
network or FFT node already tested and compiled in a library. A large number of logic 
gates also can be handled conveniently.
Yim, All-Digital Multicarrier Demodulators 6-11
6 A Data-Flow Oriented Simulation System Implementation of DOSS 6.4
The reasons for the implementation of DOSS can be clearly seen from the above example. 
Let us first look at the node function:
ts._pL\ter{impulse,signal, L, rc_fir);
If we change L, the filter length and coefficients of rc_fir have to be redesigned. For 
convenience, most packages try to simulate filters if possible, rather than implement filters 
directly as in DOSS, therefore these packages do not have the variety of filter optimization 
algorithms necessary for use in actual applications. Even if users prepare to write filter 
design procedures, the filter coefficients have to be imported into the package through a 
file. In DOSS, the user procedures can be directly called.
Without automatic scheduling, the user has to perform many tasks to control all the stages 
following the node tx Alter whenever L changes. This is necessary in both textual and 
graphical input packages. In DOSS, as in COSSAP, no user action is required.
For investigating multirate systems, the user should be able to deal with lower level details. 
The problems of a graphical input package can be seen in the polyphase interpolator node:
interpo {imp, vec, L, rc) ;
There is one input connection {imp) hut^Lparallel output connections {vec). To the author’s 
knowledge, COSSAP does not provide such a built-in icon, which allows a user to supply 
only the parameter L and the prototype filter rc. Normally, a graphical icon would have a 
fixed number of inputs and outputs. Even if a variable number of outputs can be connected 
to such an icon, the problems of connecting these outputs to the following stages are 
apparent for a graphical language. It is possible to build an interpolator from L FIR filters 
in COSSAP. However, this user defined icon has to be redrawn whenever L changes.
It is generally agreed that a grapliical interface is very good for abstract specification. It 
may be possible, but would not be useful, if everything is dealt with graphically, including 
the lowest level details.
A good package should have at least two levels of user interface. In the first level, a new 
user should be able to make good use of the package in minutes or hours. In higher levels, 
experienced users should have more sophisticated support to perform tedious tasks.
In all the graphical packages mentioned above, if users find it inconvenient to build a 
complex function graphically, they have to write library functions in Fortran. The structure
Yim, All-Digital Multicarrier Demodulators 6-12
6 A Data-Flow Oriented Simulation System Implementation of DOSS 6.4
of library function is not designed for users without internal knowledge of the package. 
Even if users are prepared to deal with the fine details of the package, a function such as 
a fixed rate interpolator would not be useful for another simulation.
DOSS provides the language DOC to simplify the specification of complex systems, 
reducing the need for writing new built-in nodes when new functions are required. The 
additional implementation of a graphical interface would be structured -  nodes and arcs 
are mapped onto icons and lines respectively. The cores of other packages include not only 
manipulations of visual objects, but also ad-hoc source code manipulations to generate 
programs.
The complexity of a graphical interface is often over estimated. Once standards exist, 
software tools for graphical interface are freely available. With high level language 
compilers, the need for machine code programming is much reduced. Graphical tools are 
similar, e.g.: X-Window deals with mouse movements, buttons and primitive visual 
objects; Motif deals with windows, menu bars and selections. There are higher level tools 
such that the overall layout and user interactions etc, are centralised rather than distributed 
throughout the program -  the design and maintenance effort of a consistence graphical 
interface would be much reduced.
The original purpose of DOC is to reduce programming effort, and to ease algorithm testing. 
Since the paradigm of DOC fits multirate DSP, the library functions are highly reusable. 
A package emerges as functions increases, as well as customizations to perform tedious 
but routine tasks. The detailed implementation of DOC is described in the following 
sections.
6.4.1.1 Built-in Node
An example of a built-in node, a one sample delay element, is shown in figure 6.2. The 
first type definition section groups all local memories into a single structure. The second 
section is the code statements tliat are executed during simulation. The third section is the 
initialization procedure, which is called by a DOC program. This procedure is also divided 
into three sections: specification of the number of samples each node consumes and 
produces for each execution; allocation and initialization of local memories; informing the 
DOC kernel the address of the group of local memories, and the address of the procedure 
to be executed.
Yim, All-Digital Multicarrier Demodulators 6-13
6 A Data-Flow Oriented Simulation System Implementation of DOSS 6.4
#include "node.h"
typedef struct {
arc *in, *out;
double last;
} node_data;
static int node_code( nd ) 
node_data *nd;
{
double new; 
new = get( nd->in ); 
put( nd->out, nd->last ); 
nd->last = new;
}
delay( input, output ) 
arc *input, *output;
{
node_data *nd; 
int node_codeO;
get_protocol( input, 1 ); 
put_protocol( output ,1 );
nd = (node_code*) mem( sizeof( node_data ) ); 
nd->last = 0; 
nd->in = input; 
nd->out = output;
}
new_node( "delay", node_code, nd);
Figure 6.2 Built-in Node Implementation
This implementation is passive -  specifying the procedure to be executed later rather than 
executing it immediately. The delayed execution allows structured sectioning of the 
program layout. The layout of most packages, including BOSS and COSSAP, are not 
attractive. In these packages, statements performing similar functions as in one section of 
a DOSS built-in node are scattered throughout the program.
Our prime interests are the functions that a built-in node performs, which are specified by 
the statements in the node_code procedure. In DOC, this procedure is very similar to a 
straightforward procedure in C, except that the input and output samples must be accessed
Yim, All-Digital Multicarrier Demodulators 6-14
6 A Data-Flow Oriented Simulation System Implementation of DOSS 6.4
through the functions put() and getO, and local memories must be accessed through the 
pointer nd. Temporary variables can be used freely; the variable new is introduced 
deliberately here for illustration.
In a DOC built-in node, the programming overhead in supporting data-flow is very small 
compared with packages such as BOSS and COSSAP. The overhead is related to the ratio
number of statements in node code
total number of statements
Programming in DOC is much more pleasant than writing a new built-in node. Since DOC 
can handle complex structures easily, the number of built-in nodes can be kept very small. 
The use of objective oriented languages (OOLs) such C++ would make the programming 
of built-in nodes simpler. The three sections of DOC built-in nodes are to implement 
memory allocations, initialization functions and internal functions of objects, which are 
primary features of OOLs.
6.4.1.2 Arc
Each arc contains a cyclic buffer and information to regulate the flow of samples. A node 
may have an arbitrary number of input and output arcs, but an arc must join exactly two 
nodes. Let each execution of the transmit node puts a maximum of M samples into an arc. 
Similarly, let each execution of the receive node takes a maximum of N samples from the 
arc. Using a fixed size buffer (larger than M and N), buffer overflow or underflow will 
not occur, since the DOC kernel has the necessary information inside each arc to determine 
which nodes should be executed at a given time. However, to avoid deadlock of data-flow, 
the minimum buffer size must be the least common factor (LCM) of M and N. This size 
is determined automatically by the DOC kernel, and the allocation of buffer memory is 
hidden from the user and programmer.
6.4.1.3 Dynamic Scheduling
After a DOC program is executed, lists are automatically generated to represent the 
simulated system. Any composite node is a group of built-in nodes joined by arcs. There 
is a single node list linking all built-in nodes. Each element of the node list contains a 
function pointer (node_code), local memory pointer (node_data) and two lists -  a list of 
all input arcs and a list of all output arcs.
Yim, All-Digital Multicarrier Demodulators 6-15
6 A Data-Flow Oriented Simulation System Implementation of DOSS 6.4
The DOC kernel checks each node in turn to see that: all its input arcs contain the maximum 
number of samples that may be consumed by each execution of the node; all its output 
arcs can hold the maximum number of samples that may be produced by each execution 
of the node. If these conditions are satisfied, the node is executed once (fired) via the 
function pointer. This process is repeated on the list until the specified simulation time is 
reached.
6.4.1.4 Hardware Computation
All samples are represented by double precision numbers normalized to one. This avoids 
scaling as in integer arithmetic, and allows uniform handling of samples. In DOC, fixed 
precision arithmetic can be realized in two ways.
In the first method, fixed precision arithmetic are all performed by arithmetic operator 
nodes, which use only floating point operators internally. The output samples of each node 
are modified according to user specifications (precision and mode) stored in each arc.
In the second method, function calls such as z = mul( x, y ) are used in nodes. The same 
arithmetic function call can be assigned to (via function pointers) different functions that 
carry out the required user specifications.
The arithmetic behaviour of nodes can be modified globally or locally as a group. For 
example:
hostO;
nodeO (...); nodel (...);
tms( saturate, 10 ); 
node2(...);
The first two nodes will behave naturally as defined by the arithmetic operations in the 
host computer. The third node will conform to arithmetic operations in TMS processors 
with the saturation mode turned on, except that the word length used is 10 bits. (When 
overflow occurs during accumulation, the saturation mode specifies that the result should 
be the largest positive number, or the smallest negative number.) The word length parameter 
is also an ordinary parameter in DOC, therefore iterations can be performed using batch 
execution.
Yim, All-Digital Multicarrier Demodulators 6-16
6 A Data-Flow Oriented Simulation System Implementation of DOSS 6.4
Other specialized DSP languages or hardware languages such as VHDL, Silage 
[Rimey88a] and Signal [Guemic86a] support the specification of word lengths etc. 
However, these specifications cannot be varied during run-time. Therefore these languages 
are only good when the required precision is already determined. '
6.4.2 Batch Execution
The purposes of simulation are to perform trade-offs and optimizations. This would require 
frequent changing of parameters of the simulated system. All other packages support only 
the iteration of a single parameter, e.g., repeat simulation of a system under different 
from 3 to 12 dB, in steps of 1 dB.
The complete simulated system is represented by a single node -  systemO. DOSS allows 
users to control the batch execution of systemO by a C procedure. Since a DOC program 
is also a C program, there is little restriction on the control and passing of multiple 
parameters between the DOC program and the batch program. Numerical optimizations 
can be performed by supply a search and compare strategy.
In multirate systems, some parameters affect the system topology, e.g., the interpolation 
rate determines the number of output arcs of a polyphase interpolator. In DOSS, tliese 
parameters do not require special treatment. For example, the only difference between a 
DOC program of a 2-channel demultiplexer and that of a 256-channel demultiplexer is a 
single parameter -  the number of channels. In a graphical input package, we have to redraw 
the system block diagram whenever the number of channels changes.
6.4.3 Code Generator
DOSS needs only a small number of built-in nodes. In the extreme case, only multipliers, 
adders, and lookup tables etc, are required. Therefore a DOC program can be used in other 
ways.
Each built-in node can generate some textual information to describe the computations 
(code generation mode) instead of performing them (simulation mode). If the text generated 
is valid statements of a programming language, a code generator results. Since the number 
of built-in nodes is small, we can generator codes for different target languages with 
minimal effort by providing one small source code library for each target.
One such code generator is Gabriel [Lee89a], which is targeted at generating assembler 
programs for various digital signal processors. The inefficiencies of compilers for DSPs 
are well known. This is because the translation of individual high level language statements
Yim, All-Digital Multicarrier Demodulators 6-17
6 A Data-Flow Oriented Simulation System Implementation of DOSS 6,4
into instructions for DSPs can neither control pipeline operations, nor take advantage of 
specially optimized instructions. Run-time efficiency can be increased by calling optimized 
assembler procedures from high level languages. However, this can be performed using 
code generators, without the overhead of function calls.
Code generation involves the translation of a data-flow oriented language (DOC) into a 
procedural language (target). In the implementation of DOC, the order of the execution of 
nodes (the schedule) is determined at run-time, by searching through the node list. In the 
generated target language, the schedule must be already determined in order to be run-time 
efficient, i.e., static scheduling.
Static scheduling is possible if the simulated system satisfies the Synchronous Data-Flow 
(SDF) model as described in chapter 2. In brief, each execution of a node (firing) consumes 
a fixed amount of samples in all its input nodes, and produces a fixed amount of samples 
in all its output nodes.
An experimental. Data-flow Oriented Code Generator (DOCG) was implemented as part 
of DOSS. The following figures illustrate the translation of a data-flow oriented language 
(DOC in figure 6.3) into a procedural language (C in 6.4).
As each node is fired, codes are generated, which may be C statements or function calls. 
In the simulation mode, the buffering functions put and get deal with the actual reading 
and writing of samples, but in the code generation mode, the index of the buffers are 
returned. The generated program is an execution cycle. In each execution cycle, some 
nodes are fired more frequently than the others, which is represented by the number of 
occurrences of a node in the program. Each node reads and writes blocks of samples in 
some locations (indexes) within buffers (arrays) of fixed size. For each occurrence of a 
node, the indexes to the buffers are the same for all execution cycles. Therefore dynamic 
buffers are not required.
In DOCG, the approach for static scheduling is totally opposite to that of Gabriel. In the 
target language, arcs are replaced by fixed size arrays, i.e., static buffers. In Gabriel, the 
schedule is determined first, followed by the sizes of static buffers. This is misleading 
since the minimum size of each static buffer can be determined simply by looking at the 
two nodes joined by an arc. If the transmit node produces M samples for each execution, 
and the receive node consumes N samples for each execution, the static buffer size is the 
least common factor of M and N. The same algorithm is used in the simulation mode for
Yim, All-Digital Multicarrier Demodulators 6-18
6 A Data-Flow Oriented Simulation System Implementation of DOSS 6.4
#include "doss.h" 
systemO 
{
def_arc(src); def_vec(vec,2); def_arc(dst);
sink(vec[0]);
dc(src, 1.0); duplicate(src,vec);
k_mul(vec[l],dst,0.5); sink(dst);
}
Figure 6.3 DOC Program
double vec_0[l]; 
double src_l[2]; 
double vec_2[l]; 
double dst_3[l];
syscodeO
{
/* t=0 */
/*dc_l :0*/ 
src_l[0] = 1;
/* t=l */
/*dc_l : !* /  
src_l[l] = 1;
!*  t=2 */
/* duplicate_2:0 */ 
vec_0[0] = src_l[0]; 
vec_2[0] = src_l[l];
/* t=3 */
/* sink_0 :0 */
/* k_mul_3 :G */ 
dst_3[0] = vec_2[0] * 0.5;
/* t=4 */
/* sink_4 ;0 */
}
Figure 6.4 Generated Program
determining the buffer size of arcs to avoid deadlock, long before the interest in static 
buffering. The values of M and N of each arc are maximum values in the simulation mode, 
but they must be constants in the code generation mode to satisfy the SDF model.
Gabriel determines the schedule by manipulation of matrix equations that contain the 
values M and N of all arcs. We use a very simple search algorithm. In the code generator 
DOCG, a DOC program is executed and a node list is produced as in the simulation mode. 
In the searching phase, the number of samples in the fixed size buffers are recorded, together 
with the total number of times that a node is fired (but without generating code). The search 
algorithm is:
Yim, All-Digital Multicarrier Demodulators 6-19
6 A Data-Flow Oriented Simulation System Implementation of DOSS 6.4
(1) All arcs are initially empty (zero number of samples).
(2) Fire any one node. This would generate negative number of samples in 
its input arcs (buffer underflow).
(3) A node should be fired if any of its output arcs have a Negative number 
of samples, and any of its input arcs have a positive number of samples.
(4) The search continues until all arcs are empty again. Each node must be 
fired at least once.
The next code generation phase, which is very similar to the simulation mode, determines 
the order of execution. A node can be fired if:
(1) buffer underflow and overflow would not occur,
(2) the total number of times it should be fired, found in the searching phase, 
has not been reached.
In this experimental version of DOCG, only C programs are generated from DOC. The 
advantages are:
(1) Static scheduling eliminates run-time inefficient dynamic buffers.
(2) Variable types can be implemented. Without code generation, all arcs must have 
the same data type for the samples that pass through. For example, all samples have 
to be represented by double precision variables. Using code generation, different 
types of static buffers can be used in the generated program, e.g., integers for fixed 
point numbers.
(3) The procedural program generated provides an additional level of debugging. The 
task of manual translation into signal processor languages is easier, since the code 
generator prints out constant data, and performs automatic scaling etc.
(4) Run-time efficient codes can be generated for simulating fixed point arithmetic. 
Without code generation, all samples are represented by floating point numbers 
even when they can be represented by integers. The arithmetic of floating point 
numbers is usually slower than integers. In addition, the precisions of floating point 
results have to be readjusted after every operation to simulate integer arithmetic. 
Using code generation, efficient C statements can be generated according to the 
type and word length of samples, and these statements are further optimized by the 
C compiler.
Yim, All-Digital Multicarrier Demodulators 6-20
6 A Data-Flow Oriented Simulation System Implementation of DOSS 6.4
The technique of in-line code generation has been used to speed up computation in a general 
purpose computer. The target code may be assembler code or microcode inserted in a high 
level language. In the simulation mode, all calls to a built-in node share the same program 
codes, but these codes are duplicated in the code generation mode. Therefore DOCG is 
inherently designed to generate high level in-line codes (efficient C statements) on a 
massive scale. When the simulated system is complex, the large amount of statements may 
slow down execution significantly, especially for a RISC machine, which is highly unlikely 
to be optimized against the unusual style of the generated program. One attractive solution 
to reduce program size is to generate procedures that can be shared. These efficient 
procedures also can be used in the simulation mode, which is necessary to deal with systems 
that do not satisfy the SDF model.
Code generation is also useful for DSP systems implemented using parallel processors. 
Since DOC is a data-flow language, it does not inhibit inherent parallelism in DSP 
algorithms. Given the number of processors and the topology, an efficient parallel schedule 
can be found from a DOC program. One program is then generated for each processor. 
The programming languages for these processors may be procedural as in DSPs, or 
data-flow oriented as in transputers.
6.4.4 Software Development and Maintenance
DOSS has been developed under the Software Proj ect Management System (SPMS), which 
is used to integrate almost all Unix programming tools. SPMS is targeted at the coding 
and maintenance phase. Since the DOSS kernel is small, design phase software tools would 
not be very useful.
Through the use of SPMS, facilities useful for both programmers and users can be provided 
economically. The automatically generated makefiles include facilities such as consistency 
checking (lint), source code control (rcs), installation of programs and libraries into 
allocated directories etc. On-line documentation can be entered and retrieved conveniently. 
Maintenance of DOSS is also simplified, with provision of directory dictionaries, and 
extension of Unix commands to operate on selected directory types.
The core of DOSS has undergone several major changes: to solve new problems 
encountered; to improve consistency and efficiency; to simplify input syntax. However, 
the design of the language DOC hardly changed. Old libraries of DOC procedures are 
usually reusable without source code modification. Compatibility is maintained by 
introducing a few macros.
Yim, All-Digital Multicarrier Demodulators 6-21
6 A Data-Flow Oriented Simulation System References 6.5
6.4.5 Integration of DSP Tools
Since DOSS is implemented in C, packages on the Unix system can be easily integrated
by directly calling stand alone programs or custom C-shell scripts. For example, the
following utilities are necessary and readily available:
(1) AT&T System V Unix Graphics Package for fast plotting of data on 4010 series 
compatible terminals or emulators, and statistical data analysis such as variance and 
frequency distribution.
(2) GNU Plot for generating hardcopies of different graphic formats, and emulating 
different terminals.
(3) A modified Ratfor version of the Park and McCellan filter design program that 
supports automatic searching to meet filter specifications.
6.5 R eferences
Guemic86a. P. Le Guemic et. al., "SIGNAL - A data flow-oriented language for
signal processing,” IEEE Trans. Acoustic., Speech, Signal 
Processing, Vol. ASSP-34, April 1986.
Lee89a. E. A. Lee, "Gabriel: A Design Environment for DSP,” IEEE Trans.
Acoustic, Speech, Signal Processing, Vol. vol. ASSP-37, Nov. 1989.
Messerschmitt84a. D. G. Messerschmitt, "A tool for structured functional simulation,” 
lEEEJ. Select. Areas Commun., Vol. vol. SAC-2, Jan. 1984..
Rimey88a. K. Rimey and P. Hilfinger, "A compiler for application specific signal
processors,” in VLSI Signal Processing IE, IEEE Press, New York 
(1988).
Yim, All-Digital Multicarrier Demodulators 6-22
Conclusion and Future Work
In the study of all-digital multicarrier demodulators, we have identified structured 
approaches for the processing of bandpass signals in linear systems. The use of complex 
signals allows us to focus on the functions of linear systems, rather than their realizations, 
which have numerous possibilities in DSP. Simple concepts are emphasised: one-sided 
frequency translation of signals and filters; sufficient sampling rate in terms of two-sided 
signal bandwidths; sampling invariance. Complex signals are highly desirable in describing 
bandpass processing, although not always in a simple way, e.g., one can state that the I.F. 
stage of an AM receiver performs a Hilbert transform together with a one-sided frequency 
shift.
The novel theory of polyphase-matrix replaces many ad-hoc approaches in multirate DSP. 
A single expression for a polyphase-matrix filter includes interpolators, decimators and 
sampling rate converters as special cases. The most general form of the polyphase-matrix 
filter can have an arbitrary integer ratio between the input and output sampling rate. This 
means that we can parallelise a single-rate filter as well as a conventional polyphase filter 
to meet special demands, thus replacing many ad-hoc techniques. The decomposition of 
a prototype filter into a polyphase-matrix is also represented by a single, simple expression, 
which replaces many tedious and error-prone procedures. Polyphase-matrix theories 
always guarantee a straightforward signal-flow graph representation (hence simple 
control), minimum delay elements and minimum computation rate. Making use of filter 
symmetry reduces the computation rate further, but the delay elements and control
Yim, All-Digital Multicarrier Demodulators 7-1
7 Conclusion and Future Work
complexity are necessarily increased. There may be opportunities for further research work 
on structured exploitations of filter symmetries and their resulting overheads. Two 
applications of polyphase-matrix theories are central to all-digital implementations of 
MCDs — polyphase-lattice filter and polyphase-matrix-DFT filter bank.
A polyphase-lattice filter changes not only the sampling rate, but also the sampling phase. 
This is a very attractive method for symbol timing adjustment in all-digital demodulators. 
Since the exact relation between the transmitter and receiver symbol clocks are unknown, 
the ratio of the input and output sampling rates of a polyphase-lattice filter cannot be fixed. 
Another application is for known rational ratios of mutually prime but large integers. A 
straightforward sampling rate convertor would require a huge ROM to store the large 
number of filter coefficients. The ratio of large integers can be approximated by smaller 
ones, with periodic changes of sampling phases to spread the approximation errors evenly. 
This can be implemented by a polyphase-lattice filter with a counter.
A single expression for a polyphase-matrix-DFT filter bank includes all efficient filter 
bank techniques (for non-overlapping channels) as special cases. Many ad-hoc approaches 
can be replaced, and the selection of filter bank structure becomes the choice between 
single-stage (low computation rate, large memory) and multistage (high computation rate, 
small memory). The advantages of multistage tree structures are: simple partitioning for 
multichip implementations; efficient handling of non-uniform bandwidth channels, which 
may be required for satellite applications. Also, the availability of a single concise 
expression and signal-flow graphs derived from it simplify the design and implementation 
of filter banks, particularly multistage ones.
We have seen that the approximate computation rate of a filter bank increases linearly as 
the number of channels. This removes the fear that a MCD for a large number of FDM 
channels is much more complicated than a single TDM demodulator. Ideally, the chip area 
consumed in VLSI implementation varies linearly with the throughput (symbol rate times 
number of channels). Therefore the complexity of MCDs does not lie in the chip volume 
and power consumption, but in the search for algorithms and computational structures to 
match a given throughput, such that cliip area is minimized. FDM has advantages in 
allowing parallel processing to achieve high throughput, e.g., a single TDM channel may 
not be as suitable as the splitting into two or more FDM subchannels at a very early stage.
A promising trend in multirate DSP is the use of over-sampling plus differential 
quantization, e.g., all samples are represented by the numbers -1,0,+1. This type of 
processing has already been used internally in high speed A/D converters. Although the
Yim, All-Digital Multicarrier Demodulators 7-2
7 Conclusion and Future Work
apparent functions of these converters are conventional (e.g., absolute quantization at the 
output with 8-bit), they use differential quantization internally and contain multistage 
decimation filters. In VLSI design, there may be simplifications by processing the 
differential quantized signals directly. Another advantage is the reduction of bus width, 
which may result in more efficient use of chip area for particular applications. Differential 
quantization requires, e.g., a 2-bit bus, while absolute quantization requires, e.g., an 8-bit 
bus. The importance of bus width can be seen in bit-serial processing, where absolute 
quantization is used, but a serial (1-bit) bus is used for interconnection. In this case, 
however, additional complexity is required in the form of a word mark or a separate control 
signal to denote the beginning of each word, otherwise, automatic recovery after a transient 
fault is not possible.
Digital modulation schemes are the synthesis of continuous signals to meet certain 
requirements. A structured approach allows us to focus on the characteristics of the 
artificially generated signal, rather than ad-hoc generating methods. For example, all linear 
modulation schemes can be represented by a single expression, which can be directly 
implemented using multirate DSP. If power and bandwidth efficiency are the only 
concerns, we do not need to consider schemes other than linear modulation using 
raised-cosine filters: the modulated signal is band-limited; trellis coding allows highly 
flexible combinations of power and bandwidth efficiency; raised-cosine pulse shaping is 
most simple for matched filter detection. Constant envelope schemes are targeted for 
non-linear channels, but their bandwidth efficiencies are less attractive than linear schemes, 
since the modulated signals generally have theoretically unlimited bandwidths. Although 
there are many schemes in this class, a single expression also can represent all the signal 
formats, in terms of phase pulse and frequency pulse shapes.
Opportunities exist for tlie design of new modulation schemes: improving the bandwidth 
efficiency of constant envelope schemes by allowing small variations in the signal 
envelope; schemes that are targeted at reducing demodulator complexity; fade tolerant 
schemes. One promising direction is the use of offset PSK schemes, with restricted 
transitions between trellis-points, and pulse shapes other than raised-cosine.
One problem in all-digital modem implementation is the design of analogue anti-aliasing 
filters preceding A/D conversion, and analogue anti-imaging filters following D/A 
conversion. Since digital modulation schemes are sensitive to phase distortion, 
approximate linear phase filters should be used. However, linear phase filters (e.g., Bessel) 
have wide transition bands that demand high sampling rates. To enable low sampling rates.
Yim, All-Digital Multicarrier Demodulators 7-3
7 Conclusion and Future Work
shape cut-off filters must be used, but they have undesirable phase characteristics that 
require phase equalizers. One logical approach in designing the overall transmitter-receiver 
pulse shape is to take into account the particular phase characteristics of analogue filters. 
Other possible improvements in filter design are: the optimization of matched-filters and 
other digital filters in the receiver should be co-ordinated; computer optimization methods 
to obtain minimum ISI raised-cosine filters are well-developed, but no run-time efficient 
method exists for root raised-cosine filters (used for matched-filters).
Although there are numerous synchronization techniques, their performance and 
complexity are reflected by three different classifications: criteria (e.g.. Maximum 
Likelihood versus Minimum Mean Square Error) ; data dependency (e.g., data-aided versus 
decision directed); and the method of parameter extraction (e.g., trackers versus Kalman 
filters). The designs of two new algorithms are presented: a differential decision frequency 
error detector that is simple and fast; a dual-comb-filter frequency/timing error detector 
that is targeted at VLSI implementation. Future research work would be the design of 
algorithms and demodulator structures suitable for VLSI. Without taking hardware into 
account, algorithms developed are often suitable for digital signal processors only. Current 
research efforts are mostly concentrated on high-bit-rate demodulators, which require the 
designing of chip-sets. Efficient single-chip implementation of multichannel, low-bit-rate 
demodulators would have different requirements.
The real-time implementation of a complete 4 x 16 kb/s MCD for the T-SAT project 
illustrated that all-digital modems need not be complex, and there is no inherent limitation 
on their performance. Most of the MCD functions were realized in two TMS320C25 chips, 
for the demultiplexer and demodulators respectively. The implementation loss of less than 
0.7 dB was easily achieved, and the acquisition time of 500 symbols under random symbols 
is fast compared to other analogue or digital demodulators with similar complexity. Also, 
structured design concepts developed in this thesis were proven: sufficient sampling rate; 
sampling invariance; a special case of polyphase-lattice filters.
The simulation system DOSS should continue as a valuable tool for future research. In the 
pass few years, packages such as BOSS waxed (became commercially available) and waned 
(technical support discontinued). In the case of TOPSIM, earlier versions are macro based 
(hence non-portable). Version III is a complete redevelopment using Fortran. Version IV 
allows multirate sampling with a graphical interface under development, both features 
would require extensive redevelopment. Therefore the life-cycles of these off-the-shelf 
simulation packages are short, and should not be confused by the use of the same generic
Yim, All-Digital Multicarrier Demodulators 7-4
7 Conclusion and Future Work
name. In contrast, DOC programs fit the data-flow paradigm and hence they hardly need 
any modification or recompilation since the beginning of DOSS. The DOSS kernel is 
designed to be structured; so simple and manageable (especially with SPMS) that a new 
kernel can be completed in days, using the modules already developed, e.g., generic list 
processing. As a result, the experiment of code generation hardly needed any extra 
programming. Also, DOSS provides sophisticated multirate structures that are easy to use, 
but publicized simulation results of synchronization using COSSAP all avoid the need for 
timing adjustment, which shares the same inherent problems of a graphical interface as in 
multirate structures. What DOSS lacks are channel models. However, in the development 
of counter measures for a particular channel characteristic, e.g., fading, one would benefit 
greatly from the development of the modelling process. One reason is that one cannot 
always assume that the models supplied by any packages are valid and highly realistic.
Mixed-level simulation has always been demanded by users, e.g., combining 
communication network, link and signal processing. Yet, no such package exists to date. 
In the software engineering viewpoint, the provisions of features and functions have to be 
sensible. One research area that should be supported in a signal processing package is 
coding. Modulation and coding are inherently inseparable, as evident in trellis coding. 
Coding involves multiple bit-rates, which also demands a data-flow software architecture 
as in multirate sampHng. Coding and DSP operate on different number fields, but these 
fields are crossed in soft decision. The synergy of coding and DSP can be illustrated by 
the fact that a linear convolution coder can be implemented by a polyphase-matrix filter, 
operating on the binary number field.
Yim, All-Digital Multicarrier Demodulators 7-5
