Digital Signal Processing Techniques for On-Board processing Satellites by Kwan, Ching Chung
DIGITAL SIGNAL PROCESSING TECHNIQUES 
for 
ON-BOARD PROCESSING SATELLITES 
Thesis submitted to the University of Surrey 
for the degree of 
Doctor of Philosophy 
Ching Chung KW AN 
Department of Electrical and Electronic Engineering, 
University of Surrey, 
Guildford, 
United Kingdom. 
March,199O 
- i-
Abstract 
In on-board processing satellite systems in which FDMA/SCPC access schemes are 
employed. transmultiplexers are required for the frequency demultiplexing of the SCPC 
signals. Digital techniques for the implementation of the transmultiplexer for such appli-
cation were examined in this project. The signal processing in the transmultiplexer opera-
tions involved many parameters which could be optimized in order to reduce the hardware 
complexity whilst satisfying the level of performance required of the system. An approach 
for the assessment of the relationship between the various parameters and the system per-
formance was devised. which allowed hardware requirement of practical system 
specifications to be estimated. 
For systems involving signals of different bandwidths a more flexible implementation 
of the trans multiplexer is required and two computationally efficient methods. the DFT 
convolution and analysis/synthesis filter bank. were investigated. These methods gave 
greater flexibility to the input frequency plan of the transmultiplexer. at the expense of 
increased computational requirements. Filters were then designed to exploit specific pro-
perties of the flexible transmultiplexer methods. resulting in considerable improvement in 
their efficiencies. Hardware implementation of the flexible transmultiplexer was considered 
and an efficient multi-processor architecture in combination with parallel processing 
software algorithms for the signal processing operations were designed. 
Finally. an experimental model of the payload for a land-mobile satellite system pro-
posal. T -SAT. was constructed using general-purpose digital signal processors and the mer-
its of the on-board processing architecture was demonstrated. 
-ii-
Acknowledgements 
I would like to express my many gratitudes towards my supervisors. Prof. B. G. 
Evans and Mr. F. P. Coakley. without whose guidance and close attention over the years 
this work would never have come into being. 
The Science and Engineering Research Council is gratefully acknowledged for their 
sponsorship through the T -SAT project. 
My heartfelt thanks also go to the many friends and fellow researchers. in particular 
L. N. Chung. Martin Lee. C. S. Leung. Isaac Ng and many others who had given me much 
encouragement and practical support through the many challenges. academic and other-
wise. but I would need to save my deepest respect and acknowledgements for my parents. 
whose love and generosity I would never be able to repay enough. 
Last but never least. to Theresia. who had been my joy and strength through the 
darkest hours in my conquest, I wish to dedicate this thesis. 
-iii-
Contents 
Abstract .................................................................................................................. i 
Acknowledgements ............................................................................................... ii 
Contents ............................................................................................................ ...... ill 
1 Introduction ...................................................................................................... 1 
1.1 General Development in Satellite Communication Systems ....................... 1 
1.2 Mobile Satellite System Requirements ......................................................... 4 
1.2.1 User Requirements ................................................................................ 4 
1.2.2 Baseline System Architecture and Parameters .................................... 5 
1.2.2.1 Advantages of On-Board Processing .......................................... 5 
1.2.2.2 Multiple Spot-beam Antennas .................................................. 8 
1.2.2.3 Orbits ...... ........... ................ ....... ............. ...... ....... ............... .......... 8 
1.3 On-Board Processing Systems and Architectures ........................................ 10 
1.3.1 Access Methods ..................................................................................... 10 
1.3.1.1 TDMA Systems ........................................................................... 10 
1.3.1.2 SCPC-FDMA System .................................................................. 11 
1.3.2 OBP Architecture ................................................................................... 13 
1.3.2.1 Baseband Switch .......................................................................... 13 
1.3.2.2 Multicarrier Demodulators ......................................................... 14 
1.3.2.3 On-Board Processor ..................................................................... 15 
1.4 Technology and Implementation Aspects .............................. ...................... 16 
1.4.1 All-Analogue Implementation .............................................................. 17 
1.4.2 Hybrid Analogue-Digital Implementation ........................................... 17 
1.4.3 All Digital Implementation .................................................................. 18 
1.4.4 Prospects and Major Problems of OBP Systems .................................. 19 
1.5 Summary and Project Objectives ................................................................. 20 
2 On-Board Transmultiplexer ............................................................................ 21 
2.1 Introduction ................................................................................................... 21 
2.2 Theoretical Background ................................................................................ 23 
2.2.1 Multirate Filtering ................................................................................. 24 
2.2.2 Multirate Filtering with Comb Filters ................................................. 28 
2.2.3 Tree Filter Bank ..................................................................................... 29 
2.2.4 Summary...... ... ........................... ................ ........ ...... ............... ............... 37 
2.3 Analysis of Parameters in Implementation ................................................. 37 
2.3.1 Statistical Properties Multicarrier SCPC Signals ................................. 38 
2.3.1.1 Variance of SCPC QPSK Signal .................................................. 38 
2.3.1.2 Dynamic Range of Multicarrier Signals 
and ADC Scale Factor .................................................................... 41 
- iv-
2.3.2 Filter Specification ................................................................................. 43 
2.3.2.1 Stopband Attenuation ................................................................. 44 
2.3.2.2 Other Parameters ......................................................................... 51 
2.3.2.3 Filter Length ................................................................................ 52 
2.3.3 ADC Quantization Word length ............................................................ 53 
2.3.4 Filter Coefficient Wordlength ............................................................... 55 
2.3.5 Filter Arithmetic Wordlengths ............................................................ 58 
2.3.5.1 Scaling of Coefficients .................................................................. 59 
2.3.5.2 Arithmetic Roundoff Noise ......................................................... 60 
2.4 Computer Simulation ............................................. :...................................... 63 
2.4.1 General Procedures ................................................................................ 63 
2.4.2 System Parameters ................................................................................ 64 
2.4.2.1 Filter Specification ................................................ ....................... 64 
2.4.2.2 Finite Word length Effects ......................................... ............... ... 66 
2.4.3 Results ............................................................................................ ........ 66 
2.4.4 Inferences from Results ........................................................................ 71 
2.5 Conclusions .................................................................................................... 73 
Appendix 2A. Optimum Signal Detection by Matched Filter ........................... 75 
3 Flexible Transmultiplexer by DFT Convolution Methods ........................ 77 
3.1 Introduction ................................................................................................... 77 
3.1.1 Problem Description .............................................................................. 77 
3.1.2 Review of Methods ................................................................................ 78 
3.2 Non-uniform TMUX by DFT Convolution ................................................. 80 
3.2.1 Description of Method .......................................................................... 80 
3.2.2 Reduction of Processing Complexity .................................................... 87 
3.2.2.1 Reduced Frequency Windowing ................................................. 88 
3.2.2.2 Analysis of Degradation due to Forced Stopband Zeros ........... 90 
3.3 Filter Design for Variable Bandwidth TMUX ............................................. 94 
3.3.1 Design Criteria ....................................................................................... 97 
3.3.2 Computational Methods ....................................................................... 98 
3.3.3 Results .................................................................................................... 101 
3.4 Issues in Implementation .............................................................................. 103 
3.4.1 Practical Details ..................................................................................... 109 
3.4.2 Computation Requirement .................................................................... 111 
3.5 Summary ....................................................................................................... 114 
4 Flexible Transmultiplexer by Analysis/Synethesis Filter Banks . ...... ..... 115 
4.1 Introduction ................................................................................................... 115 
4.2 Theory of Analysis/Synthesis Filter Banks ................................................ 117 
-v-
4.2.1 Review of Perfect Reconstruction Filter Bank Theory ....................... 117 
4.2.2 Partial Reconstruction Filter Banks ..................... ................................ 120 
4.2.3 Filter Design for Partial Reconstruction Filter Banks ......................... 124 
4.3 Filter Design and Implementation Techniques 
for Partial Reconstruction .............................................................................. 130 
4.3.1 Filter Design .......................................................................................... 130 
4.3.1.1 Design of Analysis Prototypes ................................................... 131 
4.3.1.2 Design of Synthesis Prototypes .................................................. 133 
4.3.1.3 Design Examples .......................................................................... 134 
4.3.2 Implementation Techniques .................................................................. 138 
4.3.2.1 DCf Implementation Via DFf ................................................... 138 
4.3.2.2 Implementation of 2-D DFf by Polynomial Transform .......... 140 
4.4 Computation Requirements .......................................................................... 142 
4.5 Summary ....................................................................................................... 144 
Appendix 4A. Discrete Cosine Transform Filter Bank ..................................... 146 
Appendix 4B. Polynomial Transform Implementation 
of Radix-2. 2-Dimensional DFf .......................................................... 148 
5 nsp Hardware Design for Flexible Transmultiplexers ............................. 150 
5.1 Introduction ................................................................................................... 150 
5.2 Hardware Capability Tradeoff's .................................................................... 151 
5.2.1 Uniform Frequency Allocation 
and Fractionally-Related Bandwidths ............. :..................................... 152 
5.2.2 Unconstrained Frequency Allocation ................................................... 155 
5.3 Multiprocessing Architectures for TMUX Implementation ....................... 156 
5.3.1 Necessity for Multiprocessing Implementation ................................... 156 
5.3.2 Parallel Implementation of DFf Algorithms ...................................... 157 
5.3.2.1 Review of Parallel Architectures ................................................ 157 
5.3.2.2 Algorithms for the Parallel Architecture .................................. 161 
5.3.2.3 Methodology for Reconfigurable FFf Architectures ................. 165 
5.3.3 Parallel Implementation of FIR Filters ................................................ 179 
5.3.3.1 Parallel Algorithms for FIR filters ............................................. 179 
5.3.3.2 Processor Interconnection and System Operation ..................... 180 
5.4 Efficient Techniques for Digital Signal Processors ....................................... 183 
5.4.1 Use of Multiplier-Accumulator in N-path implementation ............... 183 
5.4.2 Prime-Length DFf using Multiplier-Accumulator ............................. 184 
5.4.3 Global Memory Interface Design .......................................................... 185 
5.5 System Realization and Design Examples .................................................... 189 
5.5.1 DFf Convolution Realization ............................................................... 189 
5.5.1.1 System Realization ...................................................................... 189 
5.5.1.2 Design Example ........................................................................... 191 
- vi-
5.5.2 Analysis/Synthesis Filter Bank Realization ........................................ 198 
5.5.2.1 System Realization ...................................................................... 198 
5.5.2.2 Design Example ........................................................................... 198 
5.6 Summary ....................................................................................................... 201 
Appendix 5A. Winograd Fourier Transforms for Small Prime Lengths 204 
Appendix 5B. Fourier Transform Algorithms 
for Multiplier Accumulator ........................•....................................... 206 
Appendix 5C. TMS32OC25 Assembly Language Subroutine 
for size-7 Winograd Fourier Transform ............................................. 208 
6 On-Board Processing for the T-8AT Payload ............................................... 211 
6.1 T -SAT System Description ...................... ....... .... ................ .... .... ..... ..... ........ 211 
6.1.1 System Overview.... ...... ........ ... ............. ..... ...... .......... ...... .......... ............ 212 
6.1.2 SCPC/TDM System ............................................................................... 215 
6.1.3 TDMA/TDM System ............................................................................. 220 
6.2 Experimental Model ...................................................................................... 222 
6.2.1 On-board Processing Subsystem Design Concept ................................ 223 
6.2.2 Analogue Front-end and TMUX Hardware ......................................... 225 
6.2.2.1 Analogue Front-end .................................................................... 225 
6.2.2.2 TMUX Hardware ........................................................................ 227 
6.2.3 Formatter ............................................................................................... 231 
6.2.3.1 Overall Operation ........................................................................ 231 
6.2.3.2 TDMA Formatter Realization ..................................................... 233 
6.2.3.3 SCPC Formatter Realization ....................................................... 235 
6.2.3.4 Convolutional Coder ................................................................... 236 
6.2.3.5 Data I/O and Synchronization .................................................... 238 
6.2.3.6 Computational ReqUirement ....................................................... 240 
6.3 Summary ....................................................................................................... 240 
Appendix 6A. T-SAT Link Budgets ................................................................... 242 
7 Conclusions and Suggestions for Future Work ............................................ 244 
7 .1 Conclusions ...... ......... ...... ................. ....................... ...................... ................. 244 
7.1.1 Objectives Achieved by Digital Processing Techniques ....................... 245 
7.1.1.1 Uniform-Bandwidth TMUX ....................................................... 245 
7.1.1.2 Flexible Transmultiplexer .......................................................... 246 
7.1.1.3 Implementation of TMUX .......................................................... 248 
7.1.1.4. OBP System ................................................................................ 249 
7.1.2 Difficulties in the Proposed Techniques ............................................... 249 
7.1.2.1 Computation Requirement .......................................................... 249 
7 .1.2.2 Realization of Flexible TMUX ............... ..................................... 250 
7.2 Suggestions for Future Work ............. ................. ..... .... ............ ....... ............. 250 
Ileferences ................................................. _................................................. .......... 2.52 
1 
I n1 roduct ion 
1.1 General Development in Satellite Communication Systems 
Satellite communication systems have grown rapidly over the past 2 decades into a 
large number of systems having capabilities not dreamt of in their early days. The original 
system of SYNCOM 3 satellites launched into the geosynchronous orbit in 1964 has been 
joined by many others that cater for different requirements. Advances in technology have 
also made it more economical for a greater part of the population to have access to the 
utilization of such systems. Prominent examples of these are transoceanic links for inter-
national telephony and TV distribution. 
However. recent technological advances have also brought many other systems into 
direct competition with satellite communication systems. Optical fibre cables are 
threatening to replace the traditional role of satellite in long-haul communication links. 
-2-
as their capital costs continue to decrease in relation to the traffic-carrying capacity. For 
comparison. the coaxial trans-Atlantic submarine cable TAT 7 and its predecessors have 
capacities much less than their counter-parts of the INTELSAT series of satellites up to 
INTELSAT V. With optical fibres. the channel bandwidth can support up to almost 2 
Gbit/s. and with attenuation of 0.01 dB/km possible in the foreseeable future. very 
economical communication links can be implemented and make themselves more attractive 
than satellites [1.2]. Cellular radio networks have also become a reality in a very short 
period of time for both mobile and portable communication, fulfilling what was once pro-
posed as an application for satellites [3]. The use of terrestrial networks will most likely 
become more widespread. as reflected by the imminent arrival of the pan-European cellular 
network GSM [4]. Personal communication systems for international use are also undergo-
ing rapid development using facilities provided by existing terrestrial networks, for exam-
ple the proposed Cf-2 system for personal mobile telephony. rather than using a separate 
system that could require satellites. 
Under such intense competition, it is therefore important to concentrate development 
of future satellite systems towards areas that are still unexplored territory. uncatered 
for by present system and projected alternatives. Much effort has been given to identify 
such markets, as reflected by many independent studies conducted by different organiza-
tions [5-11]. Although these studies point to a diversity of application area, there are 
some common characteristics unique to the satellite system which give rise to the many 
possibilities. These characteristics are summarized below. 
(1). Wide coverage: global coverage is facilitated by a small number of satellites. 
(2). Flexibility: links between any points within the coverage area can be established 
with the minimum of system reconfiguration. 
(3). High capacity: with the exception of the optical system, a satellite system has higher 
traffic-handling capacity than all other terrestrial systems ( such as microwave LOS 
links, HF systems). 
(4). Speed of deployment: a satellite system is established as soon as the spacecraft is in 
orbit and accessible by earth stations. 
(5). Connectivity: satellite systems have the inherent ability to connect any points within 
the coverage area, and are not limited by initial system configuration. 
-3-
(6). Accessibility: geographically severe conditions pose no barriers to the setting up of 
satellite communication links. 
In addition to these characteristics which are inherent with the satellite system. 
application of advanced technology also offers possibilities for future satellite systems. 
Improved spacecraft design. higher spacecraft antenna gain with shaped spot beam anten-
nas [12]. and efficient solid-state GaAs devices monolithic microwave integrated circuits 
(MMIC) [13,14] has enabled smaller. cheaper earth stations. Frequency reuse techniques 
using spatial isolation or polarization discrimination [15] have allowed greater bandwidth 
efficiency than would have been achieved conventionally. In addition. with China. the 
Soviet Union and possibly Japan offering launch capabilities in competition with the U.S. 
and Europe. launch costs will progressively be reduced which further enhances the econ-
omy of satellite systems. 
Also to be taken into consideration is the rapid growth of information systems in 
the networking environments. with the ultimate goal of a universal integrated service 
digital network (ISDN) satisfying all telecommunication needs through PTT networks. 
The essential concept is a common standard for data communications satisfying a number 
of different requirements such as voice. TV broadcasting and computer data transfer. 
Coupling these factors with the characteristics of the satellite system discussed before. the 
prospects of future satellite systems become more apparent. The approach towards future 
system concepts should aim to exploit its potential advantages over the existing and pros-
pective terrestrial systems, while filling in the gap in the market still left open by them. 
At the same time future satellite systems and services should be capable of integrating 
into the global communication environment that is beginning to take shape, rather than 
competing with it. These facts have been mentioned by many authors. with particular 
examples in certain studies [16-20] that show synergy between various communication 
systems. 
One market area. widely acknowledged by many authors. is that of the mobile satel-
lite system. in particular land mobile for the larger continental areas. Digital communica-
tion techniques are preferred for their advantages over analogue systems. with the view 
also to facilitate convenient integration into the future ISDN environment. This project 
is therefore concerned with the investigation into techniques that will contribute towards 
achieving these goals. 
-4-
1.2 Mobile Satellite System Requirements 
The services to be provided by a mobile satellite system and their influences upon the 
design of the various system components are addressed in the following sections. 
1.2.1 User Requirements 
Use of satellite system for mobile communication has been most widely imple-
mented for maritime application. with communication and navigation services provided 
by INMARSAT. Studies of land mobile systems have also been carried out by a number 
of parties. and in certain cases they have progressed to the implementation stage. for 
example. an international satellite paging service was successfully demonstrated recently 
[21.22]. Aeronautical systems for business use have also appeared recently and field tri-
als are being carried out [23]. 
Recent interest in the subject have. as discussed in the previous section. focused 
more upon its integration with advanced terrestrial systems. Telephony was recognized as 
one of the areas where potentially profitable markets and large growth exist. The specific 
characteristic that enables this is the wide coverage of satellite systems. In addition to the 
earlier beliefs that such systems can provide coverage of remote. scarcely populated 
areas. the geographical continuity of service that a satellite system can provide is in con-
trast to the regional nature of terrestrial systems. Achieving geographical continuity in 
terrestrial systems involves large amounts of standardization which usually becomes a 
more demanding factor than mere technical ones. 
Regarding its integration to terrestrial networks. a large number of possibilities 
have also been proposed. These have arisen out of the satellite's capability of allowing 
multiple access in addition to its large geographical coverage. As such. it has been proposed 
to enhance existing satellite systems. which are essentially point to point systems provid-
ing a kind of 'cables in the sky'. to become a solution for networking the earth stations. 
as a 'switchboard' or even 'network in the sky' [24.25]. Requirements of such systems 
have been highlighted by many researchers and may be summarized as follows. 
(1). ISDN compatibility: interface to standard digital links. digitized voice. packetized data 
. etc. 
(2). Full connectivity: between mobiles. mobiles and fixed earth stations and point to 
multipoint links. 
-5-
(3). Ability to handle different services of different data rates. similar to ground based 
ISDN. 
(4). Compact and economical mobile terminals. with minimum processing complexity. 
light weight and small physical size. Compatibility with terrestrial systems may also be 
desirable. 
(5). Cooperation/integration with the fixed satellite system such that mobile systems can 
use the fixed earth stations as extra 'gateways' to PSTN and existing networks. 
(6). Reconftgurability according to changing system requirements: optimized to accommo-
date short-term or long-term changes in users' traffic demands. 
These objectives for the mobile system enable a baseline system design and choice of 
system parameters as well as architectures to be carried out. 
1.2.2 Baseline System Architecture and Parameters 
This section discusses many popular choices of system configuration and parameters 
as reflected by some proposed and implemented systems. which can be seen to ful:fill the 
objectives discussed previously. 
1.2.2.1 Advantages of On-Board Processing 
The case for the introduction of on-board processing (OBP) into satellite systems has 
been noted by many authors for different kinds of applications [26-31]. Details concern-
ing the on-board processor architecture and functions are discussed in a subsequent sec-
tion. Here the emphasis will be on the use of OBP in a system context as it is central to 
the entire system concept and affects the choice of other parameters. 
The benefit of the OBP has been seen as instrumental in achieving all of the objectives 
discussed before. although many different forms and degree of OBP have been proposed. 
The earlier concepts of simple regenerative repeaters [32]. as well as the more recent 
satellite-switched time divisioned multiple access (SS-TDMA) systems such as that of 
INTELSAT VI [33]. can be described as suboptimal in the sense that they do not imple-
ment the full capability of OBP. although they may be satisfactory in fulfilling the ser-
vice for which they are designed. R.f. or Lf. switching is inflexible in that it is constrained 
by the hardware and hence specification such as channel bandwidths and data rates cannot 
be changed easily. The same applies to systems using baseband switching as well. such as 
-6-
that for Italsat [34]. Therefore it is necessary to incorporate a higher degree of OBP at 
baseband. if the objective of providing a mobile satellite service that remains easily 
reconfigurable is to be achieved. Such an OBP would bring about the following advantages: 
(I). Improved Link Budget. By separating the uplink and downlink. the end-to-end bit 
error rate (BER) is the sum of the BER per link. Whereas for transparent transponder the 
end-to-end BER is a function of the sum of the noise power for each link. resulting in 
higher EblN 0 requirement for the same BER. This effect is demonstrated by Fig.1.1 which 
shows the Eb IN 0 requirements for QPSK signals with symbol error probability of 10-6• 
for both OBP and transparent systems. It also allows different modulation and coding 
schemes to be used on each link such that the link budget may be optimized with respect 
to different uplink and downlink constraints [35]. 
16 
15 
---co 14 
't:1 
~ 
0 
Z 13 '-. 
.a 
~ 
..!>I: 
12 I=: 
without OBP 
.-,....., 
I=: 
~ 
0 11 Q 
10 
with OBP 
gL-__ ~ __ -J ____ ~ __ -L ____ ~ __ ~ __ ~ 
9 10 11 12 13 14 15 16 
Uplink Eb/No (dB) 
Fig.1.1 EblNo requirements for symbol error rate of 10-6 
(2). Beam-to-beam Routing. This is required when multiple spot-beam antennas are 
employed such that connections between channels on different beams are required. Many 
different configurations are facilitated by a baseband OBP routing function to enable 
efficient system utilization. For example. TDMA uplink may be used by fixed earth sta-
tions for spectral efficiency while SCPC-FDMA may be desirable for lower equipment com-
plexity for mobile earth stations. In comparison with conventional routing by analogue 
means in transparent transponder. the greater flexibility offered by OBP routing also leads 
-7-
to more efficient spectral usage [36] through the use of TDMA/TDM where possible. 
(3). On-board Network Control. A network control system is required to facilitate set-up 
and release of channels as required by the earth stations. With transparent transponders 
this would be located in a fixed earth station and access by a mobile would require two 
hops. With network control performed by OBP this is reduced to one and hence releasing 
link capacity for carrying actual traffic. 
(4). On-board Automatic Retransmit Request (ARQ). ARQ is used to safeguard data 
integrity by requesting the retransmission of the data whenever errors are detected. As 
opposed to transparent transponder where ARQ would operate between two earth stations. 
the OBP may operate an ARQ scheme for each link separately. This reduced the amount of 
communication when. for example. only a mobile uplink introduces error to the data. This 
is in fact a likely situation since ARQ is designed to cater for short-term channel impair-
ment such as shadowing. which normally affects one link only. 
The model of the proposed OBP is depicted in Fig.1.2. It is seen here that the uplink 
and downlink are connected only by the OBP itself. Rather than simply functioning as a 
switch. the OBP will have considerable processing capability to process received uplink 
data at baseband in digital form. and perform the necessary functions on the data before 
relaying data onto the downlink. Through the use of such an OBP. flexibility and 
reconfigurability is guaranteed. It will be able to implement flexible routing algorithms 
and multiple access protocols to achieve high traffic throughput. Being programmable. it 
can be designed to cater for a whole range of data rates and network protocols to adapt 
to the ISDN environment. Also of importance is the ability to implement different uplink 
and downlink modulation and coding schemes. With on-board demodulation and decoding. 
channel impairment is effectively decoupled enabling higher traffic capacity or simpler 
demodulation and decoding techniques and hence simpler mobile terminals. It has also 
been shown [37] that a frequency division multiple access (FDMA) uplink using single 
channel per carrier (SCPC) signals from each mobile. with a time division multiplexing 
(TDM) downlink is an attractive option with many advantageous factors. as described 
later. This is only possible with OBP. and is indeed the baseline system studied by many 
researchers at present. 
Uplink 
-8-
Demod Decode Encode 
• • C/l Baseband • Q) 
~ (lj 
....., 
• • fI) • 
~ Switch 
0:: • • 
• 
Demod Decode Encode 
OBP 
Fig. 1.2 General architecture of an OBP satellite 
1.2.2.2 Multiple Spot-beam Antennas 
Mod 
• C/l 
Q) 
~ (lj 
• 
....., 
fI) 
~ 
• 
0:: 
Downlink 
Mod 
Use of OBP also facilitates the use of multiple spot beam coverage. with the OBP 
implementing appropriate on-board routing. Frequency reuse can be achieved by the spatial 
isolation of spotbeams. thus leading to higher bandwidth efficiency. However. the main 
advantage of spot beam antennas is their higher gains such that the signal power received 
at the mobile is increased. This improves the link budget and hence contributes towards 
lower-cost terminals. smaller mobile antennas or higher information rates. Proposals for 
multiple spot-beams with OBP have long been in existence and traffic handling procedures 
for efficient usage of multibeam traffic capacity have also been examined [38]. INTELSAT 
VI is an example in which such techniques are realized in the form of a SS-TDMA system. 
The more recent ARAMIS payload also uses advanced multiple spotbeams to achieve fre-
quency reuse and high satellite EIRP (Equivalent Isotropic Radiated Power) [39]. 
1.2.2.3 Orbits 
Geostationary orbits have been the norm for most communication satellites since the 
earliest systems and have been satisfactory for most applications so far. Recent studies 
concerning mobile systems have however drawn attention to many serious drawbacks of 
the geosynchronous orbit for mobile use. Most important is the problem of signal blockage 
-9-
and fading by foliage and buildings at low satellite elevatior angles. which is the case for 
most European regions. Another implication of the low elevation angle is the need for 
accurate antenna-pointing mechanisms. whether mechanical or electronic. to achieve a rea-
sonable gain. This will increase the size and cost of the mobile terminals. 
Highly elliptical orbits have therefore been proposed to counter the shortcomings of 
the low elevation angles. Amongst these are the Molniya and the Tundra orbits [40]. A 
graphical comparison of the ground tracks of these orbits is shown in Fig.1.3. The Molniya 
orbit has been used by the Soviet Union for a number of years. and is a 12 hour orbit 
whereas the Tundra has a 24 hour period. Over the active period of operation. the satellite 
remains over Europe at an elevation angle above 55° . This allows the use of fixed zenith-
pointing antennas with lower probability of blockage. 
'0· N 
'O~ 
30· 
Cl.) 
"'0 
~ o· +-' 
-+-' j 
30· 
60 0 
90°5 
Longitude 
Fig.1.3 Ground tracks of Molniya and Tundra Orbits 
Highly elliptical orbits introduce a number of different problems. The Molniya orbit 
requires 2 satellites with complicated 'hand-over' procedures to provide full 24 hours ser-
vice. The problem cannot be circumvented by limiting it to a 12 hour service because as the 
period is actually slightly less than 12 hours. a single satellite would slowly drift from 
daytime to nighttime. A possible advantage offered by the 12-hour period of the orbit is 
that there are effectively two coverage zones. located at longitudinally opposite parts of 
the earth. For example. a satellite providing European coverage will have a second coverage 
- 10-
zone around the Bering Sea. 
A further problem is the large Doppler shift introduced to the signals together with 
a large variation in the propagation path length from apogee to perigee. It is here that aBP 
may again playa role in alleviating these effects. Variation in data rate due to changes in 
path delay may be absorbed by a Doppler buffer at the satellite and with on-board demo-
dulation and a frequency reference for downlink modulation. the Doppler shift appearing 
at the mobile terminal receiver is halved to that without demodulation. Hence again 
specification of the mobile terminals will be less stringent. 
1.3 On -Board Processing Systems and Architectures 
As pointed out earlier. aBP enhances a mobile system in many aspects and allows 
other improvements to be introduced. This section examines systems which employ vari-
ous forms of OBP and attempts to determine the options pertaining to the most viable sys-
tem solution. 
1.3.1 Access Methods 
Two access methods. TDMA and SCPC systems. are considered. 
1.3.1.1 TDMA Systems 
TDMA systems have been used in conventional non-regenerative systems such as 
the 120 Mbit/s system for INTELSAT V. although their introduction was relatively 
recent due to the high cost of terminal equipment. In particular it is replacing the older 
FDMA systems. The most significant advantage of the TDMA system is the transmission 
of only one carrier at anyone time by the satellite transponder. which avoids the intermo-
dulation products caused by nonlinear traveling wave tube amplifiers (TWT A) operating 
on mUltiple carriers. These intermodulation products cause interference into the pas5 ,band 
of other signals. and hence output power back-off is required on the TWTA's to reduce 
such distortion to an acceptable level. thus sacrificing some of the available power output. 
As a result the transponder capacity is not fully utilized. 
TDMA systems are well suited for demand-assigned multiple access systems as the 
time plan of the uplink time frame is easily changed. In a multibeam system. connectivity 
between beams are much more conveniently achieved than in an FDMA system by the use 
- 11-
of on-board switching. Le. SS-TDMA. For FDMA systems. considerable processing is 
required which may include demodulation and remodulation. or at the very least some 
frequency translation and filtering is required for beam-to-beam routing. Inclusion of OBP 
changes this perspective and will be discussed in the next subsection. 
Many OBP systems using TDMA have been proposed. Earlier emphasis was on SS-
TDMA systems with relatively small amounts of OBP being carried out. The switching 
occurs at either r.f. using microwave switching technology or at a lower Lf. Baseband pro-
cessing has since taken over as the main area of interest. Several proposals have turned 
into implementations. such as the INTELSAT VI and the NASA AcrS experimental satel-
lite [41,42]. However. a severe drawback of the TDMA system is the requirement for pre-
cise timing synchronization of the bursts originating from earth stations. There are many 
methods used for this frame synchronization [43] but in all cases it adds complexity to the 
terminal equipment. The nature of mobile earth stations are also significantly different 
from traditional fixed station TDMA systems. in that burst timing is associated with 
mobile movement as well as satellite movement. Protocols that take these factors into 
accounts have been developed [44] to involve some form of closed-loop synchronization to 
enable the mobile to adjust its own burst timing. This implies more complication at the 
mobile terminal thus increases cost. 
1.3.1.2 SCPC-FDMA System 
In an FDMA system multiple access is achieved by sharing of disjoint frequency 
slots by the earth stations. It has been well used in the past as demonstrated by the series 
of INTELSAT satellites. There are other shortcomings in addition to the requirement for 
TWTA power back-off. These include the need for similar carrier power from each uplink 
signal to avoid stronger signals . drowning ouf weaker ones. Systems are normally 
designed with guard bands to prevent adjacent channel interference. which requires extra 
bandwidth and hence lower spectral efficiency. Since earth stations are assigned predeter-
mined carrier frequencies and bandwidths. reconfiguration of a system normally requires 
changes in the actual hardware as opposed to the simpler software changes of burst timing 
for TDMA. This means the flexibility of the system is limited in that short-term changes 
in the traffic demand cannot be catered for. Against these shortcomings is the advantage of 
- 12-
a cost effective system which allows relatively low-cost earth station equipment. since pre-
cise timing control is not required as for TDMA. This is reflected by its popularity in past 
systems. Demand-assigned systems have also been implemented in the form of SPADE. 
which is a SCPC system that provides improved economy especially for low capacity links 
[45]. 
With OBP. many of the above drawbacks can be readily eliminated. The most impor-
tant one, TWTA power back-off, can be avoided if the uplink and downlink modulation 
schemes are separate. In this case, the downlink is no longer constrained to be FDM and 
instead a single TDM carrier may be used. Hence FDMA is no longer a limitation to utiliz-
ing the full capacity of the TWT A. Power control requirements in the uplinks are also not 
as stringent as before because the power back-off requirements now only apply to the 
uplink, and are associated only with the uplink power amplifier gain characteristics. and 
do not involve the TWT A for the downlink. 
The proposed system then comprises mobile earth stations each assigned a frequency 
slot, which allows a mobile to access the satellite using its own SCPC signal. Issues related 
to flexibility required of the system are then dependent on the forecast traffic require-
ments. Several studies have indicated that a large number of low bit rate channels of 64 
kbit/s and lower. mainly for speech and short data messages are optimal. and that there is 
no need to support higher data rates to such mobile terminals. Hence the traditional 
requirement for flexible data rates from fixed earth stations to accommodate different 
traffic conditions does not apply to the mobile system which achieves its flexibility by 
diversified allocation of its frequency spectrum. 
Mobile terminals capable of different carrier frequencies and data rates can be easily 
implemented using frequency synthesizers and digital signal processing (DSP) techniques. 
In fact. prototypes have been produced to support a selection of data rates, modulation 
schemes, access protocols as well as carrier frequencies [46]. 
Hence the advantages of a SCPC-FDMA uplink. TDM downlink OBP system for 
mobile use are clear. Many studies have supported this option or a combination of both 
SCPC-FDMA and some form of thin-route TDMA on the uplink. 
- 13 -
1.3.2 OBP Architecture 
Work has been performed on defining the functions of the OBP according to the 
above baseline system design. The objective is to arrive at an architecture that offers 
enough flexibility to cater for the various system options. whilst at the same time main-
taining system simplicity for ease of control and feasibility of implementation. The pro-
posed OBP has a high degree of processing. significantly above presently implemented sys-
tems. The system components of this OBP are described below. 
1.3.2.1 Baseband Switch 
The on-board switch appears at the centre of the OBP model with all traffic passing 
through it. Its input/output data are purely digital. as are all interfaces to other system 
components. Its main function is to provide connection between uplink and downlink data. 
In addition to beam to beam routing for multi-beam satellites. it must also perform the 
task of reformatting the data. Uplink data may either be by FDMA or TDMA. and in the 
most general form the data rates on the uplinks will differ from those on the downlinks. 
Hence the received data is reorganized by the switch to form the downlink with the data 
placed in the correct frame location. Specifically. in the SS-TDMAlFDMA system with 
TDM downlink. the composition of each TDM time frame will consist of data from 
different frequency or time slots. or a combination of both. The on-board switch will then 
buffer the data received over a period of time. typically the downlink frame period. and 
put data into the time slots as required. Hence full connectivity and flexibility is allowed 
by such formatting of data. 
There are many well-understood architectures that have been applied in terrestrial 
¥ 
switching systems. Possible options for OBP include the TST switch and the memory 
switch. each has its advantages and disadvantages [47]. However. for a mobile system. a 
considerably smaller version of these switches is sufficient. due to the smaller number of 
channels together with less diversity in data rates. The memory switch is therefore pre-
ferred for its simplicity. For even smaller systems which do not employ multibeam anten-
nas. such as the T-SAT system proposed for U.K. [46]. it is often not required to have even 
a switch with such clearly defined and simple architecture as the switching function 
required is itself very simple. In this case a data buffer/formatter. which organizes uplink 
data and formats the data for the downlink as required. serves the function of the switch. 
- 14-
It is later demonstrated that such a formatter can be readily implemented with small 
amounts of processing power and memory. Also being software based a single processor 
may incorporate other functions. such as downlink. coding with the formatting. This is the 
subject of the discussion in Chapter 6. 
1.3.2.2 Multicarrier Demodulators 
The function of the multicarrier demodulator (MCD) is the simultaneous demodula-
tion of a group of SCPC-FDMA signals. It consists of two essential functions. the 
transmultiplexer and a bank of individual demodulators. 
Transmultiplexers (TMUX) originate from terrestrial systems in conversion of fre-
quency multiplexed signals to time-multiplexed forms and vice versa. The term is used 
loosely to denote the similar operation on-board the satellite. where in a SS-FDMA sys-
tem a device is needed to demultiplex the FDMA signals for subsequent time multiplexing. 
It has also been referred to as the demultiplexer or channelizer. which more often denotes 
implementation by analogue means. DSP technologies are believed to show better pros-
pects over the traditional analogue counterparts for satellite on-board application [48]. 
Hence much work has been done to investigate different DSP methods for implementing 
the TMUX. This is the subject discussed throughout the main part of this thesis. and will 
be elaborated in later chapters. 
One essential difference between terrestrial TMUX and those used in satellite OBP is 
the need for demodulation in the latter. Terrestrial TMUX are usually applied to PCM-
coded analogue signals such as speech. and therefore would not require demodulation 
when transformed from frequency to time domain multiplexing. Satellite OBP using 
baseband processing requires the digital information transmitted on the uplinks to enable 
the control of channel accesses by the on-board processor. as will be described next. as 
well as the isolation of the uplink. with the downlink. 
In TDMA systems these have been called burst-modems. In terms of demodulation 
they are not fundamentally different from those in SCPC systems since SCPC signals are 
also burst signals as well. In the SCPC-FDMA situation however. there needs to be a bank 
of such demodulators. one for each frequency slot. Hence the terms multicarrier demodu-
lators or group modems have been used to denote the combination of the TMUX with a 
bank of demodulators. Many studies have investigated efficient implementation of the 
- 15-
demodulators especially for realistically large number of channels. and some possible 
realizations have been reported [49]. Most of the algorithms researched so far deal with 
QPSK modulation which is widely accepted as a good compromise between equipment 
complexity and performance for mobile use. 
It should be noted that whilst the TMUX operation can be shared amongst a group of 
SCPC signals. the demodulators remain conceptually separate for each channel. for the rea-
son that carrier and timing recovery haveto be independent for each carrier. Some form of 
sharing has been reported in some studies. but these are only sharing of a common 
hardware while the synchronization remains separate in essence. 
1.3.2.3 On-Board Processor 
The on-board processor essentially performs two different tasks: the controlling of 
the other subsystem modules • and the implementation of the access protocol. It may 
therefore be described as two distinct components: the on-board control processor (OBCP) 
and the on-board network control system (OBNCS) which is the approach first proposed 
as described in the reference [50]. 
The OBCP is responsible for control of other modules which includes passing on 
configuration information required by other modules. For example beam-to-beam routing 
information is required by the switch when changes are carried out to suit different traffic 
demands. The OBCP thus needs to transmit a table containing the information every time 
a reconfiguration takes place. The same is true for other modules: the modems may sup-
port various data rates and carry out changes when prompted by the OBCP; and likewise 
the code rates for the codecs may be dynamically assigned. Hence with the OBCP a highly 
dynamic system is possible. System initialization at start-up or in the event of a fault may 
also be initiated by the OBCP which can perform the overall monitoring of the system. 
The OBNCS is analogous to the ground based NCS in multiple access systems. Its 
main function is to implement the access protocol which includes procedures to reply to 
reservation requests. to allocate channels to the requests. and to clear down allocated 
channels when they are no longer required. It needs to keep a record of the activities of 
all earth stations. as well as a database that allows it to recognize the identity of all earth 
stations from the messages that they send. Allocation of channels may be performed using 
a number of assignment algorithms [51-53]. In addition to the normal call set-up and 
- 16-
clear-down procedures. it will also need to respond to erroneous conditions and prevent 
the network from degrading. Such erroneous conditions may be common in mobile systems 
due to signal fades causing corruption of signal messages on the uplink. The OBNCS must 
then initiate the necessary clear-down procedures when these occur. Statistics concerning 
the conditions of the channel may also be derived from a knowledge of the frequency of 
such events. The OBNCS may pass on this information to the OBCP which can then 
implement counter measures such as reducing the error-control code rates. 
This partitioning of the on-board processor functions fits in particularly well with 
the implementation aspect. The OBNCS will operate under a 'real-time' environment as 
responses to events. such as reservation requests. are time-critical. The OBCP's mainly 
housekeeping tasks will be less demanding in terms of timing constraints. Hence a parallel 
processing system will be efficient in implementing these two different software environ-
ments. Using this approach then allows adequate resources to be allocated to different 
modules in designing an OBP system. 
1.4 Technology and Implementation Aspects 
Having reviewed relevant concepts in the development of an OBP mobile satellite 
system. some viable implementation aspects and technology are discussed here so as to 
establish a possible approach. The model is the baseband OBP system as outlined previ-
ously. Obviously the implementation of the on-board processor itself will be digital and 
will not be elaborated upon further. The modules of interests then are the signal process-
ing parts and the aim of this section is to provide a review of possible alternatives for the 
implementation of these signal processing components. Moreover. the choice of a particular 
technology for the TMUX. for example. very often leads to the same form of implementa-
tion for the demodulators. Hence the classification is more naturally associated with the 
entire assembly rather than its constituents such as the TMUX alone. The objectives of 
all the studies on implementation has been to reduce the weight. size and power require-
ments of the necessary hardware. Reconfigurability and modularity are also part of the 
consideration. 
- 17 -
1.4.1 All-Analogue Implementation 
Several studies have suggested all analogue approaches using different techniques. 
They are particularly suitable in the SS-FDMA systems for beam-to-beam routing with 
processing being mainly at i.f .• thus eliminating the need for on-board demodulation. An 
architecture proposed by KDD [54] uses surface acoustic wave (SAW) filter banks centred 
at 700 MHz to demultiplex the multibeam uplink into 8 subbands per beam. An intercon-
nection switch can then allocate these subbands to any frequency location in any beam on 
the downlink. Another similar example is included in an INMARSAT study also for a SS-
FDMA system [55]. A bank of 26 SAW filters is proposed. the output of which goes to a 
bank of programmable mixers of the same number. The programmable mixers can then 
shift the FDMA signals in frequency to any location. Finally a switch matrix performs the 
routing to the multiple downlink beams. Several technology options are suggested for the 
implementation of the switch matrix. 
The all-analogue approach is suitable in SS-FDMA systems with FDMA downlinks. 
For downlinks using TDM however. demodulation is almost unavoidable which then 
allows some digital electronics to be used. 
1.4.2 Hybrid Analogue-Digital Implementation 
In systems with on-board demodulation it is possible to use digital technology for 
implementation of most modules. However. speed limitation of the devices often makes it 
impractical to use digital signal processing for demultiplexing and demodulation. The high 
data rates in TDMA systems inevitably requires analogue implementation of the demodu-
lators. as seen in the recent work in implementing modems for different SS-TDMA sys-
tems [56-58]. The switch matrices in these systems use specialized technology to accom-
modate the high data rate requirements. but digital devices using GaAs gate arrays or 
CMOS memory devices are conceivable. In SCPC-FDMA systems. the data rates are a great 
deal lower than in ss-TDMA. Many forms of hybrid implementations of the demodula-
tors are then possible. Most of the studies suggest an analogue implementation of the 
TMUX. due to its high computation requirement if using DSP. with subsequent digital 
implementation of the per-channel demodulators. Most of these proposals have suggested 
SAW filter banks for the TMUX. since SAW filters can be produced to have the desirable 
characteristics of linear phase responses. high stopband attenuation. fiat passbands and 
- 18 -
sharp transition bands. Digital implementation of the demodulators has advantages such 
as programmability, reproducibility and the lack of instability due to aging of components 
etc. A more novel idea was reported in [59] where the filter bank can be configured to 
different channel bandwidths using the specially designed 'bandwidth switchable SAW 
filters'. Still another more novel concept was reported in [60] which suggested the use of 
optoelectronics as the ultimately efficient implementation. The technology unfortunately 
will not be available in the short-term future. The technique of using SAW devices to 
implement the chirp-Fourier transform with subsequent digital demodulation was prob-
ably the most practically feasible especially for large numbers of channels. and was also 
adopted in the recent work by ELAB [61]. 
An up to date review of viable hybrid analogue-digital techniques can be found in 
the recent literature [62.63.46]. The general conclusion is that in the foreseeable future. 
most advances will be in the all digital implementation of the aBP with reducing weight 
and increasing processing power predicted. Hence the main attention of this study will be 
on all digital implementation of the aBP. 
1.4.3 All Digital Implementation 
Due to the dramatic improvements in the performance of digital electronics in terms 
of size and power requirements. much attention has been focused upon the feasibility of 
using all digital techniques to implement the whole baseband on-board processor espe-
cially in SCPC-FDMAlTDM systems where the data rates per uplink channel are consider-
ably lower. Uplink signals are downconverted to near baseband where analogue-to-
digital converters CADC) convert them to digital form. DSP techniques are then applied 
to demultiplex and demodulate the SCPC signals. This is the operation that requires most 
processing power in the aBP and hence much effort has been spent in finding ways to 
reduce it. The first comprehensive investigation of digital demodulators and TMUX tech-
niques were studies sponsored by ESA [64.65]. Although earlier results were clearly in 
existence [66.67]. they were more related to specific systems rather than a general approach 
to the subject. Since then further work has been devoted to detailed analysis of the DSP 
algorithms to optimize the various parameters in order to reduce their complexity for 
hardware realization especially for larger numbers of channels [68-73]. Many useful tech-
niques have resulted which move the all-digital aBP closer to reality. Proof-of-concept 
models have also been built following many of these studies [74.75]. 
- 19-
Continual interest is being shown. especially towards the implementation of the 
MCD. Recent efforts have been directed towards the usage of custom devices in the form 
of ASIC. for which the production costs are decreasing rapidly. Work has also been 
spurred on by the arrival of some very powerful single-chip digital signal processors [76]. 
which are less expensive than custom devices. The future of the all-digital approach. 
therefore. appears very promising. 
1.4.4 Prospects and Major Problems of OBP Systems 
The viability of OBP for mobile satellite systems depends ultimately on the economic 
gain it offers to realistic system specification. It is therefore necessary to assess the cost. 
time scale and major problems for the realization of an OBP satellite in relation to such 
specifications. 
Proposals for land mobile systems for business application in the European region 
[77.78] and. in particular. a detailed study for a system with OBP [36] give quantitative 
estimates of the various system parameters. The main conclusions from these studies 
relevant to this project are summarized below. 
(1). A large number mobile SCPC channels are required to satisfy the expected traffic. An 
estimate is 200 channels for each spot-beam with a total of 12 beams for the mobile links. 
The data rate for each SCPC channel needs to be around 9.6 kbps to provide reasonable 
voice and data services. In addition. an estimate of 20 fixed earth stations are required to 
provide the gateway to the terrestrial network. with each fixed earth station accessing the 
satellite via a TDMA/TDM scheme. A TMUX is thus required for frequency demultiplex-
ing of the SCPC signals so as to enable connections between beams as well as different 
access schemes. 
(2). The space segment cost was estimated to be about 10 to 15 times the cost of the 
ground segment. with the cost of the OBP amounts to about 40% of the cost of the pay-
load. These estimates reflect the objective of minimizing the mobile earth station complex-
ity. and the resultant increase of complexity of the OBP payload. Realization of the OBP 
using 1.25-p.m CMOS technology was considered feasible towards the end of the 1990·s. 
(3). The signal processing functions of the OBP payload was recognized as a main problem 
area due to the high computational requirement of the TMUX. In addition. the implemen-
tation of TMUX using dedicated hardware imposes constraints on the frequency plans of 
- 20-
the system. as opposed to transparent transponders which allows frequency plans to be 
changed readily. 
1.5 Summary and Project Objectives 
The importance of aBP in future mobile satellite system has been discussed. An 
architecture that could satisfy the needs of such systems chosen and the functions of the 
various system components within the on-board processor have been described. Finally. 
the different forms of implementation compared. The conclusions indicate that an all-
digital approach holds the greatest promise. at least in the foreseeable future. 
In view of these conclusions and the rapid advances in digital device technology. this 
project was to devote efforts into optimizing DSP techniques required for the signal pro-
cessing operations in aBP satellites. The objective was to establish the relationship between 
parameters in existing structures for the TMUX. and hence provide a design methodology 
for the minimization of their complexity whilst satisfying some performance criteria. The 
second objective was to investigate novel techniques for higher computation efficiency and 
greater flexibility. 
The remainder of this thesis will address the many different problems of the design 
and implementation of the on-board processor. Attention is placed upon the signal pro-
cessing part of the aBP. with the objectives being to reduce the hardware requirements. 
improve its performance and to consider problems that could be encountered in its imple-
mentation within an aBP system. 
Chapter 2 to 4 discuss the issues relating to the design of the most computation-
intensive component. the TMUX. and show some approaches towards the above objectives 
by optimization of the signal processing algorithms. Chapter 5 and 6 discuss the 
hardware implementation issues and suggest techniques in overcoming some problems 
through hardware specially designed for efficient and flexible implementation of the signal 
processing algorithms. 
2 
On-Board Transmultiplexer 
2.1 Introduction 
In view of the conditions associated with mobile earth stations. FDMA is an attrac-
tive option as the equipment complexity of the mobile earth stations will be comparatively 
inexpensive [79]. The trade-offs between various multiple access techniques involve many 
other factors such as spectral efficiency. interference and channel characteristics. Greater 
details may be found in the literature [80.81.53]. 
The operation of translating digitally a frequency multiplex to a time multiplex has 
been studied in much detail in the past. and has been given the term trans multiplexer 
(TMUX). In general. these studies have been aimed mainly at the problems of translating 
between TDM and FDM standards in terrestrial systems. and in particular the public 
switched telephone network. A general review on the subject can be found in 
- 22-
Scheuermann and Gockler [82] In some respects, the techniques reported are not directly 
applicable to TMUX's for on-board processing satellite operation as they are not required 
to perform the process of demodulation. but that their main concern is the direct conver-
sion between TDM and FDM signals. Another important difference is the criterion by 
which their performance are measured. Whilst the TMUX's used in terrestrial networks 
have performance measured by predetermined criteria, such as the amount of inter-<:hannel 
crosstalk. for which a CCITT standard has been specified, the TMUX's for the OBP satel-
lite are essentially measured by the BER characteristics after demodulation. For this reason 
it is also not possible to exploit some computation reduction techniques [83] designed 
specifically to satisfy such standard. Nevertheless. knowledge of DSP structures for the 
terrestrial TMUX is useful for the demultiplexing stage of the satellite multicarrier demo-
dulator. 
Structures for TMUX's may be classified into two categories: transform and non-
transform based. The objective in both cases is to reduce the hardware requirement in 
terms of computation and storage requirement. Methods using transforms are usually 
based on an FFf type operation. the first reported technique of which used an FFT in com-
bination with a polyphase filter [84]. Similar techniques using variations of the filters and 
transforms have also been proposed [85-90]. although the principles behind all of these are 
essentially similar. Some of these techniques, such as the use of recursive filters, are 
unsuitable for the satellite application because of undesirable characteristics such as non-
linear phase. In some cases different types of transform. such as the cosine transform or a 
combination of two FFf's together. lead to lower computation requirement for particular 
situations as when the input signal is real [91]. Another advantage of the variation over 
the original polyphase-FFf algorithm is that they allow greater flexibility in the choice of 
parameters. such as the number of channels. the channel-stacking arrangement and the 
output sampling rates. These may therefore be varied using the suitable method. Another 
approach for designing FFf type TMUX is by considering the TMUX as a periodically 
time-varying filter [92] which allows the TMUX filter response to be more arbitrary and 
in some cases more efficient through the use of IIR filters [93.94] or a combination of both 
FIR and IIR filters [95]. 
Non-transform type TMUX's are conceptually more straight-forward. They perform 
direct bandpass filtering of the input signal to obtain the desired channel slot from the fre-
quency multiplex. Multirate DSP techniques are widely applied in order to implement this 
- 23-
method in an efficient manner. The basic concept is a down-converter followed by lowpass 
:filtering of the FDM signal. This method is heavy on computational requirements. Mul-
tirate techniques lead to structures with multirate :filtering and successive decimation at 
each stage [96]. The implementation may be on a per-slot basis. where the demultiplexing 
operation consists of independent bandpass :filters each assigned to a particular frequency 
slot. Here the computation is not shared in any stage of the group demodulation process. 
On the other hand. exploiting the orderly nature of the Channel-stacking arrangement. 
greater reductions can be achieved by tree type structures [97.98]. where the computation 
and :filtering are shared amongst all of the output signals as far as the demultiplexing 
stage. Savings can also be made by imposing certain constraints on the :filters. such as the 
use of halfband :filtering [99]. Work has also been reported in the use of complex :filters. 
where the wide transition bands allow filters of much shorter lengths to be used 
[100.101]. Another variation of the tree type filter bank especially efficient for very nar-
row transition width [102] uses the idea of interpolated FIR filters [103] for the tree 
stages. 
The two methods. transform and non-transform types. each have their own advan-
tages and disadvantages and is a subject of study in this chapter. Efforts are concentrated 
on the latter method which has received less study in the past. Section 2.2 discusses the 
fundamental principles involved and outlines some theoretical developments. Section 2.3 
presents some analytical solution to the choice of parameters in the implementation of 
TMUX algorithms on practical DSP hardware. Section 2.4 describes the computer simula-
tion needed to verify the analysis presented in section 2.3. and in some cases provides 
more conclusive assessment of the performance of the TMUX algorithms. Section 2.5 is a 
summary of this chapter drawing some conclusions from computer simulations and 
theoretical analysis. 
2.2 Theoretical Background 
This section reviews the theoretical background which supports the non-transform 
type of TMUX's and in particular the tree type :filter bank structures. Analysis is also car-
ried out where appropriate to derive the performance parameters. which give a measure of 
the relative merits of the structures. 
- 24-
2.2.1 Multirate Filtering 
In efficiently implementing a digital TMUX. the theory of multirate DSP may be 
employed. This is made possible by the fact that the TMUX performs sampling rate 
conversions. as the input sampling frequency is much higher than the output sampling fre-
quency of each individual channel. The saving in computation using multirate techniques 
comes from two sources: firstly in decimation such that only one in every few input sam-
ples needs to be calculated as can be seen in (2.1). 
N-l 
Y (n ~ = 1: h Ci ).x (nM - i ) (2.1) 
i=O 
where n = O. 1. 2 ...• and M = decimation rate. x (n ) = input. h (n ) = :filter coefficient. 
and y (n ) = output. 
The second saving comes from mUltistage :filtering. This technique effectively reduces 
the :filter length by a factor equal to the decimation rate of a :filter stage [104]. The :filter 
response at the high sampling rate is relaxed. shortening the :filter length and hence reduc-
ing computation. The final stage :filter. which in general has a more stringent frequency 
response and hence longer impulse response. operates at a reduced sampling rate because of 
decimation and the overall effect is a net reduction of computation. The choice of the many 
design parameters depends on which criteria is considered important and the parameters 
may then be optimized [105] with respect to the criteria chosen. The most important ones 
considered here are the multiplication rate and storage requirement. It is found that for 
narrow band :filtering as in the TMUX application. the greatest reduction in computation 
and storage occurs in going from a single stage to a two stage structure. Subsequent 
increase in the number of stages yields little. if any. reduction in computation and storage. 
The decimation rate can be optimized against either multiplication rate or storage. For 
structures with more than two stages some iterative procedures are needed. or alterna-
tively design curves may be used. For two stages. the optimum decimation ratios are sim-
ple expressions in terms of the :filter parameters [106]. as in equation 2.2 and 2.3. 
M - 2M 
lopt - (2-AI )+ J2M AI - M (AI )2 
M M 20pt =---M 10pt 
(2.2) 
(2.3) 
- 25-
transition width 
where III = st 1. __ ..3 I . M lopt .M 20pt are the optimum decimation rate for 
opvu.rw. requency 
stage 1 and 2 respectively. and M = total decimation rate. 
It turns out that for most practical cases of M and al for filters in TMUX applica-
tion. M 2 is always less than 2. Practical implementation of non-integer decimation 
requires interpolation followed by decimation and would introduce so much more compu-
tation that it outweighs the original saving. Hence in practice. M 2 is usually taken to be 2 
although it is not the theoretical optimum. 
e jwo 
yJn) 
e jWl 
x(n) 
• 
e jWK-l • 
"'------i4~~ H(:W)II-- ~> ))(_l(n) 
Fig.2.1 TMUX using per-slot method 
The most straight forward implementation of a multistage structure is the per-slot 
method. Each channel is down-converted to baseband and filtered by the lowpass filter. as 
shown in Fig.2.t. This is expressed by (2.4). 
Yk(n)= f e-JCA)k(n-i)x(nM-i).h(i) (2.4) 
i=-oo 
where (c) k .. channel carrier frequency. x (i) and h (i) are the input and filter impulse 
response respectively. and M is the decimation rate. 
- 26-
Or equivalently. a complex bandpass filter may be used. Manipulation of the above 
equation gives. 
() -Jw"n Yk n =e r, e J w,,; h Ci ).x (nM -i) (2.5) 
;=-00 
which is the convolution of the modulated filter response with the input sequence 
with down-conversion taking place after filtering. 
The computation rate per channel may be expressed as a function of the input sam-
pling rate. the decimation ratios and the filter lengths as shown in the following. assuming 
that general N-tap FIR filters are used. 
First consider the structure using modulators and real filters; 
For mUltiplication; ( denote R* as the multiplication rate ). 
modulator : R~r = f sam .2 • 
* (f sam first stage filter: R fill = 2. M 1 N 1) • 
* (fsam N 
second stage filter : R f il2 = 2. M 1 M 2 2). 
total: 
multiplication per second (mps) (2.6) 
where M; and N j are the decimation rate and the filter length of stage i respectively. 
For addition; ( denote R+ as the addition rate ). 
first stage filter: R/il1 = ~~ .(N 1-1)·2 • 
second stage filter: R/iz2 = :;;'2 .(N 2-1).2. 
total: 
Now consider the structure using complex filters ; 
For multiplication; 
mps (2.7) 
- 27-
first stage filter: R jill = 2.( ~: .N 1) . since the input signal is real, 
second stage filter: Rjil2 = 4.( f sam .N 2) • 
M 1·M2 
total: 
• fsam( N2 Rtotal=2.-- N 1+2.--) mps 
M1 M2 
For addition. 
first stage filter is identical to the previous case. 
(2.8) 
second stage filter requires 2 additions per coefficient for complex multiplication plus 
the summation of the products. amounting to approximately twice that of the previ-
ous case. Hence 
total: 
(2.9) 
Storage requirement will depend on the word lengths of the AID converter. filter 
coefficients and filter arithmetic. For both structures. the coefficient storage will be identical 
and equal to (N 1 + N 2) words. For the structure using modulators and real filters. the 
requirement is 2 words for the modulator and (2N 1+2N 2) words for the filter history. 
Therefore the total storage required is 2(N 1+N2+2) words. 
For complex filter arrangement the storage is for the real and complex parts of the 
filter history only and is therefore equal to 2(N 1+ N 2). 
These expressions for the storage assume that the wordlengths for both stages are 
identical. In theory this should not be the case because rounding at each FIR filter stage 
introduces noise and theoretically the wordlength should be increased to maintain the 
same signal to noise ratio [t07]. This is a subject discussed in section 2.3. However. in 
practice a single constant wordlength is most likely to be the case because of hardware 
simplicity. 
The choice for the variables in the above expressions (namely the wordlengths. filter 
lengths. sampling rate and decimation ratio). involves a trade-off study. This was carried 
out by computer simulation in conjunction with theoretical analysis and is described in the 
- 28-
subsequent sections. 
2.2.2 Multirate Filtering with Comb Filters 
Comb filters are FIR filters that have the following impulse response; 
h(n) = I 1. O. O~n ~N-1. otherwise. 
which is in the form of a sinx function in the frequency domain. When the constraints on 
x 
frequency response are correctly applied. the comb filter may be used as an efficient tech-
nique for lowpass filtering and decimation [108.109]. In this case the filter length N equals 
its decimation rate which lends itself to very simple implementation with inexpensive 
computation in the form of a simple accumulator which sums groups of N input samples. 
The design procedure is to choose the highest allowable decimation rate provided that the 
requirements on stopband attenuation and passband bandwidth are met. The basic 
approximate requirement. from [106] is. 
for M; » 1. 
where 2.1 s = stopband bandwidth. I; = sampling rate at decimator output. 8s 
stopband attenuation. and M; = decimation rate. 
For example. if 8s == -20 dB. F; ~ 10.Fs. 
For this application. let I; = I sam • where I sam = input sampling rate. 
M; 
~ Isam ~ 
M; ""l: Is .Os (2.10) 
To allow the largest decimation for a given I sam and 8s • Is must be made as small 
as possible. which in practice means sampling at the Nyquist frequency and I s is equal to 
one channel spacing. 
If sampled at the Nyquist rate. I sam = 2.K·I slot where K - number of channels. and 
I slot = channel bandwidth. Then 
2.K.f slot ~ 
M; ~ Is .Us 
- 29-
Let / s = / slot • 
(2.11) 
This method performs filtering on a per slot basis and the decimation structure 
replaces the first stage filter in the conventional two stage design. The second stage would 
have to be a conventional FIR filter to provide the narrow band response required. The 
option of complex filter implementation does not exist in practice because it would render 
the accumulator implementation impossible and offer no advantage in computation. The 
derivation for computation rates and storage are trivial. There is no difference in storage 
between this method and the multistage filtering method in section 2.2.1. The same is true 
for the addition rate. The only difference is that there is no multiplication required at the 
second stage filter. hence the total multiplication rate becomes 
N2 Rt~al = 2./ sam .(1+ M 1 M 2) 
where N j and M j are as defined previously. 
(2.12) 
It can be noted that M 1= N 1 for a comb filter. Choosing the maximum filter length at 
the first stage therefore minimizes that at the second. 
2.2.3 Tree Filter Bank 
The tree filter bank method differs from the previous methods in that parts of the 
filtering are shared amongst the outputs. as opposed to the situation before where each slot 
has its own filtering. In the binary tree structure. the filtering is carried out by successive 
stages each decimated by a factor of two as shown in Fig.2.2. At each stage the output con-
sists of a lowpass and a highpass filter branch. It is designed such that the frequency 
response of the lowpass and highpass filters are simply frequency-shifted versions of the 
same prototype specification. with each filtering half of the number of channels in the 
input frequency multiplex. 
Let the lowpass prototype impulse response be hLP(n),n = 0.1.2 .... 
then the highpass impulse response is. 
- 30-
t-----J
12 H(z) t 
H(z) 
H(z) H(z) 
x(n) 
H(z) H(z) 
H(z) 
t--~ 
Fig.2.2 Tree filter bank for 8-channel 
where Wo - normalized frequency (rads/s). 
Now W 0= 'IT when real filters are used. as shown in Fig.2.3. then 
The output of the highpass filter is 
N-l 
YHP (n ) = L hHP Ci ).x (n - i ) 
i=O 
N-l 
= L (-l)ihLPCi).x(n-i) 
i=O 
G(z) 
• 
• 
• 
G(z) 
This means that the same product terms hLPCi).x (n -i) may be used to produce 
both lowpass and highpass outputs of the filter stage. There is an alternate sign-inversion 
involved in the summation in either one of the two branches. which should impose less 
computational burden than multiplication. 
Similarly. the computation rate for a binary tree filter bank using real filters may be 
expressed in terms of other parameters as before. 
Consider first stage filtering. 
- 31 -
\ I 
. 
Hhp I \ 
0 fa/4- fs/2 3t./2 t. 
Fig.2.3 Filter responses for real tree TMUX 
f fmm I - output data rate - -2- • 
NI R; - multiplication rate - f IN I = f sam.2 real mps. 
assuming general N-tap FIR filters. 
Successive stages have reduced data rates by factors of two. but at the same time 
increase the number of input signal branches by the same factor. Therefore the mUltipli-
cation rate is expressed by a similar expression as the above. and the total mUltiplication 
rate is. 
(2.13) 
where L == number of stages - log2 K ; K - number of channels. 
However. the use of real filters and real signals in the above binary tree structure has 
a serious shortcoming. in that the filter must have very narrow transition bands to avoid 
aliasing at the present decimation ratio. Ideally the filters are required to have an ideal 
brickwa11 response or some bandwidth will be wasted in the transition bands. Thus the 
filters at each stage become very long making the scheme computationally inefficient. 
although it may be reduced by the use of halfband filters and exploiting the filter sym-
metry of FIR filters. 
To circumvent this problem the general approach is to use complex signals and filters. 
Fig.2.4 illustrates the basic frequency responses of these filters. The transition band is 
made equal to the passband bandwidth. which is equal to the stopband bandwidth. This 
particular design also allows the use of halfband filters which are described in the next 
section. The highpass filter is again a frequency shifted version of the lowpass response. In 
- 32-
. 
HhP / 
. 
. ", 
. 
o ts/" t./2 3t./2 t. 
Fig.2.4 Filter responses for complex tree TMUX 
this case it is more complicated as the frequency shift is :!!.. radls. i.e. 2 
-J'" n 
hHP (n) = e T hu(n).n = 0.1.2 ... 
The highpass and lowpass branch outputs may still be produced by the same product 
terms Hu(n ).x (i -n). but it now involves commutation of real and imaginary parts as 
well as the sign-inversion previously described. One method of doing this is to use the 
polyphase decomposition described in the next chapter. 
The derivation for the computation rate of this structure is very similar to the case 
with real filters. The difference lies in the implementation of complex filters and the 
filtering of complex signals. 
Consider. in general. 1 complex multiplication - 4 real multiplications. 
therefore multiplication rates of complex filters filtering complex signals - 4 x real 
filters filtering real signals. 
Hence by similar reasoning as before. the total multiplication rate for the structure 
using complex filters is. 
However. in the first stage the input signal is real and would only need two instead 
of 4 real multiplications. The reduction is equal to 2. f ;am .N 1 real mps. 
- 33-
Another possible saving in computation occurs in the final stage post-filters. These 
filters are to eliminate the unwanted aliased signal band. and perform the final decimation 
to sampling at Nyquist rate for each channel. The complex filter used here is a real 
lowpass prototype shifted by ; rad.lsec. Hence the filter coefficients are simply com-
muted in real and imaginary parts with sign inversions. That is. the real and imaginary 
parts are alternately zeros. Hence the actual number of multiplications here can be reduced 
by f 6am.NL real mps. 
Therefore the total multiplication rate becomes. 
(2.14) 
where L = 10g2(K)+ 1. 
The passband. stopband and transition bandwidth satisfy the specification of 
halfband filters and hence a reduction by roughly a factor of two can be achieved since 
about half of the halfband filter coefficients are zero. Exploiting also the FIR filter sym-
metry. reduction of the multiplication rates by approximately a factor of 4 can be 
achieved. 
It is very interesting to note from the expression of L here. that there is apparently 
one extra stage in the complex filter structure as compared to the previous structure using 
real filters. This phenomenon can be explained by a close examination of the first stage 
filtering. In this stage. the sampling rate is decimated by a factor of 2 but the output signal 
becomes complex. In effect there is no net decimation in data rate in that the number of 
samples per second at the input is the same as that at the output. The function of the first 
stage filter. then. is no more than to convert the real input signal to complex output. It is 
a wasteful process but no simpler processes can be used to replace it. One possibility is to 
use the Discrete Hilbert Transform [107] in the :first stage to produce the analytic signal 
representation of the input signal as shown in Fig.2.5. The second stage filter can then pro-
duce four useful outputs using the same lowpass prototype. The succeeding stages are the 
same as before. This method eliminates one filter of the second stage at the expense of 
increased computation at the :first stage due to the Hilbert Transform. In practice the 
overall computation is increased since the Hilbert Transform is equivalent to a FIR with 3 
to 4 times the length of the complex filter which it replaces. As with FIR implementation 
of lowpass filters. the finite-duration approximation of the ideal Hilbert transformer has 
In put 
® 
'\. 
/ 
~ 
/ 
signal 
at ® 
signal 
at ® 
- 34-
Imag. part 
~2 -Hilbert 
'\. t2 r---Transforrr / 
U- ~2 4-Band ~ 
17' Filter nL ~2 
"- ~2 Delay r-® / 
L-- ~2 
real part 
I fs/2 fa 
, fs 
. ..(--J' " . /. - - -' 
fs 
4-Band 
filter 
responses 
" , ----, / '~ ~ ... / .. , ~ ". --_ ..... 
fs 
fa Channel 
:~ I~ 
Output fs 
fs 
Signals o 00 
o fs 
Fig.2.S TMUX using Hibert transform 
'-/' 
'-/' 
Output 
'-/' 
....... 
.,; 
Frequency 
- 35-
finite transition bands which means that the frequency bands near to the edges of the 
baseband cannot be used. These disadvantages render the Discrete Hilbert Transform 
approach an unattractive alternative. 
Another point of interest is the final stage filters. As pointed out earlier. the function 
of these filters is to remove the aliased band and decimate the signal to the final rate. They 
have always been included as part of the TMUX in past literature on the subject. The rea-
son for this is because the subject addressed in the literature was the general TDM-FDM 
TMUX structure. involving no demodulation since the signals were generally speech sig-
nals in PCM format. And the final stage filter was required to give a real output signal at 
the Nyquist rate of each channel. In the application for multicarrier demodulators. how-
ever. the requirement of the sampling rate at the output of the TMUX is not a strict one. 
The final filter can and should be part of the demodulator as the matched filter for data 
detection. which would reduce the overall computations. This is the subject of overall 
optimization of the TMUX-clemodulator structure and requires further investigation. 
Variations of the tree filter bank. are plentiful. The first obvious choice is to use a 
mixer at the first stage to shift the input signal in frequency instead of by the filters. This 
is exactly similar to previous analyses for per-slot structures and the problem is trivial 
and so will not be repeated here. 
One novel structure is described here. It achieves higher computation efficiency by 
reducing the number of filters by one. but is limited for small numbers of channels (4). 
This concept is to use the first stage filter as a band-splitting operation to produce three 
outputs instead of the normal binary band-splitting filter. The desired frequency responses 
of the filters are shown in Fig.2.6 for a K-4 case. and the structure is shown in Fig.2.7. 
The first stage :filter responses consist of two real :filters. a highpass. a lowpass and another 
complex one which are all of the same prototype response. The real filters. each filter out a 
quarter of the baseband multiplex. while the complex :filter selects the centre half of the 
multiplex. It can be seen that the real filter responses perform the function that is nor-
mally carried out by two stages in a tree structure as discussed before. Hence there is an 
inherent saving of one filter. The rest of the operation is analogous to ordinary tree filter 
banks and can be understood from Fig.2.6. The number of channels is chosen to be K=4 
for a good reason. It can be seen that the output signals are not of the same form. i.e. 
some are real and some complex. This can be awkward and impractical. Furthermore. for 
- 36-
HI I~ 
0' , fS/B 3fs/8 5fs/B 71s/8 , fs , 
I~ 
0' o fs/8 3fs/B 5fs/8 7fs/B o fs , 
I~ 
0' , fs/B 3fs/8 5fs/B 7fs/8 , fs , 
/ 
0: fs/4 fs/2 3fs/2 fs 
.......... ' / 
o fs/4 fs/2 3fs/2 fs 
Fig.2.6 4-band tree filter bank frequency responses 
y 1 (n) 
Hl Y2(n) 
x(n) ---- H4 
H2 ---
H5 Y3(n) ----
H3 
Y 4(n) 
Fig.2.7 4-band tree filter bank 
- 37-
higher numbers of K such as K-8. the real output signals from the first stage are very 
awkward for efficient filtering implementation by the subsequent stages. In fact for any 
cases other than K=4. this implementation will not lead to any saving in computation at 
all. This indicates the kind of varieties that can be derived from the tree filter bank archi-
tecture. It also further highlights the inefficiency caused by conversion of a real signal into 
a complex one by a complex filtering process. 
2.2.4 Summary 
A number of techniques applicable to non-transform type implementation of TMUX 
have been identified. This leads mainly to two types of structure: the per-channel type and 
the tree type filter banks. Expressions relating their computation requirement to the inter-
nal parameters of these structures have been derived. Some possibilities of reducing the 
computation requirements such as the use of halfband filters and comb filters are studied 
as to their suitability and limitations in a TMUX. This allows some parameters of impor-
tance to be identified. such as the'the interaction between filter lengths and decimation 
rates in mUltistage filtering. Efforts can then be directed towards the optimization of these 
parameters. This is the subject discussed in the next two sections. 
2.3 Analysis of Parameters in Implementation 
This section describes an analytical approach for the evaluation of parameters in the 
digital implementation of TMUX's for SCPC applications. The method is based on statisti-
cal models for signal and noise analysis. Analytical methods using statistical means is 
capable of deriving lower bounds on some performance measures. The performance 
bounds so obtained are usually pessimistic compared to the actual performance achievable 
in the real system. but are nevertheless useful in providing some estimates to the design of 
the system. especially when these estimates are used in conjunction with computer simula-
tion to provide a more accurate design approach. First the model for multicarrier SCPC 
mUltiplex is derived. Then the model is used in conjunction with the noise model analysis 
associated with the finite wordlength effects in the TMUX. As a result. the relationship 
between various parameters are assessed which then allows the choice of parameters given 
certain constraints in the performance requirements. 
- 38-
v 
2.3.1 Statistical PrOperti~Multicarrier SCPC Signals 
The objectives in considering the statistical properties of multicarrier SCPC signals 
are mainly twofold. Firstly it allows the dynamic range of the analogue to digital con-
verter (ADC) to be estimated. Secondly the signal power. or in statistical terms its vari-
ance. is required for analysis of noise due to finite wordlength effects of digital signal pro-
cessors. 
2.3.1.1 Variance of SCPC QPSK Signal 
In the application concerned. each SCPC signal is QPSK of the form [80] 
co 
x(t) = A r, g(t-kTs ) cos(CUet+cf>k) (2.15) 
k=-co 
where g (t) is the sisnallln9 wcive.forlll for a symbol period of Ts .k is the symbol number. 
cue is the carrier frequency and t/>/c is assumed to have a uniform probability mass func-
tion such that Pre t/> k -0) = Pr ( t/> k"=t) = Pr'C (A. = '¥ ) = Pr ( <1>", = 11") = ~ . 
If g (t) is a rectangular pulse of magnitude 1 t 
density of the SCPC signal is then 
the double sided power spectral 
(2.16) 
Now with a frequency multiplexed multicarrier signal it can be assumed that the 
individual SCPC signal are uncorrplated such that given two signals x (t ).y (t) and their 
sum z (t)= x (t )+y (t). the autocorrelation function of z (t) is given by 
(2.17) 
where Rx (1') and Ry (1') are the autocorrelation of x (t) and y (t) respectively. 
This is a reasonable assumption since the SCPC signals are transmitted by indepen-
dent sources. Let G j (CIJ) be the SCPC signal of channel i. and H j (w) be its transmission 
filter frequency response such that the power spectral density of each channel is 
G j (cu). \ H j (cu ) \2. then the power spectral density of a K channel mUltiplex is 
X-I 
Px (cu) = r, Gj (cu) \ H j (CU) 12 (2.18) 
j=O 
- 39-
It is now possible to determine the dynamic range required of the ADC. Given that 
the variance of a random variable is related to its power spectral density by 
(2.19) 
By the central limit theorem. the resultant signal may be assumed to be a Gaussian 
random process with zero mean and variance as expressed. A probability of overflow 
corresponding to a particular range of x. hence CT: and a scale factor can all be found. 
It is more convenient to work in the discrete frequency domain since the expressions 
are more directly related to the discrete frequency variable. The relationship between the 
discrete and analogue power spectral densities are given by 
Gd(O) = 2'TT' f G
a
(.:!!.- 2'TT'n) 
T n=-oo T T 
(2.20) 
where T = sampling period of the ADC. and Ga (0) and Gd (0) are the power spec-
tral densities in the analogue and the digital domain respectively. 
Assuming as is normal that there is suitable anti-alias :filtering so that within the 
range -'TT'~ O~'TT'. Ga (0) becomes negligible except the term for n = O. Substituting (2.16) 
into (2.19) then gives the following expression. 
(2.21) 
and in the presence of the transmission :filter this becomes 
where H (w) is the transmission :filter response. 
H(w) must be a root-Nyquist :filter to avoid intersymbol interference. and a popular 
and straight forward choice is the raised-cosine characteristics in which case H (w) has a 
root raised-cosine response. Clearly the integral of the negative and positive frequency 
components in (2.22) are equal and hence the problem may be simplified by considering 
the integral of the one-sided power spectral density and abstracting the constant scale fac-
tor. as expressed by 
(2.23) 
- 40-
Let H (cu) be the ideal root raised-cosine filter. such that 
1 . 
( ) 2 1 I I'" ( )/1 (1-I3)W ~ IW-lA)cl~ (1+/3)W IH cu I = T I-sin 213W Iw-~~W . (2.24) 
o. otherwise. 
11" 
where W= T and 13 - rolloff factor. 
, 
Finally. referring to Fig.2.8. the -?- equalizer is included to find the variance of a 
smx 
single channel signal to the ADC. This has the form of 
(W±CAl)r, 
F(CJJ) = __ 2 __ 
. (~±Wc) T. ) 
sm 2 5 
Root 
Nyquist 
Root 
Nyquist 
(2.25) 
e(t) = white 
nOIse 
Channel 
.'"' ... ... .. ... ... ... .. .. .. .. ... ... .. ... .. ... ... ... .. .. .. .. ... ... . 
Demod TMUX 
AID 
onverter 
Fig.2.8 SCPC system model 
Link Gains 
- 41 -
The relationship between the power spectral densities of the output Y (w) and a ran-
dom input X (c.u) to a linear system H (w) is given by 
Y(w) = X(w ).IH(w) 12 (2.26) 
Consideration of (2.23) and (2.25) together with the above expression shows that the 
resultant power spectral density of the transmitted QPSK signal simply has the form of 
the raised-cosine function. i.e. 
Y (w) = A 2T, . 1 H (w ) 1 2 (2.27) 
l 
Integration of this function. as required in (2.23). gives the result of A 2. This is the 
variance or the power of the QPSK SCPC signal. which will then facilitate the evaluation 
of dynamic range of a group of frequency multiplexed signals. 
2.3.1.2 Dynamic Range of Multicarrier Signals and ADC Scale Factor 
With a frequency multiplex of SCPC signals. it is again assumed that the individual 
signals are uncorrelated with each other. in which case the total power for a K channel 
multiplex is 
X-I 
PK = L A'; (2.28) 
n=O 
A further simplification occurs if the SCPC signals are assumed to have the same 
amplitude A. In the practical situation when the signals are transmitted by different earth 
stations. this is not likely to be the case. However. it becomes a valid assumption when the 
objective is to estimate the dynamic range or scale factor for the ADC. Letting all SCPC 
signals be of equal amplitude then allows the maximum signal power at the input to the 
ADC to be specified. In the mobile-satellite application. this specifies the situation when all 
mobiles are transmitting and without any attenuation other than those included in the 
link. budget calculation. This is therefore a valid and practical specification for the ADC. 
The procedure to obtain a scale factor for the ADC is to assume that the FDM signal 
constitutes a Gaussian random variable with zero mean and variance equals to KA 2. This 
is a valid assumption by virtue of the central limit theorem. given a reasonably large 
number of SCPC signals. The scale factor is chosen. for a given variance. to give a certain 
probability of overflow at the ADC. The assumption of zero mean Gaussian distribution 
means that the scale factor so chosen is proportional to the standard deviation which is the 
- 42-
square root of the variance. Hence the final relationship obtained is that the scale factor S 
for the ADC is 
S 0: A . ..JK (2.29) 
The constant of proportionality determines the probability of overflow allowed. It is 
also to take into account the amplitude gain from the transmitters to the ADC input which 
may be calculated from the link budget. 
The effects of saturation or clipping at the ADC when the input signal overflows are 
generally nonlinear and hence a detailed analysis is difficult. Guidelines based on experi-
mental results [107] suggest a scale factor equivalent to four times the variance. This 
corresponds to a overflow probability in the order of 10-5• which is an alternative design 
rule suggested elsewhere [110]. 
In the simulation studies the model used is slightly different from the one just 
described and is shown in Fig.2.9. The difference comes from the requirement for the 
quasi-analytical method [111] in the determination of bit-error rate (BER) using computer 
simulation. The system under test is assumed linear such that the channel additive white 
Gaussian noise (A WGN) may be represented by an equivalent noise source at the com-
muted position. as shown in Fig.2.9. This assumption of linearity for the system under 
test. which is the TMUX in this case. is not valid in the strictest sense as the finite 
wordlength effects are generally nonlinear. However. given the correct scaling to prevent 
overflow. the other finite wordlength effects may be modelled statistically which can then 
be approximated as a linear system. and hence may still be tested under these conditions. 
The scale factor required for this system under simulation is determined as follows. 
In the computer simulation the overall gain is taken to be unity to simplify calculations. 
Thus the power spectral density at the input to the system is similar to that described pre-
viously with the main difference being the squared raised-cosine instead of the raised-
cosine. that is 
1 . 
+It-Sin! 2;w(~-+-W)1 r. 
O. 
Iw - w cl'(l-/3) W 
(l-(3)W "w-wc~(l+(3)W 
otherwise. 
(2.30) 
x 
Modulator 
sin 
• 
• 
x 
Modulator 
sin 
x 
x 
- 43-
Nyquist 
Nyquist 
Ideal 
Demed 
: Quasi-Analytical 
: BER Estima tien 
e(t) : 
........... _- ...... -._-- ......... 
--_ .. - ...................... _---
Link 
System 
under 
Test 
Gains 
Fig.2.9 SCPC system model in simulation 
The signal variance is then obtained as before using (2.23). This gives a closed form 
expression of the form 
(2.31) 
A practical value of f3 is 40%. In this case u 2= 0.' A 2. The scale factor is then pro-
portional to A . .JO.9K . With unit amplitude signals and using a criteria for overflow pro-
babilityas described previously. the scale factor is about 3.0,fJ[. 
Hence an approach to the analytical estimation for the parameters of the ADC is 
obtained. 
2.3.2 Filter Specification 
- 44-
2.3.2.1 Stopband Attenuation 
The overall frequency response of the transmultiplexer must be specified according to 
some performance criterion. The most direct criteria would be a given BER against signal-
to-noise characteristics which the system will need to satisfy. The objective therefore is to 
derive a carrier-to-noise ratio in terms of the filter specification and then determine a 
theoretical lower bound which represents a worst case preformance of the BER characteris-
tics. Such an approach also allows the filter frequency response to be arbitrary to a large 
extent and not necessarily an equiripple prototype. and hence may also be applied in sys-
tems such as the multistage architecture where the overall response is not equiripple. 
The lowpass prototype frequency response is considered first. Starting with a K-
channel SCPC FDM signal with each channel having equal and unit amplitude. let the 
stopband attenuation be 8s • with the transition band equals to the guard-band bandwidth 
between adjacent channels. Again using the power spectral density relationship in (2.26) 
and if the filter response is H(w). then for w within the stopband. IH(w)12~8;. A pes-
simistic but reasonable approximation is to assume that IH(cu) 12= 8;. which is a close 
approximation for equiripple FIR filters. 
The BER characteristics of the system can then be found by modelling the TMUX 
operation as an additive noise source in addition to the A WGN for which the BER of the 
system is evaluated. For the system modelled in the simulation as shown in Fig.2.9. the 
noise source due to the TMUX may be combined together with the A WGN noise source of 
the BER estimation procedure. With the noise from both sources being uncorrelated. the 
noise power at the ideal demodulator is then the sum of the A WGN and the TMUX noise 
power. 
Let the TMUX noise power be E. and using the notation for optimum signal detection 
as shown in Appendix 2A. the signal to noise ratio at the optimum sampling instant is 
given by 
+E 
1 (2.32) 
- 45-
Alternatively Eb / No may be expressed in the logarithnic scale (10 log Eb / No) so 
that the degradation in BER is expressed as a reduction in Eb / No given by 
Eb e Eb 
Il N = 10 log[ 1 + 2 2 2 N ] 
o k Eb 0 
(2.33) 
The ratio e/k2Eb2 must be determined from the particular sets of system and signal 
parameters under investigation. Consider the K -channel signal from which one channel is 
to be filtered by the TMUX.. E is due to the aliasing into the passband of the K -1 channel 
suppressed by the stopband attenuation of 8$. Using the signal variance as from (2.27). 
the noise power due to these channels is given by 
(2.34) 
Eb may similarly be expressed in terms of the signal parameters using (2.27) which 
implies that Eb = A 2Ib . Referring to the derivation for the matched filter in the Appen-
dix. this corresponds to a gain constant k of 11 A Jf; between the transmitted signal and 
the receiver filter. However. it can be seen that the value of k does not affect the ratio 
Elk 2 El. Substituting Eb and e into (2.33) then gives the reduction in Eb / No as 
Il Eb = 101 [1+282 (K -1) Eb ] 
No og $ Tb No (2.35) 
The actual value of Tb in a sampled-data system should be normalized to the sam-
pling period of the original signal. as implicitly assumed in the discussion from (2.20) to 
(2.23). For the system simulated. Til may be determined from the knowledge of the digi-
tal Nyquist filter used in the generation of the QPSK signals. Such a filter with unity-gain 
passband has the property that its L -sample impulse response of {h (n )}. 
-L+l/2'n 'L-l/2. has a maximum centre sample of h(O)= l/M where M is the ratio 
between the bit period and the sampling period [112]. With a passband gain of A the 
receiver filter output at the optimum sampling instant is then 
(2.36) 
Hence Tb = 1/ M2 and substituting into (2.35) gives 
E E 
Il_b = 10 log[1+2M 28;(K-l)Nb ] 
No 0 
(2.37) 
- 46-
This shows that the degradation in Eb / No due to finite stopband attenuation are 
related to the number of channels. the stopband attenuation and to Eb / No itself. The last 
Eb 
term of Il No reflects the fact that the degradation in Eb / No is proportionally greater for 
E 
larger values of Eb / No· A set of curves showing Il_b_ against 8$ for different values of 
No 
Eb/N o. together with the resulting HER characteristics. are shown in Fig.2.10 to Fig.2.13. 
for 8. 16. 32 and 64 channels respectively. 
The previous analysis has made the important assumption that the stopband is 
equiripple. This is suitable for a single stage filter or filter banks such as the polyphase 
filter where each channel response is a frequency shifted version of a lowpass prototype. 
which is equiripple. For mUltistage filtering such as the tree filter bank.. the attenuation is 
not uniform across the stopband. It then depends on the composite frequency response of 
the mUltistage filter. For tree filter banks using real filters the evaluation is similar to that 
previously described except that the stopband attenuation is different for different chan-
nels. The noise contribution from individual channels must be considered separately for 
each channel. For complex filters the evaluation is not so straight forward since many of 
the channels fall within the transition bands at different stages of the mUltistage process. 
and as the transition bands are not well defined in equiripple filter design. a simple closed 
form approximation cannot be derived. In this case. given the filter coefficients of each 
stage. the actual composite response may be derived. Then the frequency domain model for 
the QPSK signal described in the previous section may be used and by numerical integra-
tion one can calculate the noise contribution due to the unwanted channels as in the fol-
lowing expression 
1 1f' 
E = - J s (cu ) IT 1 1: hk .1 cos cu 1 12 d cu 
2'7T -1f' all k all 1 
(2.38) 
where hk ,I represents the i th filter coefficient of the k th stage filter. 
The composite responses for tree TMUX's of 16.32 and 64 channels are shown in 
Fig.2.14 to Fig.2.16. The numerical procedures are performed for these sets of filter 
responses to find the stopband noise-to-signal ratio ;2 which. substituted for 8 ;(K -1) 
in (2.37). allows the curves of Il ~o to be plotted as shown in Fig.2.17. 
t: 
0 
.-
-= ~ 
to 
Q) 
"'0 
0 
~ 
~ 
t: 
0 
.-
-= "'0 
= 
"" b.() Q) 
"'0 
0 
~ 
~ 
1.4 
1.2 
1 
0.8 
0.6 
0.4 
0.2 
0 
0 
Eb/NO: 12--
11-·_·_·_·_· 
10·············· 
9 -------
8--
0.005 
Stopband gain 
- 47-
Q) 
-e 
~ 
-.-
.0 
0.01 
10-4 
10-5 
10-6 
10-7 
8,: 0.01 .-.-.-.- .. 
0.005············· 
0.001 ------
ideal--
, 
, 
, 
, 
., 
., 
, 
, 
., 
10-8 '--_---1 __ ---1 __ ----1....:.-_---1 
6 
Fig.2.10 Degradation due to 8, for 8-<:hannels 
, 
2.5 
Eb/NO: 12 
11 _.-.-.-.-. 
2 10 ............. 9 ------. 
8 , ~ 1.5 
"" 
"" g 10-5 
Q) 
1 
-.-
.0 
0.5 10-6 
0 
0 0.005 0.01 10-7 
Stopband gain 
10-8 
6 
to 8, for 16-<:hannels 
8 
, 
, 
8 
10 
Eb/No (dB) 
12 
8,: 0.01 _._._._.-
0.005 ............ 
0.001 ------
ideal 
, 
, 
, 
, 
, 
, 
, 
, 
, 
, 
10 12 
Eb/No (dB) 
14 
14 
§ 
.... 
~ 
I 
"0 
o 
~ 
§ 
.... 
-cd 
"0 
cd 
to 
~ 
"0 
0 
~ 
4.l 
1.5 r------,~---r--_r-----, 
1 
0.5 
12---
11 ._-._._.-
10 ............ .. 
9 -------
8----
J " ,," 
,. . ... ", 
,. " " 
. , 
,." 
.' 
.' 
.' 
" 
",,,, 
, 
,. 
, 
" 
O~--~---~----~---~ 
- 48-
£ 
C'd 
"'" ~ 
-.-~ 
1 2 345 
Stopband gain x10-3 
Fig.2.l2 Degradation due to lis for 32-channels 
3 
EblNo: 12 
11 ._.-._._' 
2.5 10 ............ 
9 ------
2 8 ~ 
"'" , 
"'" 1.5 E 
-
.... 
1 ~ 
0.5 
0 
1 2 3 4 5 
Stopband gain x10-3 
10-2~----r-----r-----~--~ 
10-3 
10-4 
10-5 
, 
\ 
'. 
0.CK>5 _._._._ .. 
0.CK>2 ......... .. 
0.001 _____ _ 
ideal __ _ 
" 
" 
10-8 L..--__ ----l ___ ---'-____ ---"--'-__ _ 
6 8 10 12 14 
Eb/No (dB) 
10-2 
lis: 0.CK>5 _._._._._ 
0.CK>2 ............. 
, 0.CK>1 ______ . , 
, 
, ideal 10-3 , 
, 
, 
, 
, 
, 
, 
10-4 
, 
, 
10-5 , 
, 
10-8 
6 8 10 12 14 
: to 8" for 64-channels Eb/No (dB) 
- 49-
101 
100 
(1) 
10-1 
-g 
.... 
.... 10-2 ~ 
«I 
~ 10-3 
10-4 
10-5 
10-6 
10-7 
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
Frequency (nomalized) 
Fig.2.14 Composite response for 16-channel tree TMUX 
101 
100 
(1) 
10-1 
"0 
::l 
.... 
.... 10-2 ~ 
«I 
~ 10-3 
10-4 
10-5 
10-6 
10-7 
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
Frequency (nomalized) 
Fig.2.15 Composite response for 32-channel tree TMUX 
Q) 
"0 
::s 
-
.... 
~ 
ct! 
~ 
§ 
.... 
-
ct! 
"0 
ct! 
to 
Q) 
"0 
0 
~ 
~ 
101 
100 
10-1 
10-2 ~ 
10-3 t-
10-4 ~ 
10-5 t-
10-6 
10-7 
o 
0.08 
0.07 
0.06 
0.05 
0.04 
0.03 
0.02 .. 
0.01 
8 
.... 
~ 
K: 
... 
... 
- 50-
-., 
• , I I I , 1 
-
~ ~ ~ ~ ~ ~ ~ ~J I J j ll~ ~ 11 II 
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
Frequency (nomalized) 
Fig.2.16 Composite response for 64-channel tree TMUX 
16················ 
32--------
64 
, 
,.' 
,.' 
,.' Q) 
,.' 
,.' ~ ,.' 
,.' ~ 
.-:.-
,. g ,.' ,.' ,.' 
,.' 
,.' 
,.' Q) 
,.' 
,.' 
,.' 
-
,.' .... 
,.' J;:J 
".-:,-
,.' 
,.~. 
, ~ 
~.~.4f' 
10 12 14 10-7 
Eb/No (dB) 
10~~--~----~----~-----
6 8 10 12 14 
Fig.2.17 Degradation due to 8. for tree TMUX Eb/No (dB) 
- 51-
A comparison between Fig.2.17 with Fig.2.10 to 2.13 shows that the tree structures 
have much lower stopband noise-to-signal ratio than the equiripple filter of similar com-
putation requirement. In terrestrial TMUX where the signals are mainly digitized speech, 
this fact cannot be easily exploited because although the total stopband noise is low, the 
noise contribution from certain channels is higher than the equiripple case. This gives rise 
to the intolerable effects of intelligible crosstalk.. In the TMUX for OBP satellite, the main 
concern is the reduction of total stopband noise and as long as the channel signals are 
uncorrelated with each other. the BER performance is a function of the total stopband 
noise. Hence the use of these composite responses are justified. Similar lines of analysis was 
reported by Medlin [113.114] where this ratio was termed the 'integrated sidelobe ratio' 
and filters were designed to minimize this alone. 
2.3.2.2 Other Parameters 
The other parameters in the filter specification appear to be less influential in the tra-
deoff between computation efficiency and performance of the TMUX. 
The transition band has direct influence on the filter length and hence the computa-
tion efficiency. It is common practice to assume it to be the guard-band bandwidth between 
the channels. Increasing the transition bandwidth beyond this introduces noise that cannot 
be modelled accurately as white Gaussian noise and hence can only be investigated using 
computer simulation or by experimental means. In a complex tree filter structure the tran-
sition band is already very wide and further increases bring very little advantage in reduc-
ing the filter lengths. 
The other parameter is the passband ripple. It introduces distortion into the channel 
signal and may be interpreted as an additive noise source that is equivalent to the channel 
signal having passed through a filter. This filter has a frequency response equal to the 
difference between the ideal unity gain and the actual passband responses. If the passband 
ripple equals that of the stopband. then this additional noise source becomes insignificant 
compared to noise contribution from a larger number of channels in the stopband. Hence it 
can be expected that there is more freedom in the choice of the passband ripple compared 
to the stopband. However. the empirical relationship linking the parameters of an equirip-
pIe FIR filter [115] shows that the filter length is not strongly affected by the passband 
ripple. Therefore there seems little point in the pursuit of an optimal criteria for the choice 
- 52-
of passband ripple. 
2.3.2.3 Filter Length 
For FIR equiripple lowpass filter the empirical expression relating the filter length to 
its specification [115] is given by 
N= Doo(8p .8s ) -/(8 8) (cus-cu p ) (cus-cu p ) p' s 2'7T + 1 (2.39) 
2'7T 
where D oo(8p .8s )= IOgl08s [0.OO5309(log108p )2+0.07114(log108p )-0.4761] 
-0.OO266(logl08p )2-0 . .5941(log108p )-0.4278 . 
I (8p.8s)=11.012+0.512(log108p-Iogl08s). and cu s • cup are the stopband and 
passband transition frequencies respectively. 
The expression is found to be reasonably accurate and more so for larger values of N. 
For single stage filters or polyphase type filter banks its application is straight forward 
and depends simply on the filter specification chosen. Attention is paid here to the tree 
type TMUX whose filter responses possess some regularity that may be elaborated further. 
For complex tree type TMUX using halfband filters. the passband and stopband tran-
sition frequencies are related by 
(2.40) 
Therefore 
(2.41) 
Also for halfband filters. 8p = 8s • hence I (8p .8.)= 11.012. and the expression 
simplifies to 
(2.42) 
The final stage post filter does not satisfy (2.40) and must be considered separately. 
Taking a typical example. as used in the TSAT experiment [46]. the channel spacing of 
cu -cu 
14kHz with 2.8kHz guardband leads to s2'7T p 0.05. And hence 
(2.43) 
- 53-
This means that if 8p and 8s remain the same as the previous stages in the tree 
TMUX. the filter length of this postfilter needs to be approximately five times that of the 
preceding stages. From (2.13) it is seen that the multiplication rate is directly proportional 
to the sum of the filter length. Hence the objective is to minimize the total sum of the 
filter length while achieving the BER performance required. Each of the sets of parameters 
for each stage. and especially 8s • may be varied so as to achieve this objective. Unfor-
tunately the BER performance as modelled by the analytical derivation in the previous 
section does not lend itself easily to algebraic manipulation. due to the numerical integra-
tion required. Hence optimization based on analytical relationship cannot be performed. 
Simulation is then the only means by which the effects caused by variation of the filter 
lengths at different stage may be studied. 
2.3.3 ADC Quantization Wordlength 
An estimate of the quantization wordlength required may be obtained using the sta-
tistical model of uniformly distributed random variable for the quantization error [107]. 
The noise variance is then 
A 2 2-2b 
U 2 - ----
e - 12 (2.44) 
where b + 1 == quantization wordlength. A == ADC scale factor. 
The quantized number is to be taken as a fraction normalized within the range 
(-1+2-6 - 1) and (1-2-6 - 1). This is an accurate model for complex signals such that the 
correlation between the quantization error and the signals are small. For SCPC/FDM with 
a large number of channels (> 10) . the signal may be assumed to be sufficiently complex. 
Using the previous model for QPSK FDM signals. from (2.29) and the subsequent 
analysis the scale factor was determined as 3...[j{. and substituting into the previous 
expression (2.44) gives 
(2.45) 
The output noise variance may be found by (2.19) and assuming that the noise is 
white to give 
(2.46) 
-54 -
where H(ej(J)) is the system response. 
Again consider a prototype lowpass filter response with stopband attenuation Bs. the 
output noise variance may then be approximated by 
2 _ 2 ( 1 Bs2(K -1)) 
uop-u e K+ K (2.47) 
The first term in the bracket is due to the passband of the filter. corresponding to the 
bandwidth of one channel. and the second term is due to the stopband. An assumption is 
made in that the quantization noise is white. which is usually a valid assumption [110]. 
Combining (2.45) and (2.47). the following expression is obtained 
u o~ = 3. 2-{2b +2) [1 + Bs2(K -1) ] (2.48) 
It is now possible to deduce the degradation in HER using the previous approach. Sub-
E 
stituting u ~ as € into (2.34) to (2.37) gives the expression for reduction in _b as 
No 
(2.49) 
For values of Bs and K for which B;(K -1)« 1. the expression is simplified to 
I>. :0 = 10 10g.0 [1 + 6. 2-(2b +2) M2. :0 I (2.50) 
E 
A set of curves for A ;0 with different values of b and their resulting HER curves 
are shown in Fig.2.18 to Fig.2.2l. for two extreme cases of K= 8 and 256. with Bs = 0.05 
and 0.001. The similarity between the graphs justify the use of (2.50) as a good approxi-
mation for practical combinations of Bs and K. It can be seen also by a comparison with 
Figs.2.10 to 2.13. for variation of stopband attenuation Bs as the parameter instead of b. 
E 
that the noise due to the ADC quantization has less effect on ;0 and hence HER perfor-
E 
mance. This continues to be so until b is reduced to about 5 or 4. when A Nb
o 
shows much 
greater increases. However the assumption of the ADC quantization noise model also 
becomes less valid as b becomes small because the correlation between quantization noise 
and signal becomes more significant [107]. Hence these curves may only serve as guidelines 
to provide some form of estimates for the wordlength. The actual word length required 
- 55-
may be found by adding 1 or 2 more bits to that predicted analytically. Indeed this is 
shown in the latter section where computer simulation is used to compare the results. 
In mUltistage :filtering the same difficulty arises in finding a suitable closed form 
approximation for the composite frequency response. Numerical methods must be adopted 
which will only give specific results for the filter responses used. In this case the approxi-
mated relationship in (2.50) shows that the dependence on the filter response is negligible. 
This approximation remains valid for the composite responses in the tree structures shown 
before. Hence the same set of curves applies for the tree TMUX. A conclusion may be 
drawn here. that if the TMUX filter response satisfies the stopband requirements in 2.3.2.1 
then the ADC wordlength requirement is independent of the number of FDM channels. 
2.3.4. Filter Coefficient W ordlength 
Finite coefficient wordlength introduces error in the coefficient values which then 
results in the deviation of the resultant frequency response from the original response 
designed using infinite precision. The approach to quantifying such error is to model the 
coefficient quantization as independent noise sources. very similar to the ADC quantization 
noise discussed previously. Rather than expressing the degradation in terms of additive 
noise as for the ADC. a more accurate assessment of the effect is provided by examination 
of the resultant frequency response deviation. For a FIR lowpass filter. a condition can be 
set in that this quantization noise starts to cause degradation to the frequency response 
when its magnitude becomes significant compared to the stopband attenuation. In this case 
the stopband attenuation will start to increase significantly with further reduction in 
wordlength. The relationship [116] is given by 
max I Ml (cu) I = crJN InN (2.51) 
which gives the maximum error in the frequency response Ml (cu) and u 2 = variance 
of additive coefficient error noise. and N = filter length. 
Using the noise model of quantization noise for (b + 1) bit wordlength in (2.45) for 
u 2 and also setting maxlaH(cu)I=8$ as the condition when 8$ starts to be reduced. the 
following relationship is derived. 
1 NInN 
b - "2 log2 ( 128 2 ) 
$ 
(2.52) 
- 56-
2r------.------~----~ 
1.5 
1 
0.5 
b: 8 . - - --. 
7······ .... 
6 ------
5--
.---
------
---
---
~-~ --~ 
~~~ 
~~ 
~~-
................. o .............. :: ::.:~ .. ~::.:: :.:~ ... ~:::: :'. -. . -. _. -. _. 
8 10 12 
Eb/No (dB) 
~ 
.... 
5 
-
..... 
.c 
14 
, 
\ 
10-5 
10-6 
10-7 
b: 5 . _. -.- -
6 ... 
8 ------
ideal--
'. 
" 
10~~--~----~-----L~--~ 
6 8 12 14 
Fig.2.18 Degradation due to ADC wordlength for K-8. Bs -o.05 
10 
Eb/No (dB) 
6 
..... 
J 
o 
~ 
~ 
2.-----.---~~--~ 
1.5 
1 
0.5 
b: 8 . _. -. -. -. 
----
7· ........ · 
6------
5--
---
---
---
~-
-' 
---
---
............ o ............... :..~.~ :.:--:: .. ~~.~:~ ... :.:-. '-'-'-'-' 
8 10 12 
Eb/No (dB) 
14 
b: 5 -. - . - . - . 
6· .. · ...... 
8 .-----
ideal--
(1) 
~ \ 
.... 
.... g 10-5 
(1) 
-
..... 
.c 
10-6 
10-7 
10~~--~----~-----L~--~ 
6 8 12 14 
rdlength for K-8. Bs=<>·OOl 
10 
Eb/No (dB) 
§ 
.... 
-«I } 
o 
~ ~ 
- 57-
2.5 r------r----.--------. 
b: 8 . - . - . - . -
7········ 
2 6------
5--
1.5 
1 
0.5 ---
---
---
---
---
----
.......... o ... :. ':.:':' .:.:: ~.: : ... ~ .. ~: ~.: :.:~ '.'~: ~.:~. -. . -. -. -. -. 
8 10 12 
Eb/No (dB) 
Q) 
~ 
... 
~ 
Q) 
-
.... 
.c 
14 
10-2 ~-__,_--___r_--_.,._----:::: 
10-3 
, 
" 
10-4 
10-5 
10-6 
10-7 
b: .5 . _. _. - _. 
" 
, 
'. 
6 ........... . 
8 ------
ideal--
' . 
'. 
'. 
10~L---~L--~----~~-~ 
6 8 10 
Eb/No (dB) 
12 14 
Fig.2.20 Degradation due to ADC wordlength for K=256. 8,s-o.05 
2~----~------.------, 
b: 8 . - . - . - . - . 
7············ 
1.5 
6 ------
5---
1 
0.5 
---
---
---
---
---
---
---
---
o .............. ~.::.: ... .:: ~. --. -. -_. - . -. _. -.-
8 10 12 14 
Eb/No (dB) 
~ 
... 
~ 
Q) 
-
.... 
.c 
10-2 r:::------,.-----.,------,.------:: 
10-3 
10-4 , 
, 
10-5 
10-6 
10-7 
b: 5 --'-'-' 
, 
, 
6 .......... . 
8 ------
ideal--
\ 
\ 
10~ L...-__ ~L-__ ~------L-------J 
6 8 10 12 14 
'ordlength for K-256. 8,s -0.001 Eb/No (dB) 
- 58-
Thus for a given filter of length N and stopband attenuation lis. the minimum 
wordlength b can be found. Furthermore. N may be found using (2.39) given a set of 
filter specifications. which then allows the optimization of either one of b • N or lis. given 
the other two. The application of this design rule is limited by the fact that (2.51) only 
represents an asymptotic bound for max IIlH(CJJ) I as N approaches infinity. For the proto-
type lowpass filter the filter length is normally greater than 100 for a number of channels 
greater than 16. In these cases the asymptotic bound is shown to be a good approximation 
[116]. For the complex tree TMUX the filter length in each stage. with the possible excep-
tion of the final stage postfilters. are very short. of the order of N==10. In this case the 
application of (2.51) is not so accurate. 
Another complication arises from the mUltistage nature of the tree filter bank. In 
theory each stage of the filter bank may have different frequency responses. in particular 
different stopband attenuation as described in section 2.3.2. In this case the criteria for 
optimization may include minimization of coefficient wordlengths at each stage. Since in 
general a smaller lis implies shorter wordlength. the use of different lis at each stage 
appears to offer an advantage in reducing wordlengths. In fact this was found not to be the 
case as the reduction of lis in one stage requires an increase in another to maintain the 
same BER performance and the net effect is insignificant. 
The determination of the coefficient word lengths for the tree structures is therefore 
more properly achieved by computer simulation as analytical means are unlikely to yield 
sufficiently accurate estimates. 
2.3.5 Filter Arithmetic Wordlengths 
The objective in this section is to model noise due to the rounding of the multiplica-
tion products to a given arithmetic wordlength. This is required after each multiplication 
when the sum of the wordlengths of the two multiplicants is greater than that of the pro-
duct accumulator. or when the product needs to be rounded to a wordlength convenient 
for subsequent processing. It involves the appropriate scaling of the coefficients and the 
determination of the resultant roundoff noise. 
- 59-
2.3.5.1 Scaling of Coefficients 
Scaling of the filter coefficients is required to prevent overflow in the resultant pro-
ducts. It also affects the roundoff noise and hence the objective is to minimize both the 
effects of overflow and roundoff noise at the same time. Based on characterization of sig-
nals using Lp norms. appropriate scale factors may be found [117] using the appropriate 
overflow constraints. From these the following relationship is derived 
(2.53) 
where CT;; = output signal variance. Fi == system frequency response. ~u = input 
power spectral density. I I X I I p = Lp norm of X and is defined by 
12
1 rTT I X (w ) I p d w I ~ . and 1.+1.= 1. 
11" J- TT P q 
Hence the higher ordered norm one has for the input signal. the more accurate one is 
in predicting the bound of the output signal variance. In general the only norm known of 
the input signal is I I ~u I 11 which corresponds to the variance of the signal. This results 
in the relationship 
(2.54) 
where CT; is the input signal variance. 
This then implies that in order to maintain the output variance to be the same as the 
input. so that the output signal will overflow no more often than the input. the condition 
on the filter response is I I p2 I I co ~ 1 which implies 
(2.55) 
As I I F I I co = max IF (w ) I • the condition corresponds to a bound on the maximum 
frequency response magnitude of 1. As max IF (w ) I occurs in the passband. and in the 
TMUX filters the passband has a nominal gain of 1. The required scale factor is therefore 
simply 1. 
For a single stage approach or for the first stage of a tree TMUX. the input power 
spectral density may be reasonably assumed to have a maximum which means I I ~ I 100 
can be defined. In this case the scale factor may be found from the condition I I Fill ~ 1 
which is a bound on the variance of the filter impulse response. and is a less stringent 
requirement than I I F I 100 before. For multistage filtering. the input to the subsequent 
- 60-
stages cannot be defined this way and the previous bound has to be used. Decimation of 
the output causes no difference in the signal variance and the results still hold. This is 
verified by simulation as discussed in later sections. 
2.3.5.2 Arithmetic Roundoff Noise 
The objective here is to obtain some relationship between the filter arithmetic 
wordlength and the resultant roundoff noise so generated. 
Rounding of the product of filter coefficient and signal samples to a wordlength of b 
bits is similar to quantization in the ADC. Hence the noise generated by each rounding 
operation has variance as given in (2.44). In practice realizations of FIR filters. especially 
using general purpose digital signal processors. rounding is performed after the summation 
of the coefficient-signal products instead of being done at each of the individual product 
terms. Hence it may be modelled as a single additive noise source at the output of the 
filter. This ignores the occurrence of overflow which would cause much larger error than 
roundoff noise alone. 
In the TMUX. using (2.26) the resultant roundoff noise variance is given by 
2-2bt 
cr} = 1: I I Hi (cu) I 11 12 
alli 
(2.56) 
where Hi (cu) and bi denotes the frequency response from the output of the i th stage 
to the output of the TMUX. and the word length (without sign bit) of that stage. 
For single stage filtering. the noise source is added directly to the output signal. Sub-
E 
stitution of (2.45) as E into (2.37) gives the expression for the reduction in ;0 as 
(2.57) 
A comparison of (2.57) with (2.49) reveals that the filter arithmetic word length has 
to be greater than the ADC wordlength by a number of bits equal to ~ 10gzK . In practice 
this is proved to be too optimistic as the filter arithmetic wordlength typically exceeds the 
ADC by 4 to 5 bits. This fact may be attributed to overflow after summation of the pro-
ducts. 
- 61-
Another fact to be noted is that whereas the ADC is largely unrelated to the number 
of channels. the arithmetic wordlength has to be increased with greater number of chan-
nels to give the same BER performance. 
The situation with mUltistage filtering requires an extension of (2.56). which for a 
L -stage filter is 
L-l 
II n Hm(w) Iii 
m=Z+l 
(2.58) 
where bz - wordlength of stage l. and Hm (w ) ... frequency response of stage m. 
As before. the product term in (2.58) may be evaluated numerically given the 
specific sets of filters. Substitution of the value of (2.58) as E into (2.34) to (2.37) then 
E E 
allows the set of curves for A ;0 against ;0 and the corresponding BER characteristics 
for different values of band K to be plotted as shown in Fig.2.22 to Fig.2.24. with K-16. 
32 and 64 respectively. The word length bz was made constant for alll and hence the same 
for all filter stages. 
0.6 
b: 
0.5 11 
._._. _.-. 
10 ............ 
§ 9 ------
.- 0.4 8 
-~ 
"0 
~ 
~ (1) 0.3 
"0 
0 ~ 0.2 
~ 
-' --0.1 
---
--' 
---
---
---
----
o .................... . 
8 10 12 
Eb/No (dB) 
(1) 
-
~ 
'"" 
'"" 5 
-.-
.0 
" 
" 
14 
" 
10~ 
10-5 
10-6 
10-7 
, 
, 
, 
, 
" 
'. , 
'. 
, 
" 
b: 8 .. _._. _. 
, 
, 
10 .......... . 
12 .-----
ideal --
'. 
'. 
Eb/No(dB) 
10~L---~~--~----~----~ 
6 8 10 12 14 
Fig.2.22 Degradation due to arithmetic wordlength for 16-channels 
1.4 
b: 
1.2 11 ....... 10 ............ 
§ 1 9 ------..... 
-= "0 0.8 i 
"0 0.6 0 
~ 0.4 ~ 
" 
", " 
0.2 ------
---
---
---- ---
................ 
.... ............... . o .... .......... '.'.'.'.'.' '.'.'.'-' 
8 10 12 
Eb/No (dB) 
- 62-
£ 
e 
'"' 5 
-
..... 
..c 
14 
10.3 
10-4 
10.5 
10.6 
10.7 
b: 9 ..•.•. -. 
10 .......... 
12 .----. 
ideal --
• 
. 
. 
. 
10~~--~----~-----L~--~ 
6 8 12 14 
ig.2.23 Degradation due to arithmetic word length for 32-channels 
10 
Eb/No (dB) 
3~------~----~~----~ 
b: 
2.5 
11 .....•... 
10 .......... .. 
9 ------
8---2 Q) 
-= 
'"' 
1.5 g 
Q) 
-
..... 
1 ..c 
0.5 
o ............. : .. : .. : .... :~ .. ~ .. ~:-::~.--: ... ~: ........ . 
8 10 12 14 
Eb/No (dB) 
10-4 
10.5 
10.6 
10.7 
b: 9 .'.'.'.' 
10 ......... .. 
" 12 .-----
" ideal --
" 
" , 
" 
, 
, 
'. 
" 
'. 
" 
10~L---~----~----~~--~ 
6 8 10 12 14 
lc wordlength for 64-channels Eb/No (dB) 
- 63-
Equation (2.58) indicates that for Hm (cu) being bandpass filters with passband gain = 
1. the summation terms of smaller values of I are mUltiplied by more terms of Hm (w) 
and therefore attenuated more by the stopbands of these terms. The wordlength bz should 
theoretically increase with I if each stage of the TMUX is to contribute the same amount 
of roundoff noise. The form shown in (2.58) is also similar to the cascade form realization 
of FIR filters. An approach to the minimization of roundoff noise in this form of FIR filters 
is given in [118] which obtain the optimum solution by means of the ordering of second-
ordered cascade sections. This cannot be applied to the tree TMUX here because the fre-
quency response of each stage is rigidly specified so as to allow decimation and generation 
of both highpass and lowpass signals. and hence the reordering of the zeroes from the cas-
cade stages is not possible. 
Consideration of a typical tree TMUX for 16 channels. for example. using (2.57) and 
(2.58) shows that the difference in wordlength between the first and the last (4th) stages 
is about 2 bits when the roundoff noise from each stage are equal. Conversely. it shows 
that increasing the wordlength by the same amount to a later stage reduces the overall 
roundoff noise more than if it is added to an earlier stage. This serves as a design guideline 
to the efficient allocation of processing resources to achieve better performance. 
2.4 Computer Simulation 
Computer simulation was carried out in order to investigate the effects that the vari-
ous parameters have on the overall performance of the TMUX. The results are used to 
provide additional information on the effects on the BER characteristics with variations of 
the parameters to enable the theoretical analysis to be verified. Where the results from 
simulation generally agree with those from theoretical analysis. the BER characteristics can 
be established as having a reasonable degree of accuracy. 
2.4.1 General Procedures 
A specification for the signals was chosen to represent one that may be used in practi-
cal satellite communication systems for mobile application. Hence in this case. frequency 
multiplex of signals equivalent to 16kb/s QPSK with 40% cosine rolloff and 14kHz chan-
nel bandwidth were used as the input to the TMUX. The subsequent demultiplexed signal 
- 64-
at the output was then demodulated by an ideal demodulator. The BER characteristics 
were then evaluated using the quasi-analytical technique [111] which provides estimates of 
the BER characteristics from a small number of signal samples. in the order of a few 
hundred QPSK symbols in this case. This was used as a basis for comparison of 
performance. The general procedure was to obtain results for the variation of one 
parameter whilst keeping all the others constant. so that the degradation of performance 
due to the variation of one particular parameter could be separately quantified. In most 
cases. however. this was not really possible as some arbitrary choices had to be made for 
some of the other parameters. In some cases. the parameters were not independent 
variables and the choice of one affected that of another. 
The TOPSIM software package [119] was used in the generation and BER estimation 
of the QPSK signals. Routines were then written to simulate the TMUX structures allow-
ing parameters to be varied. 
2.4.2 System Parameters 
For the non-transform methods of TMUX·s. namely the per-channel method and the 
tree filter bank. the system parameters that were varied are listed as follows: 
(1) AID converter wordlength. 
(2) Filter coefficient wordlength. 
(3) Filter arithmetic wordlength. 
(4) Stopband attenuation. 
(5) Passband ripple. and 
(6) Transition frequencies. 
The first three are finite wordlength effects due to digital implementation of the 
architecture. The others are the filter specification parameters and exist regardless of the 
means of implementation. whether digital or analogue. 
2.4.2.1 Filter Specification 
The choice for the variations in filter specification is straight forward for the per-slot 
approach. The transition bandwidth may theoretically take any values from zero to a 
maximum equal to the width of the guard band between adjacent frequency slots. Any 
- 65-
wider bandwidth will certainly introduce aliasing component$ into the wanted frequency 
band. Increasing the transition band over the guard band effectively increases the adjacent 
channel interference. The degradation to the performance resulting from this depends on 
the behaviour of the adjacent channels making the results very specific to the system 
under examination. It is therefore thought that the transition band should be kept to 
within one guard band. Reducing the transition band will not improve the BER perfor-
mance significantly in the situation where the adjacent channels are perfectly aligned in 
the frequency slots. i.e. no frequency drifts. as assumed in this study. Even for the case of 
frequency drift into the guard band by adjacent channels. reducing the transition band of 
the demultiplexing filter is unlikely to improve the receiver HER greatly since the out-of-
band signal is essentially attenuated by the final receive data filter. 
As discussed previously. the length of a FIR filter is approximately inversely propor-
tional to its transition width [115]. and the amount of computation varies accordingly. 
Hence the straight forward choice of a value for the transit~on band is the maximum of 
one guard band. In mUltistage decimation the final stage has a nominal decimation rate of 2 
as discussed previously. making it possible to use halfband filters thus yielding further 
reductions. 
For tree filter banks the filter for each stage is specified to satisfy the requirements 
for the composite response. It is equivalent to the specification of individual filters for 
mUltistage filtering [107]. This approach results in filter specifications that are theoreti-
cally predicted as overly pessimistic. because the composite stopband response is not 
equiripple but rather has an upper bound equal to the stopband attenuation of one filter 
stage. It is therefore possible to use simulation to investigate how much the specification 
can be relaxed without greatly degrading the BER characteristics of the system. 
Complex halfband filters were used throughout in the tree filter banks due to the low 
multiplication rate. This means that the filter length is restricted to N = 4n + 3. (n 
integer). As a result. the filter specification cannot be varied continuously. The wide tran-
sition bands of the complex response also allows the filter lengths to be very short. Hence 
the variation of the specification is constrained by the small number of available filter 
lengths (7. 11. etc) and the 'optimal" specification is usually not achieved. In practice. the 
same filter specification may be used in all stages of the tree although theoretically the 
optimal for each stage should be slightly different from the next. A fast method to obtain 
- 66-
near-optimal filters for successive decimation-by-2 stages was suggested by Goodman 
[120] whereby the filter at each stage is chosen from nine specific filters. It was found that 
little advantage could be gained for the complex tree filters here as their short filter 
lengths and halfband responses make them very efficient already. 
For this system. it is found that the single stage filter for demultiplexing one channel 
is to have a nominal passband and stopband deviation of 0.01. or -40dB. with a transition 
band equal to the guard band bandwidth. Any relaxation of the filter specification will 
cause significant degradation to the BER characteristics. This is taken as the nominal 
specification for the composite response of the multistage filters. 
2.4.2.2 Finite Wordlength Effects 
Finite wordlengths were simulated for the ADC. filter coefficients and filter arith-
metic separately. The wordlengths were reduced successively from a value at which the 
BER characteristics started to show deviation from the ideal curve. BER curves were then 
obtained for each value of word lengths until the degradation becomes so great that the 
HER curves became meaningless. Hence a set of HER curves are obtained for each finite 
word length effect. 
2.4.3 Results 
A large number of HER curves were obtained due to the number of combinations pos-
sible with the many parameters. In addition. variations are possible with the input signal 
characteristic. such as the number of channels. channel spacing. bit rate and the rolloff fac-
tor. In this study only the number of channels were varied. The data rate was 64kbit/s 
with a channel spacing of 64kHz. The BER curves for 16. 32 and 64 channels with varia-
tion in ADC. filter coefficient and arithmetic wordlengths are shown in Fig.2.25 to Fig.2.27. 
The following table shows the wordlength requirements for different number of channels 
with an arbitrarily chosen degradation of about O.SdH from the ideal at HER - 10-6• It can 
be seen that the required wordlengths for a particular parameter are largely similar for the 
different number of channels K. 
It is seen from the HER curves (Figs. 2.26 to 2.28) that the degradation due to the 
various parameters do not take place gradually. Hence it is not always possible to choose 
- 67-
K 8 16 32 64 128 256 
ADC 8 8 8 8 8 8 
Coefficient 7 7 7 7 7 7 
Arithmetic 12 12 12 13 14 14 
Table 1. Tree wordlength requirements 
for each parameter a curve that corresponds to the same amount of degradation from the 
ideal. A set of wordlengths may be chosen arbitrarily by selecting from each graph the 
curve that corresponds to a reasonable amount of degradation. Then a comparison can be 
made between different architectures in terms of computational requirements. It is. how-
ever. a comparison of a specific case of :filters and it is not strictly speaking applicable to 
general cases. Nevertheless. the specific case of :filters so chosen represents a reasonable set 
of parameters for the TMUX and hence the results apply to'many common examples of 
TMUX having similar specifications. The computational and storage requirements are listed 
in the following table. 
K 8 16 32 64 128 256 
Storage (kB) 0.4 0.8 1.7 3.4 7.4 15.0 
Multiply ( X 109 ) 0.02 0.04 0.10 0.23 0.52 1.18 
Add ( X109 ) 0.04 0.08 0.20 0.46 1.0 2.4 
Table 2. Tree computation and storage 
BER curves for the per-slot approach for 8 channels using comb :filters with various 
decimation rates are shown in Fig.2.28. The comb :filter is used as the decimator in the :first 
stage followed by a FIR :filter in the second. It can be seen that a decimation rate of M=4 
introduces a small degradation that is similar to that of the tree :filter bank with no finite 
~ 
... 
~ Q) 
-
..... 
.0 
~ 
C'd 
... g 
Q) 
-
.... 
.0 
- 68-
(a) 16 channels 
1 0-2 ~---'r-------r----'------. 
10-3 
10-4 
10-5 , 
, 
, 
, 
, 
10-6 , 
, 
, 
10-7 
10-8 L--__ I.-__ I.-_---Iu.u._~ 
6 8 10 12 14 
Eb/No (dB) 
(c) 64 channels 
10-2 ....-----r----r----.--------::::l 
10-3 
10-4 
10-5 
10-6 
10-7 
, 
, 
, 
, 
, 
, 
10-8L---~-----L----~ll-~ 
6 8 10 12 14 
B) 
(b) 32 channels 
10-3 
10-4 
~ 
... 
~ 10-5 
-
. ... 
.0 
10-6 
10-7 
8 10 12 14 
Eb/No (dB) 
b: 4 
5 
6 
7 
8 
infinite 
ideal 
Fig.2.25 Degradation due to ADC wordlength 
G) 
-= 
'"' g 
G) 
-
.... 
.0 
G) 
-= 
'"' g 
G) 
-.-
..c 
- 69-
(a) 16 channels 
10-2 c:----..----..----.---__ 
10-3 
10-4 
G) 
-e 
10-5 g 
G) 
-.-.0 
10-6 
10-7 
10~~---~--~-~~-~ 
6 8 10 12 14 
Eb/No (dB) 
(c) 64 channels 
1 0-2 ~--.,.__--_r__---r--__:::l 
10-3 
10-4 
10-5 
10-6 
(b) 32 channels 
1 0-2 ~---.___----.--___..--__,.., 
10-3 
10-4 
10-5 
10-6 
10-7 
10~L--~L--~----~~~~ 
6 
b: 
8 10 
Eb/No (dB) 
5 
6 
12 
7 .-----------. 
8 -'-'-'-'-'-'-'-" 
infinite 
ideal 
14 
Fig.2.26 Degradation due to coefficient word length 
10-7 
10~L---~----~----~~~ 
6 8 10 12 14 
~ 
= ... 
~ 
-
.... 
~ 
Q) 
-= ... 
... 
5 
-
.... 
~ 
- 70-
(a) 16 channels 
10-2 
10-3 
, 
, 
10-4 , 
, 
, 
, 
, Q) 
, 
-= , ... 
10-5 ~ 
-
. ... 
~ 
, 
10-6 
, 
10-7 
10-8 L--__ I..--__ I..--_--IL.....l..I...-'L-----' 
6 8 10 12 14 
Eb/No (dB) 
(c) 64 channels 
10-2 =---.----...,..,------.----,-------=i 
-. 
-. 
10-3 
'-. 
-'-
10-4 
10-5 
10-6 
(b) 32 channels 
10-2 
...... 
. .... 
...... 
10-3 
, 
, 
, 
, 
, 
10-4 
, 
, 
, 
, 
, 
10-5 , , 
, 
, 
, 
.. 
10-6 
10-7 
1 0-8 L-_---'l..-_---L __ ---L~_--..J 
6 8 10 12 14 
b: 
Eb/No (dB) 
8 ......................... .. 
9 ._._._._-_.- -.-.-. 
10 
12 
16 
infinite 
ideal 
Fig.2.27 Degradation due to arithmetic wordlength 
10-7 
10-8L---~----~----~~~ 
6 8 10 12 14 
~) 
- 71 -
word length effects. This again does not agree with the theoretical condition shown in 
(2.11), which implies a maximum M equal to O.02K for 8, - 40dB. It is shown here that 
an acceptable maximum for M can be O.5K. This is because the application of 8, to (2.11) 
produces a pessimistic upper bound for the comb filter which does not have equiripple 
stopband characteristics. The nominal filter specification for the single stage filter. as stated 
in the previous section, therefore cannot be directly applied to find the maximum decima-
tion rate for the comb filter. 
-:0 
10-6 
10-7 
M: 8 
4 
2 
idea.l 
1 0-8 L--_---I'--_---.J. __ ~'-'--_---J 
6 8 10 12 14 Eb/No (dB) 
Fig.2.28 Degradation due to comb filters for 8-channels 
2.4.4 Inferences from Results 
Two kinds of inferences can be drawn from the results:' firstly the simulation results 
are compared with the theoretical analysis of the previous section. Secondly the simula-
tion results may be considered on their own and conclusions drawn. 
Comparison between theoretical and simulation results generally shows good agree-
ment between the two methods in estimating the degradation in BER characteristics with 
various system parameters. The theoretical results do show, however. lower values of the 
BER degradation although the differences are small. This is true for all the parameters 
considered, such as the results for stopband attenuation, the ADC wordlength and the 
- 72-
arithmetic wordlength. The discrepancies are of similar magnitude. generally differing in 
Eb 
Il No by a factor between 1 and 1.5 approximately. Consideration of the theoretical 
results reveals similar trends as those shown in the simulation results for the variation of 
individual parameters. For example. the ADC wordlength causes large degradation in BER 
when it is reduced to below 6. In addition. the amount of HER degradation due to one 
parameter and that due to another. whether estimated by theoretical or simulation means. 
show similar relative magnitude. For example. the arithmetic word length is greater than 
the ADC wordlength by about 4 bits for the same BER degradation. in both theoretical and 
simulation results. 
This leads to the conjecture that there are additional scurces of degradation in com-
puter simulation not included in theoretical analysis. It is believed that arithmetic 
over:flow is one such source of this error for the finite wordlength analysis. In particular 
this will be most prominent in the finite arithmetic wordlength case since there are more 
frequent occurrences of overflow in the mUltistage tree structure. although the probability 
of overflow is theoretically the same at each stage. Another source of error is believed to be 
the nature of the signals being not strictly uncorrelated with the channel signal to be 
demodulated. In the theoretical analysis the out-of-band signals are approximated as white 
Gaussian noise. In actual, QPSK signals of the same data rate as the signal to be demodu-
lated is likely to cause higher BER than white noise of the same power due to the higher 
variance of the sum of signals when the covariance between them is not zero. 
Considering the simulation results on their own. it was noted in the simulation that 
the filter specification for the tree structure can be greatly relaxed without seriously 
degrading the BER performance. This was also supported by theoretical analysis as the 
composite response of the tree structure was shown to have very low stopband noise-to-
power ratio. It was also observed that the passband ripple had much less effect than the 
stopband attenuation. It is not possible to exploit this property if halfband filters are used. 
which have equal passband and stopband deviation. Howeve:!'. it is observed that the con-
dition for passband ripple need not be satisfied when realized by mUltistage filtering. Hence 
the same filter speci:fi.cations for individual filter stages may be used for different number 
of channels which require different numbers of stages. with little degradation of the BER 
characteristics. 
- 73-
The final stage post-filters in the complex tree filter barJt. was expected to amount to 
a larger proportion of the total computation required. because of their much sharper tran-
sition band. It was found that the lengths of these post-filters could actually be reduced 
by a large amount without much degradation of the BER curve. In the example it was 
reduced to a length of 11. with a stopband attenuation of about 20dB when the BER curve 
started to depart visibly from the ideal. This is a significant result which differs with the 
common notion found in the literature that the computation due to these post-filters 
would amount to more than half of the total computation in a tree structure TMUX. It is 
established here. however. that they could actually be greatly reduced in this system for 
aBP TMUX application. This is the result of the difference between the aBP TMUX and 
the TMUX considered in the literature in that the criteria in assessing the aBP TMUX is 
the BER characteristics. as opposed to an established standard for TMUX in terrestrial net-
works. 
2.5 Conclusions 
nsp techniques applicable to the aBP TMUX have been studied. The background 
theory was developed and different possibilities explored. Some alternatives to the conven-
tional structures have been suggested with possible advantages in terms of computational 
requirements. Simulation was carried out on the tree structure to find the wordlength 
required for each of the parameters and to produce a design philosophy. The actual compu-
tational requirement was then calculated using the results obtained to give an idea of the 
hardware complexity that would be required in real terms. 
Theoretical analysis was carried out for an initial estimation of the parameters. These 
are compared with the simulation results and some small differences were found. This was 
believed to be due to the shortcomings of the theoretical models which have not taken all 
factors into account. As a tool for comparison of the effects caused by the parameters this 
theoretical approach provides a fast way of assessing the effects on the BER performance. 
Greater accuracy may then be established by simulation. 
It was generally found that the system design involved many parameters making 
overall optimization difficult. So far. the only criteria for optimization has been the compu-
tation rate (multiplication and addition). The parameters that were arbitrarily chosen were 
- 74-
chosen in order to represent a typical satellite application. The results are therefore partic-
ular to these set of parameters. However. they do represent a correct order of magnitude 
and small changes to the system specification should not result in very different conclu-
sions. 
An approach towards the choice of parameters for the TMUX structures has been 
described. This leads to some sets of parameters being chosen for the tree structure TMUX. 
The order of magnitude required was found to be very demanding on the computation 
requirement. On the other hand the storage requirement is trivial and easily achievable 
with current memory teChnology. 
- 75-
Appendix 2A. Optimum Signal Detection by Matched Filter [80] 
The optimum receiver filter for the detection of a completely known signal in addi-
tive white Gaussian noise is found by the maximization of the signal to noise ratio of the 
filter output signal at the optimum sampling time. 
Let y (t) be the filter output signal when the known signal f (t). to be detected. is 
input. In the presence of additive noise. the filter output becomes y (t ) + n (t ) where n (t ) 
is the noise output component due to the additive noise at the input. The objective of the 
receiver filter is then to maximize the ratio 
(2A.l) 
where tm denotes the optimum sampling time instant. 
If the Fourier transform of f (t) and the receiver filter is F(w) and H(w) respec-
tively. then the output y (tm ) is given by 
(2A.2) 
For white Gaussian noise of power spectral density N 0/2. n 2(t ) is the noise power at 
the output of the filter given by 
(2A.3) 
Substituting (2A.3) and (2A.2) into (2A.l) gives 
00 
1 JH(w)F(w)ejll}t,. dw 12 
-00 
- --~---------------- (2AA) 00 
",No J IH(w) 12 d w 
-00 
Using the Schwarz inequality 
00 00 00 
1 J f 1 (x ) f 2(X ) dx 12 ~ J 1 f 1 (x ) 1 2 dx J 1 f 2(X ) 12 dx 
-00 -00 
(2A.5) 
-00 
The equality is satisfied by the condition 
- 76-
where k is an arbitrary constant. 
Hence the maximization of (2A.1) is achieved by the condition 
(2A.6) 
which gives the optimum matched :filter condition for the receiver :filter. The max-
imum output signal to noise ratio under this condition is obtained by substituting the 
equality of (2A.5) into (2A.4), which gives 
(2A.7) 
00 
where Eb = _1_ J IF (cu ) 12d cu, which is the energy of f (t ) with respect to 1 ohm. 
277' -00 
The signal output y (tm ) is given by substituting (2A.6) into (2A.2) to give 
(2A.S) 
and the noise power n 2(t ) is given by 
(2A.9) 
3 
Flexible Transmultiplexer 
by DFr Convolution Methods 
3.1 Introduction 
3.1.1 Problem Description 
As the number of users in an SCPCIFDM type environment increases. it is envisaged 
that a need will arise for the system to provide services of different bandwidths so as to 
allow different traffic requirement of individual users. In some cases the situation is 
expected to be a combination of TDMA and SCPC signals. in which case the TDMA 
bandwidth will typically exceed those of the SCPC ones [121]. In other cases different 
types of users. for example mobile and business earth stations. can be supported economi-
cally using SCPC signals of different data rates and hence bandwidths [36]. Furthermore. 
a TMUX which can accommodate different bandwidths and frequency allocations will be 
- 78-
able to handle different requirements as future system configuration changes. This will 
give much greater flexibility to a system with OBP. In systems where on-board demodula-
tion and remodulation is not required. the flexible TMUX may be used in the routing of 
FDM signals between multiple spotbeam antennas. allowing greater ease of change in fre-
quency plans than equivalent proposals of Lf. or r.f. switching [25.53]. 
A recent publication [122] provides a solution to the problem with more emphasis on 
hardware implementation than the optimization of the processing algorithms. This chapter 
is concerned with a particular design methodology and its optimization with respect to 
computation requirement. 
3.1.2 Review of Methods 
The problem of TMUJCs accommodating variable bandwidths is related to non-
uniform filter banks that have been studied mostly in application to spectral analysis. Due 
to the nature of their application. most of these studies on non-uniform filter banks can-
not be applied to the TMUX considered here. The main differences lie in the requirement of 
spectral analysers to obtain an accurate estimate of the magnitude or power spectrum of 
the signals. The output from the spectral analyser represents the spectrum for a time 
duration which is the observation period of the spectral analyser. In the TMUX. the out-
put signals are required to satisfy more stringent conditions. The sampling rate is one such 
condition. in that it is determined by the amount of frequency domain aliasing allowable 
into that channel to preserve the integrity of its signal. For the same reason the system 
response of the TMUX is required to be linear phase to satisfy necessary filter 
specifications such as stopband attenuation. 
In some cases it is possible to modify the algorithms used in spectral analysis to 
serve in the TMUX. A number of cases have been found as possible solutions. but gen-
erally literature on non-uniform filter banks . even those for spectral analysis. are scarce. 
Some non-uniform filter banks for speech analysis have been reported [123]. The emphasis 
in the latter was the 'constant Q" response in which the channel bandwidth is directly pro-
portional to the centre frequency of the channel. Similarly filter banks with octave-band 
channels have been studied [99.106]. Together they represent a very particular form of 
non-uniform filter bank. which is generally unacceptable for the TMUX because of the 
- 79-
strict relationship between the channel frequency and its bandwidth. whilst more general 
channel specifications would result in rather large increases in computational requirement 
[124]. A special case of polyphase filter bank [125] using window-designed FIR filters 
[126] as polyphase branches requires less computation but is still very high compared to 
uniform polyphase-FFf filter banks because the operation rates are determined by the 
sampling rate of the channel with the widest bandwidth. and that the window-designed 
filter branches are sub-optimal. Frequency domain interpolation of FFf output coefficients 
offers a very efficient solution [127] due to the high efficiency of available techniques [128] 
• but suffer from the poor transition band characteristics of the DFf channel responses. An 
alternative approach for multiresolution spectral analysis was suggested using FFf-
pruning [129] which. although not applicable on its own due to the poor transition band 
characteristics as before. may be applied to the method described in this chapter to reduce 
computation when some channels are not actually carrying signals. 
Another more general type of non-uniform filter bank can be categorized as the 
synthesis/analysis method. These have structures that can be described by a general theory 
of filter bank. [130] as having an analysis section which has the same structures as uniform 
filter banks. followed by a synthesis section which recombines the separate channel out-
puts from the analyser to form wider bandwidth channels. The rich source of literature 
on this type of filter banks is concerned with perfect reconstruction of the original signal 
after having been analysed into many subbands with decimation at the analyser output 
[131.132]. The main concern has been the solution to the perfect reconstruction problem 
of the signal such that the analysis/synthesis filter bank behaves as a pure delay without 
any magnitude or phase distortion. To this end the subject has been well understood and 
different solutions are continually being found. As for the flexible TMUX. it is not 
required to recombine all the channels but only the number of channels that together con-
stitute a channel of wider bandwidth. This is the partial reconstruction of the input signal 
and clearly this cannot be perfect as it would imply an ideal bandpass filter with zero 
transition band. The problem of partial reconstruction has not received as much attention 
and this approach is the subject of detailed discussion in the next chapter. 
An earlier approach aimed at variable bandwidth spectral analysis uses a variation of 
the well known fast convolution procedure [133]. which was also described in a slightly 
different context of narrow band high resolution spectral analysis in a different paper 
[134] and improved upon more recently for greater computation efficiency [135]. It was 
- 80-
also noted specifically as a frequency domain implementation for trans multiplexers but 
only considered for conventional uniform bandwidth application [136]. It was again con-
sidered in the more recent paper which addressed the issues of the flexible TMUX for OBP 
satellites [122]. Emphasis was given to the hardware implementation aspects but details 
were scant on the consideration of the method itself. A qualitative outline of possible 
approaches was given in a review [137] which chose the analysis/synthesis filter bank as 
the best solution in terms of the tradeoff between flexibility and computational efficiency. 
It was compared with the tree type structure and the per-channel approach. but quantita-
tive measures regarding the comparison of these structures ar:.d their tradeoff were lacking. 
More detailed study is therefore needed to assess the suitability of these analysis/synthesis 
techniques for the flexible bandwidth TMUX in OBP applications. 
In this chapter. designs based on the DFf convolution method are described. their 
merits in terms of implementation complexity and performance are assessed and tech-
niques in simplifying the methods and optimization of performance are discussed. 
3.2 Non-uniform TMUX by DIT Convolution 
This section describes the theoretical background of a non-uniform TMUX using an 
approach similar to fast convolution by discrete Fourier transform (DFf) or equivalent 
fast Fourier transform (FFf) algorithms [133.136]. The method discussed here concerns 
the necessary changes for adaptation to the OBP TMUX requirement. in particular the 
sampling rate consideration. and also as a unified treatment of the various published work 
in different areas but related to this same theoretical basis. 
3.2.1 Description of Method 
In conventional fast convolution using the overlap-save or overlap-add algorithms 
the convolution is performed in the Fourier domain such that efficiency is achieved pri-
marily by the use of fast algorithms for the FFf [138]. The difference occurs in the 
recombination of the frequency domain samples and the subsequent inverse transform 
back to the time domain. The problems to be investigated therefore are the effects of deci-
mation on the block processing input/output relationship. and possibilities of reduction in 
- 81-
computation. 
Starting with the expression for the down-conversion and lowpass filtering of a sin-
gle channel signal x (n ) at carrier f req uency (U" • 
00 - ) (j) (m -n ) r, h (n ) x (m - n ) e k 0 (3.1) 
n=-oo 
where h (n) is the lowpass filter impulse response. 
Performing the convolution in the frequency domain using aN-point DFf can be 
described by 
(3.2) 
i = O.1. ...• N -1. m = O.1. ... .N 1M -1. 
N-l N-l 
where H (k)= r, h (n )Wff. X (m .k)= r, x (mM +n )Wff • with M = decimation 
_)21T 
rate and W=e T 
n=O n=O 
X (m.k) is the short time Fourier transform of x (n) at time instant m. 
X (m .k - k 0) represents a circular shift by k 0 samples in the N frequency domain samples 
of x (n). with (U "0= 2'1Tk 01 N. 
The time reference of the DFf in (3.2) is the first sample of every input block { 
x (mM +n) }. In the conventional overlap-add or overlap-save method of convolution this 
does not cause any problems because the time reference is automatically compensated for 
at the inverse transform. With frequency domain shifting and decimation this is no longer 
true and some appropriate correction is needed. 
Let the decimated output sequence be y' (m ). then y' (m )= y (mM). and therefore 
to 
1 N-l 
y'(m) = - r, Y(k) wfrM 
N 1"=0 
N 
m=O.l •...• M-1. 
Using the change of variable k = l +sN 1M. l = O.1. ... .N 1M -1.s = O.1. ... .M -1. leads 
N 
M-l ]7-1 
y' (m) = .!.. r, r, YCl +s N ) W~lijJNIM)m 
N 1'=0 1=0 M 
- 82-
(3.3) 
This shows that the decimated sequence may be obtained by a Z point DFf operat-
ing on the frequency domain aliased sequence X (l + s Z). In the conventional method 
such as the overlap-save [107]. the requirements for elimination of time domain aliasing is 
that 
where Nx.N,. and ND are the lengths of the input data block. the filter impulse 
response and the DFf size respectively. 
hen) I I I ... I n 
IE :><1 Nh . 
x(n) I I . I I I n 
1< .... 1 . 
Nx 
XJk) Lr I . .... I k 
1< ND >1 
. 
Xm +1(k) I I I k 
Ie: "'I ND 
Fig.3.1 Input sectioning for overlap-add convolution 
This implies the sectioning of input data samples as shown in Fig.3.1. Considering 
this with (3.2) reveals that this kind of sectioning combined with the frequency shifting 
in X (k) causes a relative time offset which varies from block to block. This time offset 
may be interpreted as an equivalent phase offset in the frequency shifting operation. Recal-
ling that by definition. 
- 83-
This definition assumes that the phase of W;kon is referred to the start of the data 
blocks. A generalization is necessary to make all transformed block to have a common 
phase reference. Rewriting the previous expression as 
N-l 
X(m.k-k o) = L x(m+n) W!o(m+n) W;kn 
n=O 
= W!~tl x (m +n) W~k-ko)m 
n=O 
With sectioning. m = iNx • i = integer. then 
XCi .k-ko) = W;oiN~ Ntl x (m +n) W~k-ko)m (3.4) 
n=O 
The :first term in (3.4) is a phase term to be multiplied to the DFf output samples. It 
is variable with both the block number i and the channel location (k 0). For multiple 
channels it will be different for each channel and requires more multiplications. A more 
efficient solution arises if instead of the previous procedure a complex bandpass filtering 
operation is taken. (3.2) can be written as 
(3.5) 
Since H (k + k 0)= Nt 1 h (n )W;k o'lW;kn is independent of the block number i. the 
n=O 
phase correction term is not needed. This is not strictly equivalent to the operation 
expressed by (3.1) but with maximal decimation of the output the two forms become 
equivalent. 
Introducing decimation of the output using (3.3) leads to the expression 
The output overlapping as performed in the conventional overlap-add technique 
remains similar with decimation as shown in Fig.3.2. and with the overlapping determined 
by 
- 84-
m = l.ND - Nh + 1 
loE 
11 Ym(n) I I Nn I I 
without t t t t sample-by-sample 
decimation + + + + add ~ ~ ~ ~ I Ym+l(n) I I I I 
1< ~ 
f.< I Nn I I Nn I Ym(n) I • • • • • with t t 
decimation + + ~ ~ 
(t.l=3) kl(n) I I I .. • • • • I"" Nn >i 
Fig.3.2 Output overlapping under decimation 
From the same figure it can be seen that an additional requirement is 
Nia - 1 
~-- = Z = integer M 
(3.7) 
(3.8) 
This restriction may be bypassed in two ways. Firstly it is noticed that the decima-
tion of the output sequence means that not every sample in the overlapped region of 
Nh -l Nia -1 samples is required. but only M of them are useful. Recalling that output 
overlap-adding is a result of the sectioning in the input sequence. it is then possible to sec-
tion the input sequence in such a way that the output samples in the overlap region of 
consecutive sections occur at the same time instances. Instead of (3.8) the constraint is 
then 
Nia - 1 
m = i ND - INT( M ). M (3.9) 
where INT(x) denotes the integer part of x. 
The relation is shown graphically in Fig.3.3. The significance of this new technique is 
that if M is of a high order. such as that encountered in narrow-band filtering. there is 
- 85-
considerable increases in efficiency as 1NT [ (N h -1)/ M] M becomes significantly smaller 
than N h • It has one serious difficulty. however. when a multi-bandwidth filter bank is to 
be implemented. The variation of bandwidths in this case leads to different decimation 
rates. The previous method of input sectioning cannot be applied because in general (3.9) 
yields different values of m for different M. There are special cases when the decimation 
rates are integer multiples of each other. or for carefully chosen sets of parameters for 
(3.9). Indeed when the channels concerned consist of bandwidths which are integer multi-
ples of a basic quantity then it is straight forward to choose a transform size ND to cater 
for the appropriate decimation rates and N h • It should be noted also that whilest in gen-
eral the filter length Nh is not the same for different bandwidths and filter specifications. 
the filter lengths depend largely upon the transition bandwidths [115] which would be 
constant in the variable bandwidth TMUX considered here. 
Nn 
I< 
y (n) I I I I m I 
• • • • • • • • • 
1 
• n 
extra ~ 1< 
overlapped J.. J.. N h- 1 
samples l' 'I' 
+ + 
t t 
>1 
t + sampJe-by-sampJe 
t add 
ym+_l(_n_) __________ ~I~.~.~I~.~.~I __ .~.--n 
I-EI<,--------=-N=-D--- - - - ~ 
Fig.3.3 Extra overlapping for time-alignment of output samples 
A second method of eliminating the constraints of (3.8) comes from the time shifting 
properties of the inverse DFT (IDPT) operation. The application of the standard IDFT 
results in zero time and frequency references. If instead an arbitrary time reference is used 
such that the transformed sequence in the time domain may have arbitrary time origin. 
then the decimated samples may be time-offset to coincide with the samples in the 
- 86-
previous block. 
Let the time offset be no. then the required output signal y' is 
(3.10) 
1< 
ND 
Y (n) I I I I m I .. . . • • • • • • • • • 1 • n 
1< N ~-1 
: . ~ u 
'+ 
... 
~ + sample- by-sample 
of add 
ym+_l_(n_) __________________ +~-LI~.~.~I~.~.---n 
K --- ~ 
DoJ I<- NO 
Fig.3.4 Output overlapping with time-offset by no 
Hence by premultiplying the samples in the frequency domain by the factor W;n°. 
any time offset can be introduced. This procedure is illustrated in Fig.3.4. The range of no 
in the case with decimation is between 0 and M -1. Therefore this offers a completely gen-
eral sectioning scheme that will accommodate a variable bandwidth TMUX with different 
decimation rates. The extra multiplication of this factor to the input samples in the fre-
quency domain is easily avoided by premultiplying it to the filter frequency domain sam-
ples H(k). Clearly it is not a constant factor since no is to vary from frame to frame. 
Each value of no is then associated with a different set of frequency domain samples. The 
value of no is calculated for each data block according to the following relationship which 
can be easily verified. 
(3.11) 
- 87-
where noCi) is the value of no at data block i. 
Since the second term in the expression is constant the value of no for the next data 
block is easily calculated from the previous one using lookup tables. and the associated 
frequency samples are multiplied by the input samples. 
Finally. it is seen that the previous procedure becomes trivial for small values of M. 
When M is small. there is very little difference even if the overlap is increased to be 
Nh -1 greater than the minimum of Nh -1 to r M 1 M. where rx 1 denotes the smallest 
integer value greater than x. The computational efficiency is not greatly reduced as a 
result. It may then be more practical to satisfy (3.11) for reasons of simplicity in imple-
mentation. On the other hand if M is of a high order. it is worthwhile to implement the 
time-offsetting procedure for greater efficiency. 
For example. a 256-point DFT for a filter of length 130 and M = 16. the minimum 
overlap is 129 samples. Increasing it for the above reason to 144 reduces the efficiency by 
about 12% because the extra overlap produces redundant output samples. If the decimation 
rate is now 32. the overlap required is 160 which reduces the efficiency by 24% and the 
time-shiftip.g scheme may be advantageous. With multiple decimation rates as in the vari-
able bandwidth TMUX. the DFT size needs to be a highly composite number so that it is 
an integer multiple of the decimation rates. This is not a difficult requirement but would 
require the implementation of mixed-radix DFT algorithms [139] instead of the popular 
radix -2 FFT. 
3.2.2 Reduction of Processing Complexity 
Some techniques may be used to exploit special properties of the input signal or the 
filters and simplified processing structures are then obtained. In this section the general 
methodology is applied to more specific circumstances in order to describe the procedures 
in reducing complexity and to assess the subsequent degradation to the system characteris-
tics. This requires some means of quantifying the degradation to provide criteria for the 
tradeoffs between processing complexity and performance. 
- 88-
3.2.2.1 Reduced Frequency Windowing 
Analogous to time domain windowing that is applied to the filter impulse response. a 
similar windowing operation may be used in the frequency domain to limit the number of 
frequency domain samples of H(k). This is made possible by the fact that the frequency 
domain samples over the stopband region are no greater than the stopband attenuation. 
which is normally quite small. Hence a rectangular window that suppresses the stopband 
samples introduces relatively small error into the overall frequency response of the filter. 
In practice. quantization of the stopband samples to finite wordlengths in signal processing 
hardware very often leads to the same result when the wordlength is small compared to 
the stopband attenuation. 
Since for a narrowband filter the stopband region is a large fraction of the sampling 
frequency. such windowing leads to a large reduction in the number of multiplication of 
frequency samples. The frequency domain window effectively sets all stopband samples of 
H (k ) to zero. such that 
(3.12) 
where Ws is the stopband band edge frequency. 
Given a set of such samples corresponding to an FIR filter of length N h • setting the 
stopband frequency samples to zero will alter the time. frequency or both responses. 
depending on the characteristics of the original impulse response. location of the stopband 
zeros and the DFf size. It is therefore possible to devise an optimization strategy to the 
choice of these parameters so that the 'best' solution is found. This procedure is described 
in section 3.3. 
A natural relationship between the number of stopband and passband samples and 
the decimation rate is that with ideal filter response and maximal decimation. the decima-
tion rate is equal to the ratio between the number of passband samples and the DFf size. 
Since the decimation rate also equals the ratio between the DFT and IDFf sizes. as shown 
in (3.3). a simplified structure results in which the IDFT operates on the samples within 
the passband region only. as shown in Fig.3.S. The reduction in computation for each 
channel is therefore by a factor equal to the decimation rate. For the complete TMUX. the 
reduction makes the total number of frequency domain multiplications equal to the DFT 
size. if the sampling frequency is such that the whole DFf domain is occupied by the 
- 89-
channels. 
-- - X , 
-- --
-::::-" 
....... 
--
• 
Hh- 1 • 
• • 
• 
--7 X ... ~ 
• H~ 
~x , 
-::::-" 
J.... 
Q) 
....., 
fI) 
ND 
• 
...... 
~ l1, -L -1. Q) 
~ 1 2 
point • 
-' 
..... 
-- X -...... 
- -..c: DFT UJ 
H Nn-L}( 
-- X -. 
--
--
• • 
• H ND-l • 
• • 
....... 
........, X --
-- -- --
1]\ 
Inp ut 
--
--;? 
L1 • 
point 
• 
IDFT 
• 
--
--;? 
---= 
L2 • 
point • 
IDFT • 
....... 
--
• 
• 
• 
..... 
·7 
LK • 
point 
• 
IDFT 
• 
--
-
J 
~ 
V 
......, 
en 
.-QlJ 
v 
0::: 
......, 
..... 
. ...., 
...c: 
UJ 
1 
~ 
v 
......, 
en 
.-QlJ 
v 
0::: 
......, 
..... 
. -
...c: 
UJ 
I 
~ 
V 
......, 
en 
.-
t:lD 
v 
0::: 
......, 
..... 
. -
..c 
UJ 
..... 
--
-
--
--
--
Channel 1 
Output 
Channel 2 
Output 
Channel K 
Output 
Fig.3.5 DFf convolution nIter bank with zero-valued stopband samples 
Concerning the error introduced by the suppression of stopband samples to zero. a 
number of methods are available to analyse its magnitude. It has been suggested [136] that 
the nIter may be designed to contain preassigned stopband zeros so that (3.12) is exactly 
satisfied. and an optimization procedure can be set up to maximize the stopband attenua-
tion. This approach does not take into account the fact that a FIR nIter of length N has at 
- 90-
most N+1 zeros on the unit circle in the z -domain, for N odd, and N zeros when N is even 
[140]. As a result, even with all zeros preassigned to the stopband, which is the case for 
filters with monotonically-flat passband, the maximum number of zeros in the stopband is 
N+1 or N for odd and even N respectively. This approach places constraints on the stop-
band bandwidth or the DFr size which then limits the number of DFr samples in the 
stopband region. It becomes very inefficient if narrowband channels are used. The same 
reason also makes it inappropriate for the variable bandwidth TMUX because of the 
widely varying filter lengths for filters of various bandwidths. These drawbacks were not 
noted in the literature [136] because the TMUX's considered there consisted of small 
numbers of channels ( < 4) and used uniform bandwidth channels. 
3.2.2.2 Analysis of Degradation due to Forced Stopband Zeros 
In general. alteration of the frequency domain samples of a filter changes the length 
of its impulse response. and when stopband samples are forced to become zero the general 
consequence is an increase in the filter length due to the overall increase in stopband 
attenuation. A method is needed to quantify the effects and assess the form of degradation 
to the TMUX. Effects of assigning stopband zeros to a lowpass filter are first discussed. 
The number of frequency domain samples ND form a DFr transform pair with the 
Nh point impulse response augmented with ND-Nh zeros. Setting stopband samples to 
zeros is equivalent to satisfying (3.12) with cu = 2", k .k = integer. The equivalent impulse 
ND 
response is obtained by the ND point IDFr of the new DFr samples. Since the frequency 
response is windowed to a finite bandwidth, the corresponding impulse response is theoret-
ically infinite in duration. By taking the ND point IDFr. the resulting ND point time 
domain sequence represents one period of the periodic sequence formed by time-aliasing of 
the infinite impulse response [107]. as given by 
00 
i (n ) = L x (n + r N D) n = O.1. ... .ND-1. (3.13) 
r=-oo 
where x (n ) is the periodic sequence and x (n ) is the infinite sequence. 
Equivalently x (n) may be considered as aND point impulse response having the 
stopband zeros. As a result of the time-domain aliasing. the frequency response is no 
longer finite meaning that the stopband attenuation is not infinite as in the rectangular 
- 91 -
window. The zeros in the stopband do not change. however. since they are defined by the 
DFf relationship which gives rise to (3.13). Since (3.13) depends on ND and that in gen-
eral x (n +rND ) tends to zero as r tends to infinity. the larger ND is. the less time-aliasing 
occurs and the greater is the stopband attenuation. Fig.3.6a.b and 3.7a.b show respectively 
the magnitude of the time and frequency responses of two filter examples which illus-
trate this process. Both filters were designed with the same nominal filter length Nh of 
59. Fig.3.6a.b correspond to the case of N D =128 with 100 stopband zeros. whilst Fig.3.7a.b 
correspond to N D -512 with 400 stopband zeros. 
It can be observed from these results that although the impulse responses have been 
extended to ND points. the magnitude of the samples h (n) for n ~Nh -1 are distinctly 
smaller than the other samples. and that the Nh point responses have not changed by a 
large extent after stopband zeros have been insisted. This suggests that with an overlap of 
Nh -1 samples as in the original scheme. the degradation due to circular convolution of 
h (n).n ~ Nh -1. should be reasonably small and controllable. A procedure to quantify 
this error will be required to assess the suitability of the choice of parameters for a given 
specification of the TMUX. General analytical results concerning overlap-error in fast-
convolution methods exist [141] and are quoted in the following for application to the 
TMUX problem. 
The system can be characterized by a periodically time-varying impulse response 
h (n.Tn) for which the input-output relationship is given by 
y (n) = L h (n .Tn) x (n -m ) (3.14) 
allm 
where h (n .Tn)= h (n +mN).N = N D - Nh + 1. i.e. h (n .Tn) is periodic with period N . 
The frequency domain relationship is given by 
N,-1 ,.. 2'TT'k 2'TT'k 
Y(co) = L H,,(co--) X(co--) 
"=0 N N 
(3.15) 
HN (co I.(0 2) is the two-dimensional Fourier transform of one period of h (n .Tn). In 
practice the values of co 1 of interest are only 2'TT'k / N .k = 1. ... .N -1. This leads to the fol-
lowing simplified expression that avoids the need for carrying out the two-dimensional 
DFT explicitly. 
- 92-
(a) time response 
100 e--------,------r-----:::I 
(b) frequecy response 
101 ~-------.--------~ 
10-1 100 
~ ] 10-2 10-1 
.-~ 
:; 10-3 10-2 
10-4 10-3 
10-5 L...-__ ----IL--_~---I ___ -...J 10-4 
o o 50 100 0.5 
Sample number Frequency (nomalized) 
Fig.3.6 59-point filter with 128-point DFf 
(a) time response (b) frequecy response 
100 101 
10-1 100 
~ 10-2 
~ 10-1 "0 "0 
::s ::s 
- -.-
.-
6h 6h 
t'd 10-3 t'd 10-2 :; :; 
10-4 10-3 
10-5 10-4 
0 200 400 0 0.5 
Sample number Frequency (nomalized) 
Fig.3.7 59-point filter with 512-point DFf 
1 
1 
- 93-
2'T1'k 1_e-J2'T1'kZIN 
whereEz(-N)= -°2 kIN 1-e J 'TI' 
Equation (3.15) means that the output suffers from a form of aliasing in the fre-
quency domain similar to that in decimation but differing in that the filter response is 
different for each aliasing component. The relationship extends to stochastic input in terms 
of input and output power spectral densities. as given by 
s (w) = _1_ N~l IHN(2'T1'k (0- 2'T1'k)12 S ((0- 2'T1'k) 
, N 2 "L:o N' N x N (3.16) 
The terms for k ¢ 0 corresponds to the cyclic convolution error due to the overlap 
being less than Nh -1. The term with k = 0 corresponds to an 'average filter' with impulse 
response given by 
- 1 N-l 
ho(m) = N n~o h(n.m) (3.17) 
This is clearly not the same as the ND point IDFf of the frequency samples and it 
represents another source of error. The average filter response may equally be interpreted 
as a windowed version of the periodic extension of h (m). the ND point IDFf of the fre-
quency samples. such that 
h o(m) = h (m) W (m ) 
where h (m) = periodic extension of h (m). and W (m) is a trapezoidal window given 
by 
l+m/N 
W(m) = 1 
1-m-P/N 
as shown in Fig.3.8. 
m = -N+1.-N+2 ....• 1. 
m = O.1. ... .Nh -1. 
m = P+1.P+2 •... .M-1. with P=Nh -1 
Since the samples h (m) for O~m ~Nh -1 have much larger magnitude than the 
other samples. h o(m) becomes a close approximation of h (m). Fig.3.9a.b show the time 
and frequency responses of the average filters of the respective filters as shown previously 
in Fig.3.6. These show that the difference between the average filters and the original 
filters are negligibly small. 
- 94-
1-.... - ............. . 
W(n) 
-N -1 0 1 P P+1 M n 
Fig.3.8 Trapezoidal window of W (n ) 
Returning to the first problem of frequency domain aliasing. it can be seen from 
(3.15) and (3.16) that the aliasing components depend on the aliasing response Hk (w) 
which is dependent only on the 'tail' of the impulse response. i.e. h (m) for 
Nh ~m ~ND-1. The magnitude of two aliasing responses for each of the two previous 
examples of Fig.3.6 and Fig.3.7 are shown in Fig.3.10a.b and Fig.3.11a.b respectively. The 
responses shown corresponds to k = 1 and N /2. as from (3.16). of which the overall mag-
nitude was found to be minimum and maximum respectively. The magnitude responses for 
other values of k generally lie somewhere between these two extremes. It can be seen 
from these responses that although the aliasing response for Ie = 1 is generally of a smaller 
magnitude with a larger DFf size. the responses for k = N /2 for different DFT sizes are 
very similar. It substantiates the previous observation that aliasing noise due to enforcing 
stopband zeros is not effectively reduced by increasing the DFT size. The objective is 
therefore to minimize Hk (w). for k ¢ O. by suitable choice of the original impulse response 
h (n). given some specific decimation rate and transform size. This therefore requires the 
minimization of the tail of the impulse response which is carried out by a computer optim-
ization technique described in the next section. 
3.3 Filter Design for Variable Bandwidth TMUX 
The previous section described an effective but somewhat heuristic approach to the 
design of filters for use in the variable bandwidth TMUX. This section describes a ~s­
tematic means for the design of FIR filters using optimization techniques which leads to a 
unique and best solution for a given set of optimization criteria. This method takes into 
account the various constraints imposed by the decimation rates. the transform sizes and 
filter specifications. and obtains the optimal solutions given a set of such constraints. 
- 95-
(a) time response 
100 ..,-------,r----...------r---= 
(b) frequecy response 
101 =-------~--------~ 
10-1 100 
Q) ] 
..... 
10-2 Q) ] 10-1 
. .... 
~ ~ 10-3 ~ ~ 10-2 
Q) ] 
-.-6b 
CI:I 
~ 
10-4 10-3 
1 0-5 1L.....1.L.L..JL..~ __ ----L __ --'-.L._.I_L.LI-LJ 10-4 
o o 50 100 150 200 0.5 
Sample number Frequency (nomalized) 
Fig.3.9a Responses of average nIter of ng.3.6 
(a) time response (b) frequecy response 
100 101 
10-1 100 
10-2 Q) 10-1 "0 
.a 
..... 
6b 
10-3 CI:I 10-2 ~ 
10-4 10-3 
10-5 10-4 
0 500 1000 0 0.5 
Sample number Frequency (nomalized) 
Fig.3.9b Responses of average nIter of ng.3.7 
1 
1 
(1) 
"0 
=' 
-.-6b 
as 
~ 
(1) 
"0 
=' 
-.-~ 
~ 
(a) k=1 
10-2 ~------.-------::::o 
10-3 
10-4 
10-5 
10-6 
o 0.5 1 
Frequency (nonutihred) 
- 96-
(1) 
'g 
-.-6b 
as 
~ 
(b) k=N/2 
10-1 c------.----_----. 
10-2 
10-3 
o 0.5 1 
Frequency (nomalized) 
Fig.3.l0 Aliasing responses of filter in fig.3.6 
(a) k=1 
10-3 ~-------r-------::a 
10-4 
10-5 
10-6 
10-7 
o 0.5 1 
Frequency (nomalhred) 
(1) 
"0 
=' 
-.-~ 
~ 
(b) k=N/2 
10-1 c------,-------, 
10-2 
10-3 
o 0.5 1 
Frequency (nomalized) 
Fig.3.ll Aliasing responses of filter in fig.3.7 
- 97-
3.3.1 Design Criteria 
The objective of the filter design method is to obtain the best approximation to the 
desired ideal response. while satisfying all the required constraints. The best approxima-
tion or the optimal solution in the Chebyshev sense is the criteria used in the particularly 
well-known Parks-McClellan filter design algorithm [142]. in which the maximum 
difference between the desired frequency response and the actual response is minimized. In 
order for the filter to be applicable here to the variable bandwidth TMUX design. addi-
tional criteria are introduced which were not in the conventional algorithms. 
The first additional criteria is for an arbitrary number of zero values to be asserted at 
the stopband frequencies at predetermined intervals. This is the measure to increase com-
putational efficiency by reducing the number of multiplications of frequency domain sam-
ples. Given maximal decimation and stopband bandwidth relating to the decimation rate 
by (3.3). the locations where the frequency response is to be set to zero are given by 
(3.18) 
where N D - DFf size. and M = decimation rate. 
Another criteria is imposed in order to limit the 'tail' magnitude of the filter impulse 
response whilest having the sufficient filter length for the number of frequency response 
zeros to exist. This is expressed by 
NOL forln I ~ 2 • NOLeven.or (3.19) 
NOL -1 In I ~ • NOLodd. 2 
8h is an arbitrary value below which the tail samples are to be suppressed. NOL is 
the number of overlap samples in the frequency domain convolution algorithm used in the 
TMUX structure discussed in the previous section. assuming also that NOL 'Nh · h (n) is 
N h -1 N h -1 Nh Nh defined for - 2 ~n ~ 2 for Nh odd. and -T'n 'T for Nh even. 
The number of overlap samples NOL and the filter length Nh are assumed to be 
either both even or both odd. which in practice is easily satisfied. When this is not the 
case. additional care has to be taken with respect to the relationship of (3.19). For con-
venience. only the former type of overlap and filter length is considered here. but the 
- 98-
results can easily be extended to the latter case. 
3.3.2 Computational Methods 
Software was written using the C language to carry out the optimization procedure 
using an iterative algorithm. The core of the procedure is a linear programming optimiza-
tion method using the simplex algorithm [143]. The linear programming approach is the 
only filter design method that allows simultaneous time and frequency domain constraints 
[140]. such as those in (3.18) and (3.19) here. It maximizes or minimizes a cost function 
subject to a set of constraints. All of these constraints and the cost function must be linear 
combinations of a set of variables. In the FIR filter design problem. the frequency response 
at a certain frequency CJJ is a linear function of the filter impulse response coefficients. 
given by 
for Nh odd. and 
for Nh even. 
(Nh -1)/2 
H(e}lAJ) = h(O) + L h(n)cos(cvn) 
n=l 
Nh -1 
""""2 
H (e) IAJ) = L h (n )cos( CJJ n ) 
n=O 
(3.20a) 
(3.20b) 
To obtain the optimal filter in the Chebyshev sense. the cost function to be minimized 
is the passband and stopband ripple. which are the maximum deviation of the actual filter 
response from the desired ideal response. Hence the linear constraints for the passband of a 
lowpass filter with passband ripple ap are given by 
(3.21a) 
and 
(3.21b) 
where CJJ p is the passband band edge frequency. 
Similarly the stopband constraints for stopband ripple of as are 
(3.22a) 
- 99-
and 
(3.22b) 
where w s is the stopband band edge frequency. 
The set of constraints from (3.18) through (3.22) then constitutes the linear program 
by which one of 8h .8p or 86 is minimized. As in conventional procedures. either constant 
values are assigned to two of these variables and the third minimized. or more appropri-
ately the variables are made multiples of each other which results in simultaneous minim-
ization of all three variables. 
There are two important additions to this linear programming algorithm. firstly to 
increase efficiency and secondly to enable solution under the additional constraints (3.18). 
(3.19). The first one is similar to a reported procedure related to the design of Nyquist 
filters for data transmission [144] and is applied more generally here. In the conventional 
linear programming filter design [140] the frequency domain constraints (3.21). (3.22) are 
enforced over a dense frequency grid. normally of the order of 16 times the length of the 
filter. This leads to a high number of constraints since for every frequency there are also 
two separate constraints (3.21a.b) or (3.22a.b). The storage requirement becomes formid-
able as well as the amount of computation necessary to obtain a solution. both of which 
are approximately proportional to the number of constraints. 
The method to increase its efficiency is based on the observation that for a length N 
linear phase FIR filter. there are at most N ;1 extrema in the range O~cu ~11' where the 
magnitude of the ripple 8p or 86 attains a maximum. and the optimal solution is obtain-
able by the minimum number of constraints if the constraints are enforced at these 
extremal frequencies alone. Hence instead of the dense frequency grid. an iterative 
approach is used and in each iteration the extremal frequencies are determined numeri-
cally. the linear program formulated and its solution found. The process is repeated until a 
final optimal solution is found. This method is illustrated in Fig.3.12. Initially the N ;1 
frequencies are divided proportionally between the passband and stopband. Subsequently 
extremal frequencies of the solution from each iteration are found using the Newton-
Raphson method for finding the roots of the derivative of the frequency response. Initial 
estimates must first be found by determining the approximate location of the zero-
crossings of the derivative and their exact values are determined to high accuracy by the 
- 100-
Newton-Raphson method. Minimum and maximum constram' ts . d 1 
. are lDlPOse a ternately to 
successive extremal frequencies. which gives further saving in computation. The solution 
from the iterations converges to the optimal which is indicated by l' 'bl h . 
neg Igl e c ange m the 
value of the cost function between successive iterations. 
Begin 
Allocate arbitrary 
frequencies as 
initial extrema 
Optimize response 
in extrema frequency 
locations 
Find new extrema 
frequency locations 
x 
Optimization Complete 
Fig.3.12 Steps in Optimization Procedure 
The second important addition to the linear programming filter design procedure is 
the dynamic allocation of the number of constraints to each iteration. This is necessary 
because the number of extremal frequencies of the optimal solution cannot be predeter-
mined. In the conventional equiripple design [140] the optimal solution may either be 
'extraripple' in which case the number of extremal frequencies between cu = 0 and 'IT is 
(Nh + 1)/2. or otherwise they are one fewer than the extraripple case. Whether the optimal 
solution is extraripple or having one fewer ripple depends on the location of the transition 
band [115]. Using the present technique with greatly reduced numbers of constraints. it 
-101 -
was found that the number of constraints used has to be exactly the number of extremal 
frequencies of the optimal solution otherwise the iterative procedure will fail to converge. 
For the design of conventional equiripple filters the choice is between either Nh + 1 or 
2 
Nh -l 
2 constraints and it can be done by trial and error. With the additional constraints 
(3.18) and (3.19) the optimal solution is no longer equiripple and the number of extremal 
frequencies is not just one of the two possibilities. Hence the number of constraints is 
allowed to vary from one iteration to the next. depending on the number of extremal fre-
quencies found at the previous iteration. This yields an optimal final solution since the ini-
tial problem assumes the maximum number of constraints. Successive iterations giving 
fewer extremal frequencies imply that the optimal solution is achieved with fewer 
extremal frequency constraints satisfied with equality. but rather some of the other con-
straints are so satisfied. 
3.3.3 Results 
This filter design method was used to design a number of filters with various 
specifications and the results appear to confirm the reliability of the algorithm. 
Some design examples are shown in Fig.3.13 to 3.16 to illustrate the effectiveness of 
the algorithm. All these examples are for the design of a length 63 filter to be used with a 
DFT size of 64. The transition widths of the filters are designed to remain constant hence 
the stopband attenuation remains approximately constant even for different passband 
bandwidths. 
Fig.3.13a to 3.13c shows three different passband bandwidths. as would be encoun-
tered in the variable bandwidth TMUX. The number of stopband zeros is reduced as the 
passband bandwidth increases. However the changes in the stopband attenuation in all 
these examples are not large. although the apparent result of baving the asserted stopband 
zeros is the loss of their original equiripple characteristics. 
Fig.3.14a to 3.14c shows the frequency responses of the filter of Fig.3.13b but with 
the magnitude of the tail of its impulse response limited. Fig.3.15a to 3.15c shows the 
magnitude of their respective impulse responses. The parameter varied for these cases 
is the length of the tail. The longer the tail. the more efficient is the filter bank but as can 
Q) 
"'0 
=' 
-
.... 
61 
~ 
~ 
Q) 
"'0 
::s 
-
.... 
61 
~ 
~ 
Q) 
] 
-
.... 
61 
~ 
~ 
101 
100 
r\ 
10-1 ~ 
10-2 ~ 
10-3 ~ 
10-4 
o 
101 
100 
10-1 
10-2 
10-3 
101 
100 
10-1 
10-2 
10-3 ~ 
10-4 
o 
- 102-
(8) bandwidth 0 12 
"" I I I I , I I I I 
( 
-
" 
n,,, f', f "r 
" l(,f/./ ~.I 10 /I ~ 
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
(c) bandwidth = 0.5 
I .-r I , J I I I 
-
" f "f fI r f1 r fI r 
-
I( IMIMM,/illliIl/l( f1 r 
-
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
Frequency (nomalized) 
Fig.3.13 Three filters with stopband zeros and different bandwidths 
- 103 -
be seen from their frequency responses. the passband gain becomes widely varying. A 
suitable compromise must be made and in these examples Fig.3.14c may represent a 
reasonable tradeoff between computational efficiency and performance. with a tail of length 
20. Longer tails would lead to very large passband ripple. 
Fig.3.16a to 3.16c shows the frequency responses of the filter of Fig.3.14b but with 
three different tail magnitudes. which can be seen from their respective impulse response 
magnitudes in Fig.3.17a to 3.17c. Since the aliasing noise is approximately proportional to 
the square of the tail magnitude. it is important to minimize it as far as possible. Again it 
involves a tradeoff with the degradation in the frequency response. Fig.3.16b may 
represent a reasonable example of such a tradeoff. The final choice would depend on the 
particular specifications of the system and a separate tradeoff study is required for each 
individual system. The most dominant aliasing responses of the filters in Fig.3.16a to 
3.16c are shown in Fig.3.18a to 3.18c respectively. For comparison. the aliasing response 
of a length 63 filters designed by simply setting the DFf stopband samples to zero. as dis-
cussed in section 3.2.2.1. is shown in Fig.3.19. The reduction in aliasing noise for the filters 
designed using the present technique is apparent. 
The practical limitations of computer memory. and perhaps more importantly the 
execution time. hinders extensive testing for the design of higher order filters. For example. 
the execution time for designing a length 255 filter on a VAX 11/750 was of the order of 
65 minutes. The execution time increases with the square of the filter length which sug-
gests the amount of time required for longer filters would be inconvenient although not 
entirely intolerable. Since the filter design is an 'off-line' process. its execution time is not a 
prime concern and the filter design method remains a useful tool in the design of computa-
tionally efficient filters for use in DFf convolution even for higher-ordered filters. 
3.4 Issues in Implementation 
The practical details in the implementation of the variable bandwidth TMUX using 
the method described and the particular filter design are examined which reveals further 
modifications necessary to the fast convolution procedure. The computational effort 
required for various specifications are assessed to show the order of magnitude in the 
hardware requirement. 
Q) 
-g 
-
.... 
~ 
= :E 
l) tail = 6 
-"0 
=' 
-
.... 
~ 
= :E 
I) tail = 10 
Q) 
-g 
-
.... 
~ 
= :E 
:) tail = 20 
100 
10-1 ~ 
10-2 I-
10-3 ~ 
10-4 
0 
100 
10-1 
10-2 
10-3 
100 
10-1 I-
10-2 ~ 
10-3 ~ 
10-4 ~ 
10-5 
o 
0.1 0.2 
0.1 0.2 
~ 
0.1 0.2 
- 104-
1.0:~ f\ f\ f 
0.9sl v v V 
0.96 '---_-L--I..._ 
o 0.1 
0.3 0.4 0.5 0.6 0.7 
0.96 
0 0.1 
0.3 0.4 0.5 0.6 0.7 
1.05 ;NV 1 
0.95 
0.9 
0 0.1 
" 
" 
~ II ,\111 
0.3 0.4 0.5 0.6 0.7 
Frequency (nomalized) 
Fig.3.14 Frequency responses of filters with different tail lengths 
0.8 0.9 1 
0.8 0.9 1 
-
-
-
-
~ 
-
0.8 0.9 1 
- 105-
Q) 
] (a) tail = 6 
.~ 10-1 
cd 
:E 
10-2 
10-3 
10-4 
10-5 
0 10 20 30 40 50 60 70 
Q) ] (b) tail = 10 
-
.... 10-1 6b 
cd 
:E 
10-2 
10-3 
10-4 
10-5 
0 10 20 30 40 50 60 70 
Q) ] (c) tail = 20 
-
.... 10-1 6b 
cd 
~ 
10-2 
10-3 
10-4 
10-50L-----1...l-0---2...l-0---3...1..0---4-'-0---5-'-0---6-'-0------'70 
Sample number 
Fig.3.15 Impulse response magnitude of filters in fig.3.14 
] 
.~ 100 ~---
:E 
10-1 
10-2 
10-3 
10-1 
10-2 
10-3 
100 1-----
10-2 ~ 
10-3 
10-4 
10-5 
0.1 
" 
- 106-
(a) tail constrain = 0.05 
0.9 '---_~.L_.. 
o 0.1 
(b) tail constrain = 0.01 
1.1 
(c) tail constrain = 0.001 
':w 
0.9 o 0.1 
0.2 0.3 0.4 
0.6 
0.5 0.6 
Frequency (nomalized) 
0.8 
0.7 0.8 
I 
I 
0.7 0.8 
Fig.3.16 Frequency responses of filters with different tail magnitude 
0.9 1 
0.9 1 
---
-
-
0.9 1 
- 107-
] 
.... 
~ (a) tail constrain = 0.05 
:E 10-1 
10-2 
10-3 
10-4 
0 10 20 30 40 50 60 70 
CI) ] 
.... 
~ (b) tail constrain = 0.01 
= :E 10-1 
10-2 
10-3 
40 50 60 
10-4~----~----~~----~------L------L------~----~ 
o 10 20 30 70 
CI) 10-1 
"§ 
-
.... 
~ 
:E 10-2 
10-3 
(c) tail constrain = 0.001 
10 20 30 40 50 60 
10-5~----~~----~------~------~------~------L-----~ 70 o 
Sample number 
Fig.3.17 Impulse response magnitude of filters in fig.3.16 
-108 -
(a) tail constrain = 0.05 
100 r-----------------------------------------------~ 
10-1 ~ 
10-2 
10-3 ~ 
" ~ 
" 
" 
r II 
" " /I 
" 
,., 
-
-
104~--~----~----~----L---~----~----~----L---~--~ 
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
(b) tail constrain = 0.01 
10-1~------------------------------------------------------~ 
10-2 f- ~ II ,., 
" " " II 
1\ fI" 
1\ 
,., 
fI Ii II 
1\ n 
/I, II " 
10-3 
104~--~----~----~----L---~----~----~~--L---~--~ 
o 
10-1 
10-2 ~ 
10-3 ~ 
II 
10-5 
o 
0.1 
I f, 
/', r 
0.1 
0.2 0.3 0.4 0.5 0.6 0.7 
(c) tail constrain = 0.001 
," I P 
~ J II I r /I ~ 
" II 
" n /I, 
. 
0.2 0.3 0.4 0.5 0.6 0.7 
Frequency (nomalized) 
Fig.3.18 Aliasing responses of filters in fig.3.16 
0.8 0.9 1 
-
r /', 
I f, 
-
/', 
fI 
0.8 0.9 1 
10-1 
f= I-
~ 
~ 
~ 
10-2 
I ~ 
- 109-
: 
-
-
I 
1 0-3 ~....IL...iL.~~~-ILJLJLJI....II...IL..L.L.L..l....IL.LI..IIl....l....I...LJIl.JU..LLJL1...1...IJ..l.JLLL.u..J.....Ll..LU..l..lJUl.1.LJlllll.lJ 
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
Frequency (nomalized) 
Fig.3.19 Aliasing response of filter with suppressed DFf samples 
3.4.1 Practical Details 
In the special filter design described in the previous section. it is noted that the sam-
ples of the impulse response with their magnitudes controlled are evenly divided between 
the two ends of the impulse response. This is a convenient and in fact necessary condition 
for the filter design algorithm. because the impulse response of a linear phase FIR filter 
must be symmetrical about the centre sample(s). The resulting filter. when applied to the 
previous variable bandwidth TMUX method. will appear to have a two-sided tail response 
instead of being one-sided as those shown in Fig.3.6 and 3.7. Clearly it is not possible to 
apply the method using such filters without some changes. The changes that have to be 
made are in the sectioning and overlapping procedures. Fig.3.20 illustrates the relationship 
between the input and output samples with the two-sided impulse response tail in the 
overlap-save process. It can be seen that the output samples in the cyclic convolution 
must be rearranged to produce the desired linear convolution output. The Nh -1 samples 
where uncontrolled wrap-around or time-aliasing occurs are located in the middle of the 
ND -Nh +1 
output block. They are preceded by the first 2 samples which have no uncon-
trolled time-aliasing. but only the time-aliasing associated with the impulse response tail 
- 110-
which is a feature controlled by the filter design algorithm. If the impulse response tail is 
ignored. these samples then correspond to the output samples of the linear convolution 
between the impulse response of length Nh and the input block of length N D • at the sam-
N D -Nh +1 pIe time of n = N h -1 + 2 to n = N D -1. with n = 0 being the first sample of the 
output block. After the next Nh -1 samples which are time-aliased. there are another 
ND-Nh +1 2 samples similar to the first such number of samples. and this time 
corresponding to the output samples of the linear convolution at n = 0 to 
ND-Nh +1 
n = Nh -1 + 2 -1. Hence the two blocks of samples must be realigned as shown 
in Fig.3.21 to give the correct output samples. instead of the simple discarding of N h -1 
samples at the end of the output block as carried out in the usual overlap-save technique. 
xCi) 
hen-i) 
fcc ND 
II 
n=O; time-
domain 
aliasing due n01 
to first part of 
tail begins 
1 1·· IcC n=n - ; a lasing 
due 0 to non-negligible 
samples of impulse 
response begins 
n=~-no-l; aliasing 
due to second part 
of tail begins 
r 
>11< 
ND l~ lIth ...... 
rn -point 'tail' or 1 o impulse response (n O=3 shown) 
Fig.3.20 Aliasing effects in convolution using filters with controlled tail magnitude 
Cyclic 
Convolution nO 
Ouput 
Desired 
Linear C 
Output 
onvolution 
- 111 -
Nn 
Nh -1 
r ] IIllI .... IIIII! 
~-- ;-. 
time- I 
I 
aliased I 
samples I 
, 
~ 
T T 
nO 
... I I 
........-
... 
~ I I I 
I 
4~ 4~ .~ 
... . .. 
J ~ 
Fig.3.21 Rearrangements of cyclic convolution output 
3.4.2 Computation Requirement 
n 
n 
An assessment of the amount of computation required for two different system 
configurations is shown herein. The number of multiplications and additions are tabulated 
in Tables 3.1 and 3.2 in the following for two different DFT sizes and a number of channel 
bandwidths. The number of arithmetic operations in the tables correspond to the number 
of operations for each input samples. Hence they are effectively the operation rate normal-
ized to the input sampling rate. The actual operation rates for any given system may thus 
be found by simply multiplying the values by the input sampling rate of the system. The 
system overhead refers to the operations in the input DFT. which do not vary with the 
number of output channels and their bandwidths. The other operation counts are those for 
one channel of the channel bandwidth shown. The channel bandwidths are normalized to 
the input sampling frequency. for example 1/8 means a channel bandwidth of 125Hz 
when the input sampling frequency is 1kHz. 
- 112-
The operation counts were calculated assuming complex input signals as well as fre-
quency domain samples. No assumptions were made concerning "trivial' multiplications 
(1. -1. j. etc.) that could be avoided. 
Table 3.1 shows the case with DFf size == 64. with a 38 samples overlap between each 
input block. or in effect data is input in blocks of 64-38==26 samples at a time. The choice 
of the DFI' size and amount of overlap are governed by the filter length. In this case. a 
filter of length 39 is appropriate for an 8-channel filter bank. and the DFf size is chosen 
for efficiency [145] and as a power of 2 for implementation via radix-2 FFf. 
Channel Bandwidth Multiplication Addition 
System Overhead 29.5 44.3 
1/8 3.1 2.8 
1/4 7.4 8.6 
1/2 17.2 20.9 
Table 3.1. Arithmetic Operation Counts for 64-point DFf. 38-sample overlap. 
A principle drawback for the use of radix-2 FFf is the very limited number of 
integer factors of the DFI' size. As discussed in the previous section. this leads to limited 
variation in the channel bandwidths. in this case to the set of binary fractions of the sam-
pling frequency. To overcome this limitation. the DFf size may be chosen to be a highly 
composite number. Table 3.2 shows an example of such a choice. 
The DFI' size of 120 has integer factors of 2.3 and 5. which enables the number of 
different bandwidths as shown. The filter of length 63 is sufficient for a 2o-channel filter 
bank. leading to an overlap of 62 and data is input in block of 58 samples using the 
12o-point DFI'. The system gives rise to non-radix-2 DFf. These are calculated via the 
index-mapping algorithms [138] which reduce the amount of computation by mapping 
large DFf to a number of smaller DFf·s. The values shown in the table have not assumed 
further use of fast algorithms once a small prime DFf have been reached. As a result. a 
smaller sized DFf such as 3 requires more operations than a 4-point DFf because the 4-
point DFf is calculated via radix-2 FFf while the 3-point DFf in the regular manner. 
This is the reason for the unusually high values for the channel bandwidths of 1/12 and 
-113 -
Channel Bandwidth Multiply Add 
System 9verhead 78.6 62.1 
1/20 1.0 0.62 
1/15 1.4 1.2 
1/12 6.2 1.7 
1/10 5.0 2.1 
1/8 10.3 3.1 
1/6 9.7 4.1 
Table 3.2. Arithmetic Operation Count for 120-point DFf. 62-point overlap 
1/8. Fast algorithms for the small prime DFr can be used but will increase the complex-
ity of the computation structure and its value depends on the particular hardware and 
software issues hence is not investigated further here. 
A number of inferences may be drawn from the examples shown. Firstly. the actual 
amount of computation requirement is of a realistic order. For example. the system of 
Table 3.1 for an input sampling frequency of 100kHz requires just less than 10M 
multiplications/sec .. with the output consisting of one 1/2 and four 1/8 channels. The 
addition rate is similar. Secondly. it can be seen that the wideband channels requires 
much higher number of operations than the narrower ones. such that the number of opera-
tions is approximately proportional to the square of the bandwidth. However the amount 
of operations associated with the output channels are much smaller than the system over-
head. This emphasizes the efficient nature of this type of processing structures as the sys-
tem overhead is shared among the channels. Additional saving in computation can be 
achieved by FFr pruning [146] when some channels are not used and hence not all samples 
from the FFr are required. 
-114 -
3.5 Summary 
In this chapter a very efficient structure for implementing variable bandwidth 
TMUX has been described. Generalization to the classical fast convolution via FFr algo-
rithm was introduced to accommodate the additional requirements. In order to improve 
efficiency a filter design method was established. The filter design algorithm itself shows 
improvements over the conventional linear programming techniques particularly in terms 
of storage requirement on the host computer. The speed of this algorithm was shown to 
be of reasonable orders. A number of filters were so designed for specific examples to 
demonstrate its usefulness. The filters were shown to satisfy the special sets of constraints 
for implementation in the TMUX structure. while having good characteristics such as high 
stopband attenuation. Finally. the practical issues in implementation are addressed. An 
assessment of the order of computation requirement is obtained and shown to be achiev-
able in practice. 
4 
Flexible Transmultiplexer 
by Analysis/ Synthesis 
Filter Banks 
4.1 Introduction 
The advantages and possible applications of the flexible TMUX have been established 
in the previous chapter. One approach offering such flexibility together with computation 
efficiency was described. In this chapter. another method is exploited. This method draws 
from the knowledge of analysis/synthesis :filter banks which has received a great deal of 
attention recently. 
The original application of the :filter bank problem was to the subband-coding of 
speech Signals. in which the input signal is split up into a number of narrow frequency 
bands and decimated accordingly. These subbands are then to be reconstructed at the 
- 116-
receiving end to produce the original input signal. The problem was to eliminate the fre-
quency domain aliasing due to the decimation at the analysis filter bank. and also to recon-
struct the signal perfectly such that the analysis/synthesis operations behave as a pure 
delay with no magnitude or phase distortions. Earlier approximate solutions which pro-
vide good reconstruction properties with low aliasing components are supplemented by the 
recent discovery of a class of perfect reconstruction filter banks which is theoretically per-
fect in that the system behaves as a pure delay. The drawback is the increased amount of 
processing required to implement the analysis and synthesis filter banks. A review of the 
filter bank theory was carried out in section 3.2. 
The concept for this approach in the application to the f.exible TMUX is that instead 
of the synthesis of the entire input signal. only a portion of the subband channels are 
reconstructed. These subbands are in consecutive frequency locations so that the recon-
structed signal is a wider band signal made up of a number of the subband bandwidths. 
Any number of the subbands may be synthesized and the method provides a very flexible 
approach to the variable bandwidth TMUX problem. The only condition is that the wider 
bandwidths must be a multiple of a fundamental which is the bandwidth of a subband 
from the analysis filter bank. This offers an obvious advantage for this scheme over the 
one considered in the previous chapter. in that the decimation rates or the bandwidths of 
the wideband channels are not constrained by implementation issues such as the DFf size 
in the previous method. 
There are some differences between the filter bank applied here and the filter bank 
theory developed for the original problem arising from the fact that not all the subbands 
are used in synthesis in this application. The process described here will be referred to as 
the partial reconstruction filter banks and in section 4.2 some of their essential properties 
are described. The consequences on the filter design and practical implementation are 
examined. A simplified technique of filter design which offers a reasonable tradeoff 
between computational complexity and system performance is then described. In section 
4.3 practical issues relating to filter design and implementation are addressed and tech-
niques leading to computationally efficient structures are described. The computational 
requirement of the structures are examined in section 4.4 with a number of different sys-
tem configuration. It is shown that greater efficiency can be achieved using a combination 
of filtering structures when the number of wideband channels is large. 
- 117 -
4.2 Theory of Analysis/Synthesis Filter Banks 
The theoretical background to the perfect reconstruction filter banks is first described. 
It is then extended to the case of partial reconstruction filter banks. The conditions of 
alias-cancellation and signal reconstruction are established and the important consequences 
concerning the filter design for these filter banks are highlighted. 
4.2.1 Review of Perfect Reconstruction Fitter Bank Theory 
, HO(z) , ~M - jL , FO(z) .... .... .... .... 
x - HI (z) - ~M ..... jL ..... F1(z) (z) ~ ~ ---;;;0 .... Y(z 
, + ..... .... .... 
) 
• · 
I • · 
I • · 
I 
- ~_l(z) -. ~M ... jL ... F K-tz) ~ ,. ,. ,. 
Fig.4.1 General analysis/synthesis filter bank 
The original structure of a K -channel analysis/synthesis filter bank is shown in 
FigA.1. There is no strict relationship between the decimation rate M. the interpolation 
rate L and the number of channels K although attention is given to the case of 
maximally-decimated filter banks where L = M = K. It should also be noted that the 
analysis and synthesis filters are completely general and that no relationship is assumed 
between any of them. Attention is however given to the case when the filters are a 
- 118-
frequency translation of a prototype which has computational advantages. As for the gen-
eral filter banks structure shown in Fig.4.1. the analysis filter output of channel k is given 
by [106] the z -transform relationship 
(4.1) 
where Hie (z ).X (z) are the z -transform of the k th channel filter and the input sig-
nal respectively. and WN=e-j2TTIN. 
Similarly. the synthesis filter output is given by 
(4.2) 
where Fie (z ) is the z -transform of the k th channel synthesis filter. 
Combining (4.1) and (4.2) and assuming L =M gives the input-output relationship 
for the classical filter bank as 
(4.3) 
The terms for nonzero values of l correspond to the aliasing terms. The condition for 
aliasing-cancellation is therefore given by 
X-I r. HIc (zW~ ).F1c (z) = 0 (4.4) 
Ic=O 
for l = 1.2 .... .M-1. 
For perfect reconstruction of the original signal. the condition is given by considering 
the terms of l = O. leading to the requirement of 
(4.5) 
where T(z) is the system transfer function and must equal a delay for perfect recon-
struction. 
The problem is then to find the sets of filters HIc (z) and FIc (z) such that (4.4) and 
(4.5) are satisfied. The difficulty in satisfying both of these conditions is that if the filter 
frequency responses are not allowed to overlap so as to satisfy (4.4). (4.5) cannot be 
satisfied unless the filters are designed to have extremely narrow transition band such that 
- 119-
the lack of overlap is negligible. In this case the filter lengths become unnecessary long and 
the filter bank is inefficient. An approximate solution was found in a special case of K = 2 
in which case the filters can be made to be mirror image of each other the alias components 
are cancelled. although perfect reconstruction was not achieved [147]. Such two-band 
filter banks are therefore called quadrature-mirror filter (QMF) banks. The significance is 
that the main aliasing components are cancelled rather than prevented using very sharp 
transition bands. Subsequently solutions were found for the two-band QMF banks [148] 
whereby the spectral factorization of half band filters used in conjunction with the previ-
ous alias-cancellation techniques lead to alias-free perfect reconstruction. The same princi-
ples extend easily to number of channels greater than 2 and that M th-band filters can be 
spectrally factorized to provide perfect reconstruction without aliasing. However the 
resulting filters from this method have poor stopband attenuation properties especially 
when M is large. 
More general solutions were found through the use of matrix identities [130-132]. In 
particular an elaborate type of all-pass structures are used to construct the analysis and 
synthesis network. The all-pass functions are 'structurally 10ssless' such that the resul-
tant filter responses always satisfy the alias-free perfect reconstruction conditions and 
therefore good filter responses can be obtained through some optimization techniques. 
However, since the filters are not frequency translated versions of a prototype, the efficient 
polyphase-FFf type structure cannot be used and the computation requirement of the all-
pass type structures tend to be excessive in comparison. 
Hence the tradeoff between computational efficiency and performance here has led to 
the investigation of efficient filter structures and design methods which only approximate 
the perfect reconstruction properties but doing so more efficiently. The general approach to 
these structures was to use real filter responses instead of complex responses. They are 
similar to the variation of the DFf filter banks known as single sideband (SSB) filter 
banks [106]. Since the filter responses are real, the channels are located in the frequency 
range O~cu ~1T, and hence the channel bandwidths are effectively half of that in the DFf 
filter banks discussed earlier. This fact may then be exploited by a careful choice of the 
channel filter responses to achieve near perfect reconstruction with negligible aliasing 
[149.150]. The channel filters are also frequency translated version of a prototype and 
hence may be implemented using very efficient structures. 
- 120-
This approach of SSB filter bank was taken for the design of flexible TMUX for its 
computational efficiency as well as its suitability for partial reconstruction filter banks. 
Partial reconstruction results in certain differences to the filter bank theory and conse-
quently the filter design. due to a different set of design criteria which is described in the 
next section. 
4.2.2 Partial Reconstruction Filter Banks 
..... Ho(z) --., ~M .... iL ..... FO(z) ..... .... -;? ..... 
x 
.... Hi (z) .... !M - iL .... Fi (z) Yo( (z) .... , - -
- + -.- -
z) 
• · 
I 
· · 
I 
· · 
I 
- HJ - 1 (z) 
.... !M .... iL .... FJ_/z) I ..... - - ..... 
I !M iN .... H. (z) - ..... - Fi (z) I ..... ..... ..... .... I 
I Yn(z 
· 
) 
· I + ..... -,.. 
· 
· I 
I ..... HK_ 1(z) 
~ !M -. iN -. F K-tz) 
- " 
.... ..... 
Fig.4.2 Partial reconstruction filter bank 
A partial reconstruction filter bank is shown in Fig.4.2. The structure shows the case 
for reconstructing a wideband channel making up of J subband channels. Without loss of 
generality. the J channels are assigned starting from channel O. It can be easily seen in the 
following that these channels may instead be any J consecutive channels out of the K 
- 121 -
subbands and the derivation still holds. 
Starting from the synthesis filter bank. the reconstructed output signal is given by 
1 J-l 
Y(z) = J1c1:;oLF1c (z ).X1c (zL) (4.6) 
The system transfer relationship is then given by 
(4.7) 
For computational efficiency. the channel filters are mad~ to be frequency translation 
of a prototype. such that 
(4.8a) 
FIc = F(zWj) (4.8b) 
where H(z )Y(z) are the lowpass prototypes for the analysis and synthesis filters 
respectively. 
Using these expressions. the k th channel analysis filter output is 
(4.9) 
Substituting (4.9) into (4.6) gives the input-output relationship as 
(4.10) 
Equation (4.10) is the basic relationship required for the design of partial reconstruc-
tion filter banks and it shows certain differences with the conventional filter bank theory. 
Conditions for perfect reconstruction and alias cance11ation similar to (4.4) and (4.5) for 
the conventional filter bank case can now be established by an examination of (4.10). 
First of a11. it can be easily seen that for a computationa11y efficient structure. the 
numbers of analysis and synthesis channels. K and J. and the decimation and interpola-
tion rate. M and L . must obey the relationship 
L 
M = 
J 
K 
(4.11) 
- 122-
It follows that L = ] for maximally decimated filter banks with M = K. 
0: F Fl 
. 0 (C~(f) Tn ... 
0: 
F J . ---1 
·0 
: fs 
: fs 
(d) Y(f):~~ 
0: : fs (e) 
G(f) I \ I 
. , . 
0 fs.L/M fs 
FigA.3 Relationship between signal and filter in partial reconstruction filter bank. 
To help identifying the terms corresponding to the wanted signal component. the 
figurative representations of the input signal. analysis and synthesis filters. and the output 
signal frequency responses are shown in Fig.4.3a to 4.3d respectively. The output response 
shows the ideal desired output signal from the system. This system in effect performs 
sampling-rate conversion by a rational factor LIM. Consider now a system implementing 
fractional sampling-rate conversion as shown in Fig.4.4. which is carried out by an initial 
interpolation by L and subsequent decimation by M with a filter G (z ) acting as the com-
bined interpolation and decimation filter. The input-output relationship of this system is 
given by [106] 
(4.12) 
- 123 -
X_(z_) ~1L--t_L----11 1 G(z) 
Fig.4.4 Fractional sampling-rate conversion filter 
The ideal desired output signal is then given by 
(4.13) 
with 0:E;; w :E;; 217'. 
This corresponds to the terms for 1 = 0 in (4.12). The conditions which gives this 
ideal output can therefore identified to be 
G (e J (fA)-2Trl)IM) = L for 1 = o. (4.14a) 
and 
G (e J (fA)-2Trl)IM) = 0 for 1 = 1.2 •... .M-1. (4.14b) 
These conditions can be satisfied by simply defining an ideal response for G (e J fA)) as 
L. 
o. 
Iw I '217'/M 
otherwise. (4.15) 
Comparison of (4.12) with (4.10) shows the direct parallel between the terms in a 
fractional sample-rate conversion system and the partial reconstruction filter bank. It 
implies therefore that for the partial reconstruction filter bank to obtain the ideal output 
shown in (4.13). one possible solution for the analysis and synthesis filters is to make 
them satisfy the condition 
- 124-
]-1 
r, F(zWf)H(z L'M WlWk) = G (Zl/MWk). l = O.1. ... .M-1. (4.16) 
k=O 
Combining (4.15) and (4.16) then leads to the solution for the analysis and synthesis 
filters as 
L. 
]-1 
r, F(zWJ)H(ZL1MW!) = 
k=O O. 
L O~CU ~27T-
... M' 
otherwise. (4.17) 
This ideal response is shown in FigA.3e. These illustrations serve to confirm the 
operations performed in the analysis of the partial reconstruction filter bank.. In fact it is 
entirely possible to arrive at the result of (4.17) purely in this qualitative manner. How-
ever. it will not be straight forward to consider the problem further qualitatively to 
encompass such effects as non-ideal analysis and synthesis filters. This solution is also 
only one particular solution amongst many. It is possible to use the general analysis of the 
partial reconstruction problem to arrive at other solutions different from that of (4.17), 
which are not possible from the qualitative consideration. In addition. certain assumptions 
may be made in the filter design procedure to obtain approximate but computationally 
attractive solutions. To this end. the previous theoretical analysis is used in the further 
investigation into such filter design methods and is described in the next section. 
4.2.3 Filter Design for Partial Reconstruction Filter Banks 
Elaborate filter design methods and processing structures have been devised for uni-
form maximally-decimated filter banks with perfect reconstruction properties [131]. 
Apart from the relatively high computation requirement of the structures. these design 
methods cannot be applied directly to the partial reconstruction filter bank because of 
their different set of conditions for aliasing cancellation and perfect reconstruction as dis-
cussed before. The same methods of aliasing cancellation and perfect reconstruction are 
therefore not available to the partial reconstruction case. Instead the perfect reconstruction 
condition is approximated by a particular method of filter design. 
The filter design procedure uses an approach that was applied to uniform SSB filter 
banks [150]. The filter bank has real channel filter responses and hence the K subband 
- 125-
channels are located in the range O~cu ~77' instead of O~w ~277' in the complex DFf filter 
banks. The analysis channel filters are frequency translated versions of a lowpass proto-
type and are located at the channel centre frequency of 
CUk = 277' . (2k +1) . _1_ 
4K 
The filter coefficients of any channel k are then expressed in terms of the prototype 
coefficients as [150] 
(4.18) 
where h (n) is the prototype filter coefficient with n = O.1. ...• Nh -1. and Nh is the 
filter length. 
The synthesis filters are similarly related by 
I 2k + 1 N f -1 77' f k (n) = 2cos 277' 4J (n - 2 )+(2k +1)4 f (n) (4.19) 
where f (n) is the length N f synthesis prototype filter coefficient. n=O.I .... .Nf -l. 
J is the number of channels being reconstructed. 
Equation (4.19) shows a different synthesis prototype filter {f (n)} from the analysis 
prototype {h (n )}. They are identical if J = K • which is the case in the conventional filter 
banks. but not so in partial reconstruction filter banks. Instead. the numbers of channels J 
and K. and the decimation and interpolation rate M and L are related by (4.11) as before. 
To enable near perfect reconstruction and alias cancellation. the prototype must be related 
by 
F(z) = H(zLIM) (4.20) 
The reasons for defining this relationship will become clear in subsequent discussion. 
Given this relationship. the analysis and synthesis channel filters may be expressed in 
terms of a common prototype in the z -domain by. respectively. 
Hk (z) = eJt/l1 H(zWl2k +1)/4) + e -JtlJ1 H(zWx (2k+l)/4) 
Fk(z) = eJ91H(zLIMw12lc+l)/4) + e-J91H(zLIMWx(2k+l)/4) 
(4.21a) 
(4.21b) 
-126 -
where cf>k and ak are arbitrary phase shifts designed for aliasing cancellation and are 
given by 
cf>k = (K -Nh -1).(2k +1). :x (4.22a) 
(4.22b) 
An important assumption on the prototype filter is that the resulting channel filters 
overlap only into the immediately adjacent channels. For example. Hk (z) overlaps only 
with H k- 1(Z) and H k+1(Z). Theoretically this is unachievable since it implies infinite 
stopband attenuation. but in practice it can be closely approximated by a reasonably high 
stopband attenuation such that the amount of overlap into non-adjacent channels is negli-
gible. 
Since the SSB filter bank specific to the filter design considered here is different from 
the DFT filter bank considered for (4.17). it is necessary to redefine G (z) for the SSB filter 
bank as 
I-I 
Gl (z) = r, Fk (z )Hk (zLIM wlc) (4.23) 
k=O 
The aliasing components are then found by application of the particular filter charac-
teristics defined by (4.20) through (4.22). into (4.23) for non-zero values of l. Expressing 
Gl (z ) in terms of the prototype filter H (z ) gives 
G
l 
(z) = 'fl[ ej(q,,,+fJ")H(zLIMwl41+2k+l)/4)H(zLIMwl2k+1)/4) 
l= 1 
(4.24) 
Since only adjacent channel filters overlap. the non-zero terms in the above expres-
sion are only due to the overlap of the positive frequency component of the channel filter 
with the image of its negative frequency component. and of the negative frequency com-
ponent with the image of the positive. This is illustrated in Fig.4.5. The separation 
between the positive and negative frequency components is (2k + 0." / K and 
2.,,- (2k + 1)." / K • as shown in the diagram. Since adjacent channels are spaced by ." / K . it 
filter response 
negative 
frequency 
component 
images 
positive 
frequency 
component 
images 
- 127-
1:\' [l' , , ' I t I " " I. ,:, I~ ,: : " ., "' --:::::;=-~- .: < , "' ' , '" 
iI" ).. 
: / t, ",,: : : 
'j " I ....... ' ' , 
• t. ......... I • 
:/, I --_ 
'J: I --~~ n I' i 
main overlapping regions 
Fig.4.5 Aliasing effects in SSB filter bank 
follows that the overlap occurring to channel k is due to the components at the values of 
l such that 
271'l 
--= (4.25a) 
X 
and 
271'l = 271' _ (2k + 1)~ ± 71' 
X X X 
(4.25b) 
This gives the four possibilities of l = k . k + 1. X -k . and X -k -1. Equivalently. the 
only non-zero terms in (4.24) are those corresponding to k=l.l-l. X-l and X-l-1. 
Substitution of these four values of k into (4.24) leads to further simplification of 
the expression for Gl (z). This is because some of the terms inside the summation of (4.24) 
become zero for these values of k. Since it has been assumed that non-adjacent channels 
do not overlap. the terms H(zL'MW1)H(zL'MWj) where the difference between a and 
~ is greater than or equal to 1 are in fact zeros. The result is that for each value of k 
above there is only one term in the summation that is non-zero. Equation (4.24) therefore 
- 128-
simplifies to 
Gz (z) = [ej{~1-1-81-1)+ej (~1-I-C81-I-l)] H (-zLIM W 121 +1)/4).H( -zLIM w 121-1)/4) 
+ (e -j{~1-8,) +e -j (~1-1-81-1») H(zLIM w121 +1)/4).H(zLIM W121 - 1)/4) (4.26) 
The last step in the elimination of aliasing component is to force all the terms in 
(4.26) to be zero. The technique is to make the exponential terms in the brackets cancel 
each other. Considering that 
If now an additional condition is defined such that 
Then (4.27) reduces to 
-I-Nt +1 
I ) (4.27) 
(4.28) 
(4.29) 
Substitution of (4.29) into (4.26) eliminates all the terms and alias-cancellation is 
achieved. Hence for partial reconstruction filter banks the analysis and synthesis proto-
types must be designed to satisfy (4.20) and (4.28) together. At first sight the condition of 
(4.28) appears to be undesirably rigid as it is a strict relationship between the filter 
lengths and the numbers of analysis and synthesis channels. However. by virtue of (4.20) 
it is a relationship easily satisfied. More interestingly. this condition can be bypassed by 
noting from (4.18) and (4.19) that it is in fact not a condition upon the actual lengths of 
the filters but the relative phase of the channel filters. One of the two prototypes H (z ) 
and F(z) can therefore be appended with zeros to a length which satisfies (4.28). In prac-
tice these extra zeros are only conceptual in order to introduce the necessary phase shifts 
to the channel signals without changing the channel filter coefficients in (4.18) or (4.19). 
Therefore they do not change the lengths of the resultant channel filters which can in turn 
be arbitrary. This will be shown in more detail in the design examples in section 4.3. 
The other important property for filter banks is the reconstruction of the wanted sig-
nal. which depends on the satisfaction of (4.14a). Substitution of (4.21a) and (4.21b) into 
(4.23) for l = 0 and gives 
- 129-
J-I 
Go(z) = Z-(L-I) ~ H2(zLIM wi2k +1)/4)+H2(zLIM W£(2k+1)/4) 
k=O 
(4.30) 
This is similar to the 'power complementary' conditions required for perfect recon-
struction in conventional nIter banks [132]. In such cases it is sufficient to satisfy the fol-
lowing constraint for perfect reconstruction to hold. 
(4.31) 
However. consideration of (4.30) with this constraint shows that perfect-
reconstruction is not achieved in the partial reconstruction filter bank except for the 
unrealizable case of 
Sampling rate 
conversion 
by filler 
bank 
leul ~ ~ 
2K 
otherwise. 
H H:H H 
rrrt\ 
: ~region of 
f5/2 uncancelled aliasing 
Fig.4.6 Uncancelled aliasing in partial reconstruction niter bank 
For all practical filters. in order for (4.30) to reduce to a pure delay z-(L-I) and thus 
perfect-reconstruction. (4.31) requires overlap between H2(zLIMWi2k+1)/4) and 
H 2(z L1M W£(2k+1)/-4) for 1c=J-l. As illustrated in Fig.4.6. this is not the case with 
-130 -
partial reconstruction. The result is that instead of signal reconstruction. aliasing is caused 
due to the overlapping of the signal in the Jth channel into the J -1th channel. This form 
of aliasing is not cancelled by the previous procedure because it is constituted by the terms 
with 1 = 0 and the previous aliasing-cancel1ation technique only eliminates the terms with 
non-zero values of I. Using filter prototypes with reasonably small transition bandwidths 
this aliasing component can be made very small. From the multirate filtering point of 
view. this is exactly equivalent to the aliasing due to the transition band of the filter in 
sampling-rate conversion with rational factors expressed in (4.12). As with all multirate 
filtering systems this source of aliasing component is normally negligible because the signal 
magnitude in the transition region is designed to be very small. Hence prototype filters for 
the partial reconstruction filter bank are designed to satisfy the constraint of (4.31). This 
is described in the next section. 
4.3 Filter Design and Implementation Techniques for Partial Recon-
struction Filter Banks 
A simple and efficient method for the design of prototype filters for partial recon-
struction filter banks is described. The design approach approximates the constraints dis-
cussed previously while achieving high computation efficiency. Design examples are shown 
to confirm its properties of near-perfect reconstruction and aliasing cancellation. Computa-
tion reduction techniques for implementation are then examined and efficient processing 
structures are arrived at. 
4.3.1 Filter Design 
There are two approximation problems in the design of the prototype filters. Firstly 
the analysis prototype H (z ) is designed to satisfy the power-complementary constraint of 
(4.31). Secondly the synthesis prototype F(z) is derived from the synthesis prototype 
subjecting to the conditions for filter lengths and numbers of channels in (4.28) and fre-
quency responses in (4.20). 
- 131 -
4.3.1.1 Design of Analysis Prototypes 
A number of different approaches are available for designing power-complementary 
filters. The special case for the two-channel filter bank can be designed by spectral factori-
zation of halfband filters [148]. For more than two channels the method can be general-
ized to the spectral factorization of Mth-band filters. It was shown [132] however that 
this method produces filters of poor stopband attenuation characteristics or aliasing cancel-
lation is not achieved. Since the special near-perfect reconstruction method discussed in the 
previous section only approximates aliasing-cancellation by assuming no overlap between 
non-adjacent filters. the method of spectral factorization of Mth-band filters can actually 
be applied because exact aliasing-cancellation can be sacrificed in order to achieve good 
stopband characteristics. The factor against the use of spectral factorization is rather the 
non-linear phase characteristics of the resultant filter. Exact linear phase cannot be 
obtained and approximate linear phase is only obtained by arbitrary choices of the z-
domain zero pairs for the spectral factors. and even then the group-delay variation is con-
siderable [148]. This approach is therefore unsuitable for the variable bandwidth TMUX. 
Another approach [150] is to use an optimization procedure applied to filters designed 
using windows such as the Hanning window. The filters so designed are not optimal in the 
Chebyshev sense and further improvement in efficiency is possible [115]. 
A reasonable tradeoffs between computation efficiency and performance is the pri-
mary concern in the design method described here. It is based on the observation that. as 
opposed to conventional filter banks for which the previous filter design methods are dev-
ised. the partial reconstruction filter bank applied to the variable bandwidth TMUX are 
not required to carry out signal reconstruction at all times. It can be assumed that most of 
the time there will be more narrow band channels. which are the outputs from the 
analysis filter banks. than wide band channels requiring reconstruction. Further more. if 
this is not the case the partial reconstruction filter bank will not be an efficient structure 
for the variable bandwidth TMUX as a good deal of processing is spent in separating chan-
nels which are only recombined later. This intuitive argument is examined in more quanti-
tative manners in the next section. The objective in the filter design is therefore to obtain a 
near optimal filter in the Chebyshev sense while approximating the filter bank constraints 
for near perfect reconstruction. This minimizes the amount of computation in the analysis 
filter bank while providing reasonable reconstruction properties. 
-132 -
The filter design approach is based either on the Remez exchange algorithm [151] or 
the linear programming method [140] in order to achieve the objective of obtaining a near 
optimal solution in the Chebyshev sense. The linear programming method was imple-
mented for its ease of imposing constraints and maintaining control of the iterative pro-
cess. The technique to incorporate the power-complementary constraint (4.31) into the 
design algorithm is then to include additional frequency domain constraints which approx-
imate the power-complementary behaviour in the frequency domain. The passband and 
stopband region of an optimal filter is equiripple and thus. apart from the error due to the 
passband and stopband ripple. readily approximates power-complementary requirement. 
The region where additional constraints are enforced is the transition band. Obviously 
there is an infinite number of transition band characteristics that satisfy the power-
complementary condition and it is impossible to determine what constraints should be 
enforced at these frequencies. Also there is a direct relationship between the number of 
constraints and the optimality of the resulting filter. in that the greater the number of 
constraints. the more unequiripple the . optimal' filter becomes. This is due to the 
mathematical basis for optimal filter design in which a linear phase filter of length N (N 
odd) has a maximum of (N-1)/2 frequency constraints satisfied with equality. These con-
straints correspond to the ripple maxima and minima in the optimal filters. Hence the price 
to be paid for having additional constraints is that some of the original maxima and 
minima are now satisfied with inequality. and the general result is an increase in the ripple 
magnitude together with the ripples being uneven in magnitude. 
It is therefore desired to keep the number of additional constraints to a minimum. 
firstly because the optimum choice of the additional constraints is unknown and secondly 
to preserve the equiripple characteristics of the optimal filter. The only constraint enforced 
is then at the centre frequency of the transition band where the filter response is deter-
mined by 
(4.32) 
Although this appears to be a simple condition on the transition band behaviour. it 
has proved to be very effective. and filters designed using this technique show good stop-
band attenuation and near-perfect reconstruction. This is demonstrated in the design 
examples together with the synthesis filter bank in section 4.3.1.3. 
-133 -
4.3.1.2 Design of Synthesis Prototypes 
Given an analysis prototype. the problem is then to derive the synthesis prototypes 
for any number of reconstruction channels J while satisfying the constraints (4.20) and 
(4.28). Equation (4.20) leads directly to the time domain relationship 
( ) _ M K f n - h(n-) = h(n-) 
L J n = O.1. ... .N,-1. (4.33) 
This means that the derivation for F (z) is trivial for integer values of K / J . where 
the coefficients of {f (n)} are just the decimated versions of {h (n )}. On the other hand. 
when K IJ is not integer. (4.33) is meaningless and the {f (n)} coefficients need to be cal-
culated. There are two ways in which this may be done. Firstly it can be seen that the 
operation required is equivalent to carrying out fractional sampling rate change to the 
impulse response {h (n)} and hence {f (n)} may be so obtained using the normal pro-
cedures [106]. The problem associated with this method is that the length of the resultant 
impulse response depends also on the interpolation-decimation filter used in the sampling 
rate conversion process. The closer this filter is to the ideal expressed in (4.15). the longer 
the impulse response of {f (n)} becomes. It is then necessary to truncate it to a reasonable 
length using windowing such as the Kaiser window. which inevitably causes deviation of 
the response from the original. 
The second method is to design the synthesis prototype individually using frequency 
constraints derived from H(z) and (4.20) and applying them to the algorithm used to 
design H (z). This technique has the advantage of being able to directly specify the filter 
length. but suffers from the fact that the transition band is not well constrained and the 
error in the transition band with respect to (4.20) is not controlled by the design algo-
rithm. It is. however. the transition band on which (4.20) needs to be strictly imposed 
because the error in the passband and stopband is only in the order of magnitude of the 
ripple which is already designed to be very small for TMUX application. It is therefore 
likely to be inferior to the first method in accurately determining {f (n )}. This conjecture 
was indeed confirmed by numerical examples and {f (n)} was generally calculated using 
sampling rate conversion followed by windowing of the analysis filter impulse response. 
The second constraint (4.28) is related to the filter lengths and the numbers of chan-
nels. It can simply be satisfied by determining the size of the window according to (4.28). 
This is a reasonable method for many different specifications attempted. If it is 
- 134-
unacceptable to limit the filter length in this way. another simple method is to append 
zeros to both ends of the impulse response of either the analysis or synthesis so that the 
condition (4.28) is met. Appending zeros in this way does not make any actual Changes to 
the coefficients of the channel filters. as can be seen from (4.18) and (4.19). The operation 
in fact introduces delay into the filter bank and because the analysis-synthesis filter banks 
are time-varying operations. their phase responses are altered. by the time delay and may 
thus be varied to meet the constraints. However it does not offer completely arbitrary 
choices of the filter lengths because in the case with more than one reconstruction channel. 
the length of the analysis prototype must satisfy more than one of the (4.28) type con-
straints and while there is complete freedom in the choice of one of the synthesis proto-
types. the others are then fixed by the length of the analysis prototype so chosen. In most 
practical cases this is not a problem and a combination of the two techniques normally 
provides the appropriate compromise. 
4.3.1.3 Design Examples 
Two different specifications of analysis filter bank are designed. In each of these cases 
two different reconstruction filter banks for two different wideband channel bandwidths 
are designed. The properties of the analysis prototype characteristics. reconstruction and 
aliasing errors are examined to demonstrate the merits of the design approach. 
Fig.4.7a to 4.7e show a design example for a length 63 analysis filter which may be 
used in a filter bank of 8 channels. Fig.4.7a shows the frequency response of the analysis 
prototype. Fig.4.7b and 4.7c show the frequency responses of the reconstruction filter bank 
and the most dominant aliasing component. corresponding to 1 = J -1 in (4.23). for a 
wideband channel reconstructed from channels 1 and 2. Fig.4.7d and 4.7e show the recon-
struction and aliasing responses for reconstructing the different channels of 1 to 3. It can 
be seen that the number of reconstruction channels does not make any significant 
difference to the stopband attenuation and transition band characteristics. 
The second design example is shown in Fig.4.8a to c. The prototype is of length 99 
for a filter bank of 12 channels. Fig.4.8a shows its frequency response. Fig.4.8b shows the 
reconstruction filter bank. response for reconstructing channels 1 to 5. and Fig.4.8c is the 
most dominant aliasing component. 
100 
10-1 ~ F ~ 
10-2 
100 
i-
l-
I-
i-
CI) 10-1 
a 
.... 
~ 
:E 10-2 
- 135-
( -: -
-
-
-
-
ft." " II" ft." 1\ " II " 
" 1\ ,.,,, 
-
-
0.5 0.6 ·0.7 0.8 0.9 1 
Frequency (nomalized) 
Fig.4.7a Frequency response of analysis prototype 
10-3~-----L--------~----~--------~----~--------~----~--------~----~----~ 
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
Frequency (nomalized) 
Fi~.4.7b Frequency response for reconstruction of two channels 
100 ~--------~------------------------------------------~ 
10-1 
] 10-2 
.~ 10-3 
:E 
10-4 
10_5L-~~ ______ ~--~~LL~~L-~-i~~~~~--~------L-~~ 
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
Frequency (nomalized) 
Fig.4.7c Frequency response of aliasing component 
- 136-
100 _-......------....r.--
] 
.... 
~ :E 10-1 
CI) 
-g 
... 
.... 
6b 
«S 
:E 
10-2 0~~~--:-;;-~~----:~-~----'---...L-----1------1--~ 
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
Frequency (nomalized) 
Fig.4.7d Frequency response for reconstruction of three channels 
100 
10-1 
10-2 
10-3 
10-4 
10-5 
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
Frequency (nomalized) 
Fig.4.7e Frequency response of aliasing component 
These examples show that the design method is capable of producing filters of good 
stopband attenuation characteristics together with low aliasing component and good recon-
struction response. The dominant aliasing components in Fig.4.7c.e and 4.8c show dis-
tinctly the overlapping region between the I -lth and Ith channel filters which is not 
CI) - 137 -
"§ 
-
.... 100 ~ ~ ~ 
::E ~ ~ 
~ 
': 
-
-
~ 
-
10-1 F F ~ 
:: 
: 
-~ 
-~ 
-
~ ~ 10-2 F ,., ~ 
~ : 
~ 
-f-
..... 
-
10-3 
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
Frequency (nomalized) 
Fig.4.8a Frequency response of analysis prototype 
CI) 
-g 
-..... 100 ~ 
::E 
10-1 
10-2 
0.5 
10-3~--~----J---~L---~----~--~----J---~----~--~ 
1 0.6 0.7 0.8 0.9 o 0.1 0.2 0.3 0.4 
Frequency (nomalized) 
Fig.4.8b Frequency response for reconstruction of five channels 
~ 100 r---------------------------------------------------------~ 
.... 
~ ::E 10-1 
10-2 
10-3 
10-4 
0.5 
10-5~----~--~----~----~--~~-L--~----~--~----~--~ 
1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 o 0.1 
Frequency (nomalized) 
Fig.4.8c Frequency response of aliasing component 
- 138 -
cancelled. as discussed previously. They also show another difference with conventional 
filter bank of which the aliasing components are approximately uniform across the fre-
quency spectrum. In the partial reconstruction filter banks here. it can be seen that the 
aliasing noise contributed by reconstructed channels are much more significant than chan-
nels not processed by the synthesis filter bank. as may be expected. The passband devia-
tion of the reconstruction responses is in the same order of magnitude as the passband rip-
ple of the prototype. and hence may be deemed acceptable. Passband ripple and stopband 
attenuation can of course be improved by increasing the prototype filter length. 
4.3.2 Implementation Techniques 
Design of computationally efficient structures are based on the channel filter being a 
frequency translation of a prototype. and that all the filters and signals being real. The 
filter bank is implemented using the Discrete Cosine Transform (DCT) in a polyphase 
block processing structure. The DCT is in turn modified to allow it to be evaluated using 
the DFf. This modified DFf filter bank is then further optimized by implementing the 
polyphase filter branches in the frequency domain. resulting in a two-dimensional DFT 
that may be evaluated via the polynomial transform. This complex series of modifications 
results in high computational efficiency. although the structural complexity is considerably 
increased. 
4.3.2.1 DCT Implementation Via DFf 
The DCT has been proven as an efficient technique in SSB filter banks where the chan-
nel filters frequency responses are real [87]. The derivation for the polyphase-DCT filter 
bank for the particular set of analysis filters given by (4.18) is detailed in Appendix 4A. It 
is seen that the output signal for the specific set of analysis :filters defined in (4.18) can be 
expressed in terms of the polyphase decomposition of the prototype filter as 
X-l 
Xl (n ) = E Cx (k .p) p pen ) k=O.1. ... ,K-l. (4.34) 
p=o 
where Cx(k.p)=2cos[(2k+1)(2p+1}7T/4K]. and pp(n) is polyphase branch filter 
output as defined in the Appendix. 
- 139-
. To implement this DCf using standard DFf algorithms, the 
polyphase output sequence is modified to produce another sequence as [151 J 
q p(n) = p pCn)+ jpx-p(n) (4.35) 
Defining an odd-squared DFf of this new sequence as 
X-l 
x" (n) = 1: q p(n) W ~+l)(2p+1) (4.36) 
p=o 
Strictly speaking the exponent of (4k + 1) makes it not conform to the standard 
odd-squared DFf. but a generalized DFf with a frequency origin of 1/4 [106]. This does 
not make any difference in terms of implementation because the generalized DFf can be 
implemented with any arbitrary time and frequency origins. The choice of this frequency 
origin is to facilitate the use of a polynomial transform in a subsequent modification. 
Substituting (4.35) into (4.36) gives 
X-l X-l 
x" (n ) = 1: p pen )W ~ +1)(2p+1) + j 1: p p(n)W Jf +l)(2p+l)W if (4.37) 
p=o p=o 
which simplifies to 
X-l 
- 1: Cx (2k .p) p p(n) (4.38) 
p=o 
Considering (4.34) and (4.38) for even and odd values of k shows that 
(4.39a) 
(4.39b) 
with k' = 0.1, ...• ~ -l. 
Hence the odd-squared ncr can be evaluated using an odd-squared DFf with the 
necessary modifications to the input sequence and ordering of the output sequence. This 
series of operations is represented in the block diagram shown in Fig.4.9. 
- 140-
Xo 
x 2 
K-point x 4 2 
Odd 
DFT 
x3 
xl 
Fig.4.9 ocr implementation using DFf 
4.3.2.2 Implementation of 2-D DFT by Polynomial Transform 
The final step in optimizing the filter bank structure is to use the polynomial 
transform approach for efficient implementation of a two-dimensional DFf [153]. 
Firstly the polyphase branches are implemented using DFf convolution such as the 
overlap-save method. There are a number of operations required on the output from these 
polyphase branches, as described in the Appendix, but they may be carried out in the fre-
quency domain because of the linearity property of the DFf. The output from each 
polyphase branch are now occurring in blocks of samples instead of one odd-squared DFf 
as before, the structure may now have a number of odd-squared DFf's in effect operating 
on input samples corresponding to different time instances. This then results in the struc-
ture shown in Fig.4.10. It can be seen that the lOFT's for the convolution of the polyphase 
branches together with the odd-squared DFf's form a two-dimensional DFf which is to be 
implemented using the polynomial transform. It is also assumed for the figure that the 
appropriate overlapping necessary for DFT convolution is carried out and hence not shown 
explicitly. 
C 
0 
.-
-' 
co 
..J 
;:l 
E 
Input '-<V 
0. 
<V 
0. 
E 
<'0 
CIl 
..J 
;:l 
0. 
c 
-
· 
· 
· 
Frequency 
K DFT's domain 
multiplication 
tf.l 
. ....., C 
>..~ 
..o~ 
c"O 
0"0 
. ct:l 
-' 
ct:l tf.l 
CJ tf.l 
.- 0 
....... I-.. 
.e- CJ 
-' 
-'U 
;::1 C 
~ ct:l 
- 141 -
K lOFT's 
M-
point 
lOFT 
M-
point 
lOFT 
M-
point 
lOFT 
K-point 
Odd 2 
DFT 
M ODFT's 
Oulpu t 
Fig.4.10 Block implementation of polyphase filters and odd-squared DFf 
The theoretical derivation of the polynomial transform for the two-dimensional DFf 
is given in Appendix 4B. As shown in the derivation. the polynomial transform requires 
both the sizes of the IDFf and the odd-squared DFf to be a power of 2. and also that 
4N ~ M where M and N are the respective sizes. For practical cases of interest. these con-
ditions are easily satisfied as in the examples shown in the next section. 
The operation required in the polynomial transform is. as shown in the Appendix. 
approximately in the order of M DFf's of size N. The original structure has N IDFf's of 
size M followed by M DFT of size N. Hence the advantage in computational efficiency is 
clear. The actual computation requirements for different system specifications is examined 
in the next section. The same optimization techniques are equally applied to the synthesis 
filter bank resulting in similar gain in efficiency. This is also considered in the different 
cases in the next section. 
- 142-
4.4 Computation Requirements 
The computational requirements of the processing struct'1res depend on many system 
parameters such as the number of reconstruction channels and their bandwidths. It is 
therefore impossible to give general assessments covering all variations in these parameters. 
However. once the number of narrow-band channels and the analysis filter specification is 
defined. it is possible to evaluate the computation requiremer...t systematically. Two design 
examples resembling practical sets of parameters are examined which give an idea of the 
operations required. 
Table 4.1 and 4.2 show the operation counts for the two examples. The values in the 
table are the number of operations required per input sample;· The structure performs the 
polyphase branch filtering using DFr convolution. and the subsequent DCf using polyno-
mial transform as described previously. All DFf's are performed by radix-2 algorithms. 
The lengths of the synthesis prototype are determined by (4.28). and the synthesis filter 
banks employ the same techniques of polynomial transforms and DFf convolutions. Also 
by the same virtue the size of the DFf for the synthesis polyphase branch is made identi-
cal to that of the analysis. The arithmetic operations are classified into 'system overhead' 
which refers to the analysis filter bank. and the synthesis channels reconstructed from the 
numbers of narrow-band channels shown as . channel bandwidth' in the tables. The 
values are for real multiplications and additions. 
Table 4.1 shows the system for 16-channels with an analysis prototype of length 
129. The polyphase branches are evaluated using DFf's of size 16 and an overlap of 9 in 
the overlap-save scheme. In reconstructing numbers of channels not of power 2. non-
radix-2 DFf must be used in the calculations of the polynomial transform. This is broken 
down where possible to smaller DFf's uSing the Cooley-Tukey indexing scheme described 
in the previous chapter. 
It can be seen that the operation counts where non-radix-2 DFf in involved become 
considerably higher because of the inefficiency in the DFf of small prime. These may of 
course be improved using other fast algorithms. The theoretical limit of operations due to 
reconstruction is equal to the system overhead when the number of reconstruction chan-
nels equal the analysis channels and the synthesis filter bank is the mirror image of the 
analysis filter bank. 
- 143 -
Channel Bandwidth Multiplication Addition 
, 
System Overhead 38.9 55.1 
2 2.6 3.5 
3 5.1 4.7 
5 10 8.3 
8 14.9 16.0 
12 29.1 27.2 
15 38.6 35.9 
Table 4.1 Operation counts for 16-channel example 
Table 4.2 shows the operation count for the case of 64 channels and with a prototype 
:filter of length 641. The size of DFf for the polyphase branches is 32 with an overlap of 
11 samples in the overlap-save convolution. 
Channel Bandwidth Multiplication Addition 
System Overhead 35.0 54.6 
2 0.52 0.78 
3 1.0 0.87 
8 3.2 4.7 
16 7.2 10.5 
32 16.0 22.6 
48 26.3 28 
Table 4.2 Operation counts for 64-channel example 
A comparison between the two tables shows :firstly that the system overhead is 
approximately the same. This is in accordance to the feature of the FFf type :filter bank 
that the computation requirement varies linearly with the number of channels. The 
- 144-
operation counts for reconstruction are generally lower in the 64-channel case than the 
16-channel case. This is due to the greater efficiency in the polyphase branch convolution 
with the larger DFf size and the greater input block size. 
The order of magnitude in both of these tables are within reach of practical device 
capabilities. It also shows that the computation requirement for reconstruction is approxi-
mately proportional to the wideband channel bandwidth. Hence efficiency is achieved 
irrespective of the constituents of reconstruction channels such that for instance. a wide-
band channel reconstructed from 8 analysis channels requires similar computation as 4 
separate channels each reconstructed from 2 analysis channels. 
4.5 Summary 
A very efficient and flexible method for the implementation of variable bandwidth 
TMUX was described. The issues in partial reconstruction filter banks were examined to 
find the conditions for perfect reconstruction and alias cancellation. A particular form of 
filter banks was chosen for its efficiency. The filter design criteria were studied and a sim-
ple and effective design approach offered. This was demonstrated by design examples to 
show its properties and was found to be effective in the tradeoff between computation 
efficiency and performance. 
Further improvements in efficiency were carried out from a structural approach. The 
filter bank was modified allowing the use of polynomial transform which achieve high 
efficiency through block processing techniques. The actual computational requirement was 
then examined and shown to be of realistic orders for many practical system specifications. 
It was shown that in many cases efficiency could be improved using a combination of pro-
cessing structures so that a best compromise was used for a particular system specification. 
As with the approach described in Chapter 3. however. the actual computation require-
ment is highly dependent on the system parameters. This approach is then probably most 
attractive when the frequency multiplex consists of large number of narrow band chan-
nels. Together with the methods described in the previous Chapters they provide a selec-
tion of very efficient means of variable bandwidth filtering. The overall choices and 
- 145-
combinations between these techniques to achieve the optimum performance will be the 
subject of discussion in the next Chapter. 
- 146-
Appendices 
4A. Discrete Cosine Transform Filter Bank [150 J 
The analysis filter bank with filter length Land N channels as given by 
h. (n) = 2eos [211" 2!;1 (n- L~l )+(2k+1); 1 h(n) (4A.1) 
can be evaluated using the ocr in conjunction with a polyphase network which is derived 
as follows. 
The k th channel output of the maximally decimated filter bank is given by 
L-l 
Xl (m) = L hk (n ) X (mN -n ) (4A.2) 
n=O 
Introducing the polyphase decomposition by letting n=rN+p. p=O.1. .... N-1 and r 
= integer. and substituting (4A.1) into (4A.2) gives 
h (rN +p)x [em -r)N -p] (4A.3) 
where A = L / N is a condition assumed only for ease of algebraic manipulation in this 
derivation which extends easily to the case when this condition is not assumed. 
The polyphase decomposition of the prototype filter and the signal are defined as 
hp(r) = h (rN +p) 
xp(r) = x(rN-p) 
Defining the polyphase filter output as 
pp(m) = -p" N (m )-p" N (m) 
1r-1- P 1r+P 
where 
).-1 - 7T 
p' p(m ) = L h per )cos[(r -1)2]x p(m -r ) 
r=O 
(4AA) 
(4A.5) 
(4A.6) 
-147 -
The particular forms for this polyphase decomposition are designed to take into 
account the sample-offset and phase terms in the channel filters of (4A.1). Substituting 
(4A.4). (4A.5) and (4A.6) into (4A.3) then gives 
(4A.7) 
This shows that the analysis filter bank may be implemented as a polyphase filter 
network. defined by the first term in (4A.7) which is independent of k. followed by an 
odd-time odd frequency DCf. defined by the second term in (4A.8). 
-148 -
4B. Polynomial Transform Implementation of Radix-2, 2-Dimensional DIT [153J 
The radix-2 2-D DFf is of size MxN where M and N are both a power of 2. The 
2-D DFf of interests here is defined by 
N-IM-l V = ~ ~ W(4kl+1)(2nl+1)W-n~2 
k1.k2 Lt Lt V n1.n2 8N M 
nl=0n 2=O 
(4B.2) 
where vn1n2 are the input samples and k 1=O.1. ... .N-1. k 2=O.1. ... .M-1. 
Defining a permutations on the index of k2 such that k2 is mapped onto (4k 1+1)k 2• 
Notice that this mapping is unique for M a power of 2. Then (4B.2) becomes 
(4B.3) 
Polynomial notations can then be used to represent the 2-D DFT [152] in (4B.3) by 
defining 
(4B.4) 
(4B.5) 
(4B.6) 
(4B.4) defines the polynomial representation of the input samples. The modulo opera-
tions in (4B.5) and (4B.6) means respectively that zN is replaced by - j and z replaced 
by W ~ 1+ 1). which is the definition of a polynomial modulo another polynomial. 
Combining (4B.5) and (4B.6) and using these modulo arithmetic relationship results 
in 
(4B.7) 
which is an inverse polynomial transform. 
The inverse polynomial transform operation may be understood as modifying the 
power of z as in the second term in (4B.7). and doing modulo operation on the result. This 
amounts to the cyclic shift of the sequence. due to the change on the power of z . and mul-
tiplying all overflowed terms. i.e. terms with powers of z greater than N . by - j . 
- 149-
Equation (4B.7) then represents M polynomials of N terms, and may be defined by 
N-l 
V(4k 1+1)k 2(Z)= 1: dk2 /.Z 1 
1=0 
where dk 2,1 represents the terms of the polynomials. 
Substituting (4B.8) into (4B.6) leads to 
which corresponds to M odd-squared DFf's of size N. 
(4B.8) 
(4B.9) 
Since (4B.9) is related to the original DFf simply be permutations on the index, as in 
(4B.3), the 2-D DFf is obtained. The operations involved are then 
1. Place input samples in polynomials terms as defined by (4BA). 
2. Perform inverse polynomial transform defined by (4B.7). 
3. Perform M odd-squared DFf's defined by (4B.9). 
4. Reordering index mapping Vk 1.(4k 1+1)k 2 back to Vk 1.1 2· 
5 
DSP Hardware Design 
for Flexible Transmultiplexers 
5.1 Introduction 
In the previous chapters. results concerning the theoretical aspects for flexible TMUX 
were discussed purely from the points of view of optimization of the algorithms. In this 
chapter. the practical problems relating to hardware implementation of these algorithms 
are addressed. The objective of this investigation is to design a processing structure which 
can support the high computational requirement of practical systems. A general technique 
to optimize usage of any processing structures for implementing the TMUX is described. 
Algorithms and structures for DFI' are then considered with respect to the ease of mul-
tiprocessor implementation and flexibility for implementing different DFI' sizes. The 
- 151 -
operation and provisions for reconfiguration of a multiprocessing design are examined. 
Efficient techniques for the realization of the structure using low-cost digital signal proces-
sors are devised and the application of the methodology is illustrated by two design exam-
ples. 
5.2 Hardware Capability Tradeoffs 
In a system having a combination channels of different bandwidths. the computation 
requirement depends on the specific system parameters. In particular the number of chan-
nels. their bandwidths and frequency allocations are the most important factors. In the 
ideal case where computation requirement is unconstrained the hardware may simply be 
designed to accommodate all possible system configurations. In practice this is neither 
necessary nor desired since the system configuration is inherently constrained by other fac-
tors such as the capability of the mobile earth stations or the multicarrier demodulators 
which follow the TMUX. As a result it is possible to tradeoff the TMUX hardware using 
these system constraints. 
The essential problem is then to obtain a comprehensive assessment of the computa-
tion requirement in relation to all system configurations needed. so that the hardware may 
be designed just enough to support this particular range of configurations. Alternatively. 
given a maximum hardware capability. the constraints on system configuration may be 
established and the system specified accordingly. It is therefore necessary to develop a 
general approach whereby the constraints on these factors may be systematically set up to 
satisfy a given computation capability. This approach is described in the following for two 
different situations in which different methods of flexible TMUX are applied. 
Although it is always possible to use a combination of these processing structures. 
the practical advantages in computation is minor in comparison to the increase in complex-
ity when system reconfiguration is required. In addition to the fact that a general pro-
cedure to devise the optimal algorithm from an exhaustive consideration of all possible 
combinations will be computationally intensive. it is also impractical to design hardware 
architectures which can implement any combinations of these structures. Hence in the fol-
lowing only the separate consideration of the two types of structures are carried out. 
- 152-
5.2.1 Uniform Frequency Allocation and Fractionally-Related Bandwidths 
The first situation considered is one that can be implemented by the 
analysis/synthesis filter banks described in chapter 4. It requires the channel bandwidths 
to be a multiple of a basic bandwidth such that 
OI.k = integer. (5.1) 
where Bk is the normalized angular bandwidth of the channel k. BK = 2'TTI K for 
complex DFT filter banks. and = 'TTl K for real SSB filter banks. 
Also these channels are required to be located uniformly such that their centre fre-
q uencies are expressed by 
for OI.k odd • 
= 13k . BK for OI.k even. (5.2) 
The procedure to devise a systematic approach to the tradeoff between computation 
requirement and system configurations is then equivalent to the problem of finding the 
relationships between the system parameters and the operation counts. A procedure similar 
to that carried out in Chapter 4 for assessment of the computation requirements can be 
applied to obtain such relationships since the operation counts due to different system 
parameters are independent of each other. in that the computation due to channels of one 
bandwidth is independent of that due to a different bandwidth except for the system over-
head. which only varies with the number of analysis channels. This leads to a linear rela-
tionship between the system configuration in terms of the numbers of channels of different 
bandwidths and the operation counts. as shown in (5.3). 
K-l 
= Cs + 1: Cini 
i=2 
(5.3) 
where Ctotal is the total operation count for the TMUX per input sample. for either 
multiplications or additions. Cs is the operation count for the system overhead due to the 
analysis filter bank. which is a function of the number of analysis channels K. C j is the 
operation count due to reconstructing one wide-band channel the bandwidth of which is a 
factor i of the analysis channel bandwidth. ni is the number of such wide-band channels. 
- 153 -
Clearly the numbers of wide-band channels are related to the numbers of analysis 
channels by 
K-l 1: i ni = K (5.4) 
i= 1 
For a given hardware capability. therefore. all realizable system configurations can be 
determined by (5.3) and (5.4) alone. Since functions of many variables are involved. it is 
not possible to provide a simple graphical representation whereby the system 
configurations can be easily determined. However. the fact that they are linear functions 
allows the relatively simple methods of linear optimization [143] to be applied to obtain 
optimal configurations that will maximize the usage of computational capability. For a 
given computation capability in terms of maximum rates of multiplication and addition. 
the linear optimization maximizes the numbers of any wide-band channels according to an 
arbitrary cost function in which the numbers of channels ni are the variables. This is 
expressed by 
K-l 
- 1: Wini (5.5) 
i = 2. 
where P is the quantity to be maximized. and Wi are the arbitrary weights which 
determines the ratio between the numbers of channels of different bandwidths in the 
optimum configuration. 
In addition any arbitrary requirements on the system configuration can be enforced 
by adding the appropriate constraints to the optimization problem. For example it is often 
required to provide a minimum number of channels of a particular bandwidth. which sim-
ply translates to a constraint of the form 
(5.6) 
where Ni is the minimum number of the particular type of channels required. 
The operation counts in (5.3) also implies two constraints in the optimization prob-
lem. one for multiplication and the other for additions. Alternatively if specific processors 
are being considered. the operations counts can be the number of instruction cycles instead 
resulting in only one constraint. Finally. the operation counts themselves are limited by 
the hardware capability which leads to the constraints of 
- 154-
X-I 
Cs + 1: Ci ni ~ Ca (5.7) 
i=2 
where Ca is the maximum operation count supported by the hardware. 
In general. the solution of the linear optimization problem is 'less optimal' when the 
number of constraints is greater. in the sense that more inequality constraints are satisfied 
with inequality as opposed to equality. In the situation when (5.7) is satisfied with ine-
quality. the computational power is not fully utilized which is undesirable. It is. however. 
inevitable that in practice most of the constraints are only satisfied with inequality 
because the numbers of channels ni must be integer values. The integer programming tech-
nique [143] may be used to obtain tighter solutions. but in practice the difference between 
this technique and rounding to integer of the solutions obtained from standard linear pro-
gramming technique is small. 
Channel Bandwidth 
Operation Counts Usage (%) 
1 2 3 5 8 12 15 
1 0 0 0 0 0 1 168.5 100 
0 2 0 0 0 1 0 162.5 96 
1 0 1 0 0 1 0 160.1 95 
0 0 1 1 1 0 0 153.0 91 
1 0 0 3 0 0 0 152.6 90 
0 0 2 2 0 0 0 150.2 89 
Table 5.1 Operation counts of optimal configurations for analysis/synthesis TMUX 
An example for a flexible TMUX with K= 16 using this approach is shown in the 
Table 5.1. It is based on the results obtained previously in Chapter 4 as shown in Table 
4.1. with each multiplication and addition counted equally as an operation. The table 
shows the operation counts for a number of combinations of channels with different 
bandwidths determined by this approach The percentage usage is the operation count 
expressed as a fraction of the maximum operation count available which. for this exam-
ple. is the operation counts required to reconstruct the widest bandwidth channel plus the 
system overhead as shown in Table 4.1 previously. It can be seen that in most cases the 
configurations utilize a large percentage of the available computational power. It is 
observed also that if the channels to be reconstructed consist of bandwidths that differ 
- 155-
less, the usage of computation will be generally higher. 
5.2.2 Unconstrained Frequency Allocation 
The second type of situation to be considered is for the channels to be unconstrained 
in their frequency allocation, which requires the use of the DFT convolution type TMUX 
as described in Chapter 3. The approach applied previously to the analysis/synthesis filter 
bank type structures can also be employed in this case. Since the frequency allocation of 
the channels has no effect on the operation counts of the system, the only variables in the 
optimization problem are again the channel bandwidths and the numbers of such channels. 
The number of possible combinations of these variables are constrained by the size of the 
input DFT which forms the basis of the optimization. Given an arbitrary choice of this 
parameter, the other variables may be optimized using the previous approach so that the 
system aims to utilize the maximum computation power available. 
Channel Bandwidth 
Operation Counts Usage (%) 
1/20 1/15 1/12 1/10 1/8 1/6 
0 0 0 0 0 6 82.8 100 
0 4 0 2 4 0 79.1 98 
0 1 4 6 0 0 77.0 97 
0 5 8 0 0 0 77.2 98 
0 15 0 0 0 0 42.0 82 
20 0 0 0 0 0 32.0 77 
Table 5.2 Operation counts of optimal configurations for DFT convolution TMUX 
Some examples for the application of this technique are shown in Table 5.2, based on 
the results shown in Table 3.2 previously. It shows a number of different combinations of 
channel bandwidths given a system with an input DFT size of 120 and with the maximum 
operation count arbitrarily chosen as that required to produce 6 channels of bandwidth 
1/6. Again the usage of the available computation power are generally reasonable. The 
same observation as from Table 5.1 applies here in that the variation in operation counts 
may be reduced, and hence the percentage usage increased, if demultiplexing of channels of 
- 156-
wider bandwidths are not required. 
5.3 Multiprocessing Architectures for TMUX Implementation 
Having considered efficient algorithms and the approach to implement these algo-
rithms to full advantage in relation to any given computation power. the problem is then 
to design actual hardware architectures for implementation of these algorithms. As well as 
being able to satisfy the computation requirements of the processing structures. the 
hardware are desired to be reconfigurable and modular. Reconfigurability is essential if the 
TMUX is to remain as flexible as the algorithms allow. Modularity of hardware allows 
ease of design and implementation. From a system point of view. the hardware should not 
introduce any more constraints to the system other than those due to the algorithms it 
implements. 
The requirement of reconfigurability can be most conveniently achieved by a 
software-based approach in which the algorithms are implemented by software on pro-
grammable signal processors. which may be reprogrammed easily according to the 
configurations required. 
5.3.1 Necessity for Multiprocessing Implementation 
The simplest concept which will satisfy the requirement is a single processor having 
the maximum computation power required for the algorithms. However for any practical 
specifications applicable to land mobile satellite systems [50] this implies an order of 
several hundred Mips (Million instructions per second) which is hardly achievable by any 
single processor alone. Hence the approach of multiprocessing is taken to achieve the high 
throughput required by combining the resources of a number of processors. It is then 
necessary to examine the mUltiprocessing architectures together with the methods of paral-
lel implementation of the algorithms by software in the distributed processors. The main 
signal processing components used in the TMUX structures described previously are the 
FIR filter and the DFr. The computation rates for both operations depend on the number 
- 157-
of channels and their bandwidth as explained previously. and for the realistic order of 
magnitude in some proposed systems [50] both operations on their own still cannot be 
achieved using a single processor available at the present. and it is likely to remain so for 
the near future [154]. Hence multiprocessing is the only solution for their implementa-
tion. 
Since the system is also required to be reconfigurabie. the architecture must be 
designed to be sufficiently general-purpose for different processing structures to be imple-
mented. The processing structures or algorithms must be adapted for efficient implementa-
tion in this general-purpose architecture. The techniques to achieve this are described in the 
following. starting with some general techniques for the basic signal processing operations 
followed by the application of the techniques to the system design. They lead to efficient 
parallel implementation of the algorithms as well as system reconfigurability with 
minimal disruption to the continuous operation of the TMUX. The mechanism for 
reconfiguration of the system is also described. 
5.3.2 Parallel Implementation of DIT Algorithms 
S.3.2.1 Review of Parallel Architectures 
The DFT operation is a basic element used in both the DFT convolution and the 
analysis/synthesis filter bank and its parallel implementation is necessary for providing 
the required throughput. For example. in the DFT convolution TMUX. the multiplication 
rate of the DFT for an input sampling rate of 1M sample/s with 20 channels is about 
50Mips. which is not possible with a single processor. The particular method of parallel 
implementation required for the flexible TMUX must also allow the DFT size to be varied. 
and allocate the number of processors according to the DFT size in a systematic manner so 
that reconfiguration is facilitated. 
There are a large number of implementation techniques for DFT algorithms suitable 
for parallel implementation [155]. Most of the investigations have however been concen-
trated on dedicated hardware or VLSI implementation which perform only one particular 
algorithm. The use of look-up tables as arithmetic units has received much attention for 
- 158 -
their high speed when implemented using read-only-memory (ROM) [156-158]. This type 
of approach is not suitable for the general-purpose application considered here because the 
algorithms are fixed by the hardware. Recent advances in high-speed digital signal proces-
sors [154.159.160] also makes the use of ROM's no longer the only low-cost solution for 
high-speed arithmetic operations. However. some of the architecture design in these inves-
tigations can be implemented using other hardware. such as programmable processors. 
without losing any advantages of the architecture design. 
butterfly a ~ a+bX 
calculation: b 
a-bX 
Stage 1 Stage 2 
x(4) 
x(2) 
x(6) 
x(l) 
x(5) 
x(3) 
x(7) 
Processor , Processor 
.... 1 --;;;- 2 
Stage 3 
X(O) 
X(l) 
X(2) 
X(3) 
X(4) 
X(5) 
X(8) 
X(7) 
. 
, Processor 
/' 3 ./ 
Fig.5.1 Pipeline DFT processor-partitioning for 8-point DFT 
The first technique in the parallel architecture design is a structural approach in 
which the processors are allocated through a graphical examination of the FFT structures. 
This technique. which was the basic architecture behind an implementation approach using 
ROM's [156]. is the general method of pipeline FFT [161.162]. The architecture achieves 
parallelism by a linear network of a number of processors with each processor allocated 
for one stage of the FFT butterfly computations. An example of this is shown in Fig.5.1 
for the radix-2. 8-point FFT. In general. an N-point radix-2 FFT implemented in this 
manner uses log2N processors with 2N data transfer between the processors for each N 
- 159-
input data samples. and each processor executes N 12 butterfly operations. As the size of 
the FFT increases. there are more processors for the increased number of stages as well as a 
greater number of mUltiplications and additions in each processor due to the greater 
number of butterflies in each stage. Hence this architecture does not lend itself directly to 
system reconfiguration which cause changes in the DFT size. 
Another basic method of parallel implementation is the cube-connected network 
[163,164] which uses N 12 processors for an N -point FFT, with each processor allocated 
for log2N butterfly computations. An example of this for the radix-2. 8-point FFT is 
shown in Fig.5.2 A higher degree of parallelism is achieved especially for large N, but the 
amount of data transfer between processors is now (N 12)log2(N 12) which is greater than 
the pipeline FFT for larger values of N. The architecture can be further generalized [165] 
so that N 12k processors may be used for the N -point FFf. with 1 ~k ~log2M. Each pro-
cessor is then allocated for 2k - 1log2N butterfly computations. For example the 8-point 
FFT in Fig.5.2 can be implemented with 2 processors with each processor taking on the 
operations of processors 1 and 3, 2 and 4 respectively. The amount of data transfer is also 
reduced. This then offers a possibility for system reconfiguration in that the amount of 
computation taken by each processor can be systematically changed by allocating the 
appropriate number of butterfly operations. As the size of the FFT increases. the amount of 
computations in each processor can be kept fairly constant by increasing the number of 
processors without upsetting the orderly nature of the hardware architecture. This charac-
teristics is important for the ease of the overall control of the hardware. 
These two basic architectures for parallel implementation can of course be used in 
conjunction such that the processors in the pipeline FFf may itself consist of many pro-
cessors operating in parallel as described in the second method. Difficulties only arise when 
the DFT size does not lead to the standard Cooley-Tukey FFT structures [115]. In these 
cases, the structures of the algorithms have to be considered separately to devise the most 
efficient partitioning of the arithmetic operations for parallel implementation. This is the 
problem encountered in the split-radix algorithms [166,167] for which a multiprocessor 
implementation is obtained by allocating a similar number of butterfly operations for each 
processor in a somewhat heuristic fashion [168]. 
Another quite different approach was suggested originally for the implementation of 
large DFT's using low-cost processors [169]. It is different from all the previous 
- 160-
x(O) X(O) 
Processor 
x(4) X(l) 1 
x(2) X(2) 
Processor 
x(6) 2 X(3) 
x{l) X(4) 
Processor 
x(5) X(5) 3 
X(6) 
Processor 
x(7) 4 X(7) 
Fig.5.2 Parallel DFf structure for 8-point DFf 
techniques of parallel implementation in that it is based on algebraic manipulation of the 
DFf algorithms rather than the graphical consideration of their structures. The idea was 
to reduce a large DFf to a number of smaller ones using index-mapping techniques such as 
the Cooley-Tukey or the Good-Thomas methods [138]. The smaller DFI"s can then be 
arbitrarily distributed among a number of processors according to their computational 
requirements. The smaller DFI"s are carried out by individual processors and hence many 
serial algorithms can be used for their calculations. It is also more flexible in that the same 
approach can be used in deriving multiprocessing schemes for DFf's of different sizes. For 
these reasons, the mUltiprocessing architectures for DFf's used in the flexible TMUX is 
based upon this approach for the initial design of the algorithms suitable for parallel 
implementation. Further more. this procedure can be applied repeatedly to successively 
reducing the DFf's to ever smaller units. and in the limit when the final DFT's are of size 
2 this process is in fact equivalent to the Cooley-Tukey radix-2 FFT algorithm. The alloca-
tions of the processors are equivalent to a combination of the pipeline FFT and the cube-
connected parallel FFf architectures. An example which illustrates this point is shown in 
Fig.5.3 for an 8-point DFT distributed between two processors using the Cooley-Tukey 
- 161 -
index-mapping. Unlike the structural approach however. the computation load is not dis-
tributed evenly among the processors using this technique. and some additional considera-
tion is required in the design. This is discussed in the following. 
1- - - - - --I 1------------1 
1 1 1 1 
x(O) 2-point X(O) 4-x(2) "d DFT XC!) point ~ x(4) (Ij 
xeS) DFT rn 2-point X(2) ~ ~ 
o 0 DFT X(3) o+J ..... 
C)o+J 
(Ij (Ij 
...... o+J 
;:j 2-point X(4) 
x(l) <l.l 8 4-- ....... DFT X(5) 
x(3) "d ~ 
"d point <l.l x(5) ...... 0.. ~ 
xC?) DFT E--o 2-point X(6) DFT XC?) 1 1 1 1 
1 
-------
I I ------ ______ 1 
Processor 1 Processor 2 
Fig.5.3 Index-mapping processor-partitioning for 8-point DFf 
5.3.2.2 Algorithms for the Parallel Architecture 
The parallel architecture is based on a combinations of algebraic techniques which 
lead to computationally efficient algorithms as well as modularity for ease of parallel 
implementation. The prerequisite in this approach is for the DFf size to be highly compo-
site and may thus be factored into small primes. This is usually satisfied by virtue of the 
filter bank requirement. and the choice of the DFf size is addressed in a following section. 
The steps in the algorithm are illustrated in Fig.5.4. Initially the DFf is converted to 
a two-dimensional DFT using either the Good-Thomas or Cooley-Tukey permutation 
scheme. depending on the DFf size. The Good-Thomas scheme [170] requires the DFf size 
to be factored into two integers such that if the DFf size is n = n 1n 2' n 1 and n 2 are 
relatively-prime. The input sequence can then be permuted to an n 1 by n 2 array with 
array element indices (i 1.i 2). which are related to the original sequence index i by 
- 162-
Ix{n)l 
\ V 
Cooley-Tukey or 
Good Thomas 
index Mapping 
\ V 2-D DFT 
Rader's DFT 
Algorithm 
\/ Linear c onvolutions 
Winograd 
Convolution 
Algorithm 
\ V 
Output 
index-mapping 
\ / 
Fig.5.4 Steps in constructing parallel DFf calculations 
(5.8a) 
(5.8b) 
The input index i is uniquely defined by the residues i 1. i 2. according to the Chinese 
remainder theorem [171] by the expressions 
(5.9a) 
(5.9b) 
A 2-D DFf is performed on the n 1 by n2 array. and the DFf of the original I-D 
sequence is found by another permutation operation on the array indices which convert the 
array back to a I-D sequence. The output array indices (k 1.k 2) are related to the index k 
of the required 1-D output sequence by 
(5.10a) 
(5.10b) 
- 163 -
which combine with (S.9a.b) to give the simple expression 
(S.ll) 
When the factors n 1. n2 are not relatively-prime. the Coo1ey-Tukey scheme [139] can 
be used instead. However the permutation in this case does not result in a 2-D DFT opera-
tion on the 2-D array. but extra multiplication by twiddle factors are performed between 
stages. The permutation on the input indices are simpler than the Good-Thomas scheme 
and are expressed by 
(S.12) 
where i is the input sequence index which is mapped to an n 1 by n2 array with 
indices Ci l .i 2). il=O.L .... nl-Li2=O.L .... n2-1. 
Similarly the output sequence is mapped by the index 
(S.13) 
where (k 1.k 2) are the output array indices. 
The operations involve firstly an n2-point DFT on each column of the array. fol-
lowed by a point-by-point multiplication by a twiddle factor W~lk2. and finally an n 1-
point DFT on each row of the array. The required output sequence is obtained from the 
array according to the indexing of (S.13). which is a simple column-by-column ordering of 
the array elements. These advantages of simpler permutation operations and free choice of 
factors n 1. n2 are achieved at the expense of an increase in multiplications due to the 
twiddle factors. 
For DFT size of more than two factors the process is repeated until the computations 
are made up of DFT's of prime factors. The computations of these DFT's are carried out 
using the Winograd DFT algorithms which are generally the most efficient. for small DFT 
size. in terms of the lowest combined numbers of multiplications and additions [138]. The 
operations consist of the Rader DFT algorithm which converts a DFT into a convolution. 
For small DFT. the convolutions are short and may be made very efficient using convolu-
tions in polynomial fields. These two operations are often referred to as the Winograd DFT 
algorithm [172]. 
In the Rader DFT algorithm. the input sequence is permuted by a mapping using a 
primitive element in the finite field of n . so that for an n -point DFT of the form 
- 164-
n-1 n-1 
X (k) = L W~kx Ci ) = x (0) + L W~kx (n 
i=O i=l 
the permutation results in expressions of the form [173] 
n-2 
X(pV) = x (0) + L x Cpu )W,fIl+V (5.14) 
u=O 
where p is a primitive element of the finite field of n such that 
modn 
mod n . u;v = 0.1. .... p-2. (5.15) 
Equation (5.14) shows that the DFT is obtained by the (n -l)-point cyclic convolu-
tion between the sequence x (p-U) with W,fu. The permutations of the input and output 
sequences are carried out according to (5.15). 
The Winograd convolution algorithms are constructed by reducing the convolution 
into very small convolutions which contain fewer multiplications and additions. A cyclic 
convolution can be expressed as a polynomial product as 
Y (z ) = x (z ) h (z ) mod m(z) (5.16) 
where m (z)= zn_1. 
It is possible to factor m (z) into relatively-prime polynomials mi (z) so that (5.16) 
is broken down into smaller polynomial products of the form 
mod mi(z) (5.17) 
where the polynomials are the original polynomials taken modulus mi (z) such that 
Yi (z )= Y (z ) mod mi (z ) and so on. 
The required polynomial Y (z ) is found from Yi (z ) by 
(5.18) 
where m (z) is factored into a polynomials. and Qi (z) are determined using the 
Chinese remainder theorem for polynomials [138] by 
(5.19a) 
ni (z )mi (z ) + Ni (z )Mi (z ) = 1 and (5.19b) 
Ot-1 
M; (z) = 
n m,. (z) 
r=O (5.19c) 
- 165 -
The efficiency of the Winograd DFT algorithms depends on the choice of the polyno-
mial factors mi (z). which must be considered on a case-by-case basis depending on the 
lengths of the convolutions 0:. For small DFT' s which are of interest here. a collection of 
optimal algorithms can be found in the literature [138] and those of interest here are listed 
in Appendix 5A. 
5.3.2.3 Methodology for Reconftgurable FIT Architectures 
Although the algorithm described in the previous section offers very low computation 
requirement compared to other algorithms. it has not been widely implemented because of 
its structural complexity which very often outweighs the reduction in arithmetic opera-
tion. The design of the parallel architecture here. as well as being reconfigurable. helps to 
reduce the structural complexity by avoiding complex house-keeping operations to be per-
formed by software. 
5.3.2.3.1 Processor Interconnection and Task Allocation 
The first problem is. given a number of identical processors. how the processors are to 
be connected to implement the algorithm as well as achieving the objectives of ease of 
reconfiguration and maximum speedup. As shown in the previous section. the main part 
of the computation of the DFT algorithm consists of Winograd DFT algorithms of 
different lengths which are the factors of the original DFT length. There are many ways 
to combine these Winograd DFT units (WFT) using the Coo ley-Tukey or the Good-
Thomas scheme. resulting in a large number of combinations each representing a different 
algorithm. For example. an n -point DFT where n has four prime factors ni.i = 1 to 4, may 
be implemented as represented in Fig.5.5. amongst others. The figures on the arms of the 
diagrams denote the number of times the processes in the succeeding branches are to be 
performed per DFT. The amount of arithmetic operations of these different algorithms are 
very similar. Any difference is only due to the difference between the two indexing 
schemes and that the total number of the basic WFT operations are the same for all com-
binations. This can be observed from the previous examples in both cases of which there 
are n2n3n4 DFT's of size nl' nln3n4 of size n2' nln2n4 of size n3. and nln2n3 of size n4· 
This can be easily generalized to any number of factors by noting that in using the 
Cooley-Tukey or the Good-Thomas schemes the number of DFT's of one factor is always 
equal to the product of all other factors. Hence the total number of arithmetic operations, 
- 166-
in terms of multiplications or additions or both. due to the WFf's can be expressed by 
(S.20) 
where a is the number of prime factors nj • and 0 (nj) is the number of operations of 
the WFf of size nj . 
o = X-point DFT o = data permutation 
Fig.5.S Representation for two DFf algorithms of size nln2n3n .. 
Equation (5.20) is important for the consideration of the processor configuration 
because it shows that there is no difference. in terms of arithmetic operations of the 
WFf·s. how the algorithm is organized. Further more. the amount of arithmetic operations 
due to the twiddle-factor multiplications required in the Cooley-Tukey scheme are also 
constant for a given DFf size. This is because given any DFT size which is a mUltiple of a 
number of prime integers. Cooley-Tukey permutation is required whenever equal prime 
integers exist in both factors into which the original DFf size is factorized. If the original 
DFf size contains a number of equal prime factors. it is impossible to avoid using the 
Cooley-Tukey scheme. In other words. the components in terms of the permutation and 
WFT operations in the algorithm structures such as those shown in Fig.5.5 are determined 
- 167-
by the DFr size. despite the apparently different forms the structure can take. Strictly 
speaking the number of multiplications due to the Cooley-Tukey scheme is also constant 
for a given DFr size because its application always involves a total number of twiddle fac-
tors equal to the original DFf size. However. the smaller the factors the Cooley-Tukey 
scheme factorizes into. the greater is the number of 'trivial' twiddle factors. Hence using 
the tree hierarchy concept shown in Fig.5.5. the further down the hierarchy the Cooley-
Tukey scheme is used. the smaller the number of non-trivial multiplications becomes. This 
forms the only design rule on the algorithm that if the computational complexity is con-
sidered on the basis of arithmetic operations alone. the Good-Thomas scheme should be 
used in preference to the Cooley-Tukey scheme whenever possible. Otherwise the organiza-
tion of the algorithm can be designed to best conform to the hardware architecture. Since 
the fundamental operations are the WFr's. the architecture concept becomes more 
apparent when its design is considered in terms of the allocations of these units to a 
number of processors. 
.-- ..--
.--
ata D 
in 
D ata 
tput put 
Sl 
0 S2 
0 
~ --"'" ~ 
- ~ - - ~ 
~ ~ 
0 SK 
ou 
~ 
- ~ -- -~ 
'---
L.....-
Processor 
Fig.5.6 Linear processor pipeline 
The most straight-forward approach is for each processor to be allocated the task of 
the WFf's of one length in a linear processor-pipeline of the form shown in Fig.5.6. The 
input data sequence is transferred successively down the pipeline and at each stage the 
necessary indexing operations and WFf's are performed on the data. Considering the 
operations due to the WFr's only. this architecture corresponds to the case when each of 
the terms in the summation series of (5.20) is allocated to one processor. From Appendix 
5A it can be seen that the number of operations. in terms of the total number of 
- 168 -
multiplications and additions, is not proportional to the size of the WFf. The summation 
terms in (5.20) are therefore unequal. In particular the number of operations increases 
more than the proportional increase in the size of the WFf. Given two terms from (5.20), 
Hence if the processors in the linear pipeline are identical, the speedup will be res-
tricted by the bottleneck due to the processor executing the WFf's of the largest size, with 
the processors executing shorter WFT's having more idle time as shown in the space-time 
diagram of Fig.5.7. Neglecting the overhead in data transfer and other house-keeping tasks, 
the time for carrying out all the operations for one input data block under this arrange-
ment is equal to the time required by the longest task. which is the group of the largest 
WFf's. Hence the speedup, defined as the ratio between the time required by one processor 
to execute a task to that required by a number of processors, is expressed by 
(5.21) s= o (n mu) n nr 
r .. max 
where n max - largest WFf size. 
• 
Processor .= Idle time • 
T3(1) 1 T3(2) 1--S3 
S2 
T2(1) i T2(2). T2(3) 
---
Sl T1(1) T1(2) T 1(4) 
---
time 
Fig.5.7 Space-time diagram illustrating bottleneck in linear pipeline 
The theoretical maximum of S in (5.21) is Q but substitution of practical values 
from Appendix SA shows that only a low percentage of this is achieved when there is a 
- 169-
number of different prime-factors. Although it is possible to improve the speedup in such 
cases by dividing the WFf's among the processors differently so that the time durations 
for each task are more similar, the organization of the algorithm becomes less systematic 
and requires separate consideration for each DFf size, which is not desirable for 
reconfigurability. This is also unavoidable when DFf's of different sizes are to be imple-
mented by the same number of processors. Apart from these drawbacks, this approach 
offers the advantages of simple processor interconnection and data transfer. Data transfer 
between processors are uni-directional leading to easy implementation. Low-cost FIFO 
(first-in-first-out) memories may be used which also simplify synchronization between 
processors. A special case occurs when all the prime-factors are identical and this 
approach becomes equivalent to the standard pipeline FFf [161]. The linear processor-
pipeline architecture is a viable solution for specific FFf algorithms in such cases, but 
becomes inconvenient when many different algorithms are to be implemented . 
I' • 
Processor • 
Tn1 I Tn2 I Tn3 I Tn1 
---S3 I I I 
S2 
Tn1 I Tn2 ~ Tn3 I Tn1 
I I I ---
Sl Tn1 I Tn2 I Tn3 I Tn1 I I I --- time 
Fig.5,8 Ideal space-time diagram for a size n In 2n 3n <4 DFf 
The second and more versatile approach uses a multiprocessing array type architec-
ture. It is built upon the observation that for the speedup to approach the theoretical max-
imum, the tasks allocated for each processors must be of similar time durations. A con-
venient method is to use all processors in parallel for the processing of each group of 
WFf's, as depicted ideally in the space-time diagram of Fig.5,8. This also has the advan-
tage that the same task allocation concept can be applied to different DFf sizes without 
significant changes to the way the WFf's are shared. The difficulty with this approach is 
the added complexity in the organization and transfer of data in order that the processors 
- 170-
are not slowed down by such operations. Consider again the diagramatic representation of 
the algorithm as shown in Fig.5.5, a possibility is to separate the permutation operations 
from the WFf's as shown in Fig.5.9. The advantages of this partitioning are that firstly 
the permutation operations need not be duplicated by different processors as would be 
necessary if the permutation operations reside in the processors which also carry out the 
WFf's. Secondly it leads to efficient data transfer so that only data required for the 
WFT's need be transferred to the processors executing the WFf's. This would result in an 
architecture resembling a star network shown in Fig.5.10a. The input data is buffered and 
transferred into the 'hub' processor which performs all permutation operations. Considera-
tion of the space-time diagram of Fig.5.8 shows that the hub processor has to transfer data 
to the WFT processors at a rate higher than or equal to the rate of execution of a single 
WFf multiplied by the number of processors. This is highest when the WFT is shortest, 
which is n = 2. From Appendix 5A this is equivalent to 4 complex additions which nor-
mally requires about the same number of processor cycles as the transfer of the 2 input 
data samples by the hub processor reading from the input buffer and writing to the dual-
port memory. This implies that the hub processor must operate at a rate equal to the sum 
of the WFf processors. On the other hand when the WFT's are of a higher order, for 
example n = 5, the amount of computation per WFT is a factor higher and the operation 
rate required of the hub processor is lowered by the same factor. Hence either the utiliza-
tion of this processor will be very low or the system becomes input-bound when low-
order WFT's are performed. 
A compromise to avoid this deficiency is to distribute the permutation operations to 
the WFT processors. These are interconnected using a mUltiport memory as shown in 
Fig.5.10b. Data are transferred directly between the WFf processors and the multiport 
memory. This places a high memory bandwidth requirement on the multiport memory but 
eliminates the need for the hub processor to organize input data for the WFf's. Special I/O 
buffering of this multiport memory is required to avoid access con1lict. A design applicable 
to a single-chip signal processor is described in section 5.4. 
Also the WFT processors must now be notified of the algorithm to be performed on 
any given group of data to determine the permutation schemes and hence the addresses of 
input and output data. This therefore still requires the existence of the hub processor, the 
purpose of which is the overall control of the system while all arithmetic operations are 
performed by the WFI' processors. The procedures for the data transfer and overall 
- 171 -
Permutation 
Processor 
Pp1 
WFT Processor P wl 
L-~--------------------------~PwN 
M processors 
N processors 
Fig.5.9 Separation of permutation and WFf operations 
Pp1 Memory Pw1 
• • • 
• • • 
PpM Memory PwN 
Hub-Processors WFT Processors (a) 
Pp1 Pw1 
• • 
• 
Memory 
• 
Data 
I/O 
PpM PwN 
Hub-Processors WFT Processors 
(b) 
Fig.S.IO Architectures for sharing of WFf operations 
- 172-
control of the operations are described in the following sections. 
5.3.2.3.2 Overall Operation 
The general strategy is to perform all housekeeping tasks using the hub processor. 
These include the tasks required in the event of a system reconfiguration, as well as the 
control of the DFf operations. On the former, the process required in response to a change 
in the DFT size is shown in the flow-chart of Fig.5 .11. The algorithm is determined by 
successively factorizing the DFT size to give at least one prime-factor at each factorization. 
This systematic approach results in a 'one-sided' structure for the algorithm such as that 
shown in Fig.5.5b. The structure is then communicated to the WFT processors which use 
the information to determine their own operations to be performed for each input data 
block. 
Begin 
Get DF1' 
Size 
\ 
Find 
Prime 
Factors 
Determine 
Algorithm 
Structure 
Update Control 
Information to 
WFT Processors 
End 
Yes 
Yes 
Good-Thomas 
index-mapping 
Factorize and generate 
left-hand branch of 
prime length WF1"s 
No 
Generate right hand 
branch of prime 
length WFT's 
End 
No 
Cooley-Tukey 
index-mapping 
Fig.5.11 Reconfiguration procedure for a change in DFT size 
The permutation of data which can be efficiently implemented by a careful organiza-
tion of the way the data samples are addressed in memory. Since the permutation 
- 173 -
operations are required for every DFf. it would be inefficient if the same index calculations 
are performed every time. Instead much of these arithmetic operations can be avoided if 
the results of the index calculations are stored and reused by a table look-up technique. 
The contents of these look-up tables are effectively pointers to the addresses of the data 
samples. For each permutation operation. two sets of tables are generated. one for input 
and the other for output data indexing. An example of such tables is shown in Fig.5.12 
which is generated for a 60-point DFf with the structure shown in Fig.5.13. The tables in 
each level is associated with a group of WFf·s of size equal to the number of rows in the 
table. The input and output sequences are indexed in ascending order such that. for exam-
ple. the 60-point sequences would be numbered from 0 to 59. The input sample indices to 
each WFT are then given by the entries in each column of the input tables. and likewise 
for the output of the WFf. The columns in the final level are simply the transposed rows 
of the previous level because final permutation factorizes into two prime integers. and 
hence the final table is not needed in practice. The final group of WFf's can obtain the 
input indices by reading the previous table row-wise. 
Strictly speaking. the number of tables increases with each permutation because each 
table represent the permutation of a row of the previous table. However the total number 
of entries in the tables are the same for each level in the tree hierarchy of the algorithm 
because it is the number of samples in the input data block. Hence the amount of storage 
required for the look-up tables is proportional to the number of levels in the hierarchy. 
which is the number of prime factors of the DFT size minus one. 
The generation of the tables is performed by the WFf processors at the request from 
the hub processor at reconfiguration. Alternatively the hub processor may be used to gen-
erate these tables but their transfer via the multi port memory to the WFT processors will 
be time-consuming when the DFf size is large. Since each WFT processor only carries out 
a portion of the WFf's in a given level it only requires the corresponding portion of the 
look-up tables in that level. Each processor then generates and stores its own part of 
look-Up tables and no duplication is needed. The storage requirement for the tables there-
fore remain the same for the multiprocessor implementation of this permutation technique. 
It may be more straight-forward in the generation of the tables that all the entries are 
generated at a level in order for the simple generation of the entries of the next level. This 
only results in a small increase in the memory requirement since the extra entries can be 
discarded once the next level is generated and at the maximum only two levels of 
- 174-
Input look-up tables 
0 2S so 15 040 5 30 55 20 045 10 35 
36 1 26 51 16 041 6 31 56 21 046 11 
Levell 12 37 2 27 52 17 042 7 32 57 22 047 
048 13 38 3 28 53 18 043 8 33 58 23 
204 049 104 39 04 29 504 19 0404 9 304 59 
Level 2 0 045 30 15 36 21 6 51 12 57 042 27 048 33 18 3 204 9 SoC 39 
40 2S 10 55 16 1 46 31 52 37 22 7 28 13 58 043 4 049 304 19 
20 5 so 35 56 041 26 11 32 17 2 47 8 53 38 23 404 29 104 59 
Level 3 
0 :JO 40 10 20 50 36 6 16 46 56 26 12 42 52 22 32 2 48 18 2t 58 8 38 204 SoC 4 304 0404 14 
.5 15 2S 55 5 35 21 51 1 31 041 11 57 27 37 7 17 047 33 3 13 043 53 23 9 39 049 19 29 59 
Level 4 
0 045 040 2S 20 5 36 21 16 1 56 041 12 57 52 37 32 17 48 33 28 13 8 53 204 9 04 049 .. 29 
30 15 10 55 so 35 6 51 46 31 lei 11 042 27 22 7 2 047 18 3 58 043 38 23 504 39 34 19 104 59 
Output look-up tables 
0 5 10 15 20 2S 30 3S 40 45 so 55 
12 17 22 27 32 37 042 047 52 57 2 7 
Levell 204 29 34 39 404 049 SoC 59 04 9 104 19 
36 41 46 51 56 1 6 11 16 21 26 31 
48 53 sa 3 8 13 18 23 28 33 38 43 
Level 2 0 15 30 45 12 27 .2 57 2. 39 SoC 9 36 51 6 21 48 3 18 33 
20 35 so 5 32 47 2 17 .. $9 14 29 56 11 lei 41 8 23 38 53 
40 55 10 2S 52 7 22 37 • 19 34 049 16 31 46 1 28 43 58 13 
Level 3 
0 15 20 35 .0 55 12 27 32 47 52 7 24 
" 
0404 59 • 19 36 51 56 11 16 31 48 3 8 
23 28 043 
30 4S so 5 10 2S .2 57 2 17 22 37 504 9 104 29 34 049 6 21 26 41 46 1 18 33 38 53 58 13 
Leve14 
0 30 20 50 40 10 12 .2 32 2 52 22 204 504 04. 14 • 
,.. 36 , 56 26 16 46 048 18 8 38 28 58 
15 .5 35 5 55 2S 27 57 47 17 7 37 39 9 59 29 19 49 51 21 11 041 31 1 3 33 23 53 043 13 
Fig.5.12 Look-up tables for 60-point DFT 
- 175 -
Levell 5 
Level 2 3 
Level 3 2 2 Level 4 
Fig.5.13 Algorithm structure for 6o-point DFf of fig.5.12 
complete tables are in existence. 
Dividing the look-up tables amongst the processors is made easier by the fact that it 
is not necessary to distinguish one table from another because the WFf's obtain input data 
according to the contents in the columns regardless of which table a particular column is 
located. Hence the tables in the same level of the hierarchy may conceptually be merged 
together so that the dimensions of the single table is nj by IT ,nr • The columns may then 
r P! I 
be distributed among P processors in a round-robin manner so that the processors are 
assigned the columns with indices of mP .mP+l.mP+2 .... .mP+P-l. m= integer. until all 
columns are allocated. This division also collaborates with the process schedule described 
in the next section. 
The overall operation is closely related to the look-up tables. The WFf's on the input 
data are performed "in-place' such that the results from these WFf's are stored back to the 
locations of the input data. The indices of the input data to the WFf's are given in the 
column entries of the look-Up tables. Referring to Fig.5.12 again. the WFf's on the 
columns of sizes from n 1 to n (It-I are then successively performed until the final group of 
size at WFf's. which are performed on the rows of the last set of tables. This completes 
- 176-
the DFf operations but permutation operations on the output data are still required to put 
the data in the correct order. In general. output permutation is required at every level of 
the hierarchy as shown in Fig.5.12. However. the one-sided structure of the algorithms 
here means that all output permutation operations are performed in succession. after all 
WFf's are completed. The mapping may then be combined together to give one single step 
of output permutation. which not only simplifies the overall operation but also reduces the 
storage requirements by having only one look-up table for output permutations. The steps 
in the overall operation of a DFf on each input block of data are summarized in the flow-
chart of Fig.5.14. The twiddle-factor multiplications required by the Cooley-Tukey per-
mutation scheme is easily incorporated into the WFf calculations. 
Begin 
Read input data 
using columns of 
table in level i 
No 
Perform 
size-D WFT's 
Write results 
back to input 
data buffer 
Read input data 
using rows of 
table in level a-l 
Perform 
size-D WFT's 
Write results to 
output data buffer 
using output 
look-up tables 
End 
Fig.5.14 DFf operations for each input data block 
- 177-
The storage requirement for the look-up tables for some examples of different DFT 
sizes is shown in Table 5.3 .. with two bytes assumed for each entry in the look-up tables. 
DFT size 16 40 80 120 256 512 1000 
Storage (KByte) 0.13 0.31 0.78 1.17 4.0 9.0 11.7 
Table 5.3. Storage requirement for look-up tables 
The storage requirements above can be easily achieved in practice. The values for the 
power-of-2 DFT's are relatively higher because of the larger number of factors. The 
radix-2 DFT's in fact represents the maximum storage requirements of DFT's of compar-
able sizes because they always have the maximum number of prime factors. 
5.3.2.3.3 Data Input/Output and Process Scheduling 
Process scheduling for the processors is relatively trivial because the DFT algorithm 
is deterministic. and so is the process schedule. In addition the algorithm is essentially a 
synchronized parallel algorithm such that a process to perform the WFT's of a certain 
level in the hierarchy cannot start until the process performing the WFT's in the preced-
ing level has completed. As each of these process consist of equal length WFT's. the 
earliest-scheduling strategy [174] may be used to achieve the optimal schedule under the 
circumstances. This means that the WFT's are equally shared amongst the processors such 
that the process schedule has ideally the form shown in Fig.5.15. for a 3-processor system. 
The diagram also illustrates two external factors which affect the efficiency of this 
scheduling strategy. Firstly if the number of WFT's of a certain size is not divisible by 
the number of processors to which they are assigned. there will be significant idle time at 
the end of the computation for this group of WFT's. This is inevitable but with a large 
ratio between the number of WFT's and the number of processors the mean utilization of 
the processors is usually not greatly affected. The second factor is the dependence of a 
group of WFT's on the output data from the WFT's on the preceding level. Since it is 
always possible that the first WFT of the next group operates on the output data from the 
last WFT in the present group. the former cannot start until the latter has completed. 
causing more idle time. Strictly speaking this can be avoided by arranging the order of 
- 178 -
executions of the WFf's such that none of them is dependent upon the immediate output 
from the previous ones, but it would require a more complicated algorithm of allocating 
the columns of the look-up tables among the WFf processors which would be time-
consuming at reconfiguration, 
Processor 
1\ ~ = idle time 
P
w3 ~1---T:3_(_1)---4I_T:3_(2-+)I_T3_(3)_ - - ~ ~--
'12(1) T2(2) T2(3) I I ---- ~---
Tl(l) I Tl(2)1 Tl(3) _ _ _ _ ~ ___ _ 
. . ~. 
time PW1~ 
• • Hub processor write input First level of WFT's completed, 
data to multiport memory, 
WF1' processors read input data 
WFT processor read input data 
for next level of WFT's 
Fig.5.15 Space-time diagram for WFf processors 
Process synchronization is facilitated by the hub processor acting as the monitor. 
with communications between processors realized by semaphores via the multiport 
memory. This involves the notification for reconfigurations and the activation of each level 
of WFf calculations. Both of these can be achieved by semaphores set by the hub proces-
sor and monitored by the WFf processors. On the latter. the WFf processors must signal 
the completion of a group of WFf·s. which is monitored by the hub processor. When the 
final group of WFf's are completed the hub processor set the semaphore which starts the 
next group of WFf's. This scheme remains applicable for the initial synchronization for 
every new block of input data. when the WFr operations cannot start until all input data 
is written by the hub processor to the multiport memory. 
- 179-
5.3.3 Parallel Implementation of FIR Filters 
Parallel implementation of the polyphase filters is not normally necessary because of 
the decimation performed at the input to each filter and the polyphase decomposition of 
the prototype resulting in very short polyphase branches. However. it is entirely possible 
in larger systems that the input sampling rate is very high in relation to the number of 
analysis channels which is equal to the decimation rate. such that processing effort of the 
polyphase branch requires more than one processor. A solution for such implementation is 
described in this section. 
5.3.3.1 Parallel Algorithms for FIR filters 
Multiprocessing implementation of FIR filters is less complex compared to the DFT. 
An obvious solution is to carry out the convolution in the frequency domain in which case 
the previous parallel DFT architecture can be applied. However the complexity of this 
realization is considerable and simpler techniques would be more straight-forward to 
implement. A well-known technique for parallel processing FIR filters is the use of residue 
number system [175] in breaking down the convolution into smaller units which can be 
implemented using ROM's look-up tables as multipliers [176]. This approach is more 
suited for VLSI implementation where the wordlengths of the arithmetic units are impor-
tant factors for trade-off. For single-chip processor implementation it does not offer high 
efficiency. Another technique is to implement the filters using the cascade realization [115] 
where the individual cascade sections can be assigned to different processors resulting in a 
pipeline architecture. The difficult issue in this technique is the critical nature in the group-
ing of the zeros of the transfer function to form the cascade sections to avoid extreme gain 
characteristics in the individual frequency responses of the sections. Obtaining the zeros 
from the impulse response also requires much computation which does not ease system 
reconfiguration. To these ends. a more straight-forward approach is the N-path digital 
filtering [177] in which the FIR filter is transformed to N parallel filter branches the 
impulse responses of which consist of samples from the original impulse response. It is a 
special case of block-processing filters [178-180] which can be less complex in terms of 
fewer delay elements when the direct implementation of the block-processing structure is 
used [181]. This structure is not as modular as the N-path structure and hence not as 
suitable for multiprocessor implementation. 
- 180-
The N-path structure is shown in Fig.5.16a. The brancn filters H j (z) are related to 
the system transfer function H (z ) by 
N-l 
H(z) = E Hj(zN) z-j (5.22) 
j=O 
Since the transfer function is related to the impulse response by 
L-l 
H (z ) = E h (k) z-k (5.23) 
k=O 
for a length L filter and h Ck ) is the impulse response samples. the transfer functions 
of the branches is given by 
M-l 
H j (z ) = E h (kN + i )Z-k (5.24) 
k=O 
where M= (L +1) div N for i = O.1. ...• (L +1) mod N -1. and M= (L +1) div N +1 
for i = (L + 1) mod N to N -1. with x div y represents the integer part of x /y . 
This is in fact the formulation for polyphase decomposition of FIR filters [106] to N 
polyphase branches. but since there is no sample-rate conversion in the system. the form 
of (5.22) does not lead to that of polyphase filter networks. In particular. although the 
rate of multiplication for each branch filter is a factor N of the original filter. the input 
samples are not decimated in the branch filters where all samples are required. Hence the 
total number of delay elements in the system. and therefore the storage requi:-ement. is 
increased by a factor of N as shown more clearly in Fig.5.16b. This is not a real problem 
in practice since the maximum filter lengths in the applications here are in the order of a 
few hundreds representing a storage requirement of around 1KByte per branch. 
5.3.3.2 Processor Interconnection and System Operation 
The design of the multiprocessing architecture is again considered with respect to the 
ease of data transfer and system reconfiguration. With each transfer function Hi (z) 
implemented in one processor. there is one output sample for each input sample. which 
must be distributed to all processors. However. since there is no interaction between pro-
cessors except for the summation of the output from all processors. it is not strictly neces-
sary to provide communication means between processors such as the multiport memory 
used in the DFf. An architecture as shown in Fig.5.17 will then be efficient for data 1/0. 
x(n) 
-1 
z 
z 
-1 
-1 
z 
x(n) 
......... 
-
\ 
\ 
-N 
z 
hO 
-N 
z 
hi 
z 
-N 
hN- 1 
/ Z-l 
l! Z-l 
-1 
z 
z 
-N 
hN 
z 
-N 
hN+1 
-N 
z 
h 2N- 1 
- 181 -
......... Hdz ) ......... 
--- -
........... H 1(z) 
......... 
--- -
• 
• 
• 
(a) 
-N 
z 
----
h2N 
-N 
z 
. ----
h2N+ 1 
• 
• 
• 
-N 
z 
- ----
h3N- 1 
(b) 
Fig.5.16 N-path implementation of FIR filter 
-N 
z 
hkN 
-N 
z 
hkN+ 1 
I 
I 
-N 
z I 
h2kN- 1 
I 
y(n) 
- 182-
The hub processor is responsible for the transfer of data to and from the N-path filter sys-
tem as well as system reconfiguration. Communication between the hub processor and the 
FIR processors are carried out using dualport memory. In addition the FIR processors are 
connected by FIFO's which are designed to optimize data I/O procedure . 
ata D 
II 0 
"" 
--
...lo. 
r 
~ 
r 
..... /-
- ! Hub 
Processor 
Memory 
Memory 
• 
• 
Dualport 
Memory 
, P1 
P2 
• 
• 
FIR 
Processor 
-( 
FIFO 
I 
-( 
FIFO 
I I 
Fig.5.17 Processor interconnection for N-path FIR filter 
The architecture offers two efficient ways of data transfer and overall operation. and 
the choice between them is a matter of tradeoff between memory bandwidth requirement 
and operational complexity. Assuming that the processor Pi implements the transfer 
function Hi (z). the first method is to transfer input data to M 1 only and subsequently 
the input data are transferred successively to the other processors via the FIFO·s. The out-
put data from each branch are transferred to the hub processor where their summation is 
carried out. This method requires the minimum of one data write per data read operation 
by the FIR processors by placing the summation of output data externally at the hub pro-
cessor. The drawback is that the time relationship between the output data samples from 
different branches are modified by the delay in the FIFO's and the FIR processors and the 
summation must account for this which then requires storage at the hub processor and 
more organization in its operation. The second way is for the summation to be carried out 
by the FIR processors. The overall operation is simplified in that for an input data sample. 
- 183 -
the output from the processor using this input data is simply added to the accompanying 
intermediate sum of the output from all previous branches. This intermediate sum is then 
transferred to the next processor via the FIFO's together with the previous input data 
sample. This is carried out in order to implement the unit sample delay between successive 
branches. Input data are transferred to the processor P 1 and output from the N-path filter 
are obtained from processor PN • which obliterates the need for the dualport memories M 2 
to M N - 1 and the architecture becomes a linear pipeline. The drawbacks of this method is 
that there is an extra data read operation of the intermediate sum for each input data sam-
ple for the filter branch. When the filter length is long this increase in data VO operation 
is insignificant compared to the arithmetic operations and this method has the advantage of 
simpler operation over the previous one. 
5.4 Efficient Techniques for Digital Signal Processors 
Application of the parallel architectures and algorithms using programmable signal 
processors can be made more efficient by taking advantage of the special characteristics of 
the processors. In particular the aspects likely to cause inefficiency in practical implementa-
tion are examined and some techniques for their improvement are described. The TMS320 
family of digital signal processors is the subject of investigation although the techniques 
are applicable to many other similar processors. 
5.4.1 Use of Multiplier-Accumulator in N-path implementation 
Most digital signal processors are equipped with a hardware multiplier-accumulator 
(MA) making it very efficient to perform the operation of the kind 
K-l 
Y = L aCi) bCi) 
i=O 
where the combined multiply-add operation only requires a single processor cycle. 
The N-path filter branch has transfer function Hi (z N) and the MA cannot !Je applied 
directly. The technique is to organize the input data efficiently to allow use of the MA 
- 184 -
Hi (z) 
:1 ~ H. (z) ~. t=n+l ~ X{z) 0- 1 ----a Y{z)=Hi (ZH)X(Z) > ~. J \ 
~ • ~ 
~N 
:1 
• 
Hi (z) I >~N 
N filter branches 
Fig.5.18 Realization for a path filter of an N-path structure 
without proportionally increasing the amount of processing in necessary rearrangement of 
the data. Using network identities for multirate networks [106] the transfer function of 
Hi (zN) can be easily proven to be equivalent to the network shown in Fig.5.18. Since the 
branches in Fig.5.18 are now Hi (z) which implies the normal convolution, the MA can be 
used. The organization of the input data is also convenient because it is similar to the 
polyphase decomposition of the signal. Since there are a total of N branches there need to 
be N separate buffers where the data are updated and shifted in turns. A convolution 
operation is carried out for the buffer where the data update and shift is executed, with 
the coefficients for the convolutions being the same for all buffers. All of these operations 
can be implemented very efficiently using a combination of the RPT and MAC instructions 
of the TMS320 processor [182] to achieve singl~ycle multiply-accumulate. 
5.4.2 Prime-Length DIT using Multiplier-Accumulator 
Another fundamental operation directly influencing the efficiency of the TMUX algo-
rithms is the WFf of small prime lengthS. A procedure has been reported [183] which 
- 185 -
devises DFf algorithms for use with MA to reduce the number of operations for small 
prime-length DFf. The algorithms for the DFf sizes of 3. 5. and 7 are shown in Appendix 
5B. The computational requirement in terms of MA operations are shown in Table 5.4. 
together with the combined numbers of multiply and add for the Winograd DFf. 
DFf size DFf using MA Winograd DFT 
2 4 4 
3 12 16 
5 40 44 
7 72 88 
11 160 208 
13 204 228 
Table 5.4. Operation counts for DFf using MA 
Compare to the operation counts of the Winograd DFf algorithm as shown in Appen-
dix 5B. the reduction is generally small. ranging from zero for the size 2 DFf to the max-
imum of around 20% for the others. but is still useful when computational resources are 
costly. 
5.4.3 Global Memory Interface Design 
As explained in section 5.3.3 the multiport memory. which is a global memory. is 
accessed by a number of WFf processors and has a very high memory bandwidth require-
ment. To achieve this an interfacing technique which prevents memory access conflict and 
hence allows the bandwidth of the memory components to be fully utilized is devised. 
The technique can in general be applied to any type of processors although in the following 
description some particulars of the TMS320C25 processor are exploited. It was originally 
designed for zero-wait-state global memory for two processors. but it is shown here that it 
- 186-
can be generalized to apply to any number of processors given that some conditions on the 
access time of the memory components are satisfied. 
Quarter 
Phase 3 1 2 
CLKOUT~~ __ ~ __ ~/ 
CLKOUT2 
STRB 
Address 
Data 
I \ 
--..:...------ , , ~--r--
---=-----, , , ,.----~ \~'-----'/ 
~~(~~: _va_l_id~: ____ ~~~ 
, , 
--'------'----~( valid )1----..:....-
, i , 
, dat4 read, 
..... ,E=------.--:~~ , , ~
~ 'unused' '.-/ 
time 
(a) read cycle 
Quarter 
Phase 3 1 2 
CLKOUT~L __ ~ __ ~/ 
CLKOUT2 
STRB 
Address 
Data 
, p, 
data bus starts being driven 
tl I 
E ~ 
'unused' 
time 
(b) vvrite cycle 
Fig.5.19 Read/write timing of the TMS320C25 processor 
The memory read and write timing for the TMS320C25 processor is shown in 
Fig.5.19a,b respectively. For zero-wait-state memory read, data must be ready at about 2 
quarter-phase after STRB is asserted at the end of quarter-phase 3. It can be seen that 
since there can only be one memory access per processor cycle, the memory is only 
accessed half of the time at maximum and hence with some interfacing logic the same 
memory can be shared as a global memory between two processors with zero-wait-state 
access. This was the basis for the original contribution. This idea can be extended to more 
than two processors based on the observation that if the memory access time is a factor N 
of the processor cycle, theoretically there can be a maximum of N memory access per cycle 
and hence a global memory with zero-wait-state shared by N processors is possible. The 
interface logic must then perform the buffering of address from the processors to the 
memory and the data between them. The difficulty with more than two processors is 
in the buffering for data write operations. Since data is not ready until about one quarter-
phase after address is asserted, data write from one processor will interfere with other 
- 187 -
memory access by the others. The only solution is to buffer the data and address and per-
form the write operation at the next processor cycle. which means in there may be a max-
imum of two memory access operations by a processor at any processor cycle. This applies 
to half of the total number of processors and hence the maximum access time of the RAM 
needs to be a factor of about 2N /3 of the processor cycle. This is illustrated by the exam-
ple below. 
RAM 
~ cd ...; 
.... ~ 
cd 0 
ca, '0 () 
Interface 
Logic 
TMS 1 TMS 2 •• 
(a) 
TMS N 
Read 
Cycle 
Write 
Cycle 
Quarter 
Phase 
Address 
to RAM 
Data from 
interface 
to TMSl 
Data from 
interface 
to TMS2 
Address 
to RAM 
Data from 
interface 
to RAM 
3 4-~ 1 2 
~. 
<3S 
. : l . 
: . valid data iatch~d ~ . 
by interfa~e . . 
. write addres~ 
: of previous 
: cycle . 
~ alid 
: data write 
. from previous 
: cycle, stored 
in interface 
oat a Jatched : 
from·TMSl 
data bus 
(b) 
Fig.5.20 Multiport memory interface for TMS320C25 
The block diagram of the concept is shown in Fig.5.20a. and an example of the timing 
relationship for four processors is shown in Fig.5.20b. The logic functions for realizing 
these operations are shown in Fig.5.21 for interfacing two processors operating in syn-
chronized clock phase. The logic functions for the other two processors are exactly the 
same. This example can be readily implemented using commonly available fast TTL and 
static RAM with access time of around 15ns. In the diagram of Fig.5.20b. the timing for 
two processors operating in synchronized clock phase is shown. This can be achieved by 
means of the SYNC input of the processors and a common clock. The other two processors 
in this example would have clock phases differed by two quarter-phase from the ones 
shown. hence their memory access will not interfere with each other. 
16 TMS1 
addr.---+-?I 
STRBl 1 
CLKOUT2 
- 188 -
16 Buller 
TllS2-+-1'---I 
data 
16 R.Ul 
>------t_t_ data 
S6 
Sl 
OS2~S6 
RW2-L.J 
o QI---+-S1 
C ;Q 
~~:4HD :~S7 CLK~~ ..... C __ I"'I.-,I RWl 
STRB1 
OSl----o 
",,17ns 
delay 
Reset 
-Bns delay 
CLKOUT2 
CLKOUTl 
OSl~To Chip-Select 
OS2~ofRAM 
[Symbols correspond to TMS320C25 
SIgnals, with trailing number denoting 
the two different processors.] 
Sl 
S7 
RW2 
OS2 S5 
STRB2 
SlID 
CLKOUT2 
To R/W 
'----
of RAM 
CLKOUT1~ Sl 
CLKOUT2 S4 
S7 
Fig.S.21 Interface logic for four TMS320 sharing global memory 
S3 
S4 
- 189-
As illustrated in the figure. the difficulty in the design of the interface logic is the 
lack of events by which the logic can be driven. For a number of processors less than four. 
the quarter-phase signals CLKOUT1 and CLKOUT2 can be used with possibly some small 
delay as shown in Fig.5.21 to generate the events to trigger actions such as latching the 
data from the memory to a buffer on the processor data bus. When more than four proces-
sors are present. the only solution would be to use an external clock for the 1)rocessors 
.. 
which is divided from a high rate clock. which can then be used for generating the events. 
The logic functions would be more complex and the timing 
5.5 System Realization and Design Examples 
The methodology for the algorithms and architecture of the components in the 
TMUX has been described. The issues in connecting these components together and provid-
ing means of reconfigura tion of the system are discussed here by means of two design 
examples which illustrate the application of the overall methodology described in this 
chapter. TMUX's using the two structures. namely the DFT convolution and the 
analysis/synthesis filter bank methods. are considered separately. 
5.5.1 DIT Convolution Realization 
5.5.1.1 System Realization 
The main functions of this structure are the input DFT. frequency-domain multipli-
cation. and the inverse DFT's (IDFr). The input DFr can be implemented by using 
directly the parallel architecture shown in section 5.3.2. The frequency-domain multiplica-
tion is in the same form as the twiddle factor multiplication in the Cooley-Tukey permu-
tation scheme and can therefore be carried out in the same way by the WFr processors. 
The application of the parallel DFr architecture to the IDFT's is less straight-forward 
because it involves a number of IDFT's of generally different sizes operating on different 
portions of the data from the frequency-domain multiplication. However. they can be 
treated by a similar procedure as the input DFr in two aspects. Firstly the IDFT XIDFI' (k) 
- 190-
of a sequence x (n) can be expressed as 
1 N-l X * (k) - ~ x*(n)WNkn IDFI' = 
N i=O 
(5.25) 
where the superscript * denotes complex conjugate. 
The IDFf can thus be obtained by any DFf algorithms operating on the complex con-
jugate of the original sequence. and then taking complex conjugate of the output sequence. 
Secondly. the partitioning of the input sequence into many shorter sequences for the 
IDFf's is not fundamentally different from the permutation operations in the input DFT. 
in the sense that both involve a number of smaller DFf modules operating on subsets of 
the input data sequence the indices of which are predetermined by the algorithms. The 
IDFT's are only more complex in that instead of a single input sequence. the input data 
itself is to be treated as many sequences of generally different lengths. The same metho-
dology for constructing parallel algorithms and architectures can therefore be applied to 
the IDFT's. with the IDFf's shared among a number of processors interconnected by a glo-
bal multiport memory as shown before. Process scheduling is identical to the DFT case 
with the exception that the IDFf's themselves are totally independent of each other and it 
is not necessary to synchronize the start and finish of each with the others. 
Look-up tables can be used as before to implement the process allocation and 
scheduling. The efficient allocation of the IDFT's among multiple processors depends on 
the specific numbers and sizes of theIDFf's. Since the IDFT's are independent of each 
other. the general principle is to allocate operations associated with an IDFT to the same 
processor in order to minimize idle time resulting from the sharing of computation of a 
single IDFT among a number of processors. The idle time in this method of allocation will 
be due to the differences between the computational requirement of groups of IDFf's allo-
cated to different processors. In a practical system consisting of a large number of chan-
nels of various bandwidths. appropriate groupings of the channels can usually be found so 
that such differences are small. In the cases where this is not satisfactory, sharing of the 
larger IDFf's among the processors can be resorted to. The increase in processor utilization 
is then at the expense of increased complexity in process allocation and scheduling. and 
consequently a system reconfiguration will also be more complex. 
Transfer of data from the input DFf part to the IDFf part may be carried out by 
FIFO or dualport memories connecting the two hub processors. This is illustrated in the 
- 191 -
design example that follows. 
5.5.1.2 Design Example 
The example considered here is specified as shown in Table 5.5. It is based on the 
example shown in Table 3.2 of Chapter 3. The multiply/add time is the minimum time 
required by the TMS320C25 to execute either multiply or add separately. or a multiply-
accumulate instruction. Complex input sampling is assumed. such that there may be a 
total of twenty of the narrow-band channels of 16kHz each. 
Level 1 5 
Input DFr size 
Maximum filter length 
Narrowest channel bandwidth 
Input sampling frequency 
Processor multiply/add time 
120 
63 
16 kHz 
320 k sample 
100 ns 
Table 5.5. Design example specification 
Level 2 3 
Level 3 2 
2 2 Level 4 Level 5 ~-.J 
Fig.5.22 DFr algorithm of design example 
- 192-
Using the previous methodology for designing DFf algorithm. the input DFT algo-
rithm would then have a structure as shown in Fig.5.22. Assuming that MA algorithms 
are used for the WFT modules. the number of MA operations can then be evaluated using 
(5.20). The algorithm uses two Cooley-Tukey permutations which results in multiplica-
tion alone. Frequency-domain multiplication is also assumed to be carried out in this part 
of the system. The resulting amount of arithmetic operations is shown in Table 5.6 .. with 
the total operations being the sum of the MA and the multiplication since both requires 
the same number of processor cycle. 
MA operations per DFf 2160 
Multiplication per DFT 1200 
MA rate 11.9 M/s 
Multiplication rate 6.6 MIs 
Total operations 18.5 M/s 
Table 5.6. Arithmetic operations of input DFT 
To determine the number of processors required. details concerning the processors are 
needed in order to estimate the number of processor cycles required for specific operations. 
For TMS32OC25, data read/write requires loading the auxiliary registers with addresses 
from the look-up tables followed by subsequent operations using the indirect addressing 
mode. To minimize overhead in data access, each WFf is to load the data into on-chip 
RAM locations so that subsequent reference to the data can then use direct addressing. and 
the WFf can be efficiently implemented using 'straight-line' code. The number of instruc-
tion cycles for the WFT's required in the example is shown in Table 5.7. based on the 
algorithms of Appendix 5B. 
The numbers of cycles for the subroutines are quite high in comparison with the 
theoretical number of multiplications and additions required by the algorithms. This is 
due to the data manipulation required both in the ordering of input/output data and in the 
arithmetic operations themselves. A large proportions of the instruction cycles are used 
directly in dealing with data I/O. For example. the data I/O instructions for the size-7 
WFf amount to about 20% of the total. For the size-2 WFf, 24 out of the 30 cycles are 
for this purpose. This is not a result of the multiprocessor implementation but a common 
feature of mixed-radix DFf algorithms which requires complex index calculations. The 
- 193 -
WFf size Number of Instruction Cycles 
2 30 
3 98 
5 201 
7 294 
Table 5.7. TMS320 instruction cycles for WFf subroutines 
convolutions involved in the MA algorithms are of lengths eN -1)/2 where N is the DFT 
length. Since N is very small for the subroutines required. the overhead in setting up the 
pipeline for single-cycle MAC operation makes the use of the MAC instruction less efficient 
compared to separate multiply and add instructions. This approach was therefore used in 
the assembly language subroutines. as can be seen in the subroutine for the N =7 WFT in 
Appendix 5C. It also shows the mechanism for data access using look-up tables which are 
to contain the actual addresses of the input data. Columns of the look-up tables are 
arranged in consecutive locations to facilitate access by simply incrementing a pointer. 
Scaling is also carried out at the end of the subroutine according to the usually method for 
DFf scaling [107]. The subroutine has been written as a straight-line code to maximize 
speed of execution. The use of look-up tables allows the equivalence of index calculation 
to be performed in three instruction cycles. which certainly cannot be achieved if it is car-
ried out directly. Different WFf sizes can be implemented in the same manner as the sub-
routine shown. Some reduction in the number of instruction cycles can possibly be 
achieved by rearranging the order of some operations and hence eliminating some data 
store/load operations between accumulator and memory. but the reduction was found to 
be small. 
The WFT subroutines have been simulated using the TMS32OC25 Simulator to verify 
the correct operations of a single processor in the system. In this case the single processor 
carries out all the WFT calCulations. The look-Up tables occupy 720 addresses and have to 
be located in the area of external memory. adding a few extra memory management 
instructions to the subroutines. This is only necessary for the purpose of simulation and 
- 194-
for actual multiprocessor implementation this is normally not necessary. Impulses were 
used as input to the system to obtain frequency responses of the system for a few 
different channel bandwidths. These are shown in Fig.5.23 and they do not exhibit 
significant degradation from the theoretical design shown in Chapter 3. Some alterations to 
the subroutines were made to bypass decimation of the output so that the frequency 
responses could be obtained. They do not affect the nature of the subroutines as far as the 
arithmetic operations are concerned so the results are accurate representation of the system 
responses. 
Applying these figures to the design example leads to a processing requirement of 
78.2 Mips(instruction cycles per second). Allowing some cycles for other housekeeping 
tasks. a total of nine processors will then be required for the implementation of the input 
DFT. This number of processors may appear high but it compares favourably with similar 
type of implementation [169] where the top limit for a 5 Mips processor was reported to 
be a 128-point DFf at the sampling rate of 20kHz. 
The amount of processing required for the IDFT's depends on the combination of 
configurations required of the system. The most comprehensive situation is to provide 
enough processors to accommodate all possible configurations. which means the number of 
processors required is determined by the maximum computational requirement out of all 
these configurations. From Appendix 5B and Table 5.7 it can be seen that the computa-
tional requirement of the DFf increases more than linearly with its size. Hence the max-
imum computational requirement occurs in the configuration where the channel 
bandwidths are the maximum allowable. For the example considered here. this 
corresponds to having two channels each with bandwidth of half the sampling frequency. 
Two size-60 IDFf's are required. and using the values in Table 5.7 this results in a total 
processing rate of 58.2 Mips. Hence seven processors should be sufficient. On the other 
extreme. the minimum processing requirement corresponds to the case with twenty size-6 
IDFT's. which require a processing rate of only 31.6 Mips. The number of processors can 
therefore be traded off against the flexibility in configuration. and the technique described 
in section 5.2 can be applied to :find the 'optimal' configurations realizable with the chosen 
processing rate. 
As discussed in section 5.3. the hub processor for the input DFf carries out the data 
transfer to and from the multiport memory. detecting and signaling the completion of each 
- 195-
(a) bandwidth - 0.12 
10-1 
10-2 
10-3 
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
(b) bandwidth = 0.3 
101 
~ 
-g 
-
.... 100 ~ 
«S 
~ 
10-1 
10-2 
10-3 
10-4 
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
(c) bandwidth - 0.5 
~ 
-g 
-
..... 100 ~ 
«S 
~ 
10-1 
10-2 
10-3L---~----~----~--~----~----~--~----~----~--~ 
o 0.1 0.2 0.3 0.4 0.5 0.6 0.8 0.9 1 0.7 
Frequency (nomalized) 
Fig.5.23 Frequency responses of design example 
- 196-
level of WFT's. and the frequency domain multiplication. Data write to the multiport 
memory occurs at the input sampling rate and a minimum of 2 instruction cycles are 
required. Data read from memory is identical. Detecting and signaling the completion of 
WFf's required reading a memory location for every WFf size. and upon completion writ-
ing to another location. This requires a few cycles per DFT and is insignificant. The fre-
quency domain multiplication requires 120 multiplications between pairs of real and com-
plex numbers. or 240 real multiplications per DFT. With each multiplication requiring 
about 5 cycles. the total processing rate required of the hub processor is about 8.4 Mips. 
In addition to these. the DFT hub processor needs to transfer data to the IDFT hub proces-
sor. Assuming that FIFO memories are used. each data transfer takes a minimum of 2 
cycles which increases the total requirement to about 9.6 Mips. A single TMS320C25 pro-
cessor should still be sufficient. 
The hub processor for the IDFT's performs essentially the same tasks as that for the 
input DFT. No frequency domain multiplications are needed here but instead the complex 
conjugation of the input and output data is required. These operations can be easily 
absorbed into the WFT subroutines by changing some additions into subtractions and vice 
versa so that there is no increase the number of instruction cycles. The WFT subroutines 
are then turned into IDFT subroutines and the same method of constructing large DFT 
from small WFT's can be applied identically to construct IDFT's from the small IDFT 
subroutines. The processing rate required of the IDFT hub processor is therefore reduced to 
about 1.8 Mips. This is very low in comparison with the DFT hub processor. Since fre-
quency domain multiplications in fact occur between the DFT and IDFT's. it is possible to 
split it between both processors. By dividing these multiplications in two halves. the DFT 
hub processor has a processing rate of around 6.3 Mips and that of the IDFT around 5.1 
Mips. which allows each to be implemented comfortably on a single procesSor. 
The control information required by the WFf processors and IDFT processors can be 
provided by the hub processors in lists of the form shown in Fig.5.24a.b respectively. For 
the WFT case. the Processor Number allows each WFT processor to determine the WFT's 
for which it is responsible and construct the look-up tables accordingly upon a system 
reconfiguration. The Start Flag is set by the hub processor to initiate the start of a level of 
WFT's. Likewise the Reconfigura tion Flag signals a system reconfigura tion and the WFT 
processor needs to update its look-up tables. Both flags are reset by the WFT processor. 
The Number of Levels is the number of branches in the DFT algorithm. It is followed by 
- 197-
the numbers and sizes of the WFf's in each level. This information is sufficient for the 
WFf processors to easily determine the structure of the algorithms and hence perform the 
twiddle factor multiplications when necessary. System reconfiguration is achieved by the 
hub processor updating the entries in the number of levels and the level numbers and 
sizes, and then setting the reconfiguration flag. 
Processor No. Processor No. 
Start Flag Start Flag 
Reconfig. Flag Reconfig. Flag 
No. of Levels Shared IDFT Flag 
Levell No. Flag~ ~notset 
Levell Size No. of IDFT's Start Address 
Level 2 No. IDFTl Start Addr. No. of Levels 
Level 2 Size 
IDFTl Size Levell No. 
Level n No. IDFT2 Start Addr. 
Levell Size 
Level n Size IDF1'2 Size 
Level 2 No. 
Level 2 Size 
, 
, 
, 
(a) WFT Processors (b) IDFT Processors 
Fig.5.24 Control information tables for WFf and IOFf processors 
The control information for the lOFf case is slightly more complex. The first three 
entries are identical to the OFT case. The Shared-lOFf Flag is used to signal either the exe-
cution of a single lOFT in parallel with other processors, or a number of individual IOFf's 
by the processor alone. In tpe first case, the fo11owing entries are the same as the 
corresponding entries in the OFT list except for the additional Start Address which is a 
pointer to the address of the first input data sample for the lOFT. The uses of these entries 
and the operation of the lOFT processor is then identical to the DFT case. If a number of 
individual lOFT's are to be executed. the fo11owing entries in the list contain the Number 
of lOFf's, f o11owed by the Start Address and Size of each one. In this case the Start Flag is 
reset by the IDFT processor after a11 lOFT's have been completed, It is set by the hub 
- 198 -
processor when the next block of input data is ready. 
5.5.2 Analysis/Synthesis Filter Bank Realization 
5.5.2.1 System Realization 
The system components here consist of the input polyphase filter branches. input 
DFT. IDFT's and the output polyphase branches for each IDFT. The input DFT and IDFT's 
are in the same form as those used in the DFT convolution method and the techniques dis-
cussed previously can be applied for their realization. Multiprocessing techniques described 
in sections 5.3.3 and 5.4.1 also can be applied directly to the realization of the FIR 
polyphase filter branches. The necessity for multiprocessor implementation for single filter 
branches depends on the system parameters. It was found that for many practical system 
specifications this is not necessary. but that many branches can be implemented on a single 
processor instead. This is demonstrated in the example below. 
5.5.2.2 Design Example 
Consider the same system specification as shown in Table 5.5 before. with the excep-
tion that the analysis DFT size here is the same as the maximum number of narrow-band 
channels of 20. Using the processing requirement of Table 5.7. the input DFT then 
requires about 23 Mips and three processors are needed. With a filter length of 63 the 
polyphase branches have lengths of 3 or 4 each. and the same number of complex MA per 
input sample. The numbers of processor cycles for real MA operation for lengths 3 and 4 
are 8 and 10 respectively. Combining these values then leads to the processing require-
ments of around 0.5 Mips per branch. Hence a single processor may implement a number 
of branches and a total of two processors will be sufficient for the specification here. 
Consideration of the synthesis filter bank. can also be carried out as for the IDFT's 
before. The maximum processing requirement occurs when the synthesized wide-band 
channel bandwidth is maximum. Since there is no practical value in having a wideband 
channel that is the same as the the input. and by insisting in addition that the IDFT size be 
a multiple of the prime lengths of Table 5.7. the maximum bandwidth allowable is the 
- 199-
synthesis of 18 narrow-band channels. The configuration is then made up of one such 
wideband channel plus another channel synthesized from 2 narrow-band channels. The 
processing rate required for the two IDFf's is about 22.2 Mips and three processors will be 
required. Conversely. the minimum requirement is the configuration with 10 wide-band 
channels each synthesized from 2 narrow-band channels. which amounts to a processing 
requirement of 4.8 Mips. 
As described in Chapter 4. the polyphase filter lengths are of the same order as the 
analysis polyphase filters because the synthesis prototype impulse response is obtained by 
decimation by a factor of K / J from the analysis prototype. where J is the number of 
channels synthesized to form the wide-band channel. and K is the total number of 
analysis channels. As the sampling rate of each polyphase branch is also the same as each 
analysis branch. the processing requirement is identical. Therefore if all analysis channels 
are used in the synthesis filter bank. the amount of processing required by the polyphase 
filter branches remains essentially constant regardless of the configuration. A total of five 
processors is then required in the synthesis filter bank. It is possible to reduce this by 
sharing the processing of filtering and IDFT between all processors and hence increase the 
utilization of each. but the amount of control information and hence the software over-
head for each processor would also increase considerably. 
For both the analysis and the synthesis filter banks. the same processor may be used 
as the hub processor for both the polyphase filters and the DFT or IDFT's because only 
data transfer operations are executed. For the analysis filter bank. there is one data write 
to and read from the FIR processors for every input sample. The same operations occur 
with the WFT processors. Assuming a minimum of 2 cycles per read or write and that the 
data are complex. this amounts to a processing requirement of 5.1 Mips. In addition. 
assuming that the analysis and synthesis filter banks are connected via FIFO memories 
between the hub processors. complex data write from the analysis hub processor to the 
FIFO occurs at the input sampling rate. The total requirement is about 6.4 Mips. and a sin-
gle processor is adequate. In the synthesis filter bank the same amount of data transfer as 
the analysis filter bank takes place and the same argument applies. 
Control information for the WFT and the IDFT processors can be the same as shown 
in Fig.5.24 before. For the analysis polyphase filters. the control information can be in the 
form of the list shown in Fig.5.25. The first two Flags are the same as the previous list of 
- 200-
. Start Flag 
Reconfig. Flag 
No. of Filters 
Start Sample 
Interpolation Rate 
Filter 1 Length 
Filter 1 Coerf. 1 
Filter 1 Coeff. 2 
Filter N Length 
Filter N Coe!!. 1 
Filter N Coeff. 2 
Fig.5.25 Control information table for FIR filter processors 
Fig.5.24. The Number of Filters is the n'umber of polyphase branches to be implemented on 
the FIR processor. The Start Address is the pointer to the start of the group of input data 
samples for this processor. The concept is analogous to the DFT case in that the input data 
are arranged in blocks the size of which equals the number of polyphase branches. The 
relative location of the data with respect to the start of the block determines which 
polyphase branch the data is input. Hence consecutive polyphase filters are implemented 
together and likewise the input samples for these filters are located in consecutive 
addresses starting from Start Address. The next entry in the list is Interpolation Rate. This 
is for the situation of N-path implementation of a single polyphase filter as described in 
section 5.3.3. In this case the filter response is theoretically interpolated by N as required 
in N-path filtering. and implemented efficiently using the technique of section 5.4.1. This 
therefore allows each processor to implement more than one polyphase filters as in this 
example. or when necessary to implement one path of an N-path implementation of a 
polyphase filter. The following entries in the list are only used in a system 
reconfiguration. These are the lengths and coefficients of each polyphase filters provided by 
- 201 -
the hub processor. Only coefficients of the prototypes need to be stored and the hub 
processor can perform the simple polyphase decomposition to obtain the coefficients for 
each polyphase filter. 
The same list and control procedure can be applied to the polyphase filters in the 
synthesis filter bank. The only difference is in the obtaining the filter coefficients. Here for 
each synthesis bandwidth there needs to be one prototype filter. which means a large 
number of coefficients to be stored. This can be reduced by noting that the bandwidths 
which are integer factor of wider bandwidths have prototypes that are integer decimation 
of those of the wider bandwidths and hence need not be stored. Also making use of the 
arbitrary constraint that only IDFI"s of sizes being composite of small primes are to be 
implemented. the number of possible synthesis bandwidths is reduced. For the design 
example being considered. this means the prototypes for the synthesis bandwidths of 18. 
16. 15. 14. and 12 narrow-band channels need to be stored. With the analysis prototype 
being of length 63. the total number of coefficients to be stored amounts to just less than 
300. Hence changes in the configuration can be quickly carried out by loading different sets 
of coefficients to the FIR processors. 
Subroutines implementing the polyphase filters on the TMS320C25 were tested 
together with the WFI' subroutines using the TMS320C25 Simulator. The frequency 
responses of the system for reconstructing 2. 3. and 4 channels are shown in Fig.5.26. This 
was carried out using impulse input and avoiding decimation of the output as before. 
Again the responses do not show significant degradation as compared to the theoretical 
design shown in Fig.4.7b and 4.7d of Chapter 4. 
5.6 Summary 
A methodology for the design and implementation of two types of flexible TMUX 
structures has been described. A technique to obtain configurations that maximize the utili-
zation of any given processing capability was considered. The high computational require-
ment of prospective systems then led to the consideration of multiprocessing implementa-
tion of the DFf and FIR filtering. A structure which allowed ease of implementation of 
different DFf algorithms was designed. A major factor affecting the efficiency of this 
- 202-
(a) 2 channels 
100 
10-2~--~----~----~----~--~----~----J-----L---~----~ 
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
(b) 3 channels 
10-2~--~----~----~----~--~----~----~----~--~----~ 
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
(c) 4 channels 
~ ] 
.~ 10-1 
~ 
10-2L---~----~----~----~--~~--~----~----~--~~--~ 
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
Frequency (nomalized) 
Fig.5.26 Frequency responses of analysis/synthesis filter bank example 
- 203 -
design was recognized to be the global multiport memory. To achieve this. an implementa-
tion technique which allowed zero-wait-state access by multiple processors was described. 
with application to the TMS32OC25 processor in particular. 
Multiprocessing algorithms for FIR :filtering were also e::;ramined in relation to ease of 
implementation and reconfiguration. It was shown that the N-path structure was a viable 
solution. and a technique for efficient implementation on processors with hardware 
multiplier-accumulators was described. 
Design examples for the two TMUX structures were then carried out. Subroutines for 
the TMS320C25 were implemented to investigate the practical viability of the design. It 
was found that the resulting computation requirement was generally quite high, although 
not unachievable. It was also shown that the DFT convolution structure had a much 
higher processing requirement than the analysis/synthesis filter bank. This was not a 
prominent feature noted in previous theoretical analysis. and could be attributed to the 
greater computational requirement for the WFf subroutines. 
Simulation of the TMS320 codes were carried out to verify the basic operation of the 
design. Although not a true simulation of the multiprocessing operation. the results 
obtained in terms of the system responses suggested the correct operation of the distri-
buted processes. 
- 204-
Appendices 
Appendix 5A. Winograd Fourier Transforms for Small Prime Lengths [138J 
The following expressions give the N -point Fourier transform X(k) of the sequence 
x (n). for n. k = 0.1. ... .N -1. x (n) is complex in general and so are the intermediate 
terms a Ci). b Ci). The multiplication and addition counts are for real operations. 
1. N = 3 
a (2)= x (l)+x (2). a (1)= x (l)-x (2). a (0)= x (O)+a (2) : 
b (1)= (COS27T/3 - 1) a (1). b (2)= (sin27T/3) a (2). b (0)= a (O)+b (1) : 
X(O) = a(O). X(!) = b(O) - jb(2). X(2) = b(O) + jb(2) . 
Number of multiplications = 4. number of additions = 12. 
2. N= S 
a (6)= x (l)+x (4). a (7)= x (2)+x (3). a (4)= x (3)-x (2). 
a (5)= x (l)-x (4). a (1)= a (6)+a (7). a (2)= a (6)-a (7). 
a (3)= a (4)+a (5). a (0)= x (O)+a (1) : 
b (1)= [ ~ (COS27T/S + COS47T/S) - 1 ] a (1). b (2)= ~ (COS27T /S - COS47T/S) a (1). 
b (3)= (sin27T/S) a (3). b (4)= (sin27T/S + sin47T/S) a (4). 
b (S)= (Sin47T/S - sin27T/S) a (S). b (6)= b (O)+b (1). 
b (7)= b (3)-b (4). b (8)= b (3)+b (S). b (9)= b (6)+b (2). 
b(10)=b(6)-b(2) ; 
X (0) = a (0). X(l) = bJ9) - jb (7) • X(2) = b (10) + jb (8) . 
X(3) = b (9) + jb (7). X(4) = b (10) + jb (8) . 
Number of multiplications = 10. number of additions = 34. 
3. N = 7 
a (9)= x (l)+x (6). a (10)= x (l)-x (6). a (11)= x (2)+x (S). 
a (12)= x (2)-x (S). a (13)= x (4)+x (3). a (14)= x (4)-x (3). 
a (15)= a (ll)+a (9). 
a (0)= x (O)+a (1). 
a (4)= a (l1)-a (9). 
- 205-
a (16)= a (14 )+a (12). a (1)= a (13 )+a (15). 
a (2)= a (9)-a (13). a (3)= a (13)-a (11). 
a (5)= a (10)+a (16). a (6)= a (10)-a (14). 
a (7)= a (14)-a (12). a (8)= a (12)-a (10) : 
1 b (1)= ["3(COS21T/7 + COS41T/7 + cos121T/7) - 1] a (1) 
1 b(2)=3"(2cos21T/7 - COS41T/7 - cos121T/7) a(2) 
1 b (3)= "3 (COS21T /7 - 2cos41T /7 + cos121T /7) a (3) 
1 
b(4)= "3(COS21T/7 + COS41T/7 - 2cos121T/7) a(4) 
b (5)= ~ (sin21T /7 + sin41T /7 - cos121T /7) a (5) 
b (6)= ~ (2sin21T/7 - sin41T/7 + cos121T/7) a (6) 
b (7)= ~ (sin21T/7 - 2sin41T/7 - cos121T/7) a (7) 
b (8)= ~ (sin21T/7 + sin47T/7 + 2cos127T /7) a (8) 
b (9)= a (O)+b (1). b (10)= b (2)+b (3). b (11)= b (4)-b (3). 
b (12)= -b (2)-b (4). b (13)= b (6)+b (7). b (14)= b (8)-b (7). 
b (15)= -b (8)-b (6). b (16)= b (9)+b (10). b (17)= b (9)+b (11). 
b (18)= b (9)+b (12). b (19)= b (13)+b (5). b (20)= b (14)+b (5). 
b (21)= b (13)+b (5) 
X(O) = a(O). X(l) = b(16) - jb(19). X(2) = b(18) - jb(21). 
X(3) = b(17) + jb(20). X(4) = b(17) - jb(20). 
X(5) = b (18) + jb (21) • X(6) = b(16) + jb(19) • 
Number of multiplications = 16. number of additions = 72. 
- 206-
Appendix SB. Fourier Transform Algorithms for Multiplier Accumulator [183J 
The following expressions give the N -point Fourier transform X(k) of the sequence 
x (n ). for n. k = 0.1. ... .N -1. All terms are generally complex. The expressions for m (i) 
are multiply-and-accumulate type operations. 
Notation: dr (n ).di (n) == real and imaginary parts of d (n) respectively. 
1. N = 3 
2.N=S 
d (0)= x (1)+x (2). d (1)= x (1)-x (2) 
ml = cos21T'/3 dr(O) + xr(O). m2 = -sin21T'/3 d i (1) 
m 3 = cos21T' /3 d i (0) + Xi (0). m4 = -sin21T' /3 d r (1) 
X(O) = X (0) + x(1) + X (2). X(1) = (ml-m2) + j(m3+m 4) 
X(2) = (ml+m2) + j(m3-m 4) 
d (0)= x (1)+x (4). d (1)= x (2)+x (3) 
d (2)= x (1)-x (4). d (3)= x (2)-x (3) 
ml(O) = cos21T'/5 dr(O) + cos41T'/5 dr (1) + xr(O) 
ml(1) = cos41T'/5 dr(O) + cos21T'/5 dr (1) + xr(O) 
m2(0) = -sin21T'/5 di (2) - sin41T'/5 d i (3) 
m2(1) = -sin41T'/5 di (2) - sin21T'/5 d i (3) 
m3(0) = cos21T'/5 di(O) + cos41T'/5 d i (1) + Xi (0) 
m3(1) = cos41T'/5 di(O) + cos21T'/5 d i (1) + Xi (0) 
m4(0) = -sin21T'/5 dr (2) - sin41T'/5 dr (3) 
m4(1) = -sin41T'/5 dr (2) - sin21T'/5 dr (3) 
X(O) = X (0) + x(1) + X (2) + x(3) + X (4) 
X(1)= [ml(0)-m2(0)] + j [m3(0)+miO)]. X(2)= [ml(1)+m2(1)] + j [m3(1)-mi1)] 
X(3)= [ml(1)-m 2(1)] + j [m3(1)+mil)]. X(4)= [ml(0)+m2(0)] + j [m3(0)-miO)] 
3. N= 7 
- 207-
d (0)= x (l)+x (6). d (1)= x (5)+x (2). d (2)= x (4)+x (3) 
d (3)= x (1)-x (6). d (4)= x (5)-x (2). d (5)= x (4)-x (3) 
mI(O) = cos2'TT'/7 dr(O) + cosl0'TT'/7 dr (l) + cos8'TT'/7 dr (2) + xr(O) 
mI(l) = cos8'TT'/7 dr(O) + cos2'TT'/7 dr (l) + coslOTr/7 dr (2) + xr(O) 
mI(2) = cosl0'TT'/7 dr(O) + cos8'TT'/7 dr (1) + cos2'TT'/7 dr (2) + xr(O) 
m2(0) = -sin2'TT' /7 d j (3) - sinlOTr/7 d j (4) - sin8'TT'/7 d j (5) 
m2(1) = -sin8'TT' /7 d j (3) - sin2'TT'/7 d j (4) - sinl0'TT'/7 d i (5) 
m2(2) = -sinl0'TT'/7 d i (3) - sin8'TT'/7 d j (4) - sin2'TT'/7 di (5) 
m 3(0) = cos2'TT' /7 d i (0) + cosl0'TT' /7 d i (1) + cos8'TT' /7 d i (2) + Xi (0) 
m 3(1) = cos 8 'TT' /7 d i (0) + cos2'TT' /7 d i (1) + cosl0'TT' /7 d i (2) + Xi (0) 
m3(2) = coslOTr/7 di(O) + cos8'TT'/7 d i (l) + cos2'TT'/7 d j (2) + Xi (0) 
miO) = -sin2'TT' /7 dr (3) - sinl0'TT'/7 dr (4) - sin8'TT'/7 dr (5) 
mil) = -sin8'TT' /7 dr (3) - sin2'TT' /7 dr (4) - sinl0'TT'/7 dr (5) 
mi2) = -sinl0'TT'/7 dr (3) - sin8'TT'/7 dr (4) - sin2'TT'/7 dr (5) 
X(O) = x (0) + x (1) + x (2) + X (3) + x (4) -:- x (5) + x (6) 
X(l)= [ml(0)-m2(0)] + j [m3(0)+miO)]. X(2)= [ml(2)-m2(2)] + j [m3(2)+mi2)] 
X(3)= [ml(1)-m2(1)] + j [m3(1)+mil)]. X(4)= [ml(l)+m2(1)] + j [mil)-mil)] 
X(5)= [ml(2)+m2(2)] + j [m3(2)-mi2)]. X(6)= [mI(0)+m2(0)] + j [m3(0)-miO)] 
- 208-
Appendix 5C. TMS320C25 Assembly Language Subroutine for size-7 Wino-
grad Fourier Transform 
••••••••••••••••••••••••••• 
•• size-7 WFT subroutine •• 
••••••••••••••••••••••••••• 
LRLK 2,TABPTR 
LAR 1,',1 
., load data to on-chip RAM •• 
.. R = real, I = imago ,. 
LAC '+ 
SACL XOR 
LAC ',2 
SACL XOI 
LAR 1,'+,1 
LAC '+ 
SACL XDOR 
LAC ',2 
SACL XDOI 
LAR 1,'+,1 
LAC '+ 
SACL XD4R 
LAC ',2 
SACL XD41 
LAR 1,'+,1 
LAC '+ 
SACL XDSR 
LAC ',2 
SACL XDSI 
LAR 1,'+,1 
LAC '+ 
SACL XD2R 
LAC ',2 
SACL XD21 
LAR 1,'+,1 
LAC '+ 
SACL XD1R 
LAC ',2 
SACL XDlI 
LAR 1,'+,1 
LAC '+ 
SACL XD3R 
LAC ',2 
SACL XD31 
•• calculate dO •• 
•• real and imago parts stored in 
., separate consecutive blocks, i.e. dOr 
., followed by d1r etc, and dOi, dli etc ••• 
LAC XDOR 
ADD XD3R 
SACL DOR 
LAC XDOI 
ADD XD31 
SACL DOl 
LAC XD1R 
ADD XD4R 
SACL D1R 
LAC XDlI 
ADD XD4I 
SACL DlI 
LAC XD2R 
ADD XDSR 
SACL D2R 
LAC XD21 
ADD XD51 
SACL D21 
LAC XDOR 
SUB XD3R 
SACL D3R 
LAC XDOI 
SUB XD3I 
SACL D31 
LAC XD1R 
SUB XD4R 
SACL D4R 
LAC XDlI 
SUB XD4I 
SACL D41 
LAC XD2R 
SUB XDSR 
SACL DSR 
LAC XD2I 
SUB XDSI 
SACL DSI 
•• calculate mO •• 
•• COSn = cos (2pi n/7), SINn = -sin(2pi n/7) .. 
ZAC 
LT DOR 
MPY COS1 
LTA DIR 
MPY COSS 
LTA D2R 
MPY COS4 
APAC 
ADD XOR 
SACL M10 
ZAC 
LT DOR 
MPY COS4 
LTA DIR 
MPY COS1 
LTA D2R 
MPY COSS 
APAC 
ADD XOR 
SACL M11 
ZAC 
LT DOR 
MPY COSS 
LTA D1R 
MPY COS4 
LTA D2R 
MPY COS1 
APAC 
ADD XOR 
SACL M12 
ZAC 
LT D31 
MPY SIN1 
LTA D41 
MPY SINS 
LTA DSI 
MPY SIN4 
APAC 
SACL M20 
ZAC 
LT D3I 
MPY SIN4 
LTA D4I 
MPY SIN1 
LTA DSI 
MPY SINS 
APAC 
SACL M21 
ZAC 
- 209-
LT D31 SACHTMP MPY SIN5 LT TMP LTA D41 MPY FOVERS *4/7 MPY SIN4 PAC LTA D51 LRLK 2,TABPTR MPY SINI LAR 1,*+,1 APAC SACH*+ SACL M22 
ZAC LAC XOI,13 ADD XDOI,13 LT DOl ADD XDlI,13 MPY COSI ADD XD2I,13 LTA DlI ADD XD3I,13 
MPY COS5 ADD XD4I,13 LTA D2I ADD XDSI,13 MPY COS4 SACHTMP 
APAC LT TMP 
ADD XOR MPY FOVERS 
SACL M30 PAC 
ZAC 
SACH*,2 
LT DOl *- X(!) ** 
MPY COS4 LAC MI0 
LTA DlI SUB M20 
MPY COSI SACLTMP 
LTA D21 LT TMP 
MPY COSS MPY SEVNTH 
- 1/7 APAC PAC 
ADD XOR LAR 1,-+,1 
SACL M31 SACH*+ 
ZAC LAC M30 
LT DOl ADD M40 
MPY COS5 SACLTMP 
LTA DlI LT TMP 
MPY COS4 MPY SEVNTH * 1/7 
LTA D21 PAC 
MPY COSI 
APAC 
SACH*,2 
ADD XOR -* X(2) ** 
SACL M32 LAC M12 
SUB M22 
ZAC SACLTMP 
LT D3R LT TMP 
MPY SINI MPY SEVNTH * 1/7 
LTA D4R PAC 
MPY SIN5 LAR 1,*+,1 
LTA DSR SACH*+ 
MPY SIN4 
APAC LAC M32 
SACL M40 ADD M42 
SACLTMP 
ZAC LT TMP 
LT D3R MPY SEVNTH * 1/7 
MPY SIN4 PAC 
LTA D4R SACH-,2 
MPY SIN1 
** X(3) ** LTA D5R 
MPY SINS LAC Mll 
APAC SUB M21 
SACL M41 SACLTMP 
LT TMP 
ZAC MPY SEVNTH * 1/7 
LT D3R PAC 
MPY SIN5 LAR 1,*+,1 
LTA D4R SACH*+ 
MPY SIN4 
LTA D5R LAC M31 
MPY SIN1 ADD M41 
APAC SACLTMP 
SACL M42 LT TMP 
MPY SEVNTH * 1/7 
** out~ut : DFT scale by 1/7 ** PAC 
** X(O ** SACH*,2 
LAC XOR,13 
** X(4) ** ADD XDOR,13 
ADD XD1R,13 LAC MIl 
ADD XD2R,13 ADD M21 
ADD XD3R,13 SACLTMP 
ADD XD4R,13 LT TMP 
ADD XD5R,13 MPY SEVNTH * 1/7 
PAC 
LAR 1,'+,1 
SACH'+ 
LAC M31 
SUB M41 
SACL TMP 
LT TMP 
MPY SEVNTH - 1/7 
PAC 
SACH ',2 
_. XeS) " 
LAC M12 
ADD M22 
SACL TMP 
LT TMP 
MPY SEVNTH • 1/7 
PAC 
LAR 1,*+,1 
SACH'+ 
LAC M32 
SUB M42 
SACL TMP 
LT TMP 
MPY SEVNTH • 117 
PAC 
SACH-,2 
•• X(6) •• 
LAC MI0 
ADD M20 
SACL TMP 
LT TMP 
MPY SEVNTH - 117 
PAC 
LAR 1,-+,1 
SACH'+ 
LAC M30 
SUB M40 
SACL TMP 
LT TMP 
MPY SEVNTH ' 117 
PAC 
SACH',2 
SAR 2, T ABPTR 
RET 
,. END OF SUBROUTINE .-
- 210-
6 
On-Board Processing 
lor the T-SAT Payload 
6.1 T -SAT System Description 
T-SAT is the U.K. Technology Satellite project carried out jointly among a number of 
university groups [46.184]. It is an investigation into the use of highly elliptic orbits for 
land-mobile satellite services. The objectives are to arrive at the designs for the system and 
various components. in order to demonstrate the viability of the use of satellites for the 
provision of such services. On-board processing was identified as a necessary element in 
the study. and is the part with which this project is concerned. The design of certain ele-
ments within the on-board procesSor (OBP). together with the experimental model for 
their realization. are described in this Chapter. Only features in the system design which 
- 212-
are relevant to the work carried out on the aBP here are addressed in this section. Detailed 
design methodology and specification of the system can be found in other publications 
[185.186]. 
6.1.1 System Overview 
The aim of the T-SAT system is to provide land-mobile communication services for 
the whole of the U.K. area. The system is therefore similar in its objectives to previous 
investigations into land-mobile services for European coverage [78.79]. and has also been 
followed lately by the ESA Archimedes study [77] in which many suggestions for solving 
problems associated with the mobile-satellite environment are the same as those for the 
T -SAT system. most notably the proposal for the use of highly elliptic orbits. 
The reasoning behind the T-SAT system design follows the argument for land-mobile 
satellite services as discussed in Chapter 1. The geostationary orbit leads to many propaga-
tion problems in the land-mobile environment. The elevation angle in regions of high lati-
tude becomes very small. resulting in blockage and multipath fading due to buildings and 
vegetation in urban area as well as on motorways [187]. This then requires large fade 
margins to guarantee an acceptable quality of service [188]. It can be achieved through a 
combination of methods including forward error-correction (FEC). automatic retransmit 
request (ARQ). high-gain tracking antenna for the mobile terminal and increasing the 
satellite equivalent isotropic radiated power (EIRP). These measures would also inevitably 
increase the cost of the mobile terminals as well as the satellites. Amongst other disadvan-
tages. ARQ increases system delay when retransmission is needed and may not be accept-
able for services such as telephony. Tracking antenna for the mobile terminals would be 
physically undesirable for smaller vehicles. Increase in the satellite EIRP would also 
require large deployable antenna and spot-beam coverage making the satellite more expen-
sive. It has therefore been established [78] that for land-mobile communications in the 
European region using geostationary satellites. providing a necessary fade margin is not 
economically feasible. The Molniya orbit is then the alternative taken to avoid many of 
these problems the root cause of which is the low elevation angle. 
- 213 -
The basic system design consists of three satellites in three 12-hour Molniya orbits 
existing in different planes which are oriented at 120 degrees to each other. This provides 
24-hour coverage with an elevation angle varying between about 50 to 70 degrees. hence 
significantly reducing the amount of blockage and multipath fading. The theoretical 
improvement to the system link budget was calculated to be around 30dB [184] although 
actual channel characteristics have still to be measured. Work in this area and directly 
related to the T-SAT initiative has been reported [189] to be under progress. The advan-
tages offered by the Molniya orbit are, however. not without accompanying drawbacks. 
The requirement for three satellites makes the system cost considerably higher both in 
providing the satellites and the added complexity in operating a 'hand-over' when the ser-
vices are transferred from one satellite to the next. It can be more economical if a 24-hour 
coverage is not required, and other cost factors such as the complexity of the hand-over 
procedure need be taken into account as part of the tradeoff between system performance 
and cost. 
Another major drawback brought about by the Molniya orbit is the considerable 
Doppler shifts due to the high relative velocity between the satellite and the earth. For 
systems designed to operate in the 1.5/1.6 GHz band this results in a maximum of 10 kHz 
shift in the carrier frequencies, and must be compensated for. In T-SAT this is carried out 
at the Lf. down-conversion stage by a Doppler Correction unit which is effectively a vari-
able frequency synthesizer. The output frequency of this unit is a function of the position 
of the satellite so that the carrier frequencies at the output of the Lf. down-converter only 
exhibits frequency shifts tolerable by the signal processing functions that follow. Like-
wise. the transmitted signals from the satellite are shifted in frequency such that the 
Doppler shift is not experienced by the mobile terminal. However, with the nature of the 
Doppler shift being non-uniform across the spectrum. only the carrier frequencies are com-
pensated for and the changes in data rates due to the Doppler effect are still present. In 
existing geostationary systems this effect is small and is compensated for by a Doppler 
buffer [80] which absorbs changes in the data rates and provides a constant output data 
stream. With on-board processing this buffering can be carried out at the satellite and only 
the Doppler effects due to the downlink are present at the mobile terminal. As described 
later. the data buffer is also a necessary part of the aBP regardless of the Doppler effect. 
hence it effectively serves two purposes at the same time. Theoretically it is also possible 
with aBP to adjust the data rate of the downlink so that it appears constant at the mobile 
- 214-
terminal. This complicates the downlink modulator but also it is not always practical. For 
example when the satellite is moving away from earth. the data rates received at the satel-
lite decrease. The downlink data rate must be increased to maintain the same data rate 
received by the mobile terminal. which means that a long message of indefinite length 
transmitted from one mobile to another must be bit-stuffed at the OBP. Conversely. 
extensive buffering is required when the satellite is moving towards earth. To accommo-
date both situations leads to much more complex system operation which the advantage 
gained does not appear to justify. and hence this technique is not investigated. 
~ TMUX ~Demod./ ~ Decoder 
~ Demod./~ 
Decoder 
1-----3001 Demod./ ~ ___ '!>I 
Decoder 
\-----3001 Demod./ I-----~ 
Decoder 
Doppler 
~------ Correct. 
~ 
OIl 
... 
... 
CIl 
lr--\ OBP [\--I 
E ~ Mod./ ~ r... 
~ Encoder ~ 
:0 
e 
~ Mod./ ~ 8 
Encoder 
4- 1. F. ~ R. F. R. F. ~ 1. F. 
Dow~onvert and O~onvert and 
amplify amplify 
Upconvert and 
amplify 
Fig.6.1 T-SAT payload system functions 
Upconverl and 
amplify 
The block diagram representation of the payload is then as shown in Fig.6.1. It is 
designed to provide a high degree of flexibility for the experimentation of different system 
configurations. Hence two access schemes. SCPCITDM and TDMAlTDM. can both be 
implemented. The general merits and drawbacks of each have been discussed in Chapter 1. 
In the following the particular implementation of the two schemes in the T-SAT system 
are examined. 
- 215-
6.1.2 SCPC/TDM System 
This access scheme uses SCPC for uplink access from the mobiles and TOM from the 
satellite on the downlink. Essential parameters in the specification of this system are 
shown in Table 6.1. 
Uplink access scheme 
Number of uplink channels 
Uplink channel spacing 
Uplink data rate 
Downlink signaling scheme 
Downlink data rate 
Modulation 
FDMA bySCPC 
16 
14 kHz 
16 kbps per channel 
Single channel. continuous TOM 
256 kbps 
QPSK 
Table 6.1 SCPC/TOM System Parameters 
The link budgets for the system have been calculated [185] and is included in Appen-
dix 6A. It shows that values of Eb / No of 13 dB on the uplink and 8 dB on the downlink 
are achievable by reasonable assumptions on other parts of the links. 
af the 16 uplink channels. 14 are used as traffic channels which are channels for the 
transmission of messages from one mobile to another. These messages are termed 
'indefinite length data messages' (ILOM) because in general they can be of any duration. 
The signals on these channels are demodulated at the satellite and simply relayed onto the 
TOM downlink. The remaining two channels are the short coded messages (SCM) and the 
signaling channels. which are used mainly for the communications between the mobiles 
and the aBP itself so that overall control of the system is facilitated. Both channels 
operate in the slotted-Aloha format to allow random access to the aBP and system 
resources. The messages on these channels are therefore of finite duration and generally 
short. The SCM channel serves two purposes. Firstly it is used for the initial communica-
tion from a mobile to the aBP whenever the mobile requires to transmit ILOM via traffic 
channels. Secondly it allows very short messages to be transmitted from one mobile to 
another as SCM without the overhead in setting up a traffic channel. The signaling Using 
this channel. a mobile can signal to the aBP the release of the allocated traffic channel 
- 216-
when it is no longer needed. as well as acknowledging messages from OBP as necessary. 
Preamble 
(224) 
22 ms 
Message Source Destination Parity + PSTN No. 
(8) LD. (16) LD. (16) + Spare 
Fig.6.2 SCM channel data packet format 
Preamble 
(224) 
18 ms 
Fig.6.3 Signaling channel data packet format 
(88) 
The formats of the data packets for the SCM and signaling channels are shown in 
Fig.6.2 and 6.3 respectively. The time slots are 24 and 20 ms long with a guard time of 2 
ms within each slot. The length of the packets are then 352 and 288 bits respectively. The 
244-bit preamble at the beginning of each packet is necessary to assist synchronization in 
the on-board demodulators. The lengths of the actual messages are 128 and 64 bits only. 
The guard time is designed to accommodate the variation in the propagation delay of the 
signal due to geographical movement of the mobile. As will be discussed later. the mobile 
derives the timing of the uplink time slots from the downlink and the expected location 
- 217-
of the satellite and hence the the propagation delay. It is not practical. however. to require 
the mobile to take into account its geographical location. Hence the guard time represents 
the maximum difference in propagation delay within the beam coverage. which for T -SAT 
is the U.K. region and the 2 m.s was calculated [186] to be sufficient in accommodating the 
difference between the north-most and south-most points. 
80 ms 
Preamble Coding Frame Structure Signal & SCM Channel Headers Coding Tail Traffic Data (128) Header Table (128) (96) Messages (7x128) (32) (14x1280) 
Signaling/SCM Field (2432) 
Fig.6.4 TDM downlink frame format for SCPC system 
The TDM downlink is organized in time frames of 80 ms long. Traffic data which are 
transmitted from mobiles are time-multiplexed with messages from the OBP into a 
number of time slots within each frame. as shown in Fig.6.4. The start of each frame is 
preceded by a preamble and unique word for synchronization in the mobile demodulators 
as well as identification of the start of frame. The slots which follow can be classified into 
two different fields. the signaling/SCM field followed by a traffic data field. The 2432-bit 
signaling/SCM field contains mainly messages from the OBP to mobiles. except the SCM 
which are sent from the mobiles on the SCM channel on the uplink. It also contain a 
Frame Structure Table which defines the composition of that frame. and Channel Headers 
which define the destination mobiles where the corresponding traffic data are to be 
received. The signaling/SCM field is convolution-coded which requires a head and tail to be 
inserted at the beginning and end of the field for the proper operation of the Viterbi 
decoder at the mobile. Details concerning the functions of these entries are addressed more 
fully in section 6.2 on the implementation of the OBP BufferlFormatter and Coder. The 
traffic data field contains data which are relayed from the traffic channels of the uplink. It 
is divided into 14 slots of 1280 bits each and hence traffic data on the uplink and 
downlink are balanced. 
Channel 
Set-Up 
Channel 
Release 
Source 
Mobile 
RR-Ack 
- 218 -
Satellite Destination Mobile 
Assignment 
Fig.6.5 Normal channel set-up and release procedure 
The procedures involved in a normal setting-up and clearing of a traffic channel are 
* shown in Fig.6.5. The source mobile initiates the process by sending a RR on the SCM 
channel. If successfully received the OBP acknowledges the RR by an RR_Ack. The OBP 
must then establish that the destination mobile requested by the RR is ready for receiving 
traffic data by sending a Poll destined to that mobile. The destination mobile then confirms 
its readiness by sending a Poll_Ack on the signaling channel to the OBP. The OBP is then 
ready to assign a traffic channel. provided that one is available. by sending an Assignment 
message to the source mobile which upon receipt of the message is allowed to transmit on 
the channel. Assignment message to the destination mobile is not necessary because the 
destinations of the downlink traffic are specified in the Channel Headers. 
- 219-
Clearing of the channels is also initiated by the source mobile which send a 
Channel_Clear message on the signaling channel. The aBP acknowledges the clearing with 
a Channel_Clear_Ack to the source mobile and the traffic channel is returned to the aBP. 
The destination mobile can either be signalled likewise or be notified of the clearing by a 
change of the corresponding destination entry in the Channel Header. depending on the 
ease of implementation of each. In addition to these normal setting-up and clearing pro-
cedures. there must be provisions to safeguard the system against all possible situations 
such as anyone of the messages not being received. These are defined by the system access 
protocol [190] the details of which are not relevant here. 
SCM messages from one mobile to another are directly relayed onto the downlink 
and no actions by the aBP are required. The identity of the destination mobile resides as 
part of the SCM and hence allowing a mobile to retrieved SCM addressed to it by reading 
their destination identities. Acknowledgement of the SCM is generated by the destination 
in the form of a SCM also. and sent to the source mobile in the same manner. This pro-
vides an efficient channel for short messages analogous to a transparent transponder. 
Channel coding is not included in the calculation of the link budgets. The figure for 
the uplink Eb / No implies a BER of around 10-9 and further coding gain appears unneces-
sary. The downlink Eb / No of 8 dB. however. equates to a BER of around 10-5 and chan-
nel coding will be necessary for the integrity of the signaling messages from the aBP to 
mobiles. For the same reason. coding of the traffic data on the uplink does not improve the 
end-to-end link performance. Coding traffic data on the downlink only leads to unbal-
anced data rates between uplink and downlink which does not result in better utilization 
of the resources. Hence no on-board coding/decoding is performed on the traffic data but 
obviously any error correction codes may be used on an end-to-end basis to improve the 
link performance. 
Timing error. which is the difference between the expected and the actual time-of-
arrival of an uplink burst signal within a time slot. can be measured by the aBP and fed 
back to the mobile so that a form of closed-loop frame synchronization is implemented to 
guarantee high accuracy of the mobile timing. This information is included as part of the 
RR_Ack and Assignment messages from the aBP so that timing correction may be per-
formed immediately after the first message from the mobile. In addition. a special message 
requesting timing-correction to the mobile may be sent if the timing error from a mobile is 
- 220-
deemed unacceptable at any stage of communications between the OBP and the mobile. 
The accuracy of the measurement does not have to be higher than 1 bit-duration, and the 
maximum error is theoretically the length of a slot. The timing error then requires a max-
imum of 9 bits which can be easily accommodated within the RR_Ack or Assignment mes-
sages. In practice. since the accuracy of the mobile timing is required not to cause timing 
error greater than the guard time. the timing error only requires 5 bits. The timing error in 
this case is only used as a warning to the mobile rather than strictly essential for frame 
synchron iza tion. 
6.1.3 TDMAlTDM System 
This system is similar to the SCPC/TDM system but uses TDMA for uplink access 
from the mobiles instead. The essential specifications are shown in Table 6.2. 
Uplink access scheme 
Number of uplink channels 
Uplink data rate 
Downlink signaling scheme 
Downlink data rate 
Modulation 
TDMA 
1 
256 kbps 
Single channel, continuous TDM 
256 kbps 
QPSK 
Table 6.2 TDMA/TDM System Parameters 
The link budgets for the TDMA uplink are also included in Appendix 6A [185]. It shows 
an uplink Eb / No of only 7 dB with the much higher mobile transmitter power of 20W. 
which is in contrast to the comfortable uplink budget of the SCPC system with less severe 
assumptions. 
The TDMA uplink channel is composed of a time frame of 80 ms duration. In addi-
tion these time frames are divided into groups of four with each group being a superframe. 
The operation of the TDMA uplink is similar to the SCPC case in that the SCPC channels 
in the latter are analogous to the time slots in the TDMA uplink frame here. Each frame is 
then subdivided as shown in Fig.6.6 into many slots where different types of signal bursts 
Access Burst 
(816) 
Preamble RR/SCM 
(128) (256) 
Signaling 
Burst (256) 
Guard Time 
(432) 
- 221 -
80 ms 
Signaling 
Burst (256) 
Preamble 
(128) 
Traffic Burst 
1 (1408) 
Message 
(128) 
Preamble 
(128) 
Fig.6.6 TDMA uplink frame structure 
Traffic Burst 
1 (1408) 
Traffic Data 
( 1280) 
are assigned. The figure shows 1 access slot. 4 signaling slots and 13 traffic slots which are 
direct parallels of the SCM. signaling and traffic channels of the SCPC system. As with the 
SCPC system. the variation in propagation delay due to geographical movement leads to 
variation in the time of arrival of the uplink bursts in the order of 2 ms. This would 
imply a guard time between each slot of the same order to prevent overlapping of the 
bursts. With a data rate of 256 kbps this is equivalent to around 500 bits. which is longer 
than all SCM and signaling messages and hence the frame efficiency would be very poor. 
Timing correction is therefore used to improve the accuracy of the mobile timing. The pro-
vision of the large guard time is then only required for the first RR message from the 
mobile when timing correction has not been carried out. The access slot is therefore given a 
guard time of around 1.8 ms. The timing error of the burst is measured by the OBP and 
fed back to the mobile which adjusts its own timing accordingly. Subsequent signaling and 
traffic bursts from the mobile can then satisfy a much higher timing accuracy. The guard 
time between each slots is then reduced to 62.5 p.s or 16 bit-duration which defines the 
basic requirement on the mobile timing accuracy. The value of the timing error has a max-
imum of 448 bits. as shown in Fig.6.6. and with an accuracy of 1 bit-duration the timing 
error is represented by a 9-bit word. As opposed to the SCPC system. this timing 
- 222-
correction is strictly necessary for frame synchronization. bringing with it a more stringent 
specification on the accuracy of the mobile timing. 
80 ms 
Preamble Coding Frame Structure Signal & SCM Channel Headers Coding Tail Traffic Data Header (128) (96) Table (256) Messages (7x128) (32) (13x1280) 
Signaling/SCM Field (3712) 
Fig.6.7 TDM downlink frame format for TDMA system 
The TDM downlink frame is shown in Fig.6.7. It is also very similar to the SCPC 
system but with fewer traffic slots on the uplink the downlink traffic data field is also 
shorter and the signaling/SCM field longer. The functions of the slots are as in the SCPC 
system. The normal channel setting-up and clearing procedures are also as with the SCPC 
case shown in Fig.6.5 apart from the necessity of timing correction messages for synchron-
ization of the mobile uplink. 
The low Eb / No requires channel coding on the uplink. A BCH code with a coding gain 
of around 3 dB was proposed [46] as a reasonable compromise between decoder complexity 
and coding gain. Since the uplink and downlink budgets in this case are similar. decoding 
the through-traffic data from the uplink and encoding it back onto the downlink would 
bring about a reduction in BER roughly by a factor of 2. This was the approach taken in 
the system design although in the experimental model its implementation is limited by the 
computational power of the processors. 
6.2 Experimental Model 
- 223-
An experimental model of the payload was implemented in order to gain insight into 
the technology required and practical problems that may be encountered in realizing the 
T-SAT system. The specification of the experimental model is a scaled-down version of the 
original proposal in that there are 4 instead of 16 SCPC channels. due to the limitation on 
available resources. This allows the SCM. signaling and 2 other traffic channels to be 
implemented for the SCPC system. All other aspects of the payload are as originally 
specified. The design and implementation of the on-board processing elements. in particular 
the Formatter. Coder and the Transmultiplexer are described iri this section. 
6.2.1 On-board Processing Subsystem Design Concept 
The block diagram of the on-board processing subsystem of the payload is shown in 
Fig.6.8. Both SCPC and TDMA systems are shown but only one is used at any time 
instance. The structure of the subsystem is designed to facilitate efficient data transfer 
between functional modules as well as ease of overall control of the subsystem. The OBP 
is the unit which is responsible for this overall control function. as well as the other main 
task of implementing the access protocol. It is therefore necessary to provide bi-directional 
links. such as a data bus. between the OBP and the various modules. On the other hand 
since the messages addressed to the OBP is a small proportion of the total data received at 
the uplink. it would be inefficient to transfer all received data to the OBP. Hence the 
design concept is to provide a path where the real-time signal processing operations are 
performed on the data. and subsequently buffering the data to allow non-real-time access 
by the OBP to selected parts of the data. The meaning of real-time is applied here only in a 
relative sense of course. since all processes are strictly real-time. This approach for con-
necting the modules then allows the high-speed data path to be implemented efficiently 
since it is uni-directional. and the data transfer between the OBP and other modules to be 
used exclusively for overall control and implementing the access protocol. 
For the SCPC system. the input signal is down-converted from the 70 MHz Lf. out-
put from the preceding r.f. sections and sampled to digital form by the Analogue Front-
end. The SCPC channels are separated and demodulated by the TMUX and Multicarrier 
Demodulator (MCD). The data is then temporarily stored in the Formatter. Uplink mes-
sages from mobiles are read by the OBP. which in turn write back messages to the 
- 224 -
u 
TDMA TDMA 
Demod. ~ Decoder ~ Formatler ~ Emcoder ~ Modulator ~ 
to 
from .. 
" 
'" TDMA do I" plink I SCPC Lf. 
wnliJi 
Lt. sCPc 11 
OBP V-~ 
.., v , v 
TMUX ~ MeD 
Fig.6.8 OBP subsystem functions 
Formatter to be sent on the downlink. The Formatter then carries out the necessary rear-
rangements of the data. with the Coder performs channel coding on the data requiring cod-
ing. The completed downlink frame is then sequentially output to the modulator and onto 
the downlink. Operations of the TDMA system is similar with the additional process of 
uplink decoding. 
The implementation of the experimental system is based on an ffiM-PC and 
TMS320C25 processors (TMS320 for short). These are chosen because products exist to 
allow multiple TMS320 processors to be linked to the ffiM-PC processor via the PC Bus. 
The TMS320 serial link allows simple data transfer from one processor to another. With 
the IBM-PC acting as the OBP. the hardware structure shown in Fig.6.8 can be readily 
implemented allowing efforts to be concentrated on software development. This approach 
also leads to great flexibility such that the system parameters can be easily changed which 
is desirable in an experimental model. Detailed implementation techniques and operations 
of the various functions are described in the following. 
- 225-
6.2.2 Analogue Front-end and TMUX Hardware 
This part of the work was related to hardware development. firstly with a signal 
conditioning and acquisition circuit for the TMUX and secondly the TMUX itself. The 
actual software used for the TMUX in the experimental model was carried out by a 
separate effort [191] as it related to the MCD which is beyond the scope of this project. 
However. design consideration of the software in relation to the hardware design was 
investigated and described here. 
6.2.2.1 Analogue Front-end 
The block diagram of the Analogue Front-end is shown in Fig.6.9. Its function is to 
perform down-conversion of the SCPC signals from Lf. and analogue-to-digital conversion 
at a suitably low sampling frequency. The input consists of 4 SCPC signals centred around 
70 MHz Lf. The SAW bandpass filter acts as the anti-aliasing filter. The filter passband is 
300 kHz centred around 70 MHz. with a stopband attenuation of 25 dB at the bandedges 
and increasing to about 40 dB at 300 kHz either side from the centre frequency. It is used 
because of its linear phase characteristics and narrow passband at the required centre fre-
quency. After quadrature down-conversion. the SCPC signals are centred around zero fre-
quency. The image-frequency component at 140 MHz is removed by simple RC lowpass 
filters. This is not really necessary in practice since such high frequency components are 
severely attenuated by the buffer amplifiers that follow. It does. however. help to reduce 
the noise in the final signal. The sampling rate of the ADC is 256 kHz. so that the aliasing 
noise to the SCPC signals comes from the region of the SAW filters with attenuation 
approaching 40 dB. A lower sampling rate would reduce the computational requirement of 
the TMUX but increase aliasing noise as the stopband of the SAW filter is not equiripple. 
Its choice is therefore a matter of tradeoff and 256 kHz was found to be a good comprom-
ise. 
The use of quadrature down-conversion has its advantages and disadvantages. The 
primary advantage is that there is effectively no imaging effects due to the negative com-
ponents compared to in down-conversion by a real mixer. as illustrated in Fig.6.10. Hence 
the output from the quadrature mixer may be centred around zero frequency so that a 
very low sampling rate can be used at the ADC. simplifying the signal conditioning and 
70 MHz 
Lt. 
input 
Buffer 
Amp. 
Quadrature Mixer 
r------
I I 
I 
I 
I 
70 
MHz 
- 226-
SAW 
Bandpass 
Filter 
Lowpass 
Filter 
256 
kHz 
Buffer 
Amp. 
A-to-O 
Convertor 
A-to-D Lowpass 
Filter Convertor 
Variable 
Gain and 
D.C. Offset 
Fig.6.9 Analogue front end functions 
To 
TMUX 
input 
To 
TMUX 
input 
single • 
~ ~ ~--------~~~o~~~-------
-_.J::>7~O~M"::':H>'::':zL-----~O~----7.LLOL<M~H;.uz",-- • 
qua~~' 
mlxer 
~~~ _________ ~~;.u;.u~ 
-14'OMHz 
Fig.6.10 Down-conversion by real and quadrature mixers 
interfaces to the AOC. The main disadvantage is that the signal is now represented by two 
parts. real and imaginary parts. Since the output of the quadrature mixer is normally at a 
low frequency. the increase in cost due to the doubling of components is minimal. 
- 227-
However it presents the problem that the two paths must be made exactly identical or 
non-linear distortion to the signal would result. In the experimental model. efforts were 
made to achieve this as far as possible. Adjustable gain stages are implemented for fine-
matching of the gains between the two paths. The problem is also complicated by the 
flash ADC's used. which require single sided input signals. The output from the quadra-
ture mixer must be added to a d.c. offset before input to the ADC. This d.c. offset was also 
made adjustable to allow such matching to be carried out. This was done by using a simple 
sinusoidal signal for the 70 MHz i.f. input. and adjusting the gains and offsets until the 
output from the ADC's. observed through a TMS320 card interfaced to them. are made 
identical in all but their phases. It was found that good matching can normally be 
achieved in this way with careful adjustment this way. A more systematic solution to this 
problem is to use digital filters to operate on the sampled data and equalize the difference 
between the two paths [192] at the expense of increased computation. The equalization 
filters using this method require lengths of around 50 and at the sampling rate of 256 kHz 
it was not considered a viable option. 
The Analogue Front-end circuit was implemented on a double sided PCB measuring 
6"x4". requiring power of about 4W of which the main consumers are the two ADC's. 
6.2.2.2 ThfiJX Hardware 
This hardware was required to implement the 4-channel TMUX and the MCD that 
follows using TMS320 processors. For the TMUX the small number of channels and the 
efficient multiply-accumulate operation of the TMS320 makes the tree structure more 
efficient than FFT type structures. The output data rate from the TMUX is required to give 
2 samples per symbol which equals 16 kHz. Both this and the input sampling rate of 256 
kHz are not related to the channel spacing of 14 kHz and hence some modification to the 
tree structure discussed in Chapter 2 is needed. The basic tree TMUX required the input 
sampling rate to be a multiple of the channel spacing. Hence the straight forward solution 
is to use a sampling rate conversion filter at the input to decimate by a rational factor such 
that the new sampling rate is now a multiple of 14 kHz. Similarly decimation by an 
appropriate rational factor at the final stages of the tree filters gives the required output 
sampling rate. Clearly the post filters can be combined with the output sampling rate 
conversion filters. and the resulting structure is shown in Fig.6.11. The rate conversion 
filters can be implemented very efficiently using a type of block processing procedure [193] 
- 228 -
so that the operations are performed at the lower output data rate F' 11 ' , b' , ma y, It IS 0 VIOUS 
that the input rate-conversion filter can be eliminated altogether if th ADC I' e samp mg rate 
is made a multiple of the channel spacing, However an efficI'ent st t ' h' , ruc ure m t IS case 
would still consist of a decimation filter at the input and requIr' e a sl'm'l f I ar amount 0 com-
putation, The input rate-conversion filter should therefore be viewed as performing the 
same decimation function rather than an operation solely attributed to incommensurate 
input sampling rate. 
28k 16k 
56k ~ 7/4 
t2 
112k ~ 7/4 256k 4 
sample/s 
t 16/ 7 ~ 2 Channel Output 
Input t 7/4 
rate-conversion 
t2 filter 
-
-...........-
../ 
~ 7/4 
Tree Filter Bank Post-filters 
Fig,6.11 T-SAT tree TMUX design 
The filter lengths and computational requirements of the TMUX are then evaluated 
to be as shown in Table 6.3 below. The input filter length is the effective filter length at 
the output sampling rate using the block processing structure for fractional decimation 
[106]. The instruction rates are based upon the use of the MAC instruction of the 
TMS32OC25. 
The instruction rates imply that three TMS320 are needed in total for the TMUX 
implementation. The values above do not include other house-keeping tasks such as data 
110 from the Analogue Front-end and the transfer of data between processors, which total 
- 229-
Input filter Tree filters 
Filter length 19 11 (per filter) 
Multiplication (M/s) 4.3 3.7 
Addition (M/s) 4.0 2.2 
Instruction (M/s) 6.1 7.1 
Table 6.3 Computation Requirement of TMUX 
to less than 3 Mips. and the spare capacity offered by three processors are needed for these 
purposes. The three sections of filtering can then be implemented in the three separate pro-
cessors. The computational requirement of the MCD was found to require one processor 
[75] and hence four processors were used in the hardware. although in the actual imple-
mentation the tasks were distributed differently [191] to suit the interfacing and other 
considerations. 
Data transfers between the processors were originally tested using the serial link of 
the TMS320. The data rate between processors is 112 k complex sample/so or 3.6 Mbps. 
which can be supported by the serial link maximum of 5 Mbps. It was found. however. 
that the serial link was not very reliable at such data rates. Also the data samples would 
need to be transferred at rather regular time intervals. making block processing type pro-
cedure difficult to implement. Hence first-in-first-out memory (FIFO) was used for inter-
facing between the processors. 
The block diagram of the hardware is shown in Fig.6.12. The four processors are 
arranged in a pipeline structure with almost identical stages. Apart from the final which 
requires external data RAM. the on-chip RAM was sufficient for their functions. Program 
is stored in PROM and downloaded to the on-chip RAM for fastest execution. The FIFO 
are mapped to the I/O ports of the processors and zero-wait-state data access to and from 
the FIFO is implemented to allow efficient data transfer. All logic interfacing functions. 
such as the number of wait states for I/O access. are carried out by a single programmable 
array logic (PAL) for each processor. The PAL takes its input from selected address and 
m Fro 
Analo 
Front 
gue 
End \ 
...--
I-< 
Q) 
-~ 
CI:l 
I-+-> 
0 T 
Form atter 
( 
\ 
~ 
Po 
~ 
...... 
~f\ 
...--
~ 
CI.l 
-
..... 
~ 
c:c 
-' ~ I' p., 
-' ~ 
0 
J~ 
- 230-
0 0 TMS Ii-. TMS Ii-. 
I ....... ~ I Ii-. I I t--
11 .. I 
11: ~l '~41\ J1.- Jl 4 ~ 4 ~ 
PROM ~ PAL I- PROM fE-- I-PAL ~ ~ 
o II ~ TMS Ii-. TMS 
....... 
Ii-. I' 
11' it lr .1.1- J.1 4~1 ~. lr -:11 
RAM ~ PROM ~ PAL I- PROM ~ PAL I-·r ~ ~ 
Fig.6.12 TMUX hardware block diagram 
control lines from the TMS320, and control signals for the peripheral devices are derived. 
This helps in the modularity of the design in that stages with different peripheral devices 
can be easily accommodated by different logic functions for the PAL and no changes in the 
hardware is necessary. It can also implement different wait state for different addresses, 
such as the first stage where the buffer interface to the Analogue Front-end has one wait-
state 1/0 access whereas access to the VO address of the FIFO is zero wait-state. Finally 
the final stage processor is interfaced to the PC Bus allowing the OBP to communicate to 
the MCD. Interrupts are generated when the OBP writes data to the buffer on the PC Bus. 
The hardware was implemented on two standard-sized IBM-PC peripheral cards. It 
could be reduced to a single card had more powerful PCB design and manufacturing f acili-
ties been available. 
- 231 -
6.2.3 Formatter 
The basic functions of the Formatter is to act as a temporary storage area for the 
uplink data, allowing messages to the OBP to be retrieved while traffic data not required 
by the OBP is simply routed to the downlink. An additional coding function exists to 
allow coding of downlink messages from the OBP. The functions of the Formatter in the 
SCPC and TDMA systems are the same but due to differences between the ways uplink 
signals are received, the internal realization and interfacing procedures need to be different. 
A software-based approach using the TMS320 is taken. The advantage of this is the 
reconfigurability it offers for changing system requirements. The data input/output inter-
facing is also simplified using the standard TMS320 serial link interface. 
6.2.3.1 Overall Operation 
The operation of the Formatter can be represented as a cyclic buffer as shown in 
Fig.6.13, with pointers addressing locations where data are being read or written by 
different processes. For the purpose of Doppler buffering [80], the processes only consist of 
data write and read, and the size of the buffer needs only to accommodate the maximum 
variation in propagation delay. With each satellite offering 8 hour/day service this 
corresponds to just less than 80 ms, implying a buffer size of around 2 kByte for both 
SCPC and TDMA systems because the net uplink data rate in both systems are the same. 
In addition, data access by the OBP requires certain finite time and must be provided for 
by the buffer size. To facilitate these data accesses as well as the internal operations of the 
Formatter, a total buffer memory equivalent to 3 downlink frames or 7.5 kByte is used. 
Small amounts of additional memory are needed to assist the formatting and convolu-
tional coding of the downlink data. It will be seen that this memory requirement is neces-
sary to implement all the Formatter operations, and that it does not necessarily lead to 
increased system delay. 
The conceptual memory organization is shown in Fig.6.14. There are three memory 
blocks which act as the storage areas for the uplink and downlink data and operate as 
'ping-pong' buffers so that each block, which is suitably called a Frame Buffer, is assigned 
to buffer data for one 80ms downlink frame. Each of these Frame Buffers is divided into 
two main buffer sections, one for signaling messages and another for the output data. The 
- 232-
y 
" ~ h OBP processes 
uplink messages 
System delay 
Fig.6.13 Formatter operation concept 
signaling messages buffer has two functions. Firstly it is used for storing data received on 
the uplink other than those in the traffic channels for the SCPC or the traffic time slots in 
the TDMA systems. It is organized so that each SCPC channel or TDMA time slots has a 
fixed address in the buffer of each block. Thus the aBP can easily retrieve all uplink mes-
sages from the mobiles by accessing the corresponding memory locations. The second 
function of the input data buffer is for storing temporarily the downlink messages from 
the aBP. This is necessary because these downlink messages need to be coded. The convo-
lutional coder. which exists as part of the Formatter softwre. operates on the data from 
the aBP and then placing the coded data to the appropriate addresses in the output buffer 
area. This is described in more details in section 6.2.3.4. 
The output buffer is a memory area that is continuously and sequentially being read 
out to the serial link connected to the downlink TOM modulator. As the downlink mes-
sages from the aBP is coded and placed in the output buffer. these aBP-generated messages 
are followed by traffic data in the downlink time frame. These traffic data are placed 
directly from the input since no operation are required upon them. 
Signaling 1 
Message 
Buffer 
output 
Buffer 
Status 
Table D 
Frame Buffer 1 
- 233 -
data from 
uplink 
, 
, , 
, 
move 
pointer 
write 
status 
I"-""'L.L.L.;~~ data 
output 
pointer 
, 
, , 
, 
move 
L..----...Jpointer 
D 
Frame Buffer 2 Frame Buffer 3 
Formatter 
Fig.6.14 Formatter memory organization 
to 
downlink 
modulator 
In addition to the input and output buffering area. there needs to be a Status Table 
for each Frame Buffer. Each entry in the Status Tables corresponds to a time slot within 
the 80ms time frame. These entries refer to the presence and validity of the data in a slot. 
and are supplied by the demodulator and decoder via the serial link as part of the data as 
described previously in section 6.2.1. 
6.2.3.2 TDMA Formatter Realization 
The timing relationship between the various events in the Formatter is shown in 
Fig.6.15. This shows the three main events of input data being placed in the appropriate 
locations of the Frame Buffer. data from the OBP being coded and placed into the output 
buffer and data in the output buffer being transferred out to the modulator. The neces-
sity for three buffers. as opposed to two in the usual ping-pong buffer set-up. can be seen 
from the diagram to be due to the time required for the OBP to transfer downlink mes-
sages to the Formatter and for the messages to be coded. In practice. since the downlink 
signaling messages for each frame amount to around 50 to 100 sixteen-bit words. data 
- 234 -
transfer from the OBP takes just a few hundred microseconds whO h 0 IC IS very short com-
pared to the frame time of 80 ms. Coding of the signaling mess k 0 
ages ta es consIderably 
longer. however. and is the ref ore the main reason for the 3-buffer TO O arrangement. hIs Intro-
duces a delay of about one frame to the downlink m~ges from the OBP. which is incon-
sequential. It should be noted that traffic data is not affected by this d 1a bee 0 e y ause It only 
applies to the signaling messages from the OBP whereas data from the uplink traffic slots 
are put directly into the next downlink frame and the maximum delay introduced is one 
frame. 
Downlink 
Frame 
XF 
(scrambler 
off) 
Coder 
Input 
Buffer 
2 
128 . 
bits~ 
o 1 
OBP . code . OBP . code . OBP . code : 
writes to : buffer: writes to: buffer : writes to : buffer: 
buffer 0 . 0 to 1: buffer 1 . 1 to 2: buffer 2 2 to O' 
: Signalling/ 
:-e-SCM field 
. . 
1 2 0 
t 
OBP reads 
signaling 
messages and 
returns downlink 
messages 
OBP reads 
status table of 
frame 1 
Fig.6.15 Formatter timing relationship 
2 
The software has been implemented so that the three processes effectively operate 
independently. It generates an interrupt to the PC bus at certain time in each frame to sig-
nify the complete reception of a frame of data. upon which the OBP can read the signaling 
messages and the data status from the Status Table. The OBP then writes back messages 
to be sent on the downlink and the completion of which is indicated by a special word 
written to a certain memory location. This is detected by the coder which then starts cod-
ing the data. 
- 235-
The output buffer is continuously being read out to the modulator by loading succes-
sive memory contents to the Data Transmit Register upon the serial link transmit inter-
rupt of the TMS320. As shown in Fig.6.15 previously. data in the output buffer memory 
will always be ready when it is read out to the downlink. 
6.2.3.3 SCPC Formatter Realization 
The SCPC Formatter differs with the TDMA Formatter mainly in the handling of the 
input data. As opposed to the time slots in the TDMA sysiems. the input data in the 
SCPC systems corresponds to frequency slots which are asynchronous with each other 
and it makes formatting of input data more difficult. The approach to this problem is to 
use the real-time frame synchronization information available from the output buffer. 
The change-over from one input buffer to the next is determined by the order that an 
input data packet starting before the beginning of the next downlink frame is placed in the 
present input buffer. This is straight forward for SCM and signaling channels where 
uplink data occur naturally in packets. but is not so for ILDM from the traffic channels 
where arbitrary packetizing of the data needs to be carried out by the Formatter. The start 
of the ILDM is made the start of the first packet. Thereafter every 1280 bits of data. 
which is the amount received every 80 ms. are grouped into one packet and put in the 
corresponding downlink time slot. Fig.6.16 illustrates an example of this procedure. 
According to this organization of uplink data therefore. there may be a maximum of 4 
packets each from the SCM and the signaling channels and one packet per traffic channel 
for every 80 ms. This determines the number of entries in the Status Tables. 
Since the SCM. signaling and ILDM packets are of different lengths. the handling of 
data from these different channels need to be independent from each other. The software 
in effect implements four independent input buffers. one for each channel. with each one 
behaving in the manner of the input buffer in the TDMA system operating on the input 
data from each channel. and allowing the formatting of the downlink to be performed 
without being aware of the asynchronous nature of the input. 
SCM 
Channel 
Signaling 
Channel 
Traffic 
Channel 
- 236 -
80 ms 
3 packets to 
. /input buffer ~ 
.~ ~~ 
to previous 2 packets to 
linput buffer /input buffer ~ 
~~ ~ 
. 
, traffic packet to next 
: t buffer 
Fig.6.16 Example of uplink packet distribution to an input buffer 
6.2.3.4 Convolutional Coder 
FEe is carried out on the downlink SCM and signaling messages by the Formatter. 
The coding scheme chosen is a 1/2-rate binary convohitional code of constraint length 6. 
Each code word contains two coded bits Cj (0) and Cj (1) given by 
Ci (0) = d i + d i - 2 + dj-s (6.1a) 
Ci (1) = d i + d i - 1 + d i - 3 + dj_1, + d i - S (6.1b) 
where dn represents the n th input data bit. 
In addition a bit-interleaving scheme is needed after the convolutional coding to cater 
for burst errors. The bits-interleaving scheme operates on blocks of 128 bits data with a 
depth of 4 such that if the original data is partitioned into four 32-bit words. C IC 2C 3C 1, ' 
given by 
- 237-
the output block is then constructed by taking a bit from each d' wor In turns. 
Expressing the output block as a concatenation of four word;) Z Z Z Z th . t 1 d 
• 1 2 3 4. e In er eave 
block is given by 
An efficient implementation of these coding schemes was achieved using a look-up 
table technique which combines the two processes together. The look-up table returns the 
2-bit output from a 6-bit input words in a pre-interleaved position. such that the four 
output words are constructed simultaneously with the convolutional coding. The imple-
mentation on the TMS320 requires 1406 processor cycles for each 64-bit input or 128-bit 
output block. For the maximum of coding 3712 bits signaling/SCM field for each downlink 
frame in the TDMA system. the computational requirements is 0.51 Mips. In practice the 
Coder also needs to read from the Frame Structure Table of each downlink frame to deter-
mine the number of downlink messages. which determines the actual amount of data 
requiring coding. When this is less than the length of the Signaling/SCM field. zeroes are 
filled in to occupy the unused slots and the amount of computation is reduced. A coding 
tail is required by the Viterbi decoder at the receiving mobile terminals. which is produced 
by effectively appending the appropriate number of zeroes to the end of the input data. 
The look-up table occupies 26= 64 addresses in the on-chip RAM of the TMS320. 
Conceptually the coding schemes can be easily implemented in hardware using a few shift 
registers. multiplexers and exclusive-or gates. The software approach taken. however. 
allows simpler control when coding is to be applied only to part of the downlink data the 
length of which is also variable. A different convolutional code. such as a different con-
straint length. can also be implemented easily with only minor changes to the look-up 
table and program codes. The low computational requirement allows the Coder to be 
incorporated with other tasks within the Formatter. 
- 238 -
6.2.3.5 Data 110 and Synchronization 
The implementation of the Formatter indirectly affects the oth f 
er components 0 the 
OBP subsystem by determining the way data are transferred between them. A data format 
is therefore designed to facilitate efficient transfer of uplink/downlink data as well as 
information necessary for synchronization between the components. 
Burst mode operation is used in the TMS320 serial link in.which data are transferred 
in packets of 16-bit words. In each packet of 16 bits, 8 bits are allocated for data and the 
other 8 bits for synchronizatio~ (Sync) information concerning those 8 bits of data. The 
8-bit Sync word is further divided into four fields with separate functions as shown in 
Fig.6.17. 
msb lsi DS lu I Isb PC DATA 
1 2 1 4 8 
S: Start of block . 1 - start of block (1 bit) 
DS : Data State (2 bits) 
U: Uncoded, 1 = uncoded (1 bit) 
PC : Packet Count (4 bits) 
Fig.6.17 Serial data packet format 
U is used in the demodulator-decoder link in the TDMA system. and is for the demo-
dulator to signal to the decoder that the data is either coded (U-o) or uncoded (U=l). DS 
is for other information associated with the data. It is 0 when there is no detected data 
error, 1 when there is a data error. and 2 when the slot/channel is empty. Any com-
ponents. e.g. demodulator and decoder. may set DS to 1 when it concludes that the data is 
erroneous. It is set to 2 by the demodulator when it believes that the uplink TDMA time 
slot or SCPC channel is empty. When a packet indicates an empty slot/channel the other 
fields can be ignored. It is simply a signaling message to the Formatter where the informa-
tion is reported to the OBP via the Status Table. 
- 239-
The strategy in safeguarding synchronization of the s~~dal data transfer is imple-
mented by the S and PC bits. In the TDMA system, synchronization of the serial link data 
is essential because any missed data packets by the decoder IS disastrous for the decoding 
of block codes. Similarly for the Formatter uplink frame synchronization will be lost and 
corrupting the entire system. The concept is to divide data into blocks of 128 bits for 
transferring from one board to another. The S bit is 1 when the data is the first 8 bits of 
the 128 bits block. Each 128 bit block is transferred in 16 blocks. PC increments from 0 
to 15 as a count for these 16 blocks. Hence any missed serial link packets can be easily 
detected allowing the decoder or the Formatter to readjust as necessary. For the SCPC 
system, since the uplink data are asynchronous, the data from each channel may occur in 
any order. The PC field is used therefore to indicate to the ::<ormatter which channel the 
data is from. The S bit is then used to indicate the start of 2. uplink burst. As opposed to 
the TDMA system, loss of frame synchronization for data from each channel is not fatal 
in that only messages in that particular uplink burst is corrupted without affecting the 
rest of the system. Hence a block count of the type used in the TDMA system is thought 
unnecessary. 
The allocation of 8 bits for the Sync word leads to some redundancy in the informa-
tion for synchronization. It is, however, a compromise between efficiency in data transfer 
and the ease at which the Formatter can manipulate input data. The maximum data rate 
for the serial link is then increased to 512 kbps, which is still very low compared to the 5 
Mbps supported by the TMS320. This data format therefore allows all 'real-time' infor-
mation to be transferred alongside the data without causing unacceptable increases in the 
serial link data rate. 
The hardware interface to the roM Modulator is via the transmit serial link of the 
TMS320. The data clock of the roM Modulator is used as the external clock for the serial 
port, It therefore requires the serial link to operate in contim:ous mode. while the receive 
side requires burst mode operation, This is made possible by division of the data clock 
by 16 using external hardware and uses as an external FSX to the TMS320, which is 
configured to operate in burst mode. 
The timing requirement of the TDM modulator implies that upon a TINT the 
TMS320 has only 19 cycles to load valid data to the DXR. A further requirement of the 
TDM Modulator is an external signal that disable the scrambler. This is asserted at the 
~ 240-
beginning of the 80ms time frame to prevent scrambling of the preamble and unique 
word. and has the same timing requirement as the loading of data into the DXR. Both 
of these made possible by a careful implementation of the transmit interrupt service rou-
tine and the XF of the TMS320 is used as the external signal to disable the scrambler. 
This also provides a convenient hardware signal that marks the starts of the downlink 
frames which is required by the TDMA demodulator. 
6.2.3.6 Computational Requirement 
The executable object code for the Formatter occupies about lK words. Total storage 
requirement amounts to just over 4K words. of which the output buffers amount to about 
90%. Differences between the SCPC and TDMA implementations are small. with the SCPC 
version requiring slightly more on-chip RAM for variable storage. 
The computational requirement in both cases amounts to about 1.8 Mips for the 
experimental model specifications. This is approximately shared equally between the cod-
ing. data output and the actual formatting of downlink data. 
The Formatter was tested with known data input together with the OBP using a 
TMS320 development system interfaced to the PC Bus. and with a variety of input data 
the operation of the Formatter was observed to be as designed without any problems with 
processing speed. 
6.3 Summary 
The system design of the OBP subsystem for the T-SAT payload has been described. 
The actual implementation of the OBP architecture was investigated through the design of 
an experimental model of the payload using TMS320C25 processors interfaced to an IBM 
PC. Details concerning the design of the TMUX and the Formatter were discussed to show 
the techniques for implementing the OBP architecture by exploiting the capability of the 
hardware with a suitable format to assist data transfer and synchronization of the subsys-
tem. Apart from demonstrating the hardware requirement of an OBP payload. it has also 
brought to light the com~lexity in implementing such systems and indeed the possible 
problems in larger systems with higher data rates or larger r.umber of channels. Amongst 
- 241 -
others. efficient transfer of data and synchronization information between components 
achievable by the serial data format used in the experimental model would not be directly 
applicable to larger systems. The viability of the OBP payload therefore not only lies in 
the efficient implementation of its constituent components but just as importantly with 
the interconnections between them. 
- 242-
Appendix 6A. T-SAT Link Budgets 
These link budgets are calculated using assumptions made in the T -SAT system 
specification [185]. 
1. TDMAlTDM System Uplink. 
Mobile transmit power 
Mobile antenna gain 
Mobile antenna gain ripple 
Mobile EIRP 
Free space loss 
Atmospheric/rain attenuation 
Carrier power at satellite 
Satellite antenna gain 
Antenna noise temperature 
Satellite G/T 
Boltzmann's Constant 
Modem implementation losS 
Carrier-to-noise ratio 
Data rate (256kbps) 
EblNo 
2. SCPC/TDM System Uplink. 
Mobile transmit power 
Mobile antenna gain 
Mobile antenna gain ripple 
Mobile EIRP 
Free space loss 
Atmospheric/rain attenuation 
Carrier power at satellite 
Satellite antenna gain 
Antenna noise temperature 
Satellite G/T 
Boltzmann's Constant 
Modem implementation loss 
Carrier-to-noise ratio 
Data rate (256k.bps) 
EblNo 
13.0 dBW 
15.0 dB 
-1.0 dB 
27.0 dBW 
-188.9 dB 
-0.5 dB 
-162.4 dBW 
23.8 dB 
27.4 dBK 
-3.6 dB/K 
-228.6 dBW IKIHz 
-1.5 dB 
61.1 dBHz 
54.1 dBHz 
7.0 dB 
7.0 dBW 
15.0 dB 
-1.0 dB 
21.0 dBW 
-188.9 dB 
-0.5 dB 
-168.4 dBW 
23.8 dB 
27.4 dBK 
-3.6 dB/K 
-228.6 dBW IKIHz 
-1.5 dB 
55.1 dBHz 
42.1 dBHz 
13.0 dB 
- 243-
3. TDM Downlink. 
Satellite transmit power 
Diplexer and feeder loss 
Mobile antenna gain 
Satellite EIRP 
Free space loss 
Atmospheric/rain attenuation 
Carrier power at mobile 
Mobile antenna gain 
Mobile antenna gain ripple 
Antenna noise temperature 
Mobile G/T 
Boltzmann's Constant 
Modem implementation loss 
Carrier-to-noise ratio 
Data rate (2S6kbps) 
EbINo 
13.0 dRW 
-1.5 dB 
23.0 dB 
35.0 dBW 
-188.9 dB 
-0.5 dB 
-154.4 dBW 
15.0 dB 
-1.0 dB 
24.5 dBK 
-10.5 dB/K 
-228.6 dBW IKIHz 
-1.5 dB 
62.2 dBHz 
54.1 dBHz 
8.1 dB 
7 
Conclusions and 
Suggestions for Future Work 
7.1 Conclusions 
A number of conclusions may be drawn from the work carried out in this project, in 
terms of the objectives achieved by the digital signal processing techniques and the general 
digital implementation of the OBP functions, and the difficulties found in the investiga-
tions. 
- 245-
7.1.1 Objectives Achieved by Digital Processing Techniques 
7.1.1.1 Uniform-Bandwidth TMUX 
The viability of computation reduction techniques, through multirate digital signal 
processing, to the uniform TMUX problem has been considered and significant improve-
ment in computational efficiency was found possible using the binary tree filter bank 
structure with complex filtering at each stage. 
Results have been obtained in the investigations of the relationship between system 
performance and the parameters involved in the processing structures. In particular, finite 
wordlengths effects due to various elements within the TMUX operations were considered 
in detail in order to provide an approach to the optimization of hardware requirement in 
relation to system specifications in on-board processing satellites. Two different 
approaches to the estimation of the BER degradation with variations of the individual 
parameters were chosen. Firstly an analytical approach based on statistical properties of 
the signals and system was taken and approximate closed form expressions relating the 
BER degradation and individual parameters were obtained. Secondly, the system was 
simulated with the various finite word length effects introduced separately to obtain exper-
imental data of the BER characteristics. The results from these two approaches were found 
to be very similar and hence appear to confirm their validity. Thus the overall metho-
dolgy allows tradeoff between computational requirement and performance to be carried 
out. Results relating individual parameters to the performance, in terms of bit-error-rates 
of QPSK signals, can therefore be used for the design of TMUX to different system 
specifications so that the minimum wordlength requirement for each element in the TMUX 
may be assigned. 
As the results for the reasonably large number of channels considered indicate that 
the various wordlength requirements are relatively quite low and less than those of many 
. . .. f 
microprocessors, the application of this design approach for the mmlInlzatlOn 0 
wordlength requirement is unlikely when the TMUX is implemented using such devices. 
However, with the increasing capability and lowering of prices of custom-designed digital 
devices, the design approach becomes an important tool for the efficient implementation of 
the TMUX in silicon. 
- 246-
It was also determined that exhaustive variation of the parameters was impractical 
because of the colossal amount of data it would produce out of all the reasonable combina-
tions of the parameters. Hence the results generated would not be applicable to system 
specifications which were very different and in such case the approach taken in the investi-
gation of the variation of performance with the parameters would have to be applied to 
produce results specific to a given system requirement. and a tradeoff may then be carried 
out using the approach demonstrated. 
The other parameters investigated were associated with the filter specification of the 
TMUX. The analytical investigations of the tree TMUX in this respect could not be 
approximated easily in closed form and numerical methods were applied to obtain the 
equivalent approximation specific to the set of filters used. Similar to the other parameters. 
the results so obtained were seen to be of the same order of magnitude as the simulation 
results thus confirming the approach for the assessment of the tree filter bank using the 
integrated sidelobe ratio of the composite response of the required channel. This property 
of tree filter bank due to multistage filtering has not been exploited in terrestrial TMUX 
because of the need for keeping crosstalk strictly below a predetermined level. The aBP 
TMUX does not need to satisfy such requirement and its frequency response may thus be 
more arbitrary so that the overall computation may be minimized with respect to the BER 
characteristics. 
Finally. in terms of the actual arithmetic requirements of the tree filter bank. the 
hardware required for the digital implementation of the TMUX using the tree structure is 
quite high for realistic numbers of SCPC channels. but entirely achievable using existing 
nsp microprocessors or custom-designed silicon. The feasibility of an aBP TMUX for 
demultiplexing of uniform-bandwidth SCPC signals is therefore only dependent on other 
factors such as the availability of space-qualified devices. 
7.1.1.2 Flexible Transmultiplexer 
For systems involving non-uniform bandwidth channels. two Flexible TMUX 
methods were examined and compared. Each method has its own constraints as to the 
bandwidth and frequency allocations of the channels. and their resultant computational 
requirements. 
- 247-
In order to improve the computational efficiency of the DFT Convolution method. 
special filters were designed to reduce the number of arithmetic operations with any degra-
dation to the frequency response characteristics of the filters being kept to a minimum. 
This has been achieved by two means. firstly by forcing the DFT stopband samples to zero 
and secondly by limiting the tail samples of its impulse response. The stopband attenua-
tion and passband ripple characteristics need to be traded off with the magnitude of the 
impulse response tail. which causes time domain aliasing of the DFT convolution output. 
The examples show that good trade-off's may normally be achieved to obtain reasonable 
frequency response characteristics with rather low aliasing response magnitude. as com-
pared to that of a filter resulted from the "brute force' means of simply setting the DFT 
stopband samples of an optimal equiripple filter to zero. Apart from the trivial drawback 
of being rather computation intensive and hence requiring long run time. the filter design 
method is thus very effective for improving the efficiencies of DFT convolution. 
For the Flexible TMUX using analysis/synthesis filter bank. a combination of the SSB 
filter bank and a simple alteration of the equiripple filter design procedure provides a good 
tradeoff between its computational requirement and performance. The near-optimal nature 
of the analysis prototype also leads to high computation efficiency for the analysis filter 
bank on its own. which is significant especially when the number of channels to be recon-
structed is small. Aliasing noise due to imperfect reconstruction at the synthesis filter 
bank can be seen to be of the same order as those in the DFf convolution method by a 
comparison of the responses of their aliasing components. and hence comparable perfor-
mance can be expected. 
The computation requirement of the DFf convolution TMUX is rather high compare 
to the uniform-bandwidth TMUX. mainly due to the large input DFf. It is. however. still 
achievable for lower input sampling rates and hence smaller total possible number of 
channels. For the analysis/synthesis filter bank the amount of arithmetic operations is 
more dependent on the number of channels to be reconstructed and their respective 
bandwidths. In general. the analysis/synthesis filter bank offers higher computational 
efficiency when the wideband channel bandwidths are small in relation to the input sam-
pling frequency. In addition to the consideration of computational efficiency. each method 
imposes different constraints to the channel bandwidths and frequencies such that the 
choice of a processing structure depends on the specific requirement of a system. and the 
optimum will involve a combination of the different TMUX methods. As such. the two 
- 248 -
Flexible TMUX methods can be most effectively applied as processing elements in a large 
aBP system where fixed analogue processing front-end may precede a number of digital 
TMUX's which provide the flexibility required of the system. 
7.1.1.3 Implementation of TMUX 
Practical issues in the implementation of a Flexible TMUX were examined in relation 
to the problem of high computational requirement and ability to implement different algo-
rithms for flexibility. 
The critical element requiring large amount of arithmetic operation was the DFT in 
both methods of Flexible TMUX. A software based design approach was adopted for the 
ease of implementing different DFT algorithms. This was achieved by an index-mapping 
technique which effectively distributed the DFT computations among mUltiple processors. 
Using a look-Up table technique. the complex index-mapping procedure was simplified such 
that input data samples for the multiple processors were obtained without requiring any 
index calculations. The difficulty was then the multiport memory of which the bandwidth 
needs to be quite high. With careful hardware design. however. this problem may be 
resolved as has been shown in the design example. 
FIR filtering operations were also considered and techniques for parallel implementa-
tion were also examined with respect to their ease of reconfiguration. Special attention was 
given to the interconnection between the FIR filtering and the DFT processes. as required in 
the analysis/synthesis filter bank. It was found that the implementation of the parallel 
FIR structures can be interfaced easily with the DFT structures using the appropriate pro-
cessor configurations. 
As an example. the problem of implementing the two different Flexible TMUX 
methods was considered specifically for the TMS320C25 DSP microprocessor and an 
approach to the design of an interface for connecting four processors to a global memory 
was found. In addition. techniques for the efficient implementation of the FIR and DFT 
processes on the TMS320 processor were examined and software written to demonstrate 
their effectiveness. The design examples highlighted two main problems for the implemen-
tation of a Flexible TMUX. namely the high computation requirement and complexity of 
reconfiguration. The proposed software and hardware tech..-,.iques do. however. provide 
effective means of alleviating these problems. The techniques are not limited in its 
- 249-
application to the TMS320 processors only and as would be more likely in an OBP satellite 
environment. more efficient special-purpose processors may be used to carry out the rela-
tively simple tasks required of the TMS320 processors under the control of a more 
general-purpose hub processor. 
7.1.1.4. OBP System 
The T-SAT OBP experimental model demonstrated. to a limited extent. the capability 
of OBP satellites using digital baseband processing techniques. While the signal processing 
functions have traditionally received much attention. it was shown that the efficient 
implementation of the OBP depended as much on the design of other supporting com-
ponents in the OBP and the means of data transfer and synchronization between the ele-
ments. which had to be designed to suit the specific data I/O requirement in the system. 
The function of the Formatter was found to be essential for the implementation of a data 
transfer and synchronization protocol within the OBP subsystem. The real-time nature of 
the uplink and downlink data is then effectively buffered from the OBP. which may be 
more effective for the other system functions. The ways in which the OBP subsystem 
could be effectively implemented in the experimental model also helped to substantiate the 
argument for the general OBP architecture proposed for T-SAT. The scaling of the experi-
mental model to a larger practical system would require the data transfer and control pro-
tocol to be reconsidered as obviously the data rate becomes higher. Apart from such physi-
cal considerations, however. the OBP architecture and the associated control methodology 
should still be generally applicable. 
7.1.2 DifJiculties in the Proposed Techniques 
7.1.2.1 Computation Requirement 
The greatest difficulty in the realization of the :flexible TMUX methods remains the 
high computation requirement for practical system specifications, despite the large 
improvement provided by the techniques considered in this investigation. For less demand-
ing system specification such as considered in the design examples, the algorithms can be 
readily implemented but the amount of hardware was still shown to be of a high order. 
- 250-
7.1.2.2 Realization of Flexible TMUX 
The realization of a procedure for the reconfiguration of the Flexible TMUX was 
found to be rather complex especially if multiprocessor implementation was used. It 
would require large amount of information specific to different algorithms to be stored and 
transferred between processors in the event of a reconfiguration. While much work has 
been carried out for the design of the algorithms and hardware structure to facilitate such 
a mechanism. the precise details would be specific to particular system requirement and 
consideration would be required on a case-by-case basis. The more flexible the system is 
required to be. the more complex and time-consuming this reconfiguration procedure 
becomes. In the same way as for the consideration of computational requirement and per-
formance of TMUX structures. the complexity in the reconfiguration procedure needs to be 
considered as part of the tradeoff in the design of Flexible TMUX. 
7.2 Suggestions for Future Work 
(1). This project has concentrated its effort on the optimization of the digital TMUX with 
respect to the signal processing algorithms. To minimize hardware complexity. the imple-
mentation of the algorithms may be further optimized using algebraic techniques on the 
algorithms to result in operation which are more amenable to hardware implementation. 
These include the methods of multiplierless FIR filters and DFT processors and the use of 
the residue number system for look-up table implementation of the arithmetic operations. 
(2). The availability of semi-custom VLSI devices offers another possibility for the imple-
mentation of the TMUX algorithms. These devices may incorporate special signal process-
ing hardware such as multiplier-accumulators or a DSP core making them more efficient in 
terms of component counts and hence hardware complexity. The efficient mapping of the 
algorithms to such devices needs further investigation as to which type of devices are more 
suitable and how the hardware capability may be optimally utilized. 
(3). The TMUX algorithms were investigated separately in this project such that con-
sideration has not been given to application of the algorithms in conjunction. Intuitively it 
is computationally efficient to take this possibility for certain mixes of channel bandwidths 
and frequency allocations. A general approach towards such overall optimization of the 
- 251 -
algorithms is therefore needed to provide a systematic way of determining the combina-
tions of algorithms. 
(4). An experimental model of the mUltiprocessor architecture described in Chapter 5 for a 
Flexible TMUX needs to be realized. Such a model would be helpful to the inv(stigations 
of the Flexible TMUX in two ways. Firstly it would demonstrate the viability of the mul-
tiprocessing implementation of the TMUX, and secondly it would offer insight into the 
complexity of the reconfiguration procedure in such a system. The T -SAT experimental 
model may serve as the backbone of the system to which the Flexible TMUX can be inter-
faced. The interfacing techniques learnt in the T-SAT system would be applicable to the 
interfacing of the Flexible TMUX to the OBP, allowing experimentation to be performed 
without requiring much additional hardware. 
- 252-
References 
1. Manassah. J. T .. Lightwave communication: innovations in telecommunications, Pt. A, 
Academic Press, New York. 1982. 
2. Kuecken. J. A .• Fibre optics, a revolution in cornm.unication, TAB. 1987. 
3. Horstein. M .• "Land-mobile communications satellite system design:' Proc. AIAA 
10th Communication Satellite Systems Conference, Orlando. pp. 467-475, 1984. 
4. Beddoes. T .• "Roaming in the pan-European cellular system:' Telecommunications. 
vol. 22. pp. 38-46. Sept. 1988. 
5. Tirro. S .. "Possibilities offered by digital satellite communications for national com-
munications in European country:' 5th Int. Conf. Digital SateUite Commun., Genoa. 
pp. 217-225. 1981. 
6. Ghais. A. F .. "Future development of the INMARSAT system." Proc. AlA A 10th 
Communication SateUite Systems Conf., Orlando. pp. 440-449. 1984. 
7. Stevenson. S. M. and Provencher. C. E .. "Rural land mobile radio market assessment; 
satellite and terrestrial concepts." Proc. AIAA 10th Communication SateUite Systems 
Conf., Orlando. pp. 595-604. 1984. 
8. Pfund. F. T .. "Regional satellite systems for the late 1980's." Proc. AIAA 10th Com-
munication Satallite Systems Conf., Orlando. pp. 226-234. 1984. 
9. Bartholome. P .. "Satellites: tomorrow's solution to tomorrow's needs." Proc. 3rd Tir-
renia Int. Workshop Digital Communications. pp. 3-7. 1987. 
10. 8argellini. P. L. and Hyde. G .. "Electrical communications systems approaching the 
year 2000: cables and satellites:' Proc. 3rd Tirrenia Int. Workshop Digital Communi-
cations. pp. 11-21. 1987. 
11. Kiesling. J. D .. "Mobile satellites- the possibility of universal service." Proc. ICC 86, 
Toranto.p. 1384. 1986. 
12. Reudink. D. O. and Yeh. Y. S .. "A scanning spotbeam satellite system." BeU Systems 
Technical Journal. vol. 56. pp. 1549-1560. Oct. 1977. 
13. Anzic. G .. "Microwave monolithic integrated circuit development for future space-
borne phased array antennas." Proc. AIAA 10th Communication SateUite Systems 
Conf., Orlando. pp. 43-53. 1984. 
14. Robertson. 1.. "L-band 20 watts GaAs FET amplifier for on-board digital mobile 
satellite system." Proc. ICC 88, Philadelphia. pp. 16.8.1-16.8.5. 1988. 
15. Constantinos. C .. "Frequency reuse in 3rd generation maritime/aeronautical systems." 
MSc. thesis. University of Surrey. U.K .. Oct. 1985. 
16. Wu. W. W .. Elements of digital satellite communication, 2. Computer Science Press. 
1984. 
- 253-
17. Guenin. J. P .. "Towards the integrated services digital network: TELECOM 1." Proc. 
5th Int. Con!. Digital Satellite Commun.,Genoa. pp. 209-215. 1981. 
18. Gatfield. A. G .. "Satellite links in integrated services digital network." Proc. 5th 
Int. Con!. Digital Satellite Commun., Genoa, pp. 235-239, 1981. 
19. Casas, J. M. and Bartholome, P., "Some aspects of the implementation of an 
integrated space/terrestrial network for Europe." Proc. 3rd Tirrenia Int. Workshop 
Digital Cornrnu.nications. pp. 23-33. 1987. 
20. Pennoni. G. and Bella. L., "Integration of a payload enhanced networking satellite 
(PENSAT) and terrestrial broadband ISDN," Proc. 3rd Tirrenia Int. Workshop Digi-
tal Communications, pp. 47-55, 1987. 
21. Casewell, I. E., Ferebee, I. C .. and Tomlinson, M., "A satellite paging system for land 
mobile users," Proc. 4th. Int. Con/. Satellite Systems Commu.n. Navigation. pp. 57-62. 
London, 1988. 
22. McClure, I. J., "B.T. experiments in radio paging via satellite." Proc. 4th Int. Con/. 
Satellite Systems Cornrnu.n. Navigation, pp. 63-67. 1988. 
23. Schoenenberger, J. G. and McKinlay. R. A., "An airline passenger telephone system-
design, development and early trials." Proc. 4th Int. Con/. Satellite Systems Commun. 
Navigation, pp. 97-101. 1988. 
24. Pennoni. G., "A TST/SS-TDMA telecommunication system: from cable to switch-
board in the sky," ESA Journal, vol. 8, 1984. 
25. Bella, L., "Satellite switching for mobile communications: new issues and perspec-
tives," Proc. 3rd Tirrenia Int. Workshop on Digital Communications. pp. 65-72. 1987. 
26. Kiesling, J. D., "Direct access satellite communications using SS-FDMA," Proc. AIAA 
8th Cornrnu.nication Satellite Systems Con!., Orlando, pp. 627-633. 1980. 
27. Apple. J., "An onboard baseband switch matrix for SS-TDMA systems," Proc. 5th 
Int. Con/. Digital Satellite Cornrnu.nications, Genoa, pp. 429-434. 1981. 
28. Shinonaga, H. and Ito, Y., "SS/FDMA system for digital transmission," Proc. 7th Int. 
Con!. Digital Satellite Corn.munications, Munich, 1986. 
29. DeRosa, J. K., Ozarow, L. H., and Weiner, L. W., "Efficient packet satellite communi-
cations," IEEE Trans. Communications Technology, vol. COM-27. pp. 1416-1422, Oct. 
1979. 
30. Evans, B. G., "Towards the intelligent bird," Int. J. Satellite Communications. pp. 
203-215, July 1985. 
31. Nuspl. P., Peters, R., Abdel-Nabi. T., and Mathews. N., "On-board processing for 
communication satellites: system and benefits." Int. J. Satellite Commun .. vol. 5. pp. 
65-76. 1987. 
32. Koga. K., Muratani, T .. and Ogawa, A., "On-board regenerative repeaters applied to 
satellite communication," Proc. IEEE, vol. 65, Mar. 1977. 
- 254-
33. Perillan, L. B. and Rowbowtham, T. R., "INTELSAT VI: SS-TDMA system definition 
and technology development," Proc. 5th Int. Conf. Digital Satellite Communication 
, 
Genoa,p.411,1981. 
34. Marcaricchio, F. and Arioli, B., "The Italsat programme," Int. J. Satellite Communica-
tion, July 1983. 
35. Mathews, N., "Performance evaluation of regenerative digital satellite links with FEe 
codecs:' Proc. 3rd Tirrenia Int. Workshop Digital Commun., pp. 213-223, Sept. 1987. 
36. "Study of feasibility of an on-board processor concept for mobile communications by 
satellite," ESA Contract 60501841NLIJX; Final Report, Dec. 1987. 
37. "Study of systems and repeaters for future narrowband communication satellite," 
ESA ESTEC contract no. 5484183INLIGM(SC), Phase II final report, 1985. 
38. Scott, P. and Craig, A., "A statistical analysis of the traffic capacity of a multibeam 
multidestination satellite," ESA Journal, vol. 4, 1980. 
39. Barnes, M. H. D. and Talbot, A. C., "The ARAMIS payload," Proc. 4th Int. Conf 
Satellite Systems Mobile Com.nu.m. Navigation, pp. 244-248, 1988. 
40. Maral .. Satellite Communication System Engineering, J.Wiley, 1987. 
41. Prasanna. S .. Pontano. B .. Dicks. J. L.. and Koh. E. K .. "INTELSAT VI 120 Mb/s 
TDMA transmission design." Proc. 5th Int. Conf. Digital Satellite Communication, 
Genoa. 1981. 
42. Grabner. J. C. and Cashman. W. F .• "Advanced communications technology satellite: 
systems description," Proc. IEEE Globecom 1986, Houston. pp. 559-567. 1986. 
43. Carter. D. R .• "Survey of synchronisation technique for a TDMA satellite-switched 
system," IEEE Trans. Communication Technology. Aug. 1980. 
44. EI-Amin. M .• "Protocols for mobile satellite communication." Ph.D. thesis. Univer-
sity of Surrey. 1985. 
45. Edelson. B .. "SPADE system progress and application." COMSAT Tech. Review, vol. 2, 
Spring 1972. 
46. Aghvami. A. H .• Clarke. A .• Evans. B. G .. Farrell. P. G .. Gardiner. J. G .. Norbury. J. 
R. , and Vilar. E .. "Land mobile satellites using highly elliptic orbits-the UK T-SAT 
mobile payload." Proc. 4th Int. Conf. Satellite Systems Mobae Co11'l.1TlU1tications and 
Navigation. pp. 147-153. 1988. 
47. Wong, C. W., "Baseband switch and device technology for on-board processing satel-
lites." Ph.D. thesis. University of Surrey. 1988. 
48. Kato. S .. Arita. T .. and Morita. K .• "Onboard digital signal processing technologies for 
present and future TDMA and SCPC systems." IEEE J. Seleded Areas Commu.n., vol. 
SAC-5. pp. 685-700. May 1987. 
49. 
50. 
- 255-
Kohri. T .. Morikura. M .. and Kato. S .. "A 400ch SCPC signal demodulator using chirp 
transform and correlation detection scheme." Proc. Globecom 87 Tok 286-291 
, yo. pp. , 
1987. 
Evans. B. G., Coakley. F. P .. EI-Amin. M. H. M .. Lu, S. C .. and Wong, C. W., 
"Baseband switches and transmultiplexers for use in an on-board processing 
mobile/business satellite system." lEE Proc. Pt.F. vol. 133, pp. 356-363, July 1986. 
51. Inukai. T .• "An efficient SS-TDMA time slot assignment algorithm." IEEE Trans. 
Communications Technology. vol. COM-27. pp. 1449-1455, Oct. 1979. 
52. Bongiovanni. D., Tang. D .. and Wong. C .• "A general multibeam satellite switching 
algorithm:' IEEE Trans. Communications Technology, vol. COM-29. July 1981. 
53. Kuo, F. F. ed., Prdocols and techniques for data communication networks, Prentice-
Hall, Englewood Cliffs, 1981. 
54. Takahata, F., Shinonaga, H., and Ohkawa, M. , "Recent developments of on-board 
processing technologies in Japan," Proc. 3rd Tirrenia Int. Workshop Digital Commun-
ications, pp. 235-242, 1987. 
55. Ananasso, F .• Deacon, J. M., Priscoli, F. D .. and Green, R. C .• "SS FDMA channeliza-
tion and routing in advanced multibeam mobile satellite systems," Proc. 4th Int. 
Conf. Satellite Systems Mobile Communications andNavigation, pp. 259-263.1988. 
56. Naderi. F. M .• "ACTS: the first step toward a switchboard in the sky." Proc. 3rd 
Tirrenia Int. Workshop Digital Comrnun .• pp. 225-233, 1987. 
57. D'Ambrosio, A. and Alletto. G., "The ITALSAT QPSK burst mode coherent demodu-
lator," Proc. 3rd Tirrenia Workshop Digital Commun .. pp. 253-260,1987. 
58. Ishizu, T., Kazekami. Y., and Sawada, H., "A design approach for on-board modem," 
Proc. ICC 86, Taronto, p. 1814. 1986. 
59. Benedicto, F. J., "Elements for a year 1995 mobile communications satellite," Proc. 
4th Int. Conf. Satellite Systems Mobik Communications and Navigation. pp. 5-9, 
1988. 
60. Ananasso. F. and Re, E. Del. "Techniques and technologies for multicarrier demodula-
tion in FDMAffDM satellite systems." Proc. 3rd Tirrenia Int. Workshop Digital 
Communications. pp. 243-251. 1987. 
61. Bakken, P. M .• Ringset, V., Ronnekleiv, A .• and Olsen, E., "Multicarrier demodulator 
(MCD) using analog and digital signal processing." Proc. 3rd Tirrenia Int. Workshop 
Digital Communications, pp. 187-196, 1987. 
62. Ananasso, F. and Saggese, E., "A survey on the technology of multicarrier demodula-
tors for FDMAITDM user-oriented satellite systems." Proc. Globecom 1985, pp. 
6.1.1-6.1.7. 1985. 
63. Ananasso. F. and DeSantis. P., "On-board technologies for user oriented SS-FDMA 
satellite systems." Proc. ICC 87, Seattle, p. 8.2. 1987. 
- 256-
64. Gardner. F. M .. "Multipath link and modem design for satellite-mobile communica-
tions." ESTEC Contract 5146182INLIGM(SC) Final Report. Gardner Research Com-
pany. Palo Alto. Aug. 1983. 
65. Gardner. F. M .. "On-board processing for mobile-satellite communications:' ESA 
tech. rep., ESfEC contract no.5889184INLIGM. May 1985. 
66. Kato. S .. Morikawa. M .. and Umehira. M .. "General purpose TDMA LSI development 
for low cost earth station:' Proc. IEEE ICC 86, Toranto. pp. 16.6.1-16.6.6. June 1986. 
67. Izumisawa. T .. Kato. S .. and Kohri. T .. "Regenerative SCPC satellite communications 
systems." Proc. AIAA 10th CoTn117llnication SateUite Systems Conf., Orlando. pp. 269-
275. 1984. 
68. Ohtani. K. and Kato. S .. "An onboard digital demodulator for regenerative SCPC 
satellite communication systems." Con!. Rec. ICC 86. pp. 1803-1808. Toronto. June 
1986. 
69. Gockler. H .. "A highly efficient multistage approach to digital FDM demultiplexing 
for mobile SCPC satellite communications." Proc. 3rd Tirrenia Int. Workshop Digital 
Communications. pp. 179-186. 1987. 
70. "Multicarrier demodulator design." ESA ESfEC contract 6096184INLIGM(SC). Dec. 
1986. 
71. Takahata. F .• Yasunaga. M .. Hirata. Y .. Ohsawa. T .. and Namiki. J .. "A PSK group 
modem for satellite communications." IEEE I. Selected Areas Cornmun .. vol. SAC-5. 
pp. 648-661. May 87. 
72. Yim. W. H .. Kwan. C. C. D .. Coakley. F. P .. and Evans. B. G .. "Comparison of digital 
transmultiplexer architectures for use in on-board processing satellites:' Proc. 3rd 
Tirrenia Int. Workshop Digital Communications. pp. 279-286. 1987. 
73. Re. E. Del and Fantacci. R .. "Multicarrier demodulator with integrated MAP syn-
chronization and ML demodulation for advanced satellite digital communication sys-
tems." Globecom 87 Proc .• pp. 913-917. 1987. 
74. Alberty. T .. Bjornstrom. G .. Eyssele. H .. Gockler. H .. and Hespelt. V .. "Digital on-
board multicarrier demodulator for mobile satellite communications:' Proc. ICC 88, 
Philadelphia. vol. 1. pp. 16.5.1-16.5.5. 1988. 
75. Yim. W. H .. Kwan. C. C. D .• Coakley. F. P .. and Evans. B. G .. "Multi-carrier demodu-
lator for the on-board processing T-SAT land mobile payload." Proc. 4th Int. Conf. 
SateUite Systems Mobik Communications and Navigation. pp. 254-258. 1988. 
76. TMS32OC30 Users Guide, Texas Instrument. 1988. 
77. Ashton. C. J .• "Archimedes - land mobile communications from highly inclined satel-
lite orbits." Proc. 4th Int. ConI. SateUite Systems Comnwn. Navigation. pp. 133-137. 
London.~t. 1988. 
78. 
79. 
- 257-
Casewell, I. E., Evans, B. G., and Craig, A. D., "An on-board processing satellite pay-
load for a European land mobile satellite system," Proc. 3rd Tirrenia Int. Workslwp 
Digital Communications, pp. 171-178, 1987. 
Evans, B. G., El-Amin, M. H. M., Casewell, I. E., and Craig, A. D., "An on-board pro-
cessing satellite payload for European mobile communications," Proc. ICDSC-7, pp. 
545-550, Munich, May 1986. 
80. Feher, K., Digital communications, satellite/earth station engineering, Prentice-Hall, 
Englewood Cliffs, 1983. 
81. Bhargava, V. K., Haccoun, D., Matyas, R., and Nuspl, P. P., Digital communications by 
satellite, John Wiley & Sons, 1981. 
82. Scheuermann, H. and Gockler, H., "A comprehensive survey of digital transmulti-
plexing methods," Proc. IEEE, vol. 69, pp. 1419-1450, Nov.,1981. 
83. Bellanger, M. G., "On computational complexity in digital transmultiplexer filters," 
IEEE Trans. Commun., vol. COM-30, pp. 1461-1465, July 1982. 
84. Bellanger, M. G. and Daguet, J. L., "TDM-FDM transmultiplexer: digital polyphase 
and FFT," IEEE Trans. Commun., vol. COM-22, pp. 1199-1205, Sep 1974. 
85. Bellanger, M. G., Bonnerot, G., and Coudreuse, M., "Digital filtering by polyphase 
network: application to sample-rate alteration and filter banks," IEEE Trans. Acoust., 
Speech, Signal Processing, vol. ASSP-24, pp. 109-114, Apr 1976. 
86. Bonnerot, G., Coudreuse, M., and Bellanger, M. G., "Digital processing techniques in 
the 60-channel transmultiplexer," IEEE Trans. Commun., vol. COM-26, pp. 698-706, 
May 1978. 
87. Narasimha, M. J., "Design of FIR filter banks for a 24-channel transmultiplexer," 
IEEE Trans. Commun., vol. COM-30, pp. 1506-1510, July, 1982. 
88. Ansari, R. and Liu, B., "Transmultiplexer design using all-pass filters," IEEE Trans. 
Commun., vol. COM-30, pp. 1569-1574, July, 1982. 
89. Takahata, F., Hirata, Y., Ogawa, A., and Inagaki, K., "Development of a TDM/FDM 
transmultiplexer:' IEEE Trans. Commun., vol. COM-26, pp. 726-733, May 1978. 
90. Takahata, F., Inagaki, K., Hirata, Y., and Ogawa, A., "A digital 60 channel transmul-
tiplexer: algorithm minimizing multiplication rate and hardware implementation," 
IEEE. Trans. Commun., vol. COM-30, pp. 1511-1519, July 1982. 
91. Vityazev, V. V. and Stepashkin, A. I., "Synthesis of digital filter-demodulators by a 
double fast-Fourier transform," Teleco1TlJ711,lTt. & Radio Eng. (USA), Pt.}, vol. 36, pp. 
40-43, 1982. 
92. Loeffler, C. M. and Burrus, C. S., "Optimal design of periodally time-varying and 
multirate digital1ilters," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-
32, pp. 991-997, Oct. 1984. 
- 258-
93. Drews. W. and Gazsi. L.. "A new design method for polyphase filters using all-pass 
sections." IEEE Trans. Circuits Systs .• vol. CAS-33. pp. 346-348. Mar. 1986. 
94. Critchley. J. and Rayner. P. J. W .. "Design methods for periodically time varying 
digital filters," IEEE Trans. Acoust., Speech, Signal Processing, vol. 36. pp. 661-673. 
May 1988. 
95. Foldvari-Orosz, J., Henk, T., and Simonyi, E., "A direct approximation of the 
polyphase filter banks," Proc. 1986 Int. Symp. Circuits Syst .. vol. 3, pp. 1101-1104. 
May 1986. 
96. Bellanger, M. G., Daguet, J. L., and Lepagnol, G. P., "Interpolation, extrapolation. and 
reduction of computation speed in digital filters," IEEE Trans. Acoust., Speech, Signal 
Processing, vol. ASSP-22, pp. 231-235, Aug 1974. 
97. Tsuda, T., Morita, S., and Fujii, Y., "Digital TDM-FDM translator with multistage 
structure," IEEE. Trans. Communs., vol. COM-26, pp. 734-741, May 1978. 
98. Molo, F., "Transmultiplexer realization with mUltistage filtering," IEEE Trans. Com-
mun., vol. COM-30, pp. 1614-1622. July 1982. 
99. Nelson, G. A., Pfeifer, L. L., and Wood, R. C., "High-speed octave band digital filter-
ing," IEEE Tran. Audio, Electr()(lCCR.lstics, vol. AU-20, pp. 58-65. Mar 1972. 
100. Re, E. Del and Emiliani, P. L., "An analytic signal approach for transmultiplexers: 
theory and design," IEEE Trans. Cornmun., vol. COM-30, pp. 1623-1628, JUly, 1982. 
101. Constantinides. A. G. and Valenzuala, R. A., "An efficient and modular Transmulti-
plexer design," IEEE Trans. Commun, vol. COM-30, pp. 1629-1641. July. 1982. 
102. Lim, Y. C. and Ko, C. C. , "Synthesis of digital filter bank with narrow transition 
width," Proc. 1986 Int. Symp. Circuits Syst., vol. 2, pp. 655-656, May 1986. 
103. Neuvo, Y., Rajan, G., and Mitra, S. K., "Design of narrow-band FIR bandpass digital 
filters with reduced arithmetic complexity," IEEE Trans. Circuits Syst., vol. CAS-34. 
pp. 409-419, April 1987. 
104. Crochiere, R. E. and Rabiner, L. R .. "Optimum FIR digital filter implementations for 
decimation, interpolation, and narrow-band filtering," IEEE Trans. Acoust., 
Speech, Signal Processing, vol. ASSP-23. pp. 444-456. Oct. 1975. 
105. Crochiere, R. E. and Rabiner, L. R., "Further considerations in the design of decima-
tors and interpolators," IEEE Trans. Acoust., Speech, Signal PrOC£ssing. vol. ASSP-24, 
pp. 296-311. Aug. 1976. 
106. Crochiere, R. E. and Rabiner. L. R., Multirate digital signal prOC£ssing, Prentice-Hall. 
Englewood Cliffs, 1983. 
107. Oppenheim, A. V. and Schafer, R. W., Digital signal prOC£ssing, Prentice-Hall. 1975. 
108. Hirata, Y .• "SSB-FDM using complementary comb filters," Elect. Lett .. vol. 17. pp. 
614-615, 20th, Aug. 1981. 
- 259-
109. Hogenauer. E. B .. "An economical class of digital filters for decimation and inter-
polation." IEEE Trans. Acoust., Speech, Signal Processing. vol. ASSP-29. pp. 155-162, 
Apr. 1981. 
110. Bellanger .• Digital Processing of Signals, J. Wiley. 1988. 
111. Jeruchim. M.C .. "Techniques for estimating the bit error rate in the simulation of 
digital communication systems:' IEEE Iournal on Sekcted Areas in Comm., vol. 
SAC-2, pp. 153-170, Jan.1984. 
112. Estola. K.-P.. "Design of computationally efficient FIR filters for sampling rate 
alteration and multiband filtering with arbitrary pass bands and time response." 
Proc. ICASSP 86, Tokyo, pp. 2575-2578, 1986. 
113. Medlin. G. W., Adams. J. W., and Leondes, C. T., "Optimal coefficients for FDM-
TDM transmultiplexers." Proc.ICC '86, pp. 960-964. 1986. 
114. Medlin. G. W., "Optimal communications filters for multirate applications." Proc. 8th 
European Conf. Electrotechnics, Stockholm, Sweden, pp. 138-141, 13-17 June, 1988. 
115. Rabiner, L. R. and Gold, B.. Theory and application of digital signal processing, 
Prentice-Hall, Englewood Cliffs, 1975. 
116. Gersho, A., Gopinath, B., and Odlyzko, A. M., "Coefficient inaccuracy in transversal 
filtering," Bell System Tech. I., pp. 2301-2316, Dec. 1979. 
117. Jackson, L.B., "On the interaction of roundoff noise and dynamic range in digital 
filters," Bell Syst. Tech. I., vol. 49, pp. 159-184, Feb.1970. 
118. Jackson, L. B., "Roundoff noise analysis for fixed point digital filters realized in cas-
cade or parallel form," IEEE Trans. Audio Electroacoustics, vol. AU-18, pp. 107-122, 
June 1970. 
119. TOPSIM Users Manual, Polytechico di Milano, 1985. 
120. Goodman. D. J. and Carey, M. J., "Nine digital filters for decimation and interpola-
tion," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-25, pp. 121-126, 
Apr. 1977. 
121. Alaria, G. B .• Colombo, G., and Pennoni, G .. "T-S-T SS/TDMA system for services of 
different bandwidths," Proc. Third Int. Workshop on Digital Communications, Tir-
renia, Italy, 14-16 Sept., pp. 261-270, North Holland, 1987. 
122. Campanella, S. J. and Sayegh. S .. "A flexible on-board demultiplexer/demodulator," 
Proc. AIAA 88, pp. 299-303, Apr.1988. 
123. Gambardella. G .. "A contribution to the theory of short-time spectral analysis with 
nonuniform bandwidth filters," IEEE Trans. Circuit Theory, vol. Cf-18, pp. 455-460, 
July 1971. 
124 G . C R "A new method of spectral analysis and sample rate reduction for . uarmo, . ., 
band-limited signals," Proc. IEEE, vol. 69, pp. 1161-1163, Sept. 1981. 
- 260-
125. Jeren, B., "A method of nonuniform filter bank implementation u· DFT" Pr smg , DC. 
1985 European Conf· Circuit Theory & Design, Prague, 2-6 Sept. 1985, pp. 533-536, 
1985. 
126. Kaiser, J. F., "Nonrecursive digital filter design using the Io-sinh window function," 
Proc. 1974 IEEE Int. Symp. on Circuits and Systems, pp. 20-23, 1974. 
127. Rivers, D. D. and Rosen, R. A., "Efficient formation of filter banks with frequency 
dependent resolution," Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 1982, 
pp. 311-314, 1982. 
128. Sudhakar, R., Agarwal, R. C., and Roy, S. C. Dutta, "Fast computation of Fourier 
transform at arbitrary frequencies," IEEE Trans. Circuits, Systems, vol. CAS-28. pp. 
972-980. Oct. 1981. 
129. Sreenivas. T. V. and Rao. P. V. S .. "High-resolution narrow-band spectra by FFT 
pruning." IEEE Trans. Acoust., Speech, Signal Processing. vol. ASSP-28. pp. 254-257, 
Apri11980. 
130. Vetterli. M .. "A theory for multirate filter banks,·' IEEE Trans. Acoust., Speech, Sig-
nal Pr0C2ssing. vol. ASSP-35, pp. 356-372, Mar. 1987. 
131. Vaidyanathan. P. P .. "Quadrature mirror filter banks, M-band extensions and perfect 
reconstruction techniques.'· IEEE ASSP MagaZine. pp. 4-19, July 1987. 
132. Vaidyanathan, P. P .. "Theory and design of M-channel maximally decimated quadra-
ture mirror filters with arbitrary M. having the perfect-reconstruction property," 
IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-35, pp. 476-492, April 
1987. 
133. Paulus. E .. "A fast convolution procedure for discrete short-time spectral analysis 
with frequency-dependent resolution." IEEE Trans. Acoust., Speech, Signal Process-
ing, vol. ASSP-32. pp. 1100-1104. Oct. 1984. 
134. Hung, E. K.-L., "A multiresolution sampled-data spectrum analyzer for a detection 
system," IEEE Trans. Acoust., Speech, Signal Pr0C2ssing, vol. ASSP-29, pp. 163-170. 
Apri11981. 
135. Chadwick. V. J. and Bray, P. T., "The modified Hung method of multiresolution fre-
quency analysis:' Signal Processing. vol. 14. pp. 25-35, North-Holland. Jan. 1988. 
136. Valenzuela, R. A., "Techniques for transmultiplexer design," Ph.D. thesis, University 
of London. June 1982. 
137. Beacken, M., "Efficient implementation of highly variable bandwidth filter banks 
with highly decimated output channels," Proc. ICASSP 84, San Diego, pp. 11.4.1-
11.4.3. Mar. 1984. 
138. Blahut, R. E., Fast algorithms for digital signal processing, Addison-Wesley. 1985. 
139. Singleton. R. C .. "An algorithm for computing the mixed radix fast Fourier 
transform,'· IEEE Trans. Audio Electroacoust., vol. AU-17. pp. 93-103. June 1969. 
- 261 -
140. Rabiner. L. R.. "Linear program design of finite impulse response (FIR) digital 
filters." IEEE Trans. Audio Electroacaust .• vol. AU-20. pp. 280-288. Oct. 1972. 
141. Pelkowitz. L.. "Frequency domain analysis of wraparound error in fast convolution 
algorithms." IEEE Trans. Acaust., Speech, Signal Processing. vol. ASSP-29. pp. 413-
422. June 1981. 
142. Parks. T. W. and McClellan. J. H .. "Chebyshev approximation for nonrecursive digi-
tal filters with linear phase." IEEE Trans. Circuit Theory. vol. Cf-19. pp. 189-194. 
Mar. 1972. 
143. Kuester. J. L. and Mize. J. H .. Optimization techniques with Fortran, McGraw-Hill. 
1973. 
144. Samueli. H .. "On the design of optimal equiripple FIR digital filters for data 
transmission applications." IEEE Trans. Circuits Systems. vol. 35. pp. 1542-1546. 
Dec. 1988. 
145. Helms. H. D .. "Fast Fourier Transform method of computing difference equations and 
simulating filters." IEEE Trans. Audio, Electroacoustics. vol. AU-15. pp. 85-90. June 
1967. 
146. Sreenivas. T. V. and Rao. P. V. S .. "FFT algorithm for both input and output prun-
ing." IEEE Trans. Acaust., Speech, Signal Processing. vol. ASSP-27. pp. 291-292. June 
1979. 
147. Croisier. A .• Esteban. D .. and Galand. C .. "Perfect channel splitting by use of 
interpolation/decimation/tree decomposition techniques." Int. Conf. Inf. Sci. Syst.,. 
Patras. Aug. 1976. 
148. Smith. M. J. T. and Barnwell. T. P .. "Exact reconstruction techniques for tree-
structured subband coders." IEEE Trans. Acaust., Speech, Signal Processing. vol. 
ASSP-34. pp. 434-441. June 1986. 
149. Nussbaumer. H. J. and Vetterli. M .. "Computationally efficient QMF filter banks." 
Proc. ICASSP 84, San Diego. pp. 11.3.1-11.3.4. Mar. 1984. 
150. Masson. J. and Pice!. Z .. "Flexible design of computationally efficient nearly perfect 
QMF filter banks:' Proc. ICASSP 85, Tampa, USA. pp. 541-544. Mar. 1985. 
151. Rabiner. L. R .. McClellan. J. H .. and Parks. T. W .. "FIR digital filter design techniques 
using weighted Chebyshev approximation." Proc. IEEE. vol. 63. pp. 595-610. 
Apr.1975. 
152. Nussbaumer. H. J .• Fast Fourier transform and convolution algorithms, 2nd ed .. 
Springler-V er lag. Berlin. 1982. 
153. Nussbaumer. H. J .• "Polynomial transform implementation of digital filter banks." 
IEEE Trans. Acaust., Speech, Signal Processing. vol. ASSP-31. pp. 616-622. June 
1983. 
- 262-
154. Lee. E. A .. "Programmable DSP architectures' Part II " IEEE ASSP M' 4 
. agaz"£ne. pp. -
14. Jan. 1989. 
155. Elliott. D. F. and Rao. R .. Fast transforms: algorithms, analyses, and applications, 
Academic Press, New York. 1983. 
156. Tseng. B.-D .. JUllien. G. A., and Miller. W. C., "Imple~entation of FFf structures 
using the residue number system." IEEE Trans. Computers. vol. C-28. pp. 831-845. 
Nov. 1979. 
157. Huang. C. H .. Peterson. D. G .. Rauch. H. E .. Teague. J. W .. and Fraser. D. F .. "Imple-
mentation of a fast digital processor using the residue number system." IEEE Trans. 
Circuits Syst .• vol. CAS-28. pp. 32-38. Jan. 1981. 
158. Liu. B. and Peled. A .. "A new hardware realization of high-speed fast Fourier 
transform." IEEE Trans. Acoust., Speech, Signal Processing. vol. ASSP-23. pp. 543-
547, Dec. 1975. 
159. Essig. D .. Erskine. C .• Caudel. E .. and Magar. S .. "A second generation digital signal 
processor." IEEE Trans. Circuits. Systs .. 
160. Lee. E. A .. "Programmable DSP Architectures: Part 1." IEEE ASSP Magazine. pp. 4-
19. Oct. 1988. 
161. Groginsky. H. L. and Works. G. Q .• "A pipeline fast Fourier transform:' IEEE Trans. 
Computers. vol. C-19, 1970. 
162. Gold. B. and Bially. T .. "Parallelism in fast Fourier transform hardware." IEEE 
Trans. Electroacoustics. vol. AU-21, 1973. 
163. Modi. J. J .. Parallel algorithms and matrix computation, Oxford University Press. 
1988. 
164. Johnsson. S. L. . Ho. C.-T., Jacquemin. M .. and Ruttenberg. A., "Computing Fast 
Fourier Transforms on boolean cubes and related networks." Advanced Algorithms 
and Architectures for Signal Processing 11. vol. SPIE 826. pp. 223-231. 1987. 
165. Mueller. P. T .. Siegel. L. J .. and Siegel. H. T., "Parallel algorithms for the two-
dimensional FFf," Proc. 5th Int. Conf. Pattern Recog. and Image Proc., pp. 497-502. 
Dec. 1980. 
166. Duhamel. P. and Hollmann. H., "Split-radix FFT algorithm," Electron. Lett., vol. 20. 
pp. 14-16, Jan. 5, 1984. 
167. Vetterli. M. and Duhamel. P .. "Split-radix algorithms for length-pAm DFf·s." IEEE 
Trans. Acoust., Speech, Signal Processing. vol. 37. pp. 57-64, Jan. 1989. 
168. Richards. M. A .• "On hardware implementation of the split-radix FFf." IEEE Trans. 
Acoust., Speech. Signal Processing. vol. 36. pp. 1575-1581. Oct. 1988. 
169. Tortoli. P. and Andreuccetti. F .. "A high-speed FFT unit based on a low cost digital 
signal processor." IEEE Trans. Circuits Systems, vol. 35. pp. 1434-1438. Nov. 1988. 
- 263 -
170. Good. I. J .. "The relationship between two fast Fourier transforms," IEEE Trans. 
Comp .• vol. C-20. pp. 310-317. 1971. 
171. McClellan. J. H. and Rader. C. M .. Number theory in digital signal processing, 
Prentice-Hall. Englewood Cliffs. 1979. 
172. Winograd. S .• "On computing the discrete Fourier transform." Mathematics of Compu-
tation. vol. 32. pp. 175-199. Jan. 1978. 
173. Rader. C. M .. "Discrete Fourier transforms when the number of data samples is 
prime." Proc. IEEE. vol. 56. pp. 1107-1108. June 1968 . 
. 174. Hwang. K. and Briggs. F. A .. Computer architecture and parallel processing, McGraw-
Hill. 1985. 
175. Leung. S. H .• "Application of residue number system to complex digital filters." 
Proc. 15th Asilomar Con/. Circuits, Systems and Computers, Pacific Grove, CA. pp. 
70-74. Nov. 1981. 
176. Jenkins. W. K. and Leon. B. J .. "The use of residue number systems in the design of 
finite impulse response digital filters." IEEE Trans. Circuits Syst .. vol. CAS-24. pp. 
191-200. April 1977. 
177. Hayashi. K .. Dhar. K. K .. Sugahara. K .. and Hirano. K .. "Design of high-speed digital 
filters suitable for multi-DSP implementation." IEEE Trans. Circuits, Systems. vol. 
CAS-33. pp. 202-217. Feb. 1986. 
178. Marshall. T. G .. "Structures for digital filter banks." Proc. IEEE ICASSP 82, Paris. 
pp. 315-318. April 1982. 
179. Marshall. T. G .. "Transform methods for developing parallel algorithms for cyclic 
block signal processing." Proc. ICC 1986. pp. 288-294. 1986. 
180. Marshall. T. G .. "The polyphase transform and its applications to block-processing 
and filter-bank structures." Proc. IEEE Int. Symposium Circuits Syst. 1987. pp. 1103-
1109. 1987. 
181. Gnanasekaran. R.. "Equivalence of block and N-path implementations of digital 
filters." IEEE Trans. Circuits Systems. vol. 35. pp. 1326-1330. Oct. 1988. 
182. TMS32OC25 Users Guide, Texas Instruments. 1987. 
183. Li. Z .• Sorensen. H. V .. and Burrus. C. S .. "FFT and convolution algorithms on DSP 
microprocessors." Proc. ICASSP 86, Tokyo. vol. 1. pp. 289-292. Apr. 1986. 
184. Evans. B. G. and Chung. L. N .. "Land mobile satellites using highly elliptic orbits." 
Proc. 3rd Tirrenia Int. Workshop Digital Commun .. pp. 163-170. Sep. 1987. 
185. "T-SAT Mobile Payload Specification." SERC Report. vol. TN-1001-MSS. 1986. 
186. "T-SAT Mobile Payload Final Report." SERC Report. June 1989. 
187. Lutz. E .. Papke. W .• and Plochinger. E .. "Land mobile satellite communications. chan-
nel model. modulation. and error control." Proc. 7th Int. Con/. Digital Satellite 
- 264-
Conunun .. pp. 537-543.1986. 
188. Lutz. E .. "Land mobile satellite channel - recording and modelling:' Proc. 4th Int. 
Conf· Satellite System Conunun. Navigation, pp. 15-19. London. Oct. 1988. 
189. Norbury. J .• Smith. H .• Renduchintala. V. S. M .. and Gardiner. J. G .. "Land mobile 
satellite service provision from the Molniya orbit - channel characterisation." Proc. 
4th Int. Conf. Satellite System Conunun. Navigation. pp. 143-146. London. Oct. 1988. 
190. Chung. L. N .. "Access protocols for on-board processing satellites:' Ph.D. thesis. 
University of Surrey. to be submitted. 
191. Yim. W. H .• Kwan. C. C. D .• Coakley. F. P .. and Evans. B. G .. "On-board multicarrier 
demodulator for mobile applications using DSP implementation." 1st European Conf. 
Satellite Conunun .• pp. 1-7. Munich. Germany. Nov. 1989. 
192. Dickinson. M .• "Digital matching of the I and Q signal paths of a direct conversion 
radio." JJE.R.E .. vol. 56. pp. 75-75. Feb. 1986. 
193. Programs for digital signal processing, IEEE Press. New York. 1979. 
