Abstract-Modern satellite communication calls for novel and flexible concepts for de-and remultiplexing of wide-band FDM-signals in conjunction with beam switching on-board a satellite. For this application, two highly efficient approaches to uniform and yet reconfigurable digital filter banks are known: i) the complex-modulated polyphase filter bank, and ii) the tree-structured filter bank.
I. INTRODUCTION
Digital signal processing on-board communication satellites (OBP) is an active field of research where, in conjunction with frequency division multiplex (FDMA) systems, presently two trends and challenges are observed, respectively: i) The need of an ever-increasing number of user channels makes it necessary to digitally process, i.e. to demultiplex, cross-connect and remultiplex, ultrawideband FDM signals requiring high end sampling rates that range considerably beyond 1GHz [1] , [2] , [3] , [4] , [5] , and ii) the desire of flexibility of channel bandwidth-to-user assignment calling for simply reconfigurable OBP systems [6] , [7] , [8] , [9] . Yet, overall power consumption must be minimum demanding highly efficient filter banks for FDM demultiplexing (FDMUX) and remultiplexing (FMUX).
Two baseline approaches to most efficient uniform digital filter banks (FB), as required for OBP, are known: a) The complex-modulated (DFT) polyphase (PP) FB [10] , and b) the multistage tree-structured FB, where its universal directional filter cells (UNDIFICE) are likewise based on the DFT PP method [11] , [12] . For both approaches it has been shown that bandwidth-to-user assignment is feasible within reasonable constraints [7] , [8] , [13] : A minimum user channel bandwidth, denoted by slot bandwidth b, can stepwise be extended by any integer number of additional slots up to a desired maximum overall bandwidth that shall be assigned to a single user.
However, as to challenge i), the above two FB approaches fundamentally differ from each other: In a DFT PP FD-MUX (a) the overall sample rate reduction is performed in compliance with the number of user channels in a single step: all arithmetic operations are carried out at the (lowest) output sampling rate [10] . In contrast, in the multistage FDMUX (b) the sampling rate is reduced stepwise, in each stage by a factor of two [11] . As a result, the polyphase approach (a) inherently represents a completely parallelised structure, immediately usable for extremely high front end sampling frequencies, whereas the high end stages of the tree-structured FDMUX (b) cannot be implemented with standard space-proved CMOS technology. Hence, the tree structure, FDMUX as well as FMUX, calls for a parallelisation of the high rate stages.
As motivated, this contribution deals with the parallelisation of multistage multirate systems. To this end, we introduce a general systematic procedure for multirate system parallelisation [14] , which is deployed in detail in Section II. For proper understanding, in Section III this procedure is applied to the high rate front end stages of the FDMUX part of the recently proposed tree-structured SBC-FDFMUX FB [9] , [13] , which uniformly demultiplexes an FDM signal always down to slot level (of bandwidth b) and that, after onboard switching, recombines these independent slot signals to an FDM signal (FMUX) with different channel allocation -FDFMUX functionality. If a single user occupies a multiple slot channel, the corresponding parts of FDMUX and FMUX are matched for (nearly) perfect reconstruction of this wideband channel signal -SBC functionality [10] . Finally, some conclusions are drawn to stimulate further research. II. SAMPLE-BY-SAMPLE APPROACH TO PARALLELISATION
In this section, we introduce the novel sample-by-sample processing (SBSP) approach to parallelisation of digital multirate systems [14] where, without any additional delay, all incoming signal samples are directly fed into assigned units for immediate signal processing. Hence, in contrast to the widely used block processing (BP) approach, SBSP does not increase latency.
In order to systematically parallelise a (multirate) system, we distinguish four procedural steps [14] :
1. Partition the original system in (elementary SISO or MIMO) subsystems E(z) with single or multiple input and/or output ports, respectively, that are simply amenable to parallelisation. To enumerate some of these: Delay, multiplier, down-and up-sampler, summation and branching, but also suitable compound subsystems such as SISO filters and FFT transform blocks.
2. Parallelise each subsystem E(z) in an SBSP manner according to the desired degree of parallelisation P . To this end, each subsystem is cascaded with a P -fold SBSP serial-to-parallel (SP) commutator for signal decomposition (demultiplexing) followed by a consistently connected Pfold parallel-to-serial (PS) commutator for recomposition (remultiplexing) of the original signal, as depicted in Fig.  1(a) . Here, obviously P = P SP = P PS , and p ∈ [0, P − 1] denotes the relative time offsets of connected pairs of downand up-samplers, respectively. Evidently, the P output signals of the SP interface comprise all polyphase components of its input signal in a time-interleaved (SBSP) manner at a P -fold lower sampling rate [10] , [12] . Since the subsequent PS interface is inverse to the preceding SP interface [12] , the SP-PS commutator cascade has unity transfer with zero delay in contrast to the (P − 1)-fold delay of the BP DelayChain Perfect-Reconstruction system [10] , as anticipated (cf. also Fig. 2) .
After this preparation, P -fold parallelisation is readily achieved by shifting the (SISO) subsystem E(z) between the SP and PS interfaces by exploiting the noble identities [10] and some novel generalised multirate identities [14] , [15] . Thus, as shown in Fig. 1(b) , the two interfaces are interconnected by an equivalent P × P MIMO system E(z), which represents the P -fold parallelisation of E(z), where all operations of which are performed at a P -fold reduced operational rate. 3 . Reconnect all parallelised subsystems exactly in the same manner as in the original system. This is always given, since parallelisation does not change the original numbers of input and output ports of SISO or MIMO subsystems, respectively.
4. Eliminate all interfractional cascades of PS-SP interfaces using the obvious multirate identity depicted in Fig. 2 . Note that this elimination process requires identical up-and down-sampling factors, P out,a PS = P in,b SP , of each PS-SP interface cascade restricting free choice of P for subsystem parallelisation. As a result of parallelisation, all input signals of the original (possibly MIMO) system are decomposed into P time-interleaved polyphase components by a SP demultiplexer for subsequent parallel processing at a P -fold lower rate, and all system output ports are provided with a PS commutator to interleave all low rate subsignals to form the high speed output signals.
For illustration, we present the parallelisation of a unit delay z
, and of an M -fold down-sampler with zero time offset [14] , as shown in Fig. 3 . The unit delay (a) is realised by P parallel time-interleaved shimming delays to be implemented by suitable system control:
where permutation is introduced for straightforward elimination of interfractional PS-SP cascades according to Fig. 2 . In case of down-sampling Fig. 3(b) , to improve efficiency, the P parallel down-samplers of the diagonal MIMO system E(z s ) are merged with the P down-samplers of the SP interface. Hence, by using suitable multirate identities [14] , the contiguous P M-fold down-samplers of the SP demultiplexer have a relative time offset of M .
III. PARALLELISATION OF SBC-FDFMUX FILTER BANK
Subsequently, we deploy the parallelisation of the high rate FDMUX front end section of the versatile tree-structured SBC-FDFMUX FB for flexible channel and bandwidth allocation [9] , [13] . The first three hierarchically cascaded stages of the FDMUX are shown in Fig. 4 in block diagram form applying BP. In each stage, ν = 1, 2, 3, the respective input spectrum is split in two subbands of equal bandwidth in conjunction with decimation by two. All UNDIFICE have identical coefficients and are assumed as critically sampling 2-channel DFT PP FB with zero frequency offset (even channel allocation scheme [13] ). The branch filter transfer functions H λ (z ν ), λ = 0, 1, represent the two PP components of the prototype filter [10] , [12] where, by setting z ν := e jΩν with Ω ν = 2πf /f ν and ν = 1, 2, 3, the respective frequency responses H λ (e jΩν ) are obtained, which are related to the operational sampling rate f ν of stage ν.
Assuming, for instance, a high end input sampling frequency of f i = f 0 = 2.4GHz [3] , [7] , the operational clock rate of the third stage is f 3 = f i /2 3 = 300MHz, which is deemed feasible using present-day CMOS technology. Hence, front end parallelisation has to reduce operational clock of all subsystems preceding the third stage down to f s = f 3 = 300MHz. This is achieved by 8-fold parallelisation of input branching and blocking (delay z −1 i ), 4-fold parallelisation of the first stage of the FDMUX tree (comprising input decimation by two, the PP branch filters H λ (z 1 ), λ = 0, 1, and butterfly), and of the input branching and blocking (delay z −1 1 ) of the second stage and, finally, corresponding 2-fold parallelisation of the two parallel 2-channel FDMUX FB (UNDIFICE) of the second stage of the tree, as indicated in Fig. 4 .
The result of parallelisation, as required above, is shown in Fig. 5 , where all interfractional interfaces have been removed by straightforward application of identity of Fig.  2 . Subsequently, parallelisation of elementary subsystems is explained in detail: 1. Down-Sampling by M = 2: In compliance with Fig.  3(b) , each 2-fold down-sampler is replaced with P ν units in parallel for 2P ν -fold down-sampling with even time offset 2p, where p = 0, 1, 2, 3 applies to the first tree stage (P 1 = 4), and p = 0, 1 to the second stage (P 2 = 2). The result of 4-fold parallelisation of the front end input down-sampler of the upper branch (ν = 1, λ = 0) is readily visible in Fig. 5 preceding filter MIMO block H 1 0 (z s ): In fact, it represents an 8-to-4 parallelisation, where all odd PP components are removed according to Fig. 3(b) [14] .
2. Cascade of unit blocking delay and 2-fold downsampler: For proper explanation, we first focus on the input section of the first tree stage, lower branch (ν = λ = 1) in front of filter block H 1 (z 1 ). To this end, as required by Fig.  4 , the unit delay z −1 i is parallelised by P 0 = 8, as shown in Fig. 3(a) , while the subsequent down-sampler applies P 1 = 4, as described above w.r.t. Fig. 3(b) . Immediate cascading of parallelised unit delay (P 0 = 8) and downsampling (P 1 = 4, M = 2) (as induced by Fig. 3) shows that only those four PP components of the parallelised delay with even time offset (p = 0, 2, 4, 6) are transferred via the 4-branch SP-input interface of down-sampling (2P 1 = 8) to its PS-output interface with naturally ordered time offsets p = 0, 1, 2, 3 w.r.t. P 1 = 4. Hence, only those retained 4 out of 8 PP components of odd time index p = 7, 1, 3, 5, being provided by the unit delay's SP-input interface and delayed by z As a result, the upper branch of stage 1, H 0 (z ν ) → H 1 0 (z s ), is fed by the even-indexed PP components of the high rate FDMUX input signal, whereas the lower branch
is provided with the delayed versions of the PP components of odd index, as depicted in Fig. 5 . Hence, as in the original system Fig. 4 , the input sequence is completely fed into the parallelised system. This procedure is repeated with the input branching and blocking sections of subsequent stages ν = 2, 3: The PP branch filters H 0 (z ν ) → H ν 0 (z s ) parallelised by P ν , where P 2 = 2 and P 3 = 1 (P 1 = 4), are provided with the evennumbered PP components of the respective input signals with timing offsets in natural order. Contrary, the set of PP components of odd index is always delayed by z −1/Pν−1 s and fed into filter blocks
3. P ν -fold Parallelisation of PP branch filters H λ (z ν ) → H ν λ (z s ), λ = 0, 1; ν = 1, 2, is achieved by systematic application of the procedure condensed in Fig. 1 (for details cf. [14] , [12] ). To this end, H λ (z ν ) is decomposed in P ν PP components of correspondingly reduced order, which are (Fig. 4) ; zs := e jΩs , Ωs = 2πf /fs, fs = f i /8
arranged to a MIMO system by exploiting a multitude of multirate identities [14] , [15] . The resulting P ν × P ν MIMO filter transfer matrix H ν λ (z s ) contains each PP component of H λ (z ν ) P ν times: Thus, the amount of hardware is increased P ν times whereas, as desired for feasibility, the operational clock rate is concurrently reduced by P ν . Hence, the overall expenditure, i.e. the number of operations times the respective operational clock rate [12] , is not changed. 
IV. CONCLUSION
A general and systematic procedure for parallelisation of multirate systems has been presented. Its application to the high rate decimating FDMUX front end of the tree-structured SBC-FDFMUX FB [9] , [13] has been deployed in detail. The stage ν degree of parallelisation P ν , ν = 0, 1, 2, 3, is diminished proportionally to the operational clock frequency f ν of stage ν and is, thus, adapted to the actual sampling rate. As a result, after suitable decomposition of the high rate front end input signal by an input commutator in P 0 = P max polyphase components (as depicted for P max = 8 in Fig.  5 ), all subsequent processing units are likewise operated at the same operational clock rate f s = f i /P 0 . Since inherent parallelism of the original tree-structured FDMUX (Fig. 4) has attained P max = 8 in the third stage, and the output signals of this stage represent the desired eight demultiplexed FDM subsignals, interleaving PS-output commutators are no longer required, as to be seen in Fig. 5 . Finally, it should be noted that parallelisation does not change overall expenditure; yet, by multiplying stage ν hardware by P ν , the operational clock rates are reduced by a factor of P ν to a feasible order of magnitude, as desired.
Applying the rules of multirate transposition [12] to the parallelised FDMUX front end, the high rate interpolating back end of the tree-structured SBC-FDFMUX FB is obtained likewise and exhibits the same properties as to expenditure and feasibility [14] . Hence, the versatile and efficient tree-structured filter bank (FDMUX, FMUX, SBC, wavelet, or any combination thereof) can be used in any (ultra) wide-band application without any restriction.
