Efficient bit-level design of an on-board digital TV demultiplexer by Sala Álvarez, José et al.
EFFICIENT BITLEVEL DESIGN OF AN ONBOARD
DIGITAL TV DEMULTIPLEXER
S Calvo JSala A Pages G Vazquez
Dept of Signal Theory and Communications Universitat Politecnica de Catalunya
c Jordi Girona  Modul D	
b

 Barcelona SPAIN
Tel 

	
 Fax 




email fsergioalvarezggpstscupces
ABSTRACT
A bitlevel description of the signal processing stage of
an onboard integrated VLSI multicarrier demodulator
is presented in this paper along with a description of the
optimization procedure that has been developed for the
signal processing functions
 
 The demultiplexer is capa
ble of handling a varying number of carriers in a  MHz
bandwidth on the satellite uplink Its architecture has
been optimized at bitlevel in a way dependent on the
known input signal statistics and carrier distributions
allowed by the frequency plan
 INTRODUCTION
Space Digital Video Broadcasting Systems are evolving
toward the DVB Digital Video Broadcasting Standard
based on MPEG	 An increasingly larger amount of
processing is being moved toward the space segment so
that complex regenerative payloads shall have to be car
ried by the forthcoming satellite generation This paper
describes the architecture that has been developed for
an O
B Multicarrier Demultiplexer ASIC prototype to
provide services for digital television and multimedia in
the frame of the HISPANET network project HISPA
NET is aimed at providing broadcasting of digital multi
programme television to Spanishspeaking communities
in Europe and America The basic concept is to provide
access to individual broadcasters and service providers
through specic transponders carried by the HISPASAT
satellite One carrier conveying all programmes recei
ved on the individual uplinks Multifrequency TDMA
is transmitted on the downlink Therefore demultiple
xing and demodulation not considered in this paper
must be carried out onboard
The design of digital onboard systems and speci
cally of the ltering stages of the digital demultiplexer
have to keep power consumption gate count and imple
mentation losses to a minimum while maintaining ac
ceptable system performance A special criterion that
 
This research work has been partially supported by the Nati
onal Research Plan of Spain CYCIT TIC	

C		 and
TIC			C		 and the Catalonian Regional Government
CIRIT SGR			
takes into account the structure of the interfering ad
jacent carriers has been developed  to derive suitable
decimation lters for the demultiplexing function The
criterion optimizes jointly the lter response in the pass
 transition and stopbands for a given number of coe
cients as the complexity of ltering is exponential in the
lter length In this followup paper we consider the
bitlevel design of the architecture therein described A
system overview is presented in section 	 System Des
cription Section  presents the approach followed in
bitlevel design for VLSI integration Results and Con
clusions are shown in Section 
 SYSTEM DESCRIPTION
The architecture of the digital onboard demultiplexer
shall have to deliver any carrier combination of those
allowed see Fig 	 and 		 of the following signa
ling rates R
s
 	R
s
 R
s
and R
s
 with R
s
the lowest
signaling rate Each carrier is QPSK modulated with
a square root raised cosine pulse rollo  Two
possible frequency plans have been tailored to facilitate
the demultiplexing scheme where the separation with
adjacent carriers is  R
s
 In the nal architecture
both frequency plans depicted in Fig 	 and Fig 		
are processed by two independent demultiplexers that
can be internally congured to deal with either of them
The overall bandwidth  MHz can contain up to 
small carriers at the R
s
signaling rate The sampling
scheme is IF sampling at f
s
 R
s
 MHz Both
frequency plans have been devised to contain the four
possible mentioned signaling rates with two constraints
a that very simple frequency shifting operations should
be carried out and b that the output sampling rate of
each carrier should be the same in samples per sym
bol for all rates These two constraints have led to the
construction of two frequency plans and the design of
the demodulators at  samples per symbol The inner
architecture of the demultiplexer consists of intercom
municating polyphase processors in a tree scheme In
principle it would have been possible to dene a com
mon architecture capable of processing both frequency
plans Note in gures 	 and 	 that it should only be
neccessary to introduce the input signal at a decimation
bytwo or at a decimationbythree block Nevertheless
the impact this approach has at bitlevel is considera
ble as each processing block must be dimensioned to
handle complex signals at sucient rate In the end it
was opted for implementing each trees hardware sepa
rately Then hardware optimization is more straight
forward and can be handled more eectively using the
techniques described in the section on BitLevel Design
A
A
 A
A
 A
A
 A
A
 A
A
 A
A
 A
A
 A
A
 A
A
 A
A
 A
A
 A
A
 A
A

H
H
H

H
H
H

H
H
H


A
A
 A
A
 A
A
 A
A
 A
A
 A
A
 A
A
 A
A
 A
A
 A
A
 A
A
 A
A
 A
A




















H
H
H
H 


H
H
H
H


































































































































































































































  
 
        
  
 
    
 
Figure 	 Frequency Plan No  and No  	above
 
Mbps and  Mbps carriers 	below
  Mbps  Mbps
and  Mbps carriers
Demultiplexing is carried out according to the scheme
de
ned in 
gures  and 	
1/2 band 2 1/2 band 2
1/2 band 2
1/2 band 2
1/3 band 3
1/3 band 3
1/3 band 3
h1 n[ ]j
−n
h1 n[ ]jn
h0 n[ ]jn
e−jπn
h1 n[ ] k6 = 0
k6 =1
k6 = −1
e−jπk6n
h2 n[ ]e
− jnπ /3
h2 n[ ]ejn
π/3
h2 n[ ] ′ k2 = 0
′ k2 =1
′ k2 = −1
e− jπ ′ k2n
6 Mbps 2 Mbps
Figure 	 Filtering and Decimation structure for Tree no
 or  Each block is a polyphase processor consisting
of a decimated lter bank and a IDFT operation Bitlevel
optimization is carried out separately at both blocks
The architecture for the 		 con
guration is dis
played in 
gure  Each block is implemented with a
polyphase processor that performs demultiplexing of 
and  carriers with a decimation ratio of  and  res
pectively Note that the working rate of each processor
is twice as fast if compared to that of a conventional
polyphase processor the decimation ratio is only half
the number of carriers This only aects the IDFT
part of the polyphase as the same outputs of the 
lter
bank may be used twice at dierent input positions to
the IDFT to evaluate odd output samples Only some
1/2 band 2
1/2 band 2
1/2 band 2
1/2 band 2
1/3 band 3
1/3 band 3
1/3 band 3
h3 n[ ] −1( )n
h3 n[ ]ejnπ 3
h3 n[ ] −1( )ne− jnπ 3
k8 =1
k8 = 0
k8 = −1
e− jπk8n
h4 n[ ]e− jnπ/4
h4 n[ ]ejnπ/ 4
′ k4 = −1
′ k4 =1
h5 n[ ]e− jnπ /4
h5 n[ ]ejnπ /4 ′ k2 =1
′ k2 = −1
e− jπ ′ k4 /2⋅n e− jπ ′ k2/2⋅n
8 Mbps 4 Mbps 2 Mbps
Figure 	 Filtering and Decimation structure for Tree no 
or 
of the outputs of each processor contain useful data de
pending on the frequency plan so that inhibited out
puts will not be synthesized in the 
nal hardware


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 












 
 
 
 
T
T
T
h
h
h
h
h
h
h
T
T
h
z
z
z
z
IDFT
xn
x
x
x
x
Figure 	 Architecture of the  	decimationbytwo
 pro
cessor lter bank The output vector y is fed into a IDFT
processor The four dierent sublters constitute the de
cimated versions of the carrier demultiplexing lter h	n

hi  h	n  i
 For this particular application we have
h	n
  	n L
 and h	n
    Upperlower branc
hes of the four logic multiplexers are selected according to
evenodd IDFT output samples 	vector z
 The  processor
is designed in a similar way
  BITLEVEL DESIGN
Bitlevel designs of signal processing algorithms often
require careful wordlength dimensioning in each node of
the architecture The overall performance of the system
in terms of a distortion criterion depends on two factors	
a the wordlength assigned to each internal variable and
b the number of quantization levels at the input The
usual criterion is thus to optimize performance for a re
asonable complexity It is useful to consider several op
timization levels in the design of a digital architecture	
a top level where algorithmical optimization is carried
out choice of the algorithm b bit level or register
transfer level RTL where the optimization procedure
described in this paper applies and c gate level where
the RTL specication is synthesized into an interconnec
tion of gatesregisters Tradeo	s must be met during
the design phase Therefore it is a usual procedure in
the provision of the system specications to state the
maximum tolerated level of implementation loss evalu
ated in dB We have chosen this therefore as the dis
tortion criterion under which the architecture must be
optimized The implementation loss is dened according
to

L
I
  log
  
E
b
N
o

arch
E
b
N
o

true
dB 
with E
b
N
o

arch
the bitenergy to noise spectral power
density ratio obtained at the output of the architecture
nally implemented and E
b
N
o

true
the true ratio of
the ideal signal model
Previously we have described the highlevel architec
ture for the digital TV demultiplexer in terms of a polyp
hase tree The next step in the design process is to
translate this algorithmics to RTL VHDL primitives
Specications of the present system in terms of the
number of carriers to be processed the four admissi
ble symbol rates for each carrier and the varying input
dynamics must be approached with a suitable strategy
at the arithmetical and logic levels The complexity of
the overall system depends on a large degree on the
number of bits assigned to each node in the architec
ture It is therefore important to identify those points
in the architecture that are more critical in terms of the
implementation loss introduced
This objective is usually achieved after recurrent si
mulations using probing sequences previously dened for
a number of scenarii we refer here to those critical ca
ses dened in the system specications Tables are pro
duced showing the implementation loss associated with
each node in terms of the implicated complexity and a
nal decision is reached for the its proper dimensioning
A close understanding of the behaviour of bit dynamics
in terms of the data statistics can provide shortcuts to
this procedure In our setting data statistics can be
intuitively related to the spectrum or frequency plan
present at the input to each polyphase processor
The working margin of each processor can be dened
as that range in terms of signal power that is presented
to it from the preceding stage in the architecture Provi
ded that this condition is met that particular processor
will work according to specications In this particular
case the whole demultiplexing tree is implemented as a
cascade of polyphase procesors so that careful monito
ring of signal dynamics is crucial to guarantee that each
polyphase processor is near its optimum working point
Bitlevel dynamics depend heavily on the input data
statistics In particular cascaded architectures are ex
tremely sensitive to this e	ect Bitlevel primitives are
implemented as xedpoint operations so that recurrent
processing on the input data vector results in signal at
tenuation along the cascade This attenuation e	ect
being detrimental to system performance in terms of
the quantization SNR or implementation loss L
I
 is
heavily dependent on the input data spectrum
The approach that has been taken for this design is
to monitor the data histogram along the cascade as pro
bing sequences are fed through This attenuation can be
eliminated if xedpoint amplication is implemented at
key points in the architecture The choice of the ampli
cation factors is critical in the sense that only a precise
understading of the signal statistics can provide a sui
table value for the whole range of carrier distributions
contemplated in the frequency plan A suitable value
must be chosen to prevent either saturation of arithme
tics or an excessively low signal to quantization noise
ratio for the specied carrier powers
Therefore in order to determine the behaviour of one
target architecture it is only necessary to determine
those input data distributions or spectra that deliver
at the output the maximally and minimally attenuated
signal power for constant input signal power It can
be shown that the attenuation induced by the archi
tecture depends on the randomness of the input signal
spectrum All operations involved in demultiplexing the
carrier set are linear operations It is straightforward to
provide an intuitive justication
 let fx
i
 i  Ig be a
set of input correlated and bounded random variables
and let us perform a linear operation L on these vari
ables
 y  Lx

    x
N
 Then the probability density
function of the random variable Y
 


p
Y
 
y
 
  y
 
def
 ymaxy 
is atter the more those input random variables are cor
related this can be justied from the Central Limit
Theorem In other words absence of correlation at
the input can be interpreted as the output taking its
maximum values with vanishing probability
Therefore wordlength dimensioning is critical in
terms of the data statistics to guarantee minimum im
plementation loss The most signicant bits at several
points in the architecture can be dropped as they will
only activate with negligible probability depending on
the data statistics Thus the necessary logic to evalu
ate those bits can be obviated in the synthesis process
leading to area and gate delay reductions in the nal
implementation That is true a tradeo	 shall have to
be established between the clipping probability those
MSB bits that would activate and the logic complexity
In the nal hardware simple scaling operations with
factors   shift bits to the upper positions Rounding
is performed afterwards to limit the wordlength passed
on to the next processor
In conclusion the use of histograms and characteris
tic probing sequences has provided the necessary means
Gates stage  stage  stage 
Tree    	

Tree  
  
Table  Number of gates used by the integrated circuit
for each tree Trees  and  total  and

	 gates respectively Tree  displays a heavier
computational load
to reduce the digital architecture complexity of the de
multiplexer to an acceptable level
  RESULTS
Histograms and spectra are shown at key points in the
architecture We have considered two dierent scena
rii a one carrier conguration containing all nine 
Mbps carriers and b one carrier conguration contai
ning two 
 Mbps carriers and one  Mbps carrier In
this way we can show the eect of data statistics and
the spectrum at those points of interest in the architec
ture Particularly we have chosen the input to each of
the polyphase processors The dynamic range is always
specied as     
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
10−4
10−3
10−2
10−1
(1)
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
10−4
10−3
10−2
10−1
(2)
(b)
(b)
(a)
(a)
Figure  Histograms a and b at the output of the rst
 and second  stages	 Note that histogram b is more
spread than a as the number of independent carriers the

rein contained is four times lower than a	 It is advisable
to monitor the dynamics associated with b to keep a suf

ciently low clipping probability while a will be proner to
granular noise	 Note also than in  histogram b is already
departing from the Gaussian shape	
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10−3
10−2
10−1
100
101
102
103
(b)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10−3
10−2
10−1
100
101
102
103
(a)
2.1 Mbps
Figure 	 above spectrum of the b conguration and
below spectrum of the a conguration	 The frequency
axis is normalized to the sampling frequency of 	 MHz	
Extensive simulations have been run to obtain the
wordlength assignment for each polyphase processor
capable of meeting the specications A 
bit analog to
digital converter ADC is used to IFsample the input
signal Wordlengths passed between polyphase proces
sors are limited to 
 bits The complexity of each stage
is presented in table  as evaluated from the VHDL synt
hesis process Note that Tree  is more complex due
to more demanding computational requirements For
comparison see gures  and 
 Conclusions
It has been shown that linear signal processing opera
tions can be eciently synthesized onto an integrated
circuit when the statistical dependence between data is
taken to advantage in the design process The reduction
in complexity must be traded o against clipping pro
bability Therefore saturating arithmetics is necessary
to avoid excessive distortion
References
 J	Sala A	 Pages J	Riba S	Calvo G	 Vazquez
M	A	Rey	 Algorithms Study and Simulation Re

sults	 Report AEO
  submitted
to the European Space Agency under contract
ESAESTEC
NLUS	
 J	Prat A	 Rodrguez F	Ortega M	A	Rey	
Digital Architectural Design	 Report AEO

  submitted to the European
Space Agency under contract ESAESTEC

NLUS	
 F	 Ortega A	 Rodrguez et al	 An advanced
Multi
Carrier Demodulator for the ESA OBP Sys

tem	 Proceedings of the Fifth ESA Internatio

nal Workshop on Digital Signal Processing Tech

niques Applied to Space Communications	
 J	 Sala A	 Pages S	 Calvo J	 Prat	 Design
and Implementation of a DVB On
Board Multi

Carrier Demodulator	 Proceedings of ICASSP
COMM		
