







A computationally efficient DAB bit-stream processor. 
 
Renan Kazazoglu1 
Suleyman S. Demirsoy1 
Izzet Kale1,2 
Richard C.S. Morling1 
 
1
 School of Informatics, University of Westminster 
2
 Applied DSP and VLSI Research Centre, Eastern Mediterranean University 
 
 
Copyright © [2006] IEEE. Reprinted from the proceedings of the 2006 IEEE 
International Symposium on Circuits and Systems, ISCAS 2006, pp. 5543-
5546. 
 
This material is posted here with permission of the IEEE. Such permission of 
the IEEE does not in any way imply IEEE endorsement of any of the 
University of Westminster's products or services. Internal or personal use of 
this material is permitted. However, permission to reprint/republish this 
material for advertising or promotional purposes or for creating new collective 
works for resale or redistribution must be obtained from the IEEE by writing to 
pubs-permissions@ieee.org. By choosing to view this document, you agree to 
all provisions of the copyright laws protecting it. 
 
 
The WestminsterResearch online digital archive at the University of Westminster 
aims to make the research output of the University available to a wider audience.  
Copyright and Moral Rights remain with the authors and/or copyright owners. 
Users are permitted to download and/or print one copy for non-commercial private 
study or research.  Further distribution and any use of material from within this 
archive for profit-making enterprises or for commercial gain is strictly forbidden.    
 
 
Whilst further distribution of specific materials from within this archive is forbidden, 
you may freely distribute the URL of the University of Westminster Eprints 
(http://www.wmin.ac.uk/westminsterresearch). 
 
In case of abuse or copyright appearing without permission e-mail wattsn@wmin.ac.uk. 
A Computationally Efficient DAB Bit-Stream
Processor
Renan Kazazoglu, Suleyman S. Demirsoy*, Izzet Kale*+ and Richard C.S. Morling*
Applied DSP and VLSI Research Group +Applied DSP and VLSI Research Centre
University of Westminster Eastern Mediterranean University
London, UK Gazimagosa, Mersin 10, Turkiye
RenanKazazogluWhotmail.com, SDemirsoy@hotmail.com, kalei@wmin.ac.uk, morlingdWwmin.ac.uk
Abstract- This paper describes an MPEG (Moving Pictures frequency sample reconstruction blocks, with emphasis on
Expert Group) Audio Layer II - LFE (Lower Frequency increasing the computational efficiency of the frequency
Extension) bit-stream processor targeting DAB (Digital Audio sample reconstruction block, leaving the synthesis sub-band
Broadcasting) receivers that will handle the decoding of the filter block for future study. Section II of this paper will
frames in a computationally efficient manner to provide a detail the architecture of the bit-stream processor and
synthesis sub-band filter with the reconstructed sub-band elaborate on the techniques utilized in the frequency sample
samples. Focus is given to the frequency sample reconstruction reconstruction block in order to increase its computational
part, which handles the re-quantization and re-scaling of the efficiency. Further power reductions will be proposed in
samples once the necessary information is extracted from the
frame. The comparison to a direct implementation of the SectionfiIIrby rultipliersowithfrequency sample reconstruction block is carried out to prove
increased computational efficiency.
II. ARCHITECTURE OF THE DAB BIT-STREAM
PROCESSOR
I. INTRODUCTION
This section details the architecture of the designed DAB
Digital Audio Broadcasting (DAB) is a system designed bit-stream processor. The first part will briefly discuss the
by the European Telecommunication Standard (ETS) to structure and operation of some of the important sub-blocks
transmit high quality digital audio and programme related of the bit-stream unpacking block in order to further clarify
data services using the terrestrial and satellite transmitters the improvements brought to the frequency sample
and cable networks in the Very High Frequency (VHF) and reconstruction block. This will be followed by the details of
Ultra High Frequency (UHF) bands [1]. As raw digital audio the techniques utilized in the frequency sample
has a high bandwidth, a method of data reduction and reconstruction block to increase its computational efficiency.
packing has to be applied to the sampled audio data before it In order to render the overall design power efficient, the bit-
can be transmitted over the medium. To this end, DAB uses stream processor was designed to work bit-serially,
the MPEG Audio Layer II - LFE coding algorithm [1], [2]. minimizing the dynamic power dissipation of the processor.
This coding scheme, depending on the results of the psycho- The format of the DAB bit-streams to be processed is given
acoustic analysis performed on the data, allocates bits to the in Fig. 2.
audio samples and packs these samples along with the
ancillary data into frames to be transmitted over the medium. Quantized sub-band
Encoded data and scale Reconstructed
The capability of providing CD quality audio and data bit-stream factors sub-band data
transmission services is the main advantage of DAB over
FM. However, the increased quality of audio and the added F
services result in relatively complex receiver architectures, Bit-stream sample sub-band
often consumrng more power. The aim of this paper iS to unpackng reconstruction filter
describe a computationally efficient DAB bit-streamllofencnsmngmoepoe. Th ai of thi pae ist nakn eoscinfle
processor targeting mobile applications, as current
P a
commercially available DAB receivers are limited to
standalone units or in-car kits.
Figure 1. DAB audio frame decoding
A DAB decoder can be divided into three basic blocks, as
it is seen in Fig. 1. The DAB bit-stream processor described
in this paper deals with the bit-stream unpacking and
0-7803-9390-2/06/$20.00 ©C2006 IEEE 5543 ISCAS 2006
t1 11 t1 1 1 0 0 1 1 0 0 Upon completing the synchronization and extracting the
header, the CRC-16 code word that follows the header is
written to the CRC block to be checked later on. A DAB
'~'$ " audio frame uses an additional CRC-8 word to protect the
Header SCFSI SCF Audio Data scale factors, in addition to the CRC-16 word that the MPEG
CRC Audio Layer II- LFE coding algorithm utilizes to protect the
Bit Allocation Ancillary Data last 16 bits of the header along with the bit allocation and
scale factor select information parts of the frame. As the bit-
Figure 2. ADAB frame stream processor was designed to operate bit-serially, theCRC check was implemented as given in [2].
A. Bit-Stream Unpacking The final sub-block used by the controller is the decodemode pointer, which is used to determine which section of
The purpose of the bit-stream unpacking block is to the frame is currently being processed and what decoding
synchronize to the incoming bit-stream and extract frame steps will follow.
related data necessary for the reconstruction of the samples,
such as the scale factors and the sample word lengths, as well 2) The Coefficient Store
as the ancillary data packed by the service provider. The The coefficient store contains standard related
block diagram of the bit-stream unpacking block is given in coefficients, such as the re-quantization constants and bit-
Fig. 3. allocation tables that will be used later on for frequency
sample reconstruction purposes. However, not all1) The Controller coefficients are stored as they appear in [1], due to the
The controller iS responsible for maintaining proper efficient implementation of the frequency sample
operation of the decoding process. It is supported by three reconstruction block, which will be explained in Section II
sub-blocks, Synchronize & Extract Header, CRC check and B.
the decode mode pointer.
3) Bit-Stream DataThe synchronization of the incoming bit-stream is The bit-allocation, SCFSI and scale factor index
handled by the Synchronize & Extract Header sub-block, information extracted from the frame is stored in the bit-
which synchronizes and extracts the 32 bit header stream data block. The bit allocation data is stored in a table
information. The header contains a synchronization word that is capable of scanning through the sub-bands
followed by frame related data such as the bit rate, sampling continuously, thus simplifying the audio sample extraction
frequency and frame mode. As DAB uses a constrained process. The SCFSI and scale factor indexes share storage to
version of the MPEG Audio Layer II - LFE coding reduce the overall size of the bit-stream processor, as they
algorithm, some of the frame related data bits are known a are not used simultaneously. Keeping the addressing of the
priori, such as the layer and sampling frequency bits. These storage elements based on non-zero bit allocated sub-bands
bits are also included in the synchronization process to eliminates the need to check each sub-band for bit allocation
reduce the chance of misdetection, as the synchronization information each time the data needs to be accessed. The
word itself is not a forbidden word, and can be found controller provides the bit-stream data block with the sub-
elsewhere in a frame. band number, and receives the bit allocation and the SCFSI
or scale factor index, depending on which stage of the
Synchronize decoding is in process.
& Extract 4) Sample Register
Encoded Frequency The sample register is used to temporarily store the audio
bit-stream CRC Controller 4 sample sample until it is completed, as these samples are read bit-
Check reconstruction serially. Depending on whether these bits correspond to a
Decode sample or a grouped sample code word, the ordering and
Mode meaning of the bits also change, which is handled by the
sample register.
B. Frequency Sample Reconstruction
Coefficient Sample The audio samples are decoded and reconstructed using
Store Register the information obtained from the header, bit allocation,
A ^ ~~~~~~~scalefactor select information and scale factor indexes.|Bit-S tream |However, due to the complex encoding algorithm, the
Data reconstruction of the samples requires a moderate amount of
operations. The frequency sample reconstruction block
Figure 3. Block diagram of the bit-stream unpacking block handles these operations in a computationally efficient
manner to minimize the area and reduce the power
5544
consumption of the device. The necessary computations for
the reconstruction process of each sample can be seen in (1).
PCM = SCF x C x (Sample + D) (1) I
where SCF is the scale factor extracted from the frame, and Sample Code
C and D are constants determined from the bit allocation Word 2
information and are given in [1] and [2]. However, the
samples are not always directly coded in the bit-stream. D
Depending on the results of the psychoacoustic analysis, the Combinational
encoder may pack three samples into one sample code word Iogic
that will have to be decoded in order to obtain the samples.
The packing is done in accordance with (2).
vi = i2 x+i*y+z (2) |2
where i = 3, 5, or 9 and x, y and z are the individual samples. tatch
The de-grouping algorithm requires a modulus operation MB
followed by an integer division operation executed three
times to obtain three samples from one code word. The first
modulus operation provides the sample z, while the integer
division operation provides the input for the next cycle of the PCM Sample Latc Right Shift
modulus operation. The second and third modulus operations
will provide y and x, respectively. Figure 5. The ALU block diagram
A simplified block diagram of the direct implementation
of the frequency sample reconstruction block is given in Fig. The division and modulus operations are only ever
4. It can be seen from the block diagram that the needed when three samples are grouped into one sample
implementation requires an adder, two multipliers and a code word and a de-grouping operation is necessary. In this
divider. However, dividers aren't very efficient when case the decoded sample code word will be divided into 3, 5
considered for their power consumption or size [4]. or 9, depending on the number of quantization levels
obtained from the bit allocation data. Exploiting this, the
An alternative structure capable of handling these division operation is carried out as a multiplication with the
computations is given in Fig. 5. The proposed structure inverse ofthe divisor, i.e. nlevels'l.
contains no dividers, due to the fixed number of divisors
needed to evaluate (2). The integer part of the result of this multiplication will
give the required integer division result. Since i can only take
Sample on three values, the fractional part of the result of the
multiplication can be mapped to the 17 possible values for
the remainder of the division operation through a simple
combinational logic block.
Closer inspection of the coefficient C reveals that this
Dividend Divisor coefficient is equivalent to nlevels-1 + 1. Thus, multiplication
. Latch by C is also handled by the multiplier responsible for
Remainder Quotient division, with an extra adder at the output of the multiplier to
compensate for the + 1 factor. The coefficient D is extracted| °1lLatch | from the bit allocation index value. This reduces necessary




L2 S I I IThe final reduction in computational efficiency is
introduced in the rescaling part of the reconstruction of the
D Latch audio samples. The 64 scale factors necessary for rescaling
the audio samples can actually be obtained from three base
numbers and their division by the powers of 2, thus
,eliminating the need to store all 64 scale factors in the
Sampl Latch < |1coefficient store. As one of these three base values is actuallySample
~~~~~~1,the selection of any scale factor produced from this base
will result in a division by the required power of 2, which is
Figure 4. Direct implementation block diagram merely a shift operation without any multiplication involved.
Exploiting this property of the scale factors, one can
implement the multiplication by the scale factors as a
5545
multiplier-block [5], as seen in Fig. 6. The multiplier block TABLE II. OPERATIONAL COMPARISON
consists of 4 adders and 4 hardwired shift operations. The Clock Executed Operations
first adder to the left is fed with the input (xl) and its 2-bit Cycle Direct Implementation Proposed Implementation
shifted version (x4), to produce 5 times the input (x5). Ci . N/N comb. division . NxN comb. multiplication
Similarly, the second one is fed with the output of the first Initialize . Store remainder in latch . Store combinational logic
adder and its 4-bit shift to produce 85 times the input, the output in latch
third with the output of the second adder and an 8-bit shift to *2 * Add D to the stored . Add D to stored sample
produce 21845 tims.tenpu,ndheilLoop remainder . Multiply by nlevels1produte 21845a imes the onput, an o thenal one watd toe (x 3) * Multiply by C . Add results
input and a I1-bit shift of the output of the third adder to . Store result in latch . Store in latch
produce 43691 times the input. The de-multiplexer in Fig. 5 . Store quotient in latch
is used to bypass the multiplier block when necessary. kl Multiply result by SCF * Multiply by SCF
Having the scale factor values inherent in the multiplier Loop * Store result in output . Shift as necessary
block architecture renders the storage of any of the scale (x 3) latch . Store result in output latch
factors unnecessary. Thus, effectively the 64 scale factor . N/N combinational . NxN combinational
storage block along with an NxN combinational multiplier, division multiplication
where N is the number of bits, is replaced by 4 adder stages, store remainder in latch . Store combinational logicwnerel\l 1 t e n OI otS 1 lc a aaer_sages output in latch4 hardwire shifts and a variable shifter.
A comparison of the resources required by the two III. FUTURE WORK
implementations is made in Table 1. In order to make the The proposed frequency sample reconstruction block
comparison meaningful the direct implementation was also contains one more multiplier. However, closer inspection
made to be as efficient as possible, implementing the reveals possibilities for further reductions in computational
combinational divider as an efficient N/N integer divider [6], complexity, size and power. The aforementioned multiplier
with its size equivalent to N2/2 adder cells, and keeping the is responsible for handling the multiplication with nlevels-1,
controller logic unchanged where possible. The operations which has only 16 different values. Thus, this multiplier
that take place in each clock cycle for the worst case of could be implemented as a reconfigurable multiplier block
having to de-group three samples from a three bit sample (ReMB) for multiple constant multiplications as proposed in
code word for the direct implementation and the proposed [3]. The proposed work will be carried out until the
implementation are listed in Table 2. The last two clock submission date for the final paper.
cycles need to repeat twice more to complete the de-
grouping process. IV. CONCLUSIONS
1 1 1 In this paper the design of a computationally efficient bit-
stream processor for MPEG Audio Layer II LFE targeting
/4 ) x5 16 x85 256 x21845 DAB audio receivers was described. Manipulation of the( > / > />> inherent constants revealed a simpler structure for handling
the re-quantization and re-scaling of the audio samples,
\\2 which resulted in a reduced size coefficient store as well as a
computationally efficient frequency sample reconstruction
block implementation.
x43691 REFERENCES
[1] ETSI, "Radio broadcasting systems; Digital audio broadcasting
(DAB) to mobile, portable and fixed receivers", ETS 300 401, 1997.
Figure 6. Scale factor multiplier block [2] ISO, "Information technology - Generic coding of moving pictures
and associated audio information - Part 3: Audio", ISO/JEC 13818-3,
1998.
TABLE I. RESOURCE COMPARISON [3] S.S. Demirsoy, A.G. Dempster and I. Kale, Design guidelines for
Resource Resource Size reconfigurable multiplier blocks", IEEE International Symposium onResource~ ~ Reouc Siz .Circuits and systems, 2003, vol.4, pp.289-292, May 2003Type Direct Implementation Proposed Implementation
Memory . Bit Allocation Index Storage [4] S.F. Oberman and M.J. Flynn, "Division algorithms and
* Scle actoSeectin Idex Scle Fcto Stoageimplementations", IEEE Transactions on Computers, vol.46, no.8,L *~~Scale Factor Selection ndex/ al ac r Storage pp.833-854, August 1997Coefficient * nlevels (16xN) * nlevels' (16xN)pp8354Aust19
Stores * D (16xN) [5] A.G. Dempster and M.D. Macleod, "Use of minimum-adder
* C (16xN) multiplier-blocks in FIR digital filters", IEEE Trans. CAS-II, vol.42,
* Scale Factors (64xN) - ___________ no.9, pp.569-577, November 1995
Arithmetic * NxN multiplier (x2) * NxN multiplier (xl) [6] K.Y. Khoo and A.N. Willson, Jr., "Efficient VLSI implementation of
Blocks * N-bit adder (xl) * N-bit adder (~x6) n/n integer division", IEEE International Symposium on Circuirs and
* N/N divider (xl) ____________ Systems, 2005, vol. 1, pp.672-675, May 2005
5546
