Abstract-TIle purpose of our work has been to evaluate if it is practical to use a 16·bil floating point representation to store the intermediate sample values and other data in memory dur ing the decoding of MP3 bit streams. A floating point number representation otTers a better trade-off betw een dynamic range and precision than a fixed point representation for a given word length. Using a floating point representation means that smaller memories can be used whicb leads to smaller chip area and lower power consumption without reducing sound quality. We have designed and implemented a DSP processor based on 16-bit floating point intermediate storage. The DSP processor is capable of decoding all MP3 bit streams at 20 MHz and this has been demonstrated on an FPGA prototype.
I. INTRODUCTION
MPEG-l layer III [1), commonly referred to as MP3, is well understood, both on desktop systems and in embedded systems. Decoders for desktop systems can be implemented using either fixed point or floating point arithmetic, whereas embedded systems typically use fixed point arithmetic.
Embedded MP3 decoders usually have to use two 16-bit memory words for each intermediate value to achieve' the required dynamic range and precision with fixed point arithmetic. We have investigated the feasibility of using a 16-bit floating point representation to reduce the memory cost without sacrificing sound quality. This would halve the data memory usage which would have a significant impact on power consumption and chip area. Another advantage with floating point arithmetic is that the hardware eliminates all scaling operations associated with fixed point arithmetic which leads to shorter firmware development time.
One drawback of floating point is the complexity of the arithmetic units. However, for a given dynamic range, the multiplier in a floating point data path is smaller than the corresponding multiplier in a fixed point data path.
In order to evaluate our floating point approach, we have used the MPEG audio compliance test [3] . In short, a decoder can be classified as full precision, limited accuracy, or not compliant depending on the difference between the provided reference output and the decoded output. We have also con ducted infonnallistening tests since there are no formal criteria for evaluating the quality of an MP3 decoder for an arbitrary bit stream.
0-7803-8578-0/04/$20.00 ©2004 IEEE
II. FLOATING POINT REQUIREMENTS
In order to design a system with ftoating point arithmetic, two important design decisions of the system have to be made.
One is the floating point format which decides the range and precision of all values that can be handled by the system. The other decision is the arithmetic operations that should be supported in hardware for a given target application.
A. The Floating Point Fonnat
Although it is possible to analytically determine the maxi· mum values encountered in an MP3 decoder, this information is not really useful. For example, by setting the gain and scale factors to their maximum values, it is possible to create a synthetic MP3 bit stream where the final output samples are magnitudes larger than the allowed output range. Because of this, we did not try to perform any formal analysis of the possible number ranges occurring in MP3 decoding.
Instead, we instrumented the ISO MP3 decoder [2] to use our own custom floating point arithmetic library with configurabIe mantissa and exponent widths. The library also supported arithmetic with mixed precision in order to mimic a processor with high precision data path but lower precision memory. By keeping track of the smallest and largest values encountered in the decoder, the library was used for determin ing the required dynamic range.
Our goal was to find an exponent configuration where all MP3 bit streams could be decoded without having to saturate any intermediate value. We did not consider hand-crafted bit streams with extreme values but we tested more than 200 different music and speech bit streams.
We concluded that all normal bit streams could be decoded successfully with an exponent size of 5 bits in data memory.
The exponent bias was selected to give a number range of approximately 2-26 to 25 which would correspond to the dynamic range of a 32-bit fixed point processor.
In order to simplify the hardware, we used the same bias for register values, but we had to increase the exponent to 6 bits to An MP3 decoder is tested by decoding a bit stream supplied in the compliance test and comparing the output with a supplied reference output. If the rms of the difference is less than 8.8 . 10-6 and the absolute difference is less than [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] relative to full scale for all samples, the decoder is classified as a full precision decoder. Otherwise, if the rms of the difference is less than 1.4· 10-4 regardless of the maximum absolute difference, the decoder is classified as a limited accuracy decoder. If the decoder fails to meet these criteria, the decoder is not compliant.
The compliance level for different sizes of the mantissa was investigated and the result is given in Fig. I . The exponent sizes used was 6 and 5 in registers and memory respectively.
B. Operations
An analysis of the ISO MP3 decoder shows that the follow ing floating point operations should be supported in hardware to implement an efficient MP3 decoder.
•
Add
• Subtract ��-r������--�I =n a=I � l u� S S�a---��
The number x = (_1)8ign . 2e:cponent-ll .
(1 + m��i;�8(l), except when exponent"" -32 for which x "" O.
Memory floating point data type:
bit: r--: ::r'-: ::---r--=----:-�-,--:9�m-,-a,:,n': ::,/is., ..,s-: :-a---=-O--,
The number x = (_1)8;gn. 2exponent-ll.
(1 + mal��: 8(l).
except when exponent = -16 for which x "" O. These operations can be mapped to a floating point adder and a floating point multiplier. All remaining operations can be reduced to these primitives or implemented as table look ups. Because the memory and registers have different word lengths it is necessary to convert between different floating point formats. The round operation converts from the register word length to the memory word length, and the floating point load operation expands a memory word to a register word.
Ill. HARDWARE IMPLEMENTATION
As a proof of concept, we developed a simple pipelined nsp core to prove the feasibility of the approach outlined above. The DSP core is a load-store architecture with separate program, data, and constant memories. The general idea was to keep the hardware reasonably simple without making the software unreasonably complex. In our experience, software is generally easier to debug than hardware. The instruction set was kept at a minimum and the hardware had no inter instruction dependency checking.
A. Data types
Each general purpose register can contain a 16-bit integer or a 23-bit floating point value. In the former case, the upper 7 bits are unused. When a floating point value is loaded from memory it is expanded from 16 to 23 bits. Before storing a floating point value it is rounded to 16 bits. The data types are summarized in Fig. 2 .
The most important reason for using these values is to avoid a configuration where the decoder barely meets the require ments for limited accuracy. Another reason is the convenience of having a J6-bit wide memory.
B. Instruction Set
The instruction set basically consisted of load and store from any of the general purpose registers, register to register integer and fl oating point operations, and lIO operations.
There are 16 general purpose registers. This number was decided upon after studying the algorithms used in MP3 The 36-point inverse modified DCT, IMDCT, was imple mented using a fast IMDCT algorithm [4] and the 12-point IMDCT was implemented using 36 floating point multiply and accumulate instructions,
The 32-point DCT used in the subband synthesis part was implemented using Lee's fast DCT algorithm [5] . With careful scheduling, the 16-point kernel could be implemented in registers only, without loading or storin g temporary values to memory.
B. Quality
According to the MP3 compliance test, our decoder is classified as a limited accuracy MPEG-l Layer III decoder.
The rms of the difference between our decoded output and the reference provided with the compliance test is 3.2 . 10-5 which is well below the limit for limited accuracy, 1.4 . 10-4.
Even though our decoder is not a full precision layer III decoder, informal listening tests could not discern files decoded with our decoder from files decoded with the full precision ISO MP3 decoder.
C. Memory Use
The final version of the decoder used approximately 6800 24-bit words for program memory, 900 23 -bi t words for the constant memory, and 6100 l6-bit words for data memory. We have not spent any time trying to reduce the program memory size. More than 40% of the program memory is used for the Huffman tables.
D. Peifonl!Qnce
In order [0 measure the performance of the decoder on a typical MP3 bit stream we used a 44.1 kHz music bit stream, with an average bit rate of 202 kbps. A profile of the decoder is shown in Fig. 3 .
The time spent in the Huffman decoding and sample de quantization is data dependent. A bit stream was constructed to trigger worst case execution time in the data dependent parts.
In our case. this consisted of a 48 kHz bit stream using only The performance of the decoder is summarized in Fig. 5 .
We see some possible improvements that could reduce the program memory size and increase the performance.
