Abstract-We present the multiple variable length decode algorithm implemented in most video applications on MDSP. In our implementations we were able to decode multiple symbols per cycle. The implementation is efficient and is targeted towards memory constrained embedded systems. We have confirmed this algorithm in our implementations of H261/3, MPEC2/4 and achieved multifold speedup improvements against algorithms, which can decode at the symbol rate only. This limits the decoding throughput capability of these algorithms. Most parallel decoding approaches use the length of the first codeword to detect the second codeword in parallel. By a single table Iookup operation we detect multiple codewords without any detection mechanism.
I. INTRODUCTION
Variable Length Decoding (VLD) is the most important part of the video standards like MPEG214 and H261/3. VLD is the first stage that feeds the rest of the processing in the pipe like IDCT, motion compensation, etc. So VLD computation throughput drives the throughput of the whole decode pipe. Rest of the processing in the decode pipe can be parallelized based on VLD throughput. Variable Length Coding (VLC), also known as Huffman coding, is a mapping process between source, symbols and variable length code words. The variable length coder assigns shorter code words to frequently occurring source symbols, and vice versa, so that the average bit rate is reduced. In order to achieve maximum compression, the coded data is sent through a continuous stream of bits with no specific guard bit assigned to separate between two consecutive symbols. As a result, decoding procedure must recognize the code length as well as the symbol itself. VLD is carried out using tree-based [7, 13 Table B -15 is also provided. Multiple such tables are required for implementation of each video standard. All the tables are indexed with less than or equal to 8-10 bits.
The software-only Multiple-VLD decoding is made easier due to a field-access unit available on each DSE. This helps to extract multiple bit fields of any length from the given 32 bits and facilitates sign extension. Using an index to the Multiple-VLD table, normally 8-10 bit value, we fetch the entry containing the multiple run-levels and the cumulative length as shown in fig. 1 . This cumulative length is then used to align the 32-bit buffer to decode the next set of entries. The sign extension capability of the field access unit is exploited for sign extending the level values. No special instructions are provided by the architecture for VLC decoding.
The field access control register (see Fig. 2 ) allows signed extraction of 4 different bit lengths from the 32-bit packed entryhahe as shown in Table 1 In case of code words greater than 8 bits, they are indexed to proper Multiple-VLD table using branching in software. All tables are stored in local data memory and the implementation is thus not subject to data cache behavior, We have demonstrated the implementation of the software only multi-symbol variable length decoder. We have implemented this method in H261/3, MPEG214 real-time video decoders on the MDSP CR420033 & CRA3O0l3 evaluation boards. Our approach decodes multiple symbols whenever allowed by the bit stream without detecting the length of the previous symbol. This approach is generic and can be used in high-throughput video applications. We achieve a high-throughput, multiple symbols per cycle decoding in software-only for video applications. We found that the general implementation is flexible and easily adaptable across standards in terms of code reuse. VLD accelerator processing cores need to be tightly coupled to the instruction pipeline and are part of the core of the main processor. Our solution is a loosely coupled data parallel approach to decoding compared to instruction parallel approach of VLIW.
