High-performance arithmetic coding VLSI macro for the H264 video compression standard by Nunez-Yanez, JL & Chouliaras, VA
                          Nunez-Yanez, J. L., & Chouliaras, V. A. (2005). High-performance
arithmetic coding VLSI macro for the H264 video compression standard.
IEEE Transactions on  Consumer Electronics, 51, 144 - 151. [Issue 1].
10.1109/TCE.2005.1405712
Link to published version (if available):
10.1109/TCE.2005.1405712
Link to publication record in Explore Bristol Research
PDF-document
University of Bristol - Explore Bristol Research
General rights
This document is made available in accordance with publisher policies. Please cite only the published
version using the reference above. Full terms of use are available:
http://www.bristol.ac.uk/pure/about/ebr-terms.html
Take down policy
Explore Bristol Research is a digital archive and the intention is that deposited content should not be
removed. However, if you believe that this version of the work breaches copyright law please contact
open-access@bristol.ac.uk and include the following information in your message:
• Your contact details
• Bibliographic details for the item, including a URL
• An outline of the nature of the complaint
On receipt of your message the Open Access Team will immediately investigate your claim, make an
initial judgement of the validity of the claim and, where appropriate, withdraw the item in question
from public view.
High-performance Arithmetic Coding VLSI Macro for the H264 
Video Compression Standard 
J. L. Nmez, V. A. Chouliaras, Member, IEEE 
Abstract - This paper investigates rhe algorithmic 
complexiq of arithmetic coding in the new H264 video codzng 
standard and proposes a coprocessor to reduce it by more 
ihan an order of magnitude. The coprocessor is based on an 
innovative algOFlfhFi2 named as the MZ-coder and maintains 
the origieal coding eflciency with a multiplication-free, non- 
stalling, fui& pipeiined architecture with modest harhvare 
requirements. The coprocessor delivers a constant throughput 
for both coding and decoding of I bit per cycle and can be 
attached to a controlling CPU whose ISA has been extended 
with arithmetic coding instnrctions. 
Index Terms - arithmetic coding, H264, video coding, 
Golomb codes, renormalization. 
arithmetic coding chip presented in [6] replaces the division 
operation by storing the probability values in a look table and 
using the coder state as a pointer to a particular probabiiity in 
that table. Multiplications on the other hand are done 
explicitly using 8x8 parallel multiplier. 
nI. ANALYSIS OF AC IN THE H264 VIDEO CODEC 
The original arithmetic coding implementation in the H264 
codec is known as CABAC [8]. Preliminary profiling of the 
H264 algorithm revealed the average number of calls to the 
AC routines per frame as a function of the quantization 
parameter QP as shown in Fig. 1. 
ArithmellC CUkq cnl 
I. INTRODUCTION 
he exponential increase in the amount of digital visual T information that must be transmitted and stored efficiently 
has motivated a large body of research into advanced video 
coding techniques which allows orders of magnitude 
reduction in the required bit-rates. New video coding 
standards such as the recent H264 video codec (also known as 
MPEG4 part 10) [2] deIiver better quality and lower bit rates 
but at the expense of an almost exponential increase in the 
number of CPU cycles required per input frame of video data. 
The introduction of advanced entropy coding within the H264 
standard with the pioneering use of context-based arithmetic 
coding [3j in a lossy video standard is one of the reasons 
behind the increase in the computational cost of the codec. 
The high-speed arithmetic coder (AC) coprocessor described 
in this paper achieves a significant reduction of the AC 
computational cost in the W264 video codec with modest 
hardware complexity. 
B 
4 B 12 16 a 24 28 
(1p 
Figure 1. Ht64 arithmetic coding complexity 
Fig, 1 shows that AC is a very compute intensive operation 
and since traditional parallelizing techniques such as SIMD 
extensions cannot accelerate this essentially sequential 
process, the introduction of dedicated hardware support in the 
form of a specialized coprocessor, is a suitable solution. 
11, HARDWARE-BASED BINARY ARITHMETIC CODERS 
The IBM Q-coder and the QM-coder [4] are the best 
known examples of hardware-based binary arithmetic coders. 
A VLSI implementation of both the Q-coder and QM-coder 
has been done in [ 5 ] .  The device is called the Qx-coder and 
can implement both algorithms clocking at 75 MHz with a 
throughput of around 64 Mbitsisecond using 0.35 pm standard 
cell technology from IBM (CMOS 5s). The adaptive binary 
J. L. Nliriez is with the Department of Electronic and Electrical 
Engineering, University of Bristol, UK (e-mail: j.l.nunez- 
yanez@bristol.ac.uk) 
V. A. Chouliaras is with the department of Electronic Engineering, 
Universiry o f  Loughborough, UK (e-mail: v.a.chouliaras@lboro a u k )  
0-7803-8838-0/05/$20.00 02005 IEEE. 
Iv.  PROPOSED ARITHMETIC CODING ALGORITHM 
The MZ-coder evolves from the Z-coder software algorithm 
presented in [7] as a generalization of the well known 
GolombiRice coder for lossless coding of bilevel images. Fig. 
2 shows the simplified (pseudocode) description of the MZ- 
coder algorithm (right) and the original CABAC algorithm 
(left). The coder state variables are range and low for the 
CABAC algorithm and range and subend for the MZ-coder. 
The renormalization process in the MZ-coder does not 
include internal dependencies. As a result it can be readily 
accomplished with a single shift left operation. On the other 
hand the pseudocode for the CABAC algorithm shows the 
internal dependencies of low inside the while loop. This 
dependency means that a variable number of cycles (from 0 
287 
up to a maximum of 6 )  are required to maintain the state 
variables in the required range. 
rLPS = table256~8 (state$”); 
range = range - rLPS; 
if ( symbol != MPS 1 
( 
low+=range; 
range = rLPS; 
- 
pLPS = tabl&x6(shte); 
Z = range + pLPS; 
If (2 * HALF) 
If (symbol = MPS) 
1 
2-QUARTER + Z >> 1 ; 
i E I’ 
/*renormalization loop*/ 
while( range < QUARTER) 
-+-m motion estimation, transform and quantization functions 
through developing fast algorithms and exploiting the 
available data level parallelism. Entropy coding based on 
range = Z; 
if (range >= HALF) 
I 
a .  
I output 1 bit; 
range <=l; 
suhend + I  ; 
if(1ow >= HALF) 
Output bit; 
IOW -= HALF; 1 






2 = FULL - 2; 
{ 
if (low< QUARTER) 
Output bit; subend +-Z; 
else range += Z; 
hits to follow++; 
low-=QUARTER output shift-bits hits; 
subend <= shift-bits; 
shift-bits = shift(range); 
I range <= shift-bits; 
low <<= 1 ; 
range <<= 1 ; 1 
I 
Figure 2. CABAC & MZ pseudocode description 
knomalizah md Figure 4. Coprocessor architecture 
Fig. 3 illustrates that the costs of multiple-cycle 
renormalization account for a throughput degradation of 
around 15% in the original CABAC algorithm. 
v. ARCHITECTURE AND VLSI IMPLEMENTATION 
Fig. 4 shows the hardware architecture of the arithmetic 
coding/decoding coprocessor. The chosen VLSI technology 
was the UMC 0.13 pm, 8-copper silicon process. The 
maximum operating frequency was 330 MHz worst-case 
(throughput of 330 Mbitsisecond) and the complexity of both 
the coder and decoder is 5600 standard cells. 




G. Lawton, “New Technologies Place Video in Your Hand”, IEEE 
Computer, Vol. 34, No.4,pp. l4-17,2001. 
G. Bjontegaard, “H.26L Test Model Long Term Number 4 (TML 4) 
drafto”, ITU-TSG16IQ.6 Ql5-J-72, June 2000. 
G. Langdon, “An Introduction to Arithmetic Coding”, 1BM 1. Res. 
Develop, Vol. 28, No. 2, pp. 135-149, March 1984. 
W.B. Pennebaker et al, “An overview of the Basic Principles of the Q- 
Coder Adaptive Binary Arithmetic Coder” IBM J. Res. Develop, Val 32, 
No. 6, pp. 7 17-725, November 1988. 
[5 ]  M. J. Slanery, J. L. Mitchell, “The Qx-coder”, 1BM Joumal of Research 
and Development, Vol. 42, No. 6, pp. 747-7534, 199s. 
[6] D. Marpe, H. Schwartz, T. Wiegand, “Context-Baed Adaptive 
Arithmetic Coding in the H.264 Video Compression Standard’, IEEE 
Transactions on Circuits and Systems for Video Technology, Vol. 13, 
L. Bottou, P. G. Howard, Y. Bengio, “The Z-coder adaptive binary 
coder”, ln Proceedings of the Data Compression Conference, pages 13-- 
22, March 1998. 
No. 7, pp. 620436,2003. 
[7] 
288 
