Rochester Institute of Technology

RIT Scholar Works
Articles

6-26-2003

Real-time video annotation using MPEG-7 motion
activity descriptors
Andreas Savakis
Pawel Sniatala
Radoslaw Rudnicki

Follow this and additional works at: http://scholarworks.rit.edu/article
Recommended Citation
Mixed Design of Integrated Circuits and Systems 10 (2003)

This Article is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Articles by an authorized
administrator of RIT Scholar Works. For more information, please contact ritscholarworks@rit.edu.

MIXDES 2003
Real-time Video Annotation using MPEG-7 Motion Activity Descriptors
Andreas Savakis 1 , Pawel Sniatala 2 , and Radoslaw Rudnicki 2
1

Department of Computer Engineering
Rochester Institute of Technology
Rochester, NY 14623, USA

Abstract
The MPEG-7 standard provides a framework of
standardized tools that can be used to describe and
efficiently manage multimedia content. Visual
descriptors include color, texture, shape and
motion. In this paper, we address the hardware
implementation of MPEG-7 motion descriptors
using Handel-C. In particular, descriptors for
motion intensity and spatial distribution of motion
activity are generated and implemented.
1. Introduction
The MPEG-7 standard was approved in 2001 as
an effort to address the growing need for handing
multimedia content [1, 2]. The key aspect of
MPEG-7 is that provides a framework of
standardized tools that can be used to describe and
efficiently manage multimedia content.
The
standard specifies a set of descriptors and
description schemes that can be employed in
applications such as video indexing and retrieval,
content summarization, real-time content delivery,
surveillance, personalized services, etc.
Visual descriptors include color, texture, shape
and motion [2, 3]. While much research on the
development of MPEG-7 has concentrated on
generating the descriptors the environments and
applications, there has been little effort devoted to
real time annotation of video sequences. In this
paper, we propose a hardware implementation of
MPEG-7 motion descriptors using Handel-C. The
motion descriptors considered here are for motion
intensity and spatial distribution of motion
activity.

2

Department of Control and System
Engineering Poznan University of Technology
60-965 Poznan, Poland

The proposed hardware implementation can be
used to provide real time annotation of MPEG-7
video streams, and can enable real time content
delivery applications ranging from home security
to personalized news and entertainment.
2. Hardware Implementation using Handel-C
Implementation of MPEG-7 motion descriptors on
an FPGA platform provides a viable alternative
compared to full custom solutions based on cost,
size, speed and power consumption.
Until
recently, the use of hardware description
languages, such as VHDL or Verilog, were
driving the majority of hardware development
solutions on FPGA platforms. Handel-C was
recently introduced as an alternative that is easier
for the typical programmer to master and may
lead to faster development cycles. Most
algorithms are prototyped in C and then are
translated into VHDL or Verilog. This process
can introduce errors, and poses risks associated
with prolonged development time and cost. The
Handel-C language avoids this problem. It uses
much of the syntax of conventional C with the
addition of inherent parallelism that is necessary
for hardware implementation. The language was
designed to describe algorithms, which are
subsequently compiled down to hardware. By
targeting FPGAs directly, Handel-C provides fast
route for hardware prototyping and developing of
first generation electronic products. The language
supports complex C functionality including
structures, pointers, and functions. Extended
operators for bit manipulation, and high-level
mathematical macros (including floating point)
are available. State machines, often used in
hardware design, can be synthesized directly from
C statements, such as if, case, and while. Handel-

MIXDES 2003
C automatically deals with clocks, clock enables,
and data transfers across clock domain
boundaries. Another useful feature is that it
supports multiple asynchronous clock domains.
By allowing the use of the optimal clock rate for
each part of the design, it enables increased speed
and reduced power consumption [9].
3. MPEG-7 Motion Activity Descriptors
Motion activity may describe several attributes
that contribute towards the efficient use of these
motion descriptors in a number of applications.
For example, the MPEG-7 framework allows for
motion descriptors relative to intensity, spatial
distribution, temporal distribution, and directional
distribution of activity. In this paper, we consider
the computation and implementation of motion
intensity and spatial distribution of activity. In
previous work, these features have been used for
shot boundary detection and key frame extraction
for the purpose of video indexing, retrieval and
summarization [4-6].
Motion activity is typically measured using the
magnitude of motion vectors. For a given video
frame, let x(i, j ) and y (i, j ) denote motion vectors
in the x and y directions respectively, where (i, j )
indicates the block indices. The spatial activity
matrix was defined in [7] to be
 Rxy (i, j ) if Rxy (i, j ) ≥ avg ( Rxy (i, j ))
Z (i , j ) = 
otherwise
0

where

Rxy (i, j ) =

( x(i, j )

2

+ y (i, j ) 2 )

and the average
avg ( Rxy (i, j )) =

1 M N
∑∑ Rxy (i, j ) .
MN i = 0 j = 0

Here M and N denote the size of each frame. This
approach ignores low activity blocks and
maintains high activity blocks unaltered to form
the spatial activity matrix.

Intensity of Activity is expressed by an integer in
the range (1-5), where higher values of intensity
correspond to higher motion activity [3]. The
intensity of motion for each frame is determined
as follows

In =

1 M N
∑∑ Z (i, j )
MN i = 0 j = 0

where n is the frame index. The intensity of
activity is normalized and quantized based on its
variance across all frames.
Spatial distribution of activity indicates whether
the activity is spread across many image regions
or it is confined to one region [7, 8]. To
determine the spatial distribution of activity, the
spatial activity matrix is divided into nine nonoverlapping regions. The spatial activity matrix
values in each region are summed up to give an
average spatial motion distribution in each region.
Then the method localizes the spatial distribution
activity in each frame to that region that depicts
the maximum activity. This depicts the number of
active regions in a frame.
4. Handel-C description

The equations which define the motion descriptors
were coded using the Handel-C language. The
code can be divided into a few main components,
as shown in Figure 1, which have their
counterparts in the hardware design and
implementation. The lowest level component is a
block to calculate the square of the input numbers.
Inputs are motion vectors x(i,j) and y(i,j), which
are in a range (-15.5,+16) and have discrete values
with stepsize of 0.5 between values. Hence to
simplify, we can move the decimal point and treat
these numbers as integers. Additionally, to
simplify the hardware, the sign can be neglected,
since we only need the square of the input values
for further calculation. Next, the quantities x(i,j)2
and y(i,j)2 are calculated, added and the square
root is computed. An algorithm that requires only
basic arithmetic operations was chosen to
calculate the square root. It was coded in HandelC as an inline function presented below. This
approach results in a simple hardware
implementation.

MIXDES 2003

inline unsigned int 8 sqrt(unsigned int 16 number)
// square calculation
{
unsigned int 16 xn,xn1;
unsigned int 16 xn_temp1,xn_temp2;
unsigned int 4 sqrt_i;
par{
sqrt_i=0;
xn= number / 2;
}
do{ //xn1 = (xn + (number/xn))/2;
xn_temp1 = number / xn;
xn_temp2 = xn_temp1 + xn;
xn1 = xn_temp2 / 2;
xn = xn1;
sqrt_i++;
} while(sqrt_i<9); // 9 loops
return xn1[7:0];
}

The Handel-C syntax is very intuitive. One can
see from the code, that a 16-bit unsigned integer
number is the input parameter and the result is an
8-bit unsigned integer. We notice a keyword par,
which is not available in standard C. When
targeting hardware it is extremely important to use
parallelism. The keyword par allows statements
in a block to be executed in parallel. By
employing the par statement in the above code,
the variables, sqrt_i and xn will have their new
value in the same clock cycle. Two specific pieces
of hardware are built to perform these two
assignments. The result is stored in a matrix Rxy .
The three steps, square, addition and square root,
take only a few lines of Handel-C code, as shown
below.
do{ // R matrix
par{ pow1 = (0@x[p])*(0@x[p]);
pow2 = (0@y[p])*(0@y[p]);
}
sum_pow = pow1+pow2;
R[p] = sqrt(sum_pow);
p++;
}while((0@p)<SIZE);

It should be noted that the @ operator is a simple
bit concatenation operation. At this stage the Rxy
matrix is calculated.
The next component, presented in the block
diagram of Figure 1, is responsible for avg(Rxy(i,
j)) calculation. This time do……while control
statement was used to loop through the Rxy matrix.
do{ // average of R matrix - avg
avg_temp2 = avg_temp1;

avg_temp1 = avg_temp2 + (0@R[r]);
r++;
}while((0@r)<SIZE);
avg_temp3 = avg_temp1 / SIZE;
avg = avg_temp3[7:0];

Finally, the decision block was used to create the
spatial activity matrix Z and based on this matrix
the intensity of activity was calculated. This part
is described in the portion of code shown below.
do{ // Z matrix (spation activity matrix)
if (R[s] >= avg) Z[s]=R[s];
else Z[p]=0;
s++;
}while((0@s)<SIZE);
do{ // In (intensity of motion)
In_temp2 = In_temp1;
In_temp1 = In_temp2 + (0@Z[t]);
t++;
}while((0@t)<SIZE);
In_temp3 = In_temp1 / SIZE;
In = In_temp3[7:0];

The last step was to normalize the intensity value
to integers in the range (1-5).
5. Hardware Implementation

The RC1000 Celoxica card with a Xilinx Virtex
V1000 FPGA was chosen as a platform to
implement the MPEG-7 motion descriptors
described above. The Virtex V1000 FPGA with
its 1 million system gates is suitable for this class
of algorithms. The RC1000 card has four memory
banks of 2MBytes each. Memory banks are
accessible by both the FPGA and any device on
the PCI bus. It allows loading the image to be
processed through the PCI bus and makes it
available to the FPGA for processing. The
structure of the RC1000 card and the application
of its components to implement some of MPEG-7
descriptors is presented in Figure 2. Although this
paper addresses only the implementation of
motion descriptors, the size of this FPGA is big
enough to implement a variety of algorithms for
content-based visual description.
The Celoxica DK1 Design Suite was used as a
software environment used to create, debug and
load the code into the card. The Handel-C code
after compilation was exported as an EDIF file.
The Xilinx integrated software environment
(Xilinx ISE) was used to create the BIT map.

MIXDES 2003
5. Results

A testing environment was build as presented in
Figure 3. Results from the FPGA processing were
compared with the results calculated based on the
code written in C++ language. This allowed us to
verify the correctness and accuracy of the results
of the hardware implementation. The Xilinx ISE
tool reported the following parameters: total
equivalent gate count for the design was 58,453;
the tested clock frequency was 20MHz; and the
total power consumption 280mW.
6. Conclusion

This paper presented an implementation of motion
descriptors for motion intensity and spatial
distribution of motion activity. The descriptor
computations were implemented using the Virtex
XCV1000-6 FPGA platform. The hardware was
synthesized based on Handel-C language
description. The Handel-C language was found to
be a good alternative to standard lower level
Hardware Description Languages.
Time to
market can be significantly decreased, since the
designer can, and even should, think in terms of
high level algorithms rather than low level
circuits. The size of the design is no longer the
major optimization criterion of the design process.
Hence, this parameter could be neglected in favor
of lower complexity in the design process. As a
result, time to market could be shortened. Witgh
Handel-C, new algorithms can be quickly
implemented in hardware, as was demonstrated in
this paper for the case of MPEG-7 motion
descriptors.

References

[1] S.F. Chang, A. Puri, T. Sikora, H. Zhang,
“Overview of the MPEG-7 Standard,” IEEE Tran.
Circ. Sys. Video Tech., vol. 11, pp. 688-695, June
2001.
[2] B. S. Manjunath, P. Salembier, T. Sikora,
Introduction to MPEG-7 Multimedia Content
Description Interface, J. Wiley, New York, 2002.
[3] S. Jeannin and A. Divakaran, “MPEG-7 Visual
Motion Descriptors,” IEEE Tran. Circ. Sys. Video
Tech., vol. 11, pp. 720-724, June 2001.
[4] I. Koprinska and S. Carrato, "Temporal video
segmentation: A survey," Signal Processing:
Image Communication, Elsevier Science, 2001.
[5] B. Shahraray and D. C. Gibbon, "Automatic
Generation of Pictorial Transcripts of Video
Programs,"
Multimedia
Computing
and
Networking 1995, vol. Proc. SPIE 2417, Feb
1995.
[6] A. Divakaran, R. Regunathan, and K. A.
Peker, "Video Summarization Using Descriptors
of Motion Activity: A Motion Activity Based
Approach to Key-Frame Extraction from Video
Shots," Journal of Electronic Imaging, vol. 10, pp.
909-916, October 2001.
[7] X. Sun, D. Ajay, and B. S. Manjunath, "A
Motion Activity Descriptor and Its Extraction in
Compressed Domain,"IEEE Pacific-Rim Conf.
Multimedia (PCM), pp. 450-453, October 2001.
[8] A. Divakaran and H. Sun, "Descriptor for
spatial distribution of motion activity for
compressed video," Proceedings of SPIE on
Storage and Retrieval for Media Databases 2000,
vol. 3972, pp. 24-28, Jan 2000.
[9] Celoxica, “Handel-C Language Reference
Manual”, www.celoxica.com.

MIXDES 2003

Figure 3. Testbench created to verify results.
Figure 1. Block diagram of the motion descriptors
calculation.

Figure 2. Video stream MPEG7 processing using
Virtex XCV1000 FPGA.

