Evaluation of the Performance/Energy Overhead in DSP Video Decoding and
  its Implications by Benmoussa, Yahia et al.
Evaluation of the Performance/Energy Overhead in
DSP Video Decoding and its Implications
Yahia Benmoussa†§, Jalil Boukhobza†, Eric Senn† and Djamel Benazzouz,§
†Universite´ Europe´enne de Bretagne, CNRS, UMR 6285 Lab-STICC, France
§Universite´ M’hamed Bougara, Boumerdes, Algeria
Abstract—Video decoding is considered as one of the most
compute and energy intensive application in energy constrained
mobile devices. Some specific processing units, such as DSPs, are
added to those devices in order to optimize the performance and
the energy consumption. However, in DSP video decoding, the
inter-processor communication overhead may have a considerable
impact on the performance and the energy consumption. In this
paper, we propose to evaluate this overhead and analyse its impact
on the performance and the energy consumption as compared to
the GPP decoding. Our work revealed that the GPP can be the
best choice in many cases due to the a significant overhead in
DSP decoding which may represents 30% of the total decoding
energy.
Keywords—Video decoding, Performance, Energy, GPP, DSP,
H264/AVC, OMAP, Gstreamer.
I. INTRODUCTION
Energy saving consideration becomes at the center of the
hardware and the application design in mobile devices such as
smart-phones and tablets. In fact, Lithium battery technologies
are not evolving fast enough, this negatively impacts the
autonomy duration. This is becoming a critical issue especially
when using processor intensive applications such as video
playback. In [1], it is shown that video playback is the most
important energy consumer application used in mobile devices.
This is due to the important use of the processing resources
responsible of more than 60% of the consumed energy [1].
Furthermore, to allow high quality video decoding, the
processors equipping mobile devices are more and more
powerful. A hardware configuration including a processor
clocked at more than 1 GHz frequency becomes common. The
main drawback of using high frequencies is that it requires
higher voltage levels. This leads to a considerable increase in
energy consumption due to the quadratic relation between the
dynamic power and the supplied voltage in CMOS circuits. To
overcome this issue, Digital Signal Processors (DSP) are used
to provide better performance-energy properties. Indeed, the
use of parallelism in data processing increases the performance
without the need to use higher voltages and frequencies [2].
In case of DSP decoding, in addition, to the clock fre-
quency and the decoded video quality parameters stated above,
the overhead due to the inter-processor communication should
be considered. This issue was addressed from performance
point in studies such as [3], [4]. However its impact on the
energy consumption as compared to a GPP decoding was
not studied before. In this paper, we propose to evaluate
the performance and the energy overhead in DSP decoding
and analyse its impact on the performance and the energy
consumption as compared to GPP video decoding. For this
purpose, we conduct some experimental measurements which
are described in section II. The obtained results and the
conclusion are discussed in sections III and IV respectively.
II. EXPERIMENTAL METHODOLOGY AND SETUP
In the experimentations, we followed two steps. 1) A video
frame level performance and energy characterization where the
DSP performance and energy overhead is evaluated in a frame
decoding cycle. We define the overhead as all the processing
which is not related to the actual frame decoding such as
GPP-DSP communication and cache memory maintenance
operations. 2) The video sequence performance and energy
consumption are evaluated and compared to those of the GPP.
Power measurements performed in this study were
achieved using the Open-PEOPLE framework [5], a multi-
user and multi-target power and energy optimization plat-
form and estimator. The target platform is OMAP3530EVM
board which consists of a Cortex A8 ARM processor and
TMS320C64x DSP. The power consumptions of the DSP and
the ARM processors are measured using . On this hardware
platform, the Linux operating system version 2.6.32 was
used. The video decoding was achieved using Gstreamer, a
multimedia development framework. The ARM decoding, was
performed using ffdec h264, an open-source plug-in based
on ffmpeg/libavcodec library. For DSP decoding, we used
TIViddec2, a proprietary Gstreamer H264/AVC baseline profile
plug-in provided by Texas Instrument. The videos sequences
used in the tests are Harbor and Soccer. Each video is coded
in different biterates (64 Kb/s, . . . 5120 Kb/s) and qcif, cif
and 4cif resolutions. Each video is then decoded at different
clock frequencies ranging from 125 MHz to 720 MHz. The
performance (Frame/s) and the energy consumption (mJ/frame)
are measured for each (bit-rate, resolution, frequency).
III. EXPERIMENTAL RESULTS & DISCUSSIONS
A. Frame level Performance and energy characterization
Fig. 3 shows the power consumption level of 4cif and
qcif DSP video decoding. The DSP frame decoding phase is
represented by the values varying between 0.7 W and 1.1 W
corresponding to [32 ms, 62ms] and [6.2 ms, 7.5ms] intervals.
This phase is terminated by a burst of DMA transfers of the
decoded frame macro-blocks from the DSP cache to the shared
memory which corresponds to the intervals [56 ms, 62ms] and
[7.2 ms, 7.5ms] and is illustrated by an increase in memory
power consumption. The ARM wake-up latency is represented
by the power level 0.66 W. The ARM wake-up is represented
ar
X
iv
:1
30
9.
25
33
v1
  [
cs
.A
R]
  1
0 S
ep
 20
13
200
400
600 0
2000
4000
6000
0
50
100
150
200
250
300
350
400
 
qcif ARM and DSP decoding (Harbour)
 
Fr
am
es
/s
ARM
DSP
Frquency Bitrate (Kb/s)
0 200 400 600 800 0
2000
4000
60000
20
40
60
80
100
120
140
160
180
 
cif ARM and DSP decoding (Harbour)
 
Fr
am
es
/s
ARM
DSP
Frquency Bitrate (Kb/s)
200
400
600
0
1000
2000
3000
4000
5000
6000
0
10
20
30
40
50
60
70
 
Bitrate (Kb/s)
4cif ARM and DSP decoding (Harbour)
 
Fr
am
es
/s
ARM
DSP
Frquency
Fig. 1: ARM and DSP decoding performance of the Harbour video
100 200 300 400 500 600 700 800 0
1000
2000
3000
4000
5000
6000
0
1
2
3
4
5
 
qcif decoding energy consumption (Harbour)
 
m
J/
Fr
am
e
ARM
DSP
Frequency Bitrate (Kb/s)
0
200
400
600
800 0
2000
4000
60000
2
4
6
8
10
 
cif decoding Energy consumption (Harbour)
 
m
J/
Fr
am
e
ARM
DSP
Frequency Bitrate (Kb/s)
0
200
400
600
800 0
2000
4000
60000
5
10
15
20
25
30
35
 
4cif decoding energy consumption (Harbour)
 
m
J/
Fr
am
e
ARM
DSP
Bitrate (Kb/s)Frequency
Fig. 2: ARM vs DSP decoding energy consumption of H264/AVC video
by the power transition to 0.83 W. Table I shows the obtained
time and energy overhead values for qcif, cif and 4cif videos.
One can notice that the overhead can reach 50% and 30% for
energy and performance respectively in case of qcif resolution.
0 10 20 30 40 50 60 70 80 90 100
0,2
0,4
0,6
0,8
1
1,2
Po
w
er
 (W
)
 
 
Memory 
 DSP + ARM
Time (ms)
Decoded frame 
transfer using DMA
Memory power increase
due to frame copy. 
Frame decoding period
(a) DSP frame decoding (4cif) power consumption
Overhead
DSP Decoding
DSP active/ARM idle
DSP idle/
ARM idle
DSP idle/
ARM active
0 2 4 6 8 10 12 14
0,2
0,4
0,6
0,8
1
1,2
Po
w
er
 (W
)
 
 
Memory 
 DSP + ARM
Overhead
Frame decoding period
Time (ms)
Memory power increase due to frame
copy. 
(b) DSP frame decoding (qcif) power consumption
DSP Decoding Decoded frame transfer using DMA
DSP active/ARM idleDSP idle/ARM idle
DSP idle/ARM active
Fig. 3: ARM and DSP frames decoding
TABLE I: DSP video decoding time and energy overhead
Resolution DSP decoding energy(mJ/frame) DSP decoding time (ms/frame)Processing Total Overhead (%) Processing Total Overhead (%)
qcif (128kb) 1.97 4.16 52.64 1.71 2.33 30.48
cif (1024kb) 6.016 8.36 28.11 5.35 6.72 20.38
4cif (5120 kb) 23.73 25.93 8.48 21.59 22.16 2.5
B. Video Stream Performance and Energy Evaluation
1) Decoding Performance Results: Fig. 1 shows a com-
parison between ARM and DSP video decoding performance
in case of 4cif, cif and qcif resolutions for the Harbor video
sequence. The flat surface represents the reference acceptable
video displaying rate (30 Frames/s). One can observes that
the performances of the ARM processor and of the DSP are
almost equivalent in case of qcif resolution. However, the
ARM decoding speed is 43% higher than the DSP in case
of 64 Kb/s bit-rate while the DSP decoding speed is 14%
higher than the ARM in case of 5120 Kb/s bit-rate. For cif
and 4cif resolutions, The DSP decoding is almost 50 % faster
than of the ARM in case of cif resolution and 100% in case of
4cif. This ratio decreases drastically for low bit-rates where the
ARM performance increases faster than the one of the DSP.
2) Energy Consumption Results: Fig. 2 shows a compari-
son between the ARM and DSP video decoding energy con-
sumption (mJ/Frame) in case of 4cif, cif and qcif resolutions.
The DSP qcif video decoding consumes 100% more energy
than the ARM in case of low bit-rate and 20% for high bit-rate.
On the other hand, the DSP 4cif video decoding consumes less
energy than the ARM although. In case of cif resolution, we
noticed an crossing between the ARM and the DSP energy
consumption levels. In fact, for low bit-rate starting from
1Kb/s, the ARM consumes less energy than the DSP.
IV. CONCLUSION
The analysis of the obtained results shows that the overall
performance and the energy efficiency of the DSP as compared
to the ARM processor depend mainly on the required video
coding quality (bit-rate and resolution). In fact, the DSP video
decoding is the best performance and energy efficient choice
in case of 4cif resolution and the use of ARM decoding is
better in case of qcif resolution and cif resolution with a bit-
rate less than 1 Mb/s. The drop of the performance and energy
consumption properties of the DSP video decoding are due to
a significant inter-processors overhead.
REFERENCES
[1] A. Carroll and G. Heiser, “An analysis of power consumption in a
smartphone,” Proceedings of the 2010 USENIX conference on USENIX
annual technical conference, pp. 21–21, 2010.
[2] D. Markovic, V. Stojanovic, B. Nikolic, M. Horowitz, and R. Brodersen,
“Methods for true energy-performance optimization,” Solid-State Cir-
cuits, IEEE Journal of, vol. 39, no. 8, pp. 1282–1293, 2004.
[3] P. Ramachandra and M. R. Satish, “H.264 main profile video decod-
ing implementation techniques on OMAP3430IVA,” Signal Processing
(ICSP), 2010 IEEE 10th International Conference on, pp. 271–274, 2010.
[4] S. Kant, U. Mithun, and P. Gupta, “Real time H.264 video encoder
implementation on a programmable dsp processor for videophone ap-
plications,” Consumer Electronics, 2006. ICCE ’06. 2006 Digest of
Technical Papers. International Conference on, pp. 93–94, 2006.
[5] E. Senn, D. Chillet, O. Zendra, C. Belleudy, S. Bilavarn, R. Atitallah,
C. Samoyeau, and A. Fritsch, “Open-people: Open power and energy
optimization PLatform and estimator,” 2012 15th Euromicro Conference
on Digital System Design (DSD), pp. 668 –675, Sep. 2012.
