Low power techniques for video compression by Muresan, Valentin et al.
ISSC 2002, Cork. June 25–26
Low Power Techniques for Video Compression
Valentin Muresan†, Noel O’Connor†, Noel Murphy†, Sean Marlow† and Stephen McGrath∗
†Center for Digital Video Processing §Multimedia Business Unit
Dublin City University Parthus plc
IRELAND IRELAND
E-mail: †Valentin.Muresan@dcu.ie ∗Stephen.McGrath@parthus.com
Abstract — This paper gives an overview of low-power techniques
proposed in the literature for mobile multimedia and Internet appli-
cations. Exploitable aspects are discussed in the behavior of differ-
ent video compression tools. These power-efficient solutions are then
classified by synthesis domain and level of abstraction. As this paper
is meant to be a starting point for further research in the area, a low-
power hardware& software co-design methodology is outlined in the
end as a possible scenario for video-codec-on-a-chip implementations
on future mobile multimedia platforms.
Keywords — mobile multimedia, video compression, low power,
hardware acceleration
I Introduction
This paper will focus on the increasingly important
set of embedded applications to run on portable
systems in the areas of digital communications
and multimedia consumer electronics, e.g., cellular
phones, personal digital assistants and multimedia
terminals. These complex systems rely on “power
hungry” algorithms for wireless communications,
video compression and decompression [1], image
processing, etc. Figure 1 depicts what are expected
to be the main functional peripherals of the next
generation mobile multimedia platforms. We see
from this that hardware accelerating solutions need
to be designed to support future real-time mobile
applications for MPEG-4 coding/decoding, GPS
localization, and Bluetooth, UMTS, GSM/GPRS
wireless communications. The portability of mo-
bile platforms makes energy consumption a par-
ticularly critical design concern as it reduces bat-
tery life. Moreover, high power dissipation leads to
more expensive packaging and decreases reliability.
UMTS GSM/GPRS
BLUETOOTH
GPS
MPEG4 CODEC
Fig. 1: Next Generation Mobile Platforms’ Peripherals
The potential for multimedia applications on
mobile platforms is constrained by low bandwidth
wireless connections, low computational power, low
memory capacity, and short-life battery problems.
The first problem is ameliorated by efficient video
compression standards as in MPEG-4. The com-
putational requirements of MPEG-4, however, ex-
acerbate the remaining three problems, which are
themselves strongly inter-related. For example,
the software implementations of video compres-
sion tools are time consuming and could not cope
with the needs for real-time results required by ap-
plications like video-conferencing. The real-time
applications require high-throughput hardware ac-
celerators to speed up computationally demand-
ing tools. The hardware solutions designed solely
for high-throughput are a lot faster, but usually
take more power than their software counterparts.
Therefore, the short-life battery problem is aggra-
vated when the high-throughput hardware acceler-
ation is compulsory and, hence, the short-life bat-
tery problem becomes the biggest problem for mo-
bile multimedia.
The next section describes how features of
MPEG-4 can be the basis for power/performance
efficiency of future mobile multimedia platforms.
Section III enumerates power-efficient hardware so-
lutions already proposed or potentially applicable
for video compression acceleration. Section IV de-
scribes how power saving can be achieved at a
high-level by exploiting the behavioral peculiarity
of each video compression tool. Finally a possible
System-on-a-Chip scenario for mobile multimedia
is sketched based on a top-down hardware/software
(HW/SW) co-design methodology.
II Shape Adaptive Video Compression
MPEG-4 was proposed as a standard to meet the
scalability/flexibility requirements of different mo-
bile multimedia platforms. Therefore, it provides
a large set of tools that can be selectively ap-
plied. The tools are divided into many overlap-
ping sets, called profiles. The MPEG-4 Simple
Profile (SP) provides, for example, the most com-
monly used tools and is closely related to H263. A
highly efficient compression standard like MPEG-
4 pays the price for its object-orienting advan-
tages by employing computationally expensive al-
gorithms for motion estimation (ME), discrete co-
sine transform (DCT), discrete wavelet transform
(DWT), CAE (Content-based Arithmetic Encod-
ing) binary-shape coding, variable length coding
(VLC), quantisation (Q).
The basic idea of previous video compression
standards (MPEG-1, MPEG-2) is to employ mo-
tion estimation and DCT tools in order to elimi-
nate temporal and spatial redundancy. MPEG-4
introduces new tools in order to boost compression
efficiency and to provide new object-based func-
tionality. The introduction of object shape along
with its texture and motion features is one of the
main steps forward. The shape of a video object
is represented by a pixel-resolution binary alpha-
plane, which is coded by CAE binary shape cod-
ing. The coded shape is used as the basis for
compression in a number of ways, for example,
the polygon matching technique for motion esti-
mation/compensation. Polygon matching assumes
that only pixels within the shape of the video ob-
ject are considered in the matching criterion used
in the motion estimation search strategy. We can
use this information to disable the useless com-
putation invested in the unnecessary compression
of outside-shape parts. Shape-adaptive versions of
the DCT are another example of compression tools
where unnecessary computation can be avoided to
save power.
III Low Power Hardware Architectures
Several architectural solutions have already been
proposed for implementing video compression tools
in hardware [2] including dedicated Application
Specific Integrated Circuit (ASIC) solutions, Dig-
ital Signal Processing (DSP) architectures, recon-
figurable Single Instruction Multiple Data (SIMD)
based hardware, Field Programmable Gate Array
(FPGA) implementations and circuit-level techno-
logical domain techniques. Unfortunately, only a
few of these are power-conscious. FPGA tech-
nologies cannot yet meet mobile device’s power,
miniaturization, and speed requirements. Also,
technological domain techniques are usually em-
ployed close to the foundry and, therefore, are
beyond the scope of this overview. However,
voltage-scaling and dynamic clock frequency are
two circuit-level technological domain techniques
known to be power-efficient. Even though video
compression tools can be implemented by means
of regular hardware (SIMD and systolic arrays),
the video content and its associated processing and
compression are highly non-uniform in both space
and time. Therefore, the efficiency of such high-
throughput solutions decreases dramatically in the
case of video compression. High-throughput so-
lutions are also very power hungry. Unless high-
throughput is a necessity for the highest rates or
it is already available for other mobile multime-
dia functions (GSM/GPRS/UMTS,GPS), regular
hardware solutions are not too likely to be appro-
priate for power-efficient video compression. Con-
sequently, dedicated (ASIC) DSP solutions are the
most promising approach in order to achieve the
level of performance and power consumption ap-
propriate to mobile multimedia applications.
Power efficiency of computational hardware can
be dealt with in the behavioral or structural do-
mains. Examples of power optimization tech-
niques carried out on the behavioral specifications
so that the video compression tools become power-
conscious are: power-aware scalability, and motion
estimation by adaptive block-matching and power-
conscious search algorithms. These are usually
achieved at algorithmic level, by ordering/reducing
the basic operations involved in the compression
tools so that the switching activity is minimized, or
decreasing the levels of computation (e.g., getting
rid of the enhancement layer video data process-
ing) when the power consumption levels go over
a given limit. Such techniques sometimes achieve
lower power consumptions levels at the expense of
poorer quality video. The scalability of the com-
pression tools is controlled at system level and in-
volves only software-based decisions. These follow
the paradigm of power/distortion-optimized and
power/rate-optimized compression strategies. On
the other hand, the adaptive block-matching and
power-conscious search algorithms for motion es-
timation involve HW/SW co-design solutions (see
section V).
In the structural domain, power optimization de-
cisions can be taken at system, register transfer
(RT), or logic levels. At system level the low power
techniques consist of the reconfiguration of datap-
ath, memory, system bus and control units. In re-
configurable architectures, the control unit usually
has a system configuration part that is in charge
of the on-line or off-line reconfiguration of the sys-
tem components: datapath, memory, system bus.
At RT and logic levels, pipeline re-structuring and
word truncation are typical low-power techniques.
However, there are also low-power techniques
that necessitate a tight HW/SW co-design method-
ology in order to achieve efficient designs. For ex-
ample, MPEG-4’s object shape and texture pro-
cessing means a requirement for hardware flexibil-
ity enhancement in order to follow the arbitrarily-
changing size and shape of the object being com-
pressed. The arbitrarily shaped object mechanism
has behavioral connotations that can be acceler-
ated only by a highly reconfigurable power-efficient
hardware architecture flexible enough to follow an
object’ shape run-time characteristics.
Power-efficient hardware solutions potentially
applicable to the video compression acceleration is-
sue are briefly described next.
a) Low Power Reconfigurable Architectures
A variety of solutions have been proposed as
configurable and programmable architectures for
video compression. They can be classified in
four main categories [3]: circuit-level technolog-
ical reconfigurability, gate-level reconfigurability
(FPGAs), logic level reconfigurability of the func-
tional modules, parametrical reconfigurability of
the functional modules (memory, bus, or datap-
ath bit-sizes), and reconfigurable by programma-
bility. The programmability reconfigurable archi-
tectures proposed for video compression are based
on general programmable processors with or with-
out DSP/multimedia extensions. They are nor-
mally power-inefficient and are not discussed here
because they are beyond the scope of this paper.
The parametrical reconfigurability technique of the
functional modules assumes usually a run time re-
allocation of the hardware resources so that the
parts which are not involve in the processing are
disabled or shut-down. These techniques are usu-
ally mixed up with the logic level reconfigurability
techniques.
a).1 Voltage-Scaling Techniques
In [4] voltage-scaling technological approaches are
summarized as examples of circuit-level technolog-
ical techniques that can be employed in DSP do-
main: firstly, globally scaling the supply voltage
along with the threshold voltage, secondly, a dual-
Vdd approach in which the reduced Vdd is selec-
tively applied to non-critical paths, and thirdly, a
variable supply voltage approach, where the Vdd is
controlled on-chip adaptively. This approach is one
of the most efficient, but it can be achieved only at
circuit-level in the technological domain.
a).2 Low-Power Programmable Architec-
tures
power inefficiency    vs   data rate
data rate
po
w
er
 in
ef
fic
ie
nc
y power inefficiency = concurrency
power consumption
Fig. 2: Power vs Data Rate
FPGAs allow parallelism, pipelining, local mem-
ory and both functional and data dedication. How-
ever, FPGAs suffer from disadvantages for mobile
applications, such as the difficulty of miniaturiza-
tion, higher power consumption and their slow-
ness at sequential computations. FPGAs are not
designed to support high-speed dynamic reconfig-
uration as they exhibit a delay overhead given
by the reconfiguration mechanism. Other similar
circuit-level (technological) reconfigurability solu-
tions have also been proposed recently in [5] for dy-
namic interconnect architectures. These solutions
have not reached their maturity because of the long
delays exhibited by the programmable interconnec-
tions logic. FPGAs are not power efficient due to
their high level of programmability and their lack
of support for memory-intensive computation [6],
even though low-power FPGAs have been recently
proposed [7] for DSP.
a).3 Low-Power by Datapath Flexibility
Several parallel architectures have already been ap-
plied to the datapath structure of video compres-
sion: SIMD arrays [2] and pipelining are amongst
the most popular. The SIMD architectures are
used as hardware accelerators for high-throughput
DSP applications. They are very efficient for ap-
plications where a constantly high processing level
is required. Even though video compression can be
implemented by means of regular DSP hardware,
the video content and its associated processing and
compression are highly non-uniform in both space
and time [6]. This means that much of the time,
the SIMD hardware is consuming power but not
carrying out useful processing, as the average video
rates are significantly lower than the maximal ones.
Fig. 3: Dynamic Pipelining
The highly pipelined VLSI architectures de-
signed for high data rates of video compression
are excessively power consumptive at low rates (see
figure 2). Figure 2 depicts the power inefficiency
curve drawn against a fluctuating video data rate
for a certain level of parallelism or concurrency
(e.g., pipelining) [8]. Similar curves can be drawn
for any level of parallelism (e.g., number of pipeline
stages). A power efficient parallel structure would
be one able to dynamically reconfigure so that the
minimum possible level of power is consumed for
any given video data rate. Pipeline throughput
scalability is the fashionable solution in this case
and it translates in hardware terms to architecture
flexibility. For example, in [8] a reconfigurable dat-
apath solution is proposed as the one depicted in
figure 3. Here the maximal pipeline can be em-
ployed to achieve high throughput when the video
data rates are high. In the case of lower video data
rates, the number of pipeline stages can be reduced
so that the power is consumed efficiently. This can
be achieved by disabling and bypassing an appro-
priate number of dissipative pipeline stages accord-
ing to the data rate. For the lowest video data rate
the minimal pipeline structure can be employed.
Fundamentally, the above dynamic pipeline
technique saves power by eliminating pipeline
stages when the high processing-per-cycle rate is
not justified. A simpler way to achieve the same re-
sults would be, for example, to run the full pipeline
for 50% of the time and then shut down the pipeline
for the rest of the processing, rather than configure
a minimal pipeline with 50% of the stages disabled.
This approach is architecturally simpler, but deliv-
ers power-unbalanced results. That is, in the above
example, the pipelined architecture reaches a max-
imum power level for the first 50% of the compu-
tation and then the power level drops virtually to
a zero power level.
Other logic-level low complexity and power tech-
niques are the word-length shortening techniques
used to truncate the pixel-value’s bit-length when
a high level correlation is exhibited in the input
video data. These techniques are known to save
computation complexity and indirectly power con-
sumption, but sometimes they also degrade the
compression rate because they lack the SAD es-
timation precision.
a).4 Low-Power On-Chip Memory
In general, the memory sub-architecture consumes
a significant amount of power because of two
sources of power loss: the frequency of memory
access causes dynamic power loss, while leakage
current also contributes to power loss. Organizing
the memory so that an access activates only parts
of it, helps at limiting dynamic memory power
loss. Memory banking, currently used in some low-
power designs, splits the memory into banks and
activates only the bank presently in use. It relies
on the exploitation of video-content spatial locality,
which can be increased by studying and optimizing
the video content reference pattern.
To avoid the leakage power loss, memory bank
shut-down procedures could be employed on mem-
ory parts that are unused for long time. Other
power-efficient techniques deal with the optimiza-
tion of memory (bank) size and addressing hard-
ware. In video compression, the dynamic power
loss can also be reduced either by reducing the re-
dundant access to video data or by reordering and
grouping the independent operations of the com-
pression tools so that the number of accesses to
the same video-data element is reduced.
a).5 Low-Power Local Bus Architectures
Video compression tools are memory intensive.
Therefore, local memory architectures are used to
avoid system bus conflicts, lighten the system bus
management, and speed-up the system level bus
transfers. These low power on-chip memory tech-
niques are also meant to eliminate power-inefficient
system-bus transfers. In the literature, buses have
also proved to be a significant source of power
loss, especially in the case of wide (32-64 lines)
inter-chip buses, where each line requires substan-
tial drivers. One approach employed to limit the
switching on these lines is to integrate data com-
pression techniques (e.g., Gray code for address
lines or transmitting the difference between suc-
cessive address values for address lines as well)
with the bus controllers to eliminate the switch-
ing activity on the bus. This way the data com-
pression/decompression is executed on the fly and
reduces the power loss levels.
IV Low Power Behavioral Optimization
At a high level, the hardware/software architec-
tures can be tailored to achieve low power con-
sumption levels by synthesizing dynamically pa-
rameterized algorithms [6, 9]. These architectures
are able to adapt at various rates of video informa-
tion and its associated compression computations.
Other behaviorally power-optimized approaches
suitable for video compression tools are the func-
tional reconfigurable algorithms and architectures,
which have the potential to reduce power consump-
tion by adjusting their dimensions. For exam-
ple, for the case of motion estimation tool, a dy-
namically resized memory can implement a flexible
search area for motion vectors [10]). Then SAD cal-
culation cancellation techniques can be employed
again for the motion estimation search strategy and
block-matching algorithms to cease the computa-
tion when the power consumption levels go over a
certain limit.
Power savings can be achieved in the behavioral
domain by employing the shape-adaptiveness fea-
ture of MPEG-4 tools described in section II. Poly-
gon matching for motion estimation and shape-
adaptive DCT/IDCT are the mostly known and
they assume reducing the computational and, in-
directly, the power consumption levels to the needs
of compressing shape, motion and texture informa-
tion for each arbitrarily-shaped object in video’s
scene.
V Low Power Mobile Multimedia Design
Methodology
The complexity of video compression entails a Sys-
tem on a Chip (SOC) design methodology in order
to embrace most of its functionality. Mobile multi-
media applications bring yet another level of power
vs performance constraints. Starting from each
MPEG tool specification, investigations have to be
made and an dedicated architectural view formu-
lated for each of them based on their behavioral pe-
culiarities. High-level hardware/software co-design
decisions need to be made in order to decide the
ratio between the hardware/software architectural
solutions to be employed further down in the syn-
thesis/implementation. Then the models have to
be validated and tested to determine if their func-
tionality meets the specifications. From this level,
silicon products vendors can take over and bring
the optimized SOC to silicon level.
VI Conclusions
Handheld manufacturers have already started to
eye the digital multimedia applications as a huge
source for value-added products to be sold on the
future 3G wireless market. Therefore, they will
slowly shift their focus into the video compression
world to find the best price/power/performance
balanced HW/SW implementations. This paper
tackles this balance from the power consumption
perspective and enumerates the low-power tech-
niques known in the mobile multimedia literature.
References
[1] O’Connor, N. E. and Murphy, N. A. and Mar-
low S.: Image and Video Compression -
Module Notes - School of Electronics, Dublin
City University, 2000-2001.
[2] Muresan, V.: Hardware Accelerating Solu-
tions for Mobile Device Platforms - Inter-
nal Technical Report, Video Media Processing
Lab, Dublin City University, April 2001.
[3] Rabaey, J.: Reconfigurable Processing:
The Solution to Low-Power Programm-
ble DSP, Proceedings of the IEEE Interna-
tional Conference on Acoustics, Speech and
Signal Processing, April 1997.
[4] Usami, K. and Igarashi, M. and Ishikawa, T.
and Kanazawa, M. and Takahashi, M. and
Hamada, M. and Arakida, H. and Terazawa,
T. and Kuroda, T.: Design Methodology of
Ultra Low-Power MPEG-4 Codec Core
Exploiting Voltage Scaling Techniques,
Proceedings of the 35th IEEE Design Automa-
tion Conference, June 1998, pp.483-488.
[5] Zhang, H. and Wan, M. and George, V. and
Rabaey, J.: Interconnect Architecture Ex-
ploration for Low-Power Reconfigurable
Single-Chip DSPs, Proceedings of the IEEE
Workshop on VLSI Signal Processing, Napa
California, October 1992, pp. 166-174.
[6] Burleson, W. and Tessier, R. and Goeckel,
D. and Swaminathan, S. and Jain, P. and
Euh, J. and Venkatraman, S. and Thyagara-
jan, V.: Dynamicall Parameterized algo-
rithms and Architectures to Exploit Sig-
nal Variations for Improved Performance
and Reduced Power, Proceedings of the In-
ternational Conference on Acoustics, Speech,
and Signal Processing (ICASSP), USA, May
2001.
[7] George, V. and Zhang, H. and Rabaey, J.: The
Design of a Low Energy FPGA, ISLPED
‘99 - Proceedings of the 1999 International
Symposium on Low Power Electronic Design,
1999, pp. 188-193.
[8] Kim, S. and Ziesler, C. H. and Papaefthymiou,
M. C.: A Reconfigurable Pipelined IDCT
for Low-Energy Video Processing, Pro-
ceedings of the 14th IEEE International
ASIC/SOC Conference, USA, Sep 2001.
[9] Wan, M. and Zhang, H. and George, V. and
Benes, M. and Abnous, A. and Prabhu, V. and
Rabaey, J.: Design Methodology of a Low-
Energy Reconfigurable Single-Chip DSP
System, Journal of VLSI Signal Processing,
2000.
[10] Park, S. R. and Burleson, W.: Reconfig-
uration for Power Saving in Real-Time
Motion Estimation, Proceedings of ICASSP,
1997.
