Techniques For Low Power Motion Estimation In Video Encoders by Gupte, Ajit D
 11
 Abstract 
 
 
This thesis looks at hardware algorithms that help reduce dynamic power dissipation in 
video encoder applications. Computational complexity of motion estimation and the data 
traffic between external memory and the video processing engine are two main reasons for 
large power dissipation in video encoders. While motion estimation may consume 50% to 
70% of total video encoder power, the power dissipated in external memory such as the DDR 
SDRAM can be of the order of 40% of the total system power. Reducing power dissipation 
in video encoders is important in order to improve battery life of mobile devices such as the 
smart phones and digital camcorders. We propose hardware algorithms which extract only 
the important features in the video data to reduce the complexity of computations, 
communications and storage, thereby reducing average power dissipation. We apply this 
concept to design hardware algorithms for optimizing motion estimation matching 
complexity, and reference frame storage and access from the external memory. In addition, 
we also develop techniques to reduce searching complexity of motion estimation. 
First, we explore a set of adaptive algorithms that reduce average power dissipated due 
to motion estimation. We propose that by taking into account the macro-block level features 
in the video data, the average matching complexity of motion estimation in terms of number 
of computations in real-time hardwired video encoders can be significantly reduced when 
compared against traditional hardwired implementations, that are designed to handle most 
demanding data sets. Current macro-block features such as pixel variance and Hadamard 
transform coefficients are analyzed, and are used to adapt the matching complexity. The 
macro-block is partitioned based on these features to obtain sub-block sums, which are used 
for matching operations. Thus, simple macro-blocks, without many features can be matched 
with much less computations compared to the macro-blocks with complex features, leading 
to reduction in average power dissipation. Apart from optimizing the matching operation, 
optimizing the search operation is a powerful way to reduce motion estimation complexity. 
We propose novel search optimization techniques including (1) a center-biased search order 
and (2) skipping unlikely search positions, both applied in the context of real time hardware 
 12
implementation. The proposed search optimization techniques take into account and are 
compatible with the reference data access pattern from the memory as required by the 
hardware algorithm. We demonstrate that the matching and searching optimization 
techniques together achieve nearly 65% reduction in power dissipation due to motion 
estimation, without any significant degradation in motion estimation quality. 
A key to low power dissipation in video encoders is minimizing the data traffic between 
the external memory devices such as DDR SDRAM and the video processor.  External 
memory power can be as high as 50% of the total power budget in a multimedia system. 
Other than the power dissipation in external memory, the amount of data traffic is an 
important parameter that has significant impact on the system cost. Large memory traffic 
necessitates high speed external memories, high speed on-chip interconnect, and more 
parallel I/Os to increase the memory throughput. This leads to higher system cost. We 
explore a lossy, scalar quantization based reference frame compression technique that can be 
used to reduce the amount of reference data traffic from external memory devices 
significantly. In this scheme, the quantization is adapted based on the pixel range within each 
block being compressed. We show that the error introduced by the scalar quantization is 
bounded and can be represented by smaller number of bits compared to the original pixel. 
The proposed reference frame compression scheme uses this property to minimize the motion 
compensation related traffic, thereby improving the compression scheme efficiency. The 
scheme maintains a fixed compression ratio, and the size of the quantization error is also kept 
constant. This enables easy storage and retrieval of reference data. The impact of using lossy 
reference on the motion estimation quality is negligible. As a result of reduction in DDR 
traffic, the DDR power is reduced significantly. The power dissipation due to additional 
hardware required for reference frame compression is very small compared to the reduction 
in DDR power. 24% reduction in peak DDR bandwidth and 23% net reduction in average 
DDR power is achieved. For video sequences with larger motion, the amount of bandwidth 
reduction is even higher (close to 40%) and reduction in power is close to 30%. 
