Frame buffer energy optimization by pixel prediction by K Patel et al.
Frame Buffer Energy Optimization by Pixel Prediction
K. Patel E. Macii M. Poncino
Dipartimento di Automatica e Informatica – Politecnico di Torino
10129 Torino ITALY
Abstract
We propose a technique to reduce the energy consumption
of the frame buffer memory, based on the spatial locality
of images and display frames. Our scheme reduces energy
by selectively avoiding reads from the frame buffer when
identical adjacent pixels are detected. This is made pos-
sible by using an auxiliary memory that stores the locality
information. The proposed architecture allows to dynami-
cally update the locality information, and, unlike previous
approaches, it works virtually independent of the size and
position of the updates of the display frames. Experimental
results evaluated on a set of typical graphical applications
show a reduction of about 40% of frame buffer reads.
1 Introduction
Energy consumption is regarded as a critical issue in embed-
ded systems, mostly because they are often battery-operated
devices. The advent of devices equipped with a LCD dis-
play has exacerbated the energy issue because the LCD sub-
system consumes considerable power (it can easily exceed
the 1 Watt mark [1]).
LCDs differ from other types of resources, because it is
an intrinsically non-power-manageable resource: it must
be continuously refreshed, and shutting it down implies
penalty in performance and in image quality.
The display subsystem consists of several components
(frame buffer, frame buffer bus, LCD controller, LCD bus,
LCD panel) all of which have a significant impact on the
total consumption. While the LCD panel is known to be the
most power hungry portion, the frame buffer is the next in
order of importance: as these displays need to be refreshed
constantly they need to read the pixel data from the frame
buffer memory. With typical refresh rate of 50–60 Hz, the
frame buffer is read 50–60 times every second.
In this work, we focus on the power consumption of the
frame buffer, basically a large DRAM array that stores the
the frame to be sent to the LCD controller. Unlike a con-
ventional DRAM, that may contain mixed data and instruc-
tions with poor correlation among them, the frame buffer
contains frames of an application, which exhibit some well-
defined correlation among them.
We propose a frame buffer energy optimization the exploits
such (spatial) correlation, broadly defined as a large amount
of adjacent pixels with similar value. We call our scheme
pixel prediction, because when two identical consecutive
pixels are detected, the currently read pixel is actually pre-
dicted based on the previous pixel vale, avoiding reads from
the frame buffer. This is made possible by using an auxil-
iary memory that stores the locality information.
The proposed technique yields promising results: measured
on a set of display configurations of applications typical
of LCD-based devices (spreadsheet, movie players, image
viewers), our scheme allows to save about 40% of the reads
from the frame buffer, roughly corresponding to an equiva-
lent reduction in energy.
2 Previous Work
Energy optimization techniques for the display subsystem
are relatively new in the low-power design domain. In [1],
the authors propose various schemes to reduce display sub-
system’s energy consumption, such as variable duty ratio
with fewer frequent refreshes, dynamic depth control at tol-
erable loss of quality, and brightness compensation and con-
trast enhancement for backlight.
Despite their high energy share, not much work has been
proposed for frame buffers. Frame buffers are essentially
DRAMs and, in principle, any low-power technique for
DRAMs could be as well applied to frame buffers. These
techniques, however, usually focus on refresh energy since
it in the most dominant part for off-chip memories [4, 5],
unlike frame buffers, for which read energy is dominant.
The approach proposed in in [2] is close to our approach;
based on the observation that frames have lot of spatial lo-
cality and the same color is repeated several times consecu-
tively, they encode colors by storing only first occurrence of
the color along with its number of consecutive occurrences
(i.e., run-length encoding - RLE). Partial screen updates are
managed by dividing the screen into a number of blocks and
performing updates on each block separately.
In this paper, we propose a simple scheme which effectively
achieves the same objective with virtually no limitation on
length and updates. Each pixel is encoded with an extra
flag bit which says if the pixel is the same as of the last
pixel. This simple scheme, on average, avoids 40% of frame
buffer reads, and is independent of the size and position of
updates.
Proceedings of the 2005 International Conference on Computer Design (ICCD’05) 
0-7695-2451-6/05 $20.00 © 2005 IEEE 
Authorized licensed use limited to: University of Southern California. Downloaded on April 08,2010 at 20:17:38 UTC from IEEE Xplore.  Restrictions apply. 
3 Motivation
Frames of typical applications do generally exhibit a signif-
icant amount of spatial locality, which manifests as a strong
correlation between adjacent pixels.
Correlation among pixels does exist more generally also in
the horizontal (H) and vertical (V) dimensions, thus gener-
alizing the notion of “adjacency”. Moreover, besides corre-
lation on each individual RGB component, joint correlation
over the three channels may also be significant.
If we consider all the possible variants, we can identify four
“types” of locality. Type A (1-D locality in the H direction
only, joint RGB), Type B (2-D locality in H and V direc-
tion, joint RGB), Type C (1-D locality in the H direction
only, individual RGB) and Type D (2-D locality in H and V
directions, individual RGB).
We have run a test on a set of images representing frames
of typical applications, and collected data about these four
types of correlation. Results, shown in Figure 1, indicate
that all the four types of correlation are quite significant, but
depend on the nature of the frame. Correlation is measured
as the percentage of adjacent pixels that are identical, where
adjacency is defined by each correlation type.
Figure 1. Amount of Various Locality Types.
Based on this quantitative analysis, we propose a energy-
efficient frame buffer architecture that exploits inter-pixel
correlation. The basic principle of our solution is to avoid
the reading of pixels whose content can be predicted. In-
formation about the predictability of pixel values need to
be stored in an additional support memory which will be
accessed on each frame buffer access, in parallel with an
access frame buffer. Energy can be saved by reducing the
number of reads to the frame buffer upon successful access
to this support memory. The next section will describe the
architectural details of the proposed scheme.
4 Pixel Prediction Architecture
The architecture of the pixel prediction architecture is af-
fected by the choice of what type of correlation should be
implemented. Figure 1 shows a tradeoff (more evident for
“richer” images) between correlation complexity and pre-
dictability of the values. Type A correlation is the simplest
one and requires the least overhead. As we move from Type
A to Type D, complexity and overhead increase. 2-D corre-
lation requires in fact the specification of what direction cor-
relation refers to, while joint RGB correlations requires to
store the extra information on a component-by-component
basis (as opposed to pixel-by-pixel). Besides the amount of
extra information required, the extra circuitry for updating
these information is also to be considered.
In order to keep energy and performance overhead to a mini-
mum, we will consider here an architectural implementation
of Type A correlation. Our approach requires the modifica-
tion to the conventional frame buffer architecture. The mod-
ified frame buffer (FB) uses a support buffer (Prediction
Buffer, PB) that stores one bit per pixel, indicating whether
that pixel is identical to the previous one. More precisely,
element PB(i, j) contains a ’1’ if if FBi,j−1 ≡ FBi,j ,
(where FBi,j denotes the (i, j)-th pixel), and 0 otherwise.
Every row of FB has a corresponding row in PB, related to
the flag bit of each pixel of that row.
4.1 Read Architecture
We assume that the FB stores an image consisting of M ·
N pixels, and that each row of the FB consists of n bits,
corresponding to m pixels. Assuming 24-bit pixel values,
n = 24·m. Figure 2 shows a more detailed architecture that





















Figure 2. Selective Read Architecture.
During the read of a row x of the FB, the next row x + 1
of the PB is also read. The read value is stored in a m-bit
register R, which contains the information about the “cor-
relation” of the m bits of row x.
When we come to read the next row x + 1 of the FB, the
prediction bits corresponding to this row, stored in R, are
used to drive the selective read block, that drives the local
word lines for each pixel in order to avoid reading of pixels
based on the value of R. Such an architecture for one such
row is similar to divided word line (DWL) [7] approach,
with the difference that the local wordline corresponding
to each pixel is driven by its corresponding prediction bit
stored in register R.
Proceedings of the 2005 International Conference on Computer Design (ICCD’05) 
0-7695-2451-6/05 $20.00 © 2005 IEEE 
Authorized licensed use limited to: University of Southern California. Downloaded on April 08,2010 at 20:17:38 UTC from IEEE Xplore.  Restrictions apply. 
Selective pre-charge is the mechanism by which the archi-
tecture saves energy; based on the bits stored in R, we
avoid reading the corresponding pixels by preventing the
pre-charge of the bitlines, and by cutting of the local word-
line from the global one for that pixel. The whole archi-
tecture relies on the fact that reads to the FB are always
sequential, thereby avoiding the risk of overriding R with
wrong quantities. Sequential access occurs also in the case
of updates of the FB, as will be shown later.
The content of R is also fed to the Predictive Readout block,
which, based on the current address and the prediction bit















Figure 3. Predictive Readout Circuit.
Figure 3 depicts how the values are read out. The LSBs of
the address identify the pixel currently read out, by filtering
the corresponding prediction bit from the m prediction bits
of R and from the n bits of the FB output buffer. If the
flag bit for the current pixel is ’1’, then the values stored in
the registers (i.e., the last pixel value) are passed on to the
output; this registers always store the last pixel value sent to
the output. Otherwise, the actual pixel value is read out.
4.2 Write Architecture
When frames are written into the FB, we need to determine
if the pixel being written is the identical to the last one. This
is done by the architecture shown in Figure 4, that empha-
sizes only the circuitry needed for writing values in the FB
and the PB. The selective pre-charge circuitry of Figure 2 is



















Figure 4. Write Architecture.
All the overhead required by writes is relative to the update
of the PB. The update of the FB is done as in regular write
operation.
The upper part of the circuit detects if two consecutively
issued addresses x and y are in sequence, i.e., if y = x +
1. Similarly, the bottom part of the circuit checks if two
consecutive pixels have the same value. If both conditions
hold, we write ’1’ as a value in the PB for current address,
and ’0’ otherwise.
4.2.1 Write Updates
One important issue is related to the FB updates. Some ap-
plications are interactive (e.g., spreadsheet, warehouse man-
ager), usually have small regular updates, and updates of the
entire screen only happen occasionally. In [2], the authors
propose the division of the whole image in small blocks.
Without this division, if the image is considered as a whole,
then even for small updates RLE compression has to be re-
computed for the entire image. By dividing the image into
blocks, the update has to be performed only for those blocks
which are affected by the update, and hence RLE has to be
done only on those blocks.
We propose a more elegant update scheme suitable to our
architecture, that does not require the division of the screen
in blocks: updates can be of any size. If only a small block
of the screen is updated, then it is obvious that after few
consecutive addresses there will be a jump of fixed stride.
For example, assume an image of size 640x480 with an up-
date of size 16X16 pixels, occurring at co-ordinates x and y.
After the address (y · 640+x+16) the address will change
by 640 (i.e., the width of the image). This indicates that we
were at the boundary of our rectangular update. This will be
indicated by ’0’ on the output of the top address compara-
tor of Figure 4. Whenever this is the case we have to do two
writes to the PB. One for the predicted address, which in this
case is (y ·640+x+17). The flag bit for this address will be
conservatively set to ’0’. Since the pixel (y · 640 + x + 16)
is updated, the status of (y · 640 + x + 17) is not known
anymore (it may be the same as (y · 640 + x + 16) or may
not). For simplicity, and also to avoid reading the value at
(y ·640+x+17), we set the flag bit for (y ·640+x+17) to
’0’. The second write will be for the current address, which
in this case, will be ((y +1) ·640+x), since we crossed the
boundary of the rectangle. Therefore, in terms of updates
our architecture does not have much overhead.
5 Experimental Results
We have considered a set of applications similar to those
of [2]: a spreadsheet, an image viewer, a movie player (three
different frames), a map viewer, an image viewer, and a
PDF reader. All the screenshots were of size 640x480 pixels
(not shown here due to space limitation).
Figure 5 shows the percentage savings in reading the pixels,
for the various screenshots. Data are inclusive of the over-
head for PB reading, which is quantified later in the section.
Proceedings of the 2005 International Conference on Computer Design (ICCD’05) 
0-7695-2451-6/05 $20.00 © 2005 IEEE 
Authorized licensed use limited to: University of Southern California. Downloaded on April 08,2010 at 20:17:38 UTC from IEEE Xplore.  Restrictions apply. 
Figure 5. Savings Compared to Regular FB.
The plot does not account for the possible partial updates of
the frames, which may potentially reduce the savings, since
they locally alter the correlation. Moreover, the arrival rate
of the full screen updates also affects the savings. In fact,
our architecture saves only frame buffer reads, and high ar-
rival rate of updates results in reduction of the ratio of writes
to reads of frame buffer, thus reducing savings.
For most applications the update rate is quite low, and it
is thus not taken into account for the results plotted in fig-
ure 5. One application for which the full screen update rate
is critical is the movie player. We can assume the arrival
rate of updates approximately at 20 frames/sec; assuming
the screen refresh rate is 60Hz (i.e., refreshing 60 times per
second), we read a frame only 3 times between two consec-
utive updates, a much lower rate than for the other applica-
tions.
5.1 Write Update Analysis
The impact of the update is interesting mostly for appli-
cations which exhibit partial updates; full updates do not
change the result since they replace an entire frame with a
new one. We analyze the effects of five partial updates, ap-
plied in sequence, on a spreadsheet applications. The first
three updates modify a cell while last two updates activate
some of the pull down menus. Unlike the block-based up-
date model of [2], our update model is independent of the
update position. It updates only the status of the pixels of
the updated block, plus an extra column, as mentioned in
Section 4.2.1, which is negligible in terms of overhead.
In modeling these updates, we have ignored their arrival
rate. An update of a frame is treated as follows: after an
update of the frame buffer, the fraction of pixels whose pre-
diction bit is still ’1’ is used an estimate of how much cor-
relation is introduced or destroyed into the frame, after this
partial update.
The results show that the first three updates exhibit simi-
lar results, that is, about 78% of predicted bits to ’1’. The
other two updates, however, some more visible effect on
the results. Pulling down a menu brings more correlated
pixel than before (an area with many pixels with white back-
ground), thus further reducing the number of pixel reads.
The reduction is however marginal due to the relative small
size of the menus: the percentage of predicted bits increases
to about 79% for these two updates.
5.2 Overhead Analysis
In the proposed architecture, we have an extra bit per pixel,
stored in the PB. Since we assume 24 bits per pixel, as in a
full color display, we have an overhead, in terms of memory
cells, of about 4%. Considering that the flag bits are stored
in separate array, we conservatively overestimate the energy
cost of reading a FB bit to be 10% of that for reading a
pixel in the FB. This is the overhead value considered in
determining the savings plotted in Figure 5.
We have implemented the control circuit, and synthesized
it with Synopsys DesignCompiler on a 0.13 µm library by
STMicroelectronics. Simulations shows that while writing
and reading the frame, the control circuit consumes 3 pJ per
pixel per write and 0.17 pJ per pixel per read. As a rough
comparison, the typical cost of a read access of a DRAM
array for a similar technology is about three orders of mag-
nitude larger: assuming a typical current of about 100mA,
access time of 10–20 ns, and supply voltage of 1.5V, yields
about 1nJ for a single access.
6 Conclusions
The frames stored in frame buffers exhibit strong spatial lo-
cality. We have proposed a compression architecture to ex-
ploit this locality by having one bit per pixel to indicate if
a pixel is the same as the previous one; the extra bit is used
to avoid reading such pixels. Our compression approach
achieves 39% of relative savings while reading pixels from
frame buffer. We have also shown that our approach works
independent of the size and position of the updates.
References
[1] I. Choi, H. Shim, N. Chang, “Low-power color TFT LCD display
for hand-held embedded systems,” ISPLED’02, pp. 112–117, Au-
gust 2002.
[2] H. Shim, N. Chang, M. Pedram, “A compressed frame buffer to
reduce display power consumption in mobile systems,” ASP-DAC
2004, pp. 819–824, January 2004.
[3] L. Benini, A. Bogliolo, G. De Micheli, “A survey of design tech-
niques for system-level dynamic power management,” IEEE Trans-
actions on VLSI, Volume 8, Issue 3, pp. 819–824, June 2000.
[4] T. Ohsawa, K. Kai, K. Murakami, “Optimizing the DRAM Refresh
Count for Merged DRAM/Logic LSIs,” ISLPED’98, Aug. 1998, pp.
82–87.
[5] J. Kim, M.C. Papaefthymiou, “Block-Based Multi-period Dynamic
Memory Design for Low Data-Retention Power,” IEEE Transac-
tions on VLSI, Vol. 11, No. 6, Dec. 2003, pp. 1006–1018.
[6] K. Inoue, et al. “A 10 Mb 3D frame buffer memory with Z-compare
and alpha-blend units,” ISSCC’95, pp. 129–134, February 1995.
[7] M. Yoshimoto, et al. “A divided word-line structure in the static
RAM and its application to a 64K full CMOS RAM,” IEEE Journal
of Solid-State Circuits, Volume 18, Issue 5, pp. 479–485, October
1983.
Proceedings of the 2005 International Conference on Computer Design (ICCD’05) 
0-7695-2451-6/05 $20.00 © 2005 IEEE 
Authorized licensed use limited to: University of Southern California. Downloaded on April 08,2010 at 20:17:38 UTC from IEEE Xplore.  Restrictions apply. 
