This paper describes a novel memory hierarchy and line-pixel-lookahead (LPL) for an H.264/AVC video decoder. The memory system is the bottleneck of most video processors, particularly in the newly announced H.264/AVC. This is because it utilizes the neighboring pixels to create a reliable predictor, leading to a dependency on a long past history of data. This problem can be resolved by allocating memory space but inducing large silicon area and power consumption as well. We first review the existing solutions and propose a three-level memory hierarchy with line-pixel-lookahead to improve access efficiency. Three-level memory hierarchy includes registers, content/slice SRAM and external frame DRAM. We emphasize the need to consider the secondary hierarchy, content/slice SRAM, during the design of an H.264/AVC decoder. Specifically, we introduce a slice SRAM and line-pixel-lookahead to lower the memory capacity and external bandwidth. This SRAM stores neighboring pixels and prevents the data re-access from DRAM. Linepixel-lookahead exploits multi-dimensional pixel locality so as to averagely improve prediction performance by 6.54% compared to conventional vertical prediction. Simulation results also reveal that the proposal makes a better trade-off between memory allocation and external bandwidth as well as power, leading to 50% of memory power reduction compared to the design without exploiting the secondary slice SRAM hierarchy.
Introduction
While there has been much work studying memory performance for scientific and general-purpose applications, this paper focuses on the need of H.264/ AVC [1] video applications. H.264/AVC achieves high compression ratio since it adequately utilizes the neighboring pixels to obtain a reliable predictor and reduces the prediction errors. Compared to prevalent MPEG-x and H.26x video standards, H.264/AVC [1] decodes present pixel from a long history of pixel data and therefore requires much intermediate storage for VLSI implementation. Therefore, this high data correlation or dependency leads to a great challenge of memory subsystem in designing multimedia systems [2] [3] [4] .
Many memory-hierarchy-based designs of H.264/ AVC have been reported of the time [5-10]. However, they usually developed a bandwidth and/or memory capacity-starved design approaches without taking into account memory allocation and data locality issues. In this paper, we first review existing memory hierarchies in video processing and microprocessor systems. We found that three-level memory hierarchy is generally accepted in existing H.264/AVC systems and composed of registers, content SRAM and frame DRAM. Since the disparity between registers and DRAM hierarchy, SRAM hierarchy design plays an increasing role and will dominate the system area or power requirements. Hence, this paper pays more attention on this level of the hierarchy. The emphasis on SRAM is also apparent from Fig. 1 which shows the die photo of the MPEG-2/H.264 video decoding system in [4] . Most of the usable area is dedicated to the on-chip SRAM and data buffers which occupy 40% and 70% of system area and power dissipation, respectively. As a result, this observation leads us to extend the topic of memory hierarchy in the H.264/ AVC video decoding system.
To improve the memory hierarchy in a video decoding system, we exploit three-level memory hierarchy with line-pixel-lookahead (LPL) schemes. In addition to the content SRAM in the secondary hierarchy, we additionally introduce a slice SRAM to pre-store the neighboring pixel to improve the access efficiency. On the other hand, H.264/AVC video standard [1] is characterized by a peculiar access locality since a high probability exists to access logically adjacent pixel in vertical direction. Because of this predictable data access pattern, a hypothesisbased lookahead scheme has been proposed to predict what data will be necessary well in advance, and thereby improve predictive miss rates [4] . In this paper, we further improve the prediction scheme by utilizing multi-dimensional features and incorporating a 4Â3 TAG template. Therefore, the proposal can averagely reduce miss rates by 6.54% compared to the vertical prediction in [4] .
We proceed to integrate the proposed memory hierarchy with LPL scheme into H.264/AVC video systems. In general, the performance of prediction unit mainly relies on the prediction miss/hit. Moreover, the performance also impacts the memory size and external bandwidth. To make a better trade-off in different performance indices, we analyze and optimize the memory capacity and bandwidth in SRAM as well as DRAM hierarchies. Therefore, the optimized memory hierarchy with line-pixellookahead achieves 50% of memory power reduction compared to the design without exploiting the slice SRAM hierarchy. The remainder of this paper is structured as follows. Section 2 outlines a review of related works in the memory hierarchy of H.264/ AVC. Sections 3 and 4 describe how data reuse can be exploited in the three-level memory hierarchy and line-pixel-lookahead (LPL) scheme, respectively. Simulation results are summarized in Section 5 and conclusions are made in Section 6.
Reviews of Memory Hierarchy in H.264/AVC

H.264/AVC Video Standards
We start with a brief overview of H.264/AVC video standard and illustrate why memory storage is required in practical VLSI implementation. In general, the intent of the H.264/AVC project was to create a standard that would be capable of providing good video quality at bit rates that are substantially lower than what previous standards would need (e.g., relative to MPEG-2, H.263, or MPEG-4 Part 2), and to do so without so much of an increase in complexity. The reduced bit rates come from some new techniques such as spatial prediction in intracoding, variable block-size motion compensation, 4Â4 integer transformation, context-adaptive entropy coding, and adaptive deblocking filter. Although those coding tools improve compressed performance, they suffer from dependencies on a long past history of pixel data. That is, present data will reference the previously decoded neighboring pixels or syntax elements. Consequently, previously decoded results should be stored into a certain amount of storage for 
