33 research outputs found

    Parallel algorithms and architectures for low power video decoding

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 197-204).Parallelism coupled with voltage scaling is an effective approach to achieve high processing performance with low power consumption. This thesis presents parallel architectures and algorithms designed to deliver the power and performance required for current and next generation video coding. Coding efficiency, area cost and scalability are also addressed. First, a low power video decoder is presented for the current state-of-the-art video coding standard H.264/AVC. Parallel architectures are used along with voltage scaling to deliver high definition (HD) decoding at low power levels. Additional architectural optimizations such as reducing memory accesses and multiple frequency/voltage domains are also described. An H.264/AVC Baseline decoder test chip was fabricated in 65-nm CMOS. It can operate at 0.7 V for HD (720p, 30 fps) video decoding and with a measured power of 1.8 mW. The highly scalable decoder can tradeoff power and performance across >100x range. Second, this thesis demonstrates how serial algorithms, such as Context-based Adaptive Binary Arithmetic Coding (CABAC), can be redesigned for parallel architectures to enable high throughput with low coding efficiency cost. A parallel algorithm called the Massively Parallel CABAC (MP-CABAC) is presented that uses syntax element partitions and interleaved entropy slices to achieve better throughput-coding efficiency and throughput-area tradeoffs than H.264/AVC. The parallel algorithm also improves scalability by providing a third dimension to tradeoff coding efficiency for power and performance. Finally, joint algorithm-architecture optimizations are used to increase performance and reduce area with almost no coding penalty. The MP-CABAC is mapped to a highly parallel architecture with 80 parallel engines, which together delivers >10x higher throughput than existing H.264/AVC CABAC implementations. A MP-CABAC test chip was fabricated in 65-nm CMOS to demonstrate the power-performance-coding efficiency tradeoff.by Vivienne. Sze.Ph.D

    Variable block size motion estimation hardware for video encoders.

    Get PDF
    Li, Man Ho.Thesis submitted in: November 2006.Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.Includes bibliographical references (leaves 137-143).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.3Chapter 1.2 --- The objectives of this thesis --- p.4Chapter 1.3 --- Contributions --- p.5Chapter 1.4 --- Thesis structure --- p.6Chapter 2 --- Digital video compression --- p.8Chapter 2.1 --- Introduction --- p.8Chapter 2.2 --- Fundamentals of lossy video compression --- p.9Chapter 2.2.1 --- Video compression and human visual systems --- p.10Chapter 2.2.2 --- Representation of color --- p.10Chapter 2.2.3 --- Sampling methods - frames and fields --- p.11Chapter 2.2.4 --- Compression methods --- p.11Chapter 2.2.5 --- Motion estimation --- p.12Chapter 2.2.6 --- Motion compensation --- p.13Chapter 2.2.7 --- Transform --- p.13Chapter 2.2.8 --- Quantization --- p.14Chapter 2.2.9 --- Entropy Encoding --- p.14Chapter 2.2.10 --- Intra-prediction unit --- p.14Chapter 2.2.11 --- Deblocking filter --- p.15Chapter 2.2.12 --- Complexity analysis of on different com- pression stages --- p.16Chapter 2.3 --- Motion estimation process --- p.16Chapter 2.3.1 --- Block-based matching method --- p.16Chapter 2.3.2 --- Motion estimation procedure --- p.18Chapter 2.3.3 --- Matching Criteria --- p.19Chapter 2.3.4 --- Motion vectors --- p.21Chapter 2.3.5 --- Quality judgment --- p.22Chapter 2.4 --- Block-based matching algorithms for motion estimation --- p.23Chapter 2.4.1 --- Full search (FS) --- p.23Chapter 2.4.2 --- Three-step search (TSS) --- p.24Chapter 2.4.3 --- Two-dimensional Logarithmic Search Algorithm (2D-log search) --- p.25Chapter 2.4.4 --- Diamond Search (DS) --- p.25Chapter 2.4.5 --- Fast full search (FFS) --- p.26Chapter 2.5 --- Complexity analysis of motion estimation --- p.27Chapter 2.5.1 --- Different searching algorithms --- p.28Chapter 2.5.2 --- Fixed-block size motion estimation --- p.28Chapter 2.5.3 --- Variable block size motion estimation --- p.29Chapter 2.5.4 --- Sub-pixel motion estimation --- p.30Chapter 2.5.5 --- Multi-reference frame motion estimation . --- p.30Chapter 2.6 --- Picture quality analysis --- p.31Chapter 2.7 --- Summary --- p.32Chapter 3 --- Arithmetic for video encoding --- p.33Chapter 3.1 --- Introduction --- p.33Chapter 3.2 --- Number systems --- p.34Chapter 3.2.1 --- Non-redundant Number System --- p.34Chapter 3.2.2 --- Redundant number system --- p.36Chapter 3.3 --- Addition/subtraction algorithm --- p.38Chapter 3.3.1 --- Non-redundant number addition --- p.39Chapter 3.3.2 --- Carry-save number addition --- p.39Chapter 3.3.3 --- Signed-digit number addition --- p.40Chapter 3.4 --- Bit-serial algorithms --- p.42Chapter 3.4.1 --- Least-significant-bit (LSB) first mode --- p.42Chapter 3.4.2 --- Most-significant-bit (MSB) first mode --- p.43Chapter 3.5 --- Absolute difference algorithm --- p.44Chapter 3.5.1 --- Non-redundant algorithm for absolute difference --- p.44Chapter 3.5.2 --- Redundant algorithm for absolute difference --- p.45Chapter 3.6 --- Multi-operand addition algorithm --- p.47Chapter 3.6.1 --- Bit-parallel non-redundant adder tree implementation --- p.47Chapter 3.6.2 --- Bit-parallel carry-save adder tree implementation --- p.49Chapter 3.6.3 --- Bit serial signed digit adder tree implementation --- p.49Chapter 3.7 --- Comparison algorithms --- p.50Chapter 3.7.1 --- Non-redundant comparison algorithm --- p.51Chapter 3.7.2 --- Signed-digit comparison algorithm --- p.52Chapter 3.8 --- Summary --- p.53Chapter 4 --- VLSI architectures for video encoding --- p.54Chapter 4.1 --- Introduction --- p.54Chapter 4.2 --- Implementation platform - (FPGA) --- p.55Chapter 4.2.1 --- Basic FPGA architecture --- p.55Chapter 4.2.2 --- DSP blocks in FPGA device --- p.56Chapter 4.2.3 --- Advantages employing FPGA --- p.57Chapter 4.2.4 --- Commercial FPGA Device --- p.58Chapter 4.3 --- Top level architecture of motion estimation processor --- p.59Chapter 4.4 --- Bit-parallel architectures for motion estimation --- p.60Chapter 4.4.1 --- Systolic arrays --- p.60Chapter 4.4.2 --- Mapping of a motion estimation algorithm onto systolic array --- p.61Chapter 4.4.3 --- 1-D systolic array architecture (LA-ID) --- p.63Chapter 4.4.4 --- 2-D systolic array architecture (LA-2D) --- p.64Chapter 4.4.5 --- 1-D Tree architecture (GA-1D) --- p.64Chapter 4.4.6 --- 2-D Tree architecture (GA-2D) --- p.65Chapter 4.4.7 --- Variable block size support in bit-parallel architectures --- p.66Chapter 4.5 --- Bit-serial motion estimation architecture --- p.68Chapter 4.5.1 --- Data Processing Direction --- p.68Chapter 4.5.2 --- Algorithm mapping and dataflow design . --- p.68Chapter 4.5.3 --- Early termination scheme --- p.69Chapter 4.5.4 --- Top-level architecture --- p.70Chapter 4.5.5 --- Non redundant positive number to signed digit conversion --- p.71Chapter 4.5.6 --- Signed-digit adder tree --- p.73Chapter 4.5.7 --- SAD merger --- p.74Chapter 4.5.8 --- Signed-digit comparator --- p.75Chapter 4.5.9 --- Early termination controller --- p.76Chapter 4.5.10 --- Data scheduling and timeline --- p.80Chapter 4.6 --- Decision metric in different architectural types . . --- p.80Chapter 4.6.1 --- Throughput --- p.81Chapter 4.6.2 --- Memory bandwidth --- p.83Chapter 4.6.3 --- Silicon area occupied and power consump- tion --- p.83Chapter 4.7 --- Architecture selection for different applications . . --- p.84Chapter 4.7.1 --- CIF and QCIF resolution --- p.84Chapter 4.7.2 --- SDTV resolution --- p.85Chapter 4.7.3 --- HDTV resolution --- p.85Chapter 4.8 --- Summary --- p.86Chapter 5 --- Results and comparison --- p.87Chapter 5.1 --- Introduction --- p.87Chapter 5.2 --- Implementation details --- p.87Chapter 5.2.1 --- Bit-parallel 1-D systolic array --- p.88Chapter 5.2.2 --- Bit-parallel 2-D systolic array --- p.89Chapter 5.2.3 --- Bit-parallel Tree architecture --- p.90Chapter 5.2.4 --- MSB-first bit-serial design --- p.91Chapter 5.3 --- Comparison between motion estimation architectures --- p.93Chapter 5.3.1 --- Throughput and latency --- p.93Chapter 5.3.2 --- Occupied resources --- p.94Chapter 5.3.3 --- Memory bandwidth --- p.95Chapter 5.3.4 --- Motion estimation algorithm --- p.95Chapter 5.3.5 --- Power consumption --- p.97Chapter 5.4 --- Comparison to ASIC and FPGA architectures in past literature --- p.99Chapter 5.5 --- Summary --- p.101Chapter 6 --- Conclusion --- p.102Chapter 6.1 --- Summary --- p.102Chapter 6.1.1 --- Algorithmic optimizations --- p.102Chapter 6.1.2 --- Architecture and arithmetic optimizations --- p.103Chapter 6.1.3 --- Implementation on a FPGA platform . . . --- p.104Chapter 6.2 --- Future work --- p.106Chapter A --- VHDL Sources --- p.108Chapter A.1 --- Online Full Adder --- p.108Chapter A.2 --- Online Signed Digit Full Adder --- p.109Chapter A.3 --- Online Pull Adder Tree --- p.110Chapter A.4 --- SAD merger --- p.112Chapter A.5 --- Signed digit adder tree stage (top) --- p.116Chapter A.6 --- Absolute element --- p.118Chapter A.7 --- Absolute stage (top) --- p.119Chapter A.8 --- Online comparator element --- p.120Chapter A.9 --- Comparator stage (top) --- p.122Chapter A.10 --- MSB-first motion estimation processor --- p.134Bibliography --- p.13

    Architectures for Adaptive Low-Power Embedded Multimedia Systems

    Get PDF
    This Ph.D. thesis describes novel hardware/software architectures for adaptive low-power embedded multimedia systems. Novel techniques for run-time adaptive energy management are proposed, such that both HW & SW adapt together to react to the unpredictable scenarios. A complete power-aware H.264 video encoder was developed. Comparison with state-of-the-art demonstrates significant energy savings while meeting the performance constraint and keeping the video quality degradation unnoticeable

    Implementing Real-Time Video Deblocking in FPGA Hardware

    Get PDF
    Video compression techniques are commonly used to meet the increasing demands for the storage and transmission of digital video content. Popular video compression techniques such as MPEG video encoding make use of block-transform coding algorithms which are susceptible to blocking artifacts. These artifacts can be reduced using a deblocking process, of which there are many. However, those deblocking algorithms which provide noticeable improvements in visual quality also tend to be computationally expensive and unsuitable for real-time video use. This dissertation selects and examines an appropriate algorithm for real-time video deblocking applications, and describes its hardware implementation on a Altera Cyclone II FPGA. The chosen algorithm is based on the concept of shifted thresholding; it reduces computational complexity by several means, such as by using only integer arithmetic and by replacing division operations with bit shifting. The implementation leverages the reduced hardware complexity of the chosen algorithm to cost-effectively implement real-time video deblocking

    Semi-synchronous video for deaf telephony with an adapted synchronous codec

    Get PDF
    Magister Scientiae - MScCommunication tools such as text-based instant messaging, voice and video relay services, real-time video chat and mobile SMS and MMS have successfully been used among Deaf people. Several years of field research with a local Deaf community revealed that disadvantaged South African Deaf people preferred to communicate with both Deaf and hearing peers in South African Sign Language as opposed to text. Synchronous video chat and video relay services provided such opportunities. Both types of services are commonly available in developed regions, but not in developing countries like South Africa. This thesis reports on a workaround approach to design and develop an asynchronous video communication tool that adapted synchronous video codecs to store-and-forward video delivery. This novel asynchronous video tool provided high quality South African Sign Language video chat at the expense of some additional latency. Synchronous video codec adaptation consisted of comparing codecs, and choosing one to optimise in order to minimise latency and preserve video quality. Traditional quality of service metrics only addressed real-time video quality and related services. There was no such standard for asynchronous video communication. Therefore, we also enhanced traditional objective video quality metrics with subjective assessment metrics conducted with the local Deaf community.South Afric

    Algorithms for compression of high dynamic range images and video

    Get PDF
    The recent advances in sensor and display technologies have brought upon the High Dynamic Range (HDR) imaging capability. The modern multiple exposure HDR sensors can achieve the dynamic range of 100-120 dB and LED and OLED display devices have contrast ratios of 10^5:1 to 10^6:1. Despite the above advances in technology the image/video compression algorithms and associated hardware are yet based on Standard Dynamic Range (SDR) technology, i.e. they operate within an effective dynamic range of up to 70 dB for 8 bit gamma corrected images. Further the existing infrastructure for content distribution is also designed for SDR, which creates interoperability problems with true HDR capture and display equipment. The current solutions for the above problem include tone mapping the HDR content to fit SDR. However this approach leads to image quality associated problems, when strong dynamic range compression is applied. Even though some HDR-only solutions have been proposed in literature, they are not interoperable with current SDR infrastructure and are thus typically used in closed systems. Given the above observations a research gap was identified in the need for efficient algorithms for the compression of still images and video, which are capable of storing full dynamic range and colour gamut of HDR images and at the same time backward compatible with existing SDR infrastructure. To improve the usability of SDR content it is vital that any such algorithms should accommodate different tone mapping operators, including those that are spatially non-uniform. In the course of the research presented in this thesis a novel two layer CODEC architecture is introduced for both HDR image and video coding. Further a universal and computationally efficient approximation of the tone mapping operator is developed and presented. It is shown that the use of perceptually uniform colourspaces for internal representation of pixel data enables improved compression efficiency of the algorithms. Further proposed novel approaches to the compression of metadata for the tone mapping operator is shown to improve compression performance for low bitrate video content. Multiple compression algorithms are designed, implemented and compared and quality-complexity trade-offs are identified. Finally practical aspects of implementing the developed algorithms are explored by automating the design space exploration flow and integrating the high level systems design framework with domain specific tools for synthesis and simulation of multiprocessor systems. The directions for further work are also presented

    Algorithm/Architecture Co-Exploration of Visual Computing: Overview and Future Perspectives

    Get PDF
    Concurrently exploring both algorithmic and architectural optimizations is a new design paradigm. This survey paper addresses the latest research and future perspectives on the simultaneous development of video coding, processing, and computing algorithms with emerging platforms that have multiple cores and reconfigurable architecture. As the algorithms in forthcoming visual systems become increasingly complex, many applications must have different profiles with different levels of performance. Hence, with expectations that the visual experience in the future will become continuously better, it is critical that advanced platforms provide higher performance, better flexibility, and lower power consumption. To achieve these goals, algorithm and architecture co-design is significant for characterizing the algorithmic complexity used to optimize targeted architecture. This paper shows that seamless weaving of the development of previously autonomous visual computing algorithms and multicore or reconfigurable architectures will unavoidably become the leading trend in the future of video technology
    corecore