2,765 research outputs found

    Dynamically variable step search motion estimation algorithm and a dynamically reconfigurable hardware for its implementation

    Get PDF
    Motion Estimation (ME) is the most computationally intensive part of video compression and video enhancement systems. For the recently available High Definition (HD) video formats, the computational complexity of De full search (FS) ME algorithm is prohibitively high, whereas the PSNR obtained by fast search ME algorithms is low. Therefore, ill this paper, we present Dynamically Variable Step Search (DVSS) ME algorithm for Processing high definition video formats and a dynamically reconfigurable hardware efficiently implementing DVSS algorithm. The architecture for efficiently implementing DVSS algorithm. The simulation results showed that DVSS algorithm performs very close to FS algorithm by searching much fewer search locations than FS algorithm and it outperforms successful past search ME algorithms by searching more search locations than these algorithms. The proposed hardware is implemented in VHDL and is capable, of processing high definition video formats in real time. Therefore, it can be used in consumer electronics products for video compression, frame rate up-conversion and de-interlacing(1)

    Variable block size motion estimation hardware for video encoders.

    Get PDF
    Li, Man Ho.Thesis submitted in: November 2006.Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.Includes bibliographical references (leaves 137-143).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.3Chapter 1.2 --- The objectives of this thesis --- p.4Chapter 1.3 --- Contributions --- p.5Chapter 1.4 --- Thesis structure --- p.6Chapter 2 --- Digital video compression --- p.8Chapter 2.1 --- Introduction --- p.8Chapter 2.2 --- Fundamentals of lossy video compression --- p.9Chapter 2.2.1 --- Video compression and human visual systems --- p.10Chapter 2.2.2 --- Representation of color --- p.10Chapter 2.2.3 --- Sampling methods - frames and fields --- p.11Chapter 2.2.4 --- Compression methods --- p.11Chapter 2.2.5 --- Motion estimation --- p.12Chapter 2.2.6 --- Motion compensation --- p.13Chapter 2.2.7 --- Transform --- p.13Chapter 2.2.8 --- Quantization --- p.14Chapter 2.2.9 --- Entropy Encoding --- p.14Chapter 2.2.10 --- Intra-prediction unit --- p.14Chapter 2.2.11 --- Deblocking filter --- p.15Chapter 2.2.12 --- Complexity analysis of on different com- pression stages --- p.16Chapter 2.3 --- Motion estimation process --- p.16Chapter 2.3.1 --- Block-based matching method --- p.16Chapter 2.3.2 --- Motion estimation procedure --- p.18Chapter 2.3.3 --- Matching Criteria --- p.19Chapter 2.3.4 --- Motion vectors --- p.21Chapter 2.3.5 --- Quality judgment --- p.22Chapter 2.4 --- Block-based matching algorithms for motion estimation --- p.23Chapter 2.4.1 --- Full search (FS) --- p.23Chapter 2.4.2 --- Three-step search (TSS) --- p.24Chapter 2.4.3 --- Two-dimensional Logarithmic Search Algorithm (2D-log search) --- p.25Chapter 2.4.4 --- Diamond Search (DS) --- p.25Chapter 2.4.5 --- Fast full search (FFS) --- p.26Chapter 2.5 --- Complexity analysis of motion estimation --- p.27Chapter 2.5.1 --- Different searching algorithms --- p.28Chapter 2.5.2 --- Fixed-block size motion estimation --- p.28Chapter 2.5.3 --- Variable block size motion estimation --- p.29Chapter 2.5.4 --- Sub-pixel motion estimation --- p.30Chapter 2.5.5 --- Multi-reference frame motion estimation . --- p.30Chapter 2.6 --- Picture quality analysis --- p.31Chapter 2.7 --- Summary --- p.32Chapter 3 --- Arithmetic for video encoding --- p.33Chapter 3.1 --- Introduction --- p.33Chapter 3.2 --- Number systems --- p.34Chapter 3.2.1 --- Non-redundant Number System --- p.34Chapter 3.2.2 --- Redundant number system --- p.36Chapter 3.3 --- Addition/subtraction algorithm --- p.38Chapter 3.3.1 --- Non-redundant number addition --- p.39Chapter 3.3.2 --- Carry-save number addition --- p.39Chapter 3.3.3 --- Signed-digit number addition --- p.40Chapter 3.4 --- Bit-serial algorithms --- p.42Chapter 3.4.1 --- Least-significant-bit (LSB) first mode --- p.42Chapter 3.4.2 --- Most-significant-bit (MSB) first mode --- p.43Chapter 3.5 --- Absolute difference algorithm --- p.44Chapter 3.5.1 --- Non-redundant algorithm for absolute difference --- p.44Chapter 3.5.2 --- Redundant algorithm for absolute difference --- p.45Chapter 3.6 --- Multi-operand addition algorithm --- p.47Chapter 3.6.1 --- Bit-parallel non-redundant adder tree implementation --- p.47Chapter 3.6.2 --- Bit-parallel carry-save adder tree implementation --- p.49Chapter 3.6.3 --- Bit serial signed digit adder tree implementation --- p.49Chapter 3.7 --- Comparison algorithms --- p.50Chapter 3.7.1 --- Non-redundant comparison algorithm --- p.51Chapter 3.7.2 --- Signed-digit comparison algorithm --- p.52Chapter 3.8 --- Summary --- p.53Chapter 4 --- VLSI architectures for video encoding --- p.54Chapter 4.1 --- Introduction --- p.54Chapter 4.2 --- Implementation platform - (FPGA) --- p.55Chapter 4.2.1 --- Basic FPGA architecture --- p.55Chapter 4.2.2 --- DSP blocks in FPGA device --- p.56Chapter 4.2.3 --- Advantages employing FPGA --- p.57Chapter 4.2.4 --- Commercial FPGA Device --- p.58Chapter 4.3 --- Top level architecture of motion estimation processor --- p.59Chapter 4.4 --- Bit-parallel architectures for motion estimation --- p.60Chapter 4.4.1 --- Systolic arrays --- p.60Chapter 4.4.2 --- Mapping of a motion estimation algorithm onto systolic array --- p.61Chapter 4.4.3 --- 1-D systolic array architecture (LA-ID) --- p.63Chapter 4.4.4 --- 2-D systolic array architecture (LA-2D) --- p.64Chapter 4.4.5 --- 1-D Tree architecture (GA-1D) --- p.64Chapter 4.4.6 --- 2-D Tree architecture (GA-2D) --- p.65Chapter 4.4.7 --- Variable block size support in bit-parallel architectures --- p.66Chapter 4.5 --- Bit-serial motion estimation architecture --- p.68Chapter 4.5.1 --- Data Processing Direction --- p.68Chapter 4.5.2 --- Algorithm mapping and dataflow design . --- p.68Chapter 4.5.3 --- Early termination scheme --- p.69Chapter 4.5.4 --- Top-level architecture --- p.70Chapter 4.5.5 --- Non redundant positive number to signed digit conversion --- p.71Chapter 4.5.6 --- Signed-digit adder tree --- p.73Chapter 4.5.7 --- SAD merger --- p.74Chapter 4.5.8 --- Signed-digit comparator --- p.75Chapter 4.5.9 --- Early termination controller --- p.76Chapter 4.5.10 --- Data scheduling and timeline --- p.80Chapter 4.6 --- Decision metric in different architectural types . . --- p.80Chapter 4.6.1 --- Throughput --- p.81Chapter 4.6.2 --- Memory bandwidth --- p.83Chapter 4.6.3 --- Silicon area occupied and power consump- tion --- p.83Chapter 4.7 --- Architecture selection for different applications . . --- p.84Chapter 4.7.1 --- CIF and QCIF resolution --- p.84Chapter 4.7.2 --- SDTV resolution --- p.85Chapter 4.7.3 --- HDTV resolution --- p.85Chapter 4.8 --- Summary --- p.86Chapter 5 --- Results and comparison --- p.87Chapter 5.1 --- Introduction --- p.87Chapter 5.2 --- Implementation details --- p.87Chapter 5.2.1 --- Bit-parallel 1-D systolic array --- p.88Chapter 5.2.2 --- Bit-parallel 2-D systolic array --- p.89Chapter 5.2.3 --- Bit-parallel Tree architecture --- p.90Chapter 5.2.4 --- MSB-first bit-serial design --- p.91Chapter 5.3 --- Comparison between motion estimation architectures --- p.93Chapter 5.3.1 --- Throughput and latency --- p.93Chapter 5.3.2 --- Occupied resources --- p.94Chapter 5.3.3 --- Memory bandwidth --- p.95Chapter 5.3.4 --- Motion estimation algorithm --- p.95Chapter 5.3.5 --- Power consumption --- p.97Chapter 5.4 --- Comparison to ASIC and FPGA architectures in past literature --- p.99Chapter 5.5 --- Summary --- p.101Chapter 6 --- Conclusion --- p.102Chapter 6.1 --- Summary --- p.102Chapter 6.1.1 --- Algorithmic optimizations --- p.102Chapter 6.1.2 --- Architecture and arithmetic optimizations --- p.103Chapter 6.1.3 --- Implementation on a FPGA platform . . . --- p.104Chapter 6.2 --- Future work --- p.106Chapter A --- VHDL Sources --- p.108Chapter A.1 --- Online Full Adder --- p.108Chapter A.2 --- Online Signed Digit Full Adder --- p.109Chapter A.3 --- Online Pull Adder Tree --- p.110Chapter A.4 --- SAD merger --- p.112Chapter A.5 --- Signed digit adder tree stage (top) --- p.116Chapter A.6 --- Absolute element --- p.118Chapter A.7 --- Absolute stage (top) --- p.119Chapter A.8 --- Online comparator element --- p.120Chapter A.9 --- Comparator stage (top) --- p.122Chapter A.10 --- MSB-first motion estimation processor --- p.134Bibliography --- p.13

    Efficient hardware implementations of low bit depth motion estimation algorithms

    Get PDF
    In this paper, we present efficient hardware implementation of multiplication free one-bit transform (MF1BT) based and constraint one-bit transform (C-1BT) based motion estimation (ME) algorithms, in order to provide low bit-depth representation based full search block ME hardware for real-time video encoding. We used a source pixel based linear array (SPBLA) hardware architecture for low bit depth ME for the first time in the literature. The proposed SPBLA based implementation results in a genuine data flow scheme which significantly reduces the number of data reads from the current block memory, which in turn reduces the power consumption by at least 50% compared to conventional 1BT based ME hardware architecture presented in the literature. Because of the binary nature of low bit-depth ME algorithms, their hardware architectures are more efficient than existing 8 bits/pixel representation based ME architectures

    Efficient hardware architectures for MPEG-4 core profile

    Get PDF
    Efficient hardware acceleration architectures are proposed for the most demandingMPEG-4 core profile algorithms, namely; texture motion estimation (TME), binary motion estimation (BME)and the shape adaptive discrete cosine transform (SA-DCT). The proposed ME designs may also be used for H.264, since both architectures can handle variable block sizes. Both ME architectures employ early termination techniques that reduce latency and save needless memory accesses and power consumption. They also use a pixel subsampling technique to facilitate parallelism, while balancing the computational load. The BME datapath also saves operations by using Run Length Coded (RLC) pixel addressing. The SA-DCT module has a re-configuring multiplier-less serial datapath using adders and multiplexers only to improve area and power. The SA-DCT packing steps are done using a minimal switching addressing scheme with guarded evaluation. All three modules have been synthesised targeting the WildCard-II FPGA benchmarking platform adopted by the MPEG-4 Part9 reference hardware group

    Hardware acceleration architectures for MPEG-Based mobile video platforms: a brief overview

    Get PDF
    This paper presents a brief overview of past and current hardware acceleration (HwA) approaches that have been proposed for the most computationally intensive compression tools of the MPEG-4 standard. These approaches are classified based on their historical evolution and architectural approach. An analysis of both evolutionary and functional classifications is carried out in order to speculate on the possible trends of the HwA architectures to be employed in mobile video platforms

    Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding

    Get PDF
    Real-time and high-quality video coding is gaining a wide interest in the research and industrial community for different applications. H.264/AVC, a recent standard for high performance video coding, can be successfully exploited in several scenarios including digital video broadcasting, high-definition TV and DVD-based systems, which require to sustain up to tens of Mbits/s. To that purpose this paper proposes optimized architectures for H.264/AVC most critical tasks, Motion estimation and context adaptive binary arithmetic coding. Post synthesis results on sub-micron CMOS standard-cells technologies show that the proposed architectures can actually process in real-time 720 × 480 video sequences at 30 frames/s and grant more than 50 Mbits/s. The achieved circuit complexity and power consumption budgets are suitable for their integration in complex VLSI multimedia systems based either on AHB bus centric on-chip communication system or on novel Network-on-Chip (NoC) infrastructures for MPSoC (Multi-Processor System on Chip

    Energy-efficient acceleration of MPEG-4 compression tools

    Get PDF
    We propose novel hardware accelerator architectures for the most computationally demanding algorithms of the MPEG-4 video compression standard-motion estimation, binary motion estimation (for shape coding), and the forward/inverse discrete cosine transforms (incorporating shape adaptive modes). These accelerators have been designed using general low-energy design philosophies at the algorithmic/architectural abstraction levels. The themes of these philosophies are avoiding waste and trading area/performance for power and energy gains. Each core has been synthesised targeting TSMC 0.09 μm TCBN90LP technology, and the experimental results presented in this paper show that the proposed cores improve upon the prior art

    Multi-standard reconfigurable motion estimation processor for hybrid video codecs

    Get PDF

    Embedded FIR filter design for real-time refocusing using a standard plenoptic video camera

    Get PDF
    Copyright 2014 Society of Photo-Optical Instrumentation Engineers and IS&T—The Society for Imaging Science and Technology. One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.A novel and low-cost embedded hardware architecture for real-time refocusing based on a standard plenoptic camera is presented in this study. The proposed layout design synthesizes refocusing slices directly from micro images by omitting the process for the commonly used sub-aperture extraction. Therefore, intellectual property cores, containing switch controlled Finite Impulse Response (FIR) filters, are developed and applied to the Field Programmable Gate Array (FPGA) XC6SLX45 from Xilinx. Enabling the hardware design to work economically, the FIR filters are composed of stored product as well as upsampling and interpolation techniques in order to achieve an ideal relation between image resolution, delay time, power consumption and the demand of logic gates. The video output is transmitted via High-Definition Multimedia Interface (HDMI) with a resolution of 720p at a frame rate of 60 fps conforming to the HD ready standard. Examples of the synthesized refocusing slices are presented
    corecore