9,721 research outputs found

    A high performance hardware architecture for one bit transform based motion estimation

    Get PDF
    Motion Estimation (ME) is the most computationally intensive part of video compression and video enhancement systems. One bit transform (IBT) based ME algorithms have low computational complexity. Therefore, in this paper, we propose a high performance systolic hardware architecture for IBT based ME. The proposed hardware performs full search ME for 4 Macroblocks in parallel and it is the fastest IBT based ME hardware reported in the literature. In addition, it uses less on-chip memory than the previous IBT based ME hardware by using a novel data reuse scheme and memory organization. The proposed hardware is implemented in Verilog HDL. It consumes %34 of the slices in a Xilinx XC2VP30-7 FPGA. It works at 115 MHz in the same FPGA and is capable of processing 50 1920x1080 full High Definition frames per second. Therefore, it can be used in consumer electronics products that require real-time video processing or compression

    A toolset for the analysis and optimization of motion estimation algorithms and processors

    Get PDF

    Optimization of the motion estimation for parallel embedded systems in the context of new video standards

    Get PDF
    15 pagesInternational audienceThe effciency of video compression methods mainly depends on the motion compensation stage, and the design of effcient motion estimation techniques is still an important issue. An highly accurate motion estimation can significantly reduce the bit-rate, but involves a high computational complexity. This is particularly true for new generations of video compression standards, MPEG AVC and HEVC, which involves techniques such as different reference frames, sub-pixel estimation, variable block sizes. In this context, the design of fast motion estimation solutions is necessary, and can concerned two linked aspects: a high quality algorithm and its effcient implementation. This paper summarizes our main contributions in this domain. In particular, we first present the HME (Hierarchical Motion Estimation) technique. It is based on a multi-level refinement process where the motion estimation vectors are first estimated on a sub-sampled image. The multi-levels decomposition provides robust predictions and is particularly suited for variable block sizes motion estimations. The HME method has been integrated in a AVC encoder, and we propose a parallel implementation of this technique, with the motion estimation at pixel level performed by a DSP processor, and the sub-pixel refinement realized in an FPGA. The second technique that we present is called HDS for Hierarchical Diamond Search. It combines the multi-level refinement of HME, with a fast search at pixel-accuracy inspired by the EPZS method. This paper also presents its parallel implementation onto a multi-DSP platform and the its use in the HEVC context

    Motion estimation with chessboard pattern prediction strategy

    Get PDF
    Due to high correlations among the adjacent blocks, several algorithms utilize movement information of spatially and temporally correlated neighboring blocks to adapt their search patterns to that information. In this paper, this information is used to define a dynamic search pattern. Each frame is divided into two sets, black and white blocks, like a chessboard pattern and a different search pattern, is defined for each set. The advantage of this definition is that the number of spatially neighboring blocks is increased for each current block and it leads to a better prediction for each block. Simulation results show that the proposed algorithm is closer to the Full-Search algorithm in terms of quality metrics such as PSNR than the other state-of-the-art algorithms while at the same time the average number of search points is less.info:eu-repo/semantics/publishedVersio

    VLSI architectures design for encoders of High Efficiency Video Coding (HEVC) standard

    Get PDF
    The growing popularity of high resolution video and the continuously increasing demands for high quality video on mobile devices are producing stronger needs for more efficient video encoder. Concerning these desires, HEVC, a newest video coding standard, has been developed by a joint team formed by ISO/IEO MPEG and ITU/T VCEG. Its design goal is to achieve a 50% compression gain over its predecessor H.264 with an equal or even higher perceptual video quality. Motion Estimation (ME) being as one of the most critical module in video coding contributes almost 50%-70% of computational complexity in the video encoder. This high consumption of the computational resources puts a limit on the performance of encoders, especially for full HD or ultra HD videos, in terms of coding speed, bit-rate and video quality. Thus the major part of this work concentrates on the computational complexity reduction and improvement of timing performance of motion estimation algorithms for HEVC standard. First, a new strategy to calculate the SAD (Sum of Absolute Difference) for motion estimation is designed based on the statistics on property of pixel data of video sequences. This statistics demonstrates the size relationship between the sum of two sets of pixels has a determined connection with the distribution of the size relationship between individual pixels from the two sets. Taking the advantage of this observation, only a small proportion of pixels is necessary to be involved in the SAD calculation. Simulations show that the amount of computations required in the full search algorithm is reduced by about 58% on average and up to 70% in the best case. Secondly, from the scope of parallelization an enhanced TZ search for HEVC is proposed using novel schemes of multiple MVPs (motion vector predictor) and shared MVP. Specifically, resorting to multiple MVPs the initial search process is performed in parallel at multiple search centers, and the ME processing engine for PUs within one CU are parallelized based on the MVP sharing scheme on CU (coding unit) level. Moreover, the SAD module for ME engine is also parallelly implemented for PU size of 32×32. Experiments indicate it achieves an appreciable improvement on the throughput and coding efficiency of the HEVC video encoder. In addition, the other part of this thesis is contributed to the VLSI architecture design for finding the first W maximum/minimum values targeting towards high speed and low hardware cost. The architecture based on the novel bit-wise AND scheme has only half of the area of the best reference solution and its critical path delay is comparable with other implementations. While the FPCG (full parallel comparison grid) architecture, which utilizes the optimized comparator-based structure, achieves 3.6 times faster on average on the speed and even 5.2 times faster at best comparing with the reference architectures. Finally the architecture using the partial sorting strategy reaches a good balance on the timing performance and area, which has a slightly lower or comparable speed with FPCG architecture and a acceptable hardware cost

    Implementation of a motion estimation algorithm for Intel FPGAs using OpenCL

    Get PDF
    Producción CientíficaMotion Estimation is one of the main tasks behind any video encoder. It is a compu- tationally costly task; therefore, it is usually delegated to specific or reconfigurable hardware, such as FPGAs. Over the years, multiple FPGA implementations have been developed, mainly using hardware description languages such as Verilog or VHDL. Since programming using hardware description languages is a complex task, it is desirable to use higher-level languages to develop FPGA applications.The aim of this work is to evaluate OpenCL, in terms of expressiveness, as a tool for devel- oping this kind of FPGA applications. To do so, we present and evaluate a parallel implementation of the Block Matching Motion Estimation process using OpenCL for Intel FPGAs, usable and tested on an Intel Stratix 10 FPGA. The implementa- tion efficiently processes Full HD frames completely inside the FPGA. In this work, we show the resource utilization when synthesizing the code on an Intel Stratix 10 FPGA, as well as a performance comparison with multiple CPU implementations with varying levels of optimization and vectorization capabilities. We also compare the proposed OpenCL implementation, in terms of resource utilization and perfor- mance, with estimations obtained from an equivalent VHDL implementation.Junta de Castilla y León - Consejería de Educación de la Proyecto PROPHET-2 (VA226P20)Ministerio de Economía, Industria y Competitividad: (PID2019- 104834 GB-I00) and European Regional Development Fund (ERDF) program: Project PCAS (TIN2017-88614-R)Ministerio de Ciencia e Innovación (PID2019-104184RB-I00 / AEI / 10.13039/501100011033)Xunta de Galicia y fondos FEDER de la UE (Centro de Investigación de Galicia acreditación 2019-2022, ref. ED431G 2019/01; Consolidation Program of Competitive Reference Groups, ref. ED431C 2021/30Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación y “European Union NextGenerationEU/PRTR” : (MCIN/ AEI/10.13039/501100011033) - grant TED2021-130367B-I00Publicación en abierto financiada por el Consorcio de Bibliotecas Universitarias de Castilla y León (BUCLE), con cargo al Programa Operativo 2014ES16RFOP009 FEDER 2014-2020 DE CASTILLA Y LEÓN, Actuación:20007-CL - Apoyo Consorcio BUCL

    A Review Paper On Motion Estimation Techniques

    Get PDF
    Motion estimation (ME) is a primary action for video compression. Actually, it leads to heavily to the compression efficiency by eliminating temporal redundancies. This approach is one among the critical part in a video encoder and can take itself greater than half of the coding complexity or computational coding time. Several fast ME algorithms were proposed as well as realized. In this paper, we offers a brief review on various motion estimation techniques mainly block matching motion estimation techniques. The paper additionally presents a very brief introduction to the whole flow of video motion vector calculation

    Parallel H.264/AVC Fast Rate-Distortion Optimized Motion Estimation using Graphics Processing Unit and Dedicated Hardware

    Get PDF
    Heterogeneous systems on a single chip composed of CPU, Graphical Processing Unit (GPU), and Field Programmable Gate Array (FPGA) are expected to emerge in near future. In this context, the System on Chip (SoC) can be dynamically adapted to employ different architectures for execution of data-intensive applications. Motion estimation is one such task that can be accelerated using FPGA and GPU for high performance H.264/AVC encoder implementation. In most of works on parallel implementation of motion estimation, the bit rate cost of motion vectors is generally ignored. On the contrary, this paper presents a fast rate-distortion optimized parallel motion estimation algorithm implemented on GPU using OpenCL and FPGA/ASIC using VHDL. The predicted motion vectors are estimated from temporally preceding motion vectors and used for evaluating the bit rate cost of the motion vectors simultaneously. The experimental results show that the proposed scheme achieves significant speedup on GPU and FPGA, and has comparable ratedistortion performance with respect to sequential fast motion estimation algorithm

    Automatic aerial target detection and tracking system in airborne FLIR images based on efficient target trajectory filtering

    Get PDF
    Common strategies for detection and tracking of aerial moving targets in airborne Forward-Looking Infrared (FLIR) images offer accurate results in images composed by a non-textured sky. However, when cloud and earth regions appear in the image sequence, those strategies result in an over-detection that increases very significantly the false alarm rate. Besides, the airborne camera induces a global motion in the image sequence that complicates even more detection and tracking tasks. In this work, an automatic detection and tracking system with an innovative and efficient target trajectory filtering is presented. It robustly compensates the global motion to accurately detect and track potential aerial targets. Their trajectories are analyzed by a curve fitting technique to reliably validate real targets. This strategy allows to filter false targets with stationary or erratic trajectories. The proposed system makes special emphasis in the use of low complexity video analysis techniques to achieve real-time operation. Experimental results using real FLIR sequences show a dramatic reduction of the false alarm rate, while maintaining the detection rate

    Frame Interpolation for Cloud-Based Mobile Video Streaming

    Full text link
    © 2016 IEEE. Cloud-based High Definition (HD) video streaming is becoming popular day by day. On one hand, it is important for both end users and large storage servers to store their huge amount of data at different locations and servers. On the other hand, it is becoming a big challenge for network service providers to provide reliable connectivity to the network users. There have been many studies over cloud-based video streaming for Quality of Experience (QoE) for services like YouTube. Packet losses and bit errors are very common in transmission networks, which affect the user feedback over cloud-based media services. To cover up packet losses and bit errors, Error Concealment (EC) techniques are usually applied at the decoder/receiver side to estimate the lost information. This paper proposes a time-efficient and quality-oriented EC method. The proposed method considers H.265/HEVC based intra-encoded videos for the estimation of whole intra-frame loss. The main emphasis in the proposed approach is the recovery of Motion Vectors (MVs) of a lost frame in real-time. To boost-up the search process for the lost MVs, a bigger block size and searching in parallel are both considered. The simulation results clearly show that our proposed method outperforms the traditional Block Matching Algorithm (BMA) by approximately 2.5 dB and Frame Copy (FC) by up to 12 dB at a packet loss rate of 1%, 3%, and 5% with different Quantization Parameters (QPs). The computational time of the proposed approach outperforms the BMA by approximately 1788 seconds
    corecore