18,978 research outputs found

    Perception-oriented methodology for robust motion estimation design

    Get PDF
    Optimizing a motion estimator (ME) for picture rate conversion is challenging. This is because there are many types of MEs and, within each type, many parameters, which makes subjective assessment of all the alternatives impractical. To solve this problem, we propose an automatic design methodology that provides `well-performing MEs' from the multitude of options. Moreover, we prove that applying this methodology results in subjectively pleasing quality of the upconverted video, even while our objective performance metrics are necessarily suboptimal. This proof involved a user rating of 93 MEs in 3 video sequences. The 93 MEs were systematically selected from a total of 7000 ME alternatives. The proposed methodology may provide an inspiration for similar tough multi-dimensional optimization tasks with unreliable metrics

    Advances in video motion analysis research for mature and emerging application areas

    Get PDF

    Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding

    Get PDF
    Real-time and high-quality video coding is gaining a wide interest in the research and industrial community for different applications. H.264/AVC, a recent standard for high performance video coding, can be successfully exploited in several scenarios including digital video broadcasting, high-definition TV and DVD-based systems, which require to sustain up to tens of Mbits/s. To that purpose this paper proposes optimized architectures for H.264/AVC most critical tasks, Motion estimation and context adaptive binary arithmetic coding. Post synthesis results on sub-micron CMOS standard-cells technologies show that the proposed architectures can actually process in real-time 720 × 480 video sequences at 30 frames/s and grant more than 50 Mbits/s. The achieved circuit complexity and power consumption budgets are suitable for their integration in complex VLSI multimedia systems based either on AHB bus centric on-chip communication system or on novel Network-on-Chip (NoC) infrastructures for MPSoC (Multi-Processor System on Chip

    Optimization of the motion estimation for parallel embedded systems in the context of new video standards

    Get PDF
    15 pagesInternational audienceThe effciency of video compression methods mainly depends on the motion compensation stage, and the design of effcient motion estimation techniques is still an important issue. An highly accurate motion estimation can significantly reduce the bit-rate, but involves a high computational complexity. This is particularly true for new generations of video compression standards, MPEG AVC and HEVC, which involves techniques such as different reference frames, sub-pixel estimation, variable block sizes. In this context, the design of fast motion estimation solutions is necessary, and can concerned two linked aspects: a high quality algorithm and its effcient implementation. This paper summarizes our main contributions in this domain. In particular, we first present the HME (Hierarchical Motion Estimation) technique. It is based on a multi-level refinement process where the motion estimation vectors are first estimated on a sub-sampled image. The multi-levels decomposition provides robust predictions and is particularly suited for variable block sizes motion estimations. The HME method has been integrated in a AVC encoder, and we propose a parallel implementation of this technique, with the motion estimation at pixel level performed by a DSP processor, and the sub-pixel refinement realized in an FPGA. The second technique that we present is called HDS for Hierarchical Diamond Search. It combines the multi-level refinement of HME, with a fast search at pixel-accuracy inspired by the EPZS method. This paper also presents its parallel implementation onto a multi-DSP platform and the its use in the HEVC context

    비디오 프레임 보간을 위한 다중 벡터 기반의 MEMC 및 심층 CNN

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2019. 2. 이혁재.Block-based hierarchical motion estimations are widely used and are successful in generating high-quality interpolation. However, it still fails in the motion estimation of small objects when a background region moves in a different direction. This is because the motion of small objects is neglected by the down-sampling and over-smoothing operations at the top level of image pyramids in the maximum a posterior (MAP) method. Consequently, the motion vector of small objects cannot be detected at the bottom level, and therefore, the small objects often appear deformed in an interpolated frame. This thesis proposes a novel algorithm that preserves the motion vector of the small objects by adding a secondary motion vector candidate that represents the movement of the small objects. This additional candidate is always propagated from the top to the bottom layers of the image pyramid. Experimental results demonstrate that the intermediate frame interpolated by the proposed algorithm significantly improves the visual quality when compared with conventional MAP-based frame interpolation. In motion compensated frame interpolation, a repetition pattern in an image makes it difficult to derive an accurate motion vector because multiple similar local minima exist in the search space of the matching cost for motion estimation. In order to improve the accuracy of motion estimation in a repetition region, this thesis attempts a semi-global approach that exploits both local and global characteristics of a repetition region. A histogram of the motion vector candidates is built by using a voter based voting system that is more reliable than an elector based voting system. Experimental results demonstrate that the proposed method significantly outperforms the previous local approach in term of both objective peak signal-to-noise ratio (PSNR) and subjective visual quality. In video frame interpolation or motion-compensated frame rate up-conversion (MC-FRUC), motion compensation along unidirectional motion trajectories directly causes overlaps and holes issues. To solve these issues, this research presents a new algorithm for bidirectional motion compensated frame interpolation. Firstly, the proposed method generates bidirectional motion vectors from two unidirectional motion vector fields (forward and backward) obtained from the unidirectional motion estimations. It is done by projecting the forward and backward motion vectors into the interpolated frame. A comprehensive metric as an extension of the distance between a projected block and an interpolated block is proposed to compute weighted coefficients in the case when the interpolated block has multiple projected ones. Holes are filled based on vector median filter of non-hole available neighbor blocks. The proposed method outperforms existing MC-FRUC methods and removes block artifacts significantly. Video frame interpolation with a deep convolutional neural network (CNN) is also investigated in this thesis. Optical flow and video frame interpolation are considered as a chicken-egg problem such that one problem affects the other and vice versa. This thesis presents a stack of networks that are trained to estimate intermediate optical flows from the very first intermediate synthesized frame and later the very end interpolated frame is generated by the second synthesis network that is fed by stacking the very first one and two learned intermediate optical flows based warped frames. The primary benefit is that it glues two problems into one comprehensive framework that learns altogether by using both an analysis-by-synthesis technique for optical flow estimation and vice versa, CNN kernels based synthesis-by-analysis. The proposed network is the first attempt to bridge two branches of previous approaches, optical flow based synthesis and CNN kernels based synthesis into a comprehensive network. Experiments are carried out with various challenging datasets, all showing that the proposed network outperforms the state-of-the-art methods with significant margins for video frame interpolation and the estimated optical flows are accurate for challenging movements. The proposed deep video frame interpolation network to post-processing is applied to the improvement of the coding efficiency of the state-of-art video compress standard, HEVC/H.265 and experimental results prove the efficiency of the proposed network.블록 기반 계층적 움직임 추정은 고화질의 보간 이미지를 생성할 수 있어 폭넓게 사용되고 있다. 하지만, 배경 영역이 움직일 때, 작은 물체에 대한 움직임 추정 성능은 여전히 좋지 않다. 이는 maximum a posterior (MAP) 방식으로 이미지 피라미드의 최상위 레벨에서 down-sampling과 over-smoothing으로 인해 작은 물체의 움직임이 무시되기 때문이다. 결과적으로 이미지 피라미드의 최하위 레벨에서 작은 물체의 움직임 벡터는 검출될 수 없어 보간 이미지에서 작은 물체는 종종 변형된 것처럼 보인다. 본 논문에서는 작은 물체의 움직임을 나타내는 2차 움직임 벡터 후보를 추가하여 작은 물체의 움직임 벡터를 보존하는 새로운 알고리즘을 제안한다. 추가된 움직임 벡터 후보는 항상 이미지 피라미드의 최상위에서 최하위 레벨로 전파된다. 실험 결과는 제안된 알고리즘의 보간 생성 프레임이 기존 MAP 기반 보간 방식으로 생성된 프레임보다 이미지 화질이 상당히 향상됨을 보여준다. 움직임 보상 프레임 보간에서, 이미지 내의 반복 패턴은 움직임 추정을 위한 정합 오차 탐색 시 다수의 유사 local minima가 존재하기 때문에 정확한 움직임 벡터 유도를 어렵게 한다. 본 논문은 반복 패턴에서의 움직임 추정의 정확도를 향상시키기 위해 반복 영역의 local한 특성과 global한 특성을 동시에 활용하는 semi-global한 접근을 시도한다. 움직임 벡터 후보의 히스토그램은 선거 기반 투표 시스템보다 신뢰할 수 있는 유권자 기반 투표 시스템 기반으로 형성된다. 실험 결과는 제안된 방법이 이전의 local한 접근법보다 peak signal-to-noise ratio (PSNR)와 주관적 화질 판단 관점에서 상당히 우수함을 보여준다. 비디오 프레임 보간 또는 움직임 보상 프레임율 상향 변환 (MC-FRUC)에서, 단방향 움직임 궤적에 따른 움직임 보상은 overlap과 hole 문제를 일으킨다. 본 연구에서 이러한 문제를 해결하기 위해 양방향 움직임 보상 프레임 보간을 위한 새로운 알고리즘을 제시한다. 먼저, 제안된 방법은 단방향 움직임 추정으로부터 얻어진 두 개의 단방향 움직임 영역(전방 및 후방)으로부터 양방향 움직임 벡터를 생성한다. 이는 전방 및 후방 움직임 벡터를 보간 프레임에 투영함으로써 수행된다. 보간된 블록에 여러 개의 투영된 블록이 있는 경우, 투영된 블록과 보간된 블록 사이의 거리를 확장하는 기준이 가중 계수를 계산하기 위해 제안된다. Hole은 hole이 아닌 이웃 블록의 vector median filter를 기반으로 처리된다. 제안 방법은 기존의 MC-FRUC보다 성능이 우수하며, 블록 열화를 상당히 제거한다. 본 논문에서는 CNN을 이용한 비디오 프레임 보간에 대해서도 다룬다. Optical flow 및 비디오 프레임 보간은 한 가지 문제가 다른 문제에 영향을 미치는 chicken-egg 문제로 간주된다. 본 논문에서는 중간 optical flow 를 계산하는 네트워크와 보간 프레임을 합성 하는 두 가지 네트워크로 이루어진 하나의 네트워크 스택을 구조를 제안한다. The final 보간 프레임을 생성하는 네트워크의 경우 첫 번째 네트워크의 출력인 보간 프레임 와 중간 optical flow based warped frames을 입력으로 받아서 프레임을 생성한다. 제안된 구조의 가장 큰 특징은 optical flow 계산을 위한 합성에 의한 분석법과 CNN 기반의 분석에 의한 합성법을 모두 이용하여 하나의 종합적인 framework로 결합하였다는 것이다. 제안된 네트워크는 기존의 두 가지 연구인 optical flow 기반 프레임 합성과 CNN 기반 합성 프레임 합성법을 처음 결합시킨 방식이다. 실험은 다양하고 복잡한 데이터 셋으로 이루어졌으며, 보간 프레임 quality 와 optical flow 계산 정확도 측면에서 기존의 state-of-art 방식에 비해 월등히 높은 성능을 보였다. 본 논문의 후 처리를 위한 심층 비디오 프레임 보간 네트워크는 코딩 효율 향상을 위해 최신 비디오 압축 표준인 HEVC/H.265에 적용할 수 있으며, 실험 결과는 제안 네트워크의 효율성을 입증한다.Abstract i Table of Contents iv List of Tables vii List of Figures viii Chapter 1. Introduction 1 1.1. Hierarchical Motion Estimation of Small Objects 2 1.2. Motion Estimation of a Repetition Pattern Region 4 1.3. Motion-Compensated Frame Interpolation 5 1.4. Video Frame Interpolation with Deep CNN 6 1.5. Outline of the Thesis 7 Chapter 2. Previous Works 9 2.1. Previous Works on Hierarchical Block-Based Motion Estimation 9 2.1.1. Maximum a Posterior (MAP) Framework 10 2.1.2.Hierarchical Motion Estimation 12 2.2. Previous Works on Motion Estimation for a Repetition Pattern Region 13 2.3. Previous Works on Motion Compensation 14 2.4. Previous Works on Video Frame Interpolation with Deep CNN 16 Chapter 3. Hierarchical Motion Estimation for Small Objects 19 3.1. Problem Statement 19 3.2. The Alternative Motion Vector of High Cost Pixels 20 3.3. Modified Hierarchical Motion Estimation 23 3.4. Framework of the Proposed Algorithm 24 3.5. Experimental Results 25 3.5.1. Performance Analysis 26 3.5.2. Performance Evaluation 29 Chapter 4. Semi-Global Accurate Motion Estimation for a Repetition Pattern Region 32 4.1. Problem Statement 32 4.2. Objective Function and Constrains 33 4.3. Elector based Voting System 34 4.4. Voter based Voting System 36 4.5. Experimental Results 40 Chapter 5. Multiple Motion Vectors based Motion Compensation 44 5.1. Problem Statement 44 5.2. Adaptive Weighted Multiple Motion Vectors based Motion Compensation 45 5.2.1. One-to-Multiple Motion Vector Projection 45 5.2.2. A Comprehensive Metric as the Extension of Distance 48 5.3. Handling Hole Blocks 49 5.4. Framework of the Proposed Motion Compensated Frame Interpolation 50 5.5. Experimental Results 51 Chapter 6. Video Frame Interpolation with a Stack of Deep CNN 56 6.1. Problem Statement 56 6.2. The Proposed Network for Video Frame Interpolation 57 6.2.1. A Stack of Synthesis Networks 57 6.2.2. Intermediate Optical Flow Derivation Module 60 6.2.3. Warping Operations 62 6.2.4. Training and Loss Function 63 6.2.5. Network Architecture 64 6.2.6. Experimental Results 64 6.2.6.1. Frame Interpolation Evaluation 64 6.2.6.2. Ablation Experiments 77 6.3. Extension for Quality Enhancement for Compressed Videos Task 83 6.4. Extension for Improving the Coding Efficiency of HEVC based Low Bitrate Encoder 88 Chapter 7. Conclusion 94 References 97Docto

    New Fast Block Matching Algorithm Using New Hybrid Search Pattern And Strategy To Improve Motion Estimation Process In Video Coding Technique

    Get PDF
    Up until today, video compression algorithm has been applied in various video applications ranging from video conferencing to video telephony. Motion Estimation or ME is deemed as one of the effective and popular techniques in video compression. As one of its techniques, the Block Matching Algorithm or BMA is widely employed in majority of well-known video codes due to its simplicity and high compression efficiency. As such, it is crucial to find different approaches of fast BMAs as the simplest and straightforward BMA is not a good fit for implementation of real-time video coding because of its high computational complexity. The aims for this study is to develop and design a new hybrid search pattern and strategy for new fast BMAs that can further improve the ME process in terms of estimation accuracy and video image quality, searching speed and computational complexity. There are 6 main designs that the algorithms proposed namely the Orthogonal-Diamond Search Algorithm with Small Diamond Search Pattern (ODS-SDSP), the Orthogonal-Diamond Search Algorithm with Large Diamond Search Pattern (ODS-LDSP), the Diamond-Orthogonal Search Algorithm with Small Diamond Pattern (DOS-SDSP), the Diamond-Orthogonal Search Algorithm with Large Diamond Pattern (DOS-LDSP), the Modified Diamond-Orthogonal Search Algorithm with Small Diamond Pattern (MDOS-SDSP), and the Modified Diamond-Orthogonal Search Algorithm with Large Diamond Pattern (MDOS-LDSP). These 6 algorithms are divided into 3 main methods namely Method A, Method B, and Method C depending on their search patterns and strategies. The first method involves the manipulation of the diamond pattern in the process, the second method includes the manipulation of the orthogonal steps, and lastly, the third method is the modified version of the second method to improve the performances of the algorithms. Evaluation is based on the algorithm performances in terms of the search points needed to find the final motion vector, the Peak-Signal to Noise Ratio (PSNR) of the algorithms, and the runtime performance of algorithm simulations. The result shows that the DOS-SDSP algorithm has the lowest search points with only 1.7341, 4.9059 and 4.0230 for each motion’s content respectively; meanwhile all the algorithms acquired similar and close PSNR values for all types of motion contents. As for simulation runtime, the results show that Method B has the least simulation runtime and Method C has the highest simulation runtime compared to others for all video sequences. The finding suggests that an early termination technique should be implemented at the early stage of the process, and mixing the selection of the mode is able to improve the algorithm performances. Therefore, it can be concluded that Method B gives the best performance in terms of search points reduction and simulation runtime while Method C yields the best for PSNR values for all types of motion contents

    Accurate Optical Flow via Direct Cost Volume Processing

    Full text link
    We present an optical flow estimation approach that operates on the full four-dimensional cost volume. This direct approach shares the structural benefits of leading stereo matching pipelines, which are known to yield high accuracy. To this day, such approaches have been considered impractical due to the size of the cost volume. We show that the full four-dimensional cost volume can be constructed in a fraction of a second due to its regularity. We then exploit this regularity further by adapting semi-global matching to the four-dimensional setting. This yields a pipeline that achieves significantly higher accuracy than state-of-the-art optical flow methods while being faster than most. Our approach outperforms all published general-purpose optical flow methods on both Sintel and KITTI 2015 benchmarks.Comment: Published at the Conference on Computer Vision and Pattern Recognition (CVPR 2017

    H.264 Motion Estimation and Applications

    Get PDF
    corecore