Search CORE

1,955 research outputs found

비디오 프레임 보간을 위한 다중 벡터 기반의 MEMC 및 심층 CNN

Author: 니구엔탕
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2019. 2. 이혁재.Block-based hierarchical motion estimations are widely used and are successful in generating high-quality interpolation. However, it still fails in the motion estimation of small objects when a background region moves in a different direction. This is because the motion of small objects is neglected by the down-sampling and over-smoothing operations at the top level of image pyramids in the maximum a posterior (MAP) method. Consequently, the motion vector of small objects cannot be detected at the bottom level, and therefore, the small objects often appear deformed in an interpolated frame. This thesis proposes a novel algorithm that preserves the motion vector of the small objects by adding a secondary motion vector candidate that represents the movement of the small objects. This additional candidate is always propagated from the top to the bottom layers of the image pyramid. Experimental results demonstrate that the intermediate frame interpolated by the proposed algorithm significantly improves the visual quality when compared with conventional MAP-based frame interpolation. In motion compensated frame interpolation, a repetition pattern in an image makes it difficult to derive an accurate motion vector because multiple similar local minima exist in the search space of the matching cost for motion estimation. In order to improve the accuracy of motion estimation in a repetition region, this thesis attempts a semi-global approach that exploits both local and global characteristics of a repetition region. A histogram of the motion vector candidates is built by using a voter based voting system that is more reliable than an elector based voting system. Experimental results demonstrate that the proposed method significantly outperforms the previous local approach in term of both objective peak signal-to-noise ratio (PSNR) and subjective visual quality. In video frame interpolation or motion-compensated frame rate up-conversion (MC-FRUC), motion compensation along unidirectional motion trajectories directly causes overlaps and holes issues. To solve these issues, this research presents a new algorithm for bidirectional motion compensated frame interpolation. Firstly, the proposed method generates bidirectional motion vectors from two unidirectional motion vector fields (forward and backward) obtained from the unidirectional motion estimations. It is done by projecting the forward and backward motion vectors into the interpolated frame. A comprehensive metric as an extension of the distance between a projected block and an interpolated block is proposed to compute weighted coefficients in the case when the interpolated block has multiple projected ones. Holes are filled based on vector median filter of non-hole available neighbor blocks. The proposed method outperforms existing MC-FRUC methods and removes block artifacts significantly. Video frame interpolation with a deep convolutional neural network (CNN) is also investigated in this thesis. Optical flow and video frame interpolation are considered as a chicken-egg problem such that one problem affects the other and vice versa. This thesis presents a stack of networks that are trained to estimate intermediate optical flows from the very first intermediate synthesized frame and later the very end interpolated frame is generated by the second synthesis network that is fed by stacking the very first one and two learned intermediate optical flows based warped frames. The primary benefit is that it glues two problems into one comprehensive framework that learns altogether by using both an analysis-by-synthesis technique for optical flow estimation and vice versa, CNN kernels based synthesis-by-analysis. The proposed network is the first attempt to bridge two branches of previous approaches, optical flow based synthesis and CNN kernels based synthesis into a comprehensive network. Experiments are carried out with various challenging datasets, all showing that the proposed network outperforms the state-of-the-art methods with significant margins for video frame interpolation and the estimated optical flows are accurate for challenging movements. The proposed deep video frame interpolation network to post-processing is applied to the improvement of the coding efficiency of the state-of-art video compress standard, HEVC/H.265 and experimental results prove the efficiency of the proposed network.블록 기반 계층적 움직임 추정은 고화질의 보간 이미지를 생성할 수 있어 폭넓게 사용되고 있다. 하지만, 배경 영역이 움직일 때, 작은 물체에 대한 움직임 추정 성능은 여전히 좋지 않다. 이는 maximum a posterior (MAP) 방식으로 이미지 피라미드의 최상위 레벨에서 down-sampling과 over-smoothing으로 인해 작은 물체의 움직임이 무시되기 때문이다. 결과적으로 이미지 피라미드의 최하위 레벨에서 작은 물체의 움직임 벡터는 검출될 수 없어 보간 이미지에서 작은 물체는 종종 변형된 것처럼 보인다. 본 논문에서는 작은 물체의 움직임을 나타내는 2차 움직임 벡터 후보를 추가하여 작은 물체의 움직임 벡터를 보존하는 새로운 알고리즘을 제안한다. 추가된 움직임 벡터 후보는 항상 이미지 피라미드의 최상위에서 최하위 레벨로 전파된다. 실험 결과는 제안된 알고리즘의 보간 생성 프레임이 기존 MAP 기반 보간 방식으로 생성된 프레임보다 이미지 화질이 상당히 향상됨을 보여준다. 움직임 보상 프레임 보간에서, 이미지 내의 반복 패턴은 움직임 추정을 위한 정합 오차 탐색 시 다수의 유사 local minima가 존재하기 때문에 정확한 움직임 벡터 유도를 어렵게 한다. 본 논문은 반복 패턴에서의 움직임 추정의 정확도를 향상시키기 위해 반복 영역의 local한 특성과 global한 특성을 동시에 활용하는 semi-global한 접근을 시도한다. 움직임 벡터 후보의 히스토그램은 선거 기반 투표 시스템보다 신뢰할 수 있는 유권자 기반 투표 시스템 기반으로 형성된다. 실험 결과는 제안된 방법이 이전의 local한 접근법보다 peak signal-to-noise ratio (PSNR)와 주관적 화질 판단 관점에서 상당히 우수함을 보여준다. 비디오 프레임 보간 또는 움직임 보상 프레임율 상향 변환 (MC-FRUC)에서, 단방향 움직임 궤적에 따른 움직임 보상은 overlap과 hole 문제를 일으킨다. 본 연구에서 이러한 문제를 해결하기 위해 양방향 움직임 보상 프레임 보간을 위한 새로운 알고리즘을 제시한다. 먼저, 제안된 방법은 단방향 움직임 추정으로부터 얻어진 두 개의 단방향 움직임 영역(전방 및 후방)으로부터 양방향 움직임 벡터를 생성한다. 이는 전방 및 후방 움직임 벡터를 보간 프레임에 투영함으로써 수행된다. 보간된 블록에 여러 개의 투영된 블록이 있는 경우, 투영된 블록과 보간된 블록 사이의 거리를 확장하는 기준이 가중 계수를 계산하기 위해 제안된다. Hole은 hole이 아닌 이웃 블록의 vector median filter를 기반으로 처리된다. 제안 방법은 기존의 MC-FRUC보다 성능이 우수하며, 블록 열화를 상당히 제거한다. 본 논문에서는 CNN을 이용한 비디오 프레임 보간에 대해서도 다룬다. Optical flow 및 비디오 프레임 보간은 한 가지 문제가 다른 문제에 영향을 미치는 chicken-egg 문제로 간주된다. 본 논문에서는 중간 optical flow 를 계산하는 네트워크와 보간 프레임을 합성 하는 두 가지 네트워크로 이루어진 하나의 네트워크 스택을 구조를 제안한다. The final 보간 프레임을 생성하는 네트워크의 경우 첫 번째 네트워크의 출력인 보간 프레임 와 중간 optical flow based warped frames을 입력으로 받아서 프레임을 생성한다. 제안된 구조의 가장 큰 특징은 optical flow 계산을 위한 합성에 의한 분석법과 CNN 기반의 분석에 의한 합성법을 모두 이용하여 하나의 종합적인 framework로 결합하였다는 것이다. 제안된 네트워크는 기존의 두 가지 연구인 optical flow 기반 프레임 합성과 CNN 기반 합성 프레임 합성법을 처음 결합시킨 방식이다. 실험은 다양하고 복잡한 데이터 셋으로 이루어졌으며, 보간 프레임 quality 와 optical flow 계산 정확도 측면에서 기존의 state-of-art 방식에 비해 월등히 높은 성능을 보였다. 본 논문의 후 처리를 위한 심층 비디오 프레임 보간 네트워크는 코딩 효율 향상을 위해 최신 비디오 압축 표준인 HEVC/H.265에 적용할 수 있으며, 실험 결과는 제안 네트워크의 효율성을 입증한다.Abstract i Table of Contents iv List of Tables vii List of Figures viii Chapter 1. Introduction 1 1.1. Hierarchical Motion Estimation of Small Objects 2 1.2. Motion Estimation of a Repetition Pattern Region 4 1.3. Motion-Compensated Frame Interpolation 5 1.4. Video Frame Interpolation with Deep CNN 6 1.5. Outline of the Thesis 7 Chapter 2. Previous Works 9 2.1. Previous Works on Hierarchical Block-Based Motion Estimation 9 2.1.1. Maximum a Posterior (MAP) Framework 10 2.1.2.Hierarchical Motion Estimation 12 2.2. Previous Works on Motion Estimation for a Repetition Pattern Region 13 2.3. Previous Works on Motion Compensation 14 2.4. Previous Works on Video Frame Interpolation with Deep CNN 16 Chapter 3. Hierarchical Motion Estimation for Small Objects 19 3.1. Problem Statement 19 3.2. The Alternative Motion Vector of High Cost Pixels 20 3.3. Modified Hierarchical Motion Estimation 23 3.4. Framework of the Proposed Algorithm 24 3.5. Experimental Results 25 3.5.1. Performance Analysis 26 3.5.2. Performance Evaluation 29 Chapter 4. Semi-Global Accurate Motion Estimation for a Repetition Pattern Region 32 4.1. Problem Statement 32 4.2. Objective Function and Constrains 33 4.3. Elector based Voting System 34 4.4. Voter based Voting System 36 4.5. Experimental Results 40 Chapter 5. Multiple Motion Vectors based Motion Compensation 44 5.1. Problem Statement 44 5.2. Adaptive Weighted Multiple Motion Vectors based Motion Compensation 45 5.2.1. One-to-Multiple Motion Vector Projection 45 5.2.2. A Comprehensive Metric as the Extension of Distance 48 5.3. Handling Hole Blocks 49 5.4. Framework of the Proposed Motion Compensated Frame Interpolation 50 5.5. Experimental Results 51 Chapter 6. Video Frame Interpolation with a Stack of Deep CNN 56 6.1. Problem Statement 56 6.2. The Proposed Network for Video Frame Interpolation 57 6.2.1. A Stack of Synthesis Networks 57 6.2.2. Intermediate Optical Flow Derivation Module 60 6.2.3. Warping Operations 62 6.2.4. Training and Loss Function 63 6.2.5. Network Architecture 64 6.2.6. Experimental Results 64 6.2.6.1. Frame Interpolation Evaluation 64 6.2.6.2. Ablation Experiments 77 6.3. Extension for Quality Enhancement for Compressed Videos Task 83 6.4. Extension for Improving the Coding Efficiency of HEVC based Low Bitrate Encoder 88 Chapter 7. Conclusion 94 References 97Docto

SNU Open Repository and Archive

A Content-Adaptive Side Information Generation Method for Distributed Video Coding

Author: Cao Ning
Li Li
Shao Yongqin
Zhu Jinxiu
Publication venue: Published by Elsevier Ltd.
Publication date: 31/12/2012
Field of study

AbstractIn this paper, a content-adaptive method to generate side information at the block level is presented. First, motion compensated temporal interpolation (MCTI) algorithm is used between the reconstructed key frames at the decoder to acquire initial motion vectors. Second, the image is segmented and the edge of moving region is detected by obtained the residual frame between two consecutive key frames. Furthermore, hierarchical motion estimation (HME) and motion vector filter (MVF) are adopted for edge region and an adaptive motion vector filter (AMVF) is introduced in non-edge region to correct the false estimated motion vectors. The proposal is tested and compared with the results of the state-of-the-art DISCOVER codec and RD improvements on the set of test sequences are observed

Elsevier - Publisher Connector

Motion Estimation at the Decoder

Author: Jörn Ostermann
Sven Klomp
Publication venue: 'IntechOpen'
Publication date: 26/04/2011
Field of study

IntechOpen

Variable Block Size Motion Compensation In The Redundant Wavelet Domain

Author: Suliman Ahmed Abdelgadir
Publication venue: Aggie Digital Collections and Scholarship
Publication date: 01/01/2011
Field of study

Video is one of the most powerful forms of multimedia because of the extensive information it delivers. Video sequences are highly correlated both temporally and spatially, a fact which makes the compression of video possible. Modern video systems employ motion estimation and motion compensation (ME/MC) to de-correlate a video sequence temporally. ME/MC forms a prediction of the current frame using the frames which have been already encoded. Consequently, one needs to transmit the corresponding residual image instead of the original frame, as well as a set of motion vectors which describe the scene motion as observed at the encoder. The redundant wavelet transform (RDWT) provides several advantages over the conventional wavelet transform (DWT). The RDWT overcomes the shift invariant problem in DWT. Moreover, RDWT retains all the phase information of wavelet coefficients and provides multiple prediction possibilities for ME/MC in wavelet domain. The general idea of variable size block motion compensation (VSBMC) technique is to partition a frame in such a way that regions with uniform translational motions are divided into larger blocks while those containing complicated motions into smaller blocks, leading to an adaptive distribution of motion vectors (MV) across the frame. The research proposed new adaptive partitioning schemes and decision criteria in RDWT that utilize more effectively the motion content of a frame in terms of various block sizes. The research also proposed a selective subpixel accuracy algorithm for the motion vector using a multiband approach. The selective subpixel accuracy reduces the computations produced by the conventional subpixel algorithm while maintaining the same accuracy. In addition, the method of overlapped block motion compensation (OBMC) is used to reduce blocking artifacts. Finally, the research extends the applications of the proposed VSBMC to the 3D video sequences. The experimental results obtained here have shown that VSBMC in the RDWT domain can be a powerful tool for video compression

North Carolina Agricultural and Technical State University: NC A&T SU Bluford Library's Aggie Digital Collections and Scholarship

Hierarchical motion estimation for side information creation in Wyner-Ziv video coding

Author: Fernando Pereira
João Ascenso
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

Recently, several video coding solutions based on the distributed source coding paradigm have appeared in the literature. Among them, Wyner-Ziv video coding schemes enable to achieve a flexible distribution of the computational complexity between the encoder and decoder, promising to fulfill requirements of emerging applications such as visual sensor networks and wireless surveillance. To achieve a performance comparable to the predictive video coding solutions, it is necessary to increase the quality of the side information, this means the estimation of the original frame created at the decoder. In this paper, a hierarchical motion estimation (HME) technique using different scales and increasingly smaller block sizes is proposed to generate a more reliable estimation of the motion field. The HME technique is integrated in a well known motion compensated frame interpolation framework responsible for the creation of the side information in a Wyner-Ziv video decoder. The proposed technique enables to achieve improvements in the rate-distortion (RD) performance up to 7 dB when compared to H.263+ Intra and 3 dB when compared to H.264/AVC Intra

CiteSeerX

Crossref

Motion Estimation and Compensation in the Redundant Wavelet Domain

Author: Cui Suxia
Publication venue: Scholars Junction
Publication date: 25/07/2003
Field of study

Despite being the prefered approach for still-image compression for nearly a decade, wavelet-based coding for video has been slow to emerge, due primarily to the fact that the shift variance of the discrete wavelet transform hinders motion estimation and compensation crucial to modern video coders. Recently it has been recognized that a redundant, or overcomplete, wavelet transform is shift invariant and thus permits motion prediction in the wavelet domain. In this dissertation, other uses for the redundancy of overcomplete wavelet transforms in video coding are explored. First, it is demonstrated that the redundant-wavelet domain facilitates the placement of an irregular triangular mesh to video images, thereby exploiting transform redundancy to implement geometries for motion estimation and compensation more general than the traditional block structure widely employed. As the second contribution of this dissertation, a new form of multihypothesis prediction, redundant wavelet multihypothesis, is presented. This new approach to motion estimation and compensation produces motion predictions that are diverse in transform phase to increase prediction accuracy. Finally, it is demonstrated that the proposed redundant-wavelet strategies complement existing advanced video-coding techniques and produce significant performance improvements in a battery of experimental results

Mississippi State University Libraries ETD database

Scholars Junction - Mississippi State University Institutional Repository

Disparity-compensated view synthesis for s3D content correction

Author: Conze Pierre-Henri
Robert Philippe
Thébault Cédric
Publication venue: HAL CCSD
Publication date: 23/01/2012
Field of study

International audienceThe production of stereoscopic 3D HD content is considerably increasing and experience in 2-view acquisition is in progress. High quality material to the audience is required but not always ensured, and correction of the stereo views may be required. This is done via disparity-compensated view synthesis. A robust method has been developed dealing with these acquisition problems that introduce discomfort (e.g hyperdivergence and hyperconvergence...) as well as those ones that may disrupt the correction itself (vertical disparity, color difference between views...). The method has three phases: a preprocessing in order to correct the stereo images and estimate features (e.g. disparity range...) over the sequence. The second (main) phase proceeds then to disparity estimation and view synthesis. Dual disparity estimation based on robust block-matching, discontinuity-preserving filtering, consistency and occlusion handling has been developed. Accurate view synthesis is carried out through disparity compensation. Disparity assessment has been introduced in order to detect and quantify errors. A post-processing deals with these errors as a fallback mode. The paper focuses on disparity estimation and view synthesis of HD images. Quality assessment of synthesized views on a large set of HD video data has proved the effectiveness of our method

Livrable D3.3 of the PERSEE project : 2D coding tools

Author: Cagnazzo Marco
Guillemot Christine
Guillo Laurent
Le Meur Olivier
Pesquet-Popescu Béatrice
Ricordel Vincent
Publication venue: HAL CCSD
Publication date: 01/10/2011
Field of study

49Livrable D3.3 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D3.3 du projet. Son titre : 2D coding tool

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Research and developments of distributed video coding

Author: Xue Zhuo
Publication venue: School of Engineering and Design, Brunel University
Publication date: 01/01/2009
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The recent developed Distributed Video Coding (DVC) is typically suitable for the applications such as wireless/wired video sensor network, mobile camera etc. where the traditional video coding standard is not feasible due to the constrained computation at the encoder. With DVC, the computational burden is moved from encoder to decoder. The compression efficiency is achieved via joint decoding at the decoder. The practical application of DVC is referred to Wyner-Ziv video coding (WZ) where the side information is available at the decoder to perform joint decoding. This join decoding inevitably causes a very complex decoder. In current WZ video coding issues, many of them emphasise how to improve the system coding performance but neglect the huge complexity caused at the decoder. The complexity of the decoder has direct influence to the system output. The beginning period of this research targets to optimise the decoder in pixel domain WZ video coding (PDWZ), while still achieves similar compression performance. More specifically, four issues are raised to optimise the input block size, the side information generation, the side information refinement process and the feedback channel respectively. The transform domain WZ video coding (TDWZ) has distinct superior performance to the normal PDWZ due to the exploitation in spatial direction during the encoding. However, since there is no motion estimation at the encoder in WZ video coding, the temporal correlation is not exploited at all at the encoder in all current WZ video coding issues. In the middle period of this research, the 3D DCT is adopted in the TDWZ to remove redundancy in both spatial and temporal direction thus to provide even higher coding performance. In the next step of this research, the performance of transform domain Distributed Multiview Video Coding (DMVC) is also investigated. Particularly, three types transform domain DMVC frameworks which are transform domain DMVC using TDWZ based 2D DCT, transform domain DMVC using TDWZ based on 3D DCT and transform domain residual DMVC using TDWZ based on 3D DCT are investigated respectively. One of the important applications of WZ coding principle is error-resilience. There have been several attempts to apply WZ error-resilient coding for current video coding standard e.g. H.264/AVC or MEPG 2. The final stage of this research is the design of WZ error-resilient scheme for wavelet based video codec. To balance the trade-off between error resilience ability and bandwidth consumption, the proposed scheme emphasises the protection of the Region of Interest (ROI) area. The efficiency of bandwidth utilisation is achieved by mutual efforts of WZ coding and sacrificing the quality of unimportant area. In summary, this research work contributed to achieves several advances in WZ video coding. First of all, it is targeting to build an efficient PDWZ with optimised decoder. Secondly, it aims to build an advanced TDWZ based on 3D DCT, which then is applied into multiview video coding to realise advanced transform domain DMVC. Finally, it aims to design an efficient error-resilient scheme for wavelet video codec, with which the trade-off between bandwidth consumption and error-resilience can be better balanced

CiteSeerX

Brunel University Research Archive

Recommended from our members

Intelligent Side Information Generation in Distributed Video Coding

Author: Akinola Mobolaji Olukunle
Publication venue
Publication date: 15/05/2015
Field of study

Distributed video coding (DVC) reverses the traditional coding paradigm of complex encoders allied with basic decoding to one where the computational cost is largely incurred by the decoder. This is attractive as the proven theoretical work of Wyner-Ziv (WZ) and Slepian-Wolf (SW) shows that the performance by such a system should be exactly the same as a conventional coder. Despite the solid theoretical foundations, current DVC qualitative and quantitative performance falls short of existing conventional coders and there remain crucial limitations. A key constraint governing DVC performance is the quality of side information (SI), a coarse representation of original video frames which are not available at the decoder. Techniques to generate SI have usually been based on linear motion compensated temporal interpolation (LMCTI), though these do not always produce satisfactory SI quality, especially in sequences exhibiting non-linear motion. This thesis presents an intelligent higher order piecewise trajectory temporal interpolation (HOPTTI) framework for SI generation with original contributions that afford better SI quality in comparison to existing LMCTI-based approaches. The major elements in this framework are: (i) a cubic trajectory interpolation algorithm model that significantly improves the accuracy of motion vector estimations; (ii) an adaptive overlapped block motion compensation (AOBMC) model which reduces both blocking and overlapping artefacts in the SI emanating from the block matching algorithm; (iii) the development of an empirical mode switching algorithm; and (iv) an intelligent switching mechanism to construct SI by automatically selecting the best macroblock from the intermediate SI generated by HOPTTI and AOBMC algorithms. Rigorous analysis and evaluation confirms that significant quantitative and perceptual improvements in SI quality are achieved with the new framework

Open Research Online (The Open University)