Search CORE

326 research outputs found

비디오 프레임 보간을 위한 다중 벡터 기반의 MEMC 및 심층 CNN

Author: 니구엔탕
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2019. 2. 이혁재.Block-based hierarchical motion estimations are widely used and are successful in generating high-quality interpolation. However, it still fails in the motion estimation of small objects when a background region moves in a different direction. This is because the motion of small objects is neglected by the down-sampling and over-smoothing operations at the top level of image pyramids in the maximum a posterior (MAP) method. Consequently, the motion vector of small objects cannot be detected at the bottom level, and therefore, the small objects often appear deformed in an interpolated frame. This thesis proposes a novel algorithm that preserves the motion vector of the small objects by adding a secondary motion vector candidate that represents the movement of the small objects. This additional candidate is always propagated from the top to the bottom layers of the image pyramid. Experimental results demonstrate that the intermediate frame interpolated by the proposed algorithm significantly improves the visual quality when compared with conventional MAP-based frame interpolation. In motion compensated frame interpolation, a repetition pattern in an image makes it difficult to derive an accurate motion vector because multiple similar local minima exist in the search space of the matching cost for motion estimation. In order to improve the accuracy of motion estimation in a repetition region, this thesis attempts a semi-global approach that exploits both local and global characteristics of a repetition region. A histogram of the motion vector candidates is built by using a voter based voting system that is more reliable than an elector based voting system. Experimental results demonstrate that the proposed method significantly outperforms the previous local approach in term of both objective peak signal-to-noise ratio (PSNR) and subjective visual quality. In video frame interpolation or motion-compensated frame rate up-conversion (MC-FRUC), motion compensation along unidirectional motion trajectories directly causes overlaps and holes issues. To solve these issues, this research presents a new algorithm for bidirectional motion compensated frame interpolation. Firstly, the proposed method generates bidirectional motion vectors from two unidirectional motion vector fields (forward and backward) obtained from the unidirectional motion estimations. It is done by projecting the forward and backward motion vectors into the interpolated frame. A comprehensive metric as an extension of the distance between a projected block and an interpolated block is proposed to compute weighted coefficients in the case when the interpolated block has multiple projected ones. Holes are filled based on vector median filter of non-hole available neighbor blocks. The proposed method outperforms existing MC-FRUC methods and removes block artifacts significantly. Video frame interpolation with a deep convolutional neural network (CNN) is also investigated in this thesis. Optical flow and video frame interpolation are considered as a chicken-egg problem such that one problem affects the other and vice versa. This thesis presents a stack of networks that are trained to estimate intermediate optical flows from the very first intermediate synthesized frame and later the very end interpolated frame is generated by the second synthesis network that is fed by stacking the very first one and two learned intermediate optical flows based warped frames. The primary benefit is that it glues two problems into one comprehensive framework that learns altogether by using both an analysis-by-synthesis technique for optical flow estimation and vice versa, CNN kernels based synthesis-by-analysis. The proposed network is the first attempt to bridge two branches of previous approaches, optical flow based synthesis and CNN kernels based synthesis into a comprehensive network. Experiments are carried out with various challenging datasets, all showing that the proposed network outperforms the state-of-the-art methods with significant margins for video frame interpolation and the estimated optical flows are accurate for challenging movements. The proposed deep video frame interpolation network to post-processing is applied to the improvement of the coding efficiency of the state-of-art video compress standard, HEVC/H.265 and experimental results prove the efficiency of the proposed network.블록 기반 계층적 움직임 추정은 고화질의 보간 이미지를 생성할 수 있어 폭넓게 사용되고 있다. 하지만, 배경 영역이 움직일 때, 작은 물체에 대한 움직임 추정 성능은 여전히 좋지 않다. 이는 maximum a posterior (MAP) 방식으로 이미지 피라미드의 최상위 레벨에서 down-sampling과 over-smoothing으로 인해 작은 물체의 움직임이 무시되기 때문이다. 결과적으로 이미지 피라미드의 최하위 레벨에서 작은 물체의 움직임 벡터는 검출될 수 없어 보간 이미지에서 작은 물체는 종종 변형된 것처럼 보인다. 본 논문에서는 작은 물체의 움직임을 나타내는 2차 움직임 벡터 후보를 추가하여 작은 물체의 움직임 벡터를 보존하는 새로운 알고리즘을 제안한다. 추가된 움직임 벡터 후보는 항상 이미지 피라미드의 최상위에서 최하위 레벨로 전파된다. 실험 결과는 제안된 알고리즘의 보간 생성 프레임이 기존 MAP 기반 보간 방식으로 생성된 프레임보다 이미지 화질이 상당히 향상됨을 보여준다. 움직임 보상 프레임 보간에서, 이미지 내의 반복 패턴은 움직임 추정을 위한 정합 오차 탐색 시 다수의 유사 local minima가 존재하기 때문에 정확한 움직임 벡터 유도를 어렵게 한다. 본 논문은 반복 패턴에서의 움직임 추정의 정확도를 향상시키기 위해 반복 영역의 local한 특성과 global한 특성을 동시에 활용하는 semi-global한 접근을 시도한다. 움직임 벡터 후보의 히스토그램은 선거 기반 투표 시스템보다 신뢰할 수 있는 유권자 기반 투표 시스템 기반으로 형성된다. 실험 결과는 제안된 방법이 이전의 local한 접근법보다 peak signal-to-noise ratio (PSNR)와 주관적 화질 판단 관점에서 상당히 우수함을 보여준다. 비디오 프레임 보간 또는 움직임 보상 프레임율 상향 변환 (MC-FRUC)에서, 단방향 움직임 궤적에 따른 움직임 보상은 overlap과 hole 문제를 일으킨다. 본 연구에서 이러한 문제를 해결하기 위해 양방향 움직임 보상 프레임 보간을 위한 새로운 알고리즘을 제시한다. 먼저, 제안된 방법은 단방향 움직임 추정으로부터 얻어진 두 개의 단방향 움직임 영역(전방 및 후방)으로부터 양방향 움직임 벡터를 생성한다. 이는 전방 및 후방 움직임 벡터를 보간 프레임에 투영함으로써 수행된다. 보간된 블록에 여러 개의 투영된 블록이 있는 경우, 투영된 블록과 보간된 블록 사이의 거리를 확장하는 기준이 가중 계수를 계산하기 위해 제안된다. Hole은 hole이 아닌 이웃 블록의 vector median filter를 기반으로 처리된다. 제안 방법은 기존의 MC-FRUC보다 성능이 우수하며, 블록 열화를 상당히 제거한다. 본 논문에서는 CNN을 이용한 비디오 프레임 보간에 대해서도 다룬다. Optical flow 및 비디오 프레임 보간은 한 가지 문제가 다른 문제에 영향을 미치는 chicken-egg 문제로 간주된다. 본 논문에서는 중간 optical flow 를 계산하는 네트워크와 보간 프레임을 합성 하는 두 가지 네트워크로 이루어진 하나의 네트워크 스택을 구조를 제안한다. The final 보간 프레임을 생성하는 네트워크의 경우 첫 번째 네트워크의 출력인 보간 프레임 와 중간 optical flow based warped frames을 입력으로 받아서 프레임을 생성한다. 제안된 구조의 가장 큰 특징은 optical flow 계산을 위한 합성에 의한 분석법과 CNN 기반의 분석에 의한 합성법을 모두 이용하여 하나의 종합적인 framework로 결합하였다는 것이다. 제안된 네트워크는 기존의 두 가지 연구인 optical flow 기반 프레임 합성과 CNN 기반 합성 프레임 합성법을 처음 결합시킨 방식이다. 실험은 다양하고 복잡한 데이터 셋으로 이루어졌으며, 보간 프레임 quality 와 optical flow 계산 정확도 측면에서 기존의 state-of-art 방식에 비해 월등히 높은 성능을 보였다. 본 논문의 후 처리를 위한 심층 비디오 프레임 보간 네트워크는 코딩 효율 향상을 위해 최신 비디오 압축 표준인 HEVC/H.265에 적용할 수 있으며, 실험 결과는 제안 네트워크의 효율성을 입증한다.Abstract i Table of Contents iv List of Tables vii List of Figures viii Chapter 1. Introduction 1 1.1. Hierarchical Motion Estimation of Small Objects 2 1.2. Motion Estimation of a Repetition Pattern Region 4 1.3. Motion-Compensated Frame Interpolation 5 1.4. Video Frame Interpolation with Deep CNN 6 1.5. Outline of the Thesis 7 Chapter 2. Previous Works 9 2.1. Previous Works on Hierarchical Block-Based Motion Estimation 9 2.1.1. Maximum a Posterior (MAP) Framework 10 2.1.2.Hierarchical Motion Estimation 12 2.2. Previous Works on Motion Estimation for a Repetition Pattern Region 13 2.3. Previous Works on Motion Compensation 14 2.4. Previous Works on Video Frame Interpolation with Deep CNN 16 Chapter 3. Hierarchical Motion Estimation for Small Objects 19 3.1. Problem Statement 19 3.2. The Alternative Motion Vector of High Cost Pixels 20 3.3. Modified Hierarchical Motion Estimation 23 3.4. Framework of the Proposed Algorithm 24 3.5. Experimental Results 25 3.5.1. Performance Analysis 26 3.5.2. Performance Evaluation 29 Chapter 4. Semi-Global Accurate Motion Estimation for a Repetition Pattern Region 32 4.1. Problem Statement 32 4.2. Objective Function and Constrains 33 4.3. Elector based Voting System 34 4.4. Voter based Voting System 36 4.5. Experimental Results 40 Chapter 5. Multiple Motion Vectors based Motion Compensation 44 5.1. Problem Statement 44 5.2. Adaptive Weighted Multiple Motion Vectors based Motion Compensation 45 5.2.1. One-to-Multiple Motion Vector Projection 45 5.2.2. A Comprehensive Metric as the Extension of Distance 48 5.3. Handling Hole Blocks 49 5.4. Framework of the Proposed Motion Compensated Frame Interpolation 50 5.5. Experimental Results 51 Chapter 6. Video Frame Interpolation with a Stack of Deep CNN 56 6.1. Problem Statement 56 6.2. The Proposed Network for Video Frame Interpolation 57 6.2.1. A Stack of Synthesis Networks 57 6.2.2. Intermediate Optical Flow Derivation Module 60 6.2.3. Warping Operations 62 6.2.4. Training and Loss Function 63 6.2.5. Network Architecture 64 6.2.6. Experimental Results 64 6.2.6.1. Frame Interpolation Evaluation 64 6.2.6.2. Ablation Experiments 77 6.3. Extension for Quality Enhancement for Compressed Videos Task 83 6.4. Extension for Improving the Coding Efficiency of HEVC based Low Bitrate Encoder 88 Chapter 7. Conclusion 94 References 97Docto

SNU Open Repository and Archive

Fusion of interpolated frames superresolution in the presence of atmospheric optical turbulence

Author: Dapore Alexander J.
Droege Douglas R.
French Joseph C.
Hardie Russell C.
Karch Barry K.
Rucci Michael A.
Publication venue: eCommons
Publication date: 01/08/2019
Field of study

An extension of the fusion of interpolated frames superresolution (FIF SR) method to perform SR in the presence of atmospheric optical turbulence is presented. The goal of such processing is to improve the performance of imaging systems impacted by turbulence. We provide an optical transfer function analysis that illustrates regimes where significant degradation from both aliasing and turbulence may be present in imaging systems. This analysis demonstrates the potential need for simultaneous SR and turbulence mitigation (TM). While the FIF SR method was not originally proposed to address this joint restoration problem, we believe it is well suited for this task. We propose a variation of the FIF SR method that has a fusion parameter that allows it to transition from traditional diffraction-limited SR to pure TM with no SR as well as a continuum in between. This fusion parameter balances subpixel resolution, needed for SR, with the amount of temporal averaging, needed for TM and noise reduction. In addition, we develop a model of the interpolation blurring that results from the fusion process, as a function of this tuning parameter. The blurring model is then incorporated into the overall degradation model that is addressed in the restoration step of the FIF SR method. This innovation benefits the FIF SR method in all applications. We present a number of experimental results to demonstrate the efficacy of the FIF SR method in different levels of turbulence. Simulated imagery with known ground truth is used for a detailed quantitative analysis. Three real infrared image sequences are also used. Two of these include bar targets that allow for a quantitative resolution enhancement assessment

University of Dayton

Recommended from our members

Intelligent Side Information Generation in Distributed Video Coding

Author: Akinola Mobolaji Olukunle
Publication venue
Publication date: 15/05/2015
Field of study

Distributed video coding (DVC) reverses the traditional coding paradigm of complex encoders allied with basic decoding to one where the computational cost is largely incurred by the decoder. This is attractive as the proven theoretical work of Wyner-Ziv (WZ) and Slepian-Wolf (SW) shows that the performance by such a system should be exactly the same as a conventional coder. Despite the solid theoretical foundations, current DVC qualitative and quantitative performance falls short of existing conventional coders and there remain crucial limitations. A key constraint governing DVC performance is the quality of side information (SI), a coarse representation of original video frames which are not available at the decoder. Techniques to generate SI have usually been based on linear motion compensated temporal interpolation (LMCTI), though these do not always produce satisfactory SI quality, especially in sequences exhibiting non-linear motion. This thesis presents an intelligent higher order piecewise trajectory temporal interpolation (HOPTTI) framework for SI generation with original contributions that afford better SI quality in comparison to existing LMCTI-based approaches. The major elements in this framework are: (i) a cubic trajectory interpolation algorithm model that significantly improves the accuracy of motion vector estimations; (ii) an adaptive overlapped block motion compensation (AOBMC) model which reduces both blocking and overlapping artefacts in the SI emanating from the block matching algorithm; (iii) the development of an empirical mode switching algorithm; and (iv) an intelligent switching mechanism to construct SI by automatically selecting the best macroblock from the intermediate SI generated by HOPTTI and AOBMC algorithms. Rigorous analysis and evaluation confirms that significant quantitative and perceptual improvements in SI quality are achieved with the new framework

Open Research Online (The Open University)

Interpolation temporelle des images avec estimation de mouvement raffinée basée pixel et réduction de l'effet de halo

Author: Tran Thi Thuy Ha
Publication venue: 'Universite de Sherbrooke'
Publication date: 01/01/2010
Field of study

Dans le présent travail, après un résumé de l'état de l'art, une nouvelle interpolation temporelle des images avec réduction de halo est proposée. D'abord, pour la télévision de définition standard, une estimation de mouvement dont la résolution est le pixel, est suggérée. L'estimation se fait par l'appariement des blocs, et est suivie par un raffinement basé pixel en considérant des vecteurs de mouvement environnant. La réduction de halo se faisant à l'aide d'une fenêtre glissante de forme adaptative ne recourt pas à une détection explicite des régions d'occlusion. Ensuite, pour la télévision à haute définition, dans le but de réduire la complexité, l'estimation de mouvement de résolution pixel ainsi que la réduction de halo sont généralisées dans le contexte d'une décomposition hiérarchique. L'interpolation finale proposée est générique et est fonction à la fois de la position de l'image et de la fiabilité de l'estimation. Plusieurs . post-traitements pour améliorer la qualité de l'image sont aussi suggérés. L'algorithme proposé intégré dans un ASIC selon la technologie de circuit intégré contemporain fonctionne en temps réel

Savoirs UdeS

Channel Resource Allocation for Multi-camera Video Streaming in Vehicular Ad-hoc Networks

Author: Moreschini Sergio
Publication venue
Publication date: 09/03/2016
Field of study

This thesis studies the problem of channel resource allocation in a vehicular video communication system. The goal of the work is to investigate the allocation of resources in the case when each user estimates the channel without coordination from other users and then transmits the video data. The analysis in this work has been carried out in two different stages. The first stage, reproduced the simulations conducted in early studies and then extended. In the second stage, an environment has been recreated based on the specifications of the standard IEEE 802.11p where the antennas are static and close to each other, so that packet losses can be caused mostly by congestion and collisions in random multiple access channels. To set up a network completely based on such standards specific antennas adapted to work at the specific frequency have been employed. At both stages, in order to transmit a video stream in the network and to collect the channel estimation information, an instant messaging software has been exploited, among those available we chose to use Skype. This thesis shows results from simulations performed in a real-world environment. These prove that the behavior from the different users in the network has a direct effect on the quality perceived by the users themselves. From this study it was possible to define a new transmission system which involves greater knowledge from users regarding the network. Using this system it would be possible to achieve a more stable and fair level of visual quality for all users involved in the system

Trepo - Institutional Repository of Tampere University

Distinguishing and connecting self and others:a social neuroscience perspective

Author: Cui Fang
Publication venue: s.n.
Publication date: 01/01/2013
Field of study

Proceedings - University of Groningen

REGION-BASED ADAPTIVE DISTRIBUTED VIDEO CODING CODEC

Author: ABDELRAHMAN ELAMIN ABDELRAHMAN ELAMIN
Publication venue
Publication date: 01/12/2010
Field of study

The recently developed Distributed Video Coding (DVC) is typically suitable for the applications where the conventional video coding is not feasible because of its inherent high-complexity encoding. Examples include video surveillance usmg wireless/wired video sensor network and applications using mobile cameras etc. With DVC, the complexity is shifted from the encoder to the decoder. The practical application of DVC is referred to as Wyner-Ziv video coding (WZ) where an estimate of the original frame called "side information" is generated using motion compensation at the decoder. The compression is achieved by sending only that extra information that is needed to correct this estimation. An error-correcting code is used with the assumption that the estimate is a noisy version of the original frame and the rate needed is certain amount of the parity bits. The side information is assumed to have become available at the decoder through a virtual channel. Due to the limitation of compensation method, the predicted frame, or the side information, is expected to have varying degrees of success. These limitations stem from locationspecific non-stationary estimation noise. In order to avoid these, the conventional video coders, like MPEG, make use of frame partitioning to allocate optimum coder for each partition and hence achieve better rate-distortion performance. The same, however, has not been used in DVC as it increases the encoder complexity. This work proposes partitioning the considered frame into many coding units (region) where each unit is encoded differently. This partitioning is, however, done at the decoder while generating the side-information and the region map is sent over to encoder at very little rate penalty. The partitioning allows allocation of appropriate DVC coding parameters (virtual channel, rate, and quantizer) to each region. The resulting regions map is compressed by employing quadtree algorithm and communicated to the encoder via the feedback channel. The rate control in DVC is performed by channel coding techniques (turbo codes, LDPC, etc.). The performance of the channel code depends heavily on the accuracy of virtual channel model that models estimation error for each region. In this work, a turbo code has been used and an adaptive WZ DVC is designed both in transform domain and in pixel domain. The transform domain WZ video coding (TDWZ) has distinct superior performance as compared to the normal Pixel Domain Wyner-Ziv (PDWZ), since it exploits the ' spatial redundancy during the encoding. The performance evaluations show that the proposed system is superior to the existing distributed video coding solutions. Although the, proposed system requires extra bits representing the "regions map" to be transmitted, fuut still the rate gain is noticeable and it outperforms the state-of-the-art frame based DVC by 0.6-1.9 dB. The feedback channel (FC) has the role to adapt the bit rate to the changing ' statistics between the side infonmation and the frame to be encoded. In the unidirectional scenario, the encoder must perform the rate control. To correctly estimate the rate, the encoder must calculate typical side information. However, the rate cannot be exactly calculated at the encoder, instead it can only be estimated. This work also prbposes a feedback-free region-based adaptive DVC solution in pixel domain based on machine learning approach to estimate the side information. Although the performance evaluations show rate-penalty but it is acceptable considering the simplicity of the proposed algorithm. vii

UTPedia

NASA Tech Briefs, May 1990

Author
Publication venue
Publication date
Field of study

Topics: New Product Ideas; NASA TU Services; Electronic Components and Circuits; Electronic Systems; Physical Sciences; Materials; Computer Programs; Mechanics; Machinery; Fabrication Technology; Mathematics and Information Sciences; Life Sciences

NASA Technical Reports Server

MRI Artefact Augmentation: Robust Deep Learning Systems and Automated Quality Control

Author: Shaw Richard
Publication venue: UCL (University College London)
Publication date: 28/04/2022
Field of study

Quality control (QC) of magnetic resonance imaging (MRI) is essential to establish whether a scan or dataset meets a required set of standards. In MRI, many potential artefacts must be identified so that problematic images can either be excluded or accounted for in further image processing or analysis. To date, the gold standard for the identification of these issues is visual inspection by experts. A primary source of MRI artefacts is caused by patient movement, which can affect clinical diagnosis and impact the accuracy of Deep Learning systems. In this thesis, I present a method to simulate motion artefacts from artefact-free images to augment convolutional neural networks (CNNs), increasing training appearance variability and robustness to motion artefacts. I show that models trained with artefact augmentation generalise better and are more robust to real-world artefacts, with negligible cost to performance on clean data. I argue that it is often better to optimise frameworks end-to-end with artefact augmentation rather than learning to retrospectively remove artefacts, thus enforcing robustness to artefacts at the feature level representation of the data. The labour-intensive and subjective nature of QC has increased interest in automated methods. To address this, I approach MRI quality estimation as the uncertainty in performing a downstream task, using probabilistic CNNs to predict segmentation uncertainty as a function of the input data. Extending this framework, I introduce a novel decoupled uncertainty model, enabling separate uncertainty predictions for different types of image degradation. Training with an extended k-space artefact augmentation pipeline, the model provides informative measures of uncertainty on problematic real-world scans classified by QC raters and enables sources of segmentation uncertainty to be identified. Suitable quality for algorithmic processing may differ from an image's perceptual quality. Exploring this, I pose MRI visual quality assessment as an image restoration task. Using Bayesian CNNs to recover clean images from noisy data, I show that the uncertainty indicates the possible recoverability of an image. A multi-task network combining uncertainty-aware artefact recovery with tissue segmentation highlights the distinction between visual and algorithmic quality, which has the impact that, depending on the downstream task, less data should be discarded for purely visual quality reasons

UCL Discovery

Error Signals from the Brain: 7th Mismatch Negativity Conference

Author: Bendixen Alexandra
Friederici Angela D.
Grimm Sabine
Gunter Thomas C.
Kotz Sonja A.
Müller Dagmar
Roeber Urte
Rübsamen Rudolf
Schröger Erich
Steinberg Johanna
Weise Annekathrin
Wetzel Nicole
Widmann Andreas
Publication venue
Publication date: 28/02/2019
Field of study

The 7th Mismatch Negativity Conference presents the state of the art in methods, theory, and application (basic and clinical research) of the MMN (and related error signals of the brain). Moreover, there will be two pre-conference workshops: one on the design of MMN studies and the analysis and interpretation of MMN data, and one on the visual MMN (with 20 presentations). There will be more than 40 presentations on hot topics of MMN grouped into thirteen symposia, and about 130 poster presentations. Keynote lectures by Kimmo Alho, Angela D. Friederici, and Israel Nelken will round off the program by covering topics related to and beyond MMN

Qucosa - Publikationsserver der Universität Leipzig