353 research outputs found

    λΉ„λ””μ˜€ ν”„λ ˆμž„ 보간을 μœ„ν•œ ν…ŒμŠ€νŠΈ λ‹¨κ³„μ˜ 적응적 방법둠 연ꡬ

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·컴퓨터곡학뢀, 2021. 2. 이경무.Computationally handling videos has been one of the foremost goals in computer vision. In particular, analyzing the complex dynamics including motion and occlusion between two frames is of fundamental importance in understanding the visual contents of a video. Research on video frame interpolation, a problem where the goal is to synthesize high-quality intermediate frames between the two input frames, specifically investigates the low-level characteristics within the consecutive frames of a video. The topic has been recently gaining increased popularity and can be applied to various real-world applications such as generating slow-motion effects, novel view synthesis, or video stabilization. Existing methods for video frame interpolation aim to design complex new architectures to effectively estimate and compensate for the motion between two input frames. However, natural videos contain a wide variety of different scenarios, including foreground/background appearance and motion, frame rate, and occlusion. Therefore, even with a huge amount of training data, it is difficult for a single model to generalize well on all possible situations. This dissertation introduces novel methodologies for test-time adaptation for tackling the problem of video frame interpolation. In particular, I propose to enable three different aspects of the deep-learning-based framework to be adaptive: (1) feature activation, (2) network weights, and (3) architectural structures. Specifically, I first present how adaptively scaling the feature activations of a deep neural network with respect to each input frame using attention models allows for accurate interpolation. Unlike the previous approaches that heavily depend on optical flow estimation models, the proposed channel-attention-based model can achieve high-quality frame synthesis without explicit motion estimation. Then, meta-learning is employed for fast adaptation of the parameter values of the frame interpolation models. By learning to adapt for each input video clips, the proposed framework can consistently improve the performance of many existing models with just a single gradient update to its parameters. Lastly, I introduce an input-adaptive dynamic architecture that can assign different inference paths with respect to each local region of the input frames. By deciding the scaling factors of the inputs and the network depth of the early exit in the interpolation model, the dynamic framework can greatly improve the computational efficiency while maintaining, and sometimes even outperforming the performance of the baseline interpolation method. The effectiveness of the proposed test-time adaptation methodologies is extensively evaluated with multiple benchmark datasets for video frame interpolation. Thorough ablation studies with various hyperparameter settings and baseline networks also demonstrate the superiority of adaptation to the test-time inputs, which is a new research direction orthogonal to the other state-of-the-art frame interpolation approaches.κ³„μ‚°μ μœΌλ‘œ λΉ„λ””μ˜€ 데이터λ₯Ό μ²˜λ¦¬ν•˜λŠ” 것은 컴퓨터 λΉ„μ „ λΆ„μ•Όμ˜ μ€‘μš”ν•œ λͺ©ν‘œ 쀑 ν•˜λ‚˜μ΄κ³ , 이λ₯Ό μœ„ν•΄μ„  두 λΉ„λ””μ˜€ ν”„λ ˆμž„ μ‚¬μ΄μ˜ μ›€μ§μž„κ³Ό 가리어짐 λ“±μ˜ λ³΅μž‘ν•œ 정보λ₯Ό λΆ„μ„ν•˜λŠ” 것이 ν•„μˆ˜μ μ΄λ‹€. λΉ„λ””μ˜€ ν”„λ ˆμž„ 보간법은 두 μž…λ ₯ ν”„λ ˆμž„ μ‚¬μ΄μ˜ 쀑간 ν”„λ ˆμž„μ„ μ •ν™•ν•˜κ²Œ μƒμ„±ν•˜λŠ” 것을 λͺ©ν‘œλ‘œ ν•˜λŠ” 문제둜, μ—°μ†λœ λΉ„λ””μ˜€ ν”„λ ˆμž„ μ‚¬μ΄μ˜ μ •λ°€ν•œ (ν™”μ†Œ λ‹¨μœ„μ˜) νŠΉμ§•λ“€μ„ μ›€μ§μž„κ³Ό 가리어짐을 κ³ λ €ν•˜μ—¬ λΆ„μ„ν•˜λ„λ‘ μ—°κ΅¬λ˜μ—ˆλ‹€. 이 λΆ„μ•ΌλŠ” 슬둜우λͺ¨μ…˜ 효과 생성, λ‹€λ₯Έ μ‹œμ μ—μ„œ λ°”λΌλ³΄λŠ” 물체 생성, 손떨림 보정 λ“± μ‹€μƒν™œμ˜ λ‹€μ–‘ν•œ μ–΄ν”Œλ¦¬μΌ€μ΄μ…˜μ— ν™œμš©λ  수 있기 λ•Œλ¬Έμ— μ΅œκ·Όμ— λ§Žμ€ 관심을 λ°›κ³  μžˆλ‹€. 기쑴의 방법듀은 두 μž…λ ₯ ν”„λ ˆμž„ μ‚¬μ΄μ˜ ν”½μ…€ λ‹¨μœ„ μ›€μ§μž„ 정보λ₯Ό 효과적으둜 μ˜ˆμΈ‘ν•˜κ³  λ³΄μ™„ν•˜λŠ” λ°©ν–₯으둜 μ—°κ΅¬λ˜μ–΄μ™”λ‹€. ν•˜μ§€λ§Œ μ‹€μ œ λΉ„λ””μ˜€ λ°μ΄ν„°λŠ” λ‹€μ–‘ν•œ 물체듀 및 λ³΅μž‘ν•œ 배경의 μ›€μ§μž„, 이에 λ”°λ₯Έ 가리어짐, λΉ„λ””μ˜€λ§ˆλ‹€ λ‹¬λΌμ§€λŠ” ν”„λ ˆμž„μœ¨ λ“± 맀우 λ‹€μ–‘ν•œ ν™˜κ²½μ„ λ‹΄κ³  μžˆλ‹€. λ”°λΌμ„œ ν•˜λ‚˜μ˜ λͺ¨λΈλ‘œ λͺ¨λ“  ν™˜κ²½μ— 일반적으둜 잘 λ™μž‘ν•˜λŠ” λͺ¨λΈμ„ ν•™μŠ΅ν•˜λŠ” 것은 μˆ˜λ§Žμ€ ν•™μŠ΅ 데이터λ₯Ό ν™œμš©ν•˜μ—¬λ„ 맀우 μ–΄λ €μš΄ λ¬Έμ œμ΄λ‹€. λ³Έ ν•™μœ„ λ…Όλ¬Έμ—μ„œλŠ” λΉ„λ””μ˜€ ν”„λ ˆμž„ 보간 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•œ ν…ŒμŠ€νŠΈ λ‹¨κ³„μ˜ 적응적 방법둠듀을 μ œμ‹œν•œλ‹€. 특히 λ”₯λŸ¬λ‹ 기반의 ν”„λ ˆμž„μ›Œν¬λ₯Ό μ μ‘μ μœΌλ‘œ λ§Œλ“€κΈ° μœ„ν•˜μ—¬ (1) 피쳐 ν™œμ„±λ„ (feature activation), (2) λͺ¨λΈμ˜ νŒŒλΌλ―Έν„°, 그리고 (3) λ„€νŠΈμ›Œν¬μ˜ ꡬ쑰λ₯Ό λ³€ν˜•ν•  수 μžˆλ„λ‘ ν•˜λŠ” μ„Έ κ°€μ§€μ˜ μ•Œκ³ λ¦¬μ¦˜μ„ μ œμ•ˆν•œλ‹€. 첫 번째 μ•Œκ³ λ¦¬μ¦˜μ€ λ”₯ 신경망 λ„€νŠΈμ›Œν¬μ˜ λ‚΄λΆ€ 피쳐 ν™œμ„±λ„μ˜ 크기λ₯Ό 각각의 μž…λ ₯ ν”„λ ˆμž„μ— 따라 μ μ‘μ μœΌλ‘œ μ‘°μ ˆν•˜λ„λ‘ ν•˜λ©°, μ–΄ν…μ…˜ λͺ¨λΈμ„ ν™œμš©ν•˜μ—¬ μ •ν™•ν•œ ν”„λ ˆμž„ 보간 μ„±λŠ₯을 얻을 수 μžˆμ—ˆλ‹€. μ˜΅ν‹°μ»¬ ν”Œλ‘œμš° 예츑 λͺ¨λΈμ„ ν™œμš©ν•˜μ—¬ ν”½μ…€ λ‹¨μœ„λ‘œ μ›€μ§μž„ 정보λ₯Ό μΆ”μΆœν•œ λŒ€λΆ€λΆ„μ˜ κΈ°μ‘΄ 방식듀과 달리, μ œμ•ˆν•œ 채널 μ–΄ν…μ…˜ 기반의 λͺ¨λΈμ€ λ³„λ„μ˜ λͺ¨μ…˜ λͺ¨λΈ 없이도 맀우 μ •ν™•ν•œ 쀑간 ν”„λ ˆμž„μ„ 생성할 수 μžˆλ‹€. 두 번째둜 μ œμ•ˆν•˜λŠ” μ•Œκ³ λ¦¬μ¦˜μ€ ν”„λ ˆμž„ 보간 λͺ¨λΈμ˜ 각 νŒŒλΌλ―Έν„° 값을 μ μ‘μ μœΌλ‘œ λ³€κ²½ν•  수 μžˆλ„λ‘ λ©”νƒ€λŸ¬λ‹ (meta-learning) 방법둠을 μ‚¬μš©ν•œλ‹€. 각각의 μž…λ ₯ λΉ„λ””μ˜€ μ‹œν€€μŠ€λ§ˆλ‹€ λͺ¨λΈμ˜ νŒŒλΌλ―Έν„° 값을 μ μ‘μ μœΌλ‘œ μ—…λ°μ΄νŠΈν•  수 μžˆλ„λ‘ ν•™μŠ΅μ‹œμΌœ 쀌으둜써, μ œμ‹œν•œ ν”„λ ˆμž„μ›Œν¬λŠ” 기쑴의 μ–΄λ–€ ν”„λ ˆμž„ 보간 λͺ¨λΈμ„ μ‚¬μš©ν•˜λ”λΌλ„ 단 ν•œ 번의 κ·ΈλΌλ””μ–ΈνŠΈ μ—…λ°μ΄νŠΈλ₯Ό 톡해 μΌκ΄€λœ μ„±λŠ₯ ν–₯상을 λ³΄μ˜€λ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, μž…λ ₯에 따라 λ„€νŠΈμ›Œν¬μ˜ ꡬ쑰가 λ™μ μœΌλ‘œ λ³€ν˜•λ˜λŠ” ν”„λ ˆμž„μ›Œν¬λ₯Ό μ œμ‹œν•˜μ—¬ κ³΅κ°„μ μœΌλ‘œ λΆ„ν• λœ ν”„λ ˆμž„μ˜ 각 μ§€μ—­λ§ˆλ‹€ μ„œλ‘œ λ‹€λ₯Έ μΆ”λ‘  경둜λ₯Ό ν†΅κ³Όν•˜κ³ , λΆˆν•„μš”ν•œ κ³„μ‚°λŸ‰μ„ 상당 λΆ€λΆ„ 쀄일 수 μžˆλ„λ‘ ν•œλ‹€. μ œμ•ˆν•˜λŠ” 동적 λ„€νŠΈμ›Œν¬λŠ” μž…λ ₯ ν”„λ ˆμž„μ˜ 크기와 ν”„λ ˆμž„ 보간 λͺ¨λΈμ˜ 깊이λ₯Ό μ‘°μ ˆν•¨μœΌλ‘œμ¨ 베이슀라인 λͺ¨λΈμ˜ μ„±λŠ₯을 μœ μ§€ν•˜λ©΄μ„œ 계산 νš¨μœ¨μ„±μ„ 크게 μ¦κ°€ν•˜μ˜€λ‹€. λ³Έ ν•™μœ„ λ…Όλ¬Έμ—μ„œ μ œμ•ˆν•œ μ„Έ κ°€μ§€μ˜ 적응적 λ°©λ²•λ‘ μ˜ νš¨κ³ΌλŠ” λΉ„λ””μ˜€ ν”„λ ˆμž„ 보간법을 μœ„ν•œ μ—¬λŸ¬ 벀치마크 데이터셋에 λ©΄λ°€ν•˜κ²Œ ν‰κ°€λ˜μ—ˆλ‹€. 특히, λ‹€μ–‘ν•œ ν•˜μ΄νΌνŒŒλΌλ―Έν„° μ„ΈνŒ…κ³Ό μ—¬λŸ¬ 베이슀라인 λͺ¨λΈμ— λŒ€ν•œ 비ꡐ, 뢄석 μ‹€ν—˜μ„ 톡해 ν…ŒμŠ€νŠΈ λ‹¨κ³„μ—μ„œμ˜ 적응적 방법둠에 λŒ€ν•œ 효과λ₯Ό μž…μ¦ν•˜μ˜€λ‹€. μ΄λŠ” λΉ„λ””μ˜€ ν”„λ ˆμž„ 보간법에 λŒ€ν•œ μ΅œμ‹  결과듀에 μΆ”κ°€μ μœΌλ‘œ 적용될 수 μžˆλŠ” μƒˆλ‘œμš΄ 연ꡬ λ°©λ²•μœΌλ‘œ, μΆ”ν›„ λ‹€λ°©λ©΄μœΌλ‘œμ˜ ν™•μž₯성이 κΈ°λŒ€λœλ‹€.1 Introduction 1 1.1 Motivations 1 1.2 Proposed method 3 1.3 Contributions 5 1.4 Organization of dissertation 6 2 Feature Adaptation based Approach 7 2.1 Introduction 7 2.2 Related works 10 2.2.1 Video frame interpolation 10 2.2.2 Attention mechanism 12 2.3 Proposed Method 12 2.3.1 Overview of network architecture 13 2.3.2 Main components 14 2.3.3 Loss 16 2.4 Understanding our model 17 2.4.1 Internal feature visualization 18 2.4.2 Intermediate image reconstruction 21 2.5 Experiments 23 2.5.1 Datasets 23 2.5.2 Implementation details 25 2.5.3 Comparison to the state-of-the-art 26 2.5.4 Ablation study 36 2.6 Summary 38 3 Meta-Learning based Approach 39 3.1 Introduction 39 3.2 Related works 42 3.3 Proposed method 44 3.3.1 Video frame interpolation problem set-up 44 3.3.2 Exploiting extra information at test time 45 3.3.3 Background on MAML 48 3.3.4 MetaVFI: Meta-learning for frame interpolation 49 3.4 Experiments 54 3.4.1 Settings 54 3.4.2 Meta-learning algorithm selection 56 3.4.3 Video frame interpolation results 58 3.4.4 Ablation studies 66 3.5 Summary 69 4 Dynamic Architecture based Approach 71 4.1 Introduction 71 4.2 Related works 75 4.2.1 Video frame interpolation 75 4.2.2 Adaptive inference 76 4.3 Proposed Method 77 4.3.1 Dynamic framework overview 77 4.3.2 Scale and depth finder (SD-finder) 80 4.3.3 Dynamic interpolation model 82 4.3.4 Training 83 4.4 Experiments 85 4.4.1 Datasets 85 4.4.2 Implementation details 86 4.4.3 Quantitative comparison 87 4.4.4 Visual comparison 93 4.4.5 Ablation study 97 4.5 Summary 100 5 Conclusion 103 5.1 Summary of dissertation 103 5.2 Future works 104 Bibliography 107 ꡭ문초둝 120Docto

    CML-MOTS: Collaborative Multi-task Learning for Multi-Object Tracking and Segmentation

    Full text link
    The advancement of computer vision has pushed visual analysis tasks from still images to the video domain. In recent years, video instance segmentation, which aims to track and segment multiple objects in video frames, has drawn much attention for its potential applications in various emerging areas such as autonomous driving, intelligent transportation, and smart retail. In this paper, we propose an effective framework for instance-level visual analysis on video frames, which can simultaneously conduct object detection, instance segmentation, and multi-object tracking. The core idea of our method is collaborative multi-task learning which is achieved by a novel structure, named associative connections among detection, segmentation, and tracking task heads in an end-to-end learnable CNN. These additional connections allow information propagation across multiple related tasks, so as to benefit these tasks simultaneously. We evaluate the proposed method extensively on KITTI MOTS and MOTS Challenge datasets and obtain quite encouraging results

    Optimizing Magnetic Resonance Imaging for Image-Guided Radiotherapy

    Full text link
    Magnetic resonance imaging (MRI) is playing an increasingly important role in image-guided radiotherapy. MRI provides excellent soft tissue contrast, and is flexible in characterizing various tissue properties including relaxation, diffusion and perfusion. This thesis aims at developing new image analysis and reconstruction algorithms to optimize MRI in support of treatment planning, target delineation and treatment response assessment for radiotherapy. First, unlike Computed Tomography (CT) images, MRI cannot provide electron density information necessary for radiation dose calculation. To address this, we developed a synthetic CT generation algorithm that generates pseudo CT images from MRI, based on tissue classification results on MRI for female pelvic patients. To improve tissue classification accuracy, we learnt a pelvic bone shape model from a training dataset, and integrated the shape model into an intensity-based fuzzy c-menas classification scheme. The shape-regularized tissue classification algorithm is capable of differentiating tissues that have significant overlap in MRI intensity distributions. Treatment planning dose calculations using synthetic CT image volumes generated from the tissue classification results show acceptably small variations as compared to CT volumes. As MRI artifacts, such as B1 filed inhomogeneity (bias field) may negatively impact the tissue classification accuracy, we also developed an algorithm that integrates the correction of bias field into the tissue classification scheme. We modified the fuzzy c-means classification by modeling the image intensity as the true intensity corrupted by the multiplicative bias field. A regularization term further ensures the smoothness of the bias field. We solved the optimization problem using a linearized alternating direction method of multipliers (ADMM) method, which is more computational efficient over existing methods. The second part of this thesis looks at a special MR imaging technique, diffusion-weighted MRI (DWI). By acquiring a series of DWI images with a wide range of b-values, high order diffusion analysis can be performed using the DWI image series and new biomarkers for tumor grading, delineation and treatment response evaluation may be extracted. However, DWI suffers from low signal-to-noise ratio at high b-values, and the multi-b-value acquisition makes the total scan time impractical for clinical use. In this thesis, we proposed an accelerated DWI scheme, that sparsely samples k-space and reconstructs images using a model-based algorithm. Specifically, we built a 3D block-Hankel tensor from k-space samples, and modeled both local and global correlations of the high dimensional k-space data as a low-rank property of the tensor. We also added a phase constraint to account for large phase variations across different b-values, and to allow reconstruction from partial Fourier acquisition, which further accelerates the image acquisition. We proposed an ADMM algorithm to solve the constrained image reconstruction problem. Image reconstructions using both simulated and patient data show improved signal-to-noise ratio. As compared to clinically used parallel imaging scheme which achieves a 4-fold acceleration, our method achieves an 8-fold acceleration. Reconstructed images show reduced reconstruction errors as proved on simulated data and similar diffusion parameter mapping results on patient data.PHDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/143919/1/llliu_1.pd

    Optical flow estimation using steered-L1 norm

    Get PDF
    Motion is a very important part of understanding the visual picture of the surrounding environment. In image processing it involves the estimation of displacements for image points in an image sequence. In this context dense optical flow estimation is concerned with the computation of pixel displacements in a sequence of images, therefore it has been used widely in the field of image processing and computer vision. A lot of research was dedicated to enable an accurate and fast motion computation in image sequences. Despite the recent advances in the computation of optical flow, there is still room for improvements and optical flow algorithms still suffer from several issues, such as motion discontinuities, occlusion handling, and robustness to illumination changes. This thesis includes an investigation for the topic of optical flow and its applications. It addresses several issues in the computation of dense optical flow and proposes solutions. Specifically, this thesis is divided into two main parts dedicated to address two main areas of interest in optical flow. In the first part, image registration using optical flow is investigated. Both local and global image registration has been used for image registration. An image registration based on an improved version of the combined Local-global method of optical flow computation is proposed. A bi-lateral filter was used in this optical flow method to improve the edge preserving performance. It is shown that image registration via this method gives more robust results compared to the local and the global optical flow methods previously investigated. The second part of this thesis encompasses the main contribution of this research which is an improved total variation L1 norm. A smoothness term is used in the optical flow energy function to regularise this function. The L1 is a plausible choice for such a term because of its performance in preserving edges, however this term is known to be isotropic and hence decreases the penalisation near motion boundaries in all directions. The proposed improved L1 (termed here as the steered-L1 norm) smoothness term demonstrates similar performance across motion boundaries but improves the penalisation performance along such boundaries

    Realtime Dynamic 3D Facial Reconstruction for Monocular Video In-the-Wild

    Get PDF
    With the increasing amount of videos recorded using 2D mobile cameras, the technique for recovering the 3D dynamic facial models from these monocular videos has become a necessity for many image and video editing applications. While methods based parametric 3D facial models can reconstruct the 3D shape in dynamic environment, large structural changes are ignored. Structure-from-motion methods can reconstruct these changes but assume the object to be static. To address this problem we present a novel method for realtime dynamic 3D facial tracking and reconstruction from videos captured in uncontrolled environments. Our method can track the deforming facial geometry and reconstruct external objects that protrude from the face such as glasses and hair. It also allows users to move around, perform facial expressions freely without degrading the reconstruction quality

    Beyond the pixels: learning and utilising video compression features for localisation of digital tampering.

    Get PDF
    Video compression is pervasive in digital society. With rising usage of deep convolutional neural networks (CNNs) in the fields of computer vision, video analysis and video tampering detection, it is important to investigate how patterns invisible to human eyes may be influencing modern computer vision techniques and how they can be used advantageously. This work thoroughly explores how video compression influences accuracy of CNNs and shows how optimal performance is achieved when compression levels in the training set closely match those of the test set. A novel method is then developed, using CNNs, to derive compression features directly from the pixels of video frames. It is then shown that these features can be readily used to detect inauthentic video content with good accuracy across multiple different video tampering techniques. Moreover, the ability to explain these features allows predictions to be made about their effectiveness against future tampering methods. The problem is motivated with a novel investigation into recent video manipulation methods, which shows that there is a consistent drive to produce convincing, photorealistic, manipulated or synthetic video. Humans, blind to the presence of video tampering, are also blind to the type of tampering. New detection techniques are required and, in order to compensate for human limitations, they should be broadly applicable to multiple tampering types. This thesis details the steps necessary to develop and evaluate such techniques
    • …
    corecore