13,051 research outputs found

    Spatio-temporal Video Re-localization by Warp LSTM

    Full text link
    The need for efficiently finding the video content a user wants is increasing because of the erupting of user-generated videos on the Web. Existing keyword-based or content-based video retrieval methods usually determine what occurs in a video but not when and where. In this paper, we make an answer to the question of when and where by formulating a new task, namely spatio-temporal video re-localization. Specifically, given a query video and a reference video, spatio-temporal video re-localization aims to localize tubelets in the reference video such that the tubelets semantically correspond to the query. To accurately localize the desired tubelets in the reference video, we propose a novel warp LSTM network, which propagates the spatio-temporal information for a long period and thereby captures the corresponding long-term dependencies. Another issue for spatio-temporal video re-localization is the lack of properly labeled video datasets. Therefore, we reorganize the videos in the AVA dataset to form a new dataset for spatio-temporal video re-localization research. Extensive experimental results show that the proposed model achieves superior performances over the designed baselines on the spatio-temporal video re-localization task

    Distributed video coding for wireless video sensor networks: a review of the state-of-the-art architectures

    Get PDF
    Distributed video coding (DVC) is a relatively new video coding architecture originated from two fundamental theorems namely, Slepian–Wolf and Wyner–Ziv. Recent research developments have made DVC attractive for applications in the emerging domain of wireless video sensor networks (WVSNs). This paper reviews the state-of-the-art DVC architectures with a focus on understanding their opportunities and gaps in addressing the operational requirements and application needs of WVSNs

    Color-decoupled photo response non-uniformity for digital image forensics

    Get PDF
    The last few years have seen the use of photo response non-uniformity noise (PRNU), a unique fingerprint of imaging sensors, in various digital forensic applications such as source device identification, content integrity verification and authentication. However, the use of a colour filter array for capturing only one of the three colour components per pixel introduces colour interpolation noise, while the existing methods for extracting PRNU provide no effective means for addressing this issue. Because the artificial colours obtained through the colour interpolation process is not directly acquired from the scene by physical hardware, we expect that the PRNU extracted from the physical components, which are free from interpolation noise, should be more reliable than that from the artificial channels, which carry interpolation noise. Based on this assumption we propose a Couple-Decoupled PRNU (CD-PRNU) extraction method, which first decomposes each colour channel into 4 sub-images and then extracts the PRNU noise from each sub-image. The PRNU noise patterns of the sub-images are then assembled to get the CD-PRNU. This new method can prevent the interpolation noise from propagating into the physical components, thus improving the accuracy of device identification and image content integrity verification

    Driving steady-state visual evoked potentials at arbitrary frequencies using temporal interpolation of stimulus presentation

    Get PDF
    Date of Acceptance: 29/10/2015 We thank Renate Zahn for help with data collection. This work was supported by Deutsche Forschungsgemeinschaft (AN 841/1-1, MU 972/20-1). We would like to thank A. Trujillo-Ortiz, R. Hernandez-Walls, A. Castro-Perez and K. BarbaRojo (Universidad Autonoma de Baja California) for making Matlab code for non-sphericity corrections freely available.Peer reviewedPublisher PD

    λΉ„λ””μ˜€ ν”„λ ˆμž„ 보간을 μœ„ν•œ ν…ŒμŠ€νŠΈ λ‹¨κ³„μ˜ 적응적 방법둠 연ꡬ

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·컴퓨터곡학뢀, 2021. 2. 이경무.Computationally handling videos has been one of the foremost goals in computer vision. In particular, analyzing the complex dynamics including motion and occlusion between two frames is of fundamental importance in understanding the visual contents of a video. Research on video frame interpolation, a problem where the goal is to synthesize high-quality intermediate frames between the two input frames, specifically investigates the low-level characteristics within the consecutive frames of a video. The topic has been recently gaining increased popularity and can be applied to various real-world applications such as generating slow-motion effects, novel view synthesis, or video stabilization. Existing methods for video frame interpolation aim to design complex new architectures to effectively estimate and compensate for the motion between two input frames. However, natural videos contain a wide variety of different scenarios, including foreground/background appearance and motion, frame rate, and occlusion. Therefore, even with a huge amount of training data, it is difficult for a single model to generalize well on all possible situations. This dissertation introduces novel methodologies for test-time adaptation for tackling the problem of video frame interpolation. In particular, I propose to enable three different aspects of the deep-learning-based framework to be adaptive: (1) feature activation, (2) network weights, and (3) architectural structures. Specifically, I first present how adaptively scaling the feature activations of a deep neural network with respect to each input frame using attention models allows for accurate interpolation. Unlike the previous approaches that heavily depend on optical flow estimation models, the proposed channel-attention-based model can achieve high-quality frame synthesis without explicit motion estimation. Then, meta-learning is employed for fast adaptation of the parameter values of the frame interpolation models. By learning to adapt for each input video clips, the proposed framework can consistently improve the performance of many existing models with just a single gradient update to its parameters. Lastly, I introduce an input-adaptive dynamic architecture that can assign different inference paths with respect to each local region of the input frames. By deciding the scaling factors of the inputs and the network depth of the early exit in the interpolation model, the dynamic framework can greatly improve the computational efficiency while maintaining, and sometimes even outperforming the performance of the baseline interpolation method. The effectiveness of the proposed test-time adaptation methodologies is extensively evaluated with multiple benchmark datasets for video frame interpolation. Thorough ablation studies with various hyperparameter settings and baseline networks also demonstrate the superiority of adaptation to the test-time inputs, which is a new research direction orthogonal to the other state-of-the-art frame interpolation approaches.κ³„μ‚°μ μœΌλ‘œ λΉ„λ””μ˜€ 데이터λ₯Ό μ²˜λ¦¬ν•˜λŠ” 것은 컴퓨터 λΉ„μ „ λΆ„μ•Όμ˜ μ€‘μš”ν•œ λͺ©ν‘œ 쀑 ν•˜λ‚˜μ΄κ³ , 이λ₯Ό μœ„ν•΄μ„  두 λΉ„λ””μ˜€ ν”„λ ˆμž„ μ‚¬μ΄μ˜ μ›€μ§μž„κ³Ό 가리어짐 λ“±μ˜ λ³΅μž‘ν•œ 정보λ₯Ό λΆ„μ„ν•˜λŠ” 것이 ν•„μˆ˜μ μ΄λ‹€. λΉ„λ””μ˜€ ν”„λ ˆμž„ 보간법은 두 μž…λ ₯ ν”„λ ˆμž„ μ‚¬μ΄μ˜ 쀑간 ν”„λ ˆμž„μ„ μ •ν™•ν•˜κ²Œ μƒμ„±ν•˜λŠ” 것을 λͺ©ν‘œλ‘œ ν•˜λŠ” 문제둜, μ—°μ†λœ λΉ„λ””μ˜€ ν”„λ ˆμž„ μ‚¬μ΄μ˜ μ •λ°€ν•œ (ν™”μ†Œ λ‹¨μœ„μ˜) νŠΉμ§•λ“€μ„ μ›€μ§μž„κ³Ό 가리어짐을 κ³ λ €ν•˜μ—¬ λΆ„μ„ν•˜λ„λ‘ μ—°κ΅¬λ˜μ—ˆλ‹€. 이 λΆ„μ•ΌλŠ” 슬둜우λͺ¨μ…˜ 효과 생성, λ‹€λ₯Έ μ‹œμ μ—μ„œ λ°”λΌλ³΄λŠ” 물체 생성, 손떨림 보정 λ“± μ‹€μƒν™œμ˜ λ‹€μ–‘ν•œ μ–΄ν”Œλ¦¬μΌ€μ΄μ…˜μ— ν™œμš©λ  수 있기 λ•Œλ¬Έμ— μ΅œκ·Όμ— λ§Žμ€ 관심을 λ°›κ³  μžˆλ‹€. 기쑴의 방법듀은 두 μž…λ ₯ ν”„λ ˆμž„ μ‚¬μ΄μ˜ ν”½μ…€ λ‹¨μœ„ μ›€μ§μž„ 정보λ₯Ό 효과적으둜 μ˜ˆμΈ‘ν•˜κ³  λ³΄μ™„ν•˜λŠ” λ°©ν–₯으둜 μ—°κ΅¬λ˜μ–΄μ™”λ‹€. ν•˜μ§€λ§Œ μ‹€μ œ λΉ„λ””μ˜€ λ°μ΄ν„°λŠ” λ‹€μ–‘ν•œ 물체듀 및 λ³΅μž‘ν•œ 배경의 μ›€μ§μž„, 이에 λ”°λ₯Έ 가리어짐, λΉ„λ””μ˜€λ§ˆλ‹€ λ‹¬λΌμ§€λŠ” ν”„λ ˆμž„μœ¨ λ“± 맀우 λ‹€μ–‘ν•œ ν™˜κ²½μ„ λ‹΄κ³  μžˆλ‹€. λ”°λΌμ„œ ν•˜λ‚˜μ˜ λͺ¨λΈλ‘œ λͺ¨λ“  ν™˜κ²½μ— 일반적으둜 잘 λ™μž‘ν•˜λŠ” λͺ¨λΈμ„ ν•™μŠ΅ν•˜λŠ” 것은 μˆ˜λ§Žμ€ ν•™μŠ΅ 데이터λ₯Ό ν™œμš©ν•˜μ—¬λ„ 맀우 μ–΄λ €μš΄ λ¬Έμ œμ΄λ‹€. λ³Έ ν•™μœ„ λ…Όλ¬Έμ—μ„œλŠ” λΉ„λ””μ˜€ ν”„λ ˆμž„ 보간 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•œ ν…ŒμŠ€νŠΈ λ‹¨κ³„μ˜ 적응적 방법둠듀을 μ œμ‹œν•œλ‹€. 특히 λ”₯λŸ¬λ‹ 기반의 ν”„λ ˆμž„μ›Œν¬λ₯Ό μ μ‘μ μœΌλ‘œ λ§Œλ“€κΈ° μœ„ν•˜μ—¬ (1) 피쳐 ν™œμ„±λ„ (feature activation), (2) λͺ¨λΈμ˜ νŒŒλΌλ―Έν„°, 그리고 (3) λ„€νŠΈμ›Œν¬μ˜ ꡬ쑰λ₯Ό λ³€ν˜•ν•  수 μžˆλ„λ‘ ν•˜λŠ” μ„Έ κ°€μ§€μ˜ μ•Œκ³ λ¦¬μ¦˜μ„ μ œμ•ˆν•œλ‹€. 첫 번째 μ•Œκ³ λ¦¬μ¦˜μ€ λ”₯ 신경망 λ„€νŠΈμ›Œν¬μ˜ λ‚΄λΆ€ 피쳐 ν™œμ„±λ„μ˜ 크기λ₯Ό 각각의 μž…λ ₯ ν”„λ ˆμž„μ— 따라 μ μ‘μ μœΌλ‘œ μ‘°μ ˆν•˜λ„λ‘ ν•˜λ©°, μ–΄ν…μ…˜ λͺ¨λΈμ„ ν™œμš©ν•˜μ—¬ μ •ν™•ν•œ ν”„λ ˆμž„ 보간 μ„±λŠ₯을 얻을 수 μžˆμ—ˆλ‹€. μ˜΅ν‹°μ»¬ ν”Œλ‘œμš° 예츑 λͺ¨λΈμ„ ν™œμš©ν•˜μ—¬ ν”½μ…€ λ‹¨μœ„λ‘œ μ›€μ§μž„ 정보λ₯Ό μΆ”μΆœν•œ λŒ€λΆ€λΆ„μ˜ κΈ°μ‘΄ 방식듀과 달리, μ œμ•ˆν•œ 채널 μ–΄ν…μ…˜ 기반의 λͺ¨λΈμ€ λ³„λ„μ˜ λͺ¨μ…˜ λͺ¨λΈ 없이도 맀우 μ •ν™•ν•œ 쀑간 ν”„λ ˆμž„μ„ 생성할 수 μžˆλ‹€. 두 번째둜 μ œμ•ˆν•˜λŠ” μ•Œκ³ λ¦¬μ¦˜μ€ ν”„λ ˆμž„ 보간 λͺ¨λΈμ˜ 각 νŒŒλΌλ―Έν„° 값을 μ μ‘μ μœΌλ‘œ λ³€κ²½ν•  수 μžˆλ„λ‘ λ©”νƒ€λŸ¬λ‹ (meta-learning) 방법둠을 μ‚¬μš©ν•œλ‹€. 각각의 μž…λ ₯ λΉ„λ””μ˜€ μ‹œν€€μŠ€λ§ˆλ‹€ λͺ¨λΈμ˜ νŒŒλΌλ―Έν„° 값을 μ μ‘μ μœΌλ‘œ μ—…λ°μ΄νŠΈν•  수 μžˆλ„λ‘ ν•™μŠ΅μ‹œμΌœ 쀌으둜써, μ œμ‹œν•œ ν”„λ ˆμž„μ›Œν¬λŠ” 기쑴의 μ–΄λ–€ ν”„λ ˆμž„ 보간 λͺ¨λΈμ„ μ‚¬μš©ν•˜λ”λΌλ„ 단 ν•œ 번의 κ·ΈλΌλ””μ–ΈνŠΈ μ—…λ°μ΄νŠΈλ₯Ό 톡해 μΌκ΄€λœ μ„±λŠ₯ ν–₯상을 λ³΄μ˜€λ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, μž…λ ₯에 따라 λ„€νŠΈμ›Œν¬μ˜ ꡬ쑰가 λ™μ μœΌλ‘œ λ³€ν˜•λ˜λŠ” ν”„λ ˆμž„μ›Œν¬λ₯Ό μ œμ‹œν•˜μ—¬ κ³΅κ°„μ μœΌλ‘œ λΆ„ν• λœ ν”„λ ˆμž„μ˜ 각 μ§€μ—­λ§ˆλ‹€ μ„œλ‘œ λ‹€λ₯Έ μΆ”λ‘  경둜λ₯Ό ν†΅κ³Όν•˜κ³ , λΆˆν•„μš”ν•œ κ³„μ‚°λŸ‰μ„ 상당 λΆ€λΆ„ 쀄일 수 μžˆλ„λ‘ ν•œλ‹€. μ œμ•ˆν•˜λŠ” 동적 λ„€νŠΈμ›Œν¬λŠ” μž…λ ₯ ν”„λ ˆμž„μ˜ 크기와 ν”„λ ˆμž„ 보간 λͺ¨λΈμ˜ 깊이λ₯Ό μ‘°μ ˆν•¨μœΌλ‘œμ¨ 베이슀라인 λͺ¨λΈμ˜ μ„±λŠ₯을 μœ μ§€ν•˜λ©΄μ„œ 계산 νš¨μœ¨μ„±μ„ 크게 μ¦κ°€ν•˜μ˜€λ‹€. λ³Έ ν•™μœ„ λ…Όλ¬Έμ—μ„œ μ œμ•ˆν•œ μ„Έ κ°€μ§€μ˜ 적응적 λ°©λ²•λ‘ μ˜ νš¨κ³ΌλŠ” λΉ„λ””μ˜€ ν”„λ ˆμž„ 보간법을 μœ„ν•œ μ—¬λŸ¬ 벀치마크 데이터셋에 λ©΄λ°€ν•˜κ²Œ ν‰κ°€λ˜μ—ˆλ‹€. 특히, λ‹€μ–‘ν•œ ν•˜μ΄νΌνŒŒλΌλ―Έν„° μ„ΈνŒ…κ³Ό μ—¬λŸ¬ 베이슀라인 λͺ¨λΈμ— λŒ€ν•œ 비ꡐ, 뢄석 μ‹€ν—˜μ„ 톡해 ν…ŒμŠ€νŠΈ λ‹¨κ³„μ—μ„œμ˜ 적응적 방법둠에 λŒ€ν•œ 효과λ₯Ό μž…μ¦ν•˜μ˜€λ‹€. μ΄λŠ” λΉ„λ””μ˜€ ν”„λ ˆμž„ 보간법에 λŒ€ν•œ μ΅œμ‹  결과듀에 μΆ”κ°€μ μœΌλ‘œ 적용될 수 μžˆλŠ” μƒˆλ‘œμš΄ 연ꡬ λ°©λ²•μœΌλ‘œ, μΆ”ν›„ λ‹€λ°©λ©΄μœΌλ‘œμ˜ ν™•μž₯성이 κΈ°λŒ€λœλ‹€.1 Introduction 1 1.1 Motivations 1 1.2 Proposed method 3 1.3 Contributions 5 1.4 Organization of dissertation 6 2 Feature Adaptation based Approach 7 2.1 Introduction 7 2.2 Related works 10 2.2.1 Video frame interpolation 10 2.2.2 Attention mechanism 12 2.3 Proposed Method 12 2.3.1 Overview of network architecture 13 2.3.2 Main components 14 2.3.3 Loss 16 2.4 Understanding our model 17 2.4.1 Internal feature visualization 18 2.4.2 Intermediate image reconstruction 21 2.5 Experiments 23 2.5.1 Datasets 23 2.5.2 Implementation details 25 2.5.3 Comparison to the state-of-the-art 26 2.5.4 Ablation study 36 2.6 Summary 38 3 Meta-Learning based Approach 39 3.1 Introduction 39 3.2 Related works 42 3.3 Proposed method 44 3.3.1 Video frame interpolation problem set-up 44 3.3.2 Exploiting extra information at test time 45 3.3.3 Background on MAML 48 3.3.4 MetaVFI: Meta-learning for frame interpolation 49 3.4 Experiments 54 3.4.1 Settings 54 3.4.2 Meta-learning algorithm selection 56 3.4.3 Video frame interpolation results 58 3.4.4 Ablation studies 66 3.5 Summary 69 4 Dynamic Architecture based Approach 71 4.1 Introduction 71 4.2 Related works 75 4.2.1 Video frame interpolation 75 4.2.2 Adaptive inference 76 4.3 Proposed Method 77 4.3.1 Dynamic framework overview 77 4.3.2 Scale and depth finder (SD-finder) 80 4.3.3 Dynamic interpolation model 82 4.3.4 Training 83 4.4 Experiments 85 4.4.1 Datasets 85 4.4.2 Implementation details 86 4.4.3 Quantitative comparison 87 4.4.4 Visual comparison 93 4.4.5 Ablation study 97 4.5 Summary 100 5 Conclusion 103 5.1 Summary of dissertation 103 5.2 Future works 104 Bibliography 107 ꡭ문초둝 120Docto

    H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions

    Full text link
    Capitalizing on the rapid development of neural networks, recent video frame interpolation (VFI) methods have achieved notable improvements. However, they still fall short for real-world videos containing large motions. Complex deformation and/or occlusion caused by large motions make it an extremely difficult problem in video frame interpolation. In this paper, we propose a simple yet effective solution, H-VFI, to deal with large motions in video frame interpolation. H-VFI contributes a hierarchical video interpolation transformer (HVIT) to learn a deformable kernel in a coarse-to-fine strategy in multiple scales. The learnt deformable kernel is then utilized in convolving the input frames for predicting the interpolated frame. Starting from the smallest scale, H-VFI updates the deformable kernel by a residual in succession based on former predicted kernels, intermediate interpolated results and hierarchical features from transformer. Bias and masks to refine the final outputs are then predicted by a transformer block based on interpolated results. The advantage of such a progressive approximation is that the large motion frame interpolation problem can be decomposed into several relatively simpler sub-tasks, which enables a very accurate prediction in the final results. Another noteworthy contribution of our paper consists of a large-scale high-quality dataset, YouTube200K, which contains videos depicting a great variety of scenarios captured at high resolution and high frame rate. Extensive experiments on multiple frame interpolation benchmarks validate that H-VFI outperforms existing state-of-the-art methods especially for videos with large motions
    • …
    corecore