Search CORE

13,052 research outputs found

Spatio-temporal Video Re-localization by Warp LSTM

Author: Feng Yang
Liu Wei
Luo Jiebo
Ma Lin
Publication venue
Publication date: 09/05/2019
Field of study

The need for efficiently finding the video content a user wants is increasing because of the erupting of user-generated videos on the Web. Existing keyword-based or content-based video retrieval methods usually determine what occurs in a video but not when and where. In this paper, we make an answer to the question of when and where by formulating a new task, namely spatio-temporal video re-localization. Specifically, given a query video and a reference video, spatio-temporal video re-localization aims to localize tubelets in the reference video such that the tubelets semantically correspond to the query. To accurately localize the desired tubelets in the reference video, we propose a novel warp LSTM network, which propagates the spatio-temporal information for a long period and thereby captures the corresponding long-term dependencies. Another issue for spatio-temporal video re-localization is the lack of properly labeled video datasets. Therefore, we reorganize the videos in the AVA dataset to form a new dataset for spatio-temporal video re-localization research. Extensive experimental results show that the proposed model achieves superior performances over the designed baselines on the spatio-temporal video re-localization task

arXiv.org e-Print Archive

Crossref

Distributed video coding for wireless video sensor networks: a review of the state-of-the-art architectures

Author: Fong A.C.M.
Imran Noreen
Seet Boon-Chong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/09/2015
Field of study

Distributed video coding (DVC) is a relatively new video coding architecture originated from two fundamental theorems namely, Slepian–Wolf and Wyner–Ziv. Recent research developments have made DVC attractive for applications in the emerging domain of wireless video sensor networks (WVSNs). This paper reviews the state-of-the-art DVC architectures with a focus on understanding their opportunities and gaps in addressing the operational requirements and application needs of WVSNs

Crossref

Springer - Publisher Connector

PubMed Central

Enlighten

Color-decoupled photo response non-uniformity for digital image forensics

Author: Li Chang-Tsun
Li Yue
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2012
Field of study

The last few years have seen the use of photo response non-uniformity noise (PRNU), a unique fingerprint of imaging sensors, in various digital forensic applications such as source device identification, content integrity verification and authentication. However, the use of a colour filter array for capturing only one of the three colour components per pixel introduces colour interpolation noise, while the existing methods for extracting PRNU provide no effective means for addressing this issue. Because the artificial colours obtained through the colour interpolation process is not directly acquired from the scene by physical hardware, we expect that the PRNU extracted from the physical components, which are free from interpolation noise, should be more reliable than that from the artificial channels, which carry interpolation noise. Based on this assumption we propose a Couple-Decoupled PRNU (CD-PRNU) extraction method, which first decomposes each colour channel into 4 sub-images and then extracts the PRNU noise from each sub-image. The PRNU noise patterns of the sub-images are then assembled to get the CD-PRNU. This new method can prevent the interpolation noise from propagating into the physical components, thus improving the accuracy of device identification and image content integrity verification

CiteSeerX

Deakin Research Online

Crossref

Warwick Research Archives Portal Repository

비디오 프레임 보간을 위한 테스트 단계의 적응적 방법론 연구

Author: 최명섭
Publication venue: 서울대학교 대학원
Publication date: 01/02/2021
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2021. 2. 이경무.Computationally handling videos has been one of the foremost goals in computer vision. In particular, analyzing the complex dynamics including motion and occlusion between two frames is of fundamental importance in understanding the visual contents of a video. Research on video frame interpolation, a problem where the goal is to synthesize high-quality intermediate frames between the two input frames, specifically investigates the low-level characteristics within the consecutive frames of a video. The topic has been recently gaining increased popularity and can be applied to various real-world applications such as generating slow-motion effects, novel view synthesis, or video stabilization. Existing methods for video frame interpolation aim to design complex new architectures to effectively estimate and compensate for the motion between two input frames. However, natural videos contain a wide variety of different scenarios, including foreground/background appearance and motion, frame rate, and occlusion. Therefore, even with a huge amount of training data, it is difficult for a single model to generalize well on all possible situations. This dissertation introduces novel methodologies for test-time adaptation for tackling the problem of video frame interpolation. In particular, I propose to enable three different aspects of the deep-learning-based framework to be adaptive: (1) feature activation, (2) network weights, and (3) architectural structures. Specifically, I first present how adaptively scaling the feature activations of a deep neural network with respect to each input frame using attention models allows for accurate interpolation. Unlike the previous approaches that heavily depend on optical flow estimation models, the proposed channel-attention-based model can achieve high-quality frame synthesis without explicit motion estimation. Then, meta-learning is employed for fast adaptation of the parameter values of the frame interpolation models. By learning to adapt for each input video clips, the proposed framework can consistently improve the performance of many existing models with just a single gradient update to its parameters. Lastly, I introduce an input-adaptive dynamic architecture that can assign different inference paths with respect to each local region of the input frames. By deciding the scaling factors of the inputs and the network depth of the early exit in the interpolation model, the dynamic framework can greatly improve the computational efficiency while maintaining, and sometimes even outperforming the performance of the baseline interpolation method. The effectiveness of the proposed test-time adaptation methodologies is extensively evaluated with multiple benchmark datasets for video frame interpolation. Thorough ablation studies with various hyperparameter settings and baseline networks also demonstrate the superiority of adaptation to the test-time inputs, which is a new research direction orthogonal to the other state-of-the-art frame interpolation approaches.계산적으로 비디오 데이터를 처리하는 것은 컴퓨터 비전 분야의 중요한 목표 중 하나이고, 이를 위해선 두 비디오 프레임 사이의 움직임과 가리어짐 등의 복잡한 정보를 분석하는 것이 필수적이다. 비디오 프레임 보간법은 두 입력 프레임 사이의 중간 프레임을 정확하게 생성하는 것을 목표로 하는 문제로, 연속된 비디오 프레임 사이의 정밀한 (화소 단위의) 특징들을 움직임과 가리어짐을 고려하여 분석하도록 연구되었다. 이 분야는 슬로우모션 효과 생성, 다른 시점에서 바라보는 물체 생성, 손떨림 보정 등 실생활의 다양한 어플리케이션에 활용될 수 있기 때문에 최근에 많은 관심을 받고 있다. 기존의 방법들은 두 입력 프레임 사이의 픽셀 단위 움직임 정보를 효과적으로 예측하고 보완하는 방향으로 연구되어왔다. 하지만 실제 비디오 데이터는 다양한 물체들 및 복잡한 배경의 움직임, 이에 따른 가리어짐, 비디오마다 달라지는 프레임율 등 매우 다양한 환경을 담고 있다. 따라서 하나의 모델로 모든 환경에 일반적으로 잘 동작하는 모델을 학습하는 것은 수많은 학습 데이터를 활용하여도 매우 어려운 문제이다. 본 학위 논문에서는 비디오 프레임 보간 문제를 해결하기 위한 테스트 단계의 적응적 방법론들을 제시한다. 특히 딥러닝 기반의 프레임워크를 적응적으로 만들기 위하여 (1) 피쳐 활성도 (feature activation), (2) 모델의 파라미터, 그리고 (3) 네트워크의 구조를 변형할 수 있도록 하는 세 가지의 알고리즘을 제안한다. 첫 번째 알고리즘은 딥 신경망 네트워크의 내부 피쳐 활성도의 크기를 각각의 입력 프레임에 따라 적응적으로 조절하도록 하며, 어텐션 모델을 활용하여 정확한 프레임 보간 성능을 얻을 수 있었다. 옵티컬 플로우 예측 모델을 활용하여 픽셀 단위로 움직임 정보를 추출한 대부분의 기존 방식들과 달리, 제안한 채널 어텐션 기반의 모델은 별도의 모션 모델 없이도 매우 정확한 중간 프레임을 생성할 수 있다. 두 번째로 제안하는 알고리즘은 프레임 보간 모델의 각 파라미터 값을 적응적으로 변경할 수 있도록 메타러닝 (meta-learning) 방법론을 사용한다. 각각의 입력 비디오 시퀀스마다 모델의 파라미터 값을 적응적으로 업데이트할 수 있도록 학습시켜 줌으로써, 제시한 프레임워크는 기존의 어떤 프레임 보간 모델을 사용하더라도 단 한 번의 그라디언트 업데이트를 통해 일관된 성능 향상을 보였다. 마지막으로, 입력에 따라 네트워크의 구조가 동적으로 변형되는 프레임워크를 제시하여 공간적으로 분할된 프레임의 각 지역마다 서로 다른 추론 경로를 통과하고, 불필요한 계산량을 상당 부분 줄일 수 있도록 한다. 제안하는 동적 네트워크는 입력 프레임의 크기와 프레임 보간 모델의 깊이를 조절함으로써 베이스라인 모델의 성능을 유지하면서 계산 효율성을 크게 증가하였다. 본 학위 논문에서 제안한 세 가지의 적응적 방법론의 효과는 비디오 프레임 보간법을 위한 여러 벤치마크 데이터셋에 면밀하게 평가되었다. 특히, 다양한 하이퍼파라미터 세팅과 여러 베이스라인 모델에 대한 비교, 분석 실험을 통해 테스트 단계에서의 적응적 방법론에 대한 효과를 입증하였다. 이는 비디오 프레임 보간법에 대한 최신 결과들에 추가적으로 적용될 수 있는 새로운 연구 방법으로, 추후 다방면으로의 확장성이 기대된다.1 Introduction 1 1.1 Motivations 1 1.2 Proposed method 3 1.3 Contributions 5 1.4 Organization of dissertation 6 2 Feature Adaptation based Approach 7 2.1 Introduction 7 2.2 Related works 10 2.2.1 Video frame interpolation 10 2.2.2 Attention mechanism 12 2.3 Proposed Method 12 2.3.1 Overview of network architecture 13 2.3.2 Main components 14 2.3.3 Loss 16 2.4 Understanding our model 17 2.4.1 Internal feature visualization 18 2.4.2 Intermediate image reconstruction 21 2.5 Experiments 23 2.5.1 Datasets 23 2.5.2 Implementation details 25 2.5.3 Comparison to the state-of-the-art 26 2.5.4 Ablation study 36 2.6 Summary 38 3 Meta-Learning based Approach 39 3.1 Introduction 39 3.2 Related works 42 3.3 Proposed method 44 3.3.1 Video frame interpolation problem set-up 44 3.3.2 Exploiting extra information at test time 45 3.3.3 Background on MAML 48 3.3.4 MetaVFI: Meta-learning for frame interpolation 49 3.4 Experiments 54 3.4.1 Settings 54 3.4.2 Meta-learning algorithm selection 56 3.4.3 Video frame interpolation results 58 3.4.4 Ablation studies 66 3.5 Summary 69 4 Dynamic Architecture based Approach 71 4.1 Introduction 71 4.2 Related works 75 4.2.1 Video frame interpolation 75 4.2.2 Adaptive inference 76 4.3 Proposed Method 77 4.3.1 Dynamic framework overview 77 4.3.2 Scale and depth finder (SD-finder) 80 4.3.3 Dynamic interpolation model 82 4.3.4 Training 83 4.4 Experiments 85 4.4.1 Datasets 85 4.4.2 Implementation details 86 4.4.3 Quantitative comparison 87 4.4.4 Visual comparison 93 4.4.5 Ablation study 97 4.5 Summary 100 5 Conclusion 103 5.1 Summary of dissertation 103 5.2 Future works 104 Bibliography 107 국문초록 120Docto

SNU Open Repository and Archive

Driving steady-state visual evoked potentials at arbitrary frequencies using temporal interpolation of stimulus presentation

Author: Andersen Søren K.
Müller Mathias M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Date of Acceptance: 29/10/2015 We thank Renate Zahn for help with data collection. This work was supported by Deutsche Forschungsgemeinschaft (AN 841/1-1, MU 972/20-1). We would like to thank A. Trujillo-Ortiz, R. Hernandez-Walls, A. Castro-Perez and K. BarbaRojo (Universidad Autonoma de Baja California) for making Matlab code for non-sphericity corrections freely available.Peer reviewedPublisher PD

Aberdeen University Research

Crossref

Springer - Publisher Connector

PubMed Central

H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions

Author: Li Changlin
Sun Yanan
Tai Yu-Wing
Tang Chi-Keung
Tao Xin
Wu Guangyang
Publication venue
Publication date: 21/11/2022
Field of study

Capitalizing on the rapid development of neural networks, recent video frame interpolation (VFI) methods have achieved notable improvements. However, they still fall short for real-world videos containing large motions. Complex deformation and/or occlusion caused by large motions make it an extremely difficult problem in video frame interpolation. In this paper, we propose a simple yet effective solution, H-VFI, to deal with large motions in video frame interpolation. H-VFI contributes a hierarchical video interpolation transformer (HVIT) to learn a deformable kernel in a coarse-to-fine strategy in multiple scales. The learnt deformable kernel is then utilized in convolving the input frames for predicting the interpolated frame. Starting from the smallest scale, H-VFI updates the deformable kernel by a residual in succession based on former predicted kernels, intermediate interpolated results and hierarchical features from transformer. Bias and masks to refine the final outputs are then predicted by a transformer block based on interpolated results. The advantage of such a progressive approximation is that the large motion frame interpolation problem can be decomposed into several relatively simpler sub-tasks, which enables a very accurate prediction in the final results. Another noteworthy contribution of our paper consists of a large-scale high-quality dataset, YouTube200K, which contains videos depicting a great variety of scenarios captured at high resolution and high frame rate. Extensive experiments on multiple frame interpolation benchmarks validate that H-VFI outperforms existing state-of-the-art methods especially for videos with large motions

arXiv.org e-Print Archive