Search CORE

103 research outputs found

Circulant temporal encoding for video retrieval and temporal alignment

Author: Douze Matthijs
Jégou Hervé
Revaud Jérôme
Schmid Cordelia
Verbeek Jakob
Publication venue
Publication date: 30/11/2015
Field of study

We address the problem of specific video event retrieval. Given a query video of a specific event, e.g., a concert of Madonna, the goal is to retrieve other videos of the same event that temporally overlap with the query. Our approach encodes the frame descriptors of a video to jointly represent their appearance and temporal order. It exploits the properties of circulant matrices to efficiently compare the videos in the frequency domain. This offers a significant gain in complexity and accurately localizes the matching parts of videos. The descriptors can be compressed in the frequency domain with a product quantizer adapted to complex numbers. In this case, video retrieval is performed without decompressing the descriptors. We also consider the temporal alignment of a set of videos. We exploit the matching confidence and an estimate of the temporal offset computed for all pairs of videos by our retrieval approach. Our robust algorithm aligns the videos on a global timeline by maximizing the set of temporally consistent matches. The global temporal alignment enables synchronous playback of the videos of a given scene

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

LAMV: Learning to align and match videos with kernelized temporal layers

Author: Baraldi Lorenzo
Cucchiara Rita
Douze Matthijs
Jégou Hervé
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

This paper considers a learnable approach for comparing and aligning videos. Our architecture builds upon and revisits temporal match kernels within neural networks: we propose a new temporal layer that finds temporal alignments by maximizing the scores between two sequences of vectors, according to a time-sensitive similarity metric parametrized in the Fourier domain. We learn this layer with a temporal proposal strategy, in which we minimize a triplet loss that takes into account both the localization accuracy and the recognition rate. We evaluate our approach on video alignment, copy detection and event retrieval. Our approach outperforms the state on the art on temporal video alignment and video copy detection datasets in comparable setups. It also attains the best reported results for particular event search, while precisely aligning videos

Crossref

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Text-based Localization of Moments in a Video Corpus

Author: Mithun Niluthpol Chowdhury
Paul Sudipta
Roy-Chowdhury Amit K.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Prior works on text-based video moment localization focus on temporally grounding the textual query in an untrimmed video. These works assume that the relevant video is already known and attempt to localize the moment on that relevant video only. Different from such works, we relax this assumption and address the task of localizing moments in a corpus of videos for a given sentence query. This task poses a unique challenge as the system is required to perform: (i) retrieval of the relevant video where only a segment of the video corresponds with the queried sentence, and (ii) temporal localization of moment in the relevant video based on sentence query. Towards overcoming this challenge, we propose Hierarchical Moment Alignment Network (HMAN) which learns an effective joint embedding space for moments and sentences. In addition to learning subtle differences between intra-video moments, HMAN focuses on distinguishing inter-video global semantic concepts based on sentence queries. Qualitative and quantitative results on three benchmark text-based video moment retrieval datasets - Charades-STA, DiDeMo, and ActivityNet Captions - demonstrate that our method achieves promising performance on the proposed task of temporal localization of moments in a corpus of videos

arXiv.org e-Print Archive

eScholarship - University of California

VADER: Video Alignment Differencing and Retrieval

Author: Black Alexander
Bui Tu
Collomosse John
Jenni Simon
Petrangeli Stefano
Sinha Ritwik
Swaminathan Viswanathan
Tanjim Md. Mehrab
Publication venue
Publication date: 25/03/2023
Field of study

We propose VADER, a spatio-temporal matching, alignment, and change summarization method to help fight misinformation spread via manipulated videos. VADER matches and coarsely aligns partial video fragments to candidate videos using a robust visual descriptor and scalable search over adaptively chunked video content. A transformer-based alignment module then refines the temporal localization of the query fragment within the matched video. A space-time comparator module identifies regions of manipulation between aligned content, invariant to any changes due to any residual temporal misalignments or artifacts arising from non-editorial changes of the content. Robustly matching video to a trusted source enables conclusions to be drawn on video provenance, enabling informed trust decisions on content encountered

arXiv.org e-Print Archive

Surgical video retrieval using deep neural networks

Author: Kollias Stefanos
Loukas Constantinos
Rapantzikos Kostas
Varytimidis Christos
Publication venue
Publication date: 21/10/2016
Field of study

Although the amount of raw surgical videos, namely videos captured during surgical interventions, is growing fast, automatic retrieval and search remains a challenge. This is mainly due to the nature of the content, i.e. visually non-consistent tissue, diversity of internal organs, abrupt viewpoint changes and illumination variation. We propose a framework for retrieving surgical videos and a protocol for evaluating the results. The method is composed of temporal shot segmentation and representation based on deep features, and the protocol introduces novel criteria to the field. The experimental results prove the superiority of the proposed method and highlight the path towards a more effective protocol for evaluating surgical videos

University of Lincoln Institutional Repository