2 research outputs found
Fine-grained Incident Video Retrieval with Video Similarity Learning.
PhD ThesesIn this thesis, we address the problem of Fine-grained Incident Video Retrieval (FIVR)
using video similarity learning methods. FIVR is a video retrieval task that aims to
retrieve all videos that depict the same incident given a query video { related video
retrieval tasks adopt either very narrow or very broad scopes, considering only nearduplicate
or same event videos. To formulate the case of same incident videos, we
de ne three video associations taking into account the spatio-temporal spans captured
by video pairs. To cover the benchmarking needs of FIVR, we construct a large-scale
dataset, called FIVR-200K, consisting of 225,960 YouTube videos from major news
events crawled from Wikipedia. The dataset contains four annotation labels according
to FIVR de nitions; hence, it can simulate several retrieval scenarios with the same
video corpus. To address FIVR, we propose two video-level approaches leveraging
features extracted from intermediate layers of Convolutional Neural Networks (CNN).
The rst is an unsupervised method that relies on a modi ed Bag-of-Word scheme,
which generates video representations from the aggregation of the frame descriptors
based on learned visual codebooks. The second is a supervised method based on Deep
Metric Learning, which learns an embedding function that maps videos in a feature
space where relevant video pairs are closer than the irrelevant ones. However, videolevel
approaches generate global video representations, losing all spatial and temporal
relations between compared videos. Therefore, we propose a video similarity learning
approach that captures ne-grained relations between videos for accurate similarity
calculation. We train a CNN architecture to compute video-to-video similarity from
re ned frame-to-frame similarity matrices derived from a pairwise region-level similarity
function. The proposed approaches have been extensively evaluated on FIVR-
200K and other large-scale datasets, demonstrating their superiority over other video
retrieval methods and highlighting the challenging aspect of the FIVR problem