32,020 research outputs found
On Semantic Similarity in Video Retrieval
Current video retrieval efforts all found their evaluation on an
instance-based assumption, that only a single caption is relevant to a query
video and vice versa. We demonstrate that this assumption results in
performance comparisons often not indicative of models' retrieval capabilities.
We propose a move to semantic similarity video retrieval, where (i) multiple
videos/captions can be deemed equally relevant, and their relative ranking does
not affect a method's reported performance and (ii) retrieved videos/captions
are ranked by their similarity to a query. We propose several proxies to
estimate semantic similarities in large-scale retrieval datasets, without
additional annotations. Our analysis is performed on three commonly used video
retrieval datasets (MSR-VTT, YouCook2 and EPIC-KITCHENS).Comment: Accepted in CVPR 2021. Project Page: https://mwray.github.io/SSVR
Event Retrieval Using Motion Barcodes
We introduce a simple and effective method for retrieval of videos showing a
specific event, even when the videos of that event were captured from
significantly different viewpoints. Appearance-based methods fail in such
cases, as appearances change with large changes of viewpoints.
Our method is based on a pixel-based feature, "motion barcode", which records
the existence/non-existence of motion as a function of time. While appearance,
motion magnitude, and motion direction can vary greatly between disparate
viewpoints, the existence of motion is viewpoint invariant. Based on the motion
barcode, a similarity measure is developed for videos of the same event taken
from very different viewpoints. This measure is robust to occlusions common
under different viewpoints, and can be computed efficiently.
Event retrieval is demonstrated using challenging videos from stationary and
hand held cameras
Preliminary results in tag disambiguation using DBpedia
The availability of tag-based user-generated content for a variety of Web resources (music, photos, videos, text, etc.) has largely increased in the last years. Users can assign tags freely and then use them to share and retrieve information. However, tag-based sharing and retrieval is not optimal due to the fact that tags are plain text labels without an explicit or formal meaning, and hence polysemy and synonymy should be dealt with appropriately. To ameliorate these problems, we propose a context-based tag disambiguation algorithm that selects the meaning of a tag among a set of candidate DBpedia entries, using a common information retrieval similarity measure. The most similar DBpedia en-try is selected as the one representing the meaning of the tag. We describe and analyze some preliminary results, and discuss about current challenges in this area
Towards an All-Purpose Content-Based Multimedia Information Retrieval System
The growth of multimedia collections - in terms of size, heterogeneity, and
variety of media types - necessitates systems that are able to conjointly deal
with several forms of media, especially when it comes to searching for
particular objects. However, existing retrieval systems are organized in silos
and treat different media types separately. As a consequence, retrieval across
media types is either not supported at all or subject to major limitations. In
this paper, we present vitrivr, a content-based multimedia information
retrieval stack. As opposed to the keyword search approach implemented by most
media management systems, vitrivr makes direct use of the object's content to
facilitate different types of similarity search, such as Query-by-Example or
Query-by-Sketch, for and, most importantly, across different media types -
namely, images, audio, videos, and 3D models. Furthermore, we introduce a new
web-based user interface that enables easy-to-use, multimodal retrieval from
and browsing in mixed media collections. The effectiveness of vitrivr is shown
on the basis of a user study that involves different query and media types. To
the best of our knowledge, the full vitrivr stack is unique in that it is the
first multimedia retrieval system that seamlessly integrates support for four
different types of media. As such, it paves the way towards an all-purpose,
content-based multimedia information retrieval system
Eolas: video retrieval application for helping tourists
In this paper, a video retrieval application for the Android mobile platform is described. The application utilises computer vision technologies that, given a photo of a landmark of interest, will automatically locate online videos about that landmark. Content-based video retrieval technologies are adopted to find the most relevant videos based on visual similarity of video content. The system has been evaluated us- ing a custom test collection with human annotated ground truth. We show that our system is effective, both in terms of speed and accuracy. This application is proposed for demonstration at MMM2014 and we are sure that this application would benefit tourists either planning travel or while travelling in real-time
Asymmetric Spatio-Temporal Embeddings for Large-Scale Image-to-Video Retrieval
We address the problem of image-to-video retrieval. Given a query image, the aim is to identify the frame or scene within a collection of videos that best matches the visual input. Matching images to videos is an asymmetric task in which specific features for capturing the visual information in images and, at the same time, compacting the temporal correlation from videos are needed. Methods proposed so far are based on the temporal aggregation of hand-crafted features. In this work, we propose a deep learning architecture for learning specific asymmetric spatio-temporal embeddings for image-tovideo retrieval. Our method learns non-linear projections from training data for both images and videos and projects their visual content into a common latent space, where they can be easily compared with a standard similarity function. Experiments conducted here show that our proposed asymmetric spatio-temporal embeddings outperform stateof-the-art in standard image-to-video retrieval datasets
- …