59,656 research outputs found
Unified Embedding and Metric Learning for Zero-Exemplar Event Detection
Event detection in unconstrained videos is conceived as a content-based video
retrieval with two modalities: textual and visual. Given a text describing a
novel event, the goal is to rank related videos accordingly. This task is
zero-exemplar, no video examples are given to the novel event.
Related works train a bank of concept detectors on external data sources.
These detectors predict confidence scores for test videos, which are ranked and
retrieved accordingly. In contrast, we learn a joint space in which the visual
and textual representations are embedded. The space casts a novel event as a
probability of pre-defined events. Also, it learns to measure the distance
between an event and its related videos.
Our model is trained end-to-end on publicly available EventNet. When applied
to TRECVID Multimedia Event Detection dataset, it outperforms the
state-of-the-art by a considerable margin.Comment: IEEE CVPR 201
Search Tracker: Human-derived object tracking in-the-wild through large-scale search and retrieval
Humans use context and scene knowledge to easily localize moving objects in
conditions of complex illumination changes, scene clutter and occlusions. In
this paper, we present a method to leverage human knowledge in the form of
annotated video libraries in a novel search and retrieval based setting to
track objects in unseen video sequences. For every video sequence, a document
that represents motion information is generated. Documents of the unseen video
are queried against the library at multiple scales to find videos with similar
motion characteristics. This provides us with coarse localization of objects in
the unseen video. We further adapt these retrieved object locations to the new
video using an efficient warping scheme. The proposed method is validated on
in-the-wild video surveillance datasets where we outperform state-of-the-art
appearance-based trackers. We also introduce a new challenging dataset with
complex object appearance changes.Comment: Under review with the IEEE Transactions on Circuits and Systems for
Video Technolog
Coherent segmentation of video into syntactic regions
In this paper we report on our work in realising an approach to video shot matching which involves automatically segmenting video into abstract intertwinded shapes in such a way that there is temporal coherency. These shapes representing approximations of objects and background regions can then be matched giving fine-grained shot-shot matching. The main contributions of the paper are firstly the extension of our segmentation algorithm for still images to spatial segmentation in video, and secondly the introduction a measurement of temporal coherency of the spatial segmentation. This latter allows us to quantitatively demonstrate the effectiveness of our approach on real video data
Visual Information Retrieval in Digital Libraries
The emergence of information highways and multimedia computing has resulted in redefining the concept of libraries. It is widely believed that in the next few years, a significant portion of information in libraries will be in the form of multimedia electronic documents. Many approaches are being proposed for storing, retrieving, assimilating, harvesting, and prospecting information from these multimedia documents. Digital libraries are expected to allow users to access information independent of the locations and types of data sources and will provide a unified picture of information. In this paper, we discuss requirements of these emerging information systems and present query methods and data models for these systems. Finally, we briefly present a few examples of approaches that provide a preview of how things will be done in the digital libraries in the near future.published or submitted for publicatio
Integrated content presentation for multilingual and multimedia information access
For multilingual and multimedia information retrieval from
multiple potentially distributed collections generating the
output in the form of standard ranked lists may often mean
that a user has to explore the contents of many lists before
finding sufficient relevant or linguistically accessible material to satisfy their information need. In some situations delivering an integrated multilingual multimedia presentation could enable the user to explore a topic allowing them to select from among a range of available content based on suitably chosen displayed metadata. A presentation of this type has similarities with the outputs of existing adaptive hypermedia systems. However, such systems are generated based on âclosedâ content with sophisticated user and domain models. Extending them to âopenâ domain information retrieval applications would raise many issues. We present an outline exploration of what will form a challenging new direction for research in multilingual information access
Dublin City University video track experiments for TREC 2003
In this paper, we describe our experiments for both the News Story Segmentation task and Interactive Search task for
TRECVID 2003. Our News Story Segmentation task involved the use of a Support Vector Machine (SVM) to combine evidence from audio-visual analysis tools in order to generate a listing of news stories from a given news programme. Our
Search task experiment compared a video retrieval system based on text, image and relevance feedback with a text-only
video retrieval system in order to identify which was more effective. In order to do so we developed two variations of our FĂschlĂĄr video retrieval system and conducted user testing in a controlled lab environment. In this paper we outline our work on both of these two tasks
- âŠ