55,101 research outputs found
An empirical study of inter-concept similarities in multimedia ontologies
Generic concept detection has been a widely studied topic in recent research on multimedia analysis and retrieval, but the issue of how to exploit the structure of a multimedia ontology as well as different inter-concept relations, has not received similar attention. In this paper, we present results from our empirical analysis of different types of similarity among semantic concepts in two multimedia ontologies, LSCOM-Lite and CDVP-206. The results show promise that the proposed methods may be helpful in providing insight into the existing inter-concept relations within an ontology and selecting the most facilitating set of concepts and hierarchical relations. Such an analysis as this can be utilized in various tasks such as building more reliable concept detectors and designing large-scale ontologies
Strategies for Searching Video Content with Text Queries or Video Examples
The large number of user-generated videos uploaded on to the Internet
everyday has led to many commercial video search engines, which mainly rely on
text metadata for search. However, metadata is often lacking for user-generated
videos, thus these videos are unsearchable by current search engines.
Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity
problem by directly analyzing the visual and audio streams of each video. CBVR
encompasses multiple research topics, including low-level feature design,
feature fusion, semantic detector training and video search/reranking. We
present novel strategies in these topics to enhance CBVR in both accuracy and
speed under different query inputs, including pure textual queries and query by
video examples. Our proposed strategies have been incorporated into our
submission for the TRECVID 2014 Multimedia Event Detection evaluation, where
our system outperformed other submissions in both text queries and video
example queries, thus demonstrating the effectiveness of our proposed
approaches
Learning to detect video events from zero or very few video examples
In this work we deal with the problem of high-level event detection in video.
Specifically, we study the challenging problems of i) learning to detect video
events from solely a textual description of the event, without using any
positive video examples, and ii) additionally exploiting very few positive
training samples together with a small number of ``related'' videos. For
learning only from an event's textual description, we first identify a general
learning framework and then study the impact of different design choices for
various stages of this framework. For additionally learning from example
videos, when true positive training samples are scarce, we employ an extension
of the Support Vector Machine that allows us to exploit ``related'' event
videos by automatically introducing different weights for subsets of the videos
in the overall training set. Experimental evaluations performed on the
large-scale TRECVID MED 2014 video dataset provide insight on the effectiveness
of the proposed methods.Comment: Image and Vision Computing Journal, Elsevier, 2015, accepted for
publicatio
Video Data Visualization System: Semantic Classification And Personalization
We present in this paper an intelligent video data visualization tool, based
on semantic classification, for retrieving and exploring a large scale corpus
of videos. Our work is based on semantic classification resulting from semantic
analysis of video. The obtained classes will be projected in the visualization
space. The graph is represented by nodes and edges, the nodes are the keyframes
of video documents and the edges are the relation between documents and the
classes of documents. Finally, we construct the user's profile, based on the
interaction with the system, to render the system more adequate to its
references.Comment: graphic
DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion
In this paper, we study the task of detecting semantic parts of an object,
e.g., a wheel of a car, under partial occlusion. We propose that all models
should be trained without seeing occlusions while being able to transfer the
learned knowledge to deal with occlusions. This setting alleviates the
difficulty in collecting an exponentially large dataset to cover occlusion
patterns and is more essential. In this scenario, the proposal-based deep
networks, like RCNN-series, often produce unsatisfactory results, because both
the proposal extraction and classification stages may be confused by the
irrelevant occluders. To address this, [25] proposed a voting mechanism that
combines multiple local visual cues to detect semantic parts. The semantic
parts can still be detected even though some visual cues are missing due to
occlusions. However, this method is manually-designed, thus is hard to be
optimized in an end-to-end manner.
In this paper, we present DeepVoting, which incorporates the robustness shown
by [25] into a deep network, so that the whole pipeline can be jointly
optimized. Specifically, it adds two layers after the intermediate features of
a deep network, e.g., the pool-4 layer of VGGNet. The first layer extracts the
evidence of local visual cues, and the second layer performs a voting mechanism
by utilizing the spatial relationship between visual cues and semantic parts.
We also propose an improved version DeepVoting+ by learning visual cues from
context outside objects. In experiments, DeepVoting achieves significantly
better performance than several baseline methods, including Faster-RCNN, for
semantic part detection under occlusion. In addition, DeepVoting enjoys
explainability as the detection results can be diagnosed via looking up the
voting cues
Measuring concept similarities in multimedia ontologies: analysis and evaluations
The recent development of large-scale multimedia concept ontologies has provided a new momentum for research in the semantic analysis of multimedia repositories. Different methods for generic concept detection have been extensively studied, but the question of how to exploit the structure of a multimedia ontology and existing inter-concept relations has not received similar attention. In this paper, we present a clustering-based method for modeling semantic concepts on low-level feature spaces and study the evaluation of the quality of such models with entropy-based methods. We cover a variety of methods for assessing the similarity of different concepts in a multimedia ontology. We study three ontologies and apply the proposed techniques in experiments involving the visual and semantic similarities, manual annotation of video, and concept detection. The results show that modeling inter-concept relations can provide a promising resource for many different application areas in semantic multimedia processing
- …