Search CORE

7,844 research outputs found

An HMM-Based Framework for Video Semantic Analysis

Author: Ma Yu-Fei
Xu Gu
Yang Shiqiang
Zhang Hong-Jiang
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2005
Field of study

Video semantic analysis is essential in video indexing and structuring. However, due to the lack of robust and generic algorithms, most of the existing works on semantic analysis are limited to specific domains. In this paper, we present a novel hidden Markove model (HMM)-based framework as a general solution to video semantic analysis. In the proposed framework, semantics in different granularities are mapped to a hierarchical model space, which is composed of detectors and connectors. In this manner, our model decomposes a complex analysis problem into simpler subproblems during the training process and automatically integrates those subproblems for recognition. The proposed framework is not only suitable for a broad range of applications, but also capable of modeling semantics in different semantic granularities. Additionally, we also present a new motion representation scheme, which is robust to different motion vector sources. The applications of the proposed framework in basketball event detection, soccer shot classification, and volleyball sequence analysis have demonstrated the effectiveness of the proposed framework on video semantic analysis

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Video semantic content analysis framework based on ontology combined MPEG-7

Author: A. Artale
A. Ekin
A. Hanjalic
A. Jaimes
A. Yoshitaka
H.X. Xu
I. Kompatsiaris
S. Dasiopoulou
S.-F. Chang
V. Mezaris
W. Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2007
Field of study

The rapid increase in the available amount of video data is creating a growing demand for efficient methods for understanding and managing it at the semantic level. New multimedia standard, MPEG-7, provides the rich functionalities to enable the generation of audiovisual descriptions and is expressed solely in XML Schema which provides little support for expressing semantic knowledge. In this paper, a video semantic content analysis framework based on ontology combined MPEG-7 is presented. Domain ontology is used to define high level semantic concepts and their relations in the context of the examined domain. MPEG-7 metadata terms of audiovisual descriptions and video content analysis algorithms are expressed in this ontology to enrich video semantic analysis. OWL is used for the ontology description. Rules in Description Logic are defined to describe how low-level features and algorithms for video analysis should be applied according to different perception content. Temporal Description Logic is used to describe the semantic events, and a reasoning algorithm is proposed for events detection. The proposed framework is demonstrated in sports video domain and shows promising results

Crossref

DCU Online Research Access Service

Content-based Video Retrieval by Integrating Spatio-Temporal and Stochastic Recognition of Events

Author: Jonker W.
Petkovic M.
Publication venue: IEEE
Publication date: 01/01/2001
Field of study

As amounts of publicly available video data grow the need to query this data efficiently becomes significant. Consequently content-based retrieval of video data turns out to be a challenging and important problem. We address the specific aspect of inferring semantics automatically from raw video data. In particular, we introduce a new video data model that supports the integrated use of two different approaches for mapping low-level features to high-level concepts. Firstly, the model is extended with a rule-based approach that supports spatio-temporal formalization of high-level concepts, and then with a stochastic approach. Furthermore, results on real tennis video data are presented, demonstrating the validity of both approaches, as well us advantages of their integrated us

CiteSeerX

Pure OAI Repository

University of Twente Research Information

An audio-based sports video segmentation and event detection algorithm

Author: Baillie M.
Jose J.M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

In this paper, we present an audio-based event detection algorithm shown to be effective when applied to Soccer video. The main benefit of this approach is the ability to recognise patterns that display high levels of crowd response correlated to key events. The soundtrack from a Soccer sequence is first parameterised using Mel-frequency Cepstral coefficients. It is then segmented into homogenous components using a windowing algorithm with a decision process based on Bayesian model selection. This decision process eliminated the need for defining a heuristic set of rules for segmentation. Each audio segment is then labelled using a series of Hidden Markov model (HMM) classifiers, each a representation of one of 6 predefined semantic content classes found in Soccer video. Exciting events are identified as those segments belonging to a crowd cheering class. Experimentation indicated that the algorithm was more effective for classifying crowd response when compared to traditional model-based segmentation and classification techniques

CiteSeerX

University of Strathclyde Institutional Repository

Enlighten

Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

Author: Barbu Andrei
Berzak Yevgeni
Harari Daniel
Katz Boris
Ullman Shimon
Publication venue
Publication date: 04/12/2015
Field of study

Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, representing a wide range of syntactic, semantic and discourse ambiguities, coupled with videos that visualize the different interpretations for each sentence. We address this task by extending a vision model which determines if a sentence is depicted by a video. We demonstrate how such a model can be adjusted to recognize different interpretations of the same underlying sentence, allowing to disambiguate sentences in a unified fashion across the different ambiguity types.Comment: EMNLP 201

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT