Search CORE

16,997 research outputs found

Semantic Model Vectors for Complex Video Event Recognition

Author: Apostol Natsev
Bert Huang
Gang Hua
Lexing Xie
Michele Merler
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

TagBook: A Semantic Video Representation without Supervision for Event Detection

Author: Li Xirong
Mazloom Masoud
Snoek Cees G. M.
Publication venue
Publication date: 01/01/2016
Field of study

We consider the problem of event detection in video for scenarios where only few, or even zero examples are available for training. For this challenging setting, the prevailing solutions in the literature rely on a semantic video representation obtained from thousands of pre-trained concept detectors. Different from existing work, we propose a new semantic video representation that is based on freely available social tagged videos only, without the need for training any intermediate concept detectors. We introduce a simple algorithm that propagates tags from a video's nearest neighbors, similar in spirit to the ones used for image retrieval, but redesign it for video event detection by including video source set refinement and varying the video tag assignment. We call our approach TagBook and study its construction, descriptiveness and detection performance on the TRECVID 2013 and 2014 multimedia event detection datasets and the Columbia Consumer Video dataset. Despite its simple nature, the proposed TagBook video representation is remarkably effective for few-example and zero-example event detection, even outperforming very recent state-of-the-art alternatives building on supervised representations.Comment: accepted for publication as a regular paper in the IEEE Transactions on Multimedi

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Strategies for Searching Video Content with Text Queries or Video Examples

Author: Chang Xiaojun
Du Xingzhong
Gan Chuang
Hauptmann Alexander G.
Jiang Lu
Lan Zhenzhong
Li Huan
Li Xuanchong
Lin Ming
Ma Zhigang
Mao Zexi
Meng Deyu
Xu Shicheng
Xu Zhongwen
Yang Yi
Yu Shoou-I
Publication venue
Publication date: 01/01/2016
Field of study

The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search. However, metadata is often lacking for user-generated videos, thus these videos are unsearchable by current search engines. Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity problem by directly analyzing the visual and audio streams of each video. CBVR encompasses multiple research topics, including low-level feature design, feature fusion, semantic detector training and video search/reranking. We present novel strategies in these topics to enhance CBVR in both accuracy and speed under different query inputs, including pure textual queries and query by video examples. Our proposed strategies have been incorporated into our submission for the TRECVID 2014 Multimedia Event Detection evaluation, where our system outperformed other submissions in both text queries and video example queries, thus demonstrating the effectiveness of our proposed approaches

arXiv.org e-Print Archive

OPUS - University of Technology Sydney

A framework for event detection in field-sports video broadcasts based on SVM generated audio-visual feature model. Case-study: soccer video

Author: Marlow Seán
Murphy Noel
O'Connor Noel E.
Sadlier David A.
Publication venue
Publication date: 01/09/2004
Field of study

In this paper we propose a novel audio-visual feature-based framework, for event detection in field sports broadcast video. The system is evaluated via a case-study involving MPEG encoded soccer video. Specifically, the evidence gathered by various feature detectors is combined by means of a learning algorithm (a support vector machine), which infers the occurrence of an event, based on a model generated during a training phase, utilizing a corpus of 25 hours of content. The system is evaluated using 25 hours of separate test content. Following an evaluation of results obtained, it is shown for this case, that both high precision and recall statistics are achievable

Irish Universities

DCU Online Research Access Service

Learning to detect video events from zero or very few video examples

Author: Galanopoulos Damianos
Mezaris Vasileios
Patras Ioannis
Tzelepis Christos
Publication venue: 'Elsevier BV'
Publication date: 25/11/2015
Field of study

In this work we deal with the problem of high-level event detection in video. Specifically, we study the challenging problems of i) learning to detect video events from solely a textual description of the event, without using any positive video examples, and ii) additionally exploiting very few positive training samples together with a small number of ``related'' videos. For learning only from an event's textual description, we first identify a general learning framework and then study the impact of different design choices for various stages of this framework. For additionally learning from example videos, when true positive training samples are scarce, we employ an extension of the Support Vector Machine that allows us to exploit ``related'' event videos by automatically introducing different weights for subsets of the videos in the overall training set. Experimental evaluations performed on the large-scale TRECVID MED 2014 video dataset provide insight on the effectiveness of the proposed methods.Comment: Image and Vision Computing Journal, Elsevier, 2015, accepted for publicatio

arXiv.org e-Print Archive

City Research Online

Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

Author: Cheng Hui
Elgammal Ahmed
Elhoseiny Mohamed
Liu Jingen
Sawhney Harpreet
Publication venue
Publication date: 15/12/2015
Field of study

We propose a new zero-shot Event Detection method by Multi-modal Distributional Semantic embedding of videos. Our model embeds object and action concepts as well as other available modalities from videos into a distributional semantic space. To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional semantics and extends it in the following directions: (a) semantic embedding of multimodal information in videos (with focus on the visual modalities), (b) automatically determining relevance of concepts/attributes to a free text query, which could be useful for other applications, and (c) retrieving videos by free text event query (e.g., "changing a vehicle tire") based on their content. We embed videos into a distributional semantic space and then measure the similarity between videos and the event query in a free text form. We validated our method on the large TRECVID MED (Multimedia Event Detection) challenge. Using only the event title as a query, our method outperformed the state-of-the-art that uses big descriptions from 12.6% to 13.5% with MAP metric and 0.73 to 0.83 with ROC-AUC metric. It is also an order of magnitude faster.Comment: To appear in AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications