Search CORE

70,325 research outputs found

Joint Video and Text Parsing for Understanding Events and Answering Queries

Author: Choe Tae Eun
Lee Mun Wai
Meng Meng
Tu Kewei
Zhu Song-Chun
Publication venue
Publication date: 21/02/2014
Field of study

We propose a framework for parsing video and text jointly for understanding events and answering user queries. Our framework produces a parse graph that represents the compositional structures of spatial information (objects and scenes), temporal information (actions and events) and causal information (causalities between events and fluents) in the video and text. The knowledge representation of our framework is based on a spatial-temporal-causal And-Or graph (S/T/C-AOG), which jointly models possible hierarchical compositions of objects, scenes and events as well as their interactions and mutual contexts, and specifies the prior probabilistic distribution of the parse graphs. We present a probabilistic generative model for joint parsing that captures the relations between the input video/text, their corresponding parse graphs and the joint parse graph. Based on the probabilistic model, we propose a joint parsing system consisting of three modules: video parsing, text parsing and joint inference. Video parsing and text parsing produce two parse graphs from the input video and text respectively. The joint inference module produces a joint parse graph by performing matching, deduction and revision on the video and text parse graphs. The proposed framework has the following objectives: Firstly, we aim at deep semantic parsing of video and text that goes beyond the traditional bag-of-words approaches; Secondly, we perform parsing and reasoning across the spatial, temporal and causal dimensions based on the joint S/T/C-AOG representation; Thirdly, we show that deep joint parsing facilitates subsequent applications such as generating narrative text descriptions and answering queries in the forms of who, what, when, where and why. We empirically evaluated our system based on comparison against ground-truth as well as accuracy of query answering and obtained satisfactory results

arXiv.org e-Print Archive

CiteSeerX

Contextual anomaly detection in crowded surveillance scenes

Author: Leach Michael
Robertson Neil
Sparks Ed
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

AbstractThis work addresses the problem of detecting human behavioural anomalies in crowded surveillance environments. We focus in particular on the problem of detecting subtle anomalies in a behaviourally heterogeneous surveillance scene. To reach this goal we implement a novel unsupervised context-aware process. We propose and evaluate a method of utilising social context and scene context to improve behaviour analysis. We find that in a crowded scene the application of Mutual Information based social context permits the ability to prevent self-justifying groups and propagate anomalies in a social network, granting a greater anomaly detection capability. Scene context uniformly improves the detection of anomalies in both datasets. The strength of our contextual features is demonstrated by the detection of subtly abnormal behaviours, which otherwise remain indistinguishable from normal behaviour

Queen's University Belfast Research Portal

Heriot Watt Pure

Elsevier - Publisher Connector

Sound event recognition in urban soundscapes with self-organizing maps and support vector machines

Author: Alías Francesc
Botteldooren Dick
Oldoni Damiano
Valero Xavier
Publication venue: Soundscape-COST
Publication date: 01/01/2013
Field of study

Ghent University Academic Bibliography