Search CORE

4,533 research outputs found

Acoustic event detection based on feature-level fusion of audio and video modalities

Author: Butko Taras
Canton Ferrer Cristian
Casas Pla Josep Ramon
Giró Nieto Xavier
Hernando Pericás Francisco Javier
Nadeu Camprubí Climent
Segura Perales Carlos
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2011
Field of study

Research articleAcoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When applied to spontaneously generated acoustic events, AED based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps. Actually, temporal overlaps accounted for more than 70% of errors in the realworld interactive seminar recordings used in CLEAR 2007 evaluations. In this paper, we improve the recognition rate of acoustic events using information from both audio and video modalities. First, the acoustic data are processed to obtain both a set of spectrotemporal features and the 3D localization coordinates of the sound source. Second, a number of features are extracted from video recordings by means of object detection, motion analysis, and multicamera person tracking to represent the visual counterpart of several acoustic events. A feature-level fusion strategy is used, and a parallel structure of binary HMM-based detectors is employed in our work. The experimental results show that information from both the microphone array and video cameras is useful to improve the detection rate of isolated as well as spontaneously generated acoustic events.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Springer - Publisher Connector

Directory of Open Access Journals

Anti-social behavior detection in audio-visual surveillance systems

Author: Kelly Philip
Kuklyte Jogile
O'Connor Noel E.
Xu Li-Qun
Ó Conaire Ciarán
Publication venue
Publication date: 01/12/2009
Field of study

In this paper we propose a general purpose framework for detection of unusual events. The proposed system is based on the unsupervised method for unusual scene detection in web{cam images that was introduced in [1]. We extend their algorithm to accommodate data from different modalities and introduce the concept of time-space blocks. In addition, we evaluate early and late fusion techniques for our audio-visual data features. The experimental results on 192 hours of data show that data fusion of audio and video outperforms using a single modality

CiteSeerX

Irish Universities

DCU Online Research Access Service

PhD Forum: Investigating the performance of a multi-modal approach to unusual event detection

Author: Kelly Philip
Kuklyte Jogile
O'Connor Noel E.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

In this paper, we investigate the parameters under- pinning our previously presented system for detecting unusual events in surveillance applications [1]. The system identifies anomalous events using an unsupervised data-driven approach. During a training period, typical activities within a surveilled environment are modeled using multi-modal sensor readings. Significant deviations from the established model of regular activity can then be flagged as anomalous at run-time. Using this approach, the system can be deployed and automatically adapt for use in any environment without any manual adjustment. Experiments carried out on two days of audio-visual data were performed and evaluated using a manually annotated ground- truth. We investigate sensor fusion and quantitatively evaluate the performance gains over single modality models. We also investigate different formulations of our cluster-based model of usual scenes as well as the impact of dynamic thresholding on identifying anomalous events. Experimental results are promis- ing, even when modeling is performed using very simple audio and visual features

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

UR-FUNNY: A Multimodal Language Dataset for Understanding Humor

Author: Hasan Md Kamrul
Hoque
Mohammed
Morency Louis-Philippe
Rahman Wasifur
Tanveer Md Iftekhar
Zadeh Amir
Zhong Jianyuan
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Humor is a unique and creative communicative behavior displayed during social interactions. It is produced in a multimodal manner, through the usage of words (text), gestures (vision) and prosodic cues (acoustic). Understanding humor from these three modalities falls within boundaries of multimodal language; a recent research trend in natural language processing that models natural language as it happens in face-to-face communication. Although humor detection is an established research area in NLP, in a multimodal context it is an understudied area. This paper presents a diverse multimodal dataset, called UR-FUNNY, to open the door to understanding multimodal language used in expressing humor. The dataset and accompanying studies, present a framework in multimodal humor detection for the natural language processing community. UR-FUNNY is publicly available for research

arXiv.org e-Print Archive

Crossref