Search CORE

2,323 research outputs found

TagBook: A Semantic Video Representation without Supervision for Event Detection

Author: Li Xirong
Mazloom Masoud
Snoek Cees G. M.
Publication venue
Publication date: 01/01/2016
Field of study

We consider the problem of event detection in video for scenarios where only few, or even zero examples are available for training. For this challenging setting, the prevailing solutions in the literature rely on a semantic video representation obtained from thousands of pre-trained concept detectors. Different from existing work, we propose a new semantic video representation that is based on freely available social tagged videos only, without the need for training any intermediate concept detectors. We introduce a simple algorithm that propagates tags from a video's nearest neighbors, similar in spirit to the ones used for image retrieval, but redesign it for video event detection by including video source set refinement and varying the video tag assignment. We call our approach TagBook and study its construction, descriptiveness and detection performance on the TRECVID 2013 and 2014 multimedia event detection datasets and the Columbia Consumer Video dataset. Despite its simple nature, the proposed TagBook video representation is remarkably effective for few-example and zero-example event detection, even outperforming very recent state-of-the-art alternatives building on supervised representations.Comment: accepted for publication as a regular paper in the IEEE Transactions on Multimedi

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Detection of overlapped acoustic events using fusion of audio and video modalities

Author: Butko Taras
Nadeu Camprubí Climent
Publication venue
Publication date: 01/01/2010
Field of study

Acoustic event detection (AED) may help to describe acoustic scenes, and also contribute to improve the robustness of speech technologies. Even if the number of considered events is not large, that detection becomes a difficult task in scenarios where the AEs are produced rather spontaneously and often overlap in time with speech. In this work, fusion of audio and video information at either feature or decision level is performed, and the results are compared for different levels of signal overlaps. The best improvement with respect to an audio-only baseline system was obtained using the featurelevel fusion technique. Furthermore, a significant recognition rate improvement is observed where the AEs are overlapped with loud speech, mainly due to the fact that the video modality remains unaffected by the interfering sound.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Combining Multiple Sensors for Event Detection of Older People

Author: Bremond François
Crispim-Junior Carlos,
Fosty Baptiste
Ma Qiao
Romdhane Rim
Thonnat Monique
Publication venue: HAL CCSD
Publication date: 14/07/2015
Field of study

International audienceWe herein present a hierarchical model-based framework for event detection using multiple sensors. Event models combine a priori knowledge of the scene (3D geometric and semantic information, such as contextual zones and equipment) with moving objects (e.g., a Person) detected by a video monitoring system. The event models follow a generic ontology based on natural language, which allows domain experts to easily adapt them. The framework novelty lies on combining multiple sensors at decision (event) level, and handling their conflict using a proba-bilistic approach. The event conflict handling consists of computing the reliability of each sensor before their fusion using an alternative combination rule for Dempster-Shafer Theory. The framework evaluation is performed on multisensor recording of instrumental activities of daily living (e.g., watching TV, writing a check, preparing tea, organizing week intake of prescribed medication) of participants of a clinical trial for Alzheimer's disease study. Two fusion cases are presented: the combination of events (or activities) from heterogeneous sensors (RGB ambient camera and a wearable inertial sensor) following a deterministic fashion, and the combination of conflicting events from video cameras with partially overlapped field of view (a RGB-and a RGB-D-camera, Kinect). Results showed the framework improves the event detection rate in both cases

Recommended from our members

Healthcare Event and Activity Logging.

Author: Fried Jeffrey C
Manjunath BS
Torres Carlos
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

The health of patients in the intensive care unit (ICU) can change frequently and inexplicably. Crucial events and activities responsible for these changes often go unnoticed. This paper introduces healthcare event and action logging (HEAL) which automatically and unobtrusively monitors and reports on events and activities that occur in a medical ICU room. HEAL uses a multimodal distributed camera network to monitor and identify ICU activities and estimate sanitation-event qualifiers. At the core is a novel approach to infer person roles based on semantic interactions, a critical requirement in many healthcare settings where individuals' identities must not be identified. The proposed approach for activity representation identifies contextual aspects basis and estimates aspect weights for proper action representation and reconstruction. The flexibility of the proposed algorithms enables the identification of people roles by associating them with inferred interactions and detected activities. A fully working prototype system is developed, tested in a mock ICU room and then deployed in two ICU rooms at a community hospital, thus offering unique capabilities for data gathering and analytics. The proposed method achieves a role identification accuracy of 84% and a backtracking role identification of 79% for obscured roles using interaction and appearance features on real ICU data. Detailed experimental results are provided in the context of four event-sanitation qualifiers: clean, transmission, contamination, and unclean

eScholarship - University of California

Acoustic event detection based on feature-level fusion of audio and video modalities

Author: Butko Taras
Canton Ferrer Cristian
Casas Pla Josep Ramon
Giró Nieto Xavier
Hernando Pericás Francisco Javier
Nadeu Camprubí Climent
Segura Perales Carlos
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2011
Field of study

Research articleAcoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When applied to spontaneously generated acoustic events, AED based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps. Actually, temporal overlaps accounted for more than 70% of errors in the realworld interactive seminar recordings used in CLEAR 2007 evaluations. In this paper, we improve the recognition rate of acoustic events using information from both audio and video modalities. First, the acoustic data are processed to obtain both a set of spectrotemporal features and the 3D localization coordinates of the sound source. Second, a number of features are extracted from video recordings by means of object detection, motion analysis, and multicamera person tracking to represent the visual counterpart of several acoustic events. A feature-level fusion strategy is used, and a parallel structure of binary HMM-based detectors is employed in our work. The experimental results show that information from both the microphone array and video cameras is useful to improve the detection rate of isolated as well as spontaneously generated acoustic events.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

Springer - Publisher Connector

Directory of Open Access Journals