45,881 research outputs found

    Combining Multiple Sensors for Event Detection of Older People

    Get PDF
    International audienceWe herein present a hierarchical model-based framework for event detection using multiple sensors. Event models combine a priori knowledge of the scene (3D geometric and semantic information, such as contextual zones and equipment) with moving objects (e.g., a Person) detected by a video monitoring system. The event models follow a generic ontology based on natural language, which allows domain experts to easily adapt them. The framework novelty lies on combining multiple sensors at decision (event) level, and handling their conflict using a proba-bilistic approach. The event conflict handling consists of computing the reliability of each sensor before their fusion using an alternative combination rule for Dempster-Shafer Theory. The framework evaluation is performed on multisensor recording of instrumental activities of daily living (e.g., watching TV, writing a check, preparing tea, organizing week intake of prescribed medication) of participants of a clinical trial for Alzheimer's disease study. Two fusion cases are presented: the combination of events (or activities) from heterogeneous sensors (RGB ambient camera and a wearable inertial sensor) following a deterministic fashion, and the combination of conflicting events from video cameras with partially overlapped field of view (a RGB-and a RGB-D-camera, Kinect). Results showed the framework improves the event detection rate in both cases

    Combining Multiple Sensors for Event Detection of Older People

    Get PDF
    International audienceWe herein present a hierarchical model-based framework for event detection using multiple sensors. Event models combine a priori knowledge of the scene (3D geometric and semantic information, such as contextual zones and equipment) with moving objects (e.g., a Person) detected by a video monitoring system. The event models follow a generic ontology based on natural language, which allows domain experts to easily adapt them. The framework novelty lies on combining multiple sensors at decision (event) level, and handling their conflict using a proba-bilistic approach. The event conflict handling consists of computing the reliability of each sensor before their fusion using an alternative combination rule for Dempster-Shafer Theory. The framework evaluation is performed on multisensor recording of instrumental activities of daily living (e.g., watching TV, writing a check, preparing tea, organizing week intake of prescribed medication) of participants of a clinical trial for Alzheimer's disease study. Two fusion cases are presented: the combination of events (or activities) from heterogeneous sensors (RGB ambient camera and a wearable inertial sensor) following a deterministic fashion, and the combination of conflicting events from video cameras with partially overlapped field of view (a RGB-and a RGB-D-camera, Kinect). Results showed the framework improves the event detection rate in both cases

    Dublin City University at TRECVID 2008

    Get PDF
    In this paper we describe our system and experiments performed for both the automatic search task and the event detection task in TRECVid 2008. For the automatic search task for 2008 we submitted 3 runs utilizing only visual retrieval experts, continuing our previous work in examining techniques for query-time weight generation for data-fusion and determining what we can get from global visual only experts. For the event detection task we submitted results for 5 required events (ElevatorNoEntry, OpposingFlow, PeopleMeet, Embrace and PersonRuns) and 1 optional event (DoorOpenClose)

    Cultural Event Recognition with Visual ConvNets and Temporal Models

    Get PDF
    This paper presents our contribution to the ChaLearn Challenge 2015 on Cultural Event Classification. The challenge in this task is to automatically classify images from 50 different cultural events. Our solution is based on the combination of visual features extracted from convolutional neural networks with temporal information using a hierarchical classifier scheme. We extract visual features from the last three fully connected layers of both CaffeNet (pretrained with ImageNet) and our fine tuned version for the ChaLearn challenge. We propose a late fusion strategy that trains a separate low-level SVM on each of the extracted neural codes. The class predictions of the low-level SVMs form the input to a higher level SVM, which gives the final event scores. We achieve our best result by adding a temporal refinement step into our classification scheme, which is applied directly to the output of each low-level SVM. Our approach penalizes high classification scores based on visual features when their time stamp does not match well an event-specific temporal distribution learned from the training and validation data. Our system achieved the second best result in the ChaLearn Challenge 2015 on Cultural Event Classification with a mean average precision of 0.767 on the test set.Comment: Initial version of the paper accepted at the CVPR Workshop ChaLearn Looking at People 201

    Surrey-cvssp system for DCASE2017 challenge task4

    Get PDF
    In this technique report, we present a bunch of methods for the task 4 of Detection and Classification of Acoustic Scenes and Events 2017 (DCASE2017) challenge. This task evaluates systems for the large-scale detection of sound events using weakly labeled training data. The data are YouTube video excerpts focusing on transportation and warnings due to their industry applications. There are two tasks, audio tagging and sound event detection from weakly labeled data. Convolutional neural network (CNN) and gated recurrent unit (GRU) based recurrent neural network (RNN) are adopted as our basic framework. We proposed a learnable gating activation function for selecting informative local features. Attention-based scheme is used for localizing the specific events in a weakly-supervised mode. A new batch-level balancing strategy is also proposed to tackle the data unbalancing problem. Fusion of posteriors from different systems are found effective to improve the performance. In a summary, we get 61% F-value for the audio tagging subtask and 0.73 error rate (ER) for the sound event detection subtask on the development set. While the official multilayer perceptron (MLP) based baseline just obtained 13.1% F-value for the audio tagging and 1.02 for the sound event detection.Comment: DCASE2017 challenge ranked 1st system, task4, tech repor
    corecore