8,073 research outputs found

    A generic framework for video understanding applied to group behavior recognition

    Get PDF
    This paper presents an approach to detect and track groups of people in video-surveillance applications, and to automatically recognize their behavior. This method keeps track of individuals moving together by maintaining a spacial and temporal group coherence. First, people are individually detected and tracked. Second, their trajectories are analyzed over a temporal window and clustered using the Mean-Shift algorithm. A coherence value describes how well a set of people can be described as a group. Furthermore, we propose a formal event description language. The group events recognition approach is successfully validated on 4 camera views from 3 datasets: an airport, a subway, a shopping center corridor and an entrance hall.Comment: (20/03/2012

    Activity Recognition Using Probabilistic Timed Automata

    Get PDF

    Event Detection in Videos

    Get PDF

    Modeling cognitive load as a self-supervised brain rate with electroencephalography and deep learning

    Get PDF
    The principal reason for measuring mental workload is to quantify the cognitive cost of performing tasks to predict human performance. Unfortunately, a method for assessing mental workload that has general applicability does not exist yet. This research presents a novel self-supervised method for mental workload modelling from EEG data employing Deep Learning and a continuous brain rate, an index of cognitive activation, without requiring human declarative knowledge. This method is a convolutional recurrent neural network trainable with spatially preserving spectral topographic head-maps from EEG data to fit the brain rate variable. Findings demonstrate the capacity of the convolutional layers to learn meaningful high-level representations from EEG data since within-subject models had a test Mean Absolute Percentage Error average of 11%. The addition of a Long-Short Term Memory layer for handling sequences of high-level representations was not significant, although it did improve their accuracy. Findings point to the existence of quasi-stable blocks of learnt high-level representations of cognitive activation because they can be induced through convolution and seem not to be dependent on each other over time, intuitively matching the non-stationary nature of brain responses. Across-subject models, induced with data from an increasing number of participants, thus containing more variability, obtained a similar accuracy to the within-subject models. This highlights the potential generalisability of the induced high-level representations across people, suggesting the existence of subject-independent cognitive activation patterns. This research contributes to the body of knowledge by providing scholars with a novel computational method for mental workload modelling that aims to be generally applicable, does not rely on ad-hoc human-crafted models supporting replicability and falsifiability.Comment: 18 pages, 12 figures, 1 tabl

    Defining CARE Properties Through Temporal Input Models

    Get PDF
    In this paper we show how it is possible to represent the CARE properties (complementarity, assignment, redundancy, equivalence) modelling the temporal relationships among inputs provided through different modalities. For this purpose we extended GestIT, which provides a declarative and compositional model for gestures, in order to support other modalities. The generic models for the CARE properties can be used for the input model design, but also for an analysis of the relationships between the different modalities included into an existing input model

    What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision

    Get PDF
    We present a novel method for aligning a sequence of instructions to a video of someone carrying out a task. In particular, we focus on the cooking domain, where the instructions correspond to the recipe. Our technique relies on an HMM to align the recipe steps to the (automatically generated) speech transcript. We then refine this alignment using a state-of-the-art visual food detector, based on a deep convolutional neural network. We show that our technique outperforms simpler techniques based on keyword spotting. It also enables interesting applications, such as automatically illustrating recipes with keyframes, and searching within a video for events of interest.Comment: To appear in NAACL 201
    • …
    corecore