8,073 research outputs found
A generic framework for video understanding applied to group behavior recognition
This paper presents an approach to detect and track groups of people in
video-surveillance applications, and to automatically recognize their behavior.
This method keeps track of individuals moving together by maintaining a spacial
and temporal group coherence. First, people are individually detected and
tracked. Second, their trajectories are analyzed over a temporal window and
clustered using the Mean-Shift algorithm. A coherence value describes how well
a set of people can be described as a group. Furthermore, we propose a formal
event description language. The group events recognition approach is
successfully validated on 4 camera views from 3 datasets: an airport, a subway,
a shopping center corridor and an entrance hall.Comment: (20/03/2012
Modeling cognitive load as a self-supervised brain rate with electroencephalography and deep learning
The principal reason for measuring mental workload is to quantify the
cognitive cost of performing tasks to predict human performance. Unfortunately,
a method for assessing mental workload that has general applicability does not
exist yet. This research presents a novel self-supervised method for mental
workload modelling from EEG data employing Deep Learning and a continuous brain
rate, an index of cognitive activation, without requiring human declarative
knowledge. This method is a convolutional recurrent neural network trainable
with spatially preserving spectral topographic head-maps from EEG data to fit
the brain rate variable. Findings demonstrate the capacity of the convolutional
layers to learn meaningful high-level representations from EEG data since
within-subject models had a test Mean Absolute Percentage Error average of 11%.
The addition of a Long-Short Term Memory layer for handling sequences of
high-level representations was not significant, although it did improve their
accuracy. Findings point to the existence of quasi-stable blocks of learnt
high-level representations of cognitive activation because they can be induced
through convolution and seem not to be dependent on each other over time,
intuitively matching the non-stationary nature of brain responses.
Across-subject models, induced with data from an increasing number of
participants, thus containing more variability, obtained a similar accuracy to
the within-subject models. This highlights the potential generalisability of
the induced high-level representations across people, suggesting the existence
of subject-independent cognitive activation patterns. This research contributes
to the body of knowledge by providing scholars with a novel computational
method for mental workload modelling that aims to be generally applicable, does
not rely on ad-hoc human-crafted models supporting replicability and
falsifiability.Comment: 18 pages, 12 figures, 1 tabl
Defining CARE Properties Through Temporal Input Models
In this paper we show how it is possible to represent the
CARE properties (complementarity, assignment, redundancy,
equivalence) modelling the temporal relationships
among inputs provided through different modalities. For
this purpose we extended GestIT, which provides a declarative
and compositional model for gestures, in order to
support other modalities. The generic models for the
CARE properties can be used for the input model design,
but also for an analysis of the relationships between
the different modalities included into an existing input
model
What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision
We present a novel method for aligning a sequence of instructions to a video
of someone carrying out a task. In particular, we focus on the cooking domain,
where the instructions correspond to the recipe. Our technique relies on an HMM
to align the recipe steps to the (automatically generated) speech transcript.
We then refine this alignment using a state-of-the-art visual food detector,
based on a deep convolutional neural network. We show that our technique
outperforms simpler techniques based on keyword spotting. It also enables
interesting applications, such as automatically illustrating recipes with
keyframes, and searching within a video for events of interest.Comment: To appear in NAACL 201
- …