Search CORE

8,073 research outputs found

A generic framework for video understanding applied to group behavior recognition

Author: Boulay Bernard
Bremond François
Zaidenberg Sofia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/06/2012
Field of study

This paper presents an approach to detect and track groups of people in video-surveillance applications, and to automatically recognize their behavior. This method keeps track of individuals moving together by maintaining a spacial and temporal group coherence. First, people are individually detected and tracked. Second, their trajectories are analyzed over a temporal window and clustered using the Mean-Shift algorithm. A coherence value describes how well a set of people can be described as a group. Furthermore, we propose a formal event description language. The group events recognition approach is successfully validated on 4 camera views from 3 datasets: an airport, a subway, a shopping center corridor and an entrance hall.Comment: (20/03/2012

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Activity Recognition Using Probabilistic Timed Automata

Author: Bogdan Kwolek
Lucjan Pelc
Publication venue: 'IntechOpen'
Publication date: 01/01/2008
Field of study

IntechOpen

CiteSeerX

Event Detection in Videos

Author: Khan Abdullah
Publication venue
Publication date: 01/01/2020
Field of study

Florence Research

Modeling cognitive load as a self-supervised brain rate with electroencephalography and deep learning

Author: Longo Luca
Publication venue
Publication date: 01/01/2022
Field of study

The principal reason for measuring mental workload is to quantify the cognitive cost of performing tasks to predict human performance. Unfortunately, a method for assessing mental workload that has general applicability does not exist yet. This research presents a novel self-supervised method for mental workload modelling from EEG data employing Deep Learning and a continuous brain rate, an index of cognitive activation, without requiring human declarative knowledge. This method is a convolutional recurrent neural network trainable with spatially preserving spectral topographic head-maps from EEG data to fit the brain rate variable. Findings demonstrate the capacity of the convolutional layers to learn meaningful high-level representations from EEG data since within-subject models had a test Mean Absolute Percentage Error average of 11%. The addition of a Long-Short Term Memory layer for handling sequences of high-level representations was not significant, although it did improve their accuracy. Findings point to the existence of quasi-stable blocks of learnt high-level representations of cognitive activation because they can be induced through convolution and seem not to be dependent on each other over time, intuitively matching the non-stationary nature of brain responses. Across-subject models, induced with data from an increasing number of participants, thus containing more variability, obtained a similar accuracy to the within-subject models. This highlights the potential generalisability of the induced high-level representations across people, suggesting the existence of subject-independent cognitive activation patterns. This research contributes to the body of knowledge by providing scholars with a novel computational method for mental workload modelling that aims to be generally applicable, does not rely on ad-hoc human-crafted models supporting replicability and falsifiability.Comment: 18 pages, 12 figures, 1 tabl

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Arrow@TUDublin

Directory of Open Access Journals

PubMed Central

Defining CARE Properties Through Temporal Input Models

Author: Spano L D
Publication venue: CEUR-WS.org
Publication date: 01/01/2014
Field of study

In this paper we show how it is possible to represent the CARE properties (complementarity, assignment, redundancy, equivalence) modelling the temporal relationships among inputs provided through different modalities. For this purpose we extended GestIT, which provides a declarative and compositional model for gestures, in order to support other modalities. The generic models for the CARE properties can be used for the input model design, but also for an analysis of the relationships between the different modalities included into an existing input model

CiteSeerX

Archivio istituzionale della ricerca - Università di Cagliari

What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision

Author: Huang Jonathan
Johnston Nick
Malmaud Jonathan
Murphy Kevin
Rabinovich Andrew
Rathod Vivek
Publication venue
Publication date: 01/01/2015
Field of study

We present a novel method for aligning a sequence of instructions to a video of someone carrying out a task. In particular, we focus on the cooking domain, where the instructions correspond to the recipe. Our technique relies on an HMM to align the recipe steps to the (automatically generated) speech transcript. We then refine this alignment using a state-of-the-art visual food detector, based on a deep convolutional neural network. We show that our technique outperforms simpler techniques based on keyword spotting. It also enables interesting applications, such as automatically illustrating recipes with keyframes, and searching within a video for events of interest.Comment: To appear in NAACL 201

arXiv.org e-Print Archive

CiteSeerX

Crossref