905 research outputs found
Mean-shift background image modelling
Background modelling is widely used in computer vision for the detection of foreground objects in a frame sequence. The more accurate the background model, the more correct is the detection of the foreground objects. In this paper, we present an approach to background modelling based on a mean-shift procedure. The mean shift vector convergence properties enable the system to achieve reliable background modelling. In addition, histogram-based computation and the new concept of local basins of attraction allow us to meet the stringent real-time requirements of video processing. ©2004 IEEE
Robust density modelling using the student's t-distribution for human action recognition
The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE
Structural SVM with Partial Ranking for Activity Segmentation and Classification
© 1994-2012 IEEE. Structural SVM is an extension of the support vector machine for the joint prediction of structured labels from multiple measurements. Following a large margin principle, the training of structural SVM ensures that the ground-Truth labeling of each sample receives a score higher than that of any other labeling. However, no specific score ranking is imposed among the other labelings. In this letter, we extend the standard constraint set of structural SVM with constraints between 'almost-correct' labelings and less desirable ones to obtain a partial-ranking structural SVM (PR-SSVM) approach. Experimental results on action segmentation and classification with two challenging datasets (the TUM Kitchen mocap dataset and the CMU-MMAC video dataset) show that the proposed method achieves better detection and false alarm rates and higher F1 scores than both the conventional structural SVM and a comparable unstructured predictor. The proposed method also achieves higher accuracy than the state of the art on these datasets in excess of 14 and 31 percentage points, respectively
Histogram-based training initialisation of hidden Markov models for human action recognition
Human action recognition is often addressed by use of latent-state models such as the hidden Markov model and similar graphical models. As such models require Expectation-Maximisation training, arbitrary choices must be made for training initialisation, with major impact on the final recognition accuracy. In this paper, we propose a histogram-based deterministic initialisation and compare it with both random and a time-based deterministic initialisations. Experiments on a human action dataset show that the accuracy of the proposed method proved higher than that of the other tested methods. © 2010 IEEE
Minimum-Risk Structured Learning of Video Summarization
© 2017 IEEE. Video summarization is an important multimedia task for applications such as video indexing and retrieval, video surveillance, human-computer interaction and video 'storyboarding'. In this paper, we present a new approach for automatic summarization of video collections that leverages a structured minimum-risk classifier and efficient submodular inference. To test the accuracy of the predicted summaries we utilize a recently-proposed measure (V-JAUNE) that considers both the content and frame order of the original video. Qualitative and quantitative tests over two action video datasets - the ACE and the MSR DailyActivity3D datasets - show that the proposed approach delivers more accurate summaries than the compared minimum-risk and syntactic approaches
Where do bright ideas occur in our brain? Meta-analytic evidence from neuroimaging studies of domain-specific creativity
Many studies have assessed the neural underpinnings of creativity, failing to find a clear anatomical localization. We aimed to provide evidence for a multi-componential neural system for creativity. We applied a general activation likelihood estimation (ALE) meta-analysis to 45 fMRI studies. Three individual ALE analyses were performed to assess creativity in different cognitive domains (Musical, Verbal, and Visuo-spatial). The general ALE revealed that creativity relies on clusters of activations in the bilateral occipital, parietal, frontal, and temporal lobes. The individual ALE revealed different maximal activation in different domains. Musical creativity yields activations in the bilateral medial frontal gyrus, in the left cingulate gyrus, middle frontal gyrus, and inferior parietal lobule and in the right postcentral and fusiform gyri. Verbal creativity yields activations mainly located in the left hemisphere, in the prefrontal cortex, middle and superior temporal gyri, inferior parietal lobule, postcentral and supramarginal gyri, middle occipital gyrus, and insula. The right inferior frontal gyrus and the lingual gyrus were also activated. Visuo-spatial creativity activates the right middle and inferior frontal gyri, the bilateral thalamus and the left precentral gyrus. This evidence suggests that creativity relies on multi-componential neural networks and that different creativity domains depend on different brain regions
Bi-modal emotion recognition from expressive face and body gestures
Psychological research findings suggest that humans rely on the combined visual channels of face and body more than any other channel when they make judgments about human communicative behavior. However, most of the existing systems attempting to analyze the human nonverbal behavior are mono-modal and focus only on the face. Research that aims to integrate gestures as an expression mean has only recently emerged. Accordingly, this paper presents an approach to automatic visual recognition of expressive face and upper-body gestures from video sequences suitable for use in a vision-based affective multi-modal framework. Face and body movements are captured simultaneously using two separate cameras. For each video sequence single expressive frames both from face and body are selected manually for analysis and recognition of emotions. Firstly, individual classifiers are trained from individual modalities. Secondly, we fuse facial expression and affective body gesture information at the feature and at the decision level. In the experiments performed, the emotion classification using the two modalities achieved a better recognition accuracy outperforming classification using the individual facial or bodily modality alone. © 2006 Elsevier Ltd. All rights reserved
Human action recognition with MPEG-7 descriptors and architectures
Modern video surveillance requires addressing high-level concepts such as humans' actions and activities. In addition, surveillance applications need to be portable over a variety of platforms, from servers to mobile devices. In this paper, we explore the potential of the MPEG-7 standard to provide interfaces, descriptors, and architectures for human action recognition from surveillance cameras. Two novel MPEG-7 descriptors, symbolic and feature-based, are presented alongside two different architectures, server-intensive and client-intensive. The descriptors and architectures are evaluated in the paper by way of a scenario analysis
A pair hidden Markov support vector machine for alignment of human actions
© 2016 IEEE. Alignment of human actions in videos is an important task for applications such as action comparison and classification. While well-established algorithms such as dynamic time warping are available for this task, they still heavily rely on basic linear cost models and heuristic parameter tuning. In this paper we propose a novel framework that combines the flexibility of the pair hidden Markov model (PHMM) with the effective parameter training of the structural support vector machine (SSVM). The framework extends the scoring function of SSVM to capture the similarity of two input sequences and introduces suitable feature and loss functions. The proposed approach is evaluated against state-of-the-art algorithms such as dynamic time warping (DTW) and canonical time warping (CTW) on pairs of human actions from the Weizmann and Olympic Sports datasets. The experimental results show that the proposed approach is capable of achieving an accuracy improvement of over 7 percentage points over the runner-up on both datasets
Hidden Markov models with kernel density estimation of emission probabilities and their use in activity recognition
In this paper, we present a modified hidden Markov model with emission probabilities modelled by kernel density estimation and its use for activity recognition in videos. In the proposed approach, kernel density estimation of the emission probabilities is operated simultaneously with that of all the other model parameters by an adapted Baum-Welch algorithm. This allows us to retain maximum-likelihood estimation while overcoming the known limitations of mixture of Gaussions in modelling certain probability distributions. Experiments on activity recognition have been performed on groundtruthed data from the CAVIAR video surveillance database and reported in the paper. The error on the training and validation sets with kernel density estimation remains around 14-16% while for the conventional Gaussian mixture approach varies between 15 and 24%, strongly depending on the initial values chosen for the parameters. Overall, kernel density estimation proves capable of providing more flexible modelling of the emission probabilities and, unlike Gaussian mixtures, does not suffer from being highly parametric and of difficult initialisation. © 2007 IEEE
- …