888 research outputs found

    Mean-shift background image modelling

    Full text link
    Background modelling is widely used in computer vision for the detection of foreground objects in a frame sequence. The more accurate the background model, the more correct is the detection of the foreground objects. In this paper, we present an approach to background modelling based on a mean-shift procedure. The mean shift vector convergence properties enable the system to achieve reliable background modelling. In addition, histogram-based computation and the new concept of local basins of attraction allow us to meet the stringent real-time requirements of video processing. ©2004 IEEE

    Robust density modelling using the student's t-distribution for human action recognition

    Full text link
    The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE

    Structural SVM with Partial Ranking for Activity Segmentation and Classification

    Full text link
    © 1994-2012 IEEE. Structural SVM is an extension of the support vector machine for the joint prediction of structured labels from multiple measurements. Following a large margin principle, the training of structural SVM ensures that the ground-Truth labeling of each sample receives a score higher than that of any other labeling. However, no specific score ranking is imposed among the other labelings. In this letter, we extend the standard constraint set of structural SVM with constraints between 'almost-correct' labelings and less desirable ones to obtain a partial-ranking structural SVM (PR-SSVM) approach. Experimental results on action segmentation and classification with two challenging datasets (the TUM Kitchen mocap dataset and the CMU-MMAC video dataset) show that the proposed method achieves better detection and false alarm rates and higher F1 scores than both the conventional structural SVM and a comparable unstructured predictor. The proposed method also achieves higher accuracy than the state of the art on these datasets in excess of 14 and 31 percentage points, respectively

    Histogram-based training initialisation of hidden Markov models for human action recognition

    Full text link
    Human action recognition is often addressed by use of latent-state models such as the hidden Markov model and similar graphical models. As such models require Expectation-Maximisation training, arbitrary choices must be made for training initialisation, with major impact on the final recognition accuracy. In this paper, we propose a histogram-based deterministic initialisation and compare it with both random and a time-based deterministic initialisations. Experiments on a human action dataset show that the accuracy of the proposed method proved higher than that of the other tested methods. © 2010 IEEE

    Human action recognition with MPEG-7 descriptors and architectures

    Full text link
    Modern video surveillance requires addressing high-level concepts such as humans' actions and activities. In addition, surveillance applications need to be portable over a variety of platforms, from servers to mobile devices. In this paper, we explore the potential of the MPEG-7 standard to provide interfaces, descriptors, and architectures for human action recognition from surveillance cameras. Two novel MPEG-7 descriptors, symbolic and feature-based, are presented alongside two different architectures, server-intensive and client-intensive. The descriptors and architectures are evaluated in the paper by way of a scenario analysis

    A pair hidden Markov support vector machine for alignment of human actions

    Full text link
    © 2016 IEEE. Alignment of human actions in videos is an important task for applications such as action comparison and classification. While well-established algorithms such as dynamic time warping are available for this task, they still heavily rely on basic linear cost models and heuristic parameter tuning. In this paper we propose a novel framework that combines the flexibility of the pair hidden Markov model (PHMM) with the effective parameter training of the structural support vector machine (SSVM). The framework extends the scoring function of SSVM to capture the similarity of two input sequences and introduces suitable feature and loss functions. The proposed approach is evaluated against state-of-the-art algorithms such as dynamic time warping (DTW) and canonical time warping (CTW) on pairs of human actions from the Weizmann and Olympic Sports datasets. The experimental results show that the proposed approach is capable of achieving an accuracy improvement of over 7 percentage points over the runner-up on both datasets

    Where do bright ideas occur in our brain? Meta-analytic evidence from neuroimaging studies of domain-specific creativity

    Get PDF
    Many studies have assessed the neural underpinnings of creativity, failing to find a clear anatomical localization. We aimed to provide evidence for a multi-componential neural system for creativity. We applied a general activation likelihood estimation (ALE) meta-analysis to 45 fMRI studies. Three individual ALE analyses were performed to assess creativity in different cognitive domains (Musical, Verbal, and Visuo-spatial). The general ALE revealed that creativity relies on clusters of activations in the bilateral occipital, parietal, frontal, and temporal lobes. The individual ALE revealed different maximal activation in different domains. Musical creativity yields activations in the bilateral medial frontal gyrus, in the left cingulate gyrus, middle frontal gyrus, and inferior parietal lobule and in the right postcentral and fusiform gyri. Verbal creativity yields activations mainly located in the left hemisphere, in the prefrontal cortex, middle and superior temporal gyri, inferior parietal lobule, postcentral and supramarginal gyri, middle occipital gyrus, and insula. The right inferior frontal gyrus and the lingual gyrus were also activated. Visuo-spatial creativity activates the right middle and inferior frontal gyri, the bilateral thalamus and the left precentral gyrus. This evidence suggests that creativity relies on multi-componential neural networks and that different creativity domains depend on different brain regions

    Bi-modal emotion recognition from expressive face and body gestures

    Full text link
    Psychological research findings suggest that humans rely on the combined visual channels of face and body more than any other channel when they make judgments about human communicative behavior. However, most of the existing systems attempting to analyze the human nonverbal behavior are mono-modal and focus only on the face. Research that aims to integrate gestures as an expression mean has only recently emerged. Accordingly, this paper presents an approach to automatic visual recognition of expressive face and upper-body gestures from video sequences suitable for use in a vision-based affective multi-modal framework. Face and body movements are captured simultaneously using two separate cameras. For each video sequence single expressive frames both from face and body are selected manually for analysis and recognition of emotions. Firstly, individual classifiers are trained from individual modalities. Secondly, we fuse facial expression and affective body gesture information at the feature and at the decision level. In the experiments performed, the emotion classification using the two modalities achieved a better recognition accuracy outperforming classification using the individual facial or bodily modality alone. © 2006 Elsevier Ltd. All rights reserved

    Minimum-Risk Structured Learning of Video Summarization

    Full text link
    © 2017 IEEE. Video summarization is an important multimedia task for applications such as video indexing and retrieval, video surveillance, human-computer interaction and video 'storyboarding'. In this paper, we present a new approach for automatic summarization of video collections that leverages a structured minimum-risk classifier and efficient submodular inference. To test the accuracy of the predicted summaries we utilize a recently-proposed measure (V-JAUNE) that considers both the content and frame order of the original video. Qualitative and quantitative tests over two action video datasets - the ACE and the MSR DailyActivity3D datasets - show that the proposed approach delivers more accurate summaries than the compared minimum-risk and syntactic approaches
    corecore