17,177 research outputs found

    Discriminatively Trained Latent Ordinal Model for Video Classification

    Full text link
    We study the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for "smile", running and jumping for "highjump"). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF -- it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1604.0150

    Complexity-Aware Assignment of Latent Values in Discriminative Models for Accurate Gesture Recognition

    Full text link
    Many of the state-of-the-art algorithms for gesture recognition are based on Conditional Random Fields (CRFs). Successful approaches, such as the Latent-Dynamic CRFs, extend the CRF by incorporating latent variables, whose values are mapped to the values of the labels. In this paper we propose a novel methodology to set the latent values according to the gesture complexity. We use an heuristic that iterates through the samples associated with each label value, stimating their complexity. We then use it to assign the latent values to the label values. We evaluate our method on the task of recognizing human gestures from video streams. The experiments were performed in binary datasets, generated by grouping different labels. Our results demonstrate that our approach outperforms the arbitrary one in many cases, increasing the accuracy by up to 10%.Comment: Conference paper published at 2016 29th SIBGRAPI, Conference on Graphics, Patterns and Images (SIBGRAPI). 8 pages, 7 figure

    An Expressive Deep Model for Human Action Parsing from A Single Image

    Full text link
    This paper aims at one newly raising task in vision and multimedia research: recognizing human actions from still images. Its main challenges lie in the large variations in human poses and appearances, as well as the lack of temporal motion information. Addressing these problems, we propose to develop an expressive deep model to naturally integrate human layout and surrounding contexts for higher level action understanding from still images. In particular, a Deep Belief Net is trained to fuse information from different noisy sources such as body part detection and object detection. To bridge the semantic gap, we used manually labeled data to greatly improve the effectiveness and efficiency of the pre-training and fine-tuning stages of the DBN training. The resulting framework is shown to be robust to sometimes unreliable inputs (e.g., imprecise detections of human parts and objects), and outperforms the state-of-the-art approaches.Comment: 6 pages, 8 figures, ICME 201

    A hierarchy of recurrent networks for speech recognition

    Get PDF
    Generative models for sequential data based on directed graphs of Restricted Boltzmann Machines (RBMs) are able to accurately model high dimensional sequences as recently shown. In these models, temporal dependencies in the input are discovered by either buffering previous visible variables or by recurrent connections of the hidden variables. Here we propose a modification of these models, the Temporal Reservoir Machine (TRM). It utilizes a recurrent artificial neural network (ANN) for integrating information from the input over time. This information is then fed into a RBM at each time step. To avoid difficulties of recurrent network learning, the ANN remains untrained and hence can be thought of as a random feature extractor. Using the architecture of multi-layer RBMs (Deep Belief Networks), the TRMs can be used as a building block for complex hierarchical models. This approach unifies RBM-based approaches for sequential data modeling and the Echo State Network, a powerful approach for black-box system identification. The TRM is tested on a spoken digits task under noisy conditions, and competitive performances compared to previous models are observed
    • …
    corecore