17,177 research outputs found
Discriminatively Trained Latent Ordinal Model for Video Classification
We study the problem of video classification for facial analysis and human
action recognition. We propose a novel weakly supervised learning method that
models the video as a sequence of automatically mined, discriminative
sub-events (eg. onset and offset phase for "smile", running and jumping for
"highjump"). The proposed model is inspired by the recent works on Multiple
Instance Learning and latent SVM/HCRF -- it extends such frameworks to model
the ordinal aspect in the videos, approximately. We obtain consistent
improvements over relevant competitive baselines on four challenging and
publicly available video based facial analysis datasets for prediction of
expression, clinical pain and intent in dyadic conversations and on three
challenging human action datasets. We also validate the method with qualitative
results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text
overlap with arXiv:1604.0150
Complexity-Aware Assignment of Latent Values in Discriminative Models for Accurate Gesture Recognition
Many of the state-of-the-art algorithms for gesture recognition are based on
Conditional Random Fields (CRFs). Successful approaches, such as the
Latent-Dynamic CRFs, extend the CRF by incorporating latent variables, whose
values are mapped to the values of the labels. In this paper we propose a novel
methodology to set the latent values according to the gesture complexity. We
use an heuristic that iterates through the samples associated with each label
value, stimating their complexity. We then use it to assign the latent values
to the label values. We evaluate our method on the task of recognizing human
gestures from video streams. The experiments were performed in binary datasets,
generated by grouping different labels. Our results demonstrate that our
approach outperforms the arbitrary one in many cases, increasing the accuracy
by up to 10%.Comment: Conference paper published at 2016 29th SIBGRAPI, Conference on
Graphics, Patterns and Images (SIBGRAPI). 8 pages, 7 figure
An Expressive Deep Model for Human Action Parsing from A Single Image
This paper aims at one newly raising task in vision and multimedia research:
recognizing human actions from still images. Its main challenges lie in the
large variations in human poses and appearances, as well as the lack of
temporal motion information. Addressing these problems, we propose to develop
an expressive deep model to naturally integrate human layout and surrounding
contexts for higher level action understanding from still images. In
particular, a Deep Belief Net is trained to fuse information from different
noisy sources such as body part detection and object detection. To bridge the
semantic gap, we used manually labeled data to greatly improve the
effectiveness and efficiency of the pre-training and fine-tuning stages of the
DBN training. The resulting framework is shown to be robust to sometimes
unreliable inputs (e.g., imprecise detections of human parts and objects), and
outperforms the state-of-the-art approaches.Comment: 6 pages, 8 figures, ICME 201
A hierarchy of recurrent networks for speech recognition
Generative models for sequential data based on directed graphs of Restricted Boltzmann Machines (RBMs) are able to accurately model high dimensional sequences as recently shown. In these models, temporal dependencies in the input are discovered by either buffering previous visible variables or by recurrent connections of the hidden variables. Here we propose a modification of these models, the Temporal Reservoir Machine (TRM). It utilizes a recurrent artificial neural network (ANN) for integrating information from the input over
time. This information is then fed into a RBM at each time step. To avoid difficulties of recurrent network learning, the ANN remains untrained and hence can be thought of as a random feature extractor. Using the architecture of multi-layer RBMs (Deep Belief Networks), the TRMs can be used as a building block for complex hierarchical models. This approach unifies RBM-based approaches for sequential data modeling and the Echo State Network, a powerful approach for black-box system identification. The TRM is tested on a spoken digits task under noisy conditions, and competitive performances compared to previous models are observed
- …