1,002 research outputs found
LOMo: Latent Ordinal Model for Facial Analysis in Videos
We study the problem of facial analysis in videos. We propose a novel weakly
supervised learning method that models the video event (expression, pain etc.)
as a sequence of automatically mined, discriminative sub-events (eg. onset and
offset phase for smile, brow lower and cheek raise for pain). The proposed
model is inspired by the recent works on Multiple Instance Learning and latent
SVM/HCRF- it extends such frameworks to model the ordinal or temporal aspect in
the videos, approximately. We obtain consistent improvements over relevant
competitive baselines on four challenging and publicly available video based
facial analysis datasets for prediction of expression, clinical pain and intent
in dyadic conversations. In combination with complimentary features, we report
state-of-the-art results on these datasets.Comment: 2016 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR
Topic supervised non-negative matrix factorization
Topic models have been extensively used to organize and interpret the
contents of large, unstructured corpora of text documents. Although topic
models often perform well on traditional training vs. test set evaluations, it
is often the case that the results of a topic model do not align with human
interpretation. This interpretability fallacy is largely due to the
unsupervised nature of topic models, which prohibits any user guidance on the
results of a model. In this paper, we introduce a semi-supervised method called
topic supervised non-negative matrix factorization (TS-NMF) that enables the
user to provide labeled example documents to promote the discovery of more
meaningful semantic structure of a corpus. In this way, the results of TS-NMF
better match the intuition and desired labeling of the user. The core of TS-NMF
relies on solving a non-convex optimization problem for which we derive an
iterative algorithm that is shown to be monotonic and convergent to a local
optimum. We demonstrate the practical utility of TS-NMF on the Reuters and
PubMed corpora, and find that TS-NMF is especially useful for conceptual or
broad topics, where topic key terms are not well understood. Although
identifying an optimal latent structure for the data is not a primary objective
of the proposed approach, we find that TS-NMF achieves higher weighted Jaccard
similarity scores than the contemporary methods, (unsupervised) NMF and latent
Dirichlet allocation, at supervision rates as low as 10% to 20%
Discriminatively Trained Latent Ordinal Model for Video Classification
We study the problem of video classification for facial analysis and human
action recognition. We propose a novel weakly supervised learning method that
models the video as a sequence of automatically mined, discriminative
sub-events (eg. onset and offset phase for "smile", running and jumping for
"highjump"). The proposed model is inspired by the recent works on Multiple
Instance Learning and latent SVM/HCRF -- it extends such frameworks to model
the ordinal aspect in the videos, approximately. We obtain consistent
improvements over relevant competitive baselines on four challenging and
publicly available video based facial analysis datasets for prediction of
expression, clinical pain and intent in dyadic conversations and on three
challenging human action datasets. We also validate the method with qualitative
results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text
overlap with arXiv:1604.0150
- …