Search CORE

2,741 research outputs found

Discriminatively Trained Latent Ordinal Model for Video Classification

Author: Sharma Gaurav
Sikka Karan
Publication venue
Publication date: 14/08/2017
Field of study

We study the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for "smile", running and jumping for "highjump"). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF -- it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1604.0150

arXiv.org e-Print Archive

MPG.PuRe

LOMo: Latent Ordinal Model for Facial Analysis in Videos

Author: Bartlett Marian
Sharma Gaurav
Sikka Karan
Publication venue
Publication date: 01/01/2016
Field of study

We study the problem of facial analysis in videos. We propose a novel weakly supervised learning method that models the video event (expression, pain etc.) as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for smile, brow lower and cheek raise for pain). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF- it extends such frameworks to model the ordinal or temporal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations. In combination with complimentary features, we report state-of-the-art results on these datasets.Comment: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR

arXiv.org e-Print Archive

MPG.PuRe

EmoNets: Multimodal deep learning approaches for emotion recognition in video

Author: Bengio Yoshua
Boulanger-Lewandowski Nicolas
Bouthillier Xavier
Courville Aaron
Dauphin Yann
Ferrari Raul Chandias
Froumenty Pierre
Gulcehre Caglar
Jean Sébastien
Kahou Samira Ebrahimi
Konda Kishore
Lamblin Pascal
Memisevic Roland
Michalski Vincent
Mirza Mehdi
Pal Christopher
Vincent Pascal
Warde-Farley David
Publication venue
Publication date: 29/03/2015
Field of study

The task of the emotion recognition in the wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies. The videos depict acted-out emotions under realistic conditions with a large degree of variation in attributes such as pose and illumination, making it worthwhile to explore approaches which consider combinations of features from multiple modalities for label assignment. In this paper we present our approach to learning several specialist models using deep learning techniques, each focusing on one modality. Among these are a convolutional neural network, focusing on capturing visual information in detected faces, a deep belief net focusing on the representation of the audio stream, a K-Means based "bag-of-mouths" model, which extracts visual features around the mouth region and a relational autoencoder, which addresses spatio-temporal aspects of videos. We explore multiple methods for the combination of cues from these modalities into one common classifier. This achieves a considerably greater accuracy than predictions from our strongest single-modality classifier. Our method was the winning submission in the 2013 EmotiW challenge and achieved a test set accuracy of 47.67% on the 2014 dataset

arXiv.org e-Print Archive

Crossref

PolyPublie

Single-trial analysis of EEG during rapid visual discrimination: enabling cortically-coupled computer vision

Author: Gerson Adam D.
Parra Lucas
Philiastides Marios G.
Sajda Paul
Publication venue: The MIT Press
Publication date: 01/01/2007
Field of study

We describe our work using linear discrimination of multi-channel electroencephalography for single-trial detection of neural signatures of visual recognition events. We demonstrate the approach as a methodology for relating neural variability to response variability, describing studies for response accuracy and response latency during visual target detection. We then show how the approach can be utilized to construct a novel type of brain-computer interface, which we term cortically-coupled computer vision. In this application, a large database of images is triaged using the detected neural signatures. We show how ‘corticaltriaging’ improves image search over a strictly behavioral response

Enlighten

Spatio-temporal wardrobe generation of actor's clothing in video content

Author: E Simo-Serra
F Wang
H Wang
J Liaukonyte
K Nogueira
K Taşdemir
L Baraldi
L dos Santos Belo
M Ajmal
P Šaloun
R Achanta
SA Chatzichristofis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Crossref

Ghent University Academic Bibliography