Search CORE

7,926 research outputs found

Representation Learning: A Review and New Perspectives

Author: Bengio Yoshua
Courville Aaron
Vincent Pascal
Publication venue
Publication date: 01/01/2014
Field of study

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning

arXiv.org e-Print Archive

CiteSeerX

Spatio-Temporal Facial Expression Recognition Using Convolutional Neural Networks and Conditional Random Fields

Author: Hasani Behzad
Mahoor Mohammad H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/04/2017
Field of study

Automated Facial Expression Recognition (FER) has been a challenging task for decades. Many of the existing works use hand-crafted features such as LBP, HOG, LPQ, and Histogram of Optical Flow (HOF) combined with classifiers such as Support Vector Machines for expression recognition. These methods often require rigorous hyperparameter tuning to achieve good results. Recently Deep Neural Networks (DNN) have shown to outperform traditional methods in visual object recognition. In this paper, we propose a two-part network consisting of a DNN-based architecture followed by a Conditional Random Field (CRF) module for facial expression recognition in videos. The first part captures the spatial relation within facial images using convolutional layers followed by three Inception-ResNet modules and two fully-connected layers. To capture the temporal relation between the image frames, we use linear chain CRF in the second part of our network. We evaluate our proposed network on three publicly available databases, viz. CK+, MMI, and FERA. Experiments are performed in subject-independent and cross-database manners. Our experimental results show that cascading the deep network architecture with the CRF module considerably increases the recognition of facial expressions in videos and in particular it outperforms the state-of-the-art methods in the cross-database experiments and yields comparable results in the subject-independent experiments.Comment: To appear in 12th IEEE Conference on Automatic Face and Gesture Recognition Worksho

arXiv.org e-Print Archive

Crossref

Multicamera Action Recognition with Canonical Correlation Analysis and Discriminative Sequence Classification

Author: A. Bobick
A. Klami
A. Quattoni
B. Dasarathy
C. Stauffer
D. Hardoon
D. Tran
D. Weinland
F. Bach
G. Lavee
L. Wang
M. Tipping
P. Turaga
R. Cilla
R. Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Proceedings of: 4th International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2011, La Palma, Canary Islands, Spain, May 30 - June 3, 2011.This paper presents a feature fusion approach to the recognition of human actions from multiple cameras that avoids the computation of the 3D visual hull. Action descriptors are extracted for each one of the camera views available and projected into a common subspace that maximizes the correlation between each one of the components of the projections. That common subspace is learned using Probabilistic Canonical Correlation Analysis. The action classification is made in that subspace using a discriminative classifier. Results of the proposed method are shown for the classification of the IXMAS dataset.Publicad

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Multi-Action Recognition via Stochastic Modelling of Optical Flow and Gradients

Author: Carvajal Johanna
Lovell Brian C.
McCool Chris
Sanderson Conrad
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

In this paper we propose a novel approach to multi-action recognition that performs joint segmentation and classification. This approach models each action using a Gaussian mixture using robust low-dimensional action features. Segmentation is achieved by performing classification on overlapping temporal windows, which are then merged to produce the final result. This approach is considerably less complicated than previous methods which use dynamic programming or computationally expensive hidden Markov models (HMMs). Initial experiments on a stitched version of the KTH dataset show that the proposed approach achieves an accuracy of 78.3%, outperforming a recent HMM-based approach which obtained 71.2%

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queensland University of Technology ePrints Archive

University of Queensland eSpace

Human action recognition with sparse classification and multiple-view learning

Author: Bach
Blackburn
Bobick
Burges
Cedras
Chaaraoui
Cilla
Dasarathy
Efros
Hardoon
Holte
Iosifidis
Jolliffe
Laptev
Lavee
Liu
Nocedal
Nounou
Parameswaran
Pehlivan
Quattoni
Rabiner
Rudoy
Tibshirani
Tran
Wang
Weinland
Weinland
Zhu
Publication venue: 'Wiley'
Publication date: 01/01/2014
Field of study

Employing multiple camera viewpoints in the recognition of human actions increases performance. This paper presents a feature fusion approach to efficiently combine 2D observations extracted from different camera viewpoints. Multiple-view dimensionality reduction is employed to learn a common parameterization of 2D action descriptors computed for each one of the available viewpoints. Canonical correlation analysis and their variants are employed to obtain such parameterizations. A sparse sequence classifier based on L1 regularization is proposed to avoid the problem of having to choose the proper number of dimensions of the common parameterization. The proposed system is employed in the classification of the Inria Xmas Motion Acquisition Sequences (IXMAS) data set with successful results.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Universidad Carlos III de Madrid e-Archivo

LOMo: Latent Ordinal Model for Facial Analysis in Videos

Author: Bartlett Marian
Sharma Gaurav
Sikka Karan
Publication venue
Publication date: 01/01/2016
Field of study

We study the problem of facial analysis in videos. We propose a novel weakly supervised learning method that models the video event (expression, pain etc.) as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for smile, brow lower and cheek raise for pain). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF- it extends such frameworks to model the ordinal or temporal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations. In combination with complimentary features, we report state-of-the-art results on these datasets.Comment: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR

arXiv.org e-Print Archive

MPG.PuRe