Search CORE

2,705 research outputs found

Discriminative Autoencoder for Feature Extraction: Application to Character Recognition

Author: Gogna Anupriya
Majumdar Angshul
Publication venue
Publication date: 11/12/2019
Field of study

Conventionally, autoencoders are unsupervised representation learning tools. In this work, we propose a novel discriminative autoencoder. Use of supervised discriminative learning ensures that the learned representation is robust to variations commonly encountered in image datasets. Using the basic discriminating autoencoder as a unit, we build a stacked architecture aimed at extracting relevant representation from the training data. The efficiency of our feature extraction algorithm ensures a high classification accuracy with even simple classification schemes like KNN (K-nearest neighbor). We demonstrate the superiority of our model for representation learning by conducting experiments on standard datasets for character/image recognition and subsequent comparison with existing supervised deep architectures like class sparse stacked autoencoder and discriminative deep belief network.Comment: The final version has been accepted at Neural Processing Letter

arXiv.org e-Print Archive

Gender and Ethnicity Classification of Iris Images using Deep Class-Encoder

Author: Majumdar Angshul
Nagpal Shruti
Noore Afzel
Singh Maneet
Singh Richa
Vatsa Mayank
Publication venue
Publication date: 08/10/2017
Field of study

Soft biometric modalities have shown their utility in different applications including reducing the search space significantly. This leads to improved recognition performance, reduced computation time, and faster processing of test samples. Some common soft biometric modalities are ethnicity, gender, age, hair color, iris color, presence of facial hair or moles, and markers. This research focuses on performing ethnicity and gender classification on iris images. We present a novel supervised autoencoder based approach, Deep Class-Encoder, which uses class labels to learn discriminative representation for the given sample by mapping the learned feature vector to its label. The proposed model is evaluated on two datasets each for ethnicity and gender classification. The results obtained using the proposed Deep Class-Encoder demonstrate its effectiveness in comparison to existing approaches and state-of-the-art methods.Comment: International Joint Conference on Biometrics, 201

arXiv.org e-Print Archive

Greedy Deep Dictionary Learning

Author: Majumdar Angshul
Singh Richa
Tariyal Snigdha
Vatsa Mayank
Publication venue
Publication date: 31/01/2016
Field of study

In this work we propose a new deep learning tool called deep dictionary learning. Multi-level dictionaries are learnt in a greedy fashion, one layer at a time. This requires solving a simple (shallow) dictionary learning problem, the solution to this is well known. We apply the proposed technique on some benchmark deep learning datasets. We compare our results with other deep learning tools like stacked autoencoder and deep belief network; and state of the art supervised dictionary learning tools like discriminative KSVD and label consistent KSVD. Our method yields better results than all

arXiv.org e-Print Archive

Learning Representations of Affect from Speech

Author: Ghosh Sayan
Laksana Eugene
Morency Louis-Philippe
Scherer Stefan
Publication venue
Publication date: 14/02/2016
Field of study

There has been a lot of prior work on representation learning for speech recognition applications, but not much emphasis has been given to an investigation of effective representations of affect from speech, where the paralinguistic elements of speech are separated out from the verbal content. In this paper, we explore denoising autoencoders for learning paralinguistic attributes i.e. categorical and dimensional affective traits from speech. We show that the representations learnt by the bottleneck layer of the autoencoder are highly discriminative of activation intensity and at separating out negative valence (sadness and anger) from positive valence (happiness). We experiment with different input speech features (such as FFT and log-mel spectrograms with temporal context windows), and different autoencoder architectures (such as stacked and deep autoencoders). We also learn utterance specific representations by a combination of denoising autoencoders and BLSTM based recurrent autoencoders. Emotion classification is performed with the learnt temporal/dynamic representations to evaluate the quality of the representations. Experiments on a well-established real-life speech dataset (IEMOCAP) show that the learnt representations are comparable to state of the art feature extractors (such as voice quality features and MFCCs) and are competitive with state-of-the-art approaches at emotion and dimensional affect recognition.Comment: This is a submission for the ICLR (International Conference on Learning Representations) Workshop 201

arXiv.org e-Print Archive

Supervised Mixed Norm Autoencoder for Kinship Verification in Unconstrained Videos

Author: Kohli Naman
Noore Afzel
Singh Richa
Vatsa Mayank
Yadav Daksha
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/05/2018
Field of study

Identifying kinship relations has garnered interest due to several applications such as organizing and tagging the enormous amount of videos being uploaded on the Internet. Existing research in kinship verification primarily focuses on kinship prediction with image pairs. In this research, we propose a new deep learning framework for kinship verification in unconstrained videos using a novel Supervised Mixed Norm regularization Autoencoder (SMNAE). This new autoencoder formulation introduces class-specific sparsity in the weight matrix. The proposed three-stage SMNAE based kinship verification framework utilizes the learned spatio-temporal representation in the video frames for verifying kinship in a pair of videos. A new kinship video (KIVI) database of more than 500 individuals with variations due to illumination, pose, occlusion, ethnicity, and expression is collected for this research. It comprises a total of 355 true kin video pairs with over 250,000 still frames. The effectiveness of the proposed framework is demonstrated on the KIVI database and six existing kinship databases. On the KIVI database, SMNAE yields video-based kinship verification accuracy of 83.18% which is at least 3.2% better than existing algorithms. The algorithm is also evaluated on six publicly available kinship databases and compared with best-reported results. It is observed that the proposed SMNAE consistently yields best results on all the databasesComment: Accepted for publication in Transactions in Image Processin

arXiv.org e-Print Archive

Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization

Author: Cai Weidong
Deng Cheng
Dizaji Kamran Ghasedi
Herandi Amirhossein
Huang Heng
Publication venue
Publication date: 08/08/2017
Field of study

Image clustering is one of the most important computer vision applications, which has been extensively studied in literature. However, current clustering methods mostly suffer from lack of efficiency and scalability when dealing with large-scale and high-dimensional data. In this paper, we propose a new clustering model, called DEeP Embedded RegularIzed ClusTering (DEPICT), which efficiently maps data into a discriminative embedding subspace and precisely predicts cluster assignments. DEPICT generally consists of a multinomial logistic regression function stacked on top of a multi-layer convolutional autoencoder. We define a clustering objective function using relative entropy (KL divergence) minimization, regularized by a prior for the frequency of cluster assignments. An alternating strategy is then derived to optimize the objective by updating parameters and estimating cluster assignments. Furthermore, we employ the reconstruction loss functions in our autoencoder, as a data-dependent regularization term, to prevent the deep embedding function from overfitting. In order to benefit from end-to-end optimization and eliminate the necessity for layer-wise pretraining, we introduce a joint learning framework to minimize the unified clustering and reconstruction loss functions together and train all network layers simultaneously. Experimental results indicate the superiority and faster running time of DEPICT in real-world clustering tasks, where no labeled data is available for hyper-parameter tuning

arXiv.org e-Print Archive

Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle

Author: Kuen Jason
Lee Chin Poo
Lim Kian Ming
Publication venue: 'Elsevier BV'
Publication date: 14/04/2016
Field of study

Visual representation is crucial for a visual tracking method's performances. Conventionally, visual representations adopted in visual tracking rely on hand-crafted computer vision descriptors. These descriptors were developed generically without considering tracking-specific information. In this paper, we propose to learn complex-valued invariant representations from tracked sequential image patches, via strong temporal slowness constraint and stacked convolutional autoencoders. The deep slow local representations are learned offline on unlabeled data and transferred to the observational model of our proposed tracker. The proposed observational model retains old training samples to alleviate drift, and collect negative samples which are coherent with target's motion pattern for better discriminative tracking. With the learned representation and online training samples, a logistic regression classifier is adopted to distinguish target from background, and retrained online to adapt to appearance changes. Subsequently, the observational model is integrated into a particle filter framework to peform visual tracking. Experimental results on various challenging benchmark sequences demonstrate that the proposed tracker performs favourably against several state-of-the-art trackers.Comment: Pattern Recognition (Elsevier), 201

arXiv.org e-Print Archive

Supervised COSMOS Autoencoder: Learning Beyond the Euclidean Loss!

Author: Nagpal Shruti
Noore Afzel
Singh Maneet
Singh Richa
Vatsa Mayank
Publication venue
Publication date: 15/10/2018
Field of study

Autoencoders are unsupervised deep learning models used for learning representations. In literature, autoencoders have shown to perform well on a variety of tasks spread across multiple domains, thereby establishing widespread applicability. Typically, an autoencoder is trained to generate a model that minimizes the reconstruction error between the input and the reconstructed output, computed in terms of the Euclidean distance. While this can be useful for applications related to unsupervised reconstruction, it may not be optimal for classification. In this paper, we propose a novel Supervised COSMOS Autoencoder which utilizes a multi-objective loss function to learn representations that simultaneously encode the (i) "similarity" between the input and reconstructed vectors in terms of their direction, (ii) "distribution" of pixel values of the reconstruction with respect to the input sample, while also incorporating (iii) "discriminability" in the feature learning pipeline. The proposed autoencoder model incorporates a Cosine similarity and Mahalanobis distance based loss function, along with supervision via Mutual Information based loss. Detailed analysis of each component of the proposed model motivates its applicability for feature learning in different classification tasks. The efficacy of Supervised COSMOS autoencoder is demonstrated via extensive experimental evaluations on different image datasets. The proposed model outperforms existing algorithms on MNIST, CIFAR-10, and SVHN databases. It also yields state-of-the-art results on CelebA, LFWA, Adience, and IJB-A databases for attribute prediction and face recognition, respectively

arXiv.org e-Print Archive

Alzheimer's Disease Diagnostics by a Deeply Supervised Adaptable 3D Convolutional Network

Author: El-Baz Ayman
Gimel'farb Georgy
Hosseini-Asl Ehsan
Publication venue
Publication date: 02/07/2016
Field of study

Early diagnosis, playing an important role in preventing progress and treating the Alzheimer's disease (AD), is based on classification of features extracted from brain images. The features have to accurately capture main AD-related variations of anatomical brain structures, such as, e.g., ventricles size, hippocampus shape, cortical thickness, and brain volume. This paper proposes to predict the AD with a deep 3D convolutional neural network (3D-CNN), which can learn generic features capturing AD biomarkers and adapt to different domain datasets. The 3D-CNN is built upon a 3D convolutional autoencoder, which is pre-trained to capture anatomical shape variations in structural brain MRI scans. Fully connected upper layers of the 3D-CNN are then fine-tuned for each task-specific AD classification. Experiments on the \emph{ADNI} MRI dataset with no skull-stripping preprocessing have shown our 3D-CNN outperforms several conventional classifiers by accuracy and robustness. Abilities of the 3D-CNN to generalize the features learnt and adapt to other domains have been validated on the \emph{CADDementia} dataset

arXiv.org e-Print Archive

EE-AE: An Exclusivity Enhanced Unsupervised Feature Learning Approach

Author: Guo Jingcai
Guo Song
Publication venue
Publication date: 30/03/2019
Field of study

Unsupervised learning is becoming more and more important recently. As one of its key components, the autoencoder (AE) aims to learn a latent feature representation of data which is more robust and discriminative. However, most AE based methods only focus on the reconstruction within the encoder-decoder phase, which ignores the inherent relation of data, i.e., statistical and geometrical dependence, and easily causes overfitting. In order to deal with this issue, we propose an Exclusivity Enhanced (EE) unsupervised feature learning approach to improve the conventional AE. To the best of our knowledge, our research is the first to utilize such exclusivity concept to cooperate with feature extraction within AE. Moreover, in this paper we also make some improvements to the stacked AE structure especially for the connection of different layers from decoders, this could be regarded as a weight initialization trial. The experimental results show that our proposed approach can achieve remarkable performance compared with other related methods

arXiv.org e-Print Archive