2,705 research outputs found
Discriminative Autoencoder for Feature Extraction: Application to Character Recognition
Conventionally, autoencoders are unsupervised representation learning tools.
In this work, we propose a novel discriminative autoencoder. Use of supervised
discriminative learning ensures that the learned representation is robust to
variations commonly encountered in image datasets. Using the basic
discriminating autoencoder as a unit, we build a stacked architecture aimed at
extracting relevant representation from the training data. The efficiency of
our feature extraction algorithm ensures a high classification accuracy with
even simple classification schemes like KNN (K-nearest neighbor). We
demonstrate the superiority of our model for representation learning by
conducting experiments on standard datasets for character/image recognition and
subsequent comparison with existing supervised deep architectures like class
sparse stacked autoencoder and discriminative deep belief network.Comment: The final version has been accepted at Neural Processing Letter
Gender and Ethnicity Classification of Iris Images using Deep Class-Encoder
Soft biometric modalities have shown their utility in different applications
including reducing the search space significantly. This leads to improved
recognition performance, reduced computation time, and faster processing of
test samples. Some common soft biometric modalities are ethnicity, gender, age,
hair color, iris color, presence of facial hair or moles, and markers. This
research focuses on performing ethnicity and gender classification on iris
images. We present a novel supervised autoencoder based approach, Deep
Class-Encoder, which uses class labels to learn discriminative representation
for the given sample by mapping the learned feature vector to its label. The
proposed model is evaluated on two datasets each for ethnicity and gender
classification. The results obtained using the proposed Deep Class-Encoder
demonstrate its effectiveness in comparison to existing approaches and
state-of-the-art methods.Comment: International Joint Conference on Biometrics, 201
Greedy Deep Dictionary Learning
In this work we propose a new deep learning tool called deep dictionary
learning. Multi-level dictionaries are learnt in a greedy fashion, one layer at
a time. This requires solving a simple (shallow) dictionary learning problem,
the solution to this is well known. We apply the proposed technique on some
benchmark deep learning datasets. We compare our results with other deep
learning tools like stacked autoencoder and deep belief network; and state of
the art supervised dictionary learning tools like discriminative KSVD and label
consistent KSVD. Our method yields better results than all
Learning Representations of Affect from Speech
There has been a lot of prior work on representation learning for speech
recognition applications, but not much emphasis has been given to an
investigation of effective representations of affect from speech, where the
paralinguistic elements of speech are separated out from the verbal content. In
this paper, we explore denoising autoencoders for learning paralinguistic
attributes i.e. categorical and dimensional affective traits from speech. We
show that the representations learnt by the bottleneck layer of the autoencoder
are highly discriminative of activation intensity and at separating out
negative valence (sadness and anger) from positive valence (happiness). We
experiment with different input speech features (such as FFT and log-mel
spectrograms with temporal context windows), and different autoencoder
architectures (such as stacked and deep autoencoders). We also learn utterance
specific representations by a combination of denoising autoencoders and BLSTM
based recurrent autoencoders. Emotion classification is performed with the
learnt temporal/dynamic representations to evaluate the quality of the
representations. Experiments on a well-established real-life speech dataset
(IEMOCAP) show that the learnt representations are comparable to state of the
art feature extractors (such as voice quality features and MFCCs) and are
competitive with state-of-the-art approaches at emotion and dimensional affect
recognition.Comment: This is a submission for the ICLR (International Conference on
Learning Representations) Workshop 201
Supervised Mixed Norm Autoencoder for Kinship Verification in Unconstrained Videos
Identifying kinship relations has garnered interest due to several
applications such as organizing and tagging the enormous amount of videos being
uploaded on the Internet. Existing research in kinship verification primarily
focuses on kinship prediction with image pairs. In this research, we propose a
new deep learning framework for kinship verification in unconstrained videos
using a novel Supervised Mixed Norm regularization Autoencoder (SMNAE). This
new autoencoder formulation introduces class-specific sparsity in the weight
matrix. The proposed three-stage SMNAE based kinship verification framework
utilizes the learned spatio-temporal representation in the video frames for
verifying kinship in a pair of videos. A new kinship video (KIVI) database of
more than 500 individuals with variations due to illumination, pose, occlusion,
ethnicity, and expression is collected for this research. It comprises a total
of 355 true kin video pairs with over 250,000 still frames. The effectiveness
of the proposed framework is demonstrated on the KIVI database and six existing
kinship databases. On the KIVI database, SMNAE yields video-based kinship
verification accuracy of 83.18% which is at least 3.2% better than existing
algorithms. The algorithm is also evaluated on six publicly available kinship
databases and compared with best-reported results. It is observed that the
proposed SMNAE consistently yields best results on all the databasesComment: Accepted for publication in Transactions in Image Processin
Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization
Image clustering is one of the most important computer vision applications,
which has been extensively studied in literature. However, current clustering
methods mostly suffer from lack of efficiency and scalability when dealing with
large-scale and high-dimensional data. In this paper, we propose a new
clustering model, called DEeP Embedded RegularIzed ClusTering (DEPICT), which
efficiently maps data into a discriminative embedding subspace and precisely
predicts cluster assignments. DEPICT generally consists of a multinomial
logistic regression function stacked on top of a multi-layer convolutional
autoencoder. We define a clustering objective function using relative entropy
(KL divergence) minimization, regularized by a prior for the frequency of
cluster assignments. An alternating strategy is then derived to optimize the
objective by updating parameters and estimating cluster assignments.
Furthermore, we employ the reconstruction loss functions in our autoencoder, as
a data-dependent regularization term, to prevent the deep embedding function
from overfitting. In order to benefit from end-to-end optimization and
eliminate the necessity for layer-wise pretraining, we introduce a joint
learning framework to minimize the unified clustering and reconstruction loss
functions together and train all network layers simultaneously. Experimental
results indicate the superiority and faster running time of DEPICT in
real-world clustering tasks, where no labeled data is available for
hyper-parameter tuning
Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle
Visual representation is crucial for a visual tracking method's performances.
Conventionally, visual representations adopted in visual tracking rely on
hand-crafted computer vision descriptors. These descriptors were developed
generically without considering tracking-specific information. In this paper,
we propose to learn complex-valued invariant representations from tracked
sequential image patches, via strong temporal slowness constraint and stacked
convolutional autoencoders. The deep slow local representations are learned
offline on unlabeled data and transferred to the observational model of our
proposed tracker. The proposed observational model retains old training samples
to alleviate drift, and collect negative samples which are coherent with
target's motion pattern for better discriminative tracking. With the learned
representation and online training samples, a logistic regression classifier is
adopted to distinguish target from background, and retrained online to adapt to
appearance changes. Subsequently, the observational model is integrated into a
particle filter framework to peform visual tracking. Experimental results on
various challenging benchmark sequences demonstrate that the proposed tracker
performs favourably against several state-of-the-art trackers.Comment: Pattern Recognition (Elsevier), 201
Supervised COSMOS Autoencoder: Learning Beyond the Euclidean Loss!
Autoencoders are unsupervised deep learning models used for learning
representations. In literature, autoencoders have shown to perform well on a
variety of tasks spread across multiple domains, thereby establishing
widespread applicability. Typically, an autoencoder is trained to generate a
model that minimizes the reconstruction error between the input and the
reconstructed output, computed in terms of the Euclidean distance. While this
can be useful for applications related to unsupervised reconstruction, it may
not be optimal for classification. In this paper, we propose a novel Supervised
COSMOS Autoencoder which utilizes a multi-objective loss function to learn
representations that simultaneously encode the (i) "similarity" between the
input and reconstructed vectors in terms of their direction, (ii)
"distribution" of pixel values of the reconstruction with respect to the input
sample, while also incorporating (iii) "discriminability" in the feature
learning pipeline. The proposed autoencoder model incorporates a Cosine
similarity and Mahalanobis distance based loss function, along with supervision
via Mutual Information based loss. Detailed analysis of each component of the
proposed model motivates its applicability for feature learning in different
classification tasks. The efficacy of Supervised COSMOS autoencoder is
demonstrated via extensive experimental evaluations on different image
datasets. The proposed model outperforms existing algorithms on MNIST,
CIFAR-10, and SVHN databases. It also yields state-of-the-art results on
CelebA, LFWA, Adience, and IJB-A databases for attribute prediction and face
recognition, respectively
Alzheimer's Disease Diagnostics by a Deeply Supervised Adaptable 3D Convolutional Network
Early diagnosis, playing an important role in preventing progress and
treating the Alzheimer's disease (AD), is based on classification of features
extracted from brain images. The features have to accurately capture main
AD-related variations of anatomical brain structures, such as, e.g., ventricles
size, hippocampus shape, cortical thickness, and brain volume. This paper
proposes to predict the AD with a deep 3D convolutional neural network
(3D-CNN), which can learn generic features capturing AD biomarkers and adapt to
different domain datasets. The 3D-CNN is built upon a 3D convolutional
autoencoder, which is pre-trained to capture anatomical shape variations in
structural brain MRI scans. Fully connected upper layers of the 3D-CNN are then
fine-tuned for each task-specific AD classification. Experiments on the
\emph{ADNI} MRI dataset with no skull-stripping preprocessing have shown our
3D-CNN outperforms several conventional classifiers by accuracy and robustness.
Abilities of the 3D-CNN to generalize the features learnt and adapt to other
domains have been validated on the \emph{CADDementia} dataset
EE-AE: An Exclusivity Enhanced Unsupervised Feature Learning Approach
Unsupervised learning is becoming more and more important recently. As one of
its key components, the autoencoder (AE) aims to learn a latent feature
representation of data which is more robust and discriminative. However, most
AE based methods only focus on the reconstruction within the encoder-decoder
phase, which ignores the inherent relation of data, i.e., statistical and
geometrical dependence, and easily causes overfitting. In order to deal with
this issue, we propose an Exclusivity Enhanced (EE) unsupervised feature
learning approach to improve the conventional AE. To the best of our knowledge,
our research is the first to utilize such exclusivity concept to cooperate with
feature extraction within AE. Moreover, in this paper we also make some
improvements to the stacked AE structure especially for the connection of
different layers from decoders, this could be regarded as a weight
initialization trial. The experimental results show that our proposed approach
can achieve remarkable performance compared with other related methods
- …