2,184 research outputs found
Deep Linear Discriminant Analysis
We introduce Deep Linear Discriminant Analysis (DeepLDA) which learns
linearly separable latent representations in an end-to-end fashion. Classic LDA
extracts features which preserve class separability and is used for
dimensionality reduction for many classification problems. The central idea of
this paper is to put LDA on top of a deep neural network. This can be seen as a
non-linear extension of classic LDA. Instead of maximizing the likelihood of
target labels for individual samples, we propose an objective function that
pushes the network to produce feature distributions which: (a) have low
variance within the same class and (b) high variance between different classes.
Our objective is derived from the general LDA eigenvalue problem and still
allows to train with stochastic gradient descent and back-propagation. For
evaluation we test our approach on three different benchmark datasets (MNIST,
CIFAR-10 and STL-10). DeepLDA produces competitive results on MNIST and
CIFAR-10 and outperforms a network trained with categorical cross entropy (same
architecture) on a supervised setting of STL-10.Comment: Published as a conference paper at ICLR 201
Low Resolution Face Recognition Using a Two-Branch Deep Convolutional Neural Network Architecture
We propose a novel couple mappings method for low resolution face recognition
using deep convolutional neural networks (DCNNs). The proposed architecture
consists of two branches of DCNNs to map the high and low resolution face
images into a common space with nonlinear transformations. The branch
corresponding to transformation of high resolution images consists of 14 layers
and the other branch which maps the low resolution face images to the common
space includes a 5-layer super-resolution network connected to a 14-layer
network. The distance between the features of corresponding high and low
resolution images are backpropagated to train the networks. Our proposed method
is evaluated on FERET data set and compared with state-of-the-art competing
methods. Our extensive experimental results show that the proposed method
significantly improves the recognition performance especially for very low
resolution probe face images (11.4% improvement in recognition accuracy).
Furthermore, it can reconstruct a high resolution image from its corresponding
low resolution probe image which is comparable with state-of-the-art
super-resolution methods in terms of visual quality.Comment: 11 pages, 8 figure
Improving Efficiency in Convolutional Neural Network with Multilinear Filters
The excellent performance of deep neural networks has enabled us to solve
several automatization problems, opening an era of autonomous devices. However,
current deep net architectures are heavy with millions of parameters and
require billions of floating point operations. Several works have been
developed to compress a pre-trained deep network to reduce memory footprint
and, possibly, computation. Instead of compressing a pre-trained network, in
this work, we propose a generic neural network layer structure employing
multilinear projection as the primary feature extractor. The proposed
architecture requires several times less memory as compared to the traditional
Convolutional Neural Networks (CNN), while inherits the similar design
principles of a CNN. In addition, the proposed architecture is equipped with
two computation schemes that enable computation reduction or scalability.
Experimental results show the effectiveness of our compact projection that
outperforms traditional CNN, while requiring far fewer parameters.Comment: 10 pages, 3 figure
Deep Linear Discriminant Analysis on Fisher Networks: A Hybrid Architecture for Person Re-identification
Person re-identification is to seek a correct match for a person of interest
across views among a large number of imposters. It typically involves two
procedures of non-linear feature extractions against dramatic appearance
changes, and subsequent discriminative analysis in order to reduce intra-
personal variations while enlarging inter-personal differences. In this paper,
we introduce a hybrid architecture which combines Fisher vectors and deep
neural networks to learn non-linear representations of person images to a space
where data can be linearly separable. We reinforce a Linear Discriminant
Analysis (LDA) on top of the deep neural network such that linearly separable
latent representations can be learnt in an end-to-end fashion. By optimizing an
objective function modified from LDA, the network is enforced to produce
feature distributions which have a low variance within the same class and high
variance between classes. The objective is essentially derived from the general
LDA eigenvalue problem and allows to train the network with stochastic gradient
descent and back-propagate LDA gradients to compute the gradients involved in
Fisher vector encoding. For evaluation we test our approach on four benchmark
data sets in person re-identification (VIPeR [1], CUHK03 [2], CUHK01 [3], and
Market1501 [4]). Extensive experiments on these benchmarks show that our model
can achieve state-of-the-art results.Comment: 12 page
Natural Image Manipulation for Autoregressive Models Using Fisher Scores
Deep autoregressive models are one of the most powerful models that exist
today which achieve state-of-the-art bits per dim. However, they lie at a
strict disadvantage when it comes to controlled sample generation compared to
latent variable models. Latent variable models such as VAEs and normalizing
flows allow meaningful semantic manipulations in latent space, which
autoregressive models do not have. In this paper, we propose using Fisher
scores as a method to extract embeddings from an autoregressive model to use
for interpolation and show that our method provides more meaningful sample
manipulation compared to alternate embeddings such as network activations
Text Classification Algorithms: A Survey
In recent years, there has been an exponential growth in the number of
complex documents and texts that require a deeper understanding of machine
learning methods to be able to accurately classify texts in many applications.
Many machine learning approaches have achieved surpassing results in natural
language processing. The success of these learning algorithms relies on their
capacity to understand complex models and non-linear relationships within data.
However, finding suitable structures, architectures, and techniques for text
classification is a challenge for researchers. In this paper, a brief overview
of text classification algorithms is discussed. This overview covers different
text feature extractions, dimensionality reduction methods, existing algorithms
and techniques, and evaluations methods. Finally, the limitations of each
technique and their application in the real-world problem are discussed
Efficient Gender Classification Using a Deep LDA-Pruned Net
Many real-time tasks, such as human-computer interaction, require fast and
efficient facial gender classification. Although deep CNN nets have been very
effective for a multitude of classification tasks, their high space and time
demands make them impractical for personal computers and mobile devices without
a powerful GPU. In this paper, we develop a 16-layer, yet lightweight, neural
network which boosts efficiency while maintaining high accuracy. Our net is
pruned from the VGG-16 model starting from the last convolutional (conv) layer
where we find neuron activations are highly uncorrelated given the gender.
Through Fisher's Linear Discriminant Analysis (LDA), we show that this high
decorrelation makes it safe to discard directly last conv layer neurons with
high within-class variance and low between-class variance. Combined with either
Support Vector Machines (SVM) or Bayesian classification, the reduced CNNs are
capable of achieving comparable (or even higher) accuracies on the LFW and
CelebA datasets than the original net with fully connected layers. On LFW, only
four Conv5_3 neurons are able to maintain a comparably high recognition
accuracy, which results in a reduction of total network size by a factor of 70X
with a 11 fold speedup. Comparisons with a state-of-the-art pruning method as
well as two smaller nets in terms of accuracy loss and convolutional layers
pruning rate are also provided.Comment: The only difference with the previous version v2 is the title on the
arxiv page. I am changing it back to the original title in v1 because
otherwise google scholar cannot track the citations to this arxiv paper
correctly. You could cite either the conference version or this arxiv
version. They are equivalen
Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Probe and Learn Neural Networks
In this paper, we propose a novel model for high-dimensional data, called the
Hybrid Orthogonal Projection and Estimation (HOPE) model, which combines a
linear orthogonal projection and a finite mixture model under a unified
generative modeling framework. The HOPE model itself can be learned
unsupervised from unlabelled data based on the maximum likelihood estimation as
well as discriminatively from labelled data. More interestingly, we have shown
the proposed HOPE models are closely related to neural networks (NNs) in a
sense that each hidden layer can be reformulated as a HOPE model. As a result,
the HOPE framework can be used as a novel tool to probe why and how NNs work,
more importantly, to learn NNs in either supervised or unsupervised ways. In
this work, we have investigated the HOPE framework to learn NNs for several
standard tasks, including image recognition on MNIST and speech recognition on
TIMIT. Experimental results have shown that the HOPE framework yields
significant performance gains over the current state-of-the-art methods in
various types of NN learning problems, including unsupervised feature learning,
supervised or semi-supervised learning.Comment: 31 pages, 5 Figures, technical repor
Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition
Linear Discriminant Analysis (LDA) has been used as a standard
post-processing procedure in many state-of-the-art speaker recognition tasks.
Through maximizing the inter-speaker difference and minimizing the
intra-speaker variation, LDA projects i-vectors to a lower-dimensional and more
discriminative sub-space. In this paper, we propose a neural network based
compensation scheme(termed as deep discriminant analysis, DDA) for i-vector
based speaker recognition, which shares the spirit with LDA. Optimized against
softmax loss and center loss at the same time, the proposed method learns a
more compact and discriminative embedding space. Compared with the Gaussian
distribution assumption of data and the learnt linear projection in LDA, the
proposed method doesn't pose any assumptions on data and can learn a non-linear
projection function. Experiments are carried out on a short-duration
text-independent dataset based on the SRE Corpus, noticeable performance
improvement can be observed against the normal LDA or PLDA methods
Multi-view Laplacian Eigenmaps Based on Bag-of-Neighbors For RGBD Human Emotion Recognition
Human emotion recognition is an important direction in the field of biometric
and information forensics. However, most existing human emotion research are
based on the single RGB view. In this paper, we introduce a RGBD video-emotion
dataset and a RGBD face-emotion dataset for research. To our best knowledge,
this may be the first RGBD video-emotion dataset. We propose a new supervised
nonlinear multi-view laplacian eigenmaps (MvLE) approach and a
multihidden-layer out-of-sample network (MHON) for RGB-D humanemotion
recognition. To get better representations of RGB view and depth view, MvLE is
used to map the training set of both views from original space into the common
subspace. As RGB view and depth view lie in different spaces, a new distance
metric bag of neighbors (BON) used in MvLE can get the similar distributions of
the two views. Finally, MHON is used to get the low-dimensional representations
of test data and predict their labels. MvLE can deal with the cases that RGB
view and depth view have different size of features, even different number of
samples and classes. And our methods can be easily extended to more than two
views. The experiment results indicate the effectiveness of our methods over
some state-of-art methods
- …