3,206 research outputs found
Feature Selection via Sparse Approximation for Face Recognition
Inspired by biological vision systems, the over-complete local features with
huge cardinality are increasingly used for face recognition during the last
decades. Accordingly, feature selection has become more and more important and
plays a critical role for face data description and recognition. In this paper,
we propose a trainable feature selection algorithm based on the regularized
frame for face recognition. By enforcing a sparsity penalty term on the minimum
squared error (MSE) criterion, we cast the feature selection problem into a
combinatorial sparse approximation problem, which can be solved by greedy
methods or convex relaxation methods. Moreover, based on the same frame, we
propose a sparse Ho-Kashyap (HK) procedure to obtain simultaneously the optimal
sparse solution and the corresponding margin vector of the MSE criterion. The
proposed methods are used for selecting the most informative Gabor features of
face images for recognition and the experimental results on benchmark face
databases demonstrate the effectiveness of the proposed methods
Winner-Take-All Autoencoders
In this paper, we propose a winner-take-all method for learning hierarchical
sparse representations in an unsupervised fashion. We first introduce
fully-connected winner-take-all autoencoders which use mini-batch statistics to
directly enforce a lifetime sparsity in the activations of the hidden units. We
then propose the convolutional winner-take-all autoencoder which combines the
benefits of convolutional architectures and autoencoders for learning
shift-invariant sparse representations. We describe a way to train
convolutional autoencoders layer by layer, where in addition to lifetime
sparsity, a spatial sparsity within each feature map is achieved using
winner-take-all activation functions. We will show that winner-take-all
autoencoders can be used to to learn deep sparse representations from the
MNIST, CIFAR-10, ImageNet, Street View House Numbers and Toronto Face datasets,
and achieve competitive classification performance
Sparse Architectures for Text-Independent Speaker Verification Using Deep Neural Networks
Network pruning is of great importance due to the elimination of the
unimportant weights or features activated due to the network
over-parametrization. Advantages of sparsity enforcement include preventing the
overfitting and speedup. Considering a large number of parameters in deep
architectures, network compression becomes of critical importance due to the
required huge amount of computational power. In this work, we impose structured
sparsity for speaker verification which is the validation of the query speaker
compared to the speaker gallery. We will show that the mere sparsity
enforcement can improve the verification results due to the possible initial
overfitting in the network
Deep Component Analysis via Alternating Direction Neural Networks
Despite a lack of theoretical understanding, deep neural networks have
achieved unparalleled performance in a wide range of applications. On the other
hand, shallow representation learning with component analysis is associated
with rich intuition and theory, but smaller capacity often limits its
usefulness. To bridge this gap, we introduce Deep Component Analysis (DeepCA),
an expressive multilayer model formulation that enforces hierarchical structure
through constraints on latent variables in each layer. For inference, we
propose a differentiable optimization algorithm implemented using recurrent
Alternating Direction Neural Networks (ADNNs) that enable parameter learning
using standard backpropagation. By interpreting feed-forward networks as
single-iteration approximations of inference in our model, we provide both a
novel theoretical perspective for understanding them and a practical technique
for constraining predictions with prior knowledge. Experimentally, we
demonstrate performance improvements on a variety of tasks, including
single-image depth prediction with sparse output constraints
Dictionary Learning and Sparse Coding on Statistical Manifolds
In this paper, we propose a novel information theoretic framework for
dictionary learning (DL) and sparse coding (SC) on a statistical manifold (the
manifold of probability distributions). Unlike the traditional DL and SC
framework, our new formulation does not explicitly incorporate any sparsity
inducing norm in the cost function being optimized but yet yields sparse codes.
Our algorithm approximates the data points on the statistical manifold (which
are probability distributions) by the weighted Kullback-Leibeler center/mean
(KL-center) of the dictionary atoms. The KL-center is defined as the minimizer
of the maximum KL-divergence between itself and members of the set whose center
is being sought. Further, we prove that the weighted KL-center is a sparse
combination of the dictionary atoms. This result also holds for the case when
the KL-divergence is replaced by the well known Hellinger distance. From an
applications perspective, we present an extension of the aforementioned
framework to the manifold of symmetric positive definite matrices (which can be
identified with the manifold of zero mean gaussian distributions),
. We present experiments involving a variety of dictionary-based
reconstruction and classification problems in Computer Vision. Performance of
the proposed algorithm is demonstrated by comparing it to several
state-of-the-art methods in terms of reconstruction and classification accuracy
as well as sparsity of the chosen representation.Comment: arXiv admin note: substantial text overlap with arXiv:1604.0693
Enforcing Template Representability and Temporal Consistency for Adaptive Sparse Tracking
Sparse representation has been widely studied in visual tracking, which has
shown promising tracking performance. Despite a lot of progress, the visual
tracking problem is still a challenging task due to appearance variations over
time. In this paper, we propose a novel sparse tracking algorithm that well
addresses temporal appearance changes, by enforcing template representability
and temporal consistency (TRAC). By modeling temporal consistency, our
algorithm addresses the issue of drifting away from a tracking target. By
exploring the templates' long-term-short-term representability, the proposed
method adaptively updates the dictionary using the most descriptive templates,
which significantly improves the robustness to target appearance changes. We
compare our TRAC algorithm against the state-of-the-art approaches on 12
challenging benchmark image sequences. Both qualitative and quantitative
results demonstrate that our algorithm significantly outperforms previous
state-of-the-art trackers.Comment: 8 pages. It has been accepted for publication in 25th International
Joint Conference on Artificial Intelligence (IJCAI-16
An efficient supervised dictionary learning method for audio signal recognition
Machine hearing or listening represents an emerging area. Conventional
approaches rely on the design of handcrafted features specialized to a specific
audio task and that can hardly generalized to other audio fields. For example,
Mel-Frequency Cepstral Coefficients (MFCCs) and its variants were successfully
applied to computational auditory scene recognition while Chroma vectors are
good at music chord recognition. Unfortunately, these predefined features may
be of variable discrimination power while extended to other tasks or even
within the same task due to different nature of clips. Motivated by this need
of a principled framework across domain applications for machine listening, we
propose a generic and data-driven representation learning approach. For this
sake, a novel and efficient supervised dictionary learning method is presented.
The method learns dissimilar dictionaries, one per each class, in order to
extract heterogeneous information for classification. In other words, we are
seeking to minimize the intra-class homogeneity and maximize class
separability. This is made possible by promoting pairwise orthogonality between
class specific dictionaries and controlling the sparsity structure of the audio
clip's decomposition over these dictionaries. The resulting optimization
problem is non-convex and solved using a proximal gradient descent method.
Experiments are performed on both computational auditory scene (East Anglia and
Rouen) and synthetic music chord recognition datasets. Obtained results show
that our method is capable to reach state-of-the-art hand-crafted features for
both applications
Structured Dictionary Learning for Classification
Sparsity driven signal processing has gained tremendous popularity in the
last decade. At its core, the assumption is that the signal of interest is
sparse with respect to either a fixed transformation or a signal dependent
dictionary. To better capture the data characteristics, various dictionary
learning methods have been proposed for both reconstruction and classification
tasks. For classification particularly, most approaches proposed so far have
focused on designing explicit constraints on the sparse code to improve
classification accuracy while simply adopting -norm or -norm for
sparsity regularization. Motivated by the success of structured sparsity in the
area of Compressed Sensing, we propose a structured dictionary learning
framework (StructDL) that incorporates the structure information on both group
and task levels in the learning process. Its benefits are two-fold: (i) the
label consistency between dictionary atoms and training data are implicitly
enforced; and (ii) the classification performance is more robust in the cases
of a small dictionary size or limited training data than other techniques.
Using the subspace model, we derive the conditions for StructDL to guarantee
the performance and show theoretically that StructDL is superior to -norm
or -norm regularized dictionary learning for classification. Extensive
experiments have been performed on both synthetic simulations and real world
applications, such as face recognition and object classification, to
demonstrate the validity of the proposed DL framework
Task-Driven Dictionary Learning for Hyperspectral Image Classification with Structured Sparsity Constraints
Sparse representation models a signal as a linear combination of a small
number of dictionary atoms. As a generative model, it requires the dictionary
to be highly redundant in order to ensure both a stable high sparsity level and
a low reconstruction error for the signal. However, in practice, this
requirement is usually impaired by the lack of labelled training samples.
Fortunately, previous research has shown that the requirement for a redundant
dictionary can be less rigorous if simultaneous sparse approximation is
employed, which can be carried out by enforcing various structured sparsity
constraints on the sparse codes of the neighboring pixels. In addition,
numerous works have shown that applying a variety of dictionary learning
methods for the sparse representation model can also improve the classification
performance. In this paper, we highlight the task-driven dictionary learning
algorithm, which is a general framework for the supervised dictionary learning
method. We propose to enforce structured sparsity priors on the task-driven
dictionary learning method in order to improve the performance of the
hyperspectral classification. Our approach is able to benefit from both the
advantages of the simultaneous sparse representation and those of the
supervised dictionary learning. We enforce two different structured sparsity
priors, the joint and Laplacian sparsity, on the task-driven dictionary
learning method and provide the details of the corresponding optimization
algorithms. Experiments on numerous popular hyperspectral images demonstrate
that the classification performance of our approach is superior to sparse
representation classifier with structured priors or the task-driven dictionary
learning method
Learning and Evaluating Sparse Interpretable Sentence Embeddings
Previous research on word embeddings has shown that sparse representations,
which can be either learned on top of existing dense embeddings or obtained
through model constraints during training time, have the benefit of increased
interpretability properties: to some degree, each dimension can be understood
by a human and associated with a recognizable feature in the data. In this
paper, we transfer this idea to sentence embeddings and explore several
approaches to obtain a sparse representation. We further introduce a novel,
quantitative and automated evaluation metric for sentence embedding
interpretability, based on topic coherence methods. We observe an increase in
interpretability compared to dense models, on a dataset of movie dialogs and on
the scene descriptions from the MS COCO dataset.Comment: Will be presented at the workshop "Analyzing and interpreting neural
networks for NLP", collocated with the EMNLP 2018 conference in Brussel
- …