1,634 research outputs found
Robust Sparse Coding via Self-Paced Learning
Sparse coding (SC) is attracting more and more attention due to its
comprehensive theoretical studies and its excellent performance in many signal
processing applications. However, most existing sparse coding algorithms are
nonconvex and are thus prone to becoming stuck into bad local minima,
especially when there are outliers and noisy data. To enhance the learning
robustness, in this paper, we propose a unified framework named Self-Paced
Sparse Coding (SPSC), which gradually include matrix elements into SC learning
from easy to complex. We also generalize the self-paced learning schema into
different levels of dynamic selection on samples, features and elements
respectively. Experimental results on real-world data demonstrate the efficacy
of the proposed algorithms.Comment: submitted to AAAI201
Learning Less-Overlapping Representations
In representation learning (RL), how to make the learned representations easy
to interpret and less overfitted to training data are two important but
challenging issues. To address these problems, we study a new type of
regulariza- tion approach that encourages the supports of weight vectors in RL
models to have small overlap, by simultaneously promoting near-orthogonality
among vectors and sparsity of each vector. We apply the proposed regularizer to
two models: neural networks (NNs) and sparse coding (SC), and develop an
efficient ADMM-based algorithm for regu- larized SC. Experiments on various
datasets demonstrate that weight vectors learned under our regularizer are more
interpretable and have better generalization performance
Saturating Auto-Encoders
We introduce a simple new regularizer for auto-encoders whose hidden-unit
activation functions contain at least one zero-gradient (saturated) region.
This regularizer explicitly encourages activations in the saturated region(s)
of the corresponding activation function. We call these Saturating
Auto-Encoders (SATAE). We show that the saturation regularizer explicitly
limits the SATAE's ability to reconstruct inputs which are not near the data
manifold. Furthermore, we show that a wide variety of features can be learned
when different activation functions are used. Finally, connections are
established with the Contractive and Sparse Auto-Encoders
A Survey on Multi-Task Learning
Multi-Task Learning (MTL) is a learning paradigm in machine learning and its
aim is to leverage useful information contained in multiple related tasks to
help improve the generalization performance of all the tasks. In this paper, we
give a survey for MTL. First, we classify different MTL algorithms into several
categories, including feature learning approach, low-rank approach, task
clustering approach, task relation learning approach, and decomposition
approach, and then discuss the characteristics of each approach. In order to
improve the performance of learning tasks further, MTL can be combined with
other learning paradigms including semi-supervised learning, active learning,
unsupervised learning, reinforcement learning, multi-view learning and
graphical models. When the number of tasks is large or the data dimensionality
is high, batch MTL models are difficult to handle this situation and online,
parallel and distributed MTL models as well as dimensionality reduction and
feature hashing are reviewed to reveal their computational and storage
advantages. Many real-world applications use MTL to boost their performance and
we review representative works. Finally, we present theoretical analyses and
discuss several future directions for MTL
Learning Efficient Structured Sparse Models
We present a comprehensive framework for structured sparse coding and
modeling extending the recent ideas of using learnable fast regressors to
approximate exact sparse codes. For this purpose, we develop a novel
block-coordinate proximal splitting method for the iterative solution of
hierarchical sparse coding problems, and show an efficient feed forward
architecture derived from its iteration. This architecture faithfully
approximates the exact structured sparse codes with a fraction of the
complexity of the standard optimization methods. We also show that by using
different training objective functions, learnable sparse encoders are no longer
restricted to be mere approximants of the exact sparse code for a pre-given
dictionary, as in earlier formulations, but can be rather used as full-featured
sparse encoders or even modelers. A simple implementation shows several orders
of magnitude speedup compared to the state-of-the-art at minimal performance
degradation, making the proposed framework suitable for real time and
large-scale applications.Comment: ICML201
Compression of Deep Convolutional Neural Networks under Joint Sparsity Constraints
We consider the optimization of deep convolutional neural networks (CNNs)
such that they provide good performance while having reduced complexity if
deployed on either conventional systems utilizing spatial-domain convolution or
lower complexity systems designed for Winograd convolution. Furthermore, we
explore the universal quantization and compression of these networks. In
particular, the proposed framework produces one compressed model whose
convolutional filters can be made sparse either in the spatial domain or in the
Winograd domain. Hence, one compressed model can be deployed universally on any
platform, without need for re-training on the deployed platform, and the
sparsity of its convolutional filters can be exploited for further complexity
reduction in either domain. To get a better compression ratio, the sparse model
is compressed in the spatial domain which has a less number of parameters. From
our experiments, we obtain , and
compressed models for ResNet-18, AlexNet and CT-SRCNN, while their
computational cost is also reduced by , and
, respectively
Riemannian Dictionary Learning and Sparse Coding for Positive Definite Matrices
Data encoded as symmetric positive definite (SPD) matrices frequently arise
in many areas of computer vision and machine learning. While these matrices
form an open subset of the Euclidean space of symmetric matrices, viewing them
through the lens of non-Euclidean Riemannian geometry often turns out to be
better suited in capturing several desirable data properties. However,
formulating classical machine learning algorithms within such a geometry is
often non-trivial and computationally expensive. Inspired by the great success
of dictionary learning and sparse coding for vector-valued data, our goal in
this paper is to represent data in the form of SPD matrices as sparse conic
combinations of SPD atoms from a learned dictionary via a Riemannian geometric
approach. To that end, we formulate a novel Riemannian optimization objective
for dictionary learning and sparse coding in which the representation loss is
characterized via the affine invariant Riemannian metric. We also present a
computationally simple algorithm for optimizing our model. Experiments on
several computer vision datasets demonstrate superior classification and
retrieval performance using our approach when compared to sparse coding via
alternative non-Riemannian formulations
Learning efficient sparse and low rank models
Parsimony, including sparsity and low rank, has been shown to successfully
model data in numerous machine learning and signal processing tasks.
Traditionally, such modeling approaches rely on an iterative algorithm that
minimizes an objective function with parsimony-promoting terms. The inherently
sequential structure and data-dependent complexity and latency of iterative
optimization constitute a major limitation in many applications requiring
real-time performance or involving large-scale data. Another limitation
encountered by these modeling techniques is the difficulty of their inclusion
in discriminative learning scenarios. In this work, we propose to move the
emphasis from the model to the pursuit algorithm, and develop a process-centric
view of parsimonious modeling, in which a learned deterministic
fixed-complexity pursuit process is used in lieu of iterative optimization. We
show a principled way to construct learnable pursuit process architectures for
structured sparse and robust low rank models, derived from the iteration of
proximal descent algorithms. These architectures learn to approximate the exact
parsimonious representation at a fraction of the complexity of the standard
optimization methods. We also show that appropriate training regimes allow to
naturally extend parsimonious models to discriminative settings.
State-of-the-art results are demonstrated on several challenging problems in
image and audio processing with several orders of magnitude speedup compared to
the exact optimization algorithms
Auxiliary Image Regularization for Deep CNNs with Noisy Labels
Precisely-labeled data sets with sufficient amount of samples are very
important for training deep convolutional neural networks (CNNs). However, many
of the available real-world data sets contain erroneously labeled samples and
those errors substantially hinder the learning of very accurate CNN models. In
this work, we consider the problem of training a deep CNN model for image
classification with mislabeled training samples - an issue that is common in
real image data sets with tags supplied by amateur users. To solve this
problem, we propose an auxiliary image regularization technique, optimized by
the stochastic Alternating Direction Method of Multipliers (ADMM) algorithm,
that automatically exploits the mutual context information among training
images and encourages the model to select reliable images to robustify the
learning process. Comprehensive experiments on benchmark data sets clearly
demonstrate our proposed regularized CNN model is resistant to label noise in
training data.Comment: Published as a conference paper at ICLR 201
Identifying global optimality for dictionary learning
Learning new representations of input observations in machine learning is
often tackled using a factorization of the data. For many such problems,
including sparse coding and matrix completion, learning these factorizations
can be difficult, in terms of efficiency and to guarantee that the solution is
a global minimum. Recently, a general class of objectives have been
introduced-which we term induced dictionary learning models (DLMs)-that have an
induced convex form that enables global optimization. Though attractive
theoretically, this induced form is impractical, particularly for large or
growing datasets. In this work, we investigate the use of practical alternating
minimization algorithms for induced DLMs, that ensure convergence to global
optima. We characterize the stationary points of these models, and, using these
insights, highlight practical choices for the objectives. We then provide
theoretical and empirical evidence that alternating minimization, from a random
initialization, converges to global minima for a large subclass of induced
DLMs. In particular, we take advantage of the existence of the (potentially
unknown) convex induced form, to identify when stationary points are global
minima for the dictionary learning objective. We then provide an empirical
investigation into practical optimization choices for using alternating
minimization for induced DLMs, for both batch and stochastic gradient descent.Comment: Updates to previous version include a small modification to
Proposition 2, to only use normed regularizers, and a modification to the
main theorem (previously Theorem 13) to focus on the overcomplete, full rank
setting and to better characterize non-differentiable induced regularizers.
The theory has been significantly modified since version
- …