292,700 research outputs found
Learning Compact Features via In-Training Representation Alignment
Deep neural networks (DNNs) for supervised learning can be viewed as a
pipeline of the feature extractor (i.e., last hidden layer) and a linear
classifier (i.e., output layer) that are trained jointly with stochastic
gradient descent (SGD) on the loss function (e.g., cross-entropy). In each
epoch, the true gradient of the loss function is estimated using a mini-batch
sampled from the training set and model parameters are then updated with the
mini-batch gradients. Although the latter provides an unbiased estimation of
the former, they are subject to substantial variances derived from the size and
number of sampled mini-batches, leading to noisy and jumpy updates. To
stabilize such undesirable variance in estimating the true gradients, we
propose In-Training Representation Alignment (ITRA) that explicitly aligns
feature distributions of two different mini-batches with a matching loss in the
SGD training process. We also provide a rigorous analysis of the desirable
effects of the matching loss on feature representation learning: (1) extracting
compact feature representation; (2) reducing over-adaption on mini-batches via
an adaptive weighting mechanism; and (3) accommodating to multi-modalities.
Finally, we conduct large-scale experiments on both image and text
classifications to demonstrate its superior performance to the strong
baselines.Comment: 11 pages, 4 figures, 6 tables. Accepted for publication by AAAI-23.
arXiv admin note: text overlap with arXiv:2002.0991
Cascade Subspace Clustering for Outlier Detection
Many methods based on sparse and low-rank representation been developed along
with guarantees of correct outlier detection. Self-representation states that a
point in a subspace can always be expressed as a linear combination of other
points in the subspace. A suitable Markov Chain can be defined on the
self-representation and it allows us to recognize the difference between
inliers and outliers. However, the reconstruction error of self-representation
that is still informative to detect outlier detection, is neglected.Inspired by
the gradient boosting, in this paper, we propose a new outlier detection
framework that combines a series of weak "outlier detectors" into a single
strong one in an iterative fashion by constructing multi-pass
self-representation. At each stage, we construct a self-representation based on
elastic-net and define a suitable Markov Chain on it to detect outliers. The
residual of the self-representation is used for the next stage to learn the
next weaker outlier detector. Such a stage will repeat many times. And the
final decision of outliers is generated by the previous all results.
Experimental results on image and speaker datasets demonstrate its superiority
with respect to state-of-the-art sparse and low-rank outlier detection methods.Comment: arXiv admin note: text overlap with arXiv:1704.03925 by other author
MAGMA: Multi-level accelerated gradient mirror descent algorithm for large-scale convex composite minimization
Composite convex optimization models arise in several applications, and are
especially prevalent in inverse problems with a sparsity inducing norm and in
general convex optimization with simple constraints. The most widely used
algorithms for convex composite models are accelerated first order methods,
however they can take a large number of iterations to compute an acceptable
solution for large-scale problems. In this paper we propose to speed up first
order methods by taking advantage of the structure present in many applications
and in image processing in particular. Our method is based on multi-level
optimization methods and exploits the fact that many applications that give
rise to large scale models can be modelled using varying degrees of fidelity.
We use Nesterov's acceleration techniques together with the multi-level
approach to achieve convergence rate, where
denotes the desired accuracy. The proposed method has a better
convergence rate than any other existing multi-level method for convex
problems, and in addition has the same rate as accelerated methods, which is
known to be optimal for first-order methods. Moreover, as our numerical
experiments show, on large-scale face recognition problems our algorithm is
several times faster than the state of the art
Convolutional Dictionary Learning: Acceleration and Convergence
Convolutional dictionary learning (CDL or sparsifying CDL) has many
applications in image processing and computer vision. There has been growing
interest in developing efficient algorithms for CDL, mostly relying on the
augmented Lagrangian (AL) method or the variant alternating direction method of
multipliers (ADMM). When their parameters are properly tuned, AL methods have
shown fast convergence in CDL. However, the parameter tuning process is not
trivial due to its data dependence and, in practice, the convergence of AL
methods depends on the AL parameters for nonconvex CDL problems. To moderate
these problems, this paper proposes a new practically feasible and convergent
Block Proximal Gradient method using a Majorizer (BPG-M) for CDL. The
BPG-M-based CDL is investigated with different block updating schemes and
majorization matrix designs, and further accelerated by incorporating some
momentum coefficient formulas and restarting techniques. All of the methods
investigated incorporate a boundary artifacts removal (or, more generally,
sampling) operator in the learning model. Numerical experiments show that,
without needing any parameter tuning process, the proposed BPG-M approach
converges more stably to desirable solutions of lower objective values than the
existing state-of-the-art ADMM algorithm and its memory-efficient variant do.
Compared to the ADMM approaches, the BPG-M method using a multi-block updating
scheme is particularly useful in single-threaded CDL algorithm handling large
datasets, due to its lower memory requirement and no polynomial computational
complexity. Image denoising experiments show that, for relatively strong
additive white Gaussian noise, the filters learned by BPG-M-based CDL
outperform those trained by the ADMM approach.Comment: 21 pages, 7 figures, submitted to IEEE Transactions on Image
Processin
- …