3,785 research outputs found
Tyler's Covariance Matrix Estimator in Elliptical Models with Convex Structure
We address structured covariance estimation in elliptical distributions by
assuming that the covariance is a priori known to belong to a given convex set,
e.g., the set of Toeplitz or banded matrices. We consider the General Method of
Moments (GMM) optimization applied to robust Tyler's scatter M-estimator
subject to these convex constraints. Unfortunately, GMM turns out to be
non-convex due to the objective. Instead, we propose a new COCA estimator - a
convex relaxation which can be efficiently solved. We prove that the relaxation
is tight in the unconstrained case for a finite number of samples, and in the
constrained case asymptotically. We then illustrate the advantages of COCA in
synthetic simulations with structured compound Gaussian distributions. In these
examples, COCA outperforms competing methods such as Tyler's estimator and its
projection onto the structure set.Comment: arXiv admin note: text overlap with arXiv:1311.059
Stochastic Training of Neural Networks via Successive Convex Approximations
This paper proposes a new family of algorithms for training neural networks
(NNs). These are based on recent developments in the field of non-convex
optimization, going under the general name of successive convex approximation
(SCA) techniques. The basic idea is to iteratively replace the original
(non-convex, highly dimensional) learning problem with a sequence of (strongly
convex) approximations, which are both accurate and simple to optimize.
Differently from similar ideas (e.g., quasi-Newton algorithms), the
approximations can be constructed using only first-order information of the
neural network function, in a stochastic fashion, while exploiting the overall
structure of the learning problem for a faster convergence. We discuss several
use cases, based on different choices for the loss function (e.g., squared loss
and cross-entropy loss), and for the regularization of the NN's weights. We
experiment on several medium-sized benchmark problems, and on a large-scale
dataset involving simulated physical data. The results show how the algorithm
outperforms state-of-the-art techniques, providing faster convergence to a
better minimum. Additionally, we show how the algorithm can be easily
parallelized over multiple computational units without hindering its
performance. In particular, each computational unit can optimize a tailored
surrogate function defined on a randomly assigned subset of the input
variables, whose dimension can be selected depending entirely on the available
computational power.Comment: Preprint submitted to IEEE Transactions on Neural Networks and
Learning System
Covariance Estimation in Elliptical Models with Convex Structure
We address structured covariance estimation in Elliptical distribution. We
assume it is a priori known that the covariance belongs to a given convex set,
e.g., the set of Toeplitz or banded matrices. We consider the General Method of
Moments (GMM) optimization subject to these convex constraints. Unfortunately,
GMM is still non-convex due to objective. Instead, we propose COCA - a convex
relaxation which can be efficiently solved. We prove that the relaxation is
tight in the unconstrained case for a finite number of samples, and in the
constrained case asymptotically. We then illustrate the advantages of COCA in
synthetic simulations with structured Compound Gaussian distributions. In these
examples, COCA outperforms competing methods as Tyler's estimate and its
projection onto a convex set
Large Scale Variational Bayesian Inference for Structured Scale Mixture Models
Natural image statistics exhibit hierarchical dependencies across multiple
scales. Representing such prior knowledge in non-factorial latent tree models
can boost performance of image denoising, inpainting, deconvolution or
reconstruction substantially, beyond standard factorial "sparse" methodology.
We derive a large scale approximate Bayesian inference algorithm for linear
models with non-factorial (latent tree-structured) scale mixture priors.
Experimental results on a range of denoising and inpainting problems
demonstrate substantially improved performance compared to MAP estimation or to
inference with factorial priors.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Slow Learners are Fast
Online learning algorithms have impressive convergence properties when it
comes to risk minimization and convex games on very large problems. However,
they are inherently sequential in their design which prevents them from taking
advantage of modern multi-core architectures. In this paper we prove that
online learning with delayed updates converges well, thereby facilitating
parallel online learning.Comment: Extended version of conference paper - NIPS 200
Non-convex Optimization for Machine Learning
A vast majority of machine learning algorithms train their models and perform
inference by solving optimization problems. In order to capture the learning
and prediction problems accurately, structural constraints such as sparsity or
low rank are frequently imposed or else the objective itself is designed to be
a non-convex function. This is especially true of algorithms that operate in
high-dimensional spaces or that train non-linear models such as tensor models
and deep networks.
The freedom to express the learning problem as a non-convex optimization
problem gives immense modeling power to the algorithm designer, but often such
problems are NP-hard to solve. A popular workaround to this has been to relax
non-convex problems to convex ones and use traditional methods to solve the
(convex) relaxed optimization problems. However this approach may be lossy and
nevertheless presents significant challenges for large scale optimization.
On the other hand, direct approaches to non-convex optimization have met with
resounding success in several domains and remain the methods of choice for the
practitioner, as they frequently outperform relaxation-based techniques -
popular heuristics include projected gradient descent and alternating
minimization. However, these are often poorly understood in terms of their
convergence and other properties.
This monograph presents a selection of recent advances that bridge a
long-standing gap in our understanding of these heuristics. The monograph will
lead the reader through several widely used non-convex optimization techniques,
as well as applications thereof. The goal of this monograph is to both,
introduce the rich literature in this area, as well as equip the reader with
the tools and techniques needed to analyze these simple procedures for
non-convex problems.Comment: The official publication is available from now publishers via
http://dx.doi.org/10.1561/220000005
- …