9 research outputs found
Schur properties of convolutions of gamma random variables
Sufficient conditions for comparing the convolutions of heterogeneous gamma
random variables in terms of the usual stochastic order are established. Such
comparisons are characterized by the Schur convexity properties of the
cumulative distribution function of the convolutions. Some examples of the
practical applications of our results are given
Optimization Methods for Inverse Problems
Optimization plays an important role in solving many inverse problems.
Indeed, the task of inversion often either involves or is fully cast as a
solution of an optimization problem. In this light, the mere non-linear,
non-convex, and large-scale nature of many of these inversions gives rise to
some very challenging optimization problems. The inverse problem community has
long been developing various techniques for solving such optimization tasks.
However, other, seemingly disjoint communities, such as that of machine
learning, have developed, almost in parallel, interesting alternative methods
which might have stayed under the radar of the inverse problem community. In
this survey, we aim to change that. In doing so, we first discuss current
state-of-the-art optimization methods widely used in inverse problems. We then
survey recent related advances in addressing similar challenges in problems
faced by the machine learning community, and discuss their potential advantages
for solving inverse problems. By highlighting the similarities among the
optimization challenges faced by the inverse problem and the machine learning
communities, we hope that this survey can serve as a bridge in bringing
together these two communities and encourage cross fertilization of ideas.Comment: 13 page
Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study
While first-order optimization methods such as stochastic gradient descent
(SGD) are popular in machine learning (ML), they come with well-known
deficiencies, including relatively-slow convergence, sensitivity to the
settings of hyper-parameters such as learning rate, stagnation at high training
errors, and difficulty in escaping flat regions and saddle points. These issues
are particularly acute in highly non-convex settings such as those arising in
neural networks. Motivated by this, there has been recent interest in
second-order methods that aim to alleviate these shortcomings by capturing
curvature information. In this paper, we report detailed empirical evaluations
of a class of Newton-type methods, namely sub-sampled variants of trust region
(TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex
ML problems. In doing so, we demonstrate that these methods not only can be
computationally competitive with hand-tuned SGD with momentum, obtaining
comparable or better generalization performance, but also they are highly
robust to hyper-parameter settings. Further, in contrast to SGD with momentum,
we show that the manner in which these Newton-type methods employ curvature
information allows them to seamlessly escape flat regions and saddle points.Comment: 21 pages, 11 figures. Restructure the paper and add experiment
Assessing stochastic algorithms for large scale nonlinear least squares problems using extremal probabilities of linear combinations of gamma random variables
This article considers stochastic algorithms for efficiently solving a class
of large scale non-linear least squares (NLS) problems which frequently arise
in applications. We propose eight variants of a practical randomized algorithm
where the uncertainties in the major stochastic steps are quantified. Such
stochastic steps involve approximating the NLS objective function using
Monte-Carlo methods, and this is equivalent to the estimation of the trace of
corresponding symmetric positive semi-definite (SPSD) matrices. For the latter,
we prove tight necessary and sufficient conditions on the sample size (which
translates to cost) to satisfy the prescribed probabilistic accuracy. We show
that these conditions are practically computable and yield small sample sizes.
They are then incorporated in our stochastic algorithm to quantify the
uncertainty in each randomized step. The bounds we use are applications of more
general results regarding extremal tail probabilities of linear combinations of
gamma distributed random variables. We derive and prove new results concerning
the maximal and minimal tail probabilities of such linear combinations, which
can be considered independently of the rest of this paper
Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian Information
We consider variants of trust-region and cubic regularization methods for
non-convex optimization, in which the Hessian matrix is approximated. Under
mild conditions on the inexact Hessian, and using approximate solution of the
corresponding sub-problems, we provide iteration complexity to achieve -approximate second-order optimality which have shown to be tight.
Our Hessian approximation conditions constitute a major relaxation over the
existing ones in the literature. Consequently, we are able to show that such
mild conditions allow for the construction of the approximate Hessian through
various random sampling methods. In this light, we consider the canonical
problem of finite-sum minimization, provide appropriate uniform and non-uniform
sub-sampling strategies to construct such Hessian approximations, and obtain
optimal iteration complexity for the corresponding sub-sampled trust-region and
cubic regularization methods.Comment: 32 page
Data completion and stochastic algorithms for PDE inversion problems with many measurements
Inverse problems involving systems of partial differential equations (PDEs) with many measurements or experiments can be very expensive to solve numerically. In a recent paper we examined dimensionality reduction methods, both stochastic and deterministic, to reduce this computational burden, assuming that all experiments share the same set of receivers. In the present article we consider the more general and practically important case where receivers are not shared across experiments. We propose a data completion approach to alleviate this problem. This is done by means of an approximation using a gradient or Laplacian regularization, extending existing data for each experiment to the union of all receiver locations. Results using the method of simultaneous sources with the completed data are then compared to those obtained by a more general but slower random subset method which requires no modifications.