9 research outputs found

    Schur properties of convolutions of gamma random variables

    Get PDF
    Sufficient conditions for comparing the convolutions of heterogeneous gamma random variables in terms of the usual stochastic order are established. Such comparisons are characterized by the Schur convexity properties of the cumulative distribution function of the convolutions. Some examples of the practical applications of our results are given

    Optimization Methods for Inverse Problems

    Full text link
    Optimization plays an important role in solving many inverse problems. Indeed, the task of inversion often either involves or is fully cast as a solution of an optimization problem. In this light, the mere non-linear, non-convex, and large-scale nature of many of these inversions gives rise to some very challenging optimization problems. The inverse problem community has long been developing various techniques for solving such optimization tasks. However, other, seemingly disjoint communities, such as that of machine learning, have developed, almost in parallel, interesting alternative methods which might have stayed under the radar of the inverse problem community. In this survey, we aim to change that. In doing so, we first discuss current state-of-the-art optimization methods widely used in inverse problems. We then survey recent related advances in addressing similar challenges in problems faced by the machine learning community, and discuss their potential advantages for solving inverse problems. By highlighting the similarities among the optimization challenges faced by the inverse problem and the machine learning communities, we hope that this survey can serve as a bridge in bringing together these two communities and encourage cross fertilization of ideas.Comment: 13 page

    Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study

    Full text link
    While first-order optimization methods such as stochastic gradient descent (SGD) are popular in machine learning (ML), they come with well-known deficiencies, including relatively-slow convergence, sensitivity to the settings of hyper-parameters such as learning rate, stagnation at high training errors, and difficulty in escaping flat regions and saddle points. These issues are particularly acute in highly non-convex settings such as those arising in neural networks. Motivated by this, there has been recent interest in second-order methods that aim to alleviate these shortcomings by capturing curvature information. In this paper, we report detailed empirical evaluations of a class of Newton-type methods, namely sub-sampled variants of trust region (TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex ML problems. In doing so, we demonstrate that these methods not only can be computationally competitive with hand-tuned SGD with momentum, obtaining comparable or better generalization performance, but also they are highly robust to hyper-parameter settings. Further, in contrast to SGD with momentum, we show that the manner in which these Newton-type methods employ curvature information allows them to seamlessly escape flat regions and saddle points.Comment: 21 pages, 11 figures. Restructure the paper and add experiment

    Assessing stochastic algorithms for large scale nonlinear least squares problems using extremal probabilities of linear combinations of gamma random variables

    Get PDF
    This article considers stochastic algorithms for efficiently solving a class of large scale non-linear least squares (NLS) problems which frequently arise in applications. We propose eight variants of a practical randomized algorithm where the uncertainties in the major stochastic steps are quantified. Such stochastic steps involve approximating the NLS objective function using Monte-Carlo methods, and this is equivalent to the estimation of the trace of corresponding symmetric positive semi-definite (SPSD) matrices. For the latter, we prove tight necessary and sufficient conditions on the sample size (which translates to cost) to satisfy the prescribed probabilistic accuracy. We show that these conditions are practically computable and yield small sample sizes. They are then incorporated in our stochastic algorithm to quantify the uncertainty in each randomized step. The bounds we use are applications of more general results regarding extremal tail probabilities of linear combinations of gamma distributed random variables. We derive and prove new results concerning the maximal and minimal tail probabilities of such linear combinations, which can be considered independently of the rest of this paper

    Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian Information

    Full text link
    We consider variants of trust-region and cubic regularization methods for non-convex optimization, in which the Hessian matrix is approximated. Under mild conditions on the inexact Hessian, and using approximate solution of the corresponding sub-problems, we provide iteration complexity to achieve ϵ \epsilon -approximate second-order optimality which have shown to be tight. Our Hessian approximation conditions constitute a major relaxation over the existing ones in the literature. Consequently, we are able to show that such mild conditions allow for the construction of the approximate Hessian through various random sampling methods. In this light, we consider the canonical problem of finite-sum minimization, provide appropriate uniform and non-uniform sub-sampling strategies to construct such Hessian approximations, and obtain optimal iteration complexity for the corresponding sub-sampled trust-region and cubic regularization methods.Comment: 32 page

    Data completion and stochastic algorithms for PDE inversion problems with many measurements

    No full text
    Inverse problems involving systems of partial differential equations (PDEs) with many measurements or experiments can be very expensive to solve numerically. In a recent paper we examined dimensionality reduction methods, both stochastic and deterministic, to reduce this computational burden, assuming that all experiments share the same set of receivers. In the present article we consider the more general and practically important case where receivers are not shared across experiments. We propose a data completion approach to alleviate this problem. This is done by means of an approximation using a gradient or Laplacian regularization, extending existing data for each experiment to the union of all receiver locations. Results using the method of simultaneous sources with the completed data are then compared to those obtained by a more general but slower random subset method which requires no modifications.
    corecore