981 research outputs found

    A General Framework of Large-Scale Convex Optimization Using Jensen Surrogates and Acceleration Techniques

    Get PDF
    In a world where data rates are growing faster than computing power, algorithmic acceleration based on developments in mathematical optimization plays a crucial role in narrowing the gap between the two. As the scale of optimization problems in many fields is getting larger, we need faster optimization methods that not only work well in theory, but also work well in practice by exploiting underlying state-of-the-art computing technology. In this document, we introduce a unified framework of large-scale convex optimization using Jensen surrogates, an iterative optimization method that has been used in different fields since the 1970s. After this general treatment, we present non-asymptotic convergence analysis of this family of methods and the motivation behind developing accelerated variants. Moreover, we discuss widely used acceleration techniques for convex optimization and then investigate acceleration techniques that can be used within the Jensen surrogate framework while proposing several novel acceleration methods. Furthermore, we show that proposed methods perform competitively with or better than state-of-the-art algorithms for several applications including Sparse Linear Regression (Image Deblurring), Positron Emission Tomography, X-Ray Transmission Tomography, Logistic Regression, Sparse Logistic Regression and Automatic Relevance Determination for X-Ray Transmission Tomography

    Generalized Bregman Divergence and Gradient of Mutual Information for Vector Poisson Channels

    Full text link
    We investigate connections between information-theoretic and estimation-theoretic quantities in vector Poisson channel models. In particular, we generalize the gradient of mutual information with respect to key system parameters from the scalar to the vector Poisson channel model. We also propose, as another contribution, a generalization of the classical Bregman divergence that offers a means to encapsulate under a unifying framework the gradient of mutual information results for scalar and vector Poisson and Gaussian channel models. The so-called generalized Bregman divergence is also shown to exhibit various properties akin to the properties of the classical version. The vector Poisson channel model is drawing considerable attention in view of its application in various domains: as an example, the availability of the gradient of mutual information can be used in conjunction with gradient descent methods to effect compressive-sensing projection designs in emerging X-ray and document classification applications

    The Limitation and Practical Acceleration of Stochastic Gradient Algorithms in Inverse Problems.

    Get PDF
    In this work we investigate the practicability of stochastic gradient descent and recently introduced variants with variance-reduction techniques in imaging inverse problems, such as space-varying image deblurring. Such algorithms have been shown in machine learning literature to have optimal complexities in theory, and provide great improvement empirically over the full gradient methods. Surprisingly, in some tasks such as image deblurring, many of such methods fail to converge faster than the accelerated full gradient method (FISTA), even in terms of epoch counts. We investigate this phenomenon and propose a theory-inspired mechanism to characterize whether a given inverse problem should be preferred to be solved by stochastic optimization technique with a known sampling pattern. Furthermore, to overcome another key bottleneck of stochastic optimization which is the heavy computation of proximal operators while maintaining fast convergence, we propose an accelerated primal-dual SGD algorithm and demonstrate the effectiveness of our approach in image deblurring experiments.acceptedVersionPeer reviewe

    Two Polyak-Type Step Sizes for Mirror Descent

    Full text link
    We propose two Polyak-type step sizes for mirror descent and prove their convergences for minimizing convex locally Lipschitz functions. Both step sizes, unlike the original Polyak step size, do not need the optimal value of the objective function.Comment: 13 page

    Robust Data-Driven Accelerated Mirror Descent

    Get PDF
    Learning-to-optimize is an emerging framework that leverages training data to speed up the solution of certain optimization problems. One such approach is based on the classical mirror descent algorithm, where the mirror map is modelled using input-convex neural networks. In this work, we extend this functional parameterization approach by introducing momentum into the iterations, based on the classical accelerated mirror descent. Our approach combines short-time accelerated convergence with stable long-time behavior. We empirically demonstrate additional robustness with respect to multiple parameters on denoising and deconvolution experiments.Comment: Note inconsistency with ICASSP paper for step-size choice in (4c) and associated Alg. 1, this version is correct with step-size kt/
    corecore