24 research outputs found

    Newton-type methods under generalized self-concordance and inexact oracles

    Get PDF
    Many modern applications in machine learning, image/signal processing, and statistics require to solve large-scale convex optimization problems. These problems share some common challenges such as high-dimensionality, nonsmoothness, and complex objectives and constraints. Due to these challenges, the theoretical assumptions for existing numerical methods are not satisfied. In numerical methods, it is also impractical to do exact computations in many cases (e.g. noisy computation, storage or time limitation). Therefore, new approaches as well as inexact computations to design new algorithms should be considered. In this thesis, we develop fundamental theories and numerical methods, especially second-order methods, to solve some classes of convex optimization problems, where first-order methods are inefficient or do not have a theoretical guarantee. We aim at exploiting the underlying smoothness structures of the problem to design novel Newton-type methods. More specifically, we generalize a powerful concept called \mbox{self-concordance} introduced by Nesterov and Nemirovski to a broader class of convex functions. We develop several basic properties of this concept and prove key estimates for function values and its derivatives. Then, we apply our theory to design different Newton-type methods such as damped-step Newton methods, full-step Newton methods, and proximal Newton methods. Our new theory allows us to establish both global and local convergence guarantees of these methods without imposing unverifiable conditions as in classical Newton-type methods. Numerical experiments show that our approach has several advantages compared to existing works. In the second part of this thesis, we introduce new global and local inexact oracle settings, and apply them to develop inexact proximal Newton-type schemes for optimizing general composite convex problems equipped with such inexact oracles. These schemes allow us to measure errors theoretically and systematically and still lead to desired convergence results. Moreover, they can be applied to solve a wider class of applications arising in statistics and machine learning.Doctor of Philosoph

    Rigorous optimization recipes for sparse and low rank inverse problems with applications in data sciences

    Get PDF
    Many natural and man-made signals can be described as having a few degrees of freedom relative to their size due to natural parameterizations or constraints; examples include bandlimited signals, collections of signals observed from multiple viewpoints in a network-of-sensors, and per-flow traffic measurements of the Internet. Low-dimensional models (LDMs) mathematically capture the inherent structure of such signals via combinatorial and geometric data models, such as sparsity, unions-of-subspaces, low-rankness, manifolds, and mixtures of factor analyzers, and are emerging to revolutionize the way we treat inverse problems (e.g., signal recovery, parameter estimation, or structure learning) from dimensionality-reduced or incomplete data. Assuming our problem resides in a LDM space, in this thesis we investigate how to integrate such models in convex and non-convex optimization algorithms for significant gains in computational complexity. We mostly focus on two LDMs: (i)(i) sparsity and (ii)(ii) low-rankness. We study trade-offs and their implications to develop efficient and provable optimization algorithms, and--more importantly--to exploit convex and combinatorial optimization that can enable cross-pollination of decades of research in both

    Optimization Methods for Structured Machine Learning Problems

    Get PDF
    Solving large-scale optimization problems lies at the core of modern machine learning applications. Unfortunately, obtaining a sufficiently accurate solution quickly is a difficult task. However, the problems we consider in many machine learning applications exhibit a particular structure. In this thesis we study optimization methods and improve their convergence behavior by taking advantage of such structures. In particular, this thesis constitutes of two parts: In the first part of the thesis, we consider the Temporal Difference learning (TD) problem in off-line Reinforcement Learning (RL). In off-line RL, it is typically the case that the number of samples is small compared to the number of features. Therefore, recent advances have focused on efficient algorithms to incorporate feature selection via `1-regularization which effectively avoids over-fitting. Unfortunately, the TD optimization problem reduces to a fixed-point problem where convexity of the objective function cannot be assumed. Further, it remains unclear whether existing algorithms have the ability to offer good approximations for the task of policy evaluation and improvement (either they are non-convergent or do not solve the fixed-point problem). In this part of the thesis, we attempt to solve the `1- regularized fixed-point problem with the help of Alternating Direction Method of Multipliers (ADMM) and we argue that the proposed method is well suited to the structure of the aforementioned fixed-point problem. In the second part of the thesis, we study multilevel methods for large-scale optimization and extend their theoretical analysis to self-concordant functions. In particular, we address the following issues that arise in the analysis of second-order optimization methods based either on sampling, randomization or sketching: (a) the analysis of the iterates is not scale-invariant and (b) lack of global fast convergence rates without restrictive assumptions. We argue that, with the analysis undertaken in this part of the thesis, the analysis of randomized second-order methods can be considered on-par with the analysis of the classical Newton method. Further, we demonstrate how our proposed method can exploit typical spectral structures of the Hessian that arise in machine learning applications to further improve the convergence rates

    Regularized Newton Method with Global O(1/k2)O(1/k^2) Convergence

    Full text link
    We present a Newton-type method that converges fast from any initialization and for arbitrary convex objectives with Lipschitz Hessians. We achieve this by merging the ideas of cubic regularization with a certain adaptive Levenberg--Marquardt penalty. In particular, we show that the iterates given by xk+1=xkβˆ’(βˆ‡2f(xk)+Hβˆ₯βˆ‡f(xk)βˆ₯I)βˆ’1βˆ‡f(xk)x^{k+1}=x^k - \bigl(\nabla^2 f(x^k) + \sqrt{H\|\nabla f(x^k)\|} \mathbf{I}\bigr)^{-1}\nabla f(x^k), where H>0H>0 is a constant, converge globally with a O(1k2)\mathcal{O}(\frac{1}{k^2}) rate. Our method is the first variant of Newton's method that has both cheap iterations and provably fast global convergence. Moreover, we prove that locally our method converges superlinearly when the objective is strongly convex. To boost the method's performance, we present a line search procedure that does not need hyperparameters and is provably efficient.Comment: 21 pages, 2 figure
    corecore