50 research outputs found

    Newton-type methods under generalized self-concordance and inexact oracles

    Get PDF
    Many modern applications in machine learning, image/signal processing, and statistics require to solve large-scale convex optimization problems. These problems share some common challenges such as high-dimensionality, nonsmoothness, and complex objectives and constraints. Due to these challenges, the theoretical assumptions for existing numerical methods are not satisfied. In numerical methods, it is also impractical to do exact computations in many cases (e.g. noisy computation, storage or time limitation). Therefore, new approaches as well as inexact computations to design new algorithms should be considered. In this thesis, we develop fundamental theories and numerical methods, especially second-order methods, to solve some classes of convex optimization problems, where first-order methods are inefficient or do not have a theoretical guarantee. We aim at exploiting the underlying smoothness structures of the problem to design novel Newton-type methods. More specifically, we generalize a powerful concept called \mbox{self-concordance} introduced by Nesterov and Nemirovski to a broader class of convex functions. We develop several basic properties of this concept and prove key estimates for function values and its derivatives. Then, we apply our theory to design different Newton-type methods such as damped-step Newton methods, full-step Newton methods, and proximal Newton methods. Our new theory allows us to establish both global and local convergence guarantees of these methods without imposing unverifiable conditions as in classical Newton-type methods. Numerical experiments show that our approach has several advantages compared to existing works. In the second part of this thesis, we introduce new global and local inexact oracle settings, and apply them to develop inexact proximal Newton-type schemes for optimizing general composite convex problems equipped with such inexact oracles. These schemes allow us to measure errors theoretically and systematically and still lead to desired convergence results. Moreover, they can be applied to solve a wider class of applications arising in statistics and machine learning.Doctor of Philosoph

    Nonsmooth Optimization; Proceedings of an IIASA Workshop, March 28 - April 8, 1977

    Get PDF
    Optimization, a central methodological tool of systems analysis, is used in many of IIASA's research areas, including the Energy Systems and Food and Agriculture Programs. IIASA's activity in the field of optimization is strongly connected with nonsmooth or nondifferentiable extreme problems, which consist of searching for conditional or unconditional minima of functions that, due to their complicated internal structure, have no continuous derivatives. Particularly significant for these kinds of extreme problems in systems analysis is the strong link between nonsmooth or nondifferentiable optimization and the decomposition approach to large-scale programming. This volume contains the report of the IIASA workshop held from March 28 to April 8, 1977, entitled Nondifferentiable Optimization. However, the title was changed to Nonsmooth Optimization for publication of this volume as we are concerned not only with optimization without derivatives, but also with problems having functions for which gradients exist almost everywhere but are not continous, so that the usual gradient-based methods fail. Because of the small number of participants and the unusual length of the workshop, a substantial exchange of information was possible. As a result, details of the main developments in nonsmooth optimization are summarized in this volume, which might also be considered a guide for inexperienced users. Eight papers are presented: three on subgradient optimization, four on descent methods, and one on applicability. The report also includes a set of nonsmooth optimization test problems and a comprehensive bibliography

    Self-concordant Smoothing for Convex Composite Optimization

    Full text link
    We introduce the notion of self-concordant smoothing for minimizing the sum of two convex functions: the first is smooth and the second may be nonsmooth. Our framework results naturally from the smoothing approximation technique referred to as partial smoothing in which only a part of the nonsmooth function is smoothed. The key highlight of our approach is in a natural property of the resulting problem's structure which provides us with a variable-metric selection method and a step-length selection rule particularly suitable for proximal Newton-type algorithms. In addition, we efficiently handle specific structures promoted by the nonsmooth function, such as â„“1\ell_1-regularization and group-lasso penalties. We prove local quadratic convergence rates for two resulting algorithms: Prox-N-SCORE, a proximal Newton algorithm and Prox-GGN-SCORE, a proximal generalized Gauss-Newton (GGN) algorithm. The Prox-GGN-SCORE algorithm highlights an important approximation procedure which helps to significantly reduce most of the computational overhead associated with the inverse Hessian. This approximation is essentially useful for overparameterized machine learning models and in the mini-batch settings. Numerical examples on both synthetic and real datasets demonstrate the efficiency of our approach and its superiority over existing approaches.Comment: 37 pages, 7 figures, 3 table

    Non-smooth optimization methods for computation of the conditional value-at-risk and portfolio optimization

    Get PDF
    We examine numerical performance of various methods of calculation of the Conditional Value-at-risk (CVaR), and portfolio optimization with respect to this risk measure. We concentrate on the method proposed by Rockafellar and Uryasev in (Rockafellar, R.T. and Uryasev, S., 2000, Optimization of conditional value-at-risk. Journal of Risk, 2, 21-41), which converts this problem to that of convex optimization. We compare the use of linear programming techniques against a non-smooth optimization method of the discrete gradient, and establish the supremacy of the latter. We show that non-smooth optimization can be used efficiently for large portfolio optimization, and also examine parallel execution of this method on computer clusters.<br /
    corecore