24 research outputs found

    Super-Linear Convergence of Dual Augmented-Lagrangian Algorithm for Sparsity Regularized Estimation

    Full text link
    We analyze the convergence behaviour of a recently proposed algorithm for regularized estimation called Dual Augmented Lagrangian (DAL). Our analysis is based on a new interpretation of DAL as a proximal minimization algorithm. We theoretically show under some conditions that DAL converges super-linearly in a non-asymptotic and global sense. Due to a special modelling of sparse estimation problems in the context of machine learning, the assumptions we make are milder and more natural than those made in conventional analysis of augmented Lagrangian algorithms. In addition, the new interpretation enables us to generalize DAL to wide varieties of sparse estimation problems. We experimentally confirm our analysis in a large scale 1\ell_1-regularized logistic regression problem and extensively compare the efficiency of DAL algorithm to previously proposed algorithms on both synthetic and benchmark datasets.Comment: 51 pages, 9 figure

    A regularization technique in dynamic optimization

    Get PDF
    In this dissertation we discuss certain aspects of a parametric regularization technique which is based on recent work by R. Goebel. For proper, lower semicontinuous, and convex functions, this regularization is self-dual with respect to convex conjugation, and a simple extension of this smoothing exhibits the same feature when applied to proper, closed, and saddle functions. In Chapter 1 we give a introduction to convex and saddle function theory, which includes new results on the convergence of saddle function values that were not previously available in the form presented. In Chapter 2, we define the regularization and extend some of the properties previously shown in the convex case to the saddle one. Furthermore, we investigate the properties of this regularization without convexity assumptions. In particular, we show that for a prox-bounded function the family of infimal values of the regularization converges to the infimal values of the given function, even when the given function might not have a minimizer. Also we show that for a general type of prox-regular functions the regularization is locally convex, even though their Moreau envelope might fail to have this property. Moreover, we apply the regularization technique to Lagrangians of convex optimization problems in two different settings, and describe the convergence of the associated saddle values and the value functions. We also employ the regularization in fully convex problems in calculus of variations, in Chapter 3, in the setting studied by R. Rockafellar and P. Wolenski. In this case, we extend a result by Rockafellar on the Lipschitz continuity of the proximal mapping of the value function jointly in the time and state variables, which in turn implies the same regularity for the gradient of the self-dual regularization. Finally, we attach a software code to use with SCAT (Symbolic Convex Analysis Toolbox) in order to symbolically compute the regularization for functions of one variable

    スパース時間周波数表現に関する研究

    Get PDF
    早大学位記番号:新9160博士(工学)早稲田大

    Fourier ptychographic microscopy via alternating direction method of multipliers

    Get PDF
    Fourier ptychographic microscopy (FPM) has risen as a promising computational imaging technique that breaks the trade-off between high resolution and large field of view (FOV). Its reconstruction is normally formulated as a blind phase retrieval problem, where both the object and probe have to be recovered from phaseless measured data. However, the stability and reconstruction quality may dramatically deteriorate in the presence of noise interference. Herein, we utilized the concept of alternating direction method of multipliers (ADMM) to solve this problem (termed ADMM-FPM) by breaking it into multiple subproblems, each of which may be easier to deal with. We compared its performance against existing algorithms in both simulated and practical FPM platform. It is found that ADMM-FPM method belongs to a global optimization algorithm with a high degree of parallelism and thus results in a more stable and robust phase recovery under noisy conditions. We anticipate that ADMM will rekindle interest in FPM as more modifications and innovations are implemented in the future

    Fully discrete approximation of rate-independent damage models with gradient regularization

    Get PDF
    This work provides a convergence analysis of a time-discrete scheme coupled with a finite-element approximation in space for a model for partial, rate-independent damage featuring a gradient regularization as well as a non-smooth constraint to account for the unidirectionality of the damage evolution. The numerical algorithm to solve the coupled problem of quasistatic small strain linear elasticity with rate-independent gradient damage is based on a Variable ADMM-method to approximate the nonsmooth contribution. Space-discretization is based on P1 finite elements and the algorithm directly couples the time-step size with the spatial grid size h. For a wide class of gradient regularizations, which allows both for Sobolev functions of integrability exponent r ∈ (1, ∞) and for BV-functions, it is shown that solutions obtained with the algorithm approximate as h → 0 a semistable energetic solution of the original problem. The latter is characterized by a minimality property for the displacements, a semistability inequality for the damage variable and an energy dissipation estimate. Numerical benchmark experiments confirm the stability of the method

    Fully discrete approximation of rate-independent damage models with gradient regularization

    Get PDF
    This work provides a convergence analysis of a time-discrete scheme coupled with a finite-element approximation in space for a model for partial, rate-independent damage featuring a gradient regularization as well as a non-smooth constraint to account for the unidirectionality of the damage evolution. The numerical algorithm to solve the coupled problem of quasistatic small strain linear elasticity with rate-independent gradient damage is based on a Variable ADMM-method to approximate the nonsmooth contribution. Space-discretization is based on P1 finite elements and the algorithm directly couples the time-step size with the spatial grid size h. For a wide class of gradient regularizations, which allows both for Sobolev functions of integrability exponent r ∈ (1, ∞) and for BV-functions, it is shown that solutions obtained with the algorithm approximate as h → 0 a semistable energetic solution of the original problem. The latter is characterized by a minimality property for the displacements, a semistability inequality for the damage variable and an energy dissipation estimate. Numerical benchmark experiments confirm the stability of the method

    Contribution à l'étude du Lagrangien augmenté séparable : application au problème de routage des données dans les réseaux de télécommunication

    Get PDF
    Dans cette thèse nous proposons une méthode de décomposition basée sur le Lagrangien augmenté séparable (Séparable Augmented Lagrangian Algorithm ou SALA dans la littérature anglophone). L'algorithme que nous proposons s'appuie sur une réécriture du domaine réalisable où la matrice des contraintes est multipliée par une matrice symétrique définie positive. Nous obtenons ainsi un algorithme globalement convergent que nous nommons Lagrangien augmenté séparable avec paramètres multiples (en anglais Séparable Augmented Lagrangian Algorithm with multiple parameters ou SALAMP). Des résultats de convergence ont été obtenus avec des hypothèses faibles de convexité de l'objectif en présence de contraintes linéaires. Après l'étude de convergence, nous avons appliqué dans un premier temps SALAMP au problème monotropique. Il décompose ce dernier en une suite de problèmes dont chacune des fonctions objectif s'écrit comme un Lagrangien paramétrisé et qui bénificierait beaucoup d'une implémentation en parallèle. Après les problèmes monotropiques, nous nous sommes intéressés à une de leurs instances à savoir le problème de multiflot de coût minimum non linéaire et convexe. L'algorithme SALAMP décompose ce dernier en sous-problèmes unidimensionnels suivant les arcs du réseau (que nous avons résolus dans le cas du problème de routage des données par l'algorithme combiné bissection-Newton) et en problèmes de recherche de plus court chemin suivant les différents flots (résolus dans le cas du routage des données dans les réseaux de communication par l'algorithme de Djisktra). La dernière partie de cette thèse est consacrée à l'analyse numérique de l'algorithme SALAMP. C'est ainsi que nous l'avons appliqué au problème de routage dans les réseaux de données où nous avons tour à tour proposé des formules empiriques pour le choix des paramètres mais aussi des heuristiques pour leur mise à jour. Nous avons ainsi obtenu des résultats très satisfaisants et par ailleurs meilleurs (en terme de nombre d'itérations et de temps d'éxecution) que ceux donnés dans la littérature. Rappelons que la thèse débute par une présentation des problèmes de flots simples, puis des problèmes de multiflot et de leurs principaux algorithmes de résolution. Ensuite un bref rappel de la méthode de Lagrange classique est donné avant de présenter l'algorithme SALA et un bref survol des méthodes de type Lagrangien augmenté séparable (telles que celle de Peacemann-Rachford et celle de Douglas-Rachford pour ne citer que celles-ci) et leurs principaux résultats de convergence."--Résumé abrégé par UMI

    4D imaging in tomography and optical nanoscopy

    Full text link
    Diese Dissertation gehört zu den Gebieten mathematische Bildverarbeitung und inverse Probleme. Ein inverses Problem ist die Aufgabe, Modellparameter anhand von gemessenen Daten zu berechnen. Solche Probleme treten in zahlreichen Anwendungen in Wissenschaft und Technik auf, z.B. in medizinischer Bildgebung, Biophysik oder Astronomie. Wir betrachten Rekonstruktionsprobleme mit Poisson Rauschen in der Tomographie und optischen Nanoskopie. Bei letzterer gilt es Bilder ausgehend von verzerrten und verrauschten Messungen zu rekonstruieren, wohingegen in der Positronen-Emissions-Tomographie die Aufgabe in der Visualisierung physiologischer Prozesse eines Patienten besteht. Standardmethoden zur 3D Bildrekonstruktion berücksichtigen keine zeitabhängigen Informationen oder Dynamik, z.B. Herzschlag oder Atmung in der Tomographie oder Zellmigration in der Mikroskopie. Diese Dissertation behandelt Modelle, Analyse und effiziente Algorithmen für 3D und 4D zeitabhängige inverse Probleme. This thesis contributes to the field of mathematical image processing and inverse problems. An inverse problem is a task, where the values of some model parameters must be computed from observed data. Such problems arise in a wide variety of applications in sciences and engineering, such as medical imaging, biophysics or astronomy. We mainly consider reconstruction problems with Poisson noise in tomography and optical nanoscopy. In the latter case, the task is to reconstruct images from blurred and noisy measurements, whereas in positron emission tomography the task is to visualize physiological processes of a patient. In 3D static image reconstruction standard methods do not incorporate time-dependent information or dynamics, e.g. heart beat or breathing in tomography or cell motion in microscopy. This thesis is a treatise on models, analysis and efficient algorithms to solve 3D and 4D time-dependent inverse problems

    Recent Advances in Randomized Methods for Big Data Optimization

    Get PDF
    In this thesis, we discuss and develop randomized algorithms for big data problems. In particular, we study the finite-sum optimization with newly emerged variance- reduction optimization methods (Chapter 2), explore the efficiency of second-order information applied to both convex and non-convex finite-sum objectives (Chapter 3) and employ the fast first-order method in power system problems (Chapter 4).In Chapter 2, we propose two variance-reduced gradient algorithms – mS2GD and SARAH. mS2GD incorporates a mini-batching scheme for improving the theoretical complexity and practical performance of SVRG/S2GD, aiming to minimize a strongly convex function represented as the sum of an average of a large number of smooth con- vex functions and a simple non-smooth convex regularizer. While SARAH, short for StochAstic Recursive grAdient algoritHm and using a stochastic recursive gradient, targets at minimizing the average of a large number of smooth functions for both con- vex and non-convex cases. Both methods fall into the category of variance-reduction optimization, and obtain a total complexity of O((n+κ)log(1/ε)) to achieve an ε-accuracy solution for strongly convex objectives, while SARAH also maintains a sub-linear convergence for non-convex problems. Meanwhile, SARAH has a practical variant SARAH+ due to its linear convergence of the expected stochastic gradients in inner loops.In Chapter 3, we declare that randomized batches can be applied with second- order information, as to improve upon convergence in both theory and practice, with a framework of L-BFGS as a novel approach to finite-sum optimization problems. We provide theoretical analyses for both convex and non-convex objectives. Meanwhile, we propose LBFGS-F as a variant where Fisher information matrix is used instead of Hessian information, and prove it applicable to a distributed environment within the popular applications of least-square and cross-entropy losses.In Chapter 4, we develop fast randomized algorithms for solving polynomial optimization problems on the applications of alternating-current optimal power flows (ACOPF) in power system field. The traditional research on power system problem focuses on solvers using second-order method, while no randomized algorithms have been developed. First, we propose a coordinate-descent algorithm as an online solver, applied for solving time-varying optimization problems in power systems. We bound the difference between the current approximate optimal cost generated by our algorithm and the optimal cost for a relaxation using the most recent data from above by a function of the properties of the instance and the rate of change to the instance over time. Second, we focus on a steady-state problem in power systems, and study means of switching from solving a convex relaxation to Newton method working on a non-convex (augmented) Lagrangian of the problem

    Efficient Methods For Large-Scale Empirical Risk Minimization

    Get PDF
    Empirical risk minimization (ERM) problems express optimal classifiers as solutions of optimization problems in which the objective is the sum of a very large number of sample costs. An evident obstacle in using traditional descent algorithms for solving this class of problems is their prohibitive computational complexity when the number of component functions in the ERM problem is large. The main goal of this thesis is to study different approaches to solve these large-scale ERM problems. We begin by focusing on incremental and stochastic methods which split the training samples into smaller sets across time to lower the computation burden of traditional descent algorithms. We develop and analyze convergent stochastic variants of quasi-Newton methods which do not require computation of the objective Hessian and approximate the curvature using only gradient information. We show that the curvature approximation in stochastic quasi-Newton methods leads to faster convergence relative to first-order stochastic methods when the problem is ill-conditioned. We culminate with the introduction of an incremental method that exploits memory to achieve a superlinear convergence rate. This is the best known convergence rate for an incremental method. An alternative strategy for lowering the prohibitive cost of solving large-scale ERM problems is decentralized optimization whereby samples are separated not across time but across multiple nodes of a network. In this regime, the main contribution of this thesis is in incorporating second-order information of the aggregate risk corresponding to samples of all nodes in the network in a way that can be implemented in a distributed fashion. We also explore the separation of samples across both, time and space, to reduce the computational and communication cost for solving large-scale ERM problems. We study this path by introducing a decentralized stochastic method which incorporates the idea of stochastic averaging gradient leading to a low computational complexity method with a fast linear convergence rate. We then introduce a rethinking of ERM in which we consider not a partition of the training set as in the case of stochastic and distributed optimization, but a nested collection of subsets that we grow geometrically. The key insight is that the optimal argument associated with a training subset of a certain size is not that far from the optimal argument associated with a larger training subset. Based on this insight, we present adaptive sample size schemes which start with a small number of samples and solve the corresponding ERM problem to its statistical accuracy. The sample size is then grown geometrically and use the solution of the previous ERM as a warm start for the new ERM. Theoretical analyses show that the use of adaptive sample size methods reduces the overall computational cost of achieving the statistical accuracy of the whole dataset for a broad range of deterministic and stochastic first-order methods. We further show that if we couple the adaptive sample size scheme with Newton\u27s method, it is possible to consider subsequent doubling of the training set and perform a single Newton iteration in between. This is possible because of the interplay between the statistical accuracy and the quadratic convergence region of these problems and yields a method that is guaranteed to solve an ERM problem by performing just two passes over the dataset
    corecore