50 research outputs found
An Extragradient-Based Alternating Direction Method for Convex Minimization
In this paper, we consider the problem of minimizing the sum of two convex
functions subject to linear linking constraints. The classical alternating
direction type methods usually assume that the two convex functions have
relatively easy proximal mappings. However, many problems arising from
statistics, image processing and other fields have the structure that while one
of the two functions has easy proximal mapping, the other function is smoothly
convex but does not have an easy proximal mapping. Therefore, the classical
alternating direction methods cannot be applied. To deal with the difficulty,
we propose in this paper an alternating direction method based on
extragradients. Under the assumption that the smooth function has a Lipschitz
continuous gradient, we prove that the proposed method returns an
-optimal solution within iterations. We apply the
proposed method to solve a new statistical model called fused logistic
regression. Our numerical experiments show that the proposed method performs
very well when solving the test problems. We also test the performance of the
proposed method through solving the lasso problem arising from statistics and
compare the result with several existing efficient solvers for this problem;
the results are very encouraging indeed
Coordinate-Update Algorithms can Efficiently Detect Infeasible Optimization Problems
Coordinate update/descent algorithms are widely used in large-scale
optimization due to their low per-iteration cost and scalability, but their
behavior on infeasible or misspecified problems has not been much studied
compared to the algorithms that use full updates. For coordinate-update methods
to be as widely adopted to the extent so that they can be used as engines of
general-purpose solvers, it is necessary to also understand their behavior
under pathological problem instances. In this work, we show that the normalized
iterates of randomized coordinate-update fixed-point iterations (RC-FPI)
converge to the infimal displacement vector and use this result to design an
efficient infeasibility detection method. We then extend the analysis to the
setup where the coordinates are defined by non-orthonormal basis using the
Friedrichs angle and then apply the machinery to decentralized optimization
problems
Recommended from our members
Operator Splitting Methods for Convex and Nonconvex Optimization
This dissertation focuses on a family of optimization methods called operator splitting methods. They solve complicated problems by decomposing the problem structure into simpler pieces and make progress on each of them separately. Over the past two decades, there has been a resurgence of interests in these methods as the demand for solving structured large-scale problems grew. One of the major challenges for splitting methods is their sensitivity to ill-conditioning, which often makes them struggle to achieve a high order of accuracy. Furthermore, their classical analyses are restricted to the nice settings where solutions do exist, and everything is convex. Much less is known when either of these assumptions breaks down.This work aims to address the issues above. Specifically, we propose a novel acceleration technique called inexact preconditioning, which exploits second-order information at relatively low computation cost. We also show that certain splitting methods still work on problems without solutions, in the sense that their iterates provide information on what goes wrong and how to fix. Finally, for nonconvex problems with saddle points, we show that almost surely, splitting methods will only converge to the local minimums under certain assumptions
Convergence in Distribution of Randomized Algorithms: The Case of Partially Separable Optimization
We present a Markov-chain analysis of blockwise-stochastic algorithms for
solving partially block-separable optimization problems. Our main contributions
to the extensive literature on these methods are statements about the Markov
operators and distributions behind the iterates of stochastic algorithms, and
in particular the regularity of Markov operators and rates of convergence of
the distributions of the corresponding Markov chains. This provides a detailed
characterization of the moments of the sequences beyond just the expected
behavior. This also serves as a case study of how randomization restores
favorable properties to algorithms that iterations of only partial information
destroys. We demonstrate this on stochastic blockwise implementations of the
forward-backward and Douglas-Rachford algorithms for nonconvex (and, as a
special case, convex), nonsmooth optimization.Comment: 25 pages, 43 reference
Fixed Point Iterations for Finite Sum Monotone Inclusions
This thesis studies two families of methods for finding zeros of finite sums of monotone operators, the first being variance-reduced stochastic gradient (VRSG) methods. This is a large family of algorithms that use random sampling to improve the convergence rate compared to more traditional approaches. We examine the optimal sampling distributions and their interaction with the epoch length. Specifically, we show that in methods like SAGA, where the epoch length is directly tied to the random sampling, the optimal sampling becomes more complex compared to for instance L-SVRG, where the epoch length can be chosen independently. We also show that biased VRSG estimates in the style of SAG are sensitive to the problem setting. More precisely, a significantly larger step-size can be used when the monotone operators are cocoercive gradients compared to when they just are cocoercive. This is noteworthy since the standard gradient descent is not affected by this change and the fact that the sensitivity to the problem assumption vanishes when the estimates are unbiased. The second set of methods we examine are deterministic operator splitting methods and we focus on frameworks for constructing and analyzing such splitting methods. One such framework is based on what we call nonlinear resolvents and we present a novel way of ensuring convergence of iterations of nonlinear resolvents by the means of a momentum term. This approach leads in many cases to cheaper per-iteration cost compared to a previously established projection approach. The framework covers many existing methods and we provide a new primal-dual method that uses an extra resolvent step as well as a general approach for adding momentum to any special case of our nonlinear resolvent method. We use a similar concept to the nonlinear resolvent to derive a representation of the entire class of frugal splitting operators, which are splitting operators that use exactly one direct or resolvent evaluation of each operator of the monotone inclusion problem. The representation reveals several new results regarding lifting numbers, existence of solution maps, and parallelizability of the forward/backward evaluations. We show that the minimal lifting is n − 1 − f where n is the number of monotone operators and f is the number of direct evaluations in the splitting. A new convergent and parallelizable frugal splitting operator with minimal lifting is also presented
Proximal methods for structured group features and correlation matrix nearness
Tesis doctoral inédita leÃda en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de IngenierÃa Informática. Fecha de lectura: junio de 2014Optimization is ubiquitous in real life as many of the strategies followed both by nature and
by humans aim to minimize a certain cost, or maximize a certain benefit. More specifically,
numerous strategies in engineering are designed according to a minimization problem, although
usually the problems tackled are convex with a di erentiable objective function, since these
problems have no local minima and they can be solved with gradient-based techniques. Nevertheless,
many interesting problems are not di erentiable, such as, for instance, projection problems
or problems based on non-smooth norms. An approach to deal with them can be found in
the theory of Proximal Methods (PMs), which are based on iterative local minimizations using
the Proximity Operator (ProxOp) of the terms that compose the objective function.
This thesis begins with a general introduction and a brief motivation of the work done. The state
of the art in PMs is thoroughly reviewed, defining the basic concepts from the very beginning
and describing the main algorithms, as far as possible, in a simple and self-contained way.
After that, the PMs are employed in the field of supervised regression, where regularized models
play a prominent role. In particular, some classical linear sparse models are reviewed and unified
under the point of view of regularization, namely the Lasso, the Elastic–Network, the Group
Lasso and the Group Elastic–Network. All these models are trained by minimizing an error
term plus a regularization term, and thus they fit nicely in the domain of PMs, as the structure of
the problem can be exploited by minimizing alternatively the di erent expressions that compose
the objective function, in particular using the Fast Iterative Shrinkage–Thresholding Algorithm
(FISTA). As a real-world application, it is shown how these models can be used to forecast wind
energy, where they yield both good predictions in terms of the error and, more importantly,
valuable information about the structure and distribution of the relevant features.
Following with the regularized learning approach, a new regularizer is proposed, called the
Group Total Variation, which is a group extension of the classical Total Variation regularizer
and thus it imposes constancy over groups of features. In order to deal with it, an approach to
compute its ProxOp is derived. Moreover, it is shown that this regularizer can be used directly
to clean noisy multidimensional signals (such as colour images) or to define a new linear model,
the Group Fused Lasso (GFL), which can be then trained using FISTA. It is also exemplified
how this model, when applied to regression problems, is able to provide solutions that identify
the underlying problem structure. As an additional result of this thesis, a public software
implementation of the GFL model is provided.
The PMs are also applied to the Nearest Correlation Matrix problem under observation uncertainty.
The original problem consists in finding the correlation matrix which is nearest to the
true empirical one. Some variants introduce weights to adapt the confidence given to each entry
of the matrix; with a more general perspective, in this thesis the problem is explored directly
considering uncertainty on the observations, which is formalized as a set of intervals where the
measured matrices lie. Two di erent variants are defined under this framework: a robust approach
called the Robust Nearest Correlation Matrix (which aims to minimize the worst-case
scenario) and an exploratory approach, the Exploratory Nearest Correlation Matrix (which focuses
on the best-case scenario). It is shown how both optimization problems can be solved
using the Douglas–Rachford PM with a suitable splitting of the objective functions.
The thesis ends with a brief overall discussion and pointers to further work.La optimización está presente en todas las facetas de la vida, de hecho muchas de las estrategias
tanto de la naturaleza como del ser humano pretenden minimizar un cierto coste, o maximizar
un cierto beneficio. En concreto, multitud de estrategias en ingenierÃa se diseñan según problemas
de minimización, que habitualmente son problemas convexos con una función objetivo
diferenciable, puesto que en ese caso no hay mÃnimos locales y los problemas pueden resolverse
mediante técnicas basadas en gradiente. Sin embargo, hay muchos problemas interesantes que
no son diferenciables, como por ejemplo problemas de proyección o basados en normas no suaves.
Una aproximación para abordar estos problemas son los Métodos Proximales (PMs), que
se basan en minimizaciones locales iterativas utilizando el Operador de Proximidad (ProxOp)
de los términos de la función objetivo.
La tesis comienza con una introducción general y una breve motivación del trabajo hecho. Se
revisa en profundidad el estado del arte en PMs, definiendo los conceptos básicos y describiendo
los algoritmos principales, dentro de lo posible, de forma simple y auto-contenida.
Tras ello, se emplean los PMs en el campo de la regresión supervisada, donde los modelos regularizados
tienen un papel prominente. En particular, se revisan y unifican bajo esta perspectiva
de regularización algunos modelos lineales dispersos clásicos, a saber, Lasso, Elastic–Network,
Lasso Grupal y Elastic–Network Grupal. Todos estos modelos se entrenan minimizando un término
de error y uno de regularización, y por tanto encajan perfectamente en el dominio de los
PMs, ya que la estructura del problema puede ser aprovechada minimizando alternativamente las
diferentes expresiones que componen la función objetivo, en particular mediante el Algoritmo
Fast Iterative Shrinkage–Thresholding (FISTA). Como aplicación al mundo real, se muestra que
estos modelos pueden utilizarse para predecir energÃa eólica, donde proporcionan tanto buenos
resultados en términos del error como información valiosa sobre la estructura y distribución de
las caracterÃsticas relevantes.
Siguiendo con esta aproximación, se propone un nuevo regularizador, llamado Variación Total
Grupal, que es una extensión grupal del regularizador clásico de Variación Total y que por
tanto induce constancia sobre grupos de caracterÃsticas. Para aplicarlo, se desarrolla una aproximación
para calcular su ProxOp. Además, se muestra que este regularizador puede utilizarse
directamente para limpiar señales multidimensionales ruidosas (como imágenes a color) o para
definir un nuevo modelo lineal, el Fused Lasso Grupal (GFL), que se entrena con FISTA. Se
ilustra cómo este modelo, cuando se aplica a problemas de regresión, es capaz de proporcionar
soluciones que identifican la estructura subyacente del problema. Como resultado adicional de
esta tesis, se publica una implementación software del modelo GFL.
Asimismo, se aplican los PMs al problema de Matriz de Correlación Próxima (NCM) bajo incertidumbre.
El problema original consiste en encontrar la matriz de correlación más cercana a
la empÃrica verdadera. Algunas variantes introducen pesos para ajustar la confianza que se da a
cada entrada de la matriz; con un carácter más general, en esta tesis se explora el problema considerando
incertidumbre en las observaciones, que se formaliza como un conjunto de intervalos
en el que se encuentran las matrices medidas. Bajo este marco se definen dos variantes: una
aproximación robusta llamada NCM Robusta (que minimiza el caso peor) y una exploratoria,
NCM Exploratoria (que se centra en el caso mejor). Ambos problemas de optimización pueden
resolverse con el PM de Douglas–Rachford y una partición adecuada de las funciones objetivo.
La tesis concluye con una discusión global y referencias a trabajo futur