250 research outputs found
Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors
Penalized regression is an attractive framework for variable selection
problems. Often, variables possess a grouping structure, and the relevant
selection problem is that of selecting groups, not individual variables. The
group lasso has been proposed as a way of extending the ideas of the lasso to
the problem of group selection. Nonconvex penalties such as SCAD and MCP have
been proposed and shown to have several advantages over the lasso; these
penalties may also be extended to the group selection problem, giving rise to
group SCAD and group MCP methods. Here, we describe algorithms for fitting
these models stably and efficiently. In addition, we present simulation results
and real data examples comparing and contrasting the statistical properties of
these methods
Learning from High-Dimensional Multivariate Signals.
Modern measurement systems monitor a growing number of variables at low cost. In the problem
of characterizing the observed measurements, budget limitations usually constrain the number n of samples that one can acquire, leading to situations where the number p of variables is much larger than n. In this situation, classical statistical methods, founded on the assumption that n is large and p is fixed,
fail both in theory and in practice. A successful approach to overcome this problem is to assume a parsimonious generative model characterized by a number k of
parameters, where k is much smaller than p.
In this dissertation we develop algorithms to fit low-dimensional generative models
and extract relevant information from high-dimensional, multivariate signals. First,
we define extensions of the well-known Scalar Shrinkage-Thresholding Operator, that
we name Multidimensional and Generalized Shrinkage-Thresholding Operators, and
show that these extensions arise in numerous algorithms for structured-sparse linear and non-linear regression. Using convex optimization techniques, we show that
these operators, defined as the solutions to a class of convex, non-differentiable, optimization problems have an equivalent convex, low-dimensional reformulation. Our
equivalence results shed light on the behavior of a general class of penalties that includes classical sparsity-inducing penalties such as the LASSO and the Group LASSO.
In addition, our reformulation leads in some cases to new efficient algorithms for a
variety of high-dimensional penalized estimation problems.
Second, we introduce two new classes of low-dimensional factor models that account for temporal shifts commonly occurring in multivariate signals. Our first contribution, called Order Preserving Factor Analysis, can be seen as an extension of the
non-negative, sparse matrix factorization model to allow for order-preserving temporal translations in the data. We develop an efficient descent algorithm to fit this model
using techniques from convex and non-convex optimization. Our second contribution
extends Principal Component Analysis to the analysis of observations suffering from
circular shifts, and we call it Misaligned Principal Component Analysis. We
quantify the effect of the misalignments in the spectrum of the sample covariance matrix in the high-dimensional regime and develop simple algorithms to jointly estimate
the principal components and the misalignment parameters.Ph.D.Electrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91544/1/atibaup_1.pd
Group-Sparse Signal Denoising: Non-Convex Regularization, Convex Optimization
Convex optimization with sparsity-promoting convex regularization is a
standard approach for estimating sparse signals in noise. In order to promote
sparsity more strongly than convex regularization, it is also standard practice
to employ non-convex optimization. In this paper, we take a third approach. We
utilize a non-convex regularization term chosen such that the total cost
function (consisting of data consistency and regularization terms) is convex.
Therefore, sparsity is more strongly promoted than in the standard convex
formulation, but without sacrificing the attractive aspects of convex
optimization (unique minimum, robust algorithms, etc.). We use this idea to
improve the recently developed 'overlapping group shrinkage' (OGS) algorithm
for the denoising of group-sparse signals. The algorithm is applied to the
problem of speech enhancement with favorable results in terms of both SNR and
perceptual quality.Comment: 14 pages, 11 figure
SLOPE - Adaptive variable selection via convex optimization
We introduce a new estimator for the vector of coefficients in the
linear model , where has dimensions with
possibly larger than . SLOPE, short for Sorted L-One Penalized Estimation,
is the solution to where
and are the
decreasing absolute values of the entries of . This is a convex program and
we demonstrate a solution algorithm whose computational complexity is roughly
comparable to that of classical procedures such as the Lasso. Here,
the regularizer is a sorted norm, which penalizes the regression
coefficients according to their rank: the higher the rank - that is, stronger
the signal - the larger the penalty. This is similar to the Benjamini and
Hochberg [J. Roy. Statist. Soc. Ser. B 57 (1995) 289-300] procedure (BH) which
compares more significant -values with more stringent thresholds. One
notable choice of the sequence is given by the BH critical
values , where and
is the quantile of a standard normal distribution. SLOPE aims to
provide finite sample guarantees on the selected model; of special interest is
the false discovery rate (FDR), defined as the expected proportion of
irrelevant regressors among all selected predictors. Under orthogonal designs,
SLOPE with provably controls FDR at level .
Moreover, it also appears to have appreciable inferential properties under more
general designs while having substantial power, as demonstrated in a series
of experiments running on both simulated and real data.Comment: Published at http://dx.doi.org/10.1214/15-AOAS842 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Proximal methods for structured group features and correlation matrix nearness
Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de lectura: junio de 2014Optimization is ubiquitous in real life as many of the strategies followed both by nature and
by humans aim to minimize a certain cost, or maximize a certain benefit. More specifically,
numerous strategies in engineering are designed according to a minimization problem, although
usually the problems tackled are convex with a di erentiable objective function, since these
problems have no local minima and they can be solved with gradient-based techniques. Nevertheless,
many interesting problems are not di erentiable, such as, for instance, projection problems
or problems based on non-smooth norms. An approach to deal with them can be found in
the theory of Proximal Methods (PMs), which are based on iterative local minimizations using
the Proximity Operator (ProxOp) of the terms that compose the objective function.
This thesis begins with a general introduction and a brief motivation of the work done. The state
of the art in PMs is thoroughly reviewed, defining the basic concepts from the very beginning
and describing the main algorithms, as far as possible, in a simple and self-contained way.
After that, the PMs are employed in the field of supervised regression, where regularized models
play a prominent role. In particular, some classical linear sparse models are reviewed and unified
under the point of view of regularization, namely the Lasso, the Elastic–Network, the Group
Lasso and the Group Elastic–Network. All these models are trained by minimizing an error
term plus a regularization term, and thus they fit nicely in the domain of PMs, as the structure of
the problem can be exploited by minimizing alternatively the di erent expressions that compose
the objective function, in particular using the Fast Iterative Shrinkage–Thresholding Algorithm
(FISTA). As a real-world application, it is shown how these models can be used to forecast wind
energy, where they yield both good predictions in terms of the error and, more importantly,
valuable information about the structure and distribution of the relevant features.
Following with the regularized learning approach, a new regularizer is proposed, called the
Group Total Variation, which is a group extension of the classical Total Variation regularizer
and thus it imposes constancy over groups of features. In order to deal with it, an approach to
compute its ProxOp is derived. Moreover, it is shown that this regularizer can be used directly
to clean noisy multidimensional signals (such as colour images) or to define a new linear model,
the Group Fused Lasso (GFL), which can be then trained using FISTA. It is also exemplified
how this model, when applied to regression problems, is able to provide solutions that identify
the underlying problem structure. As an additional result of this thesis, a public software
implementation of the GFL model is provided.
The PMs are also applied to the Nearest Correlation Matrix problem under observation uncertainty.
The original problem consists in finding the correlation matrix which is nearest to the
true empirical one. Some variants introduce weights to adapt the confidence given to each entry
of the matrix; with a more general perspective, in this thesis the problem is explored directly
considering uncertainty on the observations, which is formalized as a set of intervals where the
measured matrices lie. Two di erent variants are defined under this framework: a robust approach
called the Robust Nearest Correlation Matrix (which aims to minimize the worst-case
scenario) and an exploratory approach, the Exploratory Nearest Correlation Matrix (which focuses
on the best-case scenario). It is shown how both optimization problems can be solved
using the Douglas–Rachford PM with a suitable splitting of the objective functions.
The thesis ends with a brief overall discussion and pointers to further work.La optimización está presente en todas las facetas de la vida, de hecho muchas de las estrategias
tanto de la naturaleza como del ser humano pretenden minimizar un cierto coste, o maximizar
un cierto beneficio. En concreto, multitud de estrategias en ingeniería se diseñan según problemas
de minimización, que habitualmente son problemas convexos con una función objetivo
diferenciable, puesto que en ese caso no hay mínimos locales y los problemas pueden resolverse
mediante técnicas basadas en gradiente. Sin embargo, hay muchos problemas interesantes que
no son diferenciables, como por ejemplo problemas de proyección o basados en normas no suaves.
Una aproximación para abordar estos problemas son los Métodos Proximales (PMs), que
se basan en minimizaciones locales iterativas utilizando el Operador de Proximidad (ProxOp)
de los términos de la función objetivo.
La tesis comienza con una introducción general y una breve motivación del trabajo hecho. Se
revisa en profundidad el estado del arte en PMs, definiendo los conceptos básicos y describiendo
los algoritmos principales, dentro de lo posible, de forma simple y auto-contenida.
Tras ello, se emplean los PMs en el campo de la regresión supervisada, donde los modelos regularizados
tienen un papel prominente. En particular, se revisan y unifican bajo esta perspectiva
de regularización algunos modelos lineales dispersos clásicos, a saber, Lasso, Elastic–Network,
Lasso Grupal y Elastic–Network Grupal. Todos estos modelos se entrenan minimizando un término
de error y uno de regularización, y por tanto encajan perfectamente en el dominio de los
PMs, ya que la estructura del problema puede ser aprovechada minimizando alternativamente las
diferentes expresiones que componen la función objetivo, en particular mediante el Algoritmo
Fast Iterative Shrinkage–Thresholding (FISTA). Como aplicación al mundo real, se muestra que
estos modelos pueden utilizarse para predecir energía eólica, donde proporcionan tanto buenos
resultados en términos del error como información valiosa sobre la estructura y distribución de
las características relevantes.
Siguiendo con esta aproximación, se propone un nuevo regularizador, llamado Variación Total
Grupal, que es una extensión grupal del regularizador clásico de Variación Total y que por
tanto induce constancia sobre grupos de características. Para aplicarlo, se desarrolla una aproximación
para calcular su ProxOp. Además, se muestra que este regularizador puede utilizarse
directamente para limpiar señales multidimensionales ruidosas (como imágenes a color) o para
definir un nuevo modelo lineal, el Fused Lasso Grupal (GFL), que se entrena con FISTA. Se
ilustra cómo este modelo, cuando se aplica a problemas de regresión, es capaz de proporcionar
soluciones que identifican la estructura subyacente del problema. Como resultado adicional de
esta tesis, se publica una implementación software del modelo GFL.
Asimismo, se aplican los PMs al problema de Matriz de Correlación Próxima (NCM) bajo incertidumbre.
El problema original consiste en encontrar la matriz de correlación más cercana a
la empírica verdadera. Algunas variantes introducen pesos para ajustar la confianza que se da a
cada entrada de la matriz; con un carácter más general, en esta tesis se explora el problema considerando
incertidumbre en las observaciones, que se formaliza como un conjunto de intervalos
en el que se encuentran las matrices medidas. Bajo este marco se definen dos variantes: una
aproximación robusta llamada NCM Robusta (que minimiza el caso peor) y una exploratoria,
NCM Exploratoria (que se centra en el caso mejor). Ambos problemas de optimización pueden
resolverse con el PM de Douglas–Rachford y una partición adecuada de las funciones objetivo.
La tesis concluye con una discusión global y referencias a trabajo futur
- …