Search CORE

11 research outputs found

Intersecting singularities for multi-structured estimation

Author: Bach Francis
Richard Emile
Vert Jean-Philippe
Publication venue: HAL CCSD
Publication date: 16/06/2013
Field of study

International audienceWe address the problem of designing a convex nonsmooth regularizer encouraging multiple structural effects simultaneously. Focus- ing on the inference of sparse and low-rank matrices we suggest a new complexity index and a convex penalty approximating it. The new penalty term can be written as the trace norm of a linear function of the matrix. By analyzing theoretical properties of this family of regularizers we come up with oracle in- equalities and compressed sensing results ensuring the quality of our regularized estimator. We also provide algorithms and support- ing numerical experiments

INRIA a CCSD electronic archive server

Solving ℓ0-penalized problems with simple constraints via the Frank–Wolfe reduced dimension method

Author: Liuzzi G.
Rinaldi F.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

ell _0

ℓ0-penalized problems arise in a number of applications in engineering, machine learning and statistics, and, in the last decades, the design of algorithms for these problems has attracted the interest of many researchers. In this paper, we are concerned with the definition of a first-order method for the solution of

ell _0

ℓ0-penalized problems with simple constraints. We use a reduced dimension Frank–Wolfe algorithm Rinaldi (Optim Methods Softw, 26, 2011) and show that the subproblem related to the computation of the Frank–Wolfe direction can be solved analytically at least for some sets of simple constraints. This gives us a very easy to implement and quite general tool for dealing with

ell _0

ℓ0-penalized problems. The proposed method is then applied to the numerical solution of two practical optimization problems, namely, the Sparse Principal Component Analysis and the Sparse Reconstruction of Noisy Signals. In both cases, the reported numerical performances and comparisons with state-of-the-art solvers show the efficiency of the proposed method

Archivio della ricerca- Università di Roma La Sapienza

Risk neutral and risk averse stochastic optimization

Author: Cheng Yi
Publication venue: Georgia Institute of Technology
Publication date: 10/01/2023
Field of study

In this thesis, we focus on the modeling, computational methods and applications of multistage/single-stage stochastic optimization, which entail risk aversion under certain circumstances. Chapters 2-4 concentrate on multistage stochastic programming while Chapter 5-6 deal with a class of single-stage functional constrained stochastic optimization problems. First, we investigate the deterministic upper bound of a Multistage Stochastic Linear Program (MSLP). We first present the Dual SDDP algorithm, which solves the Dynamic Programming equations for the dual and computes a sequence of nonincreasing deterministic upper bounds for the optimal value of the problem, even without the presence of Relatively Complete Recourse (RCR) condition. We show that optimal dual solutions can be obtained using Primal SDDP when computing the duals of the subproblems in the backward pass. As a byproduct, we study the sensitivity of the optimal value as a function of the involved problem parameters. In particular, we provide formulas for the derivatives of the value function with respect to the parameters and illustrate their application on an inventory problem. Next, we extend to the infinite-horizon MSLP and show how to construct a deterministic upper bound (dual bound) via the proposed Periodical Dual SDDP. Finally, as a proof of concept of the developed tools, we present the numerical results of (1) the sensitivity of the optimal value as a function of the demand process parameters; (2) conduct Dual SDDP on the inventory and the Brazilian hydro-thermal planning problems under both finite-horizon and infinite-horizon settings. Third, we discuss sample complexity of solving stationary stochastic programs by the Sample Average Approximation (SAA) method. We investigate this in the framework of Stochastic Optimal Control (in discrete time) setting. In particular we derive a Central Limit Theorem type asymptotics for the optimal values of the SAA problems. The main conclusion is that the sample size, required to attain a given relative error of the SAA solution, is not sensitive to the discount factor, even if the discount factor is very close to one. We consider the risk neutral and risk averse settings. The presented numerical experiments confirm the theoretical analysis. Fourth, we propose a novel projection-free method, referred to as Level Conditional Gradient (LCG) method, for solving convex functional constrained optimization. Different from the constraint-extrapolated conditional gradient type methods (CoexCG and CoexDurCG), LCG, as a primal method, does not assume the existence of an optimal dual solution, thus improving the convergence rate of CoexCG/CoexDurCG by eliminating the dependence on the magnitude of the optimal dual solution. Similar to existing level-set methods, LCG uses an approximate Newton method to solve a root-finding problem. In each approximate Newton update, LCG calls a conditional gradient oracle (CGO) to solve a saddle point subproblem. The CGO developed herein employs easily computable lower and upper bounds on these saddle point problems. We establish the iteration complexity of the CGO for solving a general class of saddle point optimization. Using these results, we show that the overall iteration complexity of the proposed LCG method is

\mathcal{O}\left(\frac{1}{\epsilon^2}\log(\frac{1}{\epsilon})\right)

for finding an

\epsilon

-optimal and

\epsilon

-feasible solution of the considered problem. To the best of our knowledge, LCG is the first primal conditional gradient method for solving convex functional constrained optimization. For the subsequently developed nonconvex algorithms in this thesis, LCG can also serve as a subroutine or provide high-quality starting points that expedites the solution process. Last, to cope with the nonconvex functional constrained optimization problems, we develop three approaches: the Level Exact Proximal Point (EPP-LCG) method, the Level Inexact Proximal Point (IPP-LCG) method and the Direct Nonconvex Conditional Gradient (DNCG) method. The proposed EPP-LCG and IPP-LCG methods utilize the proximal point framework and solve a series of convex subproblems. By solving each subproblem, they leverage the proposed LCG method, thus averting the effect from large Lagrangian multipliers. We show that the iteration complexity of the algorithms is bounded by

\mathcal{O}\left(\frac{1}{\epsilon^3}\log(\frac{1}{\epsilon})\right)

in order to obtain an (approximate) KKT point. However, the proximal-point type methods have triple-layer structure and may not be easily implementable. To alleviate the issue, we also propose the DNCG method, which is the first single-loop projection-free algorithm for solving nonconvex functional constrained problem in the literature. This algorithm provides a drastically simpler framework as it only contains three updates in one loop. We show that the iteration complexity to find an

\epsilon

-Wolfe point is bounded by

\mathcal{O}\big(1/{\epsilon^4}\big)

. To the best of our knowledge, all these developments are new for projection-free methods for nonconvex optimization. We demonstrate the effectiveness of the proposed nonconvex projection-free methods on a portfolio selection problem and the intensity modulated radiation therapy treatment planning problem. Moreover, we compare the results with the LCG method proposed in Chapter \ref{chapter-noncvx}. The outcome of the numerical study shows all methods are efficient in jointly minimizing risk while promoting sparsity in a rather short computational time for the real-world and large-scale datasets.Ph.D

Scholarly Materials And Research @ Georgia Tech

Exploiting Smoothness in Statistical Learning, Sequential Prediction, and Stochastic Optimization

Author: Mahdavi Mehrdad
Publication venue
Publication date: 19/07/2014
Field of study

In the last several years, the intimate connection between convex optimization and learning problems, in both statistical and sequential frameworks, has shifted the focus of algorithmic machine learning to examine this interplay. In particular, on one hand, this intertwinement brings forward new challenges in reassessment of the performance of learning algorithms including generalization and regret bounds under the assumptions imposed by convexity such as analytical properties of loss functions (e.g., Lipschitzness, strong convexity, and smoothness). On the other hand, emergence of datasets of an unprecedented size, demands the development of novel and more efficient optimization algorithms to tackle large-scale learning problems. The overarching goal of this thesis is to reassess the smoothness of loss functions in statistical learning, sequential prediction/online learning, and stochastic optimization and explicate its consequences. In particular we examine how smoothness of loss function could be beneficial or detrimental in these settings in terms of sample complexity, statistical consistency, regret analysis, and convergence rate, and investigate how smoothness can be leveraged to devise more efficient learning algorithms.Comment: Ph.D. Thesi

arXiv.org e-Print Archive

CiteSeerX

Automated smoothing parameter estimation for quantile additive models

Author: Nortier Bertrand
Publication venue
Publication date: 23/03/2021
Field of study

Explore Bristol Research

Bayesian fusion of multi-band images : A powerful tool for super-resolution

Author: Wei Qi
Publication venue
Publication date: 24/09/2015
Field of study

Hyperspectral (HS) imaging, which consists of acquiring a same scene in several hundreds of contiguous spectral bands (a three dimensional data cube), has opened a new range of relevant applications, such as target detection [MS02], classification [C.-03] and spectral unmixing [BDPD+12]. However, while HS sensors provide abundant spectral information, their spatial resolution is generally more limited. Thus, fusing the HS image with other highly resolved images of the same scene, such as multispectral (MS) or panchromatic (PAN) images is an interesting problem. The problem of fusing a high spectral and low spatial resolution image with an auxiliary image of higher spatial but lower spectral resolution, also known as multi-resolution image fusion, has been explored for many years [AMV+11]. From an application point of view, this problem is also important as motivated by recent national programs, e.g., the Japanese next-generation space-borne hyperspectral image suite (HISUI), which fuses co-registered MS and HS images acquired over the same scene under the same conditions [YI13]. Bayesian fusion allows for an intuitive interpretation of the fusion process via the posterior distribution. Since the fusion problem is usually ill-posed, the Bayesian methodology offers a convenient way to regularize the problem by defining appropriate prior distribution for the scene of interest. The aim of this thesis is to study new multi-band image fusion algorithms to enhance the resolution of hyperspectral image. In the first chapter, a hierarchical Bayesian framework is proposed for multi-band image fusion by incorporating forward model, statistical assumptions and Gaussian prior for the target image to be restored. To derive Bayesian estimators associated with the resulting posterior distribution, two algorithms based on Monte Carlo sampling and optimization strategy have been developed. In the second chapter, a sparse regularization using dictionaries learned from the observed images is introduced as an alternative of the naive Gaussian prior proposed in Chapter 1. instead of Gaussian prior is introduced to regularize the ill-posed problem. Identifying the supports jointly with the dictionaries circumvented the difficulty inherent to sparse coding. To minimize the target function, an alternate optimization algorithm has been designed, which accelerates the fusion process magnificently comparing with the simulation-based method. In the third chapter, by exploiting intrinsic properties of the blurring and downsampling matrices, a much more efficient fusion method is proposed thanks to a closed-form solution for the Sylvester matrix equation associated with maximizing the likelihood. The proposed solution can be embedded into an alternating direction method of multipliers or a block coordinate descent method to incorporate different priors or hyper-priors for the fusion problem, allowing for Bayesian estimators. In the last chapter, a joint multi-band image fusion and unmixing scheme is proposed by combining the well admitted linear spectral mixture model and the forward model. The joint fusion and unmixing problem is solved in an alternating optimization framework, mainly consisting of solving a Sylvester equation and projecting onto a simplex resulting from the non-negativity and sum-to-one constraints. The simulation results conducted on synthetic and semi-synthetic images illustrate the advantages of the developed Bayesian estimators, both qualitatively and quantitatively

Open Archive Toulouse Archive Ouverte

Learning with Submodular Functions: A Convex Optimization Perspective

Author: Bach Francis
Publication venue: 'Now Publishers'
Publication date: 01/01/2013
Field of study

International audienceSubmodular functions are relevant to machine learning for at least two reasons: (1) some problems may be expressed directly as the optimization of submodular functions and (2) the lovasz extension of submodular functions provides a useful set of regularization functions for supervised and unsupervised learning. In this monograph, we present the theory of submodular functions from a convex analysis perspective, presenting tight links between certain polyhedra, combinatorial optimization and convex optimization problems. In particular, we show how submodular function minimization is equivalent to solving a wide variety of convex optimization problems. This allows the derivation of new efficient algorithms for approximate and exact submodular function minimization with theoretical guarantees and good practical performance. By listing many examples of submodular functions, we review various applications to machine learning, such as clustering, experimental design, sensor placement, graphical model structure learning or subset selection, as well as a family of structured sparsity-inducing norms that can be derived and used from submodular functions

INRIA a CCSD electronic archive server

Algorithmic Analysis And Statistical Inference Of Sparse Models In High Dimension

Author: Bu Zhiqi
Publication venue: ScholarlyCommons
Publication date: 01/01/2021
Field of study

The era of machine learning features large datasets that have high dimension of features. This leads to the emergence of various algorithms to learn efficiently from such high-dimensional datasets, as well as the need to analyze these algorithms from both the prediction and the statistical inference viewpoint. To be more specific, an ideal model is expected to predict accurately on the unseen new data, and to provide valid inference so as to harness the uncertainty in the model. Unfortunately, the high dimension of features poses a great challenge on the analysis of many prevalent models, rendering them either inapplicable or difficult to study. This thesis leverages the approximate message passing (AMP) algorithm, the optimization theory, and the Sorted L-One Penalized Estimation (SLOPE) to study several important problems of the sparse models. The first chapter introduces various

\ell_1

penalties including but not limited to the SLOPE, a relatively new convex optimization procedure via the sorted

\ell_1

penalty, in the general machine learning models. We then focus on the linear models and demonstrate some basic properties of SLOPE, especially its advantages over the Lasso. Next, we cover the AMP algorithm in terms of convergence behavior and asymptotic statistical characterization. The second chapter extends the AMP algorithms from Lasso to SLOPE and provides an asymptotically tight characterization of the SLOPE solution. Note that SLOPE is a relatively new convex optimization procedure for high-dimensional linear regression via the sorted

\ell_1

penalty: the larger the rank of the fitted coefficient, the larger the penalty. This non-separable penalty renders many existing techniques invalid or inconclusive in analyzing the SLOPE solution. We develop an asymptotically exact characterization of the SLOPE solution under Gaussian random designs through solving the SLOPE problem using approximate message passing (AMP). This algorithmic approach allows us to approximate the SLOPE solution via the much more amenable AMP iterates. Explicitly, we characterize the asymptotic dynamics of the AMP iterates relying on a recently developed state evolution analysis for non-separable penalties, thereby overcoming the difficulty caused by the sorted

\ell_1

penalty. Moreover, we prove that the AMP iterates converge to the SLOPE solution in an asymptotic sense, and numerical simulations show that the convergence is surprisingly fast. Our proof rests on a novel technique that specifically leverages the SLOPE problem. In contrast to prior literature, our work not only yields an asymptotically sharp analysis but also offers an algorithmic, flexible, and constructive approach to understanding the SLOPE problem. The third chapter builds on top of the asymptotic characterization of SLOPE to study the trade-off between true positive proportion (TPP) and false discovery proportion (FDP) or, equivalently, between measures of type I error and power. Assuming a regime of linear sparsity and working under Gaussian random designs, we obtain an upper bound on the optimal trade-off for SLOPE, showing its capability of breaking the Donoho--Tanner power limit. To put it into perspective, this limit is the highest possible power that the Lasso, which is perhaps the most popular

\ell_1

-based method, can achieve even with arbitrarily strong effect sizes. Next, we derive a tight lower bound that delineates the fundamental limit of sorted

\ell_1

regularization in optimally trading theFDP off for the TPP. Finally, we show that on any problem instance, SLOPE with a certain regularization sequence outperforms the Lasso, in the sense of having a smaller FDP, larger TPP, and smaller

\ell_2

estimation risk simultaneously. Our proofs are based on a novel technique that reduces a calculus of variations problem to a class of infinite-dimensional convex optimization problems and a very recent result from approximate message passing theory. The fourth chapter works on the practical application of SLOPE by efficiently designing the SLOPE penalty sequence in the finite dimension, by restricting the number of unique values in the SLOPE penalty to be small. SLOPE\u27s magnitude-dependent regularization requires an input of penalty sequence \blam, instead of a scalar penalty as in the Lasso case, thus making the design extremely expensive in computation. We propose two efficient algorithms to design the possibly high-dimensional SLOPE penalty, in order to minimize the mean squared error. For Gaussian data matrices, we propose a first-order Projected Gradient Descent (PGD) under the Approximate Message Passing regime. For general data matrices, we present a zeroth-order Coordinate Descent (CD) to design a sub-class of SLOPE, referred to as the

k

-level SLOPE. Our CD allows a useful trade-off between accuracy and computation speed. We demonstrate the performance of SLOPE with our designs via extensive experiments on synthetic data and real-world datasets

ScholarlyCommons@Penn

An adaptive, fault-tolerant system for road network traffic prediction using machine learning

Author: Mena-Yedra Rafael
Publication venue: Universitat Politècnica de Catalunya
Publication date: 06/03/2020
Field of study

This thesis has addressed the design and development of an integrated system for real-time traffic forecasting based on machine learning methods. Although traffic prediction has been the driving motivation for the thesis development, a great part of the proposed ideas and scientific contributions in this thesis are generic enough to be applied in any other problem where, ideally, their definition is that of the flow of information in a graph-like structure. Such application is of special interest in environments susceptible to changes in the underlying data generation process. Moreover, the modular architecture of the proposed solution facilitates the adoption of small changes to the components that allow it to be adapted to a broader range of problems. On the other hand, certain specific parts of this thesis are strongly tied to the traffic flow theory. The focus in this thesis is on a macroscopic perspective of the traffic flow where the individual road traffic flows are correlated to the underlying traffic demand. These short-term forecasts include the road network characterization in terms of the corresponding traffic measurements –traffic flow, density and/or speed–, the traffic state –whether a road is congested or not, and its severity–, and anomalous road conditions –incidents or other non-recurrent events–. The main traffic data used in this thesis is data coming from detectors installed along the road networks. Nevertheless, other kinds of traffic data sources could be equally suitable with the appropriate preprocessing. This thesis has been developed in the context of Aimsun Live –a simulation-based traffic solution for real-time traffic prediction developed by Aimsun–. The methods proposed here is planned to be linked to it in a mutually beneficial relationship where they cooperate and assist each other. An example is when an incident or non-recurrent event is detected with the proposed methods in this thesis, then the simulation-based forecasting module can simulate different strategies to measure their impact. Part of this thesis has been also developed in the context of the EU research project "SETA" (H2020-ICT-2015). The main motivation that has guided the development of this thesis is enhancing those weak points and limitations previously identified in Aimsun Live, and whose research found in literature has not been especially extensive. These include: • Autonomy, both in the preparation and real-time stages. • Adaptation, to gradual or abrupt changes in traffic demand or supply. • Informativeness, about anomalous road conditions. • Forecasting accuracy improved with respect to previous methodology at Aimsun and a typical forecasting baseline. • Robustness, to deal with faulty or missing data in real-time. • Interpretability, adopting modelling choices towards a more transparent reasoning and understanding of the underlying data-driven decisions. • Scalable, using a modular architecture with emphasis on a parallelizable exploitation of large amounts of data. The result of this thesis is an integrated system –Adarules– for real-time forecasting which is able to make the best of the available historical data, while at the same time it also leverages the theoretical unbounded size of data in a continuously streaming scenario. This is achieved through the online learning and change detection features along with the automatic finding and maintenance of patterns in the network graph. In addition to the Adarules system, another result is a probabilistic model that characterizes a set of interpretable latent variables related to the traffic state based on the traffic data provided by the sensors along with optional prior knowledge provided by the traffic expert following a Bayesian approach. On top of this traffic state model, it is built the probabilistic spatiotemporal model that learns the dynamics of the transition of traffic states in the network, and whose objectives include the automatic incident detection.Esta tesis ha abordado el diseño y desarrollo de un sistema integrado para la predicción de tráfico en tiempo real basándose en métodos de aprendizaje automático. Aunque la predicción de tráfico ha sido la motivación que ha guiado el desarrollo de la tesis, gran parte de las ideas y aportaciones científicas propuestas en esta tesis son lo suficientemente genéricas como para ser aplicadas en cualquier otro problema en el que, idealmente, su definición sea la del flujo de información en una estructura de grafo. Esta aplicación es de especial interés en entornos susceptibles a cambios en el proceso de generación de datos. Además, la arquitectura modular facilita la adaptación a una gama más amplia de problemas. Por otra parte, ciertas partes específicas de esta tesis están fuertemente ligadas a la teoría del flujo de tráfico. El enfoque de esta tesis se centra en una perspectiva macroscópica del flujo de tráfico en la que los flujos individuales están ligados a la demanda de tráfico subyacente. Las predicciones a corto plazo incluyen la caracterización de las carreteras en base a las medidas de tráfico -flujo, densidad y/o velocidad-, el estado del tráfico -si la carretera está congestionada o no, y su severidad-, y la detección de condiciones anómalas -incidentes u otros eventos no recurrentes-. Los datos utilizados en esta tesis proceden de detectores instalados a lo largo de las redes de carreteras. No obstante, otros tipos de fuentes de datos podrían ser igualmente empleados con el preprocesamiento apropiado. Esta tesis ha sido desarrollada en el contexto de Aimsun Live -software desarrollado por Aimsun, basado en simulación para la predicción en tiempo real de tráfico-. Los métodos aquí propuestos cooperarán con este. Un ejemplo es cuando se detecta un incidente o un evento no recurrente, entonces pueden simularse diferentes estrategias para medir su impacto. Parte de esta tesis también ha sido desarrollada en el marco del proyecto de la UE "SETA" (H2020-ICT-2015). La principal motivación que ha guiado el desarrollo de esta tesis es mejorar aquellas limitaciones previamente identificadas en Aimsun Live, y cuya investigación encontrada en la literatura no ha sido muy extensa. Estos incluyen: -Autonomía, tanto en la etapa de preparación como en la de tiempo real. -Adaptación, a los cambios graduales o abruptos de la demanda u oferta de tráfico. -Sistema informativo, sobre las condiciones anómalas de la carretera. -Mejora en la precisión de las predicciones con respecto a la metodología anterior de Aimsun y a un método típico usado como referencia. -Robustez, para hacer frente a datos defectuosos o faltantes en tiempo real. -Interpretabilidad, adoptando criterios de modelización hacia un razonamiento más transparente para un humano. -Escalable, utilizando una arquitectura modular con énfasis en una explotación paralela de grandes cantidades de datos. El resultado de esta tesis es un sistema integrado –Adarules- para la predicción en tiempo real que sabe maximizar el provecho de los datos históricos disponibles, mientras que al mismo tiempo también sabe aprovechar el tamaño teórico ilimitado de los datos en un escenario de streaming. Esto se logra a través del aprendizaje en línea y la capacidad de detección de cambios junto con la búsqueda automática y el mantenimiento de los patrones en la estructura de grafo de la red. Además del sistema Adarules, otro resultado de la tesis es un modelo probabilístico que caracteriza un conjunto de variables latentes interpretables relacionadas con el estado del tráfico basado en los datos de sensores junto con el conocimiento previo –opcional- proporcionado por el experto en tráfico utilizando un planteamiento Bayesiano. Sobre este modelo de estados de tráfico se construye el modelo espacio-temporal probabilístico que aprende la dinámica de la transición de estado

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

An adaptive, fault-tolerant system for road network traffic prediction using machine learning

Author: Mena-Yedra Rafael
Publication venue: Universitat Politècnica de Catalunya
Publication date: 06/03/2020
Field of study

UPCommons. Portal del coneixement obert de la UPC