Search CORE

11,176 research outputs found

Exclusive Sparsity Norm Minimization with Random Groups via Cone Projection

Author: Huang Yijun
Liu Ji
Publication venue
Publication date: 27/10/2015
Field of study

Many practical applications such as gene expression analysis, multi-task learning, image recognition, signal processing, and medical data analysis pursue a sparse solution for the feature selection purpose and particularly favor the nonzeros \emph{evenly} distributed in different groups. The exclusive sparsity norm has been widely used to serve to this purpose. However, it still lacks systematical studies for exclusive sparsity norm optimization. This paper offers two main contributions from the optimization perspective: 1) We provide several efficient algorithms to solve exclusive sparsity norm minimization with either smooth loss or hinge loss (non-smooth loss). All algorithms achieve the optimal convergence rate

O(1/k^2)

(

k

is the iteration number). To the best of our knowledge, this is the first time to guarantee such convergence rate for the general exclusive sparsity norm minimization; 2) When the group information is unavailable to define the exclusive sparsity norm, we propose to use the random grouping scheme to construct groups and prove that if the number of groups is appropriately chosen, the nonzeros (true features) would be grouped in the ideal way with high probability. Empirical studies validate the efficiency of proposed algorithms, and the effectiveness of random grouping scheme on the proposed exclusive SVM formulation

arXiv.org e-Print Archive

Fast Sparse Least-Squares Regression with Non-Asymptotic Guarantees

Author: Jin Rong
Lin Qihang
Yang Tianbao
Zhang Lijun
Publication venue
Publication date: 18/07/2015
Field of study

In this paper, we study a fast approximation method for {\it large-scale high-dimensional} sparse least-squares regression problem by exploiting the Johnson-Lindenstrauss (JL) transforms, which embed a set of high-dimensional vectors into a low-dimensional space. In particular, we propose to apply the JL transforms to the data matrix and the target vector and then to solve a sparse least-squares problem on the compressed data with a {\it slightly larger regularization parameter}. Theoretically, we establish the optimization error bound of the learned model for two different sparsity-inducing regularizers, i.e., the elastic net and the

\ell_1

norm. Compared with previous relevant work, our analysis is {\it non-asymptotic and exhibits more insights} on the bound, the sample complexity and the regularization. As an illustration, we also provide an error bound of the {\it Dantzig selector} under JL transforms

arXiv.org e-Print Archive

Fast and Scalable Lasso via Stochastic Frank-Wolfe Methods with a Convergence Guarantee

Author: Frandi Emanuele
Lodi Stefano
Nanculef Ricardo
Sartori Claudio
Suykens Johan A. K.
Publication venue
Publication date: 24/10/2015
Field of study

Frank-Wolfe (FW) algorithms have been often proposed over the last few years as efficient solvers for a variety of optimization problems arising in the field of Machine Learning. The ability to work with cheap projection-free iterations and the incremental nature of the method make FW a very effective choice for many large-scale problems where computing a sparse model is desirable. In this paper, we present a high-performance implementation of the FW method tailored to solve large-scale Lasso regression problems, based on a randomized iteration, and prove that the convergence guarantees of the standard FW method are preserved in the stochastic setting. We show experimentally that our algorithm outperforms several existing state of the art methods, including the Coordinate Descent algorithm by Friedman et al. (one of the fastest known Lasso solvers), on several benchmark datasets with a very large number of features, without sacrificing the accuracy of the model. Our results illustrate that the algorithm is able to generate the complete regularization path on problems of size up to four million variables in less than one minute

arXiv.org e-Print Archive

Randomized sketch descent methods for non-separable linearly constrained optimization

Author: Necoara Ion
Takac Martin
Publication venue
Publication date: 07/08/2018
Field of study

In this paper we consider large-scale smooth optimization problems with multiple linear coupled constraints. Due to the non-separability of the constraints, arbitrary random sketching would not be guaranteed to work. Thus, we first investigate necessary and sufficient conditions for the sketch sampling to have well-defined algorithms. Based on these sampling conditions we developed new sketch descent methods for solving general smooth linearly constrained problems, in particular, random sketch descent and accelerated random sketch descent methods. From our knowledge, this is the first convergence analysis of random sketch descent algorithms for optimization problems with multiple non-separable linear constraints. For the general case, when the objective function is smooth and non-convex, we prove for the non-accelerated variant sublinear rate in expectation for an appropriate optimality measure. In the smooth convex case, we derive for both algorithms, non-accelerated and accelerated random sketch descent, sublinear convergence rates in the expected values of the objective function. Additionally, if the objective function satisfies a strong convexity type condition, both algorithms converge linearly in expectation. In special cases, where complexity bounds are known for some particular sketching algorithms, such as coordinate descent methods for optimization problems with a single linear coupled constraint, our theory recovers the best-known bounds. We also show that when random sketch is sketching the coordinate directions randomly produces better results than the fixed selection rule. Finally, we present some numerical examples to illustrate the performances of our new algorithms.Comment: 28 page

arXiv.org e-Print Archive

A Field Guide to Forward-Backward Splitting with a FASTA Implementation

Author: Baraniuk Richard
Goldstein Tom
Studer Christoph
Publication venue
Publication date: 27/12/2016
Field of study

Non-differentiable and constrained optimization play a key role in machine learning, signal and image processing, communications, and beyond. For high-dimensional minimization problems involving large datasets or many unknowns, the forward-backward splitting method provides a simple, practical solver. Despite its apparently simplicity, the performance of the forward-backward splitting is highly sensitive to implementation details. This article is an introductory review of forward-backward splitting with a special emphasis on practical implementation concerns. Issues like stepsize selection, acceleration, stopping conditions, and initialization are considered. Numerical experiments are used to compare the effectiveness of different approaches. Many variations of forward-backward splitting are implemented in the solver FASTA (short for Fast Adaptive Shrinkage/Thresholding Algorithm). FASTA provides a simple interface for applying forward-backward splitting to a broad range of problems

arXiv.org e-Print Archive

Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces

Author: Dabney Will
Gemp Ian
Giguere Steve
Jacek Nicholas
Liu Bo
Liu Ji
Mahadevan Sridhar
Thomas Philip
Publication venue
Publication date: 26/05/2014
Field of study

In this paper, we set forth a new vision of reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding important questions that have remained unresolved: (i) how to design reliable, convergent, and robust reinforcement learning algorithms (ii) how to guarantee that reinforcement learning satisfies pre-specified "safety" guarantees, and remains in a stable region of the parameter space (iii) how to design "off-policy" temporal difference learning algorithms in a reliable and stable manner, and finally (iv) how to integrate the study of reinforcement learning into the rich theory of stochastic optimization. In this paper, we provide detailed answers to all these questions using the powerful framework of proximal operators. The key idea that emerges is the use of primal dual spaces connected through the use of a Legendre transform. This allows temporal difference updates to occur in dual spaces, allowing a variety of important technical advantages. The Legendre transform elegantly generalizes past algorithms for solving reinforcement learning problems, such as natural gradient methods, which we show relate closely to the previously unconnected framework of mirror descent methods. Equally importantly, proximal operator theory enables the systematic development of operator splitting methods that show how to safely and reliably decompose complex products of gradients that occur in recent variants of gradient-based temporal difference learning. This key technical innovation makes it possible to finally design "true" stochastic gradient methods for reinforcement learning. Finally, Legendre transforms enable a variety of other benefits, including modeling sparsity and domain geometry. Our work builds extensively on recent work on the convergence of saddle-point algorithms, and on the theory of monotone operators.Comment: 121 page

arXiv.org e-Print Archive

Proximal Distance Algorithms: Theory and Examples

Author: Keys Kevin L.
Lange Kenneth
Zhou Hua
Publication venue
Publication date: 30/08/2018
Field of study

Proximal distance algorithms combine the classical penalty method of constrained minimization with distance majorization. If

f(\boldsymbol{x})

is the loss function, and

C

is the constraint set in a constrained minimization problem, then the proximal distance principle mandates minimizing the penalized loss

f(\boldsymbol{x})+\frac{\rho}{2}\mathop{dist}(x,C)^2

and following the solution

\boldsymbol{x}_{\rho}

to its limit as

\rho

tends to

\infty

. At each iteration the squared Euclidean distance

\mathop{dist}(\boldsymbol{x},C)^2

is majorized by the spherical quadratic

\| \boldsymbol{x}-P_C(\boldsymbol{x}_k)\|^2

, where

P_C(\boldsymbol{x}_k)

denotes the projection of the current iterate

\boldsymbol{x}_k

onto

C

. The minimum of the surrogate function

f(\boldsymbol{x})+\frac{\rho}{2}\|\boldsymbol{x}-P_C(\boldsymbol{x}_k)\|^2

is given by the proximal map

\mathop{prox}_{\rho^{-1}f}[P_C(\boldsymbol{x}_k)]

. The next iterate

\boldsymbol{x}_{k+1}

automatically decreases the original penalized loss for fixed

\rho

. Since many explicit projections and proximal maps are known, it is straightforward to derive and implement novel optimization algorithms in this setting. These algorithms can take hundreds if not thousands of iterations to converge, but the stereotyped nature of each iteration makes proximal distance algorithms competitive with traditional algorithms. For convex problems, we prove global convergence. Our numerical examples include a) linear programming, b) nonnegative quadratic programming, c) projection to the closest kinship matrix, d) projection onto a second-order cone constraint, e) calculation of Horn's copositive matrix index, f) linear complementarity programming, and g) sparse principal components analysis. The proximal distance algorithm in each case is competitive or superior in speed to traditional methods.Comment: 23 pages, 2 figures, 7 table

arXiv.org e-Print Archive

On the Suboptimality of Proximal Gradient Descent for $\ell^{0}$ Sparse Approximation

Author: Feng Jiashi
Huang Thomas S.
Jojic Nebojsa
Yang Jianchao
Yang Yingzhen
Publication venue
Publication date: 05/09/2017
Field of study

We study the proximal gradient descent (PGD) method for

\ell^{0}

sparse approximation problem as well as its accelerated optimization with randomized algorithms in this paper. We first offer theoretical analysis of PGD showing the bounded gap between the sub-optimal solution by PGD and the globally optimal solution for the

\ell^{0}

sparse approximation problem under conditions weaker than Restricted Isometry Property widely used in compressive sensing literature. Moreover, we propose randomized algorithms to accelerate the optimization by PGD using randomized low rank matrix approximation (PGD-RMA) and randomized dimension reduction (PGD-RDR). Our randomized algorithms substantially reduces the computation cost of the original PGD for the

\ell^{0}

sparse approximation problem, and the resultant sub-optimal solution still enjoys provable suboptimality, namely, the sub-optimal solution to the reduced problem still has bounded gap to the globally optimal solution to the original problem

arXiv.org e-Print Archive

Efficient numerical algorithms for regularized regression problem with applications to traffic matrix estimations

Author: Anikin Anton
Dvurechensky Pavel
Gasnikov Alexander
Golov Andrey
Gornov Alexander
Maximov Yury
Mendel Mikhail
Spokoiny Vladimir
Publication venue
Publication date: 17/04/2016
Field of study

In this work we collect and compare to each other many different numerical methods for regularized regression problem and for the problem of projection on a hyperplane. Such problems arise, for example, as a subproblem of demand matrix estimation in IP- networks. In this special case matrix of affine constraints has special structure: all elements are 0 or 1 and this matrix is sparse enough. We have to deal with huge-scale convex optimization problem of special type. Using the properties of the problem we try "to look inside the black-box" and to see how the best modern methods work being applied to this problem.Comment: 16 pages; Information Technologies and Systems. Sochi: September, 201

arXiv.org e-Print Archive

Sparse Trace Norm Regularization

Author: Chen Jianhui
Ye Jieping
Publication venue
Publication date: 01/06/2012
Field of study

We study the problem of estimating multiple predictive functions from a dictionary of basis functions in the nonparametric regression setting. Our estimation scheme assumes that each predictive function can be estimated in the form of a linear combination of the basis functions. By assuming that the coefficient matrix admits a sparse low-rank structure, we formulate the function estimation problem as a convex program regularized by the trace norm and the

\ell_1

-norm simultaneously. We propose to solve the convex program using the accelerated gradient (AG) method and the alternating direction method of multipliers (ADMM) respectively; we also develop efficient algorithms to solve the key components in both AG and ADMM. In addition, we conduct theoretical analysis on the proposed function estimation scheme: we derive a key property of the optimal solution to the convex program; based on an assumption on the basis functions, we establish a performance bound of the proposed function estimation scheme (via the composite regularization). Simulation studies demonstrate the effectiveness and efficiency of the proposed algorithms

arXiv.org e-Print Archive