Search CORE

20 research outputs found

Sparse projections onto the simplex

Author: and Volkan Cevher
Becker Stephen
Koch Christoph
Kyrillidis Anastasios
Publication venue
Publication date: 10/04/2013
Field of study

Most learning methods with rank or sparsity constraints use convex relaxations, which lead to optimization with the nuclear norm or the

\ell_1

-norm. However, several important learning applications cannot benefit from this approach as they feature these convex norms as constraints in addition to the non-convex rank and sparsity constraints. In this setting, we derive efficient sparse projections onto the simplex and its extension, and illustrate how to use them to solve high-dimensional learning problems in quantum tomography, sparse density estimation and portfolio selection with non-convex constraints.Comment: 9 Page

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Randomized Low-Memory Singular Value Projection

Author: Becker Stephen
Cevher Volkan
Kyrillidis Anastasios
Publication venue
Publication date: 25/02/2013
Field of study

Affine rank minimization algorithms typically rely on calculating the gradient of a data error followed by a singular value decomposition at every iteration. Because these two steps are expensive, heuristic approximations are often used to reduce computational burden. To this end, we propose a recovery scheme that merges the two steps with randomized approximations, and as a result, operates on space proportional to the degrees of freedom in the problem. We theoretically establish the estimation guarantees of the algorithm as a function of approximation tolerance. While the theoretical approximation requirements are overly pessimistic, we demonstrate that in practice the algorithm performs well on the quantum tomography recovery problem.Comment: 13 pages. This version has a revised theorem and new numerical experiment

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Peaceman-Rachford splitting for a class of nonconvex optimization problems

Author: Li Guoyin
Liu Tianxiang
Pong Ting Kei
Publication venue
Publication date: 09/01/2017
Field of study

We study the applicability of the Peaceman-Rachford (PR) splitting method for solving nonconvex optimization problems. When applied to minimizing the sum of a strongly convex Lipschitz differentiable function and a proper closed function, we show that if the strongly convex function has a large enough strong convexity modulus and the step-size parameter is chosen below a threshold that is computable, then any cluster point of the sequence generated, if exists, will give a stationary point of the optimization problem. We also give sufficient conditions guaranteeing boundedness of the sequence generated. We then discuss one way to split the objective so that the proposed method can be suitably applied to solving optimization problems with a coercive objective that is the sum of a (not necessarily strongly) convex Lipschitz differentiable function and a proper closed function; this setting covers a large class of nonconvex feasibility problems and constrained least squares problems. Finally, we illustrate the proposed algorithm numerically

arXiv.org e-Print Archive

PolyU Institutional Repository

UNSWorks

Global convergence of splitting methods for nonconvex composite optimization

Author: Li Guoyin
Pong Ting Kei
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2015
Field of study

We consider the problem of minimizing the sum of a smooth function

h

with a bounded Hessian, and a nonsmooth function. We assume that the latter function is a composition of a proper closed function

P

and a surjective linear map

\cal M

, with the proximal mappings of

\tau P

\tau > 0

, simple to compute. This problem is nonconvex in general and encompasses many important applications in engineering and machine learning. In this paper, we examined two types of splitting methods for solving this nonconvex optimization problem: alternating direction method of multipliers and proximal gradient algorithm. For the direct adaptation of the alternating direction method of multipliers, we show that, if the penalty parameter is chosen sufficiently large and the sequence generated has a cluster point, then it gives a stationary point of the nonconvex problem. We also establish convergence of the whole sequence under an additional assumption that the functions

h

and

P

are semi-algebraic. Furthermore, we give simple sufficient conditions to guarantee boundedness of the sequence generated. These conditions can be satisfied for a wide range of applications including the least squares problem with the

\ell_{1/2}

regularization. Finally, when

\cal M

is the identity so that the proximal gradient algorithm can be efficiently applied, we show that any cluster point is stationary under a slightly more flexible constant step-size rule than what is known in the literature for a nonconvex

h

.Comment: To appear in SIOP

arXiv.org e-Print Archive

The Hong Kong Polytechnic University Pao Yue-kong Library

UNSWorks

Calculus of the exponent of Kurdyka-{\L}ojasiewicz inequality and its applications to linear convergence of first-order methods

Author: Li Guoyin
Pong Ting Kei
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/08/2017
Field of study

In this paper, we study the Kurdyka-{\L}ojasiewicz (KL) exponent, an important quantity for analyzing the convergence rate of first-order methods. Specifically, we develop various calculus rules to deduce the KL exponent of new (possibly nonconvex and nonsmooth) functions formed from functions with known KL exponents. In addition, we show that the well-studied Luo-Tseng error bound together with a mild assumption on the separation of stationary values implies that the KL exponent is

\frac12

. The Luo-Tseng error bound is known to hold for a large class of concrete structured optimization problems, and thus we deduce the KL exponent of a large class of functions whose exponents were previously unknown. Building upon this and the calculus rules, we are then able to show that for many convex or nonconvex optimization models for applications such as sparse recovery, their objective function's KL exponent is

\frac12

. This includes the least squares problem with smoothly clipped absolute deviation (SCAD) regularization or minimax concave penalty (MCP) regularization and the logistic regression problem with

\ell_1

regularization. Since many existing local convergence rate analysis for first-order methods in the nonconvex scenario relies on the KL exponent, our results enable us to obtain explicit convergence rate for various first-order methods when they are applied to a large variety of practical optimization models. Finally, we further illustrate how our results can be applied to establishing local linear convergence of the proximal gradient algorithm and the inertial proximal algorithm with constant step-sizes for some specific models that arise in sparse recovery.Comment: The paper is accepted for publication in Foundations of Computational Mathematics: https://link.springer.com/article/10.1007/s10208-017-9366-8. In this update, we fill in more details to the proof of Theorem 4.1 concerning the nonemptiness of the projection onto the set of stationary point

arXiv.org e-Print Archive

Crossref

PolyU Institutional Repository

UNSWorks

Double spike Dirichlet priors for structured weighting

Author: Li Meng
Lin Huiming
Publication venue
Publication date: 30/03/2021
Field of study

Assigning weights to a large pool of objects is a fundamental task in a wide variety of applications. In this article, we introduce a concept of structured high-dimensional probability simplexes, whose most components are zero or near zero and the remaining ones are close to each other. Such structure is well motivated by 1) high-dimensional weights that are common in modern applications, and 2) ubiquitous examples in which equal weights -- despite their simplicity -- often achieve favorable or even state-of-the-art predictive performances. This particular structure, however, presents unique challenges both computationally and statistically. To address these challenges, we propose a new class of double spike Dirichlet priors to shrink a probability simplex to one with the desired structure. When applied to ensemble learning, such priors lead to a Bayesian method for structured high-dimensional ensembles that is useful for forecast combination and improving random forests, while enabling uncertainty quantification. We design efficient Markov chain Monte Carlo algorithms for easy implementation. Posterior contraction rates are established to provide theoretical support. We demonstrate the wide applicability and competitive performance of the proposed methods through simulations and two real data applications using the European Central Bank Survey of Professional Forecasters dataset and a UCI dataset

arXiv.org e-Print Archive