Search CORE

3,197 research outputs found

On convergence of the maximum block improvement method

Author: Li Zhening
Uschmajew André
Zhang Shuzhong
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2015
Field of study

Abstract. The MBI (maximum block improvement) method is a greedy approach to solving optimization problems where the decision variables can be grouped into a finite number of blocks. Assuming that optimizing over one block of variables while fixing all others is relatively easy, the MBI method updates the block of variables corresponding to the maximally improving block at each iteration, which is arguably a most natural and simple process to tackle block-structured problems with great potentials for engineering applications. In this paper we establish global and local linear convergence results for this method. The global convergence is established under the Lojasiewicz inequality assumption, while the local analysis invokes second-order assumptions. We study in particular the tensor optimization model with spherical constraints. Conditions for linear convergence of the famous power method for computing the maximum eigenvalue of a matrix follow in this framework as a special case. The condition is interpreted in various other forms for the rank-one tensor optimization model under spherical constraints. Numerical experiments are shown to support the convergence property of the MBI method

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

OPUS Augsburg

Crossref

Portsmouth University Research Portal (Pure)

Non-convex Optimization for Machine Learning

Author: Jain Prateek
Kar Purushottam
Publication venue: 'Now Publishers'
Publication date: 01/01/2017
Field of study

A vast majority of machine learning algorithms train their models and perform inference by solving optimization problems. In order to capture the learning and prediction problems accurately, structural constraints such as sparsity or low rank are frequently imposed or else the objective itself is designed to be a non-convex function. This is especially true of algorithms that operate in high-dimensional spaces or that train non-linear models such as tensor models and deep networks. The freedom to express the learning problem as a non-convex optimization problem gives immense modeling power to the algorithm designer, but often such problems are NP-hard to solve. A popular workaround to this has been to relax non-convex problems to convex ones and use traditional methods to solve the (convex) relaxed optimization problems. However this approach may be lossy and nevertheless presents significant challenges for large scale optimization. On the other hand, direct approaches to non-convex optimization have met with resounding success in several domains and remain the methods of choice for the practitioner, as they frequently outperform relaxation-based techniques - popular heuristics include projected gradient descent and alternating minimization. However, these are often poorly understood in terms of their convergence and other properties. This monograph presents a selection of recent advances that bridge a long-standing gap in our understanding of these heuristics. The monograph will lead the reader through several widely used non-convex optimization techniques, as well as applications thereof. The goal of this monograph is to both, introduce the rich literature in this area, as well as equip the reader with the tools and techniques needed to analyze these simple procedures for non-convex problems.Comment: The official publication is available from now publishers via http://dx.doi.org/10.1561/220000005

arXiv.org e-Print Archive

Crossref

CERN Document Server

Gradient methods for convex minimization: better rates under weaker conditions

Author: Yin Wotao
Zhang Hui
Publication venue
Publication date: 01/01/2013
Field of study

The convergence behavior of gradient methods for minimizing convex differentiable functions is one of the core questions in convex optimization. This paper shows that their well-known complexities can be achieved under conditions weaker than the commonly accepted ones. We relax the common gradient Lipschitz-continuity condition and strong convexity condition to ones that hold only over certain line segments. Specifically, we establish complexities

O(\frac{R}{\epsilon})

and

O(\sqrt{\frac{R}{\epsilon}})

for the ordinary and accelerate gradient methods, respectively, assuming that

\nabla f

is Lipschitz continuous with constant

R

over the line segment joining

x

and

x-\frac{1}{R}\nabla f

for each x\in\dom f. Then we improve them to

O(\frac{R}{\nu}\log(\frac{1}{\epsilon}))

and

O(\sqrt{\frac{R}{\nu}}\log(\frac{1}{\epsilon}))

for function

f

that also satisfies the secant inequality

\ \ge \nu\|x-x^*\|^2

for each x\in \dom f and its projection

x^*

to the minimizer set of

f

. The secant condition is also shown to be necessary for the geometric decay of solution error. Not only are the relaxed conditions met by more functions, the restrictions give smaller

R

and larger

\nu

than they are without the restrictions and thus lead to better complexity bounds. We apply these results to sparse optimization and demonstrate a faster algorithm.Comment: 20 pages, 4 figures, typos are corrected, Theorem 2 is ne

arXiv.org e-Print Archive

CiteSeerX

DSpace at Rice University