Search CORE

3 research outputs found

Fitting Spectral Decay with the k-Support Norm

Author: McDonald AM
Pontil M
Stamos D
Publication venue: Journal of Machine Learning Research
Publication date: 04/01/2016
Field of study

The spectral kk-support norm enjoys good estimation properties in low rank matrix learning problems, empirically outperforming the trace norm. Its unit ball is the convex hull of rank kk matrices with unit Frobenius norm. In this paper we generalize the norm to the spectral (k,p)(k,p)-support norm, whose additional parameter pp can be used to tailor the norm to the decay of the spectrum of the underlying model. We characterize the unit ball and we explicitly compute the norm. We further provide a conditional gradient method to solve regularization problems with the norm, and we derive an efficient algorithm to compute the Euclidean projection on the unit ball in the case p=∞p=∞. In numerical experiments, we show that allowing pp to vary significantly improves performance over the spectral kk-support norm on various matrix completion benchmarks, and better captures the spectral decay of the underlying model

arXiv.org e-Print Archive

UCL Discovery

Approximate Frank-Wolfe Algorithms over Graph-structured Support Sets

Author: Sun Yifan
Zhou Baojian
Publication venue
Publication date: 29/06/2021
Field of study

In this paper, we propose approximate Frank-Wolfe (FW) algorithms to solve convex optimization problems over graph-structured support sets where the \textit{linear minimization oracle} (LMO) cannot be efficiently obtained in general. We first demonstrate that two popular approximation assumptions (\textit{additive} and \textit{multiplicative gap errors)}, are not valid for our problem, in that no cheap gap-approximate LMO oracle exists in general. Instead, a new \textit{approximate dual maximization oracle} (DMO) is proposed, which approximates the inner product rather than the gap. When the objective is

L

-smooth, we prove that the standard FW method using a

\delta

-approximate DMO converges as

\mathcal{O}(L / \delta t + (1-\delta)(\delta^{-1} + \delta^{-2}))

in general, and as

\mathcal{O}(L/(\delta^2(t+2)))

over a

\delta

-relaxation of the constraint set. Additionally, when the objective is

\mu

-strongly convex and the solution is unique, a variant of FW converges to

\mathcal{O}(L^2\log(t)/(\mu \delta^6 t^2))

with the same per-iteration complexity. Our empirical results suggest that even these improved bounds are pessimistic, with significant improvement in recovering real-world images with graph-structured sparsity.Comment: 30 pages, 8 figure

arXiv.org e-Print Archive

Structured sparsity via optimal interpolation norms

Author: McDonald Andrew Michael
Publication venue: UCL (University College London)
Publication date: 28/10/2017
Field of study

We study norms that can be used as penalties in machine learning problems. In particular, we consider norms that are defined by an optimal interpolation problem and whose additional structure can be used to encourage specific characteristics, such as sparsity, in the solution to a learning problem. We first study a norm that is defined as an infimum of quadratics parameterized over a convex set. We show that this formulation includes the k-support norm for sparse vector learning, and its Moreau envelope, the box-norm. These extend naturally to spectral regularizers for matrices, and we introduce the spectral k-support norm and spectral box-norm. We study their properties and we apply the penalties to low rank matrix and multitask learning problems. We next introduce two generalizations of the k-support norm. The first of these is the (k, p)-support norm. In the matrix setting, the additional parameter p allows us to better learn the curvature of the spectrum of the underlying solution. A second application is to multilinear algebra. By considering the rank of its matricizations, we obtain a k-support norm that can be applied to learn a low rank tensor. For each of these norms we provide an optimization method to solve the underlying learning problem, and we present numerical experiments. Finally, we present a general framework for optimal interpolation norms. We focus on a specific formulation that involves an infimal convolution coupled with a linear operator, and which captures several of the penalties discussed in this thesis. Finally we introduce an algorithm to solve regularization problems with norms of this type, and we provide numerical experiments to illustrate the method

UCL Discovery