Search CORE

56,590 research outputs found

SVAG: Stochastic Variance Adjusted Gradient Descent and Biased Stochastic Gradients

Author: Giselsson Pontus
Morin Martin
Publication venue
Publication date: 19/05/2020
Field of study

We examine biased gradient updates in variance reduced stochastic gradient methods. For this purpose we introduce SVAG, a SAG/SAGA-like method with adjustable bias. SVAG is analyzed under smoothness assumptions and we provide step-size conditions for convergence that match or improve on previously known conditions for SAG and SAGA. The analysis highlights a step-size requirement difference between when SVAG is applied to cocoercive operators and when applied to gradients of smooth functions, a difference not present in ordinary gradient descent. This difference is verified with numerical experiments. A variant of SVAG that adaptively selects the bias is presented and compared numerically to SVAG on a set of classification problems. The adaptive SVAG frequently performs among the best and always improves on the worst-case performance of the non-adaptive variant

arXiv.org e-Print Archive

Lund University Publications

The Loss Rank Principle for Model Selection

Author: A. Reusken
D.J.C. MacKay
G. Schwarz
J.J. Rissanen
T. Hastie
Z. Bai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

We introduce a new principle for model selection in regression and classification. Many regression models are controlled by some smoothness or flexibility or complexity parameter c, e.g. the number of neighbors to be averaged over in k nearest neighbor (kNN) regression or the polynomial degree in regression with polynomials. Let f_D^c be the (best) regressor of complexity c on data D. A more flexible regressor can fit more data D' well than a more rigid one. If something (here small loss) is easy to achieve it's typically worth less. We define the loss rank of f_D^c as the number of other (fictitious) data D' that are fitted better by f_D'^c than D is fitted by f_D^c. We suggest selecting the model complexity c that has minimal loss rank (LoRP). Unlike most penalized maximum likelihood variants (AIC,BIC,MDL), LoRP only depends on the regression function and loss function. It works without a stochastic noise model, and is directly applicable to any non-parametric regressor, like kNN. In this paper we formalize, discuss, and motivate LoRP, study it for specific regression problems, in particular linear ones, and compare it to other model selection schemes.Comment: 16 page

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

Fast Nonsmooth Regularized Risk Minimization with Continuation

Author: Kwok James T.
Zhang Ruiliang
Zheng Shuai
Publication venue
Publication date: 25/02/2016
Field of study

In regularized risk minimization, the associated optimization problem becomes particularly difficult when both the loss and regularizer are nonsmooth. Existing approaches either have slow or unclear convergence properties, are restricted to limited problem subclasses, or require careful setting of a smoothing parameter. In this paper, we propose a continuation algorithm that is applicable to a large class of nonsmooth regularized risk minimization problems, can be flexibly used with a number of existing solvers for the underlying smoothed subproblem, and with convergence results on the whole algorithm rather than just one of its subproblems. In particular, when accelerated solvers are used, the proposed algorithm achieves the fastest known rates of

O(1/T^2)

on strongly convex problems, and

O(1/T)

on general convex problems. Experiments on nonsmooth classification and regression tasks demonstrate that the proposed algorithm outperforms the state-of-the-art.Comment: AAAI-201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Minimax lower bounds for function estimation on graphs

Author: Kirichenko Alisa
van Zanten Harry
Publication venue
Publication date: 01/01/2018
Field of study

We study minimax lower bounds for function estimation problems on large graph when the target function is smoothly varying over the graph. We derive minimax rates in the context of regression and classification problems on graphs that satisfy an asymptotic shape assumption and with a smoothness condition on the target function, both formulated in terms of the graph Laplacian

arXiv.org e-Print Archive

UvA-DARE