56,590 research outputs found
SVAG: Stochastic Variance Adjusted Gradient Descent and Biased Stochastic Gradients
We examine biased gradient updates in variance reduced stochastic gradient
methods. For this purpose we introduce SVAG, a SAG/SAGA-like method with
adjustable bias. SVAG is analyzed under smoothness assumptions and we provide
step-size conditions for convergence that match or improve on previously known
conditions for SAG and SAGA. The analysis highlights a step-size requirement
difference between when SVAG is applied to cocoercive operators and when
applied to gradients of smooth functions, a difference not present in ordinary
gradient descent. This difference is verified with numerical experiments. A
variant of SVAG that adaptively selects the bias is presented and compared
numerically to SVAG on a set of classification problems. The adaptive SVAG
frequently performs among the best and always improves on the worst-case
performance of the non-adaptive variant
The Loss Rank Principle for Model Selection
We introduce a new principle for model selection in regression and
classification. Many regression models are controlled by some smoothness or
flexibility or complexity parameter c, e.g. the number of neighbors to be
averaged over in k nearest neighbor (kNN) regression or the polynomial degree
in regression with polynomials. Let f_D^c be the (best) regressor of complexity
c on data D. A more flexible regressor can fit more data D' well than a more
rigid one. If something (here small loss) is easy to achieve it's typically
worth less. We define the loss rank of f_D^c as the number of other
(fictitious) data D' that are fitted better by f_D'^c than D is fitted by
f_D^c. We suggest selecting the model complexity c that has minimal loss rank
(LoRP). Unlike most penalized maximum likelihood variants (AIC,BIC,MDL), LoRP
only depends on the regression function and loss function. It works without a
stochastic noise model, and is directly applicable to any non-parametric
regressor, like kNN. In this paper we formalize, discuss, and motivate LoRP,
study it for specific regression problems, in particular linear ones, and
compare it to other model selection schemes.Comment: 16 page
Fast Nonsmooth Regularized Risk Minimization with Continuation
In regularized risk minimization, the associated optimization problem becomes
particularly difficult when both the loss and regularizer are nonsmooth.
Existing approaches either have slow or unclear convergence properties, are
restricted to limited problem subclasses, or require careful setting of a
smoothing parameter. In this paper, we propose a continuation algorithm that is
applicable to a large class of nonsmooth regularized risk minimization
problems, can be flexibly used with a number of existing solvers for the
underlying smoothed subproblem, and with convergence results on the whole
algorithm rather than just one of its subproblems. In particular, when
accelerated solvers are used, the proposed algorithm achieves the fastest known
rates of on strongly convex problems, and on general convex
problems. Experiments on nonsmooth classification and regression tasks
demonstrate that the proposed algorithm outperforms the state-of-the-art.Comment: AAAI-201
Minimax lower bounds for function estimation on graphs
We study minimax lower bounds for function estimation problems on large graph
when the target function is smoothly varying over the graph. We derive minimax
rates in the context of regression and classification problems on graphs that
satisfy an asymptotic shape assumption and with a smoothness condition on the
target function, both formulated in terms of the graph Laplacian
- …