56,590 research outputs found

    SVAG: Stochastic Variance Adjusted Gradient Descent and Biased Stochastic Gradients

    Full text link
    We examine biased gradient updates in variance reduced stochastic gradient methods. For this purpose we introduce SVAG, a SAG/SAGA-like method with adjustable bias. SVAG is analyzed under smoothness assumptions and we provide step-size conditions for convergence that match or improve on previously known conditions for SAG and SAGA. The analysis highlights a step-size requirement difference between when SVAG is applied to cocoercive operators and when applied to gradients of smooth functions, a difference not present in ordinary gradient descent. This difference is verified with numerical experiments. A variant of SVAG that adaptively selects the bias is presented and compared numerically to SVAG on a set of classification problems. The adaptive SVAG frequently performs among the best and always improves on the worst-case performance of the non-adaptive variant

    The Loss Rank Principle for Model Selection

    Full text link
    We introduce a new principle for model selection in regression and classification. Many regression models are controlled by some smoothness or flexibility or complexity parameter c, e.g. the number of neighbors to be averaged over in k nearest neighbor (kNN) regression or the polynomial degree in regression with polynomials. Let f_D^c be the (best) regressor of complexity c on data D. A more flexible regressor can fit more data D' well than a more rigid one. If something (here small loss) is easy to achieve it's typically worth less. We define the loss rank of f_D^c as the number of other (fictitious) data D' that are fitted better by f_D'^c than D is fitted by f_D^c. We suggest selecting the model complexity c that has minimal loss rank (LoRP). Unlike most penalized maximum likelihood variants (AIC,BIC,MDL), LoRP only depends on the regression function and loss function. It works without a stochastic noise model, and is directly applicable to any non-parametric regressor, like kNN. In this paper we formalize, discuss, and motivate LoRP, study it for specific regression problems, in particular linear ones, and compare it to other model selection schemes.Comment: 16 page

    Fast Nonsmooth Regularized Risk Minimization with Continuation

    Full text link
    In regularized risk minimization, the associated optimization problem becomes particularly difficult when both the loss and regularizer are nonsmooth. Existing approaches either have slow or unclear convergence properties, are restricted to limited problem subclasses, or require careful setting of a smoothing parameter. In this paper, we propose a continuation algorithm that is applicable to a large class of nonsmooth regularized risk minimization problems, can be flexibly used with a number of existing solvers for the underlying smoothed subproblem, and with convergence results on the whole algorithm rather than just one of its subproblems. In particular, when accelerated solvers are used, the proposed algorithm achieves the fastest known rates of O(1/T2)O(1/T^2) on strongly convex problems, and O(1/T)O(1/T) on general convex problems. Experiments on nonsmooth classification and regression tasks demonstrate that the proposed algorithm outperforms the state-of-the-art.Comment: AAAI-201

    Minimax lower bounds for function estimation on graphs

    Get PDF
    We study minimax lower bounds for function estimation problems on large graph when the target function is smoothly varying over the graph. We derive minimax rates in the context of regression and classification problems on graphs that satisfy an asymptotic shape assumption and with a smoothness condition on the target function, both formulated in terms of the graph Laplacian
    • …
    corecore