Search CORE

1,025 research outputs found

Learning with SGD and Random Features

Author: Carratino Luigi
Rosasco Lorenzo
Rudi Alessandro
Publication venue
Publication date: 01/12/2018
Field of study

Sketching and stochastic gradient methods are arguably the most common techniques to derive efficient large scale learning algorithms. In this paper, we investigate their application in the context of nonparametric statistical learning. More precisely, we study the estimator defined by stochastic gradient with mini batches and random features. The latter can be seen as form of nonlinear sketching and used to define approximate kernel methods. The considered estimator is not explicitly penalized/constrained and regularization is implicit. Indeed, our study highlights how different parameters, such as number of features, iterations, step-size and mini-batch size control the learning properties of the solutions. We do this by deriving optimal finite sample bounds, under standard assumptions. The obtained results are corroborated and illustrated by numerical experiments

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Practical sketching algorithms for low-rank matrix approximation

Author: Cevher Volkan
Tropp Joel A.
Udell Madeleine
Yurtsever Alp
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 06/12/2017
Field of study

This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Caltech Authors

Sublinear Time Numerical Linear Algebra for Structured Matrices

Author: Shi Xiaofei
Woodruff David P.
Publication venue
Publication date: 17/07/2019
Field of study

We show how to solve a number of problems in numerical linear algebra, such as least squares regression,

\ell_p

-regression for any

p \geq 1

, low rank approximation, and kernel regression, in time T(A) \poly(\log(nd)), where for a given input matrix

A \in \mathbb{R}^{n \times d}

T(A)

is the time needed to compute

A\cdot y

for an arbitrary vector

y \in \mathbb{R}^d

. Since T(A) \leq O(\nnz(A)), where \nnz(A) denotes the number of non-zero entries of

A

, the time is no worse, up to polylogarithmic factors, as all of the recent advances for such problems that run in input-sparsity time. However, for many applications,

T(A)

can be much smaller than \nnz(A), yielding significantly sublinear time algorithms. For example, in the overconstrained

(1+\epsilon)

-approximate polynomial interpolation problem,

A

is a Vandermonde matrix and

T(A) = O(n \log n)

; in this case our running time is n \cdot \poly(\log n) + \poly(d/\epsilon) and we recover the results of \cite{avron2013sketching} as a special case. For overconstrained autoregression, which is a common problem arising in dynamical systems,

T(A) = O(n \log n)

, and we immediately obtain n \cdot \poly(\log n) + \poly(d/\epsilon) time. For kernel autoregression, we significantly improve the running time of prior algorithms for general kernels. For the important case of autoregression with the polynomial kernel and arbitrary target vector

b\in\mathbb{R}^n

, we obtain even faster algorithms. Our algorithms show that, perhaps surprisingly, most of these optimization problems do not require much more time than that of a polylogarithmic number of matrix-vector multiplications

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Network Sketching: Exploiting Binary Structure in Deep CNNs

Author: Chen Yurong
Guo Yiwen
Yao Anbang
Zhao Hao
Publication venue
Publication date: 06/06/2017
Field of study

Convolutional neural networks (CNNs) with deep architectures have substantially advanced the state-of-the-art in computer vision tasks. However, deep networks are typically resource-intensive and thus difficult to be deployed on mobile devices. Recently, CNNs with binary weights have shown compelling efficiency to the community, whereas the accuracy of such models is usually unsatisfactory in practice. In this paper, we introduce network sketching as a novel technique of pursuing binary-weight CNNs, targeting at more faithful inference and better trade-off for practical applications. Our basic idea is to exploit binary structure directly in pre-trained filter banks and produce binary-weight models via tensor expansion. The whole process can be treated as a coarse-to-fine model approximation, akin to the pencil drawing steps of outlining and shading. To further speedup the generated models, namely the sketches, we also propose an associative implementation of binary tensor convolutions. Experimental results demonstrate that a proper sketch of AlexNet (or ResNet) outperforms the existing binary-weight models by large margins on the ImageNet large scale classification task, while the committed memory for network parameters only exceeds a little.Comment: To appear in CVPR201

arXiv.org e-Print Archive

Crossref

FedNS: A Fast Sketching Newton-Type Algorithm for Federated Learning

Author: Li Jian
Liu Yong
Wang Wei
Wang Weiping
Wu Haoran
Publication venue
Publication date: 05/01/2024
Field of study

Recent Newton-type federated learning algorithms have demonstrated linear convergence with respect to the communication rounds. However, communicating Hessian matrices is often unfeasible due to their quadratic communication complexity. In this paper, we introduce a novel approach to tackle this issue while still achieving fast convergence rates. Our proposed method, named as Federated Newton Sketch methods (FedNS), approximates the centralized Newton's method by communicating the sketched square-root Hessian instead of the exact Hessian. To enhance communication efficiency, we reduce the sketch size to match the effective dimension of the Hessian matrix. We provide convergence analysis based on statistical learning for the federated Newton sketch approaches. Specifically, our approaches reach super-linear convergence rates w.r.t. the communication rounds for the first time. We validate the effectiveness of our algorithms through various experiments, which coincide with our theoretical findings.Comment: Accepted at AAAI 202

arXiv.org e-Print Archive

Training (Overparametrized) Neural Networks in Near-Linear Time

Author: Peng Binghui
Song Zhao
van den Brand Jan
Weinstein Omri
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 12th Innovations in Theoretical Computer Science Conference (ITCS 2021)
Publication date: 08/12/2020
Field of study

The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for developing faster

\mathit{second}

\mathit{order}

optimization algorithms beyond SGD, without compromising the generalization error. Despite their remarkable convergence rate (

\mathit{independent}

of the training batch size

n

), second-order algorithms incur a daunting slowdown in the

\mathit{cost}

\mathit{per}

\mathit{iteration}

(inverting the Hessian matrix of the loss function), which renders them impractical. Very recently, this computational overhead was mitigated by the works of [ZMG19,CGH+19}, yielding an

O(mn^2)

-time second-order algorithm for training two-layer overparametrized neural networks of polynomial width

m

. We show how to speed up the algorithm of [CGH+19], achieving an

\tilde{O}(mn)

-time backpropagation algorithm for training (mildly overparametrized) ReLU networks, which is near-linear in the dimension (

mn

) of the full gradient (Jacobian) matrix. The centerpiece of our algorithm is to reformulate the Gauss-Newton iteration as an

\ell_2

-regression problem, and then use a Fast-JL type dimension reduction to

\mathit{precondition}

the underlying Gram matrix in time independent of

M

, allowing to find a sufficiently good approximate solution via

\mathit{first}

\mathit{order}

conjugate gradient. Our result provides a proof-of-concept that advanced machinery from randomized linear algebra -- which led to recent breakthroughs in

\mathit{convex}

\mathit{optimization}

(ERM, LPs, Regression) -- can be carried over to the realm of deep learning as well

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server