Search CORE

111,982 research outputs found

Second-Order Stochastic Optimization for Machine Learning in Linear Time

Author: Agarwal Naman
Bullins Brian
Hazan Elad
Publication venue
Publication date: 01/11/2017
Field of study

First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity. Second-order methods, while able to provide faster convergence, have been much less explored due to the high cost of computing the second-order information. In this paper we develop second-order stochastic methods for optimization problems in machine learning that match the per-iteration cost of gradient based methods, and in certain settings improve upon the overall running time over popular first-order methods. Furthermore, our algorithm has the desirable property of being implementable in time linear in the sparsity of the input data

arXiv.org e-Print Archive

Princeton University Open Access Repository

Recommended from our members

Stochastic gradients methods for statistical inference

Author: Li Tianyang, Ph. D.
Publication venue
Publication date: 10/07/2019
Field of study

Statistical inference, such as hypothesis testing and calculating a confidence interval, is an important tool for accessing uncertainty in machine learning and statistical problems. Stochastic gradient methods, such as stochastic gradient descent (SGD), have recently been successfully applied to point estimation in large scale machine learning problems. In this work, we present novel stochastic gradient methods for statistical inference in large scale machine learning problems. Unregularized M -estimation using SGD. Using SGD with a fixed step size, we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the Ornstein-Uhlenbeck process suggests that such averages are asymptotically normal. From a practical perspective, our SGD-based inference procedure is a first order method, and is well-suited for large scale problems. To show its merits, we apply it to both synthetic and real datasets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation. Approximate Newton-based statistical inference using only stochastic gradients for unregularized M -estimation. We present a novel inference framework for convex empirical risk minimization, using approximate stochastic Newton steps. The proposed algorithm is based on the notion of finite differences and allows the approximation of a Hessian-vector product from first-order information. In theory, our method efficiently computes the statistical error covariance in M -estimation for unregularized convex learning problems, without using exact second order information, or resampling the entire data set. In practice, we demonstrate the effectiveness of our framework on large-scale machine learning problems, that go even beyond convexity: as a highlight, our work can be used to detect certain adversarial attacks on neural networks. High dimensional linear regression statistical inference using only stochastic gra- dients. As an extension of the approximate Newton-based statistical inference algorithm for unregularized problems, we present a similar algorithm, using only stochastic gradients, for statistical inference in high dimensional linear regression, where the number of features is much larger than the number of samples. Stochastic gradient methods for time series analysis. We present a novel stochastic gradient descent algorithm for time series analysis, which correctly captures correlation structures in a time series dataset during optimization. Instead of uniformly sampling indices in vanilla SGD, we uniformly sample contiguous blocks of indices, where the block length depends on the dataset.Computer Science

Texas ScholarWorks

Optimization Methods for Inverse Problems

Author: Cui Tiangang
Roosta-Khorasani Farbod
Ye Nan
Publication venue
Publication date: 30/11/2017
Field of study

Optimization plays an important role in solving many inverse problems. Indeed, the task of inversion often either involves or is fully cast as a solution of an optimization problem. In this light, the mere non-linear, non-convex, and large-scale nature of many of these inversions gives rise to some very challenging optimization problems. The inverse problem community has long been developing various techniques for solving such optimization tasks. However, other, seemingly disjoint communities, such as that of machine learning, have developed, almost in parallel, interesting alternative methods which might have stayed under the radar of the inverse problem community. In this survey, we aim to change that. In doing so, we first discuss current state-of-the-art optimization methods widely used in inverse problems. We then survey recent related advances in addressing similar challenges in problems faced by the machine learning community, and discuss their potential advantages for solving inverse problems. By highlighting the similarities among the optimization challenges faced by the inverse problem and the machine learning communities, we hope that this survey can serve as a bridge in bringing together these two communities and encourage cross fertilization of ideas.Comment: 13 page

arXiv.org e-Print Archive

University of Queensland eSpace

Stochastic Training of Neural Networks via Successive Convex Approximations

Author: Di Lorenzo Paolo
Scardapane Simone
Publication venue
Publication date: 15/06/2017
Field of study

This paper proposes a new family of algorithms for training neural networks (NNs). These are based on recent developments in the field of non-convex optimization, going under the general name of successive convex approximation (SCA) techniques. The basic idea is to iteratively replace the original (non-convex, highly dimensional) learning problem with a sequence of (strongly convex) approximations, which are both accurate and simple to optimize. Differently from similar ideas (e.g., quasi-Newton algorithms), the approximations can be constructed using only first-order information of the neural network function, in a stochastic fashion, while exploiting the overall structure of the learning problem for a faster convergence. We discuss several use cases, based on different choices for the loss function (e.g., squared loss and cross-entropy loss), and for the regularization of the NN's weights. We experiment on several medium-sized benchmark problems, and on a large-scale dataset involving simulated physical data. The results show how the algorithm outperforms state-of-the-art techniques, providing faster convergence to a better minimum. Additionally, we show how the algorithm can be easily parallelized over multiple computational units without hindering its performance. In particular, each computational unit can optimize a tailored surrogate function defined on a randomly assigned subset of the input variables, whose dimension can be selected depending entirely on the available computational power.Comment: Preprint submitted to IEEE Transactions on Neural Networks and Learning System

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza