11,214 research outputs found
Statistical inference using SGD
We present a novel method for frequentist statistical inference in
-estimation problems, based on stochastic gradient descent (SGD) with a
fixed step size: we demonstrate that the average of such SGD sequences can be
used for statistical inference, after proper scaling. An intuitive analysis
using the Ornstein-Uhlenbeck process suggests that such averages are
asymptotically normal. From a practical perspective, our SGD-based inference
procedure is a first order method, and is well-suited for large scale problems.
To show its merits, we apply it to both synthetic and real datasets, and
demonstrate that its accuracy is comparable to classical statistical methods,
while requiring potentially far less computation.Comment: To appear in AAAI 201
Online Bootstrap Inference with Nonconvex Stochastic Gradient Descent Estimator
In this paper, we investigate the theoretical properties of stochastic
gradient descent (SGD) for statistical inference in the context of nonconvex
optimization problems, which have been relatively unexplored compared to convex
settings. Our study is the first to establish provable inferential procedures
using the SGD estimator for general nonconvex objective functions, which may
contain multiple local minima.
We propose two novel online inferential procedures that combine SGD and the
multiplier bootstrap technique. The first procedure employs a consistent
covariance matrix estimator, and we establish its error convergence rate. The
second procedure approximates the limit distribution using bootstrap SGD
estimators, yielding asymptotically valid bootstrap confidence intervals. We
validate the effectiveness of both approaches through numerical experiments.
Furthermore, our analysis yields an intermediate result: the in-expectation
error convergence rate for the original SGD estimator in nonconvex settings,
which is comparable to existing results for convex problems. We believe this
novel finding holds independent interest and enriches the literature on
optimization and statistical inference
Statistical Inference with Stochastic Gradient Methods under -mixing Data
Stochastic gradient descent (SGD) is a scalable and memory-efficient
optimization algorithm for large datasets and stream data, which has drawn a
great deal of attention and popularity. The applications of SGD-based
estimators to statistical inference such as interval estimation have also
achieved great success. However, most of the related works are based on i.i.d.
observations or Markov chains. When the observations come from a mixing time
series, how to conduct valid statistical inference remains unexplored. As a
matter of fact, the general correlation among observations imposes a challenge
on interval estimation. Most existing methods may ignore this correlation and
lead to invalid confidence intervals. In this paper, we propose a mini-batch
SGD estimator for statistical inference when the data is -mixing. The
confidence intervals are constructed using an associated mini-batch bootstrap
SGD procedure. Using ``independent block'' trick from \cite{yu1994rates}, we
show that the proposed estimator is asymptotically normal, and its limiting
distribution can be effectively approximated by the bootstrap procedure. The
proposed method is memory-efficient and easy to implement in practice.
Simulation studies on synthetic data and an application to a real-world dataset
confirm our theory
Recommended from our members
Stochastic gradients methods for statistical inference
Statistical inference, such as hypothesis testing and calculating a confidence interval, is an important tool for accessing uncertainty in machine learning and statistical problems. Stochastic gradient methods, such as stochastic gradient descent (SGD), have recently been successfully applied to point estimation in large scale machine learning problems. In this work, we present novel stochastic gradient methods for statistical inference in large scale machine learning problems. Unregularized M -estimation using SGD. Using SGD with a fixed step size, we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the Ornstein-Uhlenbeck process suggests that such averages are asymptotically normal. From a practical perspective, our SGD-based inference procedure is a first order method, and is well-suited for large scale problems. To show its merits, we apply it to both synthetic and real datasets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation. Approximate Newton-based statistical inference using only stochastic gradients for unregularized M -estimation. We present a novel inference framework for convex empirical risk minimization, using approximate stochastic Newton steps. The proposed algorithm is based on the notion of finite differences and allows the approximation of a Hessian-vector product from first-order information. In theory, our method efficiently computes the statistical error covariance in M -estimation for unregularized convex learning problems, without using exact second order information, or resampling the entire data set. In practice, we demonstrate the effectiveness of our framework on large-scale machine learning problems, that go even beyond convexity: as a highlight, our work can be used to detect certain adversarial attacks on neural networks. High dimensional linear regression statistical inference using only stochastic gra- dients. As an extension of the approximate Newton-based statistical inference algorithm for unregularized problems, we present a similar algorithm, using only stochastic gradients, for statistical inference in high dimensional linear regression, where the number of features is much larger than the number of samples. Stochastic gradient methods for time series analysis. We present a novel stochastic gradient descent algorithm for time series analysis, which correctly captures correlation structures in a time series dataset during optimization. Instead of uniformly sampling indices in vanilla SGD, we uniformly sample contiguous blocks of indices, where the block length depends on the dataset.Computer Science
- …