241 research outputs found
Asymptotic normality for weighted sums of linear processes
We establish asymptotic normality of weighted sums of linear processes with general triangular array weights and when the innovations in the linear process are martingale differences. The results are obtained under minimal conditions on the weights and innovations. We also obtain weak convergence of weighted partial sum processes. The results are applicable to linear processes that have short or long memory or exhibit seasonal long memory behavior. In particular, they are applicable to GARCH and ARCH(∞) models and to their squares. They are also useful in deriving asymptotic normality of kernel-type estimators of a nonparametric regression function with short or long memory moving average errors
Design Issues for Generalized Linear Models: A Review
Generalized linear models (GLMs) have been used quite effectively in the
modeling of a mean response under nonstandard conditions, where discrete as
well as continuous data distributions can be accommodated. The choice of design
for a GLM is a very important task in the development and building of an
adequate model. However, one major problem that handicaps the construction of a
GLM design is its dependence on the unknown parameters of the fitted model.
Several approaches have been proposed in the past 25 years to solve this
problem. These approaches, however, have provided only partial solutions that
apply in only some special cases, and the problem, in general, remains largely
unresolved. The purpose of this article is to focus attention on the
aforementioned dependence problem. We provide a survey of various existing
techniques dealing with the dependence problem. This survey includes
discussions concerning locally optimal designs, sequential designs, Bayesian
designs and the quantile dispersion graph approach for comparing designs for
GLMs.Comment: Published at http://dx.doi.org/10.1214/088342306000000105 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bias-Variance Tradeoff in a Sliding Window Implementation of the Stochastic Gradient Algorithm
This paper provides a framework to analyze stochastic gradient algorithms in
a mean squared error (MSE) sense using the asymptotic normality result of the
stochastic gradient descent (SGD) iterates. We perform this analysis by taking
the asymptotic normality result and applying it to the finite iteration case.
Specifically, we look at problems where the gradient estimators are biased and
have reduced variance and compare the iterates generated by these gradient
estimators to the iterates generated by the SGD algorithm. We use the work of
Fabian to characterize the mean and the variance of the distribution of the
iterates in terms of the bias and the covariance matrix of the gradient
estimators. We introduce the sliding window SGD (SW-SGD) algorithm, with its
proof of convergence, which incurs a lower MSE than the SGD algorithm on
quadratic and convex problems. Lastly, we present some numerical results to
show the effectiveness of this framework and the superiority of SW-SGD
algorithm over the SGD algorithm
Point estimation, stochastic approximation, and robust Kalman filtering
Caption title.Includes bibliographical references (p. 23-25).Supported by the U.S. Air Force Office of Scientific Research. AFOSR-85-0227 AFOSR-89-0276Sanjoy K. Mitter and Irvin C. Schick
Constant Step Size Least-Mean-Square: Bias-Variance Trade-offs and Optimal Sampling Distributions
We consider the least-squares regression problem and provide a detailed
asymptotic analysis of the performance of averaged constant-step-size
stochastic gradient descent (a.k.a. least-mean-squares). In the strongly-convex
case, we provide an asymptotic expansion up to explicit exponentially decaying
terms. Our analysis leads to new insights into stochastic approximation
algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the
generalization error may be divided into a variance term which is decaying as
O(1/n), independently of the step-size , and a bias term that decays as
O(1/ 2 n 2); (c) when allowing non-uniform sampling, the choice of a
good sampling density depends on whether the variance or bias terms dominate.
In particular, when the variance term dominates, optimal sampling densities do
not lead to much gain, while when the bias term dominates, we can choose larger
step-sizes that leads to significant improvements
Efficiency of the stochastic approximation method
The practical aspect of the stochastic approximation method (SA) is studied. Specifically, we investigated the efficiency depending on the coefficients that generate the step length in optimization algorithm, as well as the efficiency depending on the type and the level of the corresponding noise. Efficiency is measured by the mean values of the objective function at the final estimates of the algorithm, over the specified number of replications. This paper provides suggestions how to choose already mentioned coefficients, in order to achieve better performance of the stochastic approximation algorithm
Cyclic Stochastic Optimization: Generalizations, Convergence, and Applications in Multi-Agent Systems
Stochastic approximation (SA) is a powerful class of iterative algorithms for nonlinear root-finding that can be used for minimizing a loss function, L(θ), with respect to a parameter vector θ, when only noisy observations of L(θ) or its gradient are available (through the natural connection between root-finding and minimization); SA algorithms can be thought of as stochastic line search methods where the entire parameter vector is updated at each iteration. The cyclic approach to SA is a variant of SA procedures where θ is divided into multiple subvectors that are updated one at a time in a cyclic manner.
This dissertation focuses on studying the asymptotic properties of cyclic SA and of the generalized cyclic SA (GCSA) algorithm, a variant of cyclic SA where the subvector to update may be selected according to a random variable or according to a predetermined pattern, and where the noisy update direction can be based on the updates of any SA algorithm (e.g., stochastic gradient, Kiefer–Wolfowitz, or simultaneous perturbation SA). The convergence of GCSA, asymptotic normality of GCSA (related to rate of convergence), and efficiency of GCSA relative to its non-cyclic counterpart are investigated both analytically and numerically. Specifically, conditions are obtained for the convergence with probability one of the GCSA iterates and for the asymptotic normality of the normalized iterates of a special case of GCSA. Further, an analytic expression is given for the asymptotic relative efficiency (when efficiency is defined in terms of mean squared error) between a special case of GCSA and its non-cyclic counterpart. Finally, an application of the cyclic SA scheme to a multi-agent stochastic optimization problem is investigated.
This dissertation also contains two appendices. The first appendix generalizes Theorem 2.2 in Fabian (1968) (a seminal paper in the SA literature that derives general conditions for the asymptotic normality of SA procedures) to make the result more applicable to some modern applications of SA including (but not limited to) the GCSA algorithm, certain root-finding SA algorithms, and certain second-order SA algorithms. The second appendix considers the problem of determining the presence and location of a static object within an area of interest by combining information from multiple sensors using a maximum-likelihood-based approach
- …