14,539 research outputs found
Asymptotic Bias of Stochastic Gradient Search
The asymptotic behavior of the stochastic gradient algorithm with a biased
gradient estimator is analyzed. Relying on arguments based on the dynamic
system theory (chain-recurrence) and the differential geometry (Yomdin theorem
and Lojasiewicz inequality), tight bounds on the asymptotic bias of the
iterates generated by such an algorithm are derived. The obtained results hold
under mild conditions and cover a broad class of high-dimensional nonlinear
algorithms. Using these results, the asymptotic properties of the
policy-gradient (reinforcement) learning and adaptive population Monte Carlo
sampling are studied. Relying on the same results, the asymptotic behavior of
the recursive maximum split-likelihood estimation in hidden Markov models is
analyzed, too.Comment: arXiv admin note: text overlap with arXiv:0907.102
Generalized Simultaneous Perturbation-based Gradient Search with Reduced Estimator Bias
We present in this paper a family of generalized simultaneous
perturbation-based gradient search (GSPGS) estimators that use noisy function
measurements. The number of function measurements required by each estimator is
guided by the desired level of accuracy. We first present in detail unbalanced
generalized simultaneous perturbation stochastic approximation (GSPSA)
estimators and later present the balanced versions (B-GSPSA) of these. We
extend this idea further and present the generalized smoothed functional (GSF)
and generalized random directions stochastic approximation (GRDSA) estimators,
respectively, as well as their balanced variants. We show that estimators
within any specified class requiring more number of function measurements
result in lower estimator bias. We present a detailed analysis of both the
asymptotic and non-asymptotic convergence of the resulting stochastic
approximation schemes. We further present a series of experimental results with
the various GSPGS estimators on the Rastrigin and quadratic function
objectives. Our experiments are seen to validate our theoretical findings.Comment: The material in this paper was presented in part at the Conference on
Information Sciences and Systems (CISS) in March 202
Simultaneous Perturbation Algorithms for Batch Off-Policy Search
We propose novel policy search algorithms in the context of off-policy, batch
mode reinforcement learning (RL) with continuous state and action spaces. Given
a batch collection of trajectories, we perform off-line policy evaluation using
an algorithm similar to that by [Fonteneau et al., 2010]. Using this
Monte-Carlo like policy evaluator, we perform policy search in a class of
parameterized policies. We propose both first order policy gradient and second
order policy Newton algorithms. All our algorithms incorporate simultaneous
perturbation estimates for the gradient as well as the Hessian of the
cost-to-go vector, since the latter is unknown and only biased estimates are
available. We demonstrate their practicality on a simple 1-dimensional
continuous state space problem
Patterns of Scalable Bayesian Inference
Datasets are growing not just in size but in complexity, creating a demand
for rich models and quantification of uncertainty. Bayesian methods are an
excellent fit for this demand, but scaling Bayesian inference is a challenge.
In response to this challenge, there has been considerable recent work based on
varying assumptions about model structure, underlying computational resources,
and the importance of asymptotic correctness. As a result, there is a zoo of
ideas with few clear overarching principles.
In this paper, we seek to identify unifying principles, patterns, and
intuitions for scaling Bayesian inference. We review existing work on utilizing
modern computing resources with both MCMC and variational approximation
techniques. From this taxonomy of ideas, we characterize the general principles
that have proven successful for designing scalable inference procedures and
comment on the path forward
- …