2 research outputs found
Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems
The multi-armed bandit problem forms the foundation for solving a wide range
of on-line stochastic optimization problems through a simple, yet effective
mechanism. One simply casts the problem as a gambler that repeatedly pulls one
out of N slot machine arms, eliciting random rewards. Learning of reward
probabilities is then combined with reward maximization, by carefully balancing
reward exploration against reward exploitation. In this paper, we address a
particularly intriguing variant of the multi-armed bandit problem, referred to
as the {\it Stochastic Point Location (SPL) Problem}. The gambler is here only
told whether the optimal arm (point) lies to the "left" or to the "right" of
the arm pulled, with the feedback being erroneous with probability .
This formulation thus captures optimization in continuous action spaces with
both {\it informative} and {\it deceptive} feedback. To tackle this class of
problems, we formulate a compact and scalable Bayesian representation of the
solution space that simultaneously captures both the location of the optimal
arm as well as the probability of receiving correct feedback. We further
introduce the accompanying Thompson Sampling guided Stochastic Point Location
(TS-SPL) scheme for balancing exploration against exploitation. By learning
, TS-SPL also supports {\it deceptive} environments that are lying about
the direction of the optimal arm. This, in turn, allows us to solve the
fundamental Stochastic Root Finding (SRF) Problem. Empirical results
demonstrate that our scheme deals with both deceptive and informative
environments, significantly outperforming competing algorithms both for SRF and
SPL.Comment: 17 pages, 2 figures. A preliminary version of some of the results of
this paper appears in the Proceedings of AIAI'1
ROOT FINDING VIA DARTS — DYNAMIC ADAPTIVE RANDOM TARGET SHOOTING
Consider multi-dimensional root finding when the equations are available only implicitly via a Monte Carlo simulation oracle that for any solution returns a vector of point estimates. We develop DARTS, a stochasticapproximation algorithm that makes quasi-Newton moves to a new solution whenever the current sample size is large compared to the estimated quality of the current solution and estimated sampling error. We show that DARTS converges in a certain precise sense, and discuss reasons to expect substantial computational efficiencies over traditional stochastic approximation variations.