4 research outputs found
Stacked Thompson Bandits
We introduce Stacked Thompson Bandits (STB) for efficiently generating plans
that are likely to satisfy a given bounded temporal logic requirement. STB uses
a simulation for evaluation of plans, and takes a Bayesian approach to using
the resulting information to guide its search. In particular, we show that
stacking multiarmed bandits and using Thompson sampling to guide the action
selection process for each bandit enables STB to generate plans that satisfy
requirements with a high probability while only searching a fraction of the
search space.Comment: Accepted at SEsCPS @ ICSE 201
Memory Bounded Open-Loop Planning in Large POMDPs using Thompson Sampling
State-of-the-art approaches to partially observable planning like POMCP are
based on stochastic tree search. While these approaches are computationally
efficient, they may still construct search trees of considerable size, which
could limit the performance due to restricted memory resources. In this paper,
we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory
bounded approach to open-loop planning in large POMDPs, which optimizes a fixed
size stack of Thompson Sampling bandits. We empirically evaluate POSTS in four
large benchmark problems and compare its performance with different tree-based
approaches. We show that POSTS achieves competitive performance compared to
tree-based open-loop planning and offers a performance-memory tradeoff, making
it suitable for partially observable planning with highly restricted
computational and memory resources.Comment: Presented at AAAI 201
Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop Planning
We propose Stable Yet Memory Bounded Open-Loop (SYMBOL) planning, a general
memory bounded approach to partially observable open-loop planning. SYMBOL
maintains an adaptive stack of Thompson Sampling bandits, whose size is bounded
by the planning horizon and can be automatically adapted according to the
underlying domain without any prior domain knowledge beyond a generative model.
We empirically test SYMBOL in four large POMDP benchmark problems to
demonstrate its effectiveness and robustness w.r.t. the choice of
hyperparameters and evaluate its adaptive memory consumption. We also compare
its performance with other open-loop planning algorithms and POMCP.Comment: Accepted at IJCAI 2019. arXiv admin note: substantial text overlap
with arXiv:1905.0402
Leveraging Statistical Multi-Agent Online Planning with Emergent Value Function Approximation
Making decisions is a great challenge in distributed autonomous environments
due to enormous state spaces and uncertainty. Many online planning algorithms
rely on statistical sampling to avoid searching the whole state space, while
still being able to make acceptable decisions. However, planning often has to
be performed under strict computational constraints making online planning in
multi-agent systems highly limited, which could lead to poor system
performance, especially in stochastic domains. In this paper, we propose
Emergent Value function Approximation for Distributed Environments (EVADE), an
approach to integrate global experience into multi-agent online planning in
stochastic domains to consider global effects during local planning. For this
purpose, a value function is approximated online based on the emergent system
behaviour by using methods of reinforcement learning. We empirically evaluated
EVADE with two statistical multi-agent online planning algorithms in a highly
complex and stochastic smart factory environment, where multiple agents need to
process various items at a shared set of machines. Our experiments show that
EVADE can effectively improve the performance of multi-agent online planning
while offering efficiency w.r.t. the breadth and depth of the planning process.Comment: Accepted at AAMAS 201