14 research outputs found

    Stacked Thompson Bandits

    Full text link
    We introduce Stacked Thompson Bandits (STB) for efficiently generating plans that are likely to satisfy a given bounded temporal logic requirement. STB uses a simulation for evaluation of plans, and takes a Bayesian approach to using the resulting information to guide its search. In particular, we show that stacking multiarmed bandits and using Thompson sampling to guide the action selection process for each bandit enables STB to generate plans that satisfy requirements with a high probability while only searching a fraction of the search space.Comment: Accepted at SEsCPS @ ICSE 201

    Memory Bounded Open-Loop Planning in Large POMDPs using Thompson Sampling

    Full text link
    State-of-the-art approaches to partially observable planning like POMCP are based on stochastic tree search. While these approaches are computationally efficient, they may still construct search trees of considerable size, which could limit the performance due to restricted memory resources. In this paper, we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory bounded approach to open-loop planning in large POMDPs, which optimizes a fixed size stack of Thompson Sampling bandits. We empirically evaluate POSTS in four large benchmark problems and compare its performance with different tree-based approaches. We show that POSTS achieves competitive performance compared to tree-based open-loop planning and offers a performance-memory tradeoff, making it suitable for partially observable planning with highly restricted computational and memory resources.Comment: Presented at AAAI 201

    Continuous Monte Carlo Graph Search

    Full text link
    In many complex sequential decision making tasks, online planning is crucial for high-performance. For efficient online planning, Monte Carlo Tree Search (MCTS) employs a principled mechanism for trading off between exploration and exploitation. MCTS outperforms comparison methods in various discrete decision making domains such as Go, Chess, and Shogi. Following, extensions of MCTS to continuous domains have been proposed. However, the inherent high branching factor and the resulting explosion of search tree size is limiting existing methods. To solve this problem, this paper proposes Continuous Monte Carlo Graph Search (CMCGS), a novel extension of MCTS to online planning in environments with continuous state and action spaces. CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance. To implement this idea, at each time step CMCGS clusters similar states into a limited number of stochastic action bandit nodes, which produce a layered graph instead of an MCTS search tree. Experimental evaluation with limited sample budgets shows that CMCGS outperforms comparison methods in several complex continuous DeepMind Control Suite benchmarks and a 2D navigation task.Comment: Under review as a conference paper at ICLR 202

    Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop Planning

    Full text link
    We propose Stable Yet Memory Bounded Open-Loop (SYMBOL) planning, a general memory bounded approach to partially observable open-loop planning. SYMBOL maintains an adaptive stack of Thompson Sampling bandits, whose size is bounded by the planning horizon and can be automatically adapted according to the underlying domain without any prior domain knowledge beyond a generative model. We empirically test SYMBOL in four large POMDP benchmark problems to demonstrate its effectiveness and robustness w.r.t. the choice of hyperparameters and evaluate its adaptive memory consumption. We also compare its performance with other open-loop planning algorithms and POMCP.Comment: Accepted at IJCAI 2019. arXiv admin note: substantial text overlap with arXiv:1905.0402

    Stable Model Predictive Path Integral Control for Aggressive Autonomous Driving

    Get PDF
    A common challenge with sampling based Model Predictive Control (MPC) algorithms operating in stochastic environments is ensuring stable behavior under sudden state disturbances. Model Predictive Path Integral (MPPI) control is an MPC algorithm that can optimize control of non-linear systems subject to non-differentiable cost criteria. It iteratively computes optimal control sequences by re-using the sequence optimized at the previous timestep as a warm start for the current iteration, which allows rapid convergence thus making it real time capable. This approach is successful in producing a diverse set of behaviors, the most impressive being its ability to control systems at the limits of handling. However, a strong unexpected state disturbance can make the previous control sequence an unsafe initialization for the new state and can result in undesired behavior. In this work, we address this problem by implementing a path tracker that produces control sequences that are used as the initializers for the current timestep, instead of simply re-using the sequence from the previous timestep. The path tracker iteratively computes control sequences that can guide the system to low-cost regions and feeds them into the MPPI framework as a sampling reference. This enforces the algorithm to sample behaviors normally distributed around controls that guide the state back to low-cost regions, even in cases where the state drastically changes. The additional advantage of our method is that it retains the ability to sample diverse and dynamically feasible controls, thus maintaining its ability for motion at the limits of handling. We experimentally verify this method on the AutoRally autonomous research platform, a one-fifth scale race car for aggressive driving tasks, and compare its performance against the most recently published results of MPPI for autonomous driving.Undergraduat
    corecore