271 research outputs found

    Monte Carlo Planning method estimates planning horizons during interactive social exchange

    Full text link
    Reciprocating interactions represent a central feature of all human exchanges. They have been the target of various recent experiments, with healthy participants and psychiatric populations engaging as dyads in multi-round exchanges such as a repeated trust task. Behaviour in such exchanges involves complexities related to each agent's preference for equity with their partner, beliefs about the partner's appetite for equity, beliefs about the partner's model of their partner, and so on. Agents may also plan different numbers of steps into the future. Providing a computationally precise account of the behaviour is an essential step towards understanding what underlies choices. A natural framework for this is that of an interactive partially observable Markov decision process (IPOMDP). However, the various complexities make IPOMDPs inordinately computationally challenging. Here, we show how to approximate the solution for the multi-round trust task using a variant of the Monte-Carlo tree search algorithm. We demonstrate that the algorithm is efficient and effective, and therefore can be used to invert observations of behavioural choices. We use generated behaviour to elucidate the richness and sophistication of interactive inference

    Belief State Planning for Autonomous Driving: Planning with Interaction, Uncertain Prediction and Uncertain Perception

    Get PDF
    This thesis presents a behavior planning algorithm for automated driving in urban environments with an uncertain and dynamic nature. The uncertainty in the environment arises by the fact that the intentions as well as the future trajectories of the surrounding drivers cannot be measured directly but can only be estimated in a probabilistic fashion. Even the perception of objects is uncertain due to sensor noise or possible occlusions. When driving in such environments, the autonomous car must predict the behavior of the other drivers and plan safe, comfortable and legal trajectories. Planning such trajectories requires robust decision making when several high-level options are available for the autonomous car. Current planning algorithms for automated driving split the problem into different subproblems, ranging from discrete, high-level decision making to prediction and continuous trajectory planning. This separation of one problem into several subproblems, combined with rule-based decision making, leads to sub-optimal behavior. This thesis presents a global, closed-loop formulation for the motion planning problem which intertwines action selection and corresponding prediction of the other agents in one optimization problem. The global formulation allows the planning algorithm to make the decision for certain high-level options implicitly. Furthermore, the closed-loop manner of the algorithm optimizes the solution for various, future scenarios concerning the future behavior of the other agents. Formulating prediction and planning as an intertwined problem allows for modeling interaction, i.e. the future reaction of the other drivers to the behavior of the autonomous car. The problem is modeled as a partially observable Markov decision process (POMDP) with a discrete action and a continuous state and observation space. The solution to the POMDP is a policy over belief states, which contains different reactive plans for possible future scenarios. Surrounding drivers are modeled with interactive, probabilistic agent models to account for their prediction uncertainty. The field of view of the autonomous car is simulated ahead over the whole planning horizon during the optimization of the policy. Simulating the possible, corresponding, future observations allows the algorithm to select actions that actively reduce the uncertainty of the world state. Depending on the scenario, the behavior of the autonomous car is optimized in (combined lateral and) longitudinal direction. The algorithm is formulated in a generic way and solved online, which allows for applying the algorithm on various road layouts and scenarios. While such a generic problem formulation is intractable to solve exactly, this thesis demonstrates how a sufficiently good approximation to the optimal policy can be found online. The problem is solved by combining state of the art Monte Carlo tree search algorithms with near-optimal, domain specific roll-outs. The algorithm is evaluated in scenarios such as the crossing of intersections under unknown intentions of other crossing vehicles, interactive lane changes in narrow gaps and decision making at intersections with large occluded areas. It is shown that the behavior of the closed-loop planner is less conservative than comparable open-loop planners. More precisely, it is even demonstrated that the policy enables the autonomous car to drive in a similar way as an omniscient planner with full knowledge of the scene. It is also demonstrated how the autonomous car executes actions to actively gather more information about the surrounding and to reduce the uncertainty of its belief state

    Belief State Planning for Autonomous Driving: Planning with Interaction, Uncertain Prediction and Uncertain Perception

    Get PDF
    This work presents a behavior planning algorithm for automated driving in urban environments with an uncertain and dynamic nature. The algorithm allows to consider the prediction uncertainty (e.g. different intentions), perception uncertainty (e.g. occlusions) as well as the uncertain interactive behavior of the other agents explicitly. Simulating the most likely future scenarios allows to find an optimal policy online that enables non-conservative planning under uncertainty

    Scalable Decision-Theoretic Planning in Open and Typed Multiagent Systems

    Full text link
    In open agent systems, the set of agents that are cooperating or competing changes over time and in ways that are nontrivial to predict. For example, if collaborative robots were tasked with fighting wildfires, they may run out of suppressants and be temporarily unavailable to assist their peers. We consider the problem of planning in these contexts with the additional challenges that the agents are unable to communicate with each other and that there are many of them. Because an agent's optimal action depends on the actions of others, each agent must not only predict the actions of its peers, but, before that, reason whether they are even present to perform an action. Addressing openness thus requires agents to model each other's presence, which becomes computationally intractable with high numbers of agents. We present a novel, principled, and scalable method in this context that enables an agent to reason about others' presence in its shared environment and their actions. Our method extrapolates models of a few peers to the overall behavior of the many-agent system, and combines it with a generalization of Monte Carlo tree search to perform individual agent reasoning in many-agent open environments. Theoretical analyses establish the number of agents to model in order to achieve acceptable worst case bounds on extrapolation error, as well as regret bounds on the agent's utility from modeling only some neighbors. Simulations of multiagent wildfire suppression problems demonstrate our approach's efficacy compared with alternative baselines.Comment: Pre-print with appendices for AAAI 202

    Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

    Get PDF
    Acting in domains where an agent must plan several steps ahead to achieve a goal can be a challenging task, especially if the agentʼs sensors provide only noisy or partial information. In this setting, Partially Observable Markov Decision Processes (POMDPs) provide a planning framework that optimally trades between actions that contribute to the agentʼs knowledge and actions that increase the agentʼs immediate reward. However, the task of specifying the POMDPʼs parameters is often onerous. In particular, setting the immediate rewards to achieve a desired balance between information-gathering and acting is often not intuitive. In this work, we propose an approximation based on minimizing the immediate Bayes risk for choosing actions when transition, observation, and reward models are uncertain. The Bayes-risk criterion avoids the computational intractability of solving a POMDP with a multi-dimensional continuous state space; we show it performs well in a variety of problems. We use policy queries—in which we ask an expert for the correct action—to infer the consequences of a potential pitfall without experiencing its effects. More important for human–robot interaction settings, policy queries allow the agent to learn the reward model without the reward values ever being specified
    • …
    corecore