271 research outputs found
Monte Carlo Planning method estimates planning horizons during interactive social exchange
Reciprocating interactions represent a central feature of all human
exchanges. They have been the target of various recent experiments, with
healthy participants and psychiatric populations engaging as dyads in
multi-round exchanges such as a repeated trust task. Behaviour in such
exchanges involves complexities related to each agent's preference for equity
with their partner, beliefs about the partner's appetite for equity, beliefs
about the partner's model of their partner, and so on. Agents may also plan
different numbers of steps into the future. Providing a computationally precise
account of the behaviour is an essential step towards understanding what
underlies choices. A natural framework for this is that of an interactive
partially observable Markov decision process (IPOMDP). However, the various
complexities make IPOMDPs inordinately computationally challenging. Here, we
show how to approximate the solution for the multi-round trust task using a
variant of the Monte-Carlo tree search algorithm. We demonstrate that the
algorithm is efficient and effective, and therefore can be used to invert
observations of behavioural choices. We use generated behaviour to elucidate
the richness and sophistication of interactive inference
Belief State Planning for Autonomous Driving: Planning with Interaction, Uncertain Prediction and Uncertain Perception
This thesis presents a behavior planning algorithm for automated driving in urban environments with an uncertain and dynamic nature. The uncertainty in the environment arises by the fact that the intentions as well as the future trajectories of the surrounding drivers cannot be measured directly but can only be estimated in a probabilistic fashion. Even the perception of objects is uncertain due to sensor noise or possible occlusions. When driving in such environments, the autonomous car must predict the behavior of the other drivers and plan safe, comfortable and legal trajectories. Planning such trajectories requires robust decision making when several high-level options are available for the autonomous car.
Current planning algorithms for automated driving split the problem into different subproblems, ranging from discrete, high-level decision making to prediction and continuous trajectory planning. This separation of one problem into several subproblems, combined with rule-based decision making, leads to sub-optimal behavior.
This thesis presents a global, closed-loop formulation for the motion planning problem which intertwines action selection and corresponding prediction of the other agents in one optimization problem. The global formulation allows the planning algorithm to make the decision for certain high-level options implicitly. Furthermore, the closed-loop manner of the algorithm optimizes the solution for various, future scenarios concerning the future behavior of the other agents. Formulating prediction and planning as an intertwined problem allows for modeling interaction, i.e. the future reaction of the other drivers to the behavior of the autonomous car.
The problem is modeled as a partially observable Markov decision process (POMDP) with a discrete action and a continuous state and observation space. The solution to the POMDP is a policy over belief states, which contains different reactive plans for possible future scenarios. Surrounding drivers are modeled with interactive, probabilistic agent models to account for their prediction uncertainty. The field of view of the autonomous car is simulated ahead over the whole planning horizon during the optimization of the policy. Simulating the possible, corresponding, future observations allows the algorithm to select actions that actively reduce the uncertainty of the world state. Depending on the scenario, the behavior of the autonomous car is optimized in (combined lateral and) longitudinal direction. The algorithm is formulated in a generic way and solved online, which allows for applying the algorithm on various road layouts and scenarios.
While such a generic problem formulation is intractable to solve exactly, this thesis demonstrates how a sufficiently good approximation to the optimal policy can be found online. The problem is solved by combining state of the art Monte Carlo tree search algorithms with near-optimal, domain specific roll-outs.
The algorithm is evaluated in scenarios such as the crossing of intersections under unknown intentions of other crossing vehicles, interactive lane changes in narrow gaps and decision making at intersections with large occluded areas. It is shown that the behavior of the closed-loop planner is less conservative than comparable open-loop planners. More precisely, it is even demonstrated that the policy enables the autonomous car to drive in a similar way as an omniscient planner with full knowledge of the scene. It is also demonstrated how the autonomous car executes actions to actively gather more information about the surrounding and to reduce the uncertainty of its belief state
Belief State Planning for Autonomous Driving: Planning with Interaction, Uncertain Prediction and Uncertain Perception
This work presents a behavior planning algorithm for automated driving in urban environments with an uncertain and dynamic nature. The algorithm allows to consider the prediction uncertainty (e.g. different intentions), perception uncertainty (e.g. occlusions) as well as the uncertain interactive behavior of the other agents explicitly. Simulating the most likely future scenarios allows to find an optimal policy online that enables non-conservative planning under uncertainty
Scalable Decision-Theoretic Planning in Open and Typed Multiagent Systems
In open agent systems, the set of agents that are cooperating or competing
changes over time and in ways that are nontrivial to predict. For example, if
collaborative robots were tasked with fighting wildfires, they may run out of
suppressants and be temporarily unavailable to assist their peers. We consider
the problem of planning in these contexts with the additional challenges that
the agents are unable to communicate with each other and that there are many of
them. Because an agent's optimal action depends on the actions of others, each
agent must not only predict the actions of its peers, but, before that, reason
whether they are even present to perform an action. Addressing openness thus
requires agents to model each other's presence, which becomes computationally
intractable with high numbers of agents. We present a novel, principled, and
scalable method in this context that enables an agent to reason about others'
presence in its shared environment and their actions. Our method extrapolates
models of a few peers to the overall behavior of the many-agent system, and
combines it with a generalization of Monte Carlo tree search to perform
individual agent reasoning in many-agent open environments. Theoretical
analyses establish the number of agents to model in order to achieve acceptable
worst case bounds on extrapolation error, as well as regret bounds on the
agent's utility from modeling only some neighbors. Simulations of multiagent
wildfire suppression problems demonstrate our approach's efficacy compared with
alternative baselines.Comment: Pre-print with appendices for AAAI 202
Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs
Acting in domains where an agent must plan several steps ahead to achieve a goal can be a challenging task, especially if the agentʼs sensors provide only noisy or partial information. In this setting, Partially Observable Markov Decision Processes (POMDPs) provide a planning framework that optimally trades between actions that contribute to the agentʼs knowledge and actions that increase the agentʼs immediate reward. However, the task of specifying the POMDPʼs parameters is often onerous. In particular, setting the immediate rewards to achieve a desired balance between information-gathering and acting is often not intuitive.
In this work, we propose an approximation based on minimizing the immediate Bayes risk for choosing actions when transition, observation, and reward models are uncertain. The Bayes-risk criterion avoids the computational intractability of solving a POMDP with a multi-dimensional continuous state space; we show it performs well in a variety of problems. We use policy queries—in which we ask an expert for the correct action—to infer the consequences of a potential pitfall without experiencing its effects. More important for human–robot interaction settings, policy queries allow the agent to learn the reward model without the reward values ever being specified
- …