1,461 research outputs found
Improving Automated Driving through Planning with Human Internal States
This work examines the hypothesis that partially observable Markov decision
process (POMDP) planning with human driver internal states can significantly
improve both safety and efficiency in autonomous freeway driving. We evaluate
this hypothesis in a simulated scenario where an autonomous car must safely
perform three lane changes in rapid succession. Approximate POMDP solutions are
obtained through the partially observable Monte Carlo planning with observation
widening (POMCPOW) algorithm. This approach outperforms over-confident and
conservative MDP baselines and matches or outperforms QMDP. Relative to the MDP
baselines, POMCPOW typically cuts the rate of unsafe situations in half or
increases the success rate by 50%.Comment: Preprint before submission to IEEE Transactions on Intelligent
Transportation Systems. arXiv admin note: text overlap with arXiv:1702.0085
E-MCTS: Deep Exploration in Model-Based Reinforcement Learning by Planning with Epistemic Uncertainty
One of the most well-studied and highly performing planning approaches used
in Model-Based Reinforcement Learning (MBRL) is Monte-Carlo Tree Search (MCTS).
Key challenges of MCTS-based MBRL methods remain dedicated deep exploration and
reliability in the face of the unknown, and both challenges can be alleviated
through principled epistemic uncertainty estimation in the predictions of MCTS.
We present two main contributions: First, we develop methodology to propagate
epistemic uncertainty in MCTS, enabling agents to estimate the epistemic
uncertainty in their predictions. Second, we utilize the propagated uncertainty
for a novel deep exploration algorithm by explicitly planning to explore. We
incorporate our approach into variations of MCTS-based MBRL approaches with
learned and provided dynamics models, and empirically show deep exploration
through successful epistemic uncertainty estimation achieved by our approach.
We compare to a non-planning-based deep-exploration baseline, and demonstrate
that planning with epistemic MCTS significantly outperforms non-planning based
exploration in the investigated deep exploration benchmark.Comment: Submitted to NeurIPS 2023, accepted to EWRL 202
The Hanabi Challenge: A New Frontier for AI Research
From the early days of computing, games have been important testbeds for
studying how well machines can do sophisticated decision making. In recent
years, machine learning has made dramatic advances with artificial agents
reaching superhuman performance in challenge domains like Go, Atari, and some
variants of poker. As with their predecessors of chess, checkers, and
backgammon, these game domains have driven research by providing sophisticated
yet well-defined challenges for artificial intelligence practitioners. We
continue this tradition by proposing the game of Hanabi as a new challenge
domain with novel problems that arise from its combination of purely
cooperative gameplay with two to five players and imperfect information. In
particular, we argue that Hanabi elevates reasoning about the beliefs and
intentions of other agents to the foreground. We believe developing novel
techniques for such theory of mind reasoning will not only be crucial for
success in Hanabi, but also in broader collaborative efforts, especially those
with human partners. To facilitate future research, we introduce the
open-source Hanabi Learning Environment, propose an experimental framework for
the research community to evaluate algorithmic advances, and assess the
performance of current state-of-the-art techniques.Comment: 32 pages, 5 figures, In Press (Artificial Intelligence
Planning to Learn: A Novel Algorithm for Active Learning during Model-Based Planning
Active Inference is a recent framework for modeling planning under
uncertainty. Empirical and theoretical work have now begun to evaluate the
strengths and weaknesses of this approach and how it might be improved. A
recent extension - the sophisticated inference (SI) algorithm - improves
performance on multi-step planning problems through recursive decision tree
search. However, little work to date has been done to compare SI to other
established planning algorithms. SI was also developed with a focus on
inference as opposed to learning. The present paper has two aims. First, we
compare performance of SI to Bayesian reinforcement learning (RL) schemes
designed to solve similar problems. Second, we present an extension of SI -
sophisticated learning (SL) - that more fully incorporates active learning
during planning. SL maintains beliefs about how model parameters would change
under the future observations expected under each policy. This allows a form of
counterfactual retrospective inference in which the agent considers what could
be learned from current or past observations given different future
observations. To accomplish these aims, we make use of a novel, biologically
inspired environment designed to highlight the problem structure for which SL
offers a unique solution. Here, an agent must continually search for available
(but changing) resources in the presence of competing affordances for
information gain. Our simulations show that SL outperforms all other algorithms
in this context - most notably, Bayes-adaptive RL and upper confidence bound
algorithms, which aim to solve multi-step planning problems using similar
principles (i.e., directed exploration and counterfactual reasoning). These
results provide added support for the utility of Active Inference in solving
this class of biologically-relevant problems and offer added tools for testing
hypotheses about human cognition.Comment: 31 pages, 5 figure
Decision-Making in Autonomous Driving using Reinforcement Learning
The main topic of this thesis is tactical decision-making for autonomous driving. An autonomous vehicle must be able to handle a diverse set of environments and traffic situations, which makes it hard to manually specify a suitable behavior for every possible scenario. Therefore, learning-based strategies are considered in this thesis, which introduces different approaches based on reinforcement learning (RL). A general decision-making agent, derived from the Deep Q-Network (DQN) algorithm, is proposed. With few modifications, this method can be applied to different driving environments, which is demonstrated for various simulated highway and intersection scenarios. A more sample efficient agent can be obtained by incorporating more domain knowledge, which is explored by combining planning and learning in the form of Monte Carlo tree search and RL. In different highway scenarios, the combined method outperforms using either a planning or a learning-based strategy separately, while requiring an order of magnitude fewer training samples than the DQN method. A drawback of many learning-based approaches is that they create black-box solutions, which do not indicate the confidence of the agent\u27s decisions. Therefore, the Ensemble Quantile Networks (EQN) method is introduced, which combines distributional RL with an ensemble approach, to provide an estimate of both the aleatoric and the epistemic uncertainty of each decision. The results show that the EQN method can balance risk and time efficiency in different occluded intersection scenarios, while also identifying situations that the agent has not been trained for. Thereby, the agent can avoid making unfounded, potentially dangerous, decisions outside of the training distribution. Finally, this thesis introduces a neural network architecture that is invariant to permutations of the order in which surrounding vehicles are listed. This architecture improves the sample efficiency of the agent by the factorial of the number of surrounding vehicles
Adaptation and Communication in Human-Robot Teaming to Handle Discrepancies in Agents' Beliefs about Plans
When agents collaborate on a task, it is important that they have some shared
mental model of the task routines -- the set of feasible plans towards
achieving the goals. However, in reality, situations often arise that such a
shared mental model cannot be guaranteed, such as in ad-hoc teams where agents
may follow different conventions or when contingent constraints arise that only
some agents are aware of. Previous work on human-robot teaming has assumed that
the team has a set of shared routines, which breaks down in these situations.
In this work, we leverage epistemic logic to enable agents to understand the
discrepancy in each other's beliefs about feasible plans and dynamically plan
their actions to adapt or communicate to resolve the discrepancy. We propose a
formalism that extends conditional doxastic logic to describe knowledge bases
in order to explicitly represent agents' nested beliefs on the feasible plans
and state of execution. We provide an online execution algorithm based on Monte
Carlo Tree Search for the agent to plan its action, including communication
actions to explain the feasibility of plans, announce intent, and ask
questions. Finally, we evaluate the success rate and scalability of the
algorithm and show that our agent is better equipped to work in teams without
the guarantee of a shared mental model.Comment: 10 pages, Published at ICAPS 2023 (Main Track
Strategic Argumentation Dialogues for Persuasion: Framework and Experiments Based on Modelling the Beliefs and Concerns of the Persuadee
Persuasion is an important and yet complex aspect of human intelligence. When
undertaken through dialogue, the deployment of good arguments, and therefore
counterarguments, clearly has a significant effect on the ability to be
successful in persuasion. Two key dimensions for determining whether an
argument is good in a particular dialogue are the degree to which the intended
audience believes the argument and counterarguments, and the impact that the
argument has on the concerns of the intended audience. In this paper, we
present a framework for modelling persuadees in terms of their beliefs and
concerns, and for harnessing these models in optimizing the choice of move in
persuasion dialogues. Our approach is based on the Monte Carlo Tree Search
which allows optimization in real-time. We provide empirical results of a study
with human participants showing that our automated persuasion system based on
this technology is superior to a baseline system that does not take the beliefs
and concerns into account in its strategy.Comment: The Data Appendix containing the arguments, argument graphs,
assignment of concerns to arguments, preferences over concerns, and
assignment of beliefs to arguments, is available at the link
http://www0.cs.ucl.ac.uk/staff/a.hunter/papers/unistudydata.zip The code is
available at https://github.com/ComputationalPersuasion/MCC
Strategic argumentation dialogues for persuasion: Framework and experiments based on modelling the beliefs and concerns of the persuadee
Persuasion is an important and yet complex aspect of human intelligence. When undertaken through dialogue, the deployment of good arguments, and therefore counterarguments, clearly has a significant effect on the ability to be successful in persuasion. Two key dimensions for determining whether an argument is 'good' in a particular dialogue are the degree to which the intended audience believes the argument and counterarguments, and the impact that the argument has on the concerns of the intended audience. In this paper, we present a framework for modelling persuadees in terms of their beliefs and concerns, and for harnessing these models in optimizing the choice of move in persuasion dialogues. Our approach is based on the Monte Carlo Tree Search which allows optimization in real-time. We provide empirical results of a study with human participants that compares an automated persuasion system based on this technology with a baseline system that does not take the beliefs and concerns into account in its strategy
- …