99 research outputs found

    Interpretable and verifiable planning and prediction for autonomous vehicles

    Get PDF
    Autonomous driving (AD) has gained much attention in recent years due to its many potential benefits such as improving safety and increasing efficiency. However, AD is a difficult problem with challenges such as handling interactions with other vehicles and predicting the future behaviour of human drivers. This often takes place in complicated urban environments where information is missing due to occlusions. AD methods must also be accurate and effective while still being efficient enough to run in real time. In this thesis, several novel AD methods are presented which contribute towards solving some of the problems of AD. In particular, the focus is on planning, prediction and goal recognition (GR) methods which are interpretable by humans and formally verifiable. Interpretability can increase user trust of AD systems and aid with debugging issues with such systems. Having the ability to formally verify propositions made about AD methods can help ensure safety and compliance with regulations. The first novel method is Interpretable Goal-based Prediction and Planning (IGP2) which integrates GR through inverse planning with Monte Carlo tree search (MCTS) to achieve a full planning and prediction system. IGP2 is evaluated in several urban driving scenarios and is shown to successfully recognise other vehicle's goals and improve driving efficiency. The second method is Goal Recognition with Interpretable Trees (GRIT). GRIT makes use of learned decision trees trained to infer a probability distribution over the goals of other vehicles. An evaluation across two vehicle trajectory datasets shows that the inference process of GRIT is fast, accurate, interpretable and verifiable. The third method is Goal Recognition with Interpretable Trees under Occlusion (OGRIT). Similarly to GRIT, OGRIT makes use of learned decision trees for GR. Through an evaluation across two vehicle trajectory datasets with significant occlusions, OGRIT is also shown to handle information missing due to occlusions and can make inferences across multiple scenarios using the same learned models, while still remaining fast, accurate, interpretable and verifiable. This thesis contributes three novel methods which work towards allowing autonomous vehicles to accurately and efficiently infer the goals of other vehicles in complex, partially occluded urban environments, and then predict their future behaviour and plan accordingly

    Opponent awareness at all levels of the multiagent reinforcement learning stack

    Get PDF
    Multiagent Reinforcement Learning (MARL) has experienced numerous high profile successes in recent years in terms of generating superhuman gameplaying agents for a wide variety of videogames. Despite these successes, MARL techniques have failed to be adopted by game developers as a useful tool to be used when developing their games, often citing the high computational cost associated with training agents alongside the difficulty of understanding and evaluating MARL methods as the two main obstacles. This thesis attempts to close this gap by introducing an informative modular abstraction under which any Reinforcement Learning (RL) training pipeline can be studied. This is defined as the MARL stack, which explicitly expresses any MARL pipeline as an environment where agents equipped with learning algorithms train via simulated experience as orchestrated by a training scheme. Within the context of 2-player zero-sum games, different approaches at granting opponent awareness at all levels of the proposed MARL stack are explored in broad study of the field. At the level of training schemes, a grouping generalization over many modern MARL training schemes is introduced under a unified framework. Empirical results are shown which demonstrate that the decision over which sequence of opponents a learning agent will face during training greatly affects learning dynamics. At the agent level, the introduction of opponent modelling in state-of-the art algorithms is explored as a way of generating targeted best responses towards opponents encountered during training, improving upon the sample efficiency of these methods. At the environment level the use of MARL as a game design tool is explored by using MARL trained agents as metagame evaluators inside an automated process of game balancing
    • …
    corecore