15 research outputs found

    Towards explainable AI planning as a service

    Get PDF
    Explainable AI is an important area of research within which Explainable Planning is an emerging topic. In this paper, we argue that Explainable Planning can be designed as a service -- that is, as a wrapper around an existing planning system that utilises the existing planner to assist in answering contrastive questions. We introduce a prototype framework to facilitate this, along with some examples of how a planner can be used to address certain types of contrastive questions. We discuss the main advantages and limitations of such an approach and we identify open questions for Explainable Planning as a service that identify several possible research directions

    Policy Interpretation for Partially Observable Monte-Carlo Planning: A Rule-Based Approach

    Get PDF
    Partially Observable Monte-Carlo Planning (POMCP) is a powerful online algorithm that can generate online policies for large Partially Observable Markov Decision Processes. The lack of an explicit representation of the policy, however, hinders interpretability. In this work, we present a MAX-SMT based methodology to iteratively explore local properties of the policy. Our approach generates a compact and informative representation that describes the system under investigation

    Explaining the influence of prior knowledge on POMCP policies

    Get PDF
    Partially Observable Monte Carlo Planning is a recently proposed online planning algorithm which makes use of Monte Carlo Tree Search to solve Partially Observable Monte Carlo Decision Processes. This solver is very successful because of its capability to scale to large uncertain environments, a very important property for current real-world planning problems. In this work we propose three main contributions related to POMCP usage and interpretability. First, we introduce a new planning problem related to mobile robot collision avoidance in paths with uncertain segment difficulties, and we show how POMCP performance in this context can take advantage of prior knowledge about segment difficulty relationships. This problem has direct real-world applications, such as, safety management in industrial environments where human-robot interaction is a crucial issue. Then, we present an experimental analysis about the relationships between prior knowledge provided to the algorithm and performance improvement, showing that in our case study prior knowledge affects two main properties, namely, the distance between the belief and the real state, and the mutual information between segment difficulty and action taken in the segment. This analysis aims to improve POMCP explainability, following the line of recently proposed eXplainable AI and, in particular, eXplainable planning. Finally, we analyze results on a synthetic case study and show how the proposed measures can improve the understanding about internal planning mechanisms

    Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach

    Get PDF
    Partially Observable Monte-Carlo Planning (POMCP) is a powerful online algorithm able to generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. The lack of an explicit representation however hinders interpretability. In this work, we propose a methodology based on Satisfiability Modulo Theory (SMT) for analyzing POMCP policies by inspecting their traces, namely sequences of belief-action-observation triplets generated by the algorithm. The proposed method explores local properties of policy behavior to identify unexpected decisions. We propose an iterative process of trace analysis consisting of three main steps, i) the definition of a question by means of a parametric logical formula describing (probabilistic) relationships between beliefs and actions, ii) the generation of an answer by computing the parameters of the logical formula that maximize the number of satisfied clauses (solving a MAX-SMT problem), iii) the analysis of the generated logical formula and the related decision boundaries for identifying unexpected decisions made by POMCP with respect to the original question. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation. Results show that the approach can exploit human knowledge on the domain, outperforming state-of-the-art anomaly detection methods in identifying unexpected decisions. An improvement of the Area Under Curve up to 47\% has been achieved in our tests.Comment: AAMAS 2021, 3-7 May 2021, London-UK (Virtual

    Identification of unexpected decisions in Partially Observable Monte Carlo Planning: a rule-based approach

    Get PDF
    Partially Observable Monte-Carlo Planning (POMCP) is a powerful online algorithm able to generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. The lack of an explicit representation however hinders interpretability. In this work, we propose a methodology based on Satisfiability Modulo Theory (SMT) for analyzing POMCP policies by inspecting their traces, namely sequences of belief-action-observation triplets generated by the algorithm. The proposed method explores local properties of policy behavior to identify unexpected decisions. We propose an iterative process of trace analysis consisting of three main steps, i) the definition of a question by means of a parametric logical formula describing (probabilistic) relationships between beliefs and actions, ii) the generation of an answer by computing the parameters of the logical formula that maximize the number of satisfied clauses (solving a MAX-SMTproblem), iii) the analysis of the generated logical formula and the related decision boundaries for identifying unexpected decisions made by POMCP with respect to the original question. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation. Results show that the approach can exploit human knowledge on the domain, outperforming state-of-the-art anomaly detection methods in identifying unexpected decisions. An improvement of the Area Under Curve up to 47% has been achieved in our tests

    FCE: Feedback Based Counterfactual Explanations for Explainable AI

    Get PDF
    none4sìSuffian, Muhammad; Graziani, Pierluigi; Alonso, Jose M.; Bogliolo, AlessandroSuffian, Muhammad; Graziani, Pierluigi; Alonso, Jose M.; Bogliolo, Alessandr

    Local and Global Explanations of Agent Behavior: Integrating Strategy Summaries with Saliency Maps

    Get PDF
    With advances in reinforcement learning (RL), agents are now being developed in high-stakes application domains such as healthcare and transportation. Explaining the behavior of these agents is challenging, as the environments in which they act have large state spaces, and their decision-making can be affected by delayed rewards, making it difficult to analyze their behavior. To address this problem, several approaches have been developed. Some approaches attempt to convey the global\textit{global} behavior of the agent, describing the actions it takes in different states. Other approaches devised local\textit{local} explanations which provide information regarding the agent's decision-making in a particular state. In this paper, we combine global and local explanation methods, and evaluate their joint and separate contributions, providing (to the best of our knowledge) the first user study of combined local and global explanations for RL agents. Specifically, we augment strategy summaries that extract important trajectories of states from simulations of the agent with saliency maps which show what information the agent attends to. Our results show that the choice of what states to include in the summary (global information) strongly affects people's understanding of agents: participants shown summaries that included important states significantly outperformed participants who were presented with agent behavior in a randomly set of chosen world-states. We find mixed results with respect to augmenting demonstrations with saliency maps (local information), as the addition of saliency maps did not significantly improve performance in most cases. However, we do find some evidence that saliency maps can help users better understand what information the agent relies on in its decision making, suggesting avenues for future work that can further improve explanations of RL agents

    Event-driven Temporal Models for Explanations - ETeMoX: Explaining Reinforcement Learning

    Get PDF
    Modern software systems are increasingly expected to show higher degrees of autonomy and self-management to cope with uncertain and diverse situations. As a consequence, autonomous systems can exhibit unexpected and surprising behaviours. This is exacerbated due to the ubiquity and complexity of Artificial Intelligence (AI)-based systems. This is the case of Reinforcement Learning (RL), where autonomous agents learn through trial-and-error how to find good solutions to a problem. Thus, the underlying decision-making criteria may become opaque to users that interact with the system and who may require explanations about the system’s reasoning. Available work for eXplainable Reinforcement Learning (XRL) offers different trade-offs: e.g. for runtime explanations, the approaches are model-specific or can only analyse results after-the-fact. Different from these approaches, this paper aims to provide an online model-agnostic approach for XRL towards trustworthy and understandable AI. We present ETeMoX, an architecture based on temporal models to keep track of the decision-making processes of RL systems. In cases where the resources are limited (e.g. storage capacity or time to response), the architecture also integrates complex event processing, an event-driven approach, for detecting matches to event patterns that need to be stored, instead of keeping the entire history. The approach is applied to a mobile communications case study that uses RL for its decision-making. In order to test the generalisability of our approach, three variants of the underlying RL algorithms are used: Q-Learning, SARSA and DQN. The encouraging results show that using the proposed configurable architecture, RL developers are able to obtain explanations about the evolution of a metric, relationships between metrics, and were able to track situations of interest happening over time windows

    Risk-aware shielding of Partially Observable Monte Carlo Planning policies

    Get PDF
    Partially Observable Monte Carlo Planning (POMCP) is a powerful online algorithm that can generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. However, the lack of an explicit policy representation hinders interpretability and a proper evaluation of the risks an agent may incur. In this work, we propose a methodology based on Maximum Satisfiability Modulo Theory (MAX-SMT) for analyzing POMCP policies by inspecting their traces, namely, sequences of belief- action pairs generated by the algorithm. The proposed method explores local properties of the policy to build a compact and informative summary of the policy behaviour. Moreover, we introduce a rich and formal language that a domain expert can use to describe the expected behaviour of a policy. In more detail, we present a formulation that directly computes the risk involved in taking actions by considering the high- level elements specified by the expert. The final formula can identify risky decisions taken by POMCP that violate the expert indications. We show that this identification process can be used offline (to improve the policy’s explainability and identify anomalous behaviours) or online (to shield the risky decisions of the POMCP algorithm). We present an extended evaluation of our approach on four domains: the well-known tiger and rocksample benchmarks, a problem of velocity regulation in mobile robots, and a problem of battery management in mobile robots. We test the methodology against a state-of- the-art anomaly detection algorithm to show that our approach can be used to identify anomalous behaviours in faulty POMCP. We also show, comparing the performance of shielded and unshielded POMCP, that the shielding mechanism can improve the system’s performance. We provide an open-source implementation of the proposed methodologies at https://github.com/GiuMaz/XPOMCP
    corecore