3,854 research outputs found

    Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations

    Full text link
    Post-hoc explanations of machine learning models are crucial for people to understand and act on algorithmic predictions. An intriguing class of explanations is through counterfactuals, hypothetical examples that show people how to obtain a different prediction. We posit that effective counterfactual explanations should satisfy two properties: feasibility of the counterfactual actions given user context and constraints, and diversity among the counterfactuals presented. To this end, we propose a framework for generating and evaluating a diverse set of counterfactual explanations based on determinantal point processes. To evaluate the actionability of counterfactuals, we provide metrics that enable comparison of counterfactual-based methods to other local explanation methods. We further address necessary tradeoffs and point to causal implications in optimizing for counterfactuals. Our experiments on four real-world datasets show that our framework can generate a set of counterfactuals that are diverse and well approximate local decision boundaries, outperforming prior approaches to generating diverse counterfactuals. We provide an implementation of the framework at https://github.com/microsoft/DiCE.Comment: 13 page

    Specifying and Interpreting Reinforcement Learning Policies through Simulatable Machine Learning

    Full text link
    Human-AI collaborative policy synthesis is a procedure in which (1) a human initializes an autonomous agent's behavior, (2) Reinforcement Learning improves the human specified behavior, and (3) the agent can explain the final optimized policy to the user. This paradigm leverages human expertise and facilitates a greater insight into the learned behaviors of an agent. Existing approaches to enabling collaborative policy specification involve black box methods which are unintelligible and are not catered towards non-expert end-users. In this paper, we develop a novel collaborative framework to enable humans to initialize and interpret an autonomous agent's behavior, rooted in principles of human-centered design. Through our framework, we enable humans to specify an initial behavior model in the form of unstructured, natural language, which we then convert to lexical decision trees. Next, we are able to leverage these human-specified policies, to warm-start reinforcement learning and further allow the agent to optimize the policies through reinforcement learning. Finally, to close the loop on human-specification, we produce explanations of the final learned policy, in multiple modalities, to provide the user a final depiction about the learned policy of the agent. We validate our approach by showing that our model can produce >80% accuracy, and that human-initialized policies are able to successfully warm-start RL. We then conduct a novel human-subjects study quantifying the relative subjective and objective benefits of varying XAI modalities(e.g., Tree, Language, and Program) for explaining learned policies to end-users, in terms of usability and interpretability and identify the circumstances that influence these measures. Our findings emphasize the need for personalized explainable systems that can facilitate user-centric policy explanations for a variety of end-users

    Neuro-symbolic Computation for XAI: Towards a Unified Model

    Get PDF
    The idea of integrating symbolic and sub-symbolic approaches to make intelligent systems (IS) understandable and explainable is at the core of new fields such as neuro-symbolic computing (NSC). This work lays under the umbrella of NSC, and aims at a twofold objective. First, we present a set of guidelines aimed at building explainable IS, which leverage on logic induction and constraints to integrate symbolic and sub-symbolic approaches. Then, we reify the proposed guidelines into a case study to show their effectiveness and potential, presenting a prototype built on the top of some NSC technologies

    Not All Dialogues are Created Equal: Instance Weighting for Neural Conversational Models

    Full text link
    Neural conversational models require substantial amounts of dialogue data for their parameter estimation and are therefore usually learned on large corpora such as chat forums or movie subtitles. These corpora are, however, often challenging to work with, notably due to their frequent lack of turn segmentation and the presence of multiple references external to the dialogue itself. This paper shows that these challenges can be mitigated by adding a weighting model into the architecture. The weighting model, which is itself estimated from dialogue data, associates each training example to a numerical weight that reflects its intrinsic quality for dialogue modelling. At training time, these sample weights are included into the empirical loss to be minimised. Evaluation results on retrieval-based models trained on movie and TV subtitles demonstrate that the inclusion of such a weighting model improves the model performance on unsupervised metrics.Comment: Accepted to SIGDIAL 201
    • …
    corecore