Transparency: from tractability to model explanations

Abstract

As artificial intelligence (AI) and machine learning (ML) models get increasingly incorporated into critical applications, ranging from medical diagnosis to loan approval, they show a tremendous potential to impact society in a beneficial way, however, this is predicated on establishing a transparent relationship between humans and automation. In particular, transparency requirements span across multiple dimensions, incorporating both technical and societal aspects, in order to promote the responsible use of AI/ML. In this thesis we present contributions along both of these axes, starting with the technical side and model transparency, where we study ways to enhance tractable probabilistic models (TPMs) with properties that enable acquiring an in-depth understanding of their decision-making process. Following this, we expand the scope of our work, studying how providing explanations about a model’s predictions influences the extent to which humans understand and collaborate with it, and finally we design an introductory course into the emerging field of explanations in AI to foster the competent use of the developed tools and methodologies. In more detail, the complex design of TPMs makes it very challenging to extract information that conveys meaningful insights, despite the fact that they are closely related to Bayesian networks (BNs), which readily provide such information. This has led to TPMs being viewed as black-boxes, in the sense that their internal representations are elusive, in contrast to BNs. The first part of this thesis challenges this view, focusing on the question of whether it is feasible to extend certain transparent features of BNs to TPMs. We start with considering the problem of transforming TPMs into alternative graphical models in a way that makes their internal representations easy to inspect. Furthermore, we study the utility of existing algorithms in causal applications, where we identify some significant limitations. To remedy this situation, we propose a set of algorithms that result in transformations that accurately uncover the internal representations of TPMs. Following this result, we look into the problem of incorporating probabilistic constraints into TPMs. Although it is well known that BNs satisfy this property, the complex structure of TPMs impedes applying the same arguments, thus advances on this problem have been very limited. Having said that, in this thesis we provide formal proofs that TPMs can be made to satisfy both probabilistic and causal constraints through parameter manipulation, showing that incorporating a constraint corresponds to solving a system of multilinear equations. We conclude the technical contributions studying the problem of generating counterfactual instances for classifiers based on TPMs, motivated by the fact that BNs are the building blocks of most standard approaches to perform this task. In this thesis we propose a novel algorithm that we prove is guaranteed to generate valid counterfactuals. The resulting algorithm takes advantage of the multilinear structure of TPMs, generalizing existing approaches, while also allowing for incorporating a priori constraints that should be respected by the final counterfactuals. In the second part of this thesis we go beyond model transparency, looking into the role of explanations in achieving an effective collaboration between human users and AI. To study this we design a behavioural experiment where we show that explanations provide unique insights, which cannot be obtained by looking at more traditional uncertainty measures. The findings of this experiment provide evidence supporting the view that explanations and uncertainty estimates have complementary functions, advocating in favour of incorporating elements of both in order to promote a synergistic relationship between humans and AI. Finally, building on our findings, in this thesis we design a course on explanations in AI, where we focus on both the technical details of state-of-the-art algorithms as well as the overarching goals, limitations, and methodological approaches in the field. This contribution aims at ensuring that users can make competent use of explanations, a need that has also been highlighted by recent large scale social initiatives. The resulting course was offered by the University of Edinburgh, at an MSc level, where student evaluations, as well as their performance, showcased the course’s effectiveness in achieving its primary goals

    Similar works