12 research outputs found

    Statistics without Interpretation: A Sober Look at Explainable Machine Learning

    Full text link
    In the rapidly growing literature on explanation algorithms, it often remains unclear what precisely these algorithms are for and how they should be used. We argue that this is because explanation algorithms are often mathematically complex but don't admit a clear interpretation. Unfortunately, complex statistical methods that don't have a clear interpretation are bound to lead to errors in interpretation, a fact that has become increasingly apparent in the literature. In order to move forward, papers on explanation algorithms should make clear how precisely the output of the algorithms should be interpreted. They should also clarify what questions about the function can and cannot be answered given the explanations. Our argument is based on the distinction between statistics and their interpretation. It also relies on parallels between explainable machine learning and applied statistics

    Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness

    Full text link
    One of the remarkable properties of robust computer vision models is that their input-gradients are often aligned with human perception, referred to in the literature as perceptually-aligned gradients (PAGs). Despite only being trained for classification, PAGs cause robust models to have rudimentary generative capabilities, including image generation, denoising, and in-painting. However, the underlying mechanisms behind these phenomena remain unknown. In this work, we provide a first explanation of PAGs via \emph{off-manifold robustness}, which states that models must be more robust off- the data manifold than they are on-manifold. We first demonstrate theoretically that off-manifold robustness leads input gradients to lie approximately on the data manifold, explaining their perceptual alignment. We then show that Bayes optimal models satisfy off-manifold robustness, and confirm the same empirically for robust models trained via gradient norm regularization, noise augmentation, and randomized smoothing. Quantifying the perceptual alignment of model gradients via their similarity with the gradients of generative models, we show that off-manifold robustness correlates well with perceptual alignment. Finally, based on the levels of on- and off-manifold robustness, we identify three different regimes of robustness that affect both perceptual alignment and model accuracy: weak robustness, bayes-aligned robustness, and excessive robustness

    Elephants Never Forget: Testing Language Models for Memorization of Tabular Data

    Full text link
    While many have shown how Large Language Models (LLMs) can be applied to a diverse set of tasks, the critical issues of data contamination and memorization are often glossed over. In this work, we address this concern for tabular data. Starting with simple qualitative tests for whether an LLM knows the names and values of features, we introduce a variety of different techniques to assess the degrees of contamination, including statistical tests for conditional distribution modeling and four tests that identify memorization. Our investigation reveals that LLMs are pre-trained on many popular tabular datasets. This exposure can lead to invalid performance evaluation on downstream tasks because the LLMs have, in effect, been fit to the test set. Interestingly, we also identify a regime where the language model reproduces important statistics of the data, but fails to reproduce the dataset verbatim. On these datasets, although seen during training, good performance on downstream tasks might not be due to overfitting. Our findings underscore the need for ensuring data integrity in machine learning tasks with LLMs. To facilitate future research, we release an open-source tool that can perform various tests for memorization \url{https://github.com/interpretml/LLM-Tabular-Memorization-Checker}.Comment: Table Representation Learning Workshop at NeurIPS 202

    LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs

    Full text link
    We show that large language models (LLMs) are remarkably good at working with interpretable models that decompose complex outcomes into univariate graph-represented components. By adopting a hierarchical approach to reasoning, LLMs can provide comprehensive model-level summaries without ever requiring the entire model to fit in context. This approach enables LLMs to apply their extensive background knowledge to automate common tasks in data science such as detecting anomalies that contradict prior knowledge, describing potential reasons for the anomalies, and suggesting repairs that would remove the anomalies. We use multiple examples in healthcare to demonstrate the utility of these new capabilities of LLMs, with particular emphasis on Generalized Additive Models (GAMs). Finally, we present the package TalkToEBM\texttt{TalkToEBM} as an open-source LLM-GAM interface

    A Bandit Model for Human-Machine Decision Making with Private Information and Opacity

    Full text link
    Applications of machine learning inform human decision makers in a broad range of tasks. The resulting problem is usually formulated in terms of a single decision maker. We argue that it should rather be described as a two-player learning problem where one player is the machine and the other the human. While both players try to optimize the final decision, the setup is often characterized by (1) the presence of private information and (2) opacity, i.e imperfect understanding between the decision makers. In the paper we prove that both properties can complicate decision making considerably. A lower bound quantifies the worst-case hardness of optimally advising a decision maker who is opaque or has access to private information. An upper bound shows that a simple coordination strategy is nearly minimax optimal. More efficient learning is possible under certain assumptions on the problem, for example that both players learn to take actions independently. Such assumptions are implicit in existing literature, for example in medical applications of machine learning, but have not been described or justified theoretically

    Explainable Machine Learning and its Limitations

    Get PDF
    In the last decade, machine learning evolved from a sub-field of computer science into one of the most impactful scientific disciplines of our time. While this has brought impressive scientific advances, there are now increasing concerns about the applications of artificial intelligence systems in societal contexts. Many concerns are rooted in the fact that machine learning models can be incredibly opaque. To overcome this problem, the nascent field of explainable machine learning attempts to provide human-understandable explanations for the behavior of complex models. After an initial period of method development and excitement, researchers in this field have now recognized the many difficulties inherent in faithfully explaining complex models. In this thesis, we review the developments within the first decade of explainable machine learning. We outline the main motivations for explainable machine learning, as well as some of the debates within the field. We also make three specific contributions that attempt to clarify what is and is not possible when explaining complex models. The first part of the thesis studies the learning dynamics of the human-machine decision making problem. We show how this learning problem is different from other forms of collaborative decision making, and derive conditions under which it can be efficiently solved. We also clarify the role of algorithmic explanations in this setup. In the second part of the thesis, we study the suitability of local post-hoc explanation algorithms in societal contexts. Focusing on the draft EU Artificial Intelligence Act, we argue that these methods are unable to fulfill the transparency objectives that are inherent in the law. Our results also suggest that regulating artificial intelligence systems implicitly via their explanations is unlikely to succeed with currently available methods. In the third part of the thesis, we provide a detailed mathematical analysis of Shapley Values, a prominent model explanation technique, and show how it is connected with Generalized Additive Models, a popular class of interpretable models. The last part of the thesis serves as an interesting case study of a connection between a post-hoc method and a class of interpretable models

    Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts

    Full text link
    Existing and planned legislation stipulates various obligations to provide information about machine learning algorithms and their functioning, often interpreted as obligations to "explain". Many researchers suggest using post-hoc explanation algorithms for this purpose. In this paper, we combine legal, philosophical and technical arguments to show that post-hoc explanation algorithms are unsuitable to achieve the law's objectives. Indeed, most situations where explanations are requested are adversarial, meaning that the explanation provider and receiver have opposing interests and incentives, so that the provider might manipulate the explanation for her own ends. We show that this fundamental conflict cannot be resolved because of the high degree of ambiguity of post-hoc explanations in realistic application scenarios. As a consequence, post-hoc explanation algorithms are unsuitable to achieve the transparency objectives inherent to the legal norms. Instead, there is a need to more explicitly discuss the objectives underlying "explainability" obligations as these can often be better achieved through other mechanisms. There is an urgent need for a more open and honest discussion regarding the potential and limitations of post-hoc explanations in adversarial contexts, in particular in light of the current negotiations of the European Union's draft Artificial Intelligence Act.Comment: FAccT 202
    corecore