12 research outputs found
Statistics without Interpretation: A Sober Look at Explainable Machine Learning
In the rapidly growing literature on explanation algorithms, it often remains
unclear what precisely these algorithms are for and how they should be used. We
argue that this is because explanation algorithms are often mathematically
complex but don't admit a clear interpretation. Unfortunately, complex
statistical methods that don't have a clear interpretation are bound to lead to
errors in interpretation, a fact that has become increasingly apparent in the
literature. In order to move forward, papers on explanation algorithms should
make clear how precisely the output of the algorithms should be interpreted.
They should also clarify what questions about the function can and cannot be
answered given the explanations. Our argument is based on the distinction
between statistics and their interpretation. It also relies on parallels
between explainable machine learning and applied statistics
Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness
One of the remarkable properties of robust computer vision models is that
their input-gradients are often aligned with human perception, referred to in
the literature as perceptually-aligned gradients (PAGs). Despite only being
trained for classification, PAGs cause robust models to have rudimentary
generative capabilities, including image generation, denoising, and
in-painting. However, the underlying mechanisms behind these phenomena remain
unknown. In this work, we provide a first explanation of PAGs via
\emph{off-manifold robustness}, which states that models must be more robust
off- the data manifold than they are on-manifold. We first demonstrate
theoretically that off-manifold robustness leads input gradients to lie
approximately on the data manifold, explaining their perceptual alignment. We
then show that Bayes optimal models satisfy off-manifold robustness, and
confirm the same empirically for robust models trained via gradient norm
regularization, noise augmentation, and randomized smoothing. Quantifying the
perceptual alignment of model gradients via their similarity with the gradients
of generative models, we show that off-manifold robustness correlates well with
perceptual alignment. Finally, based on the levels of on- and off-manifold
robustness, we identify three different regimes of robustness that affect both
perceptual alignment and model accuracy: weak robustness, bayes-aligned
robustness, and excessive robustness
Elephants Never Forget: Testing Language Models for Memorization of Tabular Data
While many have shown how Large Language Models (LLMs) can be applied to a
diverse set of tasks, the critical issues of data contamination and
memorization are often glossed over. In this work, we address this concern for
tabular data. Starting with simple qualitative tests for whether an LLM knows
the names and values of features, we introduce a variety of different
techniques to assess the degrees of contamination, including statistical tests
for conditional distribution modeling and four tests that identify
memorization. Our investigation reveals that LLMs are pre-trained on many
popular tabular datasets. This exposure can lead to invalid performance
evaluation on downstream tasks because the LLMs have, in effect, been fit to
the test set. Interestingly, we also identify a regime where the language model
reproduces important statistics of the data, but fails to reproduce the dataset
verbatim. On these datasets, although seen during training, good performance on
downstream tasks might not be due to overfitting. Our findings underscore the
need for ensuring data integrity in machine learning tasks with LLMs. To
facilitate future research, we release an open-source tool that can perform
various tests for memorization
\url{https://github.com/interpretml/LLM-Tabular-Memorization-Checker}.Comment: Table Representation Learning Workshop at NeurIPS 202
LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs
We show that large language models (LLMs) are remarkably good at working with
interpretable models that decompose complex outcomes into univariate
graph-represented components. By adopting a hierarchical approach to reasoning,
LLMs can provide comprehensive model-level summaries without ever requiring the
entire model to fit in context. This approach enables LLMs to apply their
extensive background knowledge to automate common tasks in data science such as
detecting anomalies that contradict prior knowledge, describing potential
reasons for the anomalies, and suggesting repairs that would remove the
anomalies. We use multiple examples in healthcare to demonstrate the utility of
these new capabilities of LLMs, with particular emphasis on Generalized
Additive Models (GAMs). Finally, we present the package as
an open-source LLM-GAM interface
A Bandit Model for Human-Machine Decision Making with Private Information and Opacity
Applications of machine learning inform human decision makers in a broad
range of tasks. The resulting problem is usually formulated in terms of a
single decision maker. We argue that it should rather be described as a
two-player learning problem where one player is the machine and the other the
human. While both players try to optimize the final decision, the setup is
often characterized by (1) the presence of private information and (2) opacity,
i.e imperfect understanding between the decision makers. In the paper we prove
that both properties can complicate decision making considerably. A lower bound
quantifies the worst-case hardness of optimally advising a decision maker who
is opaque or has access to private information. An upper bound shows that a
simple coordination strategy is nearly minimax optimal. More efficient learning
is possible under certain assumptions on the problem, for example that both
players learn to take actions independently. Such assumptions are implicit in
existing literature, for example in medical applications of machine learning,
but have not been described or justified theoretically
Explainable Machine Learning and its Limitations
In the last decade, machine learning evolved from a sub-field of computer science into one of the most impactful scientific disciplines of our time. While this has brought impressive scientific advances, there are now increasing concerns about the applications of artificial intelligence systems in societal contexts. Many concerns are rooted in the fact that machine learning models can be incredibly opaque. To overcome this problem, the nascent field of explainable machine learning attempts to provide human-understandable explanations for the behavior of complex models. After an initial period of method development and excitement, researchers in this field have now recognized the many difficulties inherent in faithfully explaining complex models. In this thesis, we review the developments within the first decade of explainable machine learning. We outline the main motivations for explainable machine learning, as well as some of the debates within the field. We also make three specific contributions that attempt to clarify what is and is not possible when explaining complex models. The first part of the thesis studies the learning dynamics of the human-machine decision making problem. We show how this learning problem is different from other forms of collaborative decision making, and derive conditions under which it can be efficiently solved. We also clarify the role of algorithmic explanations in this setup. In the second part of the thesis, we study the suitability of local post-hoc explanation algorithms in societal contexts. Focusing on the draft EU Artificial Intelligence Act, we argue that these methods are unable to fulfill the transparency objectives that are inherent in the law. Our results also suggest that regulating artificial intelligence systems implicitly via their explanations is unlikely to succeed with currently available methods. In the third part of the thesis, we provide a detailed mathematical analysis of Shapley Values, a prominent model explanation technique, and show how it is connected with Generalized Additive Models, a popular class of interpretable models. The last part of the thesis serves as an interesting case study of a connection between a post-hoc method and a class of interpretable models
Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts
Existing and planned legislation stipulates various obligations to provide
information about machine learning algorithms and their functioning, often
interpreted as obligations to "explain". Many researchers suggest using
post-hoc explanation algorithms for this purpose. In this paper, we combine
legal, philosophical and technical arguments to show that post-hoc explanation
algorithms are unsuitable to achieve the law's objectives. Indeed, most
situations where explanations are requested are adversarial, meaning that the
explanation provider and receiver have opposing interests and incentives, so
that the provider might manipulate the explanation for her own ends. We show
that this fundamental conflict cannot be resolved because of the high degree of
ambiguity of post-hoc explanations in realistic application scenarios. As a
consequence, post-hoc explanation algorithms are unsuitable to achieve the
transparency objectives inherent to the legal norms. Instead, there is a need
to more explicitly discuss the objectives underlying "explainability"
obligations as these can often be better achieved through other mechanisms.
There is an urgent need for a more open and honest discussion regarding the
potential and limitations of post-hoc explanations in adversarial contexts, in
particular in light of the current negotiations of the European Union's draft
Artificial Intelligence Act.Comment: FAccT 202