202 research outputs found
Interpretability and Explainability: A Machine Learning Zoo Mini-tour
In this review, we examine the problem of designing interpretable and
explainable machine learning models. Interpretability and explainability lie at
the core of many machine learning and statistical applications in medicine,
economics, law, and natural sciences. Although interpretability and
explainability have escaped a clear universal definition, many techniques
motivated by these properties have been developed over the recent 30 years with
the focus currently shifting towards deep learning methods. In this review, we
emphasise the divide between interpretability and explainability and illustrate
these two different research directions with concrete examples of the
state-of-the-art. The review is intended for a general machine learning
audience with interest in exploring the problems of interpretation and
explanation beyond logistic regression or random forest variable importance.
This work is not an exhaustive literature survey, but rather a primer focusing
selectively on certain lines of research which the authors found interesting or
informative
(Un)reasonable Allure of Ante-hoc Interpretability for High-stakes Domains: Transparency Is Necessary but Insufficient for Explainability
Ante-hoc interpretability has become the holy grail of explainable machine
learning for high-stakes domains such as healthcare; however, this notion is
elusive, lacks a widely-accepted definition and depends on the deployment
context. It can refer to predictive models whose structure adheres to
domain-specific constraints, or ones that are inherently transparent. The
latter notion assumes observers who judge this quality, whereas the former
presupposes them to have technical and domain expertise, in certain cases
rendering such models unintelligible. Additionally, its distinction from the
less desirable post-hoc explainability, which refers to methods that construct
a separate explanatory model, is vague given that transparent predictors may
still require (post-)processing to yield satisfactory explanatory insights.
Ante-hoc interpretability is thus an overloaded concept that comprises a range
of implicit properties, which we unpack in this paper to better understand what
is needed for its safe deployment across high-stakes domains. To this end, we
outline model- and explainer-specific desiderata that allow us to navigate its
distinct realisations in view of the envisaged application and audience
Generation of Differentially Private Heterogeneous Electronic Health Records
Electronic Health Records (EHRs) are commonly used by the machine learning
community for research on problems specifically related to health care and
medicine. EHRs have the advantages that they can be easily distributed and
contain many features useful for e.g. classification problems. What makes EHR
data sets different from typical machine learning data sets is that they are
often very sparse, due to their high dimensionality, and often contain
heterogeneous (mixed) data types. Furthermore, the data sets deal with
sensitive information, which limits the distribution of any models learned
using them, due to privacy concerns. For these reasons, using EHR data in
practice presents a real challenge. In this work, we explore using Generative
Adversarial Networks to generate synthetic, heterogeneous EHRs with the goal of
using these synthetic records in place of existing data sets for downstream
classification tasks. We will further explore applying differential privacy
(DP) preserving optimization in order to produce DP synthetic EHR data sets,
which provide rigorous privacy guarantees, and are therefore shareable and
usable in the real world. The performance (measured by AUROC, AUPRC and
accuracy) of our model's synthetic, heterogeneous data is very close to the
original data set (within 3 - 5% of the baseline) for the non-DP model when
tested in a binary classification task. Using strong DP, our
model still produces data useful for machine learning tasks, albeit incurring a
roughly 17% performance penalty in our tested classification task. We
additionally perform a sub-population analysis and find that our model does not
introduce any bias into the synthetic EHR data compared to the baseline in
either male/female populations, or the 0-18, 19-50 and 51+ age groups in terms
of classification performance for either the non-DP or DP variant
Generalized Multimodal ELBO
Multiple data types naturally co-occur when describing real-world phenomena
and learning from them is a long-standing goal in machine learning research.
However, existing self-supervised generative models approximating an ELBO are
not able to fulfill all desired requirements of multimodal models: their
posterior approximation functions lead to a trade-off between the semantic
coherence and the ability to learn the joint data distribution. We propose a
new, generalized ELBO formulation for multimodal data that overcomes these
limitations. The new objective encompasses two previous methods as special
cases and combines their benefits without compromises. In extensive
experiments, we demonstrate the advantage of the proposed method compared to
state-of-the-art models in self-supervised, generative learning tasks.Comment: 2021 ICL
Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence
Learning from different data types is a long-standing goal in machine
learning research, as multiple information sources co-occur when describing
natural phenomena. However, existing generative models that approximate a
multimodal ELBO rely on difficult or inefficient training schemes to learn a
joint distribution and the dependencies between modalities. In this work, we
propose a novel, efficient objective function that utilizes the Jensen-Shannon
divergence for multiple distributions. It simultaneously approximates the
unimodal and joint multimodal posteriors directly via a dynamic prior. In
addition, we theoretically prove that the new multimodal JS-divergence (mmJSD)
objective optimizes an ELBO. In extensive experiments, we demonstrate the
advantage of the proposed mmJSD model compared to previous work in
unsupervised, generative learning tasks.Comment: Accepted at NeurIPS 2020, camera-ready versio
Decoupling State Representation Methods from Reinforcement Learning in Car Racing
In the quest for efficient and robust learning methods, combining unsupervised state representation learning and reinforcement learning (RL) could offer advantages for scaling RL algorithms by providing the models with a useful inductive bias. For achieving this, an encoder is trained in an unsupervised manner with two state representation methods, a variational autoencoder and a contrastive estimator. The learned features are then fed to the actor-critic RL algorithm Proximal Policy Optimization (PPO) to learn a policy for playing Open AI's car racing environment. Hence, such procedure permits to decouple state representations from RL-controllers. For the integration of RL with unsupervised learning, we explore various designs for variational autoencoders and contrastive learning. The proposed method is compared to a deep network trained directly on pixel inputs with PPO. The results show that the proposed method performs slightly worse than directly learning from pixel inputs; however, it has a more stable learning curve, a substantial reduction of the buffer size, and requires optimizing 88% fewer parameters. These results indicate that the use of pre-trained state representations has several benefits for solving RL tasks.</p
Beyond Normal: On the Evaluation of Mutual Information Estimators
Mutual information is a general statistical dependency measure which has
found applications in representation learning, causality, domain generalization
and computational biology. However, mutual information estimators are typically
evaluated on simple families of probability distributions, namely multivariate
normal distribution and selected distributions with one-dimensional random
variables. In this paper, we show how to construct a diverse family of
distributions with known ground-truth mutual information and propose a
language-independent benchmarking platform for mutual information estimators.
We discuss the general applicability and limitations of classical and neural
estimators in settings involving high dimensions, sparse interactions,
long-tailed distributions, and high mutual information. Finally, we provide
guidelines for practitioners on how to select appropriate estimator adapted to
the difficulty of problem considered and issues one needs to consider when
applying an estimator to a new data set.Comment: Accepted at NeurIPS 2023. Code available at
https://github.com/cbg-ethz/bm
- …