9 research outputs found
Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance
Interpretability methods are valuable only if their explanations faithfully
describe the explained model. In this work, we consider neural networks whose
predictions are invariant under a specific symmetry group. This includes
popular architectures, ranging from convolutional to graph neural networks. Any
explanation that faithfully explains this type of model needs to be in
agreement with this invariance property. We formalize this intuition through
the notion of explanation invariance and equivariance by leveraging the
formalism from geometric deep learning. Through this rigorous formalism, we
derive (1) two metrics to measure the robustness of any interpretability method
with respect to the model symmetry group; (2) theoretical robustness guarantees
for some popular interpretability methods and (3) a systematic approach to
increase the invariance of any interpretability method with respect to a
symmetry group. By empirically measuring our metrics for explanations of models
associated with various modalities and symmetry groups, we derive a set of 5
guidelines to allow users and developers of interpretability methods to produce
robust explanations.Comment: 26 pages, 7 figure
TRIAGE: Characterizing and auditing training data for improved regression
Data quality is crucial for robust machine learning algorithms, with the
recent interest in data-centric AI emphasizing the importance of training data
characterization. However, current data characterization methods are largely
focused on classification settings, with regression settings largely
understudied. To address this, we introduce TRIAGE, a novel data
characterization framework tailored to regression tasks and compatible with a
broad class of regressors. TRIAGE utilizes conformal predictive distributions
to provide a model-agnostic scoring method, the TRIAGE score. We operationalize
the score to analyze individual samples' training dynamics and characterize
samples as under-, over-, or well-estimated by the model. We show that TRIAGE's
characterization is consistent and highlight its utility to improve performance
via data sculpting/filtering, in multiple regression settings. Additionally,
beyond sample level, we show TRIAGE enables new approaches to dataset selection
and feature acquisition. Overall, TRIAGE highlights the value unlocked by data
characterization in real-world regression applicationsComment: Presented at NeurIPS 202
Joint Training of Deep Ensembles Fails Due to Learner Collusion
Ensembles of machine learning models have been well established as a powerful
method of improving performance over a single model. Traditionally, ensembling
algorithms train their base learners independently or sequentially with the
goal of optimizing their joint performance. In the case of deep ensembles of
neural networks, we are provided with the opportunity to directly optimize the
true objective: the joint performance of the ensemble as a whole. Surprisingly,
however, directly minimizing the loss of the ensemble appears to rarely be
applied in practice. Instead, most previous research trains individual models
independently with ensembling performed post hoc. In this work, we show that
this is for good reason - joint optimization of ensemble loss results in
degenerate behavior. We approach this problem by decomposing the ensemble
objective into the strength of the base learners and the diversity between
them. We discover that joint optimization results in a phenomenon in which base
learners collude to artificially inflate their apparent diversity. This
pseudo-diversity fails to generalize beyond the training data, causing a larger
generalization gap. We proceed to comprehensively demonstrate the practical
implications of this effect on a range of standard machine learning tasks and
architectures by smoothly interpolating between independent training and joint
optimization.Comment: To appear in the Proceedings of the 37th Conference on Neural
Information Processing Systems (NeurIPS 2023
DAGnosis: Localized Identification of Data Inconsistencies using Structures
Identification and appropriate handling of inconsistencies in data at
deployment time is crucial to reliably use machine learning models. While
recent data-centric methods are able to identify such inconsistencies with
respect to the training set, they suffer from two key limitations: (1)
suboptimality in settings where features exhibit statistical independencies,
due to their usage of compressive representations and (2) lack of localization
to pin-point why a sample might be flagged as inconsistent, which is important
to guide future data collection. We solve these two fundamental limitations
using directed acyclic graphs (DAGs) to encode the training set's features
probability distribution and independencies as a structure. Our method, called
DAGnosis, leverages these structural interactions to bring valuable and
insightful data-centric conclusions. DAGnosis unlocks the localization of the
causes of inconsistencies on a DAG, an aspect overlooked by previous
approaches. Moreover, we show empirically that leveraging these interactions
(1) leads to more accurate conclusions in detecting inconsistencies, as well as
(2) provides more detailed insights into why some samples are flagged.Comment: AISTATS 2024; added correspondance emai
MatterGen: a generative model for inorganic materials design
The design of functional materials with desired properties is essential in
driving technological advances in areas like energy storage, catalysis, and
carbon capture. Generative models provide a new paradigm for materials design
by directly generating entirely novel materials given desired property
constraints. Despite recent progress, current generative models have low
success rate in proposing stable crystals, or can only satisfy a very limited
set of property constraints. Here, we present MatterGen, a model that generates
stable, diverse inorganic materials across the periodic table and can further
be fine-tuned to steer the generation towards a broad range of property
constraints. To enable this, we introduce a new diffusion-based generative
process that produces crystalline structures by gradually refining atom types,
coordinates, and the periodic lattice. We further introduce adapter modules to
enable fine-tuning towards any given property constraints with a labeled
dataset. Compared to prior generative models, structures produced by MatterGen
are more than twice as likely to be novel and stable, and more than 15 times
closer to the local energy minimum. After fine-tuning, MatterGen successfully
generates stable, novel materials with desired chemistry, symmetry, as well as
mechanical, electronic and magnetic properties. Finally, we demonstrate
multi-property materials design capabilities by proposing structures that have
both high magnetic density and a chemical composition with low supply-chain
risk. We believe that the quality of generated materials and the breadth of
MatterGen's capabilities represent a major advancement towards creating a
universal generative model for materials design.Comment: 13 pages main text, 35 pages supplementary informatio
Data-SUITE: Data-centric identification of in-distribution incongruous examples
Systematic quantification of data quality is critical for consistent model
performance. Prior works have focused on out-of-distribution data. Instead, we
tackle an understudied yet equally important problem of characterizing
incongruous regions of in-distribution (ID) data, which may arise from feature
space heterogeneity. To this end, we propose a paradigm shift with Data-SUITE:
a data-centric AI framework to identify these regions, independent of a
task-specific model. Data-SUITE leverages copula modeling, representation
learning, and conformal prediction to build feature-wise confidence interval
estimators based on a set of training instances. These estimators can be used
to evaluate the congruence of test instances with respect to the training set,
to answer two practically useful questions: (1) which test instances will be
reliably predicted by a model trained with the training instances? and (2) can
we identify incongruous regions of the feature space so that data owners
understand the data's limitations or guide future data collection? We
empirically validate Data-SUITE's performance and coverage guarantees and
demonstrate on cross-site medical data, biased data, and data with concept
drift, that Data-SUITE best identifies ID regions where a downstream model may
be reliable (independent of said model). We also illustrate how these
identified regions can provide insights into datasets and highlight their
limitations.Comment: Presented at the International Conference on Machine Learning (ICML)
202
Universal Dependencies 2.3
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008)