35 research outputs found
TRUST-LAPSE: An Explainable and Actionable Mistrust Scoring Framework for Model Monitoring
Continuous monitoring of trained ML models to determine when their
predictions should and should not be trusted is essential for their safe
deployment. Such a framework ought to be high-performing, explainable, post-hoc
and actionable. We propose TRUST-LAPSE, a "mistrust" scoring framework for
continuous model monitoring. We assess the trustworthiness of each input
sample's model prediction using a sequence of latent-space embeddings.
Specifically, (a) our latent-space mistrust score estimates mistrust using
distance metrics (Mahalanobis distance) and similarity metrics (cosine
similarity) in the latent-space and (b) our sequential mistrust score
determines deviations in correlations over the sequence of past input
representations in a non-parametric, sliding-window based algorithm for
actionable continuous monitoring. We evaluate TRUST-LAPSE via two downstream
tasks: (1) distributionally shifted input detection, and (2) data drift
detection. We evaluate across diverse domains - audio and vision using public
datasets and further benchmark our approach on challenging, real-world
electroencephalograms (EEG) datasets for seizure detection. Our latent-space
mistrust scores achieve state-of-the-art results with AUROCs of 84.1 (vision),
73.9 (audio), and 77.1 (clinical EEGs), outperforming baselines by over 10
points. We expose critical failures in popular baselines that remain
insensitive to input semantic content, rendering them unfit for real-world
model monitoring. We show that our sequential mistrust scores achieve high
drift detection rates; over 90% of the streams show < 20% error for all
domains. Through extensive qualitative and quantitative evaluations, we show
that our mistrust scores are more robust and provide explainability for easy
adoption into practice.Comment: Keywords: Mistrust Scores, Latent-Space, Model monitoring,
Trustworthy AI, Explainable AI, Semantic-guided A
Cross-Modal Data Programming Enables Rapid Medical Machine Learning
Labeling training datasets has become a key barrier to building medical
machine learning models. One strategy is to generate training labels
programmatically, for example by applying natural language processing pipelines
to text reports associated with imaging studies. We propose cross-modal data
programming, which generalizes this intuitive strategy in a
theoretically-grounded way that enables simpler, clinician-driven input,
reduces required labeling time, and improves with additional unlabeled data. In
this approach, clinicians generate training labels for models defined over a
target modality (e.g. images or time series) by writing rules over an auxiliary
modality (e.g. text reports). The resulting technical challenge consists of
estimating the accuracies and correlations of these rules; we extend a recent
unsupervised generative modeling technique to handle this cross-modal setting
in a provably consistent way. Across four applications in radiography, computed
tomography, and electroencephalography, and using only several hours of
clinician time, our approach matches or exceeds the efficacy of
physician-months of hand-labeling with statistical significance, demonstrating
a fundamentally faster and more flexible way of building machine learning
models in medicine
Domino: Discovering Systematic Errors with Cross-Modal Embeddings
Machine learning models that achieve high overall accuracy often make
systematic errors on important subsets (or slices) of data. Identifying
underperforming slices is particularly challenging when working with
high-dimensional inputs (e.g. images, audio), where important slices are often
unlabeled. In order to address this issue, recent studies have proposed
automated slice discovery methods (SDMs), which leverage learned model
representations to mine input data for slices on which a model performs poorly.
To be useful to a practitioner, these methods must identify slices that are
both underperforming and coherent (i.e. united by a human-understandable
concept). However, no quantitative evaluation framework currently exists for
rigorously assessing SDMs with respect to these criteria. Additionally, prior
qualitative evaluations have shown that SDMs often identify slices that are
incoherent. In this work, we address these challenges by first designing a
principled evaluation framework that enables a quantitative comparison of SDMs
across 1,235 slice discovery settings in three input domains (natural images,
medical images, and time-series data). Then, motivated by the recent
development of powerful cross-modal representation learning approaches, we
present Domino, an SDM that leverages cross-modal embeddings and a novel
error-aware mixture model to discover and describe coherent slices. We find
that Domino accurately identifies 36% of the 1,235 slices in our framework - a
12 percentage point improvement over prior methods. Further, Domino is the
first SDM that can provide natural language descriptions of identified slices,
correctly generating the exact name of the slice in 35% of settings.Comment: ICLR 2022 (Oral
Spatiotemporal Modeling of Multivariate Signals With Graph Neural Networks and Structured State Space Models
Multivariate signals are prevalent in various domains, such as healthcare,
transportation systems, and space sciences. Modeling spatiotemporal
dependencies in multivariate signals is challenging due to (1) long-range
temporal dependencies and (2) complex spatial correlations between sensors. To
address these challenges, we propose representing multivariate signals as
graphs and introduce GraphS4mer, a general graph neural network (GNN)
architecture that captures both spatial and temporal dependencies in
multivariate signals. Specifically, (1) we leverage Structured State Spaces
model (S4), a state-of-the-art sequence model, to capture long-term temporal
dependencies and (2) we propose a graph structure learning layer in GraphS4mer
to learn dynamically evolving graph structures in the data. We evaluate our
proposed model on three distinct tasks and show that GraphS4mer consistently
improves over existing models, including (1) seizure detection from
electroencephalography signals, outperforming a previous GNN with
self-supervised pretraining by 3.1 points in AUROC; (2) sleep staging from
polysomnography signals, a 4.1 points improvement in macro-F1 score compared to
existing sleep staging models; and (3) traffic forecasting, reducing MAE by
8.8% compared to existing GNNs and by 1.4% compared to Transformer-based
models
Semi-Supervised Learning for Sparsely-Labeled Sequential Data: Application to Healthcare Video Processing
Labeled data is a critical resource for training and evaluating machine
learning models. However, many real-life datasets are only partially labeled.
We propose a semi-supervised machine learning training strategy to improve
event detection performance on sequential data, such as video recordings, when
only sparse labels are available, such as event start times without their
corresponding end times. Our method uses noisy guesses of the events' end times
to train event detection models. Depending on how conservative these guesses
are, mislabeled false positives may be introduced into the training set (i.e.,
negative sequences mislabeled as positives). We further propose a mathematical
model for estimating how many inaccurate labels a model is exposed to, based on
how noisy the end time guesses are. Finally, we show that neural networks can
improve their detection performance by leveraging more training data with less
conservative approximations despite the higher proportion of incorrect labels.
We adapt sequential versions of MNIST and CIFAR-10 to empirically evaluate our
method, and find that our risk-tolerant strategy outperforms conservative
estimates by 12 points of mean average precision for MNIST, and 3.5 points for
CIFAR. Then, we leverage the proposed training strategy to tackle a real-life
application: processing continuous video recordings of epilepsy patients to
improve seizure detection, and show that our method outperforms baseline
labeling methods by 10 points of average precision
MSH3 polymorphisms and protein levels affect CAG repeat instability in huntington's disease mice
Expansions of trinucleotide CAG/CTG repeats in somatic tissues are thought to contribute to ongoing disease progression through an affected individual's life with Huntington's disease or myotonic dystrophy. Broad ranges of repeat instability arise between individuals with expanded repeats, suggesting the existence of modifiers of repeat instability. Mice with expanded CAG/CTG repeats show variable levels of instability depending upon mouse strain. However, to date the genetic modifiers underlying these differences have not been identified. We show that in liver and striatum the R6/1 Huntington's disease (HD) (CAG)~100 transgene, when present in a congenic C57BL/6J (B6) background, incurred expansion-biased repeat mutations, whereas the repeat was stable in a congenic BALB/cByJ (CBy) background. Reciprocal congenic mice revealed the Msh3 gene as the determinant for the differences in repeat instability. Expansion bias was observed in congenic mice homozygous for the B6 Msh3 gene on a CBy background, while the CAG tract was stabilized in congenics homozygous for the CBy Msh3 gene on a B6 background. The CAG stabilization was as dramatic as genetic deficiency of Msh2. The B6 and CBy Msh3 genes had identical promoters but differed in coding regions and showed strikingly different protein levels. B6 MSH3 variant protein is highly expressed and associated with CAG expansions, while the CBy MSH3 variant protein is expressed at barely detectable levels, associating with CAG stability. The DHFR protein, which is divergently transcribed from a promoter shared by the Msh3 gene, did not show varied levels between mouse strains. Thus, naturally occurring MSH3 protein polymorphisms are modifiers of CAG repeat instability, likely through variable MSH3 protein stability. Since evidence supports that somatic CAG instability is a modifier and predictor of disease, our data are consistent with the hypothesis that variable levels of CAG instability associated with polymorphisms of DNA repair genes may have prognostic implications for various repeat-associated diseases
Catching Element Formation In The Act
Gamma-ray astronomy explores the most energetic photons in nature to address
some of the most pressing puzzles in contemporary astrophysics. It encompasses
a wide range of objects and phenomena: stars, supernovae, novae, neutron stars,
stellar-mass black holes, nucleosynthesis, the interstellar medium, cosmic rays
and relativistic-particle acceleration, and the evolution of galaxies. MeV
gamma-rays provide a unique probe of nuclear processes in astronomy, directly
measuring radioactive decay, nuclear de-excitation, and positron annihilation.
The substantial information carried by gamma-ray photons allows us to see
deeper into these objects, the bulk of the power is often emitted at gamma-ray
energies, and radioactivity provides a natural physical clock that adds unique
information. New science will be driven by time-domain population studies at
gamma-ray energies. This science is enabled by next-generation gamma-ray
instruments with one to two orders of magnitude better sensitivity, larger sky
coverage, and faster cadence than all previous gamma-ray instruments. This
transformative capability permits: (a) the accurate identification of the
gamma-ray emitting objects and correlations with observations taken at other
wavelengths and with other messengers; (b) construction of new gamma-ray maps
of the Milky Way and other nearby galaxies where extended regions are
distinguished from point sources; and (c) considerable serendipitous science of
scarce events -- nearby neutron star mergers, for example. Advances in
technology push the performance of new gamma-ray instruments to address a wide
set of astrophysical questions.Comment: 14 pages including 3 figure
Recommended from our members
iEEG-BIDS, extending the Brain Imaging Data Structure specification to human intracranial electrophysiology
The Brain Imaging Data Structure (BIDS) is a community-driven specification for organizing neuroscience data and metadata with the aim to make datasets more transparent, reusable, and reproducible. Intracranial electroencephalography (iEEG) data offer a unique combination of high spatial and temporal resolution measurements of the living human brain. To improve internal (re)use and external sharing of these unique data, we present a specification for storing and sharing iEEG data: iEEG-BIDS
The James Webb Space Telescope Mission: Optical Telescope Element Design, Development, and Performance
The James Webb Space Telescope (JWST) is a large, infrared space telescope
that has recently started its science program which will enable breakthroughs
in astrophysics and planetary science. Notably, JWST will provide the very
first observations of the earliest luminous objects in the Universe and start a
new era of exoplanet atmospheric characterization. This transformative science
is enabled by a 6.6 m telescope that is passively cooled with a 5-layer
sunshield. The primary mirror is comprised of 18 controllable, low areal
density hexagonal segments, that were aligned and phased relative to each other
in orbit using innovative image-based wavefront sensing and control algorithms.
This revolutionary telescope took more than two decades to develop with a
widely distributed team across engineering disciplines. We present an overview
of the telescope requirements, architecture, development, superb on-orbit
performance, and lessons learned. JWST successfully demonstrates a segmented
aperture space telescope and establishes a path to building even larger space
telescopes.Comment: accepted by PASP for JWST Overview Special Issue; 34 pages, 25
figure