10 research outputs found
Graph Contrastive Learning for Materials
Recent work has shown the potential of graph neural networks to efficiently
predict material properties, enabling high-throughput screening of materials.
Training these models, however, often requires large quantities of labelled
data, obtained via costly methods such as ab initio calculations or
experimental evaluation. By leveraging a series of material-specific
transformations, we introduce CrystalCLR, a framework for constrastive learning
of representations with crystal graph neural networks. With the addition of a
novel loss function, our framework is able to learn representations competitive
with engineered fingerprinting methods. We also demonstrate that via model
finetuning, contrastive pretraining can improve the performance of graph neural
networks for prediction of material properties and significantly outperform
traditional ML models that use engineered fingerprints. Lastly, we observe that
CrystalCLR produces material representations that form clusters by compound
class.Comment: 7 pages, 3 figures, NeurIPS 2022 AI for Accelerated Materials Design
Worksho
Encoding Time-Series Explanations through Self-Supervised Model Behavior Consistency
Interpreting time series models is uniquely challenging because it requires
identifying both the location of time series signals that drive model
predictions and their matching to an interpretable temporal pattern. While
explainers from other modalities can be applied to time series, their inductive
biases do not transfer well to the inherently uninterpretable nature of time
series. We present TimeX, a time series consistency model for training
explainers. TimeX trains an interpretable surrogate to mimic the behavior of a
pretrained time series model. It addresses the issue of model faithfulness by
introducing model behavior consistency, a novel formulation that preserves
relations in the latent space induced by the pretrained model with relations in
the latent space induced by TimeX. TimeX provides discrete attribution maps
and, unlike existing interpretability methods, it learns a latent space of
explanations that can be used in various ways, such as to provide landmarks to
visually aggregate similar explanations and easily recognize temporal patterns.
We evaluate TimeX on 8 synthetic and real-world datasets and compare its
performance against state-of-the-art interpretability methods. We also conduct
case studies using physiological time series. Quantitative evaluations
demonstrate that TimeX achieves the highest or second-highest performance in
every metric compared to baselines across all datasets. Through case studies,
we show that the novel components of TimeX show potential for training
faithful, interpretable models that capture the behavior of pretrained time
series models
Domain Adaptation for Time Series Under Feature and Label Shifts
Unsupervised domain adaptation (UDA) enables the transfer of models trained
on source domains to unlabeled target domains. However, transferring complex
time series models presents challenges due to the dynamic temporal structure
variations across domains. This leads to feature shifts in the time and
frequency representations. Additionally, the label distributions of tasks in
the source and target domains can differ significantly, posing difficulties in
addressing label shifts and recognizing labels unique to the target domain.
Effectively transferring complex time series models remains a formidable
problem. We present Raincoat, the first model for both closed-set and universal
domain adaptation on complex time series. Raincoat addresses feature and label
shifts by considering both temporal and frequency features, aligning them
across domains, and correcting for misalignments to facilitate the detection of
private labels. Additionally, Raincoat improves transferability by identifying
label shifts in target domains. Our experiments with 5 datasets and 13
state-of-the-art UDA methods demonstrate that Raincoat can improve transfer
learning performance by up to 16.33% and can handle both closed-set and
universal domain adaptation.Comment: Accepted by ICML 2023; 29 pages (14 pages main paper + 15 pages
supplementary materials). Code: see https://github.com/mims-harvard/Raincoa
U-Noise: Learnable Noise Masks for Interpretable Image Segmentation
Deep Neural Networks (DNNs) are widely used for decision making in a myriad
of critical applications, ranging from medical to societal and even judicial.
Given the importance of these decisions, it is crucial for us to be able to
interpret these models. We introduce a new method for interpreting image
segmentation models by learning regions of images in which noise can be applied
without hindering downstream model performance. We apply this method to
segmentation of the pancreas in CT scans, and qualitatively compare the quality
of the method to existing explainability techniques, such as Grad-CAM and
occlusion sensitivity. Additionally we show that, unlike other methods, our
interpretability model can be quantitatively evaluated based on the downstream
performance over obscured images.Comment: ICIP 2021. Revision: corrected affiliation and referenc
Higher-order equivariant neural networks for charge density prediction in materials
Abstract The calculation of electron density distribution using density functional theory (DFT) in materials and molecules is central to the study of their quantum and macro-scale properties, yet accurate and efficient calculation remains a long-standing challenge. We introduce ChargE3Net, an E(3)-equivariant graph neural network for predicting electron density in atomic systems. ChargE3Net enables the learning of higher-order equivariant features to achieve high predictive accuracy and model expressivity. We show that ChargE3Net exceeds the performance of prior work on diverse sets of molecules and materials. When trained on the massive dataset of over 100K materials in the Materials Project database, our model is able to capture the complexity and variability in the data, leading to a significant 26.7% reduction in self-consistent iterations when used to initialize DFT calculations on unseen materials. Furthermore, we show that non-self-consistent DFT calculations using our predicted charge densities yield near-DFT performance on electronic and thermodynamic property prediction at a fraction of the computational cost. Further analysis attributes the greater predictive accuracy to improved modeling of systems with high angular variations. These results illuminate a pathway towards a machine learning-accelerated ab initio calculations for materials discovery
TorchMetrics - Measuring Reproducibility in PyTorch
A main problem with reproducing machine learning publications is the variance of metric implementations across papers. A lack of standardization leads to different behavior in mech- anisms such as checkpointing, learning rate schedulers or early stopping, that will influence the reported results. For example, a complex metric such as Fréchet inception distance (FID) for synthetic image quality evaluation will differ based on the specific interpolation method used. There have been a few attempts at tackling the reproducibility issues. Papers With Code links research code with its corresponding paper. Similarly, arXiv recently added a code and data section that links both official and community code to papers. However, these methods rely on the paper code to be made publicly accessible which is not always possible. Our approach is to provide the de-facto reference implementation for metrics. This approach enables proprietary work to still be comparable as long as they've used our reference implementations. We introduce TorchMetrics, a general-purpose metrics package that covers a wide variety of tasks and domains used in the machine learning community. TorchMetrics provides standard classification and regression metrics; and domain-specific metrics for audio, computer vision, natural language processing, and information retrieval. Our process for adding a new metric is as follows, first we integrate a well-tested and established third-party library. Once we've verified the implementations and written tests for them, we re-implement them in native PyTorch to enable hardware acceleration and remove any bottlenecks in inter-device transfer.If you want to cite the framework, feel free to use this (but only if you loved it
TorchMetrics - Measuring Reproducibility in PyTorch
A main problem with reproducing machine learning publications is the variance of metric implementations across papers. A lack of standardization leads to different behavior in mech- anisms such as checkpointing, learning rate schedulers or early stopping, that will influence the reported results. For example, a complex metric such as Fréchet inception distance (FID) for synthetic image quality evaluation will differ based on the specific interpolation method used. There have been a few attempts at tackling the reproducibility issues. Papers With Code links research code with its corresponding paper. Similarly, arXiv recently added a code and data section that links both official and community code to papers. However, these methods rely on the paper code to be made publicly accessible which is not always possible. Our approach is to provide the de-facto reference implementation for metrics. This approach enables proprietary work to still be comparable as long as they've used our reference implementations. We introduce TorchMetrics, a general-purpose metrics package that covers a wide variety of tasks and domains used in the machine learning community. TorchMetrics provides standard classification and regression metrics; and domain-specific metrics for audio, computer vision, natural language processing, and information retrieval. Our process for adding a new metric is as follows, first we integrate a well-tested and established third-party library. Once we've verified the implementations and written tests for them, we re-implement them in native PyTorch to enable hardware acceleration and remove any bottlenecks in inter-device transfer.If you want to cite the framework, feel free to use this (but only if you loved it
TorchMetrics - Measuring Reproducibility in PyTorch
A main problem with reproducing machine learning publications is the variance of metric implementations across papers. A lack of standardization leads to different behavior in mech- anisms such as checkpointing, learning rate schedulers or early stopping, that will influence the reported results. For example, a complex metric such as Fréchet inception distance (FID) for synthetic image quality evaluation will differ based on the specific interpolation method used. There have been a few attempts at tackling the reproducibility issues. Papers With Code links research code with its corresponding paper. Similarly, arXiv recently added a code and data section that links both official and community code to papers. However, these methods rely on the paper code to be made publicly accessible which is not always possible. Our approach is to provide the de-facto reference implementation for metrics. This approach enables proprietary work to still be comparable as long as they've used our reference implementations. We introduce TorchMetrics, a general-purpose metrics package that covers a wide variety of tasks and domains used in the machine learning community. TorchMetrics provides standard classification and regression metrics; and domain-specific metrics for audio, computer vision, natural language processing, and information retrieval. Our process for adding a new metric is as follows, first we integrate a well-tested and established third-party library. Once we've verified the implementations and written tests for them, we re-implement them in native PyTorch to enable hardware acceleration and remove any bottlenecks in inter-device transfer.If you want to cite the framework, feel free to use this (but only if you loved it
TorchMetrics - Measuring Reproducibility in PyTorch
A main problem with reproducing machine learning publications is the variance of metric implementations across papers. A lack of standardization leads to different behavior in mech- anisms such as checkpointing, learning rate schedulers or early stopping, that will influence the reported results. For example, a complex metric such as Fréchet inception distance (FID) for synthetic image quality evaluation will differ based on the specific interpolation method used. There have been a few attempts at tackling the reproducibility issues. Papers With Code links research code with its corresponding paper. Similarly, arXiv recently added a code and data section that links both official and community code to papers. However, these methods rely on the paper code to be made publicly accessible which is not always possible. Our approach is to provide the de-facto reference implementation for metrics. This approach enables proprietary work to still be comparable as long as they've used our reference implementations. We introduce TorchMetrics, a general-purpose metrics package that covers a wide variety of tasks and domains used in the machine learning community. TorchMetrics provides standard classification and regression metrics; and domain-specific metrics for audio, computer vision, natural language processing, and information retrieval. Our process for adding a new metric is as follows, first we integrate a well-tested and established third-party library. Once we've verified the implementations and written tests for them, we re-implement them in native PyTorch to enable hardware acceleration and remove any bottlenecks in inter-device transfer.If you want to cite the framework, feel free to use this (but only if you loved it
Early adverse physiological event detection using commercial wearables: challenges and opportunities
Abstract Data from commercial off-the-shelf (COTS) wearables leveraged with machine learning algorithms provide an unprecedented potential for the early detection of adverse physiological events. However, several challenges inhibit this potential, including (1) heterogeneity among and within participants that make scaling detection algorithms to a general population less precise, (2) confounders that lead to incorrect assumptions regarding a participant’s healthy state, (3) noise in the data at the sensor level that limits the sensitivity of detection algorithms, and (4) imprecision in self-reported labels that misrepresent the true data values associated with a given physiological event. The goal of this study was two-fold: (1) to characterize the performance of such algorithms in the presence of these challenges and provide insights to researchers on limitations and opportunities, and (2) to subsequently devise algorithms to address each challenge and offer insights on future opportunities for advancement. Our proposed algorithms include techniques that build on determining suitable baselines for each participant to capture important physiological changes and label correction techniques as it pertains to participant-reported identifiers. Our work is validated on potentially one of the largest datasets available, obtained with 8000+ participants and 1.3+ million hours of wearable data captured from Oura smart rings. Leveraging this extensive dataset, we achieve pre-symptomatic detection of COVID-19 with a performance receiver operator characteristic (ROC) area under the curve (AUC) of 0.725 without correction techniques, 0.739 with baseline correction, 0.740 with baseline correction and label correction on the training set, and 0.777 with baseline correction and label correction on both the training and the test set. Using the same respective paradigms, we achieve ROC AUCs of 0.919, 0.938, 0.943 and 0.994 for the detection of self-reported fever, and 0.574, 0.611, 0.601, and 0.635 for detection of self-reported shortness of breath. These techniques offer improvements across almost all metrics and events, including PR AUC, sensitivity at 75% specificity, and precision at 75% recall. The ring allows continuous monitoring for detection of event onset, and we further demonstrate an improvement in the early detection of COVID-19 from an average of 3.5 days to an average of 4.1 days before a reported positive test result