14 research outputs found
LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs
We show that large language models (LLMs) are remarkably good at working with
interpretable models that decompose complex outcomes into univariate
graph-represented components. By adopting a hierarchical approach to reasoning,
LLMs can provide comprehensive model-level summaries without ever requiring the
entire model to fit in context. This approach enables LLMs to apply their
extensive background knowledge to automate common tasks in data science such as
detecting anomalies that contradict prior knowledge, describing potential
reasons for the anomalies, and suggesting repairs that would remove the
anomalies. We use multiple examples in healthcare to demonstrate the utility of
these new capabilities of LLMs, with particular emphasis on Generalized
Additive Models (GAMs). Finally, we present the package as
an open-source LLM-GAM interface
Using Interpretable Machine Learning to Predict Maternal and Fetal Outcomes
Most pregnancies and births result in a good outcome, but complications are
not uncommon and when they do occur, they can be associated with serious
implications for mothers and babies. Predictive modeling has the potential to
improve outcomes through better understanding of risk factors, heightened
surveillance, and more timely and appropriate interventions, thereby helping
obstetricians deliver better care. For three types of complications we identify
and study the most important risk factors using Explainable Boosting Machine
(EBM), a glass box model, in order to gain intelligibility: (i) Severe Maternal
Morbidity (SMM), (ii) shoulder dystocia, and (iii) preterm preeclampsia. While
using the interpretability of EBM's to reveal surprising insights into the
features contributing to risk, our experiments show EBMs match the accuracy of
other black-box ML methods such as deep neural nets and random forests.Comment: DSHealth at SIGKDD 2022, 5 pages, 3 figure
Experimental and Computational Mutagenesis To Investigate the Positioning of a General Base within an Enzyme Active Site
The
positioning of catalytic groups within proteins plays an important
role in enzyme catalysis, and here we investigate the positioning
of the general base in the enzyme ketosteroid isomerase (KSI). The
oxygen atoms of Asp38, the general base in KSI, were previously shown
to be involved in anion–aromatic interactions with two neighboring
Phe residues. Here we ask whether those interactions are sufficient,
within the overall protein architecture, to position Asp38 for catalysis
or whether the side chains that pack against Asp38 and/or the residues
of the structured loop that is capped by Asp38 are necessary to achieve
optimal positioning for catalysis. To test positioning, we mutated
each of the aforementioned residues, alone and in combinations, in
a background with the native Asp general base and in a D38E mutant
background, as Glu at position 38 was previously shown to be mispositioned
for general base catalysis. These double-mutant cycles reveal positioning
effects as large as 10<sup>3</sup>-fold, indicating that structural
features in addition to the overall protein architecture and the Phe
residues neighboring the carboxylate oxygen atoms play roles in positioning.
X-ray crystallography and molecular dynamics simulations suggest that
the functional effects arise from both restricting dynamic fluctuations
and disfavoring potential mispositioned states. Whereas it may have
been anticipated that multiple interactions would be necessary for
optimal general base positioning, the energetic contributions from
positioning and the nonadditive nature of these interactions are not
revealed by structural inspection and require functional dissection.
Recognizing the extent, type, and energetic interconnectivity of interactions
that contribute to positioning catalytic groups has implications for
enzyme evolution and may help reveal the nature and extent of interactions
required to design enzymes that rival those found in biology
Ten Quick Tips for Deep Learning in Biology
Machine learning is a modern approach to problem-solving and task automation.
In particular, machine learning is concerned with the development and
applications of algorithms that can recognize patterns in data and use them for
predictive modeling. Artificial neural networks are a particular class of
machine learning algorithms and models that evolved into what is now described
as deep learning. Given the computational advances made in the last decade,
deep learning can now be applied to massive data sets and in innumerable
contexts. Therefore, deep learning has become its own subfield of machine
learning. In the context of biological research, it has been increasingly used
to derive novel insights from high-dimensional biological data. To make the
biological applications of deep learning more accessible to scientists who have
some experience with machine learning, we solicited input from a community of
researchers with varied biological and deep learning interests. These
individuals collaboratively contributed to this manuscript's writing using the
GitHub version control platform and the Manubot manuscript generation toolset.
The goal was to articulate a practical, accessible, and concise set of
guidelines and suggestions to follow when using deep learning. In the course of
our discussions, several themes became clear: the importance of understanding
and applying machine learning fundamentals as a baseline for utilizing deep
learning, the necessity for extensive model comparisons with careful
evaluation, and the need for critical thought in interpreting results generated
by deep learning, among others.Comment: 23 pages, 2 figure
Recommended from our members
Opportunities and obstacles for deep learning in biology and medicine
Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems—patient classification, fundamental biological processes and treatment of patients—and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine