Search CORE

14 research outputs found

LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs

Author: Aphinyanaphongs Yin
Bordt Sebastian
Caruana Rich
Kellis Manolis
Lengerich Benjamin J.
Nori Harsha
Nunnally Mark E.
Publication venue
Publication date: 02/08/2023
Field of study

We show that large language models (LLMs) are remarkably good at working with interpretable models that decompose complex outcomes into univariate graph-represented components. By adopting a hierarchical approach to reasoning, LLMs can provide comprehensive model-level summaries without ever requiring the entire model to fit in context. This approach enables LLMs to apply their extensive background knowledge to automate common tasks in data science such as detecting anomalies that contradict prior knowledge, describing potential reasons for the anomalies, and suggesting repairs that would remove the anomalies. We use multiple examples in healthcare to demonstrate the utility of these new capabilities of LLMs, with particular emphasis on Generalized Additive Models (GAMs). Finally, we present the package

\texttt{TalkToEBM}

as an open-source LLM-GAM interface

arXiv.org e-Print Archive

Using Interpretable Machine Learning to Predict Maternal and Fetal Outcomes

Author: Bosschieter Tomas M.
Caruana Rich
Lan Hui
Lengerich Benjamin J.
Nori Harsha
Sitcov Kristin
Souter Vivienne
Xu Zifei
Publication venue
Publication date: 12/07/2022
Field of study

Most pregnancies and births result in a good outcome, but complications are not uncommon and when they do occur, they can be associated with serious implications for mothers and babies. Predictive modeling has the potential to improve outcomes through better understanding of risk factors, heightened surveillance, and more timely and appropriate interventions, thereby helping obstetricians deliver better care. For three types of complications we identify and study the most important risk factors using Explainable Boosting Machine (EBM), a glass box model, in order to gain intelligibility: (i) Severe Maternal Morbidity (SMM), (ii) shoulder dystocia, and (iii) preterm preeclampsia. While using the interpretability of EBM's to reveal surprising insights into the features contributing to risk, our experiments show EBMs match the accuracy of other black-box ML methods such as deep neural nets and random forests.Comment: DSHealth at SIGKDD 2022, 5 pages, 3 figure

arXiv.org e-Print Archive

Experimental and Computational Mutagenesis To Investigate the Positioning of a General Base within an Enzyme Active Site

Author: Ana Gonzalez (77722)
Benjamin J. Lengerich (1316394)
Daniel Herschlag (2858)
Fanny Sunden (1316388)
Jason P. Schwans (1316391)
Philip Hanoian (1316385)
Sharon Hammes-Schiffer (1274832)
Yingssu Tsai (273937)
Publication venue
Publication date
Field of study

The positioning of catalytic groups within proteins plays an important role in enzyme catalysis, and here we investigate the positioning of the general base in the enzyme ketosteroid isomerase (KSI). The oxygen atoms of Asp38, the general base in KSI, were previously shown to be involved in anion–aromatic interactions with two neighboring Phe residues. Here we ask whether those interactions are sufficient, within the overall protein architecture, to position Asp38 for catalysis or whether the side chains that pack against Asp38 and/or the residues of the structured loop that is capped by Asp38 are necessary to achieve optimal positioning for catalysis. To test positioning, we mutated each of the aforementioned residues, alone and in combinations, in a background with the native Asp general base and in a D38E mutant background, as Glu at position 38 was previously shown to be mispositioned for general base catalysis. These double-mutant cycles reveal positioning effects as large as 10<sup>3</sup>-fold, indicating that structural features in addition to the overall protein architecture and the Phe residues neighboring the carboxylate oxygen atoms play roles in positioning. X-ray crystallography and molecular dynamics simulations suggest that the functional effects arise from both restricting dynamic fluctuations and disfavoring potential mispositioned states. Whereas it may have been anticipated that multiple interactions would be necessary for optimal general base positioning, the energetic contributions from positioning and the nonadditive nature of these interactions are not revealed by structural inspection and require functional dissection. Recognizing the extent, type, and energetic interconnectivity of interactions that contribute to positioning catalytic groups has implications for enzyme evolution and may help reveal the nature and extent of interactions required to design enzymes that rival those found in biology

CiteSeerX

FigShare

Ten Quick Tips for Deep Learning in Biology

Author: Boca Simina M.
Britto-Borges Thiago
Carmona Juan Jose
Chevrette Marc G.
Cofer Evan M.
Fertig Elana J.
Gitter Anthony
Greene Casey S.
Kalinin Alexandr A.
Kessler Michael D.
Lee Alexandra J.
Lee Benjamin D.
Lengerich Benjamin J.
Maguire Finlay
Raschka Sebastian
Signal Beth
Stewart Paul Allen
Titus Alexander J.
Triche Jr Timothy J.
Yu Kun-Hsing
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 29/05/2021
Field of study

Machine learning is a modern approach to problem-solving and task automation. In particular, machine learning is concerned with the development and applications of algorithms that can recognize patterns in data and use them for predictive modeling. Artificial neural networks are a particular class of machine learning algorithms and models that evolved into what is now described as deep learning. Given the computational advances made in the last decade, deep learning can now be applied to massive data sets and in innumerable contexts. Therefore, deep learning has become its own subfield of machine learning. In the context of biological research, it has been increasingly used to derive novel insights from high-dimensional biological data. To make the biological applications of deep learning more accessible to scientists who have some experience with machine learning, we solicited input from a community of researchers with varied biological and deep learning interests. These individuals collaboratively contributed to this manuscript's writing using the GitHub version control platform and the Manubot manuscript generation toolset. The goal was to articulate a practical, accessible, and concise set of guidelines and suggestions to follow when using deep learning. In the course of our discussions, several themes became clear: the importance of understanding and applying machine learning fundamentals as a baseline for utilizing deep learning, the necessity for extensive model comparisons with careful evaluation, and the need for critical thought in interpreting results generated by deep learning, among others.Comment: 23 pages, 2 figure

arXiv.org e-Print Archive

PubMed Central

Personalized regression enables sample-specific pan-cancer analysis

Author: Alaa
Benjamin J Lengerich
Bryon Aragam
Dennison
Eric P Xing
Fan
Filetti
Fisher
Forbes
Hastie
Hayeck
Isella
Kolar
Kuijjer
Kumar-Sinha
Liu
Livasy
Lopez-Martinez
Marusyk
Maurer
Mi
Mi
Moon
Morgan
Parikh
Pittman
Roth
Scafoglio
Song
Weinstein
Xing
Xu
Yamada
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

TREM2 Regulates Microglial Cholesterol Metabolism Upon Chronic Phagocytic Challenge

Author: Alicia A. Nugent
Ankita Srivastava
Anthony Lucas
Benjamin J. Andreone
Bettina Van Lengerich
Ceyda Llapashtica
Dan Xia
Do Jin Kim
Gilbert Di Paolo
Giuseppe Astarita
Hang Chen
Hilda O. Solanoy
Jason C. Dugas
Joseph W. Lewcock
Ju Shi
Junhua Wang
Karin Lin
Kathryn M. Monroe
Laralynne Przybyla
Melina Lenser
Pascal E. Sanchez
Patrick C.G. Haddick
Ryan J. Watts
Sonnet S. Davis
Steve Lianoglou
Sulochanadevi Baskaran
Suresh B. Poda
Thomas Sandmann
Timothy K. Earr
Todd Logan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Crossref

Recommended from our members

Opportunities and obstacles for deep learning in biology and medicine

Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems—patient classification, fundamental biological processes and treatment of patients—and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine

Harvard University - DASH