20 research outputs found
Learning a local-variable model of aromatic and conjugated systems
A collection of new
approaches to building and training neural
networks, collectively referred to as deep learning, are attracting
attention in theoretical chemistry. Several groups aim to replace
computationally expensive <i>ab initio</i> quantum mechanics
calculations with learned estimators. This raises questions about
the representability of complex quantum chemical systems with neural
networks. Can local-variable models efficiently approximate nonlocal
quantum chemical features? Here, we find that convolutional architectures,
those that only aggregate information locally, cannot efficiently
represent aromaticity and conjugation in large systems. They cannot
represent long-range nonlocality known to be important in quantum
chemistry. This study uses aromatic and conjugated systems computed
from molecule graphs, though reproducing quantum simulations is the
ultimate goal. This task, by definition, is both computable and known
to be important to chemistry. The failure of convolutional architectures
on this focused task calls into question their use in modeling quantum
mechanics. To remedy this heretofore unrecognized deficiency, we introduce
a new architecture that propagates information back and forth in waves
of nonlinear computation. This architecture is still a local-variable
model, and it is both computationally and representationally efficient,
processing molecules in sublinear time with far fewer parameters than
convolutional networks. Wave-like propagation models aromatic and
conjugated systems with high accuracy, and even models the impact
of small structural changes on large molecules. This new architecture
demonstrates that some nonlocal features of quantum chemistry can
be efficiently represented in local variable models
Deep learning quantification of percent steatosis in donor liver biopsy frozen sections
BACKGROUND: Pathologist evaluation of donor liver biopsies provides information for accepting or discarding potential donor livers. Due to the urgent nature of the decision process, this is regularly performed using frozen sectioning at the time of biopsy. The percent steatosis in a donor liver biopsy correlates with transplant outcome, however there is significant inter- and intra-observer variability in quantifying steatosis, compounded by frozen section artifact. We hypothesized that a deep learning model could identify and quantify steatosis in donor liver biopsies.
METHODS: We developed a deep learning convolutional neural network that generates a steatosis probability map from an input whole slide image (WSI) of a hematoxylin and eosin-stained frozen section, and subsequently calculates the percent steatosis. Ninety-six WSI of frozen donor liver sections from our transplant pathology service were annotated for steatosis and used to train (n = 30 WSI) and test (n = 66 WSI) the deep learning model.
FINDINGS: The model had good correlation and agreement with the annotation in both the training set (r of 0.88, intraclass correlation coefficient [ICC] of 0.88) and novel input test sets (r = 0.85 and ICC=0.85). These measurements were superior to the estimates of the on-service pathologist at the time of initial evaluation (r = 0.52 and ICC=0.52 for the training set, and r = 0.74 and ICC=0.72 for the test set).
INTERPRETATION: Use of this deep learning algorithm could be incorporated into routine pathology workflows for fast, accurate, and reproducible donor liver evaluation.
FUNDING: Mid-America Transplant Society
ProteomeScout: A repository and analysis resource for post-translational modifications and proteins
ProteomeScout (https://proteomescout.wustl.edu) is a resource for the study of proteins and their post-translational modifications (PTMs) consisting of a database of PTMs, a repository for experimental data, an analysis suite for PTM experiments, and a tool for visualizing the relationships between complex protein annotations. The PTM database is a compendium of public PTM data, coupled with user-uploaded experimental data. ProteomeScout provides analysis tools for experimental datasets, including summary views and subset selection, which can identify relationships within subsets of data by testing for statistically significant enrichment of protein annotations. Protein annotations are incorporated in the ProteomeScout database from external resources and include terms such as Gene Ontology annotations, domains, secondary structure and non-synonymous polymorphisms. These annotations are available in the database download, in the analysis tools and in the protein viewer. The protein viewer allows for the simultaneous visualization of annotations in an interactive web graphic, which can be exported in Scalable Vector Graphics (SVG) format. Finally, quantitative data measurements associated with public experiments are also easily viewable within protein records, allowing researchers to see how PTMs change across different contexts. ProteomeScout should prove useful for protein researchers and should benefit the proteomics community by providing a stable repository for PTM experiments
Standard operating procedure for somatic variant refinement of sequencing data with paired tumor and normal samples
Modeling Small-Molecule Reactivity Identifies Promiscuous Bioactive Compounds
Scientists rely on
high-throughput screening tools to identify
promising small-molecule compounds for the development of biochemical
probes and drugs. This study focuses on the identification of promiscuous
bioactive compounds, which are compounds that appear active in many
high-throughput screening experiments against diverse targets but
are often false-positives which may not be easily developed into successful
probes. These compounds can exhibit bioactivity due to nonspecific,
intractable mechanisms of action and/or by interference with specific
assay technology readouts. Such “frequent hitters” are
now commonly identified using substructure filters, including pan
assay interference compounds (PAINS). Herein, we show that mechanistic
modeling of small-molecule reactivity using deep learning can improve
upon PAINS filters when modeling promiscuous bioactivity in PubChem
assays. Without training on high-throughput screening data, a deep
learning model of small-molecule reactivity achieves a sensitivity
and specificity of 18.5% and 95.5%, respectively, in identifying promiscuous
bioactive compounds. This performance is similar to PAINS filters,
which achieve a sensitivity of 20.3% at the same specificity. Importantly,
such reactivity modeling is complementary to PAINS filters. When PAINS
filters and reactivity models are combined, the resulting model outperforms
either method alone, achieving a sensitivity of 24% at the same specificity.
However, as a probabilistic model, the sensitivity and specificity
of the deep learning model can be tuned by adjusting the threshold.
Moreover, for a subset of PAINS filters, this reactivity model can
help discriminate between promiscuous and nonpromiscuous bioactive
compounds even among compounds matching those filters. Critically,
the reactivity model provides mechanistic hypotheses for assay interference
by predicting the precise atoms involved in compound reactivity. Overall,
our analysis suggests that deep learning approaches to modeling promiscuous
compound bioactivity may provide a complementary approach to current
methods for identifying promiscuous compounds