21,149 research outputs found
Hierarchical Protein Function Prediction with Tail-GNNs
Protein function prediction may be framed as predicting subgraphs (with
certain closure properties) of a directed acyclic graph describing the
hierarchy of protein functions. Graph neural networks (GNNs), with their
built-in inductive bias for relational data, are hence naturally suited for
this task. However, in contrast with most GNN applications, the graph is not
related to the input, but to the label space. Accordingly, we propose
Tail-GNNs, neural networks which naturally compose with the output space of any
neural network for multi-task prediction, to provide relationally-reinforced
labels. For protein function prediction, we combine a Tail-GNN with a dilated
convolutional network which learns representations of the protein sequence,
making significant improvement in F_1 score and demonstrating the ability of
Tail-GNNs to learn useful representations of labels and exploit them in
real-world problem solving
Predicting multicellular function through multi-layer tissue networks
Motivation: Understanding functions of proteins in specific human tissues is
essential for insights into disease diagnostics and therapeutics, yet
prediction of tissue-specific cellular function remains a critical challenge
for biomedicine.
Results: Here we present OhmNet, a hierarchy-aware unsupervised node feature
learning approach for multi-layer networks. We build a multi-layer network,
where each layer represents molecular interactions in a different human tissue.
OhmNet then automatically learns a mapping of proteins, represented as nodes,
to a neural embedding based low-dimensional space of features. OhmNet
encourages sharing of similar features among proteins with similar network
neighborhoods and among proteins activated in similar tissues. The algorithm
generalizes prior work, which generally ignores relationships between tissues,
by modeling tissue organization with a rich multiscale tissue hierarchy. We use
OhmNet to study multicellular function in a multi-layer protein interaction
network of 107 human tissues. In 48 tissues with known tissue-specific cellular
functions, OhmNet provides more accurate predictions of cellular function than
alternative approaches, and also generates more accurate hypotheses about
tissue-specific protein actions. We show that taking into account the tissue
hierarchy leads to improved predictive power. Remarkably, we also demonstrate
that it is possible to leverage the tissue hierarchy in order to effectively
transfer cellular functions to a functionally uncharacterized tissue. Overall,
OhmNet moves from flat networks to multiscale models able to predict a range of
phenotypes spanning cellular subsystemsComment: In Proceedings of the 25th International Conference on Intelligent
Systems for Molecular Biology (ISMB), 201
InteractionNet: Modeling and Explaining of Noncovalent Protein-Ligand Interactions with Noncovalent Graph Neural Network and Layer-Wise Relevance Propagation
Expanding the scope of graph-based, deep-learning models to noncovalent
protein-ligand interactions has earned increasing attention in structure-based
drug design. Modeling the protein-ligand interactions with graph neural
networks (GNNs) has experienced difficulties in the conversion of
protein-ligand complex structures into the graph representation and left
questions regarding whether the trained models properly learn the appropriate
noncovalent interactions. Here, we proposed a GNN architecture, denoted as
InteractionNet, which learns two separated molecular graphs, being covalent and
noncovalent, through distinct convolution layers. We also analyzed the
InteractionNet model with an explainability technique, i.e., layer-wise
relevance propagation, for examination of the chemical relevance of the model's
predictions. Separation of the covalent and noncovalent convolutional steps
made it possible to evaluate the contribution of each step independently and
analyze the graph-building strategy for noncovalent interactions. We applied
InteractionNet to the prediction of protein-ligand binding affinity and showed
that our model successfully predicted the noncovalent interactions in both
performance and relevance in chemical interpretation
Strategies for Pre-training Graph Neural Networks
Many applications of machine learning require a model to make accurate
pre-dictions on test examples that are distributionally different from training
ones, while task-specific labels are scarce during training. An effective
approach to this challenge is to pre-train a model on related tasks where data
is abundant, and then fine-tune it on a downstream task of interest. While
pre-training has been effective in many language and vision domains, it remains
an open question how to effectively use pre-training on graph datasets. In this
paper, we develop a new strategy and self-supervised methods for pre-training
Graph Neural Networks (GNNs). The key to the success of our strategy is to
pre-train an expressive GNN at the level of individual nodes as well as entire
graphs so that the GNN can learn useful local and global representations
simultaneously. We systematically study pre-training on multiple graph
classification datasets. We find that naive strategies, which pre-train GNNs at
the level of either entire graphs or individual nodes, give limited improvement
and can even lead to negative transfer on many downstream tasks. In contrast,
our strategy avoids negative transfer and improves generalization significantly
across downstream tasks, leading up to 9.4% absolute improvements in ROC-AUC
over non-pre-trained models and achieving state-of-the-art performance for
molecular property prediction and protein function prediction.Comment: Accepted as a spotlight to ICLR 202
Predicting drug-target interaction using 3D structure-embedded graph representations from graph neural networks
Accurate prediction of drug-target interaction (DTI) is essential for in
silico drug design. For the purpose, we propose a novel approach for predicting
DTI using a GNN that directly incorporates the 3D structure of a protein-ligand
complex. We also apply a distance-aware graph attention algorithm with gate
augmentation to increase the performance of our model. As a result, our model
shows better performance than docking and other deep learning methods for both
virtual screening and pose prediction. In addition, our model can reproduce the
natural population distribution of active molecules and inactive molecules.Comment: 20 pages, 2 figure
Representation Learning on Graphs: Methods and Applications
Machine learning on graphs is an important and ubiquitous task with
applications ranging from drug design to friendship recommendation in social
networks. The primary challenge in this domain is finding a way to represent,
or encode, graph structure so that it can be easily exploited by machine
learning models. Traditionally, machine learning approaches relied on
user-defined heuristics to extract features encoding structural information
about a graph (e.g., degree statistics or kernel functions). However, recent
years have seen a surge in approaches that automatically learn to encode graph
structure into low-dimensional embeddings, using techniques based on deep
learning and nonlinear dimensionality reduction. Here we provide a conceptual
review of key advancements in this area of representation learning on graphs,
including matrix factorization-based methods, random-walk based algorithms, and
graph neural networks. We review methods to embed individual nodes as well as
approaches to embed entire (sub)graphs. In doing so, we develop a unified
framework to describe these recent approaches, and we highlight a number of
important applications and directions for future work.Comment: Published in the IEEE Data Engineering Bulletin, September 2017;
version with minor correction
DeepGO: Predicting protein functions from sequence and interactions using a deep ontology-aware classifier
A large number of protein sequences are becoming available through the
application of novel high-throughput sequencing technologies. Experimental
functional characterization of these proteins is time-consuming and expensive,
and is often only done rigorously for few selected model organisms.
Computational function prediction approaches have been suggested to fill this
gap. The functions of proteins are classified using the Gene Ontology (GO),
which contains over 40,000 classes. Additionally, proteins have multiple
functions, making function prediction a large-scale, multi-class, multi-label
problem.
We have developed a novel method to predict protein function from sequence.
We use deep learning to learn features from protein sequences as well as a
cross-species protein-protein interaction network. Our approach specifically
outputs information in the structure of the GO and utilizes the dependencies
between GO classes as background information to construct a deep learning
model. We evaluate our method using the standards established by the
Computational Assessment of Function Annotation (CAFA) and demonstrate a
significant improvement over baseline methods such as BLAST, with significant
improvement for predicting cellular locations
Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity
Empirical scoring functions based on either molecular force fields or
cheminformatics descriptors are widely used, in conjunction with molecular
docking, during the early stages of drug discovery to predict potency and
binding affinity of a drug-like molecule to a given target. These models
require expert-level knowledge of physical chemistry and biology to be encoded
as hand-tuned parameters or features rather than allowing the underlying model
to select features in a data-driven procedure. Here, we develop a general
3-dimensional spatial convolution operation for learning atomic-level chemical
interactions directly from atomic coordinates and demonstrate its application
to structure-based bioactivity prediction. The atomic convolutional neural
network is trained to predict the experimentally determined binding affinity of
a protein-ligand complex by direct calculation of the energy associated with
the complex, protein, and ligand given the crystal structure of the binding
pose. Non-covalent interactions present in the complex that are absent in the
protein-ligand sub-structures are identified and the model learns the
interaction strength associated with these features. We test our model by
predicting the binding free energy of a subset of protein-ligand complexes
found in the PDBBind dataset and compare with state-of-the-art cheminformatics
and machine learning-based approaches. We find that all methods achieve
experimental accuracy and that atomic convolutional networks either outperform
or perform competitively with the cheminformatics based methods. Unlike all
previous protein-ligand prediction systems, atomic convolutional networks are
end-to-end and fully-differentiable. They represent a new data-driven,
physics-based deep learning model paradigm that offers a strong foundation for
future improvements in structure-based bioactivity prediction
Modeling polypharmacy side effects with graph convolutional networks
The use of drug combinations, termed polypharmacy, is common to treat
patients with complex diseases and co-existing conditions. However, a major
consequence of polypharmacy is a much higher risk of adverse side effects for
the patient. Polypharmacy side effects emerge because of drug-drug
interactions, in which activity of one drug may change if taken with another
drug. The knowledge of drug interactions is limited because these complex
relationships are rare, and are usually not observed in relatively small
clinical testing. Discovering polypharmacy side effects thus remains an
important challenge with significant implications for patient mortality. Here,
we present Decagon, an approach for modeling polypharmacy side effects. The
approach constructs a multimodal graph of protein-protein interactions,
drug-protein target interactions, and the polypharmacy side effects, which are
represented as drug-drug interactions, where each side effect is an edge of a
different type. Decagon is developed specifically to handle such multimodal
graphs with a large number of edge types. Our approach develops a new graph
convolutional neural network for multirelational link prediction in multimodal
networks. Decagon predicts the exact side effect, if any, through which a given
drug combination manifests clinically. Decagon accurately predicts polypharmacy
side effects, outperforming baselines by up to 69%. We find that it
automatically learns representations of side effects indicative of
co-occurrence of polypharmacy in patients. Furthermore, Decagon models
particularly well side effects with a strong molecular basis, while on
predominantly non-molecular side effects, it achieves good performance because
of effective sharing of model parameters across edge types. Decagon creates
opportunities to use large pharmacogenomic and patient data to flag and
prioritize side effects for follow-up analysis.Comment: Presented at ISMB 201
Inductive Representation Learning on Large Graphs
Low-dimensional embeddings of nodes in large graphs have proved extremely
useful in a variety of prediction tasks, from content recommendation to
identifying protein functions. However, most existing approaches require that
all nodes in the graph are present during training of the embeddings; these
previous approaches are inherently transductive and do not naturally generalize
to unseen nodes. Here we present GraphSAGE, a general, inductive framework that
leverages node feature information (e.g., text attributes) to efficiently
generate node embeddings for previously unseen data. Instead of training
individual embeddings for each node, we learn a function that generates
embeddings by sampling and aggregating features from a node's local
neighborhood. Our algorithm outperforms strong baselines on three inductive
node-classification benchmarks: we classify the category of unseen nodes in
evolving information graphs based on citation and Reddit post data, and we show
that our algorithm generalizes to completely unseen graphs using a multi-graph
dataset of protein-protein interactions.Comment: Published in NIPS 2017; version with full appendix and minor
correction
- ā¦