43,481 research outputs found
Enhancing Model Learning and Interpretation Using Multiple Molecular Graph Representations for Compound Property and Activity Prediction
Graph neural networks (GNNs) demonstrate great performance in compound
property and activity prediction due to their capability to efficiently learn
complex molecular graph structures. However, two main limitations persist
including compound representation and model interpretability. While atom-level
molecular graph representations are commonly used because of their ability to
capture natural topology, they may not fully express important substructures or
functional groups which significantly influence molecular properties.
Consequently, recent research proposes alternative representations employing
reduction techniques to integrate higher-level information and leverages both
representations for model learning. However, there is still a lack of study
about different molecular graph representations on model learning and
interpretation. Interpretability is also crucial for drug discovery as it can
offer chemical insights and inspiration for optimization. Numerous studies
attempt to include model interpretation to explain the rationale behind
predictions, but most of them focus solely on individual prediction with little
analysis of the interpretation on different molecular graph representations.
This research introduces multiple molecular graph representations that
incorporate higher-level information and investigates their effects on model
learning and interpretation from diverse perspectives. The results indicate
that combining atom graph representation with reduced molecular graph
representation can yield promising model performance. Furthermore, the
interpretation results can provide significant features and potential
substructures consistently aligning with background knowledge. These multiple
molecular graph representations and interpretation analysis can bolster model
comprehension and facilitate relevant applications in drug discovery
Molecular Joint Representation Learning via Multi-modal Information
In recent years, artificial intelligence has played an important role on
accelerating the whole process of drug discovery. Various of molecular
representation schemes of different modals (e.g. textual sequence or graph) are
developed. By digitally encoding them, different chemical information can be
learned through corresponding network structures. Molecular graphs and
Simplified Molecular Input Line Entry System (SMILES) are popular means for
molecular representation learning in current. Previous works have done attempts
by combining both of them to solve the problem of specific information loss in
single-modal representation on various tasks. To further fusing such
multi-modal imformation, the correspondence between learned chemical feature
from different representation should be considered. To realize this, we propose
a novel framework of molecular joint representation learning via Multi-Modal
information of SMILES and molecular Graphs, called MMSG. We improve the
self-attention mechanism by introducing bond level graph representation as
attention bias in Transformer to reinforce feature correspondence between
multi-modal information. We further propose a Bidirectional Message
Communication Graph Neural Network (BMC GNN) to strengthen the information flow
aggregated from graphs for further combination. Numerous experiments on public
property prediction datasets have demonstrated the effectiveness of our model
Inductive queries for a drug designing robot scientist
It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments
- …