345 research outputs found
Graph Neural Networks for Molecules
Graph neural networks (GNNs), which are capable of learning representations
from graphical data, are naturally suitable for modeling molecular systems.
This review introduces GNNs and their various applications for small organic
molecules. GNNs rely on message-passing operations, a generic yet powerful
framework, to update node features iteratively. Many researches design GNN
architectures to effectively learn topological information of 2D molecule
graphs as well as geometric information of 3D molecular systems. GNNs have been
implemented in a wide variety of molecular applications, including molecular
property prediction, molecular scoring and docking, molecular optimization and
de novo generation, molecular dynamics simulation, etc. Besides, the review
also summarizes the recent development of self-supervised learning for
molecules with GNNs.Comment: A chapter for the book "Machine Learning in Molecular Sciences". 31
pages, 4 figure
Gode -- Integrating Biochemical Knowledge Graph into Pre-training Molecule Graph Neural Network
The precise prediction of molecular properties holds paramount importance in
facilitating the development of innovative treatments and comprehending the
intricate interplay between chemicals and biological systems. In this study, we
propose a novel approach that integrates graph representations of individual
molecular structures with multi-domain information from biomedical knowledge
graphs (KGs). Integrating information from both levels, we can pre-train a more
extensive and robust representation for both molecule-level and KG-level
prediction tasks with our novel self-supervision strategy. For performance
evaluation, we fine-tune our pre-trained model on 11 challenging chemical
property prediction tasks. Results from our framework demonstrate our
fine-tuned models outperform existing state-of-the-art models.Comment: It's an ongoing work. We're exploring the ability of Gode on other
task
Reduced collision fingerprints and pairwise molecular comparisons for explainable property prediction using Deep Learning
Les relations entre la structure des composés chimiques et leurs propriétés sont complexes et à haute dimension. Dans le processus de développement de médicaments, plusieurs proprié- tés d’un composé doivent souvent être optimisées simultanément, ce qui complique encore la tâche. Ce travail explore deux représentations des composés chimiques pour les tâches de prédiction des propriétés. L’objectif de ces représentations proposées est d’améliorer l’explicabilité afin de faciliter le processus d’optimisation des propriétés des composés. Pre- mièrement, nous décomposons l’algorithme ECFP (Extended connectivity Fingerprint) et le rendons plus simple pour la compréhension humaine. Nous remplaçons une fonction de hachage sujet aux collisions par une relation univoque de sous structure à bit. Nous consta- tons que ce changement ne se traduit pas par une meilleure performance prédictive d’un perceptron multicouche par rapport à l’ECFP. Toutefois, si la capacité du prédicteur est ra- menée à celle d’un prédicteur linéaire, ses performances sont meilleures que celles de l’ECFP. Deuxièmement, nous appliquons l’apprentissage automatique à l’analyse des paires molécu- laires appariées (MMPA), un paradigme de conception du développement de médicaments. La MMPA compare des paires de composés très similaires, dont la structure diffère par une modification sur un site. Nous formons des modèles de prédiction sur des paires de com- posés afin de prédire les différences d’activité. Nous utilisons des contraintes de similarité par paires comme MMPA, mais nous utilisons également des paires échantillonnées de façon aléatoire pour entraîner les modèles. Nous constatons que les modèles sont plus performants sur des paires choisies au hasard que sur des paires avec des contraintes de similarité strictes. Cependant, les meilleurs modèles par paires ne sont pas capables de battre les performances de prédiction du modèle simple de base. Ces deux études, RCFP et comparaisons par paires, visent à aborder la prédiction des propriétés d’une manière plus compréhensible. En utili- sant l’intuition et l’expérience des chimistes médicinaux dans le cadre de la modélisation prédictive, nous espérons encourager l’explicabilité en tant que composante nécessaire des modèles cheminformatiques prédictifs.The relationships between the structure of chemical compounds and their properties are complex and high dimensional. In the drug development process, multiple properties of a compound often need to be optimized simultaneously, further complicating the task. This work explores two representations of chemical compounds for property prediction tasks. The goal of these suggested representations is improved explainability to better understand the compound property optimization process. First, we decompose the Extended Connectivity Fingerprint (ECFP) algorithm and make it more straightforward for human understanding. We replace a collision-prone hash function with a one-to-one substructure-to-bit relationship. We find that this change which does not translate to higher predictive performance of a multi- layer perceptron compared to ECFP. However, if the capacity of the predictor is lowered to that of a linear predictor, it does perform better than ECFP. Second, we apply machine learning to Matched Molecular Pair Analysis (MMPA), a drug development design paradigm. MMPA compares pairs of highly similar compounds, differing in structure by modification at one site. We train prediction models on pairs of compounds to predict differences in activity. We use pairwise similarity constraints like MMPA, but also use randomly sampled pairs to train the models. We find that models perform better on randomly chosen pairs than on pairs with strict similarity constraints. However, the best pairwise models are not able to beat the prediction performance of the simpler baseline single model. Both of these investigations, RCFP and pairwise comparisons, aim to approach property prediction in a more explainable way. By using intuition and experience of medicinal chemists within predictive modelling, we hope to encourage explainability as a necessary component of predictive cheminformatic models
A Systematic Survey of Chemical Pre-trained Models
Deep learning has achieved remarkable success in learning representations for
molecules, which is crucial for various biochemical applications, ranging from
property prediction to drug design. However, training Deep Neural Networks
(DNNs) from scratch often requires abundant labeled molecules, which are
expensive to acquire in the real world. To alleviate this issue, tremendous
efforts have been devoted to Molecular Pre-trained Models (CPMs), where DNNs
are pre-trained using large-scale unlabeled molecular databases and then
fine-tuned over specific downstream tasks. Despite the prosperity, there lacks
a systematic review of this fast-growing field. In this paper, we present the
first survey that summarizes the current progress of CPMs. We first highlight
the limitations of training molecular representation models from scratch to
motivate CPM studies. Next, we systematically review recent advances on this
topic from several key perspectives, including molecular descriptors, encoder
architectures, pre-training strategies, and applications. We also highlight the
challenges and promising avenues for future research, providing a useful
resource for both machine learning and scientific communities.Comment: IJCAI 2023, Survey Trac
Graph neural networks for materials science and chemistry
Machine learning plays an increasingly important role in many areas of chemistry and materials science, being used to predict materials properties, accelerate simulations, design new structures, and predict synthesis routes of new materials. Graph neural networks (GNNs) are one of the fastest growing classes of machine learning models. They are of particular relevance for chemistry and materials science, as they directly work on a graph or structural representation of molecules and materials and therefore have full access to all relevant information required to characterize materials. In this Review, we provide an overview of the basic principles of GNNs, widely used datasets, and state-of-the-art architectures, followed by a discussion of a wide range of recent applications of GNNs in chemistry and materials science, and concluding with a road-map for the further development and application of GNNs
- …