3,765 research outputs found
Heuristics-Guided Exploration of Reaction Mechanisms
For the investigation of chemical reaction networks, the efficient and
accurate determination of all relevant intermediates and elementary reactions
is mandatory. The complexity of such a network may grow rapidly, in particular
if reactive species are involved that might cause a myriad of side reactions.
Without automation, a complete investigation of complex reaction mechanisms is
tedious and possibly unfeasible. Therefore, only the expected dominant reaction
paths of a chemical reaction network (e.g., a catalytic cycle or an enzymatic
cascade) are usually explored in practice. Here, we present a computational
protocol that constructs such networks in a parallelized and automated manner.
Molecular structures of reactive complexes are generated based on heuristic
rules derived from conceptual electronic-structure theory and subsequently
optimized by quantum chemical methods to produce stable intermediates of an
emerging reaction network. Pairs of intermediates in this network that might be
related by an elementary reaction according to some structural similarity
measure are then automatically detected and subjected to an automated search
for the connecting transition state. The results are visualized as an
automatically generated network graph, from which a comprehensive picture of
the mechanism of a complex chemical process can be obtained that greatly
facilitates the analysis of the whole network. We apply our protocol to the
Schrock dinitrogen-fixation catalyst to study alternative pathways of catalytic
ammonia production.Comment: 27 pages, 9 figure
Automated exploration of prebiotic chemical reaction space: progress and perspectives
Prebiotic chemistry often involves the study of complex systems of chemical reactions that form large networks with a large number of diverse species. Such complex systems may have given rise to emergent phenomena that ultimately led to the origin of life on Earth. The environmental conditions and processes involved in this emergence may not be fully recapitulable, making it difficult for experimentalists to study prebiotic systems in laboratory simulations. Computational chemistry offers efficient ways to study such chemical systems and identify the ones most likely to display complex properties associated with life. Here, we review tools and techniques for modelling prebiotic chemical reaction networks and outline possible ways to identify self-replicating features that are central to many origin-of-life models
Molecular geometric deep learning
Geometric deep learning (GDL) has demonstrated huge power and enormous
potential in molecular data analysis. However, a great challenge still remains
for highly efficient molecular representations. Currently, covalent-bond-based
molecular graphs are the de facto standard for representing molecular topology
at the atomic level. Here we demonstrate, for the first time, that molecular
graphs constructed only from non-covalent bonds can achieve similar or even
better results than covalent-bond-based models in molecular property
prediction. This demonstrates the great potential of novel molecular
representations beyond the de facto standard of covalent-bond-based molecular
graphs. Based on the finding, we propose molecular geometric deep learning
(Mol-GDL). The essential idea is to incorporate a more general molecular
representation into GDL models. In our Mol-GDL, molecular topology is modeled
as a series of molecular graphs, each focusing on a different scale of atomic
interactions. In this way, both covalent interactions and non-covalent
interactions are incorporated into the molecular representation on an equal
footing. We systematically test Mol-GDL on fourteen commonly-used benchmark
datasets. The results show that our Mol-GDL can achieve a better performance
than state-of-the-art (SOTA) methods. Source code and data are available at
https://github.com/CS-BIO/Mol-GDL
Evolutionary Computation and QSAR Research
[Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. ConsellerĂa de EconomĂa e Industria; 10SIN105004P
Kekulescope: Prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images
The application of convolutional neural networks (ConvNets) to harness
high-content screening images or 2D compound representations is gaining
increasing attention in drug discovery. However, existing applications often
require large data sets for training, or sophisticated pretraining schemes.
Here, we show using 33 IC50 data sets from ChEMBL 23 that the in vitro activity
of compounds on cancer cell lines and protein targets can be accurately
predicted on a continuous scale from their Kekule structure representations
alone by extending existing architectures, which were pretrained on unrelated
image data sets. We show that the predictive power of the generated models is
comparable to that of Random Forest (RF) models and fully-connected Deep Neural
Networks trained on circular (Morgan) fingerprints. Notably, including
additional fully-connected layers further increases the predictive power of the
ConvNets by up to 10%. Analysis of the predictions generated by RF models and
ConvNets shows that by simply averaging the output of the RF models and
ConvNets we obtain significantly lower errors in prediction for multiple data
sets, although the effect size is small, than those obtained with either model
alone, indicating that the features extracted by the convolutional layers of
the ConvNets provide complementary predictive signal to Morgan fingerprints.
Lastly, we show that multi-task ConvNets trained on compound images permit to
model COX isoform selectivity on a continuous scale with errors in prediction
comparable to the uncertainty of the data. Overall, in this work we present a
set of ConvNet architectures for the prediction of compound activity from their
Kekule structure representations with state-of-the-art performance, that
require no generation of compound descriptors or use of sophisticated image
processing techniques
- …