1,070 research outputs found
A Unified View of Deep Learning for Reaction and Retrosynthesis Prediction: Current Status and Future Challenges
Reaction and retrosynthesis prediction are fundamental tasks in computational
chemistry that have recently garnered attention from both the machine learning
and drug discovery communities. Various deep learning approaches have been
proposed to tackle these problems, and some have achieved initial success. In
this survey, we conduct a comprehensive investigation of advanced deep
learning-based models for reaction and retrosynthesis prediction. We summarize
the design mechanisms, strengths, and weaknesses of state-of-the-art
approaches. Then, we discuss the limitations of current solutions and open
challenges in the problem itself. Finally, we present promising directions to
facilitate future research. To our knowledge, this paper is the first
comprehensive and systematic survey that seeks to provide a unified
understanding of reaction and retrosynthesis prediction.Comment: Accepted as IJCAI 2023 Surve
MotifRetro: Exploring the Combinability-Consistency Trade-offs in retrosynthesis via Dynamic Motif Editing
Is there a unified framework for graph-based retrosynthesis prediction?
Through analysis of full-, semi-, and non-template retrosynthesis methods, we
discovered that they strive to strike an optimal balance between combinability
and consistency: \textit{Should atoms be combined as motifs to simplify the
molecular editing process, or should motifs be broken down into atoms to reduce
the vocabulary and improve predictive consistency?}
Recent works have studied several specific cases, while none of them explores
different combinability-consistency trade-offs. Therefore, we propose
MotifRetro, a dynamic motif editing framework for retrosynthesis prediction
that can explore the entire trade-off space and unify graph-based models.
MotifRetro comprises two components: RetroBPE, which controls the
combinability-consistency trade-off, and a motif editing model, where we
introduce a novel LG-EGAT module to dynamiclly add motifs to the molecule. We
conduct extensive experiments on USPTO-50K to explore how the trade-off affects
the model performance and finally achieve state-of-the-art performance
Re-evaluating Retrosynthesis Algorithms with Syntheseus
The planning of how to synthesize molecules, also known as retrosynthesis,
has been a growing focus of the machine learning and chemistry communities in
recent years. Despite the appearance of steady progress, we argue that
imperfect benchmarks and inconsistent comparisons mask systematic shortcomings
of existing techniques. To remedy this, we present a benchmarking library
called syntheseus which promotes best practice by default, enabling consistent
meaningful evaluation of single-step and multi-step retrosynthesis algorithms.
We use syntheseus to re-evaluate a number of previous retrosynthesis
algorithms, and find that the ranking of state-of-the-art models changes when
evaluated carefully. We end with guidance for future works in this area
Retrosynthesis prediction enhanced by in-silico reaction data augmentation
Recent advances in machine learning (ML) have expedited retrosynthesis
research by assisting chemists to design experiments more efficiently. However,
all ML-based methods consume substantial amounts of paired training data (i.e.,
chemical reaction: product-reactant(s) pair), which is costly to obtain.
Moreover, companies view reaction data as a valuable asset and restrict the
accessibility to researchers. These issues prevent the creation of more
powerful retrosynthesis models due to their data-driven nature. As a response,
we exploit easy-to-access unpaired data (i.e., one component of
product-reactant(s) pair) for generating in-silico paired data to facilitate
model training. Specifically, we present RetroWISE, a self-boosting framework
that employs a base model inferred from real paired data to perform in-silico
reaction generation and augmentation using unpaired data, ultimately leading to
a superior model. On three benchmark datasets, RetroWISE achieves the best
overall performance against state-of-the-art models (e.g., +8.6% top-1 accuracy
on the USPTO-50K test dataset). Moreover, it consistently improves the
prediction accuracy of rare transformations. These results show that Retro-
WISE overcomes the training bottleneck by in-silico reactions, thereby paving
the way toward more effective ML-based retrosynthesis models
T-Rex: Text-assisted Retrosynthesis Prediction
As a fundamental task in computational chemistry, retrosynthesis prediction
aims to identify a set of reactants to synthesize a target molecule. Existing
template-free approaches only consider the graph structures of the target
molecule, which often cannot generalize well to rare reaction types and large
molecules. Here, we propose T-Rex, a text-assisted retrosynthesis prediction
approach that exploits pre-trained text language models, such as ChatGPT, to
assist the generation of reactants. T-Rex first exploits ChatGPT to generate a
description for the target molecule and rank candidate reaction centers based
both the description and the molecular graph. It then re-ranks these candidates
by querying the descriptions for each reactants and examines which group of
reactants can best synthesize the target molecule. We observed that T-Rex
substantially outperformed graph-based state-of-the-art approaches on two
datasets, indicating the effectiveness of considering text information. We
further found that T-Rex outperformed the variant that only use ChatGPT-based
description without the re-ranking step, demonstrate how our framework
outperformed a straightforward integration of ChatGPT and graph information.
Collectively, we show that text generated by pre-trained language models can
substantially improve retrosynthesis prediction, opening up new avenues for
exploiting ChatGPT to advance computational chemistry. And the codes can be
found at https://github.com/lauyikfung/T-Rex
Graph neural networks for materials science and chemistry
Machine learning plays an increasingly important role in many areas of chemistry and materials science, being used to predict materials properties, accelerate simulations, design new structures, and predict synthesis routes of new materials. Graph neural networks (GNNs) are one of the fastest growing classes of machine learning models. They are of particular relevance for chemistry and materials science, as they directly work on a graph or structural representation of molecules and materials and therefore have full access to all relevant information required to characterize materials. In this Review, we provide an overview of the basic principles of GNNs, widely used datasets, and state-of-the-art architectures, followed by a discussion of a wide range of recent applications of GNNs in chemistry and materials science, and concluding with a road-map for the further development and application of GNNs
GLAMM: Genome-Linked Application for Metabolic Maps
The Genome-Linked Application for Metabolic Maps (GLAMM) is a unified web interface for visualizing metabolic networks, reconstructing metabolic networks from annotated genome data, visualizing experimental data in the context of metabolic networks and investigating the construction of novel, transgenic pathways. This simple, user-friendly interface is tightly integrated with the comparative genomics tools of MicrobesOnline [Dehal et al. (2010) Nucleic Acids Research, 38, D396–D400]. GLAMM is available for free to the scientific community at glamm.lbl.gov
InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery
The rapid evolution of artificial intelligence in drug discovery encounters
challenges with generalization and extensive training, yet Large Language
Models (LLMs) offer promise in reshaping interactions with complex molecular
data. Our novel contribution, InstructMol, a multi-modal LLM, effectively
aligns molecular structures with natural language via an instruction-tuning
approach, utilizing a two-stage training strategy that adeptly combines limited
domain-specific data with molecular and textual information. InstructMol
showcases substantial performance improvements in drug discovery-related
molecular tasks, surpassing leading LLMs and significantly reducing the gap
with specialized models, thereby establishing a robust foundation for a
versatile and dependable drug discovery assistant
- …