1,021 research outputs found
Geometric Latent Diffusion Models for 3D Molecule Generation
Generative models, especially diffusion models (DMs), have achieved promising
results for generating feature-rich geometries and advancing foundational
science problems such as molecule design. Inspired by the recent huge success
of Stable (latent) Diffusion models, we propose a novel and principled method
for 3D molecule generation named Geometric Latent Diffusion Models (GeoLDM).
GeoLDM is the first latent DM model for the molecular geometry domain, composed
of autoencoders encoding structures into continuous latent codes and DMs
operating in the latent space. Our key innovation is that for modeling the 3D
molecular geometries, we capture its critical roto-translational equivariance
constraints by building a point-structured latent space with both invariant
scalars and equivariant tensors. Extensive experiments demonstrate that GeoLDM
can consistently achieve better performance on multiple molecule generation
benchmarks, with up to 7\% improvement for the valid percentage of large
biomolecules. Results also demonstrate GeoLDM's higher capacity for
controllable generation thanks to the latent modeling. Code is provided at
\url{https://github.com/MinkaiXu/GeoLDM}.Comment: Published at ICML 202
DiffPack: A Torsional Diffusion Model for Autoregressive Protein Side-Chain Packing
Proteins play a critical role in carrying out biological functions, and their
3D structures are essential in determining their functions. Accurately
predicting the conformation of protein side-chains given their backbones is
important for applications in protein structure prediction, design and
protein-protein interactions. Traditional methods are computationally intensive
and have limited accuracy, while existing machine learning methods treat the
problem as a regression task and overlook the restrictions imposed by the
constant covalent bond lengths and angles. In this work, we present DiffPack, a
torsional diffusion model that learns the joint distribution of side-chain
torsional angles, the only degrees of freedom in side-chain packing, by
diffusing and denoising on the torsional space. To avoid issues arising from
simultaneous perturbation of all four torsional angles, we propose
autoregressively generating the four torsional angles from \c{hi}1 to \c{hi}4
and training diffusion models for each torsional angle. We evaluate the method
on several benchmarks for protein side-chain packing and show that our method
achieves improvements of 11.9% and 13.5% in angle accuracy on CASP13 and
CASP14, respectively, with a significantly smaller model size (60x fewer
parameters). Additionally, we show the effectiveness of our method in enhancing
side-chain predictions in the AlphaFold2 model. Code will be available upon the
accept.Comment: Under revie
Graph Neural Networks for Molecules
Graph neural networks (GNNs), which are capable of learning representations
from graphical data, are naturally suitable for modeling molecular systems.
This review introduces GNNs and their various applications for small organic
molecules. GNNs rely on message-passing operations, a generic yet powerful
framework, to update node features iteratively. Many researches design GNN
architectures to effectively learn topological information of 2D molecule
graphs as well as geometric information of 3D molecular systems. GNNs have been
implemented in a wide variety of molecular applications, including molecular
property prediction, molecular scoring and docking, molecular optimization and
de novo generation, molecular dynamics simulation, etc. Besides, the review
also summarizes the recent development of self-supervised learning for
molecules with GNNs.Comment: A chapter for the book "Machine Learning in Molecular Sciences". 31
pages, 4 figure
Barking up the right tree: An approach to search over molecule synthesis DAGs
When designing new molecules with particular properties, it is not only
important what to make but crucially how to make it. These instructions form a synthesis directed acyclic graph (DAG), describing how a large vocabulary of simple building blocks can be recursively combined through chemical reactions to create more complicated molecules of interest. In contrast, many current deep generative models for molecules ignore synthesizability. We therefore propose a deep generative model that better represents the real world process, by directly outputting molecule synthesis DAGs. We argue that this provides sensible inductive biases, ensuring that our model searches over the same chemical space that chemists would also have access to, as well as interpretability. We show that our approach is able to model chemical space well, producing a wide range of diverse molecules, and allows for unconstrained optimization of an inherently constrained problem: maximize certain chemical properties such that discovered molecules are synthesizable
Modeling and design of heterogeneous hierarchical bioinspired spider web structures using generative deep learning and additive manufacturing
Spider webs are incredible biological structures, comprising thin but strong
silk filament and arranged into complex hierarchical architectures with
striking mechanical properties (e.g., lightweight but high strength, achieving
diverse mechanical responses). While simple 2D orb webs can easily be mimicked,
the modeling and synthesis of 3D-based web structures remain challenging,
partly due to the rich set of design features. Here we provide a detailed
analysis of the heterogenous graph structures of spider webs, and use deep
learning as a way to model and then synthesize artificial, bio-inspired 3D web
structures. The generative AI models are conditioned based on key geometric
parameters (including average edge length, number of nodes, average node
degree, and others). To identify graph construction principles, we use
inductive representation sampling of large experimentally determined spider web
graphs, to yield a dataset that is used to train three conditional generative
models: 1) An analog diffusion model inspired by nonequilibrium thermodynamics,
with sparse neighbor representation, 2) a discrete diffusion model with full
neighbor representation, and 3) an autoregressive transformer architecture with
full neighbor representation. All three models are scalable, produce complex,
de novo bio-inspired spider web mimics, and successfully construct graphs that
meet the design objectives. We further propose algorithm that assembles web
samples produced by the generative models into larger-scale structures based on
a series of geometric design targets, including helical and parametric shapes,
mimicking, and extending natural design principles towards integration with
diverging engineering objectives. Several webs are manufactured using 3D
printing and tested to assess mechanical properties
A Systematic Survey in Geometric Deep Learning for Structure-based Drug Design
Structure-based drug design (SBDD), which utilizes the three-dimensional
geometry of proteins to identify potential drug candidates, is becoming
increasingly vital in drug discovery. However, traditional methods based on
physiochemical modeling and experts' domain knowledge are time-consuming and
laborious. The recent advancements in geometric deep learning, which integrates
and processes 3D geometric data, coupled with the availability of accurate
protein 3D structure predictions from tools like AlphaFold, have significantly
propelled progress in structure-based drug design. In this paper, we
systematically review the recent progress of geometric deep learning for
structure-based drug design. We start with a brief discussion of the mainstream
tasks in structure-based drug design, commonly used 3D protein representations
and representative predictive/generative models. Then we delve into detailed
reviews for each task (binding site prediction, binding pose generation,
\emph{de novo} molecule generation, linker design, and binding affinity
prediction), including the problem setup, representative methods, datasets, and
evaluation metrics. Finally, we conclude this survey with the current
challenges and highlight potential opportunities of geometric deep learning for
structure-based drug design.Comment: 14 page
Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems
Advances in artificial intelligence (AI) are fueling a new paradigm of
discoveries in natural sciences. Today, AI has started to advance natural
sciences by improving, accelerating, and enabling our understanding of natural
phenomena at a wide range of spatial and temporal scales, giving rise to a new
area of research known as AI for science (AI4Science). Being an emerging
research paradigm, AI4Science is unique in that it is an enormous and highly
interdisciplinary area. Thus, a unified and technical treatment of this field
is needed yet challenging. This work aims to provide a technically thorough
account of a subarea of AI4Science; namely, AI for quantum, atomistic, and
continuum systems. These areas aim at understanding the physical world from the
subatomic (wavefunctions and electron density), atomic (molecules, proteins,
materials, and interactions), to macro (fluids, climate, and subsurface) scales
and form an important subarea of AI4Science. A unique advantage of focusing on
these areas is that they largely share a common set of challenges, thereby
allowing a unified and foundational treatment. A key common challenge is how to
capture physics first principles, especially symmetries, in natural systems by
deep learning methods. We provide an in-depth yet intuitive account of
techniques to achieve equivariance to symmetry transformations. We also discuss
other common technical challenges, including explainability,
out-of-distribution generalization, knowledge transfer with foundation and
large language models, and uncertainty quantification. To facilitate learning
and education, we provide categorized lists of resources that we found to be
useful. We strive to be thorough and unified and hope this initial effort may
trigger more community interests and efforts to further advance AI4Science
- …