Search CORE

1,021 research outputs found

Geometric Latent Diffusion Models for 3D Molecule Generation

Author: Dror Ron
Ermon Stefano
Leskovec Jure
Powers Alexander
Xu Minkai
Publication venue
Publication date: 01/05/2023
Field of study

Generative models, especially diffusion models (DMs), have achieved promising results for generating feature-rich geometries and advancing foundational science problems such as molecule design. Inspired by the recent huge success of Stable (latent) Diffusion models, we propose a novel and principled method for 3D molecule generation named Geometric Latent Diffusion Models (GeoLDM). GeoLDM is the first latent DM model for the molecular geometry domain, composed of autoencoders encoding structures into continuous latent codes and DMs operating in the latent space. Our key innovation is that for modeling the 3D molecular geometries, we capture its critical roto-translational equivariance constraints by building a point-structured latent space with both invariant scalars and equivariant tensors. Extensive experiments demonstrate that GeoLDM can consistently achieve better performance on multiple molecule generation benchmarks, with up to 7\% improvement for the valid percentage of large biomolecules. Results also demonstrate GeoLDM's higher capacity for controllable generation thanks to the latent modeling. Code is provided at \url{https://github.com/MinkaiXu/GeoLDM}.Comment: Published at ICML 202

arXiv.org e-Print Archive

DiffPack: A Torsional Diffusion Model for Autoregressive Protein Side-Chain Packing

Author: Misra Sanchit
Tang Jian
Zhan Yangtian
Zhang Zuobai
Zhong Bozitao
Publication venue
Publication date: 01/06/2023
Field of study

Proteins play a critical role in carrying out biological functions, and their 3D structures are essential in determining their functions. Accurately predicting the conformation of protein side-chains given their backbones is important for applications in protein structure prediction, design and protein-protein interactions. Traditional methods are computationally intensive and have limited accuracy, while existing machine learning methods treat the problem as a regression task and overlook the restrictions imposed by the constant covalent bond lengths and angles. In this work, we present DiffPack, a torsional diffusion model that learns the joint distribution of side-chain torsional angles, the only degrees of freedom in side-chain packing, by diffusing and denoising on the torsional space. To avoid issues arising from simultaneous perturbation of all four torsional angles, we propose autoregressively generating the four torsional angles from \c{hi}1 to \c{hi}4 and training diffusion models for each torsional angle. We evaluate the method on several benchmarks for protein side-chain packing and show that our method achieves improvements of 11.9% and 13.5% in angle accuracy on CASP13 and CASP14, respectively, with a significantly smaller model size (60x fewer parameters). Additionally, we show the effectiveness of our method in enhancing side-chain predictions in the AlphaFold2 model. Code will be available upon the accept.Comment: Under revie

arXiv.org e-Print Archive

Graph Neural Networks for Molecules

Author: Farimani Amir Barati
Li Zijie
Wang Yuyang
Publication venue
Publication date: 06/02/2023
Field of study

Graph neural networks (GNNs), which are capable of learning representations from graphical data, are naturally suitable for modeling molecular systems. This review introduces GNNs and their various applications for small organic molecules. GNNs rely on message-passing operations, a generic yet powerful framework, to update node features iteratively. Many researches design GNN architectures to effectively learn topological information of 2D molecule graphs as well as geometric information of 3D molecular systems. GNNs have been implemented in a wide variety of molecular applications, including molecular property prediction, molecular scoring and docking, molecular optimization and de novo generation, molecular dynamics simulation, etc. Besides, the review also summarizes the recent development of self-supervised learning for molecules with GNNs.Comment: A chapter for the book "Machine Learning in Molecular Sciences". 31 pages, 4 figure

arXiv.org e-Print Archive

Barking up the right tree: An approach to search over molecule synthesis DAGs

Author: Bradshaw J
Hernández-Lobato JM
Kusner MJ
Paige B
Segler MHS
Publication venue: Advances in Neural Information Processing Systems
Publication date: 01/01/2020
Field of study

When designing new molecules with particular properties, it is not only important what to make but crucially how to make it. These instructions form a synthesis directed acyclic graph (DAG), describing how a large vocabulary of simple building blocks can be recursively combined through chemical reactions to create more complicated molecules of interest. In contrast, many current deep generative models for molecules ignore synthesizability. We therefore propose a deep generative model that better represents the real world process, by directly outputting molecule synthesis DAGs. We argue that this provides sensible inductive biases, ensuring that our model searches over the same chemical space that chemists would also have access to, as well as interpretability. We show that our approach is able to model chemical space well, producing a wide range of diverse molecules, and allows for unconstrained optimization of an inherently constrained problem: maximize certain chemical properties such that discovered molecules are synthesizable

arXiv.org e-Print Archive

Apollo (Cambridge)

Modeling and design of heterogeneous hierarchical bioinspired spider web structures using generative deep learning and additive manufacturing

Author: Buehler Markus J.
Lee Nic A.
Lu Wei
Publication venue
Publication date: 11/04/2023
Field of study

Spider webs are incredible biological structures, comprising thin but strong silk filament and arranged into complex hierarchical architectures with striking mechanical properties (e.g., lightweight but high strength, achieving diverse mechanical responses). While simple 2D orb webs can easily be mimicked, the modeling and synthesis of 3D-based web structures remain challenging, partly due to the rich set of design features. Here we provide a detailed analysis of the heterogenous graph structures of spider webs, and use deep learning as a way to model and then synthesize artificial, bio-inspired 3D web structures. The generative AI models are conditioned based on key geometric parameters (including average edge length, number of nodes, average node degree, and others). To identify graph construction principles, we use inductive representation sampling of large experimentally determined spider web graphs, to yield a dataset that is used to train three conditional generative models: 1) An analog diffusion model inspired by nonequilibrium thermodynamics, with sparse neighbor representation, 2) a discrete diffusion model with full neighbor representation, and 3) an autoregressive transformer architecture with full neighbor representation. All three models are scalable, produce complex, de novo bio-inspired spider web mimics, and successfully construct graphs that meet the design objectives. We further propose algorithm that assembles web samples produced by the generative models into larger-scale structures based on a series of geometric design targets, including helical and parametric shapes, mimicking, and extending natural design principles towards integration with diverging engineering objectives. Several webs are manufactured using 3D printing and tested to assess mechanical properties

arXiv.org e-Print Archive

A Systematic Survey in Geometric Deep Learning for Structure-based Drug Design

Author: Chen Enhong
Liu Qi
Yan Jiaxian
Zhang Zaixi
Publication venue
Publication date: 21/06/2023
Field of study

Structure-based drug design (SBDD), which utilizes the three-dimensional geometry of proteins to identify potential drug candidates, is becoming increasingly vital in drug discovery. However, traditional methods based on physiochemical modeling and experts' domain knowledge are time-consuming and laborious. The recent advancements in geometric deep learning, which integrates and processes 3D geometric data, coupled with the availability of accurate protein 3D structure predictions from tools like AlphaFold, have significantly propelled progress in structure-based drug design. In this paper, we systematically review the recent progress of geometric deep learning for structure-based drug design. We start with a brief discussion of the mainstream tasks in structure-based drug design, commonly used 3D protein representations and representative predictive/generative models. Then we delve into detailed reviews for each task (binding site prediction, binding pose generation, \emph{de novo} molecule generation, linker design, and binding affinity prediction), including the problem setup, representative methods, datasets, and evaluation metrics. Finally, we conclude this survey with the current challenges and highlight potential opportunities of geometric deep learning for structure-based drug design.Comment: 14 page

arXiv.org e-Print Archive

Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

Author: Adams Keir
Anandkumar Anima
Aspuru-Guzik Alán
Azizzadenesheli Kamyar
Barzilay Regina
Bekkers Erik
Bohde Montgomery
Bronstein Michael
Coley Connor W.
Daigavane Ameya
Du Yuanqi
Edwards Carl
Ermon Stefano
Fang Ada
Fu Cong
Fu Tianfan
Fu Xiang
Gao Nicholas
Gui Shurui
Günnemann Stephan
Helwig Jacob
Hofgard Elyssa F.
Huang Qian
Jaakkola Tommi
Ji Heng
Ji Shuiwang
Joshi Chaitanya K.
Kurtin Jerry
Ladera Adriana
Lawrence Hannah
Leskovec Jure
Li Xiner
Lin Yuchao
Ling Hongyi
Liu Meng
Liu Yi
Liò Pietro
Luo Youzhi
Mathis Simon V.
Phung Tuong
Qian Xiaofeng
Qian Xiaoning
Saxton Alexandra
Smidt Tess
Strasser Alex
Stärk Hannes
Sun Jimeng
Tehrani Aria Mansouri
Wang Limei
Wang Rui
Wang Yucheng
Weiler Maurice
Wu Tailin
Xie Yaochen
Xie YuQing
Xu Minkai
Xu Shenglong
Xu Zhao
Yan Keqiang
Yu Haiyang
Yu Rose
Zhang Xuan
Zitnik Marinka
Publication venue
Publication date: 15/11/2023
Field of study

Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science

arXiv.org e-Print Archive