29 research outputs found
Learning Harmonic Molecular Representations on Riemannian Manifold
Molecular representation learning plays a crucial role in AI-assisted drug
discovery research. Encoding 3D molecular structures through Euclidean neural
networks has become the prevailing method in the geometric deep learning
community. However, the equivariance constraints and message passing in
Euclidean space may limit the network expressive power. In this work, we
propose a Harmonic Molecular Representation learning (HMR) framework, which
represents a molecule using the Laplace-Beltrami eigenfunctions of its
molecular surface. HMR offers a multi-resolution representation of molecular
geometric and chemical features on 2D Riemannian manifold. We also introduce a
harmonic message passing method to realize efficient spectral message passing
over the surface manifold for better molecular encoding. Our proposed method
shows comparable predictive power to current models in small molecule property
prediction, and outperforms the state-of-the-art deep learning models for
ligand-binding protein pocket classification and the rigid protein docking
challenge, demonstrating its versatility in molecular representation learning.Comment: 25 pages including Appendi
A Geometric Approach for Deciphering Protein Structure from Cryo-EM Volumes
Electron Cryo-Microscopy or cryo-EM is an area that has received much attention in the recent past. Compared to the traditional methods of X-Ray Crystallography and NMR Spectroscopy, cryo-EM can be used to image much larger complexes, in many different conformations, and under a wide range of biochemical conditions. This is because it does not require the complex to be crystallisable. However, cryo-EM reconstructions are limited to intermediate resolutions, with the state-of-the-art being 3.6A, where secondary structure elements can be visually identified but not individual amino acid residues. This lack of atomic level resolution creates new computational challenges for protein structure identification. In this dissertation, we present a suite of geometric algorithms to address several aspects of protein modeling using cryo-EM density maps. Specifically, we develop novel methods to capture the shape of density volumes as geometric skeletons. We then use these skeletons to find secondary structure elements: SSEs) of a given protein, to identify the correspondence between these SSEs and those predicted from the primary sequence, and to register high-resolution protein structures onto the density volume. In addition, we designed and developed Gorgon, an interactive molecular modeling system, that integrates the above methods with other interactive routines to generate reliable and accurate protein backbone models
Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems
Advances in artificial intelligence (AI) are fueling a new paradigm of
discoveries in natural sciences. Today, AI has started to advance natural
sciences by improving, accelerating, and enabling our understanding of natural
phenomena at a wide range of spatial and temporal scales, giving rise to a new
area of research known as AI for science (AI4Science). Being an emerging
research paradigm, AI4Science is unique in that it is an enormous and highly
interdisciplinary area. Thus, a unified and technical treatment of this field
is needed yet challenging. This work aims to provide a technically thorough
account of a subarea of AI4Science; namely, AI for quantum, atomistic, and
continuum systems. These areas aim at understanding the physical world from the
subatomic (wavefunctions and electron density), atomic (molecules, proteins,
materials, and interactions), to macro (fluids, climate, and subsurface) scales
and form an important subarea of AI4Science. A unique advantage of focusing on
these areas is that they largely share a common set of challenges, thereby
allowing a unified and foundational treatment. A key common challenge is how to
capture physics first principles, especially symmetries, in natural systems by
deep learning methods. We provide an in-depth yet intuitive account of
techniques to achieve equivariance to symmetry transformations. We also discuss
other common technical challenges, including explainability,
out-of-distribution generalization, knowledge transfer with foundation and
large language models, and uncertainty quantification. To facilitate learning
and education, we provide categorized lists of resources that we found to be
useful. We strive to be thorough and unified and hope this initial effort may
trigger more community interests and efforts to further advance AI4Science
Computational Approaches to Drug Profiling and Drug-Protein Interactions
Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a
long period of stagnation in drug approvals. Due to the extreme costs associated with
introducing a drug to the market, locating and understanding the reasons for clinical failure
is key to future productivity. As part of this PhD, three main contributions were made in
this respect. First, the web platform, LigNFam enables users to interactively explore
similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly,
two deep-learning-based binding site comparison tools were developed, competing with
the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the
open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold
relationships and has already been used in multiple projects, including integration into a
virtual screening pipeline to increase the tractability of ultra-large screening experiments.
Together, and with existing tools, the contributions made will aid in the understanding of
drug-protein relationships, particularly in the fields of off-target prediction and drug
repurposing, helping to design better drugs faster
Alignment-free molecular shape comparison using spectral geometry: the framework
A framework is presented for the calculation of novel alignment-free descriptors of molecular shape. The methods are based on the technique of spectral geometry which has been developed in the field of computer vision where it has shown impressive performance for the comparison of deformable objects such as people and animals. Spectral geometry techniques encode shape by capturing the curvature of the surface of an object into a compact, information-rich representation that is alignment-free while also being invariant to isometric deformations, that is, changes that do not distort distances over the surface. Here, we adapt the technique to the new domain of molecular shape representation. We describe a series of parametrization steps aimed at optimizing the method for this new domain. Our focus here is on demonstrating that the basic approach is able to capture a molecular shape into a compact and information-rich descriptor. We demonstrate improved performance in virtual screening over a more established alignment-free method and impressive performance compared to a more accurate, but much more computationally demanding, alignment-based approach