19 research outputs found
Multi-State RNA Design with Geometric Multi-Graph Neural Networks
Computational RNA design has broad applications across synthetic biology and
therapeutic development. Fundamental to the diverse biological functions of RNA
is its conformational flexibility, enabling single sequences to adopt a variety
of distinct 3D states. Currently, computational biomolecule design tasks are
often posed as inverse problems, where sequences are designed based on adopting
a single desired structural conformation. In this work, we propose gRNAde, a
geometric RNA design pipeline that operates on sets of 3D RNA backbone
structures to explicitly account for and reflect RNA conformational diversity
in its designs. We demonstrate the utility of gRNAde for improving native
sequence recovery over single-state approaches on a new large-scale 3D RNA
design dataset, especially for multi-state and structurally diverse RNAs. Our
code is available at https://github.com/chaitjo/geometric-rna-desig
Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?
Deep generative models for structure-based drug design (SBDD), where molecule
generation is conditioned on a 3D protein pocket, have received considerable
interest in recent years. These methods offer the promise of higher-quality
molecule generation by explicitly modelling the 3D interaction between a
potential drug and a protein receptor. However, previous work has primarily
focused on the quality of the generated molecules themselves, with limited
evaluation of the 3D molecule \emph{poses} that these methods produce, with
most work simply discarding the generated pose and only reporting a "corrected"
pose after redocking with traditional methods. Little is known about whether
generated molecules satisfy known physical constraints for binding and the
extent to which redocking alters the generated interactions. We introduce
PoseCheck, an extensive analysis of multiple state-of-the-art methods and find
that generated molecules have significantly more physical violations and fewer
key interactions compared to baselines, calling into question the implicit
assumption that providing rich 3D structure information improves molecule
complementarity. We make recommendations for future research tackling
identified failure modes and hope our benchmark can serve as a springboard for
future SBDD generative modelling work to have a real-world impact
SARS-CoV-2 3D database: Understanding the Coronavirus Proteome and Evaluating Possible Drug Targets.
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a rapidly growing infectious disease, widely spread with high mortality rates. Since the release of the SARS-CoV-2 genome sequence in March 2020, there has been an international focus on developing target-based drug discovery, which also requires knowledge of the 3D structure of the proteome. Where there are no experimentally solved structures, our group has created 3D models with coverage of 97.5% and characterised them using state-of-the-art computational approaches. Models of protomers and oligomers, together with predictions of substrate and allosteric binding sites, protein- ligand docking, SARS-CoV-2 protein interactions with human proteins, impacts of mutations, and mapped solved experimental structures are freely available for download. These are imple- mented in SARS CoV-2 3D, a comprehensive and user-friendly database, available at https://sars3d.com/. This provides essential information for drug discovery, both to evaluate targets and design new potential therapeutics.This work is supported and funded by King Abdullah scholarship (Saudi Arabia research coun- cil), and American Leprosy Missions grants (G88726), SET is funded by the Cystic Fibrosis Trust (RG 70975) and Fondation Botnar (RG91317). A.R.J is funded by the Biotechnology and Biological Sciences Research Council (BBSRC) DTP studentship (BB/M011194/1). B.B. is funded by the Cystic Fibrosis Trust and L.C. on a studentship from Ipsen. T.L.B. is funded by a the Wellcome Trust Investigator Award, PHZJ/489 RG83114 (2016-2021
Recommended from our members
Functional and anatomical specificity in a higher olfactory centre.
Most sensory systems are organized into parallel neuronal pathways that process distinct aspects of incoming stimuli. In the insect olfactory system, second order projection neurons target both the mushroom body, required for learning, and the lateral horn (LH), proposed to mediate innate olfactory behavior. Mushroom body neurons form a sparse olfactory population code, which is not stereotyped across animals. In contrast, odor coding in the LH remains poorly understood. We combine genetic driver lines, anatomical and functional criteria to show that the Drosophila LH has ~1400 neurons and >165 cell types. Genetically labeled LHNs have stereotyped odor responses across animals and on average respond to three times more odors than single projection neurons. LHNs are better odor categorizers than projection neurons, likely due to stereotyped pooling of related inputs. Our results reveal some of the principles by which a higher processing area can extract innate behavioral significance from sensory stimuli
Data-driven discovery of molecular photoswitches with multioutput Gaussian processes
Photoswitchable molecules display two or more isomeric forms that may be accessed using light. Separating the electronic absorption bands of these isomers is key to selectively addressing a specific isomer and achieving high photostationary states whilst overall red-shifting the absorption bands serves to limit material damage due to UV-exposure and increases penetration depth in photopharmacological applications. Engineering these properties into a system through synthetic design however, remains a challenge. Here, we present a data-driven discovery pipeline for molecular photoswitches underpinned by dataset curation and multitask learning with Gaussian processes. In the prediction of electronic transition wavelengths, we demonstrate that a multioutput Gaussian process (MOGP) trained using labels from four photoswitch transition wavelengths yields the strongest predictive performance relative to single-task models as well as operationally outperforming time-dependent density functional theory (TD-DFT) in terms of the wall-clock time for prediction. We validate our proposed approach experimentally by screening a library of commercially available photoswitchable molecules. Through this screen, we identified several motifs that displayed separated electronic absorption bands of their isomers, exhibited red-shifted absorptions, and are suited for information transfer and photopharmacological applications. Our curated dataset, code, as well as all models are made available at https://github.com/Ryan-Rhys/The-Photoswitch-Dataset
Structure-aware generation of drug-like molecules
Structure-based drug design involves finding ligand molecules that exhibit
structural and chemical complementarity to protein pockets. Deep generative
methods have shown promise in proposing novel molecules from scratch (de-novo
design), avoiding exhaustive virtual screening of chemical space. Most
generative de-novo models fail to incorporate detailed ligand-protein
interactions and 3D pocket structures. We propose a novel supervised model that
generates molecular graphs jointly with 3D pose in a discretised molecular
space. Molecules are built atom-by-atom inside pockets, guided by structural
information from crystallographic data. We evaluate our model using a docking
benchmark and find that guided generation improves predicted binding affinities
by 8% and drug-likeness scores by 10% over the baseline. Furthermore, our model
proposes molecules with binding scores exceeding some known ligands, which
could be useful in future wet-lab studies
Protein Representation Learning by Geometric Structure Pretraining
Learning effective protein representations is critical in a variety of tasks
in biology such as predicting protein function or structure. Existing
approaches usually pretrain protein language models on a large number of
unlabeled amino acid sequences and then finetune the models with some labeled
data in downstream tasks. Despite the effectiveness of sequence-based
approaches, the power of pretraining on known protein structures, which are
available in smaller numbers only, has not been explored for protein property
prediction, though protein structures are known to be determinants of protein
function. In this paper, we propose to pretrain protein representations
according to their 3D structures. We first present a simple yet effective
encoder to learn the geometric features of a protein. We pretrain the protein
graph encoder by leveraging multiview contrastive learning and different
self-prediction tasks. Experimental results on both function prediction and
fold classification tasks show that our proposed pretraining methods outperform
or are on par with the state-of-the-art sequence-based methods, while using
much less data. All codes and models will be published upon acceptance
The Photoswitch Dataset: A Molecular Machine Learning Benchmark for the Advancement of Synthetic Chemistry
The space of synthesizable molecules is greater than , meaning only a vanishingly small fraction of these molecules have ever been realized in the lab. In order to prioritize which regions of this space to explore next, synthetic chemists need access to accurate molecular property predictions. While great advances in molecular machine learning have been made, there is a dearth of benchmarks featuring properties that are useful for the synthetic chemist. Focussing directly on the needs of the synthetic chemist, we introduce the Photoswitch Dataset, a new benchmark for molecular machine learning where improvements in model performance can be immediately observed in the throughput of promising molecules synthesized in the lab. Photoswitches are a versatile class of molecule for medical and renewable energy applications where a molecule\u27s efficacy is governed by its electronic transition wavelengths. We demonstrate superior performance in predicting these wavelengths compared to both time-dependent density functional theory (TD-DFT), the incumbent first principles quantum mechanical approach, as well as a panel of human experts. Our baseline models are currently being deployed in the lab as part of the decision process for candidate synthesis. It is our hope that this benchmark can drive real discoveries in photoswitch chemistry and that future benchmarks can be introduced to pivot learning algorithm development to benefit more expansive areas of synthetic chemistry