Search CORE

19 research outputs found

Multi-State RNA Design with Geometric Multi-Graph Neural Networks

Author: Harris Charles
Jamasb Arian R.
Joshi Chaitanya K.
Liò Pietro
Mathis Simon
Viñas Ramon
Publication venue
Publication date: 28/05/2023
Field of study

Computational RNA design has broad applications across synthetic biology and therapeutic development. Fundamental to the diverse biological functions of RNA is its conformational flexibility, enabling single sequences to adopt a variety of distinct 3D states. Currently, computational biomolecule design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired structural conformation. In this work, we propose gRNAde, a geometric RNA design pipeline that operates on sets of 3D RNA backbone structures to explicitly account for and reflect RNA conformational diversity in its designs. We demonstrate the utility of gRNAde for improving native sequence recovery over single-state approaches on a new large-scale 3D RNA design dataset, especially for multi-state and structurally diverse RNAs. Our code is available at https://github.com/chaitjo/geometric-rna-desig

arXiv.org e-Print Archive

Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?

Author: Blundell Tom
Didi Kieran
Harris Charles
Jamasb Arian R.
Joshi Chaitanya K.
Lio Pietro
Mathis Simon V.
Publication venue
Publication date: 14/08/2023
Field of study

Deep generative models for structure-based drug design (SBDD), where molecule generation is conditioned on a 3D protein pocket, have received considerable interest in recent years. These methods offer the promise of higher-quality molecule generation by explicitly modelling the 3D interaction between a potential drug and a protein receptor. However, previous work has primarily focused on the quality of the generated molecules themselves, with limited evaluation of the 3D molecule \emph{poses} that these methods produce, with most work simply discarding the generated pose and only reporting a "corrected" pose after redocking with traditional methods. Little is known about whether generated molecules satisfy known physical constraints for binding and the extent to which redocking alters the generated interactions. We introduce PoseCheck, an extensive analysis of multiple state-of-the-art methods and find that generated molecules have significantly more physical violations and fewer key interactions compared to baselines, calling into question the implicit assumption that providing rich 3D structure information improves molecule complementarity. We make recommendations for future research tackling identified failure modes and hope our benchmark can serve as a springboard for future SBDD generative modelling work to have a real-world impact

arXiv.org e-Print Archive

SARS-CoV-2 3D database: Understanding the Coronavirus Proteome and Evaluating Possible Drug Targets.

Author: Alsulami Ali
Bannerman Bridget
Beaudoin Christopher
Blundell Tom
Copoiu Liviu
Jamasb Arian
Moghul Ismail
Thomas Sherine
Torres pedro
Vedithi Sundeep
Publication venue: Briefings in Bioinformatics
Publication date: 22/03/2021
Field of study

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a rapidly growing infectious disease, widely spread with high mortality rates. Since the release of the SARS-CoV-2 genome sequence in March 2020, there has been an international focus on developing target-based drug discovery, which also requires knowledge of the 3D structure of the proteome. Where there are no experimentally solved structures, our group has created 3D models with coverage of 97.5% and characterised them using state-of-the-art computational approaches. Models of protomers and oligomers, together with predictions of substrate and allosteric binding sites, protein- ligand docking, SARS-CoV-2 protein interactions with human proteins, impacts of mutations, and mapped solved experimental structures are freely available for download. These are imple- mented in SARS CoV-2 3D, a comprehensive and user-friendly database, available at https://sars3d.com/. This provides essential information for drug discovery, both to evaluate targets and design new potential therapeutics.This work is supported and funded by King Abdullah scholarship (Saudi Arabia research coun- cil), and American Leprosy Missions grants (G88726), SET is funded by the Cystic Fibrosis Trust (RG 70975) and Fondation Botnar (RG91317). A.R.J is funded by the Biotechnology and Biological Sciences Research Council (BBSRC) DTP studentship (BB/M011194/1). B.B. is funded by the Cystic Fibrosis Trust and L.C. on a studentship from Ipsen. T.L.B. is funded by a the Wellcome Trust Investigator Award, PHZJ/489 RG83114 (2016-2021

UCL Discovery

Apollo (Cambridge)

Recommended from our members

Functional and anatomical specificity in a higher olfactory centre.

Author: Bates Alexander Shakeel
Bock Davi
Dolan Michael-John
Frechter Shahar
Jamasb Arian Rokkum
Jefferis Gregory
Kohl Johannes
Manton James
Tootoonian Sina
Publication venue: Elife
Publication date: 21/05/2019
Field of study

Most sensory systems are organized into parallel neuronal pathways that process distinct aspects of incoming stimuli. In the insect olfactory system, second order projection neurons target both the mushroom body, required for learning, and the lateral horn (LH), proposed to mediate innate olfactory behavior. Mushroom body neurons form a sparse olfactory population code, which is not stereotyped across animals. In contrast, odor coding in the LH remains poorly understood. We combine genetic driver lines, anatomical and functional criteria to show that the Drosophila LH has ~1400 neurons and >165 cell types. Genetically labeled LHNs have stereotyped odor responses across animals and on average respond to three times more odors than single projection neurons. LHNs are better odor categorizers than projection neurons, likely due to stereotyped pooling of related inputs. Our results reveal some of the principles by which a higher processing area can extract innate behavioral significance from sensory stimuli

Apollo (Cambridge)

Data-driven discovery of molecular photoswitches with multioutput Gaussian processes

Author: Aldrick Alexander A
Bourached Anthony
Fuchter Matthew J
Greenfield Jake L
Griffiths Ryan-Rhys
Jamasb Arian R
Jones Penelope
Lee Alpha A
McCorkindale William
Moss Henry B
Thawani Aditya R
Publication venue: ROYAL SOC CHEMISTRY
Publication date: 10/11/2022
Field of study

Photoswitchable molecules display two or more isomeric forms that may be accessed using light. Separating the electronic absorption bands of these isomers is key to selectively addressing a specific isomer and achieving high photostationary states whilst overall red-shifting the absorption bands serves to limit material damage due to UV-exposure and increases penetration depth in photopharmacological applications. Engineering these properties into a system through synthetic design however, remains a challenge. Here, we present a data-driven discovery pipeline for molecular photoswitches underpinned by dataset curation and multitask learning with Gaussian processes. In the prediction of electronic transition wavelengths, we demonstrate that a multioutput Gaussian process (MOGP) trained using labels from four photoswitch transition wavelengths yields the strongest predictive performance relative to single-task models as well as operationally outperforming time-dependent density functional theory (TD-DFT) in terms of the wall-clock time for prediction. We validate our proposed approach experimentally by screening a library of commercially available photoswitchable molecules. Through this screen, we identified several motifs that displayed separated electronic absorption bands of their isomers, exhibited red-shifted absorptions, and are suited for information transfer and photopharmacological applications. Our curated dataset, code, as well as all models are made available at https://github.com/Ryan-Rhys/The-Photoswitch-Dataset

UCL Discovery

PubMed Central

Structure-aware generation of drug-like molecules

Author: Cangea Cătălina
Day Ben
Drotár Pavol
Jamasb Arian Rokkum
Liò Pietro
Publication venue
Publication date: 07/11/2021
Field of study

Structure-based drug design involves finding ligand molecules that exhibit structural and chemical complementarity to protein pockets. Deep generative methods have shown promise in proposing novel molecules from scratch (de-novo design), avoiding exhaustive virtual screening of chemical space. Most generative de-novo models fail to incorporate detailed ligand-protein interactions and 3D pocket structures. We propose a novel supervised model that generates molecular graphs jointly with 3D pose in a discretised molecular space. Molecules are built atom-by-atom inside pockets, guided by structural information from crystallographic data. We evaluate our model using a docking benchmark and find that guided generation improves predicted binding affinities by 8% and drug-likeness scores by 10% over the baseline. Furthermore, our model proposes molecules with binding scores exceeding some known ligands, which could be useful in future wet-lab studies

arXiv.org e-Print Archive

Apollo (Cambridge)

Protein Representation Learning by Geometric Structure Pretraining

Author: Chenthamarakshan Vijil
Das Payel
Jamasb Arian
Lozano Aurelie
Tang Jian
Xu Minghao
Zhang Zuobai
Publication venue
Publication date: 23/05/2022
Field of study

Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein function or structure. Existing approaches usually pretrain protein language models on a large number of unlabeled amino acid sequences and then finetune the models with some labeled data in downstream tasks. Despite the effectiveness of sequence-based approaches, the power of pretraining on known protein structures, which are available in smaller numbers only, has not been explored for protein property prediction, though protein structures are known to be determinants of protein function. In this paper, we propose to pretrain protein representations according to their 3D structures. We first present a simple yet effective encoder to learn the geometric features of a protein. We pretrain the protein graph encoder by leveraging multiview contrastive learning and different self-prediction tasks. Experimental results on both function prediction and fold classification tasks show that our proposed pretraining methods outperform or are on par with the state-of-the-art sequence-based methods, while using much less data. All codes and models will be published upon acceptance

arXiv.org e-Print Archive

The Photoswitch Dataset: A Molecular Machine Learning Benchmark for the Advancement of Synthetic Chemistry

Author: Aditya Raymond Thawani
Alexander Aldrick
Alpha Lee
Anthony Bourached
Arian Jamasb
Penelope Jones
Ryan-Rhys Griffiths
William McCorkindale
Publication venue
Publication date: 06/07/2020
Field of study

The space of synthesizable molecules is greater than

10^{60}

, meaning only a vanishingly small fraction of these molecules have ever been realized in the lab. In order to prioritize which regions of this space to explore next, synthetic chemists need access to accurate molecular property predictions. While great advances in molecular machine learning have been made, there is a dearth of benchmarks featuring properties that are useful for the synthetic chemist. Focussing directly on the needs of the synthetic chemist, we introduce the Photoswitch Dataset, a new benchmark for molecular machine learning where improvements in model performance can be immediately observed in the throughput of promising molecules synthesized in the lab. Photoswitches are a versatile class of molecule for medical and renewable energy applications where a molecule\u27s efficacy is governed by its electronic transition wavelengths. We demonstrate superior performance in predicting these wavelengths compared to both time-dependent density functional theory (TD-DFT), the incumbent first principles quantum mechanical approach, as well as a panel of human experts. Our baseline models are currently being deployed in the lab as part of the decision process for candidate synthesis. It is our hope that this benchmark can drive real discoveries in photoswitch chemistry and that future benchmarks can be introduced to pivot learning algorithm development to benefit more expansive areas of synthetic chemistry

ChemRxiv