8,071 research outputs found
TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions
Although deep learning approaches have had tremendous success in image, video
and audio processing, computer vision, and speech recognition, their
applications to three-dimensional (3D) biomolecular structural data sets have
been hindered by the entangled geometric complexity and biological complexity.
We introduce topology, i.e., element specific persistent homology (ESPH), to
untangle geometric complexity and biological complexity. ESPH represents 3D
complex geometry by one-dimensional (1D) topological invariants and retains
crucial biological information via a multichannel image representation. It is
able to reveal hidden structure-function relationships in biomolecules. We
further integrate ESPH and convolutional neural networks to construct a
multichannel topological neural network (TopologyNet) for the predictions of
protein-ligand binding affinities and protein stability changes upon mutation.
To overcome the limitations to deep learning arising from small and noisy
training sets, we present a multitask topological convolutional neural network
(MT-TCNN). We demonstrate that the present TopologyNet architectures outperform
other state-of-the-art methods in the predictions of protein-ligand binding
affinities, globular protein mutation impacts, and membrane protein mutation
impacts.Comment: 20 pages, 8 figures, 5 table
Recommended from our members
A combined computational-experimental approach to define the structural origin of antibody recognition of sialyl-Tn, a tumor-associated carbohydrate antigen.
Anti-carbohydrate monoclonal antibodies (mAbs) hold great promise as cancer therapeutics and diagnostics. However, their specificity can be mixed, and detailed characterization is problematic, because antibody-glycan complexes are challenging to crystallize. Here, we developed a generalizable approach employing high-throughput techniques for characterizing the structure and specificity of such mAbs, and applied it to the mAb TKH2 developed against the tumor-associated carbohydrate antigen sialyl-Tn (STn). The mAb specificity was defined by apparent KD values determined by quantitative glycan microarray screening. Key residues in the antibody combining site were identified by site-directed mutagenesis, and the glycan-antigen contact surface was defined using saturation transfer difference NMR (STD-NMR). These features were then employed as metrics for selecting the optimal 3D-model of the antibody-glycan complex, out of thousands plausible options generated by automated docking and molecular dynamics simulation. STn-specificity was further validated by computationally screening of the selected antibody 3D-model against the human sialyl-Tn-glycome. This computational-experimental approach would allow rational design of potent antibodies targeting carbohydrates
Detection of the TCDD binding-fingerprint within the Ah receptor ligand binding domain by structurally driven mutagenesis and functional analysis
The aryl hydrocarbon receptor (AhR) is a ligand-dependent, basic helix-loop-helix Per-Arnt-Sim (PAS)-containing transcription factor that can bind and be activated by structurally diverse chemicals, including the toxic environmental contaminant 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD). Our previous three-dimensional homology model of the mouse AhR (mAhR) PAS B ligand binding domain allowed identification of the binding site and its experimental validation. We have extended this analysis by conducting comparative structural modeling studies of the ligand binding domains of six additional highaffinity mammalian AhRs. These results, coupled with site-directed mutagenesis and AhR functional analysis, have allowed detection of the "TCDD binding-fingerprint" of conserved residues within the ligand binding cavity necessary for high-affinity TCDD binding and TCDD-dependent AhR transformation DNA binding. The essential role of selected residues was further evaluated using molecular docking simulations of TCDD with both wild-type and mutant mAhRs. Taken together, our results dramatically improve our understanding of the molecular determinants of TCDD binding and provide a basis for future studies directed toward rationalizing the observed species differences in AhR sensitivity to TCDD and understanding the mechanistic basis for the dramatic diversity in AhR ligand structure. © 2009 American Chemical Society
Serverification of Molecular Modeling Applications: the Rosetta Online Server that Includes Everyone (ROSIE)
The Rosetta molecular modeling software package provides experimentally
tested and rapidly evolving tools for the 3D structure prediction and
high-resolution design of proteins, nucleic acids, and a growing number of
non-natural polymers. Despite its free availability to academic users and
improving documentation, use of Rosetta has largely remained confined to
developers and their immediate collaborators due to the code's difficulty of
use, the requirement for large computational resources, and the unavailability
of servers for most of the Rosetta applications. Here, we present a unified web
framework for Rosetta applications called ROSIE (Rosetta Online Server that
Includes Everyone). ROSIE provides (a) a common user interface for Rosetta
protocols, (b) a stable application programming interface for developers to add
additional protocols, (c) a flexible back-end to allow leveraging of computer
cluster resources shared by RosettaCommons member institutions, and (d)
centralized administration by the RosettaCommons to ensure continuous
maintenance. This paper describes the ROSIE server infrastructure, a
step-by-step 'serverification' protocol for use by Rosetta developers, and the
deployment of the first nine ROSIE applications by six separate developer
teams: Docking, RNA de novo, ERRASER, Antibody, Sequence Tolerance,
Supercharge, Beta peptide design, NCBB design, and VIP redesign. As illustrated
by the number and diversity of these applications, ROSIE offers a general and
speedy paradigm for serverification of Rosetta applications that incurs
negligible cost to developers and lowers barriers to Rosetta use for the
broader biological community. ROSIE is available at
http://rosie.rosettacommons.org
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
An Olfactory Receptor Pseudogene whose Function emerged in Humans
Human olfactory receptor, hOR17-210, is identified as a pseudogene in the human genome. Experimental data has shown however, that the gene product of cloned hOR17-210 cDNA was able to bind an odorant-binding protein and is narrowly tuned for excitation by cyclic ketones. Supported by experimental results, we used the bioinformatics methods of sequence analysis, computational protein modeling and docking, to show that functionality in this receptor is retained due to sequence-structure features not previously observed in mammalian ORs. This receptor does not possess the first two transmembrane helical domains (of seven typically seen in GPCRs). It however, possesses an additional TM that has not been observed in other human olfactory receptors. By incorporating these novel structural features, we created two putative models for this receptor. We also docked odor ligands that were experimentally shown to bind hOR17-210 model. We show how and why structural modifications of OR17-210 do not hinder this receptor's functionality. Our studies reveal that novel gene rearrangement that result in sequence and structural diversity in has a bearing on OR and GPCR function and evolution
DEEP LEARNING METHODS FOR PREDICTION OF AND ESCAPE FROM PROTEIN RECOGNITION
Protein interactions drive diverse processes essential to living organisms, and thus numerous biomedical applications center on understanding, predicting, and designing how proteins recognize their partners. While unfortunately the number of interactions of interest still vastly exceeds the capabilities of experimental determination methods, computational methods promise to fill the gap. My thesis pursues the development and application of computational methods for several protein interaction prediction and design tasks. First, to improve protein-glycan interaction specificity prediction, I developed GlyBERT, which learns biologically relevant glycan representations encapsulating the components most important for glycan recognition within their structures. GlyBERT encodes glycans with a branched biochemical language and employs an attention-based deep language model to embed the correlation between local and global structural contexts. This approach enables the development of predictive models from limited data, supporting applications such as lectin binding prediction. Second, to improve protein-protein interaction prediction, I developed a unified geometric deep neural network, ‘PInet’ (Protein Interface Network), which leverages the best properties of both data- and physics-driven methods, learning and utilizing models capturing both geometrical and physicochemical molecular surface complementarity. In addition to obtaining state-of-the-art performance in predicting protein-protein interactions, PInet can serve as the backbone for other protein-protein interaction modeling tasks such as binding affinity prediction. Finally, I turned from ii prediction to design, addressing two important tasks in the context of antibodyantigen recognition. The first problem is to redesign a given antigen to evade antibody recognition, e.g., to help biotherapeutics avoid pre-existing immunity or to focus vaccine responses on key portions of an antigen. The second problem is to design a panel of variants of a given antigen to use as “bait” in experimental identification of antibodies that recognize different parts of the antigen, e.g., to support classification of immune responses or to help select among different antibody candidates. I developed a geometry-based algorithm to generate variants to address these design problems, seeking to maximize utility subject to experimental constraints. During the design process, the algorithm accounts for and balances the effects of candidate mutations on antibody recognition and on antigen stability. In retrospective case studies, the algorithm demonstrated promising precision, recall, and robustness of finding good designs. This work represents the first algorithm to systematically design antigen variants for characterization and evasion of polyclonal antibody responses
Protein-Ligand Scoring with Convolutional Neural Networks
Computational approaches to drug discovery can reduce the time and cost
associated with experimental assays and enable the screening of novel
chemotypes. Structure-based drug design methods rely on scoring functions to
rank and predict binding affinities and poses. The ever-expanding amount of
protein-ligand binding and structural data enables the use of deep machine
learning techniques for protein-ligand scoring.
We describe convolutional neural network (CNN) scoring functions that take as
input a comprehensive 3D representation of a protein-ligand interaction. A CNN
scoring function automatically learns the key features of protein-ligand
interactions that correlate with binding. We train and optimize our CNN scoring
functions to discriminate between correct and incorrect binding poses and known
binders and non-binders. We find that our CNN scoring function outperforms the
AutoDock Vina scoring function when ranking poses both for pose prediction and
virtual screening
Structure-guided machine learning prediction of drug resistance mutations in Abelson 1 kinase.
Funder: State Government of VictoriaKinases play crucial roles in cellular signalling and biological processes with their dysregulation associated with diseases, including cancers. Kinase inhibitors, most notably those targeting ABeLson 1 (ABL1) kinase in chronic myeloid leukemia, have had a significant impact on cancer survival, yet emergence of resistance mutations can reduce their effectiveness, leading to therapeutic failure. Limited effort, however, has been devoted to developing tools to accurately identify ABL1 resistance mutations, as well as providing insights into their molecular mechanisms. Here we investigated the structural basis of ABL1 mutations modulating binding affinity of eight FDA-approved drugs. We found mutations impair affinity of type I and type II inhibitors differently and used this insight to developed a novel web-based diagnostic tool, SUSPECT-ABL, to pre-emptively predict resistance profiles and binding free-energy changes (ΔΔG) of all possible ABL1 mutations against inhibitors with different binding modes. Resistance mutations in ABL1 were successfully identified, achieving a Matthew's Correlation Coefficient of up to 0.73 and the resulting change in ligand binding affinity with a Pearson's correlation of up to 0.77, with performances consistent across non-redundant blind tests. Through an in silico saturation mutagenesis, our tool has identified possibly emerging resistance mutations, which offers opportunities for in vivo experimental validation. We believe SUSPECT-ABL will be an important tool not just for improving precision medicine efforts, but for facilitating the development of next-generation inhibitors that are less prone to resistance. We have made our tool freely available at http://biosig.unimelb.edu.au/suspect_abl/
Virtual screening for inhibitors of the human TSLP:TSLPR interaction
The pro-inflammatory cytokine thymic stromal lymphopoietin (TSLP) plays a pivotal role in the pathophysiology of various allergy disorders that are mediated by type 2 helper T cell (Th2) responses, such as asthma and atopic dermatitis. TSLP forms a ternary complex with the TSLP receptor (TSLPR) and the interleukin-7-receptor subunit alpha (IL-7Ra), thereby activating a signaling cascade that culminates in the release of pro-inflammatory mediators. In this study, we conducted an in silico characterization of the TSLP: TSLPR complex to investigate the drugability of this complex. Two commercially available fragment libraries were screened computationally for possible inhibitors and a selection of fragments was subsequently tested in vitro. The screening setup consisted of two orthogonal assays measuring TSLP binding to TSLPR: a BLI-based assay and a biochemical assay based on a TSLP: alkaline phosphatase fusion protein. Four fragments pertaining to diverse chemical classes were identified to reduce TSLP: TSLPR complex formation to less than 75% in millimolar concentrations. We have used unbiased molecular dynamics simulations to develop a Markov state model that characterized the binding pathway of the most interesting compound. This work provides a proof-ofprinciple for use of fragments in the inhibition of TSLP: TSLPR complexation
- …