89,432 research outputs found

    PRED-CLASS: cascading neural networks for generalized protein classification and genome-wide applications

    Full text link
    A cascading system of hierarchical, artificial neural networks (named PRED-CLASS) is presented for the generalized classification of proteins into four distinct classes-transmembrane, fibrous, globular, and mixed-from information solely encoded in their amino acid sequences. The architecture of the individual component networks is kept very simple, reducing the number of free parameters (network synaptic weights) for faster training, improved generalization, and the avoidance of data overfitting. Capturing information from as few as 50 protein sequences spread among the four target classes (6 transmembrane, 10 fibrous, 13 globular, and 17 mixed), PRED-CLASS was able to obtain 371 correct predictions out of a set of 387 proteins (success rate approximately 96%) unambiguously assigned into one of the target classes. The application of PRED-CLASS to several test sets and complete proteomes of several organisms demonstrates that such a method could serve as a valuable tool in the annotation of genomic open reading frames with no functional assignment or as a preliminary step in fold recognition and ab initio structure prediction methods. Detailed results obtained for various data sets and completed genomes, along with a web sever running the PRED-CLASS algorithm, can be accessed over the World Wide Web at http://o2.biol.uoa.gr/PRED-CLAS

    Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

    Full text link
    Recently exciting progress has been made on protein contact prediction, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual networks. This deep neural network allows us to model very complex sequence-contact relationship as well as long-range inter-contact correlation. Our method greatly outperforms existing contact prediction methods and leads to much more accurate contact-assisted protein folding. Tested on three datasets of 579 proteins, the average top L long-range prediction accuracy obtained our method, the representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints can yield correct folds (i.e., TMscore>0.6) for 203 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 proteins, respectively. Further, our contact-assisted models have much better quality than template-based models. Using our predicted contacts as restraints, we can (ab initio) fold 208 of the 398 membrane proteins with TMscore>0.5. By contrast, when the training proteins of our method are used as templates, homology modeling can only do so for 10 of them. One interesting finding is that even if we do not train our prediction models with any membrane proteins, our method works very well on membrane protein prediction. Finally, in recent blind CAMEO benchmark our method successfully folded 5 test proteins with a novel fold

    Segmentally Variable Genes: A New Perspective on Adaptation

    Get PDF
    Genomic sequence variation is the hallmark of life and is key to understanding diversity and adaptation among the numerous microorganisms on earth. Analysis of the sequenced microbial genomes suggests that genes are evolving at many different rates. We have attempted to derive a new classification of genes into three broad categories: lineage-specific genes that evolve rapidly and appear unique to individual species or strains; highly conserved genes that frequently perform housekeeping functions; and partially variable genes that contain highly variable regions, at least 70 amino acids long, interspersed among well-conserved regions. The latter we term segmentally variable genes (SVGs), and we suggest that they are especially interesting targets for biochemical studies. Among these genes are ones necessary to deal with the environment, including genes involved in host–pathogen interactions, defense mechanisms, and intracellular responses to internal and environmental changes. For the most part, the detailed function of these variable regions remains unknown. We propose that they are likely to perform important binding functions responsible for protein–protein, protein–nucleic acid, or protein–small molecule interactions. Discerning their function and identifying their binding partners may offer biologists new insights into the basic mechanisms of adaptation, context-dependent evolution, and the interaction between microbes and their environment. Segmentally variable genes show a mosaic pattern of one or more rapidly evolving, variable regions. Discerning their function may provide new insights into the forces that shape genome diversity and adaptationNational Science Foundation (998088, 0239435

    A saposin-lipoprotein nanoparticle system for membrane proteins.

    Get PDF
    A limiting factor in membrane protein research is the ability to solubilize and stabilize such proteins. Detergents are used most often for solubilizing membrane proteins, but they are associated with protein instability and poor compatibility with structural and biophysical studies. Here we present a saposin-lipoprotein nanoparticle system, Salipro, which allows for the reconstitution of membrane proteins in a lipid environment that is stabilized by a scaffold of saposin proteins. We demonstrate the applicability of the method on two purified membrane protein complexes as well as by the direct solubilization and nanoparticle incorporation of a viral membrane protein complex from the virus membrane. Our approach facilitated high-resolution structural studies of the bacterial peptide transporter PeptTSo2 by single-particle cryo-electron microscopy (cryo-EM) and allowed us to stabilize the HIV envelope glycoprotein in a functional state

    MODBASE, a database of annotated comparative protein structure models and associated resources.

    Get PDF
    MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence-structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE currently contains 5,152,695 reliable models for domains in 1,593,209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server (http://salilab.org/modweb). Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/)

    TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions

    Full text link
    Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to three-dimensional (3D) biomolecular structural data sets have been hindered by the entangled geometric complexity and biological complexity. We introduce topology, i.e., element specific persistent homology (ESPH), to untangle geometric complexity and biological complexity. ESPH represents 3D complex geometry by one-dimensional (1D) topological invariants and retains crucial biological information via a multichannel image representation. It is able to reveal hidden structure-function relationships in biomolecules. We further integrate ESPH and convolutional neural networks to construct a multichannel topological neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the limitations to deep learning arising from small and noisy training sets, we present a multitask topological convolutional neural network (MT-TCNN). We demonstrate that the present TopologyNet architectures outperform other state-of-the-art methods in the predictions of protein-ligand binding affinities, globular protein mutation impacts, and membrane protein mutation impacts.Comment: 20 pages, 8 figures, 5 table

    Malaria parasite translocon structure and mechanism of effector export.

    Get PDF
    The putative Plasmodium translocon of exported proteins (PTEX) is essential for transport of malarial effector proteins across a parasite-encasing vacuolar membrane into host erythrocytes, but the mechanism of this process remains unknown. Here we show that PTEX is a bona fide translocon by determining structures of the PTEX core complex at near-atomic resolution using cryo-electron microscopy. We isolated the endogenous PTEX core complex containing EXP2, PTEX150 and HSP101 from Plasmodium falciparum in the 'engaged' and 'resetting' states of endogenous cargo translocation using epitope tags inserted using the CRISPR-Cas9 system. In the structures, EXP2 and PTEX150 interdigitate to form a static, funnel-shaped pseudo-seven-fold-symmetric protein-conducting channel spanning the vacuolar membrane. The spiral-shaped AAA+ HSP101 hexamer is tethered above this funnel, and undergoes pronounced compaction that allows three of six tyrosine-bearing pore loops lining the HSP101 channel to dissociate from the cargo, resetting the translocon for the next threading cycle. Our work reveals the mechanism of P. falciparum effector export, and will inform structure-based design of drugs targeting this unique translocon
    corecore