89,432 research outputs found
PRED-CLASS: cascading neural networks for generalized protein classification and genome-wide applications
A cascading system of hierarchical, artificial neural networks (named
PRED-CLASS) is presented for the generalized classification of proteins into
four distinct classes-transmembrane, fibrous, globular, and mixed-from
information solely encoded in their amino acid sequences. The architecture of
the individual component networks is kept very simple, reducing the number of
free parameters (network synaptic weights) for faster training, improved
generalization, and the avoidance of data overfitting. Capturing information
from as few as 50 protein sequences spread among the four target classes (6
transmembrane, 10 fibrous, 13 globular, and 17 mixed), PRED-CLASS was able to
obtain 371 correct predictions out of a set of 387 proteins (success rate
approximately 96%) unambiguously assigned into one of the target classes. The
application of PRED-CLASS to several test sets and complete proteomes of
several organisms demonstrates that such a method could serve as a valuable
tool in the annotation of genomic open reading frames with no functional
assignment or as a preliminary step in fold recognition and ab initio structure
prediction methods. Detailed results obtained for various data sets and
completed genomes, along with a web sever running the PRED-CLASS algorithm, can
be accessed over the World Wide Web at http://o2.biol.uoa.gr/PRED-CLAS
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model
Recently exciting progress has been made on protein contact prediction, but
the predicted contacts for proteins without many sequence homologs is still of
low quality and not very useful for de novo structure prediction. This paper
presents a new deep learning method that predicts contacts by integrating both
evolutionary coupling (EC) and sequence conservation information through an
ultra-deep neural network formed by two deep residual networks. This deep
neural network allows us to model very complex sequence-contact relationship as
well as long-range inter-contact correlation. Our method greatly outperforms
existing contact prediction methods and leads to much more accurate
contact-assisted protein folding. Tested on three datasets of 579 proteins, the
average top L long-range prediction accuracy obtained our method, the
representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21
and 0.30, respectively; the average top L/10 long-range accuracy of our method,
CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding
using our predicted contacts as restraints can yield correct folds (i.e.,
TMscore>0.6) for 203 test proteins, while that using MetaPSICOV- and
CCMpred-predicted contacts can do so for only 79 and 62 proteins, respectively.
Further, our contact-assisted models have much better quality than
template-based models. Using our predicted contacts as restraints, we can (ab
initio) fold 208 of the 398 membrane proteins with TMscore>0.5. By contrast,
when the training proteins of our method are used as templates, homology
modeling can only do so for 10 of them. One interesting finding is that even if
we do not train our prediction models with any membrane proteins, our method
works very well on membrane protein prediction. Finally, in recent blind CAMEO
benchmark our method successfully folded 5 test proteins with a novel fold
Segmentally Variable Genes: A New Perspective on Adaptation
Genomic sequence variation is the hallmark of life and is key to understanding diversity and adaptation among the numerous microorganisms on earth. Analysis of the sequenced microbial genomes suggests that genes are evolving at many different rates. We have attempted to derive a new classification of genes into three broad categories: lineage-specific genes that evolve rapidly and appear unique to individual species or strains; highly conserved genes that frequently perform housekeeping functions; and partially variable genes that contain highly variable regions, at least 70 amino acids long, interspersed among well-conserved regions. The latter we term segmentally variable genes (SVGs), and we suggest that they are especially interesting targets for biochemical studies. Among these genes are ones necessary to deal with the environment, including genes involved in host–pathogen interactions, defense mechanisms, and intracellular responses to internal and environmental changes. For the most part, the detailed function of these variable regions remains unknown. We propose that they are likely to perform important binding functions responsible for protein–protein, protein–nucleic acid, or protein–small molecule interactions. Discerning their function and identifying their binding partners may offer biologists new insights into the basic mechanisms of adaptation, context-dependent evolution, and the interaction between microbes and their environment. Segmentally variable genes show a mosaic pattern of one or more rapidly evolving, variable regions. Discerning their function may provide new insights into the forces that shape genome diversity and adaptationNational Science Foundation (998088, 0239435
Recommended from our members
Structure of the AAA protein Msp1 reveals mechanism of mislocalized membrane protein extraction.
The AAA protein Msp1 extracts mislocalized tail-anchored membrane proteins and targets them for degradation, thus maintaining proper cell organization. How Msp1 selects its substrates and firmly engages them during the energetically unfavorable extraction process remains a mystery. To address this question, we solved cryo-EM structures of Msp1-substrate complexes at near-atomic resolution. Akin to other AAA proteins, Msp1 forms hexameric spirals that translocate substrates through a central pore. A singular hydrophobic substrate recruitment site is exposed at the spiral's seam, which we propose positions the substrate for entry into the pore. There, a tight web of aromatic amino acids grips the substrate in a sequence-promiscuous, hydrophobic milieu. Elements at the intersubunit interfaces coordinate ATP hydrolysis with the subunits' positions in the spiral. We present a comprehensive model of Msp1's mechanism, which follows general architectural principles established for other AAA proteins yet specializes Msp1 for its unique role in membrane protein extraction
A saposin-lipoprotein nanoparticle system for membrane proteins.
A limiting factor in membrane protein research is the ability to solubilize and stabilize such proteins. Detergents are used most often for solubilizing membrane proteins, but they are associated with protein instability and poor compatibility with structural and biophysical studies. Here we present a saposin-lipoprotein nanoparticle system, Salipro, which allows for the reconstitution of membrane proteins in a lipid environment that is stabilized by a scaffold of saposin proteins. We demonstrate the applicability of the method on two purified membrane protein complexes as well as by the direct solubilization and nanoparticle incorporation of a viral membrane protein complex from the virus membrane. Our approach facilitated high-resolution structural studies of the bacterial peptide transporter PeptTSo2 by single-particle cryo-electron microscopy (cryo-EM) and allowed us to stabilize the HIV envelope glycoprotein in a functional state
MODBASE, a database of annotated comparative protein structure models and associated resources.
MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence-structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE currently contains 5,152,695 reliable models for domains in 1,593,209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server (http://salilab.org/modweb). Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/)
TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions
Although deep learning approaches have had tremendous success in image, video
and audio processing, computer vision, and speech recognition, their
applications to three-dimensional (3D) biomolecular structural data sets have
been hindered by the entangled geometric complexity and biological complexity.
We introduce topology, i.e., element specific persistent homology (ESPH), to
untangle geometric complexity and biological complexity. ESPH represents 3D
complex geometry by one-dimensional (1D) topological invariants and retains
crucial biological information via a multichannel image representation. It is
able to reveal hidden structure-function relationships in biomolecules. We
further integrate ESPH and convolutional neural networks to construct a
multichannel topological neural network (TopologyNet) for the predictions of
protein-ligand binding affinities and protein stability changes upon mutation.
To overcome the limitations to deep learning arising from small and noisy
training sets, we present a multitask topological convolutional neural network
(MT-TCNN). We demonstrate that the present TopologyNet architectures outperform
other state-of-the-art methods in the predictions of protein-ligand binding
affinities, globular protein mutation impacts, and membrane protein mutation
impacts.Comment: 20 pages, 8 figures, 5 table
Malaria parasite translocon structure and mechanism of effector export.
The putative Plasmodium translocon of exported proteins (PTEX) is essential for transport of malarial effector proteins across a parasite-encasing vacuolar membrane into host erythrocytes, but the mechanism of this process remains unknown. Here we show that PTEX is a bona fide translocon by determining structures of the PTEX core complex at near-atomic resolution using cryo-electron microscopy. We isolated the endogenous PTEX core complex containing EXP2, PTEX150 and HSP101 from Plasmodium falciparum in the 'engaged' and 'resetting' states of endogenous cargo translocation using epitope tags inserted using the CRISPR-Cas9 system. In the structures, EXP2 and PTEX150 interdigitate to form a static, funnel-shaped pseudo-seven-fold-symmetric protein-conducting channel spanning the vacuolar membrane. The spiral-shaped AAA+ HSP101 hexamer is tethered above this funnel, and undergoes pronounced compaction that allows three of six tyrosine-bearing pore loops lining the HSP101 channel to dissociate from the cargo, resetting the translocon for the next threading cycle. Our work reveals the mechanism of P. falciparum effector export, and will inform structure-based design of drugs targeting this unique translocon
- …