275 research outputs found

    Prediction of cis/trans isomerization in proteins using PSI-BLAST profiles and secondary structure information

    Get PDF
    BACKGROUND: The majority of peptide bonds in proteins are found to occur in the trans conformation. However, for proline residues, a considerable fraction of Prolyl peptide bonds adopt the cis form. Proline cis/trans isomerization is known to play a critical role in protein folding, splicing, cell signaling and transmembrane active transport. Accurate prediction of proline cis/trans isomerization in proteins would have many important applications towards the understanding of protein structure and function. RESULTS: In this paper, we propose a new approach to predict the proline cis/trans isomerization in proteins using support vector machine (SVM). The preliminary results indicated that using Radial Basis Function (RBF) kernels could lead to better prediction performance than that of polynomial and linear kernel functions. We used single sequence information of different local window sizes, amino acid compositions of different local sequences, multiple sequence alignment obtained from PSI-BLAST and the secondary structure information predicted by PSIPRED. We explored these different sequence encoding schemes in order to investigate their effects on the prediction performance. The training and testing of this approach was performed on a newly enlarged dataset of 2424 non-homologous proteins determined by X-Ray diffraction method using 5-fold cross-validation. Selecting the window size 11 provided the best performance for determining the proline cis/trans isomerization based on the single amino acid sequence. It was found that using multiple sequence alignments in the form of PSI-BLAST profiles could significantly improve the prediction performance, the prediction accuracy increased from 62.8% with single sequence to 69.8% and Matthews Correlation Coefficient (MCC) improved from 0.26 with single local sequence to 0.40. Furthermore, if coupled with the predicted secondary structure information by PSIPRED, our method yielded a prediction accuracy of 71.5% and MCC of 0.43, 9% and 0.17 higher than the accuracy achieved based on the singe sequence information, respectively. CONCLUSION: A new method has been developed to predict the proline cis/trans isomerization in proteins based on support vector machine, which used the single amino acid sequence with different local window sizes, the amino acid compositions of local sequence flanking centered proline residues, the position-specific scoring matrices (PSSMs) extracted by PSI-BLAST and the predicted secondary structures generated by PSIPRED. The successful application of SVM approach in this study reinforced that SVM is a powerful tool in predicting proline cis/trans isomerization in proteins and biological sequence analysis

    Conditional random field approach to prediction of protein-protein interactions using domain information

    Get PDF
    For understanding cellular systems and biological networks, it is important to analyze functions and interactions of proteins and domains. Many methods for predicting protein-protein interactions have been developed. It is known that mutual information between residues at interacting sites can be higher than that at non-interacting sites. It is based on the thought that amino acid residues at interacting sites have coevolved with those at the corresponding residues in the partner proteins. Several studies have shown that such mutual information is useful for identifying contact residues in interacting proteins

    APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is well known that most of the binding free energy of protein interaction is contributed by a few key hot spot residues. These residues are crucial for understanding the function of proteins and studying their interactions. Experimental hot spots detection methods such as alanine scanning mutagenesis are not applicable on a large scale since they are time consuming and expensive. Therefore, reliable and efficient computational methods for identifying hot spots are greatly desired and urgently required.</p> <p>Results</p> <p>In this work, we introduce an efficient approach that uses support vector machine (SVM) to predict hot spot residues in protein interfaces. We systematically investigate a wide variety of 62 features from a combination of protein sequence and structure information. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the F-score method. Based on the selected features, nine individual-feature based predictors are developed to identify hot spots using SVMs. Furthermore, a new ensemble classifier, namely APIS (A combined model based on Protrusion Index and Solvent accessibility), is developed to further improve the prediction accuracy. The results on two benchmark datasets, ASEdb and BID, show that this proposed method yields significantly better prediction accuracy than those previously published in the literature. In addition, we also demonstrate the predictive power of our proposed method by modelling two protein complexes: the calmodulin/myosin light chain kinase complex and the heat shock locus gene products U and V complex, which indicate that our method can identify more hot spots in these two complexes compared with other state-of-the-art methods.</p> <p>Conclusion</p> <p>We have developed an accurate prediction model for hot spot residues, given the structure of a protein complex. A major contribution of this study is to propose several new features based on the protrusion index of amino acid residues, which has been shown to significantly improve the prediction performance of hot spots. Moreover, we identify a compact and useful feature subset that has an important implication for identifying hot spot residues. Our results indicate that these features are more effective than the conventional evolutionary conservation, pairwise residue potentials and other traditional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues. The data and source code are available on web site <url>http://home.ustc.edu.cn/~jfxia/hotspot.html</url>.</p

    Predicting Residue-Residue Contacts and Helix-Helix Interactions in Transmembrane Proteins Using an Integrative Feature-Based Random Forest Approach

    Get PDF
    Integral membrane proteins constitute 25–30% of genomes and play crucial roles in many biological processes. However, less than 1% of membrane protein structures are in the Protein Data Bank. In this context, it is important to develop reliable computational methods for predicting the structures of membrane proteins. Here, we present the first application of random forest (RF) for residue-residue contact prediction in transmembrane proteins, which we term as TMhhcp. Rigorous cross-validation tests indicate that the built RF models provide a more favorable prediction performance compared with two state-of-the-art methods, i.e., TMHcon and MEMPACK. Using a strict leave-one-protein-out jackknifing procedure, they were capable of reaching the top L/5 prediction accuracies of 49.5% and 48.8% for two different residue contact definitions, respectively. The predicted residue contacts were further employed to predict interacting helical pairs and achieved the Matthew's correlation coefficients of 0.430 and 0.424, according to two different residue contact definitions, respectively. To facilitate the academic community, the TMhhcp server has been made freely accessible at http://protein.cau.edu.cn/tmhhcp

    PFresGO: an attention mechanism-based deep-learning approach for protein annotation by integrating gene ontology inter-relationships

    Get PDF
    MOTIVATION: The rapid accumulation of high-throughput sequence data demands the development of effective and efficient data-driven computational methods to functionally annotate proteins. However, most current approaches used for functional annotation simply focus on the use of protein-level information but ignore inter-relationships among annotations. RESULTS: Here, we established PFresGO, an attention-based deep-learning approach that incorporates hierarchical structures in Gene Ontology (GO) graphs and advances in natural language processing algorithms for the functional annotation of proteins. PFresGO employs a self-attention operation to capture the inter-relationships of GO terms, updates its embedding accordingly and uses a cross-attention operation to project protein representations and GO embedding into a common latent space to identify global protein sequence patterns and local functional residues. We demonstrate that PFresGO consistently achieves superior performance across GO categories when compared with 'state-of-the-art' methods. Importantly, we show that PFresGO can identify functionally important residues in protein sequences by assessing the distribution of attention weightings. PFresGO should serve as an effective tool for the accurate functional annotation of proteins and functional domains within proteins. AVAILABILITY AND IMPLEMENTATION: PFresGO is available for academic purposes at https://github.com/BioColLab/PFresGO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural and network features in a machine learning framework

    Get PDF
    Determining the catalytic residues in an enzyme is critical to our understanding the relationship between protein sequence, structure, function, and enhancing our ability to design novel enzymes and their inhibitors. Although many enzymes have been sequenced, and their primary and tertiary structures determined, experimental methods for enzyme functional characterization lag behind. Because experimental methods used for identifying catalytic residues are resource- and labor-intensive, computational approaches have considerable value and are highly desirable for their ability to complement experimental studies in identifying catalytic residues and helping to bridge the sequence–structure–function gap. In this study, we describe a new computational method called PREvaIL for predicting enzyme catalytic residues. This method was developed by leveraging a comprehensive set of informative features extracted from multiple levels, including sequence, structure, and residue-contact network, in a random forest machine-learning framework. Extensive benchmarking experiments on eight different datasets based on 10-fold cross-validation and independent tests, as well as side-by-side performance comparisons with seven modern sequence- and structure-based methods, showed that PREvaIL achieved competitive predictive performance, with an area under the receiver operating characteristic curve and area under the precision-recall curve ranging from 0.896 to 0.973 and from 0.294 to 0.523, respectively. We demonstrated that this method was able to capture useful signals arising from different levels, leveraging such differential but useful types of features and allowing us to significantly improve the performance of catalytic residue prediction. We believe that this new method can be utilized as a valuable tool for both understanding the complex sequence–structure–function relationships of proteins and facilitating the characterization of novel enzymes lacking functional annotations

    FusC, a member of the M16 protease family acquired by bacteria for iron piracy against plants.

    Get PDF
    Iron is essential for life. Accessing iron from the environment can be a limiting factor that determines success in a given environmental niche. For bacteria, access of chelated iron from the environment is often mediated by TonB-dependent transporters (TBDTs), which are β-barrel proteins that form sophisticated channels in the outer membrane. Reports of iron-bearing proteins being used as a source of iron indicate specific protein import reactions across the bacterial outer membrane. The molecular mechanism by which a folded protein can be imported in this way had remained mysterious, as did the evolutionary process that could lead to such a protein import pathway. How does the bacterium evolve the specificity factors that would be required to select and import a protein encoded on another organism's genome? We describe here a model whereby the plant iron-bearing protein ferredoxin can be imported across the outer membrane of the plant pathogen Pectobacterium by means of a Brownian ratchet mechanism, thereby liberating iron into the bacterium to enable its growth in plant tissues. This import pathway is facilitated by FusC, a member of the same protein family as the mitochondrial processing peptidase (MPP). The Brownian ratchet depends on binding sites discovered in crystal structures of FusC that engage a linear segment of the plant protein ferredoxin. Sequence relationships suggest that the bacterial gene encoding FusC has previously unappreciated homologues in plants and that the protein import mechanism employed by the bacterium is an evolutionary echo of the protein import pathway in plant mitochondria and plastids

    A subset of HLA-I peptides are not genomically templated: evidence for cis- and trans-spliced peptide ligands

    Get PDF
    The diversity of peptides displayed by class I human leukocyte antigen (HLA) plays an essential role in T cell immunity. The peptide repertoire is extended by various posttranslational modifications, including proteasomal splicing of peptide fragments from distinct regions of an antigen to form nongenomically templated cis-spliced sequences. Previously, it has been suggested that a fraction of the immunopeptidome constitutes such cis-spliced peptides; however, because of computational limitations, it has not been possible to assess whether trans-spliced peptides (i.e., the fusion of peptide segments from distinct antigens) are also bound and presented by HLA molecules, and if so, in what proportion. Here, we have developed and applied a bioinformatic workflow and demonstrated that trans-spliced peptides are presented by HLA-I, and their abundance challenges current models of proteasomal splicing that predict cis-splicing as the most probable outcome. These trans-spliced peptides display canonical HLA-binding sequence features and are as frequently identified as cis-spliced peptides found bound to a number of different HLA-A and HLA-B allotypes. Structural analysis reveals that the junction between spliced peptides is highly solvent exposed and likely to participate in T cell receptor interactions. These results highlight the unanticipated diversity of the immunopeptidome and have important implications for autoimmunity, vaccine design, and immunotherapy
    corecore