18 research outputs found

    PiRaNhA: A server for the computational prediction of RNA-binding residues in protein sequences

    Get PDF
    The PiRaNhA web server is a publicly available online resource that automatically predicts the location of RNA-binding residues (RBRs) in protein sequences. The goal of functional annotation of sequences in the field of RNA binding is to provide predictions of high accuracy that require only small numbers of targeted mutations for verification. The PiRaNhA server uses a support vector machine (SVM), with position-specific scoring matrices, residue interface propensity, predicted residue accessibility and residue hydrophobicity as features. The server allows the submission of up to 10 protein sequences, and the predictions for each sequence are provided on a web page and via email. The prediction results are provided in sequence format with predicted RBRs highlighted, in text format with the SVM threshold score indicated and as a graph which enables users to quickly identify those residues above any specific SVM threshold. The graph effectively enables the increase or decrease of the false positive rate. When tested on a non-redundant data set of 42 protein sequences not used in training, the PiRaNhA server achieved an accuracy of 85%, specificity of 90% and a Matthews correlation coefficient of 0.41 and outperformed other publicly available servers. The PiRaNhA prediction server is freely available at http://www.bioinformatics.sussex.ac.uk/PIRANHA. © The Author(s) 2010. Published by Oxford University Press

    Identifying Human Kinase-Specific Protein Phosphorylation Sites by Integrating Heterogeneous Information from Various Sources

    Get PDF
    Phosphorylation is an important type of protein post-translational modification. Identification of possible phosphorylation sites of a protein is important for understanding its functions. Unbiased screening for phosphorylation sites by in vitro or in vivo experiments is time consuming and expensive; in silico prediction can provide functional candidates and help narrow down the experimental efforts. Most of the existing prediction algorithms take only the polypeptide sequence around the phosphorylation sites into consideration. However, protein phosphorylation is a very complex biological process in vivo. The polypeptide sequences around the potential sites are not sufficient to determine the phosphorylation status of those residues. In the current work, we integrated various data sources such as protein functional domains, protein subcellular location and protein-protein interactions, along with the polypeptide sequences to predict protein phosphorylation sites. The heterogeneous information significantly boosted the prediction accuracy for some kinase families. To demonstrate potential application of our method, we scanned a set of human proteins and predicted putative phosphorylation sites for Cyclin-dependent kinases, Casein kinase 2, Glycogen synthase kinase 3, Mitogen-activated protein kinases, protein kinase A, and protein kinase C families (avaiable at http://cmbi.bjmu.edu.cn/huphospho). The predicted phosphorylation sites can serve as candidates for further experimental validation. Our strategy may also be applicable for the in silico identification of other post-translational modification substrates

    Impact of residue accessible surface area on the prediction of protein secondary structures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The problem of accurate prediction of protein secondary structure continues to be one of the challenging problems in Bioinformatics. It has been previously suggested that amino acid relative solvent accessibility (RSA) might be an effective factor for increasing the accuracy of protein secondary structure prediction. Previous studies have either used a single constant threshold to classify residues into discrete classes (buries vs. exposed), or used the real-value predicted RSAs in their prediction method.</p> <p>Results</p> <p>We studied the effect of applying different RSA threshold types (namely, fixed thresholds vs. residue-dependent thresholds) on a variety of secondary structure prediction methods. With the consideration of DSSP-assigned RSA values we realized that improvement in the accuracy of prediction strictly depends on the selected threshold(s). Furthermore, we showed that choosing a single threshold for all amino acids is not the best possible parameter. We therefore used residue-dependent thresholds and most of residues showed improvement in prediction. Next, we tried to consider predicted RSA values, since in the real-world problem, protein sequence is the only available information. We first predicted the RSA classes by RVP-net program and then used these data in our method. Using this approach, improvement in prediction was also obtained.</p> <p>Conclusion</p> <p>The success of applying the RSA information on different secondary structure prediction methods suggest that prediction accuracy can be improved independent of prediction approaches. Thus, solvent accessibility can be considered as a rich source of information to help the improvement of these methods.</p

    PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites

    Get PDF
    PHOSIDA, a phosphorylation site database, integrates thousands of phosphosites identified by proteomics in various species

    In silico interaction analysis of cannabinoid receptor interacting protein 1b (CRIP1b) � CB1 cannabinoid receptor

    Get PDF
    Cannabinoid Receptor Interacting Protein isoform 1b (CRIP1b) is known to interact with the CB1 receptor. Alternative splicing of the CNRIP1 gene produces CRIP1a and CRIP1b with a difference in the third exon only. Exons 1 and 2 encode for a functional domain in both proteins. CRIP1a is involved in regulating CB1 receptor internalization, but the function of CRIP1b is not very well characterized. Since there are significant identities in functional domains of these proteins, CRIP1b is a potential target for drug discovery. We report here predicted structure of CRIP1b followed by its interaction analysis with CB1 receptor by in-silico methods A number of complementary computational techniques, including, homology modeling, ab-initio and protein threading, were applied to generate three-dimensional molecular models for CRIP1b. The computed model of CRIP1b was refined, followed by docking with C terminus of CB1 receptor to generate a model for the CRIP1b- CB1 receptor interaction. The structure of CRIP1b obtained by homology modelling using RHOGDI-2 as template is a sandwich fold structure having beta sheets connected by loops, similar to predicted CRIP1a structure. The best scoring refined model of CRIP1b in complex with the CB1 receptor C terminus peptide showed favourable polar interactions. The overall binding pocket of CRIP1b was found to be overlapping to that of CRIP1a. The Arg82 and Cys126 of CRIP1b are involved in the majority of hydrogen bond interactions with the CB1 receptor and are possible key residues required for interactions between the CB1 receptor and CRIP1b. © 2017 Elsevier Inc

    Context dependent reference states of solvent accessibility derived from native protein structures and assessed by predictability analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Solvent accessibility (ASA) of amino acid residues is often transformed from absolute values of <it>exposed surface area </it>to their <it>normalized </it>relative values. This normalization is typically attained by assuming a highest exposure conformation based on <it>extended state </it>of that residue when it is surrounded by Ala or Gly on both sides i.e. Ala-X-Ala or Gly-X-Gly solvent exposed area. Exact sequence context, the folding state of the residues, and the actual environment of a folded protein, which do impose additional constraints on the highest <it>possible </it>(or highest <it>observed</it>) values of ASA, are currently ignored. Here, we analyze the statistics of these constraints and examine how the normalization of absolute ASA values using <it>context-dependent </it>Highest Observed ASA (HOA) instead of <it>context-free </it>extended state ASA (ESA) of residues can influence the performance of sequence-based prediction of solvent accessibility. Characterization of burial and exposed states of residues based on this normalization has also been shown to provide better enrichment of DNA-binding sites in exposed residues.</p> <p>Results</p> <p>We compiled the statistics of highest observed ASA (HOA) of residues in their different contexts and analyzed their distribution in all 400 possible combinations for each residue type. We observe that many trippetides are more exposed than ESA and that HOA residues are often found in <it>turn</it>, <it>coil </it>and <it>bend </it>conformations. On the other hand several residues are never observed in an exposure state close to ESA values. A neural networks trained with HOA-normalized data outperforms the one trained with ESA-normalized values. However, the improvements are subtle in some residues, while they are more significant in others.</p> <p>Conclusion</p> <p>HOA based normalization of solvent accessibility from native structures is proposed and it shows improvement in sequence-based predictability, as well as enrichment in interface residues on surface. There may still be some difference between the highest <it>possible </it>ASA and highest <it>observed </it>ASA due to an insufficiently covered space of ASA distribution in the PDB, which limit the overall improvement in prediction to a relatively modest degree.</p

    A multi-factor model for caspase degradome prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Caspases belong to a class of cysteine proteases which function as critical effectors in cellular processes such as apoptosis and inflammation by cleaving substrates immediately after unique tetrapeptide sites. With hundreds of reported substrates and many more expected to be discovered, the elucidation of the caspase degradome will be an important milestone in the study of these proteases in human health and disease. Several computational methods for predicting caspase cleavage sites have been developed recently for identifying potential substrates. However, as most of these methods are based primarily on the detection of the tetrapeptide cleavage sites - a factor necessary but not sufficient for predicting <it>in vivo </it>substrate cleavage - prediction outcomes will inevitably include many false positives.</p> <p>Results</p> <p>In this paper, we show that structural factors such as the presence of disorder and solvent exposure in the vicinity of the cleavage site are important and can be used to enhance results from cleavage site prediction. We constructed a two-step model incorporating cleavage site prediction and these factors to predict caspase substrates. Sequences are first predicted for cleavage sites using CASVM or GraBCas. Predicted cleavage sites are then scored, ranked and filtered against a cut-off based on their propensities for locating in disordered and solvent exposed regions. Using an independent dataset of caspase substrates, the model was shown to achieve greater positive predictive values compared to CASVM or GraBCas alone, and was able to reduce the false positives pool by up to 13% and 53% respectively while retaining all true positives. We applied our prediction model on the family of receptor tyrosine kinases (RTKs) and highlighted several members as potential caspase targets. The results suggest that RTKs may be generally regulated by caspase cleavage and in some cases, promote the induction of apoptotic cell death - a function distinct from their role as transducers of survival and growth signals.</p> <p>Conclusion</p> <p>As a step towards the prediction of <it>in vivo </it>caspase substrates, we have developed an accurate method incorporating cleavage site prediction and structural factors. The multi-factor model augments existing methods and complements experimental efforts to define the caspase degradome on the systems-wide basis.</p

    Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information

    Get PDF
    Background : Structural properties of proteins such as secondary structure and solvent accessibility contribute to three-dimensional structure prediction, not only in the ab initio case but also when homology information to known structures is available. Structural properties are also routinely used in protein analysis even when homology is available, largely because homology modelling is lower throughput than, say, secondary structure prediction. Nonetheless, predictors of secondary structure and solvent accessibility are virtually always ab initio. Results: Here we develop high-throughput machine learning systems for the prediction of protein secondary structure and solvent accessibility that exploit homology to proteins of known structure, where available, in the form of simple structural frequency profiles extracted from sets of PDB templates. We compare these systems to their state-of-the-art ab initio counterparts, and with a number of baselines in which secondary structures and solvent accessibilities are extracted directly from the templates. We show that structural information from templates greatly improves secondary structure and solvent accessibility prediction quality, and that, on average, the systems significantly enrich the information contained in the templates. For sequence similarity exceeding 30%, secondary structure prediction quality is approximately 90%, close to its theoretical maximum, and 2-class solvent accessibility roughly 85%. Gains are robust with respect to template selection noise, and significant for marginal sequence similarity and for short alignments, supporting the claim that these improved predictions may prove beneficial beyond the case in which clear homology is available. Conclusion: The predictive system are publicly available at the address http://distill.ucd.ieScience Foundation IrelandIrish Research Council for Science, Engineering and TechnologyHealth Research BoardUCD President's Award 2004au, da, ke, ab, sp - kpw30/11/1

    Predicting protein interface residues using easily accessible on-line resources

    Get PDF
    © The Author 2015. Published by Oxford University Press. It has beenmore than a decade since the completion of the Human Genome Project that provided us with a complete list of human proteins. The next obvious task is to figure out how various parts interact with each other. On that account, we re- view 10methods for protein interface prediction, which are freely available as web servers. In addition, we comparatively evaluate their performance on a common data set comprising different quality target structures. We find that using experi- mental structures and high-quality homology models, structure-basedmethods outperformthose using only protein se- quences, with global template-based approaches providing the best performance. Formoderate-qualitymodels, sequence- basedmethods often performbetter than those structure-based techniques that rely on fine atomic details. We note that post-processing protocols implemented in severalmethods quantitatively improve the results only for experimental struc- tures, suggesting that these procedures should be tuned up for computer-generatedmodels. Finally, we anticipate that advancedmeta-prediction protocols are likely to enhance interface residue prediction. Notwithstanding further improve- ments, easily accessible web servers already provide the scientific community with convenient resources for the identifica- tion of protein-protein interaction sites
    corecore