22 research outputs found

    Predicting sumoylation sites using support vector machines based on various sequence features, conformational flexibility and disorder

    Get PDF
    Background Sumoylation, which is a reversible and dynamic post-translational modification, is one of the vital processes in a cell. Before a protein matures to perform its function, sumoylation may alter its localization, interactions, and possibly structural conformation. Abberations in protein sumoylation has been linked with a variety of disorders and developmental anomalies. Experimental approaches to identification of sumoylation sites may not be effective due to the dynamic nature of sumoylation, laborsome experiments and their cost. Therefore, computational approaches may guide experimental identification of sumoylation sites and provide insights for further understanding sumoylation mechanism. Results In this paper, the effectiveness of using various sequence properties in predicting sumoylation sites was investigated with statistical analyses and machine learning approach employing support vector machines. These sequence properties were derived from windows of size 7 including position-specific amino acid composition, hydrophobicity, estimated sub-window volumes, predicted disorder, and conformational flexibility. 5-fold cross-validation results on experimentally identified sumoylation sites revealed that our method successfully predicts sumoylation sites with a Matthew's correlation coefficient, sensitivity, specificity, and accuracy equal to 0.66, 73%, 98%, and 97%, respectively. Additionally, we have showed that our method compares favorably to the existing prediction methods and basic regular expressions scanner. Conclusions By using support vector machines, a new, robust method for sumoylation site prediction was introduced. Besides, the possible effects of predicted conformational flexibility and disorder on sumoylation site recognition were explored computationally for the first time to our knowledge as an additional parameter that could aid in sumoylation site prediction

    SUMOhydro: A Novel Method for the Prediction of Sumoylation Sites Based on Hydrophobic Properties

    Get PDF
    Sumoylation is one of the most essential mechanisms of reversible protein post-translational modifications and is a crucial biochemical process in the regulation of a variety of important biological functions. Sumoylation is also closely involved in various human diseases. The accurate computational identification of sumoylation sites in protein sequences aids in experimental design and mechanistic research in cellular biology. In this study, we introduced amino acid hydrophobicity as a parameter into a traditional binary encoding scheme and developed a novel sumoylation site prediction tool termed SUMOhydro. With the assistance of a support vector machine, the proposed method was trained and tested using a stringent non-redundant sumoylation dataset. In a leave-one-out cross-validation, the proposed method yielded an excellent performance with a correlation coefficient, specificity, sensitivity and accuracy equal to 0.690, 98.6%, 71.1% and 97.5%, respectively. In addition, SUMOhydro has been benchmarked against previously described predictors based on an independent dataset, thereby suggesting that the introduction of hydrophobicity as an additional parameter could assist in the prediction of sumoylation sites. Currently, SUMOhydro is freely accessible at http://protein.cau.edu.cn/others/SUMOhydro/

    SumSec: accurate prediction of Sumoylation sites using predicted secondary structure

    Get PDF
    Post Translational Modification (PTM) is defined as the modification of amino acids along the protein sequences after the translation process. These modifications significantly impact on the functioning of proteins. Therefore, having a comprehensive understanding of the underlying mechanism of PTMs turns out to be critical in studying the biological roles of proteins. Among a wide range of PTMs, sumoylation is one of the most important modifications due to its known cellular functions which include transcriptional regulation, protein stability, and protein subcellular localization. Despite its importance, determining sumoylation sites via experimental methods is time-consuming and costly. This has led to a great demand for the development of fast computational methods able to accurately determine sumoylation sites in proteins. In this study, we present a new machine learning-based method for predicting sumoylation sites called SumSec. To do this, we employed the predicted secondary structure of amino acids to extract two types of structural features from neighboring amino acids along the protein sequence which has never been used for this task. As a result, our proposed method is able to enhance the sumoylation site prediction task, outperforming previously proposed methods in the literature. SumSec demonstrated high sensitivity (0.91), accuracy (0.94) and MCC (0.88). The prediction accuracy achieved in this study is 21% better than those reported in previous studies. The script and extracted features are publicly available at: https://github.com/YosvanyLopez/SumSec

    Amino acid preferences at neddylation sites

    Get PDF
    Neddylation is a dynamic post-translational modification in which NEDD8 proteins are covalently attached to the target site lysine residue. Neddylation may affect a target protein’s localization, binding partners and structure. Targets of this modification have commonly found in nucleus and the most well characterized target family is cullins, which is modulating ubiquitination and proteosomal degradation system in a cell. Disruptions in neddylation pathway implicated in various diseases such as Alzheimer’s, Parkinson’s and cancer. Therefore, understanding neddylation site recognition bears a huge importance in understanding the complete functional mechanism of this post-translational modification and revealing the mechanisms of associated diseases towards a cure. However, there is no study in literature investigating whether a common neddylation site motif exists or not. In this work, we have identified various amino acid preferences and hydrophobicity patterns seen in neddylation sites, differing from not neddylated lysine residues

    Swarm intelligence for optimizing the parameters of multiple sequence aligners

    Get PDF
    Rubio-Largo, Á., Vanneschi, L., Castelli, M., & Vega-Rodríguez, M. A. (2018). Swarm intelligence for optimizing the parameters of multiple sequence aligners. Swarm and Evolutionary Computation. DOI: 10.1016/j.swevo.2018.04.003Different aligner heuristics can be found in the literature to solve the Multiple Sequence Alignment problem. These aligners rely on the parameter configuration proposed by their authors (also known as default parameter configuration), that tried to obtain good results (alignments with high accuracy and conservation) for any input set of unaligned sequences. However, the default parameter configuration is not always the best parameter configuration for every input set; namely, depending on the biological characteristics of the input set, one may be able to find a better parameter configuration that outputs a more accurate and conservative alignment. This work's main contributions include: to study the input set's biological characteristics and to then apply the best parameter configuration found depending on those characteristics. The framework uses a pre-computed file to take the best parameter configuration found for a dataset with similar biological characteristics. In order to create this file, we use a Particle Swarm Optimization (PSO) algorithm, that is, an algorithm based on swarm intelligence. To test the effectiveness of the characteristic-based framework, we employ five well-known aligners: Clustal W, DIALIGN-TX, Kalign2, MAFFT, and MUSCLE. The results of these aligners see clear improvements when using the proposed characteristic-based framework.authorsversionpublishe

    Numerical characterization of protein sequences based on the generalized Chou\u27s pseudo amino acid composition

    Get PDF
    The technique of comparison and analysis of biological sequences is playing an increasingly important role in the field of Computational Biology and Bioinformatics. One of the key steps in developing the technique is to identify an appropriate manner to represent a biological sequence. In this paper, on the basis of three physical-chemical properties of amino acids, a protein primary sequence is reduced into a six-letter sequence, and then a set of elements which reflect the global and local sequence-order information is extracted. Combining these elements with the frequencies of 20 native amino acids, a (21+λ) dimensional vector is constructed to characterize the protein sequence. The utility of the proposed approach is illustrated by phylogenetic analysis and identification of DNA-binding proteins

    Dual-functioning transcription factors in the developmental gene network of Drosophila melanogaster

    Get PDF
    Quantitative models for transcriptional regulation have shown great promise for advancing our understanding of the biological mechanisms underlying gene regulation. However, all of the models to date assume a transcription factor (TF) to have either activating or repressing function towards all the genes it is regulating.In this paper we demonstrate, on the example of the developmental gene network in D. melanogaster, that the data-fit can be improved by up to 40% if the model is allowing certain TFs to have dual function, that is, acting as activator for some genes and as repressor for others. We demonstrate that the improvement is not due to additional flexibility in the model but rather derived from the data itself. We also found no evidence for the involvement of other known site-specific TFs in regulating this network. Finally, we propose SUMOylation as a candidate biological mechanism allowing TFs to switch their role when a small ubiquitin-like modifier (SUMO) is covalently attached to the TF. We strengthen this hypothesis by demonstrating that the TFs predicted to have dual function also contain the known SUMO consensus motif, while TFs predicted to have only one role lack this motif.We argue that a SUMOylation-dependent mechanism allowing TFs to have dual function represents a promising area for further research and might be another step towards uncovering the biological mechanisms underlying transcriptional regulation


    Get PDF
    Since large amounts of biological data are generated using various high-throughput technologies, efficient computational methods are important for understanding the biological meanings behind the complex data. Machine learning is particularly appealing for biological knowledge discovery. Tissue-specific gene expression and protein sumoylation play essential roles in the cell and are implicated in many human diseases. Protein destabilization is a common mechanism by which mutations cause human diseases. In this study, machine learning approaches were developed for predicting human tissue-specific genes, protein sumoylation sites and protein stability changes upon single amino acid substitutions. Relevant biological features were selected for input vector encoding, and machine learning algorithms, including Random Forests and Support Vector Machines, were used for classifier construction. The results suggest that the approaches give rise to more accurate predictions than previous studies and can provide valuable information for further experimental studies. Moreover, seeSUMO and MuStab web servers were developed to make the classifiers accessible to the biological research community. Structure-based methods can be used to predict the effects of amino acid substitutions on protein function and stability. The nonsynonymous Single Nucleotide Polymorphisms (nsSNPs) located at the protein binding interface have dramatic effects on protein-protein interactions. To model the effects, the nsSNPs at the interfaces of 264 protein-protein complexes were mapped on the protein structures using homology-based methods. The results suggest that disease-causing nsSNPs tend to destabilize the electrostatic component of the binding energy and nsSNPs at conserved positions have significant effects on binding energy changes. The structure-based approach was developed to quantitatively assess the effects of amino acid substitutions on protein stability and protein-protein interaction. It was shown that the structure-based analysis could help elucidate the mechanisms by which mutations cause human genetic disorders. These new bioinformatic methods can be used to analyze some interesting genes and proteins for human genetic research and improve our understanding of their molecular mechanisms underlying human diseases

    HseSUMO: Sumoylation site prediction using half - sphere exposures of amino acids residues

    Get PDF
    Background Post-translational modifications are viewed as an important mechanism for controlling protein function and are believed to be involved in multiple important diseases. However, their profiling using laboratory-based techniques remain challenging. Therefore, making the development of accurate computational methods to predict post-translational modifications is particularly important for making progress in this area of research. Results This work explores the use of four half-sphere exposure-based features for computational prediction of sumoylation sites. Unlike most of the previously proposed approaches, which focused on patterns of amino acid co-occurrence, we were able to demonstrate that protein structural based features could be sufficiently informative to achieve good predictive performance. The evaluation of our method has demonstrated high sensitivity (0.9), accuracy (0.89) and Matthew’s correlation coefficient (0.78–0.79). We have compared these results to the recently released pSumo-CD method and were able to demonstrate better performance of our method on the same evaluation dataset. Conclusions The proposed predictor HseSUMO uses half-sphere exposures of amino acids to predict sumoylation sites. It has shown promising results on a benchmark dataset when compared with the state-of-the-art method