349 research outputs found

    PRETICTIVE BIOINFORMATIC METHODS FOR ANALYZING GENES AND PROTEINS

    Get PDF
    Since large amounts of biological data are generated using various high-throughput technologies, efficient computational methods are important for understanding the biological meanings behind the complex data. Machine learning is particularly appealing for biological knowledge discovery. Tissue-specific gene expression and protein sumoylation play essential roles in the cell and are implicated in many human diseases. Protein destabilization is a common mechanism by which mutations cause human diseases. In this study, machine learning approaches were developed for predicting human tissue-specific genes, protein sumoylation sites and protein stability changes upon single amino acid substitutions. Relevant biological features were selected for input vector encoding, and machine learning algorithms, including Random Forests and Support Vector Machines, were used for classifier construction. The results suggest that the approaches give rise to more accurate predictions than previous studies and can provide valuable information for further experimental studies. Moreover, seeSUMO and MuStab web servers were developed to make the classifiers accessible to the biological research community. Structure-based methods can be used to predict the effects of amino acid substitutions on protein function and stability. The nonsynonymous Single Nucleotide Polymorphisms (nsSNPs) located at the protein binding interface have dramatic effects on protein-protein interactions. To model the effects, the nsSNPs at the interfaces of 264 protein-protein complexes were mapped on the protein structures using homology-based methods. The results suggest that disease-causing nsSNPs tend to destabilize the electrostatic component of the binding energy and nsSNPs at conserved positions have significant effects on binding energy changes. The structure-based approach was developed to quantitatively assess the effects of amino acid substitutions on protein stability and protein-protein interaction. It was shown that the structure-based analysis could help elucidate the mechanisms by which mutations cause human genetic disorders. These new bioinformatic methods can be used to analyze some interesting genes and proteins for human genetic research and improve our understanding of their molecular mechanisms underlying human diseases

    Predicting sumoylation sites using support vector machines based on various sequence features, conformational flexibility and disorder

    Get PDF
    Background Sumoylation, which is a reversible and dynamic post-translational modification, is one of the vital processes in a cell. Before a protein matures to perform its function, sumoylation may alter its localization, interactions, and possibly structural conformation. Abberations in protein sumoylation has been linked with a variety of disorders and developmental anomalies. Experimental approaches to identification of sumoylation sites may not be effective due to the dynamic nature of sumoylation, laborsome experiments and their cost. Therefore, computational approaches may guide experimental identification of sumoylation sites and provide insights for further understanding sumoylation mechanism. Results In this paper, the effectiveness of using various sequence properties in predicting sumoylation sites was investigated with statistical analyses and machine learning approach employing support vector machines. These sequence properties were derived from windows of size 7 including position-specific amino acid composition, hydrophobicity, estimated sub-window volumes, predicted disorder, and conformational flexibility. 5-fold cross-validation results on experimentally identified sumoylation sites revealed that our method successfully predicts sumoylation sites with a Matthew's correlation coefficient, sensitivity, specificity, and accuracy equal to 0.66, 73%, 98%, and 97%, respectively. Additionally, we have showed that our method compares favorably to the existing prediction methods and basic regular expressions scanner. Conclusions By using support vector machines, a new, robust method for sumoylation site prediction was introduced. Besides, the possible effects of predicted conformational flexibility and disorder on sumoylation site recognition were explored computationally for the first time to our knowledge as an additional parameter that could aid in sumoylation site prediction

    Conjecture Regarding Posttranslational Modifications to the Arabidopsis Type I Proton-Pumping Pyrophosphatase (AVP1)

    Get PDF
    abstract: Agbiotechnology uses genetic engineering to improve the output and value of crops. Altering the expression of the plant Type I Proton-pumping Pyrophosphatase (H[superscript +]-PPase) has already proven to be a useful tool to enhance crop productivity. Despite the effective use of this gene in translational research, information regarding the intracellular localization and functional plasticity of the pump remain largely enigmatic. Using computer modeling several putative phosphorylation, ubiquitination and sumoylation target sites were identified that may regulate Arabidopsis H[superscript +]-PPase (AVP1- Arabidopsis Vacuolar Proton-pump 1) subcellular trafficking and activity. These putative regulatory sites will direct future research that specifically addresses the partitioning and transport characteristics of this pump. We posit that fine-tuning H[superscript +]-PPases activity and cellular distribution will facilitate rationale strategies for further genetic improvements in crop productivity.View the article as published at https://www.frontiersin.org/articles/10.3389/fpls.2017.01572/ful

    An investigation into the role and mechanism of action of small ubiquitin-like modifier interacting motifs in Arabidopsis thaliana proteins

    Get PDF
    SUMO is a small protein that is ligated to other proteins to regulate their function. Ligation occurs at lysine residues within a SUMO site motif. A wide range of proteins are targets of SUMOylation and in plants SUMO plays a diverse role in many important processes. Processes including development, stress tolerance, hormone regulation, DNA repair and chromatin remodelling are regulated by SUMOylation. SUMO affects protein function primarily by establishing interactions through SUMO interacting motifs (SIMs) in interacting protein partners. SUMO can also alter protein function by blocking access to protein domains and by causing conformational changes to the target. The ability to predict SIMs in plant proteins would be useful for research into the poorly understood mechanisms behind SUMO regulation. Large arrays of synthetic peptides were screened with SUMO to identify SIM peptides. These data were used to characterise the sequence composition of plant SIMs. The plant SIMs were compared and contrasted with human SIMs to highlight the functional differences between these two evolutionary distinct species. The data were used to build a predictor for SIMs using random forest models. A new SUMO site predictor was built using random forest models as well. The SIM predictor was used to identify putative SIM containing proteins in the Arabidopsis thaliana genome and the functional enrichment of these genes was analysed. The role of SUMO in the plant gibberellin (GA) pathway was also investigated. The DELLA protein RGA is a negative regulator of GA signalling and this protein was shown to be SUMOylated. RGA stability is regulated by the GA receptor GID1 and it was demonstrated that GID1a contains a SIM. It was proposed that SUMOylated RGA interacted with GID1a through its SIM which inhibited its function. The model was tested by investigating the binding of SUMO to GID1a and by generating mutants of GID1a that had reduced SUMO a affinity. The results demonstrate that GA signalling can be enhanced by introducing a mutation into the GID1a SIM

    Comprehensive characterization of amino acid positions in protein structures reveals molecular effect of missense variants

    Get PDF
    Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations' positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants' pathogenicity in terms of the perturbed molecular mechanisms.Peer reviewe

    SumSec: accurate prediction of Sumoylation sites using predicted secondary structure

    Get PDF
    Post Translational Modification (PTM) is defined as the modification of amino acids along the protein sequences after the translation process. These modifications significantly impact on the functioning of proteins. Therefore, having a comprehensive understanding of the underlying mechanism of PTMs turns out to be critical in studying the biological roles of proteins. Among a wide range of PTMs, sumoylation is one of the most important modifications due to its known cellular functions which include transcriptional regulation, protein stability, and protein subcellular localization. Despite its importance, determining sumoylation sites via experimental methods is time-consuming and costly. This has led to a great demand for the development of fast computational methods able to accurately determine sumoylation sites in proteins. In this study, we present a new machine learning-based method for predicting sumoylation sites called SumSec. To do this, we employed the predicted secondary structure of amino acids to extract two types of structural features from neighboring amino acids along the protein sequence which has never been used for this task. As a result, our proposed method is able to enhance the sumoylation site prediction task, outperforming previously proposed methods in the literature. SumSec demonstrated high sensitivity (0.91), accuracy (0.94) and MCC (0.88). The prediction accuracy achieved in this study is 21% better than those reported in previous studies. The script and extracted features are publicly available at: https://github.com/YosvanyLopez/SumSec

    The SUMOylation pathway suppresses arbovirus replication in <i>Aedes aegypti</i> cells

    Get PDF
    Mosquitoes are responsible for the transmission of many clinically important arboviruses that cause significant levels of annual mortality and socioeconomic health burden worldwide. Deciphering the mechanisms by which mosquitoes modulate arbovirus infection is crucial to understand how viral-host interactions promote vector transmission and human disease. SUMOylation is a post-translational modification that leads to the covalent attachment of the Small Ubiquitin-like MOdifier (SUMO) protein to host factors, which in turn can modulate their stability, interaction networks, sub-cellular localisation, and biochemical function. While the SUMOylation pathway is known to play a key role in the regulation of host immune defences to virus infection in humans, the importance of this pathway during arbovirus infection in mosquito vectors, such as Aedes aegypti (Ae. aegypti), remains unknown. Here we characterise the sequence, structure, biochemical properties, and tissue-specific expression profiles of component proteins of the Ae. aegypti SUMOylation pathway. We demonstrate significant biochemical differences between Ae. aegypti and Homo sapiens SUMOylation pathways and identify cell-type specific patterns of SUMO expression in Ae. aegypti tissues known to support arbovirus replication. Importantly, depletion of core SUMOylation effector proteins (SUMO, Ubc9 and PIAS) in Ae. aegypti cells led to enhanced levels of arbovirus replication from three different families; Zika (Flaviviridae), Semliki Forest (Togaviridae), and Bunyamwera (Bunyaviridae) viruses. Our findings identify an important role for mosquito SUMOylation in the cellular restriction of arboviruses that may directly influence vector competence and transmission of clinically important arboviruses

    Cooperativity among Short Amyloid Stretches in Long Amyloidogenic Sequences

    Get PDF
    Amyloid fibrillar aggregates of polypeptides are associated with many neurodegenerative diseases. Short peptide segments in protein sequences may trigger aggregation. Identifying these stretches and examining their behavior in longer protein segments is critical for understanding these diseases and obtaining potential therapies. In this study, we combined machine learning and structure-based energy evaluation to examine and predict amyloidogenic segments. Our feature selection method discovered that windows consisting of long amino acid segments of ∼30 residues, instead of the commonly used short hexapeptides, provided the highest accuracy. Weighted contributions of an amino acid at each position in a 27 residue window revealed three cooperative regions of short stretch, resemble the β-strand-turn-β-strand motif in A-βpeptide amyloid and β-solenoid structure of HET-s(218–289) prion (C). Using an in-house energy evaluation algorithm, the interaction energy between two short stretches in long segment is computed and incorporated as an additional feature. The algorithm successfully predicted and classified amyloid segments with an overall accuracy of 75%. Our study revealed that genome-wide amyloid segments are not only dependent on short high propensity stretches, but also on nearby residues

    Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs

    Get PDF
    As one of the most important reversible protein post-translation modifications, ubiquitination has been reported to be involved in lots of biological processes and closely implicated with various diseases. To fully decipher the molecular mechanisms of ubiquitination-related biological processes, an initial but crucial step is the recognition of ubiquitylated substrates and the corresponding ubiquitination sites. Here, a new bioinformatics tool named CKSAAP_UbSite was developed to predict ubiquitination sites from protein sequences. With the assistance of Support Vector Machine (SVM), the highlight of CKSAAP_UbSite is to employ the composition of k-spaced amino acid pairs surrounding a query site (i.e. any lysine in a query sequence) as input. When trained and tested in the dataset of yeast ubiquitination sites (Radivojac et al, Proteins, 2010, 78: 365–380), a 100-fold cross-validation on a 1∶1 ratio of positive and negative samples revealed that the accuracy and MCC of CKSAAP_UbSite reached 73.40% and 0.4694, respectively. The proposed CKSAAP_UbSite has also been intensively benchmarked to exhibit better performance than some existing predictors, suggesting that it can be served as a useful tool to the community. Currently, CKSAAP_UbSite is freely accessible at http://protein.cau.edu.cn/cksaap_ubsite/. Moreover, we also found that the sequence patterns around ubiquitination sites are not conserved across different species. To ensure a reasonable prediction performance, the application of the current CKSAAP_UbSite should be limited to the proteome of yeast
    • …
    corecore