55 research outputs found

    Success: evolutionary and structural properties of amino acids prove effective for succinylation site prediction

    Get PDF
    Post-translational modification is considered an important biological mechanism with critical impact on the diversification of the proteome. Although a long list of such modifications has been studied, succinylation of lysine residues has recently attracted the interest of the scientific community. The experimental detection of succinylation sites is an expensive process, which consumes a lot of time and resources. Therefore, computational predictors of this covalent modification have emerged as a last resort to tackling lysine succinylation. In this paper, we propose a novel computational predictor called ‘Success’, which efficiently uses the structural and evolutionary information of amino acids for predicting succinylation sites. To do this, each lysine was described as a vector that combined the above information of surrounding amino acids. We then designed a support vector machine with a radial basis function kernel for discriminating between succinylated and non-succinylated residues. We finally compared the Success predictor with three state-of-the-art predictors in the literature. As a result, our proposed predictor showed a significant improvement over the compared predictors in statistical metrics, such as sensitivity (0.866), accuracy (0.838) and Matthews correlation coefficient (0.677) on a benchmark dataset. The proposed predictor effectively uses the structural and evolutionary information of the amino acids surrounding a lysine. The bigram feature extraction approach, while retaining the same number of features, facilitates a better description of lysines. A support vector machine with a radial basis function kernel was used to discriminate between modified and unmodified lysines. The aforementioned aspects make the Success predictor outperform three state-of-the-art predictors in succinylation detection

    Succinilação e malonilação de proteínas na esquizofrenia

    Get PDF
    Orientador: Daniel Martins de SouzaDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de BiologiaResumo: A esquizofrenia é uma doença mental multifatorial que afeta até 1% da população mundial. Os pacientes são afetados negativamente pela presença de vários sintomas e não se sabe de uma cura para esta desordem. Vias associadas ao metabolismo energético estão desreguladas, e a desregulação metabólica é também um efeito colateral dos antipsicóticos, o tratamento principal para manejar os sintomas da esquizofrenia. Em 2011, duas modificações pós-traducionais de proteínas, a succinilação e malonilação de lisina, foram descobertas e devem existir em todos os domínios de vida. Os precursores dessas modificações ¿ succinil-CoA e malonil-CoA ¿ são parte de processos metabólicos centrais e a prevalência de ambas na célula pode variar por estímulos associados com condições metabólicas como hipóxia, que pode ser um gatilho ambiental para o desenvolvimento da esquizofrenia. Neste trabalho, a proteômica quantitativa em larga escala baseada em espectrometria de massas foi usada para determinar quais diferenças existem sobre várias condições. Tecido cerebral post-mortem de pacientes com esquizofrenia foram analisados em termos de malonilação e succinilação e comparados a tecido cerebral de pessoas mentalmente sadias. Também, culturas de precursores de oligodendrócitos humanos (linhagem MO3.13), tratadas com MK-801 e/ou um de 3 antipsicóticos foram analisadas. As diferenças descobertas aqui têm a capacidade para melhorar a compreensão da etiologia, a patofisiologia, os sintomas e o tratamento da esquizofreniaAbstract: Schizophrenia is a multifactorial mental disorder that affects nearly 1% of the population worldwide. Patients are negatively affected in various ways; and there is no known cure for this disease. Pathways associated with energy metabolism are dysregulated, and metabolic disruption is also one of the side effects of antipsychotics, the principal way to manage the symptoms of schizophrenia. In 2011 two post-translational protein modifications, the succinylation and malonylation of lysine residues, were discovered to be widely present in likely all domains of life and furthermore have been observed on many proteins associated with glycolysis and metabolism. The precursors to these modifications, understood to be succinyl-CoA and malonyl-CoA, are also both a part of central metabolic processes, and their prevalence as a modification in cells can vary with metabolism-associated stimuli, such as hypoxia, a potential environmental trigger for developing schizophrenia. In this work, shotgun mass spectrometry-based quantitative proteomics was used to determine what differences in succinyllysine and malonyllysine profiles exist under various conditions. Postmortem brain tissue of schizophrenia patients was compared with tissue from mentally sound controls. Additionally, human oligodendrocyte precursor cell cultures (MO3.13 lineage) were treated with MK-801 and/or 3 antipsychotics and analyzed. The differences uncovered herein can potentially provide insight into the etiology, pathophysiology, symptoms, and treatment of schizophreniaMestradoBioquimicaMestre em Biologia Funcional e Molecular2016/07948-8FAPES

    PTMcode v2: a resource for functional associations of post-translational modifications within and between proteins

    Get PDF
    The post-translational regulation of proteins is mainly driven by two molecular events, their modification by several types of moieties and their interaction with other proteins. These two processes are interdependent and together are responsible for the function of the protein in a particular cell state. Several databases focus on the prediction and compilation of protein-protein interactions (PPIs) and no less on the collection and analysis of protein post-translational modifications (PTMs), however, there are no resources that concentrate on describing the regulatory role of PTMs in PPIs. We developed several methods based on residue co-evolution and proximity to predict the functional associations of pairs of PTMs that we apply to modifications in the same protein and between two interacting proteins. In order to make data available for understudied organisms, PTMcode v2 (http://ptmcode.embl.de) includes a new strategy to propagate PTMs from validated modified sites through orthologous proteins. The second release of PTMcode covers 19 eukaryotic species from which we collected more than 300 000 experimentally verified PTMs (>1 300 000 propagated) of 69 types extracting the post-translational regulation of >100 000 proteins and >100 000 interactions. In total, we report 8 million associations of PTMs regulating single proteins and over 9.4 million interplays tuning PPIs

    SumSec: accurate prediction of Sumoylation sites using predicted secondary structure

    Get PDF
    Post Translational Modification (PTM) is defined as the modification of amino acids along the protein sequences after the translation process. These modifications significantly impact on the functioning of proteins. Therefore, having a comprehensive understanding of the underlying mechanism of PTMs turns out to be critical in studying the biological roles of proteins. Among a wide range of PTMs, sumoylation is one of the most important modifications due to its known cellular functions which include transcriptional regulation, protein stability, and protein subcellular localization. Despite its importance, determining sumoylation sites via experimental methods is time-consuming and costly. This has led to a great demand for the development of fast computational methods able to accurately determine sumoylation sites in proteins. In this study, we present a new machine learning-based method for predicting sumoylation sites called SumSec. To do this, we employed the predicted secondary structure of amino acids to extract two types of structural features from neighboring amino acids along the protein sequence which has never been used for this task. As a result, our proposed method is able to enhance the sumoylation site prediction task, outperforming previously proposed methods in the literature. SumSec demonstrated high sensitivity (0.91), accuracy (0.94) and MCC (0.88). The prediction accuracy achieved in this study is 21% better than those reported in previous studies. The script and extracted features are publicly available at: https://github.com/YosvanyLopez/SumSec

    Computational identification of microbial phosphorylation sites by the enhanced characteristics of sequence information

    Get PDF
    Protein phosphorylation on serine (S) and threonine (T) has emerged as a key device in the control of many biological processes. Recently phosphorylation in microbial organisms has attracted much attention for its critical roles in various cellular processes such as cell growth and cell division. Here a novel machine learning predictor, MPSite (Microbial Phosphorylation Site predictor), was developed to identify microbial phosphorylation sites using the enhanced characteristics of sequence features. The final feature vectors optimized via a Wilcoxon rank sum test. A random forest classifier was then trained using the optimum features to build the predictor. Benchmarking investigation using the 5-fold cross-validation and independent datasets test showed that the MPSite is able to achieve robust performance on the S- and T-phosphorylation site prediction. It also outperformed other existing methods on the comprehensive independent datasets. We anticipate that the MPSite is a powerful tool for proteome-wide prediction of microbial phosphorylation sites and facilitates hypothesis-driven functional interrogation of phosphorylation proteins. A web application with the curated datasets is freely available at http://kurata14.bio.kyutech.ac.jp/MPSite/

    Beyond repression of Nrf2 : an update on Keap1

    Get PDF

    SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins.

    Get PDF
    Funder: Mahidol UniversityFunder: College of Arts, Media and Technology, Chiang Mai UniversityFunder: Chiang Mai UniversityFunder: Information Technology Service Center (ITSC) of Chiang Mai UniversityFast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain limitations. Therefore, in this study, we propose a new computational approach, termed SCORPION, (StaCking-based Predictior fOR Phage VIrion PrOteiNs), to accurately identify PVPs using only protein primary sequences. Specifically, we explored comprehensive 13 different feature descriptors from different aspects (i.e., compositional information, composition-transition-distribution information, position-specific information and physicochemical properties) with 10 popular ML algorithms to construct a pool of optimal baseline models. These optimal baseline models were then used to generate probabilistic features (PFs) and considered as a new feature vector. Finally, we utilized a two-step feature selection strategy to determine the optimal PF feature vector and used this feature vector to develop a stacked model (SCORPION). Both tenfold cross-validation and independent test results indicate that SCORPION achieves superior predictive performance than its constitute baseline models and existing methods. We anticipate SCORPION will serve as a useful tool for the cost-effective and large-scale screening of new PVPs. The source codes and datasets for this work are available for downloading in the GitHub repository ( https://github.com/saeed344/SCORPION )

    iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization

    Get PDF
    Sequence-based analysis and prediction are fundamental bioinformatic tasks that facilitate understanding of the sequence(-structure)-function paradigm for DNAs, RNAs and proteins. Rapid accumulation of sequences requires equally pervasive development of new predictive models, which depends on the availability of effective tools that support these efforts. We introduce iLearnPlus, the first machine-learning platform with graphical- and web-based interfaces for the construction of machine-learning pipelines for analysis and predictions using nucleic acid and protein sequences. iLearnPlus provides a comprehensive set of algorithms and automates sequence-based feature extraction and analysis, construction and deployment of models, assessment of predictive performance, statistical analysis, and data visualization; all without programming. iLearnPlus includes a wide range of feature sets which encode information from the input sequences and over twenty machine-learning algorithms that cover several deep-learning approaches, outnumbering the current solutions by a wide margin. Our solution caters to experienced bioinformaticians, given the broad range of options, and biologists with no programming background, given the point-and-click interface and easy-to-follow design process. We showcase iLearnPlus with two case studies concerning prediction of long noncoding RNAs (lncRNAs) from RNA transcripts and prediction of crotonylation sites in protein chains. iLearnPlus is an open-source platform available at https://github.com/Superzchen/iLearnPlus/ with the webserver at http://ilearnplus.erc.monash.edu/.Zhen Chen, Pei Zhao, Chen Li, Fuyi Li, Dongxu Xiang, Yong-Zi Chen, Tatsuya Akutsu, Roger J. Daly, Geoffrey I. Webb, Quanzhi Zhao, Lukasz Kurgan, and Jiangning Son

    PreAcrs: a machine learning framework for identifying anti-CRISPR proteins

    Get PDF
    Published online: 25 October 2022Background: Anti-CRISPR proteins are potent modulators that inhibit the CRISPRCas immunity system and have huge potential in gene editing and gene therapy as a genome-editing tool. Extensive studies have shown that anti-CRISPR proteins are essential for modifying endogenous genes, promoting the RNA-guided binding and cleavage of DNA or RNA substrates. In recent years, identifying and characterizing anti-CRISPR proteins has become a hot and significant research topic in bioinformatics. However, as most anti-CRISPR proteins fall short in sharing similarities to those currently known, traditional screening methods are time-consuming and inefficient. Machine learning methods could fill this gap with powerful predictive capability and provide a new perspective for anti-CRISPR protein identification. Results: Here, we present a novel machine learning ensemble predictor, called PreAcrs, to identify anti-CRISPR proteins from protein sequences directly. Three features and eight different machine learning algorithms were used to train PreAcrs. PreAcrs outperformed other existing methods and significantly improved the prediction accuracy for identifying anti-CRISPR proteins. Conclusions: In summary, the PreAcrs predictor achieved a competitive performance for predicting new anti-CRISPR proteins in terms of accuracy and robustness. We anticipate PreAcrs will be a valuable tool for researchers to speed up the research process. The source code is available at: https://github.com/Lyn-666/anti_CRISPR.git.Lin Zhu, Xiaoyu Wang, Fuyi Li and Jiangning Son
    corecore