1,154 research outputs found

    Graph algorithms for predicting subcellular localization at the pathway level

    Full text link
    Protein subcellular localization is an important factor in normal cellular processes and disease. While many protein localization resources treat it as static, protein localization is dynamic and heavily influenced by biological context. Biological pathways are graphs that represent a specific biological context and can be inferred from large-scale data. We develop graph algorithms to predict the localization of all interactions in a biological pathway as an edge-labeling task. We compare a variety of models including graph neural networks, probabilistic graphical models, and discriminative classifiers for predicting localization annotations from curated pathway databases. We also perform a case study where we construct biological pathways and predict localizations of human fibroblasts undergoing viral infection. Pathway localization prediction is a promising approach for integrating publicly available localization data into the analysis of large-scale biological data.Comment: 35 pages, 14 figure

    Bioinformatics Approaches for Predicting Kinase–Substrate Relationships

    Get PDF
    Protein phosphorylation, catalyzed by protein kinases, is the main posttranslational modification in eukaryotes, regulating essential aspects of cellular function. Using mass spectrometry techniques, a profound knowledge has been achieved in the localization of phosphorylated residues at proteomic scale. Although it is still largely unknown, the protein kinases are responsible for such modifications. To fill this gap, many computational algorithms have been developed, which are capable to predict kinase–substrate relationships. The greatest difficulty for these approaches is to model the complex nature that determines kinase–substrate specificity. The vast majority of predictors is based on the linear primary sequence pattern that surrounds phosphorylation sites. However, in the intracellular environment the protein kinase specificity is influenced by contextual factors, such as protein–protein interactions, substrates co-expression patterns, and subcellular localization. Only recently, the development of phosphorylation predictors has begun to incorporate these variables, significantly improving specificity of these methods. An accurate modeling of kinase–substrate relationships could be the greatest contribution of bioinformatics to understand physiological cell signaling and its pathological impairment

    Success: evolutionary and structural properties of amino acids prove effective for succinylation site prediction

    Get PDF
    Post-translational modification is considered an important biological mechanism with critical impact on the diversification of the proteome. Although a long list of such modifications has been studied, succinylation of lysine residues has recently attracted the interest of the scientific community. The experimental detection of succinylation sites is an expensive process, which consumes a lot of time and resources. Therefore, computational predictors of this covalent modification have emerged as a last resort to tackling lysine succinylation. In this paper, we propose a novel computational predictor called ‘Success’, which efficiently uses the structural and evolutionary information of amino acids for predicting succinylation sites. To do this, each lysine was described as a vector that combined the above information of surrounding amino acids. We then designed a support vector machine with a radial basis function kernel for discriminating between succinylated and non-succinylated residues. We finally compared the Success predictor with three state-of-the-art predictors in the literature. As a result, our proposed predictor showed a significant improvement over the compared predictors in statistical metrics, such as sensitivity (0.866), accuracy (0.838) and Matthews correlation coefficient (0.677) on a benchmark dataset. The proposed predictor effectively uses the structural and evolutionary information of the amino acids surrounding a lysine. The bigram feature extraction approach, while retaining the same number of features, facilitates a better description of lysines. A support vector machine with a radial basis function kernel was used to discriminate between modified and unmodified lysines. The aforementioned aspects make the Success predictor outperform three state-of-the-art predictors in succinylation detection

    Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition

    Get PDF

    Histopathological image analysis : a review

    Get PDF
    Over the past decade, dramatic increases in computational power and improvement in image analysis algorithms have allowed the development of powerful computer-assisted analytical approaches to radiological data. With the recent advent of whole slide digital scanners, tissue histopathology slides can now be digitized and stored in digital image form. Consequently, digitized tissue histopathology has now become amenable to the application of computerized image analysis and machine learning techniques. Analogous to the role of computer-assisted diagnosis (CAD) algorithms in medical imaging to complement the opinion of a radiologist, CAD algorithms have begun to be developed for disease detection, diagnosis, and prognosis prediction to complement the opinion of the pathologist. In this paper, we review the recent state of the art CAD technology for digitized histopathology. This paper also briefly describes the development and application of novel image analysis technology for a few specific histopathology related problems being pursued in the United States and Europe

    Nuclear export signals (NESs) in Arabidopsis thaliana : development and experimental validation of a prediction tool

    Get PDF
    Rubiano Castellanos CC. Nuclear export signals (NESs) in Arabidopsis thaliana : development and experimental validation of a prediction tool. Bielefeld (Germany): Bielefeld University; 2010.It is well established that nucleo-cytoplasmic shuttling regulates not only the localization but also the activity of many proteins like transcription factors, cell cycle regulators and tumor suppressor proteins just to mention some. Also in plants the nucleo-cytoplasmic partitioning of proteins emerges as an important regulation mechanism for many plant-specific processes. One requirement for a protein to shuttle between nucleus and cytoplasm lies in its nuclear export activity. The widely used mechanism for export of proteins from the nucleus involves the receptor Exportin 1 and the presence of a nuclear export signal (NES) in the cargo protein. Given the big amount of sequence data available nowadays the possibility to use a computational tool to predict the proteins potentially containing an NES would help to facilitate the screening and experimental characterization of NES-containing proteins. However, the computational prediction of NESs is a challenging task. Currently there is only one NES prediction tool and that is unfortunately not accurate for predicting these signals in proteins of plants. In that direction, this study aimed mainly at developing a prediction method for identifying NESs in proteins from Arabidopsis and to validate its usefulness experimentally. It included also the definition of the influence of the NES protein context in the nuclear export activity of specific proteins of Arabidopsis. Three machine-learning algorithms (i.e. k-NN, SVM and Random Forests) were trained with experimentally validated NES sequences from proteins of Arabidopsis and other organisms. Two kinds of features were included, the sequence of the NESs expressed as the score obtained from an HMM profile constructed with the NES sequences of proteins from Arabidopsis, and physicochemical properties of the amino acid residues expressed as amino acid index values. The Random Forest classifier was selected among the three classifiers after evaluation of the performance by different methods. It showed to be highly accurate (accuracy values over 85 percent, classification error around 10 percent, MCC around 0.7 and area under the ROC curve around 0.90) and performed better than the other two trained classifiers. Using the Random Forest classifier around 5000 proteins from the total of protein sequences from Arabidopsis were predicted as containing NESs. A group of these proteins was selected by using Gene Ontologies (GO) and from this last group, 13 proteins were experimentally tested for nuclear export activity. 11 out of those 13 proteins showed positive interaction with the receptor Exportin 1 (XPO1a) from Arabidopsis in yeast two-hybrid assays. The proteins showing nuclear export activity include 9 transcription factors and 2 DNA metabolism-related proteins. Furthermore, it was established that the amino acid residues located between the hydrophobic residues in the NES as well as the protein structure of the regions around the NES could modify the nuclear export activity of some proteins. In conclusion, this work presents a new prediction tool for NESs in proteins of Arabidopsis based on a Random Forest classifier. The experimental validation of the nuclear export activity in a selected group of proteins is an indicative of the usefulness of the tool. From the biological point of view, the nuclear export activity observed in those proteins strongly suggest that nucleo-cytoplasmic partitioning could be involved in the regulation of their functions. For the follow up research the further characterization of the proteins showing positive nuclear export activity as well as the validation of additional predicted NES-containing proteins is envisioned. In the near future, the developed tool is going to be available as a web application to facilitate and promote its further usage

    FFPred 3: feature-based function prediction for all Gene Ontology domains

    Get PDF
    Predicting protein function has been a major goal of bioinformatics for several decades, and it has gained fresh momentum thanks to recent community-wide blind tests aimed at benchmarking available tools on a genomic scale. Sequence-based predictors, especially those performing homology-based transfers, remain the most popular but increasing understanding of their limitations has stimulated the development of complementary approaches, which mostly exploit machine learning. Here we present FFPred 3, which is intended for assigning Gene Ontology terms to human protein chains, when homology with characterized proteins can provide little aid. Predictions are made by scanning the input sequences against an array of Support Vector Machines (SVMs), each examining the relationship between protein function and biophysical attributes describing secondary structure, transmembrane helices, intrinsically disordered regions, signal peptides and other motifs. This update features a larger SVM library that extends its coverage to the cellular component sub-ontology for the first time, prompted by the establishment of a dedicated evaluation category within the Critical Assessment of Functional Annotation. The effectiveness of this approach is demonstrated through benchmarking experiments, and its usefulness is illustrated by analysing the potential functional consequences of alternative splicing in human and their relationship to patterns of biological features
    corecore