145 research outputs found

    Prediction of peptides binding to MHC class I alleles by partial periodic pattern mining

    Get PDF
    MHC (Major Histocompatibility Complex) is a key player in the immune response of an organism. It is important to be able to predict which antigenic peptides will bind to a spe-cific MHC allele and which will not, creating possibilities for controlling immune response and for the applications of immunotherapy. However a problem encountered in the computational binding prediction methods for MHC class I is the presence of bulges and loops in the peptides, changing the total length. Most machine learning methods in use to-day require the sequences to be of same length to success-fully mine the binding motifs. We propose the use of time-based data mining methods in motif mining to be able to mine motifs position-independently. Also, the information for both binding and non-binding peptides are used on the contrary to the other methods which only rely on binding peptides. The prediction results are between 70-80% for the tested alleles

    An entropy based heuristic model for predicting functional sub-type divisions of protein families

    Get PDF
    Multiple sequence alignments of protein families are often used for locating residues that are widely apart in the sequence, which are considered as influential for determining functional specificity of proteins towards various substrates, ligands, DNA and other proteins. In this paper, we propose an entropy-score based heuristic algorithm model for predicting functional sub-family divisions of protein families, given the multiple sequence alignment of the protein family as input without any functional sub-type or key site information given for any protein sequence. Two of the experimented test-cases are reported in this paper. First test-case is Nucleotidyl Cyclase protein family consisting of guanalyate and adenylate cyclases. And the second test-case is a dataset of proteins taken from six superfamilies in Structure-Function Linkage Database (SFLD). Results from these test-cases are reported in terms of confirmed sub-type divisions with phylogeny relations from former studies in the literature

    Predicting sumoylation sites using support vector machines based on various sequence features, conformational flexibility and disorder

    Get PDF
    Background Sumoylation, which is a reversible and dynamic post-translational modification, is one of the vital processes in a cell. Before a protein matures to perform its function, sumoylation may alter its localization, interactions, and possibly structural conformation. Abberations in protein sumoylation has been linked with a variety of disorders and developmental anomalies. Experimental approaches to identification of sumoylation sites may not be effective due to the dynamic nature of sumoylation, laborsome experiments and their cost. Therefore, computational approaches may guide experimental identification of sumoylation sites and provide insights for further understanding sumoylation mechanism. Results In this paper, the effectiveness of using various sequence properties in predicting sumoylation sites was investigated with statistical analyses and machine learning approach employing support vector machines. These sequence properties were derived from windows of size 7 including position-specific amino acid composition, hydrophobicity, estimated sub-window volumes, predicted disorder, and conformational flexibility. 5-fold cross-validation results on experimentally identified sumoylation sites revealed that our method successfully predicts sumoylation sites with a Matthew's correlation coefficient, sensitivity, specificity, and accuracy equal to 0.66, 73%, 98%, and 97%, respectively. Additionally, we have showed that our method compares favorably to the existing prediction methods and basic regular expressions scanner. Conclusions By using support vector machines, a new, robust method for sumoylation site prediction was introduced. Besides, the possible effects of predicted conformational flexibility and disorder on sumoylation site recognition were explored computationally for the first time to our knowledge as an additional parameter that could aid in sumoylation site prediction

    Prediction of peptides binding to MHC class I and II alleles by temporal motif mining

    Get PDF
    Background: MHC (Major Histocompatibility Complex) is a key player in the immune response of most vertebrates. The computational prediction of whether a given antigenic peptide will bind to a specific MHC allele is important in the development of vaccines for emerging pathogens, the creation of possibilities for controlling immune response, and for the applications of immunotherapy. One of the problems that make this computational prediction difficult is the detection of the binding core region in peptides, coupled with the presence of bulges and loops causing variations in the total sequence length. Most machine learning methods require the sequences to be of the same length to successfully discover the binding motifs, ignoring the length variance in both motif mining and prediction steps. In order to overcome this limitation, we propose the use of time-based motif mining methods that work position-independently. Results: The prediction method was tested on a benchmark set of 28 different alleles for MHC class I and 27 different alleles for MHC class II. The obtained results are comparable to the state of the art methods for both MHC classes, surpassing the published results for some alleles. The average prediction AUC values are 0.897 for class I, and 0.858 for class II. Conclusions: Temporal motif mining using partial periodic patterns can capture information about the sequences well enough to predict the binding of the peptides and is comparable to state of the art methods in the literature. Unlike neural networks or matrix based predictors, our proposed method does not depend on peptide length and can work with both short and long fragments. This advantage allows better use of the available training data and the prediction of peptides of uncommon lengths

    Molecular characterization of cDNA encoding resistance gene-like sequences in Buchloe dactyloides

    Get PDF
    Current knowledge of resistance (R) genes and their use for genetic improvement in buffalograss (Buchloe dactyloides [Nutt.] Engelm.) lag behind most crop plants. This study was conducted to clone and characterize cDNA encoding R gene-like (RGL) sequences in buffalograss. This report is the first to clone and-characterize of buffalograss RGLs. Degenerate primers designed from the conserved motifs of known R genes were used to amplify RGLs and fragments of expected size were isolated and cloned. Sequence analysis of cDNA clones and analysis of putative translation products revealed that most encoded amino acid sequences shared the similar conserved motifs found in the cloned plant disease resistance genes RPS2, MLA6, L6, RPM1, and Xa1. These results indicated diversity of the R gene candidate sequences in buffalograss. Analysis of 5' rapid amplification of cDNA ends (RACE), applied to investigate upstream of RGLs, indicated that regulatory sequences such as TATA box were conserved among the RGLs identified. The cloned RGL in this study will further enhance our knowledge on organization, function, and evolution of R gene family in buffalograss. With the sequences of the primers and sizes of the markers provided, these RGL markers are readily available for use in a genomics-assisted selection in buffalograss

    DockPro: A VR-Based Tool for Protein-Protein Docking Problem

    Get PDF
    Proteins are large molecules that are vital for all living organisms and they are essential components of many industrial products. The process of binding a protein to another is called protein-protein docking. Many automated algorithms have been proposed to find docking configurations that might yield promising protein-protein complexes. However, these automated methods are likely to come up with false positives and have high computational costs. Consequently, Virtual Reality has been used to take advantage of user's experience on the problem; and proposed applications can be further improved. Haptic devices have been used for molecular docking problems; but they are inappropriate for protein-protein docking due to their workspace limitations. Instead of haptic rendering of forces, we provide a novel visual feedback for simulating physicochemical forces of proteins. We propose an interactive 3D application, DockPro, which enables domain experts to come up with dockings of protein-protein couples by using magnetic trackers and gloves in front of a large display

    Optimization of morphological data in numerical taxonomy analysis using genetic algorithms feature selection method

    Get PDF
    Studies in Numerical Taxonomy are carried out by measuring characters as much as possible. The workload over scientists and labor to perform measurements will increase proportionally with the number of variables (or characters) to be used in the study. However, some part of the data may be irrelevant or sometimes meaningless. Here in this study, we introduce an algorithm to obtain a subset of data with minimum characters that can represent original data. Morphological characters were used in optimization of data by Genetic Algorithms Feature Selection method. The analyses were performed on an 18 character*11 taxa data matrix with standardized continuous characters. The analyses resulted in a minimum set of 2 characters, which means the original tree based on the complete data can also be constructed by those two characters

    The identification of pathway markers in intracranial aneurysm using genome-wide association data from two different populations

    Get PDF
    The identification of significant individual factors causing complex diseases is challenging in genome-wide association studies (GWAS) since each factor has only a modest effect on the disease development mechanism. In this study, we hypothesize that the biological pathways that are targeted by these individual factors show higher conservation within and across populations. To test this hypothesis, we searched for the disease related pathways on two intracranial aneurysm GWAS in European and Japanese case-control cohorts. Even though there were a few significantly conserved SNPs within and between populations, seven of the top ten affected pathways were found significant in both populations. The probability of random occurrence of such an event is 2.44E-36. We therefore claim that even though each individual has a unique combination of factors involved in the mechanism of disease development, most targeted pathways that need to be altered by these factors are, for the most part, the same. These pathways can serve as disease markers. Individuals, for example, can be scanned for factors affecting the genes in marker pathways. Hence, individual factors of disease development can be determined; and this knowledge can be exploited for drug development and personalized therapeutic applications. Here, we discuss the potential avenues of pathway markers in medicine and their translation to preventive and individualized health care

    Prediction of peptides binding to MHC class I and II alleles by temporal motif mining

    Get PDF
    Background: MHC (Major Histocompatibility Complex) is a key player in the immune response of most vertebrates. The computational prediction of whether a given antigenic peptide will bind to a specific MHC allele is important in the development of vaccines for emerging pathogens, the creation of possibilities for controlling immune response, and for the applications of immunotherapy. One of the problems that make this computational prediction difficult is the detection of the binding core region in peptides, coupled with the presence of bulges and loops causing variations in the total sequence length. Most machine learning methods require the sequences to be of the same length to successfully discover the binding motifs, ignoring the length variance in both motif mining and prediction steps. In order to overcome this limitation, we propose the use of time-based motif mining methods that work position-independently. Results: The prediction method was tested on a benchmark set of 28 different alleles for MHC class I and 27 different alleles for MHC class II. The obtained results are comparable to the state of the art methods for both MHC classes, surpassing the published results for some alleles. The average prediction AUC values are 0.897 for class I, and 0.858 for class II. Conclusions: Temporal motif mining using partial periodic patterns can capture information about the sequences well enough to predict the binding of the peptides and is comparable to state of the art methods in the literature. Unlike neural networks or matrix based predictors, our proposed method does not depend on peptide length and can work with both short and long fragments. This advantage allows better use of the available training data and the prediction of peptides of uncommon lengths
    corecore