51 research outputs found

    Using structural motif descriptors for sequence-based binding site prediction

    Get PDF
    All authors are with the Biotechnological Center, TU Dresden, Tatzberg 47-51, 01307 Dresden, Germany and -- Wan Kyu Kim is with the Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX 78712, USABackground: Many protein sequences are still poorly annotated. Functional characterization of a protein is often improved by the identification of its interaction partners. Here, we aim to predict protein-protein interactions (PPI) and protein-ligand interactions (PLI) on sequence level using 3D information. To this end, we use machine learning to compile sequential segments that constitute structural features of an interaction site into one profile Hidden Markov Model descriptor. The resulting collection of descriptors can be used to screen sequence databases in order to predict functional sites. -- Results: We generate descriptors for 740 classified types of protein-protein binding sites and for more than 3,000 protein-ligand binding sites. Cross validation reveals that two thirds of the PPI descriptors are sufficiently conserved and significant enough to be used for binding site recognition. We further validate 230 PPIs that were extracted from the literature, where we additionally identify the interface residues. Finally we test ligand-binding descriptors for the case of ATP. From sequences with Swiss-Prot annotation "ATP-binding", we achieve a recall of 25% with a precision of 89%, whereas Prosite's P-loop motif recognizes an equal amount of hits at the expense of a much higher number of false positives (precision: 57%). Our method yields 771 hits with a precision of 96% that were not previously picked up by any Prosite-pattern. -- Conclusion: The automatically generated descriptors are a useful complement to known Prosite/InterPro motifs. They serve to predict protein-protein as well as protein-ligand interactions along with their binding site residues for proteins where merely sequence information is available.Institute for Cellular and Molecular [email protected]

    Predicting and Validating Protein Interactions Using Network Structure

    Get PDF
    Protein interactions play a vital part in the function of a cell. As experimental techniques for detection and validation of protein interactions are time consuming, there is a need for computational methods for this task. Protein interactions appear to form a network with a relatively high degree of local clustering. In this paper we exploit this clustering by suggesting a score based on triplets of observed protein interactions. The score utilises both protein characteristics and network properties. Our score based on triplets is shown to complement existing techniques for predicting protein interactions, outperforming them on data sets which display a high degree of clustering. The predicted interactions score highly against test measures for accuracy. Compared to a similar score derived from pairwise interactions only, the triplet score displays higher sensitivity and specificity. By looking at specific examples, we show how an experimental set of interactions can be enriched and validated. As part of this work we also examine the effect of different prior databases upon the accuracy of prediction and find that the interactions from the same kingdom give better results than from across kingdoms, suggesting that there may be fundamental differences between the networks. These results all emphasize that network structure is important and helps in the accurate prediction of protein interactions. The protein interaction data set and the program used in our analysis, and a list of predictions and validations, are available at http://www.stats.ox.ac.uk/bioinfo/resources/PredictingInteractions

    Predicting the protein-protein interactions using primary structures with predicted protein surface

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many biological functions involve various protein-protein interactions (PPIs). Elucidating such interactions is crucial for understanding general principles of cellular systems. Previous studies have shown the potential of predicting PPIs based on only sequence information. Compared to approaches that require other auxiliary information, these sequence-based approaches can be applied to a broader range of applications.</p> <p>Results</p> <p>This study presents a novel sequence-based method based on the assumption that protein-protein interactions are more related to amino acids at the surface than those at the core. The present method considers surface information and maintains the advantage of relying on only sequence data by including an accessible surface area (ASA) predictor recently proposed by the authors. This study also reports the experiments conducted to evaluate a) the performance of PPI prediction achieved by including the predicted surface and b) the quality of the predicted surface in comparison with the surface obtained from structures. The experimental results show that surface information helps to predict interacting protein pairs. Furthermore, the prediction performance achieved by using the surface estimated with the ASA predictor is close to that using the surface obtained from protein structures.</p> <p>Conclusion</p> <p>This work presents a sequence-based method that takes into account surface information for predicting PPIs. The proposed procedure of surface identification improves the prediction performance with an <it>F-measure </it>of 5.1%. The extracted surfaces are also valuable in other biomedical applications that require similar information.</p

    Identifying Cognate Binding Pairs among a Large Set of Paralogs: The Case of PE/PPE Proteins of Mycobacterium tuberculosis

    Get PDF
    We consider the problem of how to detect cognate pairs of proteins that bind when each belongs to a large family of paralogs. To illustrate the problem, we have undertaken a genomewide analysis of interactions of members of the PE and PPE protein families of Mycobacterium tuberculosis. Our computational method uses structural information, operon organization, and protein coevolution to infer the interaction of PE and PPE proteins. Some 289 PE/PPE complexes were predicted out of a possible 5,590 PE/PPE pairs genomewide. Thirty-five of these predicted complexes were also found to have correlated mRNA expression, providing additional evidence for these interactions. We show that our method is applicable to other protein families, by analyzing interactions of the Esx family of proteins. Our resulting set of predictions is a starting point for genomewide experimental interaction screens of the PE and PPE families, and our method may be generally useful for detecting interactions of proteins within families having many paralogs

    An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker's Yeast, Saccharomyces cerevisiae

    Get PDF
    Background: Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations. Methodology/Principal Findings: We report a significantly improved version (v. 2) of a probabilistic functional gene network [1] of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis. Conclusions/Significance: YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome). YeastNet is available from http://www.yeastnet.org.This work was supported by grants from the N.S.F. (IIS-0325116, EIA-0219061), N.I.H. (GM06779-01,GM076536-01), Welch (F-1515), and a Packard Fellowship (EMM). These agencies were not involved in the design and conduct of the study, in the collection, analysis, and interpretation of the data, or in the preparation, review, or approval of the manuscript.Cellular and Molecular Biolog

    Bayesian Inference for Genomic Data Integration Reduces Misclassification Rate in Predicting Protein-Protein Interactions

    Get PDF
    Protein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false positive rates can be as high as >80%. Error correction from each generating source can be both time-consuming and inefficient due to the difficulty of covering the errors from multiple levels of data processing procedures within a single test. We propose a novel Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL), to lower the misclassification rate (both false positives and negatives) through automatically up-weighting data sources that are most informative, while down-weighting less informative and biased sources. Extensive studies indicate that NBEL is significantly more robust than the classic naïve Bayes to unreliable, error-prone and contaminated data. On a large human data set our NBEL approach predicts many more PPIs than naïve Bayes. This suggests that previous studies may have large numbers of not only false positives but also false negatives. The validation on two human PPIs datasets having high quality supports our observations. Our experiments demonstrate that it is feasible to predict high-throughput PPIs computationally with substantially reduced false positives and false negatives. The ability of predicting large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and may speed up PPIs prediction with high quality. Such a reliable prediction may provide a solid platform to other studies such as protein functions prediction and roles of PPIs in disease susceptibility

    SPPS: A Sequence-Based Method for Predicting Probability of Protein-Protein Interaction Partners

    Get PDF
    Background: The molecular network sustained by different types of interactions among proteins is widely manifested as the fundamental driving force of cellular operations. Many biological functions are determined by the crosstalk between proteins rather than by the characteristics of their individual components. Thus, the searches for protein partners in global networks are imperative when attempting to address the principles of biology. Results: We have developed a web-based tool ‘‘Sequence-based Protein Partners Search’ ’ (SPPS) to explore interacting partners of proteins, by searching over a large repertoire of proteins across many species. SPPS provides a database containing more than 60,000 protein sequences with annotations and a protein-partner search engine in two modes (Single Query and Multiple Query). Two interacting proteins of human FBXO6 protein have been found using the service in the study. In addition, users can refine potential protein partner hits by using annotations and possible interactive network in the SPPS web server. Conclusions: SPPS provides a new type of tool to facilitate the identification of direct or indirect protein partners which may guide scientists on the investigation of new signaling pathways. The SPPS server is available to the public a

    Critical assessment of sequence-based protein-protein interaction prediction methods that do not require homologous protein sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein-protein interactions underlie many important biological processes. Computational prediction methods can nicely complement experimental approaches for identifying protein-protein interactions. Recently, a unique category of sequence-based prediction methods has been put forward - unique in the sense that it does not require homologous protein sequences. This enables it to be universally applicable to all protein sequences unlike many of previous sequence-based prediction methods. If effective as claimed, these new sequence-based, universally applicable prediction methods would have far-reaching utilities in many areas of biology research.</p> <p>Results</p> <p>Upon close survey, I realized that many of these new methods were ill-tested. In addition, newer methods were often published without performance comparison with previous ones. Thus, it is not clear how good they are and whether there are significant performance differences among them. In this study, I have implemented and thoroughly tested 4 different methods on large-scale, non-redundant data sets. It reveals several important points. First, significant performance differences are noted among different methods. Second, data sets typically used for training prediction methods appear significantly biased, limiting the general applicability of prediction methods trained with them. Third, there is still ample room for further developments. In addition, my analysis illustrates the importance of complementary performance measures coupled with right-sized data sets for meaningful benchmark tests.</p> <p>Conclusions</p> <p>The current study reveals the potentials and limits of the new category of sequence-based protein-protein interaction prediction methods, which in turn provides a firm ground for future endeavours in this important area of contemporary bioinformatics.</p

    Toward a Comprehensive Approach to the Collection and Analysis of Pica Substances, with Emphasis on Geophagic Materials

    Get PDF
    Pica, the craving and subsequent consumption of non-food substances such as earth, charcoal, and raw starch, has been an enigma for more than 2000 years. Currently, there are little available data for testing major hypotheses about pica because of methodological limitations and lack of attention to the problem.In this paper we critically review procedures and guidelines for interviews and sample collection that are appropriate for a wide variety of pica substances. In addition, we outline methodologies for the physical, mineralogical, and chemical characterization of these substances, with particular focus on geophagic soils and clays. Many of these methods are standard procedures in anthropological, soil, or nutritional sciences, but have rarely or never been applied to the study of pica.Physical properties of geophagic materials including color, particle size distribution, consistency and dispersion/flocculation (coagulation) should be assessed by appropriate methods. Quantitative mineralogical analyses by X-ray diffraction should be made on bulk material as well as on separated clay fractions, and the various clay minerals should be characterized by a variety of supplementary tests. Concentrations of minerals should be determined using X-ray fluorescence for non-food substances and inductively coupled plasma-atomic emission spectroscopy for food-like substances. pH, salt content, cation exchange capacity, organic carbon content and labile forms of iron oxide should also be determined. Finally, analyses relating to biological interactions are recommended, including determination of the bioavailability of nutrients and other bioactive components from pica substances, as well as their detoxification capacities and parasitological profiles.This is the first review of appropriate methodologies for the study of human pica. The comprehensive and multi-disciplinary approach to the collection and analysis of pica substances detailed here is a necessary preliminary step to understanding the nutritional enigma of non-food consumption

    Search for gravitational waves from Scorpius X-1 in the second Advanced LIGO observing run with an improved hidden Markov model

    Get PDF
    We present results from a semicoherent search for continuous gravitational waves from the low-mass x-ray binary Scorpius X-1, using a hidden Markov model (HMM) to track spin wandering. This search improves on previous HMM-based searches of LIGO data by using an improved frequency domain matched filter, the J-statistic, and by analyzing data from Advanced LIGO's second observing run. In the frequency range searched, from 60 to 650 Hz, we find no evidence of gravitational radiation. At 194.6 Hz, the most sensitive search frequency, we report an upper limit on gravitational wave strain (at 95% confidence) of h095%=3.47×10-25 when marginalizing over source inclination angle. This is the most sensitive search for Scorpius X-1, to date, that is specifically designed to be robust in the presence of spin wandering. © 2019 American Physical Society
    corecore