16 research outputs found

    Bayesian Markov Random Field Analysis for Protein Function Prediction Based on Network Data

    Get PDF
    Inference of protein functions is one of the most important aims of modern biology. To fully exploit the large volumes of genomic data typically produced in modern-day genomic experiments, automated computational methods for protein function prediction are urgently needed. Established methods use sequence or structure similarity to infer functions but those types of data do not suffice to determine the biological context in which proteins act. Current high-throughput biological experiments produce large amounts of data on the interactions between proteins. Such data can be used to infer interaction networks and to predict the biological process that the protein is involved in. Here, we develop a probabilistic approach for protein function prediction using network data, such as protein-protein interaction measurements. We take a Bayesian approach to an existing Markov Random Field method by performing simultaneous estimation of the model parameters and prediction of protein functions. We use an adaptive Markov Chain Monte Carlo algorithm that leads to more accurate parameter estimates and consequently to improved prediction performance compared to the standard Markov Random Fields method. We tested our method using a high quality S.cereviciae validation network with 1622 proteins against 90 Gene Ontology terms of different levels of abstraction. Compared to three other protein function prediction methods, our approach shows very good prediction performance. Our method can be directly applied to protein-protein interaction or coexpression networks, but also can be extended to use multiple data sources. We apply our method to physical protein interaction data from S. cerevisiae and provide novel predictions, using 340 Gene Ontology terms, for 1170 unannotated proteins and we evaluate the predictions using the available literature

    Identification of Colorectal Cancer Related Genes with mRMR and Shortest Path in Protein-Protein Interaction Network

    Get PDF
    One of the most important and challenging problems in biomedicine and genomics is how to identify the disease genes. In this study, we developed a computational method to identify colorectal cancer-related genes based on (i) the gene expression profiles, and (ii) the shortest path analysis of functional protein association networks. The former has been used to select differentially expressed genes as disease genes for quite a long time, while the latter has been widely used to study the mechanism of diseases. With the existing protein-protein interaction data from STRING (Search Tool for the Retrieval of Interacting Genes), a weighted functional protein association network was constructed. By means of the mRMR (Maximum Relevance Minimum Redundancy) approach, six genes were identified that can distinguish the colorectal tumors and normal adjacent colonic tissues from their gene expression profiles. Meanwhile, according to the shortest path approach, we further found an additional 35 genes, of which some have been reported to be relevant to colorectal cancer and some are very likely to be relevant to it. Interestingly, the genes we identified from both the gene expression profiles and the functional protein association network have more cancer genes than the genes identified from the gene expression profiles alone. Besides, these genes also had greater functional similarity with the reported colorectal cancer genes than the genes identified from the gene expression profiles alone. All these indicate that our method as presented in this paper is quite promising. The method may become a useful tool, or at least plays a complementary role to the existing method, for identifying colorectal cancer genes. It has not escaped our notice that the method can be applied to identify the genes of other diseases as well

    Metric Labeling and Semi-metric Embedding for Protein Annotation Prediction

    No full text
    Computational techniques have been successful at predicting protein function from relational data (functional or physical interactions). These prediction techniques have been used to generate hypotheses and to direct experimental validation. With few exceptions, these predictive tasks are modeled as multi-label classification problems where the labels (functions) are treated independently or semi-independently. However, databases such as the Gene Ontology provide more information about the similarities between functions. It is a largely open question how much the use of relationships between functions can improve the quality of function prediction techniques. In this paper, we explore the use of the Metric Labeling combinatorial optimization problem to make use of heuristically computed distances between functions to make more accurate predictions of protein function in networks derived from both physical interactions and a combination of other data types. To do this, we give a new technique (based on convex optimization) for converting heuristic semimetric distances (from, e.g. Gene Ontology) into a metric that finds an embedding of the semimetric into a metric with minimum least-squares distortion (LSD). The Metric Labeling approach is shown to outperform 5 existing techniques for inferring function from networks. These results suggest Metric Labeling is useful for protein function prediction, and that our LSD minimization approach can help solve the problem of converting heuristic distances to a metric. 1

    Evolved hexose transporter enhances xylose uptake and glucose/xylose co-utilization in Saccharomyces cerevisiae

    No full text
    Enhancing xylose utilization has been a major focus in Saccharomyces cerevisiae strain-engineering efforts. The incentive for these studies arises from the need to use all sugars in the typical carbon mixtures that comprise standard renewable plant-biomass-based carbon sources. While major advances have been made in developing utilization pathways, the efficient import of five carbon sugars into the cell remains an important bottleneck in this endeavor. Here we use an engineered S. cerevisiae BY4742 strain, containing an established heterologous xylose utilization pathway, and imposed a laboratory evolution regime with xylose as the sole carbon source. We obtained several evolved strains with improved growth phenotypes and evaluated the best candidate using genome resequencing. We observed remarkably few single nucleotide polymorphisms in the evolved strain, among which we confirmed a single amino acid change in the hexose transporter HXT7 coding sequence to be responsible for the evolved phenotype. The mutant HXT7(F79S) shows improved xylose uptake rates (Vmax = 186.4 ± 20.1 nmol•min(−1)•mg(−1)) that allows the S. cerevisiae strain to show significant growth with xylose as the sole carbon source, as well as partial co-utilization of glucose and xylose in a mixed sugar cultivation
    corecore