12 research outputs found

    A critical assessment of Mus musculus gene function prediction using integrated genomic evidence

    Get PDF
    Background: Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated. Results: In this study, a standardized collection of mouse functional genomic data was assembled; nine bioinformatics teams used this data set to independently train classifiers and generate predictions of function, as defined by Gene Ontology (GO) terms, for 21,603 mouse genes; and the best performing submissions were combined in a single set of predictions. We identified strengths and weaknesses of current functional genomic data sets and compared the performance of function prediction algorithms. This analysis inferred functions for 76% of mouse genes, including 5,000 currently uncharacterized genes. At a recall rate of 20%, a unified set of predictions averaged 41% precision, with 26% of GO terms achieving a precision better than 90%. Conclusion: We performed a systematic evaluation of diverse, independently developed computational approaches for predicting gene function from heterogeneous data sources in mammals. The results show that currently available data for mammals allows predictions with both breadth and accuracy. Importantly, many highly novel predictions emerge for the 38% of mouse genes that remain uncharacterized

    Bayesian Aggregation for Hierarchical Classification

    No full text
    Large numbers of overlapping classes are found to be organized in hierarchies in many domains. In multi-label classification over such a hierarchy, members of a class must also belong to all of its parents. Training an independent classifier for each class is a common approach, but this may yield labels for a given example that collectively violate this constraint. We propose a principled method of resolving such inconsistencies to increase accuracy over all classes. Our approach is to view the hierarchy as a graphical model, and then to employ Bayesian inference to infer the most likely set of hierarchically consistent class labels from independent base classifier predictions. This method can work with any type of base classification algorithm. Experiments on synthetic data, as well as real data sets from bioinformatics and computer graphics domains, illustrate its behavior under a range of conditions, and demonstrate that it can improve accuracy over all levels of a hierarchy.

    Relative performance of different methods with regard to the test set and novel set on GO biological process terms (size 101 to 300)

    No full text
    The relative performance of individual groups differs between the test set and novel set. In addition, the performance on the novel set was generally worse than on the test set. This indicates that cross-validation should be used carefully in assessing the relative performance of different algorithms and that evaluation on novel biology is necessary. Asterisks indicate second round submissions. GO, Gene Ontology.<p><b>Copyright information:</b></p><p>Taken from "Predicting gene function in a hierarchical context with an ensemble of classifiers"</p><p>http://genomebiology.com/2008/9/S1/S3</p><p>Genome Biology 2008;9(Suppl 1):S3-S3.</p><p>Published online 27 Jun 2008</p><p>PMCID:PMC2447537.</p><p></p

    Microbial Forensics: Predicting Phenotypic Characteristics and Environmental Conditions from Large-Scale Gene Expression Profiles

    Get PDF
    <div><p>A tantalizing question in cellular physiology is whether the cellular state and environmental conditions can be inferred by the expression signature of an organism. To investigate this relationship, we created an extensive normalized gene expression compendium for the bacterium <i>Escherichia coli</i> that was further enriched with meta-information through an iterative learning procedure. We then constructed an ensemble method to predict environmental and cellular state, including strain, growth phase, medium, oxygen level, antibiotic and carbon source presence. Results show that gene expression is an excellent predictor of environmental structure, with multi-class ensemble models achieving balanced accuracy between 70.0% (±3.5%) to 98.3% (±2.3%) for the various characteristics. Interestingly, this performance can be significantly boosted when environmental and strain characteristics are simultaneously considered, as a composite classifier that captures the inter-dependencies of three characteristics (medium, phase and strain) achieved 10.6% (±1.0%) higher performance than any individual models. Contrary to expectations, only 59% of the top informative genes were also identified as differentially expressed under the respective conditions. Functional analysis of the respective genetic signatures implicates a wide spectrum of Gene Ontology terms and KEGG pathways with condition-specific information content, including iron transport, transferases, and enterobactin synthesis. Further experimental phenotypic-to-genotypic mapping that we conducted for knock-out mutants argues for the information content of top-ranked genes. This work demonstrates the degree at which genome-scale transcriptional information can be predictive of latent, heterogeneous and seemingly disparate phenotypic and environmental characteristics, with far-reaching applications.</p></div

    Distribution of GO terms at several precision/recall performance points

    No full text
    Proportion of Gene Ontology (GO) terms per evaluation category with a precision/recall performance point that is both above and to the right of a given precision/recall point in the contour plots. GO-BP, GO Biological process; GO-CC, GO Cellular component; GO-MF, GO Molecular function.<p><b>Copyright information:</b></p><p>Taken from "A critical assessment of gene function prediction using integrated genomic evidence"</p><p>http://genomebiology.com/2008/9/S1/S2</p><p>Genome Biology 2008;9(Suppl 1):S2-S2.</p><p>Published online 27 Jun 2008</p><p>PMCID:PMC2447536.</p><p></p
    corecore