251 research outputs found
The GOA database in 2009—an integrated Gene Ontology Annotation resource
The Gene Ontology Annotation (GOA) project at the EBI (http://www.ebi.ac.uk/goa) provides high-quality electronic and manual associations (annotations) of Gene Ontology (GO) terms to UniProt Knowledgebase (UniProtKB) entries. Annotations created by the project are collated with annotations from external databases to provide an extensive, publicly available GO annotation resource. Currently covering over 160 000 taxa, with greater than 32 million annotations, GOA remains the largest and most comprehensive open-source contributor to the GO Consortium (GOC) project. Over the last five years, the group has augmented the number and coverage of their electronic pipelines and a number of new manual annotation projects and collaborations now further enhance this resource. A range of files facilitate the download of annotations for particular species, and GO term information and associated annotations can also be viewed and downloaded from the newly developed GOA QuickGO tool (http://www.ebi.ac.uk/QuickGO), which allows users to precisely tailor their annotation set
FFPred: an integrated feature-based function prediction server for vertebrate proteomes
One of the challenges of the post-genomic era is to provide accurate function annotations for large volumes of data resulting from genome sequencing projects. Most function prediction servers utilize methods that transfer existing database annotations between orthologous sequences. In contrast, there are few methods that are independent of homology and can annotate distant and orphan protein sequences. The FFPred server adopts a machine-learning approach to perform function prediction in protein feature space using feature characteristics predicted from amino acid sequence. The features are scanned against a library of support vector machines representing over 300 Gene Ontology (GO) classes and probabilistic confidence scores returned for each annotation term. The GO term library has been modelled on human protein annotations; however, benchmark performance testing showed robust performance across higher eukaryotes. FFPred offers important advantages over traditional function prediction servers in its ability to annotate distant homologues and orphan protein sequences, and achieves greater coverage and classification accuracy than other feature-based prediction servers. A user may upload an amino acid and receive annotation predictions via email. Feature information is provided as easy to interpret graphics displayed on the sequence of interest, allowing for back-interpretation of the associations between features and function classes
SmedGD: the Schmidtea mediterranea genome database
The planarian Schmidtea mediterranea is rapidly emerging as a model organism for the study of regeneration, tissue homeostasis and stem cell biology. The recent sequencing, assembly and annotation of its genome are expected to further buoy the biomedical importance of this organism. In order to make the extensive data associated with the genome sequence accessible to the biomedical and planarian communities, we have created the Schmidtea mediterranea Genome Database (SmedGD). SmedGD integrates in a single web-accessible portal all available data associated with the planarian genome, including predicted and annotated genes, ESTs, protein homologies, gene expression patterns and RNAi phenotypes. Moreover, SmedGD was designed using tools provided by the Generic Model Organism Database (GMOD) project, thus making its data structure compatible with other model organism databases. Because of the unique phylogenetic position of planarians, SmedGD (http://smedgd.neuro.utah.edu) will prove useful not only to the planarian research community, but also to those engaged in developmental and evolutionary biology, comparative genomics, stem cell research and regeneration
Mining the Gene Wiki for functional genomic knowledge
<p>Abstract</p> <p>Background</p> <p>Ontology-based gene annotations are important tools for organizing and analyzing genome-scale biological data. Collecting these annotations is a valuable but costly endeavor. The Gene Wiki makes use of Wikipedia as a low-cost, mass-collaborative platform for assembling text-based gene annotations. The Gene Wiki is comprised of more than 10,000 review articles, each describing one human gene. The goal of this study is to define and assess a computational strategy for translating the text of Gene Wiki articles into ontology-based gene annotations. We specifically explore the generation of structured annotations using the Gene Ontology and the Human Disease Ontology.</p> <p>Results</p> <p>Our system produced 2,983 candidate gene annotations using the Disease Ontology and 11,022 candidate annotations using the Gene Ontology from the text of the Gene Wiki. Based on manual evaluations and comparisons to reference annotation sets, we estimate a precision of 90-93% for the Disease Ontology annotations and 48-64% for the Gene Ontology annotations. We further demonstrate that this data set can systematically improve the results from gene set enrichment analyses.</p> <p>Conclusions</p> <p>The Gene Wiki is a rapidly growing corpus of text focused on human gene function. Here, we demonstrate that the Gene Wiki can be a powerful resource for generating ontology-based gene annotations. These annotations can be used immediately to improve workflows for building curated gene annotation databases and knowledge-based statistical analyses.</p
Manual GO annotation of predictive protein signatures: the InterPro approach to GO curation
InterPro amalgamates predictive protein signatures from a number of well-known partner databases into a single resource. To aid with interpretation of results, InterPro entries are manually annotated with terms from the Gene Ontology (GO). The InterPro2GO mappings are comprised of the cross-references between these two resources and are the largest source of GO annotation predictions for proteins. Here, we describe the protocol by which InterPro curators integrate GO terms into the InterPro database. We discuss the unique challenges involved in integrating specific GO terms with entries that may describe a diverse set of proteins, and we illustrate, with examples, how InterPro hierarchies reflect GO terms of increasing specificity. We describe a revised protocol for GO mapping that enables us to assign GO terms to domains based on the function of the individual domain, rather than the function of the families in which the domain is found. We also discuss how taxonomic constraints are dealt with and those cases where we are unable to add any appropriate GO terms. Expert manual annotation of InterPro entries with GO terms enables users to infer function, process or subcellular information for uncharacterized sequences based on sequence matches to predictive models
AgBase: a functional genomics resource for agriculture
BACKGROUND: Many agricultural species and their pathogens have sequenced genomes and more are in progress. Agricultural species provide food, fiber, xenotransplant tissues, biopharmaceuticals and biomedical models. Moreover, many agricultural microorganisms are human zoonoses. However, systems biology from functional genomics data is hindered in agricultural species because agricultural genome sequences have relatively poor structural and functional annotation and agricultural research communities are smaller with limited funding compared to many model organism communities. DESCRIPTION: To facilitate systems biology in these traditionally agricultural species we have established "AgBase", a curated, web-accessible, public resource for structural and functional annotation of agricultural genomes. The AgBase database includes a suite of computational tools to use GO annotations. We use standardized nomenclature following the Human Genome Organization Gene Nomenclature guidelines and are currently functionally annotating chicken, cow and sheep gene products using the Gene Ontology (GO). The computational tools we have developed accept and batch process data derived from different public databases (with different accession codes), return all existing GO annotations, provide a list of products without GO annotation, identify potential orthologs, model functional genomics data using GO and assist proteomics analysis of ESTs and EST assemblies. Our journal database helps prevent redundant manual GO curation. We encourage and publicly acknowledge GO annotations from researchers and provide a service for researchers interested in GO and analysis of functional genomics data. CONCLUSION: The AgBase database is the first database dedicated to functional genomics and systems biology analysis for agriculturally important species and their pathogens. We use experimental data to improve structural annotation of genomes and to functionally characterize gene products. AgBase is also directly relevant for researchers in fields as diverse as agricultural production, cancer biology, biopharmaceuticals, human health and evolutionary biology. Moreover, the experimental methods and bioinformatics tools we provide are widely applicable to many other species including model organisms
Quality of Computationally Inferred Gene Ontology Annotations
Gene Ontology (GO) has established itself as the undisputed standard for protein function annotation. Most annotations are inferred electronically, i.e. without individual curator supervision, but they are widely considered unreliable. At the same time, we crucially depend on those automated annotations, as most newly sequenced genomes are non-model organisms. Here, we introduce a methodology to systematically and quantitatively evaluate electronic annotations. By exploiting changes in successive releases of the UniProt Gene Ontology Annotation database, we assessed the quality of electronic annotations in terms of specificity, reliability, and coverage. Overall, we not only found that electronic annotations have significantly improved in recent years, but also that their reliability now rivals that of annotations inferred by curators when they use evidence other than experiments from primary literature. This work provides the means to identify the subset of electronic annotations that can be relied upon—an important outcome given that >98% of all annotations are inferred without direct curation
Identification of Melatonin-Regulated Genes in the Ovine Pituitary Pars Tuberalis, a Target Site for Seasonal Hormone Control
The pars tuberalis (PT) of the pituitary gland expresses a high density of melatonin (MEL) receptors and is believed to regulate seasonal physiology by decoding changes in nocturnal melatonin secretion. Circadian clock genes are known to be expressed in the PT in response to the decline (Per1) and onset (Cry1) of MEL secretion, but to date little is known of other molecular changes in this key MEL target site. To identify transcriptional pathways that may be involved in the diurnal and photoperiod-transduction mechanism, we performed a whole genome transcriptome analysis using PT RNA isolated from sheep culled at three time points over the 24-h cycle under either long or short photoperiods. Our results reveal 153 transcripts where expression differs between photoperiods at the light-dark transition and 54 transcripts where expression level was more globally altered by photoperiod (all time points combined). Cry1 induction at night was associated with up-regulation of genes coding for NeuroD1 (neurogenic differentiation factor 1), Pbef / Nampt (nicotinamide phosphoribosyltransferase) , Hif1α (hypoxia-inducible factor-1α), and Kcnq5 (K channel) and down-regulation of Rorβ, a key clock gene regulator. Using in situ hybridization, we confirmed day-night differences in expression for Pbef / Nampt, NeuroD1, and Rorβ in the PT. Treatment of sheep with MEL increased PT expression for Cry1, Pbef / Nampt, NeuroD1, and Hif1α, but not Kcnq5. Our data thus reveal a cluster of Cry1-associated genes that are acutely responsive to MEL and novel transcriptional pathways involved in MEL action in the PT
Estimating the annotation error rate of curated GO database sequence annotations
Background Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have developed a new method of estimating the error rate of curated sequence annotations, and applied this to the Gene Ontology (GO) sequence database (GOSeqLite). This method involved artificially adding errors to sequence annotations at known rates, and used regression to model the impact on the precision of annotations based on BLAST matched sequences. Results We estimated the error rate of curated GO sequence annotations in the GOSeqLite database (March 2006) at between 28% and 30%. Annotations made without use of sequence similarity based methods (non-ISS) had an estimated error rate of between 13% and 18%. Annotations made with the use of sequence similarity methodology (ISS) had an estimated error rate of 49%. Conclusion While the overall error rate is reasonably low, it would be prudent to treat all ISS annotations with caution. Electronic annotators that use ISS annotations as the basis of predictions are likely to have higher false prediction rates, and for this reason designers of these systems should consider avoiding ISS annotations where possible. Electronic annotators that use ISS annotations to make predictions should be viewed sceptically. We recommend that curators thoroughly review ISS annotations before accepting them as valid. Overall, users of curated sequence annotations from the GO database should feel assured that they are using a comparatively high quality source of information.Craig E. Jones, Alfred L. Brown and Ute Bauman
- …