Search CORE

Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development

Author: A Millard
C Mungall
Christopher J Mungall
CJ Mungall
E Camon
Emily C Dimmer
H Mi
I Vastrik
J Day-Richter
J Wielemaker
Jennifer I Deegan
JI Clark
L Evans
M Courtot
Reference Genome Group of the Gene Ontology Consortium
S Carbon
S Hunter
The Gene Ontology Consortium
The UniProt Consortium
TJP Hubbard
W Kuśnierczyk
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The Gene Ontology project supports categorization of gene products according to their location of action, the molecular functions that they carry out, and the processes that they are involved in. Although the ontologies are intentionally developed to be taxon neutral, and to cover all species, there are inherent taxon specificities in some branches. For example, the process 'lactation' is specific to mammals and the location 'mitochondrion' is specific to eukaryotes. The lack of an explicit formalization of these constraints can lead to errors and inconsistencies in automated and manual annotation. Results We have formalized the taxonomic constraints implicit in some GO classes, and specified these at various levels in the ontology. We have also developed an inference system that can be used to check for violations of these constraints in annotations. Using the constraints in conjunction with the inference system, we have detected and removed errors in annotations and improved the structure of the ontology. Conclusions Detection of inconsistencies in taxon-specificity enables gradual improvement of the ontologies, the annotations, and the formalized constraints. This is progressively improving the quality of our data. The full system is available for download, and new constraints or proposed changes to constraints can be submitted online at <url>https://sourceforge.net/tracker/?atid=605890&group_id=36855</url>.</p

arXiv.org e-Print Archive

Towards a genome-wide transcriptogram: the Saccharomyces cerevisiae case

Author: Bader
Barabasi
Blatt
Enright
Fry
Hooper
Huang
Jelinsky
Jensen
José C. F. Moreira
José Luiz Rybarczyk-Filho
Kanehisa
Kirkpatrick
Leonardo G. Brunnet
Mauro A. A. Castro
Metropolis
Ravasz
Rita M. C. de Almeida
Rodrigo J. S. Dalmolin
Spirin
The Gene Ontology Consortium
Tu
Vinogradov
Watts
Publication venue
Publication date: 16/10/2009
Field of study

A genome modular classification that associates cellular processes to modules could lead to a method to quantify the differences in gene expression levels in different cellular stages or conditions: the transcriptogram, a powerful tool for assessing cell performance, would be at hand. Here we present a computational method to order genes on a line that clusters strongly interacting genes, defining functional modules associated with gene ontology terms. The starting point is a list of genes and a matrix specifying their interactions, available at large gene interaction databases. Considering the Saccharomyces cerevisiae genome we produced a succession of plots of gene transcription levels for a fermentation process. These plots discriminate the fermentation stage the cell is going through and may be regarded as the first versions of a transcriptogram. This method is useful for extracting information from cell stimuli/responses experiments, and may be applied with diagnostic purposes to different organisms

DeepBrain: Functional Representation of Neural In-Situ Hybridization Images for Gene Ontology Classification Using Deep Convolutional Autoencoders

Author: AM Henry
C Cortes
DG Lowe
ES Lein
FP Davis
GE Hinton
J Masci
K Puniyani
L Ng
M Ashburner
M Hawrylycz
MD Zeiler
MJ Rapoport
N Skunca
OD King
P Bork
P Pinoli
P Vincent
The Gene Ontology Consortium
U Shalit
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/11/2017
Field of study

This paper presents a novel deep learning-based method for learning a functional representation of mammalian neural images. The method uses a deep convolutional denoising autoencoder (CDAE) for generating an invariant, compact representation of in situ hybridization (ISH) images. While most existing methods for bio-imaging analysis were not developed to handle images with highly complex anatomical structures, the results presented in this paper show that functional representation extracted by CDAE can help learn features of functional gene ontology categories for their classification in a highly accurate manner. Using this CDAE representation, our method outperforms the previous state-of-the-art classification rate, by improving the average AUC from 0.92 to 0.98, i.e., achieving 75% reduction in error. The method operates on input images that were downsampled significantly with respect to the original ones to make it computationally feasible

Where differences resemble: sequence-feature analysis in curated databases of intrinsically disordered proteins

Author: Apweiler
Bardou
Brown
Damiano Piovesan
Das
Daughdrill
Dinkel
Dunker
Dunker
Dunker
Dyson
Fichó
Fu
Fukuchi
Gunasekaran
Holehouse
Lee
Marco Necci
Miskei
Mészáros
Necci
Necci
Peng
Piovesan
Piovesan
Piovesan
Radivojac
Receveur-Bréchot
Schad
Silvio C E Tosatto
The Gene Ontology Consortium
Tompa
Uversky
Van Roey
Vucetic
Ward
Wootton
Wright
Xue
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

Archivio istituzionale della ricerca - Università di Padova

Sequencing and Analysis of the Mediterranean Amphioxus (Branchiostoma lanceolatum) Transcriptome

Author: A Conesa
A Coppe
BA Fraser
C Canestro
Consortium The Gene Ontology
Consortium The Gene Ontology
F Delsuc
Hector Escriva
I Somorjai
JK Yu
JK Yu
LZ Holland
M Fuentes
M Fuentes
M Nohara
M Salem
M Schubert
M Schubert
MC Langlois
Mohamed R. Belgacem
N Takatori
NH Putnam
P Dehal
P Jin
S Bertrand
S Bertrand
S Gotz
S Kuraku
Silvan Oulion
Stephanie Bertrand
Vincent Laudet
Yann Le Petillon
Publication venue: Public Library of Science
Publication date: 09/05/2012
Field of study

BACKGROUND: The basally divergent phylogenetic position of amphioxus (Cephalochordata), as well as its conserved morphology, development and genetics, make it the best proxy for the chordate ancestor. Particularly, studies using the amphioxus model help our understanding of vertebrate evolution and development. Thus, interest for the amphioxus model led to the characterization of both the transcriptome and complete genome sequence of the American species, Branchiostoma floridae. However, recent technical improvements allowing induction of spawning in the laboratory during the breeding season on a daily basis with the Mediterranean species Branchiostoma lanceolatum have encouraged European Evo-Devo researchers to adopt this species as a model even though no genomic or transcriptomic data have been available. To fill this need we used the pyrosequencing method to characterize the B. lanceolatum transcriptome and then compared our results with the published transcriptome of B. floridae. RESULTS: Starting with total RNA from nine different developmental stages of B. lanceolatum, a normalized cDNA library was constructed and sequenced on Roche GS FLX (Titanium mode). Around 1.4 million of reads were produced and assembled into 70,530 contigs (average length of 490 bp). Overall 37% of the assembled sequences were annotated by BlastX and their Gene Ontology terms were determined. These results were then compared to genomic and transcriptomic data of B. floridae to assess similarities and specificities of each species. CONCLUSION: We obtained a high-quality amphioxus (B. lanceolatum) reference transcriptome using a high throughput sequencing approach. We found that 83% of the predicted genes in the B. floridae complete genome sequence are also found in the B. lanceolatum transcriptome, while only 41% were found in the B. floridae transcriptome obtained with traditional Sanger based sequencing. Therefore, given the high degree of sequence conservation between different amphioxus species, this set of ESTs may now be used as the reference transcriptome for the Branchiostoma genus

Public Library of Science (PLOS)

FigShare

Genomes as geography: using GIS technology to build interactive genome feature maps

Author: AI Su
Carol J Bult
Constance C Holden
D Karolchik
DL Wheeler
E Birney
J Bertin
JT Eppig
LD Stein
M Kate Beard
Mary E Dolan
ME Dolan
SE Lewis
T Barrett
T Ormsby
The Gene Ontology Consortium
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Many commonly used genome browsers display sequence annotations and related attributes as horizontal data tracks that can be toggled on and off according to user preferences. Most genome browsers use only simple keyword searches and limit the display of detailed annotations to one chromosomal region of the genome at a time. We have employed concepts, methodologies, and tools that were developed for the display of geographic data to develop a Genome Spatial Information System (GenoSIS) for displaying genomes spatially, and interacting with genome annotations and related attribute data. In contrast to the paradigm of horizontally stacked data tracks used by most genome browsers, GenoSIS uses the concept of registered spatial layers composed of spatial objects for integrated display of diverse data. In addition to basic keyword searches, GenoSIS supports complex queries, including spatial queries, and dynamically generates genome maps. Our adaptation of the geographic information system (GIS) model in a genome context supports spatial representation of genome features at multiple scales with a versatile and expressive query capability beyond that supported by existing genome browsers. RESULTS: We implemented an interactive genome sequence feature map for the mouse genome in GenoSIS, an application that uses ArcGIS, a commercially available GIS software system. The genome features and their attributes are represented as spatial objects and data layers that can be toggled on and off according to user preferences or displayed selectively in response to user queries. GenoSIS supports the generation of custom genome maps in response to complex queries about genome features based on both their attributes and locations. Our example application of GenoSIS to the mouse genome demonstrates the powerful visualization and query capability of mature GIS technology applied in a novel domain. CONCLUSION: Mapping tools developed specifically for geographic data can be exploited to display, explore and interact with genome data. The approach we describe here is organism independent and is equally useful for linear and circular chromosomes. One of the unique capabilities of GenoSIS compared to existing genome browsers is the capacity to generate genome feature maps dynamically in response to complex attribute and spatial queries

InterMitoBase: An annotated database and analysis platform of protein-protein interactions for human mitochondria

Author: B Aranda
C Stark
Chenyu Zhang
D Cotter
DJ Pagliarini
H Prokisch
HM McBride
HM Wain
Hua Xu
Jie Li
Jin Wang
Junling Wang
L Salwinski
Ming Gong
R Reja
Song Gao
The Gene Ontology Consortium
Uniprot Consortium
Y Benjamini
Zuguang Gu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The mitochondrion is an essential organelle which plays important roles in diverse biological processes, such as metabolism, apoptosis, signal transduction and cell cycle. Characterizing protein-protein interactions (PPIs) that execute mitochondrial functions is fundamental in understanding the mechanisms underlying biological functions and diseases associated with mitochondria. Investigations examining mitochondria are expanding to the system level because of the accumulation of mitochondrial proteomes and human interactome. Consequently, the development of a database that provides the entire protein interaction map of the human mitochondrion is urgently required. Results InterMitoBase provides a comprehensive interactome of human mitochondria. It contains the PPIs in biological pathways mediated by mitochondrial proteins, the PPIs between mitochondrial proteins and non-mitochondrial proteins as well as the PPIs between mitochondrial proteins. The current version of InterMitoBase covers 5,883 non-redundant PPIs of 2,813 proteins integrated from a wide range of resources including PubMed, KEGG, BioGRID, HPRD, DIP and IntAct. Comprehensive curations have been made on the interactions derived from PubMed. All the interactions in InterMitoBase are annotated according to the information collected from their original sources, GenBank and GO. Additionally, InterMitoBase features a user-friendly graphic visualization platform to present functional and topological analysis of PPI networks identified. This should aid researchers in the study of underlying biological properties. Conclusions InterMitoBase is designed as an integrated PPI database which provides the most up-to-date PPI information for human mitochondria. It also works as a platform by integrating several on-line tools for the PPI analysis. As an analysis platform and as a PPI database, InterMitoBase will be an important database for the study of mitochondria biochemistry, and should be particularly helpful in comprehensive analyses of complex biological mechanisms underlying mitochondrial functions.</p

OREMPdb: a semantic dictionary of computational pathway models

Author: A Bauer-Mehren
AL Lister
B Olivier
B Schoeberl
C Forbes Dewey
C Li
F Krause
Giuseppe Nicosia
H Wang
M Hucka
M Kanehisa
M Magrane
MJ Schilstra
ML Hines
N Le Novére
NF Noy
P de Matos
R Hoehndorf
R Umeton
Renato Umeton
SA Brown
T Yu
The Gene Ontology Consortium
VAS Ayyadurai
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background The information coming from biomedical ontologies and computational pathway models is expanding continuously: research communities keep this process up and their advances are generally shared by means of dedicated resources published on the web. In fact, such models are shared to provide the characterization of molecular processes, while biomedical ontologies detail a semantic context to the majority of those pathways. Recent advances in both fields pave the way for a scalable information integration based on aggregate knowledge repositories, but the lack of overall standard formats impedes this progress. Indeed, having different objectives and different abstraction levels, most of these resources "speak" different languages. Semantic web technologies are here explored as a means to address some of these problems. Methods Employing an extensible collection of interpreters, we developed OREMP (Ontology Reasoning Engine for Molecular Pathways), a system that abstracts the information from different resources and combines them together into a coherent ontology. Continuing this effort we present OREMPdb; once different pathways are fed into OREMP, species are linked to the external ontologies referred and to reactions in which they participate. Exploiting these links, the system builds species-sets, which encapsulate species that operate together. Composing all of the reactions together, the system computes all of the reaction paths from-and-to all of the species-sets. Results OREMP has been applied to the curated branch of BioModels (2011/04/15 release) which overall contains 326 models, 9244 reactions, and 5636 species. OREMPdb is the semantic dictionary created as a result, which is made of 7360 species-sets. For each one of these sets, OREMPdb links the original pathway and the link to the original paper where this information first appeared. </p

ProFITS of maize: a database of protein families involved in the transduction of signalling in the maize genome

Author: A Gfeller
B Rhead
C Dardick
C Molina
C Soderlund
J Jia
J Lai
JD Thompson
K Higo
LP Hamel
M Gribskov
MA Gore
ME Skinner
N Alexandrov
P Rice
PS Schnable
S Fonseca
S Hunter
S Kumar
SF Altschul
TD Wu
The Gene Ontology Consortium
TZ Sen
Yi Ling
Z Du
Z Du
Zhen Su
Zhenhai Zhang
Zhou Du
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Maize (<it>Zea mays </it>ssp. <it>mays </it>L.) is an important model for plant basic and applied research. In 2009, the B73 maize genome sequencing made a great step forward, using clone by clone strategy; however, functional annotation and gene classification of the maize genome are still limited. Thus, a well-annotated datasets and informative database will be important for further research discoveries. Signal transduction is a fundamental biological process in living cells, and many protein families participate in this process in sensing, amplifying and responding to various extracellular or internal stimuli. Therefore, it is a good starting point to integrate information on the maize functional genes involved in signal transduction. Results Here we introduce a comprehensive database 'ProFITS' (Protein Families Involved in the Transduction of Signalling), which endeavours to identify and classify protein kinases/phosphatases, transcription factors and ubiquitin-proteasome-system related genes in the B73 maize genome. Users can explore gene models, corresponding transcripts and FLcDNAs using the three abovementioned protein hierarchical categories, and visualize them using an AJAX-based genome browser (JBrowse) or Generic Genome Browser (GBrowse). Functional annotations such as GO annotation, protein signatures, protein best-hits in the <it>Arabidopsis </it>and rice genome are provided. In addition, pre-calculated transcription factor binding sites of each gene are generated and mutant information is incorporated into ProFITS. In short, ProFITS provides a user-friendly web interface for studies in signal transduction process in maize. Conclusion ProFITS, which utilizes both the B73 maize genome and full length cDNA (FLcDNA) datasets, provides users a comprehensive platform of maize annotation with specific focus on the categorization of families involved in the signal transduction process. ProFITS is designed as a user-friendly web interface and it is valuable for experimental researchers. It is freely available now to all users at <url>http://bioinfo.cau.edu.cn/ProFITS</url>.</p