Search CORE

26,499 research outputs found

Modeling sample variables with an Experimental Factor Ontology

Author: Alvis Brazma
Anna Zhukova
Bizer
Blake
Ele Holloway
Grenon
Gómez-Pérez
Helen Parkinson
Horridge
Horrocks
James Malone
Jie Zheng
Jonquet
Kapushesky
Malone
Misha Kapushesky
Nikolay Kolesnikov
Noy
Osborne
Parkinson
Phillips
Phillips
Rosse
Schofield
Sirin
Smith
Smith
Stevens
Tomasz Adamusiak
Uschold
Publication venue: Oxford University Press
Publication date: 15/04/2010
Field of study

Motivation: Describing biological sample variables with ontologies is complex due to the cross-domain nature of experiments. Ontologies provide annotation solutions; however, for cross-domain investigations, multiple ontologies are needed to represent the data. These are subject to rapid change, are often not interoperable and present complexities that are a barrier to biological resource users

Crossref

PubMed Central

HAL Descartes

A Factor Graph Approach to Automated GO Annotation

Author: Elizabeth Tapia
Fernando Roda
Flavia Krsticevic
Flavio E Spetale
Pilar Bulacio
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO) annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifiers are leveraged by checking the consistency of predefined GO relationships. Both formal leveraging strategies, with main focus on annotation precision, and heuristic alternatives, with main focus on scalability issues, have been described in literature. In this contribution, a factor graph approach to the hierarchical ensemble formulation of the automated GO annotation problem is presented. In this formal framework, a core factor graph is first built based on the GO structure and then enriched to take into account the noisy nature of GO-term predictions. Hence, starting from raw GO-term predictions, an iterative message passing algorithm between nodes of the factor graph is used to compute marginal probabilities of target GO-terms. Evaluations on Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster protein sequences from the GO Molecular Function domain showed significant improvements over competing approaches, even when protein sequences were naively characterized by their physicochemical and secondary structure properties or when loose noisy annotation datasets were considered. Based on these promising results and using Arabidopsis thaliana annotation data, we extend our approach to the identification of most promising molecular function annotations for a set of proteins of unknown function in Solanum lycopersicum.Fil: Spetale, Flavio Ezequiel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Krsticevic, Flavia Jorgelina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Roda, Fernando. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Bulacio, Pilar Estela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentin

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

Directory of Open Access Journals

PubMed Central

FigShare

Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures

Author: De la Cruz Bernard J.
Ghahramani Zoubin
Rasmussen Carl Edward
Wild David L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture (DPM) models provide a nonparametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model-based clustering methods have been to short time series data. In this paper, we present a case study of the application of nonparametric Bayesian clustering methods to the clustering of high-dimensional nontime series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a DPM model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data

Crossref

Warwick Research Archives Portal Repository

MPG.PuRe

Recommended from our members

Automating class definitions from OWL to English

Author: Malone James
Power Richard
Stevens Robert
Williams Sandra
Publication venue
Publication date: 01/07/2010
Field of study

Text definitions for entities within bio-ontologies are a cor-nerstone of the effort to gain a consensus in understanding and usage of those ontologies. Writing these definitions is, however, a considerable effort and there is often a lag be-tween specification of the entities in the ontology and the development of the text-based definitions. As well as these text definitions, there can also be logical descriptions and definitions of an ontology's entities. The goal of natural lan-guage generation (NLG) from ontologies is to take the logi-cal description of entities and generate fluent natural lan-guage. We should be able to use NLG to automatically pro-vide text-based definitions from an ontology that has logical descriptions of its entities and thus avoid the bottleneck of authoring these definitions by hand. In this paper we present some early work in using NLG to provide such text definitions for the Experimental factor Ontology (EFO). We present our results, discuss issues in generating text definitions, and highlight some future work

Open Research Online (The Open University)

Ranking relations using analogies in biological and information networks

Author: Airoldi EM
Ghahramani Z
Heller K
Silva R
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/06/2010
Field of study

Analogical reasoning depends fundamentally on the ability to learn and generalize about relations between objects. We develop an approach to relational learning which, given a set of pairs of objects

\mathbf{S}=\{A^{(1)}:B^{(1)},A^{(2)}:B^{(2)},\ldots,A^{(N)}:B ^{(N)}\}

, measures how well other pairs A:B fit in with the set

\mathbf{S}

. Our work addresses the following question: is the relation between objects A and B analogous to those relations found in

\mathbf{S}

? Such questions are particularly relevant in information retrieval, where an investigator might want to search for analogous pairs of objects that match the query set of interest. There are many ways in which objects can be related, making the task of measuring analogies very challenging. Our approach combines a similarity measure on function spaces with Bayesian analysis to produce a ranking. It requires data containing features of the objects of interest and a link matrix specifying which relationships exist; no further attributes of such relationships are necessary. We illustrate the potential of our method on text analysis and information networks. An application on discovering functional interactions between pairs of proteins is discussed in detail, where we show that our approach can work in practice even if a small set of protein pairs is provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS321 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Harvard University - DASH

UCL Discovery

CUED - Cambridge University Engineering Department

TinkerCell: Modular CAD Tool for Synthetic Biology

Author: Bergmann Frank T.
Chandran Deepak
Sauro Herbert M.
Publication venue
Publication date: 01/01/2009
Field of study

Synthetic biology brings together concepts and techniques from engineering and biology. In this field, computer-aided design (CAD) is necessary in order to bridge the gap between computational modeling and biological data. An application named TinkerCell has been created in order to serve as a CAD tool for synthetic biology. TinkerCell is a visual modeling tool that supports a hierarchy of biological parts. Each part in this hierarchy consists of a set of attributes that define the part, such as sequence or rate constants. Models that are constructed using these parts can be analyzed using various C and Python programs that are hosted by TinkerCell via an extensive C and Python API. TinkerCell supports the notion of a module, which are networks with interfaces. Such modules can be connected to each other, forming larger modular networks. Because TinkerCell associates parameters and equations in a model with their respective part, parts can be loaded from databases along with their parameters and rate equations. The modular network design can be used to exchange modules as well as test the concept of modularity in biological systems. The flexible modeling framework along with the C and Python API allows TinkerCell to serve as a host to numerous third-party algorithms. TinkerCell is a free and open-source project under the Berkeley Software Distribution license. Downloads, documentation, and tutorials are available at www.tinkercell.com.Comment: 23 pages, 20 figure

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Transcriptome analysis of cortical tissue reveals shared sets of downregulated genes in autism and schizophrenia.

Author: Arking DE
Ellis SE
Panitch R
West AB
Publication venue: eScholarship, University of California
Publication date: 01/05/2016
Field of study

Autism (AUT), schizophrenia (SCZ) and bipolar disorder (BPD) are three highly heritable neuropsychiatric conditions. Clinical similarities and genetic overlap between the three disorders have been reported; however, the causes and the downstream effects of this overlap remain elusive. By analyzing transcriptomic RNA-sequencing data generated from post-mortem cortical brain tissues from AUT, SCZ, BPD and control subjects, we have begun to characterize the extent of gene expression overlap between these disorders. We report that the AUT and SCZ transcriptomes are significantly correlated (P<0.001), whereas the other two cross-disorder comparisons (AUT-BPD and SCZ-BPD) are not. Among AUT and SCZ, we find that the genes differentially expressed across disorders are involved in neurotransmission and synapse regulation. Despite the lack of global transcriptomic overlap across all three disorders, we highlight two genes, IQSEC3 and COPS7A, which are significantly downregulated compared with controls across all three disorders, suggesting either shared etiology or compensatory changes across these neuropsychiatric conditions. Finally, we tested for enrichment of genes differentially expressed across disorders in genetic association signals in AUT, SCZ or BPD, reporting lack of signal in any of the previously published genome-wide association study (GWAS). Together, these studies highlight the importance of examining gene expression from the primary tissue involved in neuropsychiatric conditions-the cortical brain. We identify a shared role for altered neurotransmission and synapse regulation in AUT and SCZ, in addition to two genes that may more generally contribute to neurodevelopmental and neuropsychiatric conditions

PubMed Central

eScholarship - University of California