Search CORE

35,807 research outputs found

Graph-based sequence annotation using a data integration approach

Author: Hassani-Pak K.
Hindle M. M.
Kohler J.
Lysenko A.
Pesch R.
Rawlings C. J.
Taubert J.
Thiele R.
Publication venue: De Gruyter
Publication date: 01/01/2008
Field of study

A Factor Graph Approach to Automated GO Annotation

Author: Elizabeth Tapia
Fernando Roda
Flavia Krsticevic
Flavio E Spetale
Pilar Bulacio
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO) annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifiers are leveraged by checking the consistency of predefined GO relationships. Both formal leveraging strategies, with main focus on annotation precision, and heuristic alternatives, with main focus on scalability issues, have been described in literature. In this contribution, a factor graph approach to the hierarchical ensemble formulation of the automated GO annotation problem is presented. In this formal framework, a core factor graph is first built based on the GO structure and then enriched to take into account the noisy nature of GO-term predictions. Hence, starting from raw GO-term predictions, an iterative message passing algorithm between nodes of the factor graph is used to compute marginal probabilities of target GO-terms. Evaluations on Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster protein sequences from the GO Molecular Function domain showed significant improvements over competing approaches, even when protein sequences were naively characterized by their physicochemical and secondary structure properties or when loose noisy annotation datasets were considered. Based on these promising results and using Arabidopsis thaliana annotation data, we extend our approach to the identification of most promising molecular function annotations for a set of proteins of unknown function in Solanum lycopersicum.Fil: Spetale, Flavio Ezequiel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Krsticevic, Flavia Jorgelina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Roda, Fernando. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Bulacio, Pilar Estela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentin

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

Directory of Open Access Journals

PubMed Central

FigShare

Automated data integration for developmental biological research

Author: Sternberg Paul W.
Zhong Weiwei
Publication venue: 'The Company of Biologists'
Publication date: 15/09/2007
Field of study

In an era exploding with genome-scale data, a major challenge for developmental biologists is how to extract significant clues from these publicly available data to benefit our studies of individual genes, and how to use them to improve our understanding of development at a systems level. Several studies have successfully demonstrated new approaches to classic developmental questions by computationally integrating various genome-wide data sets. Such computational approaches have shown great potential for facilitating research: instead of testing 20,000 genes, researchers might test 200 to the same effect. We discuss the nature and state of this art as it applies to developmental research

Caltech Authors

Supporting user-oriented analysis for multi-view domain-specific visual languages

Author: Alessio Malizia
Baresi
Berry
Buchholtz
Clarke
de Lara
Ehrig
Ehrig
Ermel
Esther Guerra
Green
Grundy
Grundy
Guerra
Guerra
Guerra
Juan de Lara
Lédeczi
Magee
Manna
Murata
Nentwich
Paloma Díaz
Schürr
Varró
Völter
Xie
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

This is the post-print version of the final paper published in Information and Software Technology. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2008 Elsevier B.V.The integration of usable and flexible analysis support in modelling environments is a key success factor in Model-Driven Development. In this paradigm, models are the core asset from which code is automatically generated, and thus ensuring model correctness is a fundamental quality control activity. For this purpose, a common approach is to transform the system models into formal semantic domains for verification. However, if the analysis results are not shown in a proper way to the end-user (e.g. in terms of the original language) they may become useless. In this paper we present a novel DSVL called BaVeL that facilitates the flexible annotation of verification results obtained in semantic domains to different formats, including the context of the original language. BaVeL is used in combination with a consistency framework, providing support for all steps in a verification process: acquisition of additional input data, transformation of the system models into semantic domains, verification, and flexible annotation of analysis results. The approach has been validated analytically by the cognitive dimensions framework, and empirically by its implementation and application to several DSVLs. Here we present a case study of a notation in the area of Digital Libraries, where the analysis is performed by transformations into Petri nets and a process algebra.Spanish Ministry of Education and Science and MODUWEB

CiteSeerX

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Brunel University Research Archive

Biblos-e Archivo

htsint: a Python library for sequencing pipelines that combines data through gene set generation

Author: Bonneaud Camille
Herrel Anthony
Richards Adam J
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Background: Sequencing technologies provide a wealth of details in terms of genes, expression, splice variants, polymorphisms, and other features. A standard for sequencing analysis pipelines is to put genomic or transcriptomic features into a context of known functional information, but the relationships between ontology terms are often ignored. For RNA-Seq, considering genes and their genetic variants at the group level enables a convenient way to both integrate annotation data and detect small coordinated changes between experimental conditions, a known caveat of gene level analyses. Results: We introduce the high throughput data integration tool, htsint, as an extension to the commonly used gene set enrichment frameworks. The central aim of htsint is to compile annotation information from one or more taxa in order to calculate functional distances among all genes in a specified gene space. Spectral clustering is then used to partition the genes, thereby generating functional modules. The gene space can range from a targeted list of genes, like a specific pathway, all the way to an ensemble of genomes. Given a collection of gene sets and a count matrix of transcriptomic features (e.g. expression, polymorphisms), the gene sets produced by htsint can be tested for 'enrichment' or conditional differences using one of a number of commonly available packages. Conclusion: The database and bundled tools to generate functional modules were designed with sequencing pipelines in mind, but the toolkit nature of htsint allows it to also be used in other areas of genomics. The software is freely available as a Python library through GitHub at https://github.com/ajrichards/htsint

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central

Towards a Knowledge Graph based Speech Interface

Author: Auer Sören
Kumar Ashwini Jaya
köhler Joachim
Schmidt Christoph
Publication venue
Publication date: 23/05/2017
Field of study

Applications which use human speech as an input require a speech interface with high recognition accuracy. The words or phrases in the recognised text are annotated with a machine-understandable meaning and linked to knowledge graphs for further processing by the target application. These semantic annotations of recognised words can be represented as a subject-predicate-object triples which collectively form a graph often referred to as a knowledge graph. This type of knowledge representation facilitates to use speech interfaces with any spoken input application, since the information is represented in logical, semantic form, retrieving and storing can be followed using any web standard query languages. In this work, we develop a methodology for linking speech input to knowledge graphs and study the impact of recognition errors in the overall process. We show that for a corpus with lower WER, the annotation and linking of entities to the DBpedia knowledge graph is considerable. DBpedia Spotlight, a tool to interlink text documents with the linked open data is used to link the speech recognition output to the DBpedia knowledge graph. Such a knowledge-based speech recognition interface is useful for applications such as question answering or spoken dialog systems.Comment: Under Review in International Workshop on Grounding Language Understanding, Satellite of Interspeech 201

arXiv.org e-Print Archive

Fraunhofer-ePrints