Search CORE

4 research outputs found

A Factor Graph Approach to Automated GO Annotation

Author: Elizabeth Tapia
Fernando Roda
Flavia Krsticevic
Flavio E Spetale
Pilar Bulacio
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO) annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifiers are leveraged by checking the consistency of predefined GO relationships. Both formal leveraging strategies, with main focus on annotation precision, and heuristic alternatives, with main focus on scalability issues, have been described in literature. In this contribution, a factor graph approach to the hierarchical ensemble formulation of the automated GO annotation problem is presented. In this formal framework, a core factor graph is first built based on the GO structure and then enriched to take into account the noisy nature of GO-term predictions. Hence, starting from raw GO-term predictions, an iterative message passing algorithm between nodes of the factor graph is used to compute marginal probabilities of target GO-terms. Evaluations on Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster protein sequences from the GO Molecular Function domain showed significant improvements over competing approaches, even when protein sequences were naively characterized by their physicochemical and secondary structure properties or when loose noisy annotation datasets were considered. Based on these promising results and using Arabidopsis thaliana annotation data, we extend our approach to the identification of most promising molecular function annotations for a set of proteins of unknown function in Solanum lycopersicum.Fil: Spetale, Flavio Ezequiel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Krsticevic, Flavia Jorgelina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Roda, Fernando. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Bulacio, Pilar Estela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentin

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

Directory of Open Access Journals

PubMed Central

FigShare

Bio y agroInformática en CIFASIS

Author: Angelone Laura
Bulacio Pilar
Coronel José
Iglesias Natalia
Murillo Javier
Ornella Leonardo
Spetale Flavio E.
Tapia Elizabeth
Publication venue
Publication date: 01/05/2011
Field of study

Las tecnologías de alto rendimiento en proyectos de ciencias de la vida generan cantidades exponenciales de datos cuya naturaleza y complejidad inspira el desarrollo de nuevos métodos computacionales para la extracción y gestión de información biológica relevante con el objetivo de lograr una comprensión más acabada de la vida tanto a nivel molecular como poblacional. Este contexto tecnológico, define un nuevo campo de investigación multidisciplinar conocido como Bioinformática. En nuestro grupo estamos interesados en el desarrollo de algoritmos y herramientas bioinformáticas para el análisis, procesamiento y gestión de datos de espectroscopia, microarreglos, marcadores moleculares y de secuenciación de alto rendimiento en el marco de proyectos de investigación básica y biológica multidisciplinar. Nuestro trabajo en Bioinformática inspira además la introducción de tecnologías de alto rendimiento y procesamiento de datos en Agricultura de Precisión, en el marco de un campo de investigación incipiente conocido como Agroinformática.Eje: Procesamiento de señales y sistemas de tiempo realRed de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Multiclass classification of microarray data samples with a reduced number of genes

Author: A Alizadeh
A Berger
A Dupuy
A Statnikov
A Statnikov
AI Su
C Ambroise
C Furlanello
CE Shannon
CF Aliferis
DJC Mackay
DK Slonim
E Tapia
EL Allwein
Elizabeth Tapia
F Azuaje
F Masulli
FR Kschischang
G James
G Salton
I Guyon
I Shmulevich
I Tsamardinos
I Witten
J Fan
J Hadar
J Khan
J Zhu
JE Staunton
K Yeung
KH Liu
L Breiman
Laura Angelone
Leonardo Ornella
M Dettling
M Hollander
MA Delgado
N Cristianini
Pilar Bulacio
R Rifkin
R Rifkin
RM Fano
S Dudoit
S Huang
S Lee
S Pomeroy
T Abeel
T Furey
T Li
TG Dietterich
TM Cover
V Guruswami
V Vapnik
X Qiu
Y Lin
Y Saeys
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained. Results A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples. Conclusions A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples.</p

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

CONICET Digital

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central