Search CORE

RCAAP - Repositório Científico de Acesso Aberto de Portugal

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

BayGO: Bayesian analysis of ontology term enrichment in microarray data

Author: de B Pereira Carlos A
Gomes Suely L
Koide Tie
Vêncio Ricardo ZN
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The search for enriched (aka over-represented or enhanced) ontology terms in a list of genes obtained from microarray experiments is becoming a standard procedure for a system-level analysis. This procedure tries to summarize the information focussing on classification designs such as Gene Ontology, KEGG pathways, and so on, instead of focussing on individual genes. Although it is well known in statistics that association and significance are distinct concepts, only the former approach has been used to deal with the ontology term enrichment problem. RESULTS: BayGO implements a Bayesian approach to search for enriched terms from microarray data. The R source-code is freely available at in three versions: Linux, which can be easily incorporated into pre-existent pipelines; Windows, to be controlled interactively; and as a web-tool. The software was validated using a bacterial heat shock response dataset, since this stress triggers known system-level responses. CONCLUSION: The Bayesian model accounts for the fact that, eventually, not all the genes from a given category are observable in microarray data due to low intensity signal, quality filters, genes that were not spotted and so on. Moreover, BayGO allows one to measure the statistical association between generic ontology terms and differential expression, instead of working only with the common significance analysis

RCAAP - Repositório Científico de Acesso Aberto de Portugal

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Lineage relationship of prostate cancer cell types based on gene expression

Author: Denyer Gareth
Liu Alvin Y
Pascal Laura E
Vessella Robert L
Vêncio Eneida F
Vêncio Ricardo ZN
Ware Carol B
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Prostate tumor heterogeneity is a major factor in disease management. Heterogeneity could be due to multiple cancer cell types with distinct gene expression. Of clinical importance is the so-called cancer stem cell type. Cell type-specific transcriptomes are used to examine lineage relationship among cancer cell types and their expression similarity to normal cell types including stem/progenitor cells. Methods Transcriptomes were determined by Affymetrix DNA array analysis for the following cell types. Putative prostate progenitor cell populations were characterized and isolated by expression of the membrane transporter ABCG2. Stem cells were represented by embryonic stem and embryonal carcinoma cells. The cancer cell types were Gleason pattern 3 (glandular histomorphology) and pattern 4 (aglandular) sorted from primary tumors, cultured prostate cancer cell lines originally established from metastatic lesions, xenografts LuCaP 35 (adenocarcinoma phenotype) and LuCaP 49 (neuroendocrine/small cell carcinoma) grown in mice. No detectable gene expression differences were detected among serial passages of the LuCaP xenografts. Results Based on transcriptomes, the different cancer cell types could be clustered into a luminal-like grouping and a non-luminal-like (also not basal-like) grouping. The non-luminal-like types showed expression more similar to that of stem/progenitor cells than the luminal-like types. However, none showed expression of stem cell genes known to maintain stemness. Conclusions Non-luminal-like types are all representatives of aggressive disease, and this could be attributed to the similarity in overall gene expression to stem and progenitor cell types.</p

RCAAP - Repositório Científico de Acesso Aberto de Portugal

D-Scholarship@Pitt

RCAAP - Repositório Científico de Acesso Aberto de Portugal

A UML profile for the OBO relation ontology

Author: Farias Cléver RG de
Guardia Gabriela DA
Vêncio Ricardo ZN
Publication venue
Publication date
Field of study

Background: Ontologies have increasingly been used in the biomedical domain, which has prompted the emergence of different initiatives to facilitate their development and integration. The Open Biological and Biomedical Ontologies (OBO) Foundry consortium provides a repository of life-science ontologies, which are developed according to a set of shared principles. This consortium has developed an ontology called OBO Relation Ontology aiming at standardizing the different types of biological entity classes and associated relationships. Since ontologies are primarily intended to be used by humans, the use of graphical notations for ontology development facilitates the capture, comprehension and communication of knowledge between its users. However, OBO Foundry ontologies are captured and represented basically using text-based notations. The Unified Modeling Language (UML) provides a standard and widely-used graphical notation for modeling computer systems. UML provides a well-defined set of modeling elements, which can be extended using a built-in extension mechanism named Profile. Thus, this work aims at developing a UML profile for the OBO Relation Ontology to provide a domain-specific set of modeling elements that can be used to create standard UML-based ontologies in the biomedical domain. Results: We have studied the OBO Relation Ontology, the UML metamodel and the UML profiling mechanism. Based on these studies, we have proposed an extension to the UML metamodel in conformance with the OBO Relation Ontology and we have defined a profile that implements the extended metamodel. Finally, we have applied the proposed UML profile in the development of a number of fragments from different ontologies. Particularly, we have considered the Gene Ontology (GO), the PRotein Ontology (PRO) and the Xenopus Anatomy and Development Ontology (XAO). Conclusions: The use of an established and well-known graphical language in the development of biomedical ontologies provides a more intuitive form of capturing and representing knowledge than using only text-based notations. The use of the profile requires the domain expert to reason about the underlying semantics of the concepts and relationships being modeled, which helps preventing the introduction of inconsistencies in an ontology under development and facilitates the identification and correction of errors in an already defined ontology.CAPESInternational Conference on the Brazilian Association for Bioinformatics and Computational Biology. Florianópolis, Brazil, 12-15 October 201

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Evaluation of reference-based two-color methods for measurement of gene expression ratios using spotted cDNA microarrays

Author: Egidio Camila M
Mota-Vieira Luisa
Peixoto Bernardo R
Reis Eduardo M
Verjovski-Almeida Sergio
Vêncio Ricardo ZN
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Spotted cDNA microarrays generally employ co-hybridization of fluorescently-labeled RNA targets to produce gene expression ratios for subsequent analysis. Direct comparison of two RNA samples in the same microarray provides the highest level of accuracy; however, due to the number of combinatorial pair-wise comparisons, the direct method is impractical for studies including large number of individual samples (e.g., tumor classification studies). For such studies, indirect comparisons using a common reference standard have been the preferred method. Here we evaluated the precision and accuracy of reconstructed ratios from three indirect methods relative to ratios obtained from direct hybridizations, herein considered as the gold-standard. RESULTS: We performed hybridizations using a fixed amount of Cy3-labeled reference oligonucleotide (RefOligo) against distinct Cy5-labeled targets from prostate, breast and kidney tumor samples. Reconstructed ratios between all tissue pairs were derived from ratios between each tissue sample and RefOligo. Reconstructed ratios were compared to (i) ratios obtained in parallel from direct pair-wise hybridizations of tissue samples, and to (ii) reconstructed ratios derived from hybridization of each tissue against a reference RNA pool (RefPool). To evaluate the effect of the external references, reconstructed ratios were also calculated directly from intensity values of single-channel (One-Color) measurements derived from tissue sample data collected in the RefOligo experiments. We show that the average coefficient of variation of ratios between intra- and inter-slide replicates derived from RefOligo, RefPool and One-Color were similar and 2 to 4-fold higher than ratios obtained in direct hybridizations. Correlation coefficients calculated for all three tissue comparisons were also similar. In addition, the performance of all indirect methods in terms of their robustness to identify genes deemed as differentially expressed based on direct hybridizations, as well as false-positive and false-negative rates, were found to be comparable. CONCLUSION: RefOligo produces ratios as precise and accurate as ratios reconstructed from a RNA pool, thus representing a reliable alternative in reference-based hybridization experiments. In addition, One-Color measurements alone can reconstruct expression ratios without loss in precision or accuracy. We conclude that both methods are adequate options in large-scale projects where the amount of a common reference RNA pool is usually restrictive

RCAAP - Repositório Científico de Acesso Aberto de Portugal

ProbCD: enrichment analysis accounting for categorization uncertainty

Author: A Lewin
A Vinayagam
B Engelhardt
C Andersson
C Jones
D Martin
E Levy
I Rivals
Ilya Shmulevich
J Goeman
L Goodman
M Aubry
P Shannon
R Fisher
R Sealfon
R Vencio
Ricardo ZN Vêncio
S Carroll
S Maere
T Joshi
W Zhang
W Zhang
Z Jiang
Publication venue
Publication date: 01/01/2007
Field of study

As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test. We developed an open-source R package to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for
the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: http://xerad.systemsbiology.net/ProbCD/. We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation

arXiv.org e-Print Archive

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Nature Precedings

ProbFAST: Probabilistic Functional Analysis System Tool

Author: A Ahmed
A van Kampen
B Graveley
C Johnstone
C Jones
C Romualdi
C Romualdi
C Suzuki
D Murray
E Ojima
F Rojo
F Sigoillot
Greice A Molfetta
H Li
Israel T Silva
J Lu
J Pylouster
J Rae
J Wixon
K Baggerly
K Komatsu
M Ashburner
M Howe
M Kashani-Sabet
M Schena
P Dy
P Phadke
R Vêncio
R Vêncio
R Vêncio
Ricardo ZN Vêncio
S Brenner
S Lee
T Barrett
T Fawcett
Thiago YK Oliveira
V Velculescu
V Velculescu
W Meehan
Wilson A Silva
X Cui
X Jiang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The post-genomic era has brought new challenges regarding the understanding of the organization and function of the human genome. Many of these challenges are centered on the meaning of differential gene regulation under distinct biological conditions and can be performed by analyzing the Multiple Differential Expression (MDE) of genes associated with normal and abnormal biological processes. Currently MDE analyses are limited to usual methods of differential expression initially designed for paired analysis. Results We proposed a web platform named ProbFAST for MDE analysis which uses Bayesian inference to identify key genes that are intuitively prioritized by means of probabilities. A simulated study revealed that our method gives a better performance when compared to other approaches and when applied to public expression data, we demonstrated its flexibility to obtain relevant genes biologically associated with normal and abnormal biological processes. Conclusions ProbFAST is a free accessible web-based application that enables MDE analysis on a global scale. It offers an efficient methodological approach for MDE analysis of a set of genes that are turned on and off related to functional information during the evolution of a tumor or tissue differentiation. ProbFAST server can be accessed at <url>http://gdm.fmrp.usp.br/probfast</url>.</p

RCAAP - Repositório Científico de Acesso Aberto de Portugal

arXiv.org e-Print Archive

Simcluster: clustering enumeration gene expression data on the simplex space

Author: Carlos A de B Pereira
E Dougherty
G Stolovitzky
H Thygesen
Helena Brentani
I Braslavsky
Ilya Shmulevich
J Aitchison
J Aitchison
K Okubo
L Cai
L Hood
Leonardo Varuzza
M Bainbridge
M Brun
M de Hoon
M Gilchrist
M Margulies
M Schena
N Bolshakova
R Loganantharaj
R Page
R Vencio
R Vencio
RF Service
Ricardo ZN Vêncio
S Audic
S Brenner
S Datta
S Fodor
T Seo
V Velculescu
Publication venue
Publication date: 01/01/2007
Field of study

Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space.

Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster.

Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Nature Precedings

Gene expression relationship between prostate cancer cells of Gleason 3, 4 and normal epithelial cells as revealed by cell type-specific transcriptomes

Author: A Bjartell
A Fleischmann
AE Pelzer
AJ Oudes
AJ Oudes
Alvin Y Liu
AY Liu
AY Liu
AY Liu
AY Liu
AY Liu
AY Liu
B Marzolf
Bruz Marzolf
Christina P Shadle
D Singh
DL Blum
DL Hudson
EM Reis
Emily S Liebeskind
GV Glinsky
H Okada
J Inokuchi
JS Zhang
JV Swinnen
K Rostad
KD Brubaker
L True
Laura E Pascal
Laura S Page
Lawrence D True
LE Pascal
LE Pascal
Leroy E Hood
LI Kovalev
M Cantile
M Kanehisa
MA Dall'Era
ME Chen
P Kufer
Pamela Troisch
PG Febbo
PJ Adam
PS Chan
RA Irizarry
Ricardo ZN Vêncio
RZ Vêncio
S Bettuzzi
S Terry
SA Tomlins
SA Tomlins
SJ Freedland
V Paradis
W Xiao
Y Zhang
YA Goo
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Prostate cancer cells in primary tumors have been typed CD10(-)/CD13(-)/CD24(hi)/CD26(+)/CD38(lo)/CD44(-)/CD104(-). This CD phenotype suggests a lineage relationship between cancer cells and luminal cells. The Gleason grade of tumors is a descriptive of tumor glandular differentiation. Higher Gleason scores are associated with treatment failure. Methods: CD26(+) cancer cells were isolated from Gleason 3+3 (G3) and Gleason 4+4 (G4) tumors by cell sorting, and their gene expression or transcriptome was determined by Affymetrix DNA array analysis. Dataset analysis was used to determine gene expression similarities and differences between G3 and G4 as well as to prostate cancer cell lines and histologically normal prostate luminal cells. Results: The G3 and G4 transcriptomes were compared to those of prostatic cell types of non-cancer, which included luminal, basal, stromal fibromuscular, and endothelial. A principal components analysis of the various transcriptome datasets indicated a closer relationship between luminal and G3 than luminal and G4. Dataset comparison also showed that the cancer transcriptomes differed substantially from those of prostate cancer cell lines. Conclusions: Genes differentially expressed in cancer are potential biomarkers for cancer detection, and those differentially expressed between G3 and G4 are potential biomarkers for disease stratification given that G4 cancer is associated with poor outcomes. Differentially expressed genes likely contribute to the prostate cancer phenotype and constitute the signatures of these particular cancer cell types.National Institutes of Health (NIH)[CA111244]National Institutes of Health (NIH)[CA98699]National Institutes of Health (NIH)[CA85859]National Institutes of Health (NIH)[DK63630][P50-GMO-76547

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Gene expression down-regulation in CD90+ prostate tumor-associated stromal cells involves potential organ-specific genes

Abstract Background The prostate stroma is a key mediator of epithelial differentiation and development, and potentially plays a role in the initiation and progression of prostate cancer. The tumor-associated stroma is marked by increased expression of CD90/THY1. Isolation and characterization of these stromal cells could provide valuable insight into the biology of the tumor microenvironment. Methods Prostate CD90+ stromal fibromuscular cells from tumor specimens were isolated by cell-sorting and analyzed by DNA microarray. Dataset analysis was used to compare gene expression between histologically normal and tumor-associated stromal cells. For comparison, stromal cells were also isolated and analyzed from the urinary bladder. Results The tumor-associated stromal cells were found to have decreased expression of genes involved in smooth muscle differentiation, and those detected in prostate but not bladder. Other differential expression between the stromal cell types included that of the CXC-chemokine genes. Conclusion CD90+ prostate tumor-associated stromal cells differed from their normal counterpart in expression of multiple genes, some of which are potentially involved in organ development.</p

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

RCAAP - Repositório Científico de Acesso Aberto de Portugal