Search CORE

99 research outputs found

Fold change and p-value cutoffs significantly alter microarray interpretations

Author: A Fujita
Anthony Deeter
AT Askari
DA Iacobas
DB Allison
DJ McCarthy
DM Witten
E Jacob
Gayathri Nimishakavi
GD Ruxton
IB Jeffery
IJ Marques
JP Scarth
JS Isaacs
L van der Weyden
MAQC Consortium
Mark R Dalman
N Mah
R Nadon
RA Miller
SCP Renn
SN Kahn
W Enard
WJ Lin
Zhong-Hui Duan
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background As context is important to gene expression, so is the preprocessing of microarray to transcriptomics. Microarray data suffers from several normalization and significance problems. Arbitrary fold change (FC) cut-offs of >2 and significance p-values of <0.02 lead data collection to look only at genes which vary wildly amongst other genes. Therefore, questions arise as to whether the biology or the statistical cutoff are more important within the interpretation. In this paper, we reanalyzed a zebrafish (<it>D. rerio</it>) microarray data set using GeneSpring and different differential gene expression cut-offs and found the data interpretation was drastically different. Furthermore, despite the advances in microarray technology, the array captures a large portion of genes known but yet still leaving large voids in the number of genes assayed, such as leptin a pleiotropic hormone directly related to hypoxia-induced angiogenesis. Results The data strongly suggests that the number of differentially expressed genes is more up-regulated than down-regulated, with many genes indicating conserved signalling to previously known functions. Recapitulated data from Marques et al. (2008) was similar but surprisingly different with some genes showing unexpected signalling which may be a product of tissue (heart) or that the intended response was transient. Conclusions Our analyses suggest that based on the chosen statistical or fold change cut-off; microarray analysis can provide essentially more than one answer, implying data interpretation as more of an art than a science, with follow up gene expression studies a must. Furthermore, gene chip annotation and development needs to maintain pace with not only new genomes being sequenced but also novel genes that are crucial to the overall gene chips interpretation.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A gene signature for post-infectious chronic fatigue syndrome

Author: Abhijit Chaudhuri
AK Smith
B Cameron
B Evengård
Celia Cannon
D Buchwald
D Maglott
DR Shafren
G Broderick
G Kennedy
G Kennedy
H Fang
H Gräns
IB Jeffery
J Gu
J Smith
J Vecchiet
John W Gow
JR Kerr
JR Kerr
JW Gow
JW Gow
K Fukuda
KA Kurian
L Carmel
L Jason
LA Karaivanova
LJ Morrison
LM Cope
M Maes
M Maes
M Steinau
N Kaushik
Pawel Herzyk
Peter O Behan
PJ McLaren
R Breitling
R Breitling
R Edgar
RA Irizarry
RB Moss
RS Richards
RW Finberg
S Hafenstein
S Mikami
S Wessely
S Wessely
SD Vernon
SD Vernon
Suzanne Hagan
T Pham
T Saiki
T Watanabe
T Whistler
Y Manuel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background: At present, there are no clinically reliable disease markers for chronic fatigue syndrome. DNA chip microarray technology provides a method for examining the differential expression of mRNA from a large number of genes. Our hypothesis was that a gene expression signature, generated by microarray assays, could help identify genes which are dysregulated in patients with post-infectious CFS and so help identify biomarkers for the condition. Methods: Human genome-wide Affymetrix GeneChip arrays (39,000 transcripts derived from 33,000 gene sequences) were used to compare the levels of gene expression in the peripheral blood mononuclear cells of male patients with post-infectious chronic fatigue (n = 8) and male healthy control subjects (n = 7). Results: Patients and healthy subjects differed significantly in the level of expression of 366 genes. Analysis of the differentially expressed genes indicated functional implications in immune modulation, oxidative stress and apoptosis. Prototype biomarkers were identified on the basis of differential levels of gene expression and possible biological significance Conclusion: Differential expression of key genes identified in this study offer an insight into the possible mechanism of chronic fatigue following infection. The representative biomarkers identified in this research appear promising as potential biomarkers for diagnosis and treatment

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Enlighten

ResearchOnline@GCU

Evaluation of Methods for Gene Selection in Melanoma Cell Lines

Author: Chaba Linda
Odhiambo John
Omolo Bernard
Publication venue: 'Lifescience Global'
Publication date: 27/02/2016
Field of study

A major objective in microarray experiments is to identify a panel of genes that are associated with a disease outcome or trait. Many statistical methods have been proposed for gene selection within the last fifteen years. While the comparison of some of these methods has been done, most of them concentrated on finding gene signatures based on two groups. This study evaluates four gene selection methods when the outcome of interested is continuous in nature. We provide a comparative review of four methods: the Statistical Analysis of Microarrays (SAM), the Linear Models for Microarray Analysis (LIMMA), the Lassoed Principal Components (LPC), and the Quantitative Trait Analysis (QTA). Comparison is based on the power to identify differentially expressed genes, the predictive ability of the genelists for a continuous outcome (G2 checkpoint function), and the prognostic properties of the genelists for distant metastasis-free survival. A simulated dataset and a publicly available melanoma cell lines dataset are used for simulations and validation, respectively. A primary melanoma dataset is used for assessment of prognosis. No common genes were found among the genelists from the four methods. While the SAM was generally the best in terms of power, the QTA genelist performed the best in the prediction of the G2 checkpoint function. Identification of genelists depends on the choice of the gene selection method. The QTA method would be preferred over the other approaches in predicting a quantitative outcome in melanoma research. We recommend the development of more robust statistical methods for differential gene expression analysis

Publication Management System

Testing significance relative to a fold-change threshold is a TREAT

Author: Baldi
D. J. McCarthy
DeRisi
G. K. Smyth
Huggins
Jeffery
Kooperberg
Patterson
Raouf
Ritchie
Schena
Wright
Xie
Publication venue: Oxford University Press
Publication date: 15/03/2009
Field of study

Motivation: Statistical methods are used to test for the differential expression of genes in microarray experiments. The most widely used methods successfully test whether the true differential expression is different from zero, but give no assurance that the differences found are large enough to be biologically meaningful

Crossref

PubMed Central

University of Melbourne Institutional Repository

Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies

Author: Aurelien de Reynies
B Wu
C Kooperberg
C Murie
C Yauk
Caroline Paccard
D Allison
D Chessel
D Rickman
F Jaffrezic
G Marot
G Smyth
G Wright
Gregory Nuel
I Jeffery
J Soulier
JD Storey
Kerby Shedden
L Lamant
L Van 't Veer
L Zhou
Laetitia Marisa
M Kerr
M McCall
M Pirooznia
M Sullivan Pepe
Marine Jeanmougin
Mickael Guedj
N Jain
P Bertheau
P Delmar
R Simon
S Boyault
S Dudoit
S Zhang
T Mary-Huard
T Sorlie
V Tusher
X Huang
Y Benjamini
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alternatives have been developed based on different and innovative variance modeling strategies. However, a critical issue is that selecting a different test usually leads to a different gene list. In this context and given the current tendency to apply the t-test, identifying the most efficient approach in practice remains crucial. To provide elements to answer, we conduct a comparison of eight tests representative of variance modeling strategies in gene expression data: Welch's t-test, ANOVA [1], Wilcoxon's test, SAM [2], RVM [3], limma [4], VarMixt [5] and SMVar [6]. Our comparison process relies on four steps (gene list analysis, simulations, spike-in data and re-sampling) to formulate comprehensive and robust conclusions about test performance, in terms of statistical power, false-positive rate, execution time and ease of use. Our results raise concerns about the ability of some methods to control the expected number of false positives at a desirable level. Besides, two tests (limma and VarMixt) show significant improvement compared to the t-test, in particular to deal with small sample sizes. In addition limma presents several practical advantages, so we advocate its application to analyze gene expression data

CiteSeerX

Public Library of Science (PLOS)

HAL Evry

Crossref

Directory of Open Access Journals

PubMed Central

HAL Descartes

ProdInra

Density based pruning for identification of differentially expressed genes from microarray data

Author: Hu Jianjun
Xu Jia
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Motivation Identification of differentially expressed genes from microarray datasets is one of the most important analyses for microarray data mining. Popular algorithms such as statistical t-test rank genes based on a single statistics. The false positive rate of these methods can be improved by considering other features of differentially expressed genes. Results We proposed a pattern recognition strategy for identifying differentially expressed genes. Genes are mapped to a two dimension feature space composed of average difference of gene expression and average expression levels. A density based pruning algorithm (DB Pruning) is developed to screen out potential differentially expressed genes usually located in the sparse boundary region. Biases of popular algorithms for identifying differentially expressed genes are visually characterized. Experiments on 17 datasets from Gene Omnibus Database (GEO) with experimentally verified differentially expressed genes showed that DB pruning can significantly improve the prediction accuracy of popular identification algorithms such as t-test, rank product, and fold change. Conclusions Density based pruning of non-differentially expressed genes is an effective method for enhancing statistical testing based algorithms for identifying differentially expressed genes. It improves t-test, rank product, and fold change by 11% to 50% in the numbers of identified true differentially expressed genes. The source code of DB pruning is freely available on our website <url>http://mleg.cse.sc.edu/degprune</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Scholar Commons - Institutional Repository of the University of South Carolina

Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts

Author: A Subramanian
AP Oron
B Zhang
B Zheng
B Zheng
CA Joslyn
CMaS Manning
D Martin
D Nam
Ebenezer O. George
F Al-Shahrour
G Yona
HH van Haagen
IB Jeffery
JD Storey
JD Wren
Kevin Heinrich
L Wei
Lijing Xu
M Ashburner
M Chagoyen
M Schuemie
Michael W. Berry
MS Pepe
MW Berry
Nicholas Furlotte
P Minguez
R Homayouni
R Jelier
Ramin Homayouni
Ramy K. Aziz
S Chiaretti
S Raychaudhuri
S Raychaudhuri
SG Lee
TK Landauer
TM Kim
VK Mootha
W Pan
Y Pawitan
Yunyue Lin
Z Jiang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature

University of Memphis Digital Commons

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Intra- and Inter-Individual Variance of Gene Expression in Clinical Studies

Author: Chang Cheng-Wei
Chen Chaang-Ray
Cheng Hung-Tsu
Cheng Wei-Chung
Hsu Ian C.
Li Chia-Yang
Shu Wun-Yi
Tsai Min-Lung
Wang Tzu-Hao
Publication venue: Public Library of Science
Publication date: 18/06/2012
Field of study

BACKGROUND: Variance in microarray studies has been widely discussed as a critical topic on the identification of differentially expressed genes; however, few studies have addressed the influence of estimating variance. METHODOLOGY/PRINCIPAL FINDINGS: To break intra- and inter-individual variance in clinical studies down to three levels--technical, anatomic, and individual--we designed experiments and algorithms to investigate three forms of variances. As a case study, a group of "inter-individual variable genes" were identified to exemplify the influence of underestimated variance on the statistical and biological aspects in identification of differentially expressed genes. Our results showed that inadequate estimation of variance inevitably led to the inclusion of non-statistically significant genes into those listed as significant, thereby interfering with the correct prediction of biological functions. Applying a higher cutoff value of fold changes in the selection of significant genes reduces/eliminates the effects of underestimated variance. CONCLUSIONS/SIGNIFICANCE: Our data demonstrated that correct variance evaluation is critical in selecting significant genes. If the degree of variance is underestimated, "noisy" genes are falsely identified as differentially expressed genes. These genes are the noise associated with biological interpretation, reducing the biological significance of the gene set. Our results also indicate that applying a higher number of fold change as the selection criteria reduces/eliminates the differences between distinct estimations of variance

Public Library of Science (PLOS)

Directory of Open Access Journals

National Health Research Institues

PubMed Central

FigShare