Search CORE

246 research outputs found

Probe set algorithms: is there a rational best bet?

Author: B Harr
BM Bolstad
BM Bolstad
BP Durbin
C Li
C Li
DM Rocke
Eric P Hoffman
FF Millenaar
J Freudenberg
J Seo
J Seo
Jinwook Seo
JN McClintick
M Bakay
M Inoue
P Zhao
RA Irizarry
RA Irizarry
RA Irizarry
S Huang
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

Affymetrix microarrays have become a standard experimental platform for studies of mRNA expression profiling. Their success is due, in part, to the multiple oligonucleotide features (probes) against each transcript (probe set). This multiple testing allows for more robust background assessments and gene expression measures, and has permitted the development of many computational methods to translate image data into a single normalized "signal" for mRNA transcript abundance. There are now many probe set algorithms that have been developed, with a gradual movement away from chip-by-chip methods (MAS5), to project-based model-fitting methods (dCHIP, RMA, others). Data interpretation is often profoundly changed by choice of algorithm, with disoriented biologists questioning what the "accurate" interpretation of their experiment is. Here, we summarize the debate concerning probe set algorithms. We provide examples of how changes in mismatch weight, normalizations, and construction of expression ratios each dramatically change data interpretation. All interpretations can be considered as computationally appropriate, but with varying biological credibility. We also illustrate the performance of two new hybrid algorithms (PLIER, GC-RMA) relative to more traditional algorithms (dCHIP, MAS5, Probe Profiler PCA, RMA) using an interactive power analysis tool. PLIER appears superior to other algorithms in avoiding false positives with poorly performing probe sets. Based on our interpretation of the literature, and examples presented here, we suggest that the variability in performance of probe set algorithms is more dependent upon assumptions regarding "background", than on calculations of "signal". We argue that "background" is an enormously complex variable that can only be vaguely quantified, and thus the "best" probe set algorithm will vary from project to project

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

George Washington University: Health Sciences Research Commons (HSRC)

A comprehensive re-analysis of the Golden Spike data: Towards a benchmark for differential expression methods

Author: A Hess
AA Fodor
AR Dabney
C Li
DB Allison
DP Gaile
E Hubbell
E Schuster
F Leisch
G Smyth
L Shi
LM Cope
P Baldi
RA Irizarry
RA Irizarry
RC Gentleman
Richard D Pearson
S Hochreiter
S Lemieux
SE Choe
T Sing
VG Tusher
X Liu
X Liu
Z Chen
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The Golden Spike data set has been used to validate a number of methods for summarizing Affymetrix data sets, sometimes with seemingly contradictory results. Much less use has been made of this data set to evaluate differential expression methods. It has been suggested that this data set should not be used for method comparison due to a number of inherent flaws. Results We have used this data set in a comparison of methods which is far more extensive than any previous study. We outline six stages in the analysis pipeline where decisions need to be made, and show how the results of these decisions can lead to the apparently contradictory results previously found. We also show that, while flawed, this data set is still a useful tool for method comparison, particularly for identifying combinations of summarization and differential expression methods that are unlikely to perform well on real data sets. We describe a new benchmark, AffyDEComp, that can be used for such a comparison. Conclusion We conclude with recommendations for preferred Affymetrix analysis tools, and for the development of future spike-in data sets.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A robust method for estimating gene expression states using Affymetrix microarray probe level data

Author: Affymetrix
AI Su
BZ Wu
C Calza
C Li
C Wu
CV Jongeneel
E Hiyama
E Hubbell
Eiso Hiyama
FF Millenaar
JN McClintik
K Hiyama
K Otani
Keiko Hiyama
Keiko Otani
Kenichi Satoh
M Alaminos
M Kaneko
M Komatsu
M Lastowska
M Ohtaki
MAQC Consortium
Megu Ohtaki
MJ Zillox
MN McCall
Naomi Kamei
NK Sah
RA Irizarry
RA Irizarry
RC Segar
S Fumoto
T Martinez
VG Tusher
XX Tang
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Microarray technology is a high-throughput method for measuring the expression levels of thousand of genes simultaneously. The observed intensities combine a non-specific binding, which is a major disadvantage with microarray data. The Affymetrix GeneChip assigned a mismatch (MM) probe with the intention of measuring non-specific binding, but various opinions exist regarding usefulness of MM measures. It should be noted that not all observed intensities are associated with expressed genes and many of those are associated with unexpressed genes, of which measured values express mere noise due to non-specific binding, cross-hybridization, or stray signals. The implicit assumption that all genes are expressed leads to poor performance of microarray data analyses. We assume two functional states of a gene - expressed or unexpressed - and propose a robust method to estimate gene expression states using an order relationship between PM and MM measures. Results An indicator 'probability of a gene being expressed' was obtained using the number of probe pairs within a probe set where the PM measure exceeds the MM measure. We examined the validity of the proposed indicator using Human Genome U95 data sets provided by Affymetrix. The usefulness of 'probability of a gene being expressed' is illustrated through an exploration of candidate genes involved in neuroblastoma prognosis. We identified the candidate genes for which expression states differed (un-expressed or expressed) when compared between two outcomes. The validity of this result was subsequently confirmed by quantitative RT-PCR. Conclusion The proposed qualitative evaluation, 'probability of a gene being expressed', is a useful indicator for improving microarray data analysis. It is useful to reduce the number of false discoveries. Expression states - expressed or unexpressed - correspond to the most fundamental gene function 'On' and 'Off', which can lead to biologically meaningful results.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Hiroshima University Institutional Repository

A Web-based and Grid-enabled dChip version for the analysis of large sets of gene expression data

Author: C Li
C Li
C Li
F Beltrame
FF Millenaar
I Porro
Ivan Porro
Livia Torterolo
Luca Corradi
LX Qin
Marco Fato
RA Irizarry
RC Gentleman
S Tuecke
Silvia Scaglione
U Pfeffer
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Microarray techniques are one of the main methods used to investigate thousands of gene expression profiles for enlightening complex biological processes responsible for serious diseases, with a great scientific impact and a wide application area. Several standalone applications had been developed in order to analyze microarray data. Two of the most known free analysis software packages are the R-based Bioconductor and dChip. The part of dChip software concerning the calculation and the analysis of gene expression has been modified to permit its execution on both cluster environments (supercomputers) and Grid infrastructures (distributed computing). This work is not aimed at replacing existing tools, but it provides researchers with a method to analyze large datasets without any hardware or software constraints. Results An application able to perform the computation and the analysis of gene expression on large datasets has been developed using algorithms provided by dChip. Different tests have been carried out in order to validate the results and to compare the performances obtained on different infrastructures. Validation tests have been performed using a small dataset related to the comparison of HUVEC (Human Umbilical Vein Endothelial Cells) and Fibroblasts, derived from same donors, treated with IFN-α. Moreover performance tests have been executed just to compare performances on different environments using a large dataset including about 1000 samples related to Breast Cancer patients. Conclusion A Grid-enabled software application for the analysis of large Microarray datasets has been proposed. DChip software has been ported on Linux platform and modified, using appropriate parallelization strategies, to permit its execution on both cluster environments and Grid infrastructures. The added value provided by the use of Grid technologies is the possibility to exploit both computational and data Grid infrastructures to analyze large datasets of distributed data. The software has been validated and performances on cluster and Grid environments have been compared obtaining quite good scalability results.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Università di Genova

"Hook"-calibration of GeneChip-microarrays: Theory and algorithm

Author: A Halperin
Affymetrix
Affymetrix
Affymetrix
Affymetrix
BP Durbin
C Li
CJ Burden
CJ Burden
D Hekstra
DC Hoyle
E Carlon
F Naef
GA Held
GA Held
H Binder
H Binder
H Binder
H Binder
H Binder
H Binder
H Binder
H Binder
H Binder
Hans Binder
L Zhang
M Havilio
N Sugimoto
RA Irizarry
Stephan Preibisch
T Heim
T Lu
W Huber
WH Press
Z Wu
Z Wu
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background: The improvement of microarray calibration methods is an essential prerequisite for quantitative expression analysis. This issue requires the formulation of an appropriate model describing the basic relationship between the probe intensity and the specific transcript concentration in a complex environment of competing interactions, the estimation of the magnitude these effects and their correction using the intensity information of a given chip and, finally the development of practicable algorithms which judge the quality of a particular hybridization and estimate the expression degree from the intensity values. Results: We present the so-called hook-calibration method which co-processes the log-difference (delta) and -sum (sigma) of the perfect match (PM) and mismatch (MM) probe-intensities. The MM probes are utilized as an internal reference which is subjected to the same hybridization law as the PM, however with modified characteristics. After sequence-specific affinity correction the method fits the Langmuir-adsorption model to the smoothed delta-versus-sigma plot. The geometrical dimensions of this so-called hook-curve characterize the particular hybridization in terms of simple geometric parameters which provide information about the mean non-specific background intensity, the saturation value, the mean PM/MM-sensitivity gain and the fraction of absent probes. This graphical summary spans a metrics system for expression estimates in natural units such as the mean binding constants and the occupancy of the probe spots. The method is single-chip based, i.e. it separately uses the intensities for each selected chip. Conclusion: The hook-method corrects the raw intensities for the non-specific background hybridization in a sequence-specific manner, for the potential saturation of the probe-spots with bound transcripts and for the sequence-specific binding of specific transcripts. The obtained chip characteristics in combination with the sensitivity corrected probe-intensity values provide expression estimates scaled in natural units which are given by the binding constants of the particular hybridization.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MDC Repository

Determining gene expression on a single pair of microarrays

Author: A Provenzani
AA Fodor
AHI Hess
AM Hein
Anthony A Fodor
B Bolstad
BM Bolstad
C Cheng
C De Mees
CWWI Li
DB Allison
E Turro
L Klebanov
M Milo
MM Ryan
P Baldi
P Sommer
RA Irizarry
RA Irizarry
Robert W Reid
S Holm
S Pounds
WJ Lemon
X Liu
YHY Benjamini
YYD Benjamini
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background In microarray experiments the numbers of replicates are often limited due to factors such as cost, availability of sample or poor hybridization. There are currently few choices for the analysis of a pair of microarrays where N = 1 in each condition. In this paper, we demonstrate the effectiveness of a new algorithm called PINC (PINC is Not Cyber-T) that can analyze Affymetrix microarray experiments. Results PINC treats each pair of probes within a probeset as an independent measure of gene expression using the Bayesian framework of the Cyber-T algorithm and then assigns a corrected p-value for each gene comparison. The p-values generated by PINC accurately control False Discovery rate on Affymetrix control data sets, but are small enough that family-wise error rates (such as the Holm's step down method) can be used as a conservative alternative to false discovery rate with little loss of sensitivity on control data sets. Conclusion PINC outperforms previously published methods for determining differentially expressed genes when comparing Affymetrix microarrays with N = 1 in each condition. When applied to biological samples, PINC can be used to assess the degree of variability observed among biological replicates in addition to analyzing isolated pairs of microarrays.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

BGX: a Bioconductor package for the Bayesian integrated analysis of Affymetrix GeneChips

Author: AM Law
AMK Hein
AMK Hein
Anne-Mette K Hein
Ernest Turro
F Naef
GO Roberts
GO Roberts
GO Roberts
GO Roberts
LM Cope
M McGee
Natalia Bochkina
RA Irizarry
SE Choe
Sylvia Richardson
T Anderson
YH Yang
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Ranking differentially expressed genes from Affymetrix gene expression data: methods with reproducibility, sensitivity, and specificity

Author: C Li
E Hubbel
E Hubbell
F Hong
GK Smyth
H Binder
H Binder
H Ge
J Xu
K Kadota
K Kadota
Kentaro Shimizu
Koji Kadota
L Gautier
L Shi
MA Sartor
MAQC Consortium
P Baldi
R Breitling
R Foundation for Statistical Computing
R Opgen-Rhein
RA Irizarry
RA Irizarry
RC Gentleman
RD Pearson
S Hochreiter
S Lemieux
SE Choe
VG Tusher
W Huber
X Liu
X Liu
Y Nakai
Yuji Nakai
Z Chen
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background To identify differentially expressed genes (DEGs) from microarray data, users of the Affymetrix GeneChip system need to select both a preprocessing algorithm to obtain expression-level measurements and a way of ranking genes to obtain the most plausible candidates. We recently recommended suitable combinations of a preprocessing algorithm and gene ranking method that can be used to identify DEGs with a higher level of sensitivity and specificity. However, in addition to these recommendations, researchers also want to know which combinations enhance reproducibility. Results We compared eight conventional methods for ranking genes: weighted average difference (WAD), average difference (AD), fold change (FC), rank products (RP), moderated <it>t </it>statistic (modT), significance analysis of microarrays (samT), shrinkage <it>t </it>statistic (shrinkT), and intensity-based moderated <it>t </it>statistic (ibmT) with six preprocessing algorithms (PLIER, VSN, FARMS, multi-mgMOS (mmgMOS), MBEI, and GCRMA). A total of 36 real experimental datasets was evaluated on the basis of the area under the receiver operating characteristic curve (AUC) as a measure for both sensitivity and specificity. We found that the RP method performed well for VSN-, FARMS-, MBEI-, and GCRMA-preprocessed data, and the WAD method performed well for mmgMOS-preprocessed data. Our analysis of the MicroArray Quality Control (MAQC) project's datasets showed that the FC-based gene ranking methods (WAD, AD, FC, and RP) had a higher level of reproducibility: The percentages of overlapping genes (POGs) across different sites for the FC-based methods were higher overall than those for the <it>t</it>-statistic-based methods (modT, samT, shrinkT, and ibmT). In particular, POG values for WAD were the highest overall among the FC-based methods irrespective of the choice of preprocessing algorithm. Conclusion Our results demonstrate that to increase sensitivity, specificity, and reproducibility in microarray analyses, we need to select suitable combinations of preprocessing algorithms and gene ranking methods. We recommend the use of FC-based methods, in particular RP or WAD.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Accurate Estimates of Microarray Target Concentration from a Simple Sequence-Independent Langmuir Model

Author: AE Pozhitkov
Anthony A. Fodor
Arkady B. Khodursky
BM Bolstad
C Li
CJ Burden
CJ Burden
Cynthia J. Gibas
D Abdueva
D Hekstra
DD Dalma-Weiszhausz
DP Kreil
E Chudin
GA Held
GC Mulders
H Binder
J Bishop
J SantaLucia Jr
J SantaLucia Jr
JB Fan
JC Willey
L Shi
L Shi
L Zhang
M Schena
MN McCall
PK Wolber
R Mei
RA Irizarry
RA Irizarry
Raad Z. Gharaibeh
RZ Gharaibeh
RZ Gharaibeh
S Li
SC Baker
WB Langdon
Z Wu
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: Microarray technology is a commonly used tool for assessing global gene expression. Many models for estimation of target concentration based on observed microarray signal have been proposed, but, in general, these models have been complex and platform-dependent. Principal Findings: We introduce a universal Langmuir model for estimation of absolute target concentration from microarray experiments. We find that this sequence-independent model, characterized by only three free parameters, yields excellent predictions for four microarray platforms, including Affymetrix, Agilent, Illumina and a custom-printed microarray. The model also accurately predicts concentration for the MAQC data sets. This approach significantly reduces the computational complexity of quantitative target concentration estimates. Conclusions: Using a simple form of the Langmuir isotherm model, with a minimum of parameters and assumptions, and without explicit modeling of individual probe properties, we were able to recover absolute transcript concentrations with high R 2 on four different array platforms. The results obtained here suggest that with a ‘‘spiked-in’ ’ concentration serie

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central