Search CORE

31,281 research outputs found

A comprehensive evaluation of SAM, the SAM R-package and a simple modification to improve its performance

Author: AJ Rice
B Efron
B Efron
CM Kendziorski
GK Smyth
JG Thomas
M Newton
MA Newton
MK Kerr
O Larsson
P Delmar
S Dudoit
S Zhang
Shunpu Zhang
TR Golub
VG Tusher
W Huber
W Pan
W Pan
X Guo
Y Xie
Y Zhao
Publication venue: BioMed Central
Publication date: 01/06/2007
Field of study

Abstract Background The Significance Analysis of Microarrays (SAM) is a popular method for detecting significantly expressed genes and controlling the false discovery rate (FDR). Recently, it has been reported in the literature that the FDR is not well controlled by SAM. Due to the vast application of SAM in microarray data analysis, it is of great importance to have an extensive evaluation of SAM and its associated R-package (sam2.20). Results Our study has identified several discrepancies between SAM and sam2.20. One major difference is that SAM and sam2.20 use different methods for estimating FDR. Such discrepancies may cause confusion among the researchers who are using SAM or are developing the SAM-like methods. We have also shown that SAM provides no meaningful estimates of FDR and this problem has been corrected in sam2.20 by using a different formula for estimating FDR. However, we have found that, even with the improvement sam2.20 has made over SAM, sam2.20 may still produce erroneous and even conflicting results under certain situations. Using an example, we show that the problem of sam2.20 is caused by its use of asymmetric cutoffs which are due to the large variability of null scores at both ends of the order statistics. An obvious approach without the complication of the order statistics is the conventional symmetric cutoff method. For this reason, we have carried out extensive simulations to compare the performance of sam2.20 and the symmetric cutoff method. Finally, a simple modification is proposed to improve the FDR estimation of sam2.20 and the symmetric cutoff method. Conclusion Our study shows that the most serious drawback of SAM is its poor estimation of FDR. Although this drawback has been corrected in sam2.20, the control of FDR by sam2.20 is still not satisfactory. The comparison between sam2.20 and the symmetric cutoff method reveals that the relative performance of sam2.20 to the symmetric cutff method depends on the ratio of induced to repressed genes in a microarray data, and is also affected by the ratio of DE to EE genes and the distributions of induced and repressed genes. Numerical simulations show that the symmetric cutoff method has the biggest advantage over sam2.20 when there are equal number of induced and repressed genes (i.e., the ratio of induced to repressed genes is 1). As the ratio of induced to repressed genes moves away from 1, the advantage of the symmetric cutoff method to sam2.20 is gradually diminishing until eventually sam2.20 becomes significantly better than the symmetric cutoff method when the differentially expressed (DE) genes are either all induced or all repressed genes. Simulation results also show that our proposed simple modification provides improved control of FDR for both sam2.20 and the symmetric cutoff method.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

RELIC: a novel dye-bias correction method for Illumina Methylation BeadChip

Author: Jack A. Taylor
Liang Niu
Patrick De Boever
Sabine A. S. Langie
Zongli Xu
Publication venue: Springer Nature
Publication date: 01/01/2017
Field of study

Supplementary_Material. This docx file contains all supplementary tables and supplementary figures. (DOCX 424Â kb

Springer - Publisher Connector

FigShare

IMAGINE Final Report

Author: Arana C
Dattani I
Pick R
Recio I
Schmidt P
Publication venue: s.n.
Publication date: 01/09/2003
Field of study

Southampton (e-Prints Soton)

Computational statistics using the Bayesian Inference Engine

Author: Babu
Berntsen
Feroz
Gelman
Gelman
Gelman
Geyer
Giakoumatos
Green
Gregory
Grubbs
Hastings
Hobson
Jeffreys
Kass
Kirkpatrick
Lewis
Lindley
Liu
Lu
Lu
Martin D. Weinberg
Metropolis
Neal
Neal
Newton
Pearson
Press
Price
Raftery
Robert
Skilling
Storn
Storn
Sérsic
Ter Braak
Verdinelli
Wall
Weinberg
Weinberg
Yoon
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimised software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organise and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasises hybrid tempered MCMC schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE is implements a full persistence or serialisation system that stores the full byte-level image of the running inference and previously characterised posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU GPL.Comment: Resubmitted version. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU GP

arXiv.org e-Print Archive

CiteSeerX

Crossref

BayesCCE: a Bayesian framework for estimating cell-type composition from DNA methylation without the need for methylation reference.

Author: Eskin Eleazar
Gabel Eilon
Halperin Eran
Hofer Ira
Rahmani Elior
Schweiger Regev
Shenhav Liat
Wingert Theodora
Publication venue: eScholarship, University of California
Publication date: 01/09/2018
Field of study

We introduce a Bayesian semi-supervised method for estimating cell counts from DNA methylation by leveraging an easily obtainable prior knowledge on the cell-type composition distribution of the studied tissue. We show mathematically and empirically that alternative methods which attempt to infer cell counts without methylation reference only capture linear combinations of cell counts rather than provide one component per cell type. Our approach allows the construction of components such that each component corresponds to a single cell type, and provides a new opportunity to investigate cell compositions in genomic studies of tissues for which it was not possible before

Directory of Open Access Journals

eScholarship - University of California

FigShare

Fully Integrated Biochip Platforms for Advanced Healthcare

Author: Baj-Rossi Camilla
Burleson Wayne
Carrara Sandro
Cavallini Andrea
de Beeck Maaike Op
Dehollain Catherine
Ghoreishizadeh Sara
Guiseppi-Elie Anthony
Micheli Giovanni De
Moussy Francis Gabriel
Olivo Jacopo
Taurino Irene
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2012
Field of study

Recent advances in microelectronics and biosensors are enabling developments of innovative biochips for advanced healthcare by providing fully integrated platforms for continuous monitoring of a large set of human disease biomarkers. Continuous monitoring of several human metabolites can be addressed by using fully integrated and minimally invasive devices located in the sub-cutis, typically in the peritoneal region. This extends the techniques of continuous monitoring of glucose currently being pursued with diabetic patients. However, several issues have to be considered in order to succeed in developing fully integrated and minimally invasive implantable devices. These innovative devices require a high-degree of integration, minimal invasive surgery, long-term biocompatibility, security and privacy in data transmission, high reliability, high reproducibility, high specificity, low detection limit and high sensitivity. Recent advances in the field have already proposed possible solutions for several of these issues. The aim of the present paper is to present a broad spectrum of recent results and to propose future directions of development in order to obtain fully implantable systems for the continuous monitoring of the human metabolism in advanced healthcare applications

Infoscience - École polytechnique fédérale de Lausanne

Multidisciplinary Digital Publishing Institute

CiteSeerX

Crossref

ScholarWorks@UMass Amherst

Directory of Open Access Journals

Ghent University Academic Bibliography

PubMed Central

Spiral - Imperial College Digital Repository

Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies

Author: Aurelien de Reynies
B Wu
C Kooperberg
C Murie
C Yauk
Caroline Paccard
D Allison
D Chessel
D Rickman
F Jaffrezic
G Marot
G Smyth
G Wright
Gregory Nuel
I Jeffery
J Soulier
JD Storey
Kerby Shedden
L Lamant
L Van 't Veer
L Zhou
Laetitia Marisa
M Kerr
M McCall
M Pirooznia
M Sullivan Pepe
Marine Jeanmougin
Mickael Guedj
N Jain
P Bertheau
P Delmar
R Simon
S Boyault
S Dudoit
S Zhang
T Mary-Huard
T Sorlie
V Tusher
X Huang
Y Benjamini
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alternatives have been developed based on different and innovative variance modeling strategies. However, a critical issue is that selecting a different test usually leads to a different gene list. In this context and given the current tendency to apply the t-test, identifying the most efficient approach in practice remains crucial. To provide elements to answer, we conduct a comparison of eight tests representative of variance modeling strategies in gene expression data: Welch's t-test, ANOVA [1], Wilcoxon's test, SAM [2], RVM [3], limma [4], VarMixt [5] and SMVar [6]. Our comparison process relies on four steps (gene list analysis, simulations, spike-in data and re-sampling) to formulate comprehensive and robust conclusions about test performance, in terms of statistical power, false-positive rate, execution time and ease of use. Our results raise concerns about the ability of some methods to control the expected number of false positives at a desirable level. Besides, two tests (limma and VarMixt) show significant improvement compared to the t-test, in particular to deal with small sample sizes. In addition limma presents several practical advantages, so we advocate its application to analyze gene expression data

CiteSeerX

Public Library of Science (PLOS)

HAL Evry

Crossref

Directory of Open Access Journals

PubMed Central

HAL Descartes

ProdInra

A Comparative Review of Dimension Reduction Methods in Approximate Bayesian Computation

Author: Blum M. G. B.
Nunes M. A.
Prangle D.
Sisson S. A.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2013
Field of study

Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions. As the practical implementation of ABC requires computations based on vectors of summary statistics, rather than full data sets, a central question is how to derive low-dimensional summary statistics from the observed data with minimal loss of information. In this article we provide a comprehensive review and comparison of the performance of the principal methods of dimension reduction proposed in the ABC literature. The methods are split into three nonmutually exclusive classes consisting of best subset selection methods, projection techniques and regularization. In addition, we introduce two new methods of dimension reduction. The first is a best subset selection method based on Akaike and Bayesian information criteria, and the second uses ridge regression as a regularization procedure. We illustrate the performance of these dimension reduction techniques through the analysis of three challenging models and data sets.Comment: Published in at http://dx.doi.org/10.1214/12-STS406 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Central Archive at the University of Reading

Crossref

Hal - Université Grenoble Alpes

Lancaster E-Prints