Search CORE

2 research outputs found

Two-part permutation tests for DNA methylation and microarray data

Author: Boes Tanja
Jöckel Karl-Heinz
Neuhäuser Markus
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: One important application of microarray experiments is to identify differentially expressed genes. Often, small and negative expression levels were clipped-off to be equal to an arbitrarily chosen cutoff value before a statistical test is carried out. Then, there are two types of data: truncated values and original observations. The truncated values are not just another point on the continuum of possible values and, therefore, it is appropriate to combine two statistical tests in a two-part model rather than using standard statistical methods. A similar situation occurs when DNA methylation data are investigated. In that case, there are null values (undetectable methylation) and observed positive values. For these data, we propose a two-part permutation test. RESULTS: The proposed permutation test leads to smaller p-values in comparison to the original two-part test. We found this for both DNA methylation data and microarray data. With a simulation study we confirmed this result and could show that the two-part permutation test is, on average, more powerful. The new test also reduces, without any loss of power, to a standard test when there are no null or truncated values. CONCLUSION: The two-part permutation test can be used in routine analyses since it reduces to a standard test when there are positive values only. Further advantages of the new test are that it opens the possibility to use other test statistics to construct the two-part test and that it avoids the use of any asymptotic distribution. The latter advantage is particularly important for the analysis of microarrays since sample sizes are usually small

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Application of Two-Part Statistics for Comparison of Sequence Variant Counts

Author: AF Zuur
AP Hallstrom
B Rosner
BD Wagner
BJ Haas
Brandie D. Wagner
C Bascoul-Mollevi
Charles E. Robertson
D Lambert
DA Hill
DA Relman
DJ Lane
DN Frank
DN Frank
Dongxiao Zhu
E Pruesse
EC Berglund
EL Korn
EP Nawrocki
J Aitchison
J. Kirk Harris
JA Eisen
JM Potts
JR Cole
JW Sahl
JW Sahl
L Dethlefsen
M Neuhauser
NR Pace
P Lachenbruch
P Lachenbruch
PA Lachenbruch
Q Wang
R Simon
S Taylor
SD Sagel
SF Altschul
TG Martin
TZ DeSantis
TZ DeSantis Jr
VM Markowitz
W Ludwig
Publication venue: Public Library of Science
Publication date: 23/05/2011
Field of study

Investigation of microbial communities, particularly human associated communities, is significantly enhanced by the vast amounts of sequence data produced by high throughput sequencing technologies. However, these data create high-dimensional complex data sets that consist of a large proportion of zeros, non-negative skewed counts, and frequently, limited number of samples. These features distinguish sequence data from other forms of high-dimensional data, and are not adequately addressed by statistical approaches in common use. Ultimately, medical studies may identify targeted interventions or treatments, but lack of analytic tools for feature selection and identification of taxa responsible for differences between groups, is hindering advancement. The objective of this paper is to examine the application of a two-part statistic to identify taxa that differ between two groups. The advantages of the two-part statistic over common statistical tests applied to sequence count datasets are discussed. Results from the t-test, the Wilcoxon test, and the two-part test are compared using sequence counts from microbial ecology studies in cystic fibrosis and from cenote samples. We show superior performance of the two-part statistic for analysis of sequence data. The improved performance in microbial ecology studies was independent of study type and sequence technology used

Public Library of Science (PLOS)

Crossref

PubMed Central