88,659 research outputs found
Ball: An R package for detecting distribution difference and association in metric spaces
The rapid development of modern technology facilitates the appearance of
numerous unprecedented complex data which do not satisfy the axioms of
Euclidean geometry, while most of the statistical hypothesis tests are
available in Euclidean or Hilbert spaces. To properly analyze the data of more
complicated structures, efforts have been made to solve the fundamental test
problems in more general spaces. In this paper, a publicly available R package
Ball is provided to implement Ball statistical test procedures for K-sample
distribution comparison and test of mutual independence in metric spaces, which
extend the test procedures for two sample distribution comparison and test of
independence. The tailormade algorithms as well as engineering techniques are
employed on the Ball package to speed up computation to the best of our
ability. Two real data analyses and several numerical studies have been
performed and the results certify the powerfulness of Ball package in analyzing
complex data, e.g., spherical data and symmetric positive matrix data
A statistical framework for testing functional categories in microarray data
Ready access to emerging databases of gene annotation and functional pathways
has shifted assessments of differential expression in DNA microarray studies
from single genes to groups of genes with shared biological function. This
paper takes a critical look at existing methods for assessing the differential
expression of a group of genes (functional category), and provides some
suggestions for improved performance. We begin by presenting a general
framework, in which the set of genes in a functional category is compared to
the complementary set of genes on the array. The framework includes tests for
overrepresentation of a category within a list of significant genes, and
methods that consider continuous measures of differential expression. Existing
tests are divided into two classes. Class 1 tests assume gene-specific measures
of differential expression are independent, despite overwhelming evidence of
positive correlation. Analytic and simulated results are presented that
demonstrate Class 1 tests are strongly anti-conservative in practice. Class 2
tests account for gene correlation, typically through array permutation that by
construction has proper Type I error control for the induced null. However,
both Class 1 and Class 2 tests use a null hypothesis that all genes have the
same degree of differential expression. We introduce a more sensible and
general (Class 3) null under which the profile of differential expression is
the same within the category and complement. Under this broader null, Class 2
tests are shown to be conservative. We propose standard bootstrap methods for
testing against the Class 3 null and demonstrate they provide valid Type I
error control and more power than array permutation in simulated datasets and
real microarray experiments.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS146 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A hierarchical Bayesian model for inference of copy number variants and their association to gene expression
A number of statistical models have been successfully developed for the
analysis of high-throughput data from a single source, but few methods are
available for integrating data from different sources. Here we focus on
integrating gene expression levels with comparative genomic hybridization (CGH)
array measurements collected on the same subjects. We specify a measurement
error model that relates the gene expression levels to latent copy number
states which, in turn, are related to the observed surrogate CGH measurements
via a hidden Markov model. We employ selection priors that exploit the
dependencies across adjacent copy number states and investigate MCMC stochastic
search techniques for posterior inference. Our approach results in a unified
modeling framework for simultaneously inferring copy number variants (CNV) and
identifying their significant associations with mRNA transcripts abundance. We
show performance on simulated data and illustrate an application to data from a
genomic study on human cancer cell lines.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS705 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …