5,147 research outputs found
Multiple tests of association with biological annotation metadata
We propose a general and formal statistical framework for multiple tests of
association between known fixed features of a genome and unknown parameters of
the distribution of variable features of this genome in a population of
interest. The known gene-annotation profiles, corresponding to the fixed
features of the genome, may concern Gene Ontology (GO) annotation, pathway
membership, regulation by particular transcription factors, nucleotide
sequences, or protein sequences. The unknown gene-parameter profiles,
corresponding to the variable features of the genome, may be, for example,
regression coefficients relating possibly censored biological and clinical
outcomes to genome-wide transcript levels, DNA copy numbers, and other
covariates. A generic question of great interest in current genomic research
regards the detection of associations between biological annotation metadata
and genome-wide expression measures. This biological question may be translated
as the test of multiple hypotheses concerning association measures between
gene-annotation profiles and gene-parameter profiles. A general and rigorous
formulation of the statistical inference question allows us to apply the
multiple hypothesis testing methodology developed in [Multiple Testing
Procedures with Applications to Genomics (2008) Springer, New York] and related
articles, to control a broad class of Type I error rates, defined as
generalized tail probabilities and expected values for arbitrary functions of
the numbers of Type I errors and rejected hypotheses. The resampling-based
single-step and stepwise multiple testing procedures of [Multiple Testing
Procedures with Applications to Genomics (2008) Springer, New York] take into
account the joint distribution of the test statistics and provide Type I error
control in testing problems involving general data generating distributions
(with arbitrary dependence structures among variables), null hypotheses, and
test statistics.Comment: Published in at http://dx.doi.org/10.1214/193940307000000446 the IMS
Collections (http://www.imstat.org/publications/imscollections.htm) by the
Institute of Mathematical Statistics (http://www.imstat.org
Discussion of: Treelets--An adaptive multi-scale basis for sparse unordered data
We would like to congratulate Lee, Nadler and Wasserman on their contribution
to clustering and data reduction methods for high and low situations. A
composite of clustering and traditional principal components analysis, treelets
is an innovative method for multi-resolution analysis of unordered data. It is
an improvement over traditional PCA and an important contribution to clustering
methodology. Their paper [arXiv:0707.0481] presents theory and supporting
applications addressing the two main goals of the treelet method: (1) Uncover
the underlying structure of the data and (2) Data reduction prior to
statistical learning methods. We will organize our discussion into two main
parts to address their methodology in terms of each of these two goals. We will
present and discuss treelets in terms of a clustering algorithm and an
improvement over traditional PCA. We will also discuss the applicability of
treelets to more general data, in particular, the application of treelets to
microarray data.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS137F the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A practical illustration of the importance of realistic individualized treatment rules in causal inference
The effect of vigorous physical activity on mortality in the elderly is
difficult to estimate using conventional approaches to causal inference that
define this effect by comparing the mortality risks corresponding to
hypothetical scenarios in which all subjects in the target population engage in
a given level of vigorous physical activity. A causal effect defined on the
basis of such a static treatment intervention can only be identified from
observed data if all subjects in the target population have a positive
probability of selecting each of the candidate treatment options, an assumption
that is highly unrealistic in this case since subjects with serious health
problems will not be able to engage in higher levels of vigorous physical
activity. This problem can be addressed by focusing instead on causal effects
that are defined on the basis of realistic individualized treatment rules and
intention-to-treat rules that explicitly take into account the set of treatment
options that are available to each subject. We present a data analysis to
illustrate that estimators of static causal effects in fact tend to
overestimate the beneficial impact of high levels of vigorous physical activity
while corresponding estimators based on realistic individualized treatment
rules and intention-to-treat rules can yield unbiased estimates. We emphasize
that the problems encountered in estimating static causal effects are not
restricted to the IPTW estimator, but are also observed with the
-computation estimator, the DR-IPTW estimator, and the targeted MLE. Our
analyses based on realistic individualized treatment rules and
intention-to-treat rules suggest that high levels of vigorous physical activity
may confer reductions in mortality risk on the order of 15-30%, although in
most cases the evidence for such an effect does not quite reach the 0.05 level
of significance.Comment: Published in at http://dx.doi.org/10.1214/07-EJS105 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy
We consider challenges that arise in the estimation of the mean outcome under
an optimal individualized treatment strategy defined as the treatment rule that
maximizes the population mean outcome, where the candidate treatment rules are
restricted to depend on baseline covariates. We prove a necessary and
sufficient condition for the pathwise differentiability of the optimal value, a
key condition needed to develop a regular and asymptotically linear (RAL)
estimator of the optimal value. The stated condition is slightly more general
than the previous condition implied in the literature. We then describe an
approach to obtain root- rate confidence intervals for the optimal value
even when the parameter is not pathwise differentiable. We provide conditions
under which our estimator is RAL and asymptotically efficient when the mean
outcome is pathwise differentiable. We also outline an extension of our
approach to a multiple time point problem. All of our results are supported by
simulations.Comment: Published at http://dx.doi.org/10.1214/15-AOS1384 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Evaluating the Impact of Treating the Optimal Subgroup
Suppose we have a binary treatment used to influence an outcome. Given data
from an observational or controlled study, we wish to determine whether or not
there exists some subset of observed covariates in which the treatment is more
effective than the standard practice of no treatment. Furthermore, we wish to
quantify the improvement in population mean outcome that will be seen if this
subgroup receives treatment and the rest of the population remains untreated.
We show that this problem is surprisingly challenging given how often it is an
(at least implicit) study objective. Blindly applying standard techniques fails
to yield any apparent asymptotic results, while using existing techniques to
confront the non-regularity does not necessarily help at distributions where
there is no treatment effect. Here we describe an approach to estimate the
impact of treating the subgroup which benefits from treatment that is valid in
a nonparametric model and is able to deal with the case where there is no
treatment effect. The approach is a slight modification of an approach that
recently appeared in the individualized medicine literature
- …