8,507 research outputs found
Monitoring data in R with the lumberjack package
Monitoring data while it is processed and transformed can yield detailed
insight into the dynamics of a (running) production system. The lumberjack
package is a lightweight package allowing users to follow how an R object is
transformed as it is manipulated by R code. The package abstracts all logging
code from the user, who only needs to specify which objects are logged and what
information should be logged. A few default loggers are included with the
package but the package is extensible through user-defined logger objects.Comment: Accepted for publication in the Journal of Statistical Softwar
Multiple tests of association with biological annotation metadata
We propose a general and formal statistical framework for multiple tests of
association between known fixed features of a genome and unknown parameters of
the distribution of variable features of this genome in a population of
interest. The known gene-annotation profiles, corresponding to the fixed
features of the genome, may concern Gene Ontology (GO) annotation, pathway
membership, regulation by particular transcription factors, nucleotide
sequences, or protein sequences. The unknown gene-parameter profiles,
corresponding to the variable features of the genome, may be, for example,
regression coefficients relating possibly censored biological and clinical
outcomes to genome-wide transcript levels, DNA copy numbers, and other
covariates. A generic question of great interest in current genomic research
regards the detection of associations between biological annotation metadata
and genome-wide expression measures. This biological question may be translated
as the test of multiple hypotheses concerning association measures between
gene-annotation profiles and gene-parameter profiles. A general and rigorous
formulation of the statistical inference question allows us to apply the
multiple hypothesis testing methodology developed in [Multiple Testing
Procedures with Applications to Genomics (2008) Springer, New York] and related
articles, to control a broad class of Type I error rates, defined as
generalized tail probabilities and expected values for arbitrary functions of
the numbers of Type I errors and rejected hypotheses. The resampling-based
single-step and stepwise multiple testing procedures of [Multiple Testing
Procedures with Applications to Genomics (2008) Springer, New York] take into
account the joint distribution of the test statistics and provide Type I error
control in testing problems involving general data generating distributions
(with arbitrary dependence structures among variables), null hypotheses, and
test statistics.Comment: Published in at http://dx.doi.org/10.1214/193940307000000446 the IMS
Collections (http://www.imstat.org/publications/imscollections.htm) by the
Institute of Mathematical Statistics (http://www.imstat.org
Discussion of: Treelets--An adaptive multi-scale basis for sparse unordered data
We would like to congratulate Lee, Nadler and Wasserman on their contribution
to clustering and data reduction methods for high and low situations. A
composite of clustering and traditional principal components analysis, treelets
is an innovative method for multi-resolution analysis of unordered data. It is
an improvement over traditional PCA and an important contribution to clustering
methodology. Their paper [arXiv:0707.0481] presents theory and supporting
applications addressing the two main goals of the treelet method: (1) Uncover
the underlying structure of the data and (2) Data reduction prior to
statistical learning methods. We will organize our discussion into two main
parts to address their methodology in terms of each of these two goals. We will
present and discuss treelets in terms of a clustering algorithm and an
improvement over traditional PCA. We will also discuss the applicability of
treelets to more general data, in particular, the application of treelets to
microarray data.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS137F the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A practical illustration of the importance of realistic individualized treatment rules in causal inference
The effect of vigorous physical activity on mortality in the elderly is
difficult to estimate using conventional approaches to causal inference that
define this effect by comparing the mortality risks corresponding to
hypothetical scenarios in which all subjects in the target population engage in
a given level of vigorous physical activity. A causal effect defined on the
basis of such a static treatment intervention can only be identified from
observed data if all subjects in the target population have a positive
probability of selecting each of the candidate treatment options, an assumption
that is highly unrealistic in this case since subjects with serious health
problems will not be able to engage in higher levels of vigorous physical
activity. This problem can be addressed by focusing instead on causal effects
that are defined on the basis of realistic individualized treatment rules and
intention-to-treat rules that explicitly take into account the set of treatment
options that are available to each subject. We present a data analysis to
illustrate that estimators of static causal effects in fact tend to
overestimate the beneficial impact of high levels of vigorous physical activity
while corresponding estimators based on realistic individualized treatment
rules and intention-to-treat rules can yield unbiased estimates. We emphasize
that the problems encountered in estimating static causal effects are not
restricted to the IPTW estimator, but are also observed with the
-computation estimator, the DR-IPTW estimator, and the targeted MLE. Our
analyses based on realistic individualized treatment rules and
intention-to-treat rules suggest that high levels of vigorous physical activity
may confer reductions in mortality risk on the order of 15-30%, although in
most cases the evidence for such an effect does not quite reach the 0.05 level
of significance.Comment: Published in at http://dx.doi.org/10.1214/07-EJS105 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Nonsquare Spectral Factorization for Nonlinear Control Systems
This paper considers nonsquare spectral factorization of nonlinear input affine state space systems in continuous time. More specifically, we obtain a parametrization of nonsquare spectral factors in terms of invariant Lagrangian submanifolds and associated solutions of Hamilton–Jacobi inequalities. This inequality is a nonlinear analogue of the bounded real lemma and the control algebraic Riccati inequality. By way of an application, we discuss an alternative characterization of minimum and maximum phase spectral factors and introduce the notion of a rigid nonlinear system.
Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy
We consider challenges that arise in the estimation of the mean outcome under
an optimal individualized treatment strategy defined as the treatment rule that
maximizes the population mean outcome, where the candidate treatment rules are
restricted to depend on baseline covariates. We prove a necessary and
sufficient condition for the pathwise differentiability of the optimal value, a
key condition needed to develop a regular and asymptotically linear (RAL)
estimator of the optimal value. The stated condition is slightly more general
than the previous condition implied in the literature. We then describe an
approach to obtain root- rate confidence intervals for the optimal value
even when the parameter is not pathwise differentiable. We provide conditions
under which our estimator is RAL and asymptotically efficient when the mean
outcome is pathwise differentiable. We also outline an extension of our
approach to a multiple time point problem. All of our results are supported by
simulations.Comment: Published at http://dx.doi.org/10.1214/15-AOS1384 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …