3,073 research outputs found
On the power of conditional independence testing under model-X
For testing conditional independence (CI) of a response Y and a predictor X
given covariates Z, the recently introduced model-X (MX) framework has been the
subject of active methodological research, especially in the context of MX
knockoffs and their successful application to genome-wide association studies.
In this paper, we study the power of MX CI tests, yielding quantitative
explanations for empirically observed phenomena and novel insights to guide the
design of MX methodology. We show that any valid MX CI test must also be valid
conditionally on Y and Z; this conditioning allows us to reformulate the
problem as testing a point null hypothesis involving the conditional
distribution of X. The Neyman-Pearson lemma then implies that the conditional
randomization test (CRT) based on a likelihood statistic is the most powerful
MX CI test against a point alternative. We also obtain a related optimality
result for MX knockoffs. Switching to an asymptotic framework with arbitrarily
growing covariate dimension, we derive an expression for the limiting power of
the CRT against local semiparametric alternatives in terms of the prediction
error of the machine learning algorithm on which its test statistic is based.
Finally, we exhibit a resampling-free test with uniform asymptotic Type-I error
control under the assumption that only the first two moments of X given Z are
known, a significant relaxation of the MX assumption
Minimax testing of a composite null hypothesis defined via a quadratic functional in the model of regression
We consider the problem of testing a particular type of composite null
hypothesis under a nonparametric multivariate regression model. For a given
quadratic functional , the null hypothesis states that the regression
function satisfies the constraint , while the alternative
corresponds to the functions for which is bounded away from zero. On the
one hand, we provide minimax rates of testing and the exact separation
constants, along with a sharp-optimal testing procedure, for diagonal and
nonnegative quadratic functionals. We consider smoothness classes of
ellipsoidal form and check that our conditions are fulfilled in the particular
case of ellipsoids corresponding to anisotropic Sobolev classes. In this case,
we present a closed form of the minimax rate and the separation constant. On
the other hand, minimax rates for quadratic functionals which are neither
positive nor negative makes appear two different regimes: "regular" and
"irregular". In the "regular" case, the minimax rate is equal to
while in the "irregular" case, the rate depends on the smoothness class and is
slower than in the "regular" case. We apply this to the issue of testing the
equality of norms of two functions observed in noisy environments
Detection of Sparse Positive Dependence
In a bivariate setting, we consider the problem of detecting a sparse
contamination or mixture component, where the effect manifests itself as a
positive dependence between the variables, which are otherwise independent in
the main component. We first look at this problem in the context of a normal
mixture model. In essence, the situation reduces to a univariate setting where
the effect is a decrease in variance. In particular, a higher criticism test
based on the pairwise differences is shown to achieve the detection boundary
defined by the (oracle) likelihood ratio test. We then turn to a Gaussian
copula model where the marginal distributions are unknown. Standard invariance
considerations lead us to consider rank tests. In fact, a higher criticism test
based on the pairwise rank differences achieves the detection boundary in the
normal mixture model, although not in the very sparse regime. We do not know of
any rank test that has any power in that regime
Goodness-of-fit testing and quadratic functional estimation from indirect observations
We consider the convolution model where i.i.d. random variables having
unknown density are observed with additive i.i.d. noise, independent of the
's. We assume that the density belongs to either a Sobolev class or a
class of supersmooth functions. The noise distribution is known and its
characteristic function decays either polynomially or exponentially
asymptotically. We consider the problem of goodness-of-fit testing in the
convolution model. We prove upper bounds for the risk of a test statistic
derived from a kernel estimator of the quadratic functional based on
indirect observations. When the unknown density is smoother enough than the
noise density, we prove that this estimator is consistent,
asymptotically normal and efficient (for the variance we compute). Otherwise,
we give nonparametric upper bounds for the risk of the same estimator. We give
an approach unifying the proof of nonparametric minimax lower bounds for both
problems. We establish them for Sobolev densities and for supersmooth densities
less smooth than exponential noise. In the two setups we obtain exact testing
constants associated with the asymptotic minimax rates.Comment: Published in at http://dx.doi.org/10.1214/009053607000000118 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A partially linear approach to modelling the dynamics of spot and futures prices
In this paper we consider the dynamics of spot and futures prices in the presence of arbitrage. We propose a partially linear error correction model where the adjustment coefficient is allowed to depend non-linearly on the lagged price difference. We estimate our model using data on the DAX index and the DAX futures contract. We find that the adjustment is indeed nonlinear. The linear alternative is rejected. The speed of price adjustment is increasing almost monotonically with the magnitude of the price difference
Tight conditions for consistency of variable selection in the context of high dimensionality
We address the issue of variable selection in the regression model with very
high ambient dimension, that is, when the number of variables is very large.
The main focus is on the situation where the number of relevant variables,
called intrinsic dimension, is much smaller than the ambient dimension d.
Without assuming any parametric form of the underlying regression function, we
get tight conditions making it possible to consistently estimate the set of
relevant variables. These conditions relate the intrinsic dimension to the
ambient dimension and to the sample size. The procedure that is provably
consistent under these tight conditions is based on comparing quadratic
functionals of the empirical Fourier coefficients with appropriately chosen
threshold values. The asymptotic analysis reveals the presence of two quite
different re gimes. The first regime is when the intrinsic dimension is fixed.
In this case the situation in nonparametric regression is the same as in linear
regression, that is, consistent variable selection is possible if and only if
log d is small compared to the sample size n. The picture is different in the
second regime, that is, when the number of relevant variables denoted by s
tends to infinity as . Then we prove that consistent variable
selection in nonparametric set-up is possible only if s+loglog d is small
compared to log n. We apply these results to derive minimax separation rates
for the problem of variableComment: arXiv admin note: text overlap with arXiv:1102.3616; Published in at
http://dx.doi.org/10.1214/12-AOS1046 the Annals of Statistics
(http://www.imstat.org/aos/) by the Institute of Mathematical Statistics
(http://www.imstat.org
- …