3,073 research outputs found

    On the power of conditional independence testing under model-X

    Full text link
    For testing conditional independence (CI) of a response Y and a predictor X given covariates Z, the recently introduced model-X (MX) framework has been the subject of active methodological research, especially in the context of MX knockoffs and their successful application to genome-wide association studies. In this paper, we study the power of MX CI tests, yielding quantitative explanations for empirically observed phenomena and novel insights to guide the design of MX methodology. We show that any valid MX CI test must also be valid conditionally on Y and Z; this conditioning allows us to reformulate the problem as testing a point null hypothesis involving the conditional distribution of X. The Neyman-Pearson lemma then implies that the conditional randomization test (CRT) based on a likelihood statistic is the most powerful MX CI test against a point alternative. We also obtain a related optimality result for MX knockoffs. Switching to an asymptotic framework with arbitrarily growing covariate dimension, we derive an expression for the limiting power of the CRT against local semiparametric alternatives in terms of the prediction error of the machine learning algorithm on which its test statistic is based. Finally, we exhibit a resampling-free test with uniform asymptotic Type-I error control under the assumption that only the first two moments of X given Z are known, a significant relaxation of the MX assumption

    Minimax testing of a composite null hypothesis defined via a quadratic functional in the model of regression

    Get PDF
    We consider the problem of testing a particular type of composite null hypothesis under a nonparametric multivariate regression model. For a given quadratic functional QQ, the null hypothesis states that the regression function ff satisfies the constraint Q[f]=0Q[f]=0, while the alternative corresponds to the functions for which Q[f]Q[f] is bounded away from zero. On the one hand, we provide minimax rates of testing and the exact separation constants, along with a sharp-optimal testing procedure, for diagonal and nonnegative quadratic functionals. We consider smoothness classes of ellipsoidal form and check that our conditions are fulfilled in the particular case of ellipsoids corresponding to anisotropic Sobolev classes. In this case, we present a closed form of the minimax rate and the separation constant. On the other hand, minimax rates for quadratic functionals which are neither positive nor negative makes appear two different regimes: "regular" and "irregular". In the "regular" case, the minimax rate is equal to n1/4n^{-1/4} while in the "irregular" case, the rate depends on the smoothness class and is slower than in the "regular" case. We apply this to the issue of testing the equality of norms of two functions observed in noisy environments

    Detection of Sparse Positive Dependence

    Full text link
    In a bivariate setting, we consider the problem of detecting a sparse contamination or mixture component, where the effect manifests itself as a positive dependence between the variables, which are otherwise independent in the main component. We first look at this problem in the context of a normal mixture model. In essence, the situation reduces to a univariate setting where the effect is a decrease in variance. In particular, a higher criticism test based on the pairwise differences is shown to achieve the detection boundary defined by the (oracle) likelihood ratio test. We then turn to a Gaussian copula model where the marginal distributions are unknown. Standard invariance considerations lead us to consider rank tests. In fact, a higher criticism test based on the pairwise rank differences achieves the detection boundary in the normal mixture model, although not in the very sparse regime. We do not know of any rank test that has any power in that regime

    Goodness-of-fit testing and quadratic functional estimation from indirect observations

    Full text link
    We consider the convolution model where i.i.d. random variables XiX_i having unknown density ff are observed with additive i.i.d. noise, independent of the XX's. We assume that the density ff belongs to either a Sobolev class or a class of supersmooth functions. The noise distribution is known and its characteristic function decays either polynomially or exponentially asymptotically. We consider the problem of goodness-of-fit testing in the convolution model. We prove upper bounds for the risk of a test statistic derived from a kernel estimator of the quadratic functional f2\int f^2 based on indirect observations. When the unknown density is smoother enough than the noise density, we prove that this estimator is n1/2n^{-1/2} consistent, asymptotically normal and efficient (for the variance we compute). Otherwise, we give nonparametric upper bounds for the risk of the same estimator. We give an approach unifying the proof of nonparametric minimax lower bounds for both problems. We establish them for Sobolev densities and for supersmooth densities less smooth than exponential noise. In the two setups we obtain exact testing constants associated with the asymptotic minimax rates.Comment: Published in at http://dx.doi.org/10.1214/009053607000000118 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A partially linear approach to modelling the dynamics of spot and futures prices

    Get PDF
    In this paper we consider the dynamics of spot and futures prices in the presence of arbitrage. We propose a partially linear error correction model where the adjustment coefficient is allowed to depend non-linearly on the lagged price difference. We estimate our model using data on the DAX index and the DAX futures contract. We find that the adjustment is indeed nonlinear. The linear alternative is rejected. The speed of price adjustment is increasing almost monotonically with the magnitude of the price difference

    Tight conditions for consistency of variable selection in the context of high dimensionality

    Get PDF
    We address the issue of variable selection in the regression model with very high ambient dimension, that is, when the number of variables is very large. The main focus is on the situation where the number of relevant variables, called intrinsic dimension, is much smaller than the ambient dimension d. Without assuming any parametric form of the underlying regression function, we get tight conditions making it possible to consistently estimate the set of relevant variables. These conditions relate the intrinsic dimension to the ambient dimension and to the sample size. The procedure that is provably consistent under these tight conditions is based on comparing quadratic functionals of the empirical Fourier coefficients with appropriately chosen threshold values. The asymptotic analysis reveals the presence of two quite different re gimes. The first regime is when the intrinsic dimension is fixed. In this case the situation in nonparametric regression is the same as in linear regression, that is, consistent variable selection is possible if and only if log d is small compared to the sample size n. The picture is different in the second regime, that is, when the number of relevant variables denoted by s tends to infinity as nn\to\infty. Then we prove that consistent variable selection in nonparametric set-up is possible only if s+loglog d is small compared to log n. We apply these results to derive minimax separation rates for the problem of variableComment: arXiv admin note: text overlap with arXiv:1102.3616; Published in at http://dx.doi.org/10.1214/12-AOS1046 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore