Search CORE

6 research outputs found

A Flexible Framework for Hypothesis Testing in High-dimensions

Author: Javanmard Adel
Lee Jason D.
Publication venue
Publication date: 21/09/2019
Field of study

Hypothesis testing in the linear regression model is a fundamental statistical problem. We consider linear regression in the high-dimensional regime where the number of parameters exceeds the number of samples (

p> n

). In order to make informative inference, we assume that the model is approximately sparse, that is the effect of covariates on the response can be well approximated by conditioning on a relatively small number of covariates whose identities are unknown. We develop a framework for testing very general hypotheses regarding the model parameters. Our framework encompasses testing whether the parameter lies in a convex cone, testing the signal strength, and testing arbitrary functionals of the parameter. We show that the proposed procedure controls the type I error, and also analyze the power of the procedure. Our numerical experiments confirm our theoretical findings and demonstrate that we control false positive rate (type I error) near the nominal level, and have high power. By duality between hypotheses testing and confidence intervals, the proposed framework can be used to obtain valid confidence intervals for various functionals of the model parameters. For linear functionals, the length of confidence intervals is shown to be minimax rate optimal.Comment: 45 page

arXiv.org e-Print Archive

Optimal Sparsity Testing in Linear regression Model

Author: Carpentier Alexandra
Verzelen Nicolas
Publication venue
Publication date: 23/04/2020
Field of study

We consider the problem of sparsity testing in the high-dimensional linear regression model. The problem is to test whether the number of non-zero components (aka the sparsity) of the regression parameter

\theta^*

is less than or equal to

k_0

. We pinpoint the minimax separation distances for this problem, which amounts to quantifying how far a

k_1

-sparse vector

\theta^*

has to be from the set of

k_0

-sparse vectors so that a test is able to reject the null hypothesis with high probability. Two scenarios are considered. In the independent scenario, the covariates are i.i.d. normally distributed and the noise level is known. In the general scenario, both the covariance matrix of the covariates and the noise level are unknown. Although the minimax separation distances differ in these two scenarios, both of them actually depend on

k_0

and

k_1

illustrating that for this composite-composite testing problem both the size of the null and of the alternative hypotheses play a key role.Comment: 50 page

arXiv.org e-Print Archive

Constrained High Dimensional Statistical Inference

Author: Gupta Varun
Kolar Mladen
Yu Ming
Publication venue
Publication date: 17/11/2019
Field of study

In typical high dimensional statistical inference problems, confidence intervals and hypothesis tests are performed for a low dimensional subset of model parameters under the assumption that the parameters of interest are unconstrained. However, in many problems, there are natural constraints on model parameters and one is interested in whether the parameters are on the boundary of the constraint or not. e.g. non-negativity constraints for transmission rates in network diffusion. In this paper, we provide algorithms to solve this problem of hypothesis testing in high-dimensional statistical models under constrained parameter space. We show that following our testing procedure we are able to get asymptotic designed Type I error under the null. Numerical experiments demonstrate that our algorithm has greater power than the standard algorithms where the constraints are ignored. We demonstrate the effectiveness of our algorithms on two real datasets where we have {\emph{intrinsic}} constraint on the parameters

arXiv.org e-Print Archive

Goodness-of-fit testing in high-dimensional generalized linear models

Author: Bühlmann Peter
Janková Jana
Samworth Richard J.
Shah Rajen D.
Publication venue
Publication date: 12/11/2019
Field of study

We propose a family of tests to assess the goodness-of-fit of a high-dimensional generalized linear model. Our framework is flexible and may be used to construct an omnibus test or directed against testing specific non-linearities and interaction effects, or for testing the significance of groups of variables. The methodology is based on extracting left-over signal in the residuals from an initial fit of a generalized linear model. This can be achieved by predicting this signal from the residuals using modern flexible regression or machine learning methods such as random forests or boosted trees. Under the null hypothesis that the generalized linear model is correct, no signal is left in the residuals and our test statistic has a Gaussian limiting distribution, translating to asymptotic control of type I error. Under a local alternative, we establish a guarantee on the power of the test. We illustrate the effectiveness of the methodology on simulated and real data examples by testing goodness-of-fit in logistic regression models. Software implementing the methodology is available in the R package `GRPtests'.Comment: 40 pages, 4 figure

arXiv.org e-Print Archive

Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications

Author: Cai T. Tony
Guo Zijian
Publication venue
Publication date: 16/06/2018
Field of study

We consider statistical inference for the explained variance

\beta^{\intercal}\Sigma \beta

under the high-dimensional linear model

Y=X\beta+\epsilon

in the semi-supervised setting, where

\beta

is the regression vector and

\Sigma

is the design covariance matrix. A calibrated estimator, which efficiently integrates both labelled and unlabelled data, is proposed. It is shown that the estimator achieves the minimax optimal rate of convergence in the general semi-supervised framework. The optimality result characterizes how the unlabelled data affects the minimax optimal rate. Moreover, the limiting distribution for the proposed estimator is established and data-driven confidence intervals for the explained variance are constructed. We further develop a randomized calibration technique for statistical inference in the presence of weak signals and apply the obtained inference results to a range of important statistical problems, including signal detection and global testing, prediction accuracy evaluation, and confidence ball construction. The numerical performance of the proposed methodology is demonstrated in simulation studies and an analysis of estimating heritability for a yeast segregant data set with multiple traits

arXiv.org e-Print Archive

Online Debiasing for Adaptively Collected High-dimensional Data with Applications to Time Series Analysis

Author: Deshpande Yash
Javanmard Adel
Mehrabi Mohammad
Publication venue
Publication date: 05/05/2020
Field of study

Adaptive collection of data is commonplace in applications throughout science and engineering. From the point of view of statistical inference however, adaptive data collection induces memory and correlation in the samples, and poses significant challenge. We consider the high-dimensional linear regression, where the samples are collected adaptively, and the sample size

n

can be smaller than

p

, the number of covariates. In this setting, there are two distinct sources of bias: the first due to regularization imposed for consistent estimation, e.g. using the LASSO, and the second due to adaptivity in collecting the samples. We propose "online debiasing", a general procedure for estimators such as the LASSO, which addresses both sources of bias. In two concrete contexts

(i)

time series analysis and

(ii)

batched data collection, we demonstrate that online debiasing optimally debiases the LASSO estimate when the underlying parameter

\theta_0

has sparsity of order

o(\sqrt{n}/\log p)

. In this regime, the debiased estimator can be used to compute

p

-values and confidence intervals of optimal size.Comment: 66 pages, 2 tables, 11 figures; updated with minor fixes and reorganizatio

arXiv.org e-Print Archive