Search CORE

298 research outputs found

Hazard models with varying coefficients for multivariate failure time data

Author: Cai Jianwen
Fan Jianqing
Zhou Haibo
Zhou Yong
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

Statistical estimation and inference for marginal hazard models with varying coefficients for multivariate failure time data are important subjects in survival analysis. A local pseudo-partial likelihood procedure is proposed for estimating the unknown coefficient functions. A weighted average estimator is also proposed in an attempt to improve the efficiency of the estimator. The consistency and asymptotic normality of the proposed estimators are established and standard error formulas for the estimated coefficients are derived and empirically tested. To reduce the computational burden of the maximum local pseudo-partial likelihood estimator, a simple and useful one-step estimator is proposed. Statistical properties of the one-step estimator are established and simulation studies are conducted to compare the performance of the one-step estimator to that of the maximum local pseudo-partial likelihood estimator. The results show that the one-step estimator can save computational cost without compromising performance both asymptotically and empirically and that an optimal weighted average estimator is more efficient than the maximum local pseudo-partial likelihood estimator. A data set from the Busselton Population Health Surveys is analyzed to illustrate our proposed methodology.Comment: Published at http://dx.doi.org/10.1214/009053606000001145 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Carolina Digital Repository

Local partial-likelihood estimation for lifetime data

Author: Fan Jianqing
Lin Huazhen
Zhou Yong
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

This paper considers a proportional hazards model, which allows one to examine the extent to which covariates interact nonlinearly with an exposure variable, for analysis of lifetime data. A local partial-likelihood technique is proposed to estimate nonlinear interactions. Asymptotic normality of the proposed estimator is established. The baseline hazard function, the bias and the variance of the local likelihood estimator are consistently estimated. In addition, a one-step local partial-likelihood estimator is presented to facilitate the computation of the proposed procedure and is demonstrated to be as efficient as the fully iterated local partial-likelihood estimator. Furthermore, a penalized local likelihood estimator is proposed to select important risk variables in the model. Numerical examples are used to illustrate the effectiveness of the proposed procedures.Comment: Published at http://dx.doi.org/10.1214/009053605000000796 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Adaptive Huber Regression

Author: Fan Jianqing
Sun Qiang
Zhou Wenxin
Publication venue
Publication date: 10/10/2018
Field of study

Big data can easily be contaminated by outliers or contain variables with heavy-tailed distributions, which makes many conventional methods inadequate. To address this challenge, we propose the adaptive Huber regression for robust estimation and inference. The key observation is that the robustification parameter should adapt to the sample size, dimension and moments for optimal tradeoff between bias and robustness. Our theoretical framework deals with heavy-tailed distributions with bounded

(1+\delta)

-th moment for any

\delta > 0

. We establish a sharp phase transition for robust estimation of regression parameters in both low and high dimensions: when

\delta \geq 1

, the estimator admits a sub-Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime

0<\delta< 1

. Furthermore, this transition is smooth and optimal. In addition, we extend the methodology to allow both heavy-tailed predictors and observation noise. Simulation studies lend further support to the theory. In a genetic study of cancer cell lines that exhibit heavy-tailedness, the proposed methods are shown to be more robust and predictive.Comment: final versio

arXiv.org e-Print Archive

eScholarship - University of California

FigShare

Are Discoveries Spurious? Distributions of Maximum Spurious Correlations and Their Applications

Author: Fan Jianqing
Shao Qi-Man
Zhou Wen-Xin
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 21/07/2017
Field of study

Over the last two decades, many exciting variable selection methods have been developed for finding a small group of covariates that are associated with the response from a large pool. Can the discoveries from these data mining approaches be spurious due to high dimensionality and limited sample size? Can our fundamental assumptions about the exogeneity of the covariates needed for such variable selection be validated with the data? To answer these questions, we need to derive the distributions of the maximum spurious correlations given a certain number of predictors, namely, the distribution of the correlation of a response variable

Y

with the best

s

linear combinations of

p

covariates

\mathbf{X}

, even when

\mathbf{X}

and

Y

are independent. When the covariance matrix of

\mathbf{X}

possesses the restricted eigenvalue property, we derive such distributions for both a finite

s

and a diverging

s

, using Gaussian approximation and empirical process techniques. However, such a distribution depends on the unknown covariance matrix of

\mathbf{X}

. Hence, we use the multiplier bootstrap procedure to approximate the unknown distributions and establish the consistency of such a simple bootstrap approach. The results are further extended to the situation where the residuals are from regularized fits. Our approach is then used to construct the upper confidence limit for the maximum spurious correlation and to test the exogeneity of the covariates. The former provides a baseline for guarding against false discoveries and the latter tests whether our fundamental assumptions for high-dimensional model selection are statistically valid. Our techniques and results are illustrated with both numerical examples and real data analysis

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

eScholarship - University of California

A New Perspective on Robust $M$ -Estimation: Finite Sample Theory and Applications to Dependence-Adjusted Multiple Testing

Author: Bose Koushiki
Fan Jianqing
Liu Han
Zhou Wen-Xin
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 14/11/2017
Field of study

Heavy-tailed errors impair the accuracy of the least squares estimate, which can be spoiled by a single grossly outlying observation. As argued in the seminal work of Peter Huber in 1973 [{\it Ann. Statist.} {\bf 1} (1973) 799--821], robust alternatives to the method of least squares are sorely needed. To achieve robustness against heavy-tailed sampling distributions, we revisit the Huber estimator from a new perspective by letting the tuning parameter involved diverge with the sample size. In this paper, we develop nonasymptotic concentration results for such an adaptive Huber estimator, namely, the Huber estimator with the tuning parameter adapted to sample size, dimension, and the variance of the noise. Specifically, we obtain a sub-Gaussian-type deviation inequality and a nonasymptotic Bahadur representation when noise variables only have finite second moments. The nonasymptotic results further yield two conventional normal approximation results that are of independent interest, the Berry-Esseen inequality and Cram\'er-type moderate deviation. As an important application to large-scale simultaneous inference, we apply these robust normal approximation results to analyze a dependence-adjusted multiple testing procedure for moderately heavy-tailed data. It is shown that the robust dependence-adjusted procedure asymptotically controls the overall false discovery proportion at the nominal level under mild moment conditions. Thorough numerical results on both simulated and real datasets are also provided to back up our theory.Comment: Ann. Statist. (in press

arXiv.org e-Print Archive

eScholarship - University of California

FarmTest: Factor-Adjusted Robust Multiple Testing with Approximate False Discovery Control

Author: Fan Jianqing
Ke Yuan
Sun Qiang
Zhou Wen-Xin
Publication venue
Publication date: 17/09/2018
Field of study

Large-scale multiple testing with correlated and heavy-tailed data arises in a wide range of research areas from genomics, medical imaging to finance. Conventional methods for estimating the false discovery proportion (FDP) often ignore the effect of heavy-tailedness and the dependence structure among test statistics, and thus may lead to inefficient or even inconsistent estimation. Also, the commonly imposed joint normality assumption is arguably too stringent for many applications. To address these challenges, in this paper we propose a Factor-Adjusted Robust Multiple Testing (FarmTest) procedure for large-scale simultaneous inference with control of the false discovery proportion. We demonstrate that robust factor adjustments are extremely important in both controlling the FDP and improving the power. We identify general conditions under which the proposed method produces consistent estimate of the FDP. As a byproduct that is of independent interest, we establish an exponential-type deviation inequality for a robust

U

-type covariance estimator under the spectral norm. Extensive numerical experiments demonstrate the advantage of the proposed method over several state-of-the-art methods especially when the data are generated from heavy-tailed distributions. The proposed procedures are implemented in the R-package FarmTest.Comment: 52 pages, 9 figure

arXiv.org e-Print Archive

Princeton University Open Access Repository

eScholarship - University of California

FigShare

FENDI: High-Fidelity Entanglement Distribution in the Quantum Internet

Author: Gu Huayue
Li Zhouyu
Liu Jianqing
Wang Xiaojian
Yu Ruozhou
Zhou Fangtong
Publication venue
Publication date: 19/01/2023
Field of study

A quantum network distributes quantum entanglements between remote nodes, which is key to many quantum applications. However, unavoidable noise in quantum operations could lead to both low throughput and low quality of entanglement distribution. This paper aims to address the simultaneous exponential degradation in throughput and quality in a buffered multi-hop quantum network. Based on an end-to-end fidelity model with worst-case (isotropic) noise, we formulate the high-fidelity remote entanglement distribution problem for a single source-destination pair, and prove its NP-hardness. To address the problem, we develop a fully polynomial-time approximation scheme for the control plane of the quantum network, and a distributed data plane protocol that achieves the desired long-term throughput and worst-case fidelity based on control plane outputs. To evaluate our algorithm and protocol, we develop a discrete-time quantum network simulator. Simulation results show the superior performance of our approach compared to existing fidelity-agnostic and fidelity-aware solutions

arXiv.org e-Print Archive