298 research outputs found
Hazard models with varying coefficients for multivariate failure time data
Statistical estimation and inference for marginal hazard models with varying
coefficients for multivariate failure time data are important subjects in
survival analysis. A local pseudo-partial likelihood procedure is proposed for
estimating the unknown coefficient functions. A weighted average estimator is
also proposed in an attempt to improve the efficiency of the estimator. The
consistency and asymptotic normality of the proposed estimators are established
and standard error formulas for the estimated coefficients are derived and
empirically tested. To reduce the computational burden of the maximum local
pseudo-partial likelihood estimator, a simple and useful one-step estimator is
proposed. Statistical properties of the one-step estimator are established and
simulation studies are conducted to compare the performance of the one-step
estimator to that of the maximum local pseudo-partial likelihood estimator. The
results show that the one-step estimator can save computational cost without
compromising performance both asymptotically and empirically and that an
optimal weighted average estimator is more efficient than the maximum local
pseudo-partial likelihood estimator. A data set from the Busselton Population
Health Surveys is analyzed to illustrate our proposed methodology.Comment: Published at http://dx.doi.org/10.1214/009053606000001145 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Local partial-likelihood estimation for lifetime data
This paper considers a proportional hazards model, which allows one to
examine the extent to which covariates interact nonlinearly with an exposure
variable, for analysis of lifetime data. A local partial-likelihood technique
is proposed to estimate nonlinear interactions. Asymptotic normality of the
proposed estimator is established. The baseline hazard function, the bias and
the variance of the local likelihood estimator are consistently estimated. In
addition, a one-step local partial-likelihood estimator is presented to
facilitate the computation of the proposed procedure and is demonstrated to be
as efficient as the fully iterated local partial-likelihood estimator.
Furthermore, a penalized local likelihood estimator is proposed to select
important risk variables in the model. Numerical examples are used to
illustrate the effectiveness of the proposed procedures.Comment: Published at http://dx.doi.org/10.1214/009053605000000796 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Adaptive Huber Regression
Big data can easily be contaminated by outliers or contain variables with
heavy-tailed distributions, which makes many conventional methods inadequate.
To address this challenge, we propose the adaptive Huber regression for robust
estimation and inference. The key observation is that the robustification
parameter should adapt to the sample size, dimension and moments for optimal
tradeoff between bias and robustness. Our theoretical framework deals with
heavy-tailed distributions with bounded -th moment for any . We establish a sharp phase transition for robust estimation of regression
parameters in both low and high dimensions: when , the estimator
admits a sub-Gaussian-type deviation bound without sub-Gaussian assumptions on
the data, while only a slower rate is available in the regime .
Furthermore, this transition is smooth and optimal. In addition, we extend the
methodology to allow both heavy-tailed predictors and observation noise.
Simulation studies lend further support to the theory. In a genetic study of
cancer cell lines that exhibit heavy-tailedness, the proposed methods are shown
to be more robust and predictive.Comment: final versio
Are Discoveries Spurious? Distributions of Maximum Spurious Correlations and Their Applications
Over the last two decades, many exciting variable selection methods have been
developed for finding a small group of covariates that are associated with the
response from a large pool. Can the discoveries from these data mining
approaches be spurious due to high dimensionality and limited sample size? Can
our fundamental assumptions about the exogeneity of the covariates needed for
such variable selection be validated with the data? To answer these questions,
we need to derive the distributions of the maximum spurious correlations given
a certain number of predictors, namely, the distribution of the correlation of
a response variable with the best linear combinations of covariates
, even when and are independent. When the
covariance matrix of possesses the restricted eigenvalue property,
we derive such distributions for both a finite and a diverging , using
Gaussian approximation and empirical process techniques. However, such a
distribution depends on the unknown covariance matrix of . Hence,
we use the multiplier bootstrap procedure to approximate the unknown
distributions and establish the consistency of such a simple bootstrap
approach. The results are further extended to the situation where the residuals
are from regularized fits. Our approach is then used to construct the upper
confidence limit for the maximum spurious correlation and to test the
exogeneity of the covariates. The former provides a baseline for guarding
against false discoveries and the latter tests whether our fundamental
assumptions for high-dimensional model selection are statistically valid. Our
techniques and results are illustrated with both numerical examples and real
data analysis
A New Perspective on Robust -Estimation: Finite Sample Theory and Applications to Dependence-Adjusted Multiple Testing
Heavy-tailed errors impair the accuracy of the least squares estimate, which
can be spoiled by a single grossly outlying observation. As argued in the
seminal work of Peter Huber in 1973 [{\it Ann. Statist.} {\bf 1} (1973)
799--821], robust alternatives to the method of least squares are sorely
needed. To achieve robustness against heavy-tailed sampling distributions, we
revisit the Huber estimator from a new perspective by letting the tuning
parameter involved diverge with the sample size. In this paper, we develop
nonasymptotic concentration results for such an adaptive Huber estimator,
namely, the Huber estimator with the tuning parameter adapted to sample size,
dimension, and the variance of the noise. Specifically, we obtain a
sub-Gaussian-type deviation inequality and a nonasymptotic Bahadur
representation when noise variables only have finite second moments. The
nonasymptotic results further yield two conventional normal approximation
results that are of independent interest, the Berry-Esseen inequality and
Cram\'er-type moderate deviation. As an important application to large-scale
simultaneous inference, we apply these robust normal approximation results to
analyze a dependence-adjusted multiple testing procedure for moderately
heavy-tailed data. It is shown that the robust dependence-adjusted procedure
asymptotically controls the overall false discovery proportion at the nominal
level under mild moment conditions. Thorough numerical results on both
simulated and real datasets are also provided to back up our theory.Comment: Ann. Statist. (in press
FarmTest: Factor-Adjusted Robust Multiple Testing with Approximate False Discovery Control
Large-scale multiple testing with correlated and heavy-tailed data arises in
a wide range of research areas from genomics, medical imaging to finance.
Conventional methods for estimating the false discovery proportion (FDP) often
ignore the effect of heavy-tailedness and the dependence structure among test
statistics, and thus may lead to inefficient or even inconsistent estimation.
Also, the commonly imposed joint normality assumption is arguably too stringent
for many applications. To address these challenges, in this paper we propose a
Factor-Adjusted Robust Multiple Testing (FarmTest) procedure for large-scale
simultaneous inference with control of the false discovery proportion. We
demonstrate that robust factor adjustments are extremely important in both
controlling the FDP and improving the power. We identify general conditions
under which the proposed method produces consistent estimate of the FDP. As a
byproduct that is of independent interest, we establish an exponential-type
deviation inequality for a robust -type covariance estimator under the
spectral norm. Extensive numerical experiments demonstrate the advantage of the
proposed method over several state-of-the-art methods especially when the data
are generated from heavy-tailed distributions. The proposed procedures are
implemented in the R-package FarmTest.Comment: 52 pages, 9 figure
FENDI: High-Fidelity Entanglement Distribution in the Quantum Internet
A quantum network distributes quantum entanglements between remote nodes,
which is key to many quantum applications. However, unavoidable noise in
quantum operations could lead to both low throughput and low quality of
entanglement distribution. This paper aims to address the simultaneous
exponential degradation in throughput and quality in a buffered multi-hop
quantum network. Based on an end-to-end fidelity model with worst-case
(isotropic) noise, we formulate the high-fidelity remote entanglement
distribution problem for a single source-destination pair, and prove its
NP-hardness. To address the problem, we develop a fully polynomial-time
approximation scheme for the control plane of the quantum network, and a
distributed data plane protocol that achieves the desired long-term throughput
and worst-case fidelity based on control plane outputs. To evaluate our
algorithm and protocol, we develop a discrete-time quantum network simulator.
Simulation results show the superior performance of our approach compared to
existing fidelity-agnostic and fidelity-aware solutions
- …