1,024 research outputs found
Adaptive variance function estimation in heteroscedastic nonparametric regression
We consider a wavelet thresholding approach to adaptive variance function
estimation in heteroscedastic nonparametric regression. A data-driven estimator
is constructed by applying wavelet thresholding to the squared first-order
differences of the observations. We show that the variance function estimator
is nearly optimally adaptive to the smoothness of both the mean and variance
functions. The estimator is shown to achieve the optimal adaptive rate of
convergence under the pointwise squared error simultaneously over a range of
smoothness classes. The estimator is also adaptively within a logarithmic
factor of the minimax risk under the global mean integrated squared error over
a collection of spatially inhomogeneous function classes. Numerical
implementation and simulation results are also discussed.Comment: Published in at http://dx.doi.org/10.1214/07-AOS509 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Recommended from our members
Covariate-assisted ranking and screening for large-scale two-sample inference
Two-sample multiple testing has a wide range of applications. The conventionalpractice first reduces the original observations to a vector of p-values and then chooses a cutoffto adjust for multiplicity. However, this data reduction step could cause significant loss ofinformation and thus lead to suboptimal testing procedures.We introduce a new framework fortwo-sample multiple testing by incorporating a carefully constructed auxiliary variable in inferenceto improve the power. A data-driven multiple-testing procedure is developed by employinga covariate-assisted ranking and screening (CARS) approach that optimally combines the informationfrom both the primary and the auxiliary variables. The proposed CARS procedureis shown to be asymptotically valid and optimal for false discovery rate control. The procedureis implemented in the R package CARS. Numerical results confirm the effectiveness of CARSin false discovery rate control and show that it achieves substantial power gain over existingmethods. CARS is also illustrated through an application to the analysis of a satellite imagingdata set for supernova detection
New Bounds for Restricted Isometry Constants
In this paper we show that if the restricted isometry constant of
the compressed sensing matrix satisfies then -sparse
signals are guaranteed to be recovered exactly via minimization when
no noise is present and -sparse signals can be estimated stably in the noisy
case. It is also shown that the bound cannot be substantively improved. An
explicitly example is constructed in which ,
but it is impossible to recover certain -sparse signals
Effect of mean on variance function estimation in nonparametric regression
Variance function estimation in nonparametric regression is considered and
the minimax rate of convergence is derived. We are particularly interested in
the effect of the unknown mean on the estimation of the variance function. Our
results indicate that, contrary to the common practice, it is not desirable to
base the estimator of the variance function on the residuals from an optimal
estimator of the mean when the mean function is not smooth. Instead it is more
desirable to use estimators of the mean with minimal bias. On the other hand,
when the mean function is very smooth, our numerical results show that the
residual-based method performs better, but not substantial better than the
first-order-difference-based estimator. In addition our asymptotic results also
correct the optimal rate claimed in Hall and Carroll [J. Roy. Statist. Soc.
Ser. B 51 (1989) 3--14].Comment: Published in at http://dx.doi.org/10.1214/009053607000000901 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Forbidden Facts: An Investigation of Competing Objectives in Llama-2
LLMs often face competing pressures (for example helpfulness vs.
harmlessness). To understand how models resolve such conflicts, we study
Llama-2-chat models on the forbidden fact task. Specifically, we instruct
Llama-2 to truthfully complete a factual recall statement while forbidding it
from saying the correct answer. This often makes the model give incorrect
answers. We decompose Llama-2 into 1000+ components, and rank each one with
respect to how useful it is for forbidding the correct answer. We find that in
aggregate, around 35 components are enough to reliably implement the full
suppression behavior. However, these components are fairly heterogeneous and
many operate using faulty heuristics. We discover that one of these heuristics
can be exploited via a manually designed adversarial attack which we call The
California Attack. Our results highlight some roadblocks standing in the way of
being able to successfully interpret advanced ML systems. Project website
available at https://forbiddenfacts.github.io .Comment: Accepted to the ATTRIB and SoLaR workshops at NeurIPS 2023; (v3:
clarified experimental details
Variance Function Estimation in Multivariate Nonparametric Regression
Variance function estimation in multivariate nonparametric regression is considered and the minimax rate of convergence is established in the iid Gaussian case. Our work uses the approach that generalizes the one used in [A. Munk, Bissantz, T. Wagner, G. Freitag, On difference based variance estimation in nonparametric regression when the covariate is high dimensional, J. R. Stat. Soc. B 67 (Part 1) (2005) 19β41] for the constant variance case. As is the case when the number of dimensions d=1, and very much contrary to standard thinking, it is often not desirable to base the estimator of the variance function on the residuals from an optimal estimator of the mean. Instead it is desirable to use estimators of the mean with minimal bias. Another important conclusion is that the first order difference based estimator that achieves minimax rate of convergence in the one-dimensional case does not do the same in the high dimensional case. Instead, the optimal order of differences depends on the number of dimensions
SIHR: Statistical Inference in High-Dimensional Linear and Logistic Regression Models
We introduce the R package \CRANpkg{SIHR} for statistical inference in
high-dimensional generalized linear models with continuous and binary outcomes.
The package provides functionalities for constructing confidence intervals and
performing hypothesis tests for low-dimensional objectives in both one-sample
and two-sample regression settings. We illustrate the usage of \CRANpkg{SIHR}
through numerical examples and present real data applications to demonstrate
the package's performance and practicality
- β¦