493 research outputs found
Bootstrap confidence sets under model misspecification
A multiplier bootstrap procedure for construction of likelihood-based
confidence sets is considered for finite samples and a possible model
misspecification. Theoretical results justify the bootstrap validity for a
small or moderate sample size and allow to control the impact of the parameter
dimension : the bootstrap approximation works if is small. The main
result about bootstrap validity continues to apply even if the underlying
parametric model is misspecified under the so-called small modelling bias
condition. In the case when the true model deviates significantly from the
considered parametric family, the bootstrap procedure is still applicable but
it becomes a bit conservative: the size of the constructed confidence sets is
increased by the modelling bias. We illustrate the results with numerical
examples for misspecified linear and logistic regressions.Comment: Published at http://dx.doi.org/10.1214/15-AOS1355 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Non-Gaussianity in the Weak Lensing Correlation Function Likelihood -- Implications for Cosmological Parameter Biases
We study the significance of non-Gaussianity in the likelihood of weak
lensing shear two-point correlation functions, detecting significantly non-zero
skewness and kurtosis in one-dimensional marginal distributions of shear
two-point correlation functions in simulated weak lensing data. We examine the
implications in the context of future surveys, in particular LSST, with
derivations of how the non-Gaussianity scales with survey area. We show that
there is no significant bias in one-dimensional posteriors of
and due to the non-Gaussian likelihood distributions of shear
correlations functions using the mock data ( deg). We also present a
systematic approach to constructing approximate multivariate likelihoods with
one-dimensional parametric functions by assuming independence or more flexible
non-parametric multivariate methods after decorrelating the data points using
principal component analysis (PCA). While the use of PCA does not modify the
non-Gaussianity of the multivariate likelihood, we find empirically that the
one-dimensional marginal sampling distributions of the PCA components exhibit
less skewness and kurtosis than the original shear correlation
functions.Modeling the likelihood with marginal parametric functions based on
the assumption of independence between PCA components thus gives a lower limit
for the biases. We further demonstrate that the difference in cosmological
parameter constraints between the multivariate Gaussian likelihood model and
more complex non-Gaussian likelihood models would be even smaller for an
LSST-like survey. In addition, the PCA approach automatically serves as a data
compression method, enabling the retention of the majority of the cosmological
information while reducing the dimensionality of the data vector by a factor of
5.Comment: 16 pages, 10 figures, published MNRA
U-Statistic Reduction: Higher-Order Accurate Risk Control and Statistical-Computational Trade-Off, with Application to Network Method-of-Moments
U-statistics play central roles in many statistical learning tools but face
the haunting issue of scalability. Significant efforts have been devoted into
accelerating computation by U-statistic reduction. However, existing results
almost exclusively focus on power analysis, while little work addresses risk
control accuracy -- comparatively, the latter requires distinct and much more
challenging techniques. In this paper, we establish the first statistical
inference procedure with provably higher-order accurate risk control for
incomplete U-statistics. The sharpness of our new result enables us to reveal
how risk control accuracy also trades off with speed for the first time in
literature, which complements the well-known variance-speed trade-off. Our
proposed general framework converts the long-standing challenge of formulating
accurate statistical inference procedures for many different designs into a
surprisingly routine task. This paper covers non-degenerate and degenerate
U-statistics, and network moments. We conducted comprehensive numerical studies
and observed results that validate our theory's sharpness. Our method also
demonstrates effectiveness on real-world data applications
Conditional limit laws for goodness-of-fit tests
We study the conditional distribution of goodness of fit statistics of the
Cram\'{e}r--von Mises type given the complete sufficient statistics in testing
for exponential family models. We show that this distribution is close, in
large samples, to that given by parametric bootstrapping, namely, the
unconditional distribution of the statistic under the value of the parameter
given by the maximum likelihood estimate. As part of the proof, we give uniform
Edgeworth expansions of Rao--Blackwell estimates in these models.Comment: Published in at http://dx.doi.org/10.3150/11-BEJ366 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Cram\'{e}r-type moderate deviations for Studentized two-sample -statistics with applications
Two-sample -statistics are widely used in a broad range of applications,
including those in the fields of biostatistics and econometrics. In this paper,
we establish sharp Cram\'{e}r-type moderate deviation theorems for Studentized
two-sample -statistics in a general framework, including the two-sample
-statistic and Studentized Mann-Whitney test statistic as prototypical
examples. In particular, a refined moderate deviation theorem with second-order
accuracy is established for the two-sample -statistic. These results extend
the applicability of the existing statistical methodologies from the one-sample
-statistic to more general nonlinear statistics. Applications to two-sample
large-scale multiple testing problems with false discovery rate control and the
regularized bootstrap method are also discussed.Comment: Published at http://dx.doi.org/10.1214/15-AOS1375 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A Higher-Order Correct Fast Moving-Average Bootstrap for Dependent Data
We develop and implement a novel fast bootstrap for dependent data. Our
scheme is based on the i.i.d. resampling of the smoothed moment indicators. We
characterize the class of parametric and semi-parametric estimation problems
for which the method is valid. We show the asymptotic refinements of the
proposed procedure, proving that it is higher-order correct under mild
assumptions on the time series, the estimating functions, and the smoothing
kernel. We illustrate the applicability and the advantages of our procedure for
Generalized Empirical Likelihood estimation. As a by-product, our fast
bootstrap provides higher-order correct asymptotic confidence distributions.
Monte Carlo simulations on an autoregressive conditional duration model provide
numerical evidence that the novel bootstrap yields higher-order accurate
confidence intervals. A real-data application on dynamics of trading volume of
stocks illustrates the advantage of our method over the routinely-applied
first-order asymptotic theory, when the underlying distribution of the test
statistic is skewed or fat-tailed
- …