7,295 research outputs found
Early Stopping for Nonparametric Testing
Early stopping of iterative algorithms is an algorithmic regularization
method to avoid over-fitting in estimation and classification. In this paper,
we show that early stopping can also be applied to obtain the minimax optimal
testing in a general non-parametric setup. Specifically, a Wald-type test
statistic is obtained based on an iterated estimate produced by functional
gradient descent algorithms in a reproducing kernel Hilbert space. A notable
contribution is to establish a "sharp" stopping rule: when the number of
iterations achieves an optimal order, testing optimality is achievable;
otherwise, testing optimality becomes impossible. As a by-product, a similar
sharpness result is also derived for minimax optimal estimation under early
stopping studied in [11] and [19]. All obtained results hold for various kernel
classes, including Sobolev smoothness classes and Gaussian kernel classes.Comment: To appear in NIPS 201
A Decision-Theoretic Comparison of Treatments to Resolve Air Leaks After Lung Surgery Based on Nonparametric Modeling
We propose a Bayesian nonparametric utility-based group sequential design for
a randomized clinical trial to compare a gel sealant to standard care for
resolving air leaks after pulmonary resection. Clinically, resolving air leaks
in the days soon after surgery is highly important, since longer resolution
time produces undesirable complications that require extended hospitalization.
The problem of comparing treatments is complicated by the fact that the
resolution time distributions are skewed and multi-modal, so using means is
misleading. We address these challenges by assuming Bayesian nonparametric
probability models for the resolution time distributions and basing the
comparative test on weighted means. The weights are elicited as clinical
utilities of the resolution times. The proposed design uses posterior expected
utilities as group sequential test criteria. The procedure's frequentist
properties are studied by extensive simulations
Monitoring Procedures to Detect Unit Roots and Stationarity
When analysing time series an important issue is to decide whether the time
series is stationary or a random walk. Relaxing these notions, we consider the
problem to decide in favor of the I(0)- or I(1)-property. Fixed-sample
statistical tests for that problem are well studied in the literature. In this
paper we provide first results for the problem to monitor sequentially a time
series. Our stopping times are based on a sequential version of a
kernel-weighted variance-ratio statistic. The asymptotic distributions are
established for I(1) processes, a rich class of stationary processes, possibly
affected by local nonpara- metric alternatives, and the local-to-unity model.
Further, we consider the two interesting change-point models where the time
series changes its behaviour after a certain fraction of the observations and
derive the associated limiting laws. Our Monte-Carlo studies show that the
proposed detection procedures have high power when interpreted as a hypothesis
test, and that the decision can often be made very early
Pointwise adaptive estimation for robust and quantile regression
A nonparametric procedure for robust regression estimation and for quantile
regression is proposed which is completely data-driven and adapts locally to
the regularity of the regression function. This is achieved by considering in
each point M-estimators over different local neighbourhoods and by a local
model selection procedure based on sequential testing. Non-asymptotic risk
bounds are obtained, which yield rate-optimality for large sample asymptotics
under weak conditions. Simulations for different univariate median regression
models show good finite sample properties, also in comparison to traditional
methods. The approach is extended to image denoising and applied to CT scans in
cancer research
msBP: An R package to perform Bayesian nonparametric inference using multiscale Bernstein polynomials mixtures
msBP is an R package that implements a new method to perform Bayesian multiscale nonparametric inference introduced by Canale and Dunson (2016). The method, based on mixtures of multiscale beta dictionary densities, overcomes the drawbacks of Pólya trees and inherits many of the advantages of Dirichlet process mixture models. The key idea is that an infinitely-deep binary tree is introduced, with a beta dictionary density assigned to each node of the tree. Using a multiscale stick-breaking characterization, stochastically decreasing weights are assigned to each node. The result is an infinite mixture model. The package msBP implements a series of basic functions to deal with this family of priors such as random densities and numbers generation, creation and manipulation of binary tree objects, and generic functions to plot and print the results. In addition, it implements the Gibbs samplers for posterior computation to perform multiscale density estimation and multiscale testing of group differences described in Canale and Dunson (2016)
Nearest-Neighbor Neural Networks for Geostatistics
Kriging is the predominant method used for spatial prediction, but relies on
the assumption that predictions are linear combinations of the observations.
Kriging often also relies on additional assumptions such as normality and
stationarity. We propose a more flexible spatial prediction method based on the
Nearest-Neighbor Neural Network (4N) process that embeds deep learning into a
geostatistical model. We show that the 4N process is a valid stochastic process
and propose a series of new ways to construct features to be used as inputs to
the deep learning model based on neighboring information. Our model framework
outperforms some existing state-of-art geostatistical modelling methods for
simulated non-Gaussian data and is applied to a massive forestry dataset
Sequentially Updated Residuals and Detection of Stationary Errors in Polynomial Regression Models
The question whether a time series behaves as a random walk or as a station-
ary process is an important and delicate problem, particularly arising in
financial statistics, econometrics, and engineering. This paper studies the
problem to detect sequentially that the error terms in a polynomial regression
model no longer behave as a random walk but as a stationary process. We provide
the asymptotic distribution theory for a monitoring procedure given by a
control chart, i.e., a stopping time, which is related to a well known unit
root test statistic calculated from sequentially updated residuals. We provide
a functional central limit theorem for the corresponding stochastic process
which implies a central limit theorem for the control chart. The finite sample
properties are investigated by a simulation study
Selective Sequential Model Selection
Many model selection algorithms produce a path of fits specifying a sequence
of increasingly complex models. Given such a sequence and the data used to
produce them, we consider the problem of choosing the least complex model that
is not falsified by the data. Extending the selected-model tests of Fithian et
al. (2014), we construct p-values for each step in the path which account for
the adaptive selection of the model path using the data. In the case of linear
regression, we propose two specific tests, the max-t test for forward stepwise
regression (generalizing a proposal of Buja and Brown (2014)), and the
next-entry test for the lasso. These tests improve on the power of the
saturated-model test of Tibshirani et al. (2014), sometimes dramatically. In
addition, our framework extends beyond linear regression to a much more general
class of parametric and nonparametric model selection problems.
To select a model, we can feed our single-step p-values as inputs into
sequential stopping rules such as those proposed by G'Sell et al. (2013) and Li
and Barber (2015), achieving control of the familywise error rate or false
discovery rate (FDR) as desired. The FDR-controlling rules require the null
p-values to be independent of each other and of the non-null p-values, a
condition not satisfied by the saturated-model p-values of Tibshirani et al.
(2014). We derive intuitive and general sufficient conditions for independence,
and show that our proposed constructions yield independent p-values
NP-optimal kernels for nonparametric sequential detection rules
An attractive nonparametric method to detect change-points sequentially is to apply control charts based on kernel smoothers. Recently, the strong convergence of the associated normed delay associated with such a sequential stopping rule has been studied under sequences of out-of-control models. Kernel smoothers employ a kernel function to downweight past data. Since kernel functions with values in the unit interval are sufficient for that task, we study the problem to optimize the asymptotic normed delay over a class of kernels ensuring that restriction and certain additional moment constraints. We apply the key theorem to discuss several important examples where explicit solutions exist to illustrate that the results are applicable. --Control charts,financial data,nonparametric regression,quality control,statistical genetics
Group sequential designs for negative binomial outcomes
Count data and recurrent events in clinical trials, such as the number of
lesions in magnetic resonance imaging in multiple sclerosis, the number of
relapses in multiple sclerosis, the number of hospitalizations in heart
failure, and the number of exacerbations in asthma or in chronic obstructive
pulmonary disease (COPD) are often modeled by negative binomial distributions.
In this manuscript we study planning and analyzing clinical trials with group
sequential designs for negative binomial outcomes. We propose a group
sequential testing procedure for negative binomial outcomes based on Wald
statistics using maximum likelihood estimators. The asymptotic distribution of
the proposed group sequential tests statistics are derived. The finite sample
size properties of the proposed group sequential test for negative binomial
outcomes and the methods for planning the respective clinical trials are
assessed in a simulation study. The simulation scenarios are motivated by
clinical trials in chronic heart failure and relapsing multiple sclerosis,
which cover a wide range of practically relevant settings. Our research assures
that the asymptotic normal theory of group sequential designs can be applied to
negative binomial outcomes when the hypotheses are tested using Wald statistics
and maximum likelihood estimators. We also propose two methods, one based on
Student's t-distribution and one based on resampling, to improve type I error
rate control in small samples. The statistical methods studied in this
manuscript are implemented in the R package \textit{gscounts}, which is
available for download on the Comprehensive R Archive Network (CRAN)
- …