7,295 research outputs found

    Early Stopping for Nonparametric Testing

    Full text link
    Early stopping of iterative algorithms is an algorithmic regularization method to avoid over-fitting in estimation and classification. In this paper, we show that early stopping can also be applied to obtain the minimax optimal testing in a general non-parametric setup. Specifically, a Wald-type test statistic is obtained based on an iterated estimate produced by functional gradient descent algorithms in a reproducing kernel Hilbert space. A notable contribution is to establish a "sharp" stopping rule: when the number of iterations achieves an optimal order, testing optimality is achievable; otherwise, testing optimality becomes impossible. As a by-product, a similar sharpness result is also derived for minimax optimal estimation under early stopping studied in [11] and [19]. All obtained results hold for various kernel classes, including Sobolev smoothness classes and Gaussian kernel classes.Comment: To appear in NIPS 201

    A Decision-Theoretic Comparison of Treatments to Resolve Air Leaks After Lung Surgery Based on Nonparametric Modeling

    Full text link
    We propose a Bayesian nonparametric utility-based group sequential design for a randomized clinical trial to compare a gel sealant to standard care for resolving air leaks after pulmonary resection. Clinically, resolving air leaks in the days soon after surgery is highly important, since longer resolution time produces undesirable complications that require extended hospitalization. The problem of comparing treatments is complicated by the fact that the resolution time distributions are skewed and multi-modal, so using means is misleading. We address these challenges by assuming Bayesian nonparametric probability models for the resolution time distributions and basing the comparative test on weighted means. The weights are elicited as clinical utilities of the resolution times. The proposed design uses posterior expected utilities as group sequential test criteria. The procedure's frequentist properties are studied by extensive simulations

    Monitoring Procedures to Detect Unit Roots and Stationarity

    Full text link
    When analysing time series an important issue is to decide whether the time series is stationary or a random walk. Relaxing these notions, we consider the problem to decide in favor of the I(0)- or I(1)-property. Fixed-sample statistical tests for that problem are well studied in the literature. In this paper we provide first results for the problem to monitor sequentially a time series. Our stopping times are based on a sequential version of a kernel-weighted variance-ratio statistic. The asymptotic distributions are established for I(1) processes, a rich class of stationary processes, possibly affected by local nonpara- metric alternatives, and the local-to-unity model. Further, we consider the two interesting change-point models where the time series changes its behaviour after a certain fraction of the observations and derive the associated limiting laws. Our Monte-Carlo studies show that the proposed detection procedures have high power when interpreted as a hypothesis test, and that the decision can often be made very early

    Pointwise adaptive estimation for robust and quantile regression

    Full text link
    A nonparametric procedure for robust regression estimation and for quantile regression is proposed which is completely data-driven and adapts locally to the regularity of the regression function. This is achieved by considering in each point M-estimators over different local neighbourhoods and by a local model selection procedure based on sequential testing. Non-asymptotic risk bounds are obtained, which yield rate-optimality for large sample asymptotics under weak conditions. Simulations for different univariate median regression models show good finite sample properties, also in comparison to traditional methods. The approach is extended to image denoising and applied to CT scans in cancer research

    msBP: An R package to perform Bayesian nonparametric inference using multiscale Bernstein polynomials mixtures

    Get PDF
    msBP is an R package that implements a new method to perform Bayesian multiscale nonparametric inference introduced by Canale and Dunson (2016). The method, based on mixtures of multiscale beta dictionary densities, overcomes the drawbacks of Pólya trees and inherits many of the advantages of Dirichlet process mixture models. The key idea is that an infinitely-deep binary tree is introduced, with a beta dictionary density assigned to each node of the tree. Using a multiscale stick-breaking characterization, stochastically decreasing weights are assigned to each node. The result is an infinite mixture model. The package msBP implements a series of basic functions to deal with this family of priors such as random densities and numbers generation, creation and manipulation of binary tree objects, and generic functions to plot and print the results. In addition, it implements the Gibbs samplers for posterior computation to perform multiscale density estimation and multiscale testing of group differences described in Canale and Dunson (2016)

    Nearest-Neighbor Neural Networks for Geostatistics

    Full text link
    Kriging is the predominant method used for spatial prediction, but relies on the assumption that predictions are linear combinations of the observations. Kriging often also relies on additional assumptions such as normality and stationarity. We propose a more flexible spatial prediction method based on the Nearest-Neighbor Neural Network (4N) process that embeds deep learning into a geostatistical model. We show that the 4N process is a valid stochastic process and propose a series of new ways to construct features to be used as inputs to the deep learning model based on neighboring information. Our model framework outperforms some existing state-of-art geostatistical modelling methods for simulated non-Gaussian data and is applied to a massive forestry dataset

    Sequentially Updated Residuals and Detection of Stationary Errors in Polynomial Regression Models

    Full text link
    The question whether a time series behaves as a random walk or as a station- ary process is an important and delicate problem, particularly arising in financial statistics, econometrics, and engineering. This paper studies the problem to detect sequentially that the error terms in a polynomial regression model no longer behave as a random walk but as a stationary process. We provide the asymptotic distribution theory for a monitoring procedure given by a control chart, i.e., a stopping time, which is related to a well known unit root test statistic calculated from sequentially updated residuals. We provide a functional central limit theorem for the corresponding stochastic process which implies a central limit theorem for the control chart. The finite sample properties are investigated by a simulation study

    Selective Sequential Model Selection

    Full text link
    Many model selection algorithms produce a path of fits specifying a sequence of increasingly complex models. Given such a sequence and the data used to produce them, we consider the problem of choosing the least complex model that is not falsified by the data. Extending the selected-model tests of Fithian et al. (2014), we construct p-values for each step in the path which account for the adaptive selection of the model path using the data. In the case of linear regression, we propose two specific tests, the max-t test for forward stepwise regression (generalizing a proposal of Buja and Brown (2014)), and the next-entry test for the lasso. These tests improve on the power of the saturated-model test of Tibshirani et al. (2014), sometimes dramatically. In addition, our framework extends beyond linear regression to a much more general class of parametric and nonparametric model selection problems. To select a model, we can feed our single-step p-values as inputs into sequential stopping rules such as those proposed by G'Sell et al. (2013) and Li and Barber (2015), achieving control of the familywise error rate or false discovery rate (FDR) as desired. The FDR-controlling rules require the null p-values to be independent of each other and of the non-null p-values, a condition not satisfied by the saturated-model p-values of Tibshirani et al. (2014). We derive intuitive and general sufficient conditions for independence, and show that our proposed constructions yield independent p-values

    NP-optimal kernels for nonparametric sequential detection rules

    Get PDF
    An attractive nonparametric method to detect change-points sequentially is to apply control charts based on kernel smoothers. Recently, the strong convergence of the associated normed delay associated with such a sequential stopping rule has been studied under sequences of out-of-control models. Kernel smoothers employ a kernel function to downweight past data. Since kernel functions with values in the unit interval are sufficient for that task, we study the problem to optimize the asymptotic normed delay over a class of kernels ensuring that restriction and certain additional moment constraints. We apply the key theorem to discuss several important examples where explicit solutions exist to illustrate that the results are applicable. --Control charts,financial data,nonparametric regression,quality control,statistical genetics

    Group sequential designs for negative binomial outcomes

    Full text link
    Count data and recurrent events in clinical trials, such as the number of lesions in magnetic resonance imaging in multiple sclerosis, the number of relapses in multiple sclerosis, the number of hospitalizations in heart failure, and the number of exacerbations in asthma or in chronic obstructive pulmonary disease (COPD) are often modeled by negative binomial distributions. In this manuscript we study planning and analyzing clinical trials with group sequential designs for negative binomial outcomes. We propose a group sequential testing procedure for negative binomial outcomes based on Wald statistics using maximum likelihood estimators. The asymptotic distribution of the proposed group sequential tests statistics are derived. The finite sample size properties of the proposed group sequential test for negative binomial outcomes and the methods for planning the respective clinical trials are assessed in a simulation study. The simulation scenarios are motivated by clinical trials in chronic heart failure and relapsing multiple sclerosis, which cover a wide range of practically relevant settings. Our research assures that the asymptotic normal theory of group sequential designs can be applied to negative binomial outcomes when the hypotheses are tested using Wald statistics and maximum likelihood estimators. We also propose two methods, one based on Student's t-distribution and one based on resampling, to improve type I error rate control in small samples. The statistical methods studied in this manuscript are implemented in the R package \textit{gscounts}, which is available for download on the Comprehensive R Archive Network (CRAN)
    corecore