Search CORE

7,295 research outputs found

Early Stopping for Nonparametric Testing

Author: Cheng Guang
Liu Meimei
Publication venue
Publication date: 17/09/2018
Field of study

Early stopping of iterative algorithms is an algorithmic regularization method to avoid over-fitting in estimation and classification. In this paper, we show that early stopping can also be applied to obtain the minimax optimal testing in a general non-parametric setup. Specifically, a Wald-type test statistic is obtained based on an iterated estimate produced by functional gradient descent algorithms in a reproducing kernel Hilbert space. A notable contribution is to establish a "sharp" stopping rule: when the number of iterations achieves an optimal order, testing optimality is achievable; otherwise, testing optimality becomes impossible. As a by-product, a similar sharpness result is also derived for minimax optimal estimation under early stopping studied in [11] and [19]. All obtained results hold for various kernel classes, including Sobolev smoothness classes and Gaussian kernel classes.Comment: To appear in NIPS 201

arXiv.org e-Print Archive

A Decision-Theoretic Comparison of Treatments to Resolve Air Leaks After Lung Surgery Based on Nonparametric Modeling

Author: Mueller Peter
Reza Mehran J.
Thall Peter F.
Xu Yanxun
Publication venue
Publication date: 18/07/2016
Field of study

We propose a Bayesian nonparametric utility-based group sequential design for a randomized clinical trial to compare a gel sealant to standard care for resolving air leaks after pulmonary resection. Clinically, resolving air leaks in the days soon after surgery is highly important, since longer resolution time produces undesirable complications that require extended hospitalization. The problem of comparing treatments is complicated by the fact that the resolution time distributions are skewed and multi-modal, so using means is misleading. We address these challenges by assuming Bayesian nonparametric probability models for the resolution time distributions and basing the comparative test on weighted means. The weights are elicited as clinical utilities of the resolution times. The proposed design uses posterior expected utilities as group sequential test criteria. The procedure's frequentist properties are studied by extensive simulations

arXiv.org e-Print Archive

Monitoring Procedures to Detect Unit Roots and Stationarity

Author: Steland Ansgar
Publication venue
Publication date: 12/01/2010
Field of study

When analysing time series an important issue is to decide whether the time series is stationary or a random walk. Relaxing these notions, we consider the problem to decide in favor of the I(0)- or I(1)-property. Fixed-sample statistical tests for that problem are well studied in the literature. In this paper we provide first results for the problem to monitor sequentially a time series. Our stopping times are based on a sequential version of a kernel-weighted variance-ratio statistic. The asymptotic distributions are established for I(1) processes, a rich class of stationary processes, possibly affected by local nonpara- metric alternatives, and the local-to-unity model. Further, we consider the two interesting change-point models where the time series changes its behaviour after a certain fraction of the observations and derive the associated limiting laws. Our Monte-Carlo studies show that the proposed detection procedures have high power when interpreted as a hypothesis test, and that the decision can often be made very early

arXiv.org e-Print Archive

Pointwise adaptive estimation for robust and quantile regression

Author: Cuenod Charles-Andre
Reiss Markus
Rozenholc Yves
Publication venue
Publication date: 03/04/2009
Field of study

A nonparametric procedure for robust regression estimation and for quantile regression is proposed which is completely data-driven and adapts locally to the regularity of the regression function. This is achieved by considering in each point M-estimators over different local neighbourhoods and by a local model selection procedure based on sequential testing. Non-asymptotic risk bounds are obtained, which yield rate-optimality for large sample asymptotics under weak conditions. Simulations for different univariate median regression models show good finite sample properties, also in comparison to traditional methods. The approach is extended to image denoising and applied to CT scans in cancer research

arXiv.org e-Print Archive

CiteSeerX

HAL Descartes

msBP: An R package to perform Bayesian nonparametric inference using multiscale Bernstein polynomials mixtures

Author: Canale Antonio
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/01/2017
Field of study

msBP is an R package that implements a new method to perform Bayesian multiscale nonparametric inference introduced by Canale and Dunson (2016). The method, based on mixtures of multiscale beta dictionary densities, overcomes the drawbacks of Pólya trees and inherits many of the advantages of Dirichlet process mixture models. The key idea is that an infinitely-deep binary tree is introduced, with a beta dictionary density assigned to each node of the tree. Using a multiscale stick-breaking characterization, stochastically decreasing weights are assigned to each node. The result is an infinite mixture model. The package msBP implements a series of basic functions to deal with this family of priors such as random densities and numbers generation, creation and manipulation of binary tree objects, and generic functions to plot and print the results. In addition, it implements the Gibbs samplers for posterior computation to perform multiscale density estimation and multiscale testing of group differences described in Canale and Dunson (2016)

Directory of Open Access Journals

Journal of Statistical Software

Archivio istituzionale della ricerca - Università di Padova

Nearest-Neighbor Neural Networks for Geostatistics

Author: Guan Yawen
Reich Brian J
Wang Haoyu
Publication venue
Publication date: 28/03/2019
Field of study

Kriging is the predominant method used for spatial prediction, but relies on the assumption that predictions are linear combinations of the observations. Kriging often also relies on additional assumptions such as normality and stationarity. We propose a more flexible spatial prediction method based on the Nearest-Neighbor Neural Network (4N) process that embeds deep learning into a geostatistical model. We show that the 4N process is a valid stochastic process and propose a series of new ways to construct features to be used as inputs to the deep learning model based on neighboring information. Our model framework outperforms some existing state-of-art geostatistical modelling methods for simulated non-Gaussian data and is applied to a massive forestry dataset

arXiv.org e-Print Archive

Sequentially Updated Residuals and Detection of Stationary Errors in Polynomial Regression Models

Author: Steland Ansgar
Publication venue
Publication date: 12/01/2010
Field of study

The question whether a time series behaves as a random walk or as a station- ary process is an important and delicate problem, particularly arising in financial statistics, econometrics, and engineering. This paper studies the problem to detect sequentially that the error terms in a polynomial regression model no longer behave as a random walk but as a stationary process. We provide the asymptotic distribution theory for a monitoring procedure given by a control chart, i.e., a stopping time, which is related to a well known unit root test statistic calculated from sequentially updated residuals. We provide a functional central limit theorem for the corresponding stochastic process which implies a central limit theorem for the control chart. The finite sample properties are investigated by a simulation study

arXiv.org e-Print Archive

Selective Sequential Model Selection

Author: Fithian William
Taylor Jonathan
Tibshirani Robert
Tibshirani Ryan
Publication venue
Publication date: 08/12/2015
Field of study

Many model selection algorithms produce a path of fits specifying a sequence of increasingly complex models. Given such a sequence and the data used to produce them, we consider the problem of choosing the least complex model that is not falsified by the data. Extending the selected-model tests of Fithian et al. (2014), we construct p-values for each step in the path which account for the adaptive selection of the model path using the data. In the case of linear regression, we propose two specific tests, the max-t test for forward stepwise regression (generalizing a proposal of Buja and Brown (2014)), and the next-entry test for the lasso. These tests improve on the power of the saturated-model test of Tibshirani et al. (2014), sometimes dramatically. In addition, our framework extends beyond linear regression to a much more general class of parametric and nonparametric model selection problems. To select a model, we can feed our single-step p-values as inputs into sequential stopping rules such as those proposed by G'Sell et al. (2013) and Li and Barber (2015), achieving control of the familywise error rate or false discovery rate (FDR) as desired. The FDR-controlling rules require the null p-values to be independent of each other and of the non-null p-values, a condition not satisfied by the saturated-model p-values of Tibshirani et al. (2014). We derive intuitive and general sufficient conditions for independence, and show that our proposed constructions yield independent p-values

arXiv.org e-Print Archive

NP-optimal kernels for nonparametric sequential detection rules

Author: Steland Ansgar
Publication venue
Publication date
Field of study

An attractive nonparametric method to detect change-points sequentially is to apply control charts based on kernel smoothers. Recently, the strong convergence of the associated normed delay associated with such a sequential stopping rule has been studied under sequences of out-of-control models. Kernel smoothers employ a kernel function to downweight past data. Since kernel functions with values in the unit interval are sufficient for that task, we study the problem to optimize the asymptotic normed delay over a class of kernels ensuring that restriction and certain additional moment constraints. We apply the key theorem to discuss several important examples where explicit solutions exist to illustrate that the results are applicable. --Control charts,financial data,nonparametric regression,quality control,statistical genetics

Research Papers in Economics

Group sequential designs for negative binomial outcomes

Author: Friede Tim
Glimm Ekkehard
Mütze Tobias
Schmidli Heinz
Publication venue: 'SAGE Publications'
Publication date: 06/03/2019
Field of study

Count data and recurrent events in clinical trials, such as the number of lesions in magnetic resonance imaging in multiple sclerosis, the number of relapses in multiple sclerosis, the number of hospitalizations in heart failure, and the number of exacerbations in asthma or in chronic obstructive pulmonary disease (COPD) are often modeled by negative binomial distributions. In this manuscript we study planning and analyzing clinical trials with group sequential designs for negative binomial outcomes. We propose a group sequential testing procedure for negative binomial outcomes based on Wald statistics using maximum likelihood estimators. The asymptotic distribution of the proposed group sequential tests statistics are derived. The finite sample size properties of the proposed group sequential test for negative binomial outcomes and the methods for planning the respective clinical trials are assessed in a simulation study. The simulation scenarios are motivated by clinical trials in chronic heart failure and relapsing multiple sclerosis, which cover a wide range of practically relevant settings. Our research assures that the asymptotic normal theory of group sequential designs can be applied to negative binomial outcomes when the hypotheses are tested using Wald statistics and maximum likelihood estimators. We also propose two methods, one based on Student's t-distribution and one based on resampling, to improve type I error rate control in small samples. The statistical methods studied in this manuscript are implemented in the R package \textit{gscounts}, which is available for download on the Comprehensive R Archive Network (CRAN)

arXiv.org e-Print Archive