78 research outputs found

    A note on an Adaptive Goodness-of-Fit test with Finite Sample Validity for Random Design Regression Models

    Full text link
    Given an i.i.d. sample {(Xi,Yi)}i{1n}\{(X_i,Y_i)\}_{i \in \{1 \ldots n\}} from the random design regression model Y=f(X)+ϵY = f(X) + \epsilon with (X,Y)[0,1]×[M,M](X,Y) \in [0,1] \times [-M,M], in this paper we consider the problem of testing the (simple) null hypothesis f=f0f = f_0, against the alternative ff0f \neq f_0 for a fixed f0L2([0,1],GX)f_0 \in L^2([0,1],G_X), where GX()G_X(\cdot) denotes the marginal distribution of the design variable XX. The procedure proposed is an adaptation to the regression setting of a multiple testing technique introduced by Fromont and Laurent (2005), and it amounts to consider a suitable collection of unbiased estimators of the L2L^2--distance d2(f,f0)=[f(x)f0(x)]2dGX(x)d_2(f,f_0) = \int {[f(x) - f_0 (x)]^2 d\,G_X (x)}, rejecting the null hypothesis when at least one of them is greater than its (1uα)(1-u_\alpha) quantile, with uαu_\alpha calibrated to obtain a level--α\alpha test. To build these estimators, we will use the warped wavelet basis introduced by Picard and Kerkyacharian (2004). We do not assume that the errors are normally distributed, and we do not assume that XX and ϵ\epsilon are independent but, mainly for technical reasons, we will assume, as in most part of the current literature in learning theory, that f(x)y|f(x) - y| is uniformly bounded (almost everywhere). We show that our test is adaptive over a particular collection of approximation spaces linked to the classical Besov spaces

    Topological summaries for Time-Varying Data

    Get PDF
    Topology has proven to be a useful tool in the current quest for ”insights on the data”, since it characterises objects through their connectivity structure, in an easy and interpretable way. More specifically, the new, but growing, field of TDA (Topological Data Analysis) deals with Persistent Homology, a multiscale version of Homology Groups summarized by the Persistence Diagram and its functional representations (Persistence Landscapes, Silhouettes etc). All of these objects, how- ever, are designed and work only for static point clouds. We define a new topological summary, the Landscape Surface, that takes into account the changes in the topology of a dynamical point cloud such as a (possibly very high dimensional) time series. We prove its continuity and its stability and, finally, we sketch a simple example

    Persistence Flamelets: multiscale Persistent Homology for kernel density exploration

    Full text link
    In recent years there has been noticeable interest in the study of the "shape of data". Among the many ways a "shape" could be defined, topology is the most general one, as it describes an object in terms of its connectivity structure: connected components (topological features of dimension 0), cycles (features of dimension 1) and so on. There is a growing number of techniques, generally denoted as Topological Data Analysis, aimed at estimating topological invariants of a fixed object; when we allow this object to change, however, little has been done to investigate the evolution in its topology. In this work we define the Persistence Flamelets, a multiscale version of one of the most popular tool in TDA, the Persistence Landscape. We examine its theoretical properties and we show how it could be used to gain insights on KDEs bandwidth parameter

    Supervised Learning with Indefinite Topological Kernels

    Full text link
    Topological Data Analysis (TDA) is a recent and growing branch of statistics devoted to the study of the shape of the data. In this work we investigate the predictive power of TDA in the context of supervised learning. Since topological summaries, most noticeably the Persistence Diagram, are typically defined in complex spaces, we adopt a kernel approach to translate them into more familiar vector spaces. We define a topological exponential kernel, we characterize it, and we show that, despite not being positive semi-definite, it can be successfully used in regression and classification tasks

    Warped Wavelet and Vertical Thresholding

    Get PDF
    Let {(Xi,Yi)}i{1,...,n}\{(X_i,Y_i)\}_{i\in \{1,..., n\}} be an i.i.d. sample from the random design regression model Y=f(X)+ϵY=f(X)+\epsilon with (X,Y)[0,1]×[M,M](X,Y)\in [0,1]\times [-M,M]. In dealing with such a model, adaptation is naturally to be intended in terms of L2([0,1],GX)L^2([0,1],G_X) norm where GX()G_X(\cdot) denotes the (known) marginal distribution of the design variable XX. Recently much work has been devoted to the construction of estimators that adapts in this setting (see, for example, [5,24,25,32]), but only a few of them come along with a easy--to--implement computational scheme. Here we propose a family of estimators based on the warped wavelet basis recently introduced by Picard and Kerkyacharian [36] and a tree-like thresholding rule that takes into account the hierarchical (across-scale) structure of the wavelet coefficients. We show that, if the regression function belongs to a certain class of approximation spaces defined in terms of GX()G_X(\cdot), then our procedure is adaptive and converge to the true regression function with an optimal rate. The results are stated in terms of excess probabilities as in [19].Comment: Submitted to the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Reference charts for fetal cerebellar vermis height: A prospective cross-sectional study of 10605 fetuses

    Get PDF
    A prospective cross-sectional study between September 2009 and December 2014 was carried out at ALTAMEDICA Fetal–Maternal Medical Centre, Rome, Italy. Of 25203 fetal biometric measurements, 12167 (48%) measurements of the cerebellar vermis were available. After excluding 1562 (12.8%) measurements, a total of 10605 (87.2%) fetuses were considered and analyzed once only. Parametric and nonparametric quantile regression models were used for the statistical analysis. In order to evaluate the robustness of the proposed reference charts regarding various distributional assumptions on the ultrasound measurements at hand, we compared the gestational age-specific reference curves we produced through the statistical methods used. Normal mean height based on parametric and nonparametric methods were defined for each week of gestation and the regression equation expressing the height of the cerebellar vermis as a function of gestational age was calculated. Finally the correlation between dimension/gestation was measured

    A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

    Get PDF
    When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available

    RODEO for Sparse Nonparametric Regression and Quantile Regression with Censored Data

    No full text
    RODEO is a recently developed general strategy for nonparametric estimation based on the regularization of the estimator derivatives with respect to the smoothing parameters. In the original nonparametric regression framework, RODEO results in a simple yet effective new algorithm for simultaneous bandwidth and variable selection with interesting theoretical properties. In this work we focus on a censored regression model in which only the response variable is (right) censored whereas the covariates, although fully observed, are supposed to live in a high dimensional space. In order to recover a sparse representation of both the regression function and the quantile regression function, we adapt RODEO to the present setting starting from the weighted local linear estimator proposed by Cai (2003). We study its theoretical properties and evaluate its performance on both real and simulated data sets

    CNRS 2009

    No full text
    corecore