78 research outputs found
A note on an Adaptive Goodness-of-Fit test with Finite Sample Validity for Random Design Regression Models
Given an i.i.d. sample from the random
design regression model with , in this paper we consider the problem of testing the (simple) null
hypothesis , against the alternative for a fixed , where denotes the marginal distribution of the
design variable . The procedure proposed is an adaptation to the regression
setting of a multiple testing technique introduced by Fromont and Laurent
(2005), and it amounts to consider a suitable collection of unbiased estimators
of the --distance ,
rejecting the null hypothesis when at least one of them is greater than its
quantile, with calibrated to obtain a level--
test. To build these estimators, we will use the warped wavelet basis
introduced by Picard and Kerkyacharian (2004). We do not assume that the errors
are normally distributed, and we do not assume that and are
independent but, mainly for technical reasons, we will assume, as in most part
of the current literature in learning theory, that is uniformly
bounded (almost everywhere). We show that our test is adaptive over a
particular collection of approximation spaces linked to the classical Besov
spaces
Topological summaries for Time-Varying Data
Topology has proven to be a useful tool in the current quest for ”insights on the data”, since it characterises objects through their connectivity structure, in an easy and interpretable way. More specifically, the new, but growing, field of TDA (Topological Data Analysis) deals with Persistent Homology, a multiscale version of Homology Groups summarized by the Persistence Diagram and its functional representations (Persistence Landscapes, Silhouettes etc). All of these objects, how- ever, are designed and work only for static point clouds. We define a new topological summary, the Landscape Surface, that takes into account the changes in the topology of a dynamical point cloud such as a (possibly very high dimensional) time series. We prove its continuity and its stability and, finally, we sketch a simple example
Persistence Flamelets: multiscale Persistent Homology for kernel density exploration
In recent years there has been noticeable interest in the study of the "shape
of data". Among the many ways a "shape" could be defined, topology is the most
general one, as it describes an object in terms of its connectivity structure:
connected components (topological features of dimension 0), cycles (features of
dimension 1) and so on. There is a growing number of techniques, generally
denoted as Topological Data Analysis, aimed at estimating topological
invariants of a fixed object; when we allow this object to change, however,
little has been done to investigate the evolution in its topology. In this work
we define the Persistence Flamelets, a multiscale version of one of the most
popular tool in TDA, the Persistence Landscape. We examine its theoretical
properties and we show how it could be used to gain insights on KDEs bandwidth
parameter
Supervised Learning with Indefinite Topological Kernels
Topological Data Analysis (TDA) is a recent and growing branch of statistics
devoted to the study of the shape of the data. In this work we investigate the
predictive power of TDA in the context of supervised learning. Since
topological summaries, most noticeably the Persistence Diagram, are typically
defined in complex spaces, we adopt a kernel approach to translate them into
more familiar vector spaces. We define a topological exponential kernel, we
characterize it, and we show that, despite not being positive semi-definite, it
can be successfully used in regression and classification tasks
Warped Wavelet and Vertical Thresholding
Let be an i.i.d. sample from the random
design regression model with .
In dealing with such a model, adaptation is naturally to be intended in terms
of norm where denotes the (known) marginal
distribution of the design variable . Recently much work has been devoted to
the construction of estimators that adapts in this setting (see, for example,
[5,24,25,32]), but only a few of them come along with a easy--to--implement
computational scheme. Here we propose a family of estimators based on the
warped wavelet basis recently introduced by Picard and Kerkyacharian [36] and a
tree-like thresholding rule that takes into account the hierarchical
(across-scale) structure of the wavelet coefficients. We show that, if the
regression function belongs to a certain class of approximation spaces defined
in terms of , then our procedure is adaptive and converge to the
true regression function with an optimal rate. The results are stated in terms
of excess probabilities as in [19].Comment: Submitted to the Electronic Journal of Statistics
(http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics
(http://www.imstat.org
Reference charts for fetal cerebellar vermis height: A prospective cross-sectional study of 10605 fetuses
A prospective cross-sectional study between September 2009 and December 2014 was carried out at ALTAMEDICA Fetal–Maternal Medical Centre, Rome, Italy. Of 25203 fetal biometric measurements, 12167 (48%) measurements of the cerebellar vermis were available. After excluding 1562 (12.8%) measurements, a total of 10605 (87.2%) fetuses were considered and analyzed once only. Parametric and nonparametric quantile regression models were used for the statistical analysis. In order to evaluate the robustness of the proposed reference charts regarding various distributional assumptions on the ultrasound measurements at hand, we compared the gestational age-specific reference curves we produced through the statistical methods used. Normal mean height based on parametric and nonparametric methods were defined for each week of gestation and the regression equation expressing the height of the cerebellar vermis as a function of gestational age was calculated. Finally the correlation between dimension/gestation was measured
A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium
When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available
RODEO for Sparse Nonparametric Regression and Quantile Regression with Censored Data
RODEO is a recently developed general strategy for nonparametric estimation based on the regularization of the estimator derivatives with respect to the smoothing parameters. In the original nonparametric regression framework, RODEO results in a simple yet effective new algorithm for simultaneous bandwidth and variable selection with interesting theoretical properties. In this work we focus on a censored regression model in which only the response variable is (right) censored whereas the covariates, although fully observed, are supposed to live in a high dimensional space. In order to recover a sparse representation of both the regression function and the quantile regression function, we adapt RODEO to the present setting starting from the weighted local linear estimator proposed by Cai (2003). We study its theoretical properties and evaluate its performance on both real and simulated data sets
- …