1,968 research outputs found
Classification with the nearest neighbor rule in general finite dimensional spaces: necessary and sufficient conditions
Given an -sample of random vectors whose
joint law is unknown, the long-standing problem of supervised classification
aims to \textit{optimally} predict the label of a given a new observation
. In this context, the nearest neighbor rule is a popular flexible and
intuitive method in non-parametric situations.
Even if this algorithm is commonly used in the machine learning and
statistics communities, less is known about its prediction ability in general
finite dimensional spaces, especially when the support of the density of the
observations is . This paper is devoted to the study of the
statistical properties of the nearest neighbor rule in various situations. In
particular, attention is paid to the marginal law of , as well as the
smoothness and margin properties of the \textit{regression function} . We identify two necessary and sufficient conditions to
obtain uniform consistency rates of classification and to derive sharp
estimates in the case of the nearest neighbor rule. Some numerical experiments
are proposed at the end of the paper to help illustrate the discussion.Comment: 53 Pages, 3 figure
Intensity estimation of non-homogeneous Poisson processes from shifted trajectories
In this paper, we consider the problem of estimating nonparametrically a mean pattern intensity λ from the observation of n independent and non-homogeneous Poisson processes N1,âŠ,Nn on the interval [0,1]. This problem arises when data (counts) are collected independently from n individuals according to similar Poisson processes. We show that estimating this intensity is a deconvolution problem for which the density of the random shifts plays the role of the convolution operator. In an asymptotic setting where the number n of observed trajectories tends to infinity, we derive upper and lower bounds for the minimax quadratic risk over Besov balls. Non-linear thresholding in a Meyer wavelet basis is used to derive an adaptive estimator of the intensity. The proposed estimator is shown to achieve a near-minimax rate of convergence. This rate depends both on the smoothness of the intensity function and the density of the random shifts, which makes a connection between the classical deconvolution problem in nonparametric statistics and the estimation of a mean intensity from the observations of independent Poisson processes
Intensity estimation of non-homogeneous Poisson processes from shifted trajectories
This paper considers the problem of adaptive estimation of a non-homogeneous
intensity function from the observation of n independent Poisson processes
having a common intensity that is randomly shifted for each observed
trajectory. We show that estimating this intensity is a deconvolution problem
for which the density of the random shifts plays the role of the convolution
operator. In an asymptotic setting where the number n of observed trajectories
tends to infinity, we derive upper and lower bounds for the minimax quadratic
risk over Besov balls. Non-linear thresholding in a Meyer wavelet basis is used
to derive an adaptive estimator of the intensity. The proposed estimator is
shown to achieve a near-minimax rate of convergence. This rate depends both on
the smoothness of the intensity function and the density of the random shifts,
which makes a connection between the classical deconvolution problem in
nonparametric statistics and the estimation of a mean intensity from the
observations of independent Poisson processes
Martingales and Profile of Binary Search Trees
We are interested in the asymptotic analysis of the binary search tree (BST)
under the random permutation model. Via an embedding in a continuous time
model, we get new results, in particular the asymptotic behavior of the
profile
Geodesic PCA in the Wasserstein space
We introduce the method of Geodesic Principal Component Analysis (GPCA) on
the space of probability measures on the line, with finite second moment,
endowed with the Wasserstein metric. We discuss the advantages of this
approach, over a standard functional PCA of probability densities in the
Hilbert space of square-integrable functions. We establish the consistency of
the method by showing that the empirical GPCA converges to its population
counterpart, as the sample size tends to infinity. A key property in the study
of GPCA is the isometry between the Wasserstein space and a closed convex
subset of the space of square-integrable functions, with respect to an
appropriate measure. Therefore, we consider the general problem of PCA in a
closed convex subset of a separable Hilbert space, which serves as basis for
the analysis of GPCA and also has interest in its own right. We provide
illustrative examples on simple statistical models, to show the benefits of
this approach for data analysis. The method is also applied to a real dataset
of population pyramids
- âŠ