1,968 research outputs found

    Classification with the nearest neighbor rule in general finite dimensional spaces: necessary and sufficient conditions

    Get PDF
    Given an nn-sample of random vectors (Xi,Yi)1≀i≀n(X_i,Y_i)_{1 \leq i \leq n} whose joint law is unknown, the long-standing problem of supervised classification aims to \textit{optimally} predict the label YY of a given a new observation XX. In this context, the nearest neighbor rule is a popular flexible and intuitive method in non-parametric situations. Even if this algorithm is commonly used in the machine learning and statistics communities, less is known about its prediction ability in general finite dimensional spaces, especially when the support of the density of the observations is Rd\mathbb{R}^d. This paper is devoted to the study of the statistical properties of the nearest neighbor rule in various situations. In particular, attention is paid to the marginal law of XX, as well as the smoothness and margin properties of the \textit{regression function} η(X)=E[Y∣X]\eta(X) = \mathbb{E}[Y | X]. We identify two necessary and sufficient conditions to obtain uniform consistency rates of classification and to derive sharp estimates in the case of the nearest neighbor rule. Some numerical experiments are proposed at the end of the paper to help illustrate the discussion.Comment: 53 Pages, 3 figure

    Intensity estimation of non-homogeneous Poisson processes from shifted trajectories

    Get PDF
    In this paper, we consider the problem of estimating nonparametrically a mean pattern intensity λ from the observation of n independent and non-homogeneous Poisson processes N1,
,Nn on the interval [0,1]. This problem arises when data (counts) are collected independently from n individuals according to similar Poisson processes. We show that estimating this intensity is a deconvolution problem for which the density of the random shifts plays the role of the convolution operator. In an asymptotic setting where the number n of observed trajectories tends to infinity, we derive upper and lower bounds for the minimax quadratic risk over Besov balls. Non-linear thresholding in a Meyer wavelet basis is used to derive an adaptive estimator of the intensity. The proposed estimator is shown to achieve a near-minimax rate of convergence. This rate depends both on the smoothness of the intensity function and the density of the random shifts, which makes a connection between the classical deconvolution problem in nonparametric statistics and the estimation of a mean intensity from the observations of independent Poisson processes

    Intensity estimation of non-homogeneous Poisson processes from shifted trajectories

    Get PDF
    This paper considers the problem of adaptive estimation of a non-homogeneous intensity function from the observation of n independent Poisson processes having a common intensity that is randomly shifted for each observed trajectory. We show that estimating this intensity is a deconvolution problem for which the density of the random shifts plays the role of the convolution operator. In an asymptotic setting where the number n of observed trajectories tends to infinity, we derive upper and lower bounds for the minimax quadratic risk over Besov balls. Non-linear thresholding in a Meyer wavelet basis is used to derive an adaptive estimator of the intensity. The proposed estimator is shown to achieve a near-minimax rate of convergence. This rate depends both on the smoothness of the intensity function and the density of the random shifts, which makes a connection between the classical deconvolution problem in nonparametric statistics and the estimation of a mean intensity from the observations of independent Poisson processes

    Martingales and Profile of Binary Search Trees

    Full text link
    We are interested in the asymptotic analysis of the binary search tree (BST) under the random permutation model. Via an embedding in a continuous time model, we get new results, in particular the asymptotic behavior of the profile

    Geodesic PCA in the Wasserstein space

    Full text link
    We introduce the method of Geodesic Principal Component Analysis (GPCA) on the space of probability measures on the line, with finite second moment, endowed with the Wasserstein metric. We discuss the advantages of this approach, over a standard functional PCA of probability densities in the Hilbert space of square-integrable functions. We establish the consistency of the method by showing that the empirical GPCA converges to its population counterpart, as the sample size tends to infinity. A key property in the study of GPCA is the isometry between the Wasserstein space and a closed convex subset of the space of square-integrable functions, with respect to an appropriate measure. Therefore, we consider the general problem of PCA in a closed convex subset of a separable Hilbert space, which serves as basis for the analysis of GPCA and also has interest in its own right. We provide illustrative examples on simple statistical models, to show the benefits of this approach for data analysis. The method is also applied to a real dataset of population pyramids
    • 

    corecore