Search CORE

1,968 research outputs found

Classification with the nearest neighbor rule in general finite dimensional spaces: necessary and sufficient conditions

Author: Gadat Sébastien
Klein Thierry
Marteau Clément
Publication venue
Publication date: 01/11/2014
Field of study

Given an

n

-sample of random vectors

(X_i,Y_i)_{1 \leq i \leq n}

whose joint law is unknown, the long-standing problem of supervised classification aims to \textit{optimally} predict the label

Y

of a given a new observation

X

. In this context, the nearest neighbor rule is a popular flexible and intuitive method in non-parametric situations. Even if this algorithm is commonly used in the machine learning and statistics communities, less is known about its prediction ability in general finite dimensional spaces, especially when the support of the density of the observations is

\mathbb{R}^d

. This paper is devoted to the study of the statistical properties of the nearest neighbor rule in various situations. In particular, attention is paid to the marginal law of

X

, as well as the smoothness and margin properties of the \textit{regression function}

\eta(X) = \mathbb{E}[Y | X]

. We identify two necessary and sufficient conditions to obtain uniform consistency rates of classification and to derive sharp estimates in the case of the nearest neighbor rule. Some numerical experiments are proposed at the end of the paper to help illustrate the discussion.Comment: 53 Pages, 3 figure

arXiv.org e-Print Archive

Toulouse Capitole Publications

Toulouse 1 Capitole Publications

Intensity estimation of non-homogeneous Poisson processes from shifted trajectories

Author: Bigot Jérémie
Gadat Sébastien
Klein Thierry
Marteau Clément
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2013
Field of study

In this paper, we consider the problem of estimating nonparametrically a mean pattern intensity λ from the observation of n independent and non-homogeneous Poisson processes N1,…,Nn on the interval [0,1]. This problem arises when data (counts) are collected independently from n individuals according to similar Poisson processes. We show that estimating this intensity is a deconvolution problem for which the density of the random shifts plays the role of the convolution operator. In an asymptotic setting where the number n of observed trajectories tends to infinity, we derive upper and lower bounds for the minimax quadratic risk over Besov balls. Non-linear thresholding in a Meyer wavelet basis is used to derive an adaptive estimator of the intensity. The proposed estimator is shown to achieve a near-minimax rate of convergence. This rate depends both on the smoothness of the intensity function and the density of the random shifts, which makes a connection between the classical deconvolution problem in nonparametric statistics and the estimation of a mean intensity from the observations of independent Poisson processes

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Toulouse Capitole Publications

Toulouse 1 Capitole Publications

HAL-INSA Toulouse

Intensity estimation of non-homogeneous Poisson processes from shifted trajectories

Author: Bigot Jérémie
Gadat Sébastien
Klein Thierry
Marteau Clément
Publication venue
Publication date: 18/05/2011
Field of study

This paper considers the problem of adaptive estimation of a non-homogeneous intensity function from the observation of n independent Poisson processes having a common intensity that is randomly shifted for each observed trajectory. We show that estimating this intensity is a deconvolution problem for which the density of the random shifts plays the role of the convolution operator. In an asymptotic setting where the number n of observed trajectories tends to infinity, we derive upper and lower bounds for the minimax quadratic risk over Besov balls. Non-linear thresholding in a Meyer wavelet basis is used to derive an adaptive estimator of the intensity. The proposed estimator is shown to achieve a near-minimax rate of convergence. This rate depends both on the smoothness of the intensity function and the density of the random shifts, which makes a connection between the classical deconvolution problem in nonparametric statistics and the estimation of a mean intensity from the observations of independent Poisson processes

arXiv.org e-Print Archive

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

Martingales and Profile of Binary Search Trees

Author: Chauvin Brigitte
Klein Thierry
Marckert Jean-Francois
Rouault Alain
Publication venue
Publication date: 01/01/2004
Field of study

We are interested in the asymptotic analysis of the binary search tree (BST) under the random permutation model. Via an embedding in a continuous time model, we get new results, in particular the asymptotic behavior of the profile

arXiv.org e-Print Archive

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

HAL UVSQ

Geodesic PCA in the Wasserstein space

Author: Alfredo López
Dmia Isae
Jérémie Bigot
Raúl Gouet
Thierry Klein
Publication venue
Publication date: 03/10/2014
Field of study

We introduce the method of Geodesic Principal Component Analysis (GPCA) on the space of probability measures on the line, with finite second moment, endowed with the Wasserstein metric. We discuss the advantages of this approach, over a standard functional PCA of probability densities in the Hilbert space of square-integrable functions. We establish the consistency of the method by showing that the empirical GPCA converges to its population counterpart, as the sample size tends to infinity. A key property in the study of GPCA is the isometry between the Wasserstein space and a closed convex subset of the space of square-integrable functions, with respect to an appropriate measure. Therefore, we consider the general problem of PCA in a closed convex subset of a separable Hilbert space, which serves as basis for the analysis of GPCA and also has interest in its own right. We provide illustrative examples on simple statistical models, to show the benefits of this approach for data analysis. The method is also applied to a real dataset of population pyramids

arXiv.org e-Print Archive

CiteSeerX