Search CORE

480 research outputs found

Operator norm convergence of spectral clustering on level sets

Author: Pelletier Bruno
Pudlo Pierre
Publication venue
Publication date: 11/02/2010
Field of study

Following Hartigan, a cluster is defined as a connected component of the t-level set of the underlying density, i.e., the set of points for which the density is greater than t. A clustering algorithm which combines a density estimate with spectral clustering techniques is proposed. Our algorithm is composed of two steps. First, a nonparametric density estimate is used to extract the data points for which the estimated density takes a value greater than t. Next, the extracted points are clustered based on the eigenvectors of a graph Laplacian matrix. Under mild assumptions, we prove the almost sure convergence in operator norm of the empirical graph Laplacian operator associated with the algorithm. Furthermore, we give the typical behavior of the representation of the dataset into the feature space, which establishes the strong consistency of our proposed algorithm

arXiv.org e-Print Archive

HAL Descartes

HAL-Rennes 1

On the convergence of maximum variance unfolding

Author: Arias-Castro Ery
Pelletier Bruno
Publication venue
Publication date: 01/01/2013
Field of study

Maximum Variance Unfolding is one of the main methods for (nonlinear) dimensionality reduction. We study its large sample limit, providing specific rates of convergence under standard assumptions. We find that it is consistent when the underlying submanifold is isometric to a convex subset, and we provide some simple examples where it fails to be consistent

arXiv.org e-Print Archive

CiteSeerX

HAL-Rennes 1

Nonparametric regression on closed Riemannian manifolds

Author: Pelletier Bruno
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2006
Field of study

International audienceThe nonparametric estimation of the regression function of a real-valued random variable Y on a random object X val- ued in a closed Riemannian manifold M is considered. A regression estimator which generalizes kernel regression es- timators on Euclidean sample spaces is introduced. Under classical assumptions on the kernel and the bandwidth se- quence, the asymptotic bias and variance are obtained, and the estimator is shown to converge at the same L2-rate as kernel regression estimators on Euclidean spaces

HAL-Rennes 1

Remember the Curse of Dimensionality: The Case of Goodness-of-Fit Testing in Arbitrary Dimension

Author: Arias-Castro Ery
Pelletier Bruno
Saligrama Venkatesh
Publication venue
Publication date: 01/01/2018
Field of study

Despite a substantial literature on nonparametric two-sample goodness-of-fit testing in arbitrary dimensions spanning decades, there is no mention there of any curse of dimensionality. Only more recently Ramdas et al. (2015) have discussed this issue in the context of kernel methods by showing that their performance degrades with the dimension even when the underlying distributions are isotropic Gaussians. We take a minimax perspective and follow in the footsteps of Ingster (1987) to derive the minimax rate in arbitrary dimension when the discrepancy is measured in the L2 metric. That rate is revealed to be nonparametric and exhibit a prototypical curse of dimensionality. We further extend Ingster's work to show that the chi-squared test achieves the minimax rate. Moreover, we show that the test can be made to work when the distributions have support of low intrinsic dimension. Finally, inspired by Ingster (2000), we consider a multiscale version of the chi-square test which can adapt to unknown smoothness and/or unknown intrinsic dimensionality without much loss in power.Comment: This version comes after the publication of the paper in the Journal of Nonparametric Statistics. The main change is to cite the work of Ramdas et al. Some very minor typos were also correcte

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

HAL-Rennes 1

Inference in phi-families of distributions

Author: Pelletier Bruno
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

International audienceThis paper is devoted to the study of the parametric family of multivari- ate distributions obtained by minimizing a convex functional under linear constraints. Under certain assumptions on the convex functional, it is es- tablished that this family admits an affine parametrization, and parametric estimation from an i.i.d. random sample is studied. It is also shown that the members of this family are the limit distributions arising in inference based on empirical likelihood. As a consequence, given a probability measure μ0 and an i.i.d. random sample drawn from μ0, nonparametric confidence do- mains on the generalized moments of μ0 are obtained

HAL-Rennes 1

The Normalized Graph Cut and Cheeger Constant: from Discrete to Continuous

Author: Arias-Castro Ery
Pelletier Bruno
Pudlo Pierre
Publication venue: 'Applied Probability Trust'
Publication date: 09/06/2011
Field of study

Let M be a bounded domain of a Euclidian space with smooth boundary. We relate the Cheeger constant of M and the conductance of a neighborhood graph defined on a random sample from M. By restricting the minimization defining the latter over a particular class of subsets, we obtain consistency (after normalization) as the sample size increases, and show that any minimizing sequence of subsets has a subsequence converging to a Cheeger set of M

arXiv.org e-Print Archive

CiteSeerX

Crossref

HAL-Rennes 1

Bayesian Methodology for Ocean Color Remote Sensing

Author: Frouin Robert
Pelletier Bruno
Publication venue: HAL CCSD
Publication date: 13/05/2013
Field of study

66 pagesThe inverse ocean color problem, i.e., the retrieval of marine reflectance from top-of-atmosphere (TOA) reflectance, is examined in a Bayesian context. The solution is expressed as a probability distribution that measures the likelihood of encountering specific values of the marine reflectance given the observed TOA reflectance. This conditional distribution, the posterior distribution, allows the construction of reliable multi-dimensional confidence domains of the retrieved marine reflectance. The expectation and covariance of the posterior distribution are computed, which gives for each pixel an estimate of the marine reflectance and a measure of its uncertainty. Situations for which forward model and observation are incompatible are also identified. Prior distributions of the forward model parameters that are suitable for use at the global scale, as well as a noise model, are determined. Partition-based models are defined and implemented for SeaWiFS, to approximate numerically the expectation and covariance. The ill-posed nature of the inverse problem is illustrated, indicating that a large set of ocean and atmospheric states, or pre-images, may correspond to very close values of the satellite signal. Theoretical performance is good globally, i.e., on average over all the geometric and geophysical situations considered, with negligible biases and standard deviation decreasing from 0.004 at 412 nm to 0.001 at 670 nm. Errors are smaller for geometries that avoid Sun glint and minimize air mass and aerosol influence, and for small aerosol optical thickness and maritime aerosols. The estimated uncertainty is consistent with the inversion error. The theoretical concepts and inverse models are applied to actual SeaWiFS imagery, and comparisons are made with estimates from the SeaDAS standard atmospheric correction algorithm and in situ measurements. The Bayesian and SeaDAS marine reflectance fields exhibit resemblance in patterns of variability, but the Bayesian imagery is less noisy and characterized by different spatial de-correlation scales, with more realistic values in the presence of absorbing aerosols. Experimental errors obtained from match-up data are similar to the theoretical errors determined from simulated data. Regionalization of the inverse models is a natural development to improve retrieval accuracy, for example by including explicit knowledge of the space and time variability of atmospheric variables

HAL-Rennes 1

Maximum entropy solution to ill-posed inverse problems with approximately known operator

Author: Loubes Jean-Michel
Pelletier Bruno
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

International audienceWe consider the linear inverse problem of reconstructing an unknown finite measure μ from a noisy observation of a generalized moment of μ defined as the integral of a continuous and bounded operator Φ with respect to μ. Motivated by various applications, we focus on the case where the operator Φ is unknown; instead, only an approximation Φm to it is available. An approximate maximum entropy solution to the inverse problem is introduced in the form of a minimizer of a convex functional subject to a sequence of convex constraints. Under several assumptions on the convex functional, the convergence of the approximate solution is established

Elsevier - Publisher Connector

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

HAL-Rennes 1

Sur l'estimation du support d'une densité

Author: Biau Gérard
Cadre Benoît
Pelletier Bruno
Publication venue: HAL CCSD
Publication date: 01/01/2010
Field of study

International audienceEtant donnée une densité de probabilité multivariée inconnue

f

à support compact et un

n

-échantillon i.i.d. issu de

f

, nous étudions l'estimateur du support de

f

défini par l'union des boules de rayon

r_n

centrées sur les observations. Afin de mesurer la qualité de l'estimation, nous utilisons un critère général fondé sur le volume de la différence symétrique. Sous quelques hypothèses peu restrictives, et en utilisant des outils de la géométrie riemannienne, nous établissons les vitesses de convergence exactes de l'estimateur du support tout en examinant les conséquences statistiques de ces résultats

HAL-Rennes 1

Clustering by Estimation of Density Level Sets at a Fixed Probability

Author: Cadre Benoît
Pelletier Bruno
Pudlo Pierre
Publication venue: HAL CCSD
Publication date: 22/06/2009
Field of study

In density-based clustering methods, the clusters are defined as the connected components of the upper level sets of the underlying density

f

. In this setting, the practitioner fixes a probability

p

, and associates with it a threshold

t^{(p)}

such that the level set

\{f\geq t^{(p)}\}

has a probability

p

with respect to the distribution induced by

f

. This paper is devoted to the estimation of the threshold

t^{(p)}

, of the level set

\{f\geq t^{(p)}\}

, as well as of the number

k(t^{(p)})

of connected components of this level set. Given a nonparametric density estimate

\hat f_n

f

based on an i.i.d.

n

-sample drawn from

f

, we first propose a computationally simple estimate

t_n^{(p)}

t^{(p)}

, and we establish a concentration inequality for this estimate. Next, we consider the plug-in level set estimate

\{\hat f_n\geq t_n^{(p)}\}

, and we establish the exact convergence rate of the Lebesgue measure of the symmetric difference between

\{f \geq t^{(p)}\}

and

\{\hat f_n\geq t_n^{(p)}\}

. Finally, we propose a computationally simple graph-based estimate of

k(t^{(p)})

, which is shown to be consistent. Thus, the methodology yields a complete procedure for analyzing the grouping structure of the data, as

p

varies over

(0;1)

HAL-Rennes 1