480 research outputs found

    Operator norm convergence of spectral clustering on level sets

    Full text link
    Following Hartigan, a cluster is defined as a connected component of the t-level set of the underlying density, i.e., the set of points for which the density is greater than t. A clustering algorithm which combines a density estimate with spectral clustering techniques is proposed. Our algorithm is composed of two steps. First, a nonparametric density estimate is used to extract the data points for which the estimated density takes a value greater than t. Next, the extracted points are clustered based on the eigenvectors of a graph Laplacian matrix. Under mild assumptions, we prove the almost sure convergence in operator norm of the empirical graph Laplacian operator associated with the algorithm. Furthermore, we give the typical behavior of the representation of the dataset into the feature space, which establishes the strong consistency of our proposed algorithm

    On the convergence of maximum variance unfolding

    Full text link
    Maximum Variance Unfolding is one of the main methods for (nonlinear) dimensionality reduction. We study its large sample limit, providing specific rates of convergence under standard assumptions. We find that it is consistent when the underlying submanifold is isometric to a convex subset, and we provide some simple examples where it fails to be consistent

    Nonparametric regression on closed Riemannian manifolds

    No full text
    International audienceThe nonparametric estimation of the regression function of a real-valued random variable Y on a random object X val- ued in a closed Riemannian manifold M is considered. A regression estimator which generalizes kernel regression es- timators on Euclidean sample spaces is introduced. Under classical assumptions on the kernel and the bandwidth se- quence, the asymptotic bias and variance are obtained, and the estimator is shown to converge at the same L2-rate as kernel regression estimators on Euclidean spaces

    Remember the Curse of Dimensionality: The Case of Goodness-of-Fit Testing in Arbitrary Dimension

    Full text link
    Despite a substantial literature on nonparametric two-sample goodness-of-fit testing in arbitrary dimensions spanning decades, there is no mention there of any curse of dimensionality. Only more recently Ramdas et al. (2015) have discussed this issue in the context of kernel methods by showing that their performance degrades with the dimension even when the underlying distributions are isotropic Gaussians. We take a minimax perspective and follow in the footsteps of Ingster (1987) to derive the minimax rate in arbitrary dimension when the discrepancy is measured in the L2 metric. That rate is revealed to be nonparametric and exhibit a prototypical curse of dimensionality. We further extend Ingster's work to show that the chi-squared test achieves the minimax rate. Moreover, we show that the test can be made to work when the distributions have support of low intrinsic dimension. Finally, inspired by Ingster (2000), we consider a multiscale version of the chi-square test which can adapt to unknown smoothness and/or unknown intrinsic dimensionality without much loss in power.Comment: This version comes after the publication of the paper in the Journal of Nonparametric Statistics. The main change is to cite the work of Ramdas et al. Some very minor typos were also correcte

    Inference in phi-families of distributions

    No full text
    International audienceThis paper is devoted to the study of the parametric family of multivari- ate distributions obtained by minimizing a convex functional under linear constraints. Under certain assumptions on the convex functional, it is es- tablished that this family admits an affine parametrization, and parametric estimation from an i.i.d. random sample is studied. It is also shown that the members of this family are the limit distributions arising in inference based on empirical likelihood. As a consequence, given a probability measure ÎĽ0 and an i.i.d. random sample drawn from ÎĽ0, nonparametric confidence do- mains on the generalized moments of ÎĽ0 are obtained

    The Normalized Graph Cut and Cheeger Constant: from Discrete to Continuous

    Full text link
    Let M be a bounded domain of a Euclidian space with smooth boundary. We relate the Cheeger constant of M and the conductance of a neighborhood graph defined on a random sample from M. By restricting the minimization defining the latter over a particular class of subsets, we obtain consistency (after normalization) as the sample size increases, and show that any minimizing sequence of subsets has a subsequence converging to a Cheeger set of M

    Bayesian Methodology for Ocean Color Remote Sensing

    No full text
    66 pagesThe inverse ocean color problem, i.e., the retrieval of marine reflectance from top-of-atmosphere (TOA) reflectance, is examined in a Bayesian context. The solution is expressed as a probability distribution that measures the likelihood of encountering specific values of the marine reflectance given the observed TOA reflectance. This conditional distribution, the posterior distribution, allows the construction of reliable multi-dimensional confidence domains of the retrieved marine reflectance. The expectation and covariance of the posterior distribution are computed, which gives for each pixel an estimate of the marine reflectance and a measure of its uncertainty. Situations for which forward model and observation are incompatible are also identified. Prior distributions of the forward model parameters that are suitable for use at the global scale, as well as a noise model, are determined. Partition-based models are defined and implemented for SeaWiFS, to approximate numerically the expectation and covariance. The ill-posed nature of the inverse problem is illustrated, indicating that a large set of ocean and atmospheric states, or pre-images, may correspond to very close values of the satellite signal. Theoretical performance is good globally, i.e., on average over all the geometric and geophysical situations considered, with negligible biases and standard deviation decreasing from 0.004 at 412 nm to 0.001 at 670 nm. Errors are smaller for geometries that avoid Sun glint and minimize air mass and aerosol influence, and for small aerosol optical thickness and maritime aerosols. The estimated uncertainty is consistent with the inversion error. The theoretical concepts and inverse models are applied to actual SeaWiFS imagery, and comparisons are made with estimates from the SeaDAS standard atmospheric correction algorithm and in situ measurements. The Bayesian and SeaDAS marine reflectance fields exhibit resemblance in patterns of variability, but the Bayesian imagery is less noisy and characterized by different spatial de-correlation scales, with more realistic values in the presence of absorbing aerosols. Experimental errors obtained from match-up data are similar to the theoretical errors determined from simulated data. Regionalization of the inverse models is a natural development to improve retrieval accuracy, for example by including explicit knowledge of the space and time variability of atmospheric variables

    Maximum entropy solution to ill-posed inverse problems with approximately known operator

    Get PDF
    International audienceWe consider the linear inverse problem of reconstructing an unknown finite measure μ from a noisy observation of a generalized moment of μ defined as the integral of a continuous and bounded operator Φ with respect to μ. Motivated by various applications, we focus on the case where the operator Φ is unknown; instead, only an approximation Φm to it is available. An approximate maximum entropy solution to the inverse problem is introduced in the form of a minimizer of a convex functional subject to a sequence of convex constraints. Under several assumptions on the convex functional, the convergence of the approximate solution is established

    Sur l'estimation du support d'une densité

    No full text
    International audienceEtant donnée une densité de probabilité multivariée inconnue ff à support compact et un nn-échantillon i.i.d. issu de ff, nous étudions l'estimateur du support de ff défini par l'union des boules de rayon rnr_n centrées sur les observations. Afin de mesurer la qualité de l'estimation, nous utilisons un critère général fondé sur le volume de la différence symétrique. Sous quelques hypothèses peu restrictives, et en utilisant des outils de la géométrie riemannienne, nous établissons les vitesses de convergence exactes de l'estimateur du support tout en examinant les conséquences statistiques de ces résultats

    Clustering by Estimation of Density Level Sets at a Fixed Probability

    No full text
    In density-based clustering methods, the clusters are defined as the connected components of the upper level sets of the underlying density ff. In this setting, the practitioner fixes a probability pp, and associates with it a threshold t(p)t^{(p)} such that the level set {f≥t(p)}\{f\geq t^{(p)}\} has a probability pp with respect to the distribution induced by ff. This paper is devoted to the estimation of the threshold t(p)t^{(p)}, of the level set {f≥t(p)}\{f\geq t^{(p)}\}, as well as of the number k(t(p))k(t^{(p)}) of connected components of this level set. Given a nonparametric density estimate f^n\hat f_n of ff based on an i.i.d. nn-sample drawn from ff, we first propose a computationally simple estimate tn(p)t_n^{(p)} of t(p)t^{(p)}, and we establish a concentration inequality for this estimate. Next, we consider the plug-in level set estimate {f^n≥tn(p)}\{\hat f_n\geq t_n^{(p)}\}, and we establish the exact convergence rate of the Lebesgue measure of the symmetric difference between {f≥t(p)}\{f \geq t^{(p)}\} and {f^n≥tn(p)}\{\hat f_n\geq t_n^{(p)}\}. Finally, we propose a computationally simple graph-based estimate of k(t(p))k(t^{(p)}), which is shown to be consistent. Thus, the methodology yields a complete procedure for analyzing the grouping structure of the data, as pp varies over (0;1)(0;1)
    • …
    corecore