19 research outputs found

    Fast rates for noisy clustering

    Get PDF
    The effect of errors in variables in empirical minimization is investigated. Given a loss ll and a set of decision rules G\mathcal{G}, we prove a general upper bound for an empirical minimization based on a deconvolution kernel and a noisy sample Zi=Xi+ϵi,i=1,...,nZ_i=X_i+\epsilon_i,i=1,...,n. We apply this general upper bound to give the rate of convergence for the expected excess risk in noisy clustering. A recent bound from \citet{levrard} proves that this rate is O(1/n)\mathcal{O}(1/n) in the direct case, under Pollard's regularity assumptions. Here the effect of noisy measurements gives a rate of the form O(1/nγγ+2β)\mathcal{O}(1/n^{\frac{\gamma}{\gamma+2\beta}}), where γ\gamma is the H\"older regularity of the density of XX whereas β\beta is the degree of illposedness

    Fast rates for empirical vector quantization

    Get PDF
    We consider the rate of convergence of the expected loss of empirically optimal vector quantizers. Earlier results show that the mean-squared expected distortion for any fixed distribution supported on a bounded set and satisfying some regularity conditions decreases at the rate O(log n/n). We prove that this rate is actually O(1/n). Although these conditions are hard to check, we show that well-polarized distributions with continuous densities supported on a bounded set are included in the scope of this result.Comment: 18 page

    Anisotropic oracle inequalities in noisy quantization

    Get PDF
    The effect of errors in variables in quantization is investigated. We prove general exact and non-exact oracle inequalities with fast rates for an empirical minimization based on a noisy sample Zi=Xi+ϵi,i=1,,nZ_i=X_i+\epsilon_i,i=1,\ldots,n, where XiX_i are i.i.d. with density ff and ϵi\epsilon_i are i.i.d. with density η\eta. These rates depend on the geometry of the density ff and the asymptotic behaviour of the characteristic function of η\eta. This general study can be applied to the problem of kk-means clustering with noisy data. For this purpose, we introduce a deconvolution kk-means stochastic minimization which reaches fast rates of convergence under standard Pollard's regularity assumptions.Comment: 30 pages. arXiv admin note: text overlap with arXiv:1205.141

    Convergence and Rates for Fixed-Interval Multiple-Track Smoothing Using kk-Means Type Optimization

    Get PDF
    We address the task of estimating multiple trajectories from unlabeled data. This problem arises in many settings, one could think of the construction of maps of transport networks from passive observation of travellers, or the reconstruction of the behaviour of uncooperative vehicles from external observations, for example. There are two coupled problems. The first is a data association problem: how to map data points onto individual trajectories. The second is, given a solution to the data association problem, to estimate those trajectories. We construct estimators as a solution to a regularized variational problem (to which approximate solutions can be obtained via the simple, efficient and widespread kk-means method) and show that, as the number of data points, nn, increases, these estimators exhibit stable behaviour. More precisely, we show that they converge in an appropriate Sobolev space in probability and with rate n1/2n^{-1/2}

    Convergence of the kk-Means Minimization Problem using Γ\Gamma-Convergence

    Full text link
    The kk-means method is an iterative clustering algorithm which associates each observation with one of kk clusters. It traditionally employs cluster centers in the same space as the observed data. By relaxing this requirement, it is possible to apply the kk-means method to infinite dimensional problems, for example multiple target tracking and smoothing problems in the presence of unknown data association. Via a Γ\Gamma-convergence argument, the associated optimization problem is shown to converge in the sense that both the kk-means minimum and minimizers converge in the large data limit to quantities which depend upon the observed data only through its distribution. The theory is supplemented with two examples to demonstrate the range of problems now accessible by the kk-means method. The first example combines a non-parametric smoothing problem with unknown data association. The second addresses tracking using sparse data from a network of passive sensors

    Universal multiresolution source codes

    Get PDF
    A multiresolution source code is a single code giving an embedded source description that can be read at a variety of rates and thereby yields reproductions at a variety of resolutions. The resolution of a source reproduction here refers to the accuracy with which it approximates the original source. Thus, a reproduction with low distortion is a “high-resolution” reproduction while a reproduction with high distortion is a “low-resolution” reproduction. This paper treats the generalization of universal lossy source coding from single-resolution source codes to multiresolution source codes. Results described in this work include new definitions for weakly minimax universal, strongly minimax universal, and weighted universal sequences of fixed- and variable-rate multiresolution source codes that extend the corresponding notions from lossless coding and (single-resolution) quantization to multiresolution quantizers. A variety of universal multiresolution source coding results follow, including necessary and sufficient conditions for the existence of universal multiresolution codes, rate of convergence bounds for universal multiresolution coding performance to the theoretical bound, and a new multiresolution approach to two-stage universal source coding
    corecore