19 research outputs found
Fast rates for noisy clustering
The effect of errors in variables in empirical minimization is investigated.
Given a loss and a set of decision rules , we prove a general
upper bound for an empirical minimization based on a deconvolution kernel and a
noisy sample . We apply this general upper bound
to give the rate of convergence for the expected excess risk in noisy
clustering. A recent bound from \citet{levrard} proves that this rate is
in the direct case, under Pollard's regularity assumptions.
Here the effect of noisy measurements gives a rate of the form
, where is the
H\"older regularity of the density of whereas is the degree of
illposedness
Fast rates for empirical vector quantization
We consider the rate of convergence of the expected loss of empirically
optimal vector quantizers. Earlier results show that the mean-squared expected
distortion for any fixed distribution supported on a bounded set and satisfying
some regularity conditions decreases at the rate O(log n/n). We prove that this
rate is actually O(1/n). Although these conditions are hard to check, we show
that well-polarized distributions with continuous densities supported on a
bounded set are included in the scope of this result.Comment: 18 page
Anisotropic oracle inequalities in noisy quantization
The effect of errors in variables in quantization is investigated. We prove
general exact and non-exact oracle inequalities with fast rates for an
empirical minimization based on a noisy sample
, where are i.i.d. with density and
are i.i.d. with density . These rates depend on the geometry
of the density and the asymptotic behaviour of the characteristic function
of .
This general study can be applied to the problem of -means clustering with
noisy data. For this purpose, we introduce a deconvolution -means stochastic
minimization which reaches fast rates of convergence under standard Pollard's
regularity assumptions.Comment: 30 pages. arXiv admin note: text overlap with arXiv:1205.141
Convergence and Rates for Fixed-Interval Multiple-Track Smoothing Using -Means Type Optimization
We address the task of estimating multiple trajectories from unlabeled data.
This problem arises in many settings, one could think of the construction of
maps of transport networks from passive observation of travellers, or the
reconstruction of the behaviour of uncooperative vehicles from external
observations, for example. There are two coupled problems. The first is a data
association problem: how to map data points onto individual trajectories. The
second is, given a solution to the data association problem, to estimate those
trajectories. We construct estimators as a solution to a regularized
variational problem (to which approximate solutions can be obtained via the
simple, efficient and widespread -means method) and show that, as the number
of data points, , increases, these estimators exhibit stable behaviour. More
precisely, we show that they converge in an appropriate Sobolev space in
probability and with rate
Convergence of the -Means Minimization Problem using -Convergence
The -means method is an iterative clustering algorithm which associates
each observation with one of clusters. It traditionally employs cluster
centers in the same space as the observed data. By relaxing this requirement,
it is possible to apply the -means method to infinite dimensional problems,
for example multiple target tracking and smoothing problems in the presence of
unknown data association. Via a -convergence argument, the associated
optimization problem is shown to converge in the sense that both the -means
minimum and minimizers converge in the large data limit to quantities which
depend upon the observed data only through its distribution. The theory is
supplemented with two examples to demonstrate the range of problems now
accessible by the -means method. The first example combines a non-parametric
smoothing problem with unknown data association. The second addresses tracking
using sparse data from a network of passive sensors
Universal multiresolution source codes
A multiresolution source code is a single code giving an embedded source description that can be read at a variety of rates and thereby yields reproductions at a variety of resolutions. The resolution of a source reproduction here refers to the accuracy with which it approximates the original source. Thus, a reproduction with low distortion is a “high-resolution” reproduction while a reproduction with high distortion is a “low-resolution” reproduction. This paper treats the generalization of universal lossy source coding from single-resolution source codes to multiresolution source codes. Results described in this work include new definitions for weakly minimax universal, strongly minimax universal, and weighted universal sequences of fixed- and variable-rate multiresolution source codes that extend the corresponding notions from lossless coding and (single-resolution) quantization to multiresolution quantizers. A variety of universal multiresolution source coding results follow, including necessary and sufficient conditions for the existence of universal multiresolution codes, rate of convergence bounds for universal multiresolution coding performance to the theoretical bound, and a new multiresolution approach to two-stage universal source coding