183 research outputs found
Total Jensen divergences: Definition, Properties and k-Means++ Clustering
We present a novel class of divergences induced by a smooth convex function
called total Jensen divergences. Those total Jensen divergences are invariant
by construction to rotations, a feature yielding regularization of ordinary
Jensen divergences by a conformal factor. We analyze the relationships between
this novel class of total Jensen divergences and the recently introduced total
Bregman divergences. We then proceed by defining the total Jensen centroids as
average distortion minimizers, and study their robustness performance to
outliers. Finally, we prove that the k-means++ initialization that bypasses
explicit centroid computations is good enough in practice to guarantee
probabilistically a constant approximation factor to the optimal k-means
clustering.Comment: 27 page
Bregman Voronoi Diagrams: Properties, Algorithms and Applications
The Voronoi diagram of a finite set of objects is a fundamental geometric
structure that subdivides the embedding space into regions, each region
consisting of the points that are closer to a given object than to the others.
We may define many variants of Voronoi diagrams depending on the class of
objects, the distance functions and the embedding space. In this paper, we
investigate a framework for defining and building Voronoi diagrams for a broad
class of distance functions called Bregman divergences. Bregman divergences
include not only the traditional (squared) Euclidean distance but also various
divergence measures based on entropic functions. Accordingly, Bregman Voronoi
diagrams allow to define information-theoretic Voronoi diagrams in statistical
parametric spaces based on the relative entropy of distributions. We define
several types of Bregman diagrams, establish correspondences between those
diagrams (using the Legendre transformation), and show how to compute them
efficiently. We also introduce extensions of these diagrams, e.g. k-order and
k-bag Bregman Voronoi diagrams, and introduce Bregman triangulations of a set
of points and their connexion with Bregman Voronoi diagrams. We show that these
triangulations capture many of the properties of the celebrated Delaunay
triangulation. Finally, we give some applications of Bregman Voronoi diagrams
which are of interest in the context of computational geometry and machine
learning.Comment: Extend the proceedings abstract of SODA 2007 (46 pages, 15 figures
Clustering constrained by dependencies
Clustering is the unsupervised method of grouping data samples to form a partition of a given dataset. Such grouping is typically done based on homogeneity assumptions of clusters over an attribute space and hence the precise definition of the similarity metric affects the clusters inferred. In recent years, new formulations of clustering have emerged that posit indirect constraints on clustering, typically in terms of preserving dependencies between data samples and auxiliary variables. These formulations ïŹnd applications in bioinformatics, web mining, social network analysis, and many other domains. The purpose of this survey is to provide a gentle introduction to these formulations, their mathematical assumptions, and the contexts under which they are applicable
- âŠ