35 research outputs found
Clustering above Exponential Families with Tempered Exponential Measures
The link with exponential families has allowed -means clustering to be
generalized to a wide variety of data generating distributions in exponential
families and clustering distortions among Bregman divergences. Getting the
framework to work above exponential families is important to lift roadblocks
like the lack of robustness of some population minimizers carved in their
axiomatization. Current generalisations of exponential families like
-exponential families or even deformed exponential families fail at
achieving the goal. In this paper, we provide a new attempt at getting the
complete framework, grounded in a new generalisation of exponential families
that we introduce, tempered exponential measures (TEM). TEMs keep the maximum
entropy axiomatization framework of -exponential families, but instead of
normalizing the measure, normalize a dual called a co-distribution. Numerous
interesting properties arise for clustering such as improved and controllable
robustness for population minimizers, that keep a simple analytic form
The {\alpha}-divergences associated with a pair of strictly comparable quasi-arithmetic means
We generalize the family of -divergences using a pair of strictly
comparable weighted means. In particular, we obtain the -divergence in the
limit case (a generalization of the Kullback-Leibler
divergence) and the -divergence in the limit case (a
generalization of the reverse Kullback-Leibler divergence). We state the
condition for a pair of quasi-arithmetic means to be strictly comparable, and
report the formula for the quasi-arithmetic -divergences and its
subfamily of bipower homogeneous -divergences which belong to the
Csis\'ar's -divergences. Finally, we show that these generalized
quasi-arithmetic -divergences and -divergences can be decomposed as the
sum of generalized cross-entropies minus entropies, and rewritten as conformal
Bregman divergences using monotone embeddings.Comment: 18 page
Centroid-Based Clustering with ab-Divergences
Centroid-based clustering is a widely used technique within unsupervised learning
algorithms in many research fields. The success of any centroid-based clustering relies on the
choice of the similarity measure under use. In recent years, most studies focused on including several
divergence measures in the traditional hard k-means algorithm. In this article, we consider the
problem of centroid-based clustering using the family of ab-divergences, which is governed by two
parameters, a and b. We propose a new iterative algorithm, ab-k-means, giving closed-form solutions
for the computation of the sided centroids. The algorithm can be fine-tuned by means of this pair of
values, yielding a wide range of the most frequently used divergences. Moreover, it is guaranteed to
converge to local minima for a wide range of values of the pair (a, b). Our theoretical contribution
has been validated by several experiments performed with synthetic and real data and exploring the
(a, b) plane. The numerical results obtained confirm the quality of the algorithm and its suitability to
be used in several practical applications.MINECO TEC2017-82807-