312,297 research outputs found
Tanaka Theorem for Inelastic Maxwell Models
We show that the Euclidean Wasserstein distance is contractive for inelastic
homogeneous Boltzmann kinetic equations in the Maxwellian approximation and its
associated Kac-like caricature. This property is as a generalization of the
Tanaka theorem to inelastic interactions. Consequences are drawn on the
asymptotic behavior of solutions in terms only of the Euclidean Wasserstein
distance
An Eulerian Approach to the Analysis of Krause's Consensus Models
Abstract. In this paper we analyze a class of multi-agent consensus dynamical systems inspired by Krauseâs original model. As in Krauseâs, the basic assumption is the so-called bounded confidence: two agents can influence each other only when their state values are below a given distance threshold R. We study the system under an Eulerian point of view considering (possibly continuous) probability distributions of agents and we present original convergence results. The limit distribution is always necessarily a convex combination of delta functions at least R far apart from each other: in other terms these models are locally aggregating. The Eulerian perspective provides the natural framework for designing a numerical algorithm, by which we obtain several simulations in 1 and 2 dimensions
Relax, no need to round: integrality of clustering formulations
We study exact recovery conditions for convex relaxations of point cloud
clustering problems, focusing on two of the most common optimization problems
for unsupervised clustering: -means and -median clustering. Motivations
for focusing on convex relaxations are: (a) they come with a certificate of
optimality, and (b) they are generic tools which are relatively parameter-free,
not tailored to specific assumptions over the input. More precisely, we
consider the distributional setting where there are clusters in
and data from each cluster consists of points sampled from a
symmetric distribution within a ball of unit radius. We ask: what is the
minimal separation distance between cluster centers needed for convex
relaxations to exactly recover these clusters as the optimal integral
solution? For the -median linear programming relaxation we show a tight
bound: exact recovery is obtained given arbitrarily small pairwise separation
between the balls. In other words, the pairwise center
separation is . Under the same distributional model, the
-means LP relaxation fails to recover such clusters at separation as large
as . Yet, if we enforce PSD constraints on the -means LP, we get
exact cluster recovery at center separation .
In contrast, common heuristics such as Lloyd's algorithm (a.k.a. the -means
algorithm) can fail to recover clusters in this setting; even with arbitrarily
large cluster separation, k-means++ with overseeding by any constant factor
fails with high probability at exact cluster recovery. To complement the
theoretical analysis, we provide an experimental study of the recovery
guarantees for these various methods, and discuss several open problems which
these experiments suggest.Comment: 30 pages, ITCS 201
-means clustering of extremes
The -means clustering algorithm and its variant, the spherical -means
clustering, are among the most important and popular methods in unsupervised
learning and pattern detection. In this paper, we explore how the spherical
-means algorithm can be applied in the analysis of only the extremal
observations from a data set. By making use of multivariate extreme value
analysis we show how it can be adopted to find "prototypes" of extremal
dependence and we derive a consistency result for our suggested estimator. In
the special case of max-linear models we show furthermore that our procedure
provides an alternative way of statistical inference for this class of models.
Finally, we provide data examples which show that our method is able to find
relevant patterns in extremal observations and allows us to classify extremal
events
A survey of popular R packages for cluster analysis
Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring datasets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans and hclust functions; the mclust library; the poLCA library; and the clustMD library. The packages/functions cover a variety of cluster analysis methods for continuous data, categorical data or a collection of the two. The contrasting methods in the different packages are briefly introduced and basic usage of the functions is discussed. The use of the different methods is compared and contrasted and then illustrated on example data. In the discussion, links to information on other available libraries for different clustering methods and extensions beyond basic clustering methods are given. The code for the worked examples in Section 2 is available at http://www.stats.gla.ac.uk/~nd29c/Software/ClusterReviewCode.
Using proper divergence functions to evaluate climate models
It has been argued persuasively that, in order to evaluate climate models,
the probability distributions of model output need to be compared to the
corresponding empirical distributions of observed data. Distance measures
between probability distributions, also called divergence functions, can be
used for this purpose. We contend that divergence functions ought to be proper,
in the sense that acting on modelers' true beliefs is an optimal strategy.
Score divergences that derive from proper scoring rules are proper, with the
integrated quadratic distance and the Kullback-Leibler divergence being
particularly attractive choices. Other commonly used divergences fail to be
proper. In an illustration, we evaluate and rank simulations from fifteen
climate models for temperature extremes in a comparison to re-analysis data
- âŠ