19,694 research outputs found
Cluster validation by measurement of clustering characteristics relevant to the user
There are many cluster analysis methods that can produce quite different
clusterings on the same dataset. Cluster validation is about the evaluation of
the quality of a clustering; "relative cluster validation" is about using such
criteria to compare clusterings. This can be used to select one of a set of
clusterings from different methods, or from the same method ran with different
parameters such as different numbers of clusters.
There are many cluster validation indexes in the literature. Most of them
attempt to measure the overall quality of a clustering by a single number, but
this can be inappropriate. There are various different characteristics of a
clustering that can be relevant in practice, depending on the aim of
clustering, such as low within-cluster distances and high between-cluster
separation.
In this paper, a number of validation criteria will be introduced that refer
to different desirable characteristics of a clustering, and that characterise a
clustering in a multidimensional way. In specific applications the user may be
interested in some of these criteria rather than others. A focus of the paper
is on methodology to standardise the different characteristics so that users
can aggregate them in a suitable way specifying weights for the various
criteria that are relevant in the clustering application at hand.Comment: 20 pages 2 figure
Periodic travelling wave solutions of discrete nonlinear Schr\"odinger equations
The existence of nonzero periodic travelling wave solutions for a general
discrete nonlinear Schr\"odinger equation (DNLS) on finite one-dimensional
lattices is proved. The DNLS features a general nonlinear term and variable
range of interactions going beyond the usual nearest-neighbour interaction. The
problem of the existence of travelling wave solutions is converted into a fixed
point problem for an operator on some appropriate function space which is
solved by means of Schauder's Fixed Point Theorem
Expectation Propagation on the Maximum of Correlated Normal Variables
Many inference problems involving questions of optimality ask for the maximum
or the minimum of a finite set of unknown quantities. This technical report
derives the first two posterior moments of the maximum of two correlated
Gaussian variables and the first two posterior moments of the two generating
variables (corresponding to Gaussian approximations minimizing relative
entropy). It is shown how this can be used to build a heuristic approximation
to the maximum relationship over a finite set of Gaussian variables, allowing
approximate inference by Expectation Propagation on such quantities.Comment: 11 pages, 7 figure
Social Facts Explained and Presupposed
Attempts are often made to explain collective action in terms of the interaction of individuals. A common objection to such attempts is that they are circular: Since every interaction presupposes the existence of common practices and common practices involve collective action, no analysis of collective agency in terms of interaction can reduce collectivity away. In this essay I will argue that this does not constitute a real circularity. It is true that common practices are presupposed in every attempt to explain collective action. However, this does not mean that every analysis of collective action presupposes an understanding of collective action. Common practices do not involve or presuppose particular collective actions. They are more fundamental than individual or collective agency. The subject of a common practice is not a "us" or "them", but the impersonal "one": "One does this and that". What "one does" is not yet a joint activity. It is not a particular action at all
Breakdown points for maximum likelihood estimators of location-scale mixtures
ML-estimation based on mixtures of Normal distributions is a widely used tool
for cluster analysis. However, a single outlier can make the parameter
estimation of at least one of the mixture components break down. Among others,
the estimation of mixtures of t-distributions by McLachlan and
Peel [Finite Mixture Models (2000) Wiley, New York] and the addition of a
further mixture component accounting for ``noise'' by Fraley and Raftery
[The Computer J. 41 (1998) 578-588] were suggested as more robust
alternatives.
In this paper, the definition of an adequate robustness measure for cluster
analysis is discussed and bounds for the breakdown points of the mentioned
methods are given. It turns out that the two alternatives, while adding
stability in the presence of outliers of moderate size, do not possess a
substantially better breakdown behavior than estimation based on Normal
mixtures. If the number of clusters s is treated as fixed, r additional points
suffice for all three methods to let the parameters of r clusters explode. Only
in the case of r=s is this not possible for t-mixtures. The ability to estimate
the number of mixture components, for example, by use of the Bayesian
information criterion of Schwarz [Ann. Statist. 6 (1978)
461-464], and to isolate gross outliers as clusters of one point, is crucial
for an improved breakdown behavior of all three techniques. Furthermore, a
mixture of Normals with an improper uniform distribution is proposed to achieve
more robustness in the case of a fixed number of components.Comment: Published by the Institute of Mathematical Statistics
(http://www.imstat.org) in the Annals of Statistics
(http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000057
- …