19,694 research outputs found

    Cluster validation by measurement of clustering characteristics relevant to the user

    Full text link
    There are many cluster analysis methods that can produce quite different clusterings on the same dataset. Cluster validation is about the evaluation of the quality of a clustering; "relative cluster validation" is about using such criteria to compare clusterings. This can be used to select one of a set of clusterings from different methods, or from the same method ran with different parameters such as different numbers of clusters. There are many cluster validation indexes in the literature. Most of them attempt to measure the overall quality of a clustering by a single number, but this can be inappropriate. There are various different characteristics of a clustering that can be relevant in practice, depending on the aim of clustering, such as low within-cluster distances and high between-cluster separation. In this paper, a number of validation criteria will be introduced that refer to different desirable characteristics of a clustering, and that characterise a clustering in a multidimensional way. In specific applications the user may be interested in some of these criteria rather than others. A focus of the paper is on methodology to standardise the different characteristics so that users can aggregate them in a suitable way specifying weights for the various criteria that are relevant in the clustering application at hand.Comment: 20 pages 2 figure

    Periodic travelling wave solutions of discrete nonlinear Schr\"odinger equations

    Full text link
    The existence of nonzero periodic travelling wave solutions for a general discrete nonlinear Schr\"odinger equation (DNLS) on finite one-dimensional lattices is proved. The DNLS features a general nonlinear term and variable range of interactions going beyond the usual nearest-neighbour interaction. The problem of the existence of travelling wave solutions is converted into a fixed point problem for an operator on some appropriate function space which is solved by means of Schauder's Fixed Point Theorem

    Expectation Propagation on the Maximum of Correlated Normal Variables

    Full text link
    Many inference problems involving questions of optimality ask for the maximum or the minimum of a finite set of unknown quantities. This technical report derives the first two posterior moments of the maximum of two correlated Gaussian variables and the first two posterior moments of the two generating variables (corresponding to Gaussian approximations minimizing relative entropy). It is shown how this can be used to build a heuristic approximation to the maximum relationship over a finite set of Gaussian variables, allowing approximate inference by Expectation Propagation on such quantities.Comment: 11 pages, 7 figure

    Social Facts Explained and Presupposed

    Get PDF
    Attempts are often made to explain collective action in terms of the interaction of individuals. A common objection to such attempts is that they are circular: Since every interaction presupposes the existence of common practices and common practices involve collective action, no analysis of collective agency in terms of interaction can reduce collectivity away. In this essay I will argue that this does not constitute a real circularity. It is true that common practices are presupposed in every attempt to explain collective action. However, this does not mean that every analysis of collective action presupposes an understanding of collective action. Common practices do not involve or presuppose particular collective actions. They are more fundamental than individual or collective agency. The subject of a common practice is not a "us" or "them", but the impersonal "one": "One does this and that". What "one does" is not yet a joint activity. It is not a particular action at all

    Breakdown points for maximum likelihood estimators of location-scale mixtures

    Get PDF
    ML-estimation based on mixtures of Normal distributions is a widely used tool for cluster analysis. However, a single outlier can make the parameter estimation of at least one of the mixture components break down. Among others, the estimation of mixtures of t-distributions by McLachlan and Peel [Finite Mixture Models (2000) Wiley, New York] and the addition of a further mixture component accounting for ``noise'' by Fraley and Raftery [The Computer J. 41 (1998) 578-588] were suggested as more robust alternatives. In this paper, the definition of an adequate robustness measure for cluster analysis is discussed and bounds for the breakdown points of the mentioned methods are given. It turns out that the two alternatives, while adding stability in the presence of outliers of moderate size, do not possess a substantially better breakdown behavior than estimation based on Normal mixtures. If the number of clusters s is treated as fixed, r additional points suffice for all three methods to let the parameters of r clusters explode. Only in the case of r=s is this not possible for t-mixtures. The ability to estimate the number of mixture components, for example, by use of the Bayesian information criterion of Schwarz [Ann. Statist. 6 (1978) 461-464], and to isolate gross outliers as clusters of one point, is crucial for an improved breakdown behavior of all three techniques. Furthermore, a mixture of Normals with an improper uniform distribution is proposed to achieve more robustness in the case of a fixed number of components.Comment: Published by the Institute of Mathematical Statistics (http://www.imstat.org) in the Annals of Statistics (http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000057
    corecore