2,927 research outputs found

    What are the true clusters?

    Get PDF
    Constructivist philosophy and Hasok Chang's active scientific realism are used to argue that the idea of "truth" in cluster analysis depends on the context and the clustering aims. Different characteristics of clusterings are required in different situations. Researchers should be explicit about on what requirements and what idea of "true clusters" their research is based, because clustering becomes scientific not through uniqueness but through transparent and open communication. The idea of "natural kinds" is a human construct, but it highlights the human experience that the reality outside the observer's control seems to make certain distinctions between categories inevitable. Various desirable characteristics of clusterings and various approaches to define a context-dependent truth are listed, and I discuss what impact these ideas can have on the comparison of clustering methods, and the choice of a clustering methods and related decisions in practice

    Cluster validation by measurement of clustering characteristics relevant to the user

    Full text link
    There are many cluster analysis methods that can produce quite different clusterings on the same dataset. Cluster validation is about the evaluation of the quality of a clustering; "relative cluster validation" is about using such criteria to compare clusterings. This can be used to select one of a set of clusterings from different methods, or from the same method ran with different parameters such as different numbers of clusters. There are many cluster validation indexes in the literature. Most of them attempt to measure the overall quality of a clustering by a single number, but this can be inappropriate. There are various different characteristics of a clustering that can be relevant in practice, depending on the aim of clustering, such as low within-cluster distances and high between-cluster separation. In this paper, a number of validation criteria will be introduced that refer to different desirable characteristics of a clustering, and that characterise a clustering in a multidimensional way. In specific applications the user may be interested in some of these criteria rather than others. A focus of the paper is on methodology to standardise the different characteristics so that users can aggregate them in a suitable way specifying weights for the various criteria that are relevant in the clustering application at hand.Comment: 20 pages 2 figure

    Nonparametric Bayes dynamic modeling of relational data

    Full text link
    Symmetric binary matrices representing relations among entities are commonly collected in many areas. Our focus is on dynamically evolving binary relational matrices, with interest being in inference on the relationship structure and prediction. We propose a nonparametric Bayesian dynamic model, which reduces dimensionality in characterizing the binary matrix through a lower-dimensional latent space representation, with the latent coordinates evolving in continuous time via Gaussian processes. By using a logistic mapping function from the probability matrix space to the latent relational space, we obtain a flexible and computational tractable formulation. Employing P\`olya-Gamma data augmentation, an efficient Gibbs sampler is developed for posterior computation, with the dimension of the latent space automatically inferred. We provide some theoretical results on flexibility of the model, and illustrate performance via simulation experiments. We also consider an application to co-movements in world financial markets

    Rogue seasonality detection in supply chains

    Get PDF
    Rogue seasonality or unintended cyclic variability in order and other supply chain variables is an endogenous disturbance generated by a company’s internal processes such as inventory and production control systems. The ability to automatically detect, diagnose and discriminate rogue seasonality from exogenous disturbances is of prime importance to decision makers. This paper compares the effectiveness of alternative time series techniques based on Fourier and discrete wavelet transforms, autocorrelation and cross correlation functions and autoregressive model in detecting rogue seasonality. Rogue seasonalities of various intensities were generated using different simulation designs and demand patterns to evaluate each of these techniques. An index for rogue seasonality, based on the clustering profile of the supply chain variables was defined and used in the evaluation. The Fourier transform technique was found to be the most effective for rogue seasonality detection, which was also subsequently validated using data from a steel supply network

    Timescale effect estimation in time-series studies of air pollution and health: A Singular Spectrum Analysis approach

    Full text link
    A wealth of epidemiological data suggests an association between mortality/morbidity from pulmonary and cardiovascular adverse events and air pollution, but uncertainty remains as to the extent implied by those associations although the abundance of the data. In this paper we describe an SSA (Singular Spectrum Analysis) based approach in order to decompose the time-series of particulate matter concentration into a set of exposure variables, each one representing a different timescale. We implement our methodology to investigate both acute and long-term effects of PM10PM_{10} exposure on morbidity from respiratory causes within the urban area of Bari, Italy.Comment: Published in at http://dx.doi.org/10.1214/07-EJS123 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore