Search CORE

2,927 research outputs found

What are the true clusters?

Author: Hennig Christian
Publication venue
Publication date: 01/01/2015
Field of study

Constructivist philosophy and Hasok Chang's active scientific realism are used to argue that the idea of "truth" in cluster analysis depends on the context and the clustering aims. Different characteristics of clusterings are required in different situations. Researchers should be explicit about on what requirements and what idea of "true clusters" their research is based, because clustering becomes scientific not through uniqueness but through transparent and open communication. The idea of "natural kinds" is a human construct, but it highlights the human experience that the reality outside the observer's control seems to make certain distinctions between categories inevitable. Various desirable characteristics of clusterings and various approaches to define a context-dependent truth are listed, and I discuss what impact these ideas can have on the comparison of clustering methods, and the choice of a clustering methods and related decisions in practice

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Cluster validation by measurement of clustering characteristics relevant to the user

Author: Bowcock
Calinski
Coretto
Fang
Franck
Halkidi
Hausdorf
Hennig
Hennig
Hennig
Hennig
Hubert
Hubert
Katsnelson
Kaufman
Lago-Fernandez
Stigler
Tibshirani
Publication venue
Publication date: 01/01/2019
Field of study

There are many cluster analysis methods that can produce quite different clusterings on the same dataset. Cluster validation is about the evaluation of the quality of a clustering; "relative cluster validation" is about using such criteria to compare clusterings. This can be used to select one of a set of clusterings from different methods, or from the same method ran with different parameters such as different numbers of clusters. There are many cluster validation indexes in the literature. Most of them attempt to measure the overall quality of a clustering by a single number, but this can be inappropriate. There are various different characteristics of a clustering that can be relevant in practice, depending on the aim of clustering, such as low within-cluster distances and high between-cluster separation. In this paper, a number of validation criteria will be introduced that refer to different desirable characteristics of a clustering, and that characterise a clustering in a multidimensional way. In specific applications the user may be interested in some of these criteria rather than others. A focus of the paper is on methodology to standardise the different characteristics so that users can aggregate them in a suitable way specifying weights for the various criteria that are relevant in the clustering application at hand.Comment: 20 pages 2 figure

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Nonparametric Bayes dynamic modeling of relational data

Author: Dunson David B.
Durante Daniele
Publication venue: 'Oxford University Press (OUP)'
Publication date: 19/11/2013
Field of study

Symmetric binary matrices representing relations among entities are commonly collected in many areas. Our focus is on dynamically evolving binary relational matrices, with interest being in inference on the relationship structure and prediction. We propose a nonparametric Bayesian dynamic model, which reduces dimensionality in characterizing the binary matrix through a lower-dimensional latent space representation, with the latent coordinates evolving in continuous time via Gaussian processes. By using a logistic mapping function from the probability matrix space to the latent relational space, we obtain a flexible and computational tractable formulation. Employing P\`olya-Gamma data augmentation, an efficient Gibbs sampler is developed for posterior computation, with the dimension of the latent space automatically inferred. We provide some theoretical results on flexibility of the model, and illustrate performance via simulation experiments. We also consider an application to co-movements in world financial markets

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Bocconi

Rogue seasonality detection in supply chains

Author: Naim M.
Naim M.
Shukla V.
Shukla V.
Thornhill N.
Thornhill N.
Publication venue: Elsevier
Publication date: 01/01/2012
Field of study

Rogue seasonality or unintended cyclic variability in order and other supply chain variables is an endogenous disturbance generated by a company’s internal processes such as inventory and production control systems. The ability to automatically detect, diagnose and discriminate rogue seasonality from exogenous disturbances is of prime importance to decision makers. This paper compares the effectiveness of alternative time series techniques based on Fourier and discrete wavelet transforms, autocorrelation and cross correlation functions and autoregressive model in detecting rogue seasonality. Rogue seasonalities of various intensities were generated using different simulation designs and demand patterns to evaluate each of these techniques. An index for rogue seasonality, based on the clustering profile of the supply chain variables was defined and used in the evaluation. The Fourier transform technique was found to be the most effective for rogue seasonality detection, which was also subsequently validated using data from a steel supply network

Timescale effect estimation in time-series studies of air pollution and health: A Singular Spectrum Analysis approach

Author: Bilancia Massimo
Stea Girolamo
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

A wealth of epidemiological data suggests an association between mortality/morbidity from pulmonary and cardiovascular adverse events and air pollution, but uncertainty remains as to the extent implied by those associations although the abundance of the data. In this paper we describe an SSA (Singular Spectrum Analysis) based approach in order to decompose the time-series of particulate matter concentration into a set of exposure variables, each one representing a different timescale. We implement our methodology to investigate both acute and long-term effects of

PM_{10}

exposure on morbidity from respiratory causes within the urban area of Bari, Italy.Comment: Published in at http://dx.doi.org/10.1214/07-EJS123 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Mapping of horizontal refrigerant two-phase flow patterns based on clustering of capacitive sensor signals

Author: Bauwens Bruno
Canière Hugo
De Paepe Michel
T'Joen Christophe
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

Archivsystem Ask23