2,927 research outputs found
What are the true clusters?
Constructivist philosophy and Hasok Chang's active scientific realism are
used to argue that the idea of "truth" in cluster analysis depends on the
context and the clustering aims. Different characteristics of clusterings are
required in different situations. Researchers should be explicit about on what
requirements and what idea of "true clusters" their research is based, because
clustering becomes scientific not through uniqueness but through transparent
and open communication. The idea of "natural kinds" is a human construct, but
it highlights the human experience that the reality outside the observer's
control seems to make certain distinctions between categories inevitable.
Various desirable characteristics of clusterings and various approaches to
define a context-dependent truth are listed, and I discuss what impact these
ideas can have on the comparison of clustering methods, and the choice of a
clustering methods and related decisions in practice
Cluster validation by measurement of clustering characteristics relevant to the user
There are many cluster analysis methods that can produce quite different
clusterings on the same dataset. Cluster validation is about the evaluation of
the quality of a clustering; "relative cluster validation" is about using such
criteria to compare clusterings. This can be used to select one of a set of
clusterings from different methods, or from the same method ran with different
parameters such as different numbers of clusters.
There are many cluster validation indexes in the literature. Most of them
attempt to measure the overall quality of a clustering by a single number, but
this can be inappropriate. There are various different characteristics of a
clustering that can be relevant in practice, depending on the aim of
clustering, such as low within-cluster distances and high between-cluster
separation.
In this paper, a number of validation criteria will be introduced that refer
to different desirable characteristics of a clustering, and that characterise a
clustering in a multidimensional way. In specific applications the user may be
interested in some of these criteria rather than others. A focus of the paper
is on methodology to standardise the different characteristics so that users
can aggregate them in a suitable way specifying weights for the various
criteria that are relevant in the clustering application at hand.Comment: 20 pages 2 figure
Nonparametric Bayes dynamic modeling of relational data
Symmetric binary matrices representing relations among entities are commonly
collected in many areas. Our focus is on dynamically evolving binary relational
matrices, with interest being in inference on the relationship structure and
prediction. We propose a nonparametric Bayesian dynamic model, which reduces
dimensionality in characterizing the binary matrix through a lower-dimensional
latent space representation, with the latent coordinates evolving in continuous
time via Gaussian processes. By using a logistic mapping function from the
probability matrix space to the latent relational space, we obtain a flexible
and computational tractable formulation. Employing P\`olya-Gamma data
augmentation, an efficient Gibbs sampler is developed for posterior
computation, with the dimension of the latent space automatically inferred. We
provide some theoretical results on flexibility of the model, and illustrate
performance via simulation experiments. We also consider an application to
co-movements in world financial markets
Rogue seasonality detection in supply chains
Rogue seasonality or unintended cyclic variability in order and other supply chain variables is an endogenous disturbance generated by a company’s internal processes such as inventory and production control systems. The ability to automatically detect, diagnose and discriminate rogue seasonality from exogenous disturbances is of prime importance to decision makers. This paper compares the effectiveness of alternative time series techniques based on Fourier and discrete wavelet transforms, autocorrelation and cross correlation functions and autoregressive model in detecting rogue seasonality. Rogue seasonalities of various intensities were generated using different simulation designs and demand patterns to evaluate each of these techniques. An index for rogue seasonality, based on the clustering profile of the supply chain variables was defined and used in the evaluation. The Fourier transform technique was found to be the most effective for rogue seasonality detection, which was also subsequently validated using data from a steel supply network
Timescale effect estimation in time-series studies of air pollution and health: A Singular Spectrum Analysis approach
A wealth of epidemiological data suggests an association between
mortality/morbidity from pulmonary and cardiovascular adverse events and air
pollution, but uncertainty remains as to the extent implied by those
associations although the abundance of the data. In this paper we describe an
SSA (Singular Spectrum Analysis) based approach in order to decompose the
time-series of particulate matter concentration into a set of exposure
variables, each one representing a different timescale. We implement our
methodology to investigate both acute and long-term effects of
exposure on morbidity from respiratory causes within the urban area of Bari,
Italy.Comment: Published in at http://dx.doi.org/10.1214/07-EJS123 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …