890 research outputs found
Asymptotics of hierarchical clustering for growing dimension
Modern day science presents many challenges to data analysts. Advances in data collection provide very large (number of observations and number of dimensions) data sets. In many areas of data analysis an informative task is to find natural separations of data into homogeneous groups, i.e. clusters. In this paper we study the asymptotic behavior of hierarchical clustering in situations where both sample size and dimension grow to infinity. We derive explicit signal vs noise boundaries between different types of clustering behaviors. We also show that the clustering behavior within the boundaries is the same across a wide spectrum of asymptotic settings
Clustering Financial Time Series: How Long is Enough?
Researchers have used from 30 days to several years of daily returns as
source data for clustering financial time series based on their correlations.
This paper sets up a statistical framework to study the validity of such
practices. We first show that clustering correlated random variables from their
observed values is statistically consistent. Then, we also give a first
empirical answer to the much debated question: How long should the time series
be? If too short, the clusters found can be spurious; if too long, dynamics can
be smoothed out.Comment: Accepted at IJCAI 201
A proposal of a methodological framework with experimental guidelines to investigate clustering stability on financial time series
We present in this paper an empirical framework motivated by the practitioner
point of view on stability. The goal is to both assess clustering validity and
yield market insights by providing through the data perturbations we propose a
multi-view of the assets' clustering behaviour. The perturbation framework is
illustrated on an extensive credit default swap time series database available
online at www.datagrapple.com.Comment: Accepted at ICMLA 201
Discrete scale invariance and complex dimensions
We discuss the concept of discrete scale invariance and how it leads to
complex critical exponents (or dimensions), i.e. to the log-periodic
corrections to scaling. After their initial suggestion as formal solutions of
renormalization group equations in the seventies, complex exponents have been
studied in the eighties in relation to various problems of physics embedded in
hierarchical systems. Only recently has it been realized that discrete scale
invariance and its associated complex exponents may appear ``spontaneously'' in
euclidean systems, i.e. without the need for a pre-existing hierarchy. Examples
are diffusion-limited-aggregation clusters, rupture in heterogeneous systems,
earthquakes, animals (a generalization of percolation) among many other
systems. We review the known mechanisms for the spontaneous generation of
discrete scale invariance and provide an extensive list of situations where
complex exponents have been found. This is done in order to provide a basis for
a better fundamental understanding of discrete scale invariance. The main
motivation to study discrete scale invariance and its signatures is that it
provides new insights in the underlying mechanisms of scale invariance. It may
also be very interesting for prediction purposes.Comment: significantly extended version (Oct. 27, 1998) with new examples in
several domains of the review paper with the same title published in Physics
Reports 297, 239-270 (1998
- …