17,732 research outputs found
Adaptive Evolutionary Clustering
In many practical applications of clustering, the objects to be clustered
evolve over time, and a clustering result is desired at each time step. In such
applications, evolutionary clustering typically outperforms traditional static
clustering by producing clustering results that reflect long-term trends while
being robust to short-term variations. Several evolutionary clustering
algorithms have recently been proposed, often by adding a temporal smoothness
penalty to the cost function of a static clustering method. In this paper, we
introduce a different approach to evolutionary clustering by accurately
tracking the time-varying proximities between objects followed by static
clustering. We present an evolutionary clustering framework that adaptively
estimates the optimal smoothing parameter using shrinkage estimation, a
statistical approach that improves a naive estimate using additional
information. The proposed framework can be used to extend a variety of static
clustering algorithms, including hierarchical, k-means, and spectral
clustering, into evolutionary clustering algorithms. Experiments on synthetic
and real data sets indicate that the proposed framework outperforms static
clustering and existing evolutionary clustering algorithms in many scenarios.Comment: To appear in Data Mining and Knowledge Discovery, MATLAB toolbox
available at http://tbayes.eecs.umich.edu/xukevin/affec
Natural data structure extracted from neighborhood-similarity graphs
'Big' high-dimensional data are commonly analyzed in low-dimensions, after
performing a dimensionality-reduction step that inherently distorts the data
structure. For the same purpose, clustering methods are also often used. These
methods also introduce a bias, either by starting from the assumption of a
particular geometric form of the clusters, or by using iterative schemes to
enhance cluster contours, with uncontrollable consequences. The goal of data
analysis should, however, be to encode and detect structural data features at
all scales and densities simultaneously, without assuming a parametric form of
data point distances, or modifying them. We propose a novel approach that
directly encodes data point neighborhood similarities as a sparse graph. Our
non-iterative framework permits a transparent interpretation of data, without
altering the original data dimension and metric. Several natural and synthetic
data applications demonstrate the efficacy of our novel approach
- …