3 research outputs found
Detection and Evaluation of Clusters within Sequential Data
Motivated by theoretical advancements in dimensionality reduction techniques
we use a recent model, called Block Markov Chains, to conduct a practical study
of clustering in real-world sequential data. Clustering algorithms for Block
Markov Chains possess theoretical optimality guarantees and can be deployed in
sparse data regimes. Despite these favorable theoretical properties, a thorough
evaluation of these algorithms in realistic settings has been lacking.
We address this issue and investigate the suitability of these clustering
algorithms in exploratory data analysis of real-world sequential data. In
particular, our sequential data is derived from human DNA, written text, animal
movement data and financial markets. In order to evaluate the determined
clusters, and the associated Block Markov Chain model, we further develop a set
of evaluation tools. These tools include benchmarking, spectral noise analysis
and statistical model selection tools. An efficient implementation of the
clustering algorithm and the new evaluation tools is made available together
with this paper.
Practical challenges associated to real-world data are encountered and
discussed. It is ultimately found that the Block Markov Chain model assumption,
together with the tools developed here, can indeed produce meaningful insights
in exploratory data analyses despite the complexity and sparsity of real-world
data.Comment: 37 pages, 12 figure
Astronautics and aeronautics, 1967 - Chronology on science, technology, and policy
Chronology of astronautics and aeronautics in 196