14,918 research outputs found
Clustering in Block Markov Chains
This paper considers cluster detection in Block Markov Chains (BMCs). These
Markov chains are characterized by a block structure in their transition
matrix. More precisely, the possible states are divided into a finite
number of groups or clusters, such that states in the same cluster exhibit
the same transition rates to other states. One observes a trajectory of the
Markov chain, and the objective is to recover, from this observation only, the
(initially unknown) clusters. In this paper we devise a clustering procedure
that accurately, efficiently, and provably detects the clusters. We first
derive a fundamental information-theoretical lower bound on the detection error
rate satisfied under any clustering algorithm. This bound identifies the
parameters of the BMC, and trajectory lengths, for which it is possible to
accurately detect the clusters. We next develop two clustering algorithms that
can together accurately recover the cluster structure from the shortest
possible trajectories, whenever the parameters allow detection. These
algorithms thus reach the fundamental detectability limit, and are optimal in
that sense.Comment: 73 pages, 18 plots, second revisio
Detection and Evaluation of Clusters within Sequential Data
Motivated by theoretical advancements in dimensionality reduction techniques
we use a recent model, called Block Markov Chains, to conduct a practical study
of clustering in real-world sequential data. Clustering algorithms for Block
Markov Chains possess theoretical optimality guarantees and can be deployed in
sparse data regimes. Despite these favorable theoretical properties, a thorough
evaluation of these algorithms in realistic settings has been lacking.
We address this issue and investigate the suitability of these clustering
algorithms in exploratory data analysis of real-world sequential data. In
particular, our sequential data is derived from human DNA, written text, animal
movement data and financial markets. In order to evaluate the determined
clusters, and the associated Block Markov Chain model, we further develop a set
of evaluation tools. These tools include benchmarking, spectral noise analysis
and statistical model selection tools. An efficient implementation of the
clustering algorithm and the new evaluation tools is made available together
with this paper.
Practical challenges associated to real-world data are encountered and
discussed. It is ultimately found that the Block Markov Chain model assumption,
together with the tools developed here, can indeed produce meaningful insights
in exploratory data analyses despite the complexity and sparsity of real-world
data.Comment: 37 pages, 12 figure
A robust spectral method for finding lumpings and meta stable states of non-reversible Markov chains
A spectral method for identifying lumping in large Markov chains is
presented. Identification of meta stable states is treated as a special case.
The method is based on spectral analysis of a self-adjoint matrix that is a
function of the original transition matrix. It is demonstrated that the
technique is more robust than existing methods when applied to noisy
non-reversible Markov chains.Comment: 10 pages, 7 figure
Modeling sequences and temporal networks with dynamic community structures
In evolving complex systems such as air traffic and social organizations,
collective effects emerge from their many components' dynamic interactions.
While the dynamic interactions can be represented by temporal networks with
nodes and links that change over time, they remain highly complex. It is
therefore often necessary to use methods that extract the temporal networks'
large-scale dynamic community structure. However, such methods are subject to
overfitting or suffer from effects of arbitrary, a priori imposed timescales,
which should instead be extracted from data. Here we simultaneously address
both problems and develop a principled data-driven method that determines
relevant timescales and identifies patterns of dynamics that take place on
networks as well as shape the networks themselves. We base our method on an
arbitrary-order Markov chain model with community structure, and develop a
nonparametric Bayesian inference framework that identifies the simplest such
model that can explain temporal interaction data.Comment: 15 Pages, 6 figures, 2 table
- …