14,918 research outputs found

    Clustering in Block Markov Chains

    Get PDF
    This paper considers cluster detection in Block Markov Chains (BMCs). These Markov chains are characterized by a block structure in their transition matrix. More precisely, the nn possible states are divided into a finite number of KK groups or clusters, such that states in the same cluster exhibit the same transition rates to other states. One observes a trajectory of the Markov chain, and the objective is to recover, from this observation only, the (initially unknown) clusters. In this paper we devise a clustering procedure that accurately, efficiently, and provably detects the clusters. We first derive a fundamental information-theoretical lower bound on the detection error rate satisfied under any clustering algorithm. This bound identifies the parameters of the BMC, and trajectory lengths, for which it is possible to accurately detect the clusters. We next develop two clustering algorithms that can together accurately recover the cluster structure from the shortest possible trajectories, whenever the parameters allow detection. These algorithms thus reach the fundamental detectability limit, and are optimal in that sense.Comment: 73 pages, 18 plots, second revisio

    Detection and Evaluation of Clusters within Sequential Data

    Full text link
    Motivated by theoretical advancements in dimensionality reduction techniques we use a recent model, called Block Markov Chains, to conduct a practical study of clustering in real-world sequential data. Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees and can be deployed in sparse data regimes. Despite these favorable theoretical properties, a thorough evaluation of these algorithms in realistic settings has been lacking. We address this issue and investigate the suitability of these clustering algorithms in exploratory data analysis of real-world sequential data. In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets. In order to evaluate the determined clusters, and the associated Block Markov Chain model, we further develop a set of evaluation tools. These tools include benchmarking, spectral noise analysis and statistical model selection tools. An efficient implementation of the clustering algorithm and the new evaluation tools is made available together with this paper. Practical challenges associated to real-world data are encountered and discussed. It is ultimately found that the Block Markov Chain model assumption, together with the tools developed here, can indeed produce meaningful insights in exploratory data analyses despite the complexity and sparsity of real-world data.Comment: 37 pages, 12 figure

    A robust spectral method for finding lumpings and meta stable states of non-reversible Markov chains

    Full text link
    A spectral method for identifying lumping in large Markov chains is presented. Identification of meta stable states is treated as a special case. The method is based on spectral analysis of a self-adjoint matrix that is a function of the original transition matrix. It is demonstrated that the technique is more robust than existing methods when applied to noisy non-reversible Markov chains.Comment: 10 pages, 7 figure

    Modeling sequences and temporal networks with dynamic community structures

    Full text link
    In evolving complex systems such as air traffic and social organizations, collective effects emerge from their many components' dynamic interactions. While the dynamic interactions can be represented by temporal networks with nodes and links that change over time, they remain highly complex. It is therefore often necessary to use methods that extract the temporal networks' large-scale dynamic community structure. However, such methods are subject to overfitting or suffer from effects of arbitrary, a priori imposed timescales, which should instead be extracted from data. Here we simultaneously address both problems and develop a principled data-driven method that determines relevant timescales and identifies patterns of dynamics that take place on networks as well as shape the networks themselves. We base our method on an arbitrary-order Markov chain model with community structure, and develop a nonparametric Bayesian inference framework that identifies the simplest such model that can explain temporal interaction data.Comment: 15 Pages, 6 figures, 2 table
    • …
    corecore