11,031 research outputs found

    Topological Point Cloud Clustering

    Full text link
    We present Topological Point Cloud Clustering (TPCC), a new method to cluster points in an arbitrary point cloud based on their contribution to global topological features. TPCC synthesizes desirable features from spectral clustering and topological data analysis and is based on considering the spectral properties of a simplicial complex associated to the considered point cloud. As it is based on considering sparse eigenvector computations, TPCC is similarly easy to interpret and implement as spectral clustering. However, by focusing not just on a single matrix associated to a graph created from the point cloud data, but on a whole set of Hodge-Laplacians associated to an appropriately constructed simplicial complex, we can leverage a far richer set of topological features to characterize the data points within the point cloud and benefit from the relative robustness of topological techniques against noise. We test the performance of TPCC on both synthetic and real-world data and compare it with classical spectral clustering

    Spectral redemption: clustering sparse networks

    Get PDF
    Spectral algorithms are classic approaches to clustering and community detection in networks. However, for sparse networks the standard versions of these algorithms are suboptimal, in some cases completely failing to detect communities even when other algorithms such as belief propagation can do so. Here we introduce a new class of spectral algorithms based on a non-backtracking walk on the directed edges of the graph. The spectrum of this operator is much better-behaved than that of the adjacency matrix or other commonly used matrices, maintaining a strong separation between the bulk eigenvalues and the eigenvalues relevant to community structure even in the sparse case. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all the way down to the theoretical limit. We also show the spectrum of the non-backtracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering.Comment: 11 pages, 6 figures. Clarified to what extent our claims are rigorous, and to what extent they are conjectures; also added an interpretation of the eigenvectors of the 2n-dimensional version of the non-backtracking matri

    Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis

    Full text link
    In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the biomolecular data analysis. With the combination of spectral graph method, I reveal the essential difference between the global scale models and local scale ones in structure clustering, i.e., different optimization on Euclidean (or spatial) distances and sequential (or genomic) distances. More specifically, clusters from global scale models optimize Euclidean distance relations. Local scale models, on the other hand, result in clusters that optimize the genomic distance relations. For a biomolecular data, Euclidean distances and sequential distances are two independent variables, which can never be optimized simultaneously in data clustering. However, sequence scale in my SeqMM can work as a tuning parameter that balances these two variables and deliver different clusterings based on my purposes. Further, my SeqMM is used to explore the hierarchical structures of chromosomes. I find that in global scale, the Fiedler vector from my SeqMM bears a great similarity with the principal vector from principal component analysis, and can be used to study genomic compartments. In TAD analysis, I find that TADs evaluated from different scales are not consistent and vary a lot. Particularly when the sequence scale is small, the calculated TAD boundaries are dramatically different. Even for regions with high contact frequencies, TAD regions show no obvious consistence. However, when the scale value increases further, although TADs are still quite different, TAD boundaries in these high contact frequency regions become more and more consistent. Finally, I find that for a fixed local scale, my method can deliver very robust TAD boundaries in different cluster numbers.Comment: 22 PAGES, 13 FIGURE

    Relations Between Adjacency and Modularity Graph Partitioning

    Full text link
    In this paper the exact linear relation between the leading eigenvector of the unnormalized modularity matrix and the eigenvectors of the adjacency matrix is developed. Based on this analysis a method to approximate the leading eigenvector of the modularity matrix is given, and the relative error of the approximation is derived. A complete proof of the equivalence between normalized modularity clustering and normalized adjacency clustering is also given. Some applications and experiments are given to illustrate and corroborate the points that are made in the theoretical development.Comment: 11 page

    Spectral and Dynamical Properties in Classes of Sparse Networks with Mesoscopic Inhomogeneities

    Full text link
    We study structure, eigenvalue spectra and diffusion dynamics in a wide class of networks with subgraphs (modules) at mesoscopic scale. The networks are grown within the model with three parameters controlling the number of modules, their internal structure as scale-free and correlated subgraphs, and the topology of connecting network. Within the exhaustive spectral analysis for both the adjacency matrix and the normalized Laplacian matrix we identify the spectral properties which characterize the mesoscopic structure of sparse cyclic graphs and trees. The minimally connected nodes, clustering, and the average connectivity affect the central part of the spectrum. The number of distinct modules leads to an extra peak at the lower part of the Laplacian spectrum in cyclic graphs. Such a peak does not occur in the case of topologically distinct tree-subgraphs connected on a tree. Whereas the associated eigenvectors remain localized on the subgraphs both in trees and cyclic graphs. We also find a characteristic pattern of periodic localization along the chains on the tree for the eigenvector components associated with the largest eigenvalue equal 2 of the Laplacian. We corroborate the results with simulations of the random walk on several types of networks. Our results for the distribution of return-time of the walk to the origin (autocorrelator) agree well with recent analytical solution for trees, and it appear to be independent on their mesoscopic and global structure. For the cyclic graphs we find new results with twice larger stretching exponent of the tail of the distribution, which is virtually independent on the size of cycles. The modularity and clustering contribute to a power-law decay at short return times

    Hearing the clusters in a graph: A distributed algorithm

    Full text link
    We propose a novel distributed algorithm to cluster graphs. The algorithm recovers the solution obtained from spectral clustering without the need for expensive eigenvalue/vector computations. We prove that, by propagating waves through the graph, a local fast Fourier transform yields the local component of every eigenvector of the Laplacian matrix, thus providing clustering information. For large graphs, the proposed algorithm is orders of magnitude faster than random walk based approaches. We prove the equivalence of the proposed algorithm to spectral clustering and derive convergence rates. We demonstrate the benefit of using this decentralized clustering algorithm for community detection in social graphs, accelerating distributed estimation in sensor networks and efficient computation of distributed multi-agent search strategies
    • …
    corecore