11,031 research outputs found
Topological Point Cloud Clustering
We present Topological Point Cloud Clustering (TPCC), a new method to cluster
points in an arbitrary point cloud based on their contribution to global
topological features. TPCC synthesizes desirable features from spectral
clustering and topological data analysis and is based on considering the
spectral properties of a simplicial complex associated to the considered point
cloud. As it is based on considering sparse eigenvector computations, TPCC is
similarly easy to interpret and implement as spectral clustering. However, by
focusing not just on a single matrix associated to a graph created from the
point cloud data, but on a whole set of Hodge-Laplacians associated to an
appropriately constructed simplicial complex, we can leverage a far richer set
of topological features to characterize the data points within the point cloud
and benefit from the relative robustness of topological techniques against
noise. We test the performance of TPCC on both synthetic and real-world data
and compare it with classical spectral clustering
Spectral redemption: clustering sparse networks
Spectral algorithms are classic approaches to clustering and community
detection in networks. However, for sparse networks the standard versions of
these algorithms are suboptimal, in some cases completely failing to detect
communities even when other algorithms such as belief propagation can do so.
Here we introduce a new class of spectral algorithms based on a
non-backtracking walk on the directed edges of the graph. The spectrum of this
operator is much better-behaved than that of the adjacency matrix or other
commonly used matrices, maintaining a strong separation between the bulk
eigenvalues and the eigenvalues relevant to community structure even in the
sparse case. We show that our algorithm is optimal for graphs generated by the
stochastic block model, detecting communities all the way down to the
theoretical limit. We also show the spectrum of the non-backtracking operator
for some real-world networks, illustrating its advantages over traditional
spectral clustering.Comment: 11 pages, 6 figures. Clarified to what extent our claims are
rigorous, and to what extent they are conjectures; also added an
interpretation of the eigenvectors of the 2n-dimensional version of the
non-backtracking matri
Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis
In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the
biomolecular data analysis. With the combination of spectral graph method, I
reveal the essential difference between the global scale models and local scale
ones in structure clustering, i.e., different optimization on Euclidean (or
spatial) distances and sequential (or genomic) distances. More specifically,
clusters from global scale models optimize Euclidean distance relations. Local
scale models, on the other hand, result in clusters that optimize the genomic
distance relations. For a biomolecular data, Euclidean distances and sequential
distances are two independent variables, which can never be optimized
simultaneously in data clustering. However, sequence scale in my SeqMM can work
as a tuning parameter that balances these two variables and deliver different
clusterings based on my purposes. Further, my SeqMM is used to explore the
hierarchical structures of chromosomes. I find that in global scale, the
Fiedler vector from my SeqMM bears a great similarity with the principal vector
from principal component analysis, and can be used to study genomic
compartments. In TAD analysis, I find that TADs evaluated from different scales
are not consistent and vary a lot. Particularly when the sequence scale is
small, the calculated TAD boundaries are dramatically different. Even for
regions with high contact frequencies, TAD regions show no obvious consistence.
However, when the scale value increases further, although TADs are still quite
different, TAD boundaries in these high contact frequency regions become more
and more consistent. Finally, I find that for a fixed local scale, my method
can deliver very robust TAD boundaries in different cluster numbers.Comment: 22 PAGES, 13 FIGURE
Relations Between Adjacency and Modularity Graph Partitioning
In this paper the exact linear relation between the leading eigenvector of
the unnormalized modularity matrix and the eigenvectors of the adjacency matrix
is developed. Based on this analysis a method to approximate the leading
eigenvector of the modularity matrix is given, and the relative error of the
approximation is derived. A complete proof of the equivalence between
normalized modularity clustering and normalized adjacency clustering is also
given. Some applications and experiments are given to illustrate and
corroborate the points that are made in the theoretical development.Comment: 11 page
Spectral and Dynamical Properties in Classes of Sparse Networks with Mesoscopic Inhomogeneities
We study structure, eigenvalue spectra and diffusion dynamics in a wide class
of networks with subgraphs (modules) at mesoscopic scale. The networks are
grown within the model with three parameters controlling the number of modules,
their internal structure as scale-free and correlated subgraphs, and the
topology of connecting network. Within the exhaustive spectral analysis for
both the adjacency matrix and the normalized Laplacian matrix we identify the
spectral properties which characterize the mesoscopic structure of sparse
cyclic graphs and trees. The minimally connected nodes, clustering, and the
average connectivity affect the central part of the spectrum. The number of
distinct modules leads to an extra peak at the lower part of the Laplacian
spectrum in cyclic graphs. Such a peak does not occur in the case of
topologically distinct tree-subgraphs connected on a tree. Whereas the
associated eigenvectors remain localized on the subgraphs both in trees and
cyclic graphs. We also find a characteristic pattern of periodic localization
along the chains on the tree for the eigenvector components associated with the
largest eigenvalue equal 2 of the Laplacian. We corroborate the results with
simulations of the random walk on several types of networks. Our results for
the distribution of return-time of the walk to the origin (autocorrelator)
agree well with recent analytical solution for trees, and it appear to be
independent on their mesoscopic and global structure. For the cyclic graphs we
find new results with twice larger stretching exponent of the tail of the
distribution, which is virtually independent on the size of cycles. The
modularity and clustering contribute to a power-law decay at short return
times
Hearing the clusters in a graph: A distributed algorithm
We propose a novel distributed algorithm to cluster graphs. The algorithm
recovers the solution obtained from spectral clustering without the need for
expensive eigenvalue/vector computations. We prove that, by propagating waves
through the graph, a local fast Fourier transform yields the local component of
every eigenvector of the Laplacian matrix, thus providing clustering
information. For large graphs, the proposed algorithm is orders of magnitude
faster than random walk based approaches. We prove the equivalence of the
proposed algorithm to spectral clustering and derive convergence rates. We
demonstrate the benefit of using this decentralized clustering algorithm for
community detection in social graphs, accelerating distributed estimation in
sensor networks and efficient computation of distributed multi-agent search
strategies
- …