5,114 research outputs found
Tensor Spectral Clustering for Partitioning Higher-order Network Structures
Spectral graph theory-based methods represent an important class of tools for
studying the structure of networks. Spectral methods are based on a first-order
Markov chain derived from a random walk on the graph and thus they cannot take
advantage of important higher-order network substructures such as triangles,
cycles, and feed-forward loops. Here we propose a Tensor Spectral Clustering
(TSC) algorithm that allows for modeling higher-order network structures in a
graph partitioning framework. Our TSC algorithm allows the user to specify
which higher-order network structures (cycles, feed-forward loops, etc.) should
be preserved by the network clustering. Higher-order network structures of
interest are represented using a tensor, which we then partition by developing
a multilinear spectral method. Our framework can be applied to discovering
layered flows in networks as well as graph anomaly detection, which we
illustrate on synthetic networks. In directed networks, a higher-order
structure of particular interest is the directed 3-cycle, which captures
feedback loops in networks. We demonstrate that our TSC algorithm produces
large partitions that cut fewer directed 3-cycles than standard spectral
clustering algorithms.Comment: SDM 201
Hearing the clusters in a graph: A distributed algorithm
We propose a novel distributed algorithm to cluster graphs. The algorithm
recovers the solution obtained from spectral clustering without the need for
expensive eigenvalue/vector computations. We prove that, by propagating waves
through the graph, a local fast Fourier transform yields the local component of
every eigenvector of the Laplacian matrix, thus providing clustering
information. For large graphs, the proposed algorithm is orders of magnitude
faster than random walk based approaches. We prove the equivalence of the
proposed algorithm to spectral clustering and derive convergence rates. We
demonstrate the benefit of using this decentralized clustering algorithm for
community detection in social graphs, accelerating distributed estimation in
sensor networks and efficient computation of distributed multi-agent search
strategies
Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis
Database theory and database practice are typically the domain of computer
scientists who adopt what may be termed an algorithmic perspective on their
data. This perspective is very different than the more statistical perspective
adopted by statisticians, scientific computers, machine learners, and other who
work on what may be broadly termed statistical data analysis. In this article,
I will address fundamental aspects of this algorithmic-statistical disconnect,
with an eye to bridging the gap between these two very different approaches. A
concept that lies at the heart of this disconnect is that of statistical
regularization, a notion that has to do with how robust is the output of an
algorithm to the noise properties of the input data. Although it is nearly
completely absent from computer science, which historically has taken the input
data as given and modeled algorithms discretely, regularization in one form or
another is central to nearly every application domain that applies algorithms
to noisy data. By using several case studies, I will illustrate, both
theoretically and empirically, the nonobvious fact that approximate
computation, in and of itself, can implicitly lead to statistical
regularization. This and other recent work suggests that, by exploiting in a
more principled way the statistical properties implicit in worst-case
algorithms, one can in many cases satisfy the bicriteria of having algorithms
that are scalable to very large-scale databases and that also have good
inferential or predictive properties.Comment: To appear in the Proceedings of the 2012 ACM Symposium on Principles
of Database Systems (PODS 2012
- …