32,349 research outputs found
Consistency of Spectral Hypergraph Partitioning under Planted Partition Model
Hypergraph partitioning lies at the heart of a number of problems in machine
learning and network sciences. Many algorithms for hypergraph partitioning have
been proposed that extend standard approaches for graph partitioning to the
case of hypergraphs. However, theoretical aspects of such methods have seldom
received attention in the literature as compared to the extensive studies on
the guarantees of graph partitioning. For instance, consistency results of
spectral graph partitioning under the stochastic block model are well known. In
this paper, we present a planted partition model for sparse random non-uniform
hypergraphs that generalizes the stochastic block model. We derive an error
bound for a spectral hypergraph partitioning algorithm under this model using
matrix concentration inequalities. To the best of our knowledge, this is the
first consistency result related to partitioning non-uniform hypergraphs.Comment: 35 pages, 2 figures, 1 tabl
Domain Adaptation on Graphs by Learning Graph Topologies: Theoretical Analysis and an Algorithm
Traditional machine learning algorithms assume that the training and test
data have the same distribution, while this assumption does not necessarily
hold in real applications. Domain adaptation methods take into account the
deviations in the data distribution. In this work, we study the problem of
domain adaptation on graphs. We consider a source graph and a target graph
constructed with samples drawn from data manifolds. We study the problem of
estimating the unknown class labels on the target graph using the label
information on the source graph and the similarity between the two graphs. We
particularly focus on a setting where the target label function is learnt such
that its spectrum is similar to that of the source label function. We first
propose a theoretical analysis of domain adaptation on graphs and present
performance bounds that characterize the target classification error in terms
of the properties of the graphs and the data manifolds. We show that the
classification performance improves as the topologies of the graphs get more
balanced, i.e., as the numbers of neighbors of different graph nodes become
more proportionate, and weak edges with small weights are avoided. Our results
also suggest that graph edges between too distant data samples should be
avoided for good generalization performance. We then propose a graph domain
adaptation algorithm inspired by our theoretical findings, which estimates the
label functions while learning the source and target graph topologies at the
same time. The joint graph learning and label estimation problem is formulated
through an objective function relying on our performance bounds, which is
minimized with an alternating optimization scheme. Experiments on synthetic and
real data sets suggest that the proposed method outperforms baseline
approaches
Modularity bounds for clusters located by leading eigenvectors of the normalized modularity matrix
Nodal theorems for generalized modularity matrices ensure that the cluster
located by the positive entries of the leading eigenvector of various
modularity matrices induces a connected subgraph. In this paper we obtain lower
bounds for the modularity of that set of nodes showing that, under certain
conditions, the nodal domains induced by eigenvectors corresponding to highly
positive eigenvalues of the normalized modularity matrix have indeed positive
modularity, that is they can be recognized as modules inside the network.
Moreover we establish Cheeger-type inequalities for the cut-modularity of the
graph, providing a theoretical support to the common understanding that highly
positive eigenvalues of modularity matrices are related with the possibility of
subdividing a network into communities
Spectral Embedding Norm: Looking Deep into the Spectrum of the Graph Laplacian
The extraction of clusters from a dataset which includes multiple clusters
and a significant background component is a non-trivial task of practical
importance. In image analysis this manifests for example in anomaly detection
and target detection. The traditional spectral clustering algorithm, which
relies on the leading eigenvectors to detect clusters, fails in such
cases. In this paper we propose the {\it spectral embedding norm} which sums
the squared values of the first normalized eigenvectors, where can be
significantly larger than . We prove that this quantity can be used to
separate clusters from the background in unbalanced settings, including extreme
cases such as outlier detection. The performance of the algorithm is not
sensitive to the choice of , and we demonstrate its application on synthetic
and real-world remote sensing and neuroimaging datasets
- …