92 research outputs found
A Method Based on Total Variation for Network Modularity Optimization using the MBO Scheme
The study of network structure is pervasive in sociology, biology, computer
science, and many other disciplines. One of the most important areas of network
science is the algorithmic detection of cohesive groups of nodes called
"communities". One popular approach to find communities is to maximize a
quality function known as {\em modularity} to achieve some sort of optimal
clustering of nodes. In this paper, we interpret the modularity function from a
novel perspective: we reformulate modularity optimization as a minimization
problem of an energy functional that consists of a total variation term and an
balance term. By employing numerical techniques from image processing
and compressive sensing -- such as convex splitting and the
Merriman-Bence-Osher (MBO) scheme -- we develop a variational algorithm for the
minimization problem. We present our computational results using both synthetic
benchmark networks and real data.Comment: 23 page
Simplified Energy Landscape for Modularity Using Total Variation
Networks capture pairwise interactions between entities and are frequently
used in applications such as social networks, food networks, and protein
interaction networks, to name a few. Communities, cohesive groups of nodes,
often form in these applications, and identifying them gives insight into the
overall organization of the network. One common quality function used to
identify community structure is modularity. In Hu et al. [SIAM J. App. Math.,
73(6), 2013], it was shown that modularity optimization is equivalent to
minimizing a particular nonconvex total variation (TV) based functional over a
discrete domain. They solve this problem, assuming the number of communities is
known, using a Merriman, Bence, Osher (MBO) scheme.
We show that modularity optimization is equivalent to minimizing a convex
TV-based functional over a discrete domain, again, assuming the number of
communities is known. Furthermore, we show that modularity has no convex
relaxation satisfying certain natural conditions. We therefore, find a
manageable non-convex approximation using a Ginzburg Landau functional, which
provably converges to the correct energy in the limit of a certain parameter.
We then derive an MBO algorithm with fewer hand-tuned parameters than in Hu et
al. and which is 7 times faster at solving the associated diffusion equation
due to the fact that the underlying discretization is unconditionally stable.
Our numerical tests include a hyperspectral video whose associated graph has
2.9x10^7 edges, which is roughly 37 times larger than was handled in the paper
of Hu et al.Comment: 25 pages, 3 figures, 3 tables, submitted to SIAM J. App. Mat
Stochastic Block Models are a Discrete Surface Tension
Networks, which represent agents and interactions between them, arise in
myriad applications throughout the sciences, engineering, and even the
humanities. To understand large-scale structure in a network, a common task is
to cluster a network's nodes into sets called "communities", such that there
are dense connections within communities but sparse connections between them. A
popular and statistically principled method to perform such clustering is to
use a family of generative models known as stochastic block models (SBMs). In
this paper, we show that maximum likelihood estimation in an SBM is a network
analog of a well-known continuum surface-tension problem that arises from an
application in metallurgy. To illustrate the utility of this relationship, we
implement network analogs of three surface-tension algorithms, with which we
successfully recover planted community structure in synthetic networks and
which yield fascinating insights on empirical networks that we construct from
hyperspectral videos.Comment: to appear in Journal of Nonlinear Scienc
Multiclass Data Segmentation using Diffuse Interface Methods on Graphs
We present two graph-based algorithms for multiclass segmentation of
high-dimensional data. The algorithms use a diffuse interface model based on
the Ginzburg-Landau functional, related to total variation compressed sensing
and image processing. A multiclass extension is introduced using the Gibbs
simplex, with the functional's double-well potential modified to handle the
multiclass case. The first algorithm minimizes the functional using a convex
splitting numerical scheme. The second algorithm is a uses a graph adaptation
of the classical numerical Merriman-Bence-Osher (MBO) scheme, which alternates
between diffusion and thresholding. We demonstrate the performance of both
algorithms experimentally on synthetic data, grayscale and color images, and
several benchmark data sets such as MNIST, COIL and WebKB. We also make use of
fast numerical solvers for finding the eigenvectors and eigenvalues of the
graph Laplacian, and take advantage of the sparsity of the matrix. Experiments
indicate that the results are competitive with or better than the current
state-of-the-art multiclass segmentation algorithms.Comment: 14 page
Community detection in networks via nonlinear modularity eigenvectors
Revealing a community structure in a network or dataset is a central problem
arising in many scientific areas. The modularity function is an established
measure quantifying the quality of a community, being identified as a set of
nodes having high modularity. In our terminology, a set of nodes with positive
modularity is called a \textit{module} and a set that maximizes is thus
called \textit{leading module}. Finding a leading module in a network is an
important task, however the dimension of real-world problems makes the
maximization of unfeasible. This poses the need of approximation techniques
which are typically based on a linear relaxation of , induced by the
spectrum of the modularity matrix . In this work we propose a nonlinear
relaxation which is instead based on the spectrum of a nonlinear modularity
operator . We show that extremal eigenvalues of
provide an exact relaxation of the modularity measure , however at the price
of being more challenging to be computed than those of . Thus we extend the
work made on nonlinear Laplacians, by proposing a computational scheme, named
\textit{generalized RatioDCA}, to address such extremal eigenvalues. We show
monotonic ascent and convergence of the method. We finally apply the new method
to several synthetic and real-world data sets, showing both effectiveness of
the model and performance of the method
Escape times for subgraph detection and graph partitioning
We provide a rearrangement based algorithm for fast detection of subgraphs of
vertices with long escape times for directed or undirected networks.
Complementing other notions of densest subgraphs and graph cuts, our method is
based on the mean hitting time required for a random walker to leave a
designated set and hit the complement. We provide a new relaxation of this
notion of hitting time on a given subgraph and use that relaxation to construct
a fast subgraph detection algorithm and a generalization to -partitioning
schemes. Using a modification of the subgraph detector on each component, we
propose a graph partitioner that identifies regions where random walks live for
comparably large times. Importantly, our method implicitly respects the
directed nature of the data for directed graphs while also being applicable to
undirected graphs. We apply the partitioning method for community detection to
a large class of model and real-world data sets.Comment: 22 pages, 10 figures, 1 table, comments welcome!
Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization
Geographically annotated social media is extremely valuable for modern
information retrieval. However, when researchers can only access
publicly-visible data, one quickly finds that social media users rarely publish
location information. In this work, we provide a method which can geolocate the
overwhelming majority of active Twitter users, independent of their location
sharing preferences, using only publicly-visible Twitter data.
Our method infers an unknown user's location by examining their friend's
locations. We frame the geotagging problem as an optimization over a social
network with a total variation-based objective and provide a scalable and
distributed algorithm for its solution. Furthermore, we show how a robust
estimate of the geographic dispersion of each user's ego network can be used as
a per-user accuracy measure which is effective at removing outlying errors.
Leave-many-out evaluation shows that our method is able to infer location for
101,846,236 Twitter users at a median error of 6.38 km, allowing us to geotag
over 80\% of public tweets.Comment: 9 pages, 8 figures, accepted to IEEE BigData 2014, Compton, Ryan,
David Jurgens, and David Allen. "Geotagging one hundred million twitter
accounts with total variation minimization." Big Data (Big Data), 2014 IEEE
International Conference on. IEEE, 201
Total variation based community detection using a nonlinear optimization approach
Maximizing the modularity of a network is a successful tool to identify an
important community of nodes. However, this combinatorial optimization problem
is known to be NP-complete. Inspired by recent nonlinear modularity eigenvector
approaches, we introduce the modularity total variation and show that
its box-constrained global maximum coincides with the maximum of the original
discrete modularity function. Thus we describe a new nonlinear optimization
approach to solve the equivalent problem leading to a community detection
strategy based on . The proposed approach relies on the use of a fast
first-order method that embeds a tailored active-set strategy. We report
extensive numerical comparisons with standard matrix-based approaches and the
Generalized RatioDCA approach for nonlinear modularity eigenvectors, showing
that our new method compares favourably with state-of-the-art alternatives
An MBO scheme for clustering and semi-supervised clustering of signed networks
We introduce a principled method for the signed clustering problem, where the goal is to partition a weighted undirected graph whose edge weights take both positive and negative values, such that edges within the same cluster are mostly positive, while edges spanning across clusters are
mostly negative. Our method relies on a graph-based diffuse interface model formulation utilizing the Ginzburg–Landau functional, based on an adaptation of the classic numerical Merriman–Bence–Osher (MBO) scheme for minimizing such graph-based functionals. The proposed objective function aims to minimize the total weight of inter-cluster positively-weighted edges, while maximizing the total weight of the inter-cluster negatively-weighted edges. Our method scales to large sparse networks, and can be easily adjusted to incorporate labelled data information, as is often the case in the context of semisupervised learning. We tested our method on a number of both synthetic stochastic block models and real-world data sets (including financial correlation matrices), and obtained promising results that compare favourably against a number of state-of-the-art approaches from the recent literature
- …