13,828 research outputs found
Multi-Scale Link Prediction
The automated analysis of social networks has become an important problem due
to the proliferation of social networks, such as LiveJournal, Flickr and
Facebook. The scale of these social networks is massive and continues to grow
rapidly. An important problem in social network analysis is proximity
estimation that infers the closeness of different users. Link prediction, in
turn, is an important application of proximity estimation. However, many
methods for computing proximity measures have high computational complexity and
are thus prohibitive for large-scale link prediction problems. One way to
address this problem is to estimate proximity measures via low-rank
approximation. However, a single low-rank approximation may not be sufficient
to represent the behavior of the entire network. In this paper, we propose
Multi-Scale Link Prediction (MSLP), a framework for link prediction, which can
handle massive networks. The basis idea of MSLP is to construct low rank
approximations of the network at multiple scales in an efficient manner. Based
on this approach, MSLP combines predictions at multiple scales to make robust
and accurate predictions. Experimental results on real-life datasets with more
than a million nodes show the superior performance and scalability of our
method.Comment: 20 pages, 10 figure
Correlation Clustering with Low-Rank Matrices
Correlation clustering is a technique for aggregating data based on
qualitative information about which pairs of objects are labeled 'similar' or
'dissimilar.' Because the optimization problem is NP-hard, much of the previous
literature focuses on finding approximation algorithms. In this paper we
explore how to solve the correlation clustering objective exactly when the data
to be clustered can be represented by a low-rank matrix. We prove in particular
that correlation clustering can be solved in polynomial time when the
underlying matrix is positive semidefinite with small constant rank, but that
the task remains NP-hard in the presence of even one negative eigenvalue. Based
on our theoretical results, we develop an algorithm for efficiently "solving"
low-rank positive semidefinite correlation clustering by employing a procedure
for zonotope vertex enumeration. We demonstrate the effectiveness and speed of
our algorithm by using it to solve several clustering problems on both
synthetic and real-world data
Low-rank Similarity Measure for Role Model Extraction
Computing meaningful clusters of nodes is crucial to analyze large networks.
In this paper, we present a pairwise node similarity measure that allows to
extract roles, i.e. group of nodes sharing similar flow patterns within a
network. We propose a low rank iterative scheme to approximate the similarity
measure for very large networks. Finally, we show that our low rank similarity
score successfully extracts the different roles in random graphs and that its
performances are similar to the pairwise similarity measure.Comment: 7 pages, 2 columns, 4 figures, conference paper for MTNS201
Optimization via Low-rank Approximation for Community Detection in Networks
Community detection is one of the fundamental problems of network analysis,
for which a number of methods have been proposed. Most model-based or
criteria-based methods have to solve an optimization problem over a discrete
set of labels to find communities, which is computationally infeasible. Some
fast spectral algorithms have been proposed for specific methods or models, but
only on a case-by-case basis. Here we propose a general approach for maximizing
a function of a network adjacency matrix over discrete labels by projecting the
set of labels onto a subspace approximating the leading eigenvectors of the
expected adjacency matrix. This projection onto a low-dimensional space makes
the feasible set of labels much smaller and the optimization problem much
easier. We prove a general result about this method and show how to apply it to
several previously proposed community detection criteria, establishing its
consistency for label estimation in each case and demonstrating the fundamental
connection between spectral properties of the network and various model-based
approaches to community detection. Simulations and applications to real-world
data are included to demonstrate our method performs well for multiple problems
over a wide range of parameters.Comment: 45 pages, 7 figures; added discussions about computational complexity
and extension to more than two communitie
Stability of graph communities across time scales
The complexity of biological, social and engineering networks makes it
desirable to find natural partitions into communities that can act as
simplified descriptions and provide insight into the structure and function of
the overall system. Although community detection methods abound, there is a
lack of consensus on how to quantify and rank the quality of partitions. We
show here that the quality of a partition can be measured in terms of its
stability, defined in terms of the clustered autocovariance of a Markov process
taking place on the graph. Because the stability has an intrinsic dependence on
time scales of the graph, it allows us to compare and rank partitions at each
time and also to establish the time spans over which partitions are optimal.
Hence the Markov time acts effectively as an intrinsic resolution parameter
that establishes a hierarchy of increasingly coarser clusterings. Within our
framework we can then provide a unifying view of several standard partitioning
measures: modularity and normalized cut size can be interpreted as one-step
time measures, whereas Fiedler's spectral clustering emerges at long times. We
apply our method to characterize the relevance and persistence of partitions
over time for constructive and real networks, including hierarchical graphs and
social networks. We also obtain reduced descriptions for atomic level protein
structures over different time scales.Comment: submitted; updated bibliography from v
Regression and Singular Value Decomposition in Dynamic Graphs
Most of real-world graphs are {\em dynamic}, i.e., they change over time.
However, while problems such as regression and Singular Value Decomposition
(SVD) have been studied for {\em static} graphs, they have not been
investigated for {\em dynamic} graphs, yet. In this paper, we introduce,
motivate and study regression and SVD over dynamic graphs. First, we present
the notion of {\em update-efficient matrix embedding} that defines the
conditions sufficient for a matrix embedding to be used for the dynamic graph
regression problem (under norm). We prove that given an
update-efficient matrix embedding (e.g., adjacency matrix), after an update
operation in the graph, the optimal solution of the graph regression problem
for the revised graph can be computed in time. We also study dynamic
graph regression under least absolute deviation. Then, we characterize a class
of matrix embeddings that can be used to efficiently update SVD of a dynamic
graph. For adjacency matrix and Laplacian matrix, we study those graph update
operations for which SVD (and low rank approximation) can be updated
efficiently
Spatial Random Sampling: A Structure-Preserving Data Sketching Tool
Random column sampling is not guaranteed to yield data sketches that preserve
the underlying structures of the data and may not sample sufficiently from
less-populated data clusters. Also, adaptive sampling can often provide
accurate low rank approximations, yet may fall short of producing descriptive
data sketches, especially when the cluster centers are linearly dependent.
Motivated by that, this paper introduces a novel randomized column sampling
tool dubbed Spatial Random Sampling (SRS), in which data points are sampled
based on their proximity to randomly sampled points on the unit sphere. The
most compelling feature of SRS is that the corresponding probability of
sampling from a given data cluster is proportional to the surface area the
cluster occupies on the unit sphere, independently from the size of the cluster
population. Although it is fully randomized, SRS is shown to provide
descriptive and balanced data representations. The proposed idea addresses a
pressing need in data science and holds potential to inspire many novel
approaches for analysis of big data
- …