15,838 research outputs found
Streaming, Memory Limited Algorithms for Community Detection
In this paper, we consider sparse networks consisting of a finite number of
non-overlapping communities, i.e. disjoint clusters, so that there is higher
density within clusters than across clusters. Both the intra- and inter-cluster
edge densities vanish when the size of the graph grows large, making the
cluster reconstruction problem nosier and hence difficult to solve. We are
interested in scenarios where the network size is very large, so that the
adjacency matrix of the graph is hard to manipulate and store. The data stream
model in which columns of the adjacency matrix are revealed sequentially
constitutes a natural framework in this setting. For this model, we develop two
novel clustering algorithms that extract the clusters asymptotically
accurately. The first algorithm is {\it offline}, as it needs to store and keep
the assignments of nodes to clusters, and requires a memory that scales
linearly with the network size. The second algorithm is {\it online}, as it may
classify a node when the corresponding column is revealed and then discard this
information. This algorithm requires a memory growing sub-linearly with the
network size. To construct these efficient streaming memory-limited clustering
algorithms, we first address the problem of clustering with partial
information, where only a small proportion of the columns of the adjacency
matrix is observed and develop, for this setting, a new spectral algorithm
which is of independent interest.Comment: NIPS 201
Sublinear Time and Space Algorithms for Correlation Clustering via Sparse-Dense Decompositions
We present a new approach for solving (minimum disagreement) correlation
clustering that results in sublinear algorithms with highly efficient time and
space complexity for this problem. In particular, we obtain the following
algorithms for -vertex -labeled graphs :
-- A sublinear-time algorithm that with high probability returns a constant
approximation clustering of in time assuming access to the
adjacency list of the -labeled edges of (this is almost quadratically
faster than even reading the input once). Previously, no sublinear-time
algorithm was known for this problem with any multiplicative approximation
guarantee.
-- A semi-streaming algorithm that with high probability returns a constant
approximation clustering of in space and a single pass over
the edges of the graph (this memory is almost quadratically smaller than
input size). Previously, no single-pass algorithm with space was known
for this problem with any approximation guarantee.
The main ingredient of our approach is a novel connection to sparse-dense
graph decompositions that are used extensively in the graph coloring
literature. To our knowledge, this connection is the first application of these
decompositions beyond graph coloring, and in particular for the correlation
clustering problem, and can be of independent interest
Streaming, Local, and MultiLevel (Hyper)Graph Decomposition
(Hyper)Graph decomposition is a family of problems that aim to break down large (hyper)graphs into smaller sub(hyper)graphs for easier analysis. The importance of this lies in its ability to enable efficient computation on large and complex (hyper)graphs, such as social networks, chemical compounds, and computer networks. This dissertation explores several types of (hyper)graph decomposition problems, including graph partitioning, hypergraph partitioning, local graph clustering, process mapping, and signed graph clustering. Our main focus is on streaming algorithms, local algorithms and multilevel algorithms. In terms of streaming algorithms, we make contributions with highly efficient and effective algorithms for (hyper)graph partitioning and process mapping. In terms of local algorithms, we propose sub-linear algorithms which are effective in detecting high-quality local communities around a given seed node in a graph based on the distribution of a given motif. In terms of multilevel algorithms, we engineer high-quality multilevel algorithms for process mapping and signed graph clustering. We provide a thorough discussion of each algorithm along with experimental results demonstrating their superiority over existing state-of-the-art techniques.
The results show that the proposed algorithms achieve improved performance and better solutions in various metrics, making them highly promising for practical applications. Overall, this dissertation showcases the effectiveness of advanced combinatorial algorithmic techniques in solving challenging (hyper)graph decomposition problems
Network Sampling: From Static to Streaming Graphs
Network sampling is integral to the analysis of social, information, and
biological networks. Since many real-world networks are massive in size,
continuously evolving, and/or distributed in nature, the network structure is
often sampled in order to facilitate study. For these reasons, a more thorough
and complete understanding of network sampling is critical to support the field
of network science. In this paper, we outline a framework for the general
problem of network sampling, by highlighting the different objectives,
population and units of interest, and classes of network sampling methods. In
addition, we propose a spectrum of computational models for network sampling
methods, ranging from the traditionally studied model based on the assumption
of a static domain to a more challenging model that is appropriate for
streaming domains. We design a family of sampling methods based on the concept
of graph induction that generalize across the full spectrum of computational
models (from static to streaming) while efficiently preserving many of the
topological properties of the input graphs. Furthermore, we demonstrate how
traditional static sampling algorithms can be modified for graph streams for
each of the three main classes of sampling methods: node, edge, and
topology-based sampling. Our experimental results indicate that our proposed
family of sampling methods more accurately preserves the underlying properties
of the graph for both static and streaming graphs. Finally, we study the impact
of network sampling algorithms on the parameter estimation and performance
evaluation of relational classification algorithms
Analyzing Massive Graphs in the Semi-streaming Model
Massive graphs arise in a many scenarios, for example,
traffic data analysis in large networks, large scale scientific
experiments, and clustering of large data sets.
The semi-streaming model was proposed for processing massive graphs. In the semi-streaming model, we have a random
accessible memory which is near-linear in the number of vertices.
The input graph (or equivalently, edges in the graph)
is presented as a sequential list of edges (insertion-only model)
or edge insertions and deletions (dynamic model). The list
is read-only but we may make multiple passes over the list.
There has been a few results in the insertion-only model
such as computing distance spanners and approximating
the maximum matching.
In this thesis, we present some algorithms and techniques
for (i) solving more complex problems in the semi-streaming model,
(for example, problems in the dynamic model) and (ii) having
better solutions for the problems which have been studied
(for example, the maximum matching problem). In course of both
of these, we develop new techniques with broad applications and
explore the rich trade-offs between the complexity of models
(insertion-only streams vs. dynamic streams), the number
of passes, space, accuracy, and running time.
1. We initiate the study of dynamic graph streams.
We start with basic problems such as the connectivity
problem and computing the minimum spanning tree.
These problems are
trivial in the insertion-only model. However, they require
non-trivial (and multiple passes for computing the exact minimum
spanning tree) algorithms in the
dynamic model.
2. Second, we present a graph sparsification algorithm in the
semi-streaming model. A graph sparsification
is a sparse graph that approximately preserves
all the cut values of a graph.
Such a graph acts as an oracle for solving cut-related problems,
for example, the minimum cut problem and the multicut problem.
Our algorithm produce a graph sparsification with high probability
in one pass.
3. Third, we use the primal-dual algorithms
to develop the semi-streaming algorithms.
The primal-dual algorithms have been widely accepted
as a framework for solving linear programs
and semidefinite programs faster.
In contrast, we apply the method for reducing space and
number of passes in addition to reducing the running time.
We also present some examples that arise in applications
and show how to apply the techniques:
the multicut problem, the correlation clustering problem,
and the maximum matching problem. As a consequence,
we also develop near-linear time algorithms for the -matching
problems which were not known before
Online Inference for Mixture Model of Streaming Graph Signals with Non-White Excitation
This paper considers a joint multi-graph inference and clustering problem for
simultaneous inference of node centrality and association of graph signals with
their graphs. We study a mixture model of filtered low pass graph signals with
possibly non-white and low-rank excitation. While the mixture model is
motivated from practical scenarios, it presents significant challenges to prior
graph learning methods. As a remedy, we consider an inference problem focusing
on the node centrality of graphs. We design an expectation-maximization (EM)
algorithm with a unique low-rank plus sparse prior derived from low pass signal
property. We propose a novel online EM algorithm for inference from streaming
data. As an example, we extend the online algorithm to detect if the signals
are generated from an abnormal graph. We show that the proposed algorithms
converge to a stationary point of the maximum-a-posterior (MAP) problem.
Numerical experiments support our analysis
Graph Sample and Hold: A Framework for Big-Graph Analytics
Sampling is a standard approach in big-graph analytics; the goal is to
efficiently estimate the graph properties by consulting a sample of the whole
population. A perfect sample is assumed to mirror every property of the whole
population. Unfortunately, such a perfect sample is hard to collect in complex
populations such as graphs (e.g. web graphs, social networks etc), where an
underlying network connects the units of the population. Therefore, a good
sample will be representative in the sense that graph properties of interest
can be estimated with a known degree of accuracy. While previous work focused
particularly on sampling schemes used to estimate certain graph properties
(e.g. triangle count), much less is known for the case when we need to estimate
various graph properties with the same sampling scheme. In this paper, we
propose a generic stream sampling framework for big-graph analytics, called
Graph Sample and Hold (gSH). To begin, the proposed framework samples from
massive graphs sequentially in a single pass, one edge at a time, while
maintaining a small state. We then show how to produce unbiased estimators for
various graph properties from the sample. Given that the graph analysis
algorithms will run on a sample instead of the whole population, the runtime
complexity of these algorithm is kept under control. Moreover, given that the
estimators of graph properties are unbiased, the approximation error is kept
under control. Finally, we show the performance of the proposed framework (gSH)
on various types of graphs, such as social graphs, among others
- …