477 research outputs found
Graph Sparsification in the Semi-streaming Model
Analyzing massive data sets has been one of the key motivations for studying
streaming algorithms. In recent years, there has been significant progress in
analysing distributions in a streaming setting, but the progress on graph
problems has been limited. A main reason for this has been the existence of
linear space lower bounds for even simple problems such as determining the
connectedness of a graph. However, in many new scenarios that arise from social
and other interaction networks, the number of vertices is significantly less
than the number of edges. This has led to the formulation of the semi-streaming
model where we assume that the space is (near) linear in the number of vertices
(but not necessarily the edges), and the edges appear in an arbitrary (and
possibly adversarial) order.
In this paper we focus on graph sparsification, which is one of the major
building blocks in a variety of graph algorithms. There has been a long history
of (non-streaming) sampling algorithms that provide sparse graph approximations
and it a natural question to ask if the sparsification can be achieved using a
small space, and in addition using a single pass over the data? The question is
interesting from the standpoint of both theory and practice and we answer the
question in the affirmative, by providing a one pass
space algorithm that produces a sparsification that
approximates each cut to a factor. We also show that space is necessary for a one pass streaming algorithm to
approximate the min-cut, improving upon the lower bound that arises
from lower bounds for testing connectivity
Analyzing Massive Graphs in the Semi-streaming Model
Massive graphs arise in a many scenarios, for example,
traffic data analysis in large networks, large scale scientific
experiments, and clustering of large data sets.
The semi-streaming model was proposed for processing massive graphs. In the semi-streaming model, we have a random
accessible memory which is near-linear in the number of vertices.
The input graph (or equivalently, edges in the graph)
is presented as a sequential list of edges (insertion-only model)
or edge insertions and deletions (dynamic model). The list
is read-only but we may make multiple passes over the list.
There has been a few results in the insertion-only model
such as computing distance spanners and approximating
the maximum matching.
In this thesis, we present some algorithms and techniques
for (i) solving more complex problems in the semi-streaming model,
(for example, problems in the dynamic model) and (ii) having
better solutions for the problems which have been studied
(for example, the maximum matching problem). In course of both
of these, we develop new techniques with broad applications and
explore the rich trade-offs between the complexity of models
(insertion-only streams vs. dynamic streams), the number
of passes, space, accuracy, and running time.
1. We initiate the study of dynamic graph streams.
We start with basic problems such as the connectivity
problem and computing the minimum spanning tree.
These problems are
trivial in the insertion-only model. However, they require
non-trivial (and multiple passes for computing the exact minimum
spanning tree) algorithms in the
dynamic model.
2. Second, we present a graph sparsification algorithm in the
semi-streaming model. A graph sparsification
is a sparse graph that approximately preserves
all the cut values of a graph.
Such a graph acts as an oracle for solving cut-related problems,
for example, the minimum cut problem and the multicut problem.
Our algorithm produce a graph sparsification with high probability
in one pass.
3. Third, we use the primal-dual algorithms
to develop the semi-streaming algorithms.
The primal-dual algorithms have been widely accepted
as a framework for solving linear programs
and semidefinite programs faster.
In contrast, we apply the method for reducing space and
number of passes in addition to reducing the running time.
We also present some examples that arise in applications
and show how to apply the techniques:
the multicut problem, the correlation clustering problem,
and the maximum matching problem. As a consequence,
we also develop near-linear time algorithms for the -matching
problems which were not known before
Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning
The number of triangles is a computationally expensive graph statistic which
is frequently used in complex network analysis (e.g., transitivity ratio), in
various random graph models (e.g., exponential random graph model) and in
important real world applications such as spam detection, uncovering of the
hidden thematic structure of the Web and link recommendation. Counting
triangles in graphs with millions and billions of edges requires algorithms
which run fast, use small amount of space, provide accurate estimates of the
number of triangles and preferably are parallelizable.
In this paper we present an efficient triangle counting algorithm which can
be adapted to the semistreaming model. The key idea of our algorithm is to
combine the sampling algorithm of Tsourakakis et al. and the partitioning of
the set of vertices into a high degree and a low degree subset respectively as
in the Alon, Yuster and Zwick work treating each set appropriately. We obtain a
running time
and an approximation (multiplicative error), where is the number
of vertices, the number of edges and the maximum number of
triangles an edge is contained.
Furthermore, we show how this algorithm can be adapted to the semistreaming
model with space usage and a constant number of passes (three) over the graph
stream. We apply our methods in various networks with several millions of edges
and we obtain excellent results. Finally, we propose a random projection based
method for triangle counting and provide a sufficient condition to obtain an
estimate with low variance.Comment: 1) 12 pages 2) To appear in the 7th Workshop on Algorithms and Models
for the Web Graph (WAW 2010
A Linear-time Algorithm for Sparsification of Unweighted Graphs
Given an undirected graph and an error parameter , the {\em
graph sparsification} problem requires sampling edges in and giving the
sampled edges appropriate weights to obtain a sparse graph with
the following property: the weight of every cut in is within a
factor of of the weight of the corresponding cut in . If
is unweighted, an -time algorithm for constructing
with edges in expectation, and an
-time algorithm for constructing with edges in expectation have recently been developed
(Hariharan-Panigrahi, 2010). In this paper, we improve these results by giving
an -time algorithm for constructing with edges in expectation, for unweighted graphs. Our algorithm is
optimal in terms of its time complexity; further, no efficient algorithm is
known for constructing a sparser . Our algorithm is Monte-Carlo,
i.e. it produces the correct output with high probability, as are all efficient
graph sparsification algorithms
Sketching Cuts in Graphs and Hypergraphs
Sketching and streaming algorithms are in the forefront of current research
directions for cut problems in graphs. In the streaming model, we show that
-approximation for Max-Cut must use space;
moreover, beating -approximation requires polynomial space. For the
sketching model, we show that -uniform hypergraphs admit a
-cut-sparsifier (i.e., a weighted subhypergraph that
approximately preserves all the cuts) with
edges. We also make first steps towards sketching general CSPs (Constraint
Satisfaction Problems)
Simple parallel and distributed algorithms for spectral graph sparsification
We describe a simple algorithm for spectral graph sparsification, based on
iterative computations of weighted spanners and uniform sampling. Leveraging
the algorithms of Baswana and Sen for computing spanners, we obtain the first
distributed spectral sparsification algorithm. We also obtain a parallel
algorithm with improved work and time guarantees. Combining this algorithm with
the parallel framework of Peng and Spielman for solving symmetric diagonally
dominant linear systems, we get a parallel solver which is much closer to being
practical and significantly more efficient in terms of the total work.Comment: replaces "A simple parallel and distributed algorithm for spectral
sparsification". Minor change
Probabilistic Spectral Sparsification In Sublinear Time
In this paper, we introduce a variant of spectral sparsification, called
probabilistic -spectral sparsification. Roughly speaking,
it preserves the cut value of any cut with an
multiplicative error and a additive error. We show how
to produce a probabilistic -spectral sparsifier with
edges in time
time for unweighted undirected graph. This gives fastest known sub-linear time
algorithms for different cut problems on unweighted undirected graph such as
- An time -approximation
algorithm for the sparsest cut problem and the balanced separator problem.
- A time approximation minimum s-t cut algorithm
with an additive error
- …