Massive graphs arise in a many scenarios, for example,
traffic data analysis in large networks, large scale scientific
experiments, and clustering of large data sets.
The semi-streaming model was proposed for processing massive graphs. In the semi-streaming model, we have a random
accessible memory which is near-linear in the number of vertices.
The input graph (or equivalently, edges in the graph)
is presented as a sequential list of edges (insertion-only model)
or edge insertions and deletions (dynamic model). The list
is read-only but we may make multiple passes over the list.
There has been a few results in the insertion-only model
such as computing distance spanners and approximating
the maximum matching.
In this thesis, we present some algorithms and techniques
for (i) solving more complex problems in the semi-streaming model,
(for example, problems in the dynamic model) and (ii) having
better solutions for the problems which have been studied
(for example, the maximum matching problem). In course of both
of these, we develop new techniques with broad applications and
explore the rich trade-offs between the complexity of models
(insertion-only streams vs. dynamic streams), the number
of passes, space, accuracy, and running time.
1. We initiate the study of dynamic graph streams.
We start with basic problems such as the connectivity
problem and computing the minimum spanning tree.
These problems are
trivial in the insertion-only model. However, they require
non-trivial (and multiple passes for computing the exact minimum
spanning tree) algorithms in the
dynamic model.
2. Second, we present a graph sparsification algorithm in the
semi-streaming model. A graph sparsification
is a sparse graph that approximately preserves
all the cut values of a graph.
Such a graph acts as an oracle for solving cut-related problems,
for example, the minimum cut problem and the multicut problem.
Our algorithm produce a graph sparsification with high probability
in one pass.
3. Third, we use the primal-dual algorithms
to develop the semi-streaming algorithms.
The primal-dual algorithms have been widely accepted
as a framework for solving linear programs
and semidefinite programs faster.
In contrast, we apply the method for reducing space and
number of passes in addition to reducing the running time.
We also present some examples that arise in applications
and show how to apply the techniques:
the multicut problem, the correlation clustering problem,
and the maximum matching problem. As a consequence,
we also develop near-linear time algorithms for the b-matching
problems which were not known before