364 research outputs found
Linear Programming in the Semi-streaming Model with Application to the Maximum Matching Problem
In this paper, we study linear programming based approaches to the maximum
matching problem in the semi-streaming model. The semi-streaming model has
gained attention as a model for processing massive graphs as the importance of
such graphs has increased. This is a model where edges are streamed-in in an
adversarial order and we are allowed a space proportional to the number of
vertices in a graph.
In recent years, there has been several new results in this semi-streaming
model. However broad techniques such as linear programming have not been
adapted to this model. We present several techniques to adapt and optimize
linear programming based approaches in the semi-streaming model with an
application to the maximum matching problem. As a consequence, we improve
(almost) all previous results on this problem, and also prove new results on
interesting variants
Improved approximation guarantees for weighted matching in the semi-streaming model
We study the maximum weight matching problem in the semi-streaming model, and
improve on the currently best one-pass algorithm due to Zelke (Proc. of
STACS2008, pages 669-680) by devising a deterministic approach whose
performance guarantee is 4.91+epsilon. In addition, we study preemptive online
algorithms, a sub-class of one-pass algorithms where we are only allowed to
maintain a feasible matching in memory at any point in time. All known results
prior to Zelke's belong to this sub-class. We provide a lower bound of 4.967 on
the competitive ratio of any such deterministic algorithm, and hence show that
future improvements will have to store in memory a set of edges which is not
necessarily a feasible matching
Bipartite Matching in the Semi-Streaming Model
We present the first deterministic 1+eps approximation algorithm for finding a large matching in a bipartite graph in the semi-streaming model which requires only passes over the input stream. In this model, the input graph is given as a stream of its edges in some arbitrary order, and storage of the algorithm is bounded by bits, where . The only previously known arbitrarily good approximation for general graphs is achieved by the randomized algorithm of McGregor (2005), which uses passes. We show that even for bipartite graphs, McGregor's algorithm needs passes, thus it is necessarily exponential in the approximation parameter. The design as well as the analysis of our algorithm require the introduction of some new techniques. A novelty of our algorithm is a new deterministic assignment of matching edges to augmenting paths which is responsible for the complexity reduction, and gets rid of randomization. We repeatedly grow an initial matching using augmenting paths up to a length of . We terminate when the number of augmenting paths found in one iteration falls below a certain threshold also depending on , that guarantees a approximation. The main challenge is to find those augmenting paths without requiring an excessive number of passes. In each iteration, using multiple passes, we grow a set of alternating paths in parallel, considering multiple passes, we grow a set of alternating paths in parallel, considering each edge as a possible extension as it comes along in the stream. Backtracking is used on paths that fail to grow any further. Crucial are the so-called position limits: when a matching edge is the i-th matching edge in a path and it is then removed by backtracking, it will only be inserted into a path again at a position strictly lesser than i. This rule strikes a balance between terminating quickly on the one hand and giving the procedure enough freedom on the other hand
Graph Sparsification in the Semi-streaming Model
Analyzing massive data sets has been one of the key motivations for studying
streaming algorithms. In recent years, there has been significant progress in
analysing distributions in a streaming setting, but the progress on graph
problems has been limited. A main reason for this has been the existence of
linear space lower bounds for even simple problems such as determining the
connectedness of a graph. However, in many new scenarios that arise from social
and other interaction networks, the number of vertices is significantly less
than the number of edges. This has led to the formulation of the semi-streaming
model where we assume that the space is (near) linear in the number of vertices
(but not necessarily the edges), and the edges appear in an arbitrary (and
possibly adversarial) order.
In this paper we focus on graph sparsification, which is one of the major
building blocks in a variety of graph algorithms. There has been a long history
of (non-streaming) sampling algorithms that provide sparse graph approximations
and it a natural question to ask if the sparsification can be achieved using a
small space, and in addition using a single pass over the data? The question is
interesting from the standpoint of both theory and practice and we answer the
question in the affirmative, by providing a one pass
space algorithm that produces a sparsification that
approximates each cut to a factor. We also show that space is necessary for a one pass streaming algorithm to
approximate the min-cut, improving upon the lower bound that arises
from lower bounds for testing connectivity
Analyzing Massive Graphs in the Semi-streaming Model
Massive graphs arise in a many scenarios, for example,
traffic data analysis in large networks, large scale scientific
experiments, and clustering of large data sets.
The semi-streaming model was proposed for processing massive graphs. In the semi-streaming model, we have a random
accessible memory which is near-linear in the number of vertices.
The input graph (or equivalently, edges in the graph)
is presented as a sequential list of edges (insertion-only model)
or edge insertions and deletions (dynamic model). The list
is read-only but we may make multiple passes over the list.
There has been a few results in the insertion-only model
such as computing distance spanners and approximating
the maximum matching.
In this thesis, we present some algorithms and techniques
for (i) solving more complex problems in the semi-streaming model,
(for example, problems in the dynamic model) and (ii) having
better solutions for the problems which have been studied
(for example, the maximum matching problem). In course of both
of these, we develop new techniques with broad applications and
explore the rich trade-offs between the complexity of models
(insertion-only streams vs. dynamic streams), the number
of passes, space, accuracy, and running time.
1. We initiate the study of dynamic graph streams.
We start with basic problems such as the connectivity
problem and computing the minimum spanning tree.
These problems are
trivial in the insertion-only model. However, they require
non-trivial (and multiple passes for computing the exact minimum
spanning tree) algorithms in the
dynamic model.
2. Second, we present a graph sparsification algorithm in the
semi-streaming model. A graph sparsification
is a sparse graph that approximately preserves
all the cut values of a graph.
Such a graph acts as an oracle for solving cut-related problems,
for example, the minimum cut problem and the multicut problem.
Our algorithm produce a graph sparsification with high probability
in one pass.
3. Third, we use the primal-dual algorithms
to develop the semi-streaming algorithms.
The primal-dual algorithms have been widely accepted
as a framework for solving linear programs
and semidefinite programs faster.
In contrast, we apply the method for reducing space and
number of passes in addition to reducing the running time.
We also present some examples that arise in applications
and show how to apply the techniques:
the multicut problem, the correlation clustering problem,
and the maximum matching problem. As a consequence,
we also develop near-linear time algorithms for the -matching
problems which were not known before
Depth First Search in the Semi-streaming Model
Depth first search (DFS) tree is a fundamental data structure for solving various graph problems. The classical algorithm for building a DFS tree requires O(m+n) time for a given undirected graph G having n vertices and m edges. In the streaming model, an algorithm is allowed several passes (preferably single) over the input graph having a restriction on the size of local space used.
Now, a DFS tree of a graph can be trivially computed using a single pass if O(m) space is allowed. In the semi-streaming model allowing O(n) space, it can be computed in O(n) passes over the input stream, where each pass adds one vertex to the DFS tree. However, it remains an open problem to compute a DFS tree using o(n) passes using o(m) space even in any relaxed streaming environment.
We present the first semi-streaming algorithms that compute a DFS tree of an undirected graph in o(n) passes using o(m) space. We first describe an extremely simple algorithm that requires at most ceil[n/k] passes to compute a DFS tree using O(nk) space, where k is any positive integer. For example using k=sqrt{n}, we can compute a DFS tree in sqrt{n} passes using O(n sqrt{n}) space. We then improve this algorithm by using more involved techniques to reduce the number of passes to ceil[h/k] under similar space constraints, where h is the height of the computed DFS tree. In particular, this algorithm improves the bounds for the case where the computed DFS tree is shallow (having o(n) height). Moreover, this algorithm is presented in form of a framework that allows the flexibility of using any algorithm to maintain a DFS tree of a stored sparser subgraph as a black box, which may be of an independent interest. Both these algorithms essentially demonstrate the existence of a trade-off between the space and number of passes required for computing a DFS tree. Furthermore, we evaluate these algorithms experimentally which reveals their exceptional performance in practice. For both random and real graphs, they require merely a few passes even when allowed just O(n) space
- …