21,931 research outputs found
Semi-Streaming Algorithms for Annotated Graph Streams
Considerable effort has been devoted to the development of streaming
algorithms for analyzing massive graphs. Unfortunately, many results have been
negative, establishing that a wide variety of problems require
space to solve. One of the few bright spots has been the development of
semi-streaming algorithms for a handful of graph problems -- these algorithms
use space .
In the annotated data streaming model of Chakrabarti et al., a
computationally limited client wants to compute some property of a massive
input, but lacks the resources to store even a small fraction of the input, and
hence cannot perform the desired computation locally. The client therefore
accesses a powerful but untrusted service provider, who not only performs the
requested computation, but also proves that the answer is correct.
We put forth the notion of semi-streaming algorithms for annotated graph
streams (semi-streaming annotation schemes for short). These are protocols in
which both the client's space usage and the length of the proof are . We give evidence that semi-streaming annotation schemes
represent a substantially more robust solution concept than does the standard
semi-streaming model. On the positive side, we give semi-streaming annotation
schemes for two dynamic graph problems that are intractable in the standard
model: (exactly) counting triangles, and (exactly) computing maximum matchings.
The former scheme answers a question of Cormode. On the negative side, we
identify for the first time two natural graph problems (connectivity and
bipartiteness in a certain edge update model) that can be solved in the
standard semi-streaming model, but cannot be solved by annotation schemes of
"sub-semi-streaming" cost. That is, these problems are just as hard in the
annotations model as they are in the standard model.Comment: This update includes some additional discussion of the results
proven. The result on counting triangles was previously included in an ECCC
technical report by Chakrabarti et al. available at
http://eccc.hpi-web.de/report/2013/180/. That report has been superseded by
this manuscript, and the CCC 2015 paper "Verifiable Stream Computation and
Arthur-Merlin Communication" by Chakrabarti et a
Near Optimal Parallel Algorithms for Dynamic DFS in Undirected Graphs
Depth first search (DFS) tree is a fundamental data structure for solving
graph problems. The classical algorithm [SiComp74] for building a DFS tree
requires time for a given graph having vertices and edges.
Recently, Baswana et al. [SODA16] presented a simple algorithm for updating DFS
tree of an undirected graph after an edge/vertex update in time.
However, their algorithm is strictly sequential. We present an algorithm
achieving similar bounds, that can be adopted easily to the parallel
environment.
In the parallel model, a DFS tree can be computed from scratch using
processors in expected time [SiComp90] on an EREW PRAM, whereas
the best deterministic algorithm takes time
[SiComp90,JAlg93] on a CRCW PRAM. Our algorithm can be used to develop optimal
(upto polylog n factors deterministic algorithms for maintaining fully dynamic
DFS and fault tolerant DFS, of an undirected graph.
1- Parallel Fully Dynamic DFS:
Given an arbitrary online sequence of vertex/edge updates, we can maintain a
DFS tree of an undirected graph in time per update using
processors on an EREW PRAM.
2- Parallel Fault tolerant DFS:
An undirected graph can be preprocessed to build a data structure of size
O(m) such that for a set of updates (where is constant) in the graph,
the updated DFS tree can be computed in time using
processors on an EREW PRAM.
Moreover, our fully dynamic DFS algorithm provides, in a seamless manner,
nearly optimal (upto polylog n factors) algorithms for maintaining a DFS tree
in semi-streaming model and a restricted distributed model. These are the first
parallel, semi-streaming and distributed algorithms for maintaining a DFS tree
in the dynamic setting.Comment: Accepted to appear in SPAA'17, 32 Pages, 5 Figure
Graph Sparsification in the Semi-streaming Model
Analyzing massive data sets has been one of the key motivations for studying
streaming algorithms. In recent years, there has been significant progress in
analysing distributions in a streaming setting, but the progress on graph
problems has been limited. A main reason for this has been the existence of
linear space lower bounds for even simple problems such as determining the
connectedness of a graph. However, in many new scenarios that arise from social
and other interaction networks, the number of vertices is significantly less
than the number of edges. This has led to the formulation of the semi-streaming
model where we assume that the space is (near) linear in the number of vertices
(but not necessarily the edges), and the edges appear in an arbitrary (and
possibly adversarial) order.
In this paper we focus on graph sparsification, which is one of the major
building blocks in a variety of graph algorithms. There has been a long history
of (non-streaming) sampling algorithms that provide sparse graph approximations
and it a natural question to ask if the sparsification can be achieved using a
small space, and in addition using a single pass over the data? The question is
interesting from the standpoint of both theory and practice and we answer the
question in the affirmative, by providing a one pass
space algorithm that produces a sparsification that
approximates each cut to a factor. We also show that space is necessary for a one pass streaming algorithm to
approximate the min-cut, improving upon the lower bound that arises
from lower bounds for testing connectivity
Depth First Search in the Semi-streaming Model
Depth first search (DFS) tree is a fundamental data structure for solving various graph problems. The classical algorithm for building a DFS tree requires O(m+n) time for a given undirected graph G having n vertices and m edges. In the streaming model, an algorithm is allowed several passes (preferably single) over the input graph having a restriction on the size of local space used.
Now, a DFS tree of a graph can be trivially computed using a single pass if O(m) space is allowed. In the semi-streaming model allowing O(n) space, it can be computed in O(n) passes over the input stream, where each pass adds one vertex to the DFS tree. However, it remains an open problem to compute a DFS tree using o(n) passes using o(m) space even in any relaxed streaming environment.
We present the first semi-streaming algorithms that compute a DFS tree of an undirected graph in o(n) passes using o(m) space. We first describe an extremely simple algorithm that requires at most ceil[n/k] passes to compute a DFS tree using O(nk) space, where k is any positive integer. For example using k=sqrt{n}, we can compute a DFS tree in sqrt{n} passes using O(n sqrt{n}) space. We then improve this algorithm by using more involved techniques to reduce the number of passes to ceil[h/k] under similar space constraints, where h is the height of the computed DFS tree. In particular, this algorithm improves the bounds for the case where the computed DFS tree is shallow (having o(n) height). Moreover, this algorithm is presented in form of a framework that allows the flexibility of using any algorithm to maintain a DFS tree of a stored sparser subgraph as a black box, which may be of an independent interest. Both these algorithms essentially demonstrate the existence of a trade-off between the space and number of passes required for computing a DFS tree. Furthermore, we evaluate these algorithms experimentally which reveals their exceptional performance in practice. For both random and real graphs, they require merely a few passes even when allowed just O(n) space
Scalable Auction Algorithms for Bipartite Maximum Matching Problems
In this paper, we give new auction algorithms for maximum weighted bipartite
matching (MWM) and maximum cardinality bipartite -matching (MCbM). Our
algorithms run in and rounds, respectively, in the blackboard distributed
setting. We show that our MWM algorithm can be implemented in the distributed,
interactive setting using and bit messages,
respectively, directly answering the open question posed by Demange, Gale and
Sotomayor [DNO14]. Furthermore, we implement our algorithms in a variety of
other models including the the semi-streaming model, the shared-memory
work-depth model, and the massively parallel computation model. Our
semi-streaming MWM algorithm uses passes in space and our MCbM algorithm runs in
passes using space (where parameters represent
the degree constraints on the -matching and and represent the left
and right side of the bipartite graph, respectively). Both of these algorithms
improves \emph{exponentially} the dependence on in the space
complexity in the semi-streaming model against the best-known algorithms for
these problems, in addition to improvements in round complexity for MCbM.
Finally, our algorithms eliminate the large polylogarithmic dependence on
in depth and number of rounds in the work-depth and massively parallel
computation models, respectively, improving on previous results which have
large polylogarithmic dependence on (and exponential dependence on
in the MPC model).Comment: To appear in APPROX 202
Algorithms for Big Data: Graphs and PageRank
This work consists of a study of a set of techniques and strategies related
with algorithm's design, whose purpose is the resolution of problems on massive
data sets, in an efficient way. This field is known as Algorithms for Big Data.
In particular, this work has studied the Streaming Algorithms, which represents
the basis of the data structures of sublinear order in space, known as
Sketches. In addition, it has deepened in the study of problems applied to
Graphs on the Semi-Streaming model. Next, the PageRank algorithm was analyzed
as a concrete case study. Finally, the development of a library for the
resolution of graph problems, implemented on the top of the intensive
mathematical computation platform known as TensorFlow has been started.Comment: in Spanish, 143 pages, final degree project (bachelor's thesis
Almost Optimal Streaming Algorithms for Coverage Problems
Maximum coverage and minimum set cover problems --collectively called
coverage problems-- have been studied extensively in streaming models. However,
previous research not only achieve sub-optimal approximation factors and space
complexities, but also study a restricted set arrival model which makes an
explicit or implicit assumption on oracle access to the sets, ignoring the
complexity of reading and storing the whole set at once. In this paper, we
address the above shortcomings, and present algorithms with improved
approximation factor and improved space complexity, and prove that our results
are almost tight. Moreover, unlike most of previous work, our results hold on a
more general edge arrival model. More specifically, we present (almost) optimal
approximation algorithms for maximum coverage and minimum set cover problems in
the streaming model with an (almost) optimal space complexity of
, i.e., the space is {\em independent of the size of the sets or
the size of the ground set of elements}. These results not only improve over
the best known algorithms for the set arrival model, but also are the first
such algorithms for the more powerful {\em edge arrival} model. In order to
achieve the above results, we introduce a new general sketching technique for
coverage functions: This sketching scheme can be applied to convert an
-approximation algorithm for a coverage problem to a
(1-\eps)\alpha-approximation algorithm for the same problem in streaming, or
RAM models. We show the significance of our sketching technique by ruling out
the possibility of solving coverage problems via accessing (as a black box) a
(1 \pm \eps)-approximate oracle (e.g., a sketch function) that estimates the
coverage function on any subfamily of the sets
- …