13 research outputs found
Catching the head, tail, and everything in between: a streaming algorithm for the degree distribution
The degree distribution is one of the most fundamental graph properties of
interest for real-world graphs. It has been widely observed in numerous domains
that graphs typically have a tailed or scale-free degree distribution. While
the average degree is usually quite small, the variance is quite high and there
are vertices with degrees at all scales. We focus on the problem of
approximating the degree distribution of a large streaming graph, with small
storage. We design an algorithm headtail, whose main novelty is a new estimator
of infrequent degrees using truncated geometric random variables. We give a
mathematical analysis of headtail and show that it has excellent behavior in
practice. We can process streams will millions of edges with storage less than
1% and get extremely accurate approximations for all scales in the degree
distribution.
We also introduce a new notion of Relative Hausdorff distance between tailed
histograms. Existing notions of distances between distributions are not
suitable, since they ignore infrequent degrees in the tail. The Relative
Hausdorff distance measures deviations at all scales, and is a more suitable
distance for comparing degree distributions. By tracking this new measure, we
are able to give strong empirical evidence of the convergence of headtail
Near Optimal Parallel Algorithms for Dynamic DFS in Undirected Graphs
Depth first search (DFS) tree is a fundamental data structure for solving
graph problems. The classical algorithm [SiComp74] for building a DFS tree
requires time for a given graph having vertices and edges.
Recently, Baswana et al. [SODA16] presented a simple algorithm for updating DFS
tree of an undirected graph after an edge/vertex update in time.
However, their algorithm is strictly sequential. We present an algorithm
achieving similar bounds, that can be adopted easily to the parallel
environment.
In the parallel model, a DFS tree can be computed from scratch using
processors in expected time [SiComp90] on an EREW PRAM, whereas
the best deterministic algorithm takes time
[SiComp90,JAlg93] on a CRCW PRAM. Our algorithm can be used to develop optimal
(upto polylog n factors deterministic algorithms for maintaining fully dynamic
DFS and fault tolerant DFS, of an undirected graph.
1- Parallel Fully Dynamic DFS:
Given an arbitrary online sequence of vertex/edge updates, we can maintain a
DFS tree of an undirected graph in time per update using
processors on an EREW PRAM.
2- Parallel Fault tolerant DFS:
An undirected graph can be preprocessed to build a data structure of size
O(m) such that for a set of updates (where is constant) in the graph,
the updated DFS tree can be computed in time using
processors on an EREW PRAM.
Moreover, our fully dynamic DFS algorithm provides, in a seamless manner,
nearly optimal (upto polylog n factors) algorithms for maintaining a DFS tree
in semi-streaming model and a restricted distributed model. These are the first
parallel, semi-streaming and distributed algorithms for maintaining a DFS tree
in the dynamic setting.Comment: Accepted to appear in SPAA'17, 32 Pages, 5 Figure
Recommended from our members
Correlation Clustering in Data Streams
Clustering is a fundamental tool for analyzing large data sets. A rich body of work has been devoted to designing data-stream algorithms for the relevant optimization problems such as k-center, k-median, and k-means. Such algorithms need to be both time and and space efcient. In this paper, we address the problem of correlation clustering in the dynamic data stream model. The stream consists of updates to the edge weights of a graph on n nodes and the goal is to find a node-partition such that the end-points of negative-weight edges are typically in diferent clusters whereas the end-points of positive-weight edges are typically in the same cluster. We present polynomial-time, O(n â
polylog n)-space approximation algorithms for natural problems that arise. We frst develop data structures based on linear sketches that allow the âqualityâ of a given node-partition to be measured. We then combine these data structures with convex programming and sampling techniques to solve the relevant approximation problem. Unfortunately, the standard LP and SDP formulations are not obviously solvable in O(n â
polylog n)-space. Our work presents space-efcient algorithms for the convex programming required, as well as approaches to reduce the adaptivity of the sampling
Dynamic graph stream algorithms in o(n) space
In this paper we study graph problems in the dynamic streaming model, where the input is defined by a sequence of edge insertions and deletions. As many natural problems require Ω(n) space, where n is the number of vertices, existing works mainly focused on designing O(nâ
polylogn) space algorithms. Although sublinear in the number of edges for dense graphs, it could still be too large for many applications (e.g., n is huge or the graph is sparse). In this work, we give single-pass algorithms beating this space barrier for two classes of problems. We present o(n) space algorithms for estimating the number of connected components with additive error Δn of a general graph and (1+Δ) -approximating the weight of the minimum spanning tree of a connected graph with bounded edge weights, for any small constant Δ>0 . The latter improves upon the previous O(nâ
polylogn) space algorithm given by Ahn et al. (SODA 2012) for the same class of graphs. We initiate the study of approximate graph property testing in the dynamic streaming model, where we want to distinguish graphs satisfying the property from graphs that are Δ -far from having the property. We consider the problem of testing k-edge connectivity, k-vertex connectivity, cycle-freeness and bipartiteness (of planar graphs), for which, we provide algorithms using roughly O(n1âΔâ
polylogn) space, which is o(n) for any constant Δ . To complement our algorithms, we present Ω(n1âO(Δ)) space lower bounds for these problems, which show that such a dependence on Δ is necessary