Search CORE

13 research outputs found

Catching the head, tail, and everything in between: a streaming algorithm for the degree distribution

Author: McGregor Andrew
Seshadhri C.
Simpson Olivia
Publication venue
Publication date: 25/11/2015
Field of study

The degree distribution is one of the most fundamental graph properties of interest for real-world graphs. It has been widely observed in numerous domains that graphs typically have a tailed or scale-free degree distribution. While the average degree is usually quite small, the variance is quite high and there are vertices with degrees at all scales. We focus on the problem of approximating the degree distribution of a large streaming graph, with small storage. We design an algorithm headtail, whose main novelty is a new estimator of infrequent degrees using truncated geometric random variables. We give a mathematical analysis of headtail and show that it has excellent behavior in practice. We can process streams will millions of edges with storage less than 1% and get extremely accurate approximations for all scales in the degree distribution. We also introduce a new notion of Relative Hausdorff distance between tailed histograms. Existing notions of distances between distributions are not suitable, since they ignore infrequent degrees in the tail. The Relative Hausdorff distance measures deviations at all scales, and is a more suitable distance for comparing degree distributions. By tracking this new measure, we are able to give strong empirical evidence of the convergence of headtail

arXiv.org e-Print Archive

Crossref

Near Optimal Parallel Algorithms for Dynamic DFS in Undirected Graphs

Author: Khan Shahbaz
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2017
Field of study

Depth first search (DFS) tree is a fundamental data structure for solving graph problems. The classical algorithm [SiComp74] for building a DFS tree requires

O(m+n)

time for a given graph

G

having

n

vertices and

m

edges. Recently, Baswana et al. [SODA16] presented a simple algorithm for updating DFS tree of an undirected graph after an edge/vertex update in

\tilde{O}(n)

time. However, their algorithm is strictly sequential. We present an algorithm achieving similar bounds, that can be adopted easily to the parallel environment. In the parallel model, a DFS tree can be computed from scratch using

m

processors in expected

\tilde{O}(1)

time [SiComp90] on an EREW PRAM, whereas the best deterministic algorithm takes

\tilde{O}(\sqrt{n})

time [SiComp90,JAlg93] on a CRCW PRAM. Our algorithm can be used to develop optimal (upto polylog n factors deterministic algorithms for maintaining fully dynamic DFS and fault tolerant DFS, of an undirected graph. 1- Parallel Fully Dynamic DFS: Given an arbitrary online sequence of vertex/edge updates, we can maintain a DFS tree of an undirected graph in

\tilde{O}(1)

time per update using

m

processors on an EREW PRAM. 2- Parallel Fault tolerant DFS: An undirected graph can be preprocessed to build a data structure of size O(m) such that for a set of

k

updates (where

k

is constant) in the graph, the updated DFS tree can be computed in

\tilde{O}(1)

time using

n

processors on an EREW PRAM. Moreover, our fully dynamic DFS algorithm provides, in a seamless manner, nearly optimal (upto polylog n factors) algorithms for maintaining a DFS tree in semi-streaming model and a restricted distributed model. These are the first parallel, semi-streaming and distributed algorithms for maintaining a DFS tree in the dynamic setting.Comment: Accepted to appear in SPAA'17, 32 Pages, 5 Figure

arXiv.org e-Print Archive

Crossref

Recommended from our members

Correlation Clustering in Data Streams

Author: Ahn Kook Jin
Cormode Graham
Guha Sudipto
McGregor Andrew
Wirth Anthony
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2021
Field of study

Clustering is a fundamental tool for analyzing large data sets. A rich body of work has been devoted to designing data-stream algorithms for the relevant optimization problems such as k-center, k-median, and k-means. Such algorithms need to be both time and and space efcient. In this paper, we address the problem of correlation clustering in the dynamic data stream model. The stream consists of updates to the edge weights of a graph on n nodes and the goal is to find a node-partition such that the end-points of negative-weight edges are typically in diferent clusters whereas the end-points of positive-weight edges are typically in the same cluster. We present polynomial-time, O(n ⋅ polylog n)-space approximation algorithms for natural problems that arise. We frst develop data structures based on linear sketches that allow the “quality” of a given node-partition to be measured. We then combine these data structures with convex programming and sampling techniques to solve the relevant approximation problem. Unfortunately, the standard LP and SDP formulations are not obviously solvable in O(n ⋅ polylog n)-space. Our work presents space-efcient algorithms for the convex programming required, as well as approaches to reduce the adaptivity of the sampling

ScholarWorks@UMass Amherst

Warwick Research Archives Portal Repository

University of Melbourne Institutional Repository

Dynamic graph stream algorithms in o(n) space

Author: Huang Z.
Peng P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/09/2018
Field of study

In this paper we study graph problems in the dynamic streaming model, where the input is defined by a sequence of edge insertions and deletions. As many natural problems require Ω(n) space, where n is the number of vertices, existing works mainly focused on designing O(n⋅polylogn) space algorithms. Although sublinear in the number of edges for dense graphs, it could still be too large for many applications (e.g., n is huge or the graph is sparse). In this work, we give single-pass algorithms beating this space barrier for two classes of problems. We present o(n) space algorithms for estimating the number of connected components with additive error εn of a general graph and (1+ε) -approximating the weight of the minimum spanning tree of a connected graph with bounded edge weights, for any small constant ε>0 . The latter improves upon the previous O(n⋅polylogn) space algorithm given by Ahn et al. (SODA 2012) for the same class of graphs. We initiate the study of approximate graph property testing in the dynamic streaming model, where we want to distinguish graphs satisfying the property from graphs that are ε -far from having the property. We consider the problem of testing k-edge connectivity, k-vertex connectivity, cycle-freeness and bipartiteness (of planar graphs), for which, we provide algorithms using roughly O(n1−ε⋅polylogn) space, which is o(n) for any constant ε . To complement our algorithms, we present Ω(n1−O(ε)) space lower bounds for these problems, which show that such a dependence on ε is necessary

White Rose Research Online

Graph Processing on GPUs:A Survey

Author: He Ligang
Hua Qiang-Sheng
Jin Hai
Liu Bo
Shi Xuanhua
Zheng Zhigao
Zhou Yongluan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Copenhagen University Research Information System