Search CORE

319 research outputs found

Densest Subgraph in Dynamic Graph Streams

Author: A McGregor
AC Gilbert
B Bahmani
B Bahmani
G Cormode
G Cormode
G Gallo
K Kutzkov
KJ Ahn
M Charikar
M Mitzenmacher
S Khuller
V Lee
Publication venue
Publication date: 14/06/2015
Field of study

In this paper, we consider the problem of approximating the densest subgraph in the dynamic graph stream model. In this model of computation, the input graph is defined by an arbitrary sequence of edge insertions and deletions and the goal is to analyze properties of the resulting graph given memory that is sub-linear in the size of the stream. We present a single-pass algorithm that returns a

(1+\epsilon)

approximation of the maximum density with high probability; the algorithm uses O(\epsilon^{-2} n \polylog n) space, processes each stream update in \polylog (n) time, and uses \poly(n) post-processing time where

n

is the number of nodes. The space used by our algorithm matches the lower bound of Bahmani et al.~(PVLDB 2012) up to a poly-logarithmic factor for constant

\epsilon

. The best existing results for this problem were established recently by Bhattacharya et al.~(STOC 2015). They presented a

(2+\epsilon)

approximation algorithm using similar space and another algorithm that both processed each update and maintained a

(4+\epsilon)

approximation of the current maximum density in \polylog (n) time per-update.Comment: To appear in MFCS 201

arXiv.org e-Print Archive

Crossref

Communication-Optimal Distributed Dynamic Graph Clustering

Author: Bi Jinbo
Han Song
Lam Kam-Yiu
Zhu Chun Jiang
Zhu Tan
Publication venue
Publication date: 14/11/2018
Field of study

We consider the problem of clustering graph nodes over large-scale dynamic graphs, such as citation networks, images and web networks, when graph updates such as node/edge insertions/deletions are observed distributively. We propose communication-efficient algorithms for two well-established communication models namely the message passing and the blackboard models. Given a graph with

n

nodes that is observed at

s

remote sites over time

[1,t]

, the two proposed algorithms have communication costs

\tilde{O}(ns)

and

\tilde{O}(n+s)

(

\tilde{O}

hides a polylogarithmic factor), almost matching their lower bounds,

\Omega(ns)

and

\Omega(n+s)

, respectively, in the message passing and the blackboard models. More importantly, we prove that at each time point in

[1,t]

our algorithms generate clustering quality nearly as good as that of centralizing all updates up to that time and then applying a standard centralized clustering algorithm. We conducted extensive experiments on both synthetic and real-life datasets which confirmed the communication efficiency of our approach over baseline algorithms while achieving comparable clustering results.Comment: Accepted and to appear in AAAI'1

arXiv.org e-Print Archive

PubMed Central

Association for the Advancement of Artificial Intelligence: AAAI Publications

Sublinear Estimation of Weighted Matchings in Dynamic Data Streams

Author: A. McGregor
A. McGregor
C. Konrad
D. Gavinsky
J. Feigenbaum
K. Ahn
K. Ahn
L. Epstein
M. Crouch
M. Zelke
N. Nisan
R. Uehara
W. Tutte
Z. Bar-Yossef
Publication venue
Publication date: 01/01/2015
Field of study

This paper presents an algorithm for estimating the weight of a maximum weighted matching by augmenting any estimation routine for the size of an unweighted matching. The algorithm is implementable in any streaming model including dynamic graph streams. We also give the first constant estimation for the maximum matching size in a dynamic graph stream for planar graphs (or any graph with bounded arboricity) using

\tilde{O}(n^{4/5})

space which also extends to weighted matching. Using previous results by Kapralov, Khanna, and Sudan (2014) we obtain a

\mathrm{polylog}(n)

approximation for general graphs using

\mathrm{polylog}(n)

space in random order streams, respectively. In addition, we give a space lower bound of

\Omega(n^{1-\varepsilon})

for any randomized algorithm estimating the size of a maximum matching up to a

1+O(\varepsilon)

factor for adversarial streams

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Maximum Matching in Turnstile Streams

Author: A. McGregor
C. Konrad
J. Feigenbaum
K.J. Ahn
K.J. Ahn
M. Kapralov
M. Zelke
M.S. Crouch
S. Eggert
Publication venue
Publication date: 06/05/2015
Field of study

We consider the unweighted bipartite maximum matching problem in the one-pass turnstile streaming model where the input stream consists of edge insertions and deletions. In the insertion-only model, a one-pass

2

-approximation streaming algorithm can be easily obtained with space

O(n \log n)

, where

n

denotes the number of vertices of the input graph. We show that no such result is possible if edge deletions are allowed, even if space

O(n^{3/2-\delta})

is granted, for every

\delta > 0

. Specifically, for every

0 \le \epsilon \le 1

, we show that in the one-pass turnstile streaming model, in order to compute a

O(n^{\epsilon})

-approximation, space

\Omega(n^{3/2 - 4\epsilon})

is required for constant error randomized algorithms, and, up to logarithmic factors, space

O( n^{2-2\epsilon} )

is sufficient. Our lower bound result is proved in the simultaneous message model of communication and may be of independent interest

arXiv.org e-Print Archive

Crossref

Explore Bristol Research

Approximate F_2-Sketching of Valuation Functions

Author: Yaroslavtsev Grigory
Zhou Samson
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

We study the problem of constructing a linear sketch of minimum dimension that allows approximation of a given real-valued function f : F_2^n - > R with small expected squared error. We develop a general theory of linear sketching for such functions through which we analyze their dimension for most commonly studied types of valuation functions: additive, budget-additive, coverage, alpha-Lipschitz submodular and matroid rank functions. This gives a characterization of how many bits of information have to be stored about the input x so that one can compute f under additive updates to its coordinates. Our results are tight in most cases and we also give extensions to the distributional version of the problem where the input x in F_2^n is generated uniformly at random. Using known connections with dynamic streaming algorithms, both upper and lower bounds on dimension obtained in our work extend to the space complexity of algorithms evaluating f(x) under long sequences of additive updates to the input x presented as a stream. Similar results hold for simultaneous communication in a distributed setting

Dagstuhl Research Online Publication Server

Graph Sketches: Sparsification, Spanners, and Subgraphs

Author: Ahn KookJin
Guha Sudipto
Mcgregor Andrew
Publication venue: ScholarlyCommons
Publication date: 01/01/2012
Field of study

When processing massive data sets, a core task is to construct synopses of the data. To be useful, a synopsis data structure should be easy to construct while also yielding good approximations of the relevant properties of the data set. A particularly useful class of synopses are sketches, i.e., those based on linear projections of the data. These are applicable in many models including various parallel, stream, and compressed sensing settings. A rich body of analytic and empirical work exists for sketching numerical data such as the frequencies of a set of entities. Our work investigates graph sketching where the graphs of interest encode the relationships between these entities. The main challenge is to capture this richer structure and build the necessary synopses with only linear measurements. In this paper we consider properties of graphs including the size of the cuts, the distances between nodes, and the prevalence of dense sub-graphs. Our main result is a sketch-based sparsifier construction: we show that O̅(nε-2) random linear projections of a graph on n nodes suffice to (1 + ε) approximate all cut values. Similarly, we show that O(ε-2) linear projections suffice for (additively) approximating the fraction of induced sub-graphs that match a given pattern such as a small clique. Finally, for distance estimation we present sketch-based spanner constructions. In this last result the sketches are adaptive, i.e., the linear projections are performed in a small number of batches where each projection may be chosen dependent on the outcome of earlier sketches. All of the above results immediately give rise to data stream algorithms that also apply to dynamic graph streams where edges are both inserted and deleted. The non-adaptive sketches, such as those for sparsification and subgraphs, give us single-pass algorithms for distributed data streams with insertion and deletions. The adaptive sketches can be used to analyze MapReduce algorithms that use a small number of rounds

CiteSeerX

ScholarlyCommons@Penn

Online Row Sampling

Author: Cohen Michael B.
Musco Cameron
Pachocki Jakub
Publication venue
Publication date: 01/01/2016
Field of study

Finding a small spectral approximation for a tall

n \times d

matrix

A

is a fundamental numerical primitive. For a number of reasons, one often seeks an approximation whose rows are sampled from those of

A

. Row sampling improves interpretability, saves space when

A

is sparse, and preserves row structure, which is especially important, for example, when

A

represents a graph. However, correctly sampling rows from

A

can be costly when the matrix is large and cannot be stored and processed in memory. Hence, a number of recent publications focus on row sampling in the streaming setting, using little more space than what is required to store the outputted approximation [KL13, KLM+14]. Inspired by a growing body of work on online algorithms for machine learning and data analysis, we extend this work to a more restrictive online setting: we read rows of

A

one by one and immediately decide whether each row should be kept in the spectral approximation or discarded, without ever retracting these decisions. We present an extremely simple algorithm that approximates

A

up to multiplicative error

\epsilon

and additive error

\delta

using

O(d \log d \log(\epsilon||A||_2/\delta)/\epsilon^2)

online samples, with memory overhead proportional to the cost of storing the spectral approximation. We also present an algorithm that uses

O(d^2

) memory but only requires

O(d\log(\epsilon||A||_2/\delta)/\epsilon^2)

samples, which we show is optimal. Our methods are clean and intuitive, allow for lower memory usage than prior work, and expose new theoretical properties of leverage score based matrix approximation

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server