Search CORE

4,333 research outputs found

A Novel Approach to Finding Near-Cliques: The Triangle-Densest Subgraph Problem

Author: Tsourakakis Charalampos E.
Publication venue
Publication date: 20/05/2014
Field of study

Many graph mining applications rely on detecting subgraphs which are near-cliques. There exists a dichotomy between the results in the existing work related to this problem: on the one hand the densest subgraph problem (DSP) which maximizes the average degree over all subgraphs is solvable in polynomial time but for many networks fails to find subgraphs which are near-cliques. On the other hand, formulations that are geared towards finding near-cliques are NP-hard and frequently inapproximable due to connections with the Maximum Clique problem. In this work, we propose a formulation which combines the best of both worlds: it is solvable in polynomial time and finds near-cliques when the DSP fails. Surprisingly, our formulation is a simple variation of the DSP. Specifically, we define the triangle densest subgraph problem (TDSP): given

G(V,E)

, find a subset of vertices

S^*

such that

\tau(S^*)=\max_{S \subseteq V} \frac{t(S)}{|S|}

, where

t(S)

is the number of triangles induced by the set

S

. We provide various exact and approximation algorithms which the solve the TDSP efficiently. Furthermore, we show how our algorithms adapt to the more general problem of maximizing the

k

-clique average density. Finally, we provide empirical evidence that the TDSP should be used whenever the output of the DSP fails to output a near-clique.Comment: 42 page

arXiv.org e-Print Archive

CiteSeerX

Streaming Verification of Graph Properties

Author: Abdullah Amirali
Daruki Samira
Roy Chitradeep Dutta
Venkatasubramanian Suresh
Publication venue
Publication date: 01/01/2016
Field of study

Streaming interactive proofs (SIPs) are a framework for outsourced computation. A computationally limited streaming client (the verifier) hands over a large data set to an untrusted server (the prover) in the cloud and the two parties run a protocol to confirm the correctness of result with high probability. SIPs are particularly interesting for problems that are hard to solve (or even approximate) well in a streaming setting. The most notable of these problems is finding maximum matchings, which has received intense interest in recent years but has strong lower bounds even for constant factor approximations. In this paper, we present efficient streaming interactive proofs that can verify maximum matchings exactly. Our results cover all flavors of matchings (bipartite/non-bipartite and weighted). In addition, we also present streaming verifiers for approximate metric TSP. In particular, these are the first efficient results for weighted matchings and for metric TSP in any streaming verification model.Comment: 26 pages, 2 figure, 1 tabl

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Max-stable sketches: estimation of Lp-norms, dominance norms and point queries for non-negative signals

Author: Stoev Stilian A.
Taqqu Murad S.
Publication venue
Publication date: 01/01/2010
Field of study

Max-stable random sketches can be computed efficiently on fast streaming positive data sets by using only sequential access to the data. They can be used to answer point and Lp-norm queries for the signal. There is an intriguing connection between the so-called p-stable (or sum-stable) and the max-stable sketches. Rigorous performance guarantees through error-probability estimates are derived and the algorithmic implementation is discussed

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Robust Densest Subgraph Discovery

Author: Miyauchi Atsushi
Takeda Akiko
Publication venue
Publication date: 13/09/2018
Field of study

Dense subgraph discovery is an important primitive in graph mining, which has a wide variety of applications in diverse domains. In the densest subgraph problem, given an undirected graph

G=(V,E)

with an edge-weight vector

w=(w_e)_{e\in E}

, we aim to find

S\subseteq V

that maximizes the density, i.e.,

w(S)/|S|

, where

w(S)

is the sum of the weights of the edges in the subgraph induced by

S

. Although the densest subgraph problem is one of the most well-studied optimization problems for dense subgraph discovery, there is an implicit strong assumption; it is assumed that the weights of all the edges are known exactly as input. In real-world applications, there are often cases where we have only uncertain information of the edge weights. In this study, we provide a framework for dense subgraph discovery under the uncertainty of edge weights. Specifically, we address such an uncertainty issue using the theory of robust optimization. First, we formulate our fundamental problem, the robust densest subgraph problem, and present a simple algorithm. We then formulate the robust densest subgraph problem with sampling oracle that models dense subgraph discovery using an edge-weight sampling oracle, and present an algorithm with a strong theoretical performance guarantee. Computational experiments using both synthetic graphs and popular real-world graphs demonstrate the effectiveness of our proposed algorithms.Comment: 10 pages; Accepted to ICDM 201

arXiv.org e-Print Archive

Crossref

Randomized Composable Core-sets for Distributed Submodular Maximization

Author: Balcan M.-F.
Bateni M.
Dean J.
Guha S.
Mirzasoleiman B.
Publication venue
Publication date: 22/06/2015
Field of study

An effective technique for solving optimization problems over massive data sets is to partition the data into smaller pieces, solve the problem on each piece and compute a representative solution from it, and finally obtain a solution inside the union of the representative solutions for all pieces. This technique can be captured via the concept of {\em composable core-sets}, and has been recently applied to solve diversity maximization problems as well as several clustering problems. However, for coverage and submodular maximization problems, impossibility bounds are known for this technique \cite{IMMM14}. In this paper, we focus on efficient construction of a randomized variant of composable core-sets where the above idea is applied on a {\em random clustering} of the data. We employ this technique for the coverage, monotone and non-monotone submodular maximization problems. Our results significantly improve upon the hardness results for non-randomized core-sets, and imply improved results for submodular maximization in a distributed and streaming settings. In summary, we show that a simple greedy algorithm results in a

1/3

-approximate randomized composable core-set for submodular maximization under a cardinality constraint. This is in contrast to a known

O({\log k\over \sqrt{k}})

impossibility result for (non-randomized) composable core-set. Our result also extends to non-monotone submodular functions, and leads to the first 2-round MapReduce-based constant-factor approximation algorithm with

O(n)

total communication complexity for either monotone or non-monotone functions. Finally, using an improved analysis technique and a new algorithm

\mathsf{PseudoGreedy}

, we present an improved

0.545

-approximation algorithm for monotone submodular maximization, which is in turn the first MapReduce-based algorithm beating factor

1/2

in a constant number of rounds

arXiv.org e-Print Archive

Crossref

Recursive Sketching For Frequency Moments

Author: Braverman Vladimir
Ostrovsky Rafail
Publication venue
Publication date: 11/11/2010
Field of study

In a ground-breaking paper, Indyk and Woodruff (STOC 05) showed how to compute

F_k

(for

k>2

) in space complexity O(\mbox{\em poly-log}(n,m)\cdot n^{1-\frac2k}), which is optimal up to (large) poly-logarithmic factors in

n

and

m

, where

m

is the length of the stream and

n

is the upper bound on the number of distinct elements in a stream. The best known lower bound for large moments is

\Omega(\log(n)n^{1-\frac2k})

. A follow-up work of Bhuvanagiri, Ganguly, Kesh and Saha (SODA 2006) reduced the poly-logarithmic factors of Indyk and Woodruff to

O(\log^2(m)\cdot (\log n+ \log m)\cdot n^{1-{2\over k}})

. Further reduction of poly-log factors has been an elusive goal since 2006, when Indyk and Woodruff method seemed to hit a natural "barrier." Using our simple recursive sketch, we provide a different yet simple approach to obtain a

O(\log(m)\log(nm)\cdot (\log\log n)^4\cdot n^{1-{2\over k}})

algorithm for constant

\epsilon

(our bound is, in fact, somewhat stronger, where the

(\log\log n)

term can be replaced by any constant number of

\log

iterations instead of just two or three, thus approaching

log^*n

. Our bound also works for non-constant

\epsilon

(for details see the body of the paper). Further, our algorithm requires only

4

-wise independence, in contrast to existing methods that use pseudo-random generators for computing large frequency moments

arXiv.org e-Print Archive

CiteSeerX

Linear Programming in the Semi-streaming Model with Application to the Maximum Matching Problem

Author: A. McGregor
D.E.D. Vinkemeier
J. Edmonds
J. Feigenbaum
J. Feigenbaum
J.E. Hopcroft
K.J. Ahn
L.K. Fleischer
R. Preis
S. Eggert
S. Pettie
Y. Freund
Z. Füredi
Z. Füredi
Publication venue
Publication date: 01/01/2011
Field of study

In this paper, we study linear programming based approaches to the maximum matching problem in the semi-streaming model. The semi-streaming model has gained attention as a model for processing massive graphs as the importance of such graphs has increased. This is a model where edges are streamed-in in an adversarial order and we are allowed a space proportional to the number of vertices in a graph. In recent years, there has been several new results in this semi-streaming model. However broad techniques such as linear programming have not been adapted to this model. We present several techniques to adapt and optimize linear programming based approaches in the semi-streaming model with an application to the maximum matching problem. As a consequence, we improve (almost) all previous results on this problem, and also prove new results on interesting variants

arXiv.org e-Print Archive

CiteSeerX

Crossref

ScholarlyCommons@Penn