4,333 research outputs found
A Novel Approach to Finding Near-Cliques: The Triangle-Densest Subgraph Problem
Many graph mining applications rely on detecting subgraphs which are
near-cliques. There exists a dichotomy between the results in the existing work
related to this problem: on the one hand the densest subgraph problem (DSP)
which maximizes the average degree over all subgraphs is solvable in polynomial
time but for many networks fails to find subgraphs which are near-cliques. On
the other hand, formulations that are geared towards finding near-cliques are
NP-hard and frequently inapproximable due to connections with the Maximum
Clique problem.
In this work, we propose a formulation which combines the best of both
worlds: it is solvable in polynomial time and finds near-cliques when the DSP
fails. Surprisingly, our formulation is a simple variation of the DSP.
Specifically, we define the triangle densest subgraph problem (TDSP): given
, find a subset of vertices such that , where is the number of triangles induced
by the set . We provide various exact and approximation algorithms which the
solve the TDSP efficiently. Furthermore, we show how our algorithms adapt to
the more general problem of maximizing the -clique average density. Finally,
we provide empirical evidence that the TDSP should be used whenever the output
of the DSP fails to output a near-clique.Comment: 42 page
Streaming Verification of Graph Properties
Streaming interactive proofs (SIPs) are a framework for outsourced
computation. A computationally limited streaming client (the verifier) hands
over a large data set to an untrusted server (the prover) in the cloud and the
two parties run a protocol to confirm the correctness of result with high
probability. SIPs are particularly interesting for problems that are hard to
solve (or even approximate) well in a streaming setting. The most notable of
these problems is finding maximum matchings, which has received intense
interest in recent years but has strong lower bounds even for constant factor
approximations.
In this paper, we present efficient streaming interactive proofs that can
verify maximum matchings exactly. Our results cover all flavors of matchings
(bipartite/non-bipartite and weighted). In addition, we also present streaming
verifiers for approximate metric TSP. In particular, these are the first
efficient results for weighted matchings and for metric TSP in any streaming
verification model.Comment: 26 pages, 2 figure, 1 tabl
Max-stable sketches: estimation of Lp-norms, dominance norms and point queries for non-negative signals
Max-stable random sketches can be computed efficiently on fast streaming
positive data sets by using only sequential access to the data. They can be
used to answer point and Lp-norm queries for the signal. There is an intriguing
connection between the so-called p-stable (or sum-stable) and the max-stable
sketches. Rigorous performance guarantees through error-probability estimates
are derived and the algorithmic implementation is discussed
Robust Densest Subgraph Discovery
Dense subgraph discovery is an important primitive in graph mining, which has
a wide variety of applications in diverse domains. In the densest subgraph
problem, given an undirected graph with an edge-weight vector
, we aim to find that maximizes the density,
i.e., , where is the sum of the weights of the edges in the
subgraph induced by . Although the densest subgraph problem is one of the
most well-studied optimization problems for dense subgraph discovery, there is
an implicit strong assumption; it is assumed that the weights of all the edges
are known exactly as input. In real-world applications, there are often cases
where we have only uncertain information of the edge weights. In this study, we
provide a framework for dense subgraph discovery under the uncertainty of edge
weights. Specifically, we address such an uncertainty issue using the theory of
robust optimization. First, we formulate our fundamental problem, the robust
densest subgraph problem, and present a simple algorithm. We then formulate the
robust densest subgraph problem with sampling oracle that models dense subgraph
discovery using an edge-weight sampling oracle, and present an algorithm with a
strong theoretical performance guarantee. Computational experiments using both
synthetic graphs and popular real-world graphs demonstrate the effectiveness of
our proposed algorithms.Comment: 10 pages; Accepted to ICDM 201
Randomized Composable Core-sets for Distributed Submodular Maximization
An effective technique for solving optimization problems over massive data
sets is to partition the data into smaller pieces, solve the problem on each
piece and compute a representative solution from it, and finally obtain a
solution inside the union of the representative solutions for all pieces. This
technique can be captured via the concept of {\em composable core-sets}, and
has been recently applied to solve diversity maximization problems as well as
several clustering problems. However, for coverage and submodular maximization
problems, impossibility bounds are known for this technique \cite{IMMM14}. In
this paper, we focus on efficient construction of a randomized variant of
composable core-sets where the above idea is applied on a {\em random
clustering} of the data. We employ this technique for the coverage, monotone
and non-monotone submodular maximization problems. Our results significantly
improve upon the hardness results for non-randomized core-sets, and imply
improved results for submodular maximization in a distributed and streaming
settings.
In summary, we show that a simple greedy algorithm results in a
-approximate randomized composable core-set for submodular maximization
under a cardinality constraint. This is in contrast to a known impossibility result for (non-randomized) composable core-set. Our
result also extends to non-monotone submodular functions, and leads to the
first 2-round MapReduce-based constant-factor approximation algorithm with
total communication complexity for either monotone or non-monotone
functions. Finally, using an improved analysis technique and a new algorithm
, we present an improved -approximation algorithm
for monotone submodular maximization, which is in turn the first
MapReduce-based algorithm beating factor in a constant number of rounds
Recursive Sketching For Frequency Moments
In a ground-breaking paper, Indyk and Woodruff (STOC 05) showed how to
compute (for ) in space complexity O(\mbox{\em poly-log}(n,m)\cdot
n^{1-\frac2k}), which is optimal up to (large) poly-logarithmic factors in
and , where is the length of the stream and is the upper bound on
the number of distinct elements in a stream. The best known lower bound for
large moments is . A follow-up work of
Bhuvanagiri, Ganguly, Kesh and Saha (SODA 2006) reduced the poly-logarithmic
factors of Indyk and Woodruff to . Further reduction of poly-log factors has been an elusive
goal since 2006, when Indyk and Woodruff method seemed to hit a natural
"barrier." Using our simple recursive sketch, we provide a different yet simple
approach to obtain a algorithm for constant (our bound is, in fact, somewhat
stronger, where the term can be replaced by any constant number
of iterations instead of just two or three, thus approaching .
Our bound also works for non-constant (for details see the body of
the paper). Further, our algorithm requires only -wise independence, in
contrast to existing methods that use pseudo-random generators for computing
large frequency moments
Linear Programming in the Semi-streaming Model with Application to the Maximum Matching Problem
In this paper, we study linear programming based approaches to the maximum
matching problem in the semi-streaming model. The semi-streaming model has
gained attention as a model for processing massive graphs as the importance of
such graphs has increased. This is a model where edges are streamed-in in an
adversarial order and we are allowed a space proportional to the number of
vertices in a graph.
In recent years, there has been several new results in this semi-streaming
model. However broad techniques such as linear programming have not been
adapted to this model. We present several techniques to adapt and optimize
linear programming based approaches in the semi-streaming model with an
application to the maximum matching problem. As a consequence, we improve
(almost) all previous results on this problem, and also prove new results on
interesting variants
- …