31 research outputs found
Space- and Time-Efficient Algorithm for Maintaining Dense Subgraphs on One-Pass Dynamic Streams
While in many graph mining applications it is crucial to handle a stream of
updates efficiently in terms of {\em both} time and space, not much was known
about achieving such type of algorithm. In this paper we study this issue for a
problem which lies at the core of many graph mining applications called {\em
densest subgraph problem}. We develop an algorithm that achieves time- and
space-efficiency for this problem simultaneously. It is one of the first of its
kind for graph problems to the best of our knowledge.
In a graph , the "density" of a subgraph induced by a subset of
nodes is defined as , where is the set of
edges in with both endpoints in . In the densest subgraph problem, the
goal is to find a subset of nodes that maximizes the density of the
corresponding induced subgraph. For any , we present a dynamic
algorithm that, with high probability, maintains a -approximation
to the densest subgraph problem under a sequence of edge insertions and
deletions in a graph with nodes. It uses space, and has an
amortized update time of and a query time of . Here,
hides a O(\poly\log_{1+\epsilon} n) term. The approximation ratio
can be improved to at the cost of increasing the query time to
. It can be extended to a -approximation
sublinear-time algorithm and a distributed-streaming algorithm. Our algorithm
is the first streaming algorithm that can maintain the densest subgraph in {\em
one pass}. The previously best algorithm in this setting required
passes [Bahmani, Kumar and Vassilvitskii, VLDB'12]. The space required by our
algorithm is tight up to a polylogarithmic factor.Comment: A preliminary version of this paper appeared in STOC 201
Densest Subgraph in Dynamic Graph Streams
In this paper, we consider the problem of approximating the densest subgraph
in the dynamic graph stream model. In this model of computation, the input
graph is defined by an arbitrary sequence of edge insertions and deletions and
the goal is to analyze properties of the resulting graph given memory that is
sub-linear in the size of the stream. We present a single-pass algorithm that
returns a approximation of the maximum density with high
probability; the algorithm uses O(\epsilon^{-2} n \polylog n) space,
processes each stream update in \polylog (n) time, and uses \poly(n)
post-processing time where is the number of nodes. The space used by our
algorithm matches the lower bound of Bahmani et al.~(PVLDB 2012) up to a
poly-logarithmic factor for constant . The best existing results for
this problem were established recently by Bhattacharya et al.~(STOC 2015). They
presented a approximation algorithm using similar space and
another algorithm that both processed each update and maintained a
approximation of the current maximum density in \polylog (n)
time per-update.Comment: To appear in MFCS 201
Robust Densest Subgraph Discovery
Dense subgraph discovery is an important primitive in graph mining, which has
a wide variety of applications in diverse domains. In the densest subgraph
problem, given an undirected graph with an edge-weight vector
, we aim to find that maximizes the density,
i.e., , where is the sum of the weights of the edges in the
subgraph induced by . Although the densest subgraph problem is one of the
most well-studied optimization problems for dense subgraph discovery, there is
an implicit strong assumption; it is assumed that the weights of all the edges
are known exactly as input. In real-world applications, there are often cases
where we have only uncertain information of the edge weights. In this study, we
provide a framework for dense subgraph discovery under the uncertainty of edge
weights. Specifically, we address such an uncertainty issue using the theory of
robust optimization. First, we formulate our fundamental problem, the robust
densest subgraph problem, and present a simple algorithm. We then formulate the
robust densest subgraph problem with sampling oracle that models dense subgraph
discovery using an edge-weight sampling oracle, and present an algorithm with a
strong theoretical performance guarantee. Computational experiments using both
synthetic graphs and popular real-world graphs demonstrate the effectiveness of
our proposed algorithms.Comment: 10 pages; Accepted to ICDM 201
Dynamic Graph Stream Algorithms in Space
In this paper we study graph problems in dynamic streaming model, where the
input is defined by a sequence of edge insertions and deletions. As many
natural problems require space, where is the number of
vertices, existing works mainly focused on designing space
algorithms. Although sublinear in the number of edges for dense graphs, it
could still be too large for many applications (e.g. is huge or the graph
is sparse). In this work, we give single-pass algorithms beating this space
barrier for two classes of problems.
We present space algorithms for estimating the number of connected
components with additive error and
-approximating the weight of minimum spanning tree, for any
small constant . The latter improves previous
space algorithm given by Ahn et al. (SODA 2012) for connected graphs with
bounded edge weights.
We initiate the study of approximate graph property testing in the dynamic
streaming model, where we want to distinguish graphs satisfying the property
from graphs that are -far from having the property. We consider
the problem of testing -edge connectivity, -vertex connectivity,
cycle-freeness and bipartiteness (of planar graphs), for which, we provide
algorithms using roughly space, which is
for any constant .
To complement our algorithms, we present space
lower bounds for these problems, which show that such a dependence on
is necessary.Comment: ICALP 201
Better Streaming Algorithms for the Maximum Coverage Problem
We study the classic NP-Hard problem of finding the maximum k-set coverage in the data stream model: given a set system of m sets that are subsets of a universe {1,...,n}, find the k sets that cover the most number of distinct elements. The problem can be approximated up to a factor 1-1/e in polynomial time. In the streaming-set model, the sets and their elements are revealed online. The main goal of our work is to design algorithms, with approximation guarantees as close as possible to 1-1/e, that use sublinear space o(mn). Our main results are: 1) Two (1-1/e-epsilon) approximation algorithms: One uses O(1/epsilon) passes and O(k/epsilon^2 polylog(m,n)) space whereas the other uses only a single pass but O(m/epsilon^2 polylog(m,n)) space. 2) We show that any approximation factor better than (1-(1-1/k)^k) in constant passes require space that is linear in m for constant k even if the algorithm is allowed unbounded processing time. We also demonstrate a single-pass, (1-epsilon) approximation algorithm using O(m/epsilon^2 min(k,1/epsilon) polylog(m,n)) space.
We also study the maximum k-vertex coverage problem in the dynamic graph stream model. In this model, the stream consists of edge insertions and deletions of a graph on N vertices. The goal is to find k vertices that cover the most number of distinct edges. We show that any constant approximation in constant passes requires space that is linear in N for constant k whereas O(N/epsilon^2 polylog(m,n)) space is sufficient for a (1-epsilon) approximation and arbitrary k in a single pass. For regular graphs, we show that O(k/epsilon^3 polylog(m,n)) space is sufficient for a (1-epsilon) approximation in a single pass. We generalize this to a K-epsilon approximation when the ratio between the minimum and maximum degree is bounded below by K
Faster Streaming and Scalable Algorithms for Finding Directed Dense Subgraphs in Large Graphs
Finding dense subgraphs is a fundamental algorithmic tool in data mining,
community detection, and clustering. In this problem, one aims to find an
induced subgraph whose edge-to-vertex ratio is maximized.
We study the directed case of this question in the context of semi-streaming
and massively parallel algorithms. In particular, we show that it is possible
to find a approximation on randomized streams even in a single
pass by using memory on -vertex graphs. Our
result improves over prior works, which were designed for arbitrary-ordered
streams: the algorithm by Bahmani et al. (VLDB 2012) which uses
passes, and the work by Esfandiari et al. (2015) which makes one pass but uses
memory. Moreover, our techniques extend to the Massively Parallel
Computation model yielding rounds in the super-linear and rounds in the nearly-linear memory regime. This constitutes a quadratic
improvement over state-of-the-art bounds by Bahmani et al. (VLDB 2012 and WAW
2014), which require rounds even in the super-linear memory regime.
Finally, we empirically evaluate our single-pass semi-streaming algorithm on
benchmarks and show that, even on non-randomly ordered streams, the quality
of its output is essentially the same as that of Bahmani et al. (VLDB 2012)
while it is times faster on large graphs