5,789 research outputs found
Streaming Complexity of Spanning Tree Computation
The semi-streaming model is a variant of the streaming model frequently used for the computation of graph problems. It allows the edges of an n-node input graph to be read sequentially in p passes using Õ(n) space. If the list of edges includes deletions, then the model is called the turnstile model; otherwise it is called the insertion-only model. In both models, some graph problems, such as spanning trees, k-connectivity, densest subgraph, degeneracy, cut-sparsifier, and (Δ+1)-coloring, can be exactly solved or (1+ε)-approximated in a single pass; while other graph problems, such as triangle detection and unweighted all-pairs shortest paths, are known to require Ω̃(n) passes to compute. For many fundamental graph problems, the tractability in these models is open. In this paper, we study the tractability of computing some standard spanning trees, including BFS, DFS, and maximum-leaf spanning trees. Our results, in both the insertion-only and the turnstile models, are as follows.
Maximum-Leaf Spanning Trees: This problem is known to be APX-complete with inapproximability constant ρ ∈ [245/244, 2). By constructing an ε-MLST sparsifier, we show that for every constant ε > 0, MLST can be approximated in a single pass to within a factor of 1+ε w.h.p. (albeit in super-polynomial time for ε ≤ ρ-1 assuming P ≠ NP) and can be approximated in polynomial time in a single pass to within a factor of ρ_n+ε w.h.p., where ρ_n is the supremum constant that MLST cannot be approximated to within using polynomial time and Õ(n) space. In the insertion-only model, these algorithms can be deterministic.
BFS Trees: It is known that BFS trees require ω(1) passes to compute, but the naïve approach needs O(n) passes. We devise a new randomized algorithm that reduces the pass complexity to O(√n), and it offers a smooth tradeoff between pass complexity and space usage. This gives a polynomial separation between single-source and all-pairs shortest paths for unweighted graphs.
DFS Trees: It is unknown whether DFS trees require more than one pass. The current best algorithm by Khan and Mehta [STACS 2019] takes Õ(h) passes, where h is the height of computed DFS trees. Note that h can be as large as Ω(m/n) for n-node m-edge graphs. Our contribution is twofold. First, we provide a simple alternative proof of this result, via a new connection to sparse certificates for k-node-connectivity. Second, we present a randomized algorithm that reduces the pass complexity to O(√n), and it also offers a smooth tradeoff between pass complexity and space usage.ISSN:1868-896
Streaming Verification of Graph Properties
Streaming interactive proofs (SIPs) are a framework for outsourced
computation. A computationally limited streaming client (the verifier) hands
over a large data set to an untrusted server (the prover) in the cloud and the
two parties run a protocol to confirm the correctness of result with high
probability. SIPs are particularly interesting for problems that are hard to
solve (or even approximate) well in a streaming setting. The most notable of
these problems is finding maximum matchings, which has received intense
interest in recent years but has strong lower bounds even for constant factor
approximations.
In this paper, we present efficient streaming interactive proofs that can
verify maximum matchings exactly. Our results cover all flavors of matchings
(bipartite/non-bipartite and weighted). In addition, we also present streaming
verifiers for approximate metric TSP. In particular, these are the first
efficient results for weighted matchings and for metric TSP in any streaming
verification model.Comment: 26 pages, 2 figure, 1 tabl
Low Diameter Graph Decompositions by Approximate Distance Computation
In many models for large-scale computation, decomposition of the problem is key to efficient algorithms. For distance-related graph problems, it is often crucial that such a decomposition results in clusters of small diameter, while the probability that an edge is cut by the decomposition scales linearly with the length of the edge. There is a large body of literature on low diameter graph decomposition with small edge cutting probabilities, with all existing techniques heavily building on single source shortest paths (SSSP) computations. Unfortunately, in many theoretical models for large-scale computations, the SSSP task constitutes a complexity bottleneck. Therefore, it is desirable to replace exact SSSP computations with approximate ones. However this imposes a fundamental challenge since the existing constructions of low diameter graph decomposition with small edge cutting probabilities inherently rely on the subtractive form of the triangle inequality, which fails to hold under distance approximation.
The current paper overcomes this obstacle by developing a technique termed blurry ball growing. By combining this technique with a clever algorithmic idea of Miller et al. (SPAA 2013), we obtain a construction of low diameter decompositions with small edge cutting probabilities which replaces exact SSSP computations by (a small number of) approximate ones. The utility of our approach is showcased by deriving efficient algorithms that work in the CONGEST, PRAM, and semi-streaming models of computation. As an application, we obtain metric tree embedding algorithms in the vein of Bartal (FOCS 1996) whose computational complexities in these models are optimal up to polylogarithmic factors. Our embeddings have the additional useful property that the tree can be mapped back to the original graph such that each edge is "used" only logaritmically many times, which is of interest for capacitated problems and simulating CONGEST algorithms on the tree into which the graph is embedded
Parallel Algorithms for Geometric Graph Problems
We give algorithms for geometric graph problems in the modern parallel models
inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem
over a set of points in the two-dimensional space, our algorithm computes a
-approximate MST. Our algorithms work in a constant number of
rounds of communication, while using total space and communication proportional
to the size of the data (linear space and near linear time algorithms). In
contrast, for general graphs, achieving the same result for MST (or even
connectivity) remains a challenging open problem, despite drawing significant
attention in recent years.
We develop a general algorithmic framework that, besides MST, also applies to
Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic
framework has implications beyond the MapReduce model. For example it yields a
new algorithm for computing EMD cost in the plane in near-linear time,
. We note that while recently Sharathkumar and Agarwal
developed a near-linear time algorithm for -approximating EMD,
our algorithm is fundamentally different, and, for example, also solves the
transportation (cost) problem, raised as an open question in their work.
Furthermore, our algorithm immediately gives a -approximation
algorithm with space in the streaming-with-sorting model with
passes. As such, it is tempting to conjecture that the
parallel models may also constitute a concrete playground in the quest for
efficient algorithms for EMD (and other similar problems) in the vanilla
streaming model, a well-known open problem
Crowdsourced Live Streaming over the Cloud
Empowered by today's rich tools for media generation and distribution, and
the convenient Internet access, crowdsourced streaming generalizes the
single-source streaming paradigm by including massive contributors for a video
channel. It calls a joint optimization along the path from crowdsourcers,
through streaming servers, to the end-users to minimize the overall latency.
The dynamics of the video sources, together with the globalized request demands
and the high computation demand from each sourcer, make crowdsourced live
streaming challenging even with powerful support from modern cloud computing.
In this paper, we present a generic framework that facilitates a cost-effective
cloud service for crowdsourced live streaming. Through adaptively leasing, the
cloud servers can be provisioned in a fine granularity to accommodate
geo-distributed video crowdsourcers. We present an optimal solution to deal
with service migration among cloud instances of diverse lease prices. It also
addresses the location impact to the streaming quality. To understand the
performance of the proposed strategies in the realworld, we have built a
prototype system running over the planetlab and the Amazon/Microsoft Cloud. Our
extensive experiments demonstrate that the effectiveness of our solution in
terms of deployment cost and streaming quality
Weighted Min-Cut: Sequential, Cut-Query and Streaming Algorithms
Consider the following 2-respecting min-cut problem. Given a weighted graph
and its spanning tree , find the minimum cut among the cuts that contain
at most two edges in . This problem is an important subroutine in Karger's
celebrated randomized near-linear-time min-cut algorithm [STOC'96]. We present
a new approach for this problem which can be easily implemented in many
settings, leading to the following randomized min-cut algorithms for weighted
graphs.
* An -time sequential algorithm:
This improves Karger's and bounds when the input graph is not extremely
sparse or dense. Improvements over Karger's bounds were previously known only
under a rather strong assumption that the input graph is simple [Henzinger et
al. SODA'17; Ghaffari et al. SODA'20]. For unweighted graphs with parallel
edges, our bound can be improved to .
* An algorithm requiring cut queries to compute the min-cut of
a weighted graph: This answers an open problem by Rubinstein et al. ITCS'18,
who obtained a similar bound for simple graphs.
* A streaming algorithm that requires space and
passes to compute the min-cut: The only previous non-trivial exact min-cut
algorithm in this setting is the 2-pass -space algorithm on simple
graphs [Rubinstein et al., ITCS'18] (observed by Assadi et al. STOC'19).
In contrast to Karger's 2-respecting min-cut algorithm which deploys
sophisticated dynamic programming techniques, our approach exploits some cute
structural properties so that it only needs to compute the values of cuts corresponding to removing pairs of tree edges, an
operation that can be done quickly in many settings.Comment: Updates on this version: (1) Minor corrections in Section 5.1, 5.2;
(2) Reference to newer results by GMW SOSA21 (arXiv:2008.02060v2), DEMN
STOC21 (arXiv:2004.09129v2) and LMN 21 (arXiv:2102.06565v1
A note on the data-driven capacity of P2P networks
We consider two capacity problems in P2P networks. In the first one, the
nodes have an infinite amount of data to send and the goal is to optimally
allocate their uplink bandwidths such that the demands of every peer in terms
of receiving data rate are met. We solve this problem through a mapping from a
node-weighted graph featuring two labels per node to a max flow problem on an
edge-weighted bipartite graph. In the second problem under consideration, the
resource allocation is driven by the availability of the data resource that the
peers are interested in sharing. That is a node cannot allocate its uplink
resources unless it has data to transmit first. The problem of uplink bandwidth
allocation is then equivalent to constructing a set of directed trees in the
overlay such that the number of nodes receiving the data is maximized while the
uplink capacities of the peers are not exceeded. We show that the problem is
NP-complete, and provide a linear programming decomposition decoupling it into
a master problem and multiple slave subproblems that can be resolved in
polynomial time. We also design a heuristic algorithm in order to compute a
suboptimal solution in a reasonable time. This algorithm requires only a local
knowledge from nodes, so it should support distributed implementations.
We analyze both problems through a series of simulation experiments featuring
different network sizes and network densities. On large networks, we compare
our heuristic and its variants with a genetic algorithm and show that our
heuristic computes the better resource allocation. On smaller networks, we
contrast these performances to that of the exact algorithm and show that
resource allocation fulfilling a large part of the peer can be found, even for
hard configuration where no resources are in excess.Comment: 10 pages, technical report assisting a submissio
- …