112,494 research outputs found

    Scalable Facility Location for Massive Graphs on Pregel-like Systems

    Full text link
    We propose a new scalable algorithm for facility location. Facility location is a classic problem, where the goal is to select a subset of facilities to open, from a set of candidate facilities F , in order to serve a set of clients C. The objective is to minimize the total cost of opening facilities plus the cost of serving each client from the facility it is assigned to. In this work, we are interested in the graph setting, where the cost of serving a client from a facility is represented by the shortest-path distance on the graph. This setting allows to model natural problems arising in the Web and in social media applications. It also allows to leverage the inherent sparsity of such graphs, as the input is much smaller than the full pairwise distances between all vertices. To obtain truly scalable performance, we design a parallel algorithm that operates on clusters of shared-nothing machines. In particular, we target modern Pregel-like architectures, and we implement our algorithm on Apache Giraph. Our solution makes use of a recent result to build sketches for massive graphs, and of a fast parallel algorithm to find maximal independent sets, as building blocks. In so doing, we show how these problems can be solved on a Pregel-like architecture, and we investigate the properties of these algorithms. Extensive experimental results show that our algorithm scales gracefully to graphs with billions of edges, while obtaining values of the objective function that are competitive with a state-of-the-art sequential algorithm

    Graph sparsification for derandomizing massively parallel computation with low space

    Get PDF
    Massively Parallel Computation (MPC) is an emerging model which distills core aspects of distributed and parallel computation. It was developed as a tool to solve (typically graph) problems in systems where input is distributed over many machines with limited space. Recent work has focused on the regime in which machines have sublinear (in n, number of nodes in the input graph) space, with randomized algorithms presented for the fundamental problems of Maximal Matching and Maximal Independent Set. There are, however, no prior corresponding deterministic algorithms. A major challenge in the sublinear space setting is that the local space of each machine may be too small to store all the edges incident to a single node. To overcome this barrier we introduce a new graph sparsification technique that deterministically computes a low-degree subgraph with additional desired properties: degrees in the subgraph are sufficiently small that nodes’ neighborhoods can be stored on single machines, and solving the problem on the subgraph provides significant global progress towards solving the problem for the original input graph. Using this framework to derandomize the well-known randomized algorithm of Luby [SICOMP’86], we obtain O(log(\Delta) + loglog(n))- round deterministic MPC algorithms for solving the fundamental problems of Maximal Matching and Maximal Independent Set with O(n epsilon) space on each machine for any constant epsilon > 0. Based on the recent work of Ghaffari et al. [FOCS’18], this additive O(loglog(n)) factor is conditionally essential. These algorithms can also be shown to run in O(log(\Delta)) rounds in the closely related model of CONGESTED CLIQUE, improving upon the state-of-the-art bound of O(log^2(\Delta)) rounds by Censor-Hillel et al. [DISC’17]

    Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

    Full text link
    There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 20 important graph problems. We also present the optimizations and techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We have made the implementations developed in this work publicly-available as the Graph-Based Benchmark Suite (GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 201

    On the Enumeration of all Minimal Triangulations

    Full text link
    We present an algorithm that enumerates all the minimal triangulations of a graph in incremental polynomial time. Consequently, we get an algorithm for enumerating all the proper tree decompositions, in incremental polynomial time, where "proper" means that the tree decomposition cannot be improved by removing or splitting a bag

    Distributed and Parallel Algorithms for Set Cover Problems with Small Neighborhood Covers

    Get PDF
    In this paper, we study a class of set cover problems that satisfy a special property which we call the {\em small neighborhood cover} property. This class encompasses several well-studied problems including vertex cover, interval cover, bag interval cover and tree cover. We design unified distributed and parallel algorithms that can handle any set cover problem falling under the above framework and yield constant factor approximations. These algorithms run in polylogarithmic communication rounds in the distributed setting and are in NC, in the parallel setting.Comment: Full version of FSTTCS'13 pape

    Fast Local Computation Algorithms

    Full text link
    For input xx, let F(x)F(x) denote the set of outputs that are the "legal" answers for a computational problem FF. Suppose xx and members of F(x)F(x) are so large that there is not time to read them in their entirety. We propose a model of {\em local computation algorithms} which for a given input xx, support queries by a user to values of specified locations yiy_i in a legal output y∈F(x)y \in F(x). When more than one legal output yy exists for a given xx, the local computation algorithm should output in a way that is consistent with at least one such yy. Local computation algorithms are intended to distill the common features of several concepts that have appeared in various algorithmic subfields, including local distributed computation, local algorithms, locally decodable codes, and local reconstruction. We develop a technique, based on known constructions of small sample spaces of kk-wise independent random variables and Beck's analysis in his algorithmic approach to the Lov{\'{a}}sz Local Lemma, which under certain conditions can be applied to construct local computation algorithms that run in {\em polylogarithmic} time and space. We apply this technique to maximal independent set computations, scheduling radio network broadcasts, hypergraph coloring and satisfying kk-SAT formulas.Comment: A preliminary version of this paper appeared in ICS 2011, pp. 223-23
    • …
    corecore